NETWORKING 2007, Ad Hoc and Sensor Networks, Wireless Networks, Next Generation Internet, 6 conf

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...

Author: Ian F. Akyildiz | Raghupathy Sivakumar | Eylem Ekici | Jaudelice Cavalcante de Oliveira | Janise McNair

11 downloads 1162 Views 19MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Moshe Y. Vardi Rice University, Houston, TX, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany

4479

Ian F. Akyildiz Raghupathy Sivakumar Eylem Ekici Jaudelice Cavalcante de Oliveira Janise McNair (Eds.)

NETWORKING 2007 Ad Hoc and Sensor Networks, Wireless Networks, Next Generation Internet 6th International IFIP-TC6 Networking Conference Atlanta, GA, USA, May 14-18, 2007 Proceedings

13

Volume Editors Ian F. Akyildiz Raghupathy Sivakumar Georgia Institute of Technology, School of Electrical and Computer Engineering Atlanta, GA 30332, USA E-mail: {ian, siva}@ece.gatech.edu Eylem Ekici The Ohio State University, Department of Electrical and Computer Engineering Columbus, OH 43210, USA E-mail: [email protected] Jaudelice Cavalcante de Oliveira Drexel University, ECE Department, Bossone 312 Philadelphia, PA 19104-2875, USA E-mail: [email protected] Janise McNair University of Florida, Department of Electrical and Computer Engineering Gainesville, FL 32611, USA E-mail: [email protected]

Library of Congress Control Number: 2007926314 CR Subject Classification (1998): C.2, C.4, H.4, D.2, J.2, J.1, K.6, K.4 LNCS Sublibrary: SL 5 – Computer Communication Networks and Telecommunications ISSN ISBN-10 ISBN-13

0302-9743 3-540-72605-5 Springer Berlin Heidelberg New York 978-3-540-72605-0 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © 2007 IFIP International Federation for Information Processing, Hofstraße 3, 2361 Laxenburg, Austria Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12066407 06/3180 543210

Preface

General Chairs’ Message It is our great pleasure to welcome you to the Sixth IFIP Networking conference held in Atlanta, May 14–18, 2007. This conference, the sixth of a planned series of annual meetings with a highly selective and a highly competitive technical program, has been established to serve as the premier forum to cover research on all aspects of networking and communication issues. The conference is intended to involve multiple networking paradigms such as wireless and wired networks, ad-hoc networks, sensor networks, mesh networks, and optical networks. An exciting and high-quality technical program was put together by Eylem Ekici, Jaudelice C. de Oliveira and Janise McNair with the help of an exceptional panel of experts who served on the Technical Program Committee. We were very fortunate to have Eylem, Jau and Janice as the Program Chairs to launch the IFIP Networking 2007 conference on a path of academic excellence and practical relevance; we express our sincere thanks to them. Together, they put together a high-quality program that educated attendees and at the same time inspired spirited discussions. A poster session and social events provided other opportunities for discussions, debates and exchange of information amongst conference participants. Chuanyi Ji, the Local Arrangements Chair, performed a good job overseeing all aspects of the meeting planning and organization. Our very special thanks go to Chuanyi for making sure it all happened just right. Faramarz Fekri, our Financial Chair, did an outstanding job to keep the books orderly. Our sincere and special thanks go to Faramarz for his exceptional handling of the ﬁnances, and going over and beyond his call of duty to take care of other aspects of the conference organization as well. Benny Bing did an excellent job in collecting the camera-ready papers from the authors and making sure to publish the conference proceedings on time. The co-operation and support of Springer in this matter is also greatly appreciated. We thank Cordai Farrar, who provided essential administrative and logistical assistance to Chuanyi and Faramarz in the planning and implementation of the conference. We also would like to acknowledge the eﬀorts and contributions of Linda Jiang Xie and Ozgur B. Akan, who coordinated and implemented publicity for the conference as well as the management of the Web site. We express appreciation to Ed Knightly, Sherman (Xuemin) Shen and Josep Sole Pareta for raising sponsorship for the conference, to Mehmet Can Vuran and Vehbi Cagri Gungor, the Registration Chairs, to Giacomo Morabito, the Tutorials Chair, and to Giovanni Pau and George Kormentzas, the Workshop Chairs, for all their invaluable contributions. We also thank the members of the Steering Committee

VI

Preface

for providing much needed support and guidance during the organization of the conference. Finally, we thank all our industry sponsors who provided ﬁnancial assistance toward the conference organization. We look forward to an exciting week of sharing technical ideas and visions with colleagues from around the world. We hope that you will ﬁnd the conference to be very engaging and fruitful. We thank you for attending the conference and being a part of this very important event. May 2007

Ian F. Akyildiz Raghupathy Sivakumar IFIP Networking 2007 General Co-chairs

Technical Program Chairs’ Message

Welcome to Networking 2007! Networking 2007 was the sixth event in a series of International Conferences on Networking sponsored by the IFIP Technical Committee on Communication Systems (TC 6). Previous events were held in Paris (France) in 2000, Pisa (Italy) in 2002, Athens (Greece) in 2004, Waterloo (Canada) in 2005, and Coimbra (Portugal) in 2006. Networking 2007 brought together active and proﬁcient members of the networking community, from both academia and industry, thus contributing to scientiﬁc, strategic, and practical advances in the broad and fast-evolving ﬁeld of communications. The conference comprised highly technical sessions organized thematically, keynote talks, tutorials oﬀered by experts, as well as workshops and panel discussions on topical themes. Plenary sessions with keynote speeches opened the daily sessions, covering the three main tracks of the conference. The Networking 2007 call for papers attracted 440 submissions from 40 different countries in Asia, Australia, Europe, North America, and South America. These were subject to thorough review by the Program Committee members and additional reviewers. A one-week discussion phase followed the regular review deadline where TPC leaders summarized comments made by the other reviewers and made a ﬁnal recommendation. A high-quality selection of 96 full papers and 27 posters, organized into 24 regular sessions and 1 poster session, made up the Networking 2007 main technical program, which covered next-generation networks: content distribution, quality of service, topology design, routing, buﬀer management, optical networks, TCP, security, network measurement; ad hoc and sensor networks: connectivity and coverage, scheduling and resource allocation, mobility and location, routing, and key management; wireless networks: mesh networks, mobility, TCP, MAC performance, scheduling and resource allocation. The technical program was complemented by three keynote speeches: “Looking into the Future: Grand Challenges for Wireless Networks,” by Ness Shroﬀ (Purdue University); “Key Technologies and Architectures for Next-Generation Networks,” by Krishan Sabnani (Bell Labs); and “Urban Mesh Networks: Coming Soon to a City Near You,” by Ed Knightly (Rice University). In addition to the main technical program, the day preceding the conference was dedicated to three excellent tutorials on “An Introduction to Network Coding,” by Muriel Medard (MIT), “Cognitive Radio Networks,” by Ian F. Akyildiz (Georgia Institute of Technology), and “WiMAX: Technology for Broadband Wireless Internet and QoS Driven Routing: Theoretical and Experimental Considerations,” by Shailender Timiri and Shantidev Mohanty (Intel Corporation).

VIII

Preface

The ﬁnal day of Networking 2007 was dedicated to two one-day workshops, on the following topics: “Security and Privacy in Mobile and Wireless Networking,” and “Challenges for Next-Generation Networks.” We would like to express our appreciation of the eﬀorts of many people in making Networking 2007 a successful event: to the authors, we are most grateful to the hundreds of authors who spent countless hours preparing their submitted papers; to the Program Committee and to all associated reviewers: we thank you for your hard work, your promptness in submitting reviews, and your willingness and patience to accept extra assignments; to the other Executive Committee members: we thank you for your hard work and dedication to the organizational issues surrounding Networking 2007. Last, but by no means least, we thank our sponsors and supporting institutions, all the people that helped us at the Georgia Institute of Technology, and specially all the volunteers. March 2007

Eylem Ekici Janise McNair Jaudelice C. de Oliveira

Organization

Executive Committee General Co-chairs

Technical Program Co-chairs

Tutorial Chair Workshop Co-chairs

Publicity Co-chairs

Sponsor Co-chairs

Publication Chair Finance Chair Registration Co-chairs

Local Arrangements Chair

Ian F. Akyildiz (Georgia Institute of Technology, USA) Raghupathy Sivakumar (Georgia Institute of Technology, USA) Eylem Ekici (Ohio State University, USA) Janise McNair (University of Florida, USA) Jaudelice C. de Oliveira (Drexel University, USA) Giacomo Morabito (University of Catania, Italy) Giovanni Pau (University of California at Los Angeles, USA) George Kormentzas (University of Aegean, Greece) Ozgur Akan (Middle East Technical University, Turkey) Jiang (Linda) Xie (University of North Carolina-Charlotte, USA) Edward Knightly (Rice University, USA) Sherman Shen (University of Waterloo, Canada) Josep Sole Pareta (Universitat Polit`ecnica de Catalunya, Spain) Benny Bing (Georgia Institute of Technology, USA) Faramarz Fekri (Georgia Institute of Technology, USA) Mehmet Can Vuran (Georgia Institute of Technology, USA) Vehbi Cagri Gungor (Georgia Institute of Technology, USA) Chuanyi Ji (Georgia Institute of Technology, USA)

X

Organization

Steering Committee Harry Perros (North Carolina State University, USA) Augusto Casaca (IST/INESC, Portugal) Guy Omidyar (TC6 WG6.8 Chair, USA) Guy Pujolle (University of Paris 6, France) Ioannis Stavrakakis (University of Athens, Greece) Otto Spaniol (RWTH-Aachen University, Germany)

Technical Program Committee Alhussein Abouzeid (Rensselaer Polytechnic Institute, USA) Nael Abu-Ghazaleh (State University of New York at Binghamton, USA) Rui Aguiar (Universidade de Aveiro, Portugal) Ozgur Akan (Middle East Technical University, Turkey) Kemal Akkaya (Southern Illinois University, USA) Fatih Alagoz (Bogazici University, Turkey) Kevin Almeroth (University of California at Santa Barbara, USA) Mostafa Ammar (Georgia Institute of Technology, USA) Tricha Anjali (Illinois Institute of Technology, USA) Regina Araujo (Federal University of So Carlos, Brazil) Lichun Bao (University of California at Irvine, USA) Stefano Basagni (Northeastern University, USA) Christian Bettstetter (University of Klagenfurt, Austria) Raheem Beyah (Georgia State University, USA) Andrea Bianco (Politecnico di Torino, Italy) Semih Bilgen (Middle East Technical University (ODTU), Turkey) Chris Blondia (University of Antwerp, Belgium) Fernando Boavida (Coimbra University, Portugal) Azzedine Boukerche (University of Ottawa, Canada) Torsten Braun (University of Bern, Switzerland) Milind Buddhikot (Bell Labs, Lucent Technologies, USA) Wojciech Burakowski (Warsaw University of Technology, Poland) Tracy Camp (Colorado School of Mines, USA) Guohong Cao (Pennsylvania State University, USA) Antonio Capone (Politecnico di Milano, Italy) Matteo Cesana (Politecnico di Milano, Italy) Chih-Yung Chang (Tamkang University, Taiwan) Han-Chieh Chao (National Ilan University, Taiwan) Edgar Chavez (Universidad Michoacana, Mexico) Ling-Jyh Chen (Academia Sinica, Taiwan) Yuh-Shyan Chen (National Taipei University, Taiwan) Sunghyun Choi (Seoul National University, Korea) Marco Conti (IIT-CNR, Italy) Jun-Hong Cui (University of Connecticut, USA) Jaudelice C. de Oliveira (Drexel University, USA)

Organization

Michel Diaz (LAAS, France) Constantinos Dovrolis (Georgia Institute of Technology, USA) Falko Dressler (University of Erlangen, Germany) Stephan Eidenbenz (Los Alamos National Laboratory, USA) Eylem Ekici (Ohio State University, USA) Hesham El-Rewini (Southern Methodist University, USA) Tamer ElBatt (San Diego Research Center, Inc., USA) Mourad Elhadef (University of Ottawa, Canada) Ehab Elmallah (University of Alberta, Canada) Ozgur Ercetin (Sabanci University, Turkey) Laura Feeney (Swedish Institute of Computer Science, Sweden) Wu-chi Feng (Portland State University, USA) Joe Finney (Lancaster University, UK) Eric Fleury (Insa de Lyon / INRIA, France) Mario Freire (University of Beira Interior, Portugal) Andrea Fumagalli (University of Texas at Dallas, USA) Laura Galluccio (University of Catania, Italy) Sabastia Galmes (Universitat de Ies Illes Balears, Spain) Javier Gomez (National University of Mexico, Mexico) Isabelle Guerin Lassous (INRIA, France) Ozgur Gurbuz (Sabanci University, Turkey) Eren Gurses (Norwegian University of Science and Technology, Norway) Guenter Haring (Universit˝ at Wien, Austria) Paul Havinga (University of Twente, The Netherlands) Ahmed Helmy (University of Southern California, USA) Raquel Hill (Indiana University, USA) Xiaoyan Hong (University of Alabama, USA) Hung-Yun Hsieh (National Taiwan University, Taiwan) Frank Huebner (AT&T Labs, USA) David Hutchison (Lancaster University, UK) Muhammad Jaseemuddin (Ryerson University, Canada) Vana Kalogeraki (University of California at Riverside, USA) Holger Karl (University of Paderborn, Germany) Can Emre Koksal (Ohio State University, USA) Kimon Kontovasilis (NCSR Demokritos, Greece) Turgay Korkmaz (University of Texas at San Antonio, USA) Sastri Kota (Harris Corporation, USA) Yevgeni Koucheryavy (Tampere University of Technology, Finland) Evangelos Kranakis (Carleton University, Canada) Bhaskar Krishnamachari (University of Southern California, USA) Srikanth Krishnamurthy (University of California at Riverside, USA) Santosh Kumar (University of Memphis, USA) Thomas Kunz (Carleton University, Canada) Miguel Labrador (University of South Florida, USA) Bu Sung Lee (Nanyang Technological University, Singapore)

XI

XII

Organization

Chang-Gun Lee (Seoul National University, Korea) Sung-Ju Lee (HP Labs, USA) Kenji Leibnitz (Osaka University, Japan) Albert Levi (Sabanci University, Turkey) Baochun Li (University of Toronto, Canada) Jie Li (University of Tsukuba, Japan) Wei Li (University of Toledo, USA) Ben Liang (University of Toronto, Canada) Leszek Lilien (Western Michigan University, USA) Yung-Hsiang Lu (Purdue University, USA) Athina Markopoulou (University of California, Irvine, USA) Jose Marzo (Universitat de Girona, Spain) Xavier Masip-Bruin (Technical University of Catalonia (UPC), Spain) Mustafa Matalgah (University of Mississippi, USA) Ibrahim Matta (Boston University, USA) Ketan Mayer-Patel (University of North Carolina, USA) Janise McNair (University of Florida, USA) Sirisha Medidi (Washington State University, USA) Abdelhamid Mellouk (University Paris XII, France) Paulo Mendes (DoCoMo Eurolabs, Germany) Michael Menth (University of W¨ urzburg, Germany) Jelena Misic (University of Manitoba, Canada) Shantidev Mohanty (Intel Corporation, USA) Edmundo Monteiro (University of Coimbra, Portugal) Giacomo Morabito (University of Catania, Italy) Nidal Nasser (University of Guelph, Canada) Srihari Nelakuditi (University of South Carolina, USA) Ioanis Nikolaidis (University of Alberta, Canada) Stephan Olariu (Old Dominion University, USA) Joao Orvalho (IPC, Portugal) Giovanni Pau (University of California at Los Angeles, USA) Harry Perros (North Carolina State University, USA) Chiara Petrioli (University of Rome “La Sapienza,” Italy) Niki Pissinou (Florida International University, USA) Thomas Plagemann (University of Oslo, Norway) Radha Poovendran (University of Washington, USA) Konstantinos Psounis (University of Southern California, USA) Ramon Puigjaner (UIB, Spain) Guy Pujolle (University of Paris 6, France) Hazem Refai (Oklahoma University, USA) Reza Rejaie (University of Oregon, USA) George Rouskas (North Carolina State University, USA) Sergio S´ anchez-L´opez (Technical University of Catalonia, Spain) Paolo Santi (CNR, Italy) Caterina Scoglio (Kansas State University, USA) Subhabrata Sen (AT&T Labs - Research, USA)

Organization

XIII

Harish Sethu (Drexel University, USA) Ness Shroﬀ (Purdue University, USA) Jorge Silva (University of Coimbra, Portugal) Harry Skianis (National Centre for Scientiﬁc Research ‘Demokritos’, Greece) Josep Sole Pareta (UPC, Spain) Otto Spaniol (Aachen University of Technology, Germany) Burkhard Stiller (University of Z¨ urich and ETH Z¨ urich, Switzerland) Ivan Stojmenovic (University of Ottawa, Canada) Aaron Striegel (University of Notre Dame, USA) Tony Sun (University of California at Los Angeles, USA) Karthikeyan Sundaresan (Georgia Institute of Technology, USA) Violet Syrotiuk (Arizona State University, USA) Vassilis Tsaoussidis (Demokritos University, Greece) Tuna Tugcu (Bogazici University, Turkey) Damla Turgut (University of Central Florida, USA) Piet Van Mieghem (Delft University of Technology, The Netherlands) Ramanuja Vedantham (Georgia Institute of Technology, USA) Wenye Wang (NC State University, USA) Xudong Wang (Kiyon, Inc., USA) Steven Weber (Drexel University, USA) Cedric Westphal (Nokia, USA) Lars Wolf (Technical University of Braunschweig, IBR, Germany) Yang Xiao (University of Alabama, USA) Jiang (Linda) Xie (University of North Carolina at Charlotte, USA) Dong Xuan (The Ohio State University, USA) Guoliang Xue (Arizona State University, USA) Boon Sain Yeo (Wavex Technologies, Singapore) Mohamed Younis (University of Maryland Baltimore County, USA) Moustafa Youssef (University of Maryland, USA) Honghai Zhang (Lucent Technologies, USA) Lixia Zhang (University of California at Los Angeles, USA) Yongbing Zhang (University of Tsukuba, Japan) Rong Zheng (University of Houston, USA) Fang Zhu (Verizon, USA) Hao Zhu (Florida International University, USA) Taieb Znati (University of Pittsburgh, USA)

Additional Reviewers Ibrahim Abualhaol Helmut Adam Rui Aguiar Markus Anwander Jesus Arango Baris Atakan

Jeroen Avonts Abdel Aziz Leonardo Badia Seung Baek Mario Barbera Pere Barlet-Ros

Sujoy Basu V´eronique Baudin Osama Bazan Luca Becchetti Alper Bereketli Johan Bergs

XIV

Organization

Antoni Bibiloni Ali Bicak Steven Borbash Bart Braem Tiago Camilo Luca Campelli Davide Careglio Maxweel Carmo Eduardo Cerqueira Coskun Cetinkaya Sriram Chellappan Chao Chen Kuan-Ta Chen Yuh-Shyan Chen Maggie Cheng Roman Chertov Harshal Chhaya Philip Chimento Girish Chiruvolu Youngkyu Choi Cherita Corbett Vedat Coskun Paul Coulton S´ergio Cris´ostomo Sukrit Dasgupta Peter De Cleyn Isabel Dietrich Stylianos Dimitriou Lun Dong Yu Dong Khalil Drira Roman Dunaytsev Christopher Edwards Hesham Elsayed Thierry Ernst Amir Esmailpour Ramon Fabregat Silvia Farraposo Wissam Fawaz Xin Fei Anja Feldmann Pep-Lluis Ferrer Stefano Ferretti Ilario Filippini Rosario Firrincieli

Pedro M. F. M. Ferreira Alexandre Fonte James Gadze Wilfried Gansterer Mar´ıa L. Garc´ıa-Osma Ghayathri Garudapuram Damianos Gavalas Samik Ghosh Paolo Giaccone Silvia Giordano Oscar Gonzalez de Dios Jorge Granjal Vehbi Cagri Gungor Vaibhav Gupta Burak Gurdag Michael Gyarmati Thomas Hacker Omar Hammouri Seon Yeong Han Yuning He Mohamed Hefeeda Helmut Hlavacs Philipp Hofmann Carl Hu Cunqing Hua Pai-Han Huang Mehmet Isik Mikel Izal Mandana Jafarian Gentian Jakllari Jakub Jakubiak Jiwoong Jeong Yusheng Ji Tao Jiang Yingxin Jiang Changhee Joo Teodor Jov’ Carlos Juiz Eleni Kamateri Arzad Kherani Anna Kim Seongkwan Kim Vinay Kolar Jiejun Kong Ibrahim Korpeoglu

Li Lao Anis Laouiti Lap Kong Law Gabriela Leao Mauro Leoncini Nicolas Letor Chengzhi Li Hyuk Lim Ching-Ju Lin Yuan Lin Anders Lindgren Changlei Liu David Liu Jun Liu Jun Liu Ke Liu Yunhuai Liu Mahdi Lotﬁnezhad Ngok-Wah Ma Dhia Mahjoub Lefteris Mamatas Devu Manikantan Shila Cesar Marcondes Gustavo Marﬁa Eva Marin/Tordera Ruediger Martin Saverio Mascolo Marco Mellia Michela Meo Lyudmila Mihaylova Fabio Milan Dragan Milic Ingrid Moerman Dmitri Moltchanov Edmundo Monteiro Paolo Monti Xenia Mountrouidou Bala Natarajan Konstantinos Oikonomou Sema Oktug Evgeny Osipov Philippe Owezarski Claudio Palazzi Alessandro Panconesi

Organization

Michael Pascoe Stefan Penz Fabio Picconi Jonathan Pitts Ioannis Psaras Chunming Qiao Sundaram Rajagopalan Thierry Rakotoarivelo Rabie Ramadan Ulrich Reimers Mauricio Resende Jorge Rodriguez Sanchez Utz Roedig Sylwia Romaszko Kevin Ross Dario Rossi Emilia Rosti Nararat Ruangchaijatupon Christos Samaras Kamil Sarac Lambros Sarakis Emre Sayin Matthias Scheidegger Udo Schilcher Francesco Scoto Karim Seddik Alexandro Sentinelli Bartomeu Serra Ghalib Shah Manolis Sifalakis Paulo Simoes

Paul Smith Aaron So Rute Soﬁa Fernando Solano Donado Christoph Sommer Hanhee Song Hui Song Kathleen Spaey Vladimir Stankovic Thomas Staub Daniel Stutzbach Weilian Su Kyoungwon Suh Oguz Sunay Violet Syrotiuk Lei Tang Saurabh Tewari George Theodorakopoulos Masoomeh Torabzadeh Laurent Toutain Ageliki Tsioliaridou Tuna Tugcu Li-Ping Tung Alexander Tyrrell Suleyman Uludag Anna Urra Nicolas Van Wambeke Giacomo Verticale Michael Voorhaen Serdar Vural Mehmet Vuran

Markus Waelchli Gerald Wagenknecht Jiong Wang Lan Wang Alicia Washington Song Wei Wei Wei Michael Welzl Ralf Wienzek Damon Wischik Rita Wouhaybi Kai Wu Yan Wu Ariton Xhafa Yufeng Xin Kaiqi Xiong Lisong Xu Guang Yang Sichao Yang Yi Yang Marcelo Yannuzzi Sakir Yucel Ahmed Zahran Qian Zhang Wensheng Zhang Jing Zhao Qunwei Zheng Shengli Zhou Yuanyuan Zhou Zhong Zhou Andre Zimmermann Michele Zorzi

XV

Table of Contents

Ad Hoc and Sensor Networks AHS - Connectivity and Coverage On the Resilient Overlay Topology Formation in Multi-hop Wireless Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fei Xing and Wenye Wang

1

Placing and Maintaining a Core Node in Wireless Ad Hoc Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amit Dvir and Michael Segal

13

Flooding Speed in Wireless Multihop Networks with Randomized Beamforming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vasil Mizorov, J¨ org Widmer, Robert Vilzmann, and Petri M¨ ah¨ onen

25

AHS - Scheduling and Resource Allocation Power Ampliﬁer Characteristic-Aware Energy-Eﬃcient Transmission Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kwanghun Han, Youngkyu Choi, Sunghyun Choi, and Youngwoo Kwon

37

Energy Eﬃcient Throughput Optimization in Multi-hop Wireless Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dan Xu and Xin Liu

49

Election Based Hybrid Channel Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xin Wang and J.J. Garcia-Luna-Aceves

61

Asynchronous Data Aggregation for Real-Time Monitoring in Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jie Feng, Derek L. Eager, and Dwight Makaroﬀ

73

AHS - Mobility and Location Awareness A Novel Agent-Based User-Network Communication Model in Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sang-Sik Kim and Ae-Soon Park

85

XVIII

Table of Contents

Realistic Mobility and Propagation Framework for MANET Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mesut G¨ une¸s, Martin Wenig, and Alexander Zimmermann

97

Localization for Large-Scale Underwater Sensor Networks . . . . . . . . . . . . . Zhong Zhou, Jun-Hong Cui, and Shengli Zhou

108

Location-Unaware Sensing Range Assignment in Sensor Networks . . . . . . Ossama Younis, Srinivasan Ramasubramanian, and Marwan Krunz

120

AHS - Routing I A Distributed Energy-Eﬃcient Topology Control Routing for Mobile Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yan Ren, Bo Wang, Sidong Zhang, and Hongke Zhang

132

Integrated Clustering and Routing Strategies for Large Scale Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ataul Bari, Arunita Jaekel, and Subir Bandyopadhyay

143

On-Demand Routing in Disrupted Environments . . . . . . . . . . . . . . . . . . . . . Jay Boice, J.J. Garcia-Luna-Aceves, and Katia Obraczka

155

Delivery Guarantees in Predictable Disruption Tolerant Networks . . . . . . Jean-Marc Fran¸cois and Guy Leduc

167

AHS - Routing II PWave: A Multi-source Multi-sink Anycast Routing Framework for Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Haiyang Liu, Zhi-Li Zhang, Jaideep Srivastava, and Victor Firoiu

179

Simple Models for the Performance Evaluation of a Class of Two-Hop Relay Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ahmad Al Hanbali, Arzad A. Kherani, and Philippe Nain

191

Maximum Energy Welfare Routing in Wireless Sensor Networks . . . . . . . Changsoo Ok, Prasenjit Mitra, Seokcheon Lee, and Soundar Kumara

203

Analysis of Location Privacy/Energy Eﬃciency Tradeoﬀs in Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sergio Armenia, Giacomo Morabito, and Sergio Palazzo

215

Eﬃcient Error Recovery Using Network Coding in Underwater Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zheng Guo, Bing Wang, and Jun-Hong Cui

227

Table of Contents

XIX

AHS - Key Management Key Predistribution Schemes for Sensor Networks for Continuous Deployment Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ¨ u, Onsel ¨ Abd¨ ulhakim Unl¨ Arma˘gan, Albert Levi, Erkay Sava¸s, and ¨ ur Er¸cetin Ozg¨

239

Using Auxiliary Sensors for Pairwise Key Establishment in WSN . . . . . . . Qi Dong and Donggang Liu

251

Privacy-Aware Multi-Context RFID Infrastructure Using Public Key Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ¨ ur Er¸cetin Selim Kaya, Erkay Sava¸s, Albert Levi, and Ozg¨

263

Wireless Networks WiNet - Mesh Networks Minimum Cost Conﬁguration of Relay and Channel Infrastructure in Heterogeneous Wireless Mesh Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aaron So and Ben Liang Optimization Models for the Radio Planning of Wireless Mesh Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Edoardo Amaldi, Antonio Capone, Matteo Cesana, and Federico Malucelli

275

287

Interference-Aware Multicasting in Wireless Mesh Networks . . . . . . . . . . . Sudheendra Murthy, Abhishek Goswami, and Arunabha Sen

299

Characterizing the Capacity Gain of Stream Control Scheduling in MIMO Wireless Mesh Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yue Wang, Dah Ming Chiu, and John C.S. Lui

311

WiNet - Mobility AP and MN-Centric Mobility Prediction: A Comparative Study Based on Wireless Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jean-Marc Fran¸cois and Guy Leduc

322

A Flexible and Distributed Home Agent Architecture for Mobile IPv6-Based Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Albert Cabellos-Aparicio and Jordi Domingo-Pascual

333

Using PANA for Mobile IPv6 Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . Julien Bournelle, Jean-Michel Combes, Maryline Laurent-Maknavicius, and Sondes Larafa

345

XX

Table of Contents

Detecting 802.11 Wireless Hosts from Remote Passive Observations . . . . Valeria Baiamonte, Konstantina Papagiannaki, and Gianluca Iannaccone

356

WiNet - TCP A Scheme for Enhancing TCP Fairness and Throughput in IEEE 802.11 WLANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eun-Jong Lee, Hyung-Taig Lim, Seung-Joon Seok, and Chul-Hee Kang

368

TCP NJ+: Packet Loss Diﬀerentiated Transmission Mechanism Robust to High BER Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jungrae Kim, Jahwan Koo, and Hyunseung Choo

380

TCP WestwoodVT: A Novel Technique for Discriminating the Cause of Packet Loss in Wireless Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jahwan Koo, Sung-Gon Mun, and Hyunseung Choo

391

Modeling TCP in a Multi-rate Multi-user CDMA System . . . . . . . . . . . . . Majid Ghaderi, Ashwin Sridharan, Hui Zang, Don Towsley, and Rene Cruz

403

WiNet - MAC Performance IEEE 802.11b Cooperative Protocols: A Performance Study . . . . . . . . . . . Niraj Agarwal, Divya ChanneGowda, Lakshmi Narasimhan Kannan, Marco Tacca, and Andrea Fumagalli It Is Better to Give Than to Receive – Implications of Cooperation in a Real Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thanasis Korakis, Zhifeng Tao, Salik Makda, Boris Gitelman, and Shivendra Panwar

415

427

Modeling Approximations for an IEEE 802.11 WLAN Under Poisson MAC-Level Arrivals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ioannis Koukoutsidis and Vasilios A. Siris

439

Performance and Equilibrium Analysis of Heterogeneous IEEE 802.11 Based WLANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hao Zhu

450

Exploring a New Approach to Collision Avoidance in Wireless Ad Hoc Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jun Peng and Liang Cheng

462

Table of Contents

XXI

WiNet - Scheduling and Resource Allocation Video Rate Adaptation and Scheduling in Multi-rate Wireless Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sourav Pal, Sumantra R. Kundu, Amin R. Mazloom, and Sajal K. Das

475

On Scheduling and Interference Coordination Policies for Multicell OFDMA Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G´ abor Fodor

488

Distributed Uplink Scheduling in CDMA Networks . . . . . . . . . . . . . . . . . . . Ashwin Sridharan, Ramesh Subbaraman, and Roch Gu´erin

500

Resource Allocation in DVB-RCS Satellite Systems . . . . . . . . . . . . . . . . . . Andr´e-Luc Beylot, Riadh Dhaou, and C´edric Baudoin

511

WiNet - Miscellaneous Enhanced Downlink Capacity in UMTS Supported by Direct Mobile-to-Mobile Data Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Larissa Popova, Thomas Herpel, and Wolfgang Koch

522

Impact of Technology Overlap in Next-Generation Wireless Heterogeneous Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ahmed Zahran, Ben Liang, and Aladdin Saleh

535

An On-Line Measurement-Based Admission Control for VBR Video Traﬃc in Wireless Multimedia Home Networks . . . . . . . . . . . . . . . . . . . . . . Yi-Hsien Tseng, Eric Hsiao-Kuang Wu, and Gen-Huey Chen

546

On Event Signal Reconstruction in Wireless Sensor Networks . . . . . . . . . . ¨ ur B. Akan Barı¸s Atakan and Ozg¨

558

Next Generation Internet NGI -Content Distribution Peer-Assisted On-Demand Streaming of Stored Media Using BitTorrent-Like Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Niklas Carlsson and Derek L. Eager

570

Multiple Identities in BitTorrent Networks . . . . . . . . . . . . . . . . . . . . . . . . . . Jin Sun, Anirban Banerjee, and Michalis Faloutsos

582

Graph Based Modeling of P2P Streaming Systems . . . . . . . . . . . . . . . . . . . Damiano Carra, Renato Lo Cigno, and Ernst W. Biersack

594

XXII

Table of Contents

Modeling Seed Scheduling Strategies in BitTorrent . . . . . . . . . . . . . . . . . . . Pietro Michiardi, Krishna Ramachandran, and Biplab Sikdar

606

NGI -QoS I Streaming Performance in Multiple-Tree-Based Overlays . . . . . . . . . . . . . . Gy¨ orgy D´ an, Vikt´ oria Fodor, and Ilias Chatzidrossos

617

Path Selection Using Available Bandwidth Estimation in Overlay-Based Video Streaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manish Jain and Constantine Dovrolis

628

Fundamental Tradeoﬀs in Distributed Algorithms for Rate Adaptive Multimedia Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vilas Veeraraghavan and Steven Weber

640

Optimal Policies for Playing Buﬀered Media Streams . . . . . . . . . . . . . . . . . Steven Weber

652

NGI - Qos II Non-parametric and Self-tuning Measurement-Based Admission Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thomas Michael Bohnert, Edmundo Monteiro, Yevgeni Koucheryavy, and Dmitri Moltchanov

664

Optimal Rate Allocation in Overlay Content Distribution . . . . . . . . . . . . . Chuan Wu and Baochun Li

678

SLA Adaptation for Service Overlay Networks . . . . . . . . . . . . . . . . . . . . . . . Con Tran, Zbigniew Dziong, and Michal Pi´ oro

691

NGI - Topology Design Virtual Private Network to Spanning Tree Mapping . . . . . . . . . . . . . . . . . . Yannick Brehon, Daniel Kofman, and Augusto Casaca

703

Optimal Topology Design for Overlay Networks . . . . . . . . . . . . . . . . . . . . . . Mina Kamel, Caterina Scoglio, and Todd Easton

714

Construction of a Proxy-Based Overlay Skeleton Tree for Large-Scale Real-Time Group Communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jun Guo and Sanjay Jha

726

Increasing the Coverage of a Cooperative Internet Topology Discovery Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Benoit Donnet, Bradley Huﬀaker, Timur Friedman, and K.C. Claﬀy

738

Table of Contents

XXIII

NGI - Routing I Robust IP Link Costs for Multilayer Resilience . . . . . . . . . . . . . . . . . . . . . . Michael Menth, Matthias Hartmann, and R¨ udiger Martin

749

Integer SPM: Intelligent Path Selection for Resilient Networks . . . . . . . . . R¨ udiger Martin, Michael Menth, and Ulrich Sp¨ orlein

762

Beyond Centrality - Classifying Topological Signiﬁcance Using Backup Eﬃciency and Alternative Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuval Shavitt and Yaron Singer

774

Incorporating Protection Mechanisms in the Dynamic Multi-layer Routing Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anna Urra, Eusebi Calle Ortega, Jose L. Marzo, and Pere Vila

786

NGI - Routing II Accelerated Packet Placement Architecture for Parallel Shared Memory Routers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Brad Matthews, Itamar Elhanany, and Vahid Tabatabaee

797

RSVP-TE Extensions to Provide Guarantee of Service to MPLS . . . . . . . Francisco J. Rodr´ıguez-P´erez, Jos´e Luis Gonz´ alez-S´ anchez, and Alfonso Gazo-Cervero

808

An Adaptive Management Approach to Resolving Policy Conﬂicts . . . . . Selma Yilmaz and Ibrahim Matta

820

Reinforcement Learning-Based Load Shared Sequential Routing . . . . . . . . Fariba Heidari, Shie Mannor, and Lorne G. Mason

832

NGI - Buﬀer Management An Adaptive Neuron AQM for a Stable Internet . . . . . . . . . . . . . . . . . . . . . Jinsheng Sun and Moshe Zukerman Light-Weight Control of Non-responsive Traﬃc with Low Buﬀer Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Venkatesh Ramaswamy, Leticia Cu´ellar, Stephan Eidenbenz, Nicolas Hengartner, Christoph Amb¨ uhl, and Birgitta Weber

844

855

The Eﬀects of Fairness in Buﬀer Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mei Wang and Yashar Ganjali

867

Time to Buﬀer Overﬂow in an MMPP Queue . . . . . . . . . . . . . . . . . . . . . . . . Andrzej Chydzinski

879

XXIV

Table of Contents

NGI - Miscellaneous Fundamental Eﬀects of Clustering on the Euclidean Embedding of Internet Hosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sanghwan Lee, Zhi-Li Zhang, Sambit Sahu, Debanjan Saha, and Mukund Srinivasan

890

A Multihoming Based IPv4/IPv6 Transition Approach . . . . . . . . . . . . . . . Lizhong Xie, Jun Bi, and Jianping Wu

902

Oﬄine and Online Network Traﬃc Characterization . . . . . . . . . . . . . . . . . . Su Zhang, Mary K. Vernon

912

Catching IP Traﬃc Burstiness with a Lightweight Generator . . . . . . . . . . Chlo´e Rolland, Julien Ridoux, and Bruno Baynat

924

NGI - Optical Networks Importance of the Maturity of Photonic Component Industry on the Business Prospects of Optical Access Networks: A Techno-Economic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dimitris Varoutas, Thomas Kamalakis, Dimitris Katsianis, Thomas Sphicopoulos, and Thomas Monath

935

The Token Based Switch: Per-Packet Access Authorisation to Optical Shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mihai-Lucian Cristea, Leon Gommans, Li Xu, and Herbert Bos

945

Online Multicasting in WDM Networks with Shared Light Splitter Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuzhen Liu and Weifa Liang

958

Evaluation of Optical Burst-Switching as a Multiservice Environment . . . Pablo Jes´ us Argibay-Losada, Andres Su´ arez-Gonz´ alez, Manuel Fern´ andez-Veiga, Ra´ ul Rodr´ıguez-Rubio, and C´ andido L´ opez-Garc´ıa

970

NGI -TCP The TCP Minimum RTO Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ioannis Psaras and Vassilis Tsaoussidis

981

Improving XCP to Achieve Max-Min Fair Bandwidth Allocation . . . . . . . Lei Zan and Xiaowei Yang

992

TCP Libra: Exploring RTT-Fairness for TCP . . . . . . . . . . . . . . . . . . . . . . . . 1005 Gustavo Marﬁa, Claudio Palazzi, Giovanni Pau, Mario Gerla, M.Y. Sanadidi, and Marco Roccetti

Table of Contents

XXV

Interactions of Intelligent Route Control with TCP Congestion Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014 Ruomei Gao, Dana Blair, Constantine Dovrolis, Monique Morrow, and Ellen Zegura

NGI - Security Fast and Scalable Classiﬁcation of Structured Data in the Network . . . . 1026 Sumantra R. Kundu, Sourav Pal, Christoph L. Schuba, and Sajal K. Das An Eﬃcient and Secure Event Signature (EASES) Protocol for Peer-to-PeerMassively Multiplayer Online Games . . . . . . . . . . . . . . . . . . . . 1037 Mo-Che Chan, Shun-Yun Hu, and Jehn-Ruey Jiang Uniﬁed Defense Against DDoS Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1047 M. Muthuprasanna, G. Manimaran, and Z. Wang Integrity-Aware Bandwidth Guarding Approach in P2P Networks . . . . . . 1060 Wen-Hui Chiang, Ling-Jyh Chen, and Cheng-Fu Chou

NGI - Network Measurement Measuring Bandwidth Signatures of Network Paths . . . . . . . . . . . . . . . . . . . 1072 Mradula Neginhal, Khaled Harfoush, and Harry Perros A Non-cooperative Active Measurement Technique for Estimating the Average and Variance of the One-Way Delay . . . . . . . . . . . . . . . . . . . . . . . . 1084 Antonio A. de A. Rocha, Rosa M.M. Le˜ ao, and Edmundo de Souza e Silva The P2P War: Someone Is Monitoring Your Activities! . . . . . . . . . . . . . . . 1096 Anirban Banerjee, Michalis Faloutsos, and Laxmi Bhuyan On-Line Predictive Load Shedding for Network Monitoring . . . . . . . . . . . . 1108 Pere Barlet-Ros, Diego Amores-L´ opez, Gianluca Iannaccone, Josep Sanju` as-Cuxart, and Josep Sol´e-Pareta On the Schedulability of Measurement Conﬂict in Overlay Networks . . . . 1120 Mohammad Fraiwan and G. Manimaran

Poster Session SEA-LABS: A Wireless Sensor Network for Sustained Monitoring of Coral Reefs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1132 Matt Bromage, Katia Obraczka, and Donald Potts

XXVI

Table of Contents

Capacity-Fairness Performance of an Ad Hoc IEEE 802.11 WLAN with Noncooperative Stations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1136 Jerzy Konorski Multi-rate Support for Network-Wide Broadcasting in MANETs . . . . . . . 1140 Tolga Numanoglu, Wendi Heinzelman, and Bulent Tavli BRD: Bilateral Route Discovery in Mobile Ad Hoc Networks . . . . . . . . . . 1145 Rendong Bai and Mukesh Singhal Correction, Generalisation and Validation of the “Max-Min d-Cluster Formation Heuristic” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1149 Alexandre Delye de Clauzade de Mazieux, Michel Marot, and Monique Becker Analytical Performance Evaluation of Distributed Multicast Algorithms for Directional Communications in WANETs . . . . . . . . . . . . . . . . . . . . . . . . 1153 Song Guo, Oliver Yang, and Victor Leung Beyond Proportional Fair: Designing Robust Wireless Schedulers . . . . . . . 1157 Soshant Bali, Sridhar Machiraju, and Hui Zang A Voluntary Relaying MAC Protocol for Multi-rate Wireless Local Area Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1161 Jaeeun Na, Yeonkwon Jeong, and Joongsoo Ma Throughput Analysis Considering Capture Eﬀect in IEEE 802.11 Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1165 Ge Xiaohu, Yan Dong, Zhu Yaoting Performance Improvement of IEEE 802.15.4 Beacon-Enabled WPAN with Superframe Adaptation Via Traﬃc Indication . . . . . . . . . . . . . . . . . . . 1169 Zeeshan Hameed Mir, Changsu Suh, and Young-Bae Ko Analysis of WLAN Traﬃc in the Wild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1173 Caleb Phillips and Suresh Singh Enhanced Rate Adaptation Schemes with Collision Awareness . . . . . . . . . 1179 Seongkwan Kim, Sunghyun Choi, Daji Qiao, and Jongseok Kim A Study of Performance Improvement in EAP . . . . . . . . . . . . . . . . . . . . . . . 1183 Eun-Chul Cha and Hyoung-Kee Choi Characterization of Ultra Wideband Channel in Data Centers . . . . . . . . . 1187 N. Udar, K. Kant, R. Viswanathan, and D. Cheung Evaluating Internal BGP Networks from the Data Plane . . . . . . . . . . . . . . 1192 Feng Zhao, Xicheng Lu, Baosheng Wang, and Peidong Zhu

Table of Contents XXVII

Performance of a Partially Shared Buﬀer with Correlated Arrivals . . . . . . 1196 Dieter Fiems, Bart Steyaert, and Herwig Bruneel Filter-Based RFD: Can We Stabilize Network Without Sacriﬁcing Reachability Too Much? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1200 Ke Zhang and S. Felix Wu Network Access in a Diversiﬁed Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1204 Michael Wilson, Fred Kuhns, and Jonathan Turner Outburst: Eﬃcient Overlay Content Distribution with Rateless Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1208 Chuan Wu and Baochun Li Adaptive Window-Tuning Algorithm for Eﬃcient Bandwidth Allocation on EPON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1217 Sangho Lee, Tae-Jin Lee, Min Young Chung, and Hyunseung Choo Optical Burst Control Algorithm for Reducing the Eﬀect of Congestion Reaction Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1221 Myungsik Yoo and Junho Hwang Incremental Provision of QoS Discarding Non-feasible End-to-End Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1225 Alfonso Gazo-Cervero, Jos´e Luis Gonz´ alez-S´ anchez, and Francisco J. Rodr´ıguez-P´erez Enhancing Guaranteed Delays with Network Coding . . . . . . . . . . . . . . . . . 1229 Ali Mahmino, J´erˆ ome Lacan, and Christian Fraboul LPD Based Route Optimization in Nested Mobile Network . . . . . . . . . . . . 1233 Jungwook Song, Heemin Kim, Sunyoung Han, and Bokgyu Joo PIBUS: A Network Memory-Based Peer-to-Peer IO Buﬀering Service . . . 1237 Yiming Zhang, Dongsheng Li, Rui Chu, Nong Xiao, and Xicheng Lu A Subgradient Optimization Approach to Inter-domain Routing in IP/MPLS Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1241 Artur Tomaszewski, Michal Pi´ oro, Mateusz Dzida, Mariusz Mycek, and Michal Zago˙zd˙zon Cost-Based Approach to Access Selection and Vertical Handover Decision in Multi-access Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1245 Fanchun Jin, Hyeong-Ah Choi, Jae-Hoon Kim, Se-Hyun Oh, Jong-Tae Ihm, JungKyo Sohn, and Hyeong In Choi Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1249

On the Resilient Overlay Topology Formation in Multi-hop Wireless Networks Fei Xing and Wenye Wang Department of Electrical and Computer Engineering North Carolina State University, Raleigh, NC 27695, USA [email protected], [email protected]

Abstract. In this paper, we study the problem of how to design overlay topologies in multi-hop wireless networks such that the overlays achieve perfect resilience, in terms of all cooperative nodes included but misbehaving nodes excluded, and preserve the k-connectivity with high probability. To address this problem, we propose a new distributed topology control protocol called PROACtive. By using PROACtive, every node pro-actively selects its cooperative adjacent nodes as neighbors by mutually exchanging neighbor request and reply messages. As a result, the union of all neighbor sets forms a resilient overlay for a given network. Our analysis ﬁnds that the PROACtive protocol is light-weighted with the message complexity of only O(m), where m is the number of links in the original network. Our simulation results validate the eﬀectiveness of PROACtive and show that the overlays generated by our protocol preserve the k-connectivity with high probability (> 90%) and low false positive ratio (< 5%).

1

Introduction

Multi-hop wireless networks, especially mobile ad hoc networks, are more vulnerable to failures compared with wired networks due to nodal mobility and errorprone wireless channels. In addition, node misbehaviors, such as selﬁshness by refusing to forward packets of other nodes and maliciousness by launching Denial of Service (DoS) attacks, can also cause failures. For example, two DoS attacks, Jellyﬁsh and Blackhole, were shown in [1] to have the network partitioning eﬀect which degrades the network performance severely. In [2], a stochastic analysis on node isolation problem also shown that misbehaving nodes may damage the connectivity of mobile ad hoc networks substantially. Since misbehaving nodes may not provide connectivity to other adjacent nodes, existing routing protocols cannot cope with the failures caused by misbehaving nodes, which leaves the design of resilient multi-hop wireless networks an open and challenging problem in the presence of misbehaving nodes. To enhance the resilience to misbehaving nodes, some eﬀorts were made by using diﬀerent approaches. Two techniques called watchdog and pathrater were proposed in [3] to identify misbehaving nodes and avoid them in routes. A creditbased system called Sprite was proposed in [4] to stimulate cooperation among I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1–12, 2007. c IFIP International Federation for Information Processing 2007

2

F. Xing and W. Wang

selﬁsh nodes. In [5], a secure ad hoc routing protocol called Ariadne was presented to prevent attacks from tampering routing control messages by using symmetric cryptographic primitives. Multi-path routing scheme in [6] introduced redundancy to avoid single path failure caused by node failures or node misbehaviors. Nevertheless, the previous schemes are “passive” to misbehaving nodes since even a misbehaving node can be detected, it is very diﬃcult to prevent it from being selected as intermediate relays for all paths. In this paper, we study the problem of how to design overlay topologies in multi-hop wireless networks such that the overlays achieve perfect resilience, in terms of all cooperative nodes included but misbehaving nodes excluded, and preserve the k-connectivity with high probability (w.h.p.). Through the formation of resilient overlays, routing and data transferring can be performed upon cooperative platforms. Our contributions are mainly on two aspects. 1. A new distributed and localized protocol called PROACtive is proposed to generate a resilient overlay for a given network. By using PROACtive, every node pro-actively selects only cooperative adjacent nodes as its neighbors, which results in the exclusion of misbehaving nodes from the overlay. 2. The PROACtive protocol is shown to be light-weighted with the message complexity of only O(m), where m is the number of links in the original network, and the overlays preserve k-connectivity w.h.p. (> 90%) and low false positive ratio (< 5%). Note that our objective distinguishes itself from the existing topology control works [7,8,9], which usually focused on minimizing the energy consumption as well as keeping networks connected. For example, in [7], the K-Neigh protocol, based on distance estimation, was proposed to preserve the connectivity of static multihop networks, with eﬃcient power consumption, by selecting K closest neighbors for each node. Nevertheless, our approach diﬀers from the existing resilience-enhancing works in that we employ the topology control technique in the PROACtive protocol to connect cooperative nodes dynamically by mutual neighbor selections. The remainder of this paper is organized as follows. In Section 2, we formulate the problem. In Section 3, we describe the details of the PROACtive protocol. In Section 4, we validate our approach by simulations, followed by conclusions in Section 5.

2

Problem Statement

In this section, we describe the system model and formulate the perfect resilient overlay generation (PROG) problem. 2.1

System and Threat Model

In this paper, we denote multi-hop wireless networks by M(N ), where N is the set of nodes. All nodes are assumed to be distributed independently and

On the Resilient Overlay Topology Formation

3

uniformly on a two-dimensional plane, and they use omni-directional antennas with the same transmission radius r. For a pair of node u and v, they are called adjacent if the distance between them, denoted by d(u, v), is no greater than r. When d(u, v) > r, u and v may communicate via multiple intermediate hops. It is known that the geometric random graph (GRG) [10], denoted by G(N, r), is a graph in which N vertices are independently and uniformly distributed in a metric space with edges existing between any pair of vertices u and v if and only if d(u, v) ≤ r. Thus, we use the GRG to model the underlying topologies of multi-hop wireless networks. In order to identify the type of misbehaviors our work has targeted, we loosely classify the diﬀerent types of misbehaviors in a multi-hop wireless network below, though the classiﬁcation is not intended to be comprehensive. (I) Nodes participate in routing but not in data forwarding, like Jellyﬁsh and Blackhole; (II) Nodes do not cooperate in forwarding control or data packets for others, like selﬁsh nodes; (III) Compromised nodes, though appearing to be legitimate, malfunction maliciously; (IV) Malicious attacker nodes generate DoS traﬃc or signals, impersonate legitimate nodes, or tamper with routing messages. Our research focuses on misbehavior (I); while our approach can also be applicable to address misbehavior (II) and (III). Our approach, however, does not address misbehavior (IV), which requires suitable authentication and privacy mechanisms. Further, colluding attacks are out of the scope of this work. 2.2

Problem Formulation

The objective of this work is to enhance the resilience of multi-hop wireless networks against node misbehaviors. As mentioned in Section 1, misbehaving nodes may undermine network connectivity and network performance. Here we take an example to look at the eﬀect of misbehaving nodes on path reliability. For a path with h relay hops, let the probability of any relay node being failed (due to node mobility or energy depletion) be Pf , then the path reliability, denoted by RP , can be presented by RP = (1 − Pf )h . While, if any relay node may also misbehave to disrupt communications, the representation of RP becomes RP = (1 − Pf − Pm )h , where Pm is the probability of any node misbehaving. Then we can easily show that RP can be signiﬁcantly decreased by the route disruption eﬀect resulting from misbehaving relays, and the negative impact is more exaggerative when the number of hops h increases. Therefore, it has been an important issue in the design of resilient networks to “exclude” misbehaving nodes. For a multi-hop wireless network in the presence of misbehaving nodes, we call a connected subnet consisting its all and only cooperative nodes as a perfect resilient overlay (PRO). If the routing and data transfer operations are conducted only on the induced PRO, then the communication between cooperative nodes is guaranteed to be resilient to misbehaving nodes. Here we formulate our problem as the perfect resilient overlay generation (PROG) problem, as follows:

4

F. Xing and W. Wang

Deﬁnition 1. PROG Problem: Given a connected multi-hop wireless network M and a connectivity requirement k, generate a perfect resilient overlay M− such that M− is k-connected with high probability. To solve the PROG problem, we propose a distributed and localized protocol called PROACtive, by which every node can pro-actively select cooperative adjacent nodes as its neighbors, which will be described in detail right next.

3

PROACtive Protocol Design

In this section, we propose the PROACtive protocol as the solution to the PROG problem. 3.1

Basic Idea

In the PROACtive protocol, each node is assumed to be able to know whether its adjacent nodes forward packets for other nodes. For example, if wireless cards operate in promiscuous mode, a node can use the Watchdog method [3] to tell if its next-hop node drops packets instead of forwarding. By this way, a node u should quantitatively measure the cooperativity (borrowed from Biochemistry) of its adjacent nodes, which indicates the likelihood that a node performs normal network operations. Based on the obtained cooperativities, a node u can select neighbors by sending the soliciting messages called Neighbor Request (Ngbr-Rqst) to its adjacent nodes with high cooperativities. Once receiving an acknowledge message called Neighbor Reply (Ngbr-Rqst) from one of its adjacent nodes, say node v, then u knows that v has agreed to accept u as v’s neighbor, so u can add v into its neighbor set. By this mutual neighbor selection process, a cooperative node can have a cooperative neighborhood easily; while a misbehaving node can hardly have any neighbors. As a result, the union of cooperative neighbor sets generates a perfect resilient overlay which excludes misbehaving nodes. To satisfy the constraint of the k-connectivity, we refer to some results shown in a few recent literatures, which reveal the probabilistic relations between the connectivity κ(G) and the minimum degree δ(G) of the graph G. It was proved in [10] (Theorem 1.1) that if N is suﬃciently large, the GRG G(N, r), obtained by adding links between nodes in the order of increasing length, becomes k-connected at the instant when it achieves a minimum degree of k, w.h.p.. In [11] (Theorem 3 ), it was shown that for a GRG G(N, r) P r(κ(G) = k) ≈ P r(δ(G) ≥ k)

(1)

holds if N 1 and P r(δ(G) ≥ k) ≈ 1. The result was further veriﬁed by extensive simulations in [11], [12], and [13]. Moreover, it was shown in [2] that to achieve the k-connectivity in a multi-hop wireless network where misbehaving nodes present, a necessary condition is that each node should have at least k cooperative neighbors. This result implies that for a network M, if let θ(M) denote the minimum number of cooperative neighbors of M, then P r(κ(M) = k) ≈ P r(θ(M) ≥ k)

(2)

On the Resilient Overlay Topology Formation

5

holds for the suﬃciently large system size N . Therefore, in our protocol, each (cooperative) node should maintain at least k cooperative neighbors such that the k-connectivity is achievable in the overlay w.h.p.. In the next section, we describe the details of our approach. 3.2

PROACtive Protocol Details

As described brieﬂy in Section 3.1, the essential idea of the PROACtive protocol is to build up cooperative neighbor sets, which is done by the mutual neighbor selections via Ngbr-Rqst and Ngbr-Rply message exchanges. In this section, we provide the detailed procedures of building up cooperative neighbor sets. For clarity of the description, we denote the adjacent nodes and cooperative neighbors of a node u by Adj(u) and N gbr(u), respectively, and denote the cooperativity of a node u by c(u). We will ﬁrst discuss the procedure of querying potential neighbors as follows. Once a node u knows the cooperativities of its adjacent nodes, u selects a node v of the highest cooperativity from set Adj(u) as a potential neighbor, if v is not in set N gbr(u). Then u sends a Ngbr-Rqst message to v, indicating that u intends to add v to its neighbor set. If u receives a Ngbr-Rply message from v within a timeout, then u can add v into set N gbr(u); otherwise, u queries another adjacent node of the next highest cooperativity. Node u will continue the inquiries until k Ngbr-Rply messages from diﬀerent adjacent nodes are received, which guarantees u with at least k neighbors. This procedure is summarized by Algorithm 1 as follows. Algorithm 1. Procedure of querying potential neighbors Input: k, node u, and Adj(u) 1: Initiate N gbr(u) := ∅, create a temporary set T emp(u) := ∅, create a counter numRplyRcvd := 0 2: ∀v ∈ Adj(u), Measure c(v) 3: while (numRplyRcvd < k AND T emp(u) = Adj(u)) do 4: Select v if c(v) = max{c(w) : ∀w ∈ Adj(u) − T emp(u)} 5: Send Ngbr-Rqst to v 6: T emp(u) := T emp(u) + v 7: if (Receive Ngbr-Rply from v) then N gbr(u) := N gbr(u) + v 8: 9: numRplyRcvd := numRplyRcvd + 1 10: end if 11: end while

Next we discuss how a node processes the incoming neighbor requests. In our approach, each node, say u, can calculate its own threshold, based on the information available from its local environment, to decide if it should accept a querying node as its neighbor. We call this threshold as the neighbor cooperativity threshold and denote a node u’s threshold by c∗ (u). For a node u, since u can

6

F. Xing and W. Wang

measure the behaviors of its adjacent nodes quantitatively, u’s threshold can be deﬁned as the average cooperativity of its adjacent nodes, i.e., c∗ (u) =

1 d

c(v), for d = |Adj(u)|.

(3)

∀v∈Adj(u)

Hence, when u receives a Ngbr-Rqst message from one of its adjacent nodes v, it compares c(v) to its threshold c∗ (u). If c(v) ≥ c∗ (u), u replies v with a Ngbr-Rply message and adds v into u’s neighbor set; Otherwise, u discards this neighbor request and replies nothing. Algorithm 2 summaries the procedure of processing neighbor requests. Algorithm 2. Procedure of processing neighbor requests Input: node u, and Adj(u) 1: ∀v ∈ Adj(u), Measure c(v) 2: Calculate c∗ (u) by Equation (3) 3: if (Receive Ngbr-Rqst from v ∈ Adj(u)) then / N gbr(u)) then 4: if (c(v) ≥ c∗ (u) AND v ∈ 5: Send Ngbr-Rply to v 6: N gbr(u) := N gbr(u) + v 7: else 8: Discard Ngbr-Rqst 9: end if 10: end if

By this mutual neighbor selection, a node with high cooperativity, in contrast to the nodes with low cooperativities, may receive multiple requests and immediate replies from adjacent nodes. Consequently, a resilient overlay topology can be constructed from the neighbor sets. We use Algorithm 3 to summarize the perfect resilient overlay generation. Algorithm 3. Generate a perfect resilient overlay M− Input: k, and a multi-hop wireless network M 1: Let N − be the node set of M− , N − := ∅ 2: for each u ∈ M do 3: Build up N gbr(u) by using Algorithms (1) and (2) 4: N − := N − ∩ N gbr(u) 5: end for 6: return M− induced from N −

3.3

Cooperativity Measurement Scheme

To measure the cooperativity of a generic node, we investigate the characteristics of misbehaving nodes on the network layer. A selﬁsh node, for selﬁsh reasons such as saving energy, usually refuses to forward data packets for other nodes.

On the Resilient Overlay Topology Formation

7

A malicious node can do anything such as dropping partial data packets at a random or periodic manner, or pretending to be adjacent to a node actually far-away from it, thus trapping all packets destinated to that node afterwards. Thus, dropping “transient” packets is one of the most common characteristic of most misbehaviors, especially the Type-I misbehavior mentioned in Section 2.1. This observation implies that for a node u, we can use u’s packet drop ratio, denoted by qdrp , to measure u’s cooperativity c(u). Let nf wd (u) and ndrp (u) denote the numbers of packets should be forwarded and dropped, then we have, c(v) = 1 − qdrp (v) = 1 −

ndrp . nf wd

(4)

We use an example in Fig. 1(a) to illustrate this method. In Fig. 1(a), every time node u asks node w to forward a packet to v, u increases a counter nf wd (w) by 1. If u cannot overhear w’s forwarding after a timeout (e.g., round-trip delay), u increases another counter ndrp (w) by 1. Moreover, when one of u’s adjacent nodes, x, requires w to forward packets to v, u can record w’s behavior as well. Based on the measurements from both “own experience” and ”indirect observation”, u can calculate w’s cooperativity c(w) by (4). 00 11

00 x11 00 11

u

w w

v

u

(a)

v ACK

1 0 0 1 0 1

x

(b)

Fig. 1. Measuring cooperativity (a) by promiscuous mode (b) by ACKs

Notice that in our PROACtive protocol, the cooperativity measurement scheme is not limited to the technique that employs promiscuous mode only; it can also use other techniques such as close-loop feedbacks. For example, in Fig. 1(b), when u sends a data packet to w, it can piggyback an ACK request. Based on whether u can receive an ACK from one of w’s downstream nodes, say x, u may tell if w has forwarded the packet successfully. 3.4

Features of PROACtive Protocol

In this section, we discuss some unique features of our approach. First, the “individual” threshold deﬁned in (3) allows each node to reach a trade-oﬀ between system resilience and individual connectivity, compared to a global threshold. This is due to the fact that a node surrounded by nodes with relatively low cooperativities can hardly ﬁnd enough neighbors although a relatively high global threshold can achieve a resilient overlay of only cooperative nodes. On the contrary, by using the individual threshold, for a node u with adjacent nodes of relatively low cooperativities, its neighbor cooperativity threshold can decrease accordingly. Thus u may still have enough “neighbors”. Nevertheless, a global threshold can be also applicable for our protocol and more ﬂexible decision policies in neighbor can be designed by combining the global threshold and individual thresholds.

8

F. Xing and W. Wang

Second, regarding the issue of neighbor set updating, our PROACtive protocol is able to deal with the dynamic topology changes due to node mobility. For instance, a mobile node can refresh its neighbor set when it detects a disconnection with its neighbor(s). If the topology is highly dynamic (e.g., mobility is high), then mobile nodes can keep the records of its neighbors to avoid frequent neighbor set updates. Further, the PROACtive protocol is able to deal with the dynamic changes of node behaviors as well due to the ﬂexibility provided by our threshold design. For instance, the updating overhead can be also reduced by deleting a neighbor only if its cooperativity is below the minimum requirement for a speciﬁc application. Third, our approach does not involve new security vulnerabilities and can avoid the false accusation problem. For instance, the cooperativity information measured by one node are not shared with others in our protocol, and the neighbor selection is only dependent on each node’s own knowledge to its neighborhood. By this way, one node’s cooperativity cannot be falsely rated to a low or high value by several other (might be malicious) nodes, which prevents any node from the false accusation. No information sharing also helps to avoid the complexity of deciding the actual cooperativity of one node when multiple different measurements are received. Nevertheless, the integrity of Ngbr-Rqst and Ngbr-Rply messages can be protected by traditional cryptographic techniques. Finally, the PROACtive protocol is completely distributed and localized, which makes our approach more feasible to be implemented in a real scenario. Additionally, our protocol can be run locally, in an on-demand manner, whenever a mobile node detects a signiﬁcant cooperativity change among its neighborhood. Further, our protocol is light-weighted in terms of the overhead of message exchanges. For example, given a wireless multi-hop network, in the worst case, every node should send either a Ngbr-Rqst or Ngbr-Rply message to each of its adjacent nodes in order to build up a neighbor set. This implies that to generate a resilient overlay, the total number of messages needed is no more than the number of links, denoted by m, in the original network. Therefore, the message complexity of our protocol is only O(m). Until now, the PROG problem, how to generate a perfect resilient overlay, has been solved by applying the PROACtive algorithm on given networks. We will evaluate the eﬀectiveness of our solution by simulations in the next section.

4

Simulation Evaluations

To evaluate our PROACtive protocol, we performed a considerable body of experiments by using NS2 v2.28 and MATLAB v7sp3 tools. In our simulations, nodes are distributed randomly and uniformly in an area. Distinct node pairs randomly establish constant bit rate (CBR) connections, with packet size of 512 bytes and rate of 5 packets/sec, such that nodes can measure their adjacent nodes’ coopertivities by the method described in Section 3.3. The AODV routing protocol is used. The connectivity requirement k is 3 for all simulations.

On the Resilient Overlay Topology Formation

(a)

(b)

9

(c)

Fig. 2. The topologies of the underlying network and overlays generated: (a) no topology control, (b) applied with PROACtive, (c) K-Neigh with K = 9 (Phase I)

We show how the PROACtive protocol generates overlays ﬁrst, then show the eﬀectiveness of our protocol with regard to the k-connectivity preservation and the false positive (negative) rate. 4.1

Topology Generated by PROACtive Protocol

In this simulation, 500 nodes are distributed on a 1500 m×1500 m area at random with the same transmission radius of 150 m. Among 500 nodes, 150 misbehaving nodes drop packets to be forwarded or report false routes. Fig. 2(a) illustrates the network without applying any topology control, in which a pair of nodes are connected by a link as long as their distance is no larger than 150 m. Fig. 2(b) shows the network structure after applying our PROACtive protocol, in which cooperative and misbehaving nodes are represented by solid and hollow dots, respectively. From the ﬁgure, we can see that the overlay topology generated by PROACtive excludes most of misbehaving nodes, while containing almost all cooperative nodes. Though some links are removed in the overlay due to the mutual neighbor selection, the generated overlay is still connected. To highlight the feature of our protocol, the topology generated by the K-Neigh protocol (Phase 1 only, with K = 9) [7] is shown in Fig. 2(c), where we can see that all misbehaving nodes are included in the topology. This is because the neighbor selection in K-Neigh is only based on the distance between nodes, that is each node selects K nearest adjacent nodes as its neighbors. 4.2

Preservation of k-Connectivity

One of the major tasks in our simulations is to verify that the k-connectivity should be preserved, w.h.p., for the overlays generated by PROACtive, when the original network is k-connected. To test whether the network is k-connected, we use breadth ﬁrst search (BFS) to compute how many disjoint paths connecting two distinct nodes. In this simulation, the nodes are placed uniformly at random in a bounded region of 2000 m × 2000 m. The transmission radius is set to 100 m. The number of nodes N ranges from 500 to 5000 with an interval of 100,

F. Xing and W. Wang

1

Probability of k−connectivity, k=3

Probability of k−connectivity, k=3

10

0.8 0.6 0.4 0.2 0 2000

2500

The original M, Pm=0.1 Overlay M−, Pm=0.1 The original M, Pm=0.3 Overlay M−, Pm=0.3 3000 3500 4000 System size N

1 0.8 0.6 0.4 0.2 0 0

Analysis for M, N=3000 Simulation for M−, N=3000 Analysis for M, N=4000 Simualtion for M−, N=4000 0.1 0.2 0.3 0.4 0.5 0.6 Ratio of misbehaving nodes in original net (Pm)

(a)

0.7

(b)

Fig. 3. The k-connectivity probabilities of the overlays generated, compared to those of the original networks: (a) against N , (b)against PM

FPR, Pm=0.1 FNR, Pm=0.1 FPR, Pm=0.3 FNR, Pm=0.3 FPR, Pm=0.5 FNR, Pm=0.5

0.2

0.15

0.1

0.6

False negative ratios

False positive and negative ratios

0.25

FNR, FNR, FNR, FNR,

N=1000 N=2000 N=3000 N=4000

0.4

0.2

0.05

0 500 1000

2000 3000 System size N

4000

5000

0 0

(a)

0.1 0.3 0.5 0.7 0.9 Ratio of misbehaving nodes in original net (PM)

(b)

Fig. 4. The false positive and negative ratios produced by PROACtive (a) against N , (b) against PM (FNR only)

which makes the network placements representative for both sparse and dense networks. For every value of N , a certain number of nodes are misbehaving, whose ratio to the total, denoted by PM , is ranging from about 1% up to 90% with a interval of 10%. The simulation results are shown in Fig. 3(a) and 3(b). For clarity, only the results for 2000 < N < 4000 are shown, and we omit the part for PM > 0.7 by the similar reason. From the two ﬁgures, we can see that the overlays generated preserve the k-connectivity with probability greater than 90%, when the original network is k-connected. 4.3

False Positive and Negative Ratio

As we described in Section 3.4, though the individual threshold helps nodes to reach a trade-oﬀ between system resilience and individual connectivity, it is possible that a cooperative node u cannot build up its neighbor set if c(u) < c∗ (v) ∀v ∈ Adj(u). In this case, we say u is a false positive. On the contrary, a misbehaving node u may have the chance to be added to another node v’s neighbor set if v cannot have enough neighbors without adding u. In this case, we say u is a false negative. Since the perfect resilient overlay (PRO) is an overlay

On the Resilient Overlay Topology Formation

11

that contains all and only cooperative nodes of the original network, we can use two metrics, false positive rate (FPR) and false negative rate (FNR), to evaluate the eﬀectiveness of the PROACtive protocol in generating PROs. If let NC and NM denote the number of cooperative and misbehaving nodes in the original network, then the FPR and FNR can be calculated by F P R = NCm /NC and c c F N R = NM /NM , respectively, with NCm and NM denoting the number of false positives and false negatives. Our simulation results are reported in Fig. 4(a) and 4(b). In Fig. 4(a), the FPRs are very low (< 5%) for all networks of diﬀerent system size N as well as diﬀerent PM ; however, the FNRs are more signiﬁcant for small N than for large N . This indicates that relatively more misbehaving nodes are added into the overlay to keep it connected when the network is sparse. Another observation is that the FNRs increase signiﬁcantly when PM increases, which is further illustrated in Fig. 4(b), where the FNR for N = 4000 raises even up to 50% when PM = 0.9. This is due to the fact that more false negatives are produced to keep enough neighbors for every node when many misbehaving nodes present. These observations show that the PROACtive protocol is more “conservative” in satisfying the k-connectivity constraint.

5

Conclusion and Future Work

In this paper, we proposed a distributed and localized protocol, PROACtive, to generate perfect resilient overlays which contain all and only cooperative nodes of the original wireless multi-hop networks. The PROACtive protocol has a light-weighted message complexity, O(m), and the overlays generated achieve k-connectivity with high probability and low false positive ratio. The main advantage of applying our PROACtive protocol is that the resilient overlays generated essentially provide cooperative platforms for multi-hop routing and data transmission when misbehaving nodes are present. Based on the resilient overlays, new routing strategies and data aggregation schemes can be designed, which will be our next works. Further, more advanced cooperativity measurement schemes are also needed to be explored.

References 1. Aad, I., Hubaux, J.P., Knightly, E.W.: Denial of Service Resilience in Ad Hoc Networks. In: Proc. of ACM MobiCom ’04. (2004) 202–215 2. Xing, F., Wang, W.: Modeling and Analysis of Connectivity in Mobile Ad Hoc Networks with Misbehaving Nodes. In: Proc. of IEEE conference on communication (ICC) ’06. (2006) 1879 – 1884 3. Marti, S., Giuli, T.J., Lai, K., Baker, M.: Mitigating Routing Misbehavior in Mobile Ad hoc Networks. In: Proc. of ACM MobiCom ’00. (2000) 255–265 4. Zhong, S., Chen, J., Yang, Y.R.: Sprite: A Simple, Cheat-Proof, Credit-based System for Mobile Ad-Hoc Networks. In: Proc. of IEEE INFOCOM ’03. (Mar. 2003) 1987–1997

12

F. Xing and W. Wang

5. Hu, Y., Perrig, A., Johnson, D.B.: Ariadne: A Secure OnDemand Routing Protocol for Ad Hoc Networks. In: Proc. of ACM MobiCom ’02, Atlanta, USA (Sep. 2002) 6. Ganesan, D., Govindan, R., Shenker, S., Estrin, D.: Highly-Resilient, EnergyEﬃcient Multipath Routing in Wireless Sensor Networks. Mobile Computing and Communications Review (MC2R) 5(4) (2001) 1–13 7. Blough, D.M., Leoncini, M.: The K-Neigh Protocol for Symmetric Topology Control in Ad Hoc Networks. In: Proc. of ACM MobiHoc ’03, ACM Press (2003) 141–152 8. Shen, C.C., Srisathapornphat, C., Liu, R., Huang, Z., Jaikaeo, C., Lloyd, E.L.: CLTC: A Cluster-Based Topology Control Framework for Ad Hoc Networks. IEEE Transactions on Mobile Computing 3(1) (2004) 18–32 9. Cardei, M., Wu, J., Yang, S.: Topology Control in Ad Hoc Wireless Networks Using Cooperative Communication. IEEE Transactions on Mobile Computing 5(6) (Jun. 2006) 711–724 10. Penrose, M.D.: On k-connectivity for a Geometric Random Graph. Random Struct. Algorithms 15(2) (1999) 145–164 11. Bettstetter, C.: On the Minimum Node Degree and Connectivity of a Wireless Multihop Network. In: Proc. of ACM MobiHoc ’02, ACM Press (Jun. 2002) 80–91 12. Li, X.Y., Wan, P.J., Wang, Y., Yi, C.W.: Fault Tolerant Deployment and Topology Control in Wireless Networks. In: Proc. of ACM MobiHoc ’03. (Jan. 2003) 117–128 13. Bettstetter, C.: On the Connectivity of Ad Hoc Networks. The Computer Journal, Special Issue on Mobile and Pervasive Computing 47(4) (2004) 432–447

Placing and Maintaining a Core Node in Wireless Ad Hoc Sensor Networks Amit Dvir and Michael Segal Department of Communication Systems Engineering Ben Gurion University of the Negev Israel azdvir, [email protected]

Abstract. Wireless Ad hoc sensor networks are characterized by several constraints, such as bandwidth, delay, power, etc. These networks are examined by constructing a tree network. A core node usually chosen to be the median or center of the multicast tree network with a tend to minimize a performance metric, such as delay or bandwidth. In this paper, we present new efficient strategy for constructing and maintaining a core node in multicast tree for wireless ad hoc sensor networks that undergo dynamic changes based on local information. The new core (centdian) function is defined by convex combination that signifies total bandwidth and delay constraints. We provide two bounds of O(d) and O(d + l) time for maintaining the centdian using local updates, where l is the hop count between the new center and the new centdian and d is the diameter. We also show a O(n log n) time solution for finding centdian in the Euclidian complete network using interesting observations. Finally a simulation is presented.1 Keywords: Sensor networks, Wireless Ad hoc Networks, Multicast tree, Core Node.

1 Introduction Wireless Ad hoc sensor networks is a network architecture that can be rapidly deployed without relying on pre-existing fixed network infrastructure . Wireless communication is used to deliver information between nodes, which may be mobile and rapidly change the network topology. The wireless connections between the nodes (which later will be referred as links or edges) may suffer from frequent failures and recoveries due to the motion of the nodes and due to additional problems related to the propagation channels (e.g. obstructions, noise) or power limitations. A wireless ad hoc sensor network consists of a number of sensors spread across a geographical area. Each sensor has wireless communication capability and some level of intelligence for signal processing and networking of the data. Recently, wireless sensor networks have been attracting a great deal of commercial and research interest [13,27,29]. In particular, practical emergence of wireless ad hoc networks is widely considered revolutionary both in terms of paradigm shift as well as enabler of new applications. 1

This research has been partially supported by INTEL and REMON consortium.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 13–24, 2007. c IFIP International Federation for Information Processing 2007

14

A. Dvir and M. Segal

Group communication is the basis for numerous applications in which a single source delivers concurrently identical information to multiple destinations. This is usually obtained with efficient management of the network topology in the form of tree having specific properties. For example, multicast routing refers to the construction of a spanning tree rooted at the source and spanning all destinations [3,11,25,30,36]. Delivering the information only through edges that belong to the tree generates an efficient form of group communication which uses the smallest possible amount of network resources. In contrast, with unicast routing from the source to each destination, one needs to find a path from the source to each destination and generates an inefficient form of group communication where the same information is carried multiple times on the same network edges and the communication load on the intermediate nodes may significantly increase. We notice that wireless ad hoc sensor networks pose the reliable and efficient communication services necessary for distributed computing [6, 31], while objective functions considered are the most classical that involve the minimization of the average or the maximum distance to service facilities. Generally, there are two well-known basic approaches to construct multicast trees: the minimal Steiner tree (SMT) and the shortest path tree (SPT). Steiner tree (or groupshared tree) tends to minimize the total cost of a tree spanning all group nodes with possibly additional non group member nodes. The optimal construction of the SMT is known to be a NP-hard problem [14,22]. Some heuristics that offer efficient solutions to this problem are given in [21, 37]. The best up today solution has been derived by [38] and proved factor of 1.55. In contrary, SPT tends to minimize the cost of each path from the root source to each destination. This can be achieved in polynomial time by using the well-known algorithm by Dossey et al. [12]. The goal of a SPT is to preserve the minimal distances from the root to the nodes without any attempt to minimize the total cost of the tree. Gupta and Srimani [16] present distributed core selection and migration protocols for multicast tree in MANET with dynamically changing network topology. The proposed core location method is based on the notion of median node of the current multicast tree instead of the median node of the entire network. The adaptive distributed core selection and migration method uses the fact that the median of a tree is equivalent to the centroid of that tree. Gupta and al. [17] present efficient core migration protocol for MANET that migrates the core until the multicast tree branches reflect the desired QoS requirements of the multicast application where the proposed core location method is based on the notion of center node of the current multicast tree. Bing-Hong and al. [9] gave heuristic to the minimum non-leaf multicast tree problem that reduce the number of non-leaves nodes in the multicast tree and their experimental results show that the multicast tree after the execution of their method has smaller number of non-leaves than others in the geometrically distributed network model. The bandwidth of a tree is defined as the total distance of packet transmissions required to deliver packet from core node v to all others nodes. The maximum delay of the tree is the maximum distance to traversed by any packet in traveling from core node v to other node. The transport of a node is defined as the total distance of the node to all others nodes in the tree. The corresponding solution concepts have been considered in literature as median and center [26, 28, 39]. Since the median approach is based on

Placing and Maintaining a Core Node in Wireless Ad Hoc Sensor Networks

15

averaging, it often provides a solution in which remote and low-population density areas are discriminated against in terms of accessibility to public facilities, as compared with centrally situated and high-population density areas. For this reason, an alternative approach, involving the maximum distance between any customer and closest facility can be applied. This approach is referred to as the center solution concept [4]. The minmax objective primarily addresses geographical equity issues, which are of particular importance in spatial organization of emergency service systems. On the other hand, locating a facility at the center may cause a large increase in the total distance, thus generating a substantial loss in spatial efficiency. The problems of using only center or median as a core lead to search for a compromised solution concept called centdian, where centdian function presents some kind of trade-off between the center and the median functions ( [19]). The centdian function for node v in the network is defined by Dv = λ · sum(v) + (1 − λ) · dist(v), 0 ≤ λ ≤ 1 where dist(v) is the maximum distance from node v to other nodes in the network, sum(v) is the sum of distances from node v to all other nodes in the networks. Halpern [18] introduced the centdian model and studied the properties of the centdian in a tree. In a subsequent work, Carrizosa et al. [10] presented an axiomatic approach justifying the use of the centdian criterion. Tamir et al. [41] present the first polynomial time algorithm for the p-centdian problem on a tree with O(pn6 ) complexity where p is the number of facilities. For more results about centdian problem, see [2,7,20,32,33,40]. Other related notion of ordered median of a tree ( [5, 23, 34, 35]) generalizes the most common criteria mentioned above, e.g., median, center and centdian. If there are n demand points in a tree T , this function is characterized by a sequence of reals, κ = (κ1 , . . . , κn ), satisfying κ1 ≥ κ2 . . . ≥ κn ≥ 0. For a given subtree S ∈ T , let X(S) = {x1 , . . . , xn } be the set of weighted distances of the n points to S. The value of the ordered median objective at S is obtained as follows: Sort the n elements in X(S) in non-increasing order, then compute the scalar product of the sorted list with the sequence κ. It is easy to see that when κi = 1, i = 1, . . . , n, we get the median objective and when κ1 = 1 and κi = 0, i = 2, . . . , n, we obtain the center objective. For the case κ1 = 1 and κi = λ, i = 2, . . . , n we get the centdian objective. Unfortunately, constructing and maintaining cores by use of ordered median technique is not suitable for wireless ad hoc sensor networks, since this technique requires keeping some global information about nodes of network which is completely inconceivable in the case of wireless ad hoc sensor networks. Most protocols for constructing core node are not suitable for wireless ad hoc sensor networks, since these algorithms are not based on local updates. In this paper, we present new efficient strategy for constructing and maintaining a core node under centdian criteria in multicast tree for wireless ad hoc sensor networks with dynamic changes in the network topology. The new core node is defined by convex combination of the sum of the weighted distance paths (sum of the weighted edges in the path) of all the nodes in the tree network to the core node and the maximum weighted distance from the core node to the farthest node in the tree network satisfied center and median core functions. We also provide two bounds of O(d) and O(d + l) time for maintaining the centdian after a change (add/remove edge/node) in the topology of the tree network, where l is the hop count between the new center and the new centdian of the multicast

16

A. Dvir and M. Segal

tree and d is the diameter of the tree. We show an O(n log n) time algorithm for finding a centdian node in the Euclidian complete network bases on observation in [8]. Finally, we present a simulation that compare our new core solution with well known cores’ strategies to exhibit the advantages and the efficiency of our algorithms. This paper is organized as follows: Section 2 presents a new algorithm that finds and maintains a centdian core in a multicast tree. In Section 2.2 we show a solution for a static Euclidian network. Next we show our simulation results and finally, we conclude with several ideas for future work.

2 Algorithm to Find Centdian of Multicast Tree in Wireless Ad Hoc Sensor Networks We model the topology of wireless ad hoc sensor networks by weighted undirected graph G(V, E, We ), where V is the set of nodes, E is the set of edges between neighboring nodes and We is an edge weight function, e.g. squared distance between the endpoints of edges. Note that the edges represent logical connectivity between nodes, i.e., there is an edge between two nodes u and v if they can hear each other’s local broadcast. Since the nodes are mobile, the network topology graph stochastically changes. Let us define by T(V’, E’) a weighted multicast tree of G. For a node v ∈ T we define by Kv the number of nodes in the connected component containing v (created by removing e(v, x)); by Wv the total sum of weighted distances from the nodes in the connected component containing v (created by removing e(v, x)) to node v. Center of a tree T is a node c1 ∈ T such that the maximal distance from c1 to any other node in T is minimized, i.e. dist(c1 , T ) = minv∈T dist(v, T ). In order to find a center of tree T we can use the distributed algorithm described in [26] that requires r(I)+(d(T )/2) time, where r(I) is the the maximal weighted distance from the initiator node I to any other node in T and d(T ) is the weighted diameter of the tree. This algorithm finds the center node by starting form an arbitrary node I and goes from the internal nodes towards the leaves and back to the new center using the information from the leaves about the weighted distance path and the knowledge that the center of the tree lies on the diameter of the tree. Median of a tree T is a node c2 ∈ T such that the sum of the weighted distances from c2 to any other node in T is minimized, i.e. sum(c2 , T ) = minv∈T sum(v, T ). In order to find a median of a tree T we can use the distributed algorithm in [26] that requires maxx∈T (r(I) + d(x, c2 )) time, where d(x, c2 ) is the weighted distance between node x and the new median. This algorithm finds the median node by starting form an arbitrary node I and goes from the internal nodes towards the leaves. Each leave propagates the weight of its edge and each internal node propagates the sum of values obtain from its descendants plus the weight of the edge connecting him to its predecessor in the tree. Next, we show a simple algorithm to find the number of nodes in each one of node v branches. We define by Kvi , i = 1 . . . b, to be the number of nodes in the ith branch of node v, with b standing for the number of branches of node v. By convergecast process from the leaves towards the center of the tree we can find the total number of nodes in the tree. By knowing this number, we start a new process from the leaves to find for each node v its values Kvi . Each leaf sends to its father w in a rooted tree T a num(1)

Placing and Maintaining a Core Node in Wireless Ad Hoc Sensor Networks

17

message. Each internal node w gets from his sons their num messages and sums all the values in the messages. The process converges towards c1 . It is a well-known fact that a centdian is located on the path connecting center c1 and median c2 ( [18]). The following lemma presents an efficient way of calculating centdian in a multicast tree based on knowledge about location of the center and median. Lemma 1. Dv > Dx iff λ((Kx − Kv ) + 1) < 1. Proof: A centdian node x must minimize expression Dx = λ · sum(x, T ) + (1 − λ) · dist(x, T ) . Denote by Dv = λ · sum(v, T ) + (1 − λ) · dist(v, T ) the cost of node v who is the neighbor of current centdian x with the minimum value of D from x neighbors. We should move the centdian towards v only if Dx > Dv . Notice T ) . We conclude that sum(x, T ) = Wx + (Kv + that dist(x, T ) = Dx −λsum(x, (1−λ) 1) · d(x, v) + Wv and sum(v, T ) = Wv + (Kx + 1) · d(x, v) + Wx . Therefore, Dv = λ( Wv + (Kx + 1) · d(x, v) + Wx ) + (1 − λ)dist(v, T ). It easy to see that there are 5 different cases (out of 9) that we should deal when Dx > Dv , 1) sum(v, T ) < sum(x, T ) and dist(v, T ) < dist(x, T ). 2) sum(v, T ) > sum(x, T ) and dist(v, T ) < dist(x, T ). 3) sum(v, T ) = sum(x, T ) and dist(v, T ) < dist(x, T ) . 4) sum(v, T ) < sum(x, T ) and dist(v, T ) = dist(x, T ). 5) sum(v, T ) < sum(x, T ) and dist(v, T ) > dist(x, T ). We present an analysis only for cases 1-3 that are relevant (case 4 is trivial (sum(v, T ) = sum(x, T ))) and case 5 is equivalent to cases 1-3. In cases 1-3 |dist(x, T ) − dist(v, T )| ≤ d(x, v), therefore Dv = λ( Wv + (Kx + 1) · d(x, v) + Wx ) + (1 − λ)dist(v, T ) = λ( Wv + + (Kx + 1) · d(x, v) + Wx ) + (1 − λ)(dist(x, T ) − dist(x, v)) = λ( Wv + T ) + (Kx + 1) · d(x, v) + Wx ) + (1 − λ)( Dx −λsum(x, − dist(x, v)) = (1−λ) = λ( Wv + (Kx + 1) · d(x, v) + Wx ) + Dx − λsum(x, T ) − dist(x, v)(1 − λ) = = λWv + λ (Kx + 1) · d(x, v) + λWx + Dx − λsum(x, T ) − dist(x, v)(1 − λ) = = λWv + λ (Kx + 1) · d(x, v) + λWx + Dx − λ(Wx + (Kv + 1) · d(x, v) + Wv )− −dist(x, v)(1 − λ) = λWv + λ (Kx + 1) · d(x, v) + λWx + Dx − λWx − λ(Kv + 1)· ·d(x, v) − λWv − dist(x, v)(1 − λ) = Dx + λd(x, v)(Kx − Kv ) − dist(x, v)(1 − λ).

As we stated above, we move the centdian node only if D1 > D2 . This happens when Dx − (Dx + λd(x, v)(Kx − Kv ) − dist(x, v)(1 − λ)) > 0 or in other words λ(Kx − Kv + 1) < 1. In case 4 dist(x, T ) = dist(v, T ), thus Dv = λ( Wv + (Kx + 1) · d(x, v) + Wx ) + (1 − λ)dist(x, T ). Analysis similar to the previous one shows that Dx > Dv only if λd(x, v)(Kx − Kv ) < 0. In case 5 we get that Dx > Dv only if λ((Kv − Kx ) − 1) < −1. It follows that the inequality Dx > Dv holds when λ(Kx − Kv + 1) < 1

18

A. Dvir and M. Segal

Based on this lemma, we can locate the centdian in the multicast tree, starting either at center or the median of the multicast tree and going over the path between them, locally improving from one neighboring node to another. Thus, the centdian in the tree can be found in O(l) time when the location of the center and the median are known, with l standing for the number of nodes in the path connecting center and median. 2.1 Maintaining a Centdian in a Multicast Tree When we have some change in the multicast tree, each node needs to update the number of nodes in each of its branches. By using Kx values of the centdian node, our maintaining algorithms running on the subtree with the highest number of the multicast group members. In what follows we show two different approaches to maintain centdian in a multicast tree. Both approaches use the fact that the centdian function is convex and therefore has only one minimum point. The first approach is to maintain the center of the tree in O(d) time using the algorithm in [24]. The new centdian lies on the path between the center and the median of the updated tree. Therefore, starting from the new center and finding the node that locally improves the centdian function value, point us the direction towards the new centdian of the tree. The centdian, thus, can be maintained in worst-case O(d + l) time, where l is the hop count between the new center and the new centdian of the multicast tree and d is the diameter of the tree. The second approach uses the fact that the neighbor of the old centdian x that the most improves the centdian function value lies on the path between the old centdian and the new centdian. Therefore, the centdian can be maintained in worst-case O(d) time. Since we want only multicast group members to be assigned the responsibility of core node, the second approach needs to be modified. If the new centdian node is a multicast member it becomes the actual new centdian of the tree. If not, we seek the path towards the old centdian in order to find a node that belongs to multicast group and declare this node to be the new actual centdian of the multicast tree. 2.2 Algorithm to Find Centdian in Euclidean Plane We model the topology of planar network as explained above by having the edge weight function defined as the squared distance between the nodes. The motivation to choose this function is the common method that power transmitting behaves quadratically to the distance between transmitting and receiving node. Using the observation in Bespamyatnikh et al. [8] we are able to solve the centdian problem in Euclidean plane in O(n log n) time. The farthest point Voronoi diagram of a collection of points S in the plane is a partition of plane into cells, each of which consists of the points further to one particular points than to any others. This diagram can be constructed in O(n log n) time supporting a query requests in O(log n) time. For a given point p, a query asks about the farthest neighbor of p in S. Thus, in O(n log n) time we can find, for each point, its farthest neighbor performing total n queries. In other words, for every node v in the network, we find dist(v) in total O(n log n) time. Bespamyatnikh et al. [8] observed that “squared” Euclidean metric is separable, i.e. the distance between two points is the sum of their squared x and y-coordinates’ differences. We follow the notations from [8]. We sort our points according to their x and

Placing and Maintaining a Core Node in Wireless Ad Hoc Sensor Networks

19

y-coordinates. Let {p1 , . . . , pn } be the sorted points. For every point pi ∈ S we can compute the sum Σix of the x-distances from pi to the rest of the points in S. This is performed efficiently as follows. For the point p1 we compute Σ1x by computing and summing up each of the n − 1 distances. For 1 < i ≤ n we define Σix recursively: asx sume the x-distance between pi−1 and pi is δ, then Σix = Σi−1 +δ·(i−1)−δ·(n−i+1). x The sums Σi (for i = 1, . . . n) can be computed in linear time when n the points are y sorted. The value of i is computed similarly. Next, let sumxi = j=1 (xj − xi )2 . The recursion formula for computing all the squared x-distances is easily computed to be sumxi = sumxi−1 − 2δ xi−1 −nδ 2 , where the x-distance between pi−1 and pi is δ. Assume the point p ∈ S is ith in the x order and j th in the y order. The sum of squared Euclidean distances from p to the points in S is sum(p) = sumxi + sumyj . It remains to compute, for every node v the value of centdian function based on values of computed already sum(v) and dist(v) values. This is done in linear time. Thus, we can conclude, Theorem 1. Given a set S of n nodes in Euclidean complete graph with a cost of every edge that equals the squared Euclidean distance between nodes, we can find the centdian node in this graph in O(n log n) time.

3 Simulation This section describes the medium-scale experiment in details. The objectives of the experiment were to test whether the suggested maintaining algorithm actually works, and to compare its results to the performance of other core algorithms. For this simulation we choose to implement the second approach. As we performed our simulation we made an interesting observation about the runtime bound of the first approach of maintaining the centdian node. 3.1 Environment The following assumptions have been made: – For each node, the transmission and reception range are equal; however different nodes can have different ranges. The radius value refers to the transmission range. – All the nodes are equal in their functionalities and abilities. – The movement of each node is based on mobility model of random walk based on random directions and speeds ([15]).2 – There is no dependence between the nodes and the boundary of the network is predefined. In our simulation we used 5 different types of cores: center, median, continuous median, centdian and continuous centdian (with different values of λ). The difference between continuous and non continuous core is that non continuous core can “jump” from one node to another while continuous core keep continuous track of the path of previous core towards the newly computed core. 2

In this model each node moves from its current location to a new location by randomly choosing speed and direction in which to travel.

20

A. Dvir and M. Segal

3.2 Results The main goal of our simulation is to examine the influence of the multicast group on the cores’ behavior. One of the parameters we wish to exam is the period of time that cores are co-located at the same node (defined by collision). From the obtained results we can learn that the behavior of the cores in the multicast tree is sometimes similar to the behavior of the cores in regular network tree, for example: – When λ = 1 the collision between the centdian core and center core is 100%. – When λ ≤ 0.5 the collision between the centdian core and median core is 100% as has been proved in [18]. In some cases, the collision value between the median/center core and centdian core should be 1, but in our simulation the collision value in those cases are less then 1. The reason for that is the well known fact that in a tree two centers/medians may possibly exist. In the simulation we choose them in arbitrarily fashion. We simulate ad hoc sensor network with 100 nodes and 50 multicast nodes with a variety range of radius and λ values in network boundary of 600x600 meters. Figures 1–7 show the influence of the radius on the constraints’ values of the network. In particular, Figures 1–2 show the transport and delay values of the tree network as an unimodal linear function with a break point being a maximal value. The reason for that is as long as the radius is growing the network becomes more connected and more nodes are participating in the network. Starting at some point of time the network becomes to be connected and the pathes from nodes to the core contain small amount of hops. Figure 3 shows the connection between the radius and the life span of the cores, with life span being the period of time/rounds that the core does not change its location. It easy to see that as we increase the radius the life span also grows up. The collision between the cores with various values of λ and radius is depicted in Figures 4–7. In Figure 4 we focus on the collision between the centdian and median, while in Figures 5–6 we have examined the collision between the continuous centdian and the well known cores. Figure 7 shows the collision between the new centdian and the new center. From Figure 8 we can learn that for most radius’ values, the value l (number of hops between center and centdian) is small. The continuous centdian core

Fig. 1. Total transport of the cores with variety Fig. 2. Total delay of the cores with variety values of radius values of radius

Placing and Maintaining a Core Node in Wireless Ad Hoc Sensor Networks

21

Fig. 3. Cores life span with variety values of ra- Fig. 4. The collision between the Centdian and Median with variety values of radius and λ dius

Fig. 5. The collision between the Centdian and Fig. 6. The collision between the Continuous Continuous Centdian with variety values of ra- Centdian and Continuous Median with variety dius and λ values of radius and λ

Fig. 7. The collision between the Centdian and Fig. 8. The Average Hop Count between CentCenter with variety values of radius and λ dian and Center with variety values of radius and λ

achieves improved convergence to delay performance than median core and better transportation performance than center core. The continuous centdian core achieves these properties in a well connected networks, as well as in sparse networks too.

22

A. Dvir and M. Segal

4 Conclusion and Future Work We have developed a new distributed algorithm for finding and maintaining centdian core in ad hoc sensor network that is based on processing local information of the network. Analytic analysis to bound value l seems to be very interesting. One interesting future direction is by adapting self-stabilizing algorithm to core selection problem in ad hoc sensor network when it gets partitioned and partitions get connected. The analysis of the model where one assumes some distribution for the velocities of the nodes seems also attractive.

References 1. S. Alstrup, J. Holm and M. Thorup, “Maintaining median and center in dynamic trees”, In 7th Scandinavian Workshop on Algorithm Theory,(2000), pp. 46-56. 2. I. Averbakh and O. Berman, “Algorithms for path medi-centers of a tree”, Computers Operation Research, Vol. 26, (1999), pp. 1395-1409. 3. T. Ballardie, P. Francis, and J. Crowcroft, “Core Based Trees (CBT): An Architecture for Scalable Inter Domain Multicast Routing” Proc. ACM SIGCOMM, (1993), pp. 85-95. 4. J. Bar-IIan, G. Kortsarz and D. Peleg, “How to allocate network centers”, Journal of Algorithms, Vol. 15, (1993), pp. 385-415. 5. R. Benkoczi, B. Bhattacharya and A. Tamir, “Collection Depots Facility Location Problems in Trees”, submitted, http://www.tau.ac.il/ atamir/tamirp.html. 6. Y. Ben-Shimol, A. Dvir and M. Segal, “SPLAST: A novel approach for multicasting in mobile as hoc networks”, IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, Vol. 2, (2004), pp. 1011-1015. 7. O. Berman and E. K. Yang, “Medi-centre location problems”, Journal Operation Research Society, Vol. 42, (1991), pp. 313-322. 8. S. Bespamyatnikh, K. Kedem, M. Segal and A. Tamir “Optimal facility location under various distance functions”, Int. Journal on Comp. Geometry and Appls., 10(5), (2000), pp. 523–534. 9. L. Bing-Hong, K. Wei-Chieh and T. Ming-Jer, “Distributed formation of core-based forwarding multicast trees in mobile ad hoc networks”, Telecommun System, Vol. 32, (2006), pp. 263-281. 10. E. J. Carrizosa, E. Conde, F. R. Fernandez and J. Puerto, “An axiomatic approach to the centdian criterion”, Location Science, Vol. 2, (1994), pp. 165-171. 11. S. E. Deering, D. Estrin, D. Farinacci, V. Jacobson, C.-G. Liu, and L. Wei, “The PIM Architecture for Wide-Area Multicast Routing”, IEEE/ACM Trans. Networking, Vol. 4, Num. 2, (1996), pp. 153-162. 12. J. Dossey, A. Otto, L. Spence, C. Eynden, Discrete Mathematics, Harper Collins College Publishers, (1993). 13. S. Dulman, M. Rossi, P. Havinga and M. Zorzi, “On the hop count statistics for randomly deployed wireless sensor networks”, International Journal of Sensor Networks, Vol. 1, (2006), pp. 89-102. 14. M. R. Garey, R.L. Graham, D.S. Johnson, “The complexity of computing Steiner minimal trees”, SIAM J. Appl. Math., Vol 32, (1977), pp. 835-859. 15. L. M. Gavrilovska and V. M. Atanasovski, ”Ad Hoc Networking Toward 4G: Challenges and QoS Solutions”, International Conference on Telecommunications in Modern Satellite, Cable and Broadcasting Services, (2005).

Placing and Maintaining a Core Node in Wireless Ad Hoc Sensor Networks

23

16. S. K. S. Gupta, P. K. Srimani, “Adaptive core selection and migration method for multicast routing in mobile ad hoc networks”, IEEE Transactions on Parallel and Distributed Systems, Vol. 14, (2003), pp. 27-38. 17. S. K. S. Gupta, L. Schwiebert, C. Jiao and M. Kochhal, “An Efficient Core Migration Protocol for QoS in Mobile Ad Hoc Networks” IEEE Int’l Performance Computing, and Communications Conference, (2002), pp. 387-391. 18. J. Halpern, “Finding minimal center-median convex combination (cent-dian) of a graph”, Management Science, Vol. 16, (1978), pp. 534-544. 19. J. Halpern, “The location of a centdian convex combination on an undirected tree”, Journal Regional Science, Vol. 16, (1976), pp. 237-245. 20. G. Y. Handler, “Medi-centers of a tree”, Transportation Science, Vol. 19, (1985), pp. 246260. 21. F. Hwang F, D. Richards, “Steiner Tree Problems”, Networks, Vol 22, Num 1, (1992), pp. 55-89. 22. F. K. Hwang, D. S. Richards, P. Winter. The Steiner tree problem, North-Holland, (1992), pp. 93-202. 23. J. Kalcsics, S. Nickel, J. Puerto and A. Tamir, “Algorithmic results for ordered median problems defined on networks and the plane”, Operations Research Letters, Vol. 30, (2002), pp. 149-158. 24. M. H. Karaata, S. V. Pemmaraju, S. C. Bruell and S. Ghosh, “Self-stabilizing algorithms for finding centers and medians of trees”, 13th Annual ACM Symposium on Principles of Distributed Computing, (1994), pp. 374-395. 25. Y. B. Ko and N. H. Vaidya. “Geocasting in Mobile Ad hoc Networks: Location-Based Multicast Algorithms”. In Second IEEE Workshop on Mobile Computing Systems and Applications, (1999), pp. 101100. 26. E. Korach, D. Rotem, N. Santoro, “Distributed Algorithms for Finding Centers and Medians in Networks”, ACM Trans. Program. Lang. Syst.,Vol 6, (1984), pp. 380-401. 27. J. Lansford and P. Bahl, “The Design And Implementation Of HomeRF: A Radio Frequency Wireless Networking Standard For The Connected Home” Proceedings of the IEEE, Vol. 88, (2000), pp. 1662-1676. 28. J. H. Lin and J. S. Vitter, “Approximation algorithms for geometric median problems”, Information Processing Letters, Vol. 44, (1992), pp. 245-249. 29. S. Meguerdichian, F. Koushanfar, G. Qu and M. Potkonjak, “Exposure In Wireless Ad-Hoc Sensor Networks”, Proc. ACM MobiCom, (2001). 30. Mobile Ad hoc Networks (MANET) Charter. MANET Homepage. http://www.ietf.org/ html.charters/manetcharter. html. 31. E. Pagani and G. P. Rossi, “Reliable broadcast in mobile multi-hop packet networks”, IEEE international conference on Mobile computing and networking, (1997), pp. 34-42. 32. D. Perez-Brito, J. A. Moreno-Perez and I. Rodriguez-Martin, “The finite dominating set for the p-facility centdian network location problem”, Location Science, Vol. 11, (1997), pp. 27-40. 33. D. Perez-Brito, J. A. Moreno-Perez and I. Rodriguez-Martin, “The 2-facility centdian network problem”, Location Science, Vol. 6, (1998), pp. 369-381. 34. J. Puerto, A. M. Rodriguez-Chia, A. Tamir and D. Perez-Brito, “The bi-criteria doubly weighted center-median path problem on a tree”, Networks, Volume 47, Issue 4, (2006), pp. 237-247. 35. J. Puerto and A. Tamir, “Locating tree-shaped facilities using the ordered median objective”, Mathematical Programming, Vol. 102, (2005), pp. 313-338. 36. T. Pusateri. Distance Vector Multicast Routing Protocol. Internet Draft draft-ietf-idmrdvmrp-v3-09.txt, Internet Engineering Task Force, (1999).

24

A. Dvir and M. Segal

37. R. Ravi, “Steiner trees and beyond: Approximation algorithms for network design”, Ph.D. Dissertation, Brown University, (1993). 38. G. Robins, A. Zelikovsky. “Improved Steiner tree approximation in graphs”, Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms, (2000), pp. 770-779. 39. A. Tamir, “An O(pn) algorithm for p-median and related problems on tree graphs”, Operations Research Letters, Vol. 19, (1999), pp. 59-64. 40. A. Tamir, “Fully polynomial approximation schemes for locating a tree-shaped facility: A generalization of the knapsack problem”, Discrete Applied Mathematics, Vol. 87, (1998), pp. 229-243. 41. A. Tamir, D. Perez-Brito and J. A. Moreno-Perez, “A polynomial algorithm for the p-centdian problem on a tree”, Networks, Vol. 32, Num. 4, (1998), pp. 255-262.

Flooding Speed in Wireless Multihop Networks with Randomized Beamforming Vasil Mizorov1 , J¨ org Widmer2 , Robert Vilzmann3 , and Petri M¨ ah¨ onen4 1

Siemens AG, Corporate Technology, D-81730 Munich, Germany 2 DoCoMo Euro-Labs, D-80687 Munich, Germany 3 Technische Universit¨ at M¨ unchen, D-80290 Munich, Germany 4 RWTH Aachen University, D-52072 Aachen, Germany

Abstract. This paper analyzes aspects of message propagation in multihop wireless networks with beamforming antennas. In particular, we focus our intention on the message propagation in the time domain. Our work uses a simulation based implementation of the 802.11 MAC protocol and a simpliﬁed version of a previously proposed MAC protocol, called BeamMAC [1]. Both protocols are compared under diﬀerent network scenarios with several antenna array implementations (including an omnidirectional antenna). Our conclusions conﬁrm the advantages beamforming antennas have over omnidirectional antennas in wireless multihop networks. Reduced hop distances and reduced time for information dissemination speed up ﬂooding of messages. Moreover, we observe the impact network topology parameters have on the overall performance of the message propagation. Keywords: Ad Hoc Networks, antenna arrays, randomized beamforming, ﬂooding speed.

1

Introduction

The concept of wireless ad hoc networking has been until recently considered only with omnidirectional antennas. Their advantage is that they are small, compact, spatial and radiate power omnidirectionally, i.e. equally in all spatial directions. However, they cause higher interference and block transmissions of other network nodes, signiﬁcantly reducing the capacity and the throughput of the network. Seeking a way to increase the network capacity and throughput, directional (i.e. beamforming) antennas have been addressed. Their most important feature, to focus the energy into speciﬁc spatial directions, has proven to be appealing for providing higher network capacity and greater spatial reuse. There are some downsides in implementing beamforming antennas for wireless ad hoc networking. Firstly, an antenna array means increased hardware size as opposed to the small size of the wireless gadgets. However, the latest technology allows antenna arrays to be smaller in size, making their implementation easier. Secondly, beamforming antennas must “know” the direction of the intended recipient. Otherwise, they might “miss” and radiate in a nonoptimal I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 25–36, 2007. c IFIP International Federation for Information Processing 2007

26

V. Mizorov et al.

direction. Therefore, additional signal processing algorithms, like Direction-ofArrival (DoA) or Angle-of-Arrival (AoA) algorithms, are necessary for achieving optimal performance. Despite these facts, some research papers report achieving signiﬁcantly larger gains in terms of network throughput when deploying beamforming antennas. However, they are based on knowing the neighbors’ locations so each node can properly position its antenna beam ([2], [4], [3]). The location information can be obtained by means of Global Positioning System (GPS) [6] or AoA or DoA estimation algorithms ([4], [5]). Bearing in mind battery life consumption, complexity and sometimes processing capabilities of the mobile devices, we believe that these algorithms will overburden the devices and reduce their usage time. Therefore, to reduce the implementation complexity and to simplify the communication, we use randomized beamforming [7]. Nodes choose the direction of radiation randomly and avoid signal processing complexity. Thus, it turns out to be a practical approach when no a priori information is available about location of the nodes. With respect to network topology properties, authors in [7] show that the randomized beamforming improves the connectivity in the network. Due to the longer links beamforming antennas provide, it is possible to ”build a bridge” among previously isolated subnetworks [7]. In addition, [9] discusses the hop distances when randomized beamforming is implemented. It shows that the network diameter, as well as the random node pair hop distance are significantly reduced in the case of randomized beamforming. These ﬁndings are very interesting considering that the randomized beamforming introduces zigzag paths, which may increase the hop distances and lead to slower message dissemination. However, authors in [7] and [9] do not consider the eﬀects that a Medium Access Control (MAC) layer introduces to the process of message propagation. The work presented in this paper aims to further investigate the communication features in wireless ad hoc networks with randomized beamforming. In particular, we focus our intention on the time domain of the message propagation in these networks. As this type of study was missing in the scope of the related papers [9] and [7], our work represents the ﬁrst step towards more realistic approach in the investigation of the time-related features in the message propagation in networks implementing randomized beamforming. For comparison purposes, we simulate the IEEE 802.11 MAC protocol with both, omnidirectional and beamforming antennas. We use ﬂooding speed as main performance metric. In addition, we analyze the route discovery process and discuss its impact on the message propagation. The remainder of this paper is organized as follows. Section 2 describes the proposed BeamMAC protocol. Section 3 explains the antenna and link model, as well as the scenarios used in our simulations. In section 4 we present the time aspects of the message propagation. In addition, we discuss the route discovery process and the impacts beamforming antennas have on it. Finally, Section 5 concludes the work.

Flooding Speed in Wireless Multihop Networks

2

27

Protocol Model

Work done in the ﬁeld of implementing beamforming antennas for ad hoc networking resulted in various modiﬁcations of the IEEE 802.11 MAC protocol. Some propose extending the Network Allocation Vector (NAV) into directional NAV (D–NAV) by keeping directionality related information [3]. Other proposals implement directional and omnidirectional transmission of Request-To-Send (RTS) and Clear-To-Send (CTS) messages [2]. However, as the 802.11 protocol was designed for omnidirectional transmissions, network performance can deteriorate due to issues speciﬁc to directional antennas [8]. Therefore, in order to investigate the time aspects of message propagation in wireless multihop networks, we evaluate a simpliﬁed version of the BeamMAC [1] protocol. It gains access to the wireless transmission channel using the following control packets: – Announcement (ANN) – Ready-To-Receive (RTR) – Objection. (OBJ) A node willing to initiate a data transmission, must announce it beforehand. For this purpose, it sends an ANN to inform the transmitter’s surrounding of the forthcoming transmission. In other words, each desired transmission is ”simulated” before being carried out. If the intended destination of the communication, for which the ANN packet is meant, is idle (i.e. not transmitting or receiving), it transmits an RTR packet back to the transmitter. The idea here is to inform the transmitter that the desired addressee is available. Upon reception of an ANN, each neighbor currently engaged in a parallel communication as a receiver, determines the interference that would be caused by the forthcoming transmission.

ANN

Source

Destination Other Receiving Stations

DATA

RTR

ACK

OBJ Estimation

Fig. 1. BeamMAC Channel Access

If the interference is so high to degrade the ongoing communication, the receiver sends an OBJ back to the sender of the ANN. In case the level of the interference maintains an acceptable level, the receiver does not send an OBJ back. In case an OBJ is received, the node enters a backoﬀ state. Details of how an ANN packet is assessed by a receiving node are not discussed in this paper. Instead, we refer the interested reader to [1]. When the transmission is successfully “simulated” (ANN and RTR are sent, and no OBJ is received), the actual data packet can be sent. Upon error-free reception of the data packet, the receiving node transmits an acknowledgment (ACK) back.

28

V. Mizorov et al.

3

Network Model

3.1

Antenna Model

Antenna arrays, used in Multiple Input Multiple Output (MIMO) systems to increase the user data rate, can have diﬀerent shapes; most prominently, linear or circular. Antennas with linear geometry are referred to as linear antenna array, whereas antennas with circular geometry are known as circular antenna arrays. Circular antenna arrays oﬀer higher diversity and improved link capacity ([17], [18]. Thus, the antenna model used in our simulations is Uniform Circular Array (UCA) [7]. An UCA array comprises m identical isotropic radiators placed uniformly on a closed circumference. Each antenna element transmits with the same power pt /m at a wavelength λ = fc , with c = 3 · 108 m/s and carrier frequency f . By implementing a phase shift between the array elements, the resulting antenna beam pattern can be controlled. The shape of the resulting beam depends on the target direction Θb , known as boresight direction, and the number of antenna elements. Examples of antenna patterns for UCA antenna with m = 4 elements are shown in Figure 2. Directivity of a 4−element UCA

Directivity of a 4−element UCA 90

Directivity of a 4−element UCA

90

90

5

5

120

60

5

120

60

4

120

60

4

3

4

3

150

30

3

150

30

2

150

30

2

1

2

1

180

0

210

330

240

1

180

0

210

300 270

330

240

300

180

0

210

330

240

270

◦

(a) Θb = 0

300 270

◦

(b) Θb = 30

◦

(c) Θb = 70

Fig. 2. Gain patterns of UCA with m = 4 elements

In general, an antenna pattern consists of a main lobe and side lobes. The main lobe represents the radiation in the desired direction, whereas the side lobes refer to the radiation in all other directions. It can be noted from antenna theory [10] that with an increase in the number of elements in the antenna array the radiated power in the direction of the main lobe increases. Note that due to antenna reciprocity, the gain characteristic is valid for both, transmission and reception. 3.2

Wireless Link Model

The wireless link model is based on a line-of-sight communication between two nodes, given their transmission parameters and their distance. Figure 3 depicts the implemented link model.

Flooding Speed in Wireless Multihop Networks

29

Transmitter d

pt gt

gr

Receiver

pr

Fig. 3. Wireless link model

One node transmits the signal with power pt , which is received by the other node with power pr . The gain of the antenna at the transmitting node is gt . The gain of the receiver’s antenna in the corresponding direction toward the transmitter is gr . Thus, we can write −α d pr = gt gr , (1) pt 1m where α represents the pathloss exponent of the propagation environment. The value of α is environment-dependent and is approximately α = 2 for a free space scenario and α = 3...5 for urban areas [10]. The link establishment between two nodes assumes that the received power pr is above the receiver sensitivity pr0 , that is (2) pr ≥ pro . In the following, we assume that all nodes have the same transmission power pt and reception sensitivity pro . Thus, considering the fact that antenna pattern reciprocity holds (same antenna pattern for transmission and reception), all links in the network can be considered as bidirectional (or undirected) links. That is, if a node A can communicate with node B, then node B can communicate with node A, as well. One should note that our simulation model does not implement propagation phenomena like fading. 3.3

Randomized Beamforming

As mentioned in Section 1, in order to avoid implementing complex signal processing algorithms, we use a communication paradigm referred to as randomized beamforming [7]. Its implementation is based on nodes choosing a random direction where to point their antenna beams. With choosing both, the boresight direction and the antenna array direction, uniformly distributed in the interval [0, 2π] the shape of the resulting pattern is fully described. In addition, all nodes keep their beam direction constant the whole time, i.e. once chosen it does not change. 3.4

Network Topology and Scenarios

The network topology in our simulations comprise n nodes distributed uniformly at random on a square area with side length l. For obtaining the node coordinates

30

V. Mizorov et al.

(x, y) we use the Mersenne Twister pseudo-random number generator [11]. Nodes in the network are static. We use a simple ﬂooding mechanism to disseminate messages into the network. One node sends a packet and all other nodes forward the packet until received by every node. To more closely model reality, we borrow parameters from the IEEE 802.11 standard [12]. Namely, parameters used in our simulations are: frequency f = 2 GHz, pathloss exponent α = 3 (urban area), maximum transmission power pt = 0.1 W , communication threshold Pr0 = −111 dB, sensitivity threshold (for 802.11) Ps0 = −121 dB and transmission range for omnidirectional antenna Tx -Range = 121 m. In addition, we simulate the 802.11 protocol only with omnidirectional antennas, whereas the BeamMAC protocol with both, omnidirectional and UCA antennas (particularly UCA with m = 4 or UCA4 and m = 10 or UCA10). Table 1. Network scenarios Network size Small Medium Large

n

l (m) Area (km2 )

100 577 500 1290 2000 2580

0.33 1.66 6.65

We conduct our simulations with node density n/l2 = 300 km−2 . In order to obtain at least 95% connectivity in the network, we use calculations taken from [13] and calculate the area for a certain number of nodes so to guarantee the required connectivity. In fact, the connectivity in the network is above 99%. In order to perform a thorough analysis of the message dissemination process, we regard three scenarios in our simulations [15]. Namely, we use diﬀerent network sizes, i.e. small, medium and large, as we want to clearly investigate any connection between the size of the network and the performance of the both protocols. The number of nodes for each network scenario is 100, 500 and 2000 nodes, respectively. In addition, the network area is 0.33 km2 , 1.66 km2 and 6.65 km2 , respectively. The simulated scenarios (number of nodes, length, network area) are given in Table 1.

4

Simulation Analysis

For the purpose of conducting the analysis we apply a protocol driven ad hoc network simulation tool (PANTS). It is an event-based simulator developed in C++ language, incorporating realistic models for beamforming antenna patterns, calculated using accurate formulas provided by antenna theory [10] and moreover, implementing the two investigated protocols, IEEE 802.11 and BeamMAC. For visualization purposes of the network topology and the distribution of the nodes in it, our simulation tool uses the Library of Eﬃcient Data Types and Algorithms (LEDA) [14].

Flooding Speed in Wireless Multihop Networks

31

With respect to the simulated parameters, we were interested in the ﬂooding speed, ﬂooding time and route reply time. The ﬁrst one describes the speed with which a message propagates in the network, in terms of how many nodes have received the message until a certain time instant. The second parameter tells us how long it takes to ﬂood a message in the network. Finally, the third parameter helps us better understand the route reply in networks implementing randomized beamforming. To be compliant with the protocol speciﬁcations, for 802.11 we implement broadcasting as deﬁned in the IEEE 802.11 standard [12] (without RTS/CTS control handshake) and we omit the ACK packet in BeamMAC implementation. In addition, in the route reply process we make use of the 802.11 DCF function (RTS/CTS scheme) and all of the BeamMAC protocol functions. For accuracy purposes, we run a large number of simulations for every scenario. In particular, for the small and medium network we use 200 runs and for the large network we use 100 runs (due to large memory consumption). In addition, in our graphs we give a conﬁdence interval of one sigma (σ being standard deviation) which equals to a conﬁdence level of 68.27%. 4.1

Flooding Speed

We deﬁne the ﬂooding speed as a curve that gives the percentage of ﬂooded nodes depending on the time. We calculate the number of ﬂooded nodes on every packet transmission and the result (nodes, time) is represented as a point of the ﬂooding speed curve. Figures 4–6 depict the results we obtained from the considered scenarios given in Table 1. Flooding speed Nodes=100, Node density=300 (nodes/km2)

Percentage flooded nodes

100

80

60

40 802.11 BeamMAC-OMNI BeamMAC-UCA4 BeamMAC-UCA10

20

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Time (s)

Fig. 4. Flooding speed for 802.11 and BeamMAC protocol, small network

Figure 4 gives the results for the small network scenario (n = 100). It compares several diﬀerent combinations of antenna and MAC protocol, namely the 802.11 protocol (implemented with OMNI antenna) and the BeamMAC protocol implemented with directional antenna (UCA4 and UCA10) and OMNI antenna. The obtained ﬂooding speed curves are close to one another, which means that the advantages of the beamforming antennas are hardly noticeable.

32

V. Mizorov et al.

There is a minor diﬀerence in the Percentage of Flooded Nodes (PFN) between the BeamMAC and the 802.11 for a ﬁxed value of the time, which is about 1%–5% for BeamMAC-UCA4 and BeamMAC-UCA10, respectively. However, the BeamMAC-UCA10 scheme has the highest ﬂooding speed. The poor performance observed in the small network scenario is due to the impact of the border eﬀects on the message propagation. Namely, caused by the random beamforming and the fact that the network area is relatively small, many nodes transmit outside the area. This is more visible in the beamforming case, as their transmission range is up to several times bigger than the one of the omnidirectional antennas. Flooding speed 2 Nodes=500, Node density=300 (nodes/km )

Percentage flooded nodes

100

80

60

40 802.11 BeamMAC-OMNI BeamMAC-UCA4 BeamMAC-UCA10

20

0 0

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 Time (s)

Fig. 5. Flooding speed for 802.11 and BeamMAC protocol, medium network Flooding speed Nodes=2000, Node density=300 (nodes/km2)

Percentage flooded nodes

100

80

60

40 802.11 BeamMAC-OMNI BeamMAC-UCA4 BeamMAC-UCA10

20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Time (s)

Fig. 6. Flooding speed for 802.11 and BeamMAC protocol, large network

The medium network scenario shown on Figure 5 as well as the large network scenario on Figure 6 show an improvement in the message propagation for beamforming antennas. In particular, the larger the network size is, the better the ﬂooding speed of the beamforming antennas. In the latter network, the diﬀerence between the BeamMAC-UCA4 and the 802.11 is more than 20%, which makes the UCA4 antenna a more sensible and more practical solution. The impact that border eﬀects have on the message propagation is minor, as a larger portion of the nodes radiate inside the network area and less transmissions are void.

Flooding Speed in Wireless Multihop Networks

33

It can be noted from Figures 4–6 that the performance of the combination of the BeamMAC protocol and omnidirectional antenna (BeamMAC-OMNI) experiences the poorest performance. The main reason for this inferior behavior is protocol related. Namely, the BeamMAC protocol takes longer to access the wireless channel (ANN and RTR) compared with the 802.11 protocol (only DIFS - Distributed Inter Frame Space). Consequently, this has proven to be reason for the slower ﬂooding speed of the BeamMAC-OMNI implementation. 4.2

Flooding Time

The ﬂooding time parameter is deﬁned as time it takes to ﬂood the whole network. It can be derived from the ﬂooding speed curves, as the time instance when the percentage of ﬂooded nodes is 100 (i.e. whole network is ﬂooded). We show the obtained values from our simulations in Figure 7. The parameter ﬂooding time can be understood as a follow-up to the ﬂooding speed analysis. In the small network scenario there is scarcely a diﬀerence between the omnidirectional and directional antenna implementation. However, increasing the network size the diﬀerence in the ﬂooding time between the directional and omnidirectional antenna implementation is continuously reducing, meaning that beamforming antennas have the dominance in larger network areas. The reduction of the ﬂooding time is about 20%–30% (large network scenario) for the BeamMAC-UCA4 and for the BeamMAC-UCA10, respectively. Flooding time Node density=300 (nodes/km2) 0.7 0.6

Time (s)

0.5 0.4 0.3 802.11 BeamMAC-OMNI BeamMAC-UCA4 BeamMAC-UCA10

0.2 0.1 0

500

1000

1500

2000

Number of nodes

Fig. 7. Flooding time for 802.11 and BeamMAC protocol

4.3

Route Reply (RREP) Time

In addition to the ﬂooding mechanism analyzed in Section 4.1, we implemented and analyzed a route reply mechanism, as well. We believe that this parameter will give us more understanding about the time it takes two nodes to establish a path. The initiator of the route discovery (i.e. the source) broadcasts a packet to the destination, which after receiving it, generates and sends a route reply packet to the source. The route reply packet is not broadcasted, but rather sent using a

34

V. Mizorov et al.

hop-by-hop unicasting. The route reply mechanism is based on a source routing concept, i.e. the reply packet follows the same path on which the route request came from. RREP Time 2 Node density=300 (nodes/km ) 1

Time (s)

0.8

0.6

0.4 802.11 BeamMAC-OMNI BeamMAC-UCA4 BeamMAC-UCA10

0.2

0 0

500

1000

1500

2000

Number of Nodes

Fig. 8. Route reply time for 802.11 and BeamMAC protocol

As Figure 8 shows, the network area has a negative impact on the route discovery process, as well. On the one hand, considering the small network scenario, we conﬁrm that there is an insigniﬁcant reduction of the route reply time with beamforming antennas. On the other hand, considering the large network scenario, we notice a superior performance of the BeamMAC protocol. The route reply time has been reduced by about 30%–40% in the case of the BeamMACUCA4 and the BeamMAC-UCA10, respectively. Due to the previous analysis of the ﬂooding speed, we can see here that having faster message propagation ensures quicker delivery to the desired destination. This in turn, helps in achieving better route reply times.

5

Conclusion

Our work outlines the positive impact that the beamforming antennas have on the information dissemination in wireless multihop networks. Our study is performed by simulating realistic network scenarios, considering the very popular 802.11 MAC protocol and a simpliﬁed version of the BeamMAC protocol. In addition, we use parameters adopted directly from the IEEE 802.11 standard [12]. Results presented in this paper represent a performance comparison of the broadcast mechanism in multihop networks. Together with the ﬁndings in [7] and [9], we provide a thorough analysis of the parameters related to the network topology as well as to the message dissemination in these networks. Although our approach includes a simple ﬂooding model and a simple sourcerouting based route discovery process, we show that the beamforming antennas outperform the omnidirectional antennas. They provide faster message dissemination and faster route discovery process. In addition, we identify their downsides (e.g. void transmission) when implemented in small network scenarios. However,

Flooding Speed in Wireless Multihop Networks

35

in large network area scenarios the beamforming antennas have proven to be superior. This approach could be the worst case analysis when there is no other network information available. As soon as more topology related information is available, nodes can use sophisticated location ﬁnding algorithms to adapt or optimize their radiation direction. Moreover, this particular study can be of further help in service discovery scenarios in wireless multihop networks. By simply deﬁning a threshold when the service announcement or service query is successful (e.g. 80% of nodes receive the packet), looking into Figures 4–6 we can obtain the time it takes for a certain service to be properly advertised. We strongly believe that the implementation of beamforming antennas has an enormous potential in wireless multihop networks. Therefore, more in-depth analysis is required which will investigate the actual routing and will look into issues related to cross–layer optimization.

Acknowledgment The authors would like to thank Imad Aad from DoCoMo Euro-Labs and Christian Bettstetter (previously with DoCoMo, now with University of Klagenfurt) for various discussions and very useful comments.

References 1. R. Vilzmann, C. Bettstetter, C. Hartmann: BeamMAC: A New Paradigm for Medium Access in Wireless Networks, International Journal of Electronics and ¨ Volume 60, Number 1, (Jan. 2006) 3–7. Communications (AEU), 2. M. Takai, J. Martin, A. Ren, R. Bagrodia: Directional Virtual Carrier Sensing for Directional Antennas in Mobile Ad Hoc Networks, In Proc. 3rd ACM MobiHoc, Switzerland, (June 2002) 183–193. 3. R. Roy Choudhury, N. H. Vaidya: Ad Hoc Routing Using Directional Antennas, Technical Report, Coordinated Science Laboratary, University of Illinois at Urbana-Champaign (Aug. 2002). 4. H. Singh, S. Singh: A MAC Protocol based on Adaptive Beamforming for Ad Hoc Networks, In Proc. IEEE PIMRC, China, (Sept. 2003) 1346–1350. 5. H. Singh, S. Singh: Smart-802.11b MAC Protocol for Use with Smart Antennas, In Proc. IEEE ICC, France, Volume 6, (June 2004) 3684–3688. 6. Y.-B. Ko, V. Shankarkumar, N. H. Vaidya: Medium Access Control Protocols using Directional Antennas in Ad hoc Networks, In Proc. IEEE Infocom, Israel, (March 2000), 13–21. 7. C. Bettstetter, C. Hartmann, C. Moser: How Does Randomized Beamforming Improve the Connectivity of Ad Hoc Networks?, In Proc. IEEE ICC, Korea, (May 2005), 3380–3385. 8. R. Vilzmann, C. Bettstetter: A Survey on MAC Protocols for Ad Hoc Networks with Directional Antennas, In Proc. EUNICE Open European Summer School, Spain, (July 2005), 268–274. 9. R. Vilzmann, C. Bettstetter, D. Medina, C. Hartmann: Hop Distances and Flooding in Wireless Multihop Networks with Randomized Beamforming, In Proc. ACM MSWIM, Canada, (Oct. 2005), 20–27.

36

V. Mizorov et al.

10. Constantine A. Balanis: Antenna Theory, Analysis and Design, John Wiley & Sons, Inc., 2nd Edition, (1997). 11. M. Matsumoto, T. Nishimura: Mersenne Twister: A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator, ACM Transactions on Modeling and Computer Simulation, Volume 8, Number 1, (1998), 3–30. 12. IEEE Standards Board: Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Speciﬁcations, (1999), http://www.ieee802.org/11/. 13. C. Bettstetter: Mobility Modelling, Connectivity, and Adaptive Clustering in Ad Hoc Networks, PhD Thesis, Technische Universit¨ at M¨ unchen, Germany, (Oct. 2003). 14. Algorithmic Solutions: LEDA - Library of Eﬃcient Data Types and Algorithms, (Dec. 2005), http://www.algorithmic-solutions.com/enleda.htm. 15. V. Mizorov: Routing in Ad Hoc Networks with Beamforming Antennas, Master Thesis, RWTH Aachen University, (Dec. 2005). 16. T. Korakis, G. Jakllari, L. Tassiulas: A MAC protocol for full exploitation of directional antennas in ad hoc wireless networks, In Proc. 4th ACM MobiHoc, USA, (June 2003), 98–107. 17. W. Weichselberger, G. L. de Echazarreta: Comparison of Circular and Linear Antenna Arrays with respect to the UMTS Link Level, COST 260 Management Committee and Working Groups Meeting, G¨ oteburg, Sweden, (May 2001). 18. N. Razavi-Ghods, M. Abdalla, S. Salous: Characterisation of MIMO Propagation Channels Using Directional Antenna Arrays, in Proc. IEE International Conference on 3G Mobile Communication Technologies (3G2004), London, UK, (October 2004).

Power Ampliﬁer Characteristic-Aware Energy-Eﬃcient Transmission Strategy Kwanghun Han, Youngkyu Choi, Sunghyun Choi, and Youngwoo Kwon School of Electrical Engineering and INMC Seoul National University, Seoul, Korea {khhan,ykchoi}@mwnl.snu.ac.kr, {schoi,ykwon}@snu.ac.kr

Abstract. The energy consumption in transmitting an information bit, i.e., energy-per-bit, has been known to decrease monotonically as the transmission time increases [1]. However, when considering the power ampliﬁer (PA) characteristics, we learn that the energy-per-bit starts to increase as the transmission time becomes long over a certain threshold. This is caused by the fact that, in a wireless device, it is not the transmission power, which determines the energy consumed during transmissions, but the input power to the PA whose output power is used as the transmission power. Based on our new trade-oﬀ model between the energy-per-bit and transmission time, we revisit known energy-eﬃcient scheduling algorithms. Finally, we evaluate the impact of the new tradeoﬀ model and the performance of algorithms via simulations.

1

Introduction

For battery-powered hand-held mobile devices, it is a key concern to reduce the energy consumption in order to extend the device’s life time. Since the wireless communication module is a major source of the overall energy consumption in such devices [2, 3], many studies in the literature have tried to minimize the energy consumption by turning oﬀ the unused parts of the wireless communication module after ﬁnishing packet transmission as quickly as possible [4, 5, 6]. In the meantime, the authors in [1, 7] proposed another approach, which reduces the transmission energy by controlling the transmit power when sending a given amount of data. They showed based on Shannon’s capacity equation that the energy-per-bit monotonically decreases as the transmission time1 increases. According to this trade-oﬀ relation, the authors proposed Lazy scheduling algorithm, which minimizes the transmission energy by sending an information bit as slowly as possible, which is accomplished by using low order modulation, low code rate channel coding and low transmission power. However, their model [1]

1

This research is in part supported by University IT Research Center (ITRC) project, and by the SNU-Samsung 4G collaboration project. Here, the exact meaning of the transmission time is the inverse of spectral eﬃciency. Assured that it is not confusing, hereafter, we will simply use this term to represent this meaning.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 37–48, 2007. c IFIP International Federation for Information Processing 2007

38

K. Han et al.

accounts for only the transmission energy while circuitry typically keeps consuming energy in order to maintain the transmission module consistently active even when there is no on-going packet transmission. The power required by circuitry is called electronics power, and this is almost constant irrespective of the transmission power. When considering the eﬀect of electronics power, Yu et al. showed that the energy-per-bit does not monotonically decrease with transmission time any more, especially when the electronics power is comparable with the transmit power [8]. Such a situation can indeed happen when the transmission range is extremely small as in the sensor networks, while the transmit power dominates the electronics power in most wireless communication systems. Shuguang et al. veriﬁed through in-depth circuit-level analysis [9] that such a trade-oﬀ relation is really present in the context of sensor network. Based on this trade-oﬀ model, the authors in [8] proposed a packet scheduling algorithm, which aims at minimizing the energy dissipation in multi-hop wireless sensor networks. Then, in a long-range communication, e.g., conventional cellular network, where the transmit power dominates the electronics power, is it still reasonable to think that the energy-per-bit monotonically decreases as the transmission time increases? Our major ﬁnding, in this paper, is that even if the electronics power is ignorably small compared with the transmit power (i.e., long-range communications), the energy-per-bit does not monotonically decrease as transmission time increases when the power ampliﬁer (PA) characteristics are considered. In a strict sense, we should note that the actual energy consumption is not caused by the transmission energy but by the energy consumed by PA to generate the transmit power. Accordingly, we revisit the energy-eﬃcient scheduling problems, which are considered in [1, 10], based on our new trade-oﬀ model. We expect that if the algorithms derived in this paper are used for uplink scheduling, especially in a cellular network, the lifetime of battery-powered device could be extended. The rest of this paper is organized as follows: In Sect. 2, we derive the tradeoﬀ model between the energy-per-bit and transmission time considering the PA characteristics. In Sect. 3, we formulate the energy-eﬃcient scheduling problems, and then present not only two oﬄine algorithms, called Modiﬁed Lazy scheduling and Modiﬁed Move Right scheduling, but also the online version of each oﬄine algorithm, respectively. In Sect. 4, we evaluate the performance of the algorithm via simulation. Finally, in Sect. 5, we conclude the paper with some remarks on the future work.

2

Revised Trade-Oﬀ Model

The PA eﬃciency is deﬁned as the ratio of the output power to the power provided by direct current (DC) voltage source. Since the PA output power is equivalent to the transmit power in the context of communication system, we denote the output power by Ptx as shown in Fig. 1.

PA Characteristic-Aware Energy-Eﬃcient Transmission Strategy

39

DC voltage source

P in

P tx

Fig. 1. A diagram for power ampliﬁer

Let us denote the power eﬃciency by η(Ptx ) (< 1), which is typically nondecreasing with Ptx [11, 12]. Then, Ppa , which represents the actual power consumed by PA, is represented as Ppa =

Ptx , 0 ≤ Ptx ≤ Ptx,max , η(Ptx )

(1)

where Ptx,max is the maximum output power, a design parameter of a PA.2 Due to the dependence of η on Ptx , (1) implies that as we reduce Ptx by Δ, the decrement of Ppa is not linearly proportional to Δ, but is altered by the operating range of Ptx . Under an Additive White Gaussian Noise (AWGN) channel with noise power N , the optimal channel coding gives Shannon’s capacity as follows: C=

αPtx 1 log2 (1 + ), 2 N

(2)

where α represents the power loss due mainly to the path loss. Although the actual transmission rate should be less than C, for the convenience of discussion, we just regard C as the achievable rate. When we denote the time necessary to transmit one information bit by t = C1 , the energy-per-bit, Er (t), is given by Er (t) = tPpa = t

Ptx N 2 , Ptx = (2 t − 1). η(Ptx ) α

(3)

If η(Ptx ) is simply modeled by a constant, Er (t) is monotonically decreasing and convex in t as shown in [1,7]. Now, using a proposed power eﬃciency model in [13], we replace η(Ptx ) with η(Ptx ) = ηmax 2

Ptx , Ptx,max

(4)

In fact, the additional power provided by PA amounts to Ptx − Pin , and hence those, who are interested in the PA performance itself, typically use a metric called powerin . Since the PA gain deﬁned as added eﬃciency (PAE), which is deﬁned as PtxP−P pa typically ranges from 20 to 30 dB, Ptx − Pin Ptx , and hence PAE can be thought to be nearly equal to the power eﬃciency. Ptx Pin

40

K. Han et al.

where ηmax is the maximum PA eﬃciency achieved when Ptx = Ptx,max . Accordingly, we obtain a new energy-per-bit Er (t) as follows: Ptx Ptx,max t. (5) Er (t) = ηmax We present two theorems, which show the property of the trade-oﬀ relation between Er (t) and t. For the purpose of comparison, we denote by Ec (t) the energy-per-bit obtained when the power eﬃciency is considered constant, e.g., η = ηmax as in the literature. Theorem 1. Er (t) is neither monotonically decreasing nor convex in t. 1

Proof. It can be shown that as t becomes much smaller than 1, Er (t) ∼ 2 t t and 2 Ec (t) ∼ 2 t t. Therefore, Er (t) is also monotonically decreasing and convex in t as Ec (t). On the other hand, √ as t approaches inﬁnity, Ec (t) approaches a constant ln 2 i.e., 2N while E (t) ∼ t. Thus, Er (t) becomes an increasing and concave r αηmax function in the region of large t. Since Er (t) changes from a decreasing convex function to an increasing concave function in the range of t from 0 to inﬁnity, we can prove that there exist both minimum and inﬂection points of Er (t). Theorem 2. When we denote t yielding the minimum and inﬂection points of Er (t) by t∗ and to , respectively, t∗ < to . √ 2 N Ptx,max Proof. From (5), Er (t) = β 2 t − 1t, where β = √αηmax . Diﬀerentiating

−2 r t g(t), where g(t) = (1 − lnt 2 )2 t − 1. Er (t) by t, we obtain dE dt = β(2 − 1) Since Er (t) becomes an increasing function when t is large, g(t) < 0 when t < t∗ , and g(t) > 0 when t > t∗ . 2 d 2 Er − 32 2t ln 2 r t 2 t3 k(t), where Diﬀerentiating dE dt again, we obtain dt2 = β(2 − 1) 2 k(t) = (2 ln 2 − 1)2 t − 2 ln 2. From k(to ) = 0, to = log ( 22 ln 2 ) 1.0849. Since 2

1

2

2 2 ln 2−1

g(to ) 0.296 > 0, t∗ should be less than to . Indeed, a numerical solution for g(t) = 0 yields t∗ 0.8699, and hence t∗ < to is veriﬁed.

Figure 2 plots both Ec (t) and Er (t). Referring to the parameters of RF2162PA [13], we set Ptx,max to 1.41 W and ηmax to 0.5. Assuming that the received N tx signal-to-noise ratio (SNR), αP N , ranges from 0.11 dB to 20 dB, we set α appropriately in order that Ptx does not exceed Ptx,max at the highest SNR (i.e., 20 dB). Obviously, we see that Er (t) becomes quite diﬀerent from Ec (t). This observation tells us that the energy-eﬃcient scheduling algorithm should be revisited based on the new trade-oﬀ model. This motivation is diﬀerentiated from other approaches [7, 8] because the trade-oﬀ model, which considers only the eﬀect of electronics power, just accounts for the short-range communication like sensor network, while the electronics power can be ignored in the energy consumption of many other communication systems.

PA Characteristic-Aware Energy-Eﬃcient Transmission Strategy 0.9

Er(t) Ec(t)

0.8 Energy-per-bit (watt·symbols)

41

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Transmission time-per-bit (symbols)

Fig. 2. Energy-per-bit versus transmission time

3

Energy Eﬃcient Uplink Scheduling

In this section, we consider the energy eﬃcient uplink scheduling problem in cellular systems. Based on our trade-oﬀ model, we derive oﬄine and online algorithms. 3.1

Problem Deﬁnition

In cellular systems, most subscriber stations are battery-powered while the base station is AC-powered. Accordingly, the energy eﬃcient transmission strategy can be beneﬁcial, especially for uplink transmissions. In order to highlight the impacts due to our new trade-oﬀ model, we restrict the discussion to the scenario of a point-to-point communication. Let us denote by w(τ ) the amount of energy required to transmit a packet over a time duration τ by subscriber stations. We assume that M packets should be transmitted within time [0, T ), which can be considered the time allocated to a subscriber station by the scheduler. For the purpose of analytical simplicity, all packets are assumed to be of equal size. We denote the arrival time of the ith packet by ti and the packet interarrival time by di = ti+1 − ti . Without loss of generality, the ﬁrst packet is assumed to arrive at time zero, i.e., t1 = 0. The scheduler determines both the packet transmission duration τ = (τ1 , · · · , τM ) and the transmission start time s = (s1 , · · · , sM ). Based on this system description and assumption, we formulate an energy-eﬃcient scheduling problem constrained by the total allowed transmission time for a group of packets, which is equivalent to the problem considered in [1].

42

K. Han et al.

Problem 1. Find the scheduling, τ and s, which minimizes the total transmission energy: M min W (τ ) = i=1 w(τi ), s.t. ti ≤ si , ∀i ∈ {1, ..., M }, sM + τM ≤ T. For the purpose of notation, we denote this problem by ESP-I. As shown in Problem 1, the transmission of the last M th packet should be completed within time T . However, since Problem 1 does not take care of the delay, which an individual packet would experience, if T is quite large, the scheduling result may not be desirable for some types of traﬃc. Note that packets often have transmission delay bound, and hence packets exceeding a certain delay bound would be considered a delivery failure or discarded. This is quite typical for packets out of real-time multimedia applications. Therefore, we consider another problem formulation of energy-eﬃcient scheduling with constraints of per-packet delay bound, and denote the problem by ESP-II. Given a vector of per-packet delay bound q = (q1 , · · · , qM ), where qi represents the maximally allowed transmission time for the ith packet, ESP-II can be written as follows: Problem 2. Find the scheduling, τ and s, which minimizes the total transmission energy: M min W (τ ) = i=1 w(τi ), s.t. ti ≤ si , ∀i ∈ {1, · · · , M }, si + τi ≤ qi , ∀i ∈ {1, · · · , M }. 3.2

Oﬄine Algorithm

Modiﬁed Lazy Scheduling Algorithm. In order to solve ESP-I, Lazy scheduling algorithm was originally developed, and it holds optimality and feasibility (see the detailed algorithm and proofs in [1]). It achieves transmission energy reduction by lowering the transmission power and increasing the transmission time as much as possible under the strictly convex and monotonically decreasing energyper-bit function. Now, since the energy-per-bit function is changed in our model, we develop a modiﬁed Lazy scheduling algorithm, which is an extended version of the Lazy scheduling. In order to devise modiﬁed Lazy scheduling, we consider the two lemmas induced from the trade-oﬀ relationship described in Section 2. Lemma 1. There exists a packet transmission time τmax , which minimizes the transmission energy consumption. Proof. From Theorem 2, there exists a transmission time-per-bit t∗ , which yields the minimum energy-per-bit value. Accordingly, we can obtain the packet

PA Characteristic-Aware Energy-Eﬃcient Transmission Strategy

43

transmission time τmax , which minimizes the energy consumption needed to transmit a packet, by multiplying the packet size and t∗ together. Lemma 1 tells us that we do not need to consider the range of τ longer than τmax in the optimization problem. The reason is that using τ (> τmax ) does not help minimizing the energy consumption nor satisfying the delay constraint. Lemma 2. When the range of τ , which we are interested in, is upper-bounded by τmax , w(τ ) can be regarded as a convex function of τ . Proof. According to Theorem 2 in Sect. 2, Er (t) is convex when the transmission time-per-bit t is less than t∗ since t∗ is less than the inﬂection point to . Therefore, w(τ ) is also convex in τ when τ < τmax . Our modiﬁed Lazy scheduling algorithm consists of two parts. The ﬁrst part is called minimum energy transmission part, where the scheduler ﬁnds the packets which can be transmitted with the minimum energy packet transmission time τmax . The second part is called legacy Lazy scheduling part, where the Lazy scheduling is conducted for the set of the remaining packets. The modiﬁed algorithm is summarized in Algorithm 1. Algorithm 1. Modified Lazy Scheduling si ⇐ ti , i ∈ {1, ..., M } and si+1 = T tsum ⇐ s1 and smax ⇐ 0 for i = 1 to i ≤ M do tsum ⇐ tsum + τmax if tsum ≤ si+1 then smax ⇐ i and tsum ⇐ si+1 for i = 1 to i ≤ smax do si+1 ⇐ max{si+1 , si + τmax } and τi ⇐ τmax if smax < M then Do Lazy scheduling beginning from the (smax + 1)th packet

Modiﬁed Move Right Algorithm. In order to solve ESP-II, we modify the Move Right scheduling algorithm [10]. Originally, Move Right algorithm was developed to solve ESP-I when the non-identical energy-per-bit transmission function is used for each packet. However, we modify this algorithm to deal with the constraints of per-packet delay bound. The main idea of the original Move Right algorithm is to ﬁnd the optimal transmission start time with an iterative manner, and this notion of iteration is maintained in our proposed algorithm as well. The modiﬁed algorithm has two additional constraints: one is a delay bound for the feasibility and the other is a transmission duration bound for the minimum transmission energy. Initially, si and τi are set to ti and min{si+1 − si , qi − si , τmax }, respectively. For a pair of packets, which arrive subsequently, we ﬁnd s2 (s2 ≤ s2 ≤ s2 +τ2 ), τ1 , and τ2 such that w(τ1 )+w(τ2 )

44

K. Han et al.

is minimized. This operation proceeds until the last two packets. In order to get the optimum value, the algorithm repeats this process until the scheduling results converge. The detailed algorithm is described in Algorithm 2, where the superscript k−1 k−1 k−1 k in τik and ski indicates the kth iteration. f (τik−1 , τi+1 , si si+1 ) returns the k k k k ) when updated set of values (τi , τi+1 , si+1 ), which minimize w(τi k ) + w(τi+1 k−1 k−1 k k k k k τi +τi+1 is ﬁxed, where τi , τi+1 , and si+1 satisfy τi ≤ τmax , τi+1 ≤ τmax , and k−1 k−1 k−1 k ski+1 ≤ sk−1 +τik−1 , i+1 +τi+1 ≤ min{si+1 +τmax , qi+1 }, respectively. If si+1 > si k k k (τi , τi+1 , si+1 ) are not changed from the values at the (k − 1)th iteration. The optimality is also shown in [10] and this is maintained as well in our algorithm since we only consider the convex region of w(τ ) given when τ is less than or equal to τmax . Algorithm 2. Modified Move Right Scheduling k ⇐ 0 and f lag ⇐ 0 ski ⇐ ti , i ∈ {1, ..., M } τik ⇐ min{ski+1 − ski , qi − ski , τmax }, i ∈ {1, ..., M } while f lag = 0 do k ⇐k+1 τ k ⇐ τ k−1 and sk ⇐ sk−1 for i = 1 to i ≤ M − 1 do if ski+1 = ski + τik then k−1 k−1 k−1 k (τik , τi+1 , ski+1 ) = f (τik−1 , τi+1 , si si+1 ) k k−1 then if τ = τ f lag ⇐ 1

3.3

Online Algorithm

Online Extension of Modiﬁed Lazy Scheduling Algorithm. For the extension to online algorithm, we assume that packets arrive according to the Poisson distribution with mean rate λ. Our goal is to achieve the optimal performance in an average sense. To do so, we need a more tractable form of the scheduling result achieved by the algorithm and this was derived in [1] oﬄine k 1 ∗ as τj = maxk∈{1,...,M−(j+bj )} k+bj i=1 Ci , where M is the total number of packets arriving during [0, T ), j is the current packet to be sent, and Ci , i ∈ {1, ..., M − j − bj } is the inter-arrival time between the (j + i − 1)th packet and the (j + i)th packet. bj is the number of packets backlogged at that time when the jth packet is transmitted. From this expression, one can derive 1 k , where b is the curD a random variable τ ∗ (b, t) = maxk∈{1,...,M} k+b i i=1 rent backlog, M is a random variable representing the number of packet arrivals during [t, T ), and Di is the average inter-arrival time when M = i. Finally, the transmission duration of the current packet is determined by E[τ ∗ (b, t)]. However, we simply modify it to τ ∗ = min{τmax , E[τ ∗ (b, t)]} for our algorithm by considering τmax .

PA Characteristic-Aware Energy-Eﬃcient Transmission Strategy

45

Online Extension of Modiﬁed Move Right Algorithm. Online extension of modiﬁed Move Right scheduling can also be achieved by the similar method used for the online modiﬁed Lazy scheduling case. In [10], the authors derived the online algorithm using a look-ahead buﬀer. In other words, the scheduler waits for gathering some packets during a determined time. However, for ESPI, this look-ahead buﬀer adds a constant value to the total transmission delay, and hence it can cause the actual duration of transmission to be shorter, which means that more energy consumption is required. Accordingly, instead of using the notion of the look-ahead buﬀer, we make use of the assumption that for a given time interval, the packets arrive with the same inter-arrival time, which is determined by the ratio of the given time interval to the average number of packet arrivals during that interval. Then, we run Move Right scheduling to decide the transmit duration of the current packet using the information of b backlogged packets and the expected packet arrivals, where each packet is assumed to have the same delay margin from its own arrival time.

4

Simulation Results

In this section, we evaluate the proposed algorithms using an MATLAB-based simulator. We assume that the size of a packet is 10 kbits and the system bandwidth is 106 symbols/sec. A simulation run lasts for 10 sec, and during the last 0.5 sec, no packet is assumed to arrive in order to prevent the transmission energy from divergence. For other parameters, we use the same values mentioned in Sect. 2. Whenever a result needs to be averaged, we repeat the simulation runs as many as 10 times. First of all, we compare Lazy and modiﬁed Lazy scheduling for the same packet arrival patterns under the proposed trade-oﬀ model. As shown in Fig. 3, when the arrival rate is less than 150 packets/sec, Lazy scheduling tends to send each packet with a transmit duration longer than τmax , and hence more energy is consumed. In the meantime, modiﬁed Lazy scheduling limits the transmit duration up to τmax , which yields the minimum energy consumption. However, as the packet arrival rate increases, the diﬀerence of energy-per-bit between two algorithms becomes marginal since both algorithms will yield the same result if the optimal transmit duration is less than τmax . Second, Fig. 4 compares the online modiﬁed Lazy scheduling algorithm with corresponding oﬄine version. The discrepancy between two algorithms becomes larger as the packet arrival rate increases. This is due to the fact that the online algorithm only minimizes the average energy consumption. Finally, Fig. 5 shows the energy per bit achieved by oﬄine and online modiﬁed Move Right scheduling algorithm. In this simulation, we assume that the perpacket delay bound, qi , is given by 200 msec for every packet. When the packet arrival rate is less than 90 packets/sec, the delay bound does not aﬀect the energy-per-bit much because the energy optimal transmit duration is determined as a value less than the delay bound. However, as the packet arrival rate increases,

46

K. Han et al.

7

x10

-6

modified Lazy scheduling Lazy scheduling

Energy-per-bit (J)

6.5

6

5.5

5

4.5 50

100

150

200

250

Packet arrival rate (packets/sec)

Fig. 3. Comparison of Lazy scheduling with modiﬁed Lazy scheduling

-6 6.4 x10

offline modified Lazy scheduling online modified Lazy scheduling

6.2

Energy-per-bit (J)

6 5.8 5.6 5.4 5.2 5 4.8 50

100

150

200

250

Packet arrival rate (packets/sec)

Fig. 4. Oﬄine modiﬁed Lazy scheduling versus online modiﬁed Lazy scheduling

the delay bound tends to limit the transmit duration, and hence the energy-perbit also increases. This remark can be conﬁrmed by comparing them with the modiﬁed Lazy scheduling result, which does not impose per-packet delay bound constraint.

PA Characteristic-Aware Energy-Eﬃcient Transmission Strategy

9

47

x10-6

offline modified Lazy scheduling offline modified Move Right scheduling online modified Move Right scheduling

8.5

Energy-per-bit (J)

8 7.5 7 6.5 6 5.5 5 4.5 50

100

150

200

Packet arrival rate (packets/sec)

Fig. 5. Oﬄine modiﬁed Move Right scheduling versus online modiﬁed Move Right scheduling

5

Conclusion

In this work, we show that the characteristic of power ampliﬁer leads to a non-convex energy-per-bit curve. Based on this trade-oﬀ model, we propose modiﬁed Lazy scheduling and modiﬁed Move Right scheduling algorithms, which were originally proposed in [1, 10], to solve the energy minimization problem, where the delay constraint is imposed on either a group of packets or each packet. Since the transmit power in the uplink of a cellular network should be controlled in order to deal with the near-far problem and the inter-cell interference, it is our on-going work to solve both the energy-eﬃcient scheduling and power control problem jointly in the multi-cell environment.

References 1. Prabhakar, B., et al.: Energy-Eﬃcient Transmission over a Wireless Link via Lazy Packet Scheduling. In: Proc. IEEE INFOCOM. (April 2001) 2. Stemm, M., et al.: Reducing Power Consumption of Network Interfaces for Handheld Devices. In: Proc. MoMuC. (September 1996) 3. Flinn, J., Satyanarayanan, M.: Energy-aware adaptation for mobile applications. In: Proc. ACM SOSP. (December 1999) 4. Krashinsky, R., Balakrishnan, H.: Minimizing Energy for Wireless Web Access with Bounded Slowdown. In: Proc. ACM MobiCom. (September 2002) 5. Anand, M., et al.: Self-tuning wireless network power management. In: Proc. ACM MobiCom. (September 2003) 6. Qiao, D., Shin, K.G.: Smart Power-Saving Mode for IEEE 802.11 Wireless LANs. In: Proc. IEEE INFOCOM. (March 2005)

48

K. Han et al.

7. Schurgers, C., et al.: Power Management for Energy-Aware Communication Systems. ACM Trans. Embedded Computing Sys. 2(3) (2003) 431–447 8. Yu, Y., et al.: Energy-Latency Tradeoﬀs for Data Gathering in Wireless Sensor Networks. In: Proc. IEEE INFOCOM. (March 2004) 9. Cui, S., et al.: Energy-constrained Modulation Optimization. IEEE Trans. Wireless Commun. 4(5) (2005) 10. Gamal, A.E., et al.: Energy-Eﬃcient Scheduling of Packet Transmissions over a Wireless Networks. In: Proc. IEEE INFOCOM. (2002) 11. J¨ ager, H., et al.: Broadband High-Eﬃciency Monolithic InGap/GaAs HBT Power Ampliﬁers For 3G Handset Applications. IEEE MTT-S Inter. Microwave Sympo. Digest 2 (2002) 1035–1038 12. Pedro, J.C., et al.: Linearity versus Eﬃciency in Mobile Handset Power Ampliﬁers: A Battle without A Loser. Microwave Engineering Europe (August 2004) 13. Corte, F.D.: Power Management for Energy-Aware Communication Systems. RFDesign Magazine (May 2000)

Energy Eﬃcient Throughput Optimization in Multi-hop Wireless Networks Dan Xu and Xin Liu Computer Science Department, University of California Davis, CA 95616, USA {xud,liu}@cs.ucdavis.edu

Abstract. Throughput, fairness, and energy consumption are often conﬂicting objectives in multi-hop wireless networks. In this paper, we propose the notion of lexicographical maxmin energy eﬃciency throughput fairness that achieves throughput fairness per unit energy. Compared with maxmin throughput fairness and maxmin time fairness, the proposed scheme allocates more bandwidth to nodes with relay requirements and provides satisfactory bandwidth to nodes far from the sink. We design an optimal bandwidth allocation algorithm to achieve the proposed fairness objective. Simulation results show that the proposed scheme results in more balanced throughput among users when they exhaust energy resources, compared to other fairness schemes.

1

Introduction

One of the most challenging issues in bandwidth allocation is the conﬂict between fairness and throughput. In [1], the authors indicate in multi-rate 802.11 MAC, throughput-based fairness degrades network throughput considerably since most channel access time is occupied by low bit rate links. Time-based fairness proposed in [2] allows each user to fairly share time resources, which results in low throughput on low capacity links. In [3], the authors argue in multi-hop WLANs[4], maxmin throughput fairness can improve network throughput without penalizing low capacity users, and maxmin time fairness leads to an even higher network throughput by protecting high bit rate links. In multi-hop networks, some nodes need to serve as routers and relay traﬃc for other nodes. Routers handle more traﬃc and thus consume much more energy than descendant nodes. If nodes are energy constrained, such as in sensor networks or when using devices on battery power (e.g., laptops or PDAs), energy consumption needs to be considered in bandwidth allocation among users. This observation motivates the work in this paper. We consider a multi-hop wireless network where all nodes need to connect to a wired sink or AP (access point) while taking into account energy consumptions. When energy is a constraint, maxmin throughput fairness is unfair to routers

The work was in part supported by NSF through CAREER Award #0448613 and Grant #0520126, and by Intel through a gift grant.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 49–60, 2007. c IFIP International Federation for Information Processing 2007

50

D. Xu and X. Liu Sink

Sink

0.6/6 1

1/6

2 3 0.6/2 0.6/2

4 0.6/2

1

2 1/6

3 1/6

a)

4 1/2

b)

Fig. 1. A example of maxmin throughput fairness bandwidth allocation, in each subﬁgure, the left side of the slash is the allocated bandwidth and the right side is the node’s link rate

because they consume more energy and die much faster than other nodes. Consider the example illustrated by Fig. 1; nodes 2 and 3 choose node 1 as their router since in (a) they both have low bit rates and therefore low throughput if directly connected to the sink. To simplify the example, we assume at each node, transmitting one unit of data costs 2J of energy, receiving one unit of data needs 1J, and no energy is consumed in an idle state. Then in (b) where node 1 is a router, it costs 8J per unit time to transmit the same amount of data as node 2 or 3, which only consume 2J per unit time. Therefore, node 1 will have a much shorter lifetime and consequently much less aggregate throughput compared with that of itself in a) and nodes 2 or 3 in b). On the other hand, maxmin time fairness severely penalizes nodes far away from the sink. This is because, in order to fairly share a router’s time, a child node’s throughput is about 12 of that allocated to the router. The larger the number of network hops, the less the bandwidth leaf nodes receive. For example, consider a chain topology with 20 nodes and a sink as shown in Fig. 2. Each node chooses the node in front of it as its router. Assume each link’s bit rate is 11Mbps. In b), by maxmin time fairness, routers receive much more bandwidth 0.282 20

0.282 0.282 19 18

…

0.282 0.282 0.282 3 2 1

Sink

a): Maxmin throughput fairness bandwidth allocation

0.027 0.055 0.082 20 19 18

…

0.495 0.522 0.550 3 2 1

Sink

b): Maxmin time fairness bandwidth allocation

0.185 0.193 0.201 20 19 18

…

0.382 0.398 0.416 3 2 1

Sink

c): Maxmin energy efficiency throughput fairness bandwidth allocation

Fig. 2. In the chain topology, compared with maxmin throughput fairness and maxmin time fairness, maxmin energy eﬃciency allocates more bandwidth to routers while still giving child nodes relatively high throughput. Here energy consumption for transmit, receive, and idle are 1.9, 1.55, and 0.75J/s, respectively. The energy consumption model is based on the measurements reported in [5].

Energy Eﬃcient Throughput Optimization in Multi-hop Wireless Networks

51

compared to their descendants. The leaf node 20 only receives a bandwidth of 1 27kbps, which is less than 20 of that of node 1. Motivated by such limitations, we propose the notion of lexicographical maxmin energy eﬃciency throughput fairness, which achieves maxmin throughput fairness per unit energy. With this objective, a router is allocated more bandwidth when it consumes more energy per unit time to relay traﬃc for other nodes. In other words, routers are compensated for providing service for other nodes. Although a router has a shorter lifetime, it can still achieve a high total throughput before using up its battery power. Another beneﬁt is illustrated in Fig. 2, using maxmin energy eﬃciency throughput fairness, a router can receive higher throughput than its child nodes. Meanwhile, even the leaf node can still receive a satisfactory throughput, which is 185kbps in the example illustrated by Fig. 2. That is, maxmin energy eﬃciency throughput fairness provides a better balance among users than both maxmin throughput fairness and maxmin time fairness when energy is the constraint. Our contributions are as follows. We observe the limitations of maxmin throughput fairness and maxmin time fairness when energy is constrained and propose maxmin energy eﬃciency throughput fairness that allocates more bandwidth to routers while still not penalizing the bandwidth of children nodes severely. To achieve the objective, we design an iterative optimal bandwidth allocation algorithm. We also propose and implement a maxmin energy fair scheme for comparison. Our results show that maxmin energy eﬃciency throughput fairness results in more balanced throughput among all nodes when they deplete their energy compared with that of the maxmin throughput fair, maxmin time fair, and maxmin energy fair schemes. The rest of paper is organized as follows. In Section 2, we describe the network model and state the maxmin energy eﬃciency throughput fairness. To achieve the deﬁned fairness objective, we design an optimal bandwidth allocation algorithm and validate its correctness in Section 3. Section 4 is performance evaluation. We present related work in Section 5 and conclusions in Section 6.

2 2.1

Formulation Network Model

We consider a wireless multi-hop network where each node chooses only one pre-determined ﬁrst-hop router to be connected to the sink or AP. Therefore, the network topology can be modeled by a tree structure rooted at the sink. In sensor networks or mesh networks, nodes are often static and thus we consider a relatively stable network topology in this paper. Consider a node i, ai,Pi denotes achievable link rate between node i and its parent, which is denoted by Pi . Notice a sink has no parent node. We use an indicator function Ii to denote whether a node is a sink or not, where Ii = 1 if i

52

D. Xu and X. Liu Table 1. Notions of main variables bi bandwidth of node i the set of child nodes of node i Ci Pi the ﬁrst hop router of node i the subtree rooted at node i Ti total bandwidth of subtree Ti Bi Ii Ii = 1 if i is a non-sink node, Ii = 0 if i is a sink actual bit rate from node j to i aj,i energy consumed per unit time in transmit state of node i eti energy consumed per unit time in receive state of node i eri energy consumed per unit time in idle state of node i eii Wi workload of i, i.e., the sum of time length of receive and transmit state time length of idle state of node i tii Ei total energy consumption of node i 2 e bi energy eﬃciency throughput of node i bandwidth vector of the subtree Ti Bi E2 Bi energy eﬃciency throughput vector of subtree Ti

is a non-sink node, Ii = 0 otherwise. Let Ci be the set whose ﬁrst-hop router is i and Ci+ be the set Ci ∪ {i}. Let the subtree rooted at node i be Ti and |Ti | be the number of nodes in Ti . Let bi denote the bandwidth allocated to node i, and Bi be the aggregate bandwidth of subtree Ti . The notions used in this paper are listed in Table 1 for reference. The time fraction of node i consumed to transmit i bi its own traﬃc is aIi,P . Here the time fraction represents the workload added to i node i. Notice node i also needs to relay traﬃc for its child node j ∈ Ci . The B I B B time fraction to relay j’s traﬃc is aj,ij + aii,Pj , while aj,ij is the time consumed to i receive j’s packets. Since the time fraction at each node can not exceed 1, the feaB I B i bi sible bandwidth allocation condition is given as j∈Ci ( aj,ij + aii,Pj ) + aIi,P ≤ 1. i i Similar models are widely used in previous literature [3][8][12]. In such models, the eﬀect of inter-link interference can be ignored or taken into account in the achievable link rate ai,Pi , as in this article. In wireless networks, a node may transmit, receive, or stay idle (we ignore the sleeping state in this paper although all studies in this article can be extended to include the sleeping state). Energy consumed in all three states is taken into account in this paper. Let eti , eri , and eii be the energy consumptions per unit time of node i in transmit, receive, and idle states, respectively. These parameters are similar for the same kind of wireless terminals. In [5], the authors have measured these parameters for various wireless adapters. The reported values for a 802.11 WLAN module are used in our simulations. 2.2

Maxmin Energy Eﬃciency Throughput Fairness

As illustrated in the example in Fig. 1, a router node consumes much more energy, and therefore has much less overall throughput than its descendant nodes when maxmin throughput fairness is implemented. On the other hand, a node with multiple hops is severely penalized when maxmin time fairness is used. The maxmin energy eﬃciency throughput fairness is motivated by such limitations and formulated in this section.

Energy Eﬃcient Throughput Optimization in Multi-hop Wireless Networks

53

Consider a non-sink node i. Its energy consumption per unit time is

Ei =

j∈Ci

⎛

(eri

Bj Bj bi + eti ) + eti aj,i ai,Pi ai,Pi

⎞ Bj Bj bi ⎠ + eii ⎝1 − ( + )− aj,i ai,Pi ai,Pi j∈Ci

(1) B B bi In (1), 1 − j∈Ci ( aj,ij + ai,Pj ) − ai,P is the idle time of node i, denoted as i i tii , where tii ≥ 0. A non-sink node i consumes Ei unit of energy per unit time and achieves a bandwidth of bi . We deﬁne its energy eﬃciency bandwidth as e2 bi = Ebii , which denotes the bandwidth per energy unit. Suppose there are n non-sink nodes in the network. The energy eﬃciency bandwidth vector is deﬁned as E2 B=( Eb11 , Eb22 , . . . , Ebnn ), where Eb11 ≤ Eb22 ≤ . . . ≤ Ebnn . We give Deﬁnition 1 as: Deﬁnition 1(Maxmin Energy Eﬃciency Throughput Fairness). A feasible bandwidth allocation B is maxmin energy eﬃciency throughput fair if its energy efﬁciency bandwidth vector E2 B=( Eb11 , Eb22 , . . . , Ebnn ) is lexicographically equal or larger than that of any other feasible bandwidth allocation. Informally, a feasible bandwidth allocation is maxmin energy eﬃciency throughput fair if and only there is no way to increase energy eﬃciency throughput of any node without decreasing the energy eﬃciency throughput of some nodes with equal or already less energy eﬃciency throughput. The objective of Deﬁnition 1 is to provide maxmin fairness of the bandwidth per unit energy. Intuitively, it ﬁrst allocates the bandwidth per energy unit among all nodes equally. When some nodes are not able to consume the allocated bandwidth within per unit energy, the bandwidth per unit energy can be evenly shared by the rest of the nodes. A desirable property is that a router’s energy eﬃciency throughput is not less than that of its child nodes. By maxmin energy eﬃciency throughput fairness, routers that spend more energy can get more bandwidth. If we view the energy as the cost that each node must pay for communications and the bandwidth as the revenue, the more a node contributes, the higher throughput it gets. Although routers have a shorter lifetime, they can still transmit a large amount of data before using up battery power. Therefore, compared with maxmin throughput fairness, the aggregated throughput of a router is signiﬁcantly improved when it depletes its energy. This mechanism is a good incentive to encourage each node to serve others. ti times of Since maximum energy consumption at node i does not exceed eerj any other node j, maximum throughput allocated to a certain router is not excessively higher than its descendants. Compared with maxmin time fairness, the proposed fairness objective provides a satisfactory throughput to leaf nodes, even with a large number of network hops, as shown in Fig. 2. Maxmin energy eﬃciency throughput fairness provides the fairest overall throughput after each node depletes energy resource.

54

D. Xu and X. Liu

3

Solution

3.1

Algorithm Design

We design an optimal bandwidth allocation algorithm named E2 TBA (Maxmin Energy Eﬃciency Throughput Fairness Bandwidth Allocation) to achieve the deﬁned fairness objective. The structure of E2 TBA is based on the idea of Pump-Drain ﬁrst proposed in [3]. We use Pump-Drain to convert the problem of maxmin energy eﬃciency throughput fairness bandwidth allocation, which can be modeled as a serial Quasi-optimization problem, to a simper problem of solving a non-linear equation set. Then we give an approximate algorithm to obtain the solution numerically. E2 TFA runs in the recursive and distributed way: a sink initializes E2 TFA, then E2 TFA recursively calls E2 TFA in the up-bottom order. After E2 TFA execution at a node j ∈ Ci returns, node i will perform Pump-Drain within Ti to achieve maxmin energy eﬃciency throughput fairness among all the nodes in Ti . To perform Pump-Drain, each node i maintains the following information that is locally reported by its children nodes. After performing Pump-Drain, node i also reports the information to its parent node. • The bandwidth assigned to each node k in Ti , namely bk . • The distinct amounts of energy eﬃciency bandwidth assigned to Ti , which are sorted in the array λi in a non-decreasing order. |λi | is the up-to-date number of the elements of λi . • In array ξi , the kth element ξi [k] is a set of nodes whose energy eﬃciency bandwidth is equal to λi [k]. • The structure of Ti . The details of E2 TFA are as follows. Pump: Initially, the energy eﬃciency bandwidth of each node is zero(Bandwidth of each node is set to zero). After the execution of E2 TFA returns from a child node of Ti . Pump is executed at node i in the following steps. I. If node i is a sink, let bi keep the value of 0 and there is no need to perform Pump. However, Drain maybe needed since bandwidth of its children nodes may make i overloaded. If node i is a non-sink node, Pump goes to Step II. II. For a non-sink node i, after the execution of E2 TFA at j ∈ Ci returns, the bandwidth allocation within Tj is maxmin energy eﬃciency throughput fair and node j is saturated. The aggregated bandwidth of Tj is Bj , whose time B B share at node i is aj,ij + ai,Pj . The energy eﬃciency bandwidth of node j i

is e2 bj =

bj Ej .

At node i, the time fraction left to support its own bandwidth is

Energy Eﬃcient Throughput Optimization in Multi-hop Wireless Networks

1− is:

Bj j∈Ci ( aj,i

55

B

+ ai,Pj ), then the bandwidth of node i that can be supported i

⎛

⎞ Bj Bj ⎠ + φ = ⎝1 − ai,Pi aj,i ai,Pi

(2)

j∈Ci

With the bandwidth of φ, there is no idle time at node i, and the corresponding energy eﬃciency bandwidth of node i is e2 bi =

φ φ = eri eti Ei j∈Ci ( aj,i + ai,P )Bj + i

When

φ e e + a ti )Bj + a ti φ

eri j∈Ci ( aj,i

i,Pi

eti ai,Pi

φ

(3)

≥ λi [|λi |], i.e., energy eﬃciency bandwidth

i,Pi

of node i is larger than that of any other node in Ti , let bi = φ. Pump stops and there is no need to perform Drain. Otherwise, Pump goes to Step III. φ III. When < λi [|λi |], we let the energy eﬃciency e e e ( ri + ti )B + ti φ j∈Ci

aj,i

ai,P i

j

ai,P i

bandwidth of node i equal to λi [|λi |]. Node i gets the bandwidth ϕ by solving the following equation, λi [|λi |] =

Then ϕ =

λi [|λi |]

ϕ eri j∈Ci ( aj,i

eri eti j∈Ci ( aj,i + ai,P i e 1−λi [|λi |] a ti i,Pi

)Bj

+

eti ai,Pi

)Bj +

eti ai,Pi

ϕ

,

(4)

. Now node i also has the largest energy

eﬃciency bandwidth among all the nodes in Ti . Since ϕ > φ, node i must be overloaded, Pump stops and Drain is needed to decrease the bandwidth of Ti to make the bandwidth allocation feasible. Drain: The objective of Drain is to decease the bandwidth of each node in ξi [|λi |] to make the workload of node i feasible, i.e., let Wi =1. Meanwhile, the energy eﬃciency bandwidth of each node should still stay the same. However, since in ξi [|λi |], deceasing the bandwidth of a node will probably result in the decrease of energy consumption of itself and all the ancestor nodes, decreasing the bandwidth of each node while keeping their energy eﬃciency bandwidth the same is a non-trivial job. In ξi [|λi |], there are three kinds of nodes: ﬁrst, the nodes without any children in ξi [|λi |], we use k to denote them; the second is those with children in ξi [|λi |], we denote these with l; the last is node i itself. Then mathematically, the above problem can be solved by the equation set (5). The equation in {}* appears in (5) only when i is a non-sink node. Notice in (5) the idle time and idle energy consumption of node i is 0, since i is saturated before Drain is completed. In (5), the bandwidth of each node is decreased to make the workload of node i equal to 1, while keeping energy efﬁciency bandwidth the same, which is denoted by η. When Wi = 1 is satisﬁed, if η ≥ λi [|λi | − 1], then Drain is done. If η < λi [|λi | − 1], simply decreasing

56

D. Xu and X. Liu

the bandwidth of nodes in ξi [|λi |] is not enough, the bandwidth of nodes in ξi [|λi | − 1] should also be decreased. Then Drain is performed by two steps: ﬁrst, decrease the bandwidth of each node in ξi [|λi |] to make their energy eﬃciency bandwidth equal to λi [|λi | − 1], which can be done through replacing η by λi [|λi | − 1] in (5) and deleting the ﬁnal equation; second, combine the set ξi [|λi |] and ξi [|λi | − 1] and do the same operation described by (5). If at this time the current λi [|λi |] is still smaller than the current λi [|λi | − 1], repeat the above two steps. ⎧ bk −δk =η ⎪ e e e (b −δ ) ⎪ ( a rk + a tk )Bj + tka k k +eik tik ⎪ j∈C ⎪ k j,k k,Pk k,Pk ⎪ ⎪ B −δk ) j ⎪ ⎪ tik = 1 − j∈Ck ( aBj,k + ak,Pj ) − (bakk,P ⎪ ⎪ k k ⎪ ⎪ .. ⎪ ⎪ ⎪ . ⎪ ⎪ ⎪ bl −δl ⎪ =η e e e (b −δ ) ⎪ ⎪ ( a rl + a tl )(Bj − δm )+ tla l l +eil til ⎪ j,l l,Pl l,Pl ⎪ j∈C m∈ξ [|λ |]&m∈T ⎨ l i i l (5) (aj,l +al,Pl )(Bj −m∈ξi [|λi |]&m∈Tl δm ) (bl −δl ) ⎪ til = 1 − − al,P ⎪ aj,l al,Pl ⎪ l ⎪ j∈Cl ⎪ ⎪ ⎪ . ⎪ .. ⎪ ⎪ ⎪ ⎪ ⎪ bi −δi ∗ ⎪ { eri eti e (b −δ ) = η} ⎪ ⎪ (a +a )(Bj − δk )+ tia i i ⎪ j,i i,P i,P ⎪ i i j∈Ci k∈ξi [|λi |]&k=i ⎪ ⎪ ⎪ Bj − δn Ii ( Bj − δn ) ⎪ ⎪ j∈Ci n∈ξi [|λi |]&n=i n∈ξi [|λi |]&n=i i −δi ) ⎩ j∈Ci + + Ii (b =1 aj,i ai,P ai,P i

i

(5) is a non-linear equation set which deﬁnitely has an optimal solution. However, numerically, there is only an approximate algorithm to solve (5). Here we design an eﬃcient distributed algorithm to perform Drain which can get approximate optimal numerical results.

Algorithm 1. The procedure of Drain of E2 TFA for (η = λi [|λi |]; η = η − Δ; η > λi [|λi | − 1]) do E2 TFA Drain (initialnode, η, initialnode) end for if initialnode s workload is still larger than 1 then Let λi [|λi |] = λi [|λi | − 1] and ξi [|λi |] = ξi [|λi |] ∪ ξi [|λi | − 1] Algorithm is performed from beginning again end if /* Here Δ is a small value which decides precision*/

3.2

Correctness Validation

Proposition 1. E2 TFA achieves maxmin energy eﬃciency throughput fairness. Proof is available at[6].

Energy Eﬃcient Throughput Optimization in Multi-hop Wireless Networks

57

Algorithm 2. Function E2 TFA Drain (i, η, initialnode) if i is initialnode then Let all the parameters including bandwidth, idle time as what they are before the ﬁrst time the function called end if for ∀j ∈ ξi [|λi |] do if parent[j] is i then E2 TFA Drain (j, η, initialnode) end if end for if i is not initialnode then Decease bandwidth of node i by δi , which is calculated by the ﬁrst equation in (5). Update parameters of each node in ξi [|λi |], including aggregate subtree bandwidth, idle time, which varied resulted from the decrease δi . else {i is initialnode and i is not a sink} Decease bandwidth of node i by δi , which is calculated by the ﬁrst equation in (5), but here idletime is 0. end if

4

Evaluation

In this section, we evaluate the performance of E2 TFA and compare it with other fairness schemes including maxmin throughput fairness bandwidth allocation(MMFA), maxmin time fairness bandwidth allocation(MTFA) and maxmin energy fairness bandwidth allocation(MEFA). MMFA and MTFA have been studied in [3]. We propose MEFA for comparison. In MEFA, each router’s energy resource is fairly shared by all its descendants and itself. The idea of MEFA is similar with that of MTFA. Therefore, MEFA should have similar performance to MTFA. In E2 TFA, routes are predetermined. Because ﬁnding an optimal routing that yields best throughput is NP-hard, we implement two heuristic schemes. The ﬁrst is the tree construction algorithm proposed in [3], which can iteratively improve throughput. We call it ITCA here. ITCA provides good performance for networks with a small number of hops but works slowly when the number of hops increases. Therefore, we also deploy a shortest distance routing (SDR) scheme, where a node chooses its ﬁrst-hop router which is the nearest one among the nodes that have a shorter distance with the sink. SDR is eﬃcient in large wireless networks. In the simulation, we consider an area of 150m ∗ 150m. The link bit rate is determined as follows: it is 11Mb/s when the distance between a transmitter and its receiver is smaller than 50m, 5.5Mb/s when distance is smaller than 80m, 2Mbps when smaller than 120m, and 1Mbps when larger than 120m. Unit energy consumptions in the transmit, receive, and idle states are 1.9J/s, 1.55J/s, and 0.75J/s, respectively [5]. We conduct simulations in the following two scenarios.

58

D. Xu and X. Liu

• Scenario A: 4 sinks are located at each corner and 30 nodes are randomly located in the square. ITCA is used. • Scenario B: 1 sink is located at a corner and 25 nodes are randomly located in the square. SDR is used. First, we study the aggregate throughput of each node when it depletes energy. The initial energy of each node is set to be 100J. We compare MMFA, MTFA, MEFA and E2 TFA in both Scenario A and B. When a node runs out its energy, it quits the networks and the four schemes are executed for the rest of the nodes. As showed in Fig. 3, in both scenarios, E2 TFA provides the fairest aggregate throughput among all nodes. In Scenario A, using MMFA, routers only transmit a small amount of data before they deplete their energy resources. While in B, the aggregate throughput of leaf nodes in MTFA and MEFA is very low, since in Scenario B there is only a sink, the number of network hops is large, throughput of leaf nodes is severely penalized by MEFA and MTFA. Although leaf nodes have a longer lifetime, their aggregate throughput is still very low. Notice that routers have higher aggregate throughput, throughput, and energy eﬃciency throughput in MTFA and MEFA, which is contrary to that in MMFA. In Fig. 4, in both scenarios, MMFA results in fairest throughput(bi ) among all the nodes. On the other hand, MTFA allocates much higher throughput to routers than children nodes. In Scenario B, throughput of leaf nodes allocated by MTFA is only 14 of that given by MMFA. However, in the same scenario, compared with MTFA, E2 TFA allocates much more throughput to the leaf node, which is about 3 times more than that of MTFA. Meanwhile it still allocates similar high throughput to a router. In both scenarios, E2 TFA gives higher throughput to routers without severely penalizing any node. In Fig. 5, we show that E2 TFA achieves the fairest throughput per unit energy( Ebii ), which is the objective of E2 TFA. By MMFA, children nodes have smaller energy consumption and therefore have high energy eﬃciency throughput. MTFA and MEFA gives routers excessively high throughput, which result in a higher energy eﬃciency throughput for routers although they have relatively higher energy consumption. This aslo illustrates that MTFA is severely biased towards nodes near the sink.

Fig. 3. Each node’s aggregate throughput

Energy Eﬃcient Throughput Optimization in Multi-hop Wireless Networks

59

Fig. 4. Throughput of each node

Fig. 5. Energy eﬃciency throughput

5

Related Work

In [3], the authors consider both maxmin throughput fairness and maxmin time fairness in multi-hop WLANs and design an optimal bandwidth allocation algorithm for each objective. In [8][9][10][11], the authors study MAC layer scheduling or bandwidth allocation for ad hoc networks. In [12], the authors study maxmin fairness bandwidth allocation in multi-AP single-hop WLANs through association control. Since the problem is NP-hard, algorithms to determine user-AP associations are proposed that attain near-optimal maxmin fairness. In [13], the authors study maximum and maxmin fairness bandwidth allocation in multichannel wireless mesh networks. All the above works do not consider energy constraints. In [14], the authors consider maxmin fairness rate allocation in sensor networks. Flow split is allowed and thus the problem can be solved by a serial LP with lifetime constraints.

6

Conclusion

In this paper, we study throughput fairness and optimization in energy-constrained multi-hop wireless networks. We observe that maxmin throughput fairness biases against routers with heavier traﬃc while maxmin time fairness biases against nodes with more hops to the sink. Motivated by such observations, we propose the notion

60

D. Xu and X. Liu

of lexicographical maxmin energy eﬃciency throughput fairness with the following properties. First, the proposed fairness objective allocates more bandwidth to routers that relay packets for others and therefore encourages them to serve others; second, the throughput discrepancy between routers and their descendants is bounded. Therefore, leaf nodes can still receive satisfactory throughput even in a large network; third, by combining energy consumption and throughput, our scheme results in the most balanced aggregate throughput when all nodes use up the energy resources. We develop a distributed algorithm to achieve the above objective and validate its advantages through extensive simulations.

References 1. M. Heusse, F. Rousseau, G. Berger-Sabbatel, and A. Duda, “Performance anomaly of 802.11b,” in IEEE Infocom, 2003. 2. G. Tan and J. Guttag, “Time-based fairness improves performance in multi-rate wireless lans,” in USENIX Annual Technical Conference, 2004. 3. Q. Dong, S. Banerjee, B. Liu, “Throughput Optimization and Fair Bandwidth Allocation in Multi-Hop Wireless LANs,” in IEEE Infocom, 2006. 4. S. Lee, S. Banerjee, and B. Bhattacharjee, “The case for a multi-hop wireless local area network,” in IEEE Infocom, 2004. 5. O.Kasten, “Energy Consumption,” available at http://www.inf.ethz.ch/personal/ kasten/research/bathtub/energy consumption.html 6. Dan Xu and Xin Liu, “Energy Eﬃcient Throughput Optimization in Multi-hop Wireless Networks,” http://www.cs.ucdavis.edu/˜liu/. 7. T. Nandagopal, T.-E. Kim, X. Guo, and V. Bharghavan, “Achieving mac layer fairness in wireless packet networks,” in ACM MobiCom, 2000. 8. L. Tassiulas and S. Sarkar, “Maxmin fair scheduling in wireless networks,” in IEEE Infocom, 2002. 9. S. Lu, H. Luo and V. Bharghavan, A new model for packet scheduling in multihop wireless networks,in Proceedings of ACM MobiCom, 2000. 10. X. L. Huang and B. Bensaou, “On max-min fairness and scheduling in wireless ad-hoc networks: Analytical framework and implementation,” in Proceedings of in ACM MobiHoc, 2001. 11. A. Penttinen, I. Koutsopoulos and L. Tassiulas, “Low-complexity distributed fair scheduling for wireless multi-hop networks,” in IEEE Wiopt, 2005. 12. Y. Bejerano, S.-J. Han, and L. E. Li, “Fairness and load balancing in wireless lans using association control,” in Proceedings of ACM MobiCom, 2004. 13. J. Tang, G. Xue, W. Zhang, “Maximum throughput and fair bandwidth allocation in multi-channel wireless mesh networks,” in IEEE Infocom, 2006. 14. Y. Hou, Y. Shi, H. Sherali, “Rate Allocation in Wireless Sensor Networks with Network Lifetime Requirement,” in Proceedings of ACM MobiHoc, 2004.

Election Based Hybrid Channel Access Xin Wang1 and J.J. Garcia-Luna-Aceves1,2 Computer Engineering Department, University of California, Santa Cruz, Santa Cruz, CA 95064, USA 2 Palo Alto Research Center (PARC) 3333 Coyote Hill Road Palo Alto, CA 94304, USA {wangxin,[email protected]}

1

Abstract. We propose an Election based Hybrid Channel Access (EHCA) protocol for ad hoc network to achieve high throughput and bounded channel access delay at the same time. EHCA reduces the contentions during the channel scheduling formation through fair node elections, which are based on the topology information. Only the elected nodes contend for the channel and broadcast the scheduling result. Numerical analysis and simulation results show that EHCA outperforms alternative designs.

1

Introduction

The analysis about the capacity of wireless networks [5] demonstrated that perfect scheduling is the ultimate way to achieve the capacity in the MAC layer. However, in a distributed ad hoc network it is impossible to use perfect channel scheduling, and the random channel access has to be used to some extent. We propose the Election based Hybrid Channel Access (EHCA) protocol to attain both high channel utilization and bounded channel access delay. The former is important for serving data-centric applications, while the latter is critical for voice-related applications. In EHCA, channel access period is divided into four time sections. The ﬁrst section is used to exchange the neighbor information. After that, all nodes do fair elections to reduce the number of nodes which will contend for the channel access. The nodes which fail in the election will follow the scheduling result of the nodes elected. In the third section, the channel scheduling is distributed in the two-hop range and contention-free transmissions happen in the fourth section. We evaluate the performance of EHCA through analysis and simulation. Compared with existing hybrid channel access scheme [3] and

This work was supported in part by the Baskin Chair of Computer Engineering at UCSC, the National Science Foundation under Grant CNS-0435522, and the U.S. Army Research Oﬃce under grant No.W911NF-041-1-0224. Any opinions, ﬁndings, and conclusions are those of the authors and do not necessarily reﬂect the views of the funding agencies.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 61–72, 2007. c IFIP International Federation for Information Processing 2007

62

X. Wang and J.J. Garcia-Luna-Aceves

IEEE 802.11, EHCA can achieve a much higher throughput and smaller channel access delay at the same time. The rest of the paper is organized as follows. We describe the related work in Section 2. We introduce the details of the proposed approach in Section 3. We analyze the properties of EHCA in Section 4. We evaluate the performance of EHCA and compare it with alternative designs in Section 5. We conclude the paper in Section 6.

2

Related Work

Medium access control (MAC) protocols of ad hoc network can be classiﬁed into contention-based channel access and contention-free channel access. In contention-based MAC protocols, each node either detects the transmission collision (collision-detection) or tries to avoid the transmission collision through random back-oﬀs (collision-avoidance). Based on its observation of the channel status, each node contends for the channel access in a distributed fashion. Contention-based MAC protocols may experience throughput degradation at high traﬃc loads and due to their best eﬀort nature, they can not provide Quality-of-Service (Qos) support for real-time applications. In contention-free MAC protocols, a set of timetables for individual nodes or links is prearranged. Each node/links can only transmit in their assigned time/frequency slots, so that the transmissions from these nodes/links are collision-free within the eﬀective range of the transmissions. Dynamic transmission scheduling protocols can exploit spatial reuse of the wireless channel and have higher channel utilization than static scheduling approaches, e.g. TDMA. Based on whether the schedule scheme needs the topology information, the scheduling-based channel access can be further divided into topology dependent/independent scheduling. In topology dependent scheduling, global topology information is required to form the correct channel scheduling. Arikan [1] has shown that the problem of establishing an optimal interference-free schedule where the optimal is considered in term of throughput, is NP complete. Chlamtac [4] ﬁrst proposed a topology-transparent scheduling algorithm for wireless ad hoc networks. It uses polynomials over a Galois ﬁeld to assign time slots, which guarantees each node can transmit successfully at least once in a frame. This approach can provide a minimum performance guarantee for each node. It just needs the information of overall number of nodes in the network and the number of neighbors of each node. The frame length is also much smaller than the classic TDMA approach. Konstantinos [8] proposed probabilistic policy to increase the system throughput under various traﬃc loads. Ju [7] proposed an approach based on code theory to optimize the performance of Chlamtac’s algorithm in terms of minimum throughput and maximum delay. However, Carlos [11] has shown that the throughput of topology-transparent scheduling is at most the same with the slotted ALOHA.

Election Based Hybrid Channel Access

63

Hybrid channel access is proposed to take the advantages of contentionbased channel access and topology-dependent scheduling. Nodes ﬁrst use the contention-based channel access to exchange the neighbor information to build the channel scheduling or reserve time slots in the scheduling-based transmission period. The examples of hybrid channel access protocol are NAMA [3] and CATA [13]. NAMA uses a hash function, which takes the node identiﬁer and the current time slot number as input to derive a random priority for every neighbor within two hops. If a node has the highest priority, it can access the channel within the corresponding time slot. The advantage of NAMA is that it completely eliminated the communication overhead with regard to building the dynamic channel access schedule, except for collecting the two-hop neighbor information. However, NAMA has the following problems [2]: ﬁrst, a node may probabilistically derive low priority for a long period of time and never get access to the channel; second, there may be chain eﬀects to the channel access opportunities, in which the priorities of nodes cascade from high priority to low priority across the network. Chain eﬀects will reduce the spatial reuse of the whole system; third, the channel bandwidth may also be wasted when a node does not have data to send in the allocated time slot. Because of the wasted bandwidth causing starvation to the nodes with traﬃc, NAMA interacts badly with certain applications that are sensitive to the delay, such as TCP congestion control [12] and AODV route update mechanisms [9]. In CATA, the transmission period is composed of a contention period and a group transmission period. During the contention period, nodes contend for the channel access and reserve a space in the group-transmission period. Then during the group-transmission period, one or more nodes can transmit data packets without collisions. The problem of CATA is that when there is a large number of nodes, the length of the contention period may not be long enough for each node to reserve a slot in the group transmission period. In this paper, we propose a hybrid channel access protocol which reduces the control overhead during the scheduling formation through nodes elections. Compared with NAMA and CATA, it can provide a bounded channel access delay and only nodes with traﬃc can access the channel. Since it reduces the number of contending nodes through elections, it allows more nodes to successfully reserve the channel through contentions and is more suitable for mobile scenarios.

3

Election Based Hybrid Channel Access (EHCA)

We assume that each node is synchronized on slot systems and nodes access the channel based on slotted time boundaries. Each time slot is numbered relative to a consensus starting point. We divide the channel access period into four diﬀerent sections, as Figure 1 shows:

64

X. Wang and J.J. Garcia-Luna-Aceves

Fig. 1. Channel access period

3.1

Neighbor Information Exchange Period

The neighbor information exchange period is used to maintain the neighbor information, send the reservation requests and distribute the reservation information in the two-hop range. All nodes adapt the 802.11 Distributed Coordination Function (DCF) to contend for the channel access during the neighbor information exchange period. It can be further divided into two sections: one-hop broadcast period and one-hop re-broadcast period. Each node will do the following in the one-hop broadcast period: – If a node does not want to reserve a slot in the scheduling-based transmission period, just send a HELLO packet to maintain the neighbor information. – Otherwise, a node needs to send a RESERVE REQUEST packet which indicates the receiver of the transmission (NULL for broadcast packet). The format of the RESERVE REQUEST is (src, dest, type). The type ﬁeld indicates the previous scheduling is a failure or not. – Node i will classify the set of the RESERVE REQUEST information it has collected during the one-hop broadcast period as one-hop link set li1 . – Each node will choose the node with the largest MAC address in its onehop range as the leader of the network. The leader information is used for the time synchronization across diﬀerent networks, which will be further discussed in Section 3.5. Then in the one-hop re-broadcast period, node i will forward the li1 to its neighbors, which guarantees the RESERVE REQUEST will be distributed in the two-hop range. We deﬁne the RESERVE REQUEST information node i has 2 received from node j during the one-hop re-broadcast period as lij . The ﬁnal RESERVE REQUEST information node i has collected (li ) is: 2 li = {li1 ∪ lij

Ni1

∀j ∈ Ni1 }

(1)

where is the set of node i’s one-hop neighbors. After one-hop re-broadcast, each node will compare the MAC address of its two-hop neighbors with its current one-hop leader, then update the node with the largest MAC address in its two-hop range as the network leader. We denote Nmax1 as the maximum number of one-hop neighbors. Nmax1 is a predeﬁned value to control the node density in the network. The length of the neighbor information exchange period Tne needs to be long enough to allow every nodes to broadcast twice. In this paper, we set Tne as 2Nmax1 × Tb , where Tb is the maximum time needed to send a broadcast packet using 802.11 DCF, including carrier sensing and exponential back-oﬀ.

Election Based Hybrid Channel Access

65

At the end of the neighbor information exchange period, we use one-hop link contender election to reduce the possible contentions in the channel reservation 2 period. Each node i will compare the li1 with lij for its every neighbor j. If node 1 2 1 2 i ﬁnds li = li ∪ lij and li = lij , it means node i and j have the exactly same set of one-hop links, which constitute the total RESERVE REQUEST information node i has collected, then node i will compare the MAC address of i and j. If i has a smaller MAC address, it will give up doing channel reservation and follow the scheduling result of the node j. We deﬁne the node with a larger MAC address as contender. Consider the case in which a lot of nodes are close to each other, this approach elects the node with the largest MAC address as the contender and it will do the scheduling for all the one-hop links, which reduces the possible channel access contentions. 3.2

Contention-Based Channel Reservation Period

During the contention-based channel reservation period, we extend the 802.11 DCF to form the channel scheduling in a distributed fashion. We deﬁne Dmax as the maximum delay a frame can tolerate, which is dependent on speciﬁc application. Nmt is the number of maximum transmissions a frame can support in order to satisfy Dmax . The priority of node i (iprio ) is the overall number of links it has collected in the neighbor information exchange period: iprio = |li |

(2)

The node priority will be broadcasted along with the scheduling results during the reservation information exchange period (section 3.3), then each node compares the node priority it has received. If for node i, there are two nodes j and k (j, k ∈ Ni1 ) with the same highest priority (larger than iprio ), i will deﬁne the contention link set (li1 cont ) as follows: 2 2 li1 cont = li1 ∩ lij ∩ lik 2 2 |li1 cont | = |li1 ∩ lij ∩ lik |

(3)

The contention priority of node i (icont prio ) is the number of links in the contention link set (li1 cont ): icont prio = |li1 cont |

(4)

The length of the back-oﬀ time (Tbackof f ) is decided by the number of links it has observed and the type of links, as Equation 5 shows: Tsif s + Random × Ts if |li1 cont | > 0 (5) Tbackof f = Tsif s + (2Nmt − |li |) × Ts if |li1 cont | = 0 Each node keeps carrier sense the channel for a Short Inter Frame Space time (Tsif s ), which is deﬁned in IEEE 802.11 [6]. If the channel is idle and |li1 cont | > 0, the back-oﬀ time equals to Tsif s + Random × Ts , where Random is

66

X. Wang and J.J. Garcia-Luna-Aceves

a random variable uniformly distributed in [0, Nmt ], Ts is the minimum time slot length deﬁned in IEEE 802.11 [6]. When |li1 cont | = 0, the back-oﬀ time equals to Tsif s + (2Nmt − |li |) × Ts . Through this approach, we divide the back-oﬀ period into two sections. One section is for the contention links while the other is for the remaining two-hop links, as Figure 2 shows. A node which has observed contention link set always has a shorter back-oﬀ period than than a node which has not.

Fig. 2. Node back-oﬀ scheme

During the back-oﬀ period, nodes which have contention links will access the channel and broadcast its scheduling result before the nodes which have not. In other words, contention links are scheduled before the normal links. After the scheduling for the contention links is formed, the node which can observe the largest number of remaining two-hop links will have the shortest Tbackof f . It will ﬁrst access the channel and build the channel scheduling for the rest links. When a node gets the channel access, it will send an ordered set which indicates the corresponding scheduling as (sourcen , destn , type). After receiving the scheduling results from neighbor j, a node will ﬁrst compare the jprio and jcont prio . The scheduling results of contention link set li1 cont is decided by the node with the highest contention priority (N odecont prio ). The scheduling results of rest links (li − li1 cont ) is decided by the node with the highest node priority (N odeprio ). If all the link schedules of a node are already formed by its neighbor with a higher priority, a node will give up its attempt to contend the channel. We give a simple example to show how the channel scheduling is formed, as Figure 3 shows. We assume node D is the node with the highest priority 4, which can observe links AB, BD, DE and EG. Then node D will ﬁrst access the channel and broadcast the scheduling result of those four links to nodes {B, C, E, F }, which will distribute the scheduling information to nodes A and G during the reservation information exchange period (Section 3.3). We also consider the case that two nodes observe the same number of links, their node priorities (N odeprio ) are the same and their channel accesses will experience a collision. For example, we assume the nodes priorities and the contention link sets in Figure 3 are: – – – –

Aprio = Dprio = Gprio = 6 Bprio = Cprio = Eprio = Fprio = 2 1 1 lB cont = lC cont = {AB, DB} 1 1 lE cont = lF cont = {DE, GE}

Election Based Hybrid Channel Access

67

Fig. 3. Channel scheduling formation example

In the ﬁrst round of scheduling, node A and D will have the same Tbackof f and collide at hidden terminals B and C, node D and G will collide at hidden terminals E and F . We use failed-link contender election to solve this problem. as Algorithm 1 shows. We deﬁne li1 f ail as the set of links which are collected by node i during 2 one-hop broadcast period with type failure. lij f ail is the set of links node i has received from node j during one-hop re-broadcast period with type failure. Then at the end of neighbor information exchange period, we compare the li1 f ail and 2 1 2 lij f ail for each neighbor j. If li f ail ⊂ lij f ail , node i will not schedule the failed 2 links, just follow the schedule result of node j. If li1 f ail = lij f ail , we compare the iprio and jprio , the one with a higher priority will be elected as the contender. If iprio = jprio , we further compare the MAC address to break the tie. Now when we revisit the previous example, node B and C will elect one node as contender. According to the backoﬀ scheme we have introduced, this contender will access the channel before nodes A and D to build the scheduling for contention link set {AB, DB}. Then nodes A and D just need to schedule the rest four links. It is the same case for nodes {D, E, F, G}. The length of the contention-based channel reservation period (Tcr ) needs to be long enough to for nodes with the same highest contention priority to send their reservation packets, which is the worst case for the contention-based channel reservation. In this paper, based on the simulation experiment, we set Tcr as 6 × (Tmax backof f + Tr ), where Tmax backof f is the maximum back-oﬀ time and Tr is the needed to send a reservation packet. 3.3

Reservation Information Exchange Period

During the reservation information exchange period, nodes broadcast the scheduling results they have received and the related N odeprio to the neighbors, thus the scheduling results are distributed in the two-hop range. If a node receives a diﬀerent channel scheduling with the same priority, it will mark the type of corresponding link as failure.

68

X. Wang and J.J. Garcia-Luna-Aceves

Algorithm 1. Failed-link contender election algorithm /* First step, get the contend link sets */ 1: for each node j, k ∈ Ni1 do 2: if j prio == k prio == max(Ni1 prio ) 3: && max(Ni1 prio ) > i prio then 4: /* node j and node k are the highest among all node i’s one-hop neighbors */; 2 2 ∩ lik ; 5: li1 cont = li1 ∩ lij 2 2 ∩ lik |; 6: |li1 cont | = |li1 ∩ lij 7: end if 8: end for /* Second step, elect the contender for the failed links */ 9: for each node j ∈ Ni1 do 2 10: if li1 f ail ⊂ lij f ail then 11: contender = j; 12: end if 2 13: if li1 f ail == lij f ail then 14: if iprio ! = jprio then 15: contender = max prio(i, j); 16: else 17: contender = max mac address(i, j); 18: end if 19: end if 20: end for

The length of the reservation information exchange period (Tre ) needs to be long enough to allow every node to broadcast once, which is half of the neighbor information exchange period (Tre = Nmax1 × Tb ). 3.4

Schedule-Based Transmission Period

In schedule-based transmission period, each node will follow the channel scheduling with the highest priority to send its own packets. The length of the schedulebased transmission period (Tst ) equals to Nmt × Tdata , where Tdata is the time needed to send a data packet with the maximum payload length. When a node experiences a collision during the schedule-based transmission, it will mark the type of corresponding link as failure. 3.5

Network Merge Consideration

Under mobile scenarios, when two networks which are in diﬀerent time sections merge into one, they need to synchronize on one time section to form the correct scheduling. We address this problem by leader election. We get the network leader information during the neighbor information exchange period (section 3.1). Then we add the leader, N odepriority and current time section information in the header of each frame sent after the neighbor information exchange period. The basic principle is to detect the network merge by identifying the leader of the network, then letting the network with the fewer transmissions to synchronize on the

Election Based Hybrid Channel Access

69

time section of the network with more transmissions. If two network have the same number of transmissions, all nodes will follow the network leader with a larger MAC address. If two networks merge before neighbor information exchange period and both of them have not formed the leader and priority information, they can ﬁnally form a new network, although it may not have all the transmission information. If one network is already in the time section after the neighbor information exchange period while the other is not, the latter will synchronize on the time section of the former.

4

Performance Analysis

1 Through fair node election, each node in EHCA can reserve Nmax2 +1 of the schedule-based transmission period and access the channel in up to two time frames, which are at the order of Θ(Nmax1 ) slots. We compare the per-node throughput and maximum channel access delay (dmax ) of EHCA with other channel access schemes through numerical analysis, as Table 1 shows, where Nmax2 is the maximum number of nodes in the two-hop range. N is the overall number of nodes in the network. It demonstrates that EHCA achieves a good balance between throughput and delay.

Table 1. Comparison with existing channel access schemes

Protocol

Per-node throughput 1 Θ( Nmax2 ) 1 O( Nmax2 ) Θ( N1 )

dmax (unit:slot)

EHCA Θ(Nmax1 ) NAMA [3] ∞ TDMA Θ(N ) 2 2 Topology Nmax2 log2 N (N Nmax2 ) O( log ) O( log ) 2 2N Nmax2 log2 N max2 transparent [4]

5

Performance Evaluation

We have implemented the EHCA and NAMA under Qualnet [10]. We use the simulation setting that 50 nodes are uniformly distributed across a 400 × 400 square meters area. Each node uses 802.11a as the physical layer and the transmission rate is 54 Mbps. Packets are served in First-In First-Out (FIFO) order. The duration of the simulation is 90 seconds. The transmit power is set to 16 dBm, receive sensitivity to -69 dBm. We set Nmax1 equal to 20 and Nmt equal to 400. The simulations are repeated with ten diﬀerent seeds to average the results for each scenario.

70

5.1

X. Wang and J.J. Garcia-Luna-Aceves

Static Topology

We use twenty Constant Bit Rate (CBR) ﬂows with varying inter-packet times to evaluate the performance for real time applications. The packet length of the CBR ﬂow is 512 bytes. The senders and destinations are more than two-hops away from each other. This ensures that the metrics measured are reﬂective of multi-hop traﬃc. The simulation results are shown in Figure 4. Throughput

Delay

3.2e+07

0.09 802.11 NAMA EHCA

3e+07

802.11 NAMA EHCA

0.08

2.8e+07 0.07

End to end delay(sec)

Throughput(bits/s)

2.6e+07 2.4e+07 2.2e+07 2e+07 1.8e+07

0.06 0.05 0.04 0.03

1.6e+07 0.02 1.4e+07 0.01

1.2e+07 1e+07

0 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000

150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 Packets per seconds

Packets per seconds

(a) Flow Throughput

(b) Delay

Jitter

Packet delivery ratio

0.01

100 802.11 NAMA EHCA

0.009

80

Packet delivery ratio(%)

0.008

0.007 Jitter(sec)

802.11 NAMA EHCA

90

0.006

0.005

0.004

70 60 50 40 30

0.003

20

0.002

10 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 Packets per seconds

(c) Jitter

150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 Packets per seconds

(d) Packet Delivery Ratio

Fig. 4. Static Topology

5.2

Dynamic Topology

We use the Random-WayPoint as the mobility model and the way point speeds randomly varying from 1 to 10 meters/second. The pause time is 10 seconds. DSR is used as the routing protocol. The results are shown is the Figure 5. Through the comparisons of Figure 4(a) - 4(d) and Figure 5(a) - 5(d). We can see that under light traﬃc loads, EHCA performs similar to IEEE 802.11 and NAMA, but with the increase of the traﬃc load, the contention-based protocol begins to perform badly. EHCA also outperforms NAMA because NAMA does not have enough spatial reuse. The end to end delay of EHCA remains almost constant. 5.3

Interaction With TCP

We generate a traﬃc scenario which integrates the CBR traﬃc and TCP traﬃc to evaluate the interaction between EHCA and TCP. Twenty FTP ﬂows are

Election Based Hybrid Channel Access

Throughput

71

Delay

1.8e+07

0.08 802.11 NAMA EHCA

802.11 NAMA EHCA

0.07

1.6e+07

0.06 End to end delay(sec)

Throughput(bits/s)

1.4e+07

1.2e+07

1e+07

0.05

0.04

0.03

8e+06 0.02 6e+06

0.01

4e+06

0 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 Packets per seconds

150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 Packets per seconds

(a) Flow Throughput

(b) Delay

Jitter

Packet delivery ratio

0.009

90 802.11 NAMA EHCA

802.11 NAMA EHCA

80

0.008

70 Packet delivery ratio(%)

Jitter(sec)

0.007

0.006

0.005

60

50

40

0.004 30 0.003

20

0.002

10 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000

150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 Packets per seconds

Packets per seconds

(c) Jitter

(d) Packet Delivery Ratio

Fig. 5. Mobile Topology Throughput

Delay

2.2e+07

0.09 802.11 NAMA EHCA

2e+07

0.07 End to end delay(sec)

1.8e+07

Throughput(bits/s)

802.11 NAMA EHCA

0.08

1.6e+07

1.4e+07

1.2e+07

1e+07

0.06

0.05

0.04

0.03

8e+06

0.02

6e+06

0.01 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 Packets per seconds

(a) CBR Flow Throughput

150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 Packets per seconds

(b) Delay of CBR Flow

Fig. 6. Interaction with TCP

pumped into the network along with twenty CBR ﬂows. The sources and destinations of the FTP ﬂows are randomly chosen such that they are more than 2 hops away from each other. The results are shown in Figure 6(a) - 6(b), which indicate that EHCA performs well with TCP traﬃc.

6

Conclusion

This paper introduced an election based hybrid channel access (EHCA) protocol. The advantage of EHCA is that reduces the number of contention nodes through node elections, thus reducing the additional control overheads during the channel scheduling formation. EHCA can achieve high system throughput and bounded

72

X. Wang and J.J. Garcia-Luna-Aceves

channel access delay at the same time. It is particularly suited for multi-hop ad hoc networks over which both voice and data services must be provided. We have shown through analysis and simulation that EHCA outperforms TDMA, IEEE 802.11 and existing hybrid channel access scheme.

References 1. E. Arikan. Some Complexity Results about Packet Radio Networks. IEEE Transactions on Information Theory, 30(4):681–685, Jul 1984. 2. L. Bao. MALS: Multiple Access Scheduling Based on Latin Squares. In IEEE MILCOM 2004, October 31-November 3, 2004. 3. Lichun Bao and J. J. Garcia-Luna-Aceves. A New Approach to Channel Access Scheduling for Ad Hoc Networks. In ACM Seventh Annual International Conference on Mobile Computing and networking(Mobicom), 2001. 4. I. Chlamtac and A. Farago. Making Transmission Schedules Immune to Topology Changes in Multi-hop Packet Radio Networks. IEEE/ACM Transactions on Networking, 2(1):23–29, February 1994. 5. P. Gupta and P. R. Kumar. The Capacity of Wireless Networks. IEEE Trans. on Inf. Theory, 46:388–404, March 2000. 6. IEEE Standard for Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Speciﬁcations, Nov 1997. 7. Ji-Her Ju and Victor O. K. Li. An Optimal Topology-Transparent Scheduling Method in Multi-hop Packet Radio Networks. IEEE/ACM Transactions on Networking, 6(3):298–306, June 1998. 8. Konstantinos Oikonomou and Ioannis Stavrakakis. Analysis of a Probabilistic Topology-Unaware TDMA MAC Policy for Ad-Hoc Networks. IEEE JSAC Special Issue on Quality-of-Service Delivery in Variable Topology Networks, 22(7):1286– 1300, September 2004. 9. C. Perkins, E. Belding-Royer, and S. Das. RFC 3561- Ad hoc On-Demand Distance Vector (AODV) Routing, Jul 2003. 10. Qualnet Simulator. Scalable Network Technologies, http://www.scalable-networks. com/. 11. Carlos H. Rentel. Network Time Synchronization and Code-based Scheduling for Wireless Ad Hoc Networks. Ph.D. Thesis, Carleton University, January 2006. 12. T. Socolofsky and C. Kale. RFC 1180 - A TCP/IP Tutorial, Jan 1991. 13. Z. Tang and J.J. Garcia-Luna-Aceves. A Protocol for Topology-Dependent Transmission Scheduling. In Proceedings of IEEE Wireless Communications and Networking Conference(WCNC), September 21-24, 1999.

Asynchronous Data Aggregation for Real-Time Monitoring in Sensor Networks Jie Feng, Derek L. Eager, and Dwight Makaroﬀ Department of Computer Science, University of Saskatchewan, Saskatoon, SK S7N 5C9, Canada {jif226,eager,makaroff}@cs.usask.ca

Abstract. Real-time monitoring applications for sensor networks can require high sampling rates and low-delay forwarding of the sensor values to a sink node at which the data is to be further processed. High data collection rates can be eﬃciently supported by aggregating data as it is being forwarded to the sink. Since aggregation requires that some sensor data be delayed at intermediate nodes, while waiting for other data to be received, a key issue in the context of real-time monitoring is how to achieve eﬀective aggregation with minimal forwarding delay. Previous work has advocated synchronous aggregation, in which a node’s position in the aggregation tree determines when it transmits to its parent. This paper shows that asynchronous aggregation, in which the time of each node’s transmission is determined adaptively based on its local history of past packet receptions from its children, outperforms synchronous aggregation by providing lower delay for a given end-to-end loss rate. Keywords: sensor networks, aggregation protocols, real-time monitoring, performance evaluation.

1

Introduction

Data aggregation is an important technique for reducing sensor network traﬃc and energy consumption [1,2,3,4,5,6]. Various aggregation protocols have been proposed for diﬀerent applications, such as monitoring and periodic data collection [7,8,9], dynamic event detection[10], and target tracking[11]. This paper considers data aggregation in the context of sensor networks supporting realtime monitoring, speciﬁcally real-time monitoring systems where sensor data is sampled periodically and forwarded to a single sink node. A high sampling rate and a low delay in forwarding data to the sink are required in such systems so as to maintain a current “view” of the environment being monitored. Aggregation protocols for sensor networks with periodic traﬃc transmit sensor values over a tree or cluster topology, rooted at the sink [12,7,8,9]. Previous work has advocated synchronous aggregation protocols, in which a sequence of time intervals are statically deﬁned for each round (collection of one set of sensor values), with each interval dedicated to transmissions from particular sensor nodes. TAG is an example of an aggregation service using synchronous I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 73–84, 2007. c IFIP International Federation for Information Processing 2007

74

J. Feng, D.L. Eager, and D. Makaroﬀ

aggregation [7]. In TAG, each node, beginning with the sink node, informs its children in the aggregation tree of the interval during which it will be receiving data. A child’s transmission interval is ﬁxed as the receiving interval of its parent, and the child’s own receiving interval is chosen as the immediately preceding interval. Thus, all of the sensors at the i’th level of the aggregation tree, 1 ≤ i ≤ H, share transmission interval H − i, where H denotes the height of the tree and where the ﬁrst interval in a round is numbered as interval zero. All intervals are of identical duration. A potential disadvantage of synchronous aggregation is increased delay, since the interval duration must be conservatively chosen so as to provide a high probability that each node will be able to successfully transmit its data to its parent prior to the end of its transmission interval. A second potential disadvantage is that the constraints imposed on node transmission times may result in suboptimal use of spatial multiplexing. Solis and Obraczka have described and evaluated two asynchronous aggregation protocols, called periodic simple and periodic per-hop aggregation [8]. In periodic simple aggregation, each node waits for a period of time equal to the round duration, aggregating all of the data received from its children over that period, before transmitting to its parent. This approach does not provide low delay; in fact, data generated during one round may not be received at the sink for a number of rounds equal to the height H of the aggregation tree. Periodic per-hop aggregation is similar to periodic simple aggregation in that nodes may wait for a period of time equal to the round duration before transmitting to their parent, but each node may transmit earlier if data is received and aggregated from all children prior to the end of the round. Again, this approach may result in long delays, with the data generated in one round not being received at the sink until some subsequent round. These simple asynchronous protocols were found to yield poorer performance than synchronous aggregation. In this paper, improved asynchronous aggregation protocols are designed through use of more aggressive methods for determining when a node should transmit to its parent. If a node receives data from all of its children prior to sending its own data to its parent, in a given round, all of this data is aggregated and sent to the parent at that point. A node will also transmit to its parent if the time it has been waiting for its children exceeds an adaptively determined timeout value. In this case, any “late arrivals” from its children are simply dropped. The choice of timeout value is critical, since a long timeout value may cause excessive delay, while substantial data loss may be incurred if the timeout value is too short. In the proposed protocols, timeout values are adaptively determined based on local history of past packet receptions. The performance of the new asynchronous protocols, as well as that of synchronous aggregation, is evaluated using simulation. Asynchronous aggregation is found to outperform synchronous aggregation. Performance comparisons of the asynchronous protocols show that adaptation of timeout values based on a weighted average of history information from multiple rounds is preferable to adaptation based only on the immediately previous round. It is also found that randomizing the transmission times of leaf nodes to avoid congestion at

Asynchronous Data Aggregation for Real-Time Monitoring

75

the beginning of each round, and the duration of the randomization interval, have a great impact on delay and end-to-end loss rate. A method is proposed for adaptively determining the duration of the randomization interval. The remainder of the paper is organized as follows. The new asynchronous aggregation protocols are presented in Section 2. Section 3 presents simulation results evaluating the performance of the new asynchronous protocols and of synchronous aggregation. Section 4 concludes the paper.

2

Asynchronous Aggregation

The main goal is to design asynchronous aggregation protocols that maximize aggregation eﬃciency by ensuring that as much aggregation occurs as possible, while still providing timely arrival of aggregation results at the sink. Three asynchronous protocols are proposed in the following subsections, beginning with the simplest of these protocols, and then making enhancements that yield improved performance as shown by our performance results in Section 3. The proposed protocols run above the network layer. Aggregation is performed as data packets are forwarded to the sink. The union of the routes to the sink forms an aggregation tree with the sink as its root node. For simplicity, it is assumed that a node can aggregate data from its subtree, together with its own data, into a ﬁxed-size packet. 2.1

Basic Asynchronous Aggregation Protocol

In our basic asynchronous protocol, each non-leaf node sets a timeout in each round, establishing the maximum time it will wait to receive data from its children. The timeout value is determined adaptively, based on the timings of packet receptions from its children in the immediately preceding round. The node transmits its data packet for this round to its parent (aggregating its own data with whatever it has received from its children) either when it hears from all of its children, or when the timeout expires. For simplicity, it is assumed that all nodes agree on the same base time T0 deﬁning the beginning of the ﬁrst round. (In Section 3.5, however, it is shown that the proposed protocols are tolerant of substantial variability in the values of T0 used at diﬀerent nodes.) To avoid concurrent transmissions, each node i (other than the sink node) picks a random value ri between 0 and R at T0 , where R is a protocol parameter. At time T0 + ri , node i sends a packet containing its sensor data for the ﬁrst round to its parent. At each subsequent round j, each node i that is a leaf in the aggregation tree sends a packet containing its sensor data at time T0 + ri + (j − 1) × t, where t is the time between successive sensor readings at each node (i.e., the inverse of the sensor sampling rate). Each non-leaf node operates as follows. Let Lji denote the time by which non-leaf node i receives the last packet for round j. Let T Oij denote the timeout for round j at node i. Node i sets its timeout for the second round to T Oi2 = L1i + t.

76

J. Feng, D.L. Eager, and D. Makaroﬀ

For round j + 1, j + 1 > 2, the timeout of node i is updated as follows: 1. If node i received data packets for round j from all of its children before T Oij , it sets the timeout for round j + 1 to T Oij+1 = Lji + t + e (since a packet from each child should arrive approximately once every time t). The protocol parameter e allows for some variance in the times at which packets are successfully transmitted. 2. If the timer for round j went oﬀ before node i received packets from all of its children, node i sets the timeout for round j + 1 to T Oij+1 = T Oij + t. If node i receives one or more packets for round j after time T Oij (“late arrivals”), it updates T Oij+1 to Lji + t. Such late arrivals have been received too late to be aggregated in node i’s round j transmission to its parent, and are simply dropped, since only up-to-date data is of interest in real-time monitoring. The choice of the protocol parameter e impacts the timeliness of the arrivals of data packets at the sink, and the number of late arrivals at the intermediate nodes in the aggregation tree. If e is set too small, timeouts may be set too aggressively, and data packets that experience normal variability in transmission times may arrive after the expiry of the respective timeout and be dropped. When e is set too large, latency may build up as nodes wait for data packets that will never be received owing to transmission failures. Our experiments show, however, that e can be set to a ﬁxed value that yields good performance over a wide range of conditions. In contrast, we ﬁnd that tuning the parameter R according to the particular network scenario can yield substantial improvements in performance. A method is proposed in Section 2.3 to adaptively determine the value for R. The above protocol supports adaptivity to dynamics in the topology of the aggregation tree, as long as nodes have some mechanism for dynamically altering when necessary their set of child nodes and their parent. The timing of transmissions can be quickly adjusted according to the above rules. 2.2

Asynchronous Aggregation Protocol with EWMA

While the basic protocol is straightforward, it may cause a “timeout chain” phenomenon under certain circumstances. Suppose, for example, that a node 1 has only one child, node 2, and only one grandchild, node 3. Suppose that node 2 receives the packet for round j from node 3 at time Lj2 , and sets its timeout for round j + 1 to Lj2 + t + e. Suppose further that the aggregate sent by node 2 arrives at node 1 after a transmission delay d, causing node 1 to set its timeout for round j + 1 to Lj2 + d + t + e. If the round j + 1 packet from node 3 is not successfully received by node 2, node 2 will time out and send its packet for round j + 1 at time Lj2 + t + e. If the transmission delay of this packet exceeds d, node 1 will also time out, causing the packet to be a late arrival and be dropped. The main reason for the above phenomenon is that the timeout for the next round at a node is set too aggressively when packets are received from all children prior to timeout expiry. In the second asynchronous protocol that we propose, an Exponentially Weighted Moving Average (EWMA) strategy is used to adjust the

Asynchronous Data Aggregation for Real-Time Monitoring

77

timeout value in this case. Speciﬁcally, if a node i heard from all of its children before timeout in round j, it sets its timeout for round j + 1 to T Oij+1 = (1 − δ) × (T Oij + t) + δ × (Lji + t + e)]. The parameter δ controls how quickly the protocol reacts to changes in the network. The above adjustment method bears some similarity to the Additive Increase Multiplicative Decrease (AIMD) algorithm in TCP. Both react slowly to “good news” while aggressively to “bad news”. In our case, the timeout for the next round is adjusted cautiously when packets are received from all children prior to timeout expiry, but more aggressively when there is a late arrival. 2.3

Adaptive Asynchronous Aggregation Protocol

It is important to randomize the transmission times of leaf nodes to avoid congestion at the beginning of each round. The parameter R controls the duration of the randomization interval. Choosing an appropriate value of R requires balancing the risk of congestion (if R is set too small) versus increased delay (if R is set too large). The best value is network dependent. In our third asynchronous aggregation protocol, R is determined adaptively. Based on the observation that the “asynchronous with EWMA” protocol achieves good performance when the ratio of R to the average data collection delay D is within a certain range, the adaptive protocol works by calculating the average data collection delay at the sink and adjusting R when the ratio is out of range. The data collection delay for a round is deﬁned as the maximum delay from when a sensor value is captured, until the corresponding aggregate arrives at the sink. Although sensor values may be captured at somewhat different times at the various nodes, in our simulation implementation the capture times are approximated for each node and round j as T0 + (j − 1) × t. The average data collection delay is calculated as D = αD + (1 − α)D∗ , where ∗ D is the latest measurement for the data collection delay and α is a parameter determining the weight given to the previous value of the average. Suppose the desired range of R/D is [β −Δ, β +Δ]. When the sink observes that R/D is out of range, it updates R as follows. If R/D < β −Δ, R is updated to R = D×(β +Δ). If R/D > β + Δ, R is updated to R = D × (β − Δ). With suitable parameter value selections, changes to R with this protocol would be relatively rare, and we do not model any particular technique for communicating changes in R from the sink to the other network nodes.

3 3.1

Comparative Evaluation Synchronous Aggregation

The synchronous aggregation protocol used here for comparison uses a similar synchronization structure as that in TAG [7] and the cascading timeout protocol [8]. In particular, it is assumed that each node i knows its hop count to the sink, hi , and accordingly chooses its transmission interval within each round. Let I be the duration of the interval. For each round j, node i picks a random value

78

J. Feng, D.L. Eager, and D. Makaroﬀ

rij between 0 and λ × I, 0 ≤ λ ≤ 1, aggregates the data it has received for this round and sends out its packet at T0 +t×(j −1)+(H −hi )×I +rij . Randomizing transmissions over λ × I yields better performance than when all nodes at the same tree level attempt to transmit at the same time. Parameter λ is set to 0.8 in all experiments. Alternative values were tried, but did not yield better performance. The duration of the interval is the decisive performance factor once the network conﬁguration is ﬁxed. In the performance evaluation experiments, diﬀerent interval durations are used to explore the best achievable performance. 3.2

Goals, Metrics, and Methodology

The performance of the asynchronous protocols and the synchronous protocol is evaluated through ns2 simulation. The primary metrics are the end-to-end loss rate, equal to the ratio of the number of samples not included in the aggregates arriving at the sink to the total number of samples the nodes generate, and the maximum data age, which measures how old the data at the sink can be by the time the next samples arrive. The maximum data age is approximated by t plus the average data collection delay. An additional metric for which some results are presented is the average number of MAC layer data packet transmissions per round, which may yield insight into relative energy usage. Diﬀerent sensor networks are generated by randomly scattering nodes in square areas with diﬀerent sizes. The sink is located in the center of the network unless otherwise stated, and the aggregation tree is constructed as a shortest path tree. The physical layer packet loss rate is speciﬁed as a simulation input parameter. The uniform random error model is used for all experiments except those in which the two-state Gilbert error model is used to simulate channel errors (Section 3.5). An 802.11 MAC layer is simulated, without RTS/CTS [13], with a transmission range of 40 meters and rate of 2Mbps, and a ﬁxed packet size of 52 bytes. A data packet is retransmitted up to three times before being discarded if an ACK is not received. 3.3

Parameter Analysis

While the asynchronous protocols have more protocol parameters than the synchronous protocol, experimental results show that e and δ can be easily ﬁxed at 0.1 second and 0.05 respectively for all network settings. For adaptive aggregation, α, β, and Δ are ﬁxed at 0.875, 0.7, and 0.15 respectively. Fig. 1 shows the impact of e on end-to-end loss rate and maximum data age. Fixing e at 0.1 second yields good performance for diﬀerent physical layer loss rates. Experiments are also conducted with diﬀerent sensor networks and the results show that good performance is achieved for all sensor networks when e is 0.1 second. Similar results are obtained showing that the chosen values of the other parameters (δ, α, β, and Δ) yield good performance across all of the simulated network conﬁgurations. Figures showing the impact of these parameters are omitted.

20%

30% physical layer loss rate 20% physical layer loss rate 10% physical layer loss rate

10% end-to-end loss rate

maximum data age (in seconds)

Asynchronous Data Aggregation for Real-Time Monitoring

1%

0.1% 0.05% 0

0.05

0.1 e

0.15

(a) e - End-to-end Loss Rate

79

30% physical layer loss rate 20% physical layer loss rate 10% physical layer loss rate

1.4 1.2 1 0.8 0.6 0.4 0.2 0

0.2

0

0.05

0.1 e

0.15

0.2

(b) e - Maximum Data Age

Fig. 1. Impact of Parameter e (160 nodes, 250m × 250m, t = 0.5 sec., R = 0.3 sec.)

The performance of the synchronous protocol is very sensitive to the choice of the duration of the interval. Experimental results show that link quality and aggregation tree structure have a great impact on the choice of I. In practice, it may be diﬃcult to set this parameter in a manner yielding consistently good performance. 3.4

Principal Performance Comparison

Fig. 2 shows the performance of the aggregation protocols at three diﬀerent sampling rates. Each point for basic asynchronous and asynchronous with EWMA is achieved at a speciﬁc R. For t = 0.25, R ∈ [0, 0.1, 0.2, 0.25]. For t = 0.5 and 0.75, R ∈ [0, 0.1, 0.2, 0.3, 0.4, 0.5] and [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.75] respectively. Similarly, each point for the synchronous protocol is achieved at a speciﬁc I. Two diﬀerent values, 0 and t, are used as the initial value for R in adaptive asynchronous. For all three sampling rates, asynchronous aggregation achieves lower maximum data age than synchronous aggregation for a given end-to-end loss rate. The performance of basic asynchronous and asynchronous with EWMA is very close when t is 0.5 and 0.75 second. Fig. 2 shows that both protocols get end-toend loss rate close to 1% at similar maximum data age. At such a low end-to-end loss rate, almost all packet transmissions are triggered when a parent hears from all children. The timeout strategy doesn’t have much impact. When t is 0.25 second, the end-to-end loss rate gets higher and the diﬀerence between the two protocols becomes bigger as more packet transmissions are triggered by timeout. Fig. 3 shows the performance of the considered protocols with alternative sink placement. The same sensor network as for Fig. 2 is used, but with the sink at the corner. Fig. 3 (compare with Fig. 2) shows that the maximum data age of the synchronous protocol substantially increases when the sink is located at the corner while that of the asynchronous protocols stays around the same. A close look at the tree structure shows that the aggregation tree with the sink at the corner is more than twice as long as the one with the sink in the center, but the maximum number of nodes at the same tree level is quite similar in both cases.

80

J. Feng, D.L. Eager, and D. Makaroﬀ 20% 20%

10% end-to-end loss rate

end-to-end loss rate

10%

1% synch. basic asynch. asynch. with EWMA adaptive asynch. initial R = 0 adaptive asynch. initial R = 0.25

0.1% 0.05% 0

0.2

0.4

0.6

0.8

1

1% synch. basic asynch. asynch. with EWMA adaptive asynch. initial R = 0 adaptive asynch. initial R = 0.25

0.1% 0.05% 0 1.2

1.4

maximum data age (in seconds)

0.2 0.4 0.6 0.8 1 1.2 1.4 maximum data age (in seconds)

(a) t = 0.25 second

(a) t = 0.25 second 20% 20% end-to-end loss rate

1% synch. basic asynch. asynch. with EWMA adaptive asynch. initial R = 0 adaptive asynch. initial R = 0.5

0.1% 0.05%

end-to-end loss rate

10%

10%

1% synch. basic asynch. asynch. with EWMA adaptive asynch. initial R = 0 adaptive asynch. initial R = 0.5

0.1% 0.05% 0

0

0.2 0.4 0.6 0.8 1 1.2 1.4 maximum data age (in seconds)

0.2 0.4 0.6 0.8 1 1.2 1.4 maximum data age (in seconds)

(b) t = 0.5 second

(b) t = 0.5 second 20% 20% end-to-end loss rate

1% synch. basic asynch. asynch. with EWMA adaptive asynch. initial R = 0 adaptive asynch. initial R = 0.75

0.1% 0.05%

end-to-end loss rate

10%

10%

1% synch. basic asynch. asynch. with EWMA adaptive asynch. initial R = 0 adaptive asynch. initial R = 0.75

0.1% 0.05% 0

0

0.2 0.4 0.6 0.8 1 1.2 1.4 maximum data age (in seconds)

(c) t = 0.75 second Fig. 2. Performance with Sink at Center (160 nodes, 250m × 250m, 20% physical layer loss rate)

0.2 0.4 0.6 0.8 1 1.2 1.4 maximum data age (in seconds)

(c) t = 0.75 second Fig. 3. Performance with Sink at Corner (160 nodes, 250m × 250m, 20% physical layer loss rate)

Thus, for synchronous aggregation, a similar interval duration is required in both cases, but twice as many intervals are required when the sink is at the corner. Another observation from Fig. 2 and Fig. 3 is that with the same 0.5 and 0.75 second sampling periods, asynchronous with EWMA performs much better than basic asynchronous with the sink at the corner. The reason for the diﬀerence can be traced back to the tree structure as well. When the sink is located at the corner, the sink only has four children and only one of these children has three children. Moreover, only one of these three grandchildren of the sink

235

synch. basic asynch. asynch. with EWMA adaptive asynch. initial R = 0 adaptive asynch. initial R = 0.5

230 225 220 215 210 205 200 195 0

0.2 0.4 0.6 0.8 1 1.2 maximum data age (in seconds)

1.4

(a) 10% physical layer loss rate

MAC layer data packet transmissions per round

MAC layer data packet transmissions per round

Asynchronous Data Aggregation for Real-Time Monitoring

350

81

synch. basic asynch. asynch. with EWMA adaptive asynch. initial R = 0 adaptive asynch. initial R = 0.5

345 340 335 330 325 320 315 310 305 0

0.2 0.4 0.6 0.8 1 1.2 maximum data age (in seconds)

1.4

(b) 30% physical layer loss rate

Fig. 4. Average Number of MAC Layer Data Packet Transmissions per Round (160 nodes, 250m × 250m, t = 0.5 sec.)

has its own children. The performance of basic asynchronous is very susceptible to the “timeout chain” phenomenon mentioned in Section 2.2 with such a tree structure. The number of late arrivals now diﬀers enough to make a more significant diﬀerence in the end-to-end loss rate. The relative diﬀerence between basic asynchronous and asynchronous with EWMA, however, doesn’t vary much with diﬀerent sink placement when t is 0.25 second. This is because packet losses due to congestion now play an important role in the network. The loss caused by the defects of basic asynchronous is less dominant. Fig. 4 shows the average number of MAC layer data packet transmission per round of the considered protocols with 10% and 30% physical layer loss rate. Fig. 4 shows that the performance improvements shown in Fig. 2 and Fig. 3 are achieved without impact on the number of MAC layer packet transmissions. The number of MAC layer data packet transmissions is very similar with all of the considered protocols, and in fact even slightly better with the asynchronous protocols when the end-to-end loss rate is low. 3.5

Other Factors

Fig. 5 shows that the maximum data age and the end-to-end loss rate of both synchronous and asynchronous aggregation get worse as the physical loss rate increases. Asynchronous aggregation outperforms synchronous aggregation for diﬀerent physical layer loss rates. When the number of nodes is ﬁxed, the performance of the asynchronous protocols is not very sensitive to the size of the area. For the synchronous protocol, the average data collection delay increases as the tree gets longer and skinnier with a lower density. The maximum data age increases accordingly. As shown in Fig. 6, the performance improvement asynchronous aggregation achieves over synchronous aggregation increases as the size of the area increases.

J. Feng, D.L. Eager, and D. Makaroﬀ

20%

synch. basic asynch. asynch. with EWMA adaptive asynch. initial R = 0 adaptive asynch. initial R = 0.5

end-to-end loss rate

10%

1%

20% 10% end-to-end loss rate

82

1%

0.1%

0.1%

0.05%

0.05% 0

0.2 0.4 0.6 0.8 1 1.2 1.4 maximum data age (in seconds)

synch. basic asynch. asynch. with EWMA adaptive asynch. initial R = 0 adaptive asynch. initial R = 0.5 0

(a) 10% physical layer loss rate

0.2 0.4 0.6 0.8 1 1.2 1.4 maximum data age (in seconds)

(b) 30% physical layer loss rate

20%

20%

10%

10%

1% synch. basic asynch. asynch. with EWMA adaptive asynch. initial R = 0 adaptive asynch. initial R = 0.5

0.1% 0.05% 0

0.2 0.4 0.6 0.8 1 1.2 1.4 maximum data age (in seconds)

(a) 200m × 200m

end-to-end loss rate

end-to-end loss rate

Fig. 5. Impact of Physical Layer Loss Rate (160 nodes, 250m × 250m, t = 0.5 sec.)

1% synch. basic asynch. asynch. with EWMA adaptive asynch. initial R = 0 adaptive asynch. initial R = 0.5

0.1% 0.05% 0

0.2 0.4 0.6 0.8 1 1.2 1.4 maximum data age (in seconds)

(b) 350m × 350m

Fig. 6. Impact of Size of Area (160 nodes, 20% physical layer loss rate, t = 0.5 sec.)

When the size of the area is ﬁxed, the maximum data age for all of the considered protocols increases as the number of nodes increases. As seen in Fig. 7, the asynchronous protocols outperform the synchronous protocol for diﬀerent numbers of nodes. For both synchronous and asynchronous aggregation, it was assumed that there is a common base time T0 that deﬁnes the beginning of the ﬁrst period at all sensor nodes. Here this assumption is relaxed by assuming that there is some variable clock shift away from this common base time, so that diﬀerent nodes consider the ﬁrst period to begin at somewhat diﬀerent times. Fig. 8 (compare with Fig. 1(b)) shows that the asynchronous protocols are much more tolerant of clock shift than the synchronous protocol. Fig. 9 considers the impact of a more bursty physical layer packet loss process on the relative performance of the aggregation protocols. The two-state Gilbert error model is used, with a “good” state in which there is no physical layer packet loss, and a “bad” state in which there is a 40% physical layer packet loss rate. When in each state, after a time duration of 5 seconds on average,

20%

20%

10%

10%

1% synch. basic asynch. asynch. with EWMA adaptive asynch. initial R = 0 adaptive asynch. initial R = 0.5

0.1% 0.05% 0

end-to-end loss rate

end-to-end loss rate

Asynchronous Data Aggregation for Real-Time Monitoring

1% synch. basic asynch. asynch. with EWMA adaptive asynch. initial R = 0 adaptive asynch. initial R = 0.5

0.1% 0.05%

0.2 0.4 0.6 0.8 1 1.2 1.4 maximum data age (in seconds)

83

0

0.2 0.4 0.6 0.8 1 1.2 1.4 maximum data age (in seconds)

(a) 120 nodes

(b) 240 nodes

20%

20%

10%

10%

1% synch. basic asynch.. asynch. with EWMA adaptive asynch. initial R = 0 adaptive asynch. initial R = 0.5

0.1% 0.05% 0

0.2 0.4 0.6 0.8 1 1.2 1.4 maximum data age (in seconds)

Fig. 8. Performance with Clock Shift (clock shift uniform in [-50 ms, 50 ms], 160 nodes, 250m × 250m, 20% physical layer loss rate, t = 0.5 sec.)

end-to-end loss rate

end-to-end loss rate

Fig. 7. Impact of Number of Nodes (250m × 250m, 20% physical layer loss rate, t = 0.5 sec.)

1% synch. basic asynch. asynch. with EWMA adaptive asynch. initial R = 0 adaptive asynch. initial R = 0.5

0.1% 0.05% 0

0.2 0.4 0.6 0.8 1 1.2 1.4 maximum data age (in seconds)

Fig. 9. Performance with Two-state Gilbert Error Model (160 nodes, 250m × 250m, t = 0.5 sec.)

a transit decision is made with 20% probability of moving to the other state. Each link independently transits between the two states. As seen in the ﬁgure, the relative performance of the various protocols is qualitatively consistent with that observed in the earlier experiments. Similar results have been obtained with other settings of the Gilbert error model parameters.

4

Conclusion

This paper presents three asynchronous protocols and compares them against each other and against synchronous aggregation for the context of real-time monitoring systems. Simulation results show that asynchronous aggregation outperforms synchronous aggregation in its ability to keep data “current” while achieving a low end-to-end loss rate. Results also show that the per-node transmission adaptation strategy is crucial in asynchronous aggregation.

84

J. Feng, D.L. Eager, and D. Makaroﬀ

References 1. Intanagonwiwat, C., Govindan, R., Estrin, D.: Directed diﬀusion: a scalable and robust communication paradigm for sensor networks. In: MobiCom ’00, Boston, MA (2000) 56–67 2. Krishnamachari, B., Estrin, D., Wicker, S.B.: The impact of data aggregation in wireless sensor networks. In: ICDCSW ’02, Vienna, Austria (2002) 575–578 3. Intanagonwiwat, C., Estrin, D., Govindan, R., Heidemann, J.: Impact of network density on data aggregation in wireless sensor networks. In: ICDCS ’02, Vienna, Austria (2002) 457–458 4. He, T., Blum, B.M., Stankovic, J.A., Abdelzaher, T.: AIDA: Adaptive applicationindependent data aggregation in wireless sensor networks. Trans. on Embedded Computing Sys. 3(2) (2004) 426–457 5. Shrivastava, N., Buragohain, C., Agrawal, D., Suri, S.: Medians and beyond: new aggregation techniques for sensor networks. In: SenSys ’04, Baltimore, MD (2004) 239–249 6. Nath, S., Gibbons, P.B., Seshan, S., Anderson, Z.R.: Synopsis diﬀusion for robust aggregation in sensor networks. In: SenSys ’04, Baltimore, MD (2004) 250–262 7. Madden, S., Franklin, M.J., Hellerstein, J.M., Hong, W.: TAG: a tiny aggregation service for ad-hoc sensor networks. SIGOPS Oper. Syst. Rev. 36(SI) (2002) 131– 146 8. Solis, I., Obraczka, K.: The impact of timing in data aggregation for sensor networks. In: ICC ’04, Paris, France (2004) 3640–3645 9. Madden, S., Szewczyk, R., Franklin, M.J., Culler, D.: Supporting aggregate queries over ad-hoc wireless sensor networks. In: WMCSA ’02, Washington, DC (2002) 49–58 10. Fan, K.W., Liu, S., Sinha, P.: On the potential of structure-free data aggregation in sensor networks. In: INFOCOM ’06, Barcelona, Spain (2006) 11. Zhang, W., Cao, G.: DCTC: dynamic convoy tree-based collaboration for target tracking in sensor networks. IEEE Trans. Wireless Commun. 3(5) (2004) 1689– 1701 12. Heinzelman, W.R., Chandrakasan, A., Balakrishnan, H.: Energy-eﬃcient communication protocol for wireless microsensor networks. In: HICSS ’00, Maui, HI (2000) 8020–8029 13. Xu, K., Gerla, M., Bae, S.: Eﬀectiveness of RTS/CTS handshake in IEEE 802.11 based ad hoc networks. Ad Hoc Netw. 1(1) (2003) 107–123

A Novel Agent-Based User-Network Communication Model in Wireless Sensor Networks Sang-Sik Kim and Ae-Soon Park Electronics and Telecommunications Research Institute {pstring,aspark}@etri.re.kr

Abstract. Wireless sensor networks generally have three kinds of objects: sensor nodes, sinks, and users that send queries and receive data via the sinks. In addition, the user and the sinks are mostly connected to each other by infrastructure networks. The users, however, should receive the data from the sinks through multi-hop communications between disseminating sensor nodes if such users move into the sensor networks without infrastructure networks. To support mobile users, previous work has studied various user mobility models. Nevertheless, such approaches are not compatible with the existing data-centric routing algorithms, and it is difficult for the mobile users to gather data efficiently from sensor nodes due to their mobility. To improve the shortcomings, we propose a view of mobility and propose a model to support a user mobility that is independent of sinks. The proposed model, finally, is evaluated by simulation of delivery ratio, latency, and network lifetime. Keywords: User Mobility Support, Wireless Sensor Networks.

1 Introduction Wireless sensor networks typically consist of three objects, as shown in Fig. 1: user, sink, and sensor node [1]. Firstly, a user is an object that disseminates an interest in the sensor field and collects data about the interest from sensor nodes. Secondly, a sink is an object that collects data. The sink receives an interest from a user and disseminates the interest inside sensor fields. The sink receives sensing data from sensor nodes and forwards the sensing data to the user. Lastly, a sensor node is an object that generates data about the interest and delivers the data to a sink. As shown in Fig. 1, the user and the sinks are mostly connected to each other by infrastructure networks. The users, however, should receive the data from the sinks through multi-hop communications between sensor nodes if such users move around the sensor networks without infrastructure networks. Recently, applications transmitting data to moving users inside sensor fields, such as rescue in a disaster area or maneuvers in a war zone, have been on the rise in large-scale sensor networks [5]. (Firefighters and soldiers are users gathering data from sensor networks.) To support mobile users in wireless sensor network, previous work has studied various user mobility models. But, until now, only three models supported the mobility I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 85–96, 2007. © IFIP International Federation for Information Processing 2007

86

S.-S. Kim and A.-S. Park

of users for those applications: the direct user-network communication model, the GPS-based user-network communication model, and the topology-control-based usernetwork communication model.

Internet & Satellite

Sink

Task Manager Node User

Sensor field

Sensor nodes

Fig. 1. Traditional Wireless Sensor Networks Model

The direct user-network communication model (D-COM) is shown in Fig. 2. It supports the mobility of a user on the assumption that the user communicates directly with sinks through infrastructure networks, namely, the Internet, such as communication systems in traditional sensor networks [1]. And, users can communicate directly with the networks via the sinks. But, in applications such as rescues in a disaster area or maneuvers in a war zone, circumstances without infrastructure networks, except sensor networks, are more prevalent. Hence, the assumption that a user and a sink can communicate directly through the Internet is not entirely accurate. The GPS-based user-network communication model (G-COM) is seen in Fig. 3. G-COM is source-based topology [5], [6], [7]. In G-COM, sensor nodes proactively construct a GRID system with GPS receivers. G-COM assumes that all sensor nodes have their own GPS receivers and an ability that constructs a GRID when a stimulus is detected. A sensor node, i.e. source, with a stimulus is going to make a GRID in a sensor field. Once a GRID is set up, mobile user floods its interests within a cell only where the user is located. When a sensor node on a GRID receives interests, it sends interests to the source along a GRID path and data from the source are forwarded to the user along the reverse path. The topology-control-based user-network communication model (T-COM) is seen in Fig. 4. It also identifies a user with a sink. This model supports the mobility of the user by reflecting the movement of the user [8], [9]. In T-COM, the user and sensor nodes construct a tree that is rooted at the user. The user always maintains the tree and gathers data from sensor nodes. Intuitively, G-COM and T-COM seem to be suitable for the aforementioned applications. But, these models also have various problems. First of all, they cannot use existing effective data collection algorithms [2], [3], [4] between a sink and sensor nodes based on the data in static sink sensor networks because of low protocol compatibility. Accordingly, such algorithms can hardly be exploited if users in sensor networks have mobility.

A Novel Agent-Based User-Network Communication Model in WSNs

87

Source Forwarding

S

Move

User

User

Static Sink

Internet &

User

Move

Move

Move Move

Satellite

Fig. 2. Direct user-network communication model

Fig. 3. GPS based usernetwork communication model

Fig. 4. Topology control based user- network communication model

The other problem is that the cost of the overhead to reorganize the network topology and reconstruct dissemination paths from sensor nodes to the mobile user is expensive. In G-COM, all sensor nodes make the topology based on location information. Accordingly, each sensor node must have its own GPS receiver. The cost of GPS receivers is decreasing, but the overall cost is still high. In T-COM, similarly, user mobility causes topology reconstruction. Users in T-COM have a tree that is rooted at each mobile user. If users move into a new location, then the root of trees must be changed, as seen in Fig. 4. This leads to enormous overhead to sensor nodes with constrained energy. Hence, this paper proposes a novel agent-based user-network communication model (A-COM). A-COM collects data through a temporary agent and delivers the data to mobile users. In A-COM, if a user intends to obtain data while moving, the user appoints a sensor node to act as an agent, and the agent forwards interests to the sink via a sensor network. The sink, having received interests from the user, collects data from sensor nodes using the existing data collection algorithm in static sink sensor networks [2], [3], [4]. The collected data are finally forwarded to the user. (If there is no sink, the agent directly disseminates interests to the whole networks and collects data.) Table 1. Taxonomy of Mobility Type Mobility Type

Compatibility Feasibility GPS Control Control Help of with Existing receivers Overheads Overheads to infrastructure Static Sink for sensors according to support networks Routing Protocols user mobility multiple users D-COM High Low Needless Low Low Mandatory G-COM Low Middle Mandatory Middle Low Needless T-COM Low High Needless High High Needless A-COM High High Needless Low Low Needless

A-COM has various advantages, as can be seen in Table 1. First of all, A-COM has the compatibility with existing static sink routing protocols without infrastructure networks. In addition, the users in A-COM do not make a topology (tree or GRID) and communicate only with agents. So, the users, while moving, are free from

88

S.-S. Kim and A.-S. Park

topology control. The user’s freedom saves much energy, and enables more users to participate in the proposed model even if the sensors have no GPS receivers. Our simulation verifies that the lifetime of sensor networks is prolonged because the user’s freedom decreases the energy consumption of the sensor nodes. Also, we verified that the performance of the data delivery ratio and the delay never decreases. Nevertheless, the movement of the user is supported by only sensor nodes without infrastructure networks. The rest of this paper is organized as follows. Section 2 describes the proposed model. Simulation results are presented in Section 3 to evaluate the effectiveness of our design and to analyze the impact of important parameters. Section 4 concludes the paper.

2 Model Analysis In our model, if a user intends to obtain data while moving, the user appoints a sensor to act as an agent and forwards an interest to the agent. If there is one or more sink(s), the agent forwards interests to sensor networks via sink(s). The number of sinks, however, depends on the network policy. A network administrator might want to set a single or more sinks in the sensor field, or alternatively the sensor field may be hazardous as he cannot reach the field. Hence, we consider three scenarios according to the number of sinks and describe the scenarios based on following assumptions. • A user can communicate with static sinks only through sensors because networks within sensor fields are infrastructure-less networks. • It is possible that multiple sinks are deployed in sensor networks and are connected to each other via the Internet. How to connect a sink with other sinks is out of the range of this paper, but this helps maximize the efficiency of gathering data. • Multiple static sinks are located in the outskirts of sensor fields. • The data which one sink collects is aggregated by the sinks. The aggregated data is shared by every multiple static sinks through the infrastructure network, namely, Internet. • The interest from a user describes how many times the sink forwards the gathered data set to a user.

S

S

Static

Static

Sink

Sink

User

First

User

Agent

Fig. 5. Dissemination of Sink Announcement Message

Fig. 6. Interest Dissemination of the user

A Novel Agent-Based User-Network Communication Model in WSNs

89

2.1 Scenario 1: Sensor Fields with Only One Sink Dissemination of sink announcement message. In the initial stage of the sensor network, the network administrator sets a sink in a suitable position: center, outskirt, or a special position according to his or her policy. If a sink is located in an arbitrary position in sensor fields, it floods a sink announcement message to announce itself inside the whole sensor field (Fig. 5). As a result of the flooding announcement message, every sensor node knows the hop counts and next hop neighbor sensor node to the sink. Dissemination of the user interest. While moving inside the sensor fields, if a user wants to collect data, the user selects the nearest node as a first agent, as shown in Fig. 6. The user delivers an interest to the first agent. The first agent, to which the user has delivered the interest, forwards the interest to the next hop neighbor node toward the sink. The next hop sensor node, which has had the interest delivered to it, also forwards the interest to the next hop neighbor node toward the sink. This process continues until the sink receives the interest of the user. Also, a route for the interest from the sink to the user has been established through this process. Based on our assumptions, the established route vanishes from the network when the described period in the interest is over. Data collection of sink. A sensor network with a static sink is a network where sensing data from sensor nodes should be transmitted to the static sink through multihop communication. So, existing routing algorithms for a static sink can be used (e.g., routing algorithms collecting data by periods, routing algorithms collecting a minority event, or routing algorithms detecting a moving object.) Hence, the network administrator can support a suitable routing protocol for users, and besides, a user can select and use the most appropriate routing algorithm with a static sink according to the policy of the networks. Such research has already been advanced [2], [3], [4]. So we will not mention it further in this paper. Rather, we use one of the existing routing algorithms to collect data in this paper. In Fig. 7, the static sink can forward interests from users to sensors and gather data from sensor networks according to the existing routing protocols. If all data are gathered by routing protocols, the static sink aggregates all data and forwards an aggregated data to the first agent. Mobility support of the user. A user may move to another place after sending an interest to the first agent. In this case, the user selects another agent that can communicate with the first agent. Also, the user makes a new connection between the newly selected agent and the original agent. (While moving inside the sensor field, the user can make more agents and connections.) These agents and connections are used for forwarding the aggregated data from the sink. Data propagation of sink. A sink delivers the aggregated data to the first agent through the reverse path of interest forwarding. The first agent delivers the aggregated data to the last agent through the connection of agents. The last agent delivers the

90

S.-S. Kim and A.-S. Park

S

S Static Sink

Static Sink

First

First Agent User

User

Agent

Move

New Agent

Fig. 7. Data Propagation to the user

Fig. 8. Mobility support of the user

aggregated data to the user. As shown in Fig. 8, the user can surely receive the aggregated data from the sink. 2.2 Scenario 2: Sensor Fields with Multiple Sinks Separation of the sensor fields. Basically, the difference between Scenario 1 and Scenario 2 is only the number of sinks. If there are more than one sink in the sensor field, this means a separation of the sensor fields. As a result of sink announcement message dissemination in this case, all sensor nodes know the nearest sink according to the hop counts. Accordingly, Interest dissemination of the user targets the nearest sink from the agent, as shown Fig. 9. The targeted sinks can be changed whenever the user wants to send its interests (see Fig. 10). Nevertheless, mobility support of the user and data propagation of the sink is still the same with Scenario 1. In addition, users may not be able to recognize how many static sinks are in the sensor fields. This means that the proposed model is independent of the number of sinks. Interest

S

S

S

Static

Static

Sink #4

Sink #1

Static Sink #1

S

Static Sink #4 Move

Static Sink #2

S

Static User

Static

Sink #2

S

Sink #3

Fig. 9. Seperation of the Sensor Fields

S Interest

Interest

Static

S

Sink #3

Fig. 10. Interest Dissemination of the user with multiple sinks

Data sharing of multiple static sinks. As shown in Fig. 1, a sink in typical sensor networks takes charge of the function as a gateway for a connection with infrastructure networks [1]. Various papers in relation to multiple static sinks also indicate the connection between a sink and an infrastructure network and the connection between all sinks as an assumption [10], [11].

A Novel Agent-Based User-Network Communication Model in WSNs

91

Therefore, in this paper, it is assumed that each sink placed in the edge of a sensor field can communicate with the other sinks via the infrastructure networks. Hence, the proposed model of this paper collects data by one sink through sensor fields. And the aggregated data of the collected data will be shared by multiple static sinks through the infrastructure network. Advantages with multiple static sinks. The proposed model can obtain various advantages with multiple static sinks. First of all, a user can receive the data from the nearest sink to its position. Therefore, short hops communications between a user and a sink are possible. This saves energy, enhances the data delivery ratio, and reduces delay. Also, because a user requires a dissemination of interests through multiple static sinks, the locations of data collection are diverse. It relieves the hot spot problem, which has carried a disproportionate amount of traffic to sensor nodes near the sink [13]. As a result, the lifetime of the sensor networks will increase because a balance of energy consumption of the sensor nodes is made possible. 2.3 Scenario 3: Sensor Fields with No Sink Hazardous sensor fields. Sensor fields without a sink are a special type of sensor networks. If the sensor field is hazardous such that network administrator cannot reach the field (e.g., a battlefield), the sensor field may not have any sinks. In a battlefield, users (or soldiers) may move into a sensor field that has no sink and gather data from sensors. In this case, users must gather data for themselves. Dissemination of sink announcement message. Because there is no sink in the sensor field, the sensor network cannot perform the sink announcement message dissemination process for itself. In this case, users appoint the nearest sensor node as first agent, and the first agent disseminates the sink announcement message. As shown in Figs. 11 and 12, users that want to gather data from sensors examine nearby sensor nodes whether there is a sink in the sensor field or not. If there is no sink, users appoint the nearest sensor node as first agent. Once a sensor node becomes the first agent, it acts as the sink of Scenario 1. Hence, other processes such as sink announcement message dissemination, interest dissemination of the user, mobility support of the user, and data propagation of the sink are the same as in Scenario 1. New Agent Move Report

First Agent

User

Fig. 11. First Agent Selection and Announcement

First Agent

User

Fig. 12. Mobility support of the user

92

S.-S. Kim and A.-S. Park

Advantages with idle sensor network. Based on our assumptions, the proposed model in this scenario may create many first agents periodically. The first agents must return to the original state after the described period. This means that the first agents are appointed whenever users want to send their interests. Then, the first agents are reactively selected and perform all processes for user mobility. In the whole network, therefore, the sensor network can remain in an idle state in case there is no user in the sensor field. From the standpoint of the whole network, this is a positive effect because there is no control of messages and interests in the idle state sensor network.

3 Performance Evaluation In this section, we evaluate the performance of a proposed model through simulations. We first describe our simulation model and simulation metrics. We next evaluate how environmental factors and control parameters affect the performance of a proposed algorithm. 3.1 Simulation Model and Metric

50 45 40 35 30 25 20 15 10 5 0

5.0

10 0

4.5

90

A-COM D-COM

3.5

Data Delivery Ratio (%)

4.0

Delay(second)

The Number of Interest rounds

We evaluate the proposed model in Qualnet, a network simulator [12]. A sensor node’s transmitting and receiving power consumption rate are 0.66 W and 0.39 W. The transceiver in the simulation has a 50 m radio range in an outdoor area. Each interest packet is 40 bytes long, and the data packet has 64 bytes. The sensor network consists of 100 sensor nodes, which are randomly deployed in a 300m x 300m field. We consider three scenarios for the proposed model according to the number of sinks. Hence, the number of sinks and users is changed for this evaluation. The multiple static sinks are located in the outskirts of sensor fields. And the user, which follows a random waypoint model of 10m/s speed and 10 second pause time, moves into the sensor field. The user disseminates an interest at an interval of every 10 seconds. Every sensor node receives the interest and generates only one sensing data for the interest. This is defined as one interest round. Namely, one interest round is 10 seconds. The simulation lasts for 500 seconds.

A-COM D-COM

3.0 2.5 2.0 1.5 1.0 0.5

0

1

2

3

4

5

The Number of Sinks

Fig. 13. Network lifetime for the Number of Sinks

6

0.0 0

1

2

3

4

5

The Number of Sinks

Fig. 14. Delay for the Number of Sinks

6

80 70 60 50

A -C O M D -C O M

40 30 20 10 0 0

1

2

3

4

5

6

T h e N u m b e r o f S in k s

Fig. 15. Data Delivery Ratio for the Number of Sinks

A Novel Agent-Based User-Network Communication Model in WSNs

93

We use metrics to evaluate the performance of the proposed algorithm. The network lifetime is defined as the number of the interest round when the first sensor node dies. The data delivery ratio is the ratio of the number of successfully received reports by a user to the total number of reports generated by every sensor node. The delay is defined as the average time between the time a user transmits an interest and the time the user receives the report. We compare three mobility types (D-COM, G-COM, and T-COM) in Table 1 with the proposed model in the simulation. 3.2 Impact of the Number of Static Sinks

40

100

5.0

A-C OM G-CO M T-CO M

45

4.5

35 30 25 20 15 10 5 0

90

A-COM G-COM T-C OM

4.0 3.5 3.0

Data Delivery Ratio (%)

50

The Delay (Seconds)

The Number of Interest Rounds

Scenarios 1 and 2 of A-COM can be compared with the D-COM because G-COM and T-COM have no static sink. We first study the impact of the number of sinks on A-COM’s performance. The number of sinks varies from 1, 2, 3, 4 to 5. And there is only one user in the sensor field. In this part, we compare Scenarios 1 and 2 to D-COM regarding lifetime, delay, and delivery ratio. A difference between A-COM and the D-COM is how to communicate between a user and a sink. Fig. 13 shows the number of interest rounds, namely, network lifetime. As shown in Fig. 13, the number of interest rounds shows little difference between A-COM and D-COM. This means that A-COM can manage sensor fields as well as D-COM without infrastructure that connects users with sinks. In addition, the lifetime is increased according to the number of sinks. This is a side effect of multiple sinks. Multiple sinks separate the sensor field, and besides, users only use the nearest sink to send interests and receive replies. Users can use the shortest path to communicate with multiple sinks. As a result of the shortest communication, the lifetime in A-COM is enhanced according to the number of sinks. The delay is also enhanced by this side effect of multiple sinks. A-COM basically has some delay due to multi-hop communication between users and sinks. However, the delay is diminished according to the number of sinks, as shown in Fig. 14. Nevertheless, the data delivery ratio of A-COM is comparable with D-COM, as shown in Fig. 15. This also proves that the proposed model can manage sensor fields as well as D-COM without infrastructure.

2.5 2.0 1.5 1.0 0.5 0.0

0

2

4

6

T he Num ber of U sers

Fig. 16. Network Lifetime for the Number of Users

80 70 60 50

A-COM G-COM T-COM

40 30 20 10 0

0

2

4

6

The N um ber of Users

Fig. 17. Delay for the number of Users

0

2

4

6

The Num ber of Users

Fig. 18. Data Delivery Ratio for the Number of Users

94

S.-S. Kim and A.-S. Park

3.3 Impact of the Number of Users The number of users only results in path increase between users and sinks. D-COM uses direct communication between users and sinks, and A-COM uses multi-hop communication. A-COM has more paths and consumes more energy. (e.g., five users in A-COM consumes five times of the energy that is consumed by one user.) However, it is a tradeoff between energy and infrastructure. Although A-COM has more energy consumption and delays than D-COM, the merit of A-COM is infrastructure-less communication systems. Scenario 3 of A-COM can be compared with G-COM and T-COM because Scenario 3 of A-COM, G-COM, and T-COM have no static sinks. There are no sinks, and the number of users varies from 1, 2, 3, 4 to 5. In this part, we compare Scenario 3 of A-COM to G-COM and T-COM regarding lifetime, delay, and delivery ratio. G-COM and T-COM make and change the topology proactively, but Scenario 3 of A-COM reactively makes and shares it among users. Generally, users move about the sensor field only and generate its interest occasionally. Hence, sensors in Scenario 3 can save considerable energy. Alternatively, sensors in G-COM and T-COM maintain a topology continuously. Fig. 16 shows each lifetime of these sensor networks. As shown in Fig. 16, the lifetime of T-COM is considerably low due to frequent topology change and that of G-COM is relatively low due to GRID maintenance. In Fig. 17, G-COM has little delay due to proactive GRID topology by the GPS receiver. T-COM proactively creates the topology, but frequent topology changes of TCOM delay data delivery considerably. The delay of Scenario 3, as shown in Fig. 17, however, is only a little high due to the reactive first agent selection and topology construction. In the case of the data delivery ratio, A-COM and G-COM in Fig. 18 are similar except for T-COM. The reason is frequent topology change. Topology change messages disturb the data delivery ratio. 3.4 Impact of the Number of Sensor Nodes Evaluating A-COM with other models according to the number of sensors is of no real consequence. It is closely related to the performance of routing protocols that are used in the network. Hence, we do not evaluate the proposed model with this factor. 100

5.0 4.5 3.5

Data Delivery Ratio (%)

4.0

Delay (s)

90

A-COM G-COM T-C OM

3.0 2.5 2.0 1.5 1.0 0.5 0.0

80 70 60 50 40

A-COM G-COM T-COM

30 20 10 0

5

10

15

20

25

m axim um speed of a user (m /s)

Fig. 19. Delay for User Speed

5

10

15

20

25

Maximum Speed of a User (m/s)

Fig. 20. Data Delivery Ratio for User Speed

A Novel Agent-Based User-Network Communication Model in WSNs

95

3.5 Impact of the User Mobility We lastly evaluate the impact of user speed on A-COM. We vary the maximum speed of a user from 8, 10, 12, 14 to 20m/s. We assume that there is one user in the sensor field. In this part, we compare Scenario 3 to G-COM and T-COM because D-COM is independent of user speed. Fig. 19 shows the delay in data delivery, which slightly increases as the user moves faster. The delay depends on a movement operation that is processed by the user. The faster a user moves, the more the time is needed to establish a connection between the user and the network. Nevertheless, the delay of A-COM is comparable with G-COM because A-COM creates only one communication path between the user and its first agent. The delay of T-COM, on the other hand, is relatively higher than the others due to frequent topology changes. And, Fig. 20 shows the data delivery ratio when the user’s moving speed changes. The data delivery ratio of A-COM is slightly decreased according to the delay. But the data delivery ratio remains around 0.8 - 0.9; nevertheless, the user moves faster. Besides, the data delivery ratio of G-COM remains high because the GPS receiver may help the user with geographical routing. On the other hand, the data delivery ratio of T-COM is relatively lower than the others because it has too many topology changes when moving. The results in Fig. 19 and Fig 20 mean that A-COM is fast and stable without GPS receiver.

4 Conclusion and Further Work In this paper, we propose a novel agent-based user-network communication model to support the mobility of users in wireless sensor networks. In the proposed network model, the user can receive data with a higher data delivery ratio and in a faster time without infrastructure. We verified that the lifetime of sensor networks is prolonged because the reactive path construction decreases the consumption of sensor nodes. Also, we verified that performance of the data delivery ratio and the delay never falls; nevertheless, communication between the user and the network for guaranteeing movement of the user is supported by only sensor nodes without infrastructure networks. There is further work that is related to this research. In a mobile environment, many sensor nodes can shift from one place to another frequently. The mobile sensor node environment makes more dynamic sensor networks. The issue of node mobility requires further study.

References 1. I.F. Akyildiz, et al., "A survey on sensor networks," IEEE Communications Magazine, Vol. 40, pp. 102-114, Aug. 2002. 2. C. Intanagonwiwat, et al., "Directed diffusion: A scalable and robust communication paradigm for sensor networks," ACM Mobicom, 2000. 3. C. Schurgers and M.B. Srivastava, "Energy efficient routing in wireless sensor networks," IEEE MILCOM 2001.

96

S.-S. Kim and A.-S. Park

4. W. R. Heinzelman, et al., "Adaptive Protocols for Information Dissemination in Wireless Sensor Networks," ACM Mobicom, 1999. 5. F. Ye, et al., “A Two-Tier Data Dissemination Model for Large-scale Wireless Sensor Networks,” ACM MobiCOM, Sept. 2002. 6. S. Kim, et al., “SAFE: A Data Dissemination Protocol for Periodic Updates in Sensor Networks,” Distributed Computing Systems Workshops 2003. 7. H. L. Xuan and S. Lee, “A Coordination-based Data Dissemination Protocol for Wireless Sensor Networks,” IEEE ISSNIP, Dec. 2004. 8. K. Hwang, et al., "Dynamic sink oriented tree algorithm for efficient target tracking of multiple mobile sink users in wide sensor field," IEEE VTC, Sep. 2004. 9. S. R. Gandham, et al., "Energy Efficient Schemes for Wireless Sensor Networks with Multiple Mobile Base Stations," IEEE GLOBECOM, Dec. 2003. 10. H. Ferriere, et al., “Efficient and Practical Query Scoping in Sensor Networks,” IEEE International Conference on Mobile Ad-hoc and Sensor Systems, Oct. 2004. 11. E. I. Oyman and C. Erso, “Multiple Sink Network Design Problem in Large Scale Wireless Sensor Networks,” IEEE ICC, Jun. 2004. 12. Scalable Network Technologies, Qualnet, [online] available: http://www.scalablenetworks.com. 13. Hui Dai and Rechard Han, “A node-centric load balancing algorithm for wireless sensor networks,” IEEE GLOBECOM, Dec. 2003.

Realistic Mobility and Propagation Framework for MANET Simulations Mesut G¨ une¸s, Martin Wenig, and Alexander Zimmermann Chair of Computer Science, Informatik 4, RWTH Aachen University {guenes,wenig,zimmermann}@i4.informatik.rwth-aachen.de

Abstract. Two main steps on the way to more realistic simulations of mobile ad-hoc networks are the introduction of realistic mobility and sophisticated radio wave propagation models. Both have strong impact on the performance of mobile ad-hoc networks, e.g. the performance of routing protocols changes with these models. In this paper we introduce a framework which combines realistic mobility and radio wave propagation models. Our approach consists of a zone-based mobility generator and a high accuracy radio wave propagation model. For the mobility generation a wide variety of well understood random mobility models is combined with a graph based zone model, where each zone has its own mobility model. To achieve a realistic radio wave propagation model a ray tracing approach is used. The integration of these two techniques allows to create simulation setups that closely model reality.

1

Introduction

A mobile ad-hoc network is created by a collection of nodes which communicate using radio interfaces and do not rely on any pre-installed infrastructure. Furthermore, it is supposed that ad-hoc networks are inherently adaptive and auto-conﬁgured. Therefore, ad-hoc networks oﬀer immense ﬂexibility. In recent years the interest in the deployment of ad-hoc networks for real world scenarios grew. Still the number of real world ad-hoc networks is quite low and most of the testbeds [1] consist only of a small number of nodes. The development and testing of new algorithms and methods nowadays relies heavily on network simulations. Simulating wireless networks, and especially mobile adhoc networks, is not a trivial task and consequently there have been discussions about the validity of presented simulation results [2,3]. This work does not deal with the methodological background used to analyze the output of the simulation, instead it deals with the simulator’s accuracy. A key factor of accurate simulation results are accurate simulation models. To the belief of the authors the main weak points are 1) the unrealistic assumptions concerning the radio wave propagation [2], 2) the currently used simplistic mobility models [4,5] and 3) the assumed workload of the network. This work proposes a solution to the ﬁrst two mentioned problems. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 97–107, 2007. © IFIP International Federation for Information Processing 2007

98

M. G¨ une¸s, M. Wenig, and A. Zimmermann

Our contribution in this work is an integrated framework which allows the deﬁnition and control of the movement and the radio wave propagation model in higher detail than previous approaches. We propose a generation process which based on partitioning the simulation area into zones with diﬀerent independent mobility models together with a high accuracy radio wave propagation model. The need for such a generation framework might be illustrated by a quick literature overview: Taking the publications of the MobiHoc conferences of the last two years as an example, it is obvious that there is a need for better tool support for simulation designers. Out of 52 papers 35 presented simulation results (around 67%). Six papers did not give any information about the used mobility model, 10 used random waypoint to model mobility and 14 considered static scenarios. Only two papers showed results obtained from considering more than one mobility model. Only two papers mention the used radio wave propagation model, ten papers gave no indication about the used model and 22 used a ﬁxed radius. Assuming that all papers which did not specify their propagation model used a ﬁxed range it can be concluded that all papers used circular, bidirectional links. None of the presented papers used a small scale (fading) model. The structure of the paper is as follows: In section 2 mobility and radio wave propagation models are presented. In section 3 our approach is discussed in detail and in section 4 some simulation results obtained with ns-2 are discussed.

2 2.1

Related Work Random Mobility Models

There are many random mobility models proposed in literature. Detailed descriptions of these models are given in [6,7,8,9,10]. The most simple random mobility model is called Random Walk. In this model, a node selects randomly a direction and speed from predeﬁned ranges [ϕmin : ϕmax ] and [vmin : vmax ], respectively. Each movement is bounded either by travel time or distance. The Random Waypoint mobility model is an extension of Random Walk and integrates a pause time between two consecutive moves. A disadvantage of this model is the concentration of nodes in the center of the simulation area [11]. Besides these entity mobility models, there are group mobility models which specify how a set of nodes move in respect to each other [6]. In the Nomadic Community Mobility Model, all mobile nodes move to the same location in the same order but by using diﬀerent entity mobility models. The Reference Point Group Mobility model speciﬁes the movement of the group as well as the movements of the nodes within the group. There are also models which match the characteristics of car movements. In the Freeway model, there is at least one lane in each direction of a street. The nodes move on the lanes. The speed of a node depends on other nodes on the same lane. In the Manhattan model the lanes are organized around blocks of buildings. A node can change its direction only at intersections. All mobility models discussed so far share the assumption that there are no obstacles. In [12] a reﬁnement of random mobility models by integrating obstacles

Realistic Mobility and Propagation Framework for MANET Simulations

99

is proposed. The obstacles represent buildings. Upon the deﬁnition of buildings, paths between them are calculated. The mobile nodes are randomly distributed on the paths and the destinations of the nodes are selected randomly among the buildings. The nodes move on the deﬁned paths from building to building. Additionally, the communication characteristic is also aﬀected by the obstacles. A mobile node inside a building cannot communicate with a mobile node outside the building. 2.2

Models from Cellular Network Research

In cellular networks the geographical area is divided into cells. There is a base station in each cell which provides communication service for the nodes. The mobility models in this area describe the mobility of the nodes regarding the cell topology, i.e., when a node moves from a particular cell to another cell. In [13] a hierarchy of such models with regard to metropolitan, national, and international mobility is presented. Their results cannot directly be applied to MANET simulations, since here movements have to be described with higher granularity. 2.3

Mobility Models from Real User Traces

The best input for simulations would be derived from real traces. However, it is very diﬃcult for the research community to obtain those data. Therefore, there are few studies reported which are based on real data [14]. In [15] the authors describe how real user traces can be used to build simulation models. It is based on the trace collection at Darthmouth College. Their interesting research can be used as input for our mobility generator. But using it to evaluate our models is not meaningful, since we could set up our model to deliver similar results by simply creating a similar geometry and using their parameters as input. 2.4

Radio Wave Propagation

Radio channels are more complicated to model than wired channels. Their characteristics may change rapidly and randomly and they are dependent on their surrounding (buildings, terrain etc.). Nevertheless, most wireless network simulators use very simpliﬁed propagation models. In general, propagation models can be characterized into two groups: large-scale and small-scale propagation models. Large-scale models characterize how the transmission power between two nodes changes over long distances and over a long time. Small-scale models account for the fact that small movements (in the order of the wavelength) may have large inﬂuence on the transmission quality. Also, due to multipath propagation, the signal varies heavily even if the nodes do not move. Common used propagation models are the Free Space model, the Two-Ray Ground model and the Shadowing model [16]. In addition, Ricean and Rayleigh fading are often used as small-scale models [17]. None of these models is able to correctly model complex scenarios with obstacles. One way to overcome this

100

M. G¨ une¸s, M. Wenig, and A. Zimmermann

limitation is the use of ray tracing technologies. In [18] an approach using this technique is described. It allows the deﬁnition of obstacles in a graphical editor and this scenario description is used in the simulation to feed a ray tracing algorithm. The algorithm is started once for every new position the node takes up. The authors state that this approach slows down the simulation by a factor of up to 100. Also, no movement information is generated by this tool. Other approaches [19,20] either do not scale well or their accuracy is highly dependent on the selected grid resolution of the calculated scenario. Additionally, these models have been developed for ﬁxed wireless networks.

3

Generation Framework

The proposed framework named CosMos addresses the generation of i) realistic mobility patterns, and ii) accurate radio wave propagation information for the speciﬁc scenario. The whole process needs three steps. First the designer has to create a scenario for the desired simulation setup. In the next two steps the energy density maps for the radio wave propagation model are precomputed and the movement ﬁles are generated. 3.1

Scenario Creation

The scenario consists of movement zones (MZ) and obstacle zones (OZ). Both can basically be described as polygons and they divide the simulation area into smaller parts. The designer assigns a mobility model to each MZ. All models have their own set of parameter (e.g. maximum speed) and are independent of each other. When MZs overlap nodes can change from one zone to the other. The probability to leave the current zone can also be set by the designer. The obstacles have three parameters: their height, their reﬂection, and transmission coeﬃcients. The position of the zones can be set up by the designer as wished, e.g to model an indoor scenario. The values for the mobility models must be decided individually according to the intended scenario. Here, only very limited experience has been gathered by researchers. The approach presented in [15] can be integrated in our framework. For the radio wave propagation model there have been measurements which can be used, e.g. [16]. Another approach is to perform own measurements and use these as input for the ray tracer. 3.2

Mobility Generator

The movement zones create a weighted and directed graph: the zones are the vertices (V), there is an edge between two vertices if the corresponding movement zones overlap. The set of edges (E) has weights attached to them. The function w : E → − (0 . . . 1) deﬁnes weights for all edges of the graph G(V, E). The weight wi,j of a directed edge ei,j ∈ E from zone i to zone j corresponds to the probability for nodes to leave zone i to zone j.

Realistic Mobility and Propagation Framework for MANET Simulations

101

Initially all nodes are distributed randomly in the movement zones. During the generation process nodes move inside the zones according to the mobility model of their current zone. If a node decides to leave to a neighbouring zone (depending on the exit probability) it moves towards the overlapping region of the two zones. When it arrives there, the mobility model of the new zones takes over and the node starts to move according to the new mobility model. Our approach allows the calculation of the spatial distribution of nodes on the simulation area as well as the distribution of nodes on the movement zones. Since initially all nodes are distributed randomly in all movement zones, there will be a point in time in which zones with a high exit probability will ‘loose’ some nodes to the zones with a lower exit probability. After a while, the distribution of nodes should become stable. Figure 1 shows a small example with three movement zones: two large rectangles on each side and a narrow street connecting them. Zone A has a higher exit probability than Zone B. The street is only used to travel from zone A to B and vice versa. 1

Zone A Zone B Street

Zone A

Street

Zone B

0.0012 0.001 0.0008 0.0006 0.0004 0.0002 0

0 200 400 600 800 Width 100012001400 0 1600

0.0012 0.001 0.0008 0.0006 0.0004 0.0002 0

500 450 400 350 300 250 200 Height 150 100 50

(a) Spatial distribution of the nodes.

Percentage of Nodes [%]

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Simulation Time [s]

(b) Distribution of the nodes on the zones.

Fig. 1. Visualizing the steady state

Figure 1(a) shows the (long-term) spatial probabilities of the simple scenario. It shows the probability of a node being at a speciﬁc place over a simulation run of 10000 seconds. Figure 1(b) shows that initially all zones contain approximately one third of all nodes. During the simulation run the distribution slowly changes: all nodes in the street zone leave this zone to one of the neighbouring zones. The distribution stabilizes when most of the nodes are inside the zone with the lower exit probability. Nevertheless, still nodes travel from one zone to the other, but the long term average remains relatively constant from second 2000 onwards. If this ‘steady-state’ behaviour is important for the considered simulation scenario then the ﬁrst seconds will have to be cut oﬀ. The proposed framework per default creates several independent variants of the movements according to the given models. This helps the researcher to conduct simulations with independent replications.

102

3.3

M. G¨ une¸s, M. Wenig, and A. Zimmermann

Radio Wave Propagation Model

The radio wave propagation model used in this work is based on a ray tracing approach. The obstacles deﬁned in CosMos are used as input for the ray tracer. Triggering a ray tracing run for every position of the current sender is unfeasible in mobile ad-hoc networks. Instead, our approach uses a set of predeﬁned starting points for the ray tracing approach. The ray tracer is then started once for each of this points creating an energy distribution map for each one. During the simulation the energy distribution between the sender and the receivers is calculated using weighted interpolation, as detailed below. The ray tracer accounts for the following propagation phenomenas: reﬂection, diﬀraction, and scattering. To use the generated energy distribution maps during the simulation, we modiﬁed the ns-2 network simulator [21]. We added a propagation model which reads in a given set of maps and the corresponding starting points. During the simulation, whenever a node nt wants to transmit a packet, a k-nearest neighbor search is started1 . This search ﬁnds the k nearest starting points and their corresponding energy distribution maps to the sender’s position. For each node inside the maximum interference range of an unobstructed radio wave the transmission power is calculated. The formula used for the weighted interpolation is given below: k−1 si i=0 posi −post p 1 i=0 posi −post p

st−r = k−1

,

where st−r is the signal strength between the transmitter node nt and the receiver node nr . The position of the transmitter is given as post , posi denotes the position of the starting point of the i-th closest map. Note that si is the predicted signal strength of map i at the position of the receiver posr . The exponent p controls how much inﬂuence is given to further away maps2 . The beneﬁts of our approach are that it is not necessary to rerun the ray tracing algorithm during simulation time, it is not necessary to divide the simulation area into evenly sized squares, and the accuracy can be increased in areas with a lot of obstacles, simply by adding more starting points. A real-time evaluation tool has been developed to show the result of the interpolation. Our approach increases the simulation speed and allows the designer to choose between high accuracy and reduced memory needs [22].

4

Results

In this Section we discuss some simulation results created with ns-2. The simulation scenario was created with CosMos. The presented Scenario models the oﬃce building in which the authors’ chair is in. The intention of the studies was 1 2

Our experiments showed k equal to 3 gave good results. In our experiments p was set to 3.

Realistic Mobility and Propagation Framework for MANET Simulations

103

Fig. 2. Indoor scenario

to show the impact of the mobility and radio wave propagation models on the performance of MANET routing protocols. Figure 2 shows the scenario outline. Only the ground ﬂoor is modeled here. All nodes are equipped with IEEE 802.11 radio interfaces with a transmission rate of 11 Mbit/s and a transmission power of 0.1 mW. The receiving threshold was set to -88 dBm, a value taken from the speciﬁcation of the Cisco Aironet 1240AG Series access point. The AODV implementation of the university of Uppsala [23] and the current version of DSR [11] in ns2 were used. Thirty connections between randomly selected nodes were started, each one oﬀering 32 kBytes of load. Movements inside of oﬃces are seldom and relatively slow (max. speed was set to 1 m/s). Movements inside of the hallways on the other hand are faster and follow the freeway model (max. speed was set to 2 m/s). Since we had detailed plans of our building, we were able to model it with high detail. To check the accuracy of the radio wave propagation model, we conducted some measurements with real-life systems and compared the results to the calculated values. The mean error between the predicted values and the calculated is 3.5 dB which is as good as the results for approches mentioned in section 2. We conducted simulations using the AODV and the DSR routing protocols in which the following combinations were considered: CosMos mobility model together with Two-Ray Ground propagation model and CosMos mobility model with ray tracing propagation model. Figure 3 shows a comparison of the throughput achieved using AODV and DSR in the presented scenario. It is clear to see that the measured values without the ray tracer propagation model can be considered as equal. But using the ray traced propagation model, the DSR protocol suﬀers more heavily from performance loss. AODV seems to be able to cope better with the situation. The decreasing performance for larger number of nodes can be explained by the higher number of hops between sender and destination.

104

M. G¨ une¸s, M. Wenig, and A. Zimmermann

The routes are getting longer because the node density is higher and farther away nodes can also be reached. The reason for the worse performance of DSR compared to AODV seems to be a larger number of discovered paths which were actually already invalid (stale paths) when they should be used for the ﬁrst time. Figure 4 compares the average end-to-end delay for packets between source and destination. As expected, the values using TwoRay Ground can again be considered as equal. Using the ray tracer the delay of course grows due to longer routes, higher number of transmission errors, and thus higher routing overhead. Again, we see a strong inﬂuence on DSR. As a rule of thumb, one can say that if more than 90% of all packets have a delay of less than 150ms, VoIP is possible with reasonable quality. If scenarios with more than 60 nodes are considered, DSR is not able to fullﬁl this criterion. This is yet another example why accurate simulation models are absolutely neccesarry. If one would have based the decission on the simple simulation setup both algorithms would have been judged as equal but in reality only AODV is actually able to fullﬁl the delay bound. 35

Throughput in KBytes/s

30 25 20 15 10 AODV, ray tracer, cosmos DSR, ray tracer, cosmos AODV, TRG, cosmos DSR, TRG, cosmos

5 0 20

30

40

50 60 70 Number of Nodes

80

90

Fig. 3. Throughput comparison between AODV and DSR

Another result of our simulation study is that the mobility model is more important for larger scenarios. The smaller the simulation area compared to the transmission range of the nodes, the smaller the inﬂuence of the mobility model. We also measured the run-time of the simulations with and without our propagation model. Table 1 shows the times for the indoor simulation. The increase in run-time is relatively small since during the simulation runtime only lookups in the kd-tree have to be done. The preprocessing time, namely the time needed to create the energy distribution maps, is dependent on the complexity of the scenario. For the presented indoor scenario 112 starting points were used and the ray tracer needs around 12 seconds for each point (shooting 50000 photons).

Realistic Mobility and Propagation Framework for MANET Simulations 0.5

AODV, ray tracer, cosmos DSR, ray tracer, cosmos AODV, TRG, cosmos DSR, TRG, cosmos

0.45 End-to-end Delay in seconds

105

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 20

30

40

50 60 70 Number of Nodes

80

90

Fig. 4. Delay comparison between AODV and DSR Table 1. Runtime of the ns-2 simulator # Nodes 10 20 30 40 50

5

Runtime (s) TwoRayGround 13.6 34.3 59.3 69.1 90.2

Runtime (s) CosMos 16.5 61.9 91.1 119.2 147.5

Factor 1.2 1.8 1.5 1.7 1.6

Conclusion

In this paper we have introduced a mobility and radio wave propagation scenario generator for MANET. The goal was to aid researchers in the design of realistic simulation scenarios. The framework is very general and can be deployed to design scenarios with special requirements. Our approach combines a wide variety of well understood random mobility models with a graph based zone model and a sophisticated ray traced radio wave propagation model. Each zone can have a diﬀerent mobility model. The framework allows to generate the mobility deﬁnition and the ray tracer results from one common scenario. So the combination of realistic movement models and accurate radio wave propagation models becomes an easy task for the researcher. Furthermore, our approach allows the calculation of the spatial distribution of nodes on the simulation area as well as the distribution of the nodes on the deﬁned zones. This allows us to ﬁgure out the time when the stationary state is reached. Since, trustworthy MANET simulations should begin when the stationary state is reached.

106

M. G¨ une¸s, M. Wenig, and A. Zimmermann

References 1. G¨ une¸s, M., Bouazizi, I.: From Biology to Technology: Demonstration Environment for the Ant Routing Algorithm for Mobile Ad-hoc Networks. In: Tenth Annual Int. Conference on Mobile Computing and Networking (ACM MobiCom 2004), Philadelphia, USA (September 2004) 2. Kotz, D., Newport, C., Gray, R.S., Liu, J., Yuan, Y., Elliott, C.: Experimental evaluation of wireless simulation assumptions. Technical Report TR2004-507, Dept. of Computer Science, Dartmouth College (June 2004) 3. Pawlikowski, K., Jeong, H.D.J., Lee, J.S.R.: On credibility of simulation studies of telecommunication networks. IEEE Communications 40(1) (January 2002) 132– 139 4. Bettstetter, C., Hartenstein, H., Perez-Costa, X.: Stochastic properties of the random waypoint mobility model: epoch length, direction distribution, and cell change rate. In: MSWiM ’02: Proc. of the 5th ACM int. Workshop on Modeling, Analysis and Simulation of Wireless and Mobile Systems, NY, USA, ACM Press (2002) 7–14 5. Bettstetter, C., Resta, G., Santi, P.: The node distribution of the random waypoint mobility model for wireless ad hoc networks. IEEE Trans. Mobile Computing 2(3) (2003) 257–269 6. Hong, X., Gerla, M., Pei, G., Chiang, C.C.: A group mobility model for ad hoc wireless networks. In: Proc. of the ACM Int. Workshop on Modeling and Simulation of Wireless and Mobile Systems (MSWiM). (August 1999) 53–60 7. Camp, T., Boleng, J., Davies, V.: A survey of mobility models for ad hoc network research. Wireless Communications and Mobile Computing (WCMC): Special issue on Mobile Ad Hoc Networking: Research, Trends and Applications 2(5) (2002) 483–502 8. Sanchez, M.: Mobility models. http://www.disca.upv.es/misan/mobmodel.htm (2005) 9. Bai, F., Sadagopan, N., Helmy, A.: The important framework for analyzing the impact of mobility on performance of routing for ad hoc networks. AdHoc Networks Journal - Elsevier Science 1(4) (November 2003) 383–403 10. Lin, G., Noubir, G., Rajaraman, R.: Mobility models for ad hoc network simulation. In: Proc. of the 23rd Conference of the IEEE Communications Society, Hon Kong, IEEE, IEEE (March 7-11 2004) 11. Johnson, D.B., Maltz, D.A.: Dynamic source routing in ad hoc wireless networks. In: Mobile Computing. Volume 353. Kluwer (1996) 12. Jardosh, A., Belding-Royer, E.M., Almeroth, K.C., Suri, S.: Towards realistic mobility models for mobile ad hoc networks. In: The Ninth Annual Int. Conference on Mobile Computing and Networking (ACM MobiCom 2003), San Diego, USA, ACM (September 14-19 2003) 13. Lam, D., Cox, D.C., Widom, J.: Teletraffic modeling for personal communications services. IEEE Communications Magazine 35 (Feb. 1997) 79–87 14. Hsu, J., Bhatia, S., Takai, M., Bagrodia, R., Acriche, M.J.: Performance of mobile ad hoc networking routing protocols in realistic scenarios. In: MilCom 2003, Boston, Massachusetts (October 13-16 2003) 15. Kim, M., Kotz, D., Kim, S.: Extracting a mobility model from real user traces. In: Proc. of the 25th Joint Conference of the IEEE Computer and Communications Societies, Barcelona, Spain, IEEE Computer Society Press (April 2006)

Realistic Mobility and Propagation Framework for MANET Simulations

107

16. Rappaport, T.S.: Wireless Communications, Priciples & Practice. Prentice Hall (1999) 17. Punnoose, R.J., Nikitin, P.V., Stancil, D.D.: Efficient simulation of ricean fading within a packet simulator. In: Vehicular Technology Conference. (Sep. 2000) 18. Dricot, J.M., Doncker, P.D.: High-accuracy physical layer model for wireless network simulations in ns-2. In: Proc. of the Int. Workshop on Wireless Ad-hoc Networks. (2004) 19. Catedra, M., Perez, J., de Adana, F.S., Gutierrez, O.: Efficient ray-tracing techniques for three-dimensional analyses of propagation in mobile communications: application to picocell and microcell scenarios. Antennas and Propagation Magazine 40(2) (Apr 1998) 15–28 20. Schmeink, M., Mathar, R.: Preprocessed indirect 3D-ray launching for urban microcell field strength prediction. In: AP 2000 Millennium Conference on Antennas and Propagation. (April 2000) 21. Fall, K., Varadhan, K.: The ns-2 manual. Technical report, The VINT Project, UC Berkeley, LBL and Xerox PARC (2003) 22. Schmitz, A., Wenig, M.: The effect of the radio wave propagation model in mobile ad hoc networks. In: MSWiM ’06: Proc. of the 9th ACM international workshop on Modeling, Analysis and Simulation of wireless and mobile systems, Torremolinos, Spain, ACM Press (2006) 23. Wiberg, B., Nordstr¨ om, E.: Ad-hoc on-demand distance vector routing - for real world and simulation. http://core.it.uu.se/core/index.php/AODV-UU (October 2006)

Localization for Large-Scale Underwater Sensor Networks Zhong Zhou1 , Jun-Hong Cui1 , and Shengli Zhou2 1 2

Computer Science & Engineering Dept, University of Connecticut, Storrs, CT, USA, 06269 Electrical & Computer Engineering Dept, University of Connecticut, Storrs, CT, USA, 06269 {zhong.zhou,jcui,shengli}@engr.uconn.edu

Abstract. In this paper, we study the localization problem in large-scale underwater sensor networks. The adverse aqueous environments, the node mobility, and the large network scale all pose new challenges, and most current localization schemes are not applicable. We propose a hierarchical approach which divides the whole localization process into two sub-processes: anchor node localization and ordinary node localization. Many existing techniques can be used in the former. For the ordinary node localization process, we propose a distributed localization scheme which novelly integrates a 3-dimensional Euclidean distance estimation method with a recursive location estimation method. Simulation results show that our proposed solution can achieve high localization coverage with relatively small localization error and low communication overhead in large-scale 3-dimensional underwater sensor networks.

1 Introduction Recently, there has been a rapidly growing interest in monitoring aqueous environments for scientific exploration, commercial exploitation and coastline protection. The ideal vehicle for this type of extensive monitoring is a distributed underwater system with networked wireless sensors, referred to as Underwater Wireless Sensor Network (UWSN) [1,9]. For most UWSNs, localization service is an indispensable part. For example, in the long-term non-time-critical aquatic monitoring service [9,13], localization is a must-do task to get useful location-aware data. Location information is also needed for geo-routing which is proved to be more efficient than pure flooding in UWSNs [20]. In this paper, we investigate the localization issue for large-scale UWSNs. Localization has been widely explored for terrestrial wireless sensor networks, with many localization schemes being proposed so far. Generally speaking, these schemes can be classified into two categories: range-based schemes and range-free schemes. The former covers the protocols that use absolute point-to-point distance (i.e., range) estimates or angle estimates to calculate locations [12,14,6,5,18,15], while the latter makes no assumptions about the availability or validity of such range information [7,17,16,11,19]. Although range-based protocols can provide more accurate position estimates, they need additional hardware for distance measures, which will increase the network cost. On the other hand, range-free schemes do not need additional

This work is supported in part by the NSF CAREER Grant No.0644190.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 108–119, 2007. c IFIP International Federation for Information Processing 2007

Localization for Large-Scale Underwater Sensor Networks

109

hardware support, but can only provide coarse position estimates. In this paper, we are more interested in accurate localization, which is requested by a range of applications, such as estuary monitoring and pollutant tracking [9]. Moreover, in UWSNs, acoustic channels are naturally employed, and range measurements using acoustic signals are much more accurate than using radio [9,20]. Thus, range-based schemes are potentially good choice for UWSNs. Due to the unique characteristics (such as low communication bandwidth, node mobility, and 3-dimensional node deployment) of UWSNs [1,9], however, the applicability of the existing range-based schemes is yet to be investigated. There are also several schemes proposed for the localization service in underwater acoustic networks [4,3,21,10]. These solutions are mainly designed for small-scale networks (usually with tens of nodes or even less). For large-scale UWSNs, hundreds or thousands of sensor nodes are deployed in a wide underwater area. Directly applying these localization schemes proposed for small scale underwater networks in large-scale networks is often inefficient and costly. In this paper, for the first time, we explore the localization problem in large-scale UWSNs. We propose a hierarchical approach, dividing the whole localization process into two sub-processes: anchor node localization and ordinary node localization. Many existing approaches can be used in anchor node localization. For ordinary node localization, we propose a novel distributed method based on a 3-dimensional Euclidean distance estimation method and a recursive location estimation method. Simulation results show that our localization scheme can achieve high localization coverage with accurate location estimation and low communication overhead in large-scale 3-dimensional underwater sensor networks. The rest of this paper is organized as follows. In Section 2, we describe our localization scheme. Simulation results are then presented in Section 3. And finally we draw conclusions in Section 4.

2 Localization for Large-Scale UWSNs 2.1 Overview We consider a typical UWSN environment as shown in Fig. 1. There are three types of nodes in the network: surface buoys, anchor nodes, and ordinary nodes. Surface buoys are nodes that drift on the water surface. These buoys are often equipped with common GPS and can get their absolute locations from GPS or by other means. Anchor nodes are those who can directly contact the surface buoys to get their absolute positions. These nodes can also communicate with ordinary nodes and assist them to do localization. Ordinary nodes are those who can not directly talk to the surface buoys because of cost or some other constraints but can communicate with the anchor nodes to estimate their own positions. To handle the large scale of UWSNs, we propose a hierarchical localization approach. In this approach, the whole localization process is divided into two sub-processes: anchor node localization and ordinary node localization. At the beginning, only the surface buoys know their locations through common GPS or by other means. Four or more buoys are needed in our system. These buoys work as the “satellites” for the whole network, and anchor nodes can be localized by

110

Z. Zhou, J.-H. Cui, and S. Zhou

Fig. 1. A typical large-scale underwater sensor network setting

these surface buoys. Using surface buoys to locate underwater objects has been extensively investigated and many existing systems, such as [4] and [3], can be employed in the anchor node localization process. In this paper, we will not contribute to this part. Instead, we mainly tackle the problem of ordinary node localization, for which we propose a distributed range-based scheme, novelly integrating a 3-dimensional Euclidean distance estimation method and a recursive location estimation method. We describe this scheme in the following section. 2.2 Ordinary Node Localization In 3-dimensional UWSNs, for a range-based localization scheme, ordinary nodes have to estimate their distances to more than 4 anchor nodes and calculate their locations by triangulation methods, which are commonly used in GPS systems. In a large-scale UWSN, however, not all ordinary nodes can directly measure their distances to 4 or more anchor nodes, thus some multi-hop distance estimation methods have to be developed. In [18], the authors proposed and compared three multi-hop distance estimation methods: DV-Hop, DV-Distance and Euclidean. Even for two dimensional terrestrial sensor networks, the performance of DV-Hop and DV-Distance degrades dramatically in anisotropic topologies, while the Euclidean method can achieve much more accurate results and behave more consistently in both anisotropic and isotropic networks than other methods [18]. In a UWSN, since the sensor nodes are constantly moving due to many environment factors, the network topology may change unpredictably with time and space. Thus, the Euclidean method is expected to be more suitable for UWSNs than other approaches. In our scheme, we employ a hybrid approach based on a 3-dimensional Euclidean distance estimation method and a recursive location estimation method to get the ordinary node positions. When combined with the recursive method, the inherent problems of the Euclidean method such as high communication cost and low localization coverage can be greatly alleviated. Next, we first discuss these two methods, examining why they can be seamlessly integrated together. Then we describe the ordinary node localization process in detail.

Localization for Large-Scale Underwater Sensor Networks

111

3-Dimensional Euclidean Distance Estimation. In [18], a Euclidean distance propagation method is proposed for two dimensional sensor networks. Here, we extend it into 3-dimensional networks. We use an example to illustrate the method. Referring to Fig. 2, if an ordinary node E wants to estimate its distance to anchor node A, it needs to know at least three (onehop) neighbors (e.g., B, C, and D) which have distance estimates to A. Note that nodes A, B, C and D should not be co-plane and any three nodes out of A, B, C, D and E should not be co-line. Moreover, E needs to know its two-hop distance estimates, that is, E should have the length information of EB, BA, EC, CA, ED, DA, DB, DC, and BC. The 3-dimensional Euclidean distance estimation works as follows: First, node E uses edge BA, CA, BC to construct the basic localization plane. Since the lengths of edges DB, DA and DC are already known (to E), the position of D can be easily estimated. There exist at most two possible positions for D. Because E knows the lengths of edges ED, EB and EC, corresponding to the two possible positions of D, there will be at most four possible solutions for E’s position. The choice among these four possibilities is made locally by voting when E has more immediate neighbors with estimates to A. If it cannot be decided, the distance estimate to A is not available until E gets more information from its neighbors.

Fig. 2. 3-dimensional Euclidean estimation

Recursive Location Estimation. In [2], the authors propose an iterative framework to extend the position estimation from a few reference nodes throughout the whole network. System coverage increases recursively as nodes with newly estimated positions join the reference node set, which is initialized to include anchor nodes. This recursive location estimation method is illustrated in Fig. 3. In the figure, node 1 can get its location information from four neighboring anchor nodes A, B, C and D. If the location estimation error is small enough, node 1 can be regarded as a new reference node for other nodes. Then, it will broadcast its own location information. When node 2 gets to know the locations of C, D, E and 1 as well as the distances to these nodes, it can calculate its own location. On the other hand, if the location estimation error is large, node 1 cannot be treated as a reference node and will not broadcast its location. In our scheme, the following formula is used to estimate the location error δ:

112

Z. Zhou, J.-H. Cui, and S. Zhou

δ=

(u − xi )2 + (v − yi )2 + (w − zi )2 − l2 , i

(1)

i

where (u, v, w) are the estimated coordinates of the unknown node, (xi , yi , zi ) are the reference node i’s location, li is the measured distance between the unknown node and node i.

Fig. 3. Recursive location estimation

In order to alleviate the error propagation effect, every reference node in the system has a confidence value η. For the initial reference nodes (i.e., the anchor nodes), η is set to be the largest, while for a new reference node, η is associated with its location error. In our scheme, η is calculated as follows ⎧ 1 ⎪ ⎨

if node is the initial anchor δ η = 1− others ⎪ ⎩ (u − xi )2 + (v − yi )2 + (w − zi )2

(2)

i

We can see that η is essentially a normalized δ. A critical value λ (referred to as “confidence threshold” later) is set. When η > λ, the unknown node can become a reference node. Otherwise, it will continue to be non-localized. When a node gets to know its distances to more than four nodes, it will choose four according to their η values and calculate its location. Ordinary Node Localization Process. In the ordinary node localization process, there are two types of nodes: reference nodes and non-localized nodes. In the initialization phase, all anchor nodes label themselves as reference nodes and set their confidence values to 1. All the ordinary nodes are non-localized nodes. With the advance of the localization process, more and more ordinary nodes are localized and become reference nodes. There are two types of messages: localization messages and beacon messages. Localization messages are used for information exchange among non-localized nodes and reference nodes, while beacon messages are designed for distance estimates. During the localization process, each node (including reference nodes and non-localized nodes) periodically broadcasts a beacon message, containing its id. And all the neighboring

Localization for Large-Scale Underwater Sensor Networks

113

Fig. 4. Ordinary node localization process

nodes which receive this beacon message can estimate their distances to this node using techniques, such as TOA (time of arrival). We describe the actions of the two types of nodes as follows. Reference Nodes: Each reference node periodically broadcasts a localization message which contains its coordinates, node id, and confidence value. Non-localized Nodes: Each non-localized node maintains a counter, n, of localized messages it broadcasts. We set a threshold N (referred to as “localization message threshold”) to limit the maximum number of localization messages each node can send. In other words, N is used to control the localization overhead. Besides, each nonlocalized node also keeps a counter, m, of the reference nodes to which it knows the distances. Once the localization process starts, each non-localized node keeps checking m. There are two cases: (1) m < 4. This non-localized node broadcast a localization message which contains all its received reference nodes’ locations and its estimated distances to these nodes. Its measured distances to all one-hop neighbors are also included in this localization message. Besides, this node uses the 3-dimensional Euclidean distance estimation approach to estimate its distances to more non-neighboring reference nodes. After this step, the set of its known reference nodes is updated. Correspondingly, m is updated and the node returns to the m-checking procedure. (2) m ≥ 4. This non-localized node selects 4 reference nodes with the highest confidence values for location estimation. After it gets its location, it computes confidence

114

Z. Zhou, J.-H. Cui, and S. Zhou

value η. If η is larger than or equal to the confidence threshold λ, then it is localized and labels itself as a new reference node. Otherwise, if η is smaller than λ, the node will take the same actions as described in case (1). The complete localization procedure of an ordinary node is illustrated in Fig. 4.

3 Performance Evaluation In this section, we evaluate the performance of our proposed localization scheme through simulation. 3.1 Simulation Settings In our simulation experiments, 500 sensor nodes are randomly distributed in a 100m × 100m × 100m region. We define node density as the expected number of nodes in a node’s neighborhood, hence node density is equivalent to node degree. We control the node density by changing the communication range of each node while keeping the area of deployment the same. Range (i.e., distance) measurements between nodes are assumed to follow normal distributions, with real distances as mean values and standard deviations to be one percent of real distances. 5%, 10% and 20% anchor nodes are considered in our simulations. Besides our scheme, we also simulate a Euclidean scheme and a recursive scheme for comparison. The recursive scheme here is the same as in [2]. As for the Euclidean scheme, we use the three dimensional Euclidean distance estimation as the distance propagation method and then use the triangulation method to estimate an ordinary node’s position if it gets to know four or more reference nodes. It works almost the same as the Euclidean scheme for two dimensional networks [18]. We consider three performance metrics: localization coverage, localization error and average communication cost. Localization coverage is defined as the ratio of the localizable nodes to the total nodes. Localization error is the average distance between the estimated positions and the real positions of all nodes. As in [18,8], for our simulations, we normalize this absolute localization error to the node communication range R. Average communication cost is defined as the overall messages (including beacon messages and localization messages) exchanged in the network divided by the number of localized nodes. 3.2 Performance in Static Networks In this set of simulations, nodes in the network are fixed. The confidence threshold λ is set to 0.98, and the localization message threshold N is set to 5. We change the node density (i.e., node degree) from 8 to 16 and compare our scheme with the Euclidean scheme and the recursive scheme. The results are plotted in Fig. 5, Fig. 6, and Fig. 7. Localization Coverage. Fig. 5 shows that our scheme outperforms both Euclidean scheme and recursive scheme in terms of localization coverage. This is reasonable since any node which can be located by either Euclidean scheme or recursive scheme can also be located by our scheme. The localization coverage of our scheme increases monotonically with the node density. But when the node density is relatively large, the coverage

Localization for Large-Scale Underwater Sensor Networks

115

reaches a relatively high value and will not change much after that. For example, when the anchor percentage is 20%, the localization coverage reaches 94% at node density 12 and does not increase much with the node degree lifted. And we can also see that the more the anchors, the higher the localization coverage. For example, if the anchor percentage is 5%, the localization coverage can only reach 0.4 when the node density is 13, but if the anchor percentage is 10%, the localization coverage can reach 0.8 when the node density is 13. This suggests us that in sparse networks, we can increase the number of anchor nodes to achieve higher localization coverage. Localization Error. Fig. 6 plots the relationship between the localization error and the node density. We can observe that when the node density is relatively small, the localization error of our scheme is almost the same as that of the other two schemes. With the increase of the node density, the localization error of our scheme will increase and become a little larger than recursive scheme but much smaller than Euclidean scheme. This is because with the increase of the node density, the localization coverage of our scheme increases much faster than the other two schemes, as leads the growth of the localization error. But this growth is much slower rate than that of the localization coverage. As the node density continues to increase beyond some point, the localization error of our scheme will decrease slowly. This can be explained as follows. When the node density reaches a certain point, most sensor nodes can localize themselves. If we continue to increase the node density, ordinary nodes will get to know more anchor nodes and have more choices to calculate their locations. Thus, the localization error will decrease. But, as show in Fig. 6, this decrease is very limited. For example, when the anchor percentage is 5%, if we increase the node density from 13 to 16, the localization error only decreases from 0.3 to 0.27. Thus, in practice, we cannot expect to reduce the localization error by simply lifting the node density. Fig. 6 also shows us that the localization error will decrease observably with the anchor percentage. For example, at node density 13, when the anchor percentage is 5%, the localization error is 0.3. But when the anchor percentage is enlarged to 20%, it reduces to 0.05. Thus, more anchor nodes can translate into smaller localization errors. Communication Cost. Fig. 7 shows the average communication cost with the changing node density. In the recursive localization scheme, only nodes with known locations broadcast messages and other nodes keep silent. Therefore, the average communication cost of this scheme is very small. For our scheme, when the node density is small, it introduces larger communication cost than the recursive scheme. This is because in our scheme, when the network is sparse, although many nodes exchange beacon messages, they cannot finally localize themselves. In other words, these beacon messages are actually “wasted” in the localization process. But with the increase of the node density, this waste becomes smaller and smaller, and the average communication cost of our scheme becomes closer and closer to the recursive scheme. From the figure, we can also observe that the average communication cost of our scheme decrease with the increase of anchor percentage. Compared with the Euclidean localization scheme, our scheme can always achieve much lower communication cost. This is due to that fact that the recursive component in our scheme help to find more reference nodes much faster than the Euclidean localization scheme.

116

Z. Zhou, J.-H. Cui, and S. Zhou

(a) Anchor percentage=5%

(b) Anchor percentage=10%

(c) Anchor percentage=20%

Fig. 5. Localization coverage

(a) Anchor percentage=5%

(b) Anchor percentage=10%

(c) Anchor percentage=20%

Fig. 6. Localization error

Discussions. It is shown in [8] that range-based ad hoc localization schemes have high requirements on the node density of the networks. The paper also shows that in a two dimensional network, the node density needs to be at least 11 in order to localize 95% nodes with less than 5% localization error when 20% anchor nodes are present in the network. From Fig. 6(c), we can observe that when there are 20% anchors, our scheme can localize more than 95% nodes with less than 5% localization error if the node density is 12 in a 3-dimensional UWSN. Compared with the results in [8] for two dimensional networks, our scheme can achieve the same performance in 3-dimensional networks, with the connectivity requirement increased from 11 to 12. This indicates the good performance of our proposed scheme. On the other hand, this connectivity requirement of 12 may be still a little high for UWSNs with expensive sensor nodes or sparse deployment. One possible solution is to distinguish between the sensor’s localization range and communication range. This means that we can increase the transmission power for the localization and beacon messages. In this way, the localization connectivity requirement can be satisfied while the contention among data will not increase much. Besides the aforementioned results, we also study the impact of confidence threshold λ, the impact of the localization message threshold N , and the performance in mobile networks. In the following, we briefly summarize our findings for each aspect. Due to space limit, however, we do not include the detailed results in this paper. Interested readers can refer to our technical report [22].

Localization for Large-Scale Underwater Sensor Networks

(a) Anchor percentage=5%

(b) Anchor percentage=10%

117

(c) Anchor percentage=20%

Fig. 7. Average communication cost

Impact of Confidence Threshold: This study suggests us that by changing the confidence threshold, we can control the tradeoff between the localization error, the localization coverage and the average communication cost. For example, with the increase of the confidence threshold, the localization coverage and the localization error will decrease, while the average communication cost will increase. For UWSNs where location information is only used for geo-routing, high localization accuracy is not required [11], but a high localization coverage is desired. For this type of networks, the confidence threshold can be set to a relatively small value. While for UWSNs which require high precise location information, the confidence value should be set to a relatively large value. Some adaptive algorithms can be used to control this important parameter to provide performance guarantees. Impact of Localization Message Threshold: This study tells us that for a network setting, there is a critical value of N . When N is smaller than this value, the localization coverage, the localization error and the average communication cost will increase rapidly. When N is larger than this value, the localization coverage and the localization error will not change much and are relatively stable. But the communication cost will continue to increase. This indicates that beyond the critical value, increasing N will only increase the communication cost and will not bring any benefits. Thus, in practice we need to carefully choose N according to the network environments. In our previously presented simulations, we set N to 5, which is the critical value of N for the considered network setting. Performance in Mobile Networks: We also conduct simulations to evaluate the performance of our scheme in mobile networks, and the results show that the localization coverage and average communication cost are not affected much by the node mobility, while the localization error increases noticeably with the node moving speed. This is mainly due to that fact that the average distance measurement error increases with the average moving speed, as naturally causes the increase of the final localization error.

4 Conclusion In this paper, we presented a hierarchical localization approach for large-scale UWSNs. In this approach, the whole localization process consists of two sub-processes: anchor

118

Z. Zhou, J.-H. Cui, and S. Zhou

node localization and ordinary node localization. We focused on the ordinary node localization, for which we proposed a distributed scheme which novelly integrates a 3-dimensional Euclidean distance estimation method and a recursive localization method. Simulation results showed that our scheme can achieve high localization coverage with relatively small localization error and low communication cost. Besides, we also investigated the tradeoffs among the node density, the anchor percentage, the localization error, the localization coverage and the communication cost in our scheme. Different networks may have different requirements for these parameters. Via changing the confidence threshold parameter of our scheme, we can well control these tradeoffs.

References 1. I. F. Akyildiz, D. Pompili, and T. Melodia. Challenges for efficient communication in underwater acoustic sensor networks. ACM SIGBED Review, 1(1):3–8, Jul 2004. 2. J. Albowitz, A. Chen, and L. Zhang. Recursive position estimation in sensor networks. In Proceedings of IEEE ICNP, pages 35–41, Nov 2001. 3. T. C. Austin, R. P. Stokey, and K. M. Sharp. PARADIGM: a buoy-based system for auv navigation and tracking. In Proceedings of MTS/IEEE Oceans, 2000. 4. C. Bechaz and H. Thomas. GIB system: The underwater GPS solution. In Proceedings of 5th Europe Conference on Underwater Acoustics, May 2000. 5. P. Biswas and Y. Ye. Theory of semidefinite programming relaxation for sensor network localization. To appear in matehmatical programming. 6. P. Biswas and Y. Ye. Semidefinite programming for ad hoc wireless sensor network localization. In Proceedings of IPSN, pages 46–54, Apr 2004. 7. N. Bulusu, J. Heidemann, and D. Estrin. GPS-less low cost outdoor localization for very small devices. IEEE Personal Communications Magazine, pages 28–34, Oct 2000. 8. K. K. Chintalapudi, A. Dhariwal, R. Govindan, and G. Sukhatme. Ad-hoc localization using range and sectoring. In Proceedings of IEEE Infocom, pages 2662–2672, Mar 2004. 9. J.-H. Cui, J. Kong, M. Gerla, and S. Zhou. Challenges: building scalable mobile underwater wireless sensor networks for aquatic applications. IEEE Network, Special Issue on Wireless Sensor Networking, pages 12–18, May 2006. 10. J. E. Garcia. Ad hoc positioning for sensors in underwater acoustic networks. In Proceedings of MTS/IEEE Oceans, pages 2338–2340, 2004. 11. T. He, C. Huang, B. M. Blum, J. A. Stankovic, and T. Abdelzaher. Range-free localization schemes for large scale sensor networks. In Proceedings of 9th annual internatonal conference on mobile computing and networking, pages 81–95, Sep 2003. 12. Kenneth and D. Frampton. Acoustic self-localization in a distributed sensor network. IEEE Sensors Journals, 6:166–172, Feb 2006. 13. J. Kong, J.-H. Cui, D. Wu, and M. Gerla. Building underwater ad-hoc networks and sensor networks for large scale real-time aquatic application. In Proceedings of IEEE Military Communications Conference (MILCOM’05), Atlantic City, New Jersey, USA, pages 1535– 1541, Oct 2005. 14. A. Mahajian and M. Walworth. 3-D position sensing using the differences in the Timeof-Flights from a wave source to various receivers. IEEE Transactions on Robotics and Automation, 17:91–94, Feb 2001. 15. D. Moore, J. Leonard, D. Rus, and S. Teller. Robust distributed network localization with noisy range measurements. In Proceedings of Sensys, pages 50–61, Nov 2004. 16. R. Nagpal, H. Shrobe, and J. Bachrach. Organizing a global coordinate system from local inforamtion on an ad hoc sensor network. In Proceedings of IPSN, Apr 2003.

Localization for Large-Scale Underwater Sensor Networks

119

17. D. Nichulescu and B. Nath. DV based positioning in ad hoc networks. Springer, Telecommunication Systems, pages 267–280, Oct 2003. 18. D. Niculescu and B. Nathi. Ad hoc positioning system (APS). In Proceedings of IEEE Globecom, pages 2926–2931, Nov 2001. 19. H. Wu, C. Wang, and N.-F. Tzeng. Novel self-configurable positioning technique for multihop wireless networks. IEEE/ACM Transaction on Networking, pages 609–621, Jun 2005. 20. P. Xie, L. Lao, and J.-H. Cui. VBF: vector-based forwarding protocol for underwater sensor networks. In Proceedings of IFIP Networking, May 2006. 21. Y. Zhang and L. Cheng. A distributed protocol for multi-hop underwater robot positioning. In Proceedings of IEEE International Conference on Robotics and Biomimetics, pages 480– 484, Aug 2004. 22. Z. Zhou, J.-H. Cui, and S. Zhou. Localization for large-scale underwater sensor networks. UCONN CSE Technical Report: UbiNet-TR06-04, http://www.cse.uconn.edu/˜jcui/ publications.html, Dec. 2006.

Location-Unaware Sensing Range Assignment in Sensor Networks Ossama Younis, Srinivasan Ramasubramanian, and Marwan Krunz Department of Electrical & Computer Engineering University of Arizona, Tucson, AZ 85721 {younis,srini,krunz}@ece.arizona.edu

Abstract. We study field-monitoring applications in which sensors are deployed in large numbers and the sensing process is expensive. In such applications, nodes should use the minimum possible sensing ranges to prolong the “coverage time” of the network. We investigate how to determine such minimum ranges in a distributed fashion when the nodes are location-unaware. We develop a distributed protocol (SRAP) that assigns shorter ranges to nodes with less remaining batteries. To handle location-unawareness, we develop a novel algorithm (VICON) for determining the virtual coordinates of the neighbors of each sensor. VICON relies on approximate neighbor distances and 2-hop neighborhood information. Our simulations indicate that SRAP results in significant coverage time improvement even under inaccurate distance estimation.

1 Introduction Sensor monitoring applications require node collaboration to maximize the network “coverage time,” defined as the time during which a specified fraction of the area is continuously monitored. In this work, we focus on applications in which the sensing process is the dominant source of energy consumption and sensing ranges are adjustable. Examples of such applications are those requiring sensors to send continuous longrange pulses for object detection (e.g., RADAR systems). In these applications, sensing is a continuous active process, while communication and processing are only invoked whenever an object of interest is detected. Other example applications are those requiring each sensor to analyze the collected data (e.g., environmental traces or images) before reporting it. Reducing the sensing range in such applications results in significant reduction in the data set to be analyzed, thus conserving energy. Currently, some commercially available sensors are capable of adjusting their sensing levels to control the cost associated with the sensing process (e.g., the Osiris photoelectric sensor [6]). We study how to assign the minimum possible sensing range to every sensor without degrading field coverage. Selecting the optimal sensing ranges for all the sensors is an NP-hard problem [10] (the simplified version of this problem in which each sensor is either ON or OFF is also NP-hard [2]). In previous research that considered nodes with adjustable ranges, greedy techniques were proposed for target monitoring [1] or

This work was supported in part by the National Science Foundation under grants CNS0627118, CNS-0313234, 0325979, and 0435490.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 120–131, 2007. c IFIP International Federation for Information Processing 2007

Location-Unaware Sensing Range Assignment in Sensor Networks

121

constructing connected covers [10]. However, the problem is more challenging in location-unaware networks in which a sensor is not capable of determining its location or the directions of the incoming signals. This occurs when the sensors can not perform network-wide localization based on location-aware anchor nodes (e.g., in forests or outer space). 1.1 Contributions We develop a distributed sensing-range assignment protocol (SRAP) for locationunaware sensor networks, assuming that every node can tune its sensing range to one of an available set of ranges. In such networks, nodes are not aware of the field “boundary,” and therefore the objective of every sensor is to cover its own maximum sensing region. SRAP employs a novel localized algorithm (VICON) for determining the virtual coordinates of the neighbors of every node prior to range selection1 . At a node v, VICON exploits the 2-hop neighborhood information and the estimated distances between v and its neighbors. VICON employs conservative heuristics to place as many neighbors of v as possible when the estimated distances are inaccurate or the graph of v’s neighbors is disconnected. To prolong the lifetime of every sensor, SRAP assigns sensing ranges based on the remaining sensor batteries. SRAP is also superior to previous work in eliminating redundancy. 1.2 Related Work Under fixed sensing ranges, a node can be either ON or OFF. All previously proposed protocols for this model assumed that node locations or directions of neighbors can be estimated (refer to [8] for a list of these protocols). More recent proposals assumed variable sensing ranges. Cardei et al. [1] proposed centralized and distributed heuristics for maximizing the number of set covers (AR-SC) under this model. Their approach assumes synchronized nodes, base station intervention, and knowledge of node positions. We do not assume any of these capabilities and study a more general model. However, we use the greedy approach in [1] as a baseline for comparison. Zhou et al. [10] proposed another greedy algorithm for selecting a connected cover to optimize query execution under variable sensing and communication ranges. They focused on maintaining both network connectivity and field coverage. Our approach can be integrated with the one in [10] to maintain connected covers in location-unaware networks. The rest of the paper is organized as follows. Section 2 provides the problem formulation. Section 3 introduces the VICON (VIrtual COordinates of Neighbors) algorithm. Section 4 provides details of the SRAP protocol and its properties. Section 5 evaluates the performance of SRAP. Finally, Section 6 gives concluding remarks.

2 Problem Statement Assumptions: Let the maximum transmission range of each node be Rt . We refer to a node within distance ≤ Rt as a “neighbor.” We assume the following: (1) nodes 1

Note that SRAP is independent of VICON.

122

O. Younis, S. Ramasubramanian, and M. Krunz

are stationary; (2) each node has a set of k usable sensing levels, which correspond to sensing ranges R1 , . . . , Rk , where Rk is the maximum sensing range. Turning off the sensing component corresponds to R0 = 0; (3) energy depletion is proportional to Rim , where 1 ≤ i ≤ k and m is a constant ≥ 1; (4) a node can sense an event within a circular “sensing region” around it; (5) the sensing component in each node is continuously active and the sensing process is energy-intensive. The radio component, however, employs a low duty cycle; (6) neighbor locations and directions of received signals cannot be estimated; and (7) a node can estimate the distance between itself and a neighbor based on well-known techniques such as the time of arrival, received signal strength, etc [9]. For simplicity, we assume that Rt ≥ Rk . We use a conservative approach to estimate neighbor distances in which Rt is divided into a discrete set of nd distances and every range of signal strengths maps to one of these distances. Every node broadcasts the estimated distances to its neighbors so that every node is aware of its 2-hop neighborhood. We account for the inaccuracy in distance estimation in our algorithm presented in Section 3 and evaluate its effect in Section 5. Objectives: Given a set of N deployed sensors, it is required to assign every sensor i, 1 ≤ i ≤ N , the minimum sensing range Rj , where 0 ≤ j ≤ k, such that i’s sensing region is covered. Because the field boundary is unknown to individual sensors, the objective of every sensor is to ensure that its maximum sensing region is covered.

3 The VICON Algorithm In VICON, a node computes “virtual” coordinates of its neighbors. A virtual coordinate space (VCS) of node v’s neighbors is one that keeps the connectivity profile of the real coordinate space (RCS). That is, the distances and angles between the neighbors of v are preserved. However, the VCS can have the neighbors rotated, which does not affect the coverage properties. The problem of assigning neighbor coordinates is a special instance of the “graph embedding” problem, which was studied extensively in the literatures of graph theory and computational geometry [3]. Computing virtual coordinates was also studied in the networking literature, e.g., [5, 7]. In these studies, the objective was to assign coordinates to all the nodes in the network. Such approaches are not suitable for our work for three reasons. First, we do not have anchor nodes in the network since the entire network is location-unaware. Second, basic triangulation techniques do not handle disconnected graphs and fail when distances are inaccurate. Finally, we only require each node to compute the virtual coordinates of its neighbors, and do not need to compute network-wide coordinates. Our approach is a lightweight algorithm that can be easily employed in dynamic networks where new nodes are deployed at any time. We first describe VICON assuming accurate estimates of distances. Then, we extend it to mitigate the negative effects of inaccurate distance estimation. 3.1 Details of VICON Prior to executing VICON, each node is aware of its 2-hop connectivity information (reachability and distances) through neighbor broadcasts. A node v executing VICON

Location-Unaware Sensing Range Assignment in Sensor Networks

123

proceeds as follows. Assume that v has three neighbors v1 , v2 , and v3 , as depicted in Fig. 1(a). Node v assumes that it is positioned at the origin and places its first neighbor (v1 ) at (d1 ,0), where d1 is the distance between v and v1 (see Fig. 1(b)). Using v1 , v2 , v, v1 , and v, v2 , v can compute the angle g1 shown in Fig. 1(a). To determine the virtual coordinates of v2 , v2 is rotated by an angle g1 from the origin in the counter-clockwise direction. Similarly, v3 is rotated in the counter-clockwise direction with an angle g2 and assigned a tentative coordinate. The validity of this coordinate is then tested against all the already-placed sensors to determine whether the original connectivity is preserved. In this example, rotating v3 in the counter-clockwise direction causes it to be a neighbor of v2 , which contradicts with the RCS. Therefore, v3 is rotated by an angle g2 in the clockwise direction. Figure 1(b) illustrates that v is still covered by three nodes that are within g1 + g2 total angle, and are all on one side of v.

v 1 (8,21)

v 2 (20,21)

v2 v (0,0)

(12.8,7.2)

v1

g1 g2

(20,0)

g1

g2

v (20,5)

v 3(-4,5)

v3

(0,0)

(a) RCS

(14.4,-19.2)

(b) VCS

Fig. 1. Executing VICON at a node v to determine the virtual coordinates of v’s neighbors

Two problems have to be considered. The first problem is depicted in Figure 2(a), where v’s neighbors form more than one connected component. This results in having a subset of the neighbors unable to find reference nodes that are already placed in the VCS. VICON handles this problem as follows. First, v’s neighbors are divided into groups, where each group represents a connected component (e.g., Fig. 2(a) shows two groups: {v1 , v2 } and {v3 , v4 }). Second, the coordinates of the neighbors in each group are computed independently from the other groups. Finally, each group, other than the first one, is rotated to preserve the RCS connectivity. This is depicted in Fig. 2(b), where the two groups are placed closest to each other while preserving their disjointedness.

v3

v2

v3

v v4

v1

(a) RCS

v4

v2 v

v1

(b) VCS

Fig. 2. Assigning coordinates to disjoint neighbor groups

The second problem is that a node may satisfy the connectivity requirements with the already-placed neighbors in both the clockwise or counter-clockwise direction. The

124

O. Younis, S. Ramasubramanian, and M. Krunz

problem is demonstrated in Fig. 3(a), where node v3 is a neighbor of node v1 but not of v2 or v4 . In the VCS, v1 and v2 are placed first. Node v3 is placed in the counterclockwise direction from v1 (as shown in Fig. 3(b)) and it also satisfies the connectivity of the RCS when placed in the clockwise direction. As a result, v fails to determine a virtual coordinate for v4 . v4

v4

v2

v1 v3

v2

v3 UNSURE

v1

v

v

v4 (a) RCS

(b) VCS

Fig. 3. Failure to compute the virtual coordinates of v4 due to incorrect placement of v3

The above problem can be addressed using the following recursive approach. Assume that node v has a list of Nnbr neighbors. Node v processes these neighbors in sequence and pushes the IDs of the successfully placed neighbors in a stack named FinishedNbr. A neighbor that can be successfully placed in two positions is marked “UNSURE” in FinishedNbr, while a neighbor that can only be placed in one position is marked “SURE.” If v fails to compute coordinates for a neighbor i (2 < i ≤ Nnbr ), then it pops neighbor IDs from FinishedNbr until it finds one referring to an UNSURE neighbor. This neighbor is then placed in the alternative direction, marked SURE, and pushed back in FinishedNbr. VICON then attempts to re-process the pushed-out neighbors. This approach ensures that incorrectly selected coordinates are corrected as more neighbors are placed. In our example depicted in Fig. 3(b), node v3 is marked UNSURE when placed. When v fails to place v4 , it pops v3 from FinishedNbr, places it in clockwise direction relative to v1 , then successfully places v4 . VICON does not preserve the directions of neighbors, which is not a problem since the objective of every node is to determine “how much” area is uncovered, and not “which” area. Pseudo-code and proof of correctness of VICON can be found in [8]. 3.2 VICON Under Inaccurate Distance Estimation Inaccurate distance estimation may cause failures in node placement due to either magnifying or shrinking the angles between a node and its neighbors. We conducted numerical experiments under different settings to study the reasons behind this failure. These experiments revealed two important observations: (1) placement inaccuracy within a maximum inaccuracy I = Rt /nd can be tolerated without sacrificing coverage, and (2) our distance estimation is overconservative for some distances, and less conservative for others. Based on these observations, we extend the basic VICON algorithm as follows. – Assume that the distance d between node v and one of its neighbors u corresponds ˆ to the discrete distance dˆ (dˆ ≥ d). We set u’s distance from v to be d−I/2 to achieve average uncertainty in distance estimation instead of maximum uncertainty.

Location-Unaware Sensing Range Assignment in Sensor Networks

125

– The computed virtual coordinate of a neighbor u is acceptable if it preserves neighborhood within a distance ≤ I of all u’s neighbors. Note that under high densities, the shift in the angles can add up and result in failure to place some neighbors. Thus, the above measures do not ensure that all the neighbors will be eventually placed. In [8], we show the effect of node density on successful neighbor placement.

4 SRAP Protocol Protocol Design: SRAP assigns longer ranges to nodes with higher “weights,” where a dynamic parameter is used to represent the weight of a node (e.g., remaining energy). In addition, SRAP is re-triggered at fixed intervals of time, referred to as the cover update interval tcu , to efficiently balance the load among sensors. The SRAP protocol is executed at every node in the network, typically via timer expiration2. Since sensor clocks are typically unsynchronized, the node with the fastest clock in its 1-hop neighborhood sends a message to its neighbors to trigger the execution of SRAP. Consequently, every node that receives this message sends a similar triggering message prior to executing SRAP. We assume that a node can be in one of two states: DECIDED or UNDECIDED and all the nodes start in the UNDECIDED state. SRAP has three phases: Phase I is for initialization, Phase II is the core operation of SRAP in which a node v decides on a sensing range R, and Phase III is for the optimization of R. A summary of the three phases is shown in Fig. 4. Phase I. In the first phase of SRAP, v computes a real-valued weight wgt(v) as: wgt(v) = E(v)/Emax , where E(v) is the remaining energy in v’s battery and Emax is the maximum battery capacity. A neighbor discovery process is then initiated in which v broadcasts wgt(v). Based on the replies that v receives, it broadcasts its neighborhood table (which includes the estimated neighbor distances). In the second step of this phase, v executes VICON to compute the virtual coordinates of its neighbors (this step is independent of SRAP). The final step is to check whether v has to use its maximum sensing range Rk or not. This is done by having v assume that all its neighbors are using Rk , and check if any part of its sensing region is not covered. If v passes this test, it quits SRAP and uses Rk . Otherwise, v executes Phase II. Phase II. Node v computes its sensing range R based on its weight and the weights of its neighbors. Node v does not make a decision on R unless it has the highest weight among all of its undecided neighbors. This gives a chance for nodes with higher weights to decide first and choose longer ranges. At the same time, v sets a timer T1 (similar for all nodes) for this phase. If T1 expires before a decision is made, v computes its range assuming that all undecided neighbors use R0 . The function Compute R(v) proceeds as follows. Node v first sets its range R to Rk−1 and sets the range of every undecided neighbor u to the largest Rj smaller than [(wgt(u)/wgt(v))1/m × R], where j ≤ k − 1 and m is a constant. Note that 2

SRAP can be triggered asynchronously when detecting events such as node failures or when new nodes are deployed.

126

O. Younis, S. Ramasubramanian, and M. Krunz

Start Compute wgt( v) Discover neighbors VICON( v)

PHASE I (Initialization)

v has to use MAX_RANGE?

Y

R=Rk

N

PHASE II (Core operation)

Set Timer T

1

wgt(v) is the highest among undecided neighbors?

N N T 1 expired

Y Compute_R( v) State = DECIDED

PHASE III (Optimization)

N

Y State = DECIDED?

Minimize R using token exchange End

Fig. 4. The SRAP protocol executed at node v

wgt(u)/wgt(v) < 1, which means that v’s undecided neighbors are assumed to use sensing ranges < Rk−1 . Decided neighbors are set to the ranges that they have decided on. If this assignment results in covering the sensing region of v, v sets its range R to Rk−2 and the same process is repeated. If range Ri , 0 ≤ i < k, fails to ensure complete coverage of v’s region, then v uses R = Ri+1 , changes its state to DECIDED, and advertises R to its neighbors3. Phase III. After Phase II, v can terminate SRAP and use its selected R. However, redundancies may have been introduced due to the order of the decision-making process in Phase II. Therefore, we propose an iterative approach for removing redundancies. When the node with the least weight in its neighborhood selects its sensing range, it sends a token to its neighbors, allowing them to proceed with Phase III. A node v starts a timer T2 (of a few seconds granularity) when it receives the first token from one of its neighbors. It waits to receive tokens from all the neighbors with less weights than its own. Once these tokens are available, v computes its final sensing range based on the advertised ranges of its neighbors, and releases a token that advertises the new range of v. If T2 expires before v has received enough tokens, it keeps its range as computed in Phase II and releases its token. Analysis of SRAP: We analyze the SRAP protocol in terms of its correctness, computational complexity, and message overhead. 3

Note that loss of messages in Phase II may only result in more conservative estimation of R. However, termination is not affected.

Location-Unaware Sensing Range Assignment in Sensor Networks

127

Lemma 1. When SRAP terminates, the sensing region of every node with non-depleted battery is completely covered (Coverage property). Proof. When SRAP is executed at node v, two operations affect the final coverage of v’s sensing region: 1. Selection of v’s sensing range in Phase I and II. In Phase I, if v determines that its sensing region can not be completely covered by its neighbors, it sets its sensing range to Rk and terminates SRAP. If v goes through Phase II, it selects its sensing range based on both advertised ranges of its decided neighbors and hypothetical ranges of its undecided neighbors. An undecided neighbor will not be able to select a sensing range less than the largest hypothetical range made by any of its neighbors, unless its region is covered. This ensures that v’s sensing region is completely covered. 2. Reduction of sensing ranges in Phase III. A “hole” in field coverage may occur when two neighboring nodes (e.g., v1 and v2 ) are allowed to reduce their sensing ranges simultaneously. Such scenario implies that both v1 and v2 had all the tokens they need from their neighbors with less weights to start Phase III simultaneously. This is not possible since we assume that the weight is a real number and either v1 or v2 will have less weight than the other. 2 Lemma 2. Every node v selects the minimum sensing range that satisfies coverage of its sensing region if accurate distances are used and the neighbors of v do not form multiple disjoint components (Minimality property). Proof. Let us first assume that the estimated neighbor distances are accurate; i.e., VICON computes virtual coordinates for all the neighbors of v. Also assume that v has selected R = Ri although R = Rj (j < i) was sufficient to have v’s region covered. This may occur in Phase II depending on the order of SRAP execution among neighboring nodes of v. However, when Phase III is executed, v will be able to compute the minimum range R based on its neighbors final decisions. Since nodes getting tokens after v are only allowed to reduce their sensing ranges, the selected R is minimal (this applies to all nodes). Along with selecting covers from higher-weight nodes and refreshing covers, this result has a significant impact on the perceived coverage time. If accurate distances are used for computing virtual coordinates, minimality can only be violated if the neighbors of v form multiple disjoint components. This is unlikely to occur, however, in dense networks. On the other hand, if inaccurate distances are used for obtaining virtual coordinates, minimality can be violated. The redundancy introduced in this case depends on node density and distribution in the field. Message overhead. Four types of message exchange are required: (1) neighbor discovery, which requires O(1) messages, (2) advertisement of node’s weight, which requires only one message whenever SRAP is re-triggered, (3) advertisement of the selected range, which requires O(1) messages, and (4) token exchange, which requires one message. Therefore, the total message overhead of SRAP is O(1) per node. Note that if in future applications more parameters are added to the node weight computation (e.g., mobility, remaining uncovered area, etc.), then a node v’s weight has to be advertised whenever one of v’s neighbors decides its range. This raises the message overhead to O(Nnbr ), where Nnbr is the average number of neighbors per node

128

O. Younis, S. Ramasubramanian, and M. Krunz

(to ensure connectivity, Nnbr must be = O(log N ) in randomly deployed networks [4]). Note that Phase III may have to be modified in this case since it relies on a parameter that is assumed to be fixed during the range assignment process. Time complexity. The time complexity of SRAP has two components: (1) convergence speed of any node in the network (ignoring the timers in the protocol), and (2) processing complexity. Phase I has O(1) convergence speed. The average-case convergence speed is proportional to the average number of neighbors of any node. The worst-case convergence speed for phases II and III can be proportional to the number of nodes (under very pessimistic distribution of nodes’ remaining battery levels). (In our experiments, we found that the convergence speed of SRAP is significantly less than the worst case.) This justifies the use of timers T1 and T2 to limit the convergence speed and avoid indefinite waits in case of failures. The main processing complexity in SRAP is in testing whether the node’s sensing region is covered. This test is performed once in phases I and III, and every time a neighbor selects its range in Phase II. If we discretize the sensing region into a number of P points, then the complexity of the test is O(P Nnbr ), where Nnbr is the average number of neighbors. The other source of complexity is the VICON algorithm, which 2 is executed only once in static networks. VICON has a complexity of O(αNnbr ) where 5 α = 2 in the worst case since there can exist at most five neighbors of a node that are pairwise non-neighbors. Each of these neighbors can be assigned to at most two positions. Therefore, VICON does not introduce significant computational complexity.

5 Performance Evaluation We study an operational scenario in which a number of sensors send their reports to a base station via multi-hop communication. We assume that nodes that are randomly distributed in a field from (0,0) to (50,50). A base station is placed at (25,25). All the nodes start with full batteries and the network is considered dead when the base station is disconnected. The simulation parameters used in our experiments are as follows: N = 900 nodes, Rt = 5 meters, number of discretized distances nd = 5, k = 4, Rk = 5 meters, battery capacity = 1.0 Joule, communication energy Ecomm = 10−6 Watt, energy consumption parameter m = 2, and cover update interval tcu = 2000 seconds. For radio communications, we assume that a fixed amount of power is consumed from every active node during its operation. We set the energy consumed in communication to correspond to the energy consumed at R1 . We developed a discrete event-driven simulator that is scalable and efficient for largescale networks. We compare SRAP to AR-SC [1]. AR-SC is a distributed protocol proposed for target coverage. However, we extend it to area coverage by discretizing the field into a large number of points. AR-SC gives priority in decision-making (range assignment) to nodes seeing more uncovered targets. This is similar to typical set cover algorithms that aim at reducing the size of the selected set (e.g., [2]). We assume ideal conditions for the operation of AR-SC, which include full node synchronization, optimal sequence of decision-making according to node priorities, and knowledge of the exact node coordinates.

Location-Unaware Sensing Range Assignment in Sensor Networks

129

0.006

700 CentralizedApp

600

0.005

SRAP-A

500

Cost (Joule)

Average number of nodes

We also compare SRAP to a generic centralized greedy algorithm (which we refer to as “CentralizedApp”). In CentralizedApp, a centralized entity that is aware of the locations of all the nodes in the network is responsible for range assignment. The network operation is divided into phases of equal duration. Given the energy spent by each sensor at the end of phase i, the minimal cover for phase i + 1 is chosen such that the maximum energy spent by a sensor at the end of phase i + 1 is minimized. The algorithm selects a minimal cover as follows. All the sensors are assumed to employ the maximum sensing range for phase i + 1. The sensors are arranged in the descending order of the expected energy spent at the end of phase i + 1. The algorithm selects the sensor with the highest value (say v). If reducing v’s range by one step (i.e., from Rj to Rj−1 , 0 < j ≤ k) violates coverage, then v’s sensing range is kept at Rj . Otherwise, v’s sensing range is reduced to Rj−1 . The expected energy spent at the end of phase i + 1 is updated, as well as the ordered set of sensors. The procedure is repeated until a minimal cover is obtained. Although, the algorithm is described here as a centralized manner, it may be distributed in a distributed manner using only one-hop neighborhood information. As the sensors reduce their range one step at a time, the worst-case running time of the algorithm is O(N k). We study the operation of SRAP under accurate distance measurements (SRAP-A) and under discretized (inaccurate) distances (SRAP-D). For a fair comparison, we assume that the network boundary is not known by the application employing any of the compared techniques. We assume no packets are lost at the MAC layer. Packet losses may only add some redundancies to field coverage but have no impact on the operation of SRAP. The results provided below are the average of 10 experiments.

SRAP-D 400 AR-SC 300 200 100

0.004 0.003 0.002

CentralizedApp SRAP-A SRAP-D AR-SC

0.001

0 0

1

2

3

4

0 100

Sensing level

(a) Average number of nodes at each sensing level

300

500 700 Number of nodes

900

(b) Energy cost of a selected cover.

Fig. 5. Properties of SRAP

Properties of SRAP and VICON. We first focus on the selection of one cover by any of the compared algorithms. All the nodes are assumed to be alive. We compute the number of nodes selected at the k sensing levels (in addition to R0 = 0), the cost of the selected cover (energy consumed in the network during its operation), and the ratio of successfully placed neighbors per node for SRAP-D. Figure 5(a) demonstrates the number of nodes at each sensing level for different node densities. SRAP shows more collaborative behavior under both accurate and discretized

130

O. Younis, S. Ramasubramanian, and M. Krunz

360

1

300

Coverage quality

Coverage time (seconds) x 10-3

distances than AR-SC. CentralizedApp shows the best collaborative behavior because it can reduce the ranges iteratively and not at one step per node as in SRAP and AR-SC. Figure 5(b) shows the cover cost for all algorithms. With maximum sensing ranges, the energy consumed in the network can be computed as (R4 × R4 + 1) × Ecomm (which is 0.0234 Joule for N = 900). As expected, CentralizedApp gives the smallest cover cost. SRAP significantly reduces the cover cost over AR-SC, especially when distances are accurately estimated. SRAP-D and AR-SC show, respectively, about 10-20% and 30-70% increase in cover cost over SRAP-A.

240 180

CentralizedApp SRAP-A SRAP-D AR-SC

120 60

0.75

0.5 CentralizedApp SRAP-A SRAP-D 0.25 AR-SC No-Adjust 0

0

20

40 60 Coverage quality

80

100

(a) Coverage time while quality is within five ranges: 0-20%, 20-40%, 40-60%, 6080%, and 80-100%

0

60

120

180

240

300

360

Time (seconds) x 10-3

(b) Coverage quality over time in a complete run

Fig. 6. Performance of SRAP in contrast to AR-SC and CentralizedApp

Network Operation. We now evaluate the network operation when sensing range assignment is employed. We focus on three metrics. The first metric is the duration while the coverage quality of the field is within a specified range. The second metric is the coverage quality over time as the network operates. Coverage quality is the fraction of field coverage at specific instances of time. The third metric is coverage redundancy, which is the number of sensors covering the least covered point within a sensor’s region. A coverage redundancy of 1 means that there is at least one point within any sensor’s region that is covered by only one sensor. This metric indicates how minimal the selected cover and ranges are. Figure 6(a) shows the coverage time during which the percentage of field coverage is within a specific range of coverage quality. CentralizedApp shows about 50% coverage time improvement over SRAP. SRAP-A and SRAP-D significantly improve coverage time over AR-SC, especially at the higher coverage quality ranges (60-80% and 80100%). This is a desirable effect for applications that try to maximize field coverage for the longest possible time. Figure 6(b) demonstrates the coverage quality of the field over time as the network is operating. We include results of the application when operated without sensing range adjustment (referred to as “No-Adjust”), i.e., all the sensors use their maximum sensing ranges (Rk ). The figure shows that under CentralizedApp and SRAP nodes die smoothly over time because of periodically refreshing the selected sensing ranges based on a dynamic parameter (battery level). We also study the redundancy in the selected covers under SRAP and AR-SC. CentralizedApp guarantees no

Location-Unaware Sensing Range Assignment in Sensor Networks

131

redundancy in the selected cover and thus is not included in this experiment. Results (reported in [8]) indicate that for SRAP-A, the redundancy does not exceed 1 by more than 2-3%. We closely examined these redundancies and found that they occur at sensors which are assigned range R0 . For SRAP-D, redundancy may reach 9-10% due to the failure of VICON to place some neighbors for each node.

6 Conclusion We studied the problem of sensing range assignment in location-unaware networks. To handle location-unawareness, we proposed a novel localized algorithm (VICON) that each node uses to compute virtual coordinates of its neighbors. We then proposed a distributed protocol (SRAP) which periodically assigns sensing ranges to nodes based on their remaining battery powers. SRAP has negligible message overhead and computational complexity. Our simulation results indicate that SRAP significantly improves coverage time, even under inaccurate distance estimation. To extend the functionality of SRAP for different applications, we plan to study how to incorporate other parameters in the node weights, such as mobility, node degree, or potential coverage.

References [1] M. Cardei, J. Wu, M. Lu, and M. Pervaiz. Maximum network lifetime in wireless sensor networks with adjustable sensing ranges. In Proc. of the IEEE Intl. Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), August 2005. [2] U. Feige. A threshold of ln n for approximating set cover. Journal of the ACM, 45(4): 634–652, July 1998. [3] J. L. Gross and T. Tucker. Topological Graph Theory. John Wiley and Sons, 2001. [4] P. Gupta and P. R. Kumar. Critical power for asymptotic connectivity in wireless networks. Stochastic Analysis, Control, Optimizations, and Applications: A Volume in Honor of W.H. Fleming, W.M. McEneaney, G. Yin, and Q. Zhang (Eds.), Birkhauser, 1998. [5] T. Moscibroda, R. O’Dell, M. Wattenhofer, and R. Wattenhofer. Virtual coordinates for ad hoc and sensor networks. In Proc. of DIALM-POMC, October 2004. [6] Osiris Photoelectric Sensors, http://schneider-electric.ca/www/en/products/sensors2000/ html/osiris.htm, 2007. [7] A. Rao, C. Papadimitriou, S. Ratnasamy, S. Shenker, and I. Stoica. Geographic routing without location information. In Proc. of the ACM MobiCom Conference, September 2003. [8] O. Younis, M. Krunz, and S. Ramasubramanian. Sensing range assignment in locationunaware networks. Technical report, University of Arizona, November 2006. [9] M. Youssef and A. Agrawala. The Horus WLAN location determination system. In Proc. of the ACM International Conference on Mobile Systems, Applications, and Services (ACM MobiSys), June 2005. [10] Z. Zhou, S. Das, and H. Gupta. Variable radii connected sensor cover in sensor networks. In Proc. of the IEEE Communications Society Conference on Sensors and Ad Hoc Comm. and Networks (SECON), September 2004.

A Distributed Energy-Efficient Topology Control Routing for Mobile Wireless Sensor Networks Yan Ren, Bo Wang, Sidong Zhang, and Hongke Zhang School of Electronics and Information Engineering, Beijing Jiaotong University, 100044 Beijing, China {yren,bwang,sdzhang,hkzhang}@center.njtu.edu.cn

Abstract. One of the fundamental issues in wireless sensor networks (WSNs) is the topology control (TC) problem, which reflects how well the energy consumption is reduced and the network capacity is enhanced. In this paper, by using computational geometry theoretic, we present a fully distributed routing protocol, Cooperative Energy-efficient Topology Control (Co-ETC), whose goal is to achieve energy efficiency in mobile wireless sensor networks. Based on underlying routing graph, the proposed scheme allows each node (with or without mobility) to locally select communication neighbors and dynamically adjust its transmission radius accordingly, such that all nodes together self-form a energy-efficient topology. The simulation results indicated that the proposed scheme was feasible. Compared with existing state-of-the-art algorithms and protocols, Co-ETC has better energy-efficiency. Moreover, it can adapt to mobile environment well. Keywords: topology control, energy efficiency, routing protocol, wireless sensor networks.

1 Introduction In recent years, extensive research has been conducted on wireless sensor networks (WSNs), considered one of the top research topics [1]. Such environments may have a large number of small sensor nodes, each capable of collecting, storing, processing observations and communicating over short-range wireless interfaces and multiple hops to central locations called sinks. Since sensors may be spread in an arbitrary manner, one of the fundamental issues that arise naturally in WSNs is topology control (TC) problem. Topology control technique is to let each wireless node locally adjust its transmission range and select certain neighbors for communication, while maintaining a structure that can support energy efficient routing and improve the overall network performance [2]. In general, it can be considered as a measure of the quality of service of a WSN. For instance, one may ask how well the network can maintain some global graph property (e.g., connectivity). Furthermore, TC formulations can reduce energy consumption and enhance network capacity. Due to mobility, sensor nodes (even sink) may change their locations after initial deployment in many scenarios. Mobility can result from environmental influences I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 132–142, 2007. © IFIP International Federation for Information Processing 2007

A Distributed Energy-Efficient Topology Control Routing

133

(e.g., wind or water), available mobile platforms in the deployment area (e.g., robots in object tracking, soldiers in battlefield surveillance) or mobile devices can be incorporated into the design of the WSN architecture (e.g., airborne and vehicles) [3]. The impact of mobility on the effectiveness of TC is twofold: Increased message overhead (especially in the case of high mobility scenarios) and nonuniform node spatial distribution. Considering the impact above, deriving results regarding mobile WSNs is even more challenging. Several TC algorithms or routing protocols [3], [4] which use different underlying routing graphs with several good properties have been proposed in last few years. However, to our knowledge none of them has been defined to explicitly deal with all four key properties for unicast routing graph on WSNs simultaneously: power spanner, sparse, localized and degree-bounded. By using computational geometry theoretic techniques, we present a cooperative topology control routing protocol with following properties: 1.

2.

3. 4. 5.

Energy-efficient. It ensures that the routes calculated on given routing graph are at most a constant factor (defined as power stretch factor that will be specified later in this paper) away from the power-optimal routes. Sparse. There is linear number of edges in given network. It eases the task of finding path and maintaining the route path in the presence of node mobility, and it can reduce the communication overhead. Bounded node degree. Bounded node degree can reduce bottleneck and neighbor signal interference in network. Running in distributed fashion. Every node compute the scalable routing graph cooperatively using the information only provided by its neighbor. Adaptive to mobile environment. This ensures that topology requires little maintenance in the presence of mobility that could change routing graph to some extent.

The remainder of this article is organized as follows: In the next section, we first give some preliminaries and define the network model. The basic design of Co-ETC protocol is presented in Section 3. We then discuss several extensions in Section 4. Specifically, we consider how to find an energy-efficient-path and how to avoid “bottleneck” node in communication graph. Finally, we evaluate the performance of Co-ETC in Section 5, conclude the paper and discuss possible future research directions in Section 6.

2 Preliminaries and Network Model In our subsequent discussions, network topology is represented by an undirected simple graph G = (V, E) when all the nodes transmit at maximum power, where V = {v1, v2, …, vn} is the set of n wireless sensor nodes and E is the set of links in the WSN. For sake of simplicity, we will assume that every node has the same maximum transmission range Rmax. A node can reach all nodes (called neighbors) inside its transmission region. We also assume that each node is assigned a unique identifier, ID and knows its location.

134

Y. Ren et al.

Before we design the Co-ETC protocol, we first give some computational geometry concepts, which will be used as the foundations of geometry for Co-ETC: Definition 1 (Voronoi Diagram and Delaunay Triangulation). The Voronoi diagram, denoted by VD(V) (real lines in Fig. 1), of a set of discrete nodes partitions the plane into a set of convex polygons (called voronoi regions) such that all sites inside a voronoi region are closest to only one node. This construction effectively produces polygons with edges that are equidistant from neighboring nodes. The Delaunay triangulation, denoted by DT(V) (dotted lines in Fig. 1), is the dual graph of the Voronoi Diagram. It is the unique triangulation such that the circumcircle of every triangle contains no nodes of V in its interior.

Fig. 1. Voronoi Diagram and Delaunary Triangulation

Definition 2 (Gabriel Graph). The Gabriel Graph, denoted by GG(V), consists of all edges uv such that the disk(a, b) does not contain any node from V, where disk(a, b) is the closed disk with diameter ab (see Fig. 2).

Fig. 2. Gabriel Graph

Definition 3 (Power Stretch Factor). The power stretch factor, denoted by ρp, respect to G is the maximum over all possible node pairs of the ratio between the cost of the minimum-power path in G’ (Any arbitrary subgraph of G, e.g., DT(V) and GG(V)) and in G, where is represented by pG’(u, v) and pG(u, v) respectively. In other words,

A Distributed Energy-Efficient Topology Control Routing

ρ p = max u ,v∈V

p G ′ (u ,v ) pG (u ,v )

135

(1)

Notice that, generally, we would like to use a subgraph G’ (also called a routing graph) which has a low-power stretch factor and which is sparser than the original graph G. Such a routing graph can be used to compute routes with guarantee that the energy needed to communicate along is almost minimal. In addition, computing optimal routes in G’ is easier and has less communication overhead than in G. Moreover, such a sparse routing graph requires little maintenance in the presence of node mobility. The power stretch factor and maximum node degree of the graphs defined previously have been analyzed in [5], and are reported in Table I. Table 1. Power Stretch Factor and Maximum Node Degree of DT and GG

Power Stretch Factor DT GG

⎛1 + 5 ⎞ ⎜ ⎟ ⎜ 2 π⎟ ⎝ ⎠ 1

Maximum Node Degree

α

Θ(n) n-1

As shown, the Gabriel Graph is energy-optimal since it has a power stretch factor of 1. And neither DT(V) nor GG(V) has constant maximum node degree. We will utilize these attractive routing graphs for extensions of our Co-ETC protocol in Section 4.

3 Design of Co-ETC Protocol In this section, we consider base design of the Co-ETC protocol, which enables each node to maintain the relevant part of the Voronoi routing graph efficiently and consistently, with the knowledge only provided by its neighbors, and manage topology changes due to mobility, node joining/leaving. The motivation of the proposed protocol stems from the commonality encountered in the mobile WSNs. CoETC is composed of three main procedures: route graph setup, node joining and leaving, and node movement. 3.1 Routing Graph Setup Initially, when a node wants to communicate with another one, no prior knowledge of the routing graph is available. For sake of simplicity, we will assume that every node has the same initial transmission range R. This ensures routing graph is consistently for adjacency neighbors. To avoid message collision, every node waits for a random back-off time period Ti, then simply flood a query with its own ID and coordinates to neighbors. After every node has all its neighbors’ coordinates according to the definition of VD in Section 2, each node generate the local VD itself, and the union of local graphs corresponds to all the sensors in V (see Fig. 3). Notice that, each node has to have enough R to reach its neighbors at the beginning. Generally, we assign it equal to the maximum transmission range Rmax.

136

Y. Ren et al.

Fig. 3. Voronoi Diagram in WSN

In above routing graph, each node maintains a Voronoi diagram of all neighbors and directly connects to them with minimal latency. As only a few neighbors are kept, the cost to maintain such a routing graph at each node is low. 3.2 Node Joining and Leaving A joining node (mark as ▲) first sends a Query message with its joining coordinates and ID to neighbors. Any acceptors (the node whose region contains the joining node’s coordinates) will respond by sending a list of joining node’s adjacency neighbors (the node whose Voronoi regions border the given node’s, e.g., we mark such nodes as ● in Fig. 4). Notice that, there must be only one acceptor except it on the Voronoi lines. The joining node first computes and organizes its local Voronoi diagram, and then connects to each adjacency neighbor of itself (affected by joining). Related neighbors also update their Voronoi diagram to account for the joining node. When a node will leave from WSN due to node failures or power duty cycling reasons etc., the leaving node simply disconnects. Its adjacency neighbors will update their Voronoi diagrams, replacements are learned via other still-connected adjacency neighbors. The procedure can be seen as the converse procedure of Fig. 4 (node ▲ leave the network).

Fig. 4. Node Joining Procedure. (Left) Before node joining. (Right) After node joining.

A Distributed Energy-Efficient Topology Control Routing

137

3.3 Node Movement When a node moves, its position updates and transmission radius are sent to all connected neighbors. If the recipient is a periphery neighbors (the node whose adjacency neighbors may partially lie outside transmission range, we mark such nodes as ■. Notice that, some adjacency neighbors may be periphery neighbors at the same time.). it will check whether any of its adjacency neighbors tend to be visible to this moving node and send a Pre-register message with position update. In this way, new neighbors can estimate the connecting time and enhance the resilience of routing graph. When a new neighbor is detected, notifications are sent to the moving node for initiating communications. The moving node also disconnects any periphery neighbors that have left its transmission range. This procedure can be explained in Fig. 5.

Fig. 5. Node Movement. (Left) Before node movement. (Right) After node movement.

The three main procedures of Co-ETC protocol above illustrate that nodes communicate with each other efficiently and the changes to routing graph are localized. Besides, protocol is scalable and adaptive to mobile environment.

4 Extensions In this section, we consider two extensions of the Co-ETC protocol and present efficient distributed algorithms: 1) find an energy-efficient-path between nodes u and v in V; 2) adjust node degree of existing routing graph dynamically to avoid “bottleneck”. From section 2, we have known that the Gabriel Graph is energy-optimal since ρp(GG) = 1. However, we cannot use this graph directly in our extension because: 1) the Gabriel Graph may have long edges, while we are only allowed to connect points within limited transmission range and 2) the empty-disk rule is a global rule and is not suitable for local computation. To deal with those two problems, we consider the following algorithm to find an energy-efficient-path constructed by edges of Gabriel Graph according to existing Voronoi routing graph:

138

Y. Ren et al.

Algorithm 1: Energy-Efficient-Path (V, u, v) 1. From the source point u, node u adds an edge uv1 if and only if the disk(u, v1) does not contain any node from V. 2. Continually add all such edges into new routing graph until there is no edge which is belonged to the Gabriel Graph GG(V). Notice that, because the original graph is a subgraph of G, so not every edge of GG(V) can be added into new routing graph. 3. Assign each constructed edge vivj the weight equal to ||dij||α. 4. Run the Bellman-Ford distributed shortest path algorithm [6] to compute the shortest path connecting u and v, which has the minimum weight among all paths between u and v. The correctness of the algorithm is based on the following observation: If there is a sensor node vk inside disk(vi, vj), then ||dik||α ≤ ||dij||α and ||djk||α ≤ ||dij||α. It is obvious that the path vivkvj is in the GG(V). Thus, the path by substituting edge vivj with edges vivk and vkvj consumes less energy, which is a contradiction. Consequently, edge vivj must be a Gabriel edge. For this reason, those extra long edges that are not belong to old Delaunay Triangulation also will not be added into our new routing graph. According to the analyses in Section 2, neither DT(V) nor GG(V) has constant maximum node degree. It means that there will be bottleneck nodes in our routing graph, and these nodes are forced to connect beyond their capacities. So we adjust node degree of existing routing graph dynamically as following: Algorithm 2: Dynamic-Node-Degree-Adjustments (V) 1. For all nodes whose neighbors exceed the upper bound value Nmax, decreases its transmission power by ∆P once a step, until it connects Nmax neighbors. At the same time, original node sends Hello message and attaches to it with current link neighborhood list and current transmission power, denoted by Poriginal-node. 2. Whenever another node, which so far does belong to the neighborhood list, hears the Hello message of the original node for the first time. For avoiding onedirected link, it first compares Poriginal-node and Pitself. z If Poriginal-node > Pitself, it leaves its transmission power as before, whichever is smaller. z Or else it set its transmission power to the original node equal to Poriginal-node (adopts directional antennas for transmission) and then begins the same procedure like original node. 3. Node answers Hello message with Reply message and attaches to it only with current transmission power. 4. Node will restores to the preferred transmission power if the number of neighbors falls below the lower bound Nmin.

5 Evaluation and Simulation Results To demonstrate the feasibility and effectiveness of our Co-ETC protocol, we have done an across-the-board performance evaluation of this protocol according to several

A Distributed Energy-Efficient Topology Control Routing

139

vital criteria including network connectivity, network-lifetime, and robustness to mobility. We implement our Co-ETC protocol on the NS-2 simulator using an idealized MAC layer with fixed link error probability. The simulation environment is generated by using MATLAB. In simulation, the network nodes are uniformly distributed in a square area A of size 1000*1000. 5.1 Network Connectivity Perhaps, the most basic requirement of a topology is that it be connected. More precisely, to ensuring routing graph efficiently constructed, nodes in network must have enough initial transmission range R to reach its neighbors at the Route-GraphSetup stage (Section 3.1). In simulation, we assign it equal to the maximum transmission range Rmax. For every value of R considered in the simulations, we generated 100 random placements and, for every placement, we evaluated the percentage of nodes belong to the largest connected component. With different R, the average percentage of nodes which are in the largest strongly connected component related to the number of nodes is shown in Fig. 6.

Fig. 6. Percentage of nodes that belong to the largest connected component

As shown in Fig. 6, there are critical thresholds under different transmission range R. The sharpness of the threshold depends on the R. Based on these results; we can choose appropriate R for the initial scheme of different numbers of nodes. Fig.7 has shows the actual topologies for one simulated network with 100 nodes with maximum transmission range Rmax=250. Fig.7(a) and (b) are the topologies resulting from original graph G and our Co-ETC routing graph. As can be seen, Co-ETC maintains network connectivity in its routing graph.

140

Y. Ren et al.

5.2 Network-Lifetime We compared the Co-ETC protocol (after adding extensions) presented in the previous sections to K-NEIGH protocol [3] and CBTC algorithm [4]. Notice that, the CBTC algorithm (with α=2/3π) and K-NEIGH protocol were all with pruning optimization in simulation. Since CBTC algorithm does not provide routing by itself, we used the AODV [7] routing protocol for it in simulation. The same network setup was used to compare the implementations of routing protocols. 100 sensor nodes were randomly placed in area with R=250. Each node had initial energy of 0.5 Joule. Five of them were source nodes, which produce CBR data traffic. The length of a data packet was 64 bytes. One node was assign as sink node. Nodes moved randomly with constant speeds 10/ time. Node bounced back once hitting the boundary. Original topology

Co−ETC routing graph

1000

1000

800

800

600

600

400

400

200

200

0

0

200

400

600

800

1000

0

0

200

400

600

800

1000

Fig. 7. Topology graph. (Left) Original topology. (Right) Co-ETC routing graph.

In WSNs, the metric of actual interest is not the energy-efficient of per packet, but the whole operational lifetime of WSN. So we use network lifetime as the metric to evaluate the energy-efficient performance of protocols. We consider the network to be down until there is no data packet can be received by sink node. Fig. 8 illustrates the normalized network lifetime simulation results of Co-ETC, and the reference K-NEIGH and CBTC+AODV (α=2/3π) under different network loads in both static and mobile scenarios. It is shown that Co-ETC extends the network-lifetime significantly in the mobile scenario. Lifetime at most a factor of 2.2 those of K-NEIGH and CBTC+AODV could be reached. Although Co-ETC is designed to be efficient in dynamic WSNs, it also prolongs the lifetime 10-45 percent in static scenario. Contrary to K-NEIGH and CBTC, our Co-ETC protocol is almost independent of the choice of scenario: static or mobile. This is due to the fact that the constructed routing graph of Co-ETC has good property adapt to mobile environment.

A Distributed Energy-Efficient Topology Control Routing

141

Fig. 8. Network lifetimes of different schemes

6 Conclusion In this paper, we consider energy-efficient topology control routing in mobile wireless sensor networks. We present a distributed routing protocol, Co-ETC, and indicate that base on underlying routing graph, it has better energy-efficiency. Moreover, it can adapt to mobile environment well. Although Co-ETC outperforms other existing state-of-the-art algorithms and protocols in terms of network lifetime, and has desirable property including mobility, sparse, and localized. However, we have assumed that all nodes have circular communications ranges. In practice, this is often violated due to fading and multipath effects. It would be interesting to extend results of this paper to more realistic scenarios. Furthermore, we will further do research on using more complex mobility models such as group mobility. Acknowledgments. This work is supported by National Science Foundation of China. (No. 60473001, 60572037, 60573001), and the Innovation Foundation of Science and Technology for Excellent Doctorial Candidates of Beijing Jiaotong University (No. 48013).

References 1. Akyildiz, I.F., Su, W., Sankarasubramaniam, Y., Cayirci, E.: Wireless sensor networks: a survey. Computer Networks, Vol. 38. (2002) 393-422 2. Rajaraman, R.: Topology control and routing in ad hoc network: A survey. SIGACT News, Vol. 33. (2002) 60-73 3. Blough, D.M., Leoncini, M., Resta, G., Santi, P.: The k-neighbors protocol for symmetric topology control in ad hoc networks. Proc. ACM Mobihoc’03, (2003) 141-152

142

Y. Ren et al.

4. Li, L., Halpern, J.Y., Bahl, P., Wang, Y.M., Wattenhofer, R.: A cone-based distributed topology-control algorithm for wireless multi-hop networks. IEEE/ACM Transactions on Networking (TON), Vol. 13. (2005) 147-159 5. Wang, W., Li, X., Moaveninejad, K., Wang, Y., Song, W., The spanning ratio of β-skeletons. Proc. CCCG, (2003) 35-38 6. Cormen, T.J., Leiserson C.E., Rivest, R.L.: Introduction to Algorithms. Massachusetts: MIT Press/ New York: McGraw-Hill (1990) 7. Perkins, C.E., Royer, E.M.: Ad-hoc on-demand distance vector routing. Proc. IEEE WMCSA’99, (1999) 90-100

Integrated Clustering and Routing Strategies for Large Scale Sensor Networks Ataul Bari, Arunita Jaekel, and Subir Bandyopadhyay School of Computer Science, University of Windsor 401 Sunset Ave. Windsor, ON N9B 3P4, Canada {bari1,arunita,subir}@uwindsor.ca

Abstract. In two-tiered sensor networks using relay nodes, sensor nodes are arranged in clusters and the higher-powered relay nodes can be used as cluster heads. The lifetime of the networks is determined primarily by the lifetime of the relay nodes. Clustering techniques and routing schemes play a crucial role in determining the useful network lifetime. Traditionally, the clustering and the routing problems, for these networks, have been considered independently and solved separately. In this paper, we present a new integer linear program (ILP) formulation that jointly optimizes both clustering and routing to maximize the lifetime of such networks. We show that our integrated approach can lead to signiﬁcant improvements over techniques that consider clustering and routing separately, particularly for the non-ﬂow-splitting (single-path) routing model. We also propose a heuristic, based on LP-relaxation of the routing variables, which can be used for larger networks.

1

Introduction

A wireless sensor network (WSN) is a network of low-powered, multi-functional sensor nodes, each consisting, at a minimum, of a sensing circuit, a digital signal processor, and radio links [1], [2], [3]. It is extremely important to design routing protocols and algorithms that are energy eﬃcient, so that the overall lifetime of the network can be extended as much as possible. In a two-tier hierarchical architecture, the network is organized as a number of clusters where each sensor node belongs to only one cluster. Some nodes are treated as cluster heads and have additional responsibilities (e.g. data gathering, data aggregation and routing) compared to the remaining nodes. Recently, relay nodes, acting as cluster heads, have been proposed in two-tier sensor networks [2], [4], [5], [6], [7], [8], [9] for energy-eﬃcient data gathering, load-balancing and fault tolerance [2], [5], [7]. In relay-based networks, each relay node collects data from the sensor nodes in its cluster and forwards this data to the base station (or sink) using either single-hop or multi-hop communication model. The multi-hop data transmission model (MHDTM), [7], [8], [9], [10] is particularly suitable for larger networks and is the model used in this paper. A number of routing schemes for two-tiered networks have been proposed in the literature [2], [4], [5], [7], [8], [10]. Most of these adopt the ﬂow-splitting (also I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 143–154, 2007. c IFIP International Federation for Information Processing 2007

144

A. Bari, A. Jaekel, and S. Bandyopadhyay

called multi-path routing) model. In contrast, in a single-path routing model, a node is not allowed to split the traﬃc, and forwards all its data to a single neighbor. This model avoids many limitations of the ﬂow splitting model [7]. An important factor aﬀecting the lifetime of a two-tier network is the clustering scheme used to assign sensor nodes to the appropriate clusters [6], [11], particularly for single-path routing. Previous approaches to clustering and routing have considered the two problems independently. Typically, the assignment of sensor nodes to clusters is done ﬁrst, and then a routing scheme to maximize the network lifetime is calculated. In this paper, we present a new integer linear program (ILP) formulation that jointly optimizes both clustering and routing to maximize the lifetime of the upper-tier relay node network. We have assumed a network model where a) the roles of the sensor nodes and the relay nodes are not interchangeable, b) the relay nodes do not perform sensing tasks and are provisioned with higher energy, c) the relay nodes can transmit over larger distances, compared to regular sensor nodes, d) each sensor node, is located close enough to some relay node so that the sensor node can transmit directly to the relay node, e) sensor nodes only communicate to their respective cluster heads and do not take part in the routing, and f) both the sensor nodes and the relay nodes communicate through an ideal shared medium where communication between nodes is handled by appropriate MAC protocols (as in [2], [5]). We focus on the non-ﬂow-splitting (single-path) routing model and show that our integrated approach can lead to signiﬁcant improvements over techniques that consider clustering and routing separately. To the best of our knowledge, this is the ﬁrst technique that combines the clustering and the routing problem to maximize the network lifetime. We assume that the clustering and the routing scheme is computed at the base station (or at some centralized entity where such computation may be carried out). There are two possible scenarios for determining the positions of the relay nodes and the sensor nodes as follows: Case i) We place the sensor and the relay nodes at predetermined locations. Before the deployment of the network, we can compute the clustering and the routing decisions at some centralized location. The sensor nodes and the relay nodes may be pre-conﬁgured with this information. Case ii) We can ﬁnd the locations of the nodes using a GPS system. GPS equipped nodes has been widely proposed in the literature [1], [2], [5], [6]. The GPS system needs to be operated for a very short period of time, to know the locations of the sensor and the relay nodes. We also assume that the nodes are stationary after deployment. We can then compute, at some central location, the clustering and the routing decisions and broadcast the result to the entire network. Since the communication from each sensor or relay node as well as the communication broadcast from the base station to the sensor and relay nodes will be a single, small packet, the energy

Integrated Clustering and Routing Strategies

145

dissipated for sending/receiving these two packets is insigniﬁcant, compared to the subsequent transmissions, and will not have any substantial impact on the lifetime of the network. In this paper we have i) presented an integrated ILP for optimal clustering and single-path routing to maximize network lifetime, giving a signiﬁcant increase in network lifetime compared to solving the two problems separately. ii) proposed a LP-relaxation of the routing variables so that our approach can be used for multi-path routing as well. iii) extended the above ILP to develop a new heuristic for single-path routing capable of handling large networks, and have shown, through simulations, that the network lifetimes that may be achieved by the heuristic solutions is close to the theoretical upper bound.

2 2.1

Review Load Balanced Clustering

Many researchers have investigated clustering of nodes in a wireless network [2], [3], [5]. In [2], the problem of forming clusters around a few high-energy gateway nodes has been investigated. In Fig. 1, the sensor nodes in the shaded region can be assigned to any one of clusters A, B or C. The routing scheme and/or the energy dissipation of relay nodes A, B and C, may favor one assignment over the others. A load balanced clustering algorithm assigns each sensor node to an appropriate cluster to maximize the lifetime of the network. In [9], a minimum number relay nodes have been used as cluster heads to cover all sensor nodes. However, [9] does not address the issue of clustering after the placement of the relay nodes. In [2], the “cardinality” of a cluster (the number of sensor nodes associated with the cluster) is used in a heuristic to minimize the variance of the cardinality of each cluster in the network. In [6], it is demonstrated that suitable clustering techniques can be used to increase the lifetime of the network. 2.2

Routing in Sensor Networks Using Relay Nodes

The problem of routing in wireless sensor networks, under the “ﬂow-splitting” model, has been extensively covered in the literature. In [8], Hou et al. have attempted to maximize the lifetime of a sensor network by provisioning relay and sensor nodes with additional energy using a mixed-integer non-linear program and have proposed a heuristic. In [10], the authors have formulated the lifetime optimization problem, under the ﬂow-splitting model. In [12], Falck et al. have addressed the issue of balanced data gathering in sensor networks and have proposed a LP formulation that enforces some balancing constraints in the data gathering schedule. In [2], Gupta and Younis have focused on load balanced

146

A. Bari, A. Jaekel, and S. Bandyopadhyay

Relay nodes

C

A

B

Sensor nodes Fig. 1. Sensor nodes in overlapping coverage area

clustering and have proposed a heuristic solution for the optimization problem. Routing without ﬂow splitting (i.e., single-path routing) has been studied in [7], [13], and [14]. In [7], the authors have presented a transformation algorithm to convert a multiple outgoing ﬂow routing model to a single outgoing ﬂow routing model. In [14], the authors have investigated the problem of maximizing network lifetime by appropriately placing nodes which are not energy constrained (e.g., connected to a wall outlet). In [13], the authors propose a formulation for constructing minimum-energy data-aggregation trees, for a ﬂat architecture.

3 3.1

ILP Formulations for Optimal Clustering and Routing Network Model

In this paper, we have assumed that the communication energy dissipation is based on the ﬁrst order radio model [3], where the energy required to transmit (receive) b bits, at a distance d, is given by ETx (b, d) = α1 b + βbdm (ERx (b) = α2 b), where α1 (α2 ) is the energy coeﬃcient for transmitter (receiver), β is the energy coeﬃcient for the transmit ampliﬁer and q is the path loss exponent. For our network model, we consider a two-tiered wireless sensor network with n sensor nodes, m relay nodes and one base station. Each sensor node belongs to only one cluster and each relay node acts as the cluster head of exactly one cluster. We assign each node a unique label as follows: i) for each sensor node, a label i, 1 ≤ i ≤ n, ii) for each relay node, a label j, n < j ≤ n + m, and iii) for the base station, a label n + m + 1. Let S be the set of all sensor nodes. Each sensor node belongs to only one cluster and each relay node acts as the cluster head of exactly one cluster. In

Integrated Clustering and Routing Strategies

147

other words, if S j , n + 1 ≤ j ≤ m + n, is the set of sensor nodes belonging to the j th cluster, then S = S n+1 ∪ S n+2 ∪ . . . ∪S m+n and S j ∩ S k = ∅, ∀j = k, n + 1 ≤ j, k ≤ m + n. The set S j will constitute the cluster with the relay node having label j as the cluster head. As mentioned in the introduction, we assume that the locations of all the sensor nodes and the relay nodes are known (or can be determined), and the average amount of data generated by each sensor node is also known a priori. The data rate for all sensor nodes need not be uniform, but can vary from node to node. We also assume that the placement strategy, applied during the deployment phase of the network, ensures proper coverage of each sensor node and the connectivity of the relay node network. In our model, data gathering is proactive, i.e., data are collected and forwarded to the base station periodically, following a predeﬁned schedule. We refer to each period of data gathering as a round [10]. In each round of data gathering, each relay node gathers the data it receives from its own cluster and transmits that data, either directly to the base station (single hop model) or forwards the data towards the base station using a multi-hop path (multi-hop model). In the case of multi-hop routing, in addition to the data generated by its own cluster, each relay node also relays any data it receives from neighboring relay nodes. We will measure the lifetime of the network by the number of rounds the network operates from the start, until a relay node depletes its energy completely and ceases to function. 3.2

Notation Used

In our formulation we are given the following data as input: • • • • • • • • •

α1 (α2 ): Energy coeﬃcient for transmitter (receiver). β: Energy coeﬃcient for ampliﬁer. q: Path loss exponent. bi : Number of bits generated by sensor node i per round. n (m): Total number of sensor (relay) nodes, with each sensor (relay) node having a unique index lying between 1 and n (n + 1 and n + m). n + m + 1: Index of the base station. C: A large constant, greater than bi , the total number of bits received by the base station in a round. rmax (dmax ): Transmission range of each sensor (relay) node. di,j : Euclidean distance from node i to node j.

We also deﬁne the following variables: • Xi,j : Binary variable deﬁned as follows: ⎧ 1 if sensor node i belongs to the cluster ⎪ ⎪ ⎨ of relay node j, Xi,j = ∀i, j : 1 ≤ i ≤ n, n + 1 ≤ j ≤ n + m, ⎪ ⎪ ⎩ 0 otherwise.

148

A. Bari, A. Jaekel, and S. Bandyopadhyay

• Yj,k : Binary variable deﬁned as follows: 1 if relay node j transmits to relay node k, Yj,k = 0 otherwise. • Bj : Total number of bits generated by the sensor nodes belonging to cluster j in one round. • fj,k : Number of bits sent by relay node j to relay node k in one round. • Rj : Number of bits received by relay node j from other relay nodes in one round. • Tj : Number of bits transmitted by relay node j in one round. • Fmax : The total energy spent per round by the relay node which is being depleted at the fastest rate. 3.3

ILP Formulation for Non Flow-Splitting Model (ILP-NFS)

Given the network as described in Section 3.1, the objective of this formulation is to maximize the lifetime of the network by ﬁnding an optimal clustering and routing scheme. Minimize Fmax

(1)

Subject to: a) A sensor node i can transmit to a cluster-head j, only if the distance between i and j is less than the range rmax of the sensor node. Xi,j · di,j ≤ rmax ,

∀i, 1 ≤ i ≤ n,

(2)

∀j, n < j ≤ n + m b) A sensor node must belong to exactly one cluster. n+m

Xi,j = 1,

∀i, 1 ≤ i ≤ n

(3)

j=n+1

c) Compute the total number of bits Bj received at relay node j from its own cluster in one round of data gathering. n

bi · Xi,j = Bj ,

∀j, n < j ≤ n + m

(4)

i=1

d) Relay node j can only transmit to one relay node or to the base station (non-ﬂow-splitting constraint). n+m+1 k=n+1

Yj,k = 1,

∀j, n < j ≤ n + m

(5)

Integrated Clustering and Routing Strategies

149

e) Compute the total number of bits Tj transmitted by relay node j in one round of data gathering. Tj =

n+m+1

fj,k ,

∀j, n < j ≤ n + m

(6)

k=n+1

f) Compute the total number of bits Rj received at node j (a relay node or the base station) from other relay nodes in one round of data gathering. Rj =

n+m

fk,j ,

∀j, n < j ≤ n + m + 1

(7)

k=n+1

g) Ensure that relay node j is transmitting to relay node k, only if relay node k is the next node in the multi-hop path from j to the base station (or k is the base station). fj,k ≤ C · Yj,k ,

∀j, k, n < j ≤ n + m,

(8)

n
∀j, n < j ≤ n + m

(9)

i) A source relay node can transmit to a destination relay node (or base station) if the destination node is within the transmission range of the source node. (10) Yj,k · dj,k ≤ dmax , ∀j, k, n < j ≤ n + m, n
n+m+1 k=n+1

fj,k · dqj,k ≤ Fmax ,

(11)

∀j, n < j ≤ n + m

In evaluating ILP-NFS, we have computed the lifetime as the number of rounds until one relay node becomes completely depleted. Although ILP-NFS minimizes the maximum load in any relay node in a round, it is possible that, after the ﬁrst relay node is depleted, other relay nodes might still have some residual power. We could have used the surviving relay nodes to further extend the lifetime of the network. The rescheduling strategy we have proposed in [4] may be used, where the routes are recomputed at periodic intervals1 , taking into consideration the available residual energy of each relay node. We have shown in [4] that it is possible to increase signiﬁcantly the lifetime of the network by this technique. We may apply the same technique to this formulation as well. We omit the details since the technique is essentially the same. 1

Clustering was not considered in that paper.

150

4

A. Bari, A. Jaekel, and S. Bandyopadhyay

LP-Relaxation Based Heuristic

The ILP formulation presented in the previous section guarantees an optimal solution that maximizes the lifetime of the upper-tier relay node network. However, this formulation becomes computationally intractable for larger networks. In this section, we propose a heuristic approach based on a LP-relaxation of the routing variables Yj,k . In this approach, we ﬁrst solve the ILP (ILP-FS) for the combined clustering and routing problem under the ﬂow-splitting model. This means that we allow the traﬃc to be split among diﬀerent nodes, and no longer require the integer variables Yj,k . The solution obtained for the ﬂow-splitting model is then used to guide the ILP for the non-ﬂow-splitting model. The idea is that, we deﬁne a small set P j of “promising” relay nodes, for node j, for all j, n < j ≤ n + m. The nodes in P j are selected, based on the nodes that j transmits to, in the solution generated by ILP-FS. When looking for the next node, in the path from relay node j to the base station, the new formulation only considers the nodes in P j . 4.1

ILP for Flow-Splitting Model (ILP-FS)

The formulation for the ﬂow-splitting model is similar to ILP-NFS presented in the previous section, with the following modiﬁcations. 1. The variables Yj,k are eliminated. Since, we are allowed to arbitrarily split the traﬃc, the integer routing variables Yj,k are no longer needed and traﬃc ﬂows can be determined by the continuous ﬂow variables fj,k . 2. Since a relay node can transmit to any number of other nodes, constraint (5) is no longer needed, and is removed. 3. Since there can be non-zero ﬂow to more than one node, constraint (8) is not needed, and is removed. 4. Constraint (10) is modiﬁed as follows: fj,k = 0,

∀j, k, dj,k > dmax

(12)

All other variables and constraints, as well as the objective function are identical to ILP-NFS. 4.2

Non Flow Splitting-Heuristic (NFS-H)

In the heuristic, we ﬁnd the routing scheme in three steps. The ﬁrst step in the heuristic is to run ILP-FS to identify the set, P j , of promising relay nodes for the j th relay node. In the second step, we set Yj,k = 0, for all relay nodes not in P j . Finally, we run ILP-NFS to generate a solution for combined clustering and routing, under non-ﬂow splitting model. By setting Yj,k = 0, in step 2, for all relay nodes not in P j , ILP-NFS is forced to select one of the nodes in P j as the next node in the path from node j to the base station. By extensive simulation, we have observed that the size of P j , for each node j, is typically from 1 to 4. This considerably reduces the complexity for ILP-NFS and we ﬁnd that ILP-NFS quickly converges to a solution.

Integrated Clustering and Routing Strategies

5 5.1

151

Experimental Results Experimental Setup

We have carried out a number of experiments to test the eﬀectiveness of our formulations, with diﬀerent sensor and relay node distributions. We have measured the achieved lifetime of a network by the number of rounds until the ﬁrst relay node runs out of battery power. The ILP formulations were solved using CPLEX version 9.1 [15]. In the ﬁrst set of experiments, we consider networks with up to 15 relay nodes, and sensor nodes randomly distributed over the region. We varied the number of sensor nodes from 100 to 500 nodes. To evaluate the performance of our approach, we compared the achieved lifetime with standard multi-hop routing schemes such as the minimum hop (MH) routing and the minimum transmission energy (MTE) routing. We also considered an optimal single-path routing scheme (ILP-R) [4], which maximizes the network lifetime for a speciﬁed clustering strategy. For each routing scheme, we experimented with a number of standard clustering techniques [11] such as: i. Greedy Clustering (GC): Each relay node greedily selects all sensor nodes which can communicate with it, and has not yet been assigned to a cluster. ii. Least Distance Clustering (LDC): Each sensor node transmits to the closest relay node. iii. Minimum Variance Clustering (MVC): The clustering algorithm attempts to distribute sensor nodes uniformly over the relay nodes, such that the variation in cluster size is minimized. In the following sections, we have shown the results corresponding to the MVC clustering algorithm, which is the best performing algorithm as reported in [6]. The improvements obtained with respect to GC and LDC are comparable to MVC, or even higher. The ﬁrst set of experiments was used to calibrate the performance of the heuristic with respect to the optimal. In the second set of experiments, we considered much larger networks with up to 44 relay nodes and 5000 sensor nodes. For such networks, it was not possible to obtain optimal solutions using the ILP for the non-ﬂow-splitting model. So, we compared our heuristic with MTE and MH routing. We have assumed that the communication energy dissipation is based on the ﬁrst order radio model, described in Section 3.1. The values for the constants are taken as α1 = α2 = 50nJ/bit, β = 100pJ/bit/m2 and the path-loss exponent, q = 2, which are similar to [3]. The range of each sensor (relay) node is set to 40m (200m) and the initial energy of each relay node is set to 5J, as in [9]. 5.2

Performance Evaluation for Moderate Sized Networks

In this section, we present the results for a 160m × 160m area network with 12 relay nodes. Figure 2 shows the network lifetime obtained by using our integrated approach versus the MH, MTE, and ILP-R routing strategies combined with MVC clustering scheme, for the non-ﬂow-splitting model. For each data set, we

152

A. Bari, A. Jaekel, and S. Bandyopadhyay

represent the lifetimes achieved by the diﬀerent strategies in the following order: minimum-hop routing (MH), minimum transmission energy routing (MTE), optimal routing performed separately after clustering (ILP-R), LP-relaxation based heuristic for combined clustering and routing (NFS-H), and optimal solution for combined clustering and routing (ILP-NFS). 4

5

x 10

MH MTE ILP−R NFS−H ILP−NFS

4.5

4

Lifetime in rounds

3.5

3

2.5

2

1.5

1

0.5

0

100

200

300

400

500

Number of sensor nodes

Fig. 2. Comparison of achieved lifetimes with MVC scheme

The data clearly indicate that an integrated approach performs signiﬁcantly better than traditional routing schemes. The average achieved lifetime obtained by ILP-NFS increased by over 300% (over 200%) compared to traditional routing schemes such as MH (MTE). Even when compared to the optimal routing generated by ILP-R, ILP-NFS produced an average improvement of 23%. The results also show that the performance of our heuristic (NFS-H) is quite close to the optimal (within 10% - 15%) in all cases. 5.3

Performance Evaluation for Large Scale Networks

In the previous sections we compared our heuristic (NFS-H), with the optimal solution and have shown that its performance is close to optimal for small networks. For large networks, it was not possible to generate optimal solutions for the non-ﬂow-splitting model, using ILP-NFS. The optimal routing formulation (ILP-R) in [4] was also unable to generate solutions for larger networks. Therefore, in this section, we compare the performance of our heuristic with traditional MH and MTE routing schemes. We also report the achieved lifetime obtained using ILP-FS. This provides an upper bound on the maximum achievable lifetime for each experimental run.

Integrated Clustering and Routing Strategies

153

4

3

x 10

MH MTE NFS−H ILP−FS

Lifetime in rounds

2.5

2

1.5

1

0.5

0

12

15

24

36

44

Number of relay nodes

Fig. 3. Comparison of achieved lifetimes for moderate to large networks

Figure 3 shows how the achieved lifetime varies with the size of the network. For a speciﬁc size of relay node network, we selected the number of sensor nodes so that the average cluster size remains the same for all networks. The sensing area varied from 160m × 160m for a network with 12 relay nodes to 400m × 280m for a network with 44 relay nodes. As before, Figure 3 conﬁrms that the combined approach clearly performs better than both MH and MTE for large networks. The average improvement of our combined approach is 300% and 200% over MH and MTE respectively. The overall lifetime decreases with network size, since the sensing area as well as the total amount of data to be transmitted increases, which in turn increases the load on critical nodes close to the base station.

6

Conclusion

In this paper we have proposed an integrated approach that jointly optimizes clustering and routing in large scale two-tier sensor networks. We have presented an ILP formulation that maximizes the lifetime of the upper-tier relay node network, as well as a LP-relaxation based heuristic that can be used for large networks with thousands of nodes. We have calibrated the performance of the heuristic, by comparing with optimal solutions for smaller networks. We have demonstrated that our combined approach signiﬁcantly increases network lifetime for non-ﬂow-splitting (single-path) routing. Our proposed heuristic clearly outperforms traditional routing schemes such as minimum-hop routing and minimum-transmission-energy routing, for large scale networks.

154

A. Bari, A. Jaekel, and S. Bandyopadhyay

Acknowledgment The work of A. Jaekel and S. Bandyopadhyay have been supported by research grants from the Natural Sciences and Engineering Research Council of Canada (NSERC).

References 1. I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. Wireless sensor networks: a survey. Computer Networks, vol. 38, pp. 393-422, 2002. 2. G. Gupta and M. Younis. Load-balanced clustering of wireless sensor networks. In IEEE Intl. Conf. on Comm., vol. 3, pp. 1848-1852, 2003. 3. W. Heinzelman, A. Chandrakasan, and H. Balakrishnan. Energy eﬀcient communication protocol for wireless micro-sensor networks. In Proc. of the 33rd HICSS, pp. 3005-3014, 2000. 4. A. Bari, A. Jaekel and S. Bandyopadhyay. Maximizing the Lifetime of Two-Tiered Sensor Networks. In IEEE EIT, pp. 222–226, 2006. 5. G. Gupta and M. Younis. Fault-tolerant clustering of wireless sensor networks. Proceedings of IEEE WCNC. pp. 1579–1584, 2003. 6. G. Gupta and M. Younis. Performance evaluation of load-balanced clustering of wireless sensor networks. 10th Inter. Conf. on Telecomm., vol. 2, pp. 1577–1583, 2003. 7. Y. T. Hou, Y. Shi, J. Pan, and S. F. Midkiﬀ. Lifetime-optimal data routing in wireless sensor networks without ﬂow splitting. In Workshop on Broadband Adv. Sensor Networks, San Jose, CA, 2004. 8. Y. T. Hou, Y. Shi, H. D. Sherali, and S. F. Midkiﬀ. On Energy Provisioning and Relay Node Placement for Wireless Sensor Networks. In IEEE SECON, vol. 32, 2005. 9. J. Tang, B. Hao, and A. Sen. Relay node placement in large scale wireless sensor networks. Comp. Comm., vol. 29(4), pp. 490–501, 2006. 10. K. Kalpakis, K. Dasgupta, and P. Namjoshi. Maximum Lifetime Data Gathering and Aggregation in Wireless Sensor Networks. IEEE Inter. Conf. on Network., 2002. 11. A. Bari, A. Jaekel and S. Bandyopadhyay. Optimal load balanced clustering in two-tiered sensor networks. In IEEE BaseNets, 2006. 12. E. Falck, P. Floren, P. Kaski, J. Kohonen, and P. Orponen. Balanced data gathering in energy-constrained sensor networks, vol. 3121 LNCS, pp. 59-70, Berlin Heidelberg, 2004. Springer-Verlag. 13. M. Borghini, F. Cuomo, T. Melodia, U. Monaco and F. Ricciato. Optimal data delivery in wireless sensor networks in the energy and latency domains. WICON, pp. 138–145, 2005. 14. M. Yarvis, N. Kushalnagar, H. Singh, A. Rangarajan, Y. Liu nad S. Singh. Exploiting heterogeneity in sensor networks. INFOCOM 2005, vol. 2, pp. 878–890, 2005. 15. ILOG CPLEX 9.1 Documentation. Available at the website http://www.columbia. edu/˜dano/resources/cplex91 man/index.html

On-Demand Routing in Disrupted Environments Jay Boice, J.J. Garcia-Luna-Aceves, and Katia Obraczka Department of Computer Engineering University of California at Santa Cruz

Abstract. While current on-demand routing protocols are optimized to take into account unique features of mobile ad hoc networks (MANETs) such as frequent topology changes and limited battery life, they often do not consider the possibility of intermittent connectivity that may lead to temporary partitions. In this work, we introduce the Space-Contentadaptive-Time Routing (SCaTR) framework, which enables data delivery in the face of both temporary and long-lived MANET partitions. SCaTR takes advantage of past connectivity information to eﬀectively route trafﬁc towards destinations when no direct route from the source exists. We show through simulations that, when compared to traditional on-demand protocols, as well as Epidemic routing, SCaTR increases delivery ratio with lower signaling overhead in a variety of network scenarios with intermittent connectivity. We also show that SCaTR performs as well as on-demand routing in scenarios that are well-connected, and/or have no mobility predictability (e.g., scenarios with random mobility).

1

Introduction

The price, performance, and form factors of sensors, processors, storage elements, and radios today are enabling the development of network-supported applications in very disrupted environments, i.e., environments where end-to-end connectivity is not guaranteed at all times because of either the characteristics of the environment or the normal operation of the network nodes. Examples of such applications and environments include monitoring of disrupted phenomena (e.g., wild ﬁres), object tracking, establishment of on-demand network infrastructure for disaster relief or military purposes (in which case, the ad hoc network can be disrupted by terrain, weather, and other natural phenomena, as well as jamming, interference, etc.), peer-to-peer vehicular or interpersonal networks [1] with very sparse connectivity, and mobile ad hoc networks (MANETs) that need not be connected at all times in order to limit interference and contention. In these scenarios, network disconnection is the normal state of operation rather than an exception. The demand for networking in environments prone to intermittent connectivity poses a challenge because the architects of the IP Internet and MANETs have assumed that physical connectivity exists on an end-to-end basis between sources and destinations for extended periods of time, or at least for the duration of a transaction among communicating parties. This assumption has had I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 155–166, 2007. c IFIP International Federation for Information Processing 2007

156

J. Boice, J.J. Garcia-Luna-Aceves, and K. Obraczka

profound implications on how communication bandwidth is shared, how routing is accomplished, and how messages are disseminated across computer networks. In particular, routing in packet-switching networks has been based on routing tables that specify the next hop to one or more destinations. Such routing information is derived entirely from topology (or connectivity) information that represents only a snapshot of the state and characteristics of network links at particular instants. Regardless of the speciﬁc mechanisms used in a routing protocol today (e.g., proactive or on-demand routing), computing the routing table entry for a given destination can be viewed as a particular form of searching a database. The routing database can be replicated (as it is done in such topology broadcast approaches as TBRPF [2] and OLSR [3]) or distributed, as in AODV [4] or DSR [5]. Depending on whether the routing database is replicated or distributed, the search algorithm can be centralized (e.g., using Dijkstra’s shortest path ﬁrst algorithm) or distributed (e.g., using a ﬂood search based on route requests and route replies). The routing databases constructed by traditional routing algorithms specify the instantaneous status of a link (up or down), and the value of its parameters such as delay and bandwidth at some speciﬁc point in time. The search for routes in such databases produces snapshot paths that have no temporal dimension. Hence, if the network connectivity or link parameters change, multiple paths to destinations may be aﬀected; the only way most current routing protocol can recover is to search for new paths. This time-independent, reactive approach to changes in network connectivity and link parameters works well as long as the disruptions in network connectivity due to environmental or operational reasons are not so frequent and/or long-lived that they prevent the routing protocol from obtaining time-independent paths to intended destinations. Starting with the work in the Interplanetary Internet Research Group (IPNRG) [6] of the IRTF (Internet Research Task Force), considerable eﬀort has recently been devoted to the study of networks with intermittent connectivity or very long latencies. Perhaps most prominent in this area is the work by the DTNRG (Delay Tolerant Networking Research Group) [7], which started in 2002 under the IRTF. Section 2 summarizes prior related work on routing in disrupted environments. From our summary of related work, it becomes apparent that no complete solution exists for on-demand routing that incorporates the network topology’s time dependency. In this paper we describe the SCaTR (Space Content adaptive Time Routing) framework to enable on-demand routing in MANETs with intermittent connectivity. Section 3 describes SCaTR which we currently implement by extending the On Demand Ad Hoc Distance Vector Routing Protocol [4] (AODV). Our current instantiation of SCaTR is such that, if the network is connected, it operates exactly as regular on-demand routing, in this case AODV. However, if no direct route is available from source to destination, a node that is deemed closer to the destination than the source will advertise itself as a proxy. In this manner, we are assured that the resulting protocol will do no worse than standard AODV in well-connected environments, and better in partitioned networks.

On-Demand Routing in Disrupted Environments

157

Section 5 addresses the performance of SCaTR compared to on-demand and Epidemic routing. Scenarios involving both random and predictable node mobility are investigated. In predictable mobility scenarios, node schedules or trajectories are not assumed to be global knowledge. Instead, the routing algorithm in SCaTR uses mobility histories to improve performance. Our simulation results show that the added functionality of proxies in SCaTR improves delivery rates in both predictable and random mobility situations, while incurring lower signalling overhead. Given enough time, the protocol delivers all possible packets to their intended destinations, achieving optimal reliability. Section 6 summarizes our contributions and discusses ideas for future work.

2

Related Work

Recently, Message Ferrying [8] has been investigated for use in highly partitioned networks. The approach utilizes special nodes called ferries whose mobility can be controlled to maintain communication between partitions. Much of the work has focused on route scheduling of the ferries, and synchronization between their routes, since a well-chosen schedule will have a great impact on timely and reliable message delivery. It has also been shown that the ferries can be used as an energy saving device for other nodes in the network; if there are no ferries nearby, nodes can be turned oﬀ to conserve energy. This work makes the assumption that controllable nodes exist in the network, however, there are situations in which these nodes are not available. Our work addresses these situations. In the Epidemic routing [9] approach, data are transmitted through message exchanges between neighboring nodes. The premise behind the work is that by duplicating a message and sending it to all neighbors, it can quickly and reliably reach its destination. Similar work [10] has been done to improve Epidemic routing by limiting the number of nodes to which messages are sent. Epidemic routing, however, relies on data duplication to deliver messages which can be prohibitively expensive in terms of energy consumption. Delay Tolerant Networking [7] has attracted considerable interest from the network research community for several years now. Additional layers atop the routing and transport layers have been proposed to provide store and forward capabilities, as well as interoperability services to partitioned networks. This work has also examined the idea of custody [11], which is a hop-by-hop method of reliability for disconnected networks. This is a much-needed approach to the issue of reliability, as end-to-end reliability is unmanageable in these scenarios. A more general framework for routing using knowledge of network mobility has also been proposed [12]. This approach adds the time dimension to routing tables, and selects routes based on a combination of the data’s destination and the time of message arrival. It uses global knowledge of node mobility schedules to construct these routing tables. In some scenarios, these schedules may not be available, or too expensive to maintain. Spray and Wait [13] is a recent improvement over a pure ﬂooding protocol such as Epidemic routing. In this work, only the source can replicate a message, and

158

J. Boice, J.J. Garcia-Luna-Aceves, and K. Obraczka

the amount of replication is proportional to the number of nodes in the network. It is shown that the method can bound delay proportional to the optimal delay. After ’spraying’ several copies of a message, the host ’waits’ until one is delivered. Location information is a useful tool for routing in disrupted networks. Both MobySpace [14] and MV routing [15] are methods that use location information to aid routing decisions. They assume that a node who has visited a particular location is likely to revisit it, and therefore is a good candidate to carry messages to that location. MV routing uses location information to facilitate buﬀer management, while MobySpace is a framework for generating probabilities that a nodes will move to speciﬁc locations in the future. Both methods require some sort of localization method such as GPS. While there has been signiﬁcant prior work on the topic of partitioned networks, most of the approaches have made one of three assumptions. One assumption is global topology knowledge, which has proved to be useful in determining future connectivity. Another assumption is the existence of controllable nodes, which can be extremely useful in aiding delivery in sparse networks. The third assumption is that data can be duplicated freely among nodes, which can lead to excellent delivery ratios. Our work is an attempt to do away with these assumptions, yielding a more general framework applicable to any network scenario where any such assumption may be unrealistic.

3

SCaTR

The SCaTR framework extends on-demand routing, taking action only when direct routes cannot be established by the underlying protocol. In the case of a route discovery failure, i.e., the source and destination are in separate partitions, SCaTR tries to route data to the node or nodes in the source’s partition that are likely to have a route to the destination in the near future. These nodes act as proxies for the destination and buﬀer messages until either the destination is discovered, or another node is selected as a better proxy for those messages. Messages are replicated at most one time, resulting in minimal data replication and duplicate message ﬁltering overhead. Proxies are selected based on past connectivity information which nodes keep in content-adaptive contact tables. As it will become clear, contact tables, which are the equivalent of traditional routing tables, use time-dependent and spacedependent routing metrics. These metrics diﬀer for diﬀerent types of content or local constraints such as buﬀer size. For instance, if a proxy is running low on buﬀer space, it may decide to select as the next proxy for that destination the ﬁrst node it hears from that has been in contact with the destination; this is done even if the node’s contact value is lower that its own; however, if the node has higher buﬀer availability, it can carry the data for a longer interval. Because SCaTR takes no action if routes are successfully established, we are guaranteed that it will perform no worse than the underlying on-demand routing protocol in any situation. This diversity makes it well suited for adoption in any network scenario. SCaTR consists of several phases: contact table maintenance,

On-Demand Routing in Disrupted Environments

159

route discovery, route selection, and proxy rediscovery. Each phase is described in detail below. Each node in the network maintains a contact table containing a measure of time-dependent distances to other nodes in the network. Each entry in the table consists of a destination address and its current contact value. Nodes maintain these tables with information that is piggybacked onto hello messages; when a node receives a hello message from a neighbor, it also receives that node’s contact table. It uses this table to make changes to its own contact table. Because it can be very expensive to maintain contact information about all nodes in a network, SCaTR initializes its contact tables on-demand. Each node starts with an empty table, and adds destinations only when it receives a request for that destination, or meets another node who has an entry in its table for that destination. This method of initialization results in a delay for the ﬁrst messages introduced into the network because the contact table request must propagate to the destination and then back to the source before proxies are advertised. In large networks with predeﬁned sinks or a limited number of destinations, however, this method saves signiﬁcant computational and communication overhead. The mechanisms used to maintain and update contact values have several important characteristics that must be addressed: – Ordering: The highest contact value advertised for any node must be advertised by the node itself. A node that will reach a destination sooner than another node should have a higher contact value for that destination. – Stability: Contact values should not ﬂuctuate greatly over time in order to minimize ’leapfrogging’ of contact values (the case in which nodes alternately select each other as proxies due to asynchronous contact value updates). – Simplicity: Contact value calculation should not be computationally complex, as it must be frequently invoked. To address these issues, we propose an algorithm that is initiated with a maximum contact value advertised by destinations themselves. Time is broken into hello intervals. During a hello interval, nodes maximize their neighbors’ advertised contact values for each destination. These values are averaged over a hello period, and each node maintain a window of periods, used to generate the next interval’s contact value for each destination. A more detailed description of contact value maintenance follows. The length of a hello interval is the amount of time between hello message advertisements. During each hello interval, a node maintains the maximum contact value it receives from its neighbors for each destination. The hello period consists of a sequential group of hello intervals as deﬁned by the parameter τ . At the end of a hello period, a node averages the values obtained during each hello interval to obtain a single value for the entire period for each destination. The length of time for which a node retains contact information is speciﬁed by the period window κ. At the end of each hello period, the value obtained during the most recent period is added to a queue, and the oldest value is removed. Each node’s contact values are recalculated at the end of each period by averaging the

160

J. Boice, J.J. Garcia-Luna-Aceves, and K. Obraczka

most recent κ periods. Until the node’s contact value is updated again, it will advertise this value to its neighbors. The main mechanisms for route discovery in the SCaTR framework are the Proxy Request (PREQ) and Proxy Reply (PREP) messages. When the underlying routing protocol is unable to establish a route to a destination, the source node will issue a PREQ to its current partition. This is a request for a candidate node or nodes in the source’s partition to buﬀer messages and carry them towards their destination. The PREQ can request one or more destinations to reduce signaling overhead. Instead of sending out an individual PREQ for each destination, the node will send a single PREQ with all destinations for which it has buﬀered data. The PREQ also contains the source node’s contact value for each destination that is being requested. The contact value is included so that only nodes with a signiﬁcantly better value reply to the request. In our implementation, we have used a threshold of 5% to determine whether or not a proxy is better; a proxy is only selected if its contact value is 5% higher than the node at which the messages currently reside. Diﬀerent threshold values could be used and will determine how selective a node is in choosing proxies. In preliminary experiments, we found 5% to be an eﬀective threshold to balance overhead and delivery ratios. The PREP is a message that advertises the responding node as a possible proxy for one or more of the destinations in the PREQ. The source can then decide which proxy or proxies to use for each destination. The PREP contains the proxy’s contact value for the destination, as well as its remaining buﬀer space, and the number of messages it has already buﬀered for the source-destination pair. It is important to note that the PREP speciﬁes the destination, not the proxy itself. The source need not know the address of the proxy; it must only know the next hop towards the destination. The source-destination pair obtained from the PREQ packet is entered into a buﬀering table at the proxy, and any packets that are later received by the proxy with this source-destination pair are buﬀered. As the source collects PREPs from the nodes in its partition, it compares contact values for each of the destinations. If a new PREP has a higher contact value than the one currently in its routing table, it replaces the entry. Routing tables may now contain two types of routes. One is an active, connected route, while the other is a route to a destination that was advertised by a proxy. These are distinguished only because a reply that advertises a direct route will always take precedence over a proxy route. After a proxy buﬀers packets, it must ensure that those packets reach their destination. Proxies have several methods of initiating the route discovery process for buﬀered messages. One method is ’listening’ as updates to its contact table are received. If it receives an improved contact value for a destination for whom it has buﬀered packets, the node assumes that a better route to the destination is available in its partition, or that the destination itself is nearby. This information initiates route discovery behavior in the proxy. The proxy sends out a single PREQ for all nodes for which it has buﬀered packets. In the same

On-Demand Routing in Disrupted Environments

161

manner as the route discovery process described above, nodes reply to this request, and the proxy selects the best next destination for the data. The proxy also listens to any relevant RREPs or PREPs that it relays, and if route information for destinations in its buﬀer is relayed, a route is established along the advertised path. In the case that a proxy does not make any updates to the contact value for a buﬀered destination, or relay a reply for that destination, it sends a PREQ periodically. There are many approaches to message replication, ranging from the minimal approach of a single copy of each message, to the unlimited duplication (up to the number of nodes in the network) of Epidemic routing. More message redundancy requires increased overhead in the form of replicated data, as well as the computational and memory requirements required to maintain buﬀers and detect duplicate messages. One approach to the problem is to allow each proxy to replicate a message a certain number of times. However, as the network scales, it is clear that this number will grow exponentially. In addition, a proxy holding a message has no indication of whether another proxy has successfully delivered the message already. Thus, any replicas made by this proxy are in waste. This clearly is not a feasible solution. It is also possible to control replication in a tree fashion with the message source as the root of the tree. There can be a deﬁned branching factor or replication degree that each level of the tree is permitted to use. Using the depth of a message in the tree and the branching factor, one can specify an exact number of duplications for each message. This reduces message duplication, but has the same problem of communicating message deliveries as mentioned above. In SCaTR, we have decided to minimize the amount of message replication. Prior work by one of the authors [16] has indicated analytically that under constrained conditions, the optimal number of message duplications is one. The overhead resulting in any further duplication is not justiﬁed by the increase in delivery ratio. For these reasons, the source can select at most two initial proxies for message transmission, and proxies may not duplicate messages.

4

Simulation Setup and Mobility Models

The SCaTR framework is implemented as an extension to the AODV routing protocol and simulated in GloMoSim [17] with several mobility scenarios and connectivity models. Epidemic routing was implemented for comparison to SCaTR, in which messages are passed to all neighbors. Throughout the experiments, 18 CBR data ﬂows generate messages at ten second intervals for the ﬁrst 400 seconds of the simulation. The experiments run for 2000 seconds to give messages time to propagate to their destination, and use an 802.11 MAC layer. All experiments were run with 5 seed values and the results averaged. The two mobility scenarios described below, in addition to the random waypoint model, show the eﬀectiveness of the protocol over varied topologies and connectivity models. In the gridded random waypoint setup, nodes are arranged in a square ﬁeld with a 377m radio propagation range. Within each square, a ﬁxed number of

162

J. Boice, J.J. Garcia-Luna-Aceves, and K. Obraczka

nodes move according to the random waypoint mobility model; random locations are selected within the square, and a node moves there at a rate of between 5m/s and 10m/s. After reaching its destination, a node pauses for 20s. The ﬂows exist between nodes on opposite ends of the grid to provide maximum route lengths. Throughout the experiments with this mobility model, the number of nodes and ﬂows are ﬁxed, while the dimensions of the scenario are varied to provide more or less connectivity in the network. In the scheduled routes setup, 50 nodes are arranged in a square ﬁeld with a radio propagation range of 377m. Ten nodes are positioned around the perimeter of the network, and act as sources and destinations. All other nodes are assigned a randomly sized and positioned rectangle over which to travel at a random speed of between 5m/s and 20m/s. Using these parameters, links will be somewhat predictable, although they will not always occur at the same interval. Throughout the experiments, the size of the scenario is increased to provide less connectivity in the network. We also run simulations with the random waypoint mobility model to illustrate the performance of SCaTR in a purely random network; in this case, past mobility information gives no indication of future topologies. This scenario is especially interesting as it is not favorable to SCaTR. 40 nodes move according to the random waypoint mobility model within a square area 3000m x 3000m. Due to the sparseness of the network, the connectivity is poor. The experiments vary the speed of each node to change the duration of connected paths in the network. Pause time remains a constant 30s throughout the simulations.

5

Performance Comparison

In this section we show that the addition of SCaTR to an on-demand routing protocol results in signiﬁcantly improved performance in terms of delivery ratio and signaling overhead in all mobility scenarios that were tested. In these experiments, signaling overhead includes control messages such as route requests and replies, as well as data packet replicas. Because the networks simulated are sparse, it is often not possible to deliver all messages in the allotted time. All simulations were run for 2000 seconds, and it should be noted that undelivered packets after this time were not lost; they have not yet reached their destination. It would be possible to run much longer simulations, and have much higher delivery ratios, but the time selected gives a good indication of the relative performance between protocols. In a subsequent experiment, we show how lengthening simulation time aﬀects performance. We ﬁrst compare routing methods in the scheduled routes mobility scenario. Connectivity is varied by increasing the size of the scenario; clearly a larger scenario size results in less connectivity. Experiments with scheduled routes show that SCaTR takes advantage of predictability to provide high delivery ratios with low signaling overhead. Figure 1 illustrates delivery rates and overheads for the various protocols. At small scenario sizes with good network connectivity, all protocols successfully deliver some messages. Epidemic routing delivers, as

On-Demand Routing in Disrupted Environments

163

140 120 100

AODV Epidemic SCaTR / AODV (one proxy) SCaTR / AODV (two proxies)

80 60 40 20 0 2000

2500 3000 3500 4000 4500 Scenario Width and Height (meters)

Messages per Delivered Packet

Packet Delivery Ratio (%)

expected, 100% of messages, while SCaTR delivers approximately 96%. AODV, due to frequent route disruptions, delivers only 6% of messages. Epidemic routing maintains a 100% delivery ratio until the scenario size reaches 2500m x 2500m. Most notably, SCaTR sustains similar delivery ratio to Epidemic routing throughout the experiment. At 4500m x 4500m, connectivity is extremely poor, and none of the protocols deliver more than 40% of messages. The plot of signaling overhead in Figure 1 shows that SCaTR maintains its high delivery rates with little overhead. Since those packets that reach their destination with Epidemic routing likely reach all nodes in the network, overhead remains at nearly 50 messages per delivery. The addition of SCaTR enables AODV to maintain fairly constant overhead, much lower than that of Epidemic routing. The ’hump’ in the plot for AODV occurred because partitions at the start of the simulations surrounding the sources were large. Thus, as AODV tried its route establishment attempts, they reached a high number of nodes (but not the destination). This behavior had a large impact on overhead since so few packets were delivered.

250 200

AODV Epidemic SCaTR / AODV (one proxy) SCaTR / AODV (two proxies)

150 100 50 0 2000

2500 3000 3500 4000 Scenario Width and Height (meters)

Fig. 1. Delivery ratio (left) and signaling overhead (right) for the scheduled routes mobility model with varied network connectivity

The results for the gridded random waypoint model, shown in Figure 2, are similar to that of the scheduled routes scenario. AODV’s performance decreases sharply as connectivity diminishes, while SCaTR maintains delivery ratios similar to Epidemic routing. In addition, Epidemic routing and SCaTR both show very high delivery ratios until the network is extremely disconnected. Notably, in both scenarios, the results show that adding a second proxy to SCaTR does not have a signiﬁcant impact on its delivery ratio, although it does incur additional overhead. It is likely that a network with many more nodes would show the beneﬁt of the second proxy more clearly, however these results indicate that it is not a signiﬁcant factor. When buﬀer limitations are introduced, SCaTR’s performance signiﬁcantly improves compared to that of Epidemic routing. All protocols employed a FIFO drop strategy for buﬀered messages. As shown in Figure 2, the duplicate messages in Epidemic routing have a detrimental eﬀect with a small buﬀer. Only once

164

J. Boice, J.J. Garcia-Luna-Aceves, and K. Obraczka

140 120 100

140 AODV Epidemic SCaTR / AODV (one proxy) SCaTR / AODV (two proxies)

80 60 40 20 0 2000

Packet Delivery Ratio (%)

Packet Delivery Ratio (%)

the size of the buﬀer reaches approximately 25% of the number of messages originated in the network does Epidemic routing’s delivery ratio surpass that of SCaTR. This improvement, however, is still oﬀset by the high overhead of Epidemic routing. We can also see that with a very constrained buﬀer of 5% to 10%, it is beneﬁcial to select only one proxy for each message.

120

Epidemic SCaTR / AODV (one proxy) SCaTR / AODV (two proxies)

100 80 60 40 20 0

2500 3000 3500 4000 4500 Scenario Width and Height (meters)

5 10 15 20 25 30 35 40 45 50 Node Storage Space (% total messages)

Fig. 2. Left, delivery ratio for the gridded random waypoint mobility model with varied network connectivity. Right, delivery ratio for scheduled routes mobility model with limited buﬀer space.

SCaTR, because it utilizes mobility predictability, was not designed with random mobility in mind, but we include this scenario to illustrate that it will perform at least as well as the underlying on-demand protocol in any situation. Figure 3 illustrates performance when node speeds were varied to provide more or less connectivity in the network. SCaTR shows a higher delivery ratio than standard AODV despite the fact that it is unable to take advantage of historical information to predict future topologies. In addition, this experiment shows the beneﬁt of additional message replication that can be provided with SCaTR. Epidemic routing has excellent delivery ratios at the expense of very high overhead. In power constrained environments, this would likely not be acceptable. It should be noted that the overhead of AODV is extremely high because it delivers so few messages. Figure 4 shows that SCaTR’s performance improves as the length of simulation increases. As the scenario time increases, delivery rates increase and signaling overhead remains constant after an initial drop. Epidemic routing quickly achieves nearly a 100% delivery rate, while SCaTR is able to achieve better than 80% after approximately 2000 seconds. Despite the fact that SCaTR takes longer than Epidemic routing to deliver messages, it does so with far less signaling. After an initial drop in signaling overhead due to low delivery rates (most packets are en route to their destination at this point), SCaTR settles at approximately 10-20 messages per delivered packet. Epidemic routing settles at approximately 38 messages per delivered packet, since each message reaches a large portion of the network.

On-Demand Routing in Disrupted Environments

Messages per Delivered Packet

Packet Delivery Ratio (%)

160 AODV Epidemic SCaTR / AODV (one proxy) SCaTR / AODV (two proxies)

140 120 100 80 60 40 20 0 5

10

15

20

25

165

100 AODV Epidemic SCaTR / AODV (one proxy) SCaTR / AODV (two proxies)

80 60 40 20 0

30

5

Node Speed (meters/second)

10

15

20

25

30

Node Speed (meters/second)

Packet Delivery Ratio (%)

140 120

Epidemic SCaTR / AODV (one proxy) SCaTR / AODV (two proxies)

100 80 60 40 20 0

Messages per Delivered Packet

Fig. 3. Delivery ratio (left) and signaling overhead (right) for random waypoint mobility model with varied node speeds

600 800 1000 1200 1400 1600 1800 2000 Scenario Length (seconds)

70 60

Epidemic SCaTR / AODV (one proxy) SCaTR / AODV (two proxies)

50 40 30 20 10 0 600 800 1000 1200 1400 1600 1800 2000 Scenario Length (seconds)

Fig. 4. Delivery ratio (left) and signaling overhead (right) for scheduled routes mobility model with varied scenario length

6

Discussion and Future Work

We have introduced the Space-Content-adaptive-Time Routing (SCaTR) framework to eﬃciently and eﬀectively route data in networked environments where connectivity is intermittent. SCaTR extends on-demand routing to operate in environments where, often times, there may not be a direct route between source and destination. Thus, if the network is connected, SCaTR operates exactly as regular on-demand routing. However, if source and destination do not have a connected route, SCaTR chooses a node that is deemed closer to the destination than the source as a proxy for that destination. The proxy will either deliver the data to the destination directly, or choose another proxy closer to the destination than itself. In summary, the resulting protocol will do no worse than standard AODV in well-connected environments, and far better in partitioned networks. Through extensive simulations in environments with varying connectivity and mobility patterns, we showed that SCaTR can yield considerably higher delivery ratio with lower signaling overhead than traditional on-demand routing.

166

J. Boice, J.J. Garcia-Luna-Aceves, and K. Obraczka

References 1. Davis, J., Fagg, A., Levine, B.: Wearable computers as packet transport mechanisms in highly-partitioned ad-hoc networks. In: Proceedings of the International Symposium on Wearable Computing, Zurich, Switzerland, October 2001. (2001) 141–148 2. Ogier, R., Templin, F., Lewis, M.: Rfc 3684: Topology dissemination based on reverse-path forwarding (tbrpf) (2004) 3. Clausen, T., Jacquet, P.: Rfc 3626: Optimized link state routing protocol (olsr) (2003) 4. Perkins, C., Belding-Royer, E.: Rfc 3561: Ad hoc on-demand distance vector (aodv) routing (2003) 5. Johnson, D.B., Maltz, D.A.: Dynamic source routing in ad hoc wireless networks. In Imielinski, Korth, eds.: Mobile Computing. Volume 353. Kluwer Academic Publishers (1996) 6. Cerf, V., Burleigh, S., Hooke, A., Torgerson, L., Durst, R., Scott, K., Fall, K., Weiss, H.: Delay-tolerant network architecture (2004) 7. dtnrg: Delay tolerant networking research group (2006) 8. Zhao, W., Ammar, M., Zegura, E.: A message ferrying approach for data delivery in sparse mobile ad hoc networks. In: Proceedings of ACM Mobihoc 2004. (2004) 9. Mitchener, W., Vahdat, A.: Epidemic routing for partially connected ad-hoc networks (2000) 10. Lindgren, A., Doria, A., Schelen, O.: Probabilistic routing in intermittently connected networks. In: The First International Workshop on Service Assurance with Partial and Intermittent Resources (SAPIR 2004). (2004) 11. Fall, K., Hong, W., Madden, S.: Custody transfer for reliable delivery in delay tolerant networks (2003) 12. Merugu, S., Ammar, M., Zegura, E.: Routing in space and time in networks with predictable mobility (2004) 13. Spyropoulos, T., Psounis, K., Raghavendra, C.: Spray and wait: An eﬃcient routing scheme for intermittently connected mobile networks. In: Proc. SIGCOMM 2005. (2005) 14. Leguay, J., Friedman, T., Conan, V.: Dtn routing in a mobility pattern space. In: Proc. SIGCOMM 2005. (2005) 15. Burns, B., Brock, O., Levine, B.N.: Mv routing and capacity building in disruption tolerant networks. In: IEEE INFOCOM 2005. (2005) 16. de Moraes, R., Sadjadpour, H., Garcia-Luna-Aceves, J.: Throughput-delay analysis of mobile ad-hoc networks with a multi-copy relaying strategy. In: Proc. IEEE SECON 2004: The First IEEE Communications Society Conference on Sensor and Ad Hoc Communications and Networks. (2004) 17. glomosim: Global mobile information systems simulation library (2001)

Delivery Guarantees in Predictable Disruption Tolerant Networks Jean-Marc Fran¸cois and Guy Leduc Research Unit in Networking (RUN) DEECS — Institut Monteﬁore, B28 — Sart-Tilman University of Li`ege 4000 Li`ege, Belgium {francois,leduc}@run.montefiore.ulg.ac.be

Abstract. This article studies disruption tolerant networks (DTNs) where each node knows the probabilistic distribution of contacts with other nodes. It proposes a framework that allows one to formalize the behaviour of such a network. It generalizes extreme cases that have been studied before where either (a) nodes only know their contact frequency with each other or (b) they have a perfect knowledge of who meets who and when. This paper then gives an example of how this framework can be used; it shows how one can ﬁnd a packet forwarding algorithm optimized to meet the delay/bandwidth consumption trade-oﬀ: packets are duplicated so as to (statistically) guarantee a given delay or delivery probability, but not too much so as to reduce the bandwidth, energy, and memory consumption.

1

Introduction

Disruption (or Delay) Tolerant Networks (DTNs, [1]) have been the subject of much research activity in the last few years, pushing further the concept of Ad Hoc networks. Like Ad Hoc networks, DTNs are infrastructureless, thus the packets are relayed from one node to the next until they reach their destination. However, in DTNs, node clusters can be completely disconnected from the rest of the network. In this case, nodes must buﬀer the packets and wait until node mobility changes the network’s topology, allowing the packets to be ﬁnally delivered. A network of Bluetooth-enabled PDAs, a village intermittently connected via low Earth orbiting satellites, or even an interplanetary Internet ([2]) are examples of disruption tolerant networks. The atomic data unit is a group of packets to be delivered together. In DTN parlance, it is called a message or a bundle; we use the latter in the following. Routing in such networks is particularly challenging since it requires to take into account the uncertainty of mobiles movements. The ﬁrst method that has

This work has been supported by the Belgian Science Policy in the framework of the IAP program (Motion P5/11 project) and by the IST-FET ANA project.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 167–178, 2007. c IFIP International Federation for Information Processing 2007

168

J.-M. Fran¸cois and G. Leduc

been proposed in the literature is pretty radical and proposes to forward bundles in an “epidemic” way ([3,4]), i.e., to copy them each time a new node is encountered. This method of course results in optimum delays and delivery probabilities, at the expense of an extremely high consumption of bandwidth (and, thus, energy) and memory. To mitigate those shortcomings, the epidemic routing has been enhanced using heuristics that allow the propagation of bundles to a subset of all the nodes ([5,6]). More advance heuristics have been introduced to cope with the nodes limited memory. Cache mechanisms have been proposed, where the most interesting bundles are kept (i.e. those that are likely to reach their destination soon) and the others are discarded when the cache is full ([7,8,9,10]). Few papers explore how the expected delay could be more precisely estimated (notable exceptions are [11,12]). It has been proved ([13]) that a perfect knowledge of the future node meetings allows the computation of an optimal bundle routing. This short introduction emphasizes two shortcomings: – Previous works suppose either that nodes contacts are perfectly deterministic, or that only the contact frequency is known for each pair of nodes. In this paper, we introduce a framework which generalizes those extreme cases and formalizes the nodes contact predictability. It allows one to compute the expected impact of a particular bundle forwarding strategy; – Previous works only propose bundle forwarding heuristics. In what follows, we give an example of how the above-mentioned framework can be used to ﬁnd a bundle routing strategy that fulﬁlls delivery guarantees while limiting bandwidth/energy consumption.

2

Predictable Future Contacts

The network is composed of a ﬁnite set of wireless nodes N that can move and thus, from time to time, come into contact. In the sequel, a contact between two nodes happens when those nodes have setup a bi-directional wireless link between them. A contact is always considered long enough to allow all the required data exchanges to take place1 . 2.1

Contact Proﬁles

We expect the mobiles motion to be predictable, yet obviously the degree of predictability varies from one network to another. Sometimes nodes motion is known in advance because they must stick to a given schedule (e.g. a network of buses) or because their trajectory can easily be modelled (e.g. nodes embedded in a satellite). Other networks are less predictable, yet not totally random: colleagues could be pretty sure to meet every day during working hours, without 1

This is a major diﬀerence with [13] which does not neglect bundle transmission times.

Delivery Guarantees in Predictable Disruption Tolerant Networks

169

Probability (%)

30 25 20 15 10 5 0 0

5

10

15

20

25

30

20

25

30

Probability density (%/day)

Time (days) 45 40 35 30 25 20 15 10 5 0 0

5

10

15

Time (days)

Fig. 1. Contact proﬁle and ﬁrst contact distribution: example. Top: A contact proﬁle: the height of a bar gives the probability that two nodes meet (at least once) during the corresponding 12-hour time period. Bottom: The corresponding ﬁrst contact distribution; each bar corresponds to a 12-hour period.

any other time guarantee. Mobile nodes behaviour could also be learnt automatically so as to extract cyclical contact patterns. We therefore suppose that each node pair {a, b} ⊂ N can estimate its contact probability for (discrete) each time step in the near future. We call it a contact proﬁle and denote it Cab : N → [0, 1]. In the following, we suppose the proﬁle known for each node pair. Contact proﬁles can easily represent situations usually depicted in the literature: – A constant proﬁle Cab (t) = k describes a node pair that only knows its contact frequency. For example, the proﬁle Cab (t) = 1/30 (contact probability per day) corresponds to two nodes a and b meeting once a month on average. – Perfect knowledge of meeting times results in a proﬁle made of peaks: ∀t ∈ N : Cab (t) ∈ {0, 1}. In practice, unknown contact proﬁles could be replaced by a constant function equal to zero on its domain to get a defensive approximation of their behaviour. The following sections aim at studying how bundles propagate from one node to another in a network whose nodes’ contact proﬁles are known. 2.2

First Contact Distribution

It is easy to deduce the probability distribution of a (ﬁrst) contact at time t between nodes a and b ∈ N given their proﬁle Cab ; we denote this distribution

170

J.-M. Fran¸cois and G. Leduc

dab . Since the probability of a ﬁrst contact at time t is the probability of meeting at time step t times the probability not to meet at time steps 0, 1, . . . , t − 1. We have (∀a, b ∈ N ): dab (t) = Cab (t)

t−1

1 − Cab (i)

∀t ∈ N

(1)

i=0

The distributions domain is N since contact proﬁles have been deﬁned using discrete time steps. We extend the distributions to R to get rid of this artifact. Notice that dab is not a well-deﬁned probability distribution since its integral over its domain is not equal to 1: two nodes might never meet. Deﬁnition 1. The ﬁrst contact distribution set, C, is the set of functions2 ∞ f : R+ → R+ such that 0 f (x) dx ≤ 1. Contact proﬁles do not allow us to express contact interdependencies; for example, they cannot model that two nodes are certain to meet exactly once during the weekend without knowing exactly which day (if a probability of .5 is assignated to Saturday and Sunday, there is a .25 probability that the nodes will meet twice). First contact distributions have no such limitations. Therefore, when it is possible, one could ﬁnd preferable to generate them directly without relying on contact proﬁles. Figure 1 gives an example contact proﬁle Cab (top) and the corresponding ﬁrst contact distribution dab (bottom). Notice that if a bundle is delivered directly from a to b, knowing the ﬁrst contact distribution allows an easy veriﬁcation of a large spectrum of guarantees, such as the average delay or the probability of delivery before a certain date.

3

Delivery Distributions

3.1

Deﬁnition

First contact distributions can be generalized to take into account the knowledge that no contact were made before a certain date. Let Dab (T, t) be the probability distribution that a and b require a delay of t time steps to meet for the ﬁrst time after time step T . Since these distributions will be the building blocks that allow us to compute when a bundle can be delivered to its destination, we call them delivery distributions. Dab can directly be derived from the contact proﬁle Cab (∀a, b ∈ N ): Dab (T, t) = Cab (T + t)

T +t−1

1 − Cab (i)

∀T, t ∈ N

i=T 2

As before, the domain of these functions can be extended to R+ . 2

R+ denotes the set of positive reals.

(2)

Delivery Guarantees in Predictable Disruption Tolerant Networks

171

Deﬁnition 2. The delivery distribution set, D, holds all the functions ∞ 2 f : R+ → R+ such that ∀T ∈ R+ : 0 f (T, x) dx ≤ 1. Notice the inequality. D (%/day) 60 50 40 30 25

20 10 0

20 15

0 5

10

10 15

T

t

5 20 0

Fig. 2. Contact probability density. The Dab (T, t) delivery distribution matching the contact proﬁle given in ﬁgure 1.

Figure 2 draws the Dab (T, t) distribution corresponding to the contact proﬁle given in ﬁgure 1. Notice that the D(T, ·) functions of course belong to C (∀T ≥ 0). Notice that Dab (T, ·) is the expected delivery delay distribution for a bundle sent directly from a source a to a destination b if a decides to send it at time T . 3.2

Order Relation on Distributions

We deﬁne an order relation between ﬁrst contact distributions. Intuitively, this relation allows one to compare two distributions to ﬁnd which one represents more frequent or predictable contacts. A rigorous deﬁnition is given below. Deﬁnition 3. The ﬁrst contact distributions d1 ∈ C is greater (or equal) than d2 ∈ C (denoted d1 d2 ) if and only if: x x ∀x ≥ 0 : d1 (t) dt ≥ d2 (t) dt (3) 0

0

This relation is a partial order (but not a total order as there exist d1 , d2 ∈ C such that neither d1 d2 nor d1 d2 ; see [14] for more details). It appears diﬃcult to deﬁne a total order on C: comparing two distributions that cannot be ordered using the relation is a matter of choice and depends on the bundle delivery guarantees one wants to enforce. The relation is thus a least common denominator, and could be replaced in what follows with a more restrictive order deﬁnition. The worst (smallest) element of C is the ⊥ (bottom) distribution: ⊥(t) = 0 (∀t ≥ 0). The best (greatest) ﬁrst contact distribution is denoted (top): (t) = δ(t) (∀t ≥ 0); the δ symbol denotes the Dirac distribution.

172

J.-M. Fran¸cois and G. Leduc

The relation can be extended to D. For all D1 , D2 ∈ D: D1 D2 ⇐⇒ ∀T ≥ 0 : D1 (T, ·) D2 (T, ·) The D⊥ delivery distribution is such that ∀T ≥ 0 : D⊥ (T, ·) ≡ ⊥. The deﬁnition of D follows immediately.

4 4.1

Delivery Distribution Operators The Forwarding Operator

Let Dsbd be the delivery distribution associated with the delivery of a bundle from a source node s to a destination d via node b. More precisely, if s decides to send a bundle at time T , it will reach d after a delay described by the Dsbd (T, ·) distribution. Dsbd can be computed thanks to Dsb and Dbd : Dsbd ≡ Dsb ⊗ Dbd

(4)

The ⊗ (or forwarding) operator is a function deﬁned for all distribution pair. We have ⊗ : D2 → D: t D1 ⊗ D2 (T, t) = D1 (T, x) D2 (T + x, t − x) dx (5) 0

It is easy to see that this operator is associative but not commutative. Equation (5) simply states that since the total delivery delay is equal to t, if the delay to reach b is equal to x, then the delay from b to d is t − x. Equation (4) can be generalized: a bundle could be forwarded through several intermediate hops before reaching its destination. We denote Ds−d (notice the dash) the delivery delay distribution for a bundle sent from a source s to a destination d at time T ; from now on, ⊗ will thus be applied to any kind of delivery distributions. For example, the graph below depicts a simple delivery path, i.e. a sequence of forwarding nodes; the corresponding delivery distribution is also given. s

/a

/b

/d :

Ds−d ≡ Dsa ⊗ Dab ⊗ Dbd

We say that two delivery paths with a common source s and destination d are disjoint if the intersection of the set of nodes they involve is {s, d}. 4.2

The Duplication Operator

Let D s Ed be the delivery distribution associated with the delivery of a bundle d

from s to d if it is duplicated so as to follow the disjoint delivery paths described by the distributions Ds−d and Ds−d . We have: D s Ed ≡ Ds−d ⊕ Ds−d d

(6)

Delivery Guarantees in Predictable Disruption Tolerant Networks

173

The ⊕ (or duplication) operator is a function ⊕ : D2 → D, deﬁned as follows: D1 ⊕ D2 (T, t) =

t D1 (T, x) dx D2 (T, t)+ 1− 0 t 1− D2 (T, x) dx D1 (T, t) (7) 0

The expected delay computed is that of the ﬁrst bundle to reach the destination d. It is easy to see that ⊕ is associative and commutative. Operators ⊗ and ⊕ can be combined to consider more complex forwarding strategies, assigning a higher precedence to ⊗. Equation (7) is the sum of two terms. Each term is the probability that the bundle reaches the destination after a delay t using one path and that the bundle following the other path is not arrived yet. It can be proven that we have both D1 ⊕ D2 D1 and D1 ⊕ D2 D2 . This means that, contrary to what happens in deterministic networks, duplicating a bundle to send it along two paths can improve performance: it is not the case that the best path always delivers the bundle ﬁrst. Figure 3 shows an example of the distributions obtained using the “duplication” operator. As expected, duplicating bundles shortens the delays and increases the delivery probability. D (%/day) 250 200 150 100 25 50 0

20 0

15 5 10

10 15

t

5 20

T

25

0

Fig. 3. Duplication (⊕) operator: example. We denote D1 the delivery distribution depicted in ﬁgure 2. Let D2 be a distribution that corresponds to nodes that are certain to meet on day 9, 16 and 25. This plot depicts D1 ⊕ D2 .

4.3

The Scheduling Operator

Let D s

d d

be the delivery distribution that, every time a bundle has to be sent,

. We have: chooses the best delivery strategy out of Ds−d and Ds−d

Ds

d d

≡ Ds−d Ds−d

(8)

174

J.-M. Fran¸cois and G. Leduc

The deﬁnition of is straightforward. It is a function : D2 → D such that: D1 (T, t) if D2 (T, ·) D1 (T, ·) D1 D2 (T, t) = (9) D2 (T, t) otherwise If s sends a bundle at time T , it is delivered using D2 (T, ·) if and only if D2 (T, ·) D1 (T, ·). This operator is not commutative since is not a total order: when D1 (T, ·) and D2 (T, ·) cannot be compared, D1 (T, ·) is chosen. This new operator can be combined with the other two (⊗ and ⊕) assigning it a lower precedence. The following example involves all the operators deﬁned above. Two plain arrows leaving a node depict a duplication. Two arrows leaving a node, one of them dotted, depict a scheduling operation. The dotted arrow leads to the second argument of , emphasizing the operator’s non-commutativity. ;e s GG / b G# c 4.4

/f

/d /d : /d

Ds−d ≡ Dsc ⊗ Dcd ⊕ Dsb ⊗ (Dbf ⊗ Df d Dbe ⊗ Ded )

Delivery Schemes

We have deﬁned a delivery path as a delivery strategy that only involves forwarding. A delivery scheme with source s and destination d is a general delivery strategy that allows a bundle to be delivered from s to d. It can use an arbitrary number of forwarding, duplication and scheduling operations.

5

Delivery Guarantees

Knowing the delay distribution d ∈ C associated with the delivery of a bundle allows us to verify a large range of conditions on permissible delays or on delivery probabilities. For example, the condition ∞ d(t) t dt ≤ dmax 0

imposes a maximum expected delay dmax , while

1h

d(t) dt ≥ .9 0

24h

d(t) dt ≥ .99

and 0

matches distributions delivering a bundle in less than one hour nine times out of ten, and in less than a day with a probability of 99%. We naturally impose that a condition fulﬁlled for a certain delivery scheme must be fulﬁlled for better schemes.

Delivery Guarantees in Predictable Disruption Tolerant Networks

175

Deﬁnition 4. A delivery condition C is a predicate: C : C → {true, false} iﬀ ∀d1 , d2 ∈ C such that d1 d2 , we have C(d2 ) =⇒ C(d1 ). A condition C can be extended to a delivery distribution D ∈ D: C(D) ⇐⇒ ∀T ≥ 0 : C D(T, ·)

6 6.1

Delivering Bundles with Guarantees Probabilistic Bellman-Ford

Algorithm 1 adapts the Bellman-Ford algorithm to predictable disruption tolerant networks. In this section, we do not allow bundle duplication. Notice that, in general, the concept of “shortest path” is meaningless since the relation is a partial order.

Algorithm 1. Probabilistic Bellman-Ford Data: d is the destination node 1 2

∀ x ∈ N \ {d} : Bx ← D⊥ ; Bd ← D ;

10

repeat stabilized ← true; forall x ∈ N do forall y ∈ N do Dxy−d ← Dxy ⊗ By ; if Bx = Bx Dxy−d then stabilized ← false; Bx ← Bx Dxy−d ;

11

until stabilized ;

3 4 5 6 7 8 9

Similarly to the Bellman-Ford algorithm, algorithm 1 computes, for every node n ∈ N , the best distribution leading to the destinations found so far (Bn ). This distribution is propagated to its neighbours (i.e. all the other nodes since the network is infrastructureless). Once node x receives the best delivery distribution By found by y, it computes the delivery distribution obtained if it would send the bundle directly to y, and if y would forward it according to By . The resulting distribution is denoted Dxy−d (line 6). Dxy−d is compared to the best known distribution to the destination (Bx ) by means of the operator. If Dxy−d is better than Bx on some time intervals, Bx is updated (line 9). The algorithm terminates once no more Bx distribution is updated.

176

J.-M. Fran¸cois and G. Leduc

As mentioned before, this algorithm generalizes both [15] (i.e. converges to the “shortest expected path”) and [13]3 (i.e. ﬁnds the exact shortest path in the case of perfectly predictable networks). The delivery computed by this algorithm depends on the order at which the elements of N are picked up (lines 5 and 6). In practice, it might be preferable to rely on a heuristic to choose the preferred elements ﬁrst. 6.2

Guarantees

Our aim is now to ﬁnd a way to deliver bundles that fulﬁlls a given condition C as speciﬁed in deﬁnition 4, while trying to minimize the network’s bandwidth/energy/memory consumption. Ideally, the DTN is predictable enough to enforce condition C without duplicating any bundle. We thus propose to rely on algorithm 1 to ﬁnd a ﬁrst delivery scheme (and, thus, a ﬁrst delivery distribution D1 ). If C is not fulﬁlled by D1 , we search for another fast bundle forwarding scheme using algorithm 1; let D2 be its delivery distribution. We then duplicate the bundle on both delivery schemes, yielding a distribution D1 ⊕ D2 . We have already pointed out that D1 ⊕ D2 D1 , thus C(D1 ⊕ D2 ) is more likely to be true then C(D1 ). This process is iterated until C is ﬁnally fulﬁlled; see algorithm 2. As mentioned in section 4.2, the distribution computed by the “duplication” (⊕) operator is biased if its operands are not independent distributions. To avoid this bias, we ensure that D1 and D2 are independent by forbidding D2 to rely on the nodes involved in D1 (source and destination nodes excluded, line 5). More details can be found in [14].

Algorithm 2. Constrained probabilistic delivery Data: Delivery condition C Data: Bundle source s and destination d 1 2 3 4 5 6

B ← D⊥ repeat Using nodes in N , compute D ∈ D via algorithm 1 B ←B⊕D N ← N \ {nodes involved in D} ∪ {s, d} until C(B) or N = {s, d}

Nothing guarantees of course that there exists a way to deliver bundles that satisﬁes C: even an epidemic broadcasting might not suﬃce. 3

To be fair, this work also deals with message transmission delays, which are not considered here.

Delivery Guarantees in Predictable Disruption Tolerant Networks

7

177

Conclusion and Future Works

We propose to model contacts between a disruption tolerant network’s mobile nodes as a random process, characterized by contact distributions. Such a description is more general than those generally encountered in the literature. We have setup a framework that shows how such contact distributions can be combined to compute the bundle delivery delay distribution corresponding to a given delivery strategy (i.e. a description of the nodes forwarding decisions). This framework is formally deﬁned and quite generic; it can be used to evaluate quantitatively the performance of new routing protocols. It could be expanded with new operators describing other (more subtle) forwarding schemes. A significant improvement would be to modify the framework so as to deal with bundles transmission delays. As a future work, real network traces can be analysed so as to quantify their predictability; the delivery strategies elaborated using this framework could then be compared with the heuristics proposed in the literature. To demonstrate the applicability of the framework, we have used it to build a new routing algorithm. It uses a modiﬁed Bellman-Ford algorithm adapted to DTNs and asks the source to duplicate bundles. It tries to compute a routing strategy that fulﬁlls a given delivery condition without consuming too many resources.

References 1. Zhang, Z.: Routing in intermittently connected mobile ad hoc networks and delay tolerant networks. In: IEEE Surveys and Tutorials. Volume 8-1. (2006) 24–37 2. Burleigh, S., Hooke, A., et al.: Delay-tolerant networking — an approach to interplanetary internet. IEEE Communications Magazine 41(6) (2003) 128–136 3. Vahdat, A., Becker, D.: Epidemic routing for partially connected ad hoc networks. Technical Report TR CS-200006, Duke University (April 2000) 4. Juang, P., Oki, H., et al.: Energy-eﬃcient computing for wildlife tracking: Design tradeoﬀs and early experiences with zebranet. In: ASPLOS, San Jose, CA. (October 2002) 5. Spuropoulos, A., Psounis, K., Raghavendra, C.: Single-copy routing in intermittently connected mobile networks. In: Proceedings of IEEE SECON. (October 2004) 6. Spyropoulos, T., Psounis, K., et al.: Spray and wait: An eﬃcient routing scheme for intermittently connected mobile networks. In: Proc. of SIGCOMM’05. (2005) 7. Wang, Y., Jain, S., Martonosi, M., Fall, K.: Erasure-coding based routing for opportunistic networks. In: WDTN ’05: Proceeding of the 2005 ACM SIGCOMM workshop on Delay-tolerant networking, New York, NY, USA, ACM Press (2005) 229–236 8. Lindgren, A., Doria, A., Schel`en, O.: Probabilistic routing in intermittently connected networks. SIGMOBILE Mob. Comput. Commun. Rev. 7(3) (2003) 19–20 9. Jones, E., Li, L., Ward, P.: Practical routing in delay-tolerant networks. In: Proc. of WDTN’05, New York, NY, USA, ACM Press (2005) 237–243 10. Leguay, J., Friedman, T., Conan, V.: DTN routing in a mobility pattern space. In: Proc. of WDTN’05, New York, NY, USA, ACM Press (2005) 276–283

178

J.-M. Fran¸cois and G. Leduc

11. Shen, C., Borkar, G., Rajagopalan, S., Jaikaeo, C.: Interrogation-based relay routing for ad hoc satellite networks. In: IEEE Globecom, Taipei, Taiwan (November 17-21 2002) 12. Musolesi, M., Hailes, S., Mascolo, C.: Adaptive routing for intermittently connected mobile ad hoc networks. In: Proc. of WoWMoM’05. (2005) 183–189 13. Merugu, S., Ammar, M., Zegura, E.: Routing in space and time in networks with predictable mobility. Technical Report GIT-CC-04-7, Georgia Tech. Inst. (2004) 14. Fran¸cois, J.M., Leduc, G.: Predictable disruption tolerant networks and delivery guarantees. Technical Report (arXiv:cs.NI/0612034 v1) (2006) 15. Tan, K., Zhang, Q., Zhu, W.: Shortest path routing in partially connected Ad Hoc networks. In: Proc. of IEEE GLOBECOM’03. Volume 2. (December 2003) 1038– 1042

PWave: A Multi-source Multi-sink Anycast Routing Framework for Wireless Sensor Networks Haiyang Liu1 , Zhi-Li Zhang1, , Jaideep Srivastava1 , and Victor Firoiu2 1 2

Dept. of Comp. Sci. and Eng., Univ. of Minnesota, Minneapolis, MN 55414 Advanced Information Technologies, BAE Systems, Burlington, MA 01803

Abstract. We propose a novel routing framework called PWave that supports multi-source multi-sink anycast routing for wireless sensor networks. A distributed and scalable potential field estimation algorithm and a probabilistic forwarding scheme are proposed to ensure low overhead and high resilience to network dynamics. Key properties of this framework are proved through theoretical analysis and verified through simulations.Using network lifetime maximization problem as one example, we illustrated the power of this framework by showing a 2.7 to 8 times lifetime extension over Directed Diffusion and up to 5 times lifetime extension over the energy-aware multipath routing.

1 Introduction Wireless sensor networks (WSNs) are generally deployed to support specific missions or applications such as habitat monitoring, object tracking. Traffic are generated from a number of sensing sources and collected by (any one of) a few sinks [1]. Hence data communications in WSNs exhibit multi-source, a multi-sink anycast pattern, which is fundamentally different from that in general-purpose communication networks (whether wired or wireless), where any two nodes may serve as the two ends of an end-to-end communication. In addition, WSNs often operate in challenging environments and are subject to frequent disruptions and node failures. These unique settings and constraints call for a robust routing framework for WSNs that can quickly adapt to changes in traffic pattern, network conditions and environments. Existing routing schemes (e.g., [2]) used in WSNs are either variations, or even direct adoption, of routing algorithms for general-purpose wired networks, or mobile wireless ad hoc networks (MANETs), which are typically designed using the single shortest path unicast routing paradigm1. While several multi-path routing schemes (see, e.g., [3,4,5]) have been proposed, they tend to be extension of the single shortest-path routing paradigm with use of additional paths – choice of these alternative paths is often decided based on somewhat ad hoc mechanisms. More importantly, these techniques

1

This work was supported in part by the NSF grant CNS-0435444 and a DoD Army High Performance Computing Research Center grant. While offering some desirable local properties, the more recent geographic or trajectory-based routing paradigm is still unicast based, using “shortest distance” paths in an Euclidean or metric space. In addition, it requires some type of location information, which may not be easy to obtain.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 179–190, 2007. c IFIP International Federation for Information Processing 2007

180

H. Liu et al.

only support single-sink configuration (i.e., unicast routing) with no direct and easy extension to support multi-source, multi-sink anycast routing. In this paper we propose a novel framework – referred to as PWave to support multisource, multi-sink anycast routing that is inherent in WSNs. Inspired by the analogy between WSNs and electric networks, PWave constructs a potential field by assigning a “potential” (analogous to “voltage” in an electric network) to each node: a source or an intermediate node routes traffic (proportionally) to neighboring nodes with lower potentials towards the sinks, which have the lowest (zero) potentials. The PWave framework is designed with strong theoretical underpinnings. First of all, the constructed potential field realizes scalable, robust, proportional multi-source, multi-sink traffic allocation that optimizes a customizable quadratic function (based on appropriate definition of “link costs”). Furthermore, it guarantees that there is no local minima (i.e. packets are never stuck in a local dead-end) and thus ensures loop-free routing. It adapts to local changes rapidly, while dampening their global impact. These features enable compact and efficient protocol design with low execution overhead. We develop a fully distributed algorithm for constructing the potential field and implement PWave using probabilistic forwarding to achieve the properties described above. PWave scales to the density of the network because only one-hop neighborhood information exchange is needed. In addition this algorithm is resilient to network dynamics in that local perturbations only have local effect. These features make PWave a suitable routing framework for WSNs. In a nutshell, the research contribution of this paper lies in the proposal of a novel routing framework that supports global optimization of custom objectives in a multi-source multi-sink anycast routing settings via fully localized computations. To the best of our knowledge, this is the first systematic routing framework for WSN with this capability. The remainder of this paper is organized as follows. We detail the PWave framework design in section 2. The potential field estimation algorithm is described in section 3, followed by the experimental evaluations in section 4. We describe related work in section 5 and conclude in section 6.

2 The PWave Routing Framework 2.1 System Model and Problem Formulation In order to formally define the PWave routing framework, we first introduce the system model, notations and assumptions. We assume that a WSN can be represented as a weighted (undirected) graph G = (N, E), where N is the set of nodes and E the edges (i.e., links) between nodes which are assumed to be symmetric. Asymmetric links are blacklisted as suggested in literatures. We use Rx,y denote the weight of an edge e connecting nodes x and y, which is strictly positive, i.e., Rx,y > 0. This weight represents some measure of unit cost for transmitting one bit of information between x and y defined in certain manners depending on the applications and routing design objectives. (We will provide some examples of Rx,y later.) Hence if Ix,y amount of data is transmitted from x to y, the total cost would be Ix,y Rx,y . For simplicity, we assume that data traffic only flow along one direction of an edge. The cost for acknowledgements is implicitly accounted for in Rx,y . Under this assumption, Ix,y denote the data rate

PWave: A Multi-source Multi-sink Anycast Routing Framework for WSNs

181

flowing from x to y, we define Iy,x = −Ix,y . The same relation also holds if Iy,x is the data rate flowing from y to x. Let S ⊂ N denote the set of source nodes, and D ⊂ N be the set of sink nodes. For each s ∈ S, Is denote the data rate that may be generated by source node s. More generally, to account for potential in-network processing at intermediate nodes in a WSN that may increase or decrease the data rate flowing through them, for each x ∈ (N − D), we use Ix to denote the (internal) data generation/consumption rate at node x. Note here Ix > 0 means that data is generated at node x while Ix < 0 means that data is consumed at node x. For each node x ∈ N, we use Z(x) to denote the set of its neighboring nodes. Then the flow conservation law requires that for any node x that is not a sink, the total of data rates flowing into node x is equal to the total of data flowing out of node x plus or minus the data rate generated or consumed at node x itself. Namely, Ix,y = Ix , x ∈ (N − D). (1) y∈Z(x)

Given the graph G representing a WSN and I := {Ix |x ∈ N}, we refer to the tuple (G, I) a network configuration NC. Given a NC, routing for a WSN can be casted as a global multi-source multi-sink anycast flow allocation optimization problem to determine the flows {Ix,y } along the links under the flow conservation constraints (1) and boundary conditions (2) such that certain global objective function F (G, I, {Ix,y |(x, y) ∈ E}) can be optimized. Single path (or minimum cost) routing that computes a minimum cost path for each source to one of the sinks is such a flow allocation scheme that allocates flows based only on the cost of the paths, but not on the flow rates. 2.2 PWave Routing Framework Intuitions and Principles. From physics, it is well known that if the energy level of a physical system is minimized, the system would be in most stable state, i.e. the system will tend to go back to this state after disturbances. A routing framework designed this way will thus be robust. With this intuition, we design our PWave routing framework to solve this optimization problem by minimizing a natural quadratic objective function (4), which is equivalent to the total energy of a corresponding electric network system (see Fig. 1). PWave solves the flow allocation optimization by assigning a potential field to the nodes in a WSN, namely, a function V : N → R+ , where R+ denote the set of nonnegative real numbers. The potential function V satisfies the following boundary conditions at the sink nodes (2) Vd = 0, d ∈ D and the flow distribution conditions at non-sink nodes Ix,y =

(Vx − Vy ) Rx,y

as well as the flow conservation constraints (1) at the non-sink nodes.

(3)

182

H. Liu et al.

11

Ix,3 Ix,3

S1 S1

S2 S2 IS2 IS2

RRx,S2 x,S2

RRx,2 22 x,2 xx

11

d2

(a) Communication Network

33

RRx,1x,1

RRx,S2 x,S2

2 RRx,2 2 x,2 xx

s1 s1

RRx,1x,1

IS2 IS2

S2 S2

d1

IS1 IS1

33

S1 S1 Ix,3 Ix,3

IS1 IS1

d1

d2

(b) Equivalent Electric Network

Fig. 1. Analogy between WSNs and Resistive Electric Networks

The equation (3) specifies a localized rule on how data are routed at each node x based on local information, namely, its and its neighbors’ potentials: data only flow from node x towards one of its nodes y with lower potentials (Vy < Vx ) and the amount of data routed along the edge (x, y) is inversely proportional to R(x, y), or equivalently, proportional to gx,y := 1/Rx,y , which is referred to as the conductance of edge (x, y). The boundary conditions (2) ensure that the sink nodes have the lowest possible potential (namely, zero potential) so that data will always flow towards the sinks and are eventually “absorbed” at the sinks. As will be shown shortly, the potential field defined above guarantees the existence of a unique flow allocation {Ix,y : (x, y) ∈ E} such that it minimizes the following global objective function: E=

1 1 2 Ix,y Rx,y = gx,y (Vx − Vy )2 . 2 x,y 2 x,y

(4)

Moreover, we will see that the potential-based PWave routing framework also allows for a probabilistic routing/forwarding implementation at the packet level. More specifically, equation (3) allocation can be achieved in practice through forwarding a packet from node x to one of its lower potential neighbors y with probability given by px→y =

I x,y

Ix,i

(5)

i:i∈Z(x)∧V (i)
where Ix,i is computed from local potential values using eq.(3). In summary, the potential-based PWave routing framework enables us to achieve goals at three different levels: i) at the network-wide macroscopic level, it minimizes a natural global objective function (4); ii) at the intermediate flow level, it provides a localized rule to determine how data flows are routed; and iii) at microscopic packet level, it allows for a simple probabilistic packet forwarding mechanism to achieve both flow-level routing and network-wide design objectives.

PWave: A Multi-source Multi-sink Anycast Routing Framework for WSNs

183

Example Applications. Before we leave this section, we remark that by choosing different interpretations (and thus different values) for the edge costs Rxy (or equivalently, the edge conductance, gxy ), we can use eq.(4) to optimize different design objectives in routing for WSNs. For example, if we set Rxy = 1, then data flows are approximately proportionally distributed and routed based on path hop counts. If we set Rxy equal to the data loss rate on an edge, then minimizing eq.(4) would yield a flow allocation/routing strategy that attempts to approximately equalize the data losses among different paths. As another example, if we set Rx,y as a combination of the per unit power consumed by transmitting one bit of data, denoted as CEi , and the current energy level, denoted as ENi , as follows: Ri,j =

1 CEi CEj =( + ) gi,j ENi ENj

(6)

the PWave routing framework yields a solution that approximately equalizes the power consumption among various paths to maximize the network lifetime. See [6] for detailed derivations.

3 Potential Field Estimation 3.1 Principles and Algorithm The potential field estimation problem is governed by eqs.(1, 3) under boundary conditions specified in eq. (2). The existence and uniqueness of the solution to this problem are well known from electric network theory [7]. Traditional way of solving this problem is to rewrite the problem in matrix form and compute the inverse of the matrix. This is obviously infeasible in WSN environment as centralized data collection and processing are needed. Inspired by the random walk interpretation of electric networks from [7,8], we propose an iterative and localized algorithm based on random walk games to progressively estimate the potential field . Consider experiments of random walks in graph G, illustrated in Fig. 2, where every node is marked with a fixed face value m. Starting from an arbitrary node x, a walker, Terminal

d1

33

S1 S1

x,3 ppx,3

IS1 IS1

Residential person

ppx,S2 x,S2 S2 S2

x,1 ppx,1

IS2 IS2

s1 s1

xx

22

ppx,2 x,2 Terminal

11

d2 Residential person

Fig. 2. Illustration of the Random Walk Game

184

H. Liu et al.

initially with 0 money in hand, goes to one of its adjacent nodes with probability px,i for i ∈ Z(x). This walker keeps going in the same fashion inside the graph until it reaches one of the sink nodes, i.e. d1 or d2 . Every time the walker passes one node i, it collects a fixed amount of money equal to mi . After a large number of experiments, central limit theorem guarantees that the expectation of the total money collected by the walker, f (x), converges [8] to: deg(x) px,i f (i) + mx (7) f (x) = i=1

Obviously f (i) = 0, i ∈ D as a walker starting from a sink node immediately stops. After rewriting eqs. (1, 3) to yield the expression of the voltages, we observe that eq. (7) is equivalent to eqs. (1, 3) when px,i and mx are designed as follows: gi,x px,i = deg(x) (8) k=1 gk,x Ix mx = deg(x)

(9) gk,x This mathematical equivalence warrants that the expectation of the total money collected by a walker starting from node x converges to the potential value at node x. While this analogy provides a distributed way for individual nodes to estimate its potential value, this method is still impractical in a WSN environment because of the need for extra routing infrastructure support, due to route-back of the final collected amount, and the huge communication overhead due to required large number of experiments and long duration for each experiment. Motivated by the Relaxation Method [9], we address the issues listed above by restricting the random walk game to be within one-hop neighborhood of the starting point and apply equation (7) iteratively on all nodes until the whole network reaches equilibrium state. With this method, only local broadcasting among adjacent nodes is needed. The overall communication overhead can be reduced through adjusting of accuracy requirement. In addition, the nature of potential field being a smooth harmonic function ensures localized effect of perturbations, which we prove it in section 3.2, that further reduces potential field maintenance overhead. The pseudo code of this algorithm is presented in Algorithm 1. This pseudo code describes how the entire network reaches global equilibrium. Each node only needs to periodically execute the steps from line 12 to line 19. Though PotentialFieldConstruct only specifies a uniform absolute tolerance threshold for all nodes. In practice, non-uniform relative thresholds may be used to better reflect estimation accuracies at different potential levels. Our experimental results in section 4 show that significantly low overhead can be achieved by combining a coarsegrained estimation of the entire potential field with on-demand localized potential field refinements. k=1

3.2 Properties We now summarize the key properties of this algorithm that are of great value in ensuring efficient routing protocol design and executions in WSN. All formal proofs of these properties are described in [6] and omitted in this paper due to space constraint.

PWave: A Multi-source Multi-sink Anycast Routing Framework for WSNs

185

Algorithm 1. PotentialFieldConstruct (N, I, Sinks, tolerance) Require: Set of all nodes N, array of packet generation rate I, set of sink nodes Sinks, error tolerance tolerance Ensure: The equilibrium potential field 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22:

Variables: P nodePotentials, x nodeId, for each x ∈ N if x ∈ Sinks then P [x] ← 0 else P [x] ← arbitrary non-negative number endif endfor equilibrium ← f alse while (equilibrium = true) equilibrium ← true for each x ∈ N if x ∈ / Sinks then newP ← apply eq. (7) else newP ← 0 endif if |P [x] − newP | ≥ tolerance then equilibrium ← f alse endif P [x] ← newP endfor endwhile return P

Properties: 1) Convergency:The PotentialFieldConstruct algorithm, when tolerance → 0, converges to the unique solution of the potential field estimation problem with arbitrary non-negative initial guess of potential values and in any iteration order. 2) Loop Free: Forwarding over equilibrium potential field is loop free. 3) No Local Minima: The equilibrium potential field does not have local minima in that node x has the minimum potential value among its neighbor nodes if and only if x is a sink node. 4) Proportional Traffic Allocation:Given two disjoint paths sharing only the starting and ending nodes, P ath1 and P ath2 , with corresponding effective resistances Re1 and Re2 . If Re1 < Re2, the flow rate allocated on P ath1 will be higher than that allocated on P ath2 . 5) Rich Links Attract More Traffic:When more parallel links are added to a path, higher flow rate will be allocated to that path. 6) Perturbation Spatial Decay:In a dense network, the absolute potential value changes caused by event at node x are attenuated to zero over the number of hops from node x. Property 1) guarantees the correctness of the algorithm. Property 2) and 3) enable efficient protocol design in PWave as there is no need for loop-preventing mechanisms and last resort of flooding for getting packets out of a local dead-end. Property 4) comes directly from Ohm’s Law and effectively guarantees that capable paths take more traffic. Property 5) comes from Rayleigh’s Monotonicity Law [7] and warrants that any new

186

H. Liu et al.

nodes or links always help to take more traffic share. Property 6)is the key property that ensures the scalability and network dynamics resilience of PWave since it guarantees that only local areas of the potential field need to be updated in react to local events.

4 Experimental Evaluations 4.1 Experimental Setup and Performance Metrics We have implemented the PWave routing algorithm and protocol in TinyOS and evaluated its performance using TOSSIM [10] simulation environment. Recognizing the asynchronous nature of PWave protocol and the lack of precise MAC timing and interference models in TOSSIM, we developed our own plug-in modules to count the protocol iteration steps. We conducted our experiments with a network setting with grid layout, denoted as NS1 and a setting with random layout, denoted as NS2. Both layouts contain 400 nodes. In NS1, the nodes are placed on grid locations of a (200m X 200m) square. In NS2, the nodes are randomly distributed in a (400m X 400m) square. The regular structure of NS1 enables better illustration of key concepts, while NS2 simulates a sensor layout closer to environmental monitoring applications.

Potential Field Shortest Cost Field Normalized Absolute Field Value Change

Radio Range = 10 Radio Range = 13 Radio Range = 19

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

0 0

Speed (num of iterations)

1 0.9

2

4 6 8 10 12 Number of Hops from Event Location

14

16

2

10

Potential Field Shortest Cost Field

1

10

0

10

0

0.5

1

1.5

2

2.5

0 0

Speed (num of iterations)

Normalized Absolute Potential Value Change

1 0.9

Overhead (num of msg’s)

Overhead (num of msgs)

4

10

Potential Field Shortest Cost Field

2

10

0

10

0

0.5

1

1.5

2

2.5 Spatial Effect (num of hops)

Spatial Effect (num of hops)

2

10

Potential Field Shortest Cost Field

1

10

0

10

0

0.5

1 1.5 Estimation Accuracy (%)

2

2.5

5

10 15 20 Number of Hops from Event Location

25

15 Potential Field Shortest Cost Field

10 5 0 25 300

30

35

40 Potential Field Shortest Cost Field

200 100 0 25 25 20 15 10 5 0 25

30

35

40 Potential Field Shortest Cost Field

30

35

40

Radio Range (m)

Fig. 3. Spatial Decaying of Perturbations for NS1 (a) and NS2 (b); PWave Sensitivity to Accuracy (c) and Radio Range (d)

PWave: A Multi-source Multi-sink Anycast Routing Framework for WSNs

187

We evaluated the performance of PWave in terms of convergence time, communication overhead, and locality of effects of network changes (spatial effect). The convergence time is measured by the number of iterations of the PWave algorithm until equilibrium is reached (all changes in potential are below the given threshold). The overhead is measured by the total number of messages broadcasting needed to reach equilibrium. The spatial effect of a single network change is measured by the maximum number of hops from the point of change where there is a node with potential value affected by the initial network change. 4.2 Evaluation of the PWave Protocol Here we investigate the dynamic performance of PWave protocol in reacting to network dynamics. The other aspect of PWave’s performance, the initialization of the potential field, has lesser impact on the overall performance of PWave since it is a one-time cost that can be minimized through pre-computed distribution guess and amortized through long communication session. We first evaluate the locality of impact following a network event and then the sensitivity of PWave performance to the estimation accuracy and network density. Locality of Perturbations (Spatial Effect). Fig. 3(a) shows the spatial decay of a 10% data rate increase at (50,50) inside network NS1 with source at (90,90), sink at (0,0) and the radio range of 13m. We observe that this change has an exponentially spatial decay, essential for the scalability and resilience of PWave. We also observe that the event perturbation decays faster with larger radius range, which corresponds to higher network density. Thus better robustness through locality can be achieved with denser deployment of senor nodes. Fig. 3(b) shows the spatial decay effect for a network event of 1/3 link cost increase around node x at (193,190) inside NS2 network with source at (389,379), sink at (32,1) and the radio range of 25m. Observe that PWave still shows exponential decay while the event in Smallest Cost Field (SCF) propagates with constant value throughout the network. This result shows the resilience of PWave to network event. Last, observe that the link const change event is general in that it can include link outages and new nodes joining the network, and thus the above observations hold for a wide range of network events. PWave Sensitivity. Using the same setting from which Fig. 3(b) was generated, Fig. 3(c) shows the PWave and SCF performance for a range of relative accuracy requirements. First observe that SCF is insensitive to the relative accuracy requirement since SCF has constant event propagation. On the other hand, PWave shows a clear decrease in convergence time, communication overhead, and reduced spatial effect for higher levels of error tolerance. In particular, for relative tolerance greater than 0.6%, PWave incurs lower overhead and faster convergency than SCF, primarily due to the tighter affected region. From our experiments, this accuracy is sufficient for traffic allocation purpose. Using the same setting with a fixed relative accuracy of 0.6%, Fig. 3(d) shows that denser networks (larger radio range) have decreased convergence times, require smaller refreshing overhead, and have a smaller number of nodes impacted by a network change, in both PWave and SCF. But PWave incurs less overhead

188

H. Liu et al.

and converges much faster at higher density than SCF. We also observe that SCF’s impacted area (spacial effect) is larger than PWave. 4.3 Network Lifetime Maximization In the following we illustrate performance of PWave when applied in the context of network lifetime maximization (see section 2.2) and compare with other two state-ofthe-art protocols, Directed Diffusion [2] and Energy-aware Routing [4]. Our experiments use the NS1 network setting with four source nodes located closely at (144,135), (153,135), (153,144) and (144,144). Three sink nodes are located at (27,27), (144,27) and (27,144). All nodes in network have the same amount of battery energy level initially except sink nodes and source nodes which are set to have high energy level to guarantee that source and sink nodes will not run out of battery before intermediate nodes given that the purpose of this experiment is to evaluate the efficiency of the traffic balancing obtained from PWave and its competitors. To reduce simulation time, the battery energy level of each non-sink, no-source node is set to be able to handle(either receive or transmit) at most 10,000 packets. The life time of the network is defined as the duration from the start of experiment until the first node runs out of energy. 4

4

4 sources

6 5 4 3 2 1 0 200 150 100

sink1

50

sink2

Directed Diffusion Energy Aware Routing Potential Based Routing

3.5 Number of Packets Transmitted

7

x 10

0

50

100

150

200

sink3

3 2.5 2 1.5 1 0.5 0 10

15

20

25 30 Radio Range (m)

35

40

Fig. 4. (a) Estimated Potential Field; (b) Network Lifetime vs Network Density

We illustrate in Fig. 4(a) the potential field constructed by PWave under relative accuracy of 0.6% with 31 iteration rounds and 2314 message overhead, and observe its properties of monotonicity and lack of local minima. Fig. 4(b) shows that PWave achieves 2.7 to 8 times longer lifetime compared to the baseline shortest path Directed Diffusion. and up to 5 times longer lifetime over the energy-aware routing scheme. PWave achieves the major lifetime extension through better traffic balancing and take advantage of existence of multiple sinks none of the existing schemes are exploring. Fig. 5 shows the traffic distribution normalized to the total traffic. Observe that less traffic is allocated on nodes that are along longer paths from sources to sinks. For example, the sink at (27,27) received less traffic than the other two sinks due to its longer distance from the sources. While all nodes around a sink received traffic allocation, the ones that face the source nodes received more allocation due to their smaller distance to sources.

PWave: A Multi-source Multi-sink Anycast Routing Framework for WSNs

189

4 sources

0.35

sink 3

sink 1 0.3

0.25

sink 2

0.2

0.15

0.1

0.05

0

200 150 100 50 0

20

40

60

80

100

120

140

160

180

Fig. 5. Traffic Spatial Distribution

This experiment thus verifies that PWave achieve balanced traffic allocations with more traffic allocated on shorter paths.

5 Related Work Well-studied network optimization is generally related to our work. The classic work in this area can be traced back to [11,9]. All of these work formulated the optimization problem over traditional unicast/multicast communication pattern with multiple commodity flow model. PWave address a single commodity flow between multiple sources and multiple sinks with anycast communication pattern. The key mathematical difference between these two settings lies in the end-to-end flow rate constraints. Traditional mode requires that traffic (one commodity) generated by a source must be equal to the traffic received by the corresponding destination while PWave only requires the sum of the traffic generated by all sources is equal to the sum of the traffic received by all destination nodes. Many multipath routing schemes for WSNs [3,4,5] are related to our work. But these schemes are based on heuristic alternative paths selection without clear global optimization objective. And they did not address the anycast problem. The traffic-aware routing scheme proposed in [12] is also relevant. [12] builds local potential field (using a taut elastic membrane analogy), around shortest path as an accessory to route data around hot spots. This scheme builds on top of link-state routing protocol and aims at Internet type networks. It requires substantial message exchanges and intensive computations, thus not applicable in WSNs.

6 Conclusion and Future Work In this paper, we presented a novel anycast routing framework that supports global optimization of custom objectives via a fully distributed, highly scalable and resilient protocol. Key properties of this framework are proved through theoretical analysis and verified through simulations. Using network lifetime maximization problem as one example, we illustrated the power of this framework by showing a 2.7 to 8 time lifetime

190

H. Liu et al.

extension over Directed Diffusion and up to 5 times lifetime extension over the energyaware multipath routing scheme proposed in [4]. As our immediate future work, we plan to pursue systematic performance evaluation in real world settings.

References 1. Intel: Intel Heterogeneous Sensor Networks Project (2004) http://www.intel.com/ research/exploratory/heterogeneous.htm. 2. Intanagonwiwat, C., etc.: Directed Diffusion: A Scalable and Robust Communication Paradigm for Sensor Networks. In: Proc. of ACM Mobicom’00. (2000) 3. Ganesan, D., etc.: Highly-Resilient Energy-Efficient Multipath Routing in Wireless Sensor Networks. ACM Mobile Computing and Communication Review 5(4) (2001) 4. Shah, R.C., Rabaey, J.: Energy Aware Routing for Low Energy Ad Hoc Sensor Networks. (In: Proc. of WCNC’02) 5. Ye, F., Zhong, G., Lu, S., Zhang, L.: GRAdient Broadcast: A Robust Data Delivery Protocol for Large Scale Sensor Networks. J. of Wireless Networks 11(3) (2005) 285–298 6. Liu, H., Zhang, Z., Srivastava, J., Firoiu, V.: PWave: Flexible Potentila-based Routing Framework for Wireless Sensor Networks. Technical report, (http://www-users.cs.umn. edu/˜hliu/pwave.pdf) 7. Doyle, P.G., Snell, J.L.: Random Walks and Electric Networks. Mathematical Assn. of America (1984) 8. Qian, H., Nassif, S., Sapatnekar, S.: Power Grid Analysis Using Random Walks. IEEE Tran. on CAD of Integrated Circuits and Systems 24(8) (2005) 1204–1224 9. Stern, T.: A Class of Decentralized Routing Algorithms Using Relaxation. IEEE Trans on Comm. (25(10)) 10. Levis, P., Lee, N., Welsh, M., Culler, D.: TOSSIM: Accurate and Scalable Simulation of Entire TinyOS Applications. In: Proceedings of SenSys’03. (2003) 126–137 11. Gallager, R.: A Minimum Delay Routing Algorithm Using Distributed Computation. IEEE Trans on Comm. (25(1)) 12. Basu, A., Lin, A., Ramanathan, S.: Routing Using Potentials: A Dynamic Traffic-Aware Routing Algorithm. Proceedings of ACM SIGCOMM’03. (2003)

Simple Models for the Performance Evaluation of a Class of Two-Hop Relay Protocols Ahmad Al Hanbali1,3 , Arzad A. Kherani2, and Philippe Nain1 1

INRIA, Sophia-Antipolis, Cedex 06902, France Dept. of Computer Science and Engineering, IIT Delhi, New Delhi, India 3 Dept. of Computer Science, Universit´e de Nice-Sophia Antipolis, France

2

Abstract. We evaluate the performance of a class of two-hop relay protocols for mobile ad hoc networks. The interest is on the multicopy two-hop relay (MTR) protocol, where the source may generate multiple copies of a packet and use relay nodes to deliver the packet (or a copy) to its destination, and on the twohop relay protocol with erasure coding. Performance metrics of interest are the time to deliver a single packet to its destination, the number of copies of the packet at delivery instant, and the total number of copies that the source generates. The packet copies at relay nodes have limited lifetime (time-to-live TTL). Via a Markovian analysis, the three performance metrics of the MTR protocol are obtained in closed-from in the case where the number of the copies in the network is limited. Also, we develop an approximation analysis in the case where the inter-meeting times between nodes are arbitrarily distributed and the TTLs of the copies are constant and all equal. In particular, we show that exponential intermeeting times yield stochastically smaller delivery delays than hyper-exponential inter-meeting times, and that exponential TTLs yield stochastically larger delivery delays than constant TTLs. Finally, we characterize the delivery delay and the number of transmissions in the two-hop relay protocol with erasure coding and compare this scheme with the multicopy scheme. Keywords: Mobile ad hoc network, Two-hop relay protocol, Erasure coding, Mobility model, Analytical model, Markovian analysis, Performance evaluation.

1 Introduction In mobile ad hoc networks (MANETs), since there is no fixed infrastructure and nodes are mobile, routes between nodes are set up and turn down dynamically. For this reason, MANETs often experience route failures and network disconnectivity, especially when the nodes are moving frequently and the network is sparse. Grossglauser and Tse [9] propose to make the mobile nodes serve as relays in order to increase the network throughput in MANETs. Their relay mechanism, called two-hop relay protocol, is simple: if there is no direct route between the source node and the destination node, the source node transmits its packets to the nearest neighbor node (called relay node) for delivery to the destination. Then, it was shown that with this protocol it is possible

The authors acknowledge the support of the European IST project BIONETS and of the Network of Excellence (NoE) EuroNGI.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 191–202, 2007. c IFIP International Federation for Information Processing 2007

192

A. Al Hanbali, A.A. Kherani, and P. Nain

to schedule Θ(N ) concurrent successful transmissions per time-slot, where N is the number of nodes [9]. After that, it was observed that the delay experienced by the packets under this protocol is large [7,13,17]. In order to reduce this delay it is proposed in [5,13] to allow the source transmits the packet to all its neighboring nodes (not only to the nearest neighbor). In this paper we evaluate the performance of two variants of the basic two-hop relay protocol: the multicopy two-hop relay (MTR) protocol and the two-hop relay protocol with erasure coding when N is finite. Before describing these two variants that aim at improving the delay, let first introduce the nodes mobility model. All nodes move independently of each other according to the same random mobility model inside a two-dimensional bounded region. Two nodes may only communicate at certain points in time, called meeting times. The time that elapses between two consecutive meeting times of a given pair of nodes is called the inter-meeting time. The following will be assumed throughout: (A1) Transmission times are instantaneous. (A2) All inter-meeting times are independent and identically distributed (iid) random variables (rvs) with a common cumulative distribution function (CDF) G(·). Assumption (A1) will be justified in delay tolerant networks, where the incurred delay to send a packet may be very large with respect to the transmission times [1]. For Assumption(A2), in [15] it has been shown that when nodes move independently on a sphere and they have uniform (stationary) spatial distribution, the inter-meeting time distribution is approximately the exponential distribution with mean 1/λ for λt 1. Furthermore, in [8] the exponential approximation of inter-meeting times distribution was validated for the random mobility models inside a square of non-uniform spatial distribution such as the random waypoint model [3] and of uniform spatial distribution such as the random direction model [12]. Moreover, it was observed that assumption (A2) is “reasonable” as long the node transmission range is not “too large” with respect to the area where nodes move. On the other hand, if nodes are humans moving in a conference space it has been found that the inter-meeting times distribution shows a heavier-than-exponential tail [4]. In this paper, we will show the impact of considering arbitrary and exponential inter-meeting times distribution on the performance of the MTR protocol. See Section 3 for more details. We consider the scenario consisting of N + 1 mobile nodes: one source node, one destination node, and N − 1 relay nodes. The source has a single packet to transmit to the destination. We now introduce the two variants. MTR protocol. In the MTR protocol the source node may either transmit the packet directly to the destination node when both nodes come within transmission of one another, or use the relay nodes. In the latter case, if the source meets a relay node before meeting the destination, then it sends a copy of the packet to this relay node; this relay node will only transmit the packet to the destination when it comes close to it (as opposed to the epidemic routing protocol [20], also called the unrestricted relay protocol [8], where a relay node is allowed to send a copy (of its copy) to another relay node). Section 2 evaluates the performance of the MTR protocol in the case where the number of copies in the network is limited. We define, Td , the (packet) delivery delay as the first time when the destination receives the original packet or a copy, whichever arrives first to the destination. We assume

Simple Models for the Performance Evaluation of a Class of TR Protocols

193

that the packet at the source cannot be dropped before the transmission has taken place, i.e., before time Td if the packet joins the source at time t = 0. On the other hand, each copy has a time-to-live (TTL) associated with it: when a TTL expires, the relay node that holds the copy drops it. This relay node then becomes eligible to receive another copy. We assume that the source cannot transmit a copy to a relay node that already holds a copy. Two-hop relay protocol with erasure coding. The relay protocol is the basic two-hop relay protocol. A node may only transmit a piece of information either directly to the destination or to at most one relay node. In the two-hop relay protocol with erasure coding, the source introduces some redundancy in its transmissions and sends more data than the actual information. Upon receiving a piece of data (packet), the source produces n blocks of data. The transmission of the packet is completed when the destination receives the kth block, regardless of the identity of the k ≤ n blocks it has received [15,19]. More details are provided in Section 4, where the Laplace-Stieljes transform (LST) of the delivery delay and the z-transform of the number of transmissions are derived in closed-form. Note that under the above assumptions, the delivery delay obtained in our setting gives a lower-bound, as a consequence of the instantaneous transmission time. Second, the total number of copies per-packet generated gives an upper-bound. This is so because in the realistic context the source will not systematically transmit a packet to a relay node that it encounters. The rest of the paper is organized as follows: Section 2 evaluates the performance of the MTR protocol with the assumptions that the number of packet copies in the network is limited, and that inter-meeting times and TTLs are distributed exponentially. In Section 3, we show the impact of considering arbitrary distribution of inter-meeting times and constant TTLs on the delivery delay of MTR. Section 4 finds the delivery delay and the number of transmissions for the two-hop relay protocol with erasure coding, and compares with erasure coding scheme.

2 Performance of MTR Protocol with Limited Number of Copies In this section we consider the MTR protocol, with exponentially distributed node intermeeting times with parameter λ (i.e., G(t) = 1 − e−λt ), exponentially distributed TTLs with parameter μ, and where the number of copies of a packet in the network may not exceed K (including the packet at the source), where K is an arbitrary integer less than or equal to N (in [2, Sec. 3] K = N ). We recall that we only focus on the transmission of a single packet between a given source and a given destination, and that the packet at the source has no TTL (only copies have a TTL). The performance metrics of interest are: Td the time needed to deliver the packet to the destination, Cd the number of copies of the packet before the delivery to the destination, and Gd the total number of transmissions before the delivery to the destination. Note that the latter metric is related to the energy needed to deliver a packet to the destination. In this section we derive closed-form expressions for the E[Tdn ] for all n ≥ 1, P (Cd = j), and E[Gd ]. We conclude this section by showing how these results can be used to find the value of K that minimizes E[Gd ], subject to a constraint on E[Td ].

194

A. Al Hanbali, A.A. Kherani, and P. Nain

Under the above assumptions it is easily seen that the system can be modeled as a finite-state absorbing Markov chain I = {I(t), t ≥ 0}, where I(t) ∈ {1, 2, . . . , K} gives the number of copies of the packet at time t < Td , and I(t) = a if t ≥ Td . The states 1, 2, . . . , K are the transient states and the state a is the absorbing state of I. Let q(i, j) denote the (i,j)-entry of, Q, the infinitesimal generator of I. It is easily found that q(i, i + 1) = (N − i)λ,

i = 1, . . . , K − 1,

q(i, i − 1) = (i − 1)μ, i = 2, . . . , K, q(i, i) = −[N λ + (i − 1)μ], i = 1, . . . , K − 1, q(K, K) = −[Kλ + (K − 1)μ], q(i, a) = iλ, i = 1, . . . , K, q(i, j) = 0, otherwise. The matrix Q can be written as Q=

QK R 0 0

,

(1)

where QK = [q(i, j)]1≤i,j≤K , R = (q(1, a), . . . , q(K, a))T , and 0 is a K-dimensional row vector with all entries equal to 0. We will show below that for any initial state I(0), E[Tdn ], P (Cd = j), and E[Gd ] can be derived in closed-form if one has a closed-form expression for Q−1 K , the inverse of QK . We now derive Q−1 K in closed-form. We note that QK can be decomposed as QK = ˆ K + buuT , where Q ˆ K is the K-by-K sub-matrix composed of the first K rows and Q columns of the matrix QN , u = (0, · · · , 0, 1)T , and b = λ(N − K). By applying the Sherman-Morrison formula [16, P. 76] we find that ˆ −1 Q−1 K = QK −

b ˆ −1 u 1 + buT Q K

ˆ −1 uuT Q ˆ −1 . Q K K

(2)

ˆ −1 . EquivLet qK (i, j) be the (i, j)-entry of Q−1 ˆK (i, j) be the (i, j)-entry of Q K and q K alently, (2) rewrites qK (i, j) = qˆK (i, j) −

qK (K, j) λ(N − K)ˆ qK (i, K)ˆ . 1 + λ(N − K)ˆ qK (K, K)

(3)

ˆ −1 . It remains to find the entries qˆK (i, j) of Q K −1 The matrix QN was obtained in closed-form in [2, Appendix I]. On the other hand, ˆ −1 is related to Q−1 through the relation simple algebra shows that the (i, j)-entry of Q K N qˆK (i, j) = qN (i, j) +

λ(N − K)qN (i, K)qN (K + 1, j) 1 − λ(N − K)qN (K + 1, K)

(4)

for 1 ≤ i, j ≤ K. A closed-form expression for qN (i, j) was obtained in [2, Eq. 25] for any i and j.

Simple Models for the Performance Evaluation of a Class of TR Protocols

195

2.1 The Performance Metrics Given that I(0) = i, the nth order moment of Td , which is by definition equal to the time to absorption of I, is given by [14, Chap. 2, Eq. 2.2.7] n Ei [Tdn ] = (−1)n n!(αi Q−n K e) = (−1) n!

K

(n)

qK (i, j)

(5)

j=1

for i = 1, . . . , K, where e is a K-dimensional column vector with all entries equal to 1, αi is a K-dimensional row vector with all entries equal to 0 except the ith one that is (n) equal to 1, and qK (i, j) is the (i, j)-entry of Q−n K that can be expressed in closed-form in terms of qK (i, j) which are given in (3)-(4). Given that I(0) = i, the probability distribution of the number of copies just at delivery time is [2, Sec. 3.2] Pi (Cd = j) = −jλqK (i, j),

i = 1, . . . , K.

(6)

Given that I(0) = i, the expected total number of transmissions before delivery is given by [2, Sec. 3.3] 1 Ei [Gd ] = αEi [Td ] + Ei [Cd ] + bqK (i, K) + δ , (7) 2 where α := (λN − μ), b = λ(N − K), and δ :=

1 ρ

− i − 1.

2.2 Minimizing the Consumed Energy The total number of copies that are transmitted is directly related to the energy consumed to deliver the packet to the destination. Our objective now is to use the above results to find, Kopt , the optimal value of K which minimizes the expected total number 25

λ=0.004 ρ=1 λ=0.002 ρ=1 λ=0.001 ρ=1

Kopt

20

15

10

5 0

50

100

150

200

250

300

c

Fig. 1. The optimal maximum number of copies (Kopt ) as a function of constraint (C) on E[Td ] for N = 100 and C ∈ [E (100) [Td ], 2E (100) [Td ]]

196

A. Al Hanbali, A.A. Kherani, and P. Nain

of copies that are transmitted before the delivery of the packet to the destination, subject to a constraint on the expected delivery delay, that is, min

{K:E (K) [Td ]≤C}

E (K) [Gd ],

where E (K) [Td ] := E1 [Td ] and E (K) [Gd ] := E1 [Gd ]. The superscript (K) emphasizes the dependency in the variable K. Since the integer mappings K → E (K) [Td ] and K → E (K) [Gd ] are strictly decreasing and strictly increasing, respectively, the solution to this constrained optimization problem is obtained for the smallest integer K in [1, N ] such that E (K) [Td ] ≤ C if E (N ) [Td ] ≤ C, and has no solution if E (N ) [Td ] > C. Figure 1 reports Kopt as a function of C for ρ = 1 and for three values of λ. We observe that Kopt has a sharp decay as C increases.

3 Impact of Arbitrary Inter-meeting and Constant TTLs on MTR In [4], it has been observed that the inter-meeting times distribution has heavier-thanexponential tail. This finding was the motivation to investigate the impact of arbitrary inter-meeting times distribution on the delivery delay of the MTR protocol. Throughout this section we assume that for any pair of nodes their inter-meeting times are iid with distribution G(t), and all inter-meetings are mutually independent. Let X be a generic rv with distribution G. Also define G∗ (s) = E[e−sX ] the LST of X. We assume that TTLs are constant and all equal to T . As a result, the stochastic process I is no longer a Markov process and a different approach has to be used in order to evaluate the delivery delay of the MTR protocol. For sake of simplicity we consider the case where K = N , i.e., there is no restriction on the number of packet copies in the network. For convenience we label the nodes so that node 0 is the source, node N is the destination, and nodes 1, 2, . . . , N − 1 are the relay nodes. Since K = N , we have st

Td = min(Xsd , D1 , . . . , DN −1 ),

(8)

st

where Xsd = X represents the inter-meeting time between the source and the destination, and Di is the time needed for relay node i = 1, 2, . . . , N − 1 to deliver a copy of the packet to the destination. Moreover, the rvs Xsd , D1 , . . . , DN −1 are mutually independent and the rvs D1 , . . . , DN −1 are identically distributed. Hence, P (Td < t) = 1 − (1 − G(t))P (Di > t)N −1 .

(9)

We need to determine P (Di > t). We shall actually find an approximation formula for P (Di > t) since finding an exact expression is a very difficult task, unless G(t) is the exponential distribution that is considered in the end of this section. From now on i is fixed in {1, . . . , N − 1}. We assume that the source, destination and relay node i are in steady-state at time t = 0, and that the relay node i does not hold a copy of the packet at t = 0 (only the source holds the original packet at t = 0). Let R record the number of times the relay node i has dropped a copy of the packet before it transmits it to the destination. On the event R = m + 1, let ak > 0 be the

Simple Models for the Performance Evaluation of a Class of TR Protocols

197

arrival time of the kth copy to the relay node i for k = 1, . . . , m + 1, let dk > ak be the time where the kth copy is dropped by relay node i for k = 1, . . . , m, and let em+1 be ˆ = a1 , Zk = ak+1 − dk the time where copy m + 1 reaches the destination. Define X ˆ for k = 1, . . . , m, and Z = em+1 − am+1 . Clearly, ˆ + Z1 + · · · + Zm + mT + Zˆ Di = X

(10)

ˆ Z1 , . . . , Zm , Yˆ are mutually on the event R = m + 1. Given that R = m + 1, the rvs X, independent; moreover the rvs Z1 , . . . , Zm are iid. Let D∗ (s) := E[e−sDi ] be the LST of Di . We have ˆ ˆ −sT D∗ (s) = E[e−sX ]E[e−sZ ] (e E[e−sZk ])m P (R = m + 1). (11) m≥0

1. Evaluation of ZT∗ (s) := E[e−sZk ]: Recall that X denotes a generic inter-meeting time and that its density probability is g(·). Let hT (t) := dP (Zk < t)/dt be the probability density of Zk . The reason why we indicate the dependency on the parameter T in hT (t) will soon become apparent. If the st source does not meet the relay node i in (ak , ak +T ) then Zk = ak+1 −ak −T = X −T , otherwise Zk = ak+1 − ak − (T − (ak − ak )) where ak is first time the source meets the st st relay node i in (ak , ak + T ). The latter rewrites Zk = X1 + X2 − T with Xj = X for ∞ j = 1, 2. From this we deduce that ZT∗ (s) = 0 hT (t)e−st dt satisfies the following renewal equation ∞ T ZT∗ (s) = e−st g(T + t)dt + g(u)ZT∗ −u (s)du. (12) 0

0

that ZT∗ (s)

satisfies an integral equation (of Fredholm type) from which We have shown ZT∗ (s) can be obtained numerically using standard techniques [10]. 2. Probability distribution of R: Finding the probability distribution of R is difficult. We will first assume that R is a geometric rv with parameter π = 1 − P (R = 1). It is possible to find an integral equation for π. However, for sake of simplicity, we will assume that the destination node is at equilibrium at time a1 , so that π = 1−Ge (T ), with Ge (t) t the excess probability distribution of G(t), that is, Ge (t) = (1/E[X]) 0 (1 − G(u))du. In summary, P (R = m + 1) ≈ (1 − π)π m , for m ≥ 0. Plugging the approximation of P (R = m + 1) in (11) gives that D∗ (s) ≈

ˆ

1 − π (1 − G∗ (s)) E[e−sZ ] , sE[X] 1 − πe−sT ZT∗ (s)

(13)

where ZT∗ (s) is the solution of the integral equation (12), and π = 1 − Ge (T ) (Hint: ˆ ˆ E[e−sX ] = (1 − G∗ (s))/sE[X]). It remains to evaluate E[e−sZ ]. Again, this is not ˆ an easy task. Clearly, e−sT ≤ E[e−sZ ] ≤ 1. For sake of simplicity, we will replace ˆ E[e−sZ ] by 1. This gives the final approximation D∗ (s) ≈

(1 − G∗ (s)) 1−π . sE[X] 1 − πe−sT ZT∗ (s)

(14)

198

A. Al Hanbali, A.A. Kherani, and P. Nain

P (Di > t) is obtained by inverting (1 − D∗ (s))/s (since (1 − D∗ (s))/s = ∞Finally, −st e P (Di > t)dt) with the help of the complex inversion formula [18, Chap. 7], 0 which yields γ+i∞ 1 − D∗ (s) 1 ds, t > 0, (15) ets P (Di > t) = 2πi γ−i∞ s where the integration has to be performed along a line s = γ in the complex plane (in (15) i denotes the imaginary complex number). The real number γ must be chosen so that s = γ lies to the right of all singularities. Note that since P (Di > t) is a bounded function it is sufficient to take γ > 0. The approximation for FTd (t) := P (Td > t) has been computed when G(t) is an H hyper-exponential distribution, namely, G(t) = 1 − l=1 pl e−νl t , and compared to simulation results. The evaluation of the integral in (15) has been performed by using the procedure described in [6]. Numerical results are reported in Figure 2. Two hyper-exponential distributions, represented by the t-tuple (H, ν1 , . . . , νH , p1 , . . . , pN ), have been considered: (H1) (3, 0.09, 0.08, 0.07, 0.6, 0.3, 0.1) with mean 22.83sec., and (H2) (3, 0.05, 0.04, 0.03, 0.6, 0.3, 0.1) with mean 11.84sec.. The numerical results of the mathematical model were done using a C program for a network composed of one source, one destination, and N − 1 relay nodes. Let Td (app) and Td (sim) be the approximate and simulated delivery delays, respectively. Let FTd (app)(.) (resp. FTd (sim)(.)) denotes the complementary cumulative distribution function (CCDF) of Td (app) (resp. Td (sim)). Figure 2.(a) displays the mappings t → FTd (app) (t) and t → FTd (sim) (t) for the hyper-exponential distributions H1 and H2. We observe that the approximation is accurate for moderate value of N . Figure 2.(b) compares FTd (sim) (t) with the CCDF of Td in the case where the intermeeting times are exponentially distributed, where the latter distribution has been obtained in closed-form. See (16) in the following for details. We conclude from these results that Td under exponential inter-meeting times is stochastically smaller than Td under hyper-exponential inter-meeting times. This is related to the fact that the hyperexponential distribution has a fatter tail than the exponential distribution. We conclude this section by briefly addressing the simple case where the intermeeting times are distributed exponentially with rate λ, and the TTLs are constant and all equal to T . In this case we derive the closed-form expression of the delivery delay of the MTR protocol as follows T m (Am )k+1 − (Bm )k+1 1+ , (k + 1)! m=0 t

P (Di > t) = e

−λt

(16)

k=0

where Am := λ(t − mT ), Bm := λ[t − (m + 1)T ]+ , x designates the largest integer ≤ x, and [x]+ := max(0, x). Plugging (16) in (9) gives the CDF of Td . By comparing the CCDF of Td with constant TTL= T and with TTLs distributed exponentially with mean T , we deduced that Td with constant TTL is stochastically smaller than Td with TTL exponential.

Simple Models for the Performance Evaluation of a Class of TR Protocols

1

1 Simul. E[X]=22..8s T=1s Approx. E[X]=22.8s T=1s Simul. E[X]=11.84s T=1s Approx. E[X]=11.84s T=1s

0.9

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1 20

40

60

80

Hyper. E[X]=22..8s T=1s Expo. E[X]=22.8s T=1s Hyper. E[X]=11.84s T=1s Expo. E[X]=11.84s T=1s

0.9

FTd(.)

FTd(.)

0.8

0 0

199

0 0

100

10

20

30

Time (s)

40

50

60

Time (s)

(a)

(b)

Fig. 2. (a) Mappings t → FTd (app) (t) and t → FTd (sim) (t) for two different hyper-exponential distributions (N = 10). (b) Comparison of CCDF of Td in the case of hyper-exponential (simulated CCDF) and exponential (CCDF using (16)) inter-meeting time distributions (N = 50).

4 Two-Hop Relay with Erasure Coding We now consider a system where the source introduces some redundancy in its transmissions and sends more data than the actual information. The advantage of this mechanism is that it can considerably reduce the variance of the delivery delay at the expense of an increase of its expectation. One of these techniques is known as erasure coding [11]. Erasure coding with replication factor r works as follow. Upon receiving a packet of size M , the source produces n = r · M/b equal sized code blocks of size b, such that any of the k = (1 + ) · M/b code blocks can be used to reconstruct the packet. Here is a small constant, close to zero [11]. Thus, the destination is able to decode the packet if it receives k ≤ n blocks. On the other hand, when k = 1, the size of a block becomes almost equal to M , the packet size, and in this case the destination needs to receive a block in order to R(t)

λ

λ(n−k+1)

a

λN

k−1

λ(i−l)

l+1 l

λ(N−i+l)

λ(n−1)

λN λ(N−1) λ(N−2)

0 0

1

2

3

4

5

i

λ(N−i) i+1

λn

λ

λ(N−1)

2λ

λN

1

iλ

λ

2

B(t)+R(t) n

Fig. 3. Transition diagram of the Markov chain {A(t), t ≥ 0}

200

A. Al Hanbali, A.A. Kherani, and P. Nain

decode the packet. Thus for k = 1, the erasure coding scheme is the same as a simple multicopy scheme in which the source sends exactly one copy of a packet to n different relay nodes [19]. We will exploit this observation to compare erasure coding with the multicopy scheme in the following. Throughout this section the stochastic model is the following. There are N relay nodes, one source node, and one destination node. We assume that the source cannot send directly a packet to the destination. Inter-meeting times between any pair of nodes are exponentially distributed with rate λ, except for the pair source-destination. Under this setting, the only way to forward the data from the source to the destination is through the relay nodes. The source has only one packet to send to the destination, and the source implements the erasure coding algorithm with replication factor r and parameter k. Hence, the destination needs to receive k ≤ n of the blocks in order to decode the original packet. The forwarding mechanism used to deliver the blocks to the destination is the standard two-hop relay protocol. We assume that there is only one copy of a block in the network. A relay node can only relay one block at a time, and it is possible for a relay node that already delivered a block to the destination receives a new block when it again encounters the source. There is no TTL associated with the blocks. Let Td and Gd be the delivery delay and the total number of source-relay transmissions at the time when the kth block reaches the destination, respectively. Introduce the joint transform H(s, z) := E[e−sTd z Gd ], s ≥ 0, |z| ≤ 1. We now evaluate H(s, z). Let A(t) = (B(t), R(t)) denote a two-dimensional process such that A(t) = (m, l), 1 ≤ m ≤ n, 1 ≤ l ≤ k − 1, m + l ≤ n, if there are m relay nodes that hold m blocks (one block for each relay node) and the destination has received l blocks at time t < Td , and A(t) = a when t ≥ Td (a is an absorbing state). Under the above assumptions, {A(t), t ≥ 0} is a finite-state absorbing Markov process. Figure 3 displays the transition diagram of this Markov chain, where the y-axis represents R(t) and the x-axis represents the sum B(t) + R(t). More precisely, a point (i, l), i ≥ l, in the transition diagram means that the destination has received l blocks and that there are i − l relay nodes that hold i − l blocks (one block for each). Let ji ≥ 0 denote the number of jumps (transitions) along the horizontal line of index i ∈ {0, · · · , k − 1}. Let Si denote the total number of jumps along the lines of index less than or equal to i. Given Si−1 , the probability of making ji jumps along the horizontal line i is

−Si−1 +i)! Si −i P1 (ji ) = (N (N −Si +i)! N ji +1 , Si < n (17) (N −Si−1 +i)! 1 P2 (ji ) = (N −n+i)! N ji , Si = n, for 0 ≤ i ≤ k − 1. Let m∗ denote the index of the horizontal line such that Sm∗ −1 < n and Sm∗ = n. Conditioned on all the possible paths before absorption at {a} and given that A(0) = (0, 0), H(s, z) can be written as H(s, z) =

n−1

···

j0 =1 ∗

×

jk−1 =0

m −1 l=0

n−1

n−Sm∗ −2 n k−1 zN λ Sk−1 +k P1 (jl ) + ··· P2 (n − Sm∗ −1 ) s + Nλ j =1 j ∗ =0 l=0

0

zN λ n+m∗ k−1

λ(n − l) , P1 (jl ) s + Nλ s + λ(n − l) ∗ l=m

m −1

(18)

Simple Models for the Performance Evaluation of a Class of TR Protocols

201

We now evaluate theexpectation and the variance of Td for different values of k, n, and N . Let σTd := var(Td )/E[Td ] denote the normalized standard deviation of Td . As noted earlier, erasure coding when k = 1 is similar to the multicopy scheme. Table 1 shows that E[Td ] increases with k and that σTd decreases with k. For instance, for N = 20 and n = 10 when k increases from 1 to 5 E[Td ] increases by a factor of 3 while σTd decreases by a factor of 7.5, thereby showing that erasure coding has a much lower variability than the multicopy scheme. A similar result was found in [19] under the assumption that the source instantaneously transmits all its n blocks to n different relay nodes. Table 1. Erasure coding (k > 1) vs multicopy scheme (k = 1) for different value of n and N (N, n) (20,10) (40,10) k 1 2 5 1 2 5 E[Td ] (sec.) 31.51 42.04 96.69 22 33.7 79.76 σTd 8.7 4.9 1.17 2.01 1.38 0.63

We conclude this section by investigating the behavior of TN∗ (s) := E[e−sTd ], the LST of Td , as N is large. We observed that as N becomes large the most probable path (MPP) is where all n blocks are first transmitted to n relay nodes, and then these relay nodes start to deliver these blocks to the destination. Further, it is easy to see that as N is large the system has a deterministic path MPP with probability one. Therefore, TN∗ (s) ≈

N λ n k−1

λ(n − l) s + Nλ s + λ(n − l)

(19)

l=0

5 Concluding Remarks In this paper, we have studied a class of two-hop relay protocols. The interest was on the multicopy two-hop relay (MTR) protocol and on the two-hop relay protocol with erasure coding. Closed-form expressions have been derived for nth order moment of the time to deliver a packet to its destination, the distribution of the number of copies at delivery, and the expected total number of copies generated before delivery in the case where copies have limited lifetime (TTL) and where the number of copies in the network is limited. Also, We investigated the impact of arbitrary inter-meeting times distribution and constant TTLs on the delivery delay of the MTR protocol. In particular, we show that exponential inter-meeting times yield stochastically smaller delivery delay than hyper-exponential inter-meeting times. Finally, for the two-hop relay protocol with erasure coding the joint generating function of the delivery delay and of the number of transmissions was derived in closed-form. By analyzing these results, we found that the delivery delay in the case of erasure coding has much lower dispersion than the delivery delay of the multicopy scheme. As future work, we will study the delay when there are multiple sources with multiple packets to transmit to a set of destinations and where the relay nodes may have different mobility.

202

A. Al Hanbali, A.A. Kherani, and P. Nain

References 1. Delay Tolerant Research Group, http://www.dtnrg.org. 2. A. Al Hanbali, P. Nain, and E. Altman, Performance of Two-hop Relay Routing Protocol With Limited Packet Lifetime, Proc. of Valuetools 2006, Pisa, Italy, Oct. 2006. 3. C. Bettstetter, H. Hartenstein, and X. Prez-Costa. Stochastic Properties of the Random Waypoint Mobility Model. ACM/Kluwer Wireless Networks, Special Issue on Modeling and Analysis of Mobile Networks, vol. 10, no. 5, pp. 555-567, Sept. 2004. 4. A. Chaintreau, P. Hui, J. Crowcroft, C. Diot, R. Gass, and J. Scott, Impact of Human Mobility on the Design of Opportunistic Forwarding Algorithm, Proc. of INFOCOM 2006, Barcelona, Spain, Apr. 2006. 5. R. De Moraes, H. Sadjadpour, and J. Garcia-Luna-Aceves, Throughput-Delay Analysis of Mobile Ad-hoc Networks with a Multi-Copy Relaying Strategy, Proc. of IEEE SECON, Santa Clara, CA, Oct. 2004. 6. H. Dubner and J. Abate, Numerical inversion of Laplace transforms by relating them to the finite Fourier cosine transform, Journal of the ACM, Vol. 15, No. 1, pp. 115-123, Jan. 1968. 7. A. El Gamal, J. Mammen, B. Prabhakar, and D. Shah. Throughput-Delay Trade-off in Wireless Networks Proc. of INFOCOM 2004, Hong Kong, Apr. 2004. 8. R. Groenevelt, P. Nain, and G. Koole, The Message Delay in Mobile Ad Hoc Networks, Proc. of Performance 2005, Juan-les-Pins, France, October 2005. Published in Performance Evaluation, Vol. 62, Issues 1-4, Oct. 2005, pp. 210-228. 9. M. Grossglauser and D. Tse, Mobility Increases the Capacity of Ad hoc Wireless Networks, IEEE/ACM Transactions on Networking, Vol. 10, No. 4, Aug. 2002, pp. 477-486. 10. S. G. Mikhlin, Integral Equations. Pergamon Press, Oxford, 1964. 11. M. Mitzenmacher, Digital Fountains: A Survey and Look Forward, Proc. of IEEE Information Theory Workshop, TX, USA, Oct. 2004. 12. P. Nain, D. Towsley, B. Liu, and Z. Liu, Properties of Random Direction Models Proc. of INFOCOM 2005, Miami, FL, USA, Mar. 2005. 13. M. J. Neely and E. Modiano, Capacity and Delay Tradeoffs for Ad-Hoc MobileNetworks, IEEE Transactions on Information Theory, Vol. 51, No. 6, June 2005. 14. M. Neuts, Matrix-Geometric Solutions in Stochastic Models: An Algorithmic Approach. Johns Hopkins University Press, 1981. 15. E. Perevalov and R. Blum, Delay Limited Capacity of Ad hoc Networks: Asymptotically Optimal Transmissiom and Relaying Strategy, Proc. of INFOCOM 2003, San Francisco, USA, Apr. 2003. 16. W. H. Press, B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, 1988. 17. G. Sharma, R. Mazumdar, and N. Shroff Delay and Capacity Trade-offs in Mobile Ad Hoc Networks: A Global Perspective Proc. of INFOCOM 2006, Barcelona, Spain, Apr. 2006. 18. M. R. Spiegel, Schaum’s Outline of Theory and Problems of Laplace Transforms. McGrawHill, New York, 1965. 19. Y. Wang, S. Jain, M. Martonosi, and K. Fall, Erasure-coding based routing for opportunistic networks Proc. of SIGCOMM Wokshop on DTN, Philidelphia, PA, USA, Aug. 2005, 20. X. Zhang, J. Neglia, J. Kurose, and D. Towsley, Performance Modeling of Epidemic Routing Proc. of NETWORKING 2006, Coimbra, Portugal, May 2006.

Maximum Energy Welfare Routing in Wireless Sensor Networks Changsoo Ok1, Prasenjit Mitra2, Seokcheon Lee3, and Soundar Kumara1 1

The Department of Industrial Engineering, The Pennsylvania State University, University Park, PA 16802, USA {cuo108,skumara}@psu.edu 2 College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA 16802, USA [email protected] 3 The School of Industrial Engineering, Purdue University, West Lafayette, IN 47907-20232, USA [email protected]

Abstract. Most routing algorithms for sensor networks focus on finding energy efficient paths to prolong the lifetime of sensor networks. As a result, the sensors on the efficient paths are depleted quickly, and consequently the sensor networks become incapable of monitoring events from some parts of their target areas. In many sensor network applications, the events have uncertainties in positions and generation patterns. Therefore, routing algorithms should be designed to consider not only energy efficiency, but also the amount of energy left in each sensor to avoid sensors running out of power early. This paper introduces a new metric, called Energy-Welfare, devised to consider average and balance of sensors’ remaining energies simultaneously. Using this metric, we design the Maximum Energy Welfare Routing algorithm, which achieves energy efficiency and energy balance of sensor networks simultaneously. Moreover, we demonstrate the effectiveness of the proposed routing algorithm by comparing with three existing routing algorithms. Keywords: Sensor Network, Distributed algorithm, Energy aware routing, Energy Welfare, Social welfare, Energy balance.

1 Introduction Sensor networks report predetermined events or transmit sensed data to the base station for further analysis [1, 2]. The sensors contain a fixed amount of stored power and the process of sending data consumes some of that stored power. It is desirable that the sensors run as long as possible. In this work, we propose a routing algorithm that attempts to route messages efficiently so as to maximize the life of a sensor network. Consequently, the design of the routing algorithm for sensor networks should also incorporate the following factors [2]. z Due to sensors’ limited power, the routing algorithm should have a design to allow finding efficient paths to prolong the lifetime of the sensor network. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 203–214, 2007. © IFIP International Federation for Information Processing 2007

204

C. Ok et al.

z However, it is inevitable for most energy efficient routing algorithms to drive some sensors, which are close to the base station or on energy efficient paths, drained power quickly. As a result, the sensor networks become unable to detect events from regions where all sensors are nonfunctioning. Thus, in the sensor networks, data traffic should be dispersed and distributed over the whole network to extend its lifetime. z Although most existing routing algorithms assume that events are generated uniformly at each sensor, events could occur randomly [3], uniformly [4] over the target area, or repeatedly [5] at a specific part of the target area. Event patterns can change from one type to another over time. Therefore, the routing algorithm should be sufficiently robust for diverse event generation functions. This problem can be addressed by routing so as to utilize the energy uniformly over the entire sensor network. z In addition, a sensor network can consist of a large number of nodes for which a central control architecture does not apply. Therefore, the routing algorithm should adopt a local decision making scheme. Although the literature includes several routing algorithms, such as direct communication approach, hierarchical routing methods [6, 7], self-organized routing algorithm [4], and the other routing algorithms [8], little evidence exists for effectiveness and efficiency of these algorithms with respect to the considerations mentioned earlier.

BS

Equality

Average

50

e2,BS=5 E2=25 2

Residual Energy

e1,BS=10

49 32.5

25

40

0.79

0.97 34.5 25 20

e1,2=1 1 d E1=50

Path 1 (S1ÆBS) EW = 30.36

Path 2 (S1ÆS2ÆBS) EW=27.2

Fig. 1. An explanatory example for Energy Welfare Routing algorithm: Node 1 routes data to a path to maximize an Energy Welfare (Average×Equality) of sensor 1 and 2

We assume that the neighbors only get the information about the energy left in each neighbor and the energy required to transmit to the base station from that neighbor periodically. Individual sensors forward messages to neighbors that they think are on the “best” route to the base station. The determination of the optimal route is difficult because the individual sensors do not have the information about the

Maximum Energy Welfare Routing in Wireless Sensor Networks

205

dynamic topology of the entire network and the dynamic energy balances of each node on the network. This study proposes a new heuristic metric, called Energy Welfare, to achieve the efficiency and balance of energies of sensors simultaneously. Based on this metric, we propose a localized routing algorithm, the Maximum Energy Welfare (MaxEW) routing, to accomplish the two objectives. Figure 1 gives a simple example of the MaxEW routing. Ei and eij represent the residual energy of sensor i and energy required to transmit from node i to node j respectively. Sensor 1 has two paths to reach to the base station. Path 2 is more energy efficient than Path 1. However, if sensor 1 keeps using path 2, sensor 2 will run out its power while sensor 1 has sufficient energy. In MaxEW, sensors can avoid the traffic concentration to a sensor by using a metric, Energy Welfare (Average Equality), as a decision criterion. The Equality is in inverse proportion to the difference between energy levels of sensor 1 and 2. Based on the metric, sensor 1 chooses path 1 which causes higher energy welfare. The rest of this paper has the following organization: Section 2 presents the details of the considered sensor network. Section 3 defines energy equality and the welfare of sensor network as new metrics. After describing the details of the maximum energywelfare routing algorithms in Section 4, Section 5 presents extensive simulation results, and Section 6 details conclusions.

×

2 Sensor Network Model With n homogenous sensors randomly and uniformly distributed over a target area, every sensed data must be sent to the base station. Each sensor has limited battery power. Sensors can control their respective transmission power for minimal consumption to transmit to a destination [6, 7] and they have discrete adjustable transmission power levels [9-12]. This ability is necessary to allow the routing algorithm to maximize sensor networks’ operational times. Therefore, sensors can send data to either a neighbor or the base station directly, according to their routing policies [4, 6, 7]. The details of the problem are: 2.1 Energy Consumption Model Each sensor uses a fixed transmission power for communicating with its neighboring sensors while each sensor transmits data to the base station. The neighboring distance is defined as the maximal reachable distance with the fixed transmission power for neighboring sensors. For a given sensor the sensors within its neighboring distance are its “neighboring sensors” or “neighbors”. In this scheme, each node can be aware of the current energy level of its neighbors or energy required to transmit from its neighboring nodes to the base station by anticipating and/or eavesdropping data from the neighbors. Generally, sensors use their energy when they sense, receive and transmit data. However, the amount of energy consumption for sensing is unaffected by the routing algorithm and only a small difference exists between the power consumption of idle and receiving modes [13]. Therefore, consideration is only energy consumption by

206

C. Ok et al.

transmission in the design of the routing algorithms to maximize the lifetime of the sensor network. By normalization of the radio characteristic constant and the size of sensed data [4], the energy consumption model is simplified to E=d2, where E and d are the required energy and the transmission distance respectively. 2.2 The Lifetime of Sensor Network Validating the effectiveness of the proposed MaxEW routing algorithm uses the lifetime of sensor network as the performance measure [4, 6, 8, 11]. The definition of lifetime of a sensor network is the time or number of rounds that occur until the first node or a portion of nodes become incapable, due to energy depletion, of sending data to the base station directly or indirectly via its neighbors [4, 6, 8, 11, 14]. The portion (number of depleted nodes) can vary depending on the context of the sensor networks. In this paper, the lifetime of a sensor network is the number of rounds until the first (L1), 10% (L10), or 20% (L20) node(s) expend all their power [8, 11]. 2.3 Event Generation Functions For evaluation purposes, many previous studies of routing algorithms assumed that all sensors have uniform data or event generation rates [4, 6, 7]. In infrastructure monitoring applications, each sensor performs a sensing task every time, T, and has a homogeneous event generation function. However, in many sensor network applications, this assumption becomes unrealistic. In a monitoring application for the migration of a herd of animals, the animals might move along a path in the target area repeatedly [5]. While, in the case of forest fire detection, events can occur rarely and randomly over the target area [3]. Furthermore, some event generation functions can be a combination of uniform, random, and repeated types. Therefore, more reasonable is the consideration of several event types for evaluation of routing algorithms. The results section demonstrates that our algorithm is robust for the different types of event generation functions.

3 Energy Equality and Welfare To keep detecting events at an unknown position in a target area for as long as possible, routing algorithms should have a design to enable finding efficient paths, and, at the same time, prevent a particular set of sensors from being depleted early by a concentration of data traffic. In other words, the routing algorithms must achieve an energy balance for sensor networks while guaranteeing that sensors use their energies efficiently. Designing the routing algorithm first required a new measure for considering energy balance of sensor networks, as well as energy efficiency of routing algorithms. For this purpose, two definitions of Energy-Equality (EE) and EnergyWelfare (EW) apply.

Maximum Energy Welfare Routing in Wireless Sensor Networks

207

3.1 Energy-Equality (EE) To measure how well energy-balanced a sensor network is, we define the EnergyEquality (EE) of a given sensor network is Equation (1) and (2):

EE ( t , ε ) = 1 − I N ( t , ε ) IN

⎡1 (t , ε ) = 1 − ⎢ n ⎣⎢

∑

i∈ N

⎛ E i (t ) ⎞ ⎜⎜ E ( t ) ⎟⎟ ⎝ ⎠

1− ε

.

(1)

1

⎤ 1− ε . ⎥ ⎦⎥

(2)

I N (t , ε ) is the inequality(explained below) of the sensor’s residual batteries at

time, t. N and εare the set of nodes in the sensor network and the inequality aversion parameter. n and Ei(t) are the number of sensors and the remaining energy of sensor, i, at time, t, respectively. Basically, the derivation of the energy inequality index, I N (t , ε ) , is from the Atkinson inequality index [15]. Social scientists use the index to measure the inequality among entities with respect to their income. The aversion parameter ε reflects the strength of society’s penalty for inequality, and can take values ranging from zero to infinity. When ε equals zero, no penalty for inequality accrues. As ε rises, society penalizes inequality more. The values of ε that are typically used include, 1.5 and 2.5 [16, 17]. This aversion parameter provides a flexibility to apply this metric to diverse sensor network applications. 3.2 Energy-Welfare (EW) A drawback exists for only considering energy balance of a sensor network. If a routing algorithm only pursues the energy balance without considering energy efficiency, sensors’ residual energies might converge to a lower value. That is, sensors possibly use their energy in an inefficient way to achieve the energy-balance of the sensor network. Therefore, Energy-Welfare (EW) is the consideration of energy efficiency and energy balance of sensor networks simultaneously. EW is a simple form of weighting the average of sensors’ residual energies by EE. We can calculate EW using equation (3).

EW (t ) = E (t ) × EE (t , ε ) .

(3)

The equation for EW has the same form as Atkinson welfare function [18]. The EW is high where the average and equality of sensors’ remaining battery power are both high. A low average, or EE, leads to a low EW. Therefore, the EW is an appropriate metric to design a routing algorithm that improves sensor networks with energy efficiency and energy balance perspectives. Additionally, since a sensor

208

C. Ok et al.

network having a high EW, can monitor unknown position events for a long time, the EW has consideration as the preparedness of sensor networks for upcoming events.

4 Maximum Energy Welfare (MaxEW) Routing In this paper, we assume that each sensor uses a fixed transmission power for communicating with its neighboring nodes. To send a message from a sensor to the base station the total transmission power required is minimized if the sensor communicates directly with the base station. The sensors are aware of the minimum transmission power required to send a message to the base station and the current battery level of its neighbors. The basic idea of the proposed routing algorithm is simply to use a path which maximizes the EW of the sensors. When a node i needs to send data to the base station, it can transmit data to the base station directly or route the data to one of its neighbors (Ni). For evaluating these alternatives, node i calculates the EW of a local society which consists of its neighbor and itself for each alterative. That is, the node can anticipate the residual energies of its neighboring nodes and itself when the data is routed according to its decision as in (4).

{E (t ) : j ∈ N j

+ {i}} → {E j (t + 1) : j ∈ N i + {i}} . Di ( t )

i

(4)

Also, this node can calculate the EWij of these expected residual energies, Ej(t+1), for each decision, Di(t), by (5). By comparing these expected EWs, the node routes data to the path allowing the maximum EW of the local society, Ni+{i}.

EWij (t + 1) = E (t + 1)(1 − I N (ε ) (t + 1) ) .

(5)

Through this decision making scheme, MaxEW tries to maximize the energy welfare of the entire sensor network. In MaxEW, each sensor keeps a small size routing table only for its neighboring nodes. The routing table contains node identification number, minimum transmission power to the base station, and available energy, for each neighbor node. The details of the algorithm are: Initialize routing table. During the setup period, each sensor finds its minimum transmission power to the base station. Then, each sensor broadcasts a setup message to neighboring nodes with a pre-set transmission power. This setup message includes node ID., minimum transmission power to the base station, and available energy. Every node receiving this broadcast message registers the transmitting node as one of its neighbors. Since all nodes have an identical neighbor distance, two nodes within the neighbor distance are neighbors to each other. After the setup period, all sensors initialize their routing tables.

Maximum Energy Welfare Routing in Wireless Sensor Networks

209

Algorithm 1. MaxEW routing algorithm (at node i and time t)

∈

For all j Ni+{i} do If j = i then For all k Ni+{i} do If k = i then Ek(t+1) = Ek(t) - TEBk Else Ek(t+1) = Ek(t) End If End For Else For all k Ni+{i} do If k = i then Ek(t+1) = Ek(t) - TENk Else If k = j then Ek(t+1) = Ek(t) – TEBk Else Ek(t+1) = Ek(t) End If End For End If Compute EWij by (4) End For Choose J = ArgMax EWij and route data

∈

∈

j∈N i +{ BS }

(

)

Update routing table. A change of a neighbor’s energy level should be reflected in the routing table. When a sensor transmits data, all of its neighbors receive this data and get the current battery level of the transmitting sensor. As a result, whenever a sensor’s battery level changes, all routing tables including the corresponding sensor information are updated. Localized routing decision. Based on their routing tables, every node makes a local routing decision. Algorithm 1 gives a high-level description of the MaxEW algorithm of node i, at time, t. TEBk and TENk are the required transmission energies from node, k, to the base station and neighboring nodes, respectively. Also, EWij is the expected energy welfare of Ni +{i} when node i routes data to node, j. Based on this algorithm, node i selects J as the best candidate for transmitting data to the base station without considering whether J sends data directly to the base station or not. If the selection is that node i itself is the best node, it sends the data to the base station, finishing the routing process. Otherwise, the data is routed to node J and J performs the same process. This process continues until the base station receives the data. This localized decision making process results in a monotonic increase of EW because the best candidate can have a better indirect path than direct transmission.

210

C. Ok et al.

Base station ni nj

nk Routing path Possible path

Fig. 2. An example routing path: ni sends data to nj, nj to nk, then nk sends to the base station directly

Fig. 2 shows how the MaxEW algorithm operates over a sensor network. For a given data, ni chooses nj among several possible routes. After the data passes to nj, energy level of ni changes and the routing table of nj also changes. nj performs the same process sequentially. In the figure, nk sends data to the base station directly because the transmission of nk, itself results in the maximum EW of nk’s local society compared to other neighbors. MaxEW guarantees elimination of loops in any routing path. In MaxEW, a sensor routes data to a neighbor only if the neighbor incurs more energy welfare than the sensor itself. As this routing mechanism continues, the expected energy welfare of the original node is apparently greater than that of the next down-stream node. Therefore, MaxEW always assures finding a routing path to the base station without loops.

5 Experimental Results In this section, several experimental results validate the effectiveness of the MaxEW algorithm. The comparison of the algorithm is with three other algorithms discussed in [4, 7]: Direct Communication (DC), Minimum Transmission Energy (MTE), and Self-Organized routing (SOR). In DC, every sensor simply transmits data directly to the base station without considering any energy-efficient indirect path. MTE and SOR consider indirect routing to save sensor power but make routing decisions based on energy efficiency only. The MaxEW algorithm tries to achieve an energy balance of the network by maximizing the EW of a local society in a decentralized manner. Experimental results show that this approach is valid for extending the lifetime of sensor networks and robust for different event. The experiments use a sensor network in which 100 nodes have random and uniform deployment in a 100m×100m square area with the base station located at (50, 150)(see Fig. 3). In the sensor network, sensors have an initial battery level of 250,000. The initial energy levels are established by determining the amount of energy needed for the farthest node to transmit data to the base station 100 times with DC and also used in [4]. To discuss the effect of different event generation types on the lifetime of

Maximum Energy Welfare Routing in Wireless Sensor Networks

211

150 Base Station 100

50

0 0

20

40

60

80

100

Fig. 3. The configurations of the experimental sensor networks: 100 nodes, randomly and uniformly deployed in 100m×100m square area, have the base station located at (50, 150) [6, 7]

a sensor network, performed simulations uses uniform, random, and repeat event generation functions. In the case of the random distribution, 25% of sensors have events randomly occurring in each round. While, for the repeat events, the assumption is that sensors from (0, 0) to (50, 50) incur repeated events. Because a sensor network is generated randomly, 100 repeated experiments for each condition provides an average of the results. Lastly, neighbor distance of sensors (for MTE and MaxEW) and the aversion parameter ε (for MaxEW) have settings of 15m and 2.5 respectively. Table 1. Lifetime (L1, L10, L20) for Direct, MTE, SOR, and MaxEW with Uniform, Random, and Repeat Events

Direct MTE SOR MaxEW

L1 105.3 14.4 28.7 202.5

Uniform L10 123.4 67.1 145.9 258.3

L20 142.3 115.5 109.9 268.9

L1 492.7 74.6 111.3 1012.7

Random L10 L20 618.1 717.2 337.4 581.1 562.9 771.4 1239.9 1346.7

L1 107.8 30.0 154.1 685.6

Repeat L10 L20 145.8 193.3 223.7 415.6 316.9 418.2 970.6 1020.6

Table 1 gives the result for the lifetime of sensor network (L1, L10, L20) for Direct Communication, MTE, SOR, and MaxEW algorithms with three different event generation types. As shown in Table1, MaxEW shows a dominant performance compared with Direct, MTE and SOR over the time. Especially, in the case of L1, MaxEW gives approximately two to eight times better performance than the others. Fig. 4 shows how well MaxEW achieves the energy balance of sensors over the network. As discussed in [7], in the Direct (Fig. 4(a)), SOR (Fig. 4(c)) and MTE (Fig. 4(b)) routing scheme sensors far away and close to the base station ran out their energies around 150 round. While, in MaxEW, all sensors remain live and even have sufficient energy for responding upcoming events (Fig. 4(d)). Also notable is that Direct, MTE, and SOR missed some events during the first 150 rounds. However, MaxEW guaranteed that all data is transmitted to the base station for the same period.

212

C. Ok et al. Direct-Uniform-150R

MTE-Uniform-150R 6

x 10

Remaining Battery

Remaining Battery

6

x 10 2.5 2 1.5 1 0.5 0 100

50 0

Y

40

20

0

80

60

2.5 2 1.5 1 0.5 0 100

100

50 0

Y

X

40

20

60

80

100

X

(a)

(b)

SOR-Uniform-150R

MaxEW-Uniform-150R

6

6

x 10

x 10

Remaining Battery

Remaining Battery

0

2.5 2 1.5 1 0.5 0 100 50 0

Y

20

0

80

60

40

100

2.5 2 1.5 1 0.5 0 100 50 0

Y

X

(c)

40

20

0

60

80

100

X

(d)

Fig. 4. The remaining energy distributions of sensors with uniform events at the 150 Round for DC, DTE, SOR, and MaxEW Direct

MTE

150

150

BS

BS 100

Y

Y

100

50

0 0

50

20

40

60

80

0 0

100

20

40

60

X

X

(a)

(b)

80

100

80

100

MaxEW

SOR 150

150

BS

BS 100

Y

Y

100

50

50

0 0

20

40

60

80

100

0 0

20

40

60

X

X

(c)

(d)

Fig. 5. Routing paths by Direct, MTE, SOR, and MaxEW with repeat events on the region from (0, 0) to (50, 50)

Maximum Energy Welfare Routing in Wireless Sensor Networks

213

Fig. 5 shows the routing paths for four algorithms with repeated events in the regions from (0, 0) to (50, 50). In the case of Direct, MTE, and SOR, data traffic concentrates in specific sensors which have location in the region or on the efficient path. On the other hand, MaxEW tries to dissipate energy usage over the whole network to achieve energy balance. As a result, MaxEW can keep all sensors operating for as long as possible.

6 Conclusion and Future Works Sensor networks should be able to achieve energy balance as well as energy efficiency. Most energy-aware routing algorithms are only concerned about energy efficiency. This paper has presented a performance measure, called Energy Welfare, to consider energy balance and efficiency of sensor networks simultaneously. Based on this metric, the proposal is for a Maximum Energy Welfare (MaxEW) routing algorithm. We demonstrate the superiority of this routing algorithm to Direct Communication, MTE, and SOR with a lifetime metric, generally accepted for evaluation of routing algorithms. Additionally, from the experimental results, the conclusion is that MaxEW is robust for several event generation functions. To build the metric EW and MaxEW algorithm, we here use the Atkinson welfare function and set the inequality aversion parameter to 2.5. Many alternative welfare functions are available in social science, and this inequality aversion parameter is tunable. In the future, we will apply alternate social welfare functions and different aversion parameters to enhance our results. Currently, we used three types of event generation function for evaluation of our routing algorithm. Future work will involve development of more diverse and detailed event generation functions. In addition, we can consider a general multi-hop communication scenario where only a few sensors can communicate with the base station. In this scenario, the required transmission energy from a sensor to the base station can be calculated by the number of hop to the base station. As a future work, we will investigate how MaxEW works well in the general multi-hop scenario. Acknowledgments. This work was supported in part by NSF under the grant NSFSST 0427840. Any opinions, findings, and conclusions or recommendations presented in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation.

References 1. D. Estrin, R. Govindan, J. Heidemann, and S. Kumar, Next century challenges: scalable coordination in sensor networks, Proceedings of the 5th annual ACM/IEEE international conference on Mobile computing and networking, Seattle, Washington, United States, 263270 (1999). 2. I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, A survey on Sensor Networks, IEEE Communications Magazine, 38(8), 102-114 (2002).

214

C. Ok et al.

3. D. Braginsky and D. Estin, Rumor Routing Algorithm for Sensor Networks, the proceedings of the first Workshop on Sensor Networks and Applications (WSNA), 22-31 (2002). 4. A. Rogers, E. David, and N. R. Jennings, Self-organized routing for wireless microsensor networks, IEEE Transactions on Systems, Man and Cybernetics, Part A, 35(3), 349-359 (2005). 5. Z. Butler and D. Rus, Event-based motion control for mobile-sensor networks, IEEE Pervasive Computing, 2(4), 34-42 (2003). 6. S. Lindsey, C. Raghavendra, and K. M. Sivalingam, Data Gathering Algorithms in Sensor Networks Using Energy Metrics, IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 13(9), 924-935 (2002). 7. W. R. Heinzelman, A. Chandrakasan, and H. Balakrishnan, Energy-Efficient Communication Protocol for Wireless Microsensor Networks, Proceedings of the 33rd Hawaii International Conference on System Sciences - 2000, (2000). 8. J. Chang and L. Tassiulas, Maximum Lifetime Routing in Wireless Sensor Networks, IEEE/ACM TRANSACTION ON NETWORKING, 12(4), 609-619 (2004). 9. J. Wanger and R. Cristescu, Power Control for Target Tracking in Sensor Networks, 2005 Conference on Information Sciences and Systems, The Johns Hopkins University, (2005). 10. R. Ramanathan and R. Rosales-Hain, Topology Control of Multihop Wireless Networks using Transmit Power Adjustment, INFOCOM 2000, 404-413 (2000). 11. H. Zhang and J. C. Hou, Maximizing α-Lifetime for Wireless Sensor Networks, SenMetrics 2005, 70-77 (2005). 12. R. Wattenhofer, L. Li, P. Bahl, and Y. Wang, Distributed topology control for power efficient operation in multihop wireless ad hoc networks, INFORCOM 2001, Anchorage, AK, USA, 1388-1397 (2001). 13. M. Stemm and R. H. Katz, Measuring and reducing energy consumption of network interface in hand-held devices, IEICE Transactions on Communications, E80-B(8), 11251131 (1997). 14. S. R. Gandham, M. Dawande, R. Prakash, and V. S., Energy Efficient Schemes for Wireless Sensor Networks with Multiple Mobile Base Stations, Proceedings of IEEE Globecom 2003, 377-381 (2003). 15. A. B. Atkinson, On the measurement of inequality, Journal of Economics Theory, 2(3), 244-263 (1970). 16. D. A. Seiver, A note on the measurement of income inequality with interval data, Review of Income and Wealth, 25(2), 229-233 (1979). 17. J. G. Williamson, Strategic wage goods, prices, and inequality, American Economic Review, 67(2), 29-41 (1977). 18. A. K. Sen and J. E. Foster, On Economic Inequality, (Oxford: Clarendon Press, 1997).

Analysis of Location Privacy/Energy Eﬃciency Tradeoﬀs in Wireless Sensor Networks Sergio Armenia1 , Giacomo Morabito2 , and Sergio Palazzo2 1

2

CNIT - Research Unit of Catania, Italy [email protected] Universita’ di Catania, DIIT, V.le A.Doria, Catania, Italy [email protected], [email protected]

Abstract. In this paper an analytical framework is proposed for the evaluation of the tradeoﬀs between location privacy and energy eﬃciency in wireless sensor networks. We assume that random routing is utilized to improve privacy. However, this involves an increase in the average path length and thus an increase in energy consumption. The privacy loss is measured using information theory concepts; indeed, it is calculated as the diﬀerence between the uncertainties on the target location before and after the attack. To evaluate both privacy loss and average energy consumption the behavior of the routing protocol is modeled through a Markov chain in which states represent the nodes traversed by a packet in its way to the sink. The analytical framework can be used by designers to evaluate the most appropriate setting of the random routing parameters depending on the privacy and/or energy eﬃciency requirements.

1

Introduction

It is well known that wireless networks have serious privacy problems. This is mainly because of the broadcast nature of the radio channel that allows all stations in proximity of the sender to overhear the frames sent. Even if network devices make use of encryption algorithms, conﬁdentiality is usually provided for the data ﬁeld only, whereas the header/tail ﬁelds remain in plain text. Therefore, given that during the normal activity there are frequently packets sent using broadcast address as destination, an eavesdropper can receive and process them without any eﬀort and thus can obtain information about the sender. This, joined to the fact that wireless devices usually have a ﬁxed address, gives the attackers the possibility to link device address to user identity or to device position as well to the type of application utilized. In wireless sensor networks (WSNs) the above problems are ampliﬁed and new issues arise. In fact, WSNs are based on the wireless multihop communication paradigm and therefore, eavesdropping attacks can be accomplished more easily. Furthermore, WSN applications are pervasive by nature and as a consequence, a lot of user sensible information can be stolen by attackers. In the recent past a lot of attention has been devoted to the key distribution in the WSN cryptography domain. Accordingly, several solutions have been proposed for pre-distributing keys or for reducing their size. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 215–226, 2007. c IFIP International Federation for Information Processing 2007

216

S. Armenia, G. Morabito, and S. Palazzo

However, secure cryptography does not guarantee privacy, as we have already said. Indeed, some research work has recently appeared that deals with the relationship between routing and location privacy in WSN. In fact, radio activity at an intermediate node can be used to obtain information about the position of the information source. In [3] a formal model of the source-location privacy problem is provided, and two popular classes of routing protocols, namely, ﬂooding protocols and single path protocol, are analysed from the privacy and energy consumption standpoints. Based on such analysis a new technique called phantom routing is proposed that combines the advantages of both the above mentioned classes of routing protocols and provides suitable protection of the source location while not causing a noticeable increase in energy consumption. In [5] the authors propose GROW (Greedy Random Walk): a two way random walk to reduce the chance an eavesdropper can collect the source-location information. Note that both the above research contributions are simulations-based. Diﬀerently, in this paper we introduce an analytical framework for the evaluation of the tradeoﬀ between location privacy and energy eﬃciency in wireless sensor networks. To this end we extend the deﬁnition of privacy loss based on information theory concepts, proposed in [1] for data mining systems, to the case of location privacy in sensor networks. More speciﬁcally, we focus on the relationship between random routing design choices and privacy loss as well as energy eﬃciency. Accordingly, we will derive a Markov-based model of the random routing behavior that allows to calculate the privacy loss as well as the average energy consumption. Numerical results conﬁrm that, as expected, energy eﬃciency and privacy are competing requirements. The framework can be used by protocol designers to set appropriate tradeoﬀs between the two above requirements. The remainder of this paper is organised as follows: in Section 2 we present the system model along with a statement of the problem. In Section 3 we evaluate the privacy loss and the energy consumption. Some numerical results are provided in Section 4 and, ﬁnally, conclusions are drawn in Section 5.

2

Problem Statement and System Model

In this section we ﬁrst state the problem of location-privacy in wireless sensor networks (WSNs). More in detail, in Section 2.1 we will deﬁne the problem using the panda-hunter game scenario, then, in Section 2.2 we introduce the system model that will be utilized in the following of the paper. 2.1

Statement of the Problem: The Panda-Hunter Game

The panda-hunter game is a well known reference scenario utilized for the study of source location privacy in WSNs [4,3]. Suppose that a set of sensor nodes has been deployed by the Save The Panda Organisation, in a random way within a large area in order to study and monitor

Analysis of Location Privacy/Energy Eﬃciency Tradeoﬀs in WSNs

217

panda habit. Sensor nodes are able to detect panda’s presence. At any time, while the panda freely moves, there is always a sensor node, called source node, that detects panda’s position. Such an observation must be periodically reported to a sink node, via multihop routing techniques. In this way the current position of the panda is approximately the position of the current source node. Thus, when the sink node receives a message from the source node, it will know the panda position. We suppose that transmissions are encrypted, so the source node ID ﬁeld cannot be read by attackers. Moreover we assume that relationship between node ID and node location is known only by the sink node. In the area there is a hunter as well, with the role of adversary. He aims to catch the panda, thus he is an enemy from the Save The Panda Organisation standpoint. The hunter is not able to decrypt messages therefore, he cannot learn, at least not directly, location of the source node, but in order to get the worst case we considered the hunter, as in [3], non malicius, i.e. does not interfere with proper function of the network, device rich, i.e. he is equipped in such a way he can measure signal strenght and angle of arrival of any message, resource rich, i.e. he has unlimited amount of power, and informed, i.e. he knows location of the sink, and the network structure and protocols. Using his devices and resources the hunter can analyse messages at RF level, so he can try to capture panda by back-tracing the routing path used by messages until the source. As an example, consider the sensor network represented in Figure 1. There are N = 11 sensor nodes n0 , n1 , ..., n10 , with n0 representing the sink.

Fig. 1. Example of hunting activity

In Figure 1 we show the shortest path routing tree connecting each sensor node ni to the sink n0 , i.e., node n0 is the root of the tree. If the hunter is located near node n6 and detects radio activity, then a node in the set {n7 , n8 , n9 , n10 } is the source node. Instead, if no activity is detected, then the panda is near one of the remaining nodes, i.e., a node in the set {n1 , n2 , n3 , n4 , n5 } is the source node.

218

S. Armenia, G. Morabito, and S. Palazzo

Observe, that in any case the hunter splits the network and obtains information about the panda location. This leads to a strict connection between location privacy and routing protocol in a WSN. Routing protocols must be privacy-aware in order to save, or at least prolong, panda’s life.

Fig. 2. Example of random routing

A simple way to improve privacy is to introduce some randomness in the routing behavior. Indeed, in random routing the next relay is chosen randomly between all the neighbors of the current relay. As an example, in Figure 2 we show a path obtained applying random routing in the same WSN shown in Figure 1. In this case, the fact that node n3 is forwarding a packet does not mean that the source node is in the set {n3 , n4 }. However, the length of path followed by packets in random routing can increase signiﬁcantly, which involves large energy consumption. In other words, the increase in privacy is achieved at the expenses of higher energy consumption. It follows that appropriate tradeoﬀs are needed. 2.2

System Model

Let us consider a WSN composed of M nodes denoted as n0 , n1 , ..., nM−1 . For any node ni , with i < M , we deﬁne Φ(ni ) the set of neighbors1 of ni and φ(ni ) the number of its neighbors, i.e., φ(ni ) = |Φ(ni )|. Now suppose that n0 is the sink and let us call d(ni ) the distance between node ni and the sink n0 . Obviously, d(n0 ) = 0 and d(ni ) = minn∈Φ(ni ) {d(n) + 1}. Observe that routing of packets towards the sink in a sensor networks can be modeled by means of a matrix Q ∈ (M−1)×(M−1) , the generic element of which, [Q]i,j , represents the probability that the next relay of a packet transmitted by ni is nj , with i and j ∈ [1, (M − 1)]. We deﬁne Q as the routing matrix. 1

We say that two nodes are neighbors if they are in the radio coverage of each other.

Analysis of Location Privacy/Energy Eﬃciency Tradeoﬀs in WSNs

219

In order to model random routing we deﬁne the best next relay Ψ (ni ) as the neighbour of ni which is closest to the sink, i.e., it is a node that satisﬁes the following relationship d (ψ(ni )) ≤ d(m), ∀m ∈ Φ(ni )

(1)

Let us stress that even if several nodes may satisfy the relationship in eq. (1), for each ni only one node ψ(ni ) is selected. Accordingly, if shortest path routing is utilized [Q][i,j] is equal to 1 if nj is the best next relay, i.e., if nj = ψ(ni ), and is equal to 0, otherwise. We deﬁne as p-random routing a routing algorithm which chooses the best next relay with probability p and any other neighbor node with equal probability. Accordingly, the routing matrix of a p-random routing protocol is ⎧ p if nj = ψ(ni ) and φ(ni ) > 1 ⎪ ⎪ ⎨ (1−p) if nj = ψ(ni ) and φ(ni ) > 1 (2) [Q]i,j = φ(ni )−1 ⎪ 1 if nj = ψ(ni ) and φ(ni ) = 1 ⎪ ⎩ 0 otherwise Note that if p is equal to 1, then random routing becomes a shortest path routing.

3

Performance Analysis

We deﬁne and derive the location privacy loss when p-random routing is applied in Sections 3.1. Then, in Section 3.2, we will derive the corresponding energy consumption. Such performance metrics will be evaluated as a function of the probability p. This allows us to evaluate appropriate tradeoﬀs between privacy loss and energy consumption. 3.1

Privacy Loss

A measure of privacy is crucial to the evaluation of privacy enhancement solutions. Accordingly, in the recent past some research eﬀort has been devoted to the deﬁnition of an appropriate privacy metric. In [6] an overview of the most interesting solutions is provided. Here we extend the deﬁnition proposed in [1] to the location privacy case. Let S be the random variable representing the current position of the panda. We identify the location with the sensor node that detects the presence of the panda. Accordingly, at any time the random variable S can assume a value in the set {n0 , n1 , ..., nM−1 }. Now suppose that following an attack, the hunter can observe a variable X which is correlated to S and can assume one of the following N values: {x0 , x2 , · · · , xN −1 }. The loss of privacy is related to the amount of information gained by the hunter following the attack. Such information is given by the diﬀerence between

220

S. Armenia, G. Morabito, and S. Palazzo

the uncertainty on S before and after knowing X. In the context of the information theory the measure of uncertainty of a random variable can be evaluated as the entropy of such a variable, H(S). In [1] the loss of privacy is calculated as: ρ = 1 − 2−I(S,X)

(3)

where I(S, X) is the mutual information between S and X and is given by I(S, X) = H(S) − H(S|X). The uncertainty on S is (see [7] for example) deﬁned as: H(S) = −

M−1

pS (nm ) log2 [pS (nm )]

(4)

m=0

where pS (nm ) represents the probability that the source node is nm , whereas the uncertainty on S given X is H(S|X) = −

M−1 −1 N

pSX (nm , xn ) log2 [pS (nm |xm )]

(5)

m=0 n=0

where pSX (nm , xn ) represents the joint probability that S assumes the value nm and X assumes the value xn , whereas pS (nm |xn ) represents the probability that S assumes the value nm given that X assumes the value xn . Obviously, the probability pS (nm |xn ) can be calculated as pS (nm |xn ) =

pSX (nm , xn ) pSX (nm , xn ) = M−1 pX (xn ) i=0 pSX (ni , xn )

(6)

Suppose that all locations are equiprobable, i.e., pS (nm ) = 1/M for any nm . Accordingly, the uncertainty on S given in eq. (4) can be calculated as H(S) = log2 M . Also, suppose that the hunter attacks the WSN at node n∗ . Following the attack, the hunter detects radio activity if the path between the source node and the sink passes through the node n∗ and viceversa. Accordingly, X can assume only two values: 0 if there is no radio activity at node n∗ X= (7) 1 if there is radio activity at node n∗ As a consequence, we can rewrite eq. (5) as H(S|X) =

M−1 1 m=0 x=0

pSX (nm , x) log2

1 pS (nm |x)

(8)

In eq. (8) we need to calculate the probability pSX (nm , x) which can also be used in eq. (6) to calculate pS (nm |xn ). The probability pS (nm |xn ) is given by pSX (nm , x) = pX (x|nm ) · pS (nm ) = pX (x|nm )/M

(9)

Analysis of Location Privacy/Energy Eﬃciency Tradeoﬀs in WSNs

221

Observe that pX (x|nm ) represents the probability that a packet generated by node nm does not pass through n∗ , if x = 0, and that such packet passes at least ones through n∗ , if x = 1. Now we will calculate pX (1|nm ); once this is known, pX (0|nm ) can be easily evaluated as pX (0|nm ) = 1 − pX (1|nm ) (10) Recall that pX (1|nm ) is the probability that a packet generated by node nm passes through node n∗ at least once before reaching the sink n0 . Let V be the random variable representing the hop at which the packet visits for the ﬁrst time node n∗ . Applying the theorem of the total probability, the probability pX (1|nm ) can be calculated as the sum of probabilities that a packet generated by node nm visits at the V -th hop node n∗ , for any value of V , i.e., pX (1|nm ) =

∞

pXV (1, v|nm )

(11)

v=0

The probability in the sum in the right handside of eq. (11) is the probability that the packet generated by nm does not visit node n∗ and does not reach the sink until hop (v − 1), and, ﬁnally, at the v-th hop visits node n∗ . This can be calculated as: ∗

pXV (1, v|nm ) = w(m) · Gv · [w(n ) ]T

(12)

where: – w(j) is an array of M − 1 elements, w(j) ∈ M−1 , all set equal to zero, with the exception of the j-th element which is equal to 1, i.e., 0 if i = j and 1 ≤ i ≤ M − 1 (13) [w(j) ]i = 1 if i = j and 1 ≤ i ≤ M − 1 – G is an [M − 1] × [M − 1] matrix, G ∈ [M−1]×[M−1] , and its generic element [G][i,j] represents the probability that a packet received by node ni will be relayed to node nj , with nj = n∗ , and is not relayed by node n∗ . This can be obtained as follows: [Q][i,j] if j = n∗ (14) [G][i,j] = 0 if j = n∗ – [w]T represents the transponse of the array w. Substituting eq. (12) in eq. (11) we can easily obtain: pX (1|nm ) = w

(m)

·

∞ v=0

G

v

∗

· [w(n ) ]T

(15)

222

S. Armenia, G. Morabito, and S. Palazzo

By applying the spectral decomposition to matrix G = D ·B ·D−1 , where B is a diagonal matrix containing the eigenvalues βi of G and D is the matrix whose columns are the corresponding eigenvectors, we can rewrite eq. (15) as follows: ∞ ∗ (m) v pX (1|nm ) = w ·D· B · D−1 · [w(n ) ]T (16) v=0

We call K the sum in the right hand side of eq. (16). We can easily obtain that K is a diagonal matrix whose generic element is 1/(1 − βi ) if i = j [K][i,j] = (17) 0 otherwise. Accordingly, eq. (16) can be rewritten as ∗

pX (1|nm ) = w(m) · D · K · D−1 · [w(n ) ]T

(18)

where K has been calculated in eq. (17). Once pX (1|nm ) has been calculated we have all the parameters required for the calculation of the uncertainty on S given X, i.e., H(S|X). Note that the value of privacy loss ρ depends on n∗ . Since the hunter knows the structure and protocols of the network, the node n∗ which maximizes the privacy loss will be selected. As a consequence, the WSN gives a privacy loss γ given by γ = maxn∗ {ρ}. 3.2

Energy Consumption

The energy consumption for routing a packet from its source to the sink can be calculated as the product of the energy cost for a single hop transmission, c, and the number of hops between the source node and the sink2 . We call Z the random variable representing the number of hops needed for a packet to reach the sink. The average energy consumption, , needed to route a packet to the destination can be calculated as = c · E{Z} = c ·

∞

z · pZ (z)

(19)

z=1

where E{Z} represents the average value of Z and pZ (z) represents the probability that the number of hops between the source and the destination is equal to z. The probability pZ (z) is the probability that a packet does not reach the sink in (z − 1) hops and ﬁnally arrives at the sink at the z-th hop. Therefore, it is easy to show that pZ (z) can be written in compact form as

pZ (z) = π (S ) · P z−1 · ω T

(20)

where 2

Observe that c can also take possible retransmissions into account. In this sense, analysis of c is simple and not reported in this paper for space constraints.

Analysis of Location Privacy/Energy Eﬃciency Tradeoﬀs in WSNs

223

– π (S ) is an array of (M − 1) elements, π (S ) ∈ M−1 . Its generic element is given by:

π (S ) = pS (nm ) = 1/M with 1 ≤ m < M . (21) m

– P is an [M − 1] × [M − 1] matrix, i.e., P ∈ [M−1]×[M−1] . Its generic element [P ][i,j] represents the probability that a packet received by node ni is transmitted to node nj , with nj = n0 . Accordingly, the generic element of P is given by: if i and j ∈ {1, 2, ..., M − 1}

[P ][i,j] = [Q][i,j]

(22)

– ω is an array of M − 1 elements, i.e., ω ∈ M−1 . Its generic element [ω]m , with 1 ≤ m < M represents the probability that a packet is relayed by node nm to the destination. Accordingly, [ω]m = [Q][m,0]

(23)

Applying the spectral decomposition of P and following a procedure analogous to that presented in Section 3.1, we can rewrite eq. (19) as

= c · π (S ) · T · H · T −1 · ω T

(24)

In eq. (24) the matrix H is a diagonal matrix and its generic element is 1/[(1 − λi )]2 if i = j [H][i,j] = (25) 0 otherwise where λi is the i-th eigenvalue of P and T is the matrix of the eigenvectors of P.

4

Numerical Examples

In this section we apply the proposed analytical framework to describe how this can be used to evaluate the tradeoﬀs between location privacy and energy eﬃciency in WSN. We consider a network of M sensor nodes uniformly distributed on a squared area of size 1 km × 1 km. We assume that all sensor nodes have coverage radius equal to R = 200 m. Once position of sensor nodes is set and the value of the parameter p, characterizing the random routing, is known, it is possibile to construct the routing matrix Q as given in eq. (2). Starting from the routing matrix Q it is possible to evaluate the privacy loss γ and the average energy consumption as reported in Section 3. All values in the following ﬁgures have been evaluated as the average of the results obtained in 20 cases. For each case, a new distribution of sensor nodes has been generated. Moreover, for each case individual routes are chosen considering the same sink node and a source node chosen in a random fashion based on uniform distribution.

224

S. Armenia, G. Morabito, and S. Palazzo

Fig. 3. Privacy loss, γ, versus the probability p for diﬀerent values of the number of sensor nodes, i.e., M = 50 and M = 100

In Figure 3 we show the privacy loss, calculated as described in Section 3.1, versus the value of the probability p for two diﬀerent values of the number of nodes, i.e., M = 50 and M = 100. In Figure 3 the privacy loss increases as the probability p becomes higher. This is an expected result. Indeed, using low values of p makes the routing behavior fuzzy and therefore, the hunter cannot obtain signiﬁcant information attacking the network.

Fig. 4. Normalized energy consumption /c versus the probability p for diﬀerent values of the number of nodes, i.e., M = 50 and M = 100

In Figure 4 we show the average energy consumption to deliver a packet to the sink, , versus the probability p for M = 50 and M = 100. More speciﬁcally, in Figure 4 we show the values of the ratio /c. We present normalized energy consumption values because c depends on the speciﬁc communication technology utilized, and not on the routing algorithm. As expected, the energy consumption decreases as the value of the probability p increases; furthermore, the higher the

Analysis of Location Privacy/Energy Eﬃciency Tradeoﬀs in WSNs

225

Fig. 5. (Upper plot:) Average energy consumption versus privacy loss γ and (Bottom plot:) The value of the probability p versus the corresponding privacy loss γ

number of sensor nodes M , the lower the energy consumption. This is because, if there are more nodes, it is likely to ﬁnd better next relays than in case there are few nodes. To highlight the tradeoﬀ between privacy loss and energy eﬃciency, in Figure 5 we show two plots. In the upper plot we represent the normalized energy consumption, /c, versus the corresponding value of the location privacy loss, γ. As expected, the privacy loss increases as the energy consumption decreases. This ﬁgure has been obtained considering M = 100 nodes and can be utilized by the designer to select an appropriate tradeoﬀ between energy eﬃciency and privacy. Once a point in the curve is chosen, the designer can use the bottom plot to obtain the corresponding value of p that gives the selected performance.

5

Conclusion

In this paper we have presented an analytical framework for the evaluation of the tradeoﬀ between location privacy and energy eﬃciency in a WSN applying random routing to increase privacy protection. The proposed framework is based on a Markov-based modeling of the random routing behavior. The framework can be used by network designers to evaluate the most appropriate value of the probability p characterizing the random routing behavior in accordance with the application requirements.

Acknowledgments This paper has been partially supported by European Commission under contract DISCREET (FP6-2004-IST-4 contract no. 27679).

226

S. Armenia, G. Morabito, and S. Palazzo

References 1. D. Agrawal, C. Aggarwal. On the Design and Quantiﬁcation of Privacy Preserving Data Mining Algorithms. Proc. of the Twentieth ACM SIGACT-SIGMODSIGART, Santa Barbara, California, USA. May 2001. 2. M. Anand, Z. G. Ives, I. Lee. Quantifying Eavesdropping Vulnerability In Sensor Networks. Department of Computer & Information Science, University of Pennsylvania, 2005. 3. P. Kamat, Y. Zhang, W. Trappe, C. Ozturk. Enhancing Source-Location Privacy in Sensor Network Routing. Proc. of International Conference on Distributed Computing Systems (ICDCS 2005), Columbus, OH, USA. June 2005. 4. C. Ozturk, Y. Zhang, W. Trappe, M. Ott. Source-Location Privacy for Networks of Energy-Constrained Sensors. In Proc. of IEEE IEEE Workshop on Software Technologies for Embedded and Ubiquitous Computing Systems (WSTFEUS), Vienna, Austria. May 2004. 5. Y. Xi, L. Schwiebert, W. Shi. Preserving Source Location Privacy in MonitoringBased Wireless Sensor Networks. Department of Computer Science, Wayne State University, 2006. 6. DISCREET Project, State of the art Deliverable. http://www.ist-discreet.org/ Deliverables/D2103.pdf 7. S. Haykin. Communication Systems, 4th edition.

Efficient Error Recovery Using Network Coding in Underwater Sensor Networks Zheng Guo, Bing Wang, and Jun-Hong Cui Computer Science & Engineering Department, University of Connecticut, Storrs, CT, 06269 {guozheng,bing,jcui}@engr.uconn.edu

Abstract. Before the wide deployment of underwater sensor networks becomes a reality, one of the challenges that needs to be resolved is efficient error recovery in the presence of high error rates, node mobility and long propagation delays. In this paper, we propose an efficient error-recovery scheme that carefully couples network coding and multipath routing. Through an analytical study, we provide guidance on how to choose parameters in our scheme and demonstrate that our scheme is efficient in both error recovery and energy consumption. We evaluate the performance of our scheme using simulation and our simulation confirms the results from the analytical study.

1 Introduction Underwater sensor networks are ideal vehicles for monitoring aqueous environments. However, before the wide deployment of underwater sensor networks becomes a reality, a range of challenges must be tackled [1,2,3]. One such challenge is efficient error recovery in the presence of high error rates, node mobility and long propagation delays (caused by fast fading acoustic channel, water currents and slow acoustic communication). Using common error-recovery techniques such as Automatic Repeat reQuest (ARQ) and Forward Error Correction (FEC) in underwater sensor networks has the following drawbacks. ARQ-based schemes require the receiver to detect losses and then request the sender to retransmit packets. This may lead to long delays. FEC-based schemes proactively add redundant packets to eliminate retransmission from the source. The FEC can be applied on an end-to-end or hop-by-hop basis (as in [4]). However, in either case, the proper amount of redundancy is hard to decide due to the difficulty of obtaining accurate error-rate estimates [3]. In our prior study [5], we demonstrate that network coding is a promising technique for error recovery in underwater sensor networks. The main idea of network coding [6,7] is that, instead of simply forwarding a packet, a node may code several incoming packets into one or multiple outgoing packets. Network coding is suitable for underwater sensor networks because (1) underwater sensor nodes are usually larger than land-based sensors and posses more computational capabilities [8]; (2) the broadcast property of acoustic channels naturally renders multiple highly interleaved routes

This work is supported in part by the NSF CAREER Grant No. 0644190 and in part by the Uconn Large Grant FRS 449251.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 227–238, 2007. c IFIP International Federation for Information Processing 2007

228

Z. Guo, B. Wang, and J.-H. Cui

from a source to a sink. The computational power at the sensor nodes coupled with the multiple routes provides ample opportunity to apply network coding. In this paper, building upon our preliminary work [5], we provide an in-depth study on using network coding in underwater sensor networks. Our main contributions are as follows. First, we propose an error-recovery scheme that carefully couples network coding and multipath routing. Second, we analytically study the performance of this scheme along with several other error-recovery schemes. Our analysis provides guidance on how to choose parameters in our scheme and demonstrates that, among the multiple schemes, our scheme is most efficient in terms of error recovery and energy consumption. Last, we evaluate the performance of our scheme using simulation and the simulation confirms the results from the analytical study. As related work, multipath routing schemes have been proposed for error resilience in sensor networks (e.g., [9,8]). Our scheme carefully combines network coding and multipath routing and provides much better error recovery than using multipath routing alone (see Sections 4 and 5). The study of [10] provides error resilience using multiple virtual sinks: a source forwards packets to multiple high-bandwidth virtual sinks, which then forward the packets to the final destination. This scheme requires a specialized delivery infrastructure while our scheme does not have such a requirement. The rest of the paper is organized as follows. Section 2 describes the problem setting. Section 3 describes our error-recovery scheme based on network coding. Sections 4 and 5 study the performance of our scheme along with several other schemes using analysis and simulation respectively. Finally, Section 6 concludes the paper and presents future work.

2 Problem Setting We now describe the problem setting. Consider a source-sink pair in an underwater sensor network. The path (or multipath) from the source to the sink is determined by a single-path (or multipath) routing algorithm. We refer to the intermediate nodes on the path(s) as relays. We consider several error-recovery schemes including single-path forwarding, endto-end FEC, hop-by-hop FEC, multipath forwarding and network coding. In single-path and multipath forwarding, packets are simply forwarded, without any coding. Singlepath forwarding is a baseline scheme since it does not exploit any extra mechanism for error recovery. Multipath forwarding recovers error through redundant packets over the multiple paths (a relay does not forward duplicate packets). FEC-based schemes use a single path from the source to the sink: end-to-end FEC encodes packets at the source and decodes them at the the sink; in hop-by-hop FEC, each relay on the path decodes incoming packets, encodes the recovered packets, and then forwards them to the next hop. Network coding requires multiple paths from the source to the sink; a node encodes incoming packets into one or multiple outgoing packets, as to be described in detail in Section 3. A packet successfully received (under single or multipath forwarding) or recovered (under FEC or network coding) is referred to as a successfully delivered packet. Since efficient error-recovery schemes for underwater sensor networks must achieve high

Efficient Error Recovery Using Network Coding in Underwater Sensor Networks

229

error-recovery rate and conserve sensor node energy simultaneously, we consider the following two metrics. The first metric is the number of successfully delivered packets over the total number packets from the source, referred to as successful delivery ratio, denoted as R. The second metric is the total number of transmissions from the source to the sink (including transmissions from the source and relays) normalized by the successful delivery ratio. Since the number of transmissions roughly corresponds to the amount of energy consumed in the network, we refer to this metric as normalized energy consumption, denoted as T . This metric represents the average number of transmissions required for a successfully delivered packet. We next describe our network coding scheme for underwater sensor networks and then evaluate the various schemes using analysis and simulation.

3 Using Network Coding in Underwater Sensor Networks We now describe our error-recovery scheme based on network coding. This scheme carefully couples network coding and multipath routing to achieve a good balance between error recovery and energy consumption. In the following, we first describe how to apply network coding (we use random linear coding [11] due to its simplicity) given a set of paths from a source to a sink. We then describe how to adapt the multiple paths or the amount of redundancy to improve the efficiency of network coding. 3.1 Network Coding Scheme Packets from the source are divided into generations, each generation contains K packets. The source linearly combines K packets in a generation using randomly generated coefficients. More specifically, let X1 , . . . , XK denote the K packets in a generation. The source linearly combines these K packets to compute K outgoing packets, deK noted as Y1 , Y2 , . . . , YK where Yi = j=1 gij Xj . The coefficient gij is picked randomly from a finite field F2q . The set of coefficients (gi1 , . . . , giK ) is referred as the encoding vector for Yi [7] and are carried in a packet as overhead. We choose K ≥ K since adding a small amount of redundancy at the source (e.g., K = K + 2) reduces the impact of packet loss on the first hop (which cannot be recovered at later hops) and improves error recovery at the sink [5]. A relay in forwarding paths stores incoming packets from different routes in a local buffer for a certain period of time, then linearly combines the buffered packets belonging to the same generation. Suppose a relay, r, receives M incoming packr . Let (fi1 , . . . , fiK ) denote the encoding vector carried by Xir , i = ets, X1r , . . . , XM 1, . . . , M . Since transmitting dependent packets is not useful for decoding at the sink, relay r computes M outgoing packets, where M is the rank of the coefficient matrix r (fij ), i = 1, . . . , M , j = 1, . . . , K. Therefore, M ≤ min(M, K). Let Y1r , . . . , YM M r r r denote the outgoing packets, Yi = j=1 hij Xj , where hij is picked randomly from r r the finite field F2q . Let (gi1 , . . . , giK ) denote the encoding vector of Yir , i = 1, . . . , M . M r r Then gij = l=1 hil flj . When the sink receives K packets with linearly independent encoding vectors, it recovers the original packets by matrix inversion [7]. The complexity is O(K 3 ).

230

Z. Guo, B. Wang, and J.-H. Cui

sin k

s o u rc e

s o u rc e /s in k m u lti-p a th re la y o th er n o d es

Fig. 1. Illustration of transmitting a packet along multiple paths from the source to the sink. Nodes in a dashed circle form a relay set.

3.2 Path or Redundancy Adaption for Network Coding The efficiency of network coding relies on the quality of the underlying paths determined by a multipath routing algorithm. We next describe a multipath property under which network coding is efficient (in terms of both error recovery and energy consumption). Fig. 1 illustrates the process of transmitting a packet along a multipath. The source broadcasts the packet to its downstream neighbors (nodes within its transmission range and in the forwarding paths), referred to as a relay set. Nodes in the relay set further forward the packet to their neighbors, forming another relay set. Intuitively, a multipath suitable for network coding should contain a similar number of nodes in each relay set. This is because, a relay set with too few nodes may not provide sufficient redundancy; a relay set with too many nodes wastes energy to provide more redundancy than what is necessary for error recovery. We develop two schemes to adjust the multipath or the amount of redundancy to improve the efficiency of network coding. In both schemes, a node uses the number of its downstream neighbors to approximate the size of its downstream relay set. This is because the former can be easily estimated through localization service (e.g., [12]) and localized communication between a node and its neighbors while the latter is difficult to estimate. The first scheme requires that sensor nodes have multiple levels of transmission power [13]. A node selects a transmission power so that the estimated number of downstream neighbors is between Nl and Nu , where Nl and Nu are lower and upper thresholds respectively. We refer to this scheme as transmission-range adaption. In the second scheme, each node has a fixed transmission range and a node adapts the amount of redundancy that it injects to the network. More specifically, a node with less than a Nl downstream neighbors encodes more outgoing packets to increase the amount of redundancy. Similarly, a node with more than Nu downstream neighbors encodes less outgoing packets to reduce the amount of redundancy (we only do this when the coefficient matrix at the node has a full rank of K). We refer to this scheme as redundancy adaption. Our analytical results in the next section provide guidance on how to choose parameters for the above two adaption schemes. Note that both schemes only require localized

Efficient Error Recovery Using Network Coding in Underwater Sensor Networks

231

information and hence are easy to deploy. Furthermore, they can be applied to mobile underwater sensor networks when coupled with a multipath routing scheme that supports mobility (e.g., [8]).

4 Analytical Study We now analytically study the performance of the various error-recovery schemes in Section 2. Our goal is two-fold: (1) analytically compare the efficiency of the various schemes; (2) provide guidance on how to choose parameters in network coding. In the interest of space, we only present the results for multi-path forwarding and network coding; the results for other schemes can be found in [14]. Multi-path forwarding and network coding use the same multipath from the source to the sink. Assume that there are H relay sets from the source to the sink, indexed them from 1 to H (see Fig. 1). The sink is in the H-th relay set. Let Ni be the number of relays in the i-th relay set. For simplicity, we assume that the relay sets do not intersect. Furthermore, a node in a relay set can receive from all nodes in the previous relay set. Last, a node only uses packets forwarded from its previous relay set (i.e., packets received from nodes in the same relay set are discarded). For both schemes, we derive the normalized energy consumption, T , from the successful delivery ratio, R, as follows. Consider an arbitrary packet (regardless of being successfully delivered or not), let Ti denote the average number of times that it is transmitted from the nodes in the previous relay set (or the source) to those in the i-th relay set. Then H Ti (1) T = i=1 R We assume that the acoustic channels have the bit error rate of pb . Let p be the probability that a packet has bit error. Then p = 1 − (1 − pb )L for independent bit errors and a packet size of L bits. We next present the analysis for multipath forwarding and network coding. 4.1 Analysis of Multipath Forwarding Consider an arbitrary packet P . Let αi be the probability that a node in the i-th relay set receives packet P . Let αi,n be the probability that n nodes in the i-th relay set receive packet P , n = 0, . . . , Ni . Assume that packet losses are independent. Then 1−p i=1 (2) αi = Ni−1 n α (1 − p ), 2≤i≤H i−1,n n=0 This is because, for a node in the first relay set, the probability that it receives packet P from the source is 1 − p; when i ≥ 2, a node in the i-th relay set receives packet P when it receives at least one copy of this packet from the (i − 1)-th relay set. Assume that packet transmissions to nodes in a relay set are independent. Then Ni n (3) αi,n = αi (1 − αi )Ni −n , n = 0, . . . , Ni n

232

Z. Guo, B. Wang, and J.-H. Cui

Since packet P is an arbitrary packet and the sink is in the H-th set, we have R = αH . The above results indicate that αH can be obtained in the following manner. We first obtain α1,n from α1 (of value 1 − p), and then obtain α2 using α1,n . This process continues until eventually αH is obtained. Since a node forwards packet P at most once, we have 1, i=1 Ti = (4) αi−1 Ni−1 , 2 ≤ i ≤ H After obtaining R and Ti , we calculate the normalized energy consumption T from (1). 4.2 Analysis of Network Coding Consider an arbitrary generation of K packets. Under linear random coding, when a sink receives at least K packets in the generation, the probability that it can recover the K original packets is high for a sufficiently large finite field [11]. Therefore, for simplicity, we assume that the sink recovers the K original packets as long as it receives at least K packets in the generation. We do not differentiate nodes in the same relay set. Let βi,k be the probability that a node in the i-th relay set receives k packets (when 0 ≤ k < K) or at least k packets (when k = K) from all nodes in the previous relay set, 1 ≤ i ≤ H. Since the sink is in the H-th relay set and the generation is arbitrary, we have R = βH,K . We next derive βi,k , 1 ≤ i ≤ H, 0 ≤ k ≤ K. The nodes in the first relay set receive packets from the source. Therefore K (1 − p)k pK −k , 0 ≤ k < K k (5) β1,k = 1 − K−1 k=K j=0 β1,j where K ≥ K is the number of encoded packets from the source. For i ≥ 1, 0 ≤ k < K, we obtain βi+1,k as follows. We index the nodes in the i-th relay from 1 to Ni . Let γi,j,k denote the probability that a node in the i-th relay set receives k packets from the j-th node in the previous relay set, 1 ≤ i ≤ H, 1 ≤ j ≤ Ni−1 , 0 ≤ k < K. Since each relay transmits no more than K packets, we have γi,j,k =

K

βi−1,k

n=k

n (1 − p)k pn−k k

(6)

For a node in the (i+1)-th set, let kj be the number of packets that it receives from the jth node in the previous relay set. To obtain βi+1,k , we need to consider all combinations Ni of kj ’s such that j=1 kj = k, kj = 0, . . . , k. That is, Ni

βi+1,k = kj =0,...,k

s.t.

Ni

j=1 j=1 kj =k

γi+1,j,kj

(7)

Efficient Error Recovery Using Network Coding in Underwater Sensor Networks

233

For a small generation size K, the above quantity is easy to compute. We use small K (e.g., K = 3) since our study [5] indicates that it is sufficient to achieve good performance using small K (also confirmed by simulation in the settings of Section 5). We obtain βi+1,K from βi+1,k , 0 ≤ k < K as βi+1,K = 1 −

K−1

βi+1,k

(8)

k=0

From the above, we calculate R = βH,K as follows. We first obtain β1,k , which is used to compute γ2,j,n and β2,k , 0 ≤ k ≤ K. This process continues until eventually βH,K is obtained. Since a relay transmits no more than K packets, we have K /K, i=1 Ti = Ni−1 K (9) kβ , 2≤i≤H i−1,k k=0 K After obtaining R and Ti , we calculate the normalized energy consumption T from (1). 4.3 Numerical Results

Successful Delivery Ratio

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

network coding network coding (N=2) multi−path forwarding multi−path forwarding (N=2) hop−by−hop FEC end−to−end FEC single−path forwarding

0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5

Bit Error Rate

(a) Successful delivery ratio.

−3

x 10

Normalized Energy Consumption

We next compare the various schemes based on our analytical results. The bit error rate is in the range of 10−4 to 1.5 × 10−3 to account for potential high loss rate in underwater sensor network (e.g., due to fast channel fading). For network coding, a generation contains 3 packets (e.g., K = 3). The source transmits K = 5 packets. For multipath forwarding and network coding, we set the number of relay sets, H, to 7 or 9, and assume all relay sets contain the same number of nodes, i.e., Ni = N , i = 1, . . . , H. Similarly, for single-path forwarding and FEC, we set the number of hops from the source to the sink to 7 or 9. For FEC, each block contains 3 packets (same as the generation size in network coding) and the amount of redundancy is 3N − 3 since a relay set contains N nodes in multipath forwarding and network coding. 6

10

5

10

single−path forwarding end−to−end FEC hop−by−hop FEC network coding multi−path forwarding

4

10

3

10

2

10

1

10

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5

Bit Error Rate

(b) Normalized energy consumption.

Fig. 2. Numerical results, H = 9, N = 3 unless otherwise specified

−3

x 10

234

Z. Guo, B. Wang, and J.-H. Cui

Fig. 2 plots the successful delivery ratio and normalized energy consumption for various schemes when H = 9. We observe that network coding outperforms the other schemes: it achieves the highest successful delivery ratio and the lowest normalized energy consumption for the range of bit error rates when N = 3 (i.e., each relay set contains 3 nodes). Furthermore, network coding achieves similar performance when H = 7 (not plotted), indicating that it is insensitive to the length of the path (network size). We also observe that when the number of nodes in each relay set, N , is decreased from 3 to 2, the successful delivery ratio of network coding drops sharply. Based on the above results, we set Nl = Nl = 3 in our simulation 5. From Fig. 2, we also observe that multipath forwarding achieves a similar normalized energy consumption and a lower successful delivery ratio than network coding for the same value of N . The successful delivery ratio under hop-by-hop FEC is sensitive to both the bit error rate and the number of hops on the path (network size), indicating that the amount of redundancy needs to be carefully selected according to these two parameters. The successful delivery ratio under single-path forwarding and end-to-end FEC decreases significantly as the bit error rate increases, indicating that they are not suitable for high error-rate underwater sensor networks.

5 Simulation Study We now evaluate the performance of the various error-recovery schemes using simulation. The underwater sensor network is deployed in a cubic target area of 1km × 1km × 1km, which is a reasonable network size for underwater sensor networks. The source and sink are deployed respectively at bottom corner and surface corner, on the diagonal of the cube. The MAC layer supports broadcasting. The routes from the source to the sink is determined by Vector-based Forwarding (VBF) [8]. In VBF, a routing pipe is a pipe centered around the vector from the source to the sink. Nodes inside the routing pipe are responsible for routing packets from the source to the sink; nodes outside the routing pipe simply discard all incoming packets. The relay set formation depends on node density and routing protocol used, using VBF is easy to format relay sets and we will propose two techniques in 5.2 to adjust the relay sets. Each packet is 50 bytes. For network coding, each generation contains K = 3 packets; the source outputs K = 5 packets for each generation and each relay outputs no more than 3 packets. We choose a finite field of F28 [11], leading to packets of 53 bytes (including 3-byte encoding vector). A relay has a memory to store 10 packets for each generation; packets transmitted from the node are removed from the memory. We look at two types of sensor deployment: grid random deployment and uniform random deployment. In grid random deployment, the target area is divided into grids; a number of nodes are randomly deployed in each grid. In uniform random deployment, nodes are uniformly randomly deployed in the area. Grid random deployment covers the area more evenly than uniform random deployment while uniform random deployment is easier to deploy. The comparative results of the various schemes from simulation are consistent with those from analytical study. We focus on the performance of network coding and multipath forwarding in the following.

Normalized Energy Consumption

Efficient Error Recovery Using Network Coding in Underwater Sensor Networks

Successful Delivery Ratio

1

0.95

0.9

0.85

0.8

0.75

network coding (simu.) network coding (ana., N=3) multi−path forwarding (ana., N=3) hop−by−hop FEC (ana.) multi−path forwarding (simu.)

0.7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

Bit Error Rate

(a) Successful delivery ratio.

235

50 hop−by−hop FEC (ana.) network coding (simu.) network coding (ana., N=3) multi−path forwarding (simu.) multi−path forwarding (ana., N=3)

45

40

35

30

25

20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5

Bit Error Rate

−3

x 10

−3

x 10

(b) Normalized energy consumption.

Fig. 3. Simulation results under grid random deployment

5.1 Performance Under Grid Random Deployment In grid random deployment, the target area is divided into 125 grids, each grid is 200m× 200m × 200m. Each grid contains 2 nodes, randomly distributed in the grid. Based on the analytical results in Section 4, we set the transmission power and pipe radius of a node to cover 3 to 4 downstream neighbors (with an average of 3.1). This is achieved when each node uses a transmission range of 300 m [15] and a pipe radius of 150 m. Figures 3 (a) and (b) plot the successful delivery ratio and normalized energy consumption for network coding and multipath forwarding. The confidence intervals (from 20 simulation runs) are tight and hence omitted. We also plot the analytical results when N = 3 (i.e., each relay set contains 3 nodes). For network coding, we observe that the simulation results are very close to those from the analysis, indicating that the analysis provides a good approximation and guidance on choosing parameters in network coding. For multipath forwarding, the analytical results are slightly (no more than 8%) higher than those from the simulation. This might be because we assume a node can hear from all nodes from its previous relay set in the analysis, which provides an overestimate of the successful delivery ratio. We observe that network coding provides significantly better error recoveries than multipath forwarding for high bit error rates. The normalized energy consumption under network coding is slightly higher than that under multipath forwarding because the source adds redundancy and more packets are forwarded at a relay in network coding (a relay discards duplicate packets in multipath forwarding). For the sake of comparison, we also plot the analytical result under hop-by-hop FEC in Fig. 3. When using this scheme, the number of hops (on the single path) from the source to the sink is 9, and a block contains 3 packets (to be consistent with the generation size in network coding). Each blocks adds 28/9 ∗ 3 − 3 = 7 redundant packets since the routing pipe used in network coding and multipath forwarding contains 28 nodes. Note that, although we purposely add a higher amount of redundancy for hopby-hop FEC, it still achieves much lower successful delivery ratio than network coding for relatively high bit error rates.

236

Z. Guo, B. Wang, and J.-H. Cui

90

0.8

80

0.7

70

0.6

60

0.5

50

0.4

40

0.3

30

0.2 0.1 0 120

20

successful delivery ratio normalized energy consumption 140

160

180

200

Radius (m)

220

240

10 0 260

1

200

0.9

180

0.8

160

0.7

140

0.6

120 successful delivery ratio

0.5

normalized energy consumption

100

0.4

80

0.3

60

0.2

40 20

0.1 0 250

270

290

310

330

350

370

0 390 400

Normalized Energy Consumption

100

Successful Delivery Ratio

Successful Delivery Ratio

1 0.9

Normalized Energy Consumption

We now demonstrate that it is indeed important for a node to have 3 to 4 downstream neighbors for efficient network coding, as indicated by the analytical results. For this purpose, we either fix the transmission range to 300 m and vary the pipe radius or fix the pipe radius to 150 m and vary the transmission range. The results are plotted in Figures 4(a) and (b) respectively, where the bit error rate is 1.5 × 10−3 . In both cases, we observe that a good balance between error recovery and energy consumption is achieved when the transmission range is 300 m and the pipe radius is 150 m (i.e., when a node has 3 to 4 downstream neighbors).

Transmission Range (m)

(a)

(b)

Fig. 4. Successful delivery ratio and normalized energy consumption under grid random deployment: (a) Transmission range is 300 m, (b) Pipe radius is 150m

5.2 Performance Under Uniform Random Deployment We now present the results under uniform random deployment. Under this type of deployment, we find that using the same transmission range and pipe radius for all the nodes cannot ensure 3 to 4 downstream neighbors for each node. We therefore allow a node to adjust its transmission range or the amount of redundancy that it injects into the network. We first present the result under transmission-range adaptation. The pipe radius is set to 150 m. A node set its transmission range to have 3 to 4 downstream neighbors (with an average of 3.3). The resulting transmission ranges are from 100 to 400 m for all the nodes. Fig. 5 plots the successful delivery ratio under network coding. We observe that transmission-range adaption achieves a similar successful delivery ratio as that from the analytical result using N = 3. This indicates that transmission-range adaption is effective for error recovery. For comparison, we obtain the results when all nodes uses a transmission range of 300 m. We observe that it achieves significantly lower successful delivery ratio (see Fig. 5) and higher normalized energy consumption (not plotted) than those under transmission-range adaption. We next present the results when all nodes uses the same transmission range of 300 m and adjusts the amount of redundancy according to the number of its downstream neighbors. In Fig. 6, a node adds one more outgoing packet when it has less than 3

Efficient Error Recovery Using Network Coding in Underwater Sensor Networks

237

Successful Delivery Ratio

1

0.95

0.9

0.85

0.8

0.75

0.7 0.8

network coding (simu., adapt.) network coding (ana., N=3) network coding (simu., non−adapt.) 0.9

1

1.1

1.2

1.3

1.4

Bit Error Rate

−3

x 10

Fig. 5. Transmission-range adaption in uniform random deployment

Successful Delivery Ratio

1

0.95

0.9

0.85

0.8

0.75

0.7 0.8

network coding (ana., N=3) network coding (simu., adapt.) network coding (simu., non−adapt.) 0.9

1

1.1

1.2

Bit Error Rate

1.3

1.4

1.5 −3

x 10

Fig. 6. Redundancy adaption in uniform random deployment

downstream neighbors and removes an outgoing packet when it has more than 6 downstream neighbors. We observe that this adaption achieves a similar successful delivery ratio as that from the analysis using N = 3 with only slightly higher normalized energy consumption (not plotted). The above results demonstrate that adjusting redundancy is also helpful for efficient error recovery under network coding.

6 Conclusion and Future Work In this paper, we first proposed an efficient error-recovery scheme that carefully couples network coding and multipath routing for underwater sensor networks. We analytically studied the performance of our scheme along with several other error-recovery schemes. Our analysis provided guidance on how to choose parameters in our scheme and demonstrated that our scheme is most efficient among the multiple schemes. Finally, we evaluated the performance of our scheme using simulation. Our simulation results confirmed the analytical study that our scheme is efficient in both error recovery and energy consumption. As future work, we are pursuing in three directions: (1) analyzing the traffic congestion and delay for network coding; (2) using network coding in multicast applications

238

Z. Guo, B. Wang, and J.-H. Cui

in underwater sensor networks, e.g., command distribution or software update from one source to all other nodes; (3) using network coding in the architecture with multiple virtual sinks.

References 1. I. F. Akyildiz, D. Pompili, and T. Melodia, “Challenges for efficient communication in underwater acoustic sensor networks,” ACM SIGBED Review, vol. 1, July 2004. 2. J. Heidemann, W. Ye, J. Wills, A. Syed, and Y. Li, “Research challenges and applications for underwater sensor networking,” in Proceedings of the IEEE Wireless Communications and Networking Conference, Las Vegas, Nevada, USA. 3. J. H. Cui, J. Kong, M. Gerla, and S. Zhou, “Challenges: Building scalable mobile underwater wireless sensor networks for aquatic applications,” in IEEE Network, Special Issue on Wireless Sensor Networking, June 2006. 4. P. Xie and J. H. Cui, “SDRT: A reliable data transport protocol for underwater sensor networks,” tech. rep., University of Connecticut, Computer Science and Engineering Dept., February 2006. 5. Z. Guo, P. Xie, J.-H. Cui, and B. Wang, “On applying network coding to underwater sensor networks,” in Proceedings of ACM WUWNet’06, Los angeles, CA, September 2006. 6. R. Ahlswede, N. Cai, S. R. Li, and R. W. Yeung, “Network information flow,” IEEE Transactions on Information Theory, vol. 46, July 2000. 7. C. Fragouli, J.-Y. L. Boudec, and J. Widmer, “Network coding: An instant primer,” ACM SIGCOMM Computer Communication Review, January 2006. 8. P. Xie, J. H. Cui, and L. Lao, “VBF: Vector-based forwarding protocol for underwater sensor networks,” in Proceedings of IFIP Networking’06, Coimbra, Portugal, May 2006. 9. D. Ganesan, R. Govindan, S. Shenker, and D. Estrin, “Highly-resilient, energy-efficient multipath routing in wireless sensor networks,” ACM SIGMOBILE Mobile Computing and Communication Review, vol. 5, no. 4, 2001. 10. W. K. Seah and H. Tan, “Multipath virtual sink architecture for underwater sensor networks,” in Proceedings of the MTS/IEEE OCEANS2006 Asia Pacific Conference, May 2006. 11. T. Ho, R. Koetter, M. Medard, D. R. Karger, and M. Effros, “The benefits of coding over routing in a randomized setting,” in International Symposium on Information Theory (ISIT), 2003. 12. V. Chandrasekhar, W. K. Seah, Y. S. Choo, and H. V. Ee, “Localization in underwater sensor networks - survey and challenges,” in Proceedings of ACM WUWNet’06, Los angeles, CA. 13. J. Wills, W. Ye, and J. Heidemann, “Low-power acoustic modem for dense underwater sensor networks,” in Proceedings of ACM WUWNet’06, Los angeles, CA, September 2006. 14. Z. Guo, B. Wang, and J.-H. Cui, “Efficient error recovery using network coding in underwater sensor netowrks,” tech. rep., University of Connecticut, Computer Science and Engineering Dept., November 2006. 15. D. B. Kilfoyle and A. B. Baggeroer, “The state of the art in underwater acoustic telemetry,” IEEE Journal of Oceanic Engineering, vol. OE-25, no. 5, pp. 4–27, 2000.

Key Predistribution Schemes for Sensor Networks for Continuous Deployment Scenario* Abdülhakim Ünlü, Önsel Armağan, Albert Levi, Erkay Savaş, and Özgür Erçetin Sabanci University, Istanbul, Turkey {aunlu,onsel}@su.sabanciuniv.edu {levi,erkays,oercetin}@sabanciuniv.edu

Abstract. In sensor networks, secure communication among sensor nodes requires secure links and consequently secure key establishment. Due to resource constraints, achieving such key establishment is non-trivial. Recently some random key predistribution techniques have been proposed to establish pairwise keys. Some of these approaches assume certain deployment knowledge is available prior to deployment and nodes are deployed in groups/bundles. In this paper, we propose another practical deployment model where nodes are deployed over a line one by one in a continuous fashion. In this model, sensor nodes can also be deployed over multiple parallel lines to cover two-dimensional area. Based on this model, we develop two key predistribution schemes. Analysis and simulation results show that our key predistribution schemes make use of the deployment knowledge better than the existing schemes. Thus they perform better than other location-aware protocols using the metrics of connectivity, resiliency, memory usage and communication cost for key establishment.

1 Introduction In sensor networks [1], confidentiality, privacy and authenticity of communication between sensor nodes are important when nodes are deployed in an environment where there are adversaries. In order to fulfill these security requirements, cryptographic techniques are employed. Generally symmetric cryptography is used to provide security in sensor networks. In order to use symmetric key cryptography, communicating sensor nodes must share the same key. Distribution of keys to large amount of sensor nodes, so that they can establish secure links, is an active research area. Generally key predistribution schemes [2-8], where the keys are stored in sensor nodes before deployment, are used for this purpose. A naïve way of key predistribution is to generate a master key and install this master key to all nodes before the deployment. However in this scheme, when a node is captured, the master key is also captured and all secure links in the sensor network are compromised. *

Albert Levi and Abdülhakim Ünlü are supported by the Scientific and Technological Research Council of Turkey under project number 104E071. Erkay Savaş is supported by the Scientific and Technological Research Council of Turkey under project number 104E007.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 239–250, 2007. © IFIP International Federation for Information Processing 2007

240

A. Ünlü et al.

Another extreme key predistribution way is to assign unique link keys for each node. In this method, compromise of one node leads to compromise of only that node’s links. However, this method is not scalable since the total number keys to be predistributed per node should be as much as the number of nodes in the network in order to guarantee that after deployment each neighboring node pair shares a key. In order to overcome this scalability problem and effectively use the node memory, Eschenauer and Gligor proposed a probabilistic key predistribution scheme [5]. In this scheme, before sensor deployment, a key server creates a key ring for each node, by picking a limited amount of random keys from a large key pool. Then the key server loads the key ring to memory of each node. After deployment, sensor nodes in the field let their neighbors know which keys they have. If two neighboring nodes share one or more identical keys, then they can establish a secure link. After this shared key discovery with direct neighbors, neighboring node pairs that do not share keys can establish secure links in multiple hops. If the local connectivity (in terms of secure links) is above a certain threshold, then random graph theory [9] states that overall sensor network will be cryptographically connected with high probability. Du et al. utilized Blom’s key management scheme [12] in a key predistribution scheme for sensor networks [4]. This scheme shows a threshold property; until λ nodes are captured, the network is perfectly secure, but after λ nodes are compromised all secure links are compromised. Some recent papers on random key predistribution [3,7,10,11] utilized expected location information of sensor nodes in their models. In all these location-aware approaches, it is assumed that nodes are prepared in small groups and deployed as bundles, e.g. groups of nodes can be dropped from a plane, similar to parachuting troops or dropping cargo. The nodes in the same group have a very large chance to be in the range of each other. Moreover, the node groups that are dropped next to each other also have a chance to be close to each other on the ground. Using this deployment location knowledge, key pools and key rings are arranged and performance of key predistribution schemes can be improved substantially. In location aware schemes, the node deployment model is one of the most important design criteria that directly affects the performance of the scheme. As discussed above, a batch deployment strategy is assumed in the location aware random key predistribution schemes proposed in the literature. Such a deployment strategy may not be appropriate for scenarios like borderline or perimeter defense - if sensors are deployed in bundles, it is likely that there will be places on the border with a few or no sensor nodes. Moreover, there is still room to further improve the performance, in terms of connectivity, resiliency and memory usage, of location-aware key predistribution schemes with more realistic deployment models. 1.1 Our Contribution We introduce a new deployment model, called the continuous deployment model, and develop two key pre-distribution schemes on this model. The main idea behind the continuous deployment model is to drop the nodes one by one (i.e. not in batches) continuously from an aerial vehicle. The aerial vehicle may follow a continuous line for perimeter defense applications. In applications that need area coverage, the vehicle may follow a route with several parallel lines. We use the latter scenario, which is

Key Predistribution Schemes for Sensor Networks for Continuous Deployment Scenario

241

more complicated than the former one, in the development and the analysis of key pre-distribution schemes developed based on the continuous deployment model. In our first key pre-distribution scheme, it is assumed we know the order in which the nodes are dropped off for each line. For key predistribution in this scheme, we take a deterministic approach and assign pairwise keys to sensor nodes. In our second key predistribution scheme, we relaxed the order assumption such that the dropping order of the sensor nodes is not known, but the nodes to be dropped for each line are grouped. Here we use a probabilistic key predistribution mechanism; for each line, each node is assigned some keys from the key pools. We anticipate that the use of more deployment knowledge, as in the methods that we proposed, would improve the performance of the system. We performed analytical and simulation-based performance evaluation of the proposed schemes and show that the proposed approach actually improves key predistribution performance over Du et al.’s scheme [3], in which the nodes are deployed in groups, in terms of connectivity, resiliency against node capture and memory usage.

2 The Continuous Deployment Model In this section, we introduce a practical deployment model, where nodes are deployed sequentially but not in batches. In our deployment model the nodes are dropped one by one following a trajectory. This model can be easily realized by dropping nodes through a pipe in a plane as the plane flies over a known route. For example, if a rectangular area is to be covered with sensor nodes, the plane takes a route where it scans the rectangular area line by line. Figure 1 shows an example sensor network deployed in this model.

Fig. 1. A sample sensor network

The point where a node is dropped out of plane or helicopter is called its deployment point. However, due to several reasons its actual position drifts from deployment point. The actual position of a node on the field after deployment is named its resident point. Both deployment and resident points are defined in two-dimensional space. In the rest of the paper, the deployment area is assumed to be a rectangular one. In this area, there are L parallel lines and N nodes per line. Our deployment model assumes fixed intervals between the deployment points of two consecutive nodes of a line. The deployment point of ith node on jth line is denoted as dji, where j=1 ... L and i=1 ... N. Similarly the resident point of that node is denoted as rji.

242

A. Ünlü et al.

The resident point of a node may float away from its deployment point. Due to this fact, two nodes with deployment point dli and dlj, where dli < dlj, can be at resident points rli and rlj where rli > rlj. In our model, two nodes can be neighbors according to their deployment points, but they can be out of each others’ coverage after deployment. We call two nodes neighbors only if their resident points are close enough so that they can directly communicate over radio. The density of the lines and node dropping frequency are important system parameters to keep the resulting sensor network connected. Our model utilizes two-dimensional Gaussian distribution function to determine probability of a node being at a resident point based on its deployment point.

3 Continuous Key Predistribution Scheme Key distribution for our deployment model can be performed in two ways. In the first way, we assume that the deployment order of individual nodes is known. In this way, the neighboring relationships, in terms of the deployment points, are known. Such knowledge yields very efficient key distribution method that will be discussed later. However, in order to realize this, we have to transfer cryptographic materials to the nodes just before dropping them, so we need to have a complex setup inside the plane. Alternatively, we may transfer cryptographic materials before loading them to plane, but we have to preserve nodes’ order by, for example, keeping all the sensor nodes in pipes. In the second way, a line of nodes is treated as a single group. We do not assume knowledge of order of nodes; we just form groups of nodes and then store cryptographic material according to the key distribution scheme that will be explained later. Then, we deploy each group as a line in a random order. This approach is simpler to realize than the first method, but it has some performance deficiencies that will be discussed in this paper. We propose two different key predistribution schemes, Scheme I and Scheme II, for the above two ways. Both of them follow well-known three phase approach as in other key predistribution schemes proposed in the literature. First phase is the “predistribution” phase, where keys are stored in nodes according to a method proposed by the scheme. Second phase is the “direct key establishment” phase, where nodes discover their neighbors and find out if they share common keys with their neighbors to form secure link. Third phase is the “path key establishment phase”, where a node tries to find secure paths to its neighbors, with which it does not share common keys, in order to establish secure link. A secure link exists between two nodes if they both own at least one key in common and they are neighbors. We assume that all keys have unique IDs. 3.1 Key Predistribution Scheme I Parameters and the symbols used in this scheme are: N number of nodes on a line L number of lines that makes up the sensor network M number of keys shared with nodes on the same line

Key Predistribution Schemes for Sensor Networks for Continuous Deployment Scenario

Q d A Li dij sij rij

243

number of keys shared with nodes on adjacent line distance between deployment points on a line radio range of a node ith line, where i=1 .. L deployment point of jth node on line Li, j=1 .. N the id of the sensor node with deployment point dij resident point of jth node on line Li, where j=1 .. N

Along with the deployment model examined in the previous section, sensor nodes, which are placed adjacent in the pipe, have high probability of being neighbors after deployment. Similarly, sensor nodes, which have similar locations in consecutive pipes, maintain the likelihood of being neighbors. As a result of this observation, we infer that a pairwise key predistribution method would work efficiently. Thus, we adopted such a strategy in our method. There are three phases in this scheme as described above: (i) Predistribution Phase, (ii) Direct Key Establishment Phase, and (iii) Path Key Establishment Phase. Predistribution phase. This phase is split into two: inline key predistribution and cross-line key predistribution. Inline key predistribution is for the nodes within the same line of deployment. Cross-line key predistribution is for the nodes in adjacent lines. Figure 2 depicts this phase. Q nodes Li+1 M nodes Li si1

sij

siN

Li-1 Q nodes

Fig. 2. Node sij shares keys with square-shaped nodes

Inline Key Predistribution. The setup server creates and stores pairwise keys in sensor nodes such that each node shares keys with its M neighbors on the current line. More formally, for all i=1 .. L, and j=1 .. N, the setup server creates M keys to be stored in sij such that sij and its M neighboring sensor nodes, si(j-M/2),…,si(j1),si(j+1),….,si(j+M/2),share unique keys. Cross-Line Key Predistribution. Sensor nodes also share keys with their neighbors in neighboring lines. For all i=1 .. L, and j=1 .. N, the setup server creates 2*Q keys to be stored in sij such that this node shares unique pairwise keys with Q nodes from the lower line, s(i-1)(j-Q/2),…….,s(i-1)(j+Q/2), and also Q nodes from an upper line, s(i+1)(jQ/2),…….,s(i+1)(j+Q/2). After these two processes, a sensor node will have M+2Q keys before deployment.

244

A. Ünlü et al.

Direct Key Establishment Phase. After deployment, sensor nodes communicate with their neighbor nodes to discover shared keys in order to establish secure links Shared key discovery is trivial, since sensor nodes already have IDs of nodes with which they share pairwise keys. So a node only needs to know the IDs of its neighbors. When a node finds a matching node in its neighborhood, they can immediately start using their pairwise shared key. Unauthorized entities cannot know the IDs of keys used to secure links or IDs of keys in any node, since only node IDs are to be transmitted over unencrypted links,. This phase is indifferent for both nodes on the same line and nodes on the adjacent lines. Path Key Establishment Phase. After direct key establishment phase, a node may end up with a case where it has neighbors that it cannot find a shared key to establish a secure link. Thus, these two neighboring nodes without a secure link will have to find a secure path, which is a path of secure links, through their other neighbors. The process of establishing a secure link over a secure path is called path key establishment. The process works as follows. Assume node sij does not have a secure link with its neighbor node sik. Node sij asks its 1-hop neighbors, with which it has secure links, to see if they also have secure links with node sik. If any of the neighbors, say sin, has such a secure link, then sin generates a random key and sends the key to both node sij and sik over secure links. Then the nodes sij and sik use this key to establish a secure link. If none of the 1-hop neighbors have secure link with node sik, then node sij asks its 2-hop neighbors. If not found again, sij asks to next hop neighbors until it finds a node that shares key with sik. If the graph of secure links is a connected graph, a node eventually finds a secure path to any node in the sensor network. In our analysis, we will show that a node can reach all its neighbors with high probability in three hops of secure links. 3.2 Key Predistribution Scheme II Parameters and the symbols used in this scheme are: N L Li Si sI sc MI Mc K d A Aij dli rli sij kij

number of nodes on a line number of lines that makes up the sensor network ith line, where i=1 .. L key space of line i number of nodes a key in Si is distributed on Li number of nodes a key in Si is distributed on neighbors of Li memory space of nodes of Li for keys from Si memory space of nodes of neighbors of Li for keys from Si number of unique keys in a key space distance between deployment points on a line radio range of a node circular area around node sij, where sij can send and receive radio signals deployment point of node i on line l resident point of node i on line l the id of sensor node with deployment point dij the id of jth key in key space Si

Key Predistribution Schemes for Sensor Networks for Continuous Deployment Scenario

245

In this model, we do not assume a particular order in the deployment points of the nodes of a line. Thus we use less deployment knowledge as compared to predistribution scheme I. Although the scheme is still based on pairwise key distribution some redundancy should also be added in order to achieve a reasonable level of connectivity. Setup server generates groups of unique keys for each line. This keys form the key space, Si, of line i. There are K keys in each key space, and a node from line i gets keys from Si, Si-1 and Si+1 according to the key predistribution method. The duplication of each key is limited and determined parametrically. Similar to Scheme I, this scheme has three phases; predistribution, direct key establishment and path key establishment. Predistribution Phase. In key predistribution step, we describe the method how keys are distributed to nodes on various lines. Setup server generates key spaces for each line, Si, where i = 1 .. L, then distributes sI and sc copies of each key as explained below. Our aim here is to distribute the keys such that nodes that are expected to be near share more keys. Key predistribution method for each kij, where i = 1 .. L and j = 1.. N, is as follows: 1. 2. 3.

Key kij is randomly generated for key space of Si of line Li. sI nodes with sufficient space in their MI are randomly selected on Li and kij is installed in those sI nodes. sc nodes with sufficient space in their Mc are selected randomly from each neighboring lines of Li. So, 2sc nodes are selected from two neighboring lines. Then kij is installed in those sc nodes in each neighboring line.

At the end of key predistribution phase, each key from key space Si has a total of (sI + 2sc) copies on three lines; sI copies in Li and 2sc copies in Li-1 and Li+1. And each node has a total of MI + Mc keys installed. We can calculate K, the size of each key space Si, i = 1 .. L, by using sc, sI, MI, and Mc. Since there are N sensor nodes on line i, and since setup server loads exactly MI unique keys from Si into each node on line i, setup server will need NM I keys. Each key from Si will have sI copies on line i. Also, sc copies of keys from Si+1 and Si-1 will be loaded into nodes from line i, and each node has Mc memory for keys from neighboring key spaces. Then, number of unique keys in any key space, K, can be computed as follows:

K = NM I s I = NM c 2s c Direct Key Establishment Phase. After deployment, nodes have to find shared keys with its neighbors. This phase is similar to the basic scheme [5]. Here, each node needs to know which keys its neighbors have so that it can decide which keys they share. Each node broadcasts a message containing the indices of the keys it carries. Nodes can use these broadcast messages to find out if they share common keys with their neighbors. If a node can find a shared key with one of its neighbors, it can use that key to establish a secure link between itself and its neighbor. Path Key Establishment Phase. If two neighboring nodes cannot find a shared key directly, they have to reach a common key over a secure path. This method is identical to the path key establishment method in Scheme I.

246

A. Ünlü et al.

4 Performance Analysis In our analysis and simulation, we use the following configuration. Deployment area is 1000m x 1000m. There are 50 deployment lines, i.e. L=50, and the distance between lines is 20m. On each deployment line there are 200 nodes, i.e. N=200. Total number of sensor nodes, NxL, is 10000. Distance between two adjacent deployment points, d, is 5m. Communication range, R, for each node is 40m. Standard deviation of normal distribution, σ, is 10m. For scheme II, total number of unique keys is 50K = 100000. 4.1 Local and Global Connectivity In this section, we show our simulation results of the probability of a node sharing a key with its neighbors. This probability is called local connectivity, Plocal. The detailed formulation for Plocal could not be given here due to space limitations. Figure 3 shows local connectivity versus memory usage m. We compare results for our scheme I and scheme II with Du et al’s [3] scheme. Scheme II has higher connectivity than [3] for all values of m. For scheme II, different values of sI and sc results in different Plocal values even for the same memory usage. In our experiments for various m values, we obtained best results when sI and sc are equal. Scheme I outperforms both scheme II and Du’s scheme for low m values. In our simulations, scheme I reached a maximum local connectivity value of 0.8518 at M=28 and Q=26 that yields m=80. As the number of keys used increases after m=80, local connectivity stays the same. Increasing M and Q, and consequently m, values means that a node shares keys with distant nodes. This will not contribute to the local connectivity, because distant nodes have very small probability of falling within that node’s communication range. Simulation results in Figure 3 confirm our explanation. There are two factors that makes scheme II’s connectivity performance better than Du et al.’s scheme. Firstly, in our schemes we use more deployment knowledge such that in Du’s scheme there is a single deployment point for each bundle of nodes, whereas in scheme II there are deployment points for each node. Secondly, in scheme II, 1 0.9 0.8

0.7

Plocal

0.6

0.5 0.4 0.3

0.2 Our Scheme 2 Du's Scheme Our Scheme 1

0.1

0

0

20

40

60

80

100

120

140

160

180

200

m

Fig. 3. Local connectivity versus memory usage m

220

Key Predistribution Schemes for Sensor Networks for Continuous Deployment Scenario

247

we distribute copies of a key homogeneously. We distribute copies of a key to both upper and lower neighboring lines, so a node can use keys in its Mc to establish secure links with nodes on the same line, on its direct neighbor lines and nodes on two lines away. In addition, by introducing sI and sc, we can have a fixed number of copies of all keys. Du’s scheme can have the same average number of copies of keys with same m values but a particular key can have a much higher or much lower number of copies. Fixing number of copies in scheme II contributes to homogeneity of key distribution. A high local connectivity value means that a node can communicate with most of its neighbors securely. However, a high local connectivity value does not guarantee that there will not be isolated parts in the network. Thus, we need to examine that whether our schemes can create too many isolated components or not. We measured, global connectivity, which is the ratio of size of largest isolated part to the size of whole network, through simulations. The results show that 100% global connectivity is reached when m is as low as 10 for Scheme I and 30 for Scheme II. Since we determine the deployment point of all nodes in Scheme I and fix the number of copies of a key in Scheme II, we minimize the possibility that network has more than one isolated part. Our simulation results support this idea. 4.2

Resiliency Against Node Capture

We investigate the effects of compromised nodes on direct key establishment. We assume that total c randomly chosen nodes are compromised. The fraction of additional communications that can be compromised based on the information from the compromised nodes defines the resiliency of our system. This section is focused on the resiliency of Scheme II against node capture attacks. Scheme I uses pairwise keys, therefore it is %100 resilient against compromising of sensor nodes. Let kaj denote a key generated for line La. Assume kaj is used for a link between two nodes that are not compromised. We know that there are Si copies of kaj on the nodes of La, Sc copies of kaj on the nodes of La-1 and Sc copies of kaj on the nodes of La+1. Thus, in order to compromise kaj, adversary should compromise nodes from La-1, La, and La+1. If there are j compromised nodes on La, the probability that kaj is not compromised on line La is given as: ⎛N − pcomp _ i ( j ) = ⎜⎜ ⎝ sI

j⎞ ⎟⎟ ⎠

⎛N⎞ ⎜⎜ ⎟⎟ ⎝ sI ⎠

(1)

If there are j compromised nodes on an adjacent line of La, the probability that kaj is not compromised on that adjacent line is given as: p comp

_ c

⎛N − j⎞ ⎟⎟ ( j ) = ⎜⎜ ⎝ sc ⎠

⎛N ⎞ ⎜⎜ ⎟⎟ ⎝ sc ⎠

(2)

Thus, if there are x compromised nodes on La-1, y compromised nodes on La and z compromised nodes on La+1, the probability that kaj is not compromised becomes Pcomp_c(x)*Pcomp_i(y)*Pcomp_c(z) .The probability that there are x compromised nodes on line La-1, y compromised nodes on line La and z compromised nodes on line La+1 is calculated as: ⎛ c ⎞⎛ c − x ⎞⎛ c − x − ⎟⎟⎜⎜ pc _ xyz = ⎜⎜ ⎟⎟⎜⎜ ⎝ x ⎠⎝ y ⎠⎝ z

y ⎞⎛ 1 ⎞ x ⎛ 1 ⎞ y ⎛ 1 ⎞ z ⎛ L − 3 ⎞ c− x− y− z ⎟⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎠⎝ L ⎠ ⎝ L ⎠ ⎝ L ⎠ ⎝ L ⎠

(3)

248

A. Ünlü et al.

By using equations 1, 2, and 3, we calculate the probability that an adversary obtains a key, which is used for a link between two nodes that are not compromised, out of randomly compromised c nodes as given below: N N −2

pcomp_ all (c) = 1 − ∑ ∑ x=0 y=0

N

∑ z =0

⎛ c ⎞⎛ c − x ⎞⎛ c − x − y ⎞⎛ 1 ⎞ ⎟⎟⎜ ⎟ ⎟⎟⎜⎜ ⎜⎜ ⎟⎟⎜⎜ ⎠⎝ L ⎠ ⎝ x ⎠⎝ y ⎠⎝ z

x+ y+ z

⎛ L − 3⎞ ⎜ ⎟ ⎝ L ⎠

c−x− y−z

Pcomp_ c ( x) Pcomp_ i ( y) Pcomp_ c ( z)

(4)

Comparison of our scheme II and Du et al.’s [3] scheme is shown in Figures 4 and 5. In both schemes probability of a link being compromised, Pcomp_all, is plotted against number of nodes captured. In Figure 4, number of keys in a node is taken as 60 and number of nodes is 10000 for both schemes. In Figure 5, we fix local connectivity to 0.86 for both schemes. Our scheme is outperforms Du’s scheme, because we can reach a local connectivity of 0.86 with only m=90 keys in a node, whereas Du’s scheme requires m=140 to reach the same local connectivity. Probability of a secure link being compromised when a number of nodes are captured is directly proportional to the number of copies of a key. In scheme II, number of copies of a key is a parameter determined by sI and sc. In Figure 5, for scheme II there are 3+2*3 = 9 copies of a key. In Du et al.’s scheme, a key has a random number of copies but we can find an average number of copies of a key by using |S|, number of unique keys in the sensor network, N, number of total nodes and m, number of keys in each node: k average = N ⋅ m S . Because we used the same m values for both scheme II and Du’s scheme in Figure 4, there were six copies of a key for both schemes and we got very similar results for both schemes as shown in the figure. 0.09 Our Scheme 2 Du's Scheme

0.08

Probability of a link being compromised

0.07

0.06

0.05

0.04

0.03

0.02

0.01

0

20

40

60 80 Number of nodes captured

100

120

140

Fig. 4. Number of nodes captured vs. probability of a link compromised. m=60, si=2, sc=2

4.3 Path Key Establishment Overhead As the number of hops in path key establishment phase increases, a node can reach more of its neighbors and communication cost increases. We analyzed path key establishment through simulations for scheme II and depicted the results in Figure 6.

Key Predistribution Schemes for Sensor Networks for Continuous Deployment Scenario

249

The ratio of neighbors that a node can reach in i hops is defined as pl(i). Obviously, pl(1) gives local connectivity. Our scheme performs better than Du et al.’s scheme [3] such that our scheme needs less number of hops for small m values. It can be observed from Figure 6 that for m=60 or larger values of m, a node can reach all its neighbors in at most two hops. In [3], only 63% of the nodes reach their neighbors in at most two hops when m=60. Moreover, in [3], m should be 200 in order for a node to reach all of its neighbors in at most two hops. 0.18 Our Scheme 2 Du's Scheme

0.16

Probability of a link being compromised

0.14

0.12

0.1

0.08

0.06

0.04

0.02

0

20

40

60 80 Number of nodes captured

100

120

140

Fig. 5. Number of nodes captured vs. probability of a link compromised for plocal=0.86. For our scheme: m=90, si=3 and sc=3. For Du’s scheme: m=140. 1

0.9

0.8

0.7

pl (i)

0.6

0.5

0.4

0.3

0.2 3 hops 0.1

2 hops 1 hop

0

30

60

90

120 m

150

180

210

Fig. 6. Path Key Establishment Overhead

5 Conclusions In this paper, we proposed a new deployment model and two novel key predistribution schemes based on the proposed model. In our deployment model, we proposed

250

A. Ünlü et al.

the nodes to be deployed in lines in a continuous fashion. This model is practical and can be realized easily. In the proposed scheme I, we assume to know the deployment points of each node and with that knowledge we distribute pairwise keys to each node to be used for communication between its neighbors. In scheme II, we loosen this assumption and assume that a node can be at any deployment point in a known line. We compared our schemes with Du et al.’s key predistribution scheme [3]. Performance evaluation showed that scheme I can reach high local connectivity values even with small memory usage. This is due to the assumption in scheme I that we can know the neighbors of each node according to their deployment points. However, there is an upper limit in local connectivity; other schemes can have better local connectivity with high memory usage, whereas local connectivity in scheme I stays the same at 0.85 after a certain point. On the other hand, scheme II achieves higher local connectivity values than Du’s scheme in all cases. Both scheme I and II show good performance in global connectivity and it is possible to reach 100% global connectivity with small memory usage. Moreover, scheme II has better node capture resiliency than Du et al.’s scheme with the same local connectivity value. Furthermore, communication cost of path key establishment overhead is smaller in our schemes.

References [1] F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “A survey on sensor networks,” IEEE Communications Magazine, vol. 40, no. 8, pp. 102–114, August 2002. [2] H. Chan, A. Perrig, and D. Song. Random key predistribution schemes for sensor networks. In IEEE Symposium on Research in Security and Privacy, pages 197–213, 2003. [3] W. Du, J. Deng, Y. S. Han, S. Chen, and P. Varshney. A key management scheme for wireless sensor networks using deployment knowledge. In Proceedings of IEEE INFOCOM’04, March 2004. [4] W. Du, J. Deng, Y. S. Han, and P. Varshney. A pairwise key predistribution scheme for wireless sensor networks. In Proceedings of ACM CCS’03, pages 42–51, October 2003. [5] L. Eschenauer and V. D. Gligor. A key-management scheme for distributed sensor networks. In Proceedings of the ACM CCS’02, pages 41–47, November 2002. [6] D. Liu and P. Ning. Establishing pairwise keys in distributed sensor networks. In Proceedings of ACM CCS’03, pages 52–61, October 2003. [7] D. Liu and P. Ning. Location-based pairwise key establishments for static sensor networks. In Proceedings of ACM SASN ’03, pages 72–82, October 2003. [8] S. Zhu, S. Setia, and S. Jajodia. LEAP: Efficient security mechanisms for large-scale distributed sensor networks. In Proceedings of ACM CCS’03, pages 62–72, October 2003. [9] J. Spencer, The Strange Logic of Random Graphs, Algorithms and Combinatorics 22, Springer-Verlag 2000. [10] D. Liu, P. Ning, and W. Du. Group-Based Key Pre-Distribution in Wireless Sensor Networks. In Proceedings of 2005 ACM Workshop on Wireless Security. [11] D. Huang, M. Mehta, D. Medhi, and L. Harn. Location aware Key Management Scheme for Wireless Sensor Networks. SASN’04, October 25, 2004, Washington, DC, USA. [12] R. Blom. An optimal class of symmetric key generation systems. In Proceedings of EUROCRYPT 84, 1985.

Using Auxiliary Sensors for Pairwise Key Establishment in WSN Qi Dong and Donggang Liu Department of Computer Science and Engineering The University of Texas at Arlington Box 19015, Arlington, Texas 76019-0015, USA {qi.dong,dliu}@uta.edu

Abstract. Many techniques have been developed recently for establishing pairwise keys in sensor networks. However, they are either vulnerable to a few number of compromised sensor nodes or involve expensive protocols for establishing keys. This paper introduces a much better alternative for achieving high resilience to node compromises and high eﬃciency in key establishment. The main idea is to deploy additional sensor nodes, called assisting nodes, to help the key establishment between sensor nodes. The proposed approach has many advantages over existing approaches. In this approach, a sensor node only needs to make a few local contacts and perform a few hash operations to setup a key with any other sensor node in the network at a very high probability. The majority of sensor nodes only need to store a single key in their memory space. Besides these beneﬁts, it still provides high resilience to node compromises. The implementation of this approach on TelosB motes also demonstrates its feasibility for pairwise key establishment in sensor networks. Keywords: Key management, pairwise keys, sensor networks.

1

Introduction

Wireless sensor networks are ideal candidates for a wide range of applications in military and civilian operations such as health monitoring, data acquisition in hazardous environments, and target tracking. Security has been recognized as a critical requirement for many sensor applications, especially in military operations. Key management is the cornerstone to ensure the security of many network operations. As one of the most fundamental security services, pairwise key establishment enables secure node-to-node communication using cryptographic methods such as encryption and authentication. Many techniques have been developed recently to setup pairwise keys in sensor networks [1,2,3,4,5,6,7,8,9,10]. Perrig et al. developed the SNEP protocol to provide pairwise key establishment using a KDC [1]. This approach, however, introduces huge communication overhead and is vulnerable to single point failure. A number of key pre-distribution schemes were proposed to establish keys I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 251–262, 2007. c IFIP International Federation for Information Processing 2007

252

Q. Dong and D. Liu

without the online KDC [2,3,4,5,6]. These approaches preload a small set of secrets into every sensor node before deployment to make sure that after deployment, every two sensor nodes can setup a shared key using their preloaded secrets. However, these approaches either require expensive protocols (e.g., path key establishment) to setup keys or are vulnerable to a small number of compromised sensor nodes. In addition, some techniques also use the sensors’ location information and assume static sensor nodes [7,8,11,10,9]. However, these two assumptions may not be true in practice. This paper presents a novel technique for pairwise key establishment in sensor networks. The main idea is to deploy additional sensor nodes, called assisting nodes, to help the key establishment between sensor nodes. Diﬀerent from the nodes in traditional networks where they are mainly used for sensing and forwarding, the assisting nodes are only responsible for key management in the network, exploiting a novel dimension of using sensor nodes. The proposed approach has many advantages over existing approaches. First, it can achieve a very high probability of establishing a shared key between any two sensor nodes. Second, a sensor node only needs to make a few local contacts and perform a few hash operations to setup a key with any other sensor node in the network. Third, the majority of sensor nodes only need to store a single key in their memory space. Fourth, it does not depend on the sensors’ location information and can be used for the sensor networks with highly mobile sensor nodes. Besides these beneﬁts, our approach still provides high resilience to node compromises. Finally, the implementation of this approach on TelosB motes [12] also demonstrates its feasibility for key establishment in sensor networks. The rest of the paper is organized as follows. Section 2 presents our pairwise key establishment protocol as well as the detailed analysis. Section 3 gives the implementation issues. Section 4 reviews related work. Section 5 concludes this paper and points out some future work.

2

Pairwise Key Establishment

This section provides the technical detail as well as the analysis on how to establish pairwise keys using auxiliary sensors. In this paper, we consider the sensor networks consisting of a large number of tiny resource-constrained sensor nodes [13]. These sensor nodes can be static or highly mobile. We assume that the attacker can eavesdrop, modify, forge, replay or block any network traﬃc. We also assume that the attacker can compromise a few sensor nodes and learn all the secret information, including the keying materials, on those compromised nodes [14]. 2.1

Protocol Description

Typically, sensor nodes are deployed to sense the conditions in their local surroundings and report observations for various uses. However, in this paper, we exploit a new dimension of using sensor nodes and believe that it is important

Using Auxiliary Sensors for Pairwise Key Establishment in WSN

253

to deploy sensor nodes to facilitate certain network protocols such as key management. Hence, the main idea of our approach is to deploy additional sensor nodes, called assisting nodes, to help the pairwise key establishment between sensor nodes. The detailed protocol is presented below. Let n be the network size and m be the number of assisting sensor nodes. For convenience, we call the sensor nodes that are not assisting nodes as the regular sensor nodes. – Initialization: Before deployment, the base station generates a master key Ku for every sensor node u. The master key Ku is only known by the sensor node u and the base station. Every assisting node i will get preloaded with a hash H(Ku ||i) for every regular sensor node u, where H is a one-way hash function, and “||” denotes the concatenation operation. Hence, an assisting node will need to store n hash images. This clearly introduces considerable storage overhead at assisting sensor nodes. However, the only job of the assisting nodes is to help pairwise key establishment. As a result, they can use all their memory, including the ﬂash memory, to store these values. Therefore, we believe that it will be feasible for an assisting node to store n hash images. For instance, the TelosB motes have 1MB ﬂash memory and can store the hash images for a network of 128,000 sensor nodes if every hash is 8 bytes long. Additionally, research focusing on high-capacity and energyeﬃcient storage subsystem on sensor network platforms has drawn a lot of attention, which will soon make it possible to equip a sensor node with a large ﬂash memory [15] without increasing the cost signiﬁcantly. Therefore, more and longer hash images can be stored in each assisting node for a very large sensor network. – Pairwise Key Establishment: After deployment, every regular sensor node discovers the assisting nodes in its neighborhood. When a sensor node u needs to establish a pairwise key with another node v, it will send a request to every neighbor assisting node i. The request message includes the IDs of both sensor nodes and will be protected by the key H(Ku ||i), which has been preloaded to the assisting node i. The assisting node i will serve as a KDC and generate a reply to u. This reply message includes two copies of a random key R, one is protected by H(Ku ||i) (for node u) and the other is protected by H(Kv ||i) (for node v). This procedure is similar to the Needham-Schroeder Symmetric Key Protocol[16]. After the request, u will get a random key from every neighbor assisting node. Let {R1 , ..., Rl } be the set of all these random keys. The ﬁnal key Ku,v between u and v is simply the bit-wise XOR of all these keys, i.e., Ku,v = R1 ⊕ R2 ⊕ · · ·⊕ Rl . Obviously, as long as at least one random key is secure, the ﬁnal key will be safe. Though our later analysis in Section 2.2 shows that even a small fraction of assisting nodes can guarantee a high probability of establishing pairwise keys using the above algorithm, it is still possible that a regular sensor node cannot ﬁnd any assisting sensor node in its neighborhood since the accurate deployment of assisting nodes may not be guaranteed in some scenarios. To deal with this issue, we have supplemental key establishment, where a regular sensor node may

254

Q. Dong and D. Liu

discover the set of assisting nodes within a certain number of hops. This will certainly increase the chance of ﬁnding an assisting node to use. An additional beneﬁt of doing this is to achieve better security performance. From the previous description, u derives the ﬁnal key by applying XOR operations to all the random keys, which implies that the more random keys, the higher the security of the ﬁnal key. – Supplemental Key Establishment: In this step, a sensor node u discovers the assisting sensor nodes that are no more than h hops away from itself. This can be easily achieved by having node u’s neighbors to help collecting the IDs of the assisting nodes around them. The neighbor nodes will broadcast the inquiry message on behalf of u, and forward u the replies from assisting nodes. Once such set is discovered, the remaining step will be similar to the pairwise key establishment discussed before. The discovery and usage of assisting nodes multiple hops away will introduce additional communication overhead since the intermediate nodes will be needed to relay the messages. However, this will only involve communication in a local area, which we believe will not be a big problem for the current generation of sensor networks. Also, the need for the supplemental key establishment will not likely be invoked frequently. 2.2

Analysis and Discussion

This subsection will present the performance analysis of the proposed scheme, focusing on the probability of establishing pairwise key, the resilience against node captures, and the overheads. For simplicity, we assume that all the n regular sensor nodes and the m assisting nodes are evenly deployed in the ﬁeld. Probability of Establishing Keys: During the pairwise key establishment, a sensor node u is required to communicate with at least one assisting node in its neighborhood to setup a key with another sensor node. Let d denote the average number of one-hop neighbors of a sensor node. The probability that any assisting node i is not in the local area of the regular sensor node u can be estimated by 1 − d/(m + n). Thus, the probability that a regular sensor node fails to ﬁnd any assisting node in its neighborhood can be estimated by (1 − d/(n + m))m . The probability P of establishing a pairwise key can then be estimated by P = 1 − (1 − d/(n + m))m . As d << n is usually satisﬁed in practice, P can be further approximated by 1 − e−dm/(n+m) . Figure 1 shows the relationship between the probability P of establishing key and the fraction of assisting nodes m n . From the ﬁgure, we can see that a small fraction (0.1) can guarantee a high probability (greater than 0.9) of establishing keys between sensor nodes. On the other hand, since P increases faster with larger number of d , the proposed scheme can achieve attractive performance in high-density networks. During the supplemental key establishment, the assisting nodes within h-hop range of u will be used. Obviously, the larger the range, the higher probability of ﬁnding the assisting nodes. Here we only analyze the situation when h = 2. From

Probability of establishing a pairwise key

Using Auxiliary Sensors for Pairwise Key Establishment in WSN

255

1 0.9 0.8 0.7 0.6 0.5 0.4

d=70, using one−hop assisting nodes d=50, using one−hop assisting nodes d=30, using one−hop assisting nodes d=30, using two−hop assisting nodes

0.3 0.2 0.1 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fraction of m/n

Fig. 1. The probability P that two sensor nodes can establish a pairwise key v.s. the fraction of assisting sensor nodes m n

the above discussion, the probability that none of u’s neighbor nodes can ﬁnd any assisting node can be estimated by ((1 − d/(n + m))m )d . Let P denote the probability of establishing a pairwise key using the assisting nodes in two-hop range. Therefore, P = P + (1 − P ) × (1 − (1 − d/(n + m))md). Figure 1 indicates that by using two-hop assisting nodes, even when the fraction of m n is as small as 0.003, the probability of establishing key is still greater than 0.9. Resilience against Node Captures: We assume that the base station will never be compromised. Thus, the master key of any non-compromised node will be always safe since the assisting nodes are only equipped with the hashes of the master key. Even if an assisting node is captured, it is computationally infeasible to get the original keys from the hashes due to the one-way property of hash functions. We also note that the sensor node derives the ﬁnal key by applying XOR operation to all the random keys. The pairwise key will be secure unless all the related assisting nodes are compromised. This indicates an attractive property of our scheme: a benign assisting node can guarantee the security of the keys established in its neighborhood as long as it can communicate with the sensor nodes in its neighborhood. We then study the probability Pc of a key being compromised when a certain fraction of nodes are compromised. Assume the attacker will randomly compromise a fraction fc of sensor nodes. From our earlier analysis, Pc is equal to the probability that all the corresponding random keys are compromised. The number of assisting nodes that might provide those random keys can be estimated (m×d)/n by m×d . Figure 2 shows that our apn . Hence, Pc can be estimated by fc proach is highly resilient to the node compromise attack. It also implies that we can enhance the security of key management by deploying more assisting nodes.

256

Q. Dong and D. Liu

Fraction of the compromised links between non−compromised nodes

1 0.9

m/n=0.1 m/n=0.2 m/n=0.3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fraction of the compromised nodes

Fig. 2. The fraction of compromised links between non-compromised nodes v.s. the fraction of compromised nodes

Overheads: The proposed scheme only requires a single key for every regular sensor node and n hash values for an assisting node. As we discussed before, the assisting node is only responsible for the pairwise key establishment and can use all its memory, including the ﬂash memory, to store these hash images. The proposed scheme involves small computation overheads. A regular sensor node only need to apply a few symmetric key operations and hash operations to establish a key with any other sensor node. We note that in pairwise key establishment phase, the communication is in one-hop range. Although multi-hop communication will occur when supplemental key establishment is needed, the communication is still limited in a local area. Moreover, we have shown that in most cases, only one-hop communications are needed. As a result, the supplemental key establishment will not incur signiﬁcant communication overhead for our protocol. 2.3

Comparison with Previous Schemes

This section will compare the proposed scheme with previous techniques for pairwise key establishment such as the basic probabilistic scheme [2], the q-composite scheme [3], the random pairwise keys scheme [3], the random subset assignment scheme [4], and the grid-based scheme [4]. Security Performance: We assume the network size n = 20, 000 and the average number of neighbors d = 50. For previous schemes, we assume each sensor can store 200 keys. Hence, for the grid-based scheme [4], the probability of two nodes sharing a direct key is 0.014. For the other schemes, we set P = 0.33 to make sure the network is well connected. However, from our previous analysis, our approach can always guarantee the establishment of pairwise keys at a very high probability using a small fraction of assisting nodes.

Using Auxiliary Sensors for Pairwise Key Establishment in WSN

257

Fraction of compromised links between non−compromised nodes

1 0.9 0.8 0.7 0.6 0.5

Basic probabilistic(p=0.014) Basic probabilistic(p=0.33) q−Composite(q=2,p=0.33) q−Composite(q=3,p=0.33) Random Subset(p=0.014) Random Subset(p=0.33) Grid−based(p=0.014) Ours(m/n=0.1,p=0.99)

0.4 0.3 0.2 0.1 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fraction of compromised nodes

Fig. 3. The fraction of compromised links between non-compromised nodes in diﬀerent schemes

Figure 3 shows the fraction of compromised links in the presence of compromised nodes for diﬀerent schemes. The ﬁgure tells us that in terms of protecting the direct keys, our scheme can provide high resilience to node compromises, which is similar to the random subset assignment scheme and the grid-based scheme [4]. In addition, we must remember that our scheme can guarantee a much higher probability of establishing a pairwise key between two sensor nodes in a densely deployed sensor network. Note that the previous schemes need to employ expensive protocols for path key establishment when two sensor nodes cannot directly setup a pairwise key. As a result, the attacker might discover not only the direct keys but also the indirect (path) keys by compromising the intermediate nodes used in the establishment of the indirect (path) keys. On the contrary, our approach does not need to setup path keys. Even if the attacker has captured the nodes which relay the keying information, the key will not be disclosed. Figure 4 shows the fraction of compromised (direct or indirect) keys between non-compromised nodes in the presence of compromised nodes. The ﬁgure clearly shows that our scheme performs much better than other schemes. For example, when 70% sensor nodes are compromised, the fraction of compromised pairwise keys between non-compromised sensor nodes is only around 0.18. Contrarily, at least 88% keys have been exposed in the previous schemes. Additionally, our proposed scheme can guarantee that a single benign assisting node can protect the keys established in its neighborhood as long as this node can talk to the sensor nodes in its neighborhood. Overheads: In the proposed scheme, only a single master key is stored in every regular sensor node, while the previous schemes have considerable storage requirements for achieving high performance. For instance, In Figure 3 and Figure 4, the previous schemes require every node to store 200 entries to achieve

Q. Dong and D. Liu

Fraction of compromised keys shared between non−compromised nodes

258

1 0.9 0.8 0.7 0.6 0.5

Basic probabilistic(p=0.014) Basic probabilistic(p=0.33) q−Composite(q=2,p=0.33) q−Composite(q=3,p=0.33) Random Subset(p=0.014) Random Subset(p=0.33) Grid−based(p=0.014) Ours(m/n=0.1,p=0.99)

0.4 0.3 0.2 0.1 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fraction of compromised nodes

Fig. 4. The fraction of compromised (direct or indirect) keys between non-compromised nodes in diﬀerent schemes

the desire performance. In terms of the computation overhead, our proposed approach involves only a few number of symmetric key operations and hash operations. Hence, it will not incur much additional overhead. From the previous discussion, most of the communication overhead is introduced in the pairwise key establishment phase, which only requires direct communications in one-hop range. The multi-hop communication in the supplemental key establishment rarely occurs due to the high probability of ﬁnding a neighbor assisting node. On the other hand, for many previous schemes such as the grid-based scheme [4] and PIKE [6], a sensor node often needs to go through the path key establishment to setup keys with other sensor nodes. Such path key establishment could be very expensive in practice since the intermediate sensor node that can help establish the pairwise key may be located far away from the two sensor nodes that want to establish a shared key. According to the above discussion, we can clearly see that our proposed approach has signiﬁcant advantages over existing schemes in terms of storage, computation and communication overheads. 2.4

Security Reinforcing Version

From our previous analysis, we note that once an assisting node is compromised, the attacker is able to discover all the random keys generated by this node. Although the actual pairwise key is combined from multiple random keys generated by diﬀerent assisting nodes, it is still not desirable to let the attacker ﬁgure out the old random keys. In the following, we give a simple extension to ﬁx this problem by updating the keys at the assisting nodes. The basic idea is to update the key at every assisting sensor node after the pairwise key establishment. In other words, the hash key at any assisting node will be changed immediately after it is used once. As a result, the attacker is not

Using Auxiliary Sensors for Pairwise Key Establishment in WSN

259

able to reveal any random key generated before even if the assisting sensor node is compromised at certain point. To achieve this goal, we will take the advantage of the one-way hash function H(·). We will also maintain a sequence number for every hash key shared between a regular sensor node and an assisting node. For example, initially, the hash key Hu = H(Ku ||i) and the sequence number Su = 0 will be stored in the assisting node i for the regular sensor node u before deployment. Once the assisting node i receives the request message to setup a pairwise key between u and v, i will send a random key R to one of the nodes along with the current sequence number Su and Sv via the secure links. At this time, the two copies of the random key R are protected by the hash values Hu and Hv respectively. After that, node i will replace Hu with H(Hu ) and Hv with H(Hv ), and also increase Su and Sv by 1. When the regular sensor node u receives the message from i, u will verify the authentication and conﬁdentiality of the message using the same hash Hu , which can be computed based on Ku , i and Su . Node u can then derive the random key R for pairwise key establishment. In the protocol, node u may choose to keep track of the sequence number Su with each neighbor assisting node in its local area to reduce the overhead during the calculation of the hash key Hu . Such sequence number can certainly be used to deal with the replay attacks as well. Therefore, by employing one-way hash function, the improved approach can enhance the resilience against node captures, i.e., any compromised node will not reveal any secret about the pairwise keys established before. However, compromised assisting sensor nodes can still participate in the pairwise key establishment in the future when new nodes are added in the network. These malicious assisting nodes may disclose valuable information to adversaries. Fortunately, our scheme guarantees that as long as there are at least one benign assisting node in a given area, the ﬁnal pairwise key will be still safe no matter how many sensor nodes are compromised. Based on this property, we may deploy new assisting sensor nodes to replace the old and untrustworthy ones to achieve better security during the pairwise key establishment.

3

Implementation Issues

Based on the previous analysis, we can see that our proposed pairwise key establishment approach is eﬃcient for resource-constrained sensor nodes. In this section, we will present the implementation issues. We have implemented the prototype of the proposed scheme under TinyOS platform [17]. We use RC5 module [18] to implement the security primitives such as hash and MAC operations, assuming 8-byte long hash values and keys. This mechanism is transparent to the applications. In the protocol, the regular sensor node will ﬁrst send request messages to its neighbor assisting nodes and wait for their responses. An assisting node generates the random key and send the reply message to every requesting sensor node. After the regular sensor node collects the random keys, it will combine these keys to derive the ﬁnal pairwise key.

260

Q. Dong and D. Liu

Our scheme has been tested on the TelosB [12] motes. For the assisting nodes, the additional code space is 2598 bytes in the ROM, and the extra usage of data space in the RAM is 892 bytes. We also make use of the 1M ﬂash memory on the chip to store the hash values of the regular nodes’ master keys. For the regular nodes that need to setup secure communication links with 50 neighbors, the additional code space in the ROM is 2102 bytes, and the extra data space in the RAM is 682 bytes. Clearly, the proposed scheme is practical for sensor networks in terms of the code size.

4

Related Work

The closest related work to the techniques studied in this paper is pairwise key establishment. Many techniques have been proposed along this research direction, including the basic probabilistic scheme [2], the q-composite scheme [3], the random pairwise keys scheme [3], the two threshold-based schemes [4,5] and PIKE [6]. Additionally, the prior deployment and post deployment knowledge were also used to improve the performance of key pre-distribution in various situations [7,8,11,10]. This paper gives a better way for pairwise key establishment by exploiting a new dimension of using sensor nodes. There are many other studies on sensor network security, including broadcast authentication [1,19], tamper-resistant hardware [20], secure data aggregation [21], and vulnerabilities, attacks, and countermeasures [22]. We consider them complementary to our technique.

5

Conclusion and Future Work

In this paper, we developed a novel scheme to establish the pairwise keys in sensor networks. This scheme takes advantage of special nodes (the assisting nodes) in the network for key management, representing a new dimension of using sensor nodes. The analysis indicates that our scheme has signiﬁcant advantages over the existing approaches. By making use of these cheap assisting nodes, we ease the burdens of other regular sensor nodes and further extend the lifetime of the whole network. Several research directions are worth investigating. We are interested in conducting a thorough evaluation in a large scale sensor network. We are particularly interested in issues on how to tolerate the communication error and delay in bad channel conditions, how to withstand the high deployment failure rate where a lot of multi-hop communication may be needed. In addition, the additional assisting nodes in the proposed scheme are only deployed to establish the pairwise keys. However, we may make further use of those nodes to defend the network against various other attacks. Acknowledgment. The authors would like to thank the anonymous reviewers for their valuable comments.

Using Auxiliary Sensors for Pairwise Key Establishment in WSN

261

References 1. Perrig, A., Szewczyk, R., Wen, V., Culler, D., Tygar, D.: SPINS: Security protocols for sensor networks. In: Proceedings of Seventh Annual International Conference on Mobile Computing and Networks. (July 2001) 2. Eschenauer, L., Gligor, V.D.: A key-management scheme for distributed sensor networks. In: Proceedings of the 9th ACM Conference on Computer and Communications Security. (November 2002) 41–47 3. Chan, H., Perrig, A., Song, D.: Random key predistribution schemes for sensor networks. In: IEEE Symposium on Research in Security and Privacy. (2003) 197– 213 4. Liu, D., Ning, P.: Establishing pairwise keys in distributed sensor networks. In: Proceedings of 10th ACM Conference on Computer and Communications Security (CCS’03). (October 2003) 52–61 5. Du, W., Deng, J., Han, Y.S., Varshney, P.: A pairwise key pre-distribution scheme for wireless sensor networks. In: Proceedings of 10th ACM Conference on Computer and Communications Security (CCS’03). (October 2003) 42–51 6. Chan, H., Perrig, A.: PIKE: Peer intermediaries for key establishment in sensor networks. In: Proceedings of IEEE Infocom. (March 2005) 7. Du, W., Deng, J., Han, Y.S., Chen, S., Varshney, P.: A key management scheme for wireless sensor networks using deployment knowledge. In: Proceedings of IEEE INFOCOM’04. (March 2004) 8. Liu, D., Ning, P.: Location-based pairwise key establishments for static sensor networks. In: 2003 ACM Workshop on Security in Ad Hoc and Sensor Networks (SASN ’03). (October 2003) 72–82 9. Liu, D., Ning, P.: Improving key pre-distribution with deployment knowledge in static sensor networks. ACM Transaction on Sensor Networks (TOSN) 1(2) (2005) 10. Yu, Z., Guan, Y.: A key pre-distribution scheme using deployment knowledge for wireless sensor networks. In: Proceedings of ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN). (April 2005) 11. Huang, D., Mehta, M., Medhi, D., Harn, L.: Location-aware key management scheme for wireless sensor networks. In: Proceedings of the 2nd ACM workshop on Security of ad hoc and sensor networks (SASN ’04). (October 2004) 29 – 42 12. Crossbow Technology Inc.: Wireless sensor networks. http://www.xbow.com/ Products/Wireless Sensor Networks.htm Accessed in February 2006. 13. Akyildiz, I., Su, W., Sankarasubramaniam, Y., Cayirci, E.: Wireless sensor networks: A survey. Computer Networks 38(4) (2002) 393–422 14. Hartung, C., Balasalle, J., Han, R.: Node compromise in sensor networks: The need for secure systems. Technical Report CU-CS-990-05, U. Colorado at Boulder (Jan. 2005) 15. Mathur, G., Desnoyers, P., Ganesan, D., Shenoy, P.: Ultra-low power data storage for sensor networks. In: Information Processing in Sensor Networks, 2006(IPSN 2006). (April 2006) 16. Needham, R.M., Schroeder, M.D.: Using encryption for authentication in large networks of computers. Commun. ACM 21(12) (1978) 993–999 17. Hill, J., Szewczyk, R., Woo, A., Hollar, S., Culler, D., Pister, K.S.J.: System architecture directions for networked sensors. In: Architectural Support for Programming Languages and Operating Systems. (2000) 93–104 18. Rivest, R.: The RC5 encryption algorithm. In: Proceedings of the 1st International Workshop on Fast Software Encryption. Volume 809. (1994) 86–96

262

Q. Dong and D. Liu

19. Liu, D., Ning, P.: Eﬃcient distribution of key chain commitments for broadcast authentication in distributed sensor networks. In: Proceedings of the 10th Annual Network and Distributed System Security Symposium (NDSS’03). (February 2003) 263–276 20. Basagni, S., Herrin, K., Bruschi, D., Rosti, E.: Secure pebblenets. In: Proceedings of ACM International Symposium on Mobile ad hoc networking and computing. (2001) 156–163 21. Przydatek, B., Song, D., Perrig, A.: SIA: Secure information aggregation in sensor networks. In: Proceedings of the First ACM Conference on Embedded Networked Sensor Systems (SenSys ’03). (Nov 2003) 22. Wood, A.D., Stankovic, J.A.: Denial of service in sensor networks. IEEE Computer 35(10) (2002) 54–62

U % ! "! [email protected], {erkays,levi,oercetin}@sabanciuniv.edu

!( ) ) *+ , ! ! % ./ / / ! *+ , . / % -0 %! ) )! ) $ ( / . / ! ! ! 1)! ) *+ , ! % % -1 0 ) % ) 1) $ ( %% ! ! / ) % ) $( %! ) 1 ) ) / ) $ ( ) ) / % /. / % $( % ) ! ! 1 0 *+ , ! ! ! - )! ) 1 )/ 23 45 ./ / ) ! / / / % % - / 67 / )/ % ./ / / $( % / / 8! - 3 4 ) / / ) )/ 1 ) % )/ ) ) % - *+ , ! $ " # *+ , )

!

)!

)

)/

)

1

)

$

$

"

*

% - 9 % % . %. / /! 1 ) ! ! - )) %! -! % -$ 4 ) % ) ! / 9 % ! / % % % *+ , 1 / / $ "/ ! % - 1 / % *+ , ! / % /1 $ "/ - ! % / ! - *+ , ) )! 2 % 0 5 %% - )) ! / !)) / 1 ) $ / - / *+ ,1 % )) / / ! %)! ) % 0 / / % % % / :8! / $ "/ % ) 1 / / ! !/ % % ) ) ) 1 ! / / - / / % %! %. / / $ ;

"/ . 4 !

% - "!

% / ) ) ! % ) 9

!

!)) % / <=7 =>?$

-

%" /

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 263–274, 2007. © IFIP International Federation for Information Processing 2007

*

/

264

S.V. Kaya et al.

"/

! - *+ , 0 )! / 1 ./ %% ) ) 0 ) 1 % . )) ! 9 % %! $ + 9 % % - % % -% - % -)! ) ! ) *+ , 8! % - / 9 $@ . / )) // % .1 $+ ! ) / % 9 / $ % ) / )) ) ) !1 $ "/ % - ! ) % -- ! % -! ) ) % ! $ %/ % -% -% ! 9 . / ! ) / % %) ! $ " ) % . ) ) ! 1)! ) *+ , - 1 ! ! ./ *+ , % ! % . / % -$ "/ ! ! % ) ! 9 % ! ) ) . ! ) % 1 ! ! - / $+ *+ , % % %! 8! % % -- % -)! ) ! % + ! <$ 3 1 % ) / !% % /) - % ! ./ / / ) / % $@ ) / !% % /) / / ! % / ) ! %/ ) ! / / %$ % % % ./ / -/ !% % /) ! / ) A ) ! % / ! ! % %$ ! % ) - / ! % ./ / ) . % % - / ) / / ! %1 $( 0 % / 0 ) - % -./ % 1 /) 9 2 $ $ 5 % %$ + / ! % /) . / % 8! *+ , - / ) 9 ./ % % / $ / 0 / ! ! % / ./ % - % ./ / ! / % - / % %! $

! $! B! 14

C -

/ % -/ *+ , / )

/ )/

) ) -

% % %!

/ )

0 *+ ,

-

! !

- *+ , ) -) /! ! - / -

$ "/ /)

! 1)! ) ! ./ /

1 - / ! %

1

Privacy-Aware Multi- Context RFID Infrastructure Using Public Key Cryptography

% )) ) % % ! ) ) % ! 1)! )

% *+ ,

-

! ! ! % / . % / / ! ! $

! )

$ ( %!

265

%1 !

- /

3!

) )/ 23 45 ) . -! / 8! ./ / ) % 1 / ) ) % ! ! ./ / % - *+ , /. % . / ! % -/ % ) / $@ . )! ) )/ . 0 % ! ./ / % 0 ! 1 % *+ , $ C / )! ) )/ % % *+ , )) ) % % / / 3 4 / -% / %. . / % ) . % /) ! % $ ) C"* )! ) 1 D<EF -% -) . 8! ) 1 ! % ! % ) ! / 4 G *+ , $ )

% , --

-

"& '

/ % ) ) % / ! ) % ) *+ , 1 - / / )) / $ "/ )) / ! / / -! 1 / .1 % )! -$ @ / / D
266

S.V. Kaya et al.

/ %!

1 ) ) !/ ) / ) ./ / / 3 4 --

D<=F /

%

/ ) ! ! - % ! / . - / / $ "/ /

/ -

. / % % DEF / / / . ! 8! % ! $ /

( ) "/ . ! ! /

! 1 ) )

) %)

% % -

/

))

% 0) % ! % *+ ,

- /

! 1 0 *+ , - ! )) /-

$

(!$ %

< *+ , ! % )) - % - 1 )! ) . ) ) % % *+ , ! ! $ / ! 1% ) )! ) %/ ! % % ./ / % / ! 1% %$ / *+ , % ! % / ! 1% $ / . % / ! 1% % -! ! ) % / $+ 1 -. / 0 ) < ) / % / / / % % ) ! % 1 % - / 1 ) % ! 1% $+ % 8! % A % % % / %% % / ! 1% ) % % $ % ! ) ! 1% / . + ! <$ + 0 ) % / -. % ) % ! 1 % $ %% % ! 1% - / ! 1% !)1 % %- / . % $ % ! % ) 1 ) . % % $ , % ! 1 0 *+ , ! ! / %% % $ + ) - / % ! 1% %! ! ! 1% ) %$ % - / % % 1 % % % %! % / ! 1% $ %% ) % % % ! 1% ) - / % % 9 / ! 1% % / $ % 8! 8! ! 1% % / 1 % / / % . % ! ! % - / ! $ "/ % % / ) - L <5 / ! / 1 % , - / % G5 - / % %! % J5 - / $ ,! / ) / % / !%) / / ) - / % $ "/ ! / 1 - / % / % % /- . % ! )! ) )/ ./ / 0) / . / ) 1 $ "/ % / % $+ ! 1%

/

Privacy-Aware Multi-Context RFID Infrastructure Using Public Key Cryptography

. $ - / *+ , ! / ! / /

! /

-

-

2

!

% . !)) % ) $ "/ . . /

5

/ %) % !

.

/

%

9

-

!))

./ %

/ /

!

! % ! / %

-$

! % ! / / ! M3 % $ - / / ! $ - / % . % - / % / . . / ) ! % ! % / / . / )! - / / . % % ! % . /

1 )

267

% ! )

! %

1 % /

% %

%

%!

1

$ /

)!

%

$ %

/ ) 1 %

) $

(!% + ! % %

G ! / ) ) % 1 % - ) $ "/ ) # "/

%

!

% $ "/ ! . $

% .

/ - /

%

% %

/

/ 8!

/ %

% / # "/ )

/ / . / / ! $ "/ ! % % % . )/

8! % )

% ,

%) /

%

%

/ ) % 2 / C *2C / % , % ! /

% -% ) % ) / -

% /

/

%

/

) 5 $ / 5 % / N" % $ "/ ! / % / % % -% % % $

/ .

% )

!

. % .$

/ - ! 0) %

/

/ % 8! $ "/ / ) % ! % / . / / % / -1) . % % % % / % ./ %$ "/ % ! / ) % ./ / % % % / $ "/! / % . 8! $

% / " 2 J5 / 8! ) $ "/ % % NC O 2 E5$ "/ C ) % % % $ "/ % 1 % / ) - / C 2 75 ! % . - % / % %. / - / 3/ / ) , 2" ,5 / C O! / )! - / % % % )/ 0 % / N4 % O %! % / ) / / % $ / / 4 % / % ) / / - / % ./ / % ) %

(!( * / %

. )

! %

- ! ) $

)

1

)

) 1

268

S.V. Kaya et al.

+ # "/ ) ! / / ! % / 1 / / % / ./ % / % $ "/ ) ! )! 1 ) / / % - / % 3/ - / ) $ / ! / )! ) )/ . ! 1/! ! 8! . % ! / - )! ) -- 1 ! % ! / $K % . )! 1 % 1 $( ! / / % % % ! % !/ % / . % % % % )) % ./ / 1 ! ) . ) ! % ) ) - / %. ) -% 2 $ $ 5 / % $

! %! B! 14

*

+

% ! %!

$B

, -

% % ) - / % $+ % / ) ! % - / ) $ . !% ) % -% ) 0 ) ) % %/ ) % / ! !/ % ) / - / % / % ) % /

'# %

4

0 *+ , 3

% / / ) % /

$ "/ % ! % ! . / % -) $ - / % % ! . % ! ! %$ ! % -)) % . . ! )) 0 % % 1 ! % ) D<6F$ M ! ) % % / ) ) % % $ / ) % / ! / !%/ % ) %$ (/ / ) % 0) / ! ) %$ "/ / . - / ! $

1 / !% / $ / ! %1 %

/ % / % ! / / % / / 8! / / % % ! 1 ! 1 %. / /

Privacy-Aware Multi- Context RFID Infrastructure Using Public Key Cryptography

"/

-

/ / /

/ .

%

! / % ! $ "/ % % / % % -! / / % ! ) %! / . ) - . / / % , % A % / % " $ "/ % %%" ) / % $ % . / 8! ./ / ! % 4 % ./ / / % $ "/ % % / % / % ) / % ) / % , % ! / % $ "/ ./ / % % % L <5 % / / G5 / % / / / % J5 / ! - / 0) % %% ) % / . % . / / % % / % $+ / / / 0) - / / !% %$ "/ % % ! / 0) % $ "/ % % ! / 4 % % / 0)

'# ) ! % !/ % / ) ) . / / % . / )! . / ! . / % % ! . /

!

+

%

.

) ./ / / %

-

%

/ . % %

, - /

%

% % -

/

$C -

$ "/! / / " , ! !/

269

% ) 1 ! / $ "/ 1 ) % ! / % $ "/ / / ) %$ "/ /

/ - % 0) 1 %

% %

! % $

% / $ K! ) ! ) ! , ! %)

% 1 %

1 $

' # "/ ) ) % ) - ! % % ) ! / ) % % ! / / 8! $ % / . ! / % ) % ! / ! 0) ./ / - / - / ) ) %) $ % % ) %! % % %1 / % / % , - / % % % $ % 1 % ! % !/ % / . / % % / / ) / % $ / % # "/ / ) - / $ - / % A 8! . / . 2 $ E % I5 / % @ . / , . ) % . / ./ . / / ! % % ./ )

,

!) , $ %

%

/ / C / ) % % /C %4 % $K . / % % $ ! ! ) % ! $

) % 4 % 1 / $ / /

%

270

S.V. Kaya et al.

V

-

*

, "/ -! )! ) )/ *+ ,1 % ./ / ) $ 3 !% % / DGF D<>F D<=F ./ ) % . / % -% ) !% / % / % .% ) / / ! $ ) ) !% % ) , ) %% % . / % ! / 2 $ $ 5$ + ! / / ) 1 / / ! % / / $ % ) !% , ! ) / ! % ) !% ) $ / . - % / % % / ) !% / % ! ) % % ) !% / , / ) !% % / $ 4 8! % / % ) !% ,$ K 9 % . . / ) !% % / ! )1 )/ / / % / ) % ) ) . / / ) % ) !% ,$ "/ % / ) % ) / ) !% )! % ! % -! ! ! ) / ! $ ! C / ! / ! ) ) ) % / % K2C5 DGF K2CG:J5 D<>F % K2 C5 D<=F % D
$! 4 *

)

+

DGF D<>F D<=F D
-3 )

%

,+

+

/

H *

K2C5 K2CG:J5 K2 C5 K2 C5 K2<5

"/ / D<=F . %! / / %) K2 C5 ! % ! "/ / D
%

/

+ K2C5 K2CG:J5 K2<5 K2<5 K2<5

. K2 C5

J % $ %) 1 ) 1 )!

% /

Privacy-Aware Multi-Context RFID Infrastructure Using Public Key Cryptography

271

% ! / % / )! / % $ ( ! / % ! ! - ! 0 ) .$ K / / 0 ) ! ) -! %! " <$ "/ ) / . !% ! - 2% 5 -. - % %9! ) 0 ) . / ) / - )! 1 ) P / . ! / 0) / / $ ./ . . ) - ! / 1 )/ % $ % ) / ) - C"* )! ) / - )/ . ! ! ) % ) - C"* % 1 ) ) G$> M@ 3E / ! ( % . Q3 % ! % / ) RR ) DG=F ./ / / .% . ) )/ $ *4E / ./ / - / / ) RR ) ! ! % ) I> / C"* % ) $ / ! / / - ! / ) )/ / C"* ) !) / )! / % %$ ! / / GG= 2 ! 5 ! - ) % % ) ) % / % / D
,

*

) . +

+ D
3 )

-3 )

< C"* %

% / *+ , S . *+ , % / % % - C"* ) ,

)

)

4 $" / )

(! 4

) )

M 4 ! + !

% %)

"

)

)

/

H "/

" G! )! / / . / C"*

%

+ 8! 3 . "

/

"

7
%

%

*+ ,

+

/

"

*

" 0$12

7
<

I>

67

)

)/

%

-

%$ )

)

)

!

) !% % ) 1 -1 / 1 J$

% . - / D?F "

- C"*

)/

., 3 042 7== @ G7= @ G= ( <= ( 7>$E7 <
J=== =$<J

%

5

% ! ! )

*+ , 5* 012 <== @ 5

J7?7 =$ J7

1

272

S.V. Kaya et al.

-

%) . 8! / C"* / / / / % / C"* ) 0) % 2 <== @ 5$ C / / ) ) ) 1 % % / ./ $ C / / ) " J 0 / ! - . 1 / - ! ! % / ) % ) - 8! % -1 $@ . / . ) ) / 1 -1 / 1 % ! / . / / 8! ) $+ 0 ) / C"* ) . / / / ! DG
4 %

D
Privacy-Aware Multi- Context RFID Infrastructure Using Public Key Cryptography

% - G1E= C"*

C"* . ! % ) - ! % D<7F$ , ) - ! ! % % % ) - !

./ / . ) -

! . /) ! % C"*

273

) 1

$ )

! ) $( %%

! % ) %! ) )

- !%! !

. /

!

- )/ -

! 1

! $ "/ )

) )/ / / / 1 -1 / 1 ) - C"* D7F ! ) % % ! $ / ! - *+ , .% ) % / / . / / -! ! $ + 0 ) / C"* ) . // / ! DG
6 ( ) ) % ) 1 . ! 1 0 *+ , ! ! / ) )! ) )/ 23 45$ / ! ! % -% *+ , - % -)! ) $ ) - / % *+ , P / 1 / ) ) %$ ,! ) / , . / / )! - / % ./ / ) % ) / , - / % $ )/ 1 % / / % / . / , - / % % / / ) % $ "/ ) 3 4 / ) ) % / ) % / )/ 1 % / $ ( %-3 4! ) $ ( - ! % / C"* ) 1 ! 3 4 / ) ) %) . /) . 8! / G= U( / ) ! JJ== $ - ! ) -! % 8! % 0 ! - C"* . / !% %! ) $ +! / %/ 01 % % - ! !% DGGF$ ( %% % ! ! / ) ) %. / . % / / ) ! / $ %% . %! % ) - ./ / . % / ) 1 ) $ / ) / ! % % / !/ % % $( / . % / % % !) % ! ) 1 ) ) 8! ) . / % ! ! / $

References 1. Weis, S., Sarma, S., Rivest, R., Engels, D.: Security and privacy aspects of low-cost radio frequency identification systems.: In SPC‘03, LNCS, Vol.2802. Springer-Verlag (2003) 454 V 469 2. Ohkubo, M., Suzuki, K., Kinoshita, S.: Cryptographic Approach to Privacy-friendly Tags. In RFID Privacy Workshop. MIT (2003)

274

S.V. Kaya et al.

3. Menezes, A.J., van Oorschot, P.C., Vanstone, S. A.: Handbook of Applied Cryptography. CRC Press (1997) 4. Avoine, G., Dysli, E., Oechslin, P.: Reducing time complexity in RFID systems.: In SAC‘05. LNCS, Vol. 3897. Springer-Verlag (2006) 291 V 306 5. Gaubatz, G., Kaps, J.P., Öztürk, E., Sunar, B.: State of the Art in Ultra-Low Power Public Key Cryptography for Wireless Sensor Networks.: In PerSec‘05, Kauai Island, Hawaii (2005) 6. Hoffstein, J., Silverman, J., Whyte, W.: NTRU report 012, version 2. estimated breaking times for NTRU lattices, Technical Report 12, NTRU Cryptosystems, Inc. (2003) 7. National Institute of Standards and Technology (NIST). FIPS-197: Advanced Encryption Standard, November (2001). Available online at http://www.itl.nist.gov/fipspubs/ 8. Fujistsu web site, 2006. Referenced 2006 at http://www.fujitsu.com/us/services/edevices/ microelectronics/memory/fram/ 9. Feldhofer, M., Dominikus, S., Wolkerstorfer, J.: Strong authentication for RFID systems us-ing the AES algorithm.: In CHES‘04. LNCS, Vol. 3156. Springer-Verlag (2004) 357 V 370 10. Molnar, D., Wagner, D.: Privacy and security in library RFID: Issues, practices, and architectures.: In CCS‘04. ACM Press, Washington, DC, USA (2004) 210 V 219 11. Juels, A., Weis, S.: Authenticating pervasive devices with human protocols. In CRYPTO‘05. LNCS, Vol. 3621. Springer-Verlag (2005) 293 V 308. 12. NTRU RFID data sheet, 2006. http://www.ntru.com/products/NtruRFID.pdf 13. NTRU RFID white paper, 2006. www.ntru.com/products/RFID_White_paper_FNL.pdf 14. Hoffstein, J., Pipher, J., Silverman, J.H.: NTRU: A Ring-Based Public Key Cryptosystem.: Algorithmic Number Theory (ANTS III). LNCS, Vol. 1423. Springer-Verlag, Berlin (1998) 267-288 15. Graham, N.H., Nguyen, P., Pointcheval, D., Proos, J., Silverman, J. H., Singer, A., Whyte, W.: The Impact of Decryption Failures on the Security of NTRU Encryption. In CRYPTO‘03, Santa Barbara, USA (2003) 16. Dimitriou, T.: A Lightweight RFID Protocol to protect against traceability and cloning attacks. In SecureComm‘05 59 V 66 17. Capkun, S., Hubaux, J.P.: Secure positioning of wireless devices with application to sensor networks. In INFOCOM‘05 18. Avoine, G., Oeschlin, P.: A Scalable Protocol for RFID Pseudonyms. In Persec (2004) 19. Molnar, D., Soppera, A., Wagner, D.: A Scalable, Delegatable Pseudonym Protocol Enabling Ownership Transfer of RFID Tags. Workshop on RFID and Light-Weight Crypto, July 14-15, Graz, Austria (2005) 20. Dai W.: Crypto++, a Free C++ Library for Cryptography. http://www.eskimo.com/~weidai (2004) 21. O‘Rourke, C., Sunar, B.: Achieving NTRU with Montgomery Multiplication. In IEEE Trans. on Comp., vol. 52, No. 4, April (2003) 22. Kaya, S. V., Savas, E., Levi, A., Ercetin, O.: Privacy-Aware Multi-Context RFID Infrastructure using Public Key Cryptography http://students.sabanciuniv.edu/~selimvolkan / MultiContext_RFID_Framework.pdf (November 19, 2006)

Minimum Cost Configuration of Relay and Channel Infrastructure in Heterogeneous Wireless Mesh Networks Aaron So and Ben Liang Department of Electrical and Computer Engineering University of Toronto, Toronto, Ontario, Canada M5S 3G4 {aaronso,liang}@comm.utoronto.ca

Abstract. Fixed broadband wireless access is a promising technology allowing Internet service providers to expend their customer base in sparsely populated rural areas. Because the size of the target service area is humongous, relay infrastructure is essential. Installing and maintaining this relay infrastructure is the main cost associated with such networks. Thus, we develop an optimization framework which computes the minimum number of relay stations and their corresponding channel configurations such that a pre-specified subscribers’ traffic demand can be satisfied. Since the problem is a mixed-integer program, we propose an efficient optimization algorithm to compute the optimal solution in a reasonable amount of time. Our numerical results show that by using a few relay stations in a rural community, broadband Internet access can be established in a cost effective manner. Keywords: Fixed wireless broadband Internet access, relay stations, optimal placement and channel assignment.

1 Introduction Since high wiring cost is one of the biggest factors inhibiting wired broadband Internet access in sparsely populated rural areas, broadband wireless has long held the promise of delivering a wide range of data and information services to business and residential customers quickly and cost-effectively. With the publication of a comprehensive industry standard, namely IEEE 802.16, broadband wireless is ready to unleash its full potential. The IEEE 802.16 standard requires two separate physical layer specifications because the propagation characteristics of radio waves are so different in the lowerand upper-microwave regions. The WirelessMAN-OFDM and WirelessMAN-SC specifications utilize the 2-11 GHz and 10-66 GHz spectrum respectively. Lower frequency signals can penetrate walls and deflect from obstacles, while higher frequency transmissions must meet strict line-of-sight requirements. However, the advantage of using high frequency bands is an abundance of bandwidth. This intrinsic property of IEEE 802.16 technology makes it ideal for a heterogeneous architecture.

This research was made possible thanks to Bell Canada’s support through its Bell University Laboratories R&D program.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 275–286, 2007. c IFIP International Federation for Information Processing 2007

276

A. So and B. Liang

For the network under investigation, we assume that there is a base station wired to the ISP core network, and this base station is assigned to serve sedentary subscribers in a particular area. Because of the size of the coverage area, the base station usually cannot serve every subscriber by single-hop communication. As a result, several relay stations (RSs) are installed in the network, for example, on the subscribers’ rooftops, to relay traffic from and to the base station. If line-of-sight communications can be established among some RSs and the base station, the bandwidth abundant high frequency spectrum is used to form a backbone network. The lower spectrum, on the other hand, is used by the base station and RSs to communicate with the subscribers and form the corresponding local network. For this architecture, the cost of the network is dominated by the installation and maintenance cost of the RSs. Under this hypothetical heterogeneous mesh networking architecture, the focus of this work is to minimize the number of RSs used in the mesh network while maintaining the prespecified uplink and downlink demands of the subscribers. Note that the above IEEE 802.16 specifications are used only as an example; the analytical framework presented in this paper is general and can be applied to mesh networks based on other types of wireless technologies. To the best of our knowledge, this work is among the first solutions to address the problem of joint relay equipment placement and channel assignment in a heterogeneous wireless mesh network. In this work, we describe a heterogeneous wireless mesh network architecture with relay infrastructure and develop an analytical framework which determines whether a network with a particular relay station placement and channel assignment can satisfy the subscribers’ demands and interference constraints. Furthermore, we propose an optimization framework which combines a heuristic with Bender’s decomposition to calculate the minimum deployment and maintenance cost of a given heterogeneous wireless mesh network. The rest of this paper is organized as follows. In Section 2, we review the related work in multihop wireless networks. In Section 3, we describe the network infrastructure and equipment capabilities. In Section 4, we define our relay station placement and channel assignment problem mathematically, and describe an optimization solution. In Section 5, we discuss the convergence time and performance of the proposed optimization algorithm. Finally, concluding remarks are given in Section 6.

2 Related Works Motivated by recent advances in ad hoc networking [1][2], wireless multihop mesh networking is now considered as the next evolutionary step for wireless data networks. To bring wireless mesh networks closer to reality, in [3], Draves et al. conducted a detailed empirical evaluation of several link-quality metrics on route computation performance in wireless mesh networks. The issue of interference management in wireless mesh networks has been discussed in several contexts. In [4], Jain et al. considered the fundamental question of how much throughput a given wireless mesh network can achieve under different interference conditions. To address operational issues, an interferenceaware channel assignment algorithm for multi-radio wireless mesh networks was proposed in [5] by Ramachandran et al.

Minimum Cost Configuration of Relay and Channel Infrastructure

277

The problem of wireless network equipment placement has also been addressed in several works. In [6], So and Liang proposed a Lagrangian approach to computes the optimal placement of a fixed number of relay node, which relay traffic in a two-hop fashion, to improve throughput in a WLAN. In the context of community mesh networks, innovative integration techniques were developed by Begerano in [7] to minimize the number of wired access points in a mesh network to reduce wiring cost, while maintaining users’ QoS constraints. To the best of our knowledge, there is no existing work that addresses the problem of joint relay equipment placement and channel assignment in community wireless mesh networks, which is what we investigate in this work.

3 Infrastructure and Equipment Capabilities To establish a network in a rural area, an operator needs to establish a site for the initial base station and the central office, which should have high capacity backhaul connection with the ISP core network. One cost effective backhaul solution is to lease dark fiber from electrical utilities or railroad companies. However, by using this approach, the network point of access is already fixed. Thus, the ISP does not have the freedom to choose the location of the central office and initial base station. In this work, we consider the case where the location of the base station and central office is given. Our goal is to place the minimum number of relay stations in the network such that the demands of the subscribers can be met. Since the subscriber locations are fixed and the high frequency spectrum is used, by using advanced antenna technologies and the three-dimensional space intelligently, we can effectively control interference in both the backbone and local networks. Adaptive array antenna technologies [8][9] have the ability to focus a beam very tightly toward a receiver, virtually eliminating the effects of interference. However, since such equipment is expensive, we assume that it is used only in the backbone network. For the local network, we assume a more affordable and common approach as follows [10]. Polarized directional antennas only disposes magnetic fields horizontally. When the antenna is tilted downward (or upward), beyond

Backbone Network Link Local Network Link

Base Station

Relay Station 2 Relay Station 1

High Capacity

Fiber Optic Link Relay Station 3

ISP Core Network

Fig. 1. Tilted polarized directional antenna systems

278

A. So and B. Liang

a certain distance, the radiation will simply be absorbed into the ground (or outer space). As shown in Fig. 1, two relay stations, e.g., Relay Station 2 and 3, can use the same channel for the local network and do not interfere with each other as long as they are placed far apart from each other. However, since subscribers who use the same RS are located in the same multipath environment, they have to share the channel in a timemultiplexed fashion. In the next section, we define the problem mathematically.

4 System Model and Optimization Framework Suppose there are N subscribers and one base station in the system, and they are represented by the set V = {0, 1, ..., N }, where the base station is represented by the index 0. Let VR ⊆ V be the set of nodes where the installation of relay stations are feasible.1 We can use the set VR to form a directed-complete graph representing the backbone B network. The link weight from node i to node j, denoted Cij , represents the capacity in terms of bit per second from node i to node j using the backbone technology. The capacity between two nodes is zero if line-of-sight communication cannot be established. Similar to the backbone network, we can use the set V to form a directed-complete graph representing the local network. Within the local technology, the capacity from L . Since a link which handles local traffic has to be node i to node j is denoted by Cij L associated with the base station or a relay station, Cij = 0 if both i and j are not in the set VR . As mentioned above, two relay stations using the same channel would interfere each other’s local network operations if they are placed in each other’s interfering zones. Let N (i) be the set of nodes that interfere the operation of node i, where i ∈ VR . Moreover, when there are NC local channels available, let Λ = {1, 2, ..., NC } be the local channel set. Furthermore, for each subscriber i, there is a pre-specified uplink demand, ui , and downlink demand, di . Given the above as the input to our problem, we define the following decision variables. For discrete decision variables, let us set Xiλ = 1 if an RS which uses channel λ is installed in node i; otherwise we set Xiλ = 0. We term Xiλ the location-channel variables. Let us define the following continuous decision variables. Let fijd and fiju be the amount of downlink and uplink traffic flow from node i to node j by using the backbone technology respectively. Let hdij and huij be the amount of downlink and uplink traffic flow from node i to node j by using the local technology respectively. All input and decision variables are non-negative. Moreover, we define X01 = 1 since the base station is always present, and without loss of generality, we can also let the base station use channel 1 for its local network operation. Next, we formulate our problem as a mixed integer program. 4.1 Optimization Formulation Our goal is to find the minimum number of RSs in the system which satisfies all the demand and interference constraints. The optimization formulation is as follows: 1

Whether a subscriber site is in VR or not depends on the willingness of the subscriber and other physical conditions. Furthermore, the base station, which has index 0, is included in the set VR .

Minimum Cost Configuration of Relay and Channel Infrastructure

min : X

s.t.

Xiλ

(1)

i∈VR ,λ∈Λ

i∈VR \{0}

d f0i +

hd0i =

ui

(2)

di

(3)

i∈V \{0}

i∈V \{0} u fji

hui0 =

i∈V \{0}

i∈VR \{0}

i∈V \{0}

∀ i ∈ VR \{0}

(4)

fijd

∀ i ∈ VR \{0}

(5)

huij ≥ ui

∀ i ∈ V \{0}

(6)

hdji ≥ di

∀ i ∈ V \{0}

(7)

huji

fiju

+

=

j∈V \{0}

j∈VR ,i=j

d fji −

j∈VR ,i=j

hdij =

j∈V \{0}

j∈VR ,i=j

u fi0 +

279

j∈VR ,i=j

j∈VR

j∈VR

hdij + huij

j∈V,j=i

L Cij

hdij + huji ≤ k

j∈V

+

hdji + huji

L Cji

Xiλ

≤ (1 − Xiλ )k + 1

∀ i ∈ VR , λ ∈ Λ(8)

∀ i ∈ VR

(9)

∀ i ∈ VR

(10)

λ∈Λ

Xiλ ≤ 1

λ∈Λ B fiju + fijd ≤ Cij

Xiλ

∀ i ∈ VR , j ∈ VR , i = j (11)

Xjλ

∀ i ∈ VR , j ∈ VR , i = j (12)

λ∈Λ B fiju + fijd ≤ Cij

Xiλ +

λ∈Λ

Xjλ ≤ 1

∀ i ∈ VR , λ ∈ Λ

(13)

j∈N (i)

The objective (1) minimizes the number of RSs to be installed in the network. Constraints (2) and (3) verify that the amount of traffic entering and exiting the base station equals the total uplink and downlink demands respectively. Constraints (4) and (5) verify that the amount of traffic entering each RS matches the amount of traffic exiting each RS (the conservation of flow at each RS). Constraints (6) and (7) verify that the uplink and downlink demands are met respectively. Constraints (8) and (9) work together with an arbitrary large number k. If an RS which uses channel λ is placed at node i, then Xiλ = 1, and the right hand side of constraint (8) is 1. Since local uplink and downlink traffic share the channel in a time-multiplexed fashion, constraint (8) verifies that the local traffic enters and exits through the ith RS does not exceed its capacity. If no RS is installed at node i, then Xiλ = 0 ∀λ ∈ Λ and the right hand side of constraint (8) is k + 1. Thus, constraint (8) does not impose any restriction on the traffic exiting and entering node i. However, the right hand side of constraint (9) is 0. This ensures that no

280

A. So and B. Liang

local uplink traffic enters node i and no local downlink traffic exits node i. Constraint (10) verifies that at most one channel can be assigned to an RS. Constraints (11) and (12) work together to ensure that a positive backbone traffic between node i and j exists only if an RS is placed at node i and an RS is placed at node j. Finally, constraint (13) ensures that no two RSs which use the same channel are placed in each other’s interfering zone. 4.2 Problem Reformulation and Bender’s Decomposition Traditionally, any mixed integer problem can be solved by branch-and-bound. However, such approach is virtually intractable even for a small number of discrete variables because an exponential number of linear programs have to be solved. Given that we have a large number of continuous variables and a relatively small number of integer variables, Bender’s decomposition breaks down the problem to a sequence of small 0-1 integer problems [11] which can be solved efficiently by commercial optimization softwares such as CPLEX. In the following, we first reformulate the above analytical framework so that it can be decomposed by Bender’s method. Then, we describe the algorithm that we used to solve the RS placement and channel assignment problem. To apply Bender’s decomposition to a mixed integer program, the problem needs to be organized into the following form 2 min c1 y + c2 x

(14)

s.t. A1 y + A2 x ≥ b,

(15)

x,y

where x is a vector represents the location-channel variables Xiλ , y is a vector represents the set of continuous variables fiju , fijd , huij , hdij , c1 = 0t , c2 = 1t , and (·)t denotes vector transposition. For a fixed value of the location-channel variables x = x , problem (14) reduces to the following feasibility problem: min T (y| x) c1 y

(16)

s.t. A1 y ≥ b − A2 x

(17)

y

Obviously, given a particular RS placement and channel assignment, x , the resulting problem, (16)(17), may or may not be feasible. To make all location-channel variables feasible, let us introduce one positive continuous variable, v, and a very large infeasibility constant, P . We can then modify (16)(17) by changing A1 to A1 = [A1 |1], c1 t to c1 = [0 |P ], and y to y = y v. Then, the modified feasibility problem is the following: min T (y | x) = c1 y = P v

(18)

s.t. A1 y ≥ b − A2 x

(19)

y

For any x which makes (16)(17) infeasible, problem (18) (19) is still feasible, but it will suffer a very large infeasible penalty P v. 2

a = b is equivalent to a ≥ b and b ≥ a.

Minimum Cost Configuration of Relay and Channel Infrastructure

281

Now, let us consider the dual of the modified feasibility problem (18)(19). Let u be the set of dual variables. The dual of the modified feasibility problem may now be formulated as follows: max D(u| x) (b − A2 x )t u u

s.t. At1 u ≤ ct1 u≥0

(20) (21) (22)

Denote the optimal solutions to the linear programs (18) and (20) by y ∗ and u∗ respectively. Then, by duality theory, c1 y ∗ = (b − A2 x )t u∗

(23)

We now consider all the extreme points of the dual problem (20). Note that the extreme points are defined by the feasible region described by (21) and (22) which are independent of the location-channel variables x. Thus, the extreme points can be generated without any knowledge of the RS locations and channel assignments. Let us denote the ith extreme point by ui and total number of extreme points by p. We know from the theory of linear programming that at least one optimal solution to any linear problem occurs at an extreme point of the feasible region. Thus, the original problem can be reformulate as the following pure 0-1 problem: min c2 x + D

(24)

s.t. D ≥ (b − A2 x)t ui ∀i ∈ [1, p],

(25)

x

or equivalently, min D

(26)

x

s.t. D ≥ c2 x + (b − A2 x) u

t i

∀i ∈ [1, p].

(27)

The difficulty with problem (26) is that the number of extreme points of the dual problem is potentially very large. Thus, we do not want to enumerate all of the constraints in (27) explicitly. Also, at the optimal solution to (26), only a small subset of the constraints (27) are likely to be tight. Thus, even if we could enumerate all of them, many of them would prove to be unnecessary. On the other hand, if we solve (26) with only a subset of the constraints in (27), we will obtain a valid lower bound on the optimal value of the original objective function. Furthermore, if all of the constraints that are tight in the optimal solution to (26) happen to be in the subset of constraints that we include, then the value of the objective function (26) will exactly equal the optimal value. To generate the desired subset of extreme points, Bender’s method adds constraints to the constraint set (27) one by one [11]. When a new constraint is added, the optimal solution of (26) returns either a better (larger) lower bound value or the optimal solution to the original problem (14) if a feasible RS placement and channel assignment exists.3 3

If no feasible solution exists, the algorithm will return a very large number.

282

A. So and B. Liang

4.3 Modified Bender’s Method By using the original Bender’s method, at each iteration, one needs to solve a small pure 0-1 minimization problem. Even though this approach makes the problem manageable, it could be time consuming and potentially require a large amount of time to compute. The purpose of finding the solution of (26) at each iteration is to find an appropriate extreme point to add to the constraint set (27). Instead of performing the minimization (26) at each iteration, we propose to use a heuristic to find a decent extreme point at each iteration, and only perform minimization (26) when the extreme point generated by the heuristic is invalid. To further reduce the run time of the Bender’s method, we propose to use the results generated by each iteration to reduce the solution space. If a feasible RS placement and channel assignment exists, the optimal solution must be an integer which equals the minimum number of required relay stations. To take advantage of this observation, we add the constraint, Ll ≤ i∈VR ,λ∈Λ Xiλ ≤ Lu , to the problem, where the minimum number of required relay stations must be greater than or equal to Ll and smaller than or equal to Lu . We initialize Ll = −∞ and Lu = ∞. We update Lu whenever a new (smaller) upper bound is found, and we update Ll whenever the lower bound, which is computed by the minimization of (26), increases by more than 1. Fig. 2 presents a flow chart of the modified Bender’s decomposition approach to solve the RS placement and channel assignment problem.

Empty constraint set (27) and an arbitrary RS placement and channel assignment xˆ incumbent upper bound =

Solve the modified feasibility problem (20) (21) (22) and obtain extreme point u t and upper bound= c 2 xˆ (b A2 xˆ ) u STOP YES if upper bound < incumbent upper bound incumbent upper bound = upper bound

Lu

¬incumbent upper bound ¼

lower bound incumbent upper bound?

NO

xˆ

xˆ '

Add new constraint

D' t c2 x (b A2 x)t u to constraint set (27)

Solve updated problem (26)(27) and obtain a potential lower bound and a new RS placement and channel assignment xˆ ' by heuristic.

NO

lower bound incumbent upper bound?

Solve updated problem (26)(27) and obtain a lower bound and a new RS placement and channel assignment xˆ ' by optimization. YES

Ll

ªlower bound º

Fig. 2. Flowchart of the modified Bender’s decomposition approach for solving the RS placement and channel assignment problem

Minimum Cost Configuration of Relay and Channel Infrastructure

283

We begin by using an empty constraint set (27), and we add new constraints to it iteratively. By using the RS locations and channel assignments, x , generated by the previous iteration4, we solve the modified feasibility problem (20). When this problem is solved, we obtain an extreme point u. From this, we obtain the minimum infeasibility penalty for the RS placement and channel assignment, x , suggested by the previous iteration. The sum of the infeasibility penalty and the cost of the RSs constitute an + (b − A2 x )t u. upper bound of the problem. That is to say the upper bound = c2 x We keep the best (lowest) upper bound found so far and save it as the incumbent upper bound. The newly generated extreme point is then used to add a new constraint, D ≥ c2 x + (b − A2 x)t u, to the constraint set (27). In the original Bender’s method, one solves the updated problem (26) and obtain a lower bound value and a new RS placement and channel assignment x . In this work, we propose to use a simple heuris5 tic to generate decent values of x , and we only perform minimization if the lower bound value generated by the heuristic is higher than or equal to the incumbent upper bound. Otherwise we set x = x and generate another extreme point.6 If the lower bound generated by the minimization and the incumbent upper bound are equal, we stop. Otherwise, we set x =x and go back to the beginning of the iterative phase.

5 Numerical Analysis In this section, we present numerical results based on a hypothetical IEEE 802.16 network. A link capacity model, similar to that in [6], is used to determine the operational bit rate between any pair of nodes. The optimal RS placement and channel assignment in a typical rural environment will be derived by the proposed modified Bender’s decomposition method. By using the proposed optimization framework, we evaluate the cost of deploying a heterogeneous wireless mesh network with relay stations in a sparse rural area. The cost of the network is the minimum number of RSs required. We set the infeasibility constant P to 1000. For the backbone network, a 20MHz spectrum is occupied, and the IEEE 802.16 WirelessMAN-SC technology is used. For the local network, we use the IEEE 802.16 WirelessMAN-OFDM technology and a 20MHz spectrum as well. As shown in Fig. 4, the subscribers are distributed in a 12km × 12km rural area, and the base station and central office are located at node 0, where they can be connected to the ISP core network via the fiber optic network of the railroad company, which is assumed to be underutilized. According to the IEEE 802.16 WirelessMAN-OFDM specifications, channel bandwidth can be adjusted dynamically. However, the bandwidth occupied by each channel is vendor specific. In this work, as an example of illustration, we assume that the local network spectrum is divided into three channels.7 Each RS or base station has a 4km 4 5 6

7

For the first iteration, we use an arbitrary RS placement and channel assignment. In our numerical result, we use a simple descent algorithm. Problem (26) is a small pure 0-1 minimization problem. This problem can be solved to optimality in a reasonable amount of time by any commercial optimization softwares if the number of integer variables is not too large (e.g. less than 1000). The link rate of one channel is one third of the original 20MHz channel.

A. So and B. Liang 5

x 10

2

0

−2

Bounds

284

−4

−6

original incumbent upper bound original lower bound modified incumbent upper bound modified lower bound

−8

−10

0

2

4

6

8

10

12

14

16

18

20

22

Time (hours)

Fig. 3. Convergence of the original and modified Bender’s decomposition method

Channel 1 Channel 2

57

Channel 3 27

58 28

26 29 25 9 10 8

24 21 16

7 6

20

19

18 11

5

23

15

12

13

14

1 2

0 4

22

17

3

51 50 30

52

31

49

33 37

36 35

38

39 41

32 34

40 42

43 44

47

45

46

54

53 55 56

48

Fig. 4. Network configuration of a heterogeneous wireless mesh network

Minimum Cost Configuration of Relay and Channel Infrastructure

285

interference zone. In other words, if a base station or RS using a particular channel is placed in one location, another RS which uses the same channel cannot be placed within a 4km radius of the former base station or RS. Among the 58 nodes, we assume the ISP has access to 50 of them for the installation of RSs. Furthermore, we set the uplink and downlink demand of each subscriber to 1Mbps, and 2Mbps respectively. As mentioned in Section 4.2, the original version of Bender’s decomposition method requires solving a small pure 0-1 optimization problem in each iteration, which potentially takes a very long time to perform. In the modified version as shown in Fig.2, we have integrated a classic descent algorithm [12] to reduce the runtime of the Bender’s method. The convergence of the original and modified Bender’s decomposition methods are shown in Fig. 3, and the resulting configuration of the network is shown in Fig. 4. It takes about 22 hours for the modified version of the Bender’s decomposition method to converge to the optimal value, while in the same amount of time, the gap between the upper bound and the lower bound generated by the original method is still very large. Even though the extreme points generated by the heuristic at some iterations are not the desired extreme points, the heuristic can rapidly generate a set of useful extreme points which leads to faster convergence than the original approach.

6 Conclusion In this work, we investigate the optimal placement and channel assignment of wireless relay stations to minimize the operational cost of a wireless mesh network. We have presented a heterogeneous wireless mesh network architecture which uses relay stations to form a backbone and a local network. Furthermore, we have developed an analytical model to investigate whether a particular RS placement and channel assignment can satisfy the user demands and interference constraints. We use Bender’s decomposition to compute the optimal number of RSs and their corresponding placement and channel assignment which minimize the operational cost of a heterogeneous wireless mesh network. Furthermore, we integrate heuristics in the algorithm to reduce the runtime of the Bender’s decomposition method. Given a set of network parameters, the proposed framework and optimization technique can offer significant run time advantages, when used by network designers to compute the optimal placement and channel assignment of relay stations and to provide design guidelines on the network setup and maintenance cost estimations.

References 1. Perkins, C.E., ed.: Ad Hoc Networking. Addison-Wesley Longman (2001) 2. Haas, Z., Deng, J., Liang, B., Papadimitratos, P., Sajama, S.: Wiley Encyclopedia of Telecommunications. John Wiley & Sons (2002) 3. Draves, R., Padhye, J., Zill, B.: Comparison of routing metrics for multi-hop wireless networks. In: Proc. of ACM SIGCOMM, Portland, OR, U.S.A. (Sept. 2004) 4. Jain, K., Padhye, J., Padmanabhan, V.N., Qiu, L.: Impact of interference on multi-hop wireless network performance. In: Proc. of ACM MOBICOM, San Diego, CA, U.S.A. (Sept. 2003) 66–80

286

A. So and B. Liang

5. Ramachandran, K., Belding, E., Almeroth, K., Buddhikot, M.: Interference-aware channel assignment in multi-radio wireless mesh networks. In: Proc. of IEEE INFOCOM, Barcelona, Spain (April 2006) 6. So, A., Liang, B.: A Lagrangian approach for the optimal placement of wireless relay nodes in wireless local area networks. In: Proc. of the International IFIP-TC6 Networking Conference, Coimbra, Portugal (May 2006) 160–172 (extended version in press, to appear in the IEEE Transactions on Mobile Computing) 7. Bejerano, Y.: Efficient integration of multihop wireless and wired networks with QoS constraints. IEEE Transactions on Networking 12(6) (Dec. 2004) 1064– 1078 8. Glapa, S.: Navigating the harsh realities of broadband wireless network economics. Technical report, ArrayComm, Inc. (2004) 9. Beaton, M.: Unwiring broadband, technology white paper. Technical report, Navini Networks (2002) 10. D. Sweeney, ed.: WiMax Operatos’s Manual, Building 802.16 Wireless Networks. 2 edn. Apress (2005) 11. Martin, R.K.: Large Scale Linear and Integer Programming. 1 edn. Kluwer Academic Publishers (1999) 12. Wolsey, L.: Integer Programming. 1 edn. Wiley (1998)

Optimization Models for the Radio Planning of Wireless Mesh Networks Edoardo Amaldi, Antonio Capone, Matteo Cesana, and Federico Malucelli Politecnico di Milano, Dipartimento Elettronica ed Informazione, Milano, Italy {amaldi,capone,cesana,malucell}@elet.polimi.it

Abstract. In this paper we propose novel optimization models for the planning of Wireless Mesh Networks whose objective is to minimize the network installation cost, while providing full coverage to wireless mesh clients. Our mixed integer linear programming models aim at selecting the number and positions of mesh routers and access points, while taking into account in an accurate way traﬃc routing, interference, rate adaptation, and channel assignment. We provide the optimal solutions of the proposed problem formulations on a set of realistic-size instances and discuss the eﬀect of diﬀerent parameters on the characteristics of the planned networks.

1

Introduction

The network devices composing Wireless Mesh Networks (WMNs) [1] are of three types: Mesh Routers (MRs), Mesh Access Points (MAPs) and Mesh Clients (MCs). The functionality of both the MRs and the MAPs is twofold: they act as classical access points towards the MCs, whereas they have the capability to set up a Wireless Distribution System (WDS) by connecting to other mesh routers or access points through point to point wireless links. Both MRs and MAPs are often ﬁxed and electrically powered devices. Furthermore, the MAPs are geared with some kind of broadband wired connectivity (LAN, ADSL, ﬁber, etc.) and act as gateways toward the wired backbone. MCs are users terminals connected to the network through MAPs or MRs. Several parameters concur to the determination of a general wireless mesh network eﬀectiveness including the number of radio interfaces for each device, the number of available radio channels, the access mechanism, the routing strategies and the speciﬁc wireless technology used to implement the mesh paradigm. All these parameters are de facto degrees of freedom the network designer can exploit to deploy an eﬀective WMN, thus optimization criteria are needed for the tuning of such parameters. To this end, many works have appeared in the literature with the purpose of providing optimized protocols for WMNs. So et al. propose in [2] a multichannel MAC protocol in the case single interface transceivers are used, whereas reference [3] analyzes those networks where even multiple radio interface per wireless node can be used adapting the channel access protocol. Das et al. propose two I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 287–298, 2007. c IFIP International Federation for Information Processing 2007

288

E. Amaldi et al.

Integer-Linear programming models to solve the ﬁxed channel assignment problem with multiple radio interfaces [4], whilst [5] and [6] address the problems of channel assignment and routing jointly, providing diﬀerent formulations to the optimization problem. The most of the previously published work assumes a given network topology, i.e., the general approach tends to optimize the channel assignment and/or the routing assuming given positions for the MRs and the MAPs. On the other hand, the purpose of the present work is to model the radio planning problem of WMNs, providing quantitative methods to optimize number, positions and coordination of MRs and MAPs and the overall topology of the network. The problem of planning WMNs diﬀers from that of planning other wireless access networks, such as cellular systems [7] or WLANs [8]. In the latter cases, network planning involves selecting the locations in which to install the base stations or access points, setting their conﬁguration parameters (emission power, antenna height, tilt, azimuth, etc.), and assigning channels so as to cover the service area and to guarantee enough capacity to each cell [9]. In the case of WMNs, each candidate site can host either MAPs or MRs, which have diﬀerent installation costs. Roughly speaking, MAPs are more expensive than MRs since they must be directly connected to the wired backbone and might be more powerful than MRs in terms of both processing and transmission capabilities. Moreover, the traﬃc to/from the wired backbone has to be routed on a path connecting the MR to one MAP at least. In this context, capacity limits of radio links among MRs and between MRs and MAPs play a key role since the traﬃc routed on a link must not exceed its capacity. The resulting network design problem must simultaneously consider the radio coverage of users, like in classical radio planning for wireless access networks [9], and the traﬃc routing, like in the design of wired networks [10]. Very few previous works consider the problem of planning WMNs or, more in general, ﬁxed multi-hop wireless network. The main attempt to address the problem appears in [11] where the focus is on locating internet transit access points (MAPs in the terminology adopted in this paper) in wireless neighborhood network. Heuristic solutions and a lower bound are provided. Since the positions of all the other nodes (MRs) is given and such nodes are also the only traﬃc ending points in the network, the problem considered in [11] is actually a subproblem of that proposed in this paper since the coverage part is not included. Moreover, the interference model is based on the number of hops crossed by ﬂows, while the model considered in this paper is a ﬂuidic version [12] of the protocol interference model [14]. The problem in [11] was originally proposed in [13] where, on the other hand, a time division multiple access is assumed and the slot scheduling optimization is included in the model. In this work we propose an optimization model for the problem of planning WMNs based on mathematical programming which takes into account both the local and the multihop connectivity requirements. The problem is NP-hard but it can be solved to the optimum for realistic size instances. We provide

Optimization Models for the Radio Planning of Wireless Mesh Networks

289

optimal solutions for a set of synthetic instances and discuss the eﬀect of diﬀerent parameters on the characteristics of the solution. Our work is organized as follows: Section 2 gives the formulation of the proposed model and comments on its main features, whilst Section 3 reports numerical results. Concluding remarks are given in Section 4.

2

Wireless MESH Network Planning

For the sake of simplicity, we deﬁne a basic version of the model for the WMN planning problem neglecting interference, adaptive modulation and multiple channels, and then we extend it step-by-step. Let us consider the network description presented in Figure 1. Similarly to the coverage problems commonly considered for wireless access networks [9], let S = {1, . . . , m} denote the set of CSs and I = {1, . . . , n} the set of TPs. A special node N represents the wired backbone network. The cost associated to installing a MR in CS j is denoted by cj , while the additional cost required to install a MAP in CS j is denoted by pj , j ∈ S. The total cost for installing a MAP in CS j is therefore given by (cj + pj ). Traﬃc generated by TP i is given by parameter di , i ∈ I. The traﬃc capacity of the wireless link between CSs j and l is denoted by ujl , j, l ∈ S, while the capacity of the radio access interface of CS j is denoted by vj , j ∈ S. The sequence Si of CSs that can cover TP i is calculated for all TPs considering a non increasing order of (i) (i) (i) received signal strength, Si = {j1 , j2 , . . . , jLi }, i ∈ I. According to TPs and CSs location and propagation information the connectivity parameters can be calculated. Let aij , i ∈ I, j ∈ S, be the coverage parameters: 1 if a MAP or MR in CS j cover TP i aij = 0 otherwise, and bjl , j, l ∈ S, the wireless connectivity parameters: 1 if CS j and l can be connected with a link bjl = 0 otherwise.

N wjN zj

j yjl

Wired backbone

MAP MR TP

l xij i

Fig. 1. WMN planning problem description

290

E. Amaldi et al.

Decision variables of the problem include TP assignment variables xij , i ∈ I, j ∈ S: 1 if TP i is assigned to CS j xij = 0 otherwise, installation variables zj , j ∈ S: 1 if a MAP or a MR is installed in CS j zj = 0 otherwise, wired backbone connection variables wjN , j ∈ S (if zj = 1, wjN denote if j is connected to the wired network N , i.e. if it is a MAP or a MR): 1 if a MAP is installed in CS j wjN = 0 otherwise, wireless connection variables yjl , j, l ∈ S: 1 if there is a wireless link between CS j and l yjl = 0 otherwise, and ﬁnally ﬂow variables fjl which denote the traﬃc ﬂow routed on link (j, l), where the special variable fjN denotes the traﬃc ﬂow on the wired link between MAP j and the backbone network. Given the above parameters and variables, the WMN planning problem can be stated as follows: (cj zj + pj wjN ) (1) min j∈S

s.t.

xij = 1

∀i ∈ I

(2)

j∈S

xij ≤ zj aij ∀i ∈ I, ∀j ∈ S di xij + (flj − fjl ) − fjN = 0

i∈I

(3) ∀j ∈ S

(4)

l∈S

flj + fjl ≤ ujl yjl di xij ≤ vj

∀j, l ∈ S

(5)

∀j ∈ S

(6)

∀j ∈ S

(7)

∀j, l ∈ S

(8)

i∈I

fjN ≤ M wjN yjl ≤ zj , yjl ≤ zl yjl ≤ bjl zj (i) +

Li h=+1

xij (i) ≤ 1 h

∀j, l ∈ S ∀ = 1, . . . , Li − 1, ∀i ∈ I

xij , zj , yjl , wjN ∈ {0, 1}

∀i ∈ I, ∀j, l ∈ S

(9) (10) (11)

Optimization Models for the Radio Planning of Wireless Mesh Networks

291

The objective function (1) accounts for the total cost of the networks including installation costs cj and costs related to the connection of MAP to the wired backbone pj . If for some practical reason only a MR and not a MAP can be installed in CS j, the corresponding variable wjN is set to zero. Constraints (2) provide full coverage of all TPs, while constraints (3) are coherence constraints assuring respectively that a TP i can be assigned to CS j only if a device (MAP or MR) is installed in j and if i is within the coverage set of j. Constraints (4) deﬁne the ﬂow balance in node j. These constraints are the same as those adopted for classical multicommodity ﬂow problems. The term assigned TPs, l∈S flj is the total traﬃc i∈I di xij is the total traﬃc related to received by j from neighboring nodes, l∈S fjl is the total traﬃc transmitted by j to neighboring nodes, and fjN is the traﬃc transmitted to the wired backbone. Even if these constraints assume that traﬃc from TPs is transmitted to the devices to which they are assigned and that this traﬃc is ﬁnally delivered by the network to the wired backbone, without loss of generality we can assume that di accounts for the sum of traﬃc in the uplink (from TPs to the WMN) and in the downlink (from WMN to the TPs) since radio resources are shared in the two directions. Constraints (5) impose that the total ﬂow on the link between device j and l does not exceed the capacity of the link itself (ujl ). Constraints (6) impose for all the MCs’ traﬃc serviced by a network device (MAP or MR) not to exceed the capacity of the wireless link used for the access, whilst constraints (7) forces the ﬂow between device j and the wired backbone to zero if device j is not a MAP. The parameter M is used to limit the capacity of the installed MAP. Constraints (8) and (9) deﬁnes the existence of a wireless link between CS j and CS l, depending on the installation of nodes in j and l and wireless connectivity parameters bjl . The constraints expressed by (10) force the assignment of a TP to the best CS in which a MAP or MR is installed according to a proper sorting criteria (such as the received signal strength), whilst constraints (11) deﬁnes the decision variables of the model to assume binary values only. Obviously, the above model is NP-hard since it includes the set covering and the multi-commodity ﬂow problems as special cases. The model deﬁned above considers ﬁxed transmission rates for both the wireless access interface and for the wireless distribution system, and it will be referred to as Fixed Rate Model (FRM) throughout the paper. The FRM can be easily extended to endorse transmission rate adaptation. As to the wireless distribution system, the rate adaptation can be accounted directly in the variables ujl . Rate adaptation in the wireless access network can be accounted in the model with a slight modiﬁcations of constraints (6). We consider several concentric regions centered in each CS, assigning to each region a maximum rate value. All the TPs falling in one of these regions can communicate with the node in the CS using the speciﬁc rate of the region. Formally, we can deﬁne the set of regions for a given CS j Rjk = 1, ..., K and the set Ijk ⊂ I containing all the TPs falling in region k of CS j. Such sets can be determined for each CS j using the incidence variable akij which is equal to

292

E. Amaldi et al.

1 if TP i falls within region k of the CS j and to zero otherwise. Each of these regions of a given CS j is assigned a maximum capacity deﬁned by variable vjk . Using such deﬁnitions, the FRM can be extended to the case of rate adaptation in the wireless access part of the network by substituting the constraints (6) with the following new constraints: i∈Ijk di xij ≤ 1 ∀j ∈ S (12) vjk k∈R j

The new deﬁned model with constraints (12) will be referred to as Rate Adaptation Model (RAM) throughout the paper. Both the FRM and the RAM do not consider the eﬀect of the interference on the access capacity and on the capacity of wireless links connecting mesh nodes. However, in practical cases we must take into account interference eﬀect. We focus here on the case of IEEE 802.11 and assume that all MAPs and MRs share the same radio channel for the access part and use another shared channel for the backbone links. Since now the access capacity is shared by all mesh nodes, we can take into account interference quite easily modifying constraints (12) and considering not only the TPs assigned to the node in CS j but all TPs in the coverage range: i∈Ijk di ≤ 1 ∀j ∈ S (13) zj vjk k∈R j

The interference limiting eﬀect on the wireless link capacities is more diﬃcult to account for, since it depends on the network topology and the multiple access protocol. Considering the protocol interference model proposed in [14], we can deﬁne sets of links that cannot be active simultaneously. These sets depend on the speciﬁc multiple access protocol considered. In the case of CSMA/CA, adopted by IEEE 802.11, each set Cjl considers a link (j, l) and includes all links that are one and two hops away in the mesh-network graph (links connecting j and l to their neighbors and their neighbors to the neighbors of their neighbors). To each set we can associate a constraint on the ﬂows crossing its links: yjl

(k,h)∈Cjl

fkh ≤1 ukh

∀j, l ∈ S

(14)

Obviously, by describing the capacity limitation due to set Cjl with the constraint on ﬂows crossing its links, we make an approximation since we pass from a discrete model to a ﬂuidic one [12]. The eﬀect of this approximation and the one due to traﬃc dynamics can be accounted by properly reducing capacity values ukh . We estimated through simulation of IEEE 802.11 multi-hop networks that a reduction of 5% is suﬃcient to achieve consistent results. Replacing constraints (5) with (14) and (6) with (13) we get a new model referred to in the paper as Interference Aware Model (IAM). Note that the nonlinear constraints (14) can be easily linearized.

Optimization Models for the Radio Planning of Wireless Mesh Networks

293

As a matter of fact, FRMT/RAM and IAM can be considered extreme cases with respect to interference, corresponding respectively to those scenarios where enough channels and radio interfaces are available in the mesh nodes so that interference can be neglected, and to those scenarios where only one channel is available for the mesh backbone. In all the other intermediate scenarios, channel assignment to mesh nodes must be included in the optimization model. The extension of the model to the multiple channels case is quite straightforward and, due to length constraints, here we just outline the modiﬁcations needed. Let us assume that a set F of Q channels is available and that each mesh node is equipped with B radio cards. New installation variables zjq (q ∈ F, j ∈ S) must be considered which are equal to one if a mesh node is installed in j and q are extended is assigned channel q, and to zero otherwise. Also link variables yjl q . In the to include the channel used by link (j, l), as well as ﬂow variables fjl objective function we replace variable zj with new variable tj which is equal to 1 if a mesh node is installed in j. Constraints are easily modiﬁed to include the new dimension related to channels. In particular, constraints (3)-(5) are replaced with the following: q zj aij ∀i ∈ I, ∀j ∈ S (15) xij ≤ q∈F

di xij +

i∈I

q q flj − fjl − fjN = 0

∀j ∈ S

(16)

l∈S,q∈F q yjl

(k,h)∈Cjl

q fkh ≤1 ukh

∀j, l ∈ S, ∀q ∈ F.

(17)

To limit to B the maximum number of channels assigned to a mesh node a new set of constraints must be added: q zj ≤ B ∀j ∈ S, (18) q∈F

and to deﬁne new variables tj we add: tj ≤ zjq ∀j ∈ S, ∀q ∈ F.

(19)

The new model will be referred to as Multi-Channel Model (MCM).

3

Numerical Results

In this section we test the sensitivity of the models proposed in previous section to diﬀerent parameters like the number of candidate sites, the traﬃc demands from the MCs and the installation costs. To this end, we have implemented a generator of WMN topologies which considers speciﬁc parameter settings and makes some assumptions on propagation and device features. Obviously, these assumptions do not aﬀect the proposed model which is general and can be applied to any problem instance. The generator considers a square area with edge L = 1000 m,

294

E. Amaldi et al.

and it randomly draws the position of m CSs and of n = 100 TPs. The coverage area of a mesh node is assumed to be a circular coverage region with radius RA = 100 m. Only feasible instances where each TP is covered by at least one CS are considered. The wireless range of wireless backbone links is RB = 250 m, while the capacities of the access links,vj and backbone links, ujl , are both set to 54 Mb/s for all j and l. The capacity of links connecting MAPs to the wired network is M = 128 Mb/s, while the ratio between the cost of a MR and a MAP is β (β = 1/10 unless otherwise speciﬁed). All the results commented hereafter are the optimal solutions of the considered instances obtained formalizing the model in AMPL [16] and solving it with CPLEX [15] using workstations equipped with a AMD Athlon (TM) processor with CPUs operating at 1.2GHz, and with 1024Mbyte of RAM. 3.1

Fixed Rate Model

Once assigned the number and the positions of either CS and TP, the quality of the deployed WMN and consequently the overall installation cost depends on two parameters: the traﬃc demand d of the MCs and the ratio between the MR and MAP installation costs β. In this section we analyze the sensitivity of the proposed model to these parameters.

(a) d=600Kb/s

(b) d=3Mb/s

Fig. 2. Sample WMNs planned by the FRM with increasing traﬃc demands of the MCs and ﬁnite capacity of the installed MAPs (M = 128M b/s). Standard setting of the topological parameters.

Eﬀect of the Traﬃc Demands. Figure 2 reports an example of the planned networks when applying the FRM to the same instance with two diﬀerent requirements on the end user traﬃc, d = 600Kb/s and d = 3M b/s for all MCs. Table 1 analyze the characteristics of the solutions of the FRM when varying the number of candidate sites. The results presented are obtained averaging each point on 10 instances of network topology. For each couple (m, d) the tables

Optimization Models for the Radio Planning of Wireless Mesh Networks

295

Table 1. Solutions provided by the FRM d=600Kb/s

d=3Mb/s

MAP MR Links Time (s) MAP MR Links Time (s) m=30 2.25 23.75 21.50

0.4

23.65 20.20

0.63

m=40 1.45

22.55

1.43

3.40 23.75 21.00

10.93

m=50 1.25 24.15 22.9

4.69

3.25 23.95 21.55

32.88

24

4

report the number of installed MRs, the number of installed MAPs, the number of wireless links of the WDS and the processing time to get the optimal solution. Two main results come from the observation of the table: ﬁrst, the very same eﬀect of traﬃc increase observed in Figure 2 is evident also on averaged results, in fact the number of installed MAPs increases when increasing the traﬃc demands. Second, for a given traﬃc value, increasing the number of CS to 50 augments the probability for a MC to be connected to a MAP through a multi hop wireless path, therefore the model tends to install less MAPs and more MRs. On the other side, if the number of CS is lower (30), the model installs more MAPs since not all the MCs can be connected to the installed MAPs through multi hop wireless paths. In other words, with high m the solution space is bigger and the model favors those solutions providing connectivity which have a lower impact on the network cost, i.e., those installing more MRs than MAPs. Eﬀect of the Cost Parameter. The number of installed MAPs and MRs intuitively depends on the installation cost ratio between a simple wireless router and a mesh access point. Table 2. Solutions provided by the FRM when varying the installation cost ratio β. Number of CS m = 30. d=600Kb/s β

d=3Mb/s

MAP MR Links MAP MR Links

1/10 2.10 23.40 21.30

3.80 23.30 20.20

1/7

2.40 24.00 21.60

4.10 23.80 20.10

1/5

2.40 24.30 22.70

4.10 23.80 20.10

1/3

2.40 24.30 21.60

4.10 23.80 20.10

1/2

2.80 23.60 20.80

4.20 23.60 19.60

Table 2 reports the characteristics of the solutions when varying the parameter β for diﬀerent values of the oﬀered traﬃc d. The results reported in the table shows that if the cost for installing a MAP decreases with respect to the cost of MRs, the proposed model tends to install more MAPs. However, the diﬀerences observed with diﬀerent cost ratios are much smaller than what one could expect.

296

E. Amaldi et al. Table 3. Quality of the solutions provided by the RAM d=200Kb/s

d=600Kb/s

MAP MR Links Time (s) MAP MR Links Time (s) m=30 2.80 23.80 21.00

2.24

2.80 25.40 22.60

2.24

m=40 1.70 24.60 22.90

9.66

1.70 26.00 24.30

13.25

m=50 1.20 24.40 23.20

46.83

1.20 26.10 24.90

61.37

The reason is that in the considered scenario the optimization process is driven mainly by the capacity constraints. We obtained results (not shown here) with a bigger diﬀerence in the number of installed MAPs by letting higher capacity for the links connecting MAPs (setting high values to parameter M ). 3.2

Rate Adaptation Model

In real wireless networks, the capacity of a given wireless link depends on the distance between transmitter and receiver. The RAM endorses this fact by deﬁning three capacity regions around a MR (and MAP) and assigning the link between MC and MR (or MAP) an increasing capacity when getting nearer to the MR (or MAP) location. The rate values vik = v k ∀j adopted to obtain numerical results emulates IEEE 802.11g transmission and depends on the distance r: 0m ≤ r ≤ 30m v k = 36M b/s, 30m < r ≤ 60m v k = 18M b/s, 60m < r ≤ 100m v k = 2M b/s. The behavior of the RAM is similar to the one of the FRM in terms of sensitivity to the model parameters. Table 3 summarizes the characteristics of the solutions of the RAM when varying the number of candidate sites. The traﬃc oﬀered by the MCs is lower with respect to the one used to test the FRM since the link rates are in the average lower then those in the FTR case. As a result, we observed that higher values of d may lead to unfeasible instances. The results obtained highlight a behavior of the RAM very similar to the one already observed for the FRM in the same conﬁguration. 3.3

Interference Aware Model

The IAM considers the impact of the interference on the access capacity, through constraints (14), and the capacities wireless links, through constraints (13). Table 4 reports the results obtained with the IAM for the special case in which rate adaptation is not included. The parameter settings are the same adopted for the FRM and therefore results in Table 4 can be directly compared with those reported in Table 1. Sets Cjl have been obtained considering the IEEE 802.11 multiple access scheme. We observe that the number of MAPs installed is remarkably higher and the number of links lower with respect to the FRM case. This is due to the capacity reduction of wireless links that favors solutions where MRs and MAPs are interconnected through paths with a small number of hops. In fact, with short paths between MRs and MAPs the eﬀect of interference is weaker. Obviously, short paths

Optimization Models for the Radio Planning of Wireless Mesh Networks

297

Table 4. Solutions provided by the IAM without rate adaptation d=600Kb/s

d=3Mb/s

MAP MR Links Time (s) MAP MR Links Time (s) m=30 3.40 22.20 19.40

4.37

8.50 22.00 14.20

4.16

m=40 2.50 23.10 20.90

258.09

7.80 22.50 16.10

53.98

m=50 2.30 23.40 21.60 1,706.63

7.70 23.40 17.80 1,345.32

Table 5. Solutions provided by the MCM without rate adaptation d=600Kb/s

d=3Mb/s

MAP MR Links Time (s) MAP MR Links Time (s) m=30 2.25 23.75 21.60

2.99

4.00 23.65 20.55

9.79

m=40 1.45 24.00 22.60

73.43

3.40 23.75 21.30

141.18

m=50 1.25 24.05 22.85

393.38

3.30 23.65 21.35

673.52

require a higher number of MAPs. Another relevant diﬀerence between the FRM and the IAM results is the computation time which is much higher for IAM in most of the cases. As expected this is due to the structure of constraints (13) which involve several ﬂow variables simultaneously. A similar behavior can be observed considering the results (not shown here) obtained with IAM and rate adaptation. 3.4

Multiple Channel Model

The MCM adds to the planning problem the channel assignment to multi-radio devices. Table 5 reports the numerical results obtained by the MCM when considering Q = 11 channels and B = 3 radio interfaces, which are typical values for the IEEE 802.11a technology. We observe that the results are not very diﬀerent from those obtained with the FRM (see Table 1) in terms of installed MAPs, MRs and links. On the other side, the computation complexity of MCM is much higher than the one of the other models: considering a constraints of four hours on the computation time, we are able to obtain the optimum solution (reported in the table) only in the 80% of the instances. Therefore, the MCM should be adopted when planning multi-channel/multi-radio WMNs only in case the number of available channels and interfaces is very limited. In the other cases the FRM/RAM models can be safely used for planning the network and frequency assignment can be optimized in a second phase [4].

4

Conclusion

In this paper we proposed an optimization model based on mathematical programming whose objective function is the minimization of the overall network

298

E. Amaldi et al.

installation cost while taking into account the coverage of the end users, the wireless connectivity in the wireless distribution system and the management of the traﬃc ﬂows. Technology dependent issues such as rate adaptation and interference eﬀect have been considered. To test the quality of the solutions provided by the model, we generated synthetic instances of WMNs and solved them to the optimum using AMPL/CPLEX varying several network parameters. The numerical results we gathered show that the model is able to capture the eﬀect on the network conﬁguration of all these parameters, providing a promising framework for the planning of WMNs.

References 1. I.F. Akyildiz, X. Wang, I. Kiyon, A Survey on Wireless Mesh Networks, IEEE Com. Mag., Sept. 2005, Vol. 43, No. 9, pp. 23–30. 2. J. So, N. Vaidya, Multi-Channel MAC for ad hoc Networks: Handling MultiChannel Hidden Terminals using a Single Transceiver, ACM MOBIHOC 2004, pp. 222-233. 3. A. Adya et al., A Multi-Radio Uniﬁcation Protocol for IEEE 802.11 Wireless Networks, IEEE BROADNETS 2004, pp. 344–354. 4. A.K. Das, H.M.K. Alazemi, R. Vijayakumar, S. Roy, Optimization models for ﬁxed channel assignment in wireless mesh networks with multiple radios, IEEE SECON 2005, pp. 463–474. 5. M. Alicherry, R. Bhatia, L. E. Li, Multi-radio, multi-channel communication: Joint channel assignment and routing for throughput optimization in multi-radio wireless mesh networks, ACM MOBICOM 2005, pp. 58–72. 6. A. Raniwala, K. Gopalan, and T. C. Chiueh, Centralized channel assignment and routing algorithms for multichannel wireless mesh networks, ACM MC2R, Apr. 2004, Vol. 8, No. 2, pp. 50–65. 7. E. Amaldi, A. Capone, F. Malucelli, Planning UMTS Base Station Location: Optimization Models with Power Control and Algorithms, IEEE Trans. on Wireless Comm., Sept. 2003, Vol. 2, No. 5, pp. 939–952. 8. S. Bosio, A. Capone, M. Cesana, Radio Planning of Wireless Local Area Networks, ACM/IEEE Trans. on Net., to appear. 9. E. Amaldi, A. Capone, F. Malucelli, C. Mannino, Optimization Problems and Models for Planning Cellular Networks, in Handbook of Optimization in Telecommunications, Kluver Academic Publishers, 2006. 10. M. Pioro, D. Medhi, Routing, Flow, and Capacity Design in Communication and Computer Networks, Morgan Kaufmann Publishers, 2004. 11. R. Chandra, L. Qiu, K. Jain, M. Mahdian, Optimizing the placement of integration points in multi-hop wireless networks, IEEE ICNP 2004, pp. 271–282. 12. M. Kodialam, T. Nandagopal, Characterizing the Capacity Region in Multi-Radio Multi-Channel Wireless Mesh Networks, ACM Mobicom 2005, pp. 73–87. 13. Y. Bejerano, Eﬃcient integration of multi-hop wireless and wired networks with QoS constraints, ACM MOBICOM 2002, pp. 215–226. 14. P. Gupta and P.R Kumar, The capacity of wireless networks, IEEE Trans. on Inf. Theory, Mar. 2000, Vol. 46, No. 2, pp. 388–404. 15. ILOG CPLEX 8.0 user’s manual, 2002. 16. R. Fourer, D. M. Gay, and B. W. Kernighan, AMPL, A modeling language for mathematical programming, 1993.

Interference-Aware Multicasting in Wireless Mesh Networks Sudheendra Murthy1 , Abhishek Goswami2 , and Arunabha Sen1 1

School of Computing and Informatics, Arizona State University, Tempe, AZ 85287 [email protected],[email protected] 2 Mobile Devices Software, Motorola [email protected]

Abstract. Multicasting is one of the most important applications in Wireless Ad hoc Networks and the currently emerging Wireless Mesh Networks. In such networks, interference due to the shared wireless medium is a prime factor in determining the data rate achievable by a multicast application. In this research work, we present an interferenceaware multicast routing algorithm that takes into account the eﬀects of interference to determine the maximum bandwidth multicast structure. We characterize the problem of computing maximum bandwidth multicast structure as a graph problem of ﬁnding minimum degree weakly induced subgraph in a graph subject to the connectivity and interference constraints. We establish the intractability of the problem and provide eﬃcient heuristic that performs close to the optimal in most of the cases. We also present the design of a more practical distributed algorithm. The simulation results demonstrate the beneﬁts of our heuristic over Shortest Path Tree and Minimum Steiner Tree approximation algorithms. Keywords: Wireless Mesh Network, Minimum Interference Multicast, Weakly Induced Connected Subgraph, NP-Complete.

1

Introduction

Research in Wireless Mesh Networks (WMNs) has gained tremendous momentum recently as a result of its commercial deployment in many US cities including Seattle, Philadelphia, Tempe [1]. WMNs are increasingly being used to provide cost eﬀective and reliable Internet connectivity to residents and businesses in these cities. These WMNs consist of a set of wireless routers (access points) to which the end users connect, a set of wireless routers that act as forwarding nodes and a set of gateway routers that provide connectivity to the Internet. Data from the end users is routed in the WMN towards the gateway routers and to the Internet.

This research was supported in part by ARO grant W911NF-06-1-0354. The information reported here does not reﬂect the position or the policy of the federal government.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 299–310, 2007. c IFIP International Federation for Information Processing 2007

300

S. Murthy, A. Goswami, and A. Sen

The widespread deployment of WMNs has fueled research work in providing better support to multimedia applications like real-time video transport and Voice over the Internet (VoIP) services. Common to many of these applications is the need for a multicast framework that facilitates eﬃcient distribution of datagrams to a cohort of hosts. Multicasting results in bandwidth savings as compared to multiple unicast sessions. Early eﬀorts in providing multicast support in WMNs ignored to consider the interference eﬀects of the shared wireless medium in which the mesh routers operate. Interference is an important factor that dictates the bandwidth available for the multicast transmission. The mesh routers are usually equipped with IEEE 802.11 a/b/g interfaces. Interference in the context of this paper refers to the bandwidth sharing eﬀect between nodes operating in close range caused due to the CSMA/CA nature of 802.11. In this paper, we present a novel multicast framework that considers the eﬀects of interference and constructs a multicast structure that provides maximum bandwidth to the applications. Our objective is to identify multicast forwarding group nodes whose transmission induces least amount of interference. To the best of our knowledge, this is the ﬁrst paper that proposes such a framework. The main contributions of this work are as follows. 1. We provide a novel formulation of the interference-aware multicasting problem as a graph problem of ﬁnding weakly induced subgraph in a graph representing the mesh routers. 2. We prove the NP-completeness of the problem, thereby establishing its intractable nature. 3. We provide an Integer Linear Program technique of optimally solving this problem. 4. We provide a centralized heuristic algorithm that performs close to the optimal and describe a distributed implementation of the algorithm. We compare the performance of the proposed heuristic with the optimal, shortest path solution and minimum Steiner tree solution. The road map for rest of the paper is as follows. In section 2, we introduce the required notation and give the formal problem deﬁnition. The complexity analysis of the problem is provided in section 3. The Integer Linear Program formulation to obtain the optimal solution is provided in section 4. The centralized heuristic and the design of its distributed extension are provided in section 5. Section 6 presents the evaluation of our heuristics. Section 7 reviews the related work in this area, while section 8 concludes the paper.

2 2.1

Problem Formulation Wireless Transmission and Interference Model

We assume uniform transmission range RT and interference range RI for all the routers in the WMN. We represent a WMN by a directed potential communication graph G(V, E) in which the vertices represent the mesh routers. Denote

Interference-Aware Multicasting in Wireless Mesh Networks

7 1

RT 2

7

2 3 4

6

1

RI 2

Fig. 1. A Potential Communication Graph G(V, E) with transmission radius RT

7

2

2 6 3

5

301

4

5

1

6 3 4

Fig. 2. Potential Interference Graph H(V, F ) with interference radius RI =1.5RT

5

Fig. 3. SH , the subgraph weakly induced by the vertex set S = {3, 6, 7} on the graph of ﬁgure 2

the Euclidean distance between routers u and v by dist(u, v). Two directed edges {(u, v), (v, u)} ∈ E if dist(u, v) ≤ RT and implies that mesh router u can communicate directly with mesh router v. We assume that each router is equipped with one radio and all routers operate on the same channel. Any of the mesh routers can be the source node in the multicast tree. However in many applications, the multicast source comes from the wired network and hence, the multicast source is a gateway mesh router. An access point router becomes a receiver of the multicast structure if there is at least one end-user connected to the access point who wants to receive this multicast stream. We assume a 802.11 CSMA/CA medium access control scheme1 . Thus, a transmission between two nodes may prevent all nodes within the transmission range of the sender from transmitting due to carrier sensing. We assume that the interference range RI is q × RT with q ≥ 1. We model the co-channel interference of the WMN with the help of a directed potential interference graph. A directed potential interference graph is represented by H(V, F ) in which the vertex set corresponds to the mesh routers. Two directed edges {(u, v), (v, u)} ∈ F if dist(u, v) ≤ RI . A directed edge (u, v) in the potential interference graph implies that the transmission of router u can cause interference at router v. Figure 1 shows a sample potential communication graph with 7 mesh routers. The circles around the routers are drawn with radius RT /2 and hence two intersecting circles implies that the two routers are within the transmission range of each other. Figure 2 shows the corresponding potential interference graph with RI =1.5RT . Since RT ≤ RI , the potential communication graph is always a subgraph of the potential interference graph. This interference model is diﬀerent from the frequently used conﬂict graph representation of interference in a WMN. In the conﬂict graph representation, a vertex vij is in the vertex set of the conﬂict graph if (i, j) is an edge in the potential communication graph. There exists an edge between two vertices vij and vkl in the conﬂict graph if min (dist(i, k), dist(i, l), dist(j, k), dist(j, l)) 1

Although we consider the standard 802.11 protocol without RTS/CTS, our techniques can be applied to RTS/CTS 802.11 environments with some minor changes.

302

S. Murthy, A. Goswami, and A. Sen

≤ RI . In the context of the problem explored in this paper, our interference model oﬀers some beneﬁts over the conﬂict graph model. Firstly, our interference model is more intuitive in the sense that it captures the idea of co-channel interference occurring at the receiving nodes in a WMN. Secondly, it can easily be seen that the conﬂict graph and the potential interference graph can be derived from each other when the potential communication graph is known. Finally, the potential interference graph modeling makes our problem deﬁnition more elegant. 2.2

Graph Deﬁnitions and Notations

All graphs deﬁned in this paper are directed graphs. We drop the preﬁx directed and henceforth, a graph implies a directed graph. In this section, we introduce the required graph terminology. – In-degree of a vertex v in a graph G is the number of arcs coming into v and the out-degree of v is the number of arcs going out of v. The in-degree and − + out-degree of a vertex are represented by δG (v) and δG (v) respectively. – The maximum in-degree of a graph G represented by Δ− (G) is the maxi− mum of the in-degrees of its vertices. That is, Δ− (G) = maxv∈V (G) δG (v). + Similarly, the maximum out-degree of a graph G represented by Δ (G) is the maximum of the out-degrees of its vertices. That is, Δ+ (G) = maxv∈V (G) + δG (v). – A vertex u is a neighbor of vertex v if there is a directed edge from v to u. The (closed) neighborhood of vertex v in graph G(V, E) denoted by N [v] is the set that includes v and the neighbors of v. The (closed) neighborhood of a subset S ⊆ V of graph G(V, E) denoted by N [S] is the set that includes S and the neighbors of S. That is, N [S] = S ∪ v∈S N [v]. – The subgraph weakly induced by vertex set S ⊆ V in graph G(V, E) is deﬁned as the graph with the vertex set N [S] and edge set E ∩ (S × N [S]). In other words, the edge set of the subgraph weakly induced by S consists of all the edges induced by the vertices of S along with the directed edges from set S to its neighbors in the graph. The weakly induced subgraph of graph G on the vertex set S is denoted by S G . An example of the subgraph weakly induced by the set of vertices S = {3, 6, 7} on the potential interference graph of ﬁgure 2 is shown in ﬁgure 3. Throughout this paper, we use the notation G(V, E) to represent the potential communication graph and H(V, F ) to represent the potential interference graph. 2.3

Problem Deﬁnition

The maximum data rate that can be achieved in a multicast structure is limited by the data rate of the bottleneck link. The data rate of the bottleneck link is determined by the amount of interference experienced by the receiver of the link. For instance, a mesh node present in the communication range of 5 transmissions

Interference-Aware Multicasting in Wireless Mesh Networks

303

experiences more interference and thus provides less throughput compared to a mesh node present in the communication range of 4 transmissions. In an eﬀort towards maximizing the data rate of the multicast structure, we try to select the forwarding group nodes that induce least amount of interference. That is, given the locations of wireless mesh routers, a multicast source node and a set of multicast receivers, our goal is to ﬁnd the group of forwarding nodes that induce the least interference on the forwarding nodes and the receivers. In terms of the potential communication graph and the potential interference graph, the problem is formally stated as follows. Given a potential communication graph G(V, E), its corresponding interference graph H(V, F ), a multicast source vertex s ∈ V and a set of receiver vertices R ⊆ V \ s, ﬁnd vertex set S ⊂ V such that 1. the subgraph of G weakly induced by S, i.e., S G has directed paths connecting s to each ri ∈ R and 2. the maximum in-degree of the vertices of the subgraph of H weakly induced by S, i.e., Δ− (S H ) is minimized. We term this problem as the Minimum-Degree Weakly Induced Connected Subgraph (MDWICS) problem. The optimal subset S of the MDWICS problem contains the source vertex and the forwarding group vertices that result in the least interference. In the next section, we prove the hardness of the MDWICS problem.

3

Computational Complexity

To prove the NP-completeness of the MDWICS problem, we provide a polynomial time transformation from Exact Cover by 3 Sets (X3C) problem [2]. Consider the scenario in which the interference and transmission ranges of the transmitters are equal. In this case, the potential communication graph and the potential interference graph have the same set of vertices, edges and thus, can be represented by a single graph, say G(V, E). The decision version of the MDWICS problem is then stated as follows. INSTANCE: Directed graph G = (V, E), vertex s ∈ V designated as source, vertices R ⊆ V \ s designated as the set of receivers and a positive integer K. QUESTION: Is there a subset S ⊂ V such that the subgraph weakly induced by S in G denoted by S G has paths from s to each ri ∈ R and Δ− (S G ) is at most K. Theorem 1. MDWICS problem is NP-complete. Proof. Clearly MDWICS problem is in NP since a nondeterministic algorithm need only guess the vertex set S and check in polynomial time whether the subgraph weakly induced by S in G has paths from s to each ri ∈ R and the maximum in-degree of the subgraph is at most K. Suppose a ﬁnite set X = {x1 , x2 , . . . , x3q } and a collection C = {C1 , C2 , . . . , Cm } of 3-element subsets of X make up the instance of X3C. From this

304

S. Murthy, A. Goswami, and A. Sen

instance, we will construct an instance of the MDWICS problem using local replacement technique. Corresponding to every element xi ∈ X, 1 ≤ i ≤ 3q and subset Cj ∈ C, 1 ≤ j ≤ m, introduce vertices xi and ci respectively in graph G(V, E). These vertices together with an additional vertex s make up the vertex set of G. The edge set of G consists of two types of edges. The ﬁrst set includes directed edges (cl , xi ), (cl , xj ) and (cl , xk ) for every subset Cl = {xi , xj , xk }. The second set includes directed edges from s to each ci . The graph constructed using this mechanism has m+3q+1 vertices and 4m edges. Designate vertex s as source and vertices R = ∪3q i=1 xi as the set of receivers. This completes the construction procedure of the proof. Suppose that K = 1. We claim that there exists a S ⊂ V such that S G has paths from s to each xi , 1 ≤ i ≤ 3q and Δ− (S G ) ≤ 1 if and only if the corresponding X3C instance contains an exact cover for X. s c1

x1

c2

x2

c3

x3

c5

c4

x4

x5

x6

Fig. 4. Local replacement for subsets C1 ={x1 , x2 , x3 }, C2 ={x2 , x4 , x6 }, C3 ={x3 , x5 , x6 }, C4 ={x1 , x3 , x5 }, C5 ={x4 , x5 , x6 }

It is easy to verify that if X3C has an exact cover, then the vertices corresponding to the subsets in the X3C solution along with vertex s is a solution to the MDWICS problem. Conversely, suppose S ∈ V such that S G has paths from source s to each of the receivers x,i s and maximum in-degree of S G is at most 1. Consider the set S = S ∩ {c1 , . . . , cm }. Note that each vertex ci ∈ V , 1 ≤ i ≤ m has exactly 3 outgoing edges. |S | = q, since otherwise there would be either no path (|S | < q) or multiple paths (|S | > q) to some xi thereby violating maximum in-degree of 1. It follows that the subsets corresponding to the vertices in S form an exact cover to the X3C problem.

4

Optimal Solution

In this section, we provide an Integer Linear Program (ILP) formulation [3] to solve the MDWICS problem optimally. Given a potential communication graph G(V, E), potential interference graph H(V, F ), source s ∈ V and receivers R ⊆ V \s, the problem is to ﬁnd subset S ∈ V that minimizes the maximum in-degree of S H subject to the constraint that S G has paths from s to each ri ∈ R. The ILP ﬁnds the weakly induced subgraph that achieves the above mentioned goal. From this subgraph, we can extract the desired node set S by removing leaf nodes, that is, nodes with zero out-degree.

Interference-Aware Multicasting in Wireless Mesh Networks

305

The indicator variables are deﬁned as follows. xi,j = 1, if edge (i, j) ∈ E is p in the optimal solution, 0 otherwise. Deﬁne fi,j =1, if there is a ﬂow from s to p receiver p through link (i, j) ∈ E in the optimal solution, 0 otherwise. fi,j is used to ensure connectivity from s to receiver r. Deﬁne yi,j =1 for edge (i, j) ∈ F if node i is the transmitter in the optimal solution, 0 otherwise. yi,j is used to capture the interference caused by i’s transmission on node j. The objective is to minimize D, the maximum in-degree of all the nodes in the interference subgraph. The following set of constraints deﬁne the problem accurately. – The degree constraint speciﬁes that the maximum in-degree of the nodes in the optimal solution interference subgraph should be no larger than D. That is, (i,v)∈F yi,v ≤ D, ∀v ∈ V . – The broadcast constraint captures the broadcast characteristics of the wireless medium. When a node transmits, its transmission aﬀects all nodes in its transmission range. This is represented by x(v,i) = x(v,j) , ∀v ∈ V, (v, i) and (v, j) ∈ E, i = j. – The interference constraint models the interference characteristic of the network. For every node in the potential communication graph, if there is a directed communication edge going out, then there is a directed interfering edge going out from the corresponding node in the potential interference graph. This is represented by yv,i ≥ xv,j , ∀(v, i) ∈ F, ∀(v, j) ∈ E. Connectivity from the source node to each receiver is ensured by the following ﬂow conservation constraints. These constraints are similar to the multi-commodity ﬂow constraints [3]. – The total incoming ﬂow into an intermediate node is equal to the total p p = (v,j)∈E fv,j , ∀p ∈ outgoing ﬂow from that node. That is, (i,v)∈E fi,v [1, k], ∀v ∈ {V \ {s ∪ R}}. – For each receiver and for each ﬂow, the incoming ﬂow to outgoing is equal p ﬂow except for the ﬂow destined for the receiver. (i,r)∈E fi,r = (r,j)∈E p fr,j , ∀p ∈ {R \ r}, ∀r ∈ {R}. – There is zero incoming ﬂow and unit outgoing ﬂow to the source node. That |R| p p = 0 and (s,i)∈E fs,i = 1, ∀p ∈ R is, p=1 (i,s)∈E f(i,s) – For each receiver, the outgoing ﬂow from that receiver should be zero, if the r ﬂow is for the same receiver. (r,j)∈E fr,j = 0, ∀r ∈ R p – The dependence between fi,j and xi,j for edge (i, j) ∈ E is represented by |R| p |R| × xi,j ≥ p=1 fi,j , ∀(i, j) ∈ E

5

Proposed Algorithms

In this section, we ﬁrst present a centralized heuristic for the MDWICS problem and then describe the design of a distributed version. 5.1

Centralized Heuristic

The greedy heuristic takes as input the potential communication graph G(V, E), the potential interference graph H(V, F ), multicast source vertex s and multicast

306

S. Murthy, A. Goswami, and A. Sen

receiver set R. The output of the heuristic is the set of multicast forwarding group nodes. In the algorithm, S G and S H are the weakly induced subgraphs of G and H respectively on vertices S as deﬁned before. The algorithm maintains a feasible solution set S of vertices that contain paths from the source vertex to all the receivers and a set W of vertices that contains the set of visited vertices. Initially, all vertices in the graph are included in set S and set W contains the source vertex (Step 1). The algorithm then selects in each iteration a vertex in S \ s that has the maximum in-degree in the subgraph weakly induced by S in graph H. If there are multiple such vertices, any one of them is selected arbitrarily. This vertex will be a potential candidate for removal since it has the highest in-degree in S H . Step 5 checks if removal of this vertex from the graph G disconnects any of the receivers from s. If not, this vertex and all its incident edges are removed from both G and H. This vertex is added to the set of vertices W visited so far. The algorithm terminates when all vertices in the graph have been visited. Algorithm 1. MDWICS Heuristic Input: potential communication graph G(V, E), potential interference graph H(V, F ), source s ∈ V , receiver set R ⊆ V \ s Output: set S of multicast forwarding group vertices 1: S ← V , W ← {s} 2: while V \ W = ∅ do − (v)|v ∈ S \ s} 3: v ← arg max{δS H 4: S ←S\v 5: if ∃ directed paths from s to each ri ∈ R in the graph S G then 6: S ← S 7: Remove vertex v and all its incident edges from both graph G and H 8: end if 9: W ← W ∪ {v} 10: end while 11: return S \ s

5.2

Distributed Protocol

The design of the distributed interference-aware multicast protocol is based on the Optimized Link State Routing (OLSR) protocol [4]. OLSR could be used in mesh networks to maintain the state and quality of the links. The distributed protocol takes full beneﬁt of the topology knowledge obtained by the OLSR protocol with its Topology Control (TC) messages. The TC messages are eﬃciently dispersed in the network through multipoint relays (MPRs). Setting the parameter TC REDUNDANCY=2 at each node in the OLSR protocol ensures that each node gets information about every other node, link in the network (Section 15.1 [4]). Each node in the network is uniquely identiﬁed by its IP address. The source initiating the multicast session generates a unique ID for the session based on its IP address and a sequence number for the multicast group.

Interference-Aware Multicasting in Wireless Mesh Networks

307

The source with its topology information independently computes the set of multicast forwarding group nodes. It then disseminates through the MPRs, a MC FG message consisting of the IP addresses of the forwarding group nodes and a MC JOIN message to the multicast group members. A node upon receiving a MC FG message checks if its IP address is listed in the message. If so, it records the session ID in a forwarding group (FG) table. Only the ﬁrst MC FG message received by a node is broadcasted with subsequent MC FG messages being discarded. The MC JOIN message is to inform the multicast group members about the initiation of the session. The session ID is included in every packet forwarded by the multicast source. If a node receiving this packet has the session ID listed in its FG table, it forwards the packet. Evaluation of this distributed version is the focus of current ongoing work.

6

Simulation Environment, Results and Discussion

We conducted extensive experiments to evaluate our centralized heuristic. We compared the performance of our heuristic with the Shortest Path Tree (SPT) algorithm, Minimum Steiner Tree (MST) approximation algorithm [5] and the optimal solution obtained by solving the ILP formulation given in section 4. The Shortest Path Tree algorithm ﬁnds the set of edges connecting s to each receiver such that the length of the shortest path (measured as the number of hops) from s to each receiver is minimized. The MST algorithm presented in [5] is a O(log 2 k)-approximation algorithm (k is the number of receivers) for the Minimum Steiner Tree problem. We used Cplex 10 to solve the ILPs for the optimal solution. In all our experiments, the number of mesh routers in the network was ﬁxed to be 70. The area for the deployment of the mesh nodes was a square area whose sides were computed based on the required node-density. For instance, for a required node-density of 100 nodes/km2 , the locations of the 70 nodes were randomly generated in a square of area of 0.7 km2 . The locations of the 70 nodes were generated randomly. The transmission radius of the mesh routers was ﬁxed at 100m. The multicast source and the multicast receivers were selected randomly. The simulation scenarios were designed to measure the impact of three parameters namely, the number of receivers, the density of the network (nodes per unit-area) and the ratio of interference radius to transmission radius (RI /RT ) on the interference-degree of the multicasting structure produced by the four approaches. The interference-degree is the maximum number of forwarding group mesh nodes that aﬀect any node in the multicast structure. As discussed before, this determines the maximum achievable data rate for the multicast structure. In the ﬁrst set of experiments (ﬁgure 5), the number of receivers was ﬁxed at 28 and the node-density at 100 nodes/km2 . RI was increased from 100m to 250m in steps of 50m resulting in RI /RT ratio to vary between 1 and 2.5. It is to be noted from ﬁgure 5 that as the RI /RT ratio, in turn the interference range of the mesh routers increase, the interference-degree increases rapidly. This is

308

S. Murthy, A. Goswami, and A. Sen

intuitive since an increase in the interference range introduces more interference on the mesh nodes. For all values of RI /RT ratio, our heuristic performs better than the SPT and MST algorithms.

Fig. 5. Interference-degree vs RI /RT ratio with # Receivers=28 and nodedensity=100 nodes/km2

Fig. 6. Interference-degree vs # receivers with RI /RT =2 and nodedensity=100 nodes/km2

In the second set of experiments (ﬁgures 6, 7), we ﬁxed the RI /RT ratio at 2. For node densities 100 nodes/km2 and 200 nodes/km2 , we plotted the variation of the interference-degree with the number of receivers. It can be observed that as the number of receivers increases, the interference-degree for each algorithm increases. However, the interference-degree of the heuristic stays closer to the optimal compared with the SPT and MST algorithms.

Fig. 7. Interference-degree vs # re- Fig. 8. Interference-degree vs nodeceivers with RI /RT =2 and node- density with RI /RT =1 and # density=200 nodes/km2 receivers=28

In the next experiments (ﬁgure 8), we studied the eﬀect of varying density on the interference-degree. The number of receivers was set at 28 and the RI /RT

Interference-Aware Multicasting in Wireless Mesh Networks

309

ratio at 1.5. The results may seem to be counter-intuitive at ﬁrst glance, since increased density must lead to increased interference-degree and the ﬁrst two points in the graph do not conﬁrm this observation. However for each value of node-density, the area in which the nodes are deployed changes and thus, the locations of the nodes are recomputed for each node-density resulting in diﬀerent communication and interference topologies. In all the experiments, the performance of our heuristic matches closely with that of the optimal. The performance SPT and MST algorithms in most of the cases are far from optimal. This is natural since these algorithms prefer shorter paths from source to the receivers and do not pay attention to the interference caused by these paths on the network nodes.

7

Related Work

Most of the related work in this area focusses on eﬃcient network layer multicast and broadcast in multihop wireless networks and MANETs. Multicast schemes can be classiﬁed as either source-based or mesh-based. Source-based protocols construct shortest paths from source to the receivers. AMRIS, MAODV and MOLSR [6] are the well known source-based protocols. The mesh-based protocols consider multiple paths from source to the receivers. Examples of meshbased protocols are ODMRP, CAMP and FGMP [7]. In the presence of mobility, mesh-based protocols are advantageous as they maintain alternate paths for each receiver. Another scheme [8] based on on-demand routing and genetic algorithm executes faster than the conventional multicast schemes. Moreover, 802.11 QoS issues studied in [9] reveal that the design framework of multicast protocols should share ﬂow characteristics across multiple layers and cooperate to meet the application’s requirement. Study done in [10] states that in multicasting there is no one size ﬁts all protocol that can optimally serve the need of all types of application. Recently, a joint optimization approach in [11] emphasizes network coding technique for multicast routing and game theory approach for interference management. The closest work to ours is [12] in which the problem of computing multicast trees with minimal bandwidth consumption in mesh networks is considered. The authors show that this NP-complete problem is equivalent to minimizing the number of multicast transmissions rather than the edge cost or the total number of edges of the multicast tree. On the contrary, a tree with minimum number of transmissions may not provide minimum interference or maximum bandwidth.

8

Conclusion and Future work

In this paper, we categorized the interference-aware multicasting problem as a graph problem of ﬁnding a weakly induced subgraph of minimum degree. For this purpose, we introduced a new model of interference called the potential interference graph. We analyzed the intractable nature of this problem and presented an eﬃcient greedy-based heuristic algorithm. We also presented a distributed

310

S. Murthy, A. Goswami, and A. Sen

extension of our heuristic. The simulation results provide substantial evidence of the superior performance of our heuristic compared to the Shortest Path Tree and Minimum Steiner Tree approximation algorithms. Future work in this area lies in analyzing the performance of the distributed algorithm and providing approximation bounds to the centralized algorithm.

References 1. Online: City-wide wi-ﬁ projects. (http://www.seattlewireless.net, http:// www.wirelessphiladelphia.org, http://www.waztempe.com) 2. Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman Press (1979) 3. Bazaraa, M., Jarvis, J., Sherali, H.: Linear Programming and Networks Flows. John Wiley & Sons (2004) 4. Online: Rfc 3626 - optimized link state routing protocol (olsr). (http://www. ietf.org/rfc/rfc3626.txt) 5. Charikar, M., Chekuri, C., yat Cheung, T., Dai, Z., Goel, A., Guha, S.: Approximation algorithms for directed steiner problems. In: ACM-SIAM Symposium on Discrete Algorithms (SODA). (1998) 6. Kunz, T., Cheng, E.: Multicasting in ad-hoc networks: Comparing maodv and odmrp. In: International Conference on Distributed Computing Systems (ICDCS). (2002) 7. Madruga, E., Garcia-Luna-Aceves, J.: Multicasting along meshes in ad-hoc networks. In: International Conference on Communications. (1999) 8. Banerjee, N., Das, S.: Modern: Multicast on-demand qos-based routing in wireless networks. In: Vehicular Technology Conference. (2001) 9. Zhu, H., Li, M., Chlamtac, I., Prabhakaran, B.: A survey of quality of service in ieee 802.11 networks. IEEE Wireless Communications 11 (2004) 10. Gossain, H., Cordeiro, C., Agrawal, D.: Multicast: wired to wireless. IEEE Communications Magazine 40 (2002) 116–123 11. Yuan, J., Li, Z., Yu, W., Li, B.: Cross-layer optimization framework for multihop multicast in wireless mesh networks. IEEE Journal on Selected Areas in Communication 24 (2006) 12. Ruiz, P., Gomez-Skarmeta, A.: Heuristic algorithms for minimum bandwith consumption multicast routing in wireless mesh networks. In: ADHOC-NOW. (2005)

Characterizing the Capacity Gain of Stream Control Scheduling in MIMO Wireless Mesh Networks Yue Wang1 , Dah Ming Chiu2 , and John C.S. Lui1 1

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong {ywang,cslui}@cse.cuhk.edu.hk 2 Dept. of Information Engineering, The Chinese University of Hong Kong [email protected]

Abstract. Stream control has recently attracted attentions in the research of MIMO wireless networks as a potential way to improve network capacity. However, inappropriate use of stream control may significantly degrade network capacity as well. In this paper, we provide the first formal study on stream control scheduling in MIMO wireless mesh networks. We derive the theoretical upper bound on network capacity gain of stream control scheduling. We also provide an efficient scheduling algorithm and show that its achieved network capacity gain is close to its theoretical upper bound. Moreover, we point out the poor performance of a previous stream control scheduling algorithm SCMA under the general settings of wireless mesh networks. This formal characterization provides a deeper understanding of steam control scheduling in MIMO wireless mesh networks. Keywords: MIMO, stream control, network capacity, scheduling.

1 Introduction Multiple-input multiple-output (MIMO) technology is regarded as one of the most significant breakthroughs in recent wireless communication[1]. By exploiting multi-paths in indoor or outdoor environment, MIMO is able to provide very high data rate by simultaneously transmitting multiple independent data streams on the same wireless channel. This kind of simultaneous transmissions, referred to as spatial multiplexing, is very desirable to wireless mesh networks (WMNs) where network capacity is the main concern.

001001

Transmitting Wireless Node

Receiving Wireless Node

Fig. 1. MIMO Wireless nodes with three antennas

Generally speaking, MIMO can be represented as a multiple transmitters and multiple receivers system. As illustrated in Fig. 1, there are three antennas for each wireless I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 311–321, 2007. c IFIP International Federation for Information Processing 2007

312

Y. Wang, D. Ming Chiu, and J.C.S. Lui

link 1

a

200 m

b

NSC throughput

node. At the transmitting wireless node, a primary bit stream is splitted into three individual bit streams, which are then encoded and transmitted by the three antennas. The signals are mixed naturally in the wireless channel. At the receiver end, these three bit streams are separated and decoded, and then combined into the original primary bit stream. An analogy of the stream separation is similar to solving three unknowns in a group of three linear equations. In general, each MIMO node has K ≥ 1 antennas. Link capacity grows linearly with K since there are K independent streams. This link capacity gain is referred to as spatial multiplexing gain.1 Roughly speaking, for the receiver to successfully separate and decode incoming streams, the following conditions must be satisfied: (a) the number of successfully decoded streams is not greater than K and (b) the strength of additional streams (treated as noise by the receiver) is far weaker than that of the successfully decoded streams. Otherwise, it is not possible to decode any stream[3]. When a wireless link simultaneously transmits 0 or K streams on a channel, we call it Non-Stream Control (NSC) scheduling. Alternatively, one can use Stream Control (SC) scheduling to improve the capacity of MIMO systems. Under SC scheduling, a wireless link can simultaneously transmit k streams along a channel, where 0 ≤ k ≤ K. SC scheduling provides more flexibility than does NSC scheduling since it can choose an appropriate number of streams so as to maximize network capacity. Recent studies [4][5][6] show that SC can be applied to multiple interfering links to improve network capacity.

link 1 = 100Kbps

link 2 = 100Kbps

time time slot

time slot

link 2

c

200 m

(a)

d

SC throughput

200 m

(b) link 2 = 60Kbps link 1 = 60Kbps time

(c)

Fig. 2. An Illustration of Network Capacity for NSC and SC scheduling

An example to illustrate the improvement in network capacity by SC is shown in Fig. 2. There are four MIMO nodes with two flows of streams, one from node A to node B along link 1, and the other from node C to node D along link 2. The two links 1

Another important feature of MIMO is that it has diversity gain that provides an increase in SNR at the receiver by partially or fully redundant K streams. However, it was shown in [3] that we often achieve higher throughput by exploiting spatial multiplexing instead of diversity. For the rest of this paper, we only focus on the scheduling based on spatial multiplexing.

Characterizing the Capacity Gain of Stream Control Scheduling in MIMO WMNs

313

NSC throughput

are placed close to each other and thus get mutual interference. Assume there are four antennas on each node (K = 4). Under NSC scheduling, each wireless node transmits 4 streams at an aggregate rate of 100 Kbps. The maximum network capacity is 100 Kbps when the two links transmit in an alternate manner (Fig. 2(b)). Under SC scheduling, however, each link can choose to transmit k streams where k ∈ {0, 1, 2, 3, 4}. Specifically in MIMO, these streams have different transmission rate [2]. For example, the ratio of transmission rate for the 4 streams is 1.0 : 0.8 : 0.7 : 0.5. Each link can choose the two best streams to transmit with an aggregate rate of 1.0+0.8 1.0+0.8+0.7+0.5 · 100 = 60 Kbps (this is called stream selection), achieving a network capacity of 120 Kbps (Fig. 2(c)).

a link 2

b

SC throughput

c link 1

d link 3

g

e

link 4

link 2

link 1

link 3 link 4

time slot

time slot

(b) link 4 link 3 link 2 link 1

h

f

time

time

(a)

(c)

Fig. 3. Illustration of Receiver Overloading Problem

Previous work [4][5][6] showed that MIMO with stream control can increase the network capacity by 20 − 65% for a set of mutually interfering links. However, it was noted in [3] that when stream control is not applied judiciously, one may encounter the problem of receiver overloading that will significantly degrade the overall network capacity. We illustrate this problem in Fig. 3. Here, each wireless node has K = 4 antennas. Link 1 interferes with link 2, 3 and 4 while the last three links do not interfere with each other. If link 1 transmits 4 streams in one time slot and the other three links each transmit 4 streams in the next time slot, it will result in an average of 8 streams per time slot. If all four links use SC, then each can only transmit one stream (upper bounded by 4 streams for link 1). In this case, only 4 streams per time slot can be achieved, which has a poorer performance than NSC scheduling. The reason is that the use of SC for link 1 suppresses full transmissions of the other three links. Sundaresan et al. proposed a heuristic algorithm called SCMA (Stream-Controlled Multiple Access) for using SC scheduling in multihop wireless networks [3]. The algorithm states that SC is used for multiple interfering links only when they belong to a single maximal clique (more discussions in later sections). In other cases, NSC scheduling should be used. SCMA can work well for dense link interference graph where links likely belong to a single maximal clique. In this paper, we provide the first formal characterization on the capacity gain of SC from the angle of scheduling. In particular, we

314

Y. Wang, D. Ming Chiu, and J.C.S. Lui

address the fundamental question that how much network capacity gain can be expected from SC scheduling. In order to answer these questions, in Section 2, we derive the theoretical upper bound on the network capacity gain of SC scheduling to NSC scheduling. In Section 3, we present a greedy scheduling algorithm when NSC or SC is used. In Section 4, we discuss the simulation results and compare the network capacity of NSC and SC scheduling as well as that of SCMA. Finally, Section 5 discusses the limitations of our work and concludes the paper.

2 Upper Bound on Capacity Gain of Stream Control Scheduling In this section, we first define the terminologies used in wireless networks, then we propose a simplified model for MIMO WMNs and derive the theoretical upper bound on the network capacity gain of SC scheduling. We consider a multihop wireless network with n static nodes. Each node has multiple omnidirectional antennas that operate on one channel. We model it as a directed graph G = (V, E) where V represents the set of wireless nodes and E represents the set of wireless links. Each wireless node has the same transmission power and receiving sensitivity, resulting in the same maximum transmission range by a certain path loss model (e.g. two-ray ground model). Therefore, two nodes can form a link when the distance between them is not greater than the maximum transmission range. We use the range-based interference model[7] that is shown to be realistic by [8]. Let L1,2 (L3,4 ) be the link between node 1 and 2 (node 3 and 4). Let dij denote the distance between node i and node j. L1,2 is interfered by L3,4 when node 1 or node 2 is interfered by node 3 or node 4, precisely, min(d1,3 , d1,4 , d2,3 , d2,4 ) ≤ S · d1,2 . Here we call S the interference range factor that is determined by the SNR threshold for successful receiving and the underlying path loss model. Note that the above reception and interference models are actually implemented in ns-2 simulator.

2 1 1 2 3

(a)

4

(b)

Fig. 4. Examples of Link Interference Graph

To represent the interference between links, we introduce the link interference graph GI = (V I , E I ) , also known as flow contention graph [9]. we rename it here to avoid confusion with end-to-end flows. V I represents the set of wireless links and E I the set of interfering link pairs. That is, if link i is interfered by link j or vice versa, we add two edges (i, j) and (j, i) into E I specifying that i and j can not transmit at the same

Characterizing the Capacity Gain of Stream Control Scheduling in MIMO WMNs

315

time. Fig. 4 shows the link interference graphs for the networks in Fig. 2 and Fig. 3, respectively. To derive the upper bound on network capacity, we assume that the system operates in a synchronous time-slotted mode. Links transmit at the beginning of each time slot in a TDMA mode. They can transmit at the same time, provided that they do not interfere with each other. Besides, we do not consider packet losses caused by fading. This is a reasonable assumption since MIMO can significantly mitigate fading losses. Next, we assume that there are K > 1 antennas on each wireless node, and thus a link can simultaneously transmit at most K independent data streams. We introduce the stream control gain g, a factor for all wireless links. The total capacity of multiple mutually interfering links using SC is at most g times that using NSC. 1 < g < 2 because g ≥ 2 will violate the physical constraint (i.e. multiple interfering links can not transmit simultaneously at their full rate). Finally, we define network capacity gain using SC. Assume that the traffic pattern (i.e., the ratio of average link rate) is fixed. Let CN SC and CSC be the respective network capacity for NSC and SC scheduling. We have CSC = GSC · CN SC , where GSC is defined as the network capacity gain of SC scheduling. We now present the interference constraint in MIMO wireless networks. Without the loss of generality, we normalize the capacity of a MIMO link to 1 and define the link transmission rate as a fraction of 1. Let y(t) be the link transmission rate vector at time t (i.e., ye (t) is the transmission rate of link e), and N (e) the set of neighbors (i.e., interfering links) of e. If e transmits at time slot t, the sum of transmission rate in N (e) ∪ e must be less than or equal to g, otherwise, receivers can not decode the mixed signals correctly. Formally, we define the MIMO interference constraint as, ⎛ ⎞ 1ye (t)>0 · ⎝ye (t) + ye (t)⎠ ≤ g, ∀e ∈ E and t ≥ 0. (1) e ∈N (e)

The function 1 is an indicator function, returning 1 when ye (t) > 0, and returning 0 otherwise. For NSC scheduling, g = 1 and ye (t) is 0 or 1. For SC scheduling, 1 < g < 2 and ye (t) is a real number between 0 and 1 since a MIMO link can properly choose the number of streams and their transmission rate. Thus, NSC scheduling can be seen as a special case of SC scheduling. We now state and prove the fundamental theorem of SC scheduling. Theorem 1. Assume that the traffic pattern (i.e., the ratio of average link rate) is fixed, the network capacity of SC scheduling is at most g times that of NSC scheduling. Proof: We prove it by two steps. First, we define the SC scheduling SC(1) when g = 1, and prove that the capacity of SC(1) is equal to that of NSC. Second, we prove that the capacity of SC scheduling is at most g times that of SC(1). Step 1: We only need to prove that the capacity of NSC scheduling is greater than or equal to that of SC(1) scheduling, since the capacity of SC(1) scheduling is at least equal to that of NSC scheduling. As we are interested in the average link rate, it is sufficient to prove that for any feasible link transmission rate vector y SC of SC(1) at some time slot, there is a sequence of link transmission rate vectors of NSC {yN SC (t)|t = 1, 2, ..., T }

316

Y. Wang, D. Ming Chiu, and J.C.S. Lui

for T time slots, whose average is greater than or equal to y SC . In other words, we scale y SC by a large number of time slots T to make it an integer workload vector u = y SC · T (real number can be approximated by rational), and prove that u can be scheduled in T time slots using NSC. We prove this by induction for the number of links m in the network. When m = 1 (u and y SC are scalars), we can easily schedule u in T since y SC ≤ 1. Assume that we can schedule u for m links in T . We consider adding the (m + 1)th active link e. The neighbors of e consume at most e ∈N (e) ue time slots, and we can use the remaining (T − e ∈N (e) ue ) time slots (which is greater than ue by Eq. (1) for g = 1) to schedule ue . So, we can schedule u for m+1 links in T . Step 2: We prove it by contradiction. Assume the capacity of SC is greater than g times that of SC(1), then we can get the capacity of SC (1) by dividing all link transmission rate by g, which is greater than itself. Remark: SC scheduling can not provide greater than g times network capacity gain although it provides a finer scheduling scheme. Therefore, if g is of small value, SC is not worthwhile to consider as the computational cost of SC scheduling is higher than that of NSC scheduling. Besides, the upper bound is not always achievable due to the potential problem of receiver overloading.

3 A Stream Control Scheduling Algorithm In the previous section, we derived the theoretical upper bound of network capacity gain of SC scheduling. However, this upper bound may not be achievable. In this section, we present a centralized SC scheduling algorithm and obtain the capacity gain in a practical way. It is important to point out that this algorithm provides a lower bound on the achievable network capacity. Given a wireless network G and a vector of link traffic workloads u (ue is the workload on link e), the scheduling algorithm will calculate the number of time slots needed to transmit all workload in u. Let T N SC and T SC be the number of time slots expended for NSC and SC scheduling respectively. So the network capacity of NSC and SC scheduling is: ue CN SC = e∈E , (2) N SC T ue CSC = e∈E , (3) SC T and the network capacity gain of SC scheduling is, GSC =

CSC CN SC

=

T N SC . T SC

(4)

To schedule u, we design a greedy scheduling algorithm GreedySC. At each time slot, the algorithm finds schedulable links of maximal workloads greedily so as to minimize the total number of time slots used. Fig. 5 depicts the algorithm. The inputs to this algorithm are the link interference graph GI , the workload vector u and stream control

Characterizing the Capacity Gain of Stream Control Scheduling in MIMO WMNs

317

GreedySC(GI , u, g) Ts := 0; A := E; While A = φ Ts := Ts + 1; y := 0; Sort A in the deceasing order of u; /* Step 1. using NSC*/ For each e ∈ A If ye = 0, ∀e ∈ N (e) ye := 1; ue := ue − 1; If ue ≤ 0 A := A − {e}; fi fi end for /* Step 2. using SC*/ For each e ∈ A and ye = 0 If yN(e) ≤ 1 and (yN(e ) ≤ 1, ∀e ∈ N (e) and ye > 0) ye := g − 1; ue := ue − (g − 1); If ue ≤ 0 A := A − {e}; fi fi end for end while

Fig. 5. Specification of GreedySC Scheduling Algorithm

gain g. Here, Ts is the number of time slots expended, A is the set of uncompleted links, and yN (e) is the sum of transmission rate for links in N (e). Note that GreedySC can also be applied to NSC scheduling by letting g = 1. The algorithm is described as follows. For each time slot, we first sort the links in decreasing order of the remaining workloads. In Step 1, we use the NSC mode to schedule links. That is, we iteratively pick a link e from the sorted list. If no neighboring links of e have been scheduled, we schedule e for transmission (ye := 1). Otherwise, we do not schedule it. Then we move on to the next link on the sorted list until we finish examining all the links. In Step 2, we use the SC mode to schedule links. Again, we iteratively pick a link e from the sorted list that was un-schedulable in step 1. If Eq. (1) can be satisfied for e and all its neighboring links, we schedule e for SC transmission (ye := g − 1). Otherwise, we do not schedule it. Then we move on to the next link on the sorted list until we finish examining all the links. At this point, we decide the scheduling for one time slot. The above process is repeated in the next time slot until all the traffic workload in u is completed.

318

Y. Wang, D. Ming Chiu, and J.C.S. Lui

The algorithm naturally avoids the problem of receiver overloading as it schedules links using NSC first and thus guarantee the network capacity of SC scheduling is always greater than that of NSC scheduling. Note that, according to the scheduling sequence of links in GreedySC, a SC link e (ye = g − 1) must neighbor some NSC link e (ye = 1). Otherwise, e can be scheduled using NSC. And all neighboring links of e or e can not be scheduled to satisfy Eq. (1). Therefore, for stream selection, the two links can select streams of other rate to achieve ye + ye = g (e.g., ye = ye = g2 ). GreedySC achieves higher network capacity than SCMA. To understand this, let us describe SCMA first. In short, multiple interfering links can be scheduled simultaneously using SC, only when they all belong to a single maximal clique2 . SCMA can not cause receiver overloading, because links using SC do not affect the transmission of other maximal cliques. Besides, SCMA can be implemented distributedly [3]. For example in Fig. 4 (a), link 1 and 2 belong to the single maximal clique {1, 2}, so they can be scheduled simultaneously using SC. In Fig. 4 (b), there are no two interfering links both belonging to a single maximal clique, as link 1 belong to 3 maximal cliques {1, 2}, {1, 3} and {1, 4}. So SCMA can not use SC here. But in fact, links belonging to multiple maximal cliques can also use SC for transmission, which is considered by GreedySC. For example in Fig. 4 (b), GreedySC will schedule link 1 using NSC (y1 = 1) and link 2 using SC (y2 = g − 1). Formally, consider any two links a and b belonging to a single maximal clique. Suppose a is scheduled using NSC in step 1 of GreedySC. It is easy to see that b can be scheduled using SC in step 2 of GreedySC. In this way, GreedySC covers the case of SCMA and thus achieves higher network capacity.

4 Performance Evaluation This section shows the simulation results in general WMNs. We test the capacity gain of SC scheduling for GreedySC and SCMA, and compare it with the theoretical upper bound g. By general WMNs, we mean static multihop wireless networks with the general settings of (1) network density and (2) interference level in current practices. We measure network density by node degree, i.e., the number of links a wireless node has, and measure interference level by the interference range factor S. The node degree in a WMN is typically around 2 to 6, otherwise, there will be too much interference. For example, our simulation shows that a link is interfered by about 16 other links when the average node degree is 3. We vary the SNR threshold for successful receiving from 6 to 10dB [11] depending on the environment and coding scheme. And the resulting interference range factor S varies from 1.4 to 1.8 by the two-ray ground path loss model[10]. Note that higher SNR threshold implies more interfering links. We designed a TDMA simulator to generate WMNs and schedule link workloads. First, we randomly place 25 wireless nodes in an area of 1000m×1000m, with the maximum transmission range of 250m. The average node degree is 4.2. g = 1.5 for all wireless links. We generate 100 flows with sources and destinations randomly selected. 2

Here a maximal clique is a maximal fully connected subgraph of GI . All links of a maximal clique interfere (neighbor) with each other. A link may belong to multiple maximal cliques.

Characterizing the Capacity Gain of Stream Control Scheduling in MIMO WMNs GreedySC SCMA

1.5

Network Capacity Gain of SC

Network Capacity Gain of SC

1.6

1.4 1.3 1.2 1.1 1 0.9 6

7

8 9 SNR threshold (dB)

10

Fig. 6. Avg. Network Capacity Gain of SC for 1000m×1000m Random Topologies (g = 1.5)

1.6

319

GreedySC SCMA

1.5 1.4 1.3 1.2 1.1 1 0.9 6

7

8 9 SNR threshold (dB)

10

Fig. 7. Avg. Network Capacity Gain of SC for 800m×800m Random Topologies (g = 1.5)

Each flow has the same traffic demand of 1000. And we map the flow traffic demand x into link traffic workload u by u = Rx, where R is the routing matrix (Rij = 1, if flow j is on link i; Rij = 0, otherwise). We use the shortest path routing for these flows and run the simulation for tens of times to get the average. Fig. 6 shows the network capacity gain of SC scheduling GSC when GreedySC and SCMA are used. SCMA provides little capacity gain because there are few links belonging to a single maximal clique, while GreedySC provides 30−40% improvement on network capacity compared with NSC scheduling. An interesting observation is that there are still a small number of links belonging to single maximal cliques when we increase SNR threshold to make link interference graph denser (introduce more edges in GI ). We found the reason from the simulation traces. The average size of maximal cliques gets larger as the link interference graph becomes denser. As a result, most links are in the overlap of multiple maximal cliques. So we can not expect that many links belong to a single maximal clique until the link interference graph is sufficiently dense (the extreme is one clique). Next, we consider the effect of network density on the network capacity gain of SC scheduling. We keep all parameters of the previous experiment but reduce the area size to 800m×800m to increase the network density. The average node degree now is 6.1. Fig. 7 shows the network capacity gain of SC scheduling. The performance of GreedySC and SCMA is similar to that of 1000m×1000m. The reason is that both the number of links and interfering link pairs increases when the network density increases. As a consequence, the link interference graph is scaled up proportionally. We also did simulation in grid topologies. In this experiment, we place 25 wireless nodes in a 5 × 5 grid. There is a distance of 200m (and 160m later) between two horizontally or vertically neighboring nodes. The other parameters are the same as that in random topologies. SCMA does not work at all here because no links belong to a single maximal clique, while GreedySC provides the network capacity gain of 1.2 to 1.3 (Fig. 8 and Fig. 9). In short, SCMA performs poorly in general WMNs. Finally, simulation shows that the network capacity gain of GreedySC is a linearly increasing function of g and is closer to its theoretical upper bound (g) than SCMA. Fig. 10 shows the result for 25-node random topologies in an area of 1000m×1000m when SIR threshold is 10dB.

Y. Wang, D. Ming Chiu, and J.C.S. Lui 1.6

1.6

GreedySC SCMA

1.5

Network Capacity Gain of SC

Network Capacity Gain of SC

320

1.4 1.3 1.2 1.1 1 0.9 6

7

8 9 SNR threshold (dB)

Network Capacity Gain of SC

Fig. 8. Avg. Network Capacity Gain of SC for 1000m×1000m Grid Topologies (g = 1.5)

1.8

1.4 1.3 1.2 1.1 1 0.9 6

10

GreedySC SCMA

1.5

7

8 9 SNR threshold (dB)

10

Fig. 9. Avg. Network Capacity Gain of SC for 800m×800m Grid Topologies (g = 1.5)

Upper Bound GreedySC SCMA

1.6

1.4

1.2

1 1

1.2

1.4

1.6

1.8

g

Fig. 10. Network capacity gain of SC as a function of g (SIR threshold = 10dB)

5 Conclusion and Future Work In this paper, we characterized the network capacity of SC by formulating it as a mere scheduling problem. We proved that the network capacity of SC scheduling is at most g times that of NSC scheduling, where g is stream control gain. We then proposed an efficient algorithm GreedySC so as to realize NSC and SC scheduling. Intensive simulations showed that the capacity gain achieved by GreedySC is close to its theoretical upper bound. Therefore, we can use GreedySC as a benchmark for other SC algorithms. In particular, we pointed out the poor performance of SCMA under the general settings of WMNs, indicating that there are plenty of rooms for enhancement. There are some limitations in our work. First, our model assumes perfect packet scheduling. Actually, one can easily incorporate packet losses into our theorem and algorithm to get similar results. Second, we use a simplified physical layer model and do not account for time-varying fading channels. It remains an open problem to jointly consider packet scheduling and more detailed physical layer model in MIMO WMNs. Third, GreedySC currently is a centralized algorithm. The distributed implementation requires special supports from physical layer. For example, transmitters need to estimate accurately the number of streams that receivers can accommodate so as to transmit extra

Characterizing the Capacity Gain of Stream Control Scheduling in MIMO WMNs

321

streams using SC. In sum, there is a lot of future work for researchers both in the area of network layer and physical layer before SC becomes a practical technique in MIMO WMNs.

References 1. D. Gesbert, M. Shafi, D. Shiu, P.J. Smith, and A. Naguib, “From Theory to Practice: An Overview of MIMO Space-Time Coded Wireless Systems,” IEEE J. Selcted Areas in Comm., vol. 21, pp. 281-301, 2003. 2. David Tse and Pramod Visvanath, “Fundamentals of Wireless Communication”, Cambridge University Press, 2005. 3. K. Sundaresan, R. Sivakumar, M.A. Ingram, and T.-Y. Chang, “Medium access control in ad hoc networks with MIMO links: optimization considerations and algorithms,” IEEE Trans. on Mobile Computing, vol. 3, pp. 350 - 365, 2004. 4. M.F. Demirkol and M.A. Ingram, “Control Using Capacity Constraints for Interfering MIMO Links,” Procs., Intl Symp. Personal, Indoor, and Mobile Radio Comm., vol. 3, pp. 1032-1036, 2002. 5. M.F. Demirkol and M.A. Ingram, “Stream Control in Networks with Interfering MIMO Links,” Procs., IEEE WCNC, vol. 1, pp. 343-348, 2003. 6. J.-S. Jiang, M.F. Demirkol, and M.A. Ingram, “Measured Capacities at 5.8 GHz of Indoor MIMO Systems with MIMO Interference,” Procs., IEEE VTC, vol. 1, pp. 388-393, 2003. 7. P. Gupta and P. R. Kumar, “The capacity of wireless networks,” IEEE Trans. on Info. Theory, vol. 46, pp. 388-404, 2000. 8. J. Padhye, S. Agarwal, V. N. Padmanabhan, Lili Qiu, A. Rao and Brian Zill, “Estimation of Link Interference in Static Multi-hop Wireless Networks,” Procs., USENIX IMC, pp. 305310, 2005. 9. H. Luo, S. Lu, V. Bharghavan, “A new model for packet scheduling in multihop wireless networks,” Procs., ACM MOBICOM, pp. 76-86, 2000. 10. K. Xu, M. Gerla, and S. Bae, “How effective is the IEEE 802.11 RTS/CTS handshake in ad hoc networks,” Procs., IEEE GLOBECOM, vol. 1, pp. 72-76, 2002. 11. A. Kochut, A. Vasan, A. U. Shankar, A. Agrawala, “Sniffing out the correct physical layer capture model in 802.11b,” Procs., IEEE ICNP, pp. 252-261, 2004.

AP and MN-Centric Mobility Prediction: A Comparative Study Based on Wireless Traces Jean-Marc Fran¸cois and Guy Leduc Research Unit in Networking (RUN) Department of Electrical Engineering and Computer Science Institut Monteﬁore, B28 — Sart-Tilman University of Li`ege 4000 Li`ege, Belgium {francois,leduc}@run.montefiore.ulg.ac.be

Abstract. The mobility prediction problem is deﬁned as guessing a mobile node’s next access point as it moves through a wireless network. Those predictions help take proactive measures in order to guarantee a given quality of service. Prediction agents can be divided into two main categories: agents related to a speciﬁc terminal (responsible for anticipating its own movements) and those related to an access point (which predict the next access point of all the mobiles connected through it). This paper aims at comparing those two schemes using real traces of a large WiFi network. Several observations are made, such as the diﬃculties encountered to get a reliable trace of mobiles motion, the unexpectedly small diﬀerence between both methods in terms of accuracy, and the inadequacy of commonly admitted hypotheses (such as the diﬀerent motion behaviours between the week-end and the rest of the week).

1

Introduction

Wireless networks have experienced spectacular developments those last ten years. They are today facing two major changes: – The number of wireless users grows quickly, and those users always ask for more bandwidth. The current trend is thus to reduce the transmitters’ coverage which in turn increases the rate at which mobile hosts (MHs) switch from an antenna to the next (or handover rate). – Since voice, television, and data networks are now merging, it would be desirable to be able to guarantee various quality of service (QoS) levels. When a mobile terminal moves, one of the main causes of service degradation is switching between the network’s access points (APs): changing one’s current AP requires re-routing the received and sent data ﬂows, a procedure which is likely to cause packet losses and delays. Predicting a MH’s next handover(s)

This work has been supported by the Belgian Science Policy in the framework of the IAP program (Motion P5/11 project) and by the IST-FET ANA project.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 322–332, 2007. © IFIP International Federation for Information Processing 2007

AP and MN-Centric Mobility Prediction

323

allows taking pro-active measures to reduce handovers impact, a problem known as mobility prediction. In the following, we study a wireless WiFi network and aim at predicting each mobile host’s next AP. Mobility prediction methods can be classiﬁed into two main families: – MH-centric (MHC): the agent performing predictions is bound to a MH; it builds a model of this particular MH’s movements (e.g. [1,2,3]); – AP-centric (APC): the prediction agent is bound to an AP; this AP builds a model using the motion of the MHs passing by (e.g. [4,5,6]). Those models can be built on-line, as the mobiles move, and can perform a prediction anytime. The pieces of information that allow deducing the likely motion of a mobile terminal are very varied (e.g. GPS coordinates). In what follows, we assume that the piece of information used is the sequence of most recently encountered APs. This assumption is not strong since it only requires that MHs record the last APs they have been associated with. The various prediction methods presented in the literature have rarely been validated using mobility traces extracted from a real network. [7] is a notable exception which shows that simple markovian models perform nearly as well as other, more complex methods. We thus focus on those models in this paper. In the following, we present a comparison between MH and AP-centric prediction methods using the mobility traces of a large-scale WiFi network. It is expected that the conclusions drawn here could be applied to other kinds of mobile networks. The rest of the paper is organised as follows. Section 2 deﬁnes the prediction models utilized in the article. Sections 3 and 4 give a description of the traces used, the way they have been processed, and basic performance results. Section 5 compares both types of prediction schemes (MHC and APC). Section 6 concludes.

2

Next-AP Predictors

A next-AP predictor, or prediction agent, is the entity responsible for building a model of MH’s movements; this model can be used for predictive purposes. 2.1

Centralized and Decentralized Methods

In this paper, we study the diﬀerences between the two most popular prediction schemes. In the ﬁrst method, APC, the prediction agents are the APs. Each AP builds a model of the movements of mobiles passing by. The MHs’ involvement is minimal, since they only send during each handover an identiﬁcation of their previous APs. This architecture is particularly well suited to situations where predictions are mainly useful to the ﬁxed network infrastructure (which could, for example, use it to reserve resources anticipatively).

324

J.-M. Fran¸cois and G. Leduc

The second method, MHC, is more distributed: every MH builds a model using its own movements. It is expected to be more reliable since more speciﬁc: the behaviour of a particular mobile cannot be simpliﬁed to the mean behaviour of all the MHs moving in the same area. However, this scheme does not ﬁt well the standard wireless network paradigm, where terminals are supposed to be small, memory and processing limited devices (such as a low-end GSM), and not suited to running a learning algorithm. Moreover, no prediction can be made when a MH visits APs for the ﬁrst time. 2.2

Markovian Models

We model MHs’ motion habits thanks to their location history (or trace), i.e. the sequence of APs crossed during their journey. Considering each AP as a symbol of a (ﬁnite) alphabet, a MH’s trace is a sequence of symbols and prediction aims at guessing symbol i + 1 given the ﬁrst i. Observing MHs’ motion allows a prediction agent to tune the model’s parameters so that prediction improves over time It has been shown ([7,8]) that in this context, simple Markov predictors perform as well as other, more complex methods1 (such as [9,10,11,12]). We thus only consider this class of predictors here. Let L = {L1 , L2 , L3 . . . } be the set of locations and L = L1 , L2 , L3 . . . a location history. The order n markovian hypothesis is: P (Li = l|L1 , . . . , Li−1 ) = P (Li = l|Li−n , . . . , Li−1 )

∀ l ∈ L, i > n

(1)

Less formally, this equation states that the stochastic variable that describes the next-AP probability follows a distribution that only depends on the last n symbols. We assume a stationary distribution2 . The next-AP distribution can easily be learnt on-line. We assume that the agent responsible for building the markovian model is regularly notiﬁed of MH(s) movements. If we denote Lm the location history of mobile m, the order-n model estimation rule is: m m O(Lm i−n , . . . , Li−1 , l ; L ) (2) P (Li = l | Li−n , . . . , Li−1 ) = m∈M m m m m∈M O(Li−n , . . . , Li−1 ; L ) where the O(· ; ·) operator ﬁnds the number of occurrences of its ﬁrst argument in its second, and M is the set of mobiles involved. In the case of MHC, each terminal only models its own motion, thus M is a singleton. When a model is used to perform a prediction, the most probable next AP (given the current context, i.e. the MH’s last n APs) is chosen. No prediction 1

2

To be fair, some of those not only aim at location prediction, but at other purposes such as mobile paging. This hypothesis is conﬁrmed by the results presented in section 5.3.

AP and MN-Centric Mobility Prediction

325

can be performed if the context has never been observed before. To limit the consequences of this possibility, we build together with each order-n model, n−1 other models of order n − 1, n − 2, . . . , 1. If a prediction cannot be performed because the current context is seen for the ﬁrst time, we fallback on a lower-order model. We do not aim at predicting when a mobile will enter or leave the network; only the proper inter-AP movements are taken into account.

3

Wireless Traces

The traces used have been collected by Dartmouth University in the context of the CRAWDAD project ([13]). It is a collection of events generated by the WiFi network of the campus. Syslog and SNMP data have been recorded for 2 years and cover 6202 MHs and 575 APs ([14]). This data have been analysed ([8]) to extract the actual movement traces (i.e. for each MH, a sequence of APs). A special AP, denoted OFF, indicates that a mobile has been ‘deauthenticated’ or that it has not generated any activity since at least 30 minutes; we then consider that it has been disconnected. Each MH is identiﬁed using its MAC address; we assume that each MAC address matches one (and only one) user. Apparatus with more than one interface and apparatus shared by more than one person are considered rare. In the following, the same data are exploited to perform MH and AP-based predictions. More formally, each time MH m moves, its last n movements m m Lm i−n , . . . , Li−1 and the next AP Li are used to learn the parameters of an order-n markovian model; those “context, next AP” couples are the elements

MHC agents

APC agents

MH1

AP4

AP1

MH2

order 0, order 1, order 2 models

order 0, order 1, order 2 models

Learning set

Learning set

...

order 1, order 2 models

...

order 1, order 2 models

Learning set

...

Learning set

MH1 trace: AP3 AP4 AP7 AP8... <AP3, AP4> <AP3 AP4, AP7> <AP4 AP7, AP8>...

AP8 AP7 AP6

AP5 AP4

AP3

MH2 trace: AP3 AP4 AP5 AP9 OFF AP4 AP5... <AP3, AP4> <AP3 AP4, AP5> <AP4 AP5, AP9> <AP4, AP5>...

Fig. 1. Overview of the learning process for order-2 markovian models. The mobiles’ motion generates location histories that allow agents to build a set of “context, nextAP” couples (here denoted between brackets). The special “OFF” AP is introduced when the MH leaves the network; since we do not aim at predicting this event, those APs do not appear in the learning sets.

326

J.-M. Fran¸cois and G. Leduc

(or learning samples) of the learning set. With MHC, the prediction agent is bound to ‘m’; with APC, it is bound to Lm i−1 . Notice that the contexts fed to m Lm i−1 always end with Li−1 ; an order n model built with APC is thus as complex as an order n − 1 model built using MHC. Figure 1 depicts graphically the learning process. Each time a terminal moves, its new AP is appended to its mobility trace; a special ’OFF’ AP is added when the MH is disconnected from the network. This trace is then converted to a series of “context, next-AP” couples; the maximal context length depends on the order of the models built. The couples corresponding to MH m populate the learning set bound to m (left); the couples whose context ends with AP p are the learning samples that compose the learning set of p’s agent (right). MH-centric modelling

AP-centric modelling

16

14

14 12 12

Frequency (%)

Frequency (%)

10 10

8

6

8

6

4

4

2

2

0 1

4

16

64

256

1K

Learning set cardinality

4K

16K

64K

256K

0 1

4

16

64

256

1K

4K

16K

64K

256K

Learning set cardinality

Fig. 2. Distributions of learning sets cardinalities regarding MHC (left) and APC (right). The x-axis is logarithmic.

Figure 2 compares APC and MHC in terms of the distributions of the learning sets’ cardinality (i.e. the number of “context, next AP” couples). A lot of MHs barely move: 12% perform less than 8 movements (plot on the left, sum of the percentages reported in the ﬁrst 3 bars). On the contrary, few APs are unpopular: 15% are crossed by less than 256 MHs. The impact of under-learning should thus be more pronounced using the distributed MHC method. 3.1

Ping-Ponging

It is known (e.g. [15]) that such dataset exhibits the ping-ponging artefact (pingponging is deﬁned as repeatedly changing one’s current association back and forth between two —or more— access points). Mobility prediction is concerned about the physical movements of mobile terminals, not about those quick artefacts. This does not mean that predicting ping-ponging is not an interesting topic, but that it is only marginally related to the question studied here. We thus try to remove this artefact. Considering the location history L1 , . . . , Ln of a MH, the movement to Ln is classiﬁed as ping-ponging if Ln−2 = Ln . This simple rule surely does trigger “false positive”: MHs physically moving back and forth from an AP to another are classiﬁed as ping-ponging. However, we notice that the proportion of movements classiﬁed as ping-ponging varies from one interface manufacturer to another. Our

AP and MN-Centric Mobility Prediction

327

criterion thus looks reasonable since it ﬁlters handovers triggered by a technological cause. About 3 movements out of 10 are classiﬁed as ping-ponging.

4

Next-AP Predictions Accuracy

Table 1 shows the next-AP prediction accuracy using both APC and MHC, for various model orders. Table 1. Prediction accuracy. Each number corresponds to the ratio between the number of accurate predictions and the total number of predictions. The columns on the right show the performance one would obtain if ping-ponging were not removed.

Order 0 1 2 3 4 5

W/o ping-pong

With ping-pong

APC N/A3 28.7% 47.6% 53.0% 55.2% 55.4%

APC N/A 40.9% 64.5% 64.9% 65.1% 64.9%

MHC 24.0% 39.4% 58.4% 59.5% 59.8% 59.6%

MHC 38.7% 68.4% 72.9% 73.0% 73.0% 72.8%

We ﬁrst observe that a simple, statistical approach4 gives unsatisfactory results. Models improve quickly: order-2 models are quasi-optimal. Beyond that, performance improves very slowly and reaches its apogee with order-4 models. Performance decreases with models of higher order, showing a slight over-learning. The columns on the right are those obtained if the ping-ponging movements are not removed; they show that ping-ponging is easy to predict, even with simple markovian models. Results related to MHC are in accordance with [7]. Since table 1 shows that ping-ponging has a major impact on the prediction results, it would be interesting to repeat the experiments presented in [7] once ping-ponging has been ﬁltered out. The good prediction ratio is, overall, quite low. We can suppose that it would be higher for other kinds of networks where terminals are usually not switched oﬀ during their displacements (e.g. GSM), even if some of the terminals composing this dataset are WiFi phones ([14]). In this study, absolute accuracy performance is not our primary concern; we here emphasize the diﬀerences between APC and MHC schemes. The decentralized MHC scheme works better than APC. This result was expected, as diﬀerent persons have their speciﬁc behaviour: averaging the movement patterns of the people crossing the same AP only gives a rough estimate of 3

4

An order n model is based on contexts of length n. With APC, the prediction agent is the same as the last element of the context, which is undeﬁned in the case of a context of length 0. That is, an order 0 for MHC, or 1 for APC.

328

J.-M. Fran¸cois and G. Leduc

the way they move. Surprisingly however, the accuracy diﬀerence between APC and MHC is small (from 55.4% to 59.8%); in practice, this means that getting decent prediction performance does not require to embed a prediction agent in each mobile: placing them in the ﬁxed infrastructure can suﬃce. Section 5.2 gives hints on why the results of the two methods are so close.

5 5.1

Stressing the Diﬀerences Between APC and MHC Prediction Accuracy vs Learning Set Cardinality

0.8

100

0.7

90 80

0.6

70

0.5

60 50

0.4 APC MHC AP (%) MH (%)

0.3 0.2 0.1 0

40 30 20

Percentage of AP or MN

Prediction accuracy

The markovian models’ parameters learning process and the prediction process are interleaved: when a mobile is associated with an AP, this AP predicts where the terminal is going; as soon as the next AP is known, this piece of information is added to the learning set and allows it to improve the mobility model. The ratio of accurate predictions is thus a function of the elapsed learning time, and the way it evolves depends on the method used — APC or MHC.

10 0

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Learning set cardinality

Fig. 3. Prediction accuracy (strong lines, left axis) and proportion of prediction agents (thin lines, right axis) vs learning sets’ cardinality using order 3 models

Figure 3 (bold lines, left axis) draws the instantaneous prediction accuracy as a function of the learning sets cardinality. This plot shows that learning sets made of a few hundred elements lead to prediction ratios of more than 40%, regardless of whether prediction agents depend on APs or MHs. For bigger learning sets, both methods show diﬀerent proﬁles. For APC, performance gets slowly better and stabilizes at 55% when the learning set is made of about 3000 samples. With MHC, the results increase much faster, reaching 60% when the learning set is composed of 1000 elements, and settling at about 73% with 2000 elements. This last percentage is astonishingly good and is thus studied more carefully below.

AP and MN-Centric Mobility Prediction

329

On the same ﬁgure, thin lines (right axis) give the proportion of prediction agents which have a learning set cardinality higher than a given value. For example, for MHC, about 80% of the mobiles have a learning set composed of less than 500 elements. The curve associated with MHC decreases much faster than that of APC: nearly all the models have a learning set with a cardinality smaller than 3000. The APC curve has a very diﬀerent shape: there are still some learning sets with cardinalities greater than 8000 elements. Only a small number of MHs are thus responsible for prediction ratios higher than 70%. A manual inspection of motion traces shows that ping-ponging between 3 APs or more seems to explain this anomaly. Fortunately, this situation is rare and should only marginally impact the average performance. This ﬁgure allows ﬁnding the steady-state (i.e. on the long run) good prediction ratio reached by each method. If APC clearly settles at about 55%, the case of MHC needs to be considered more carefully. As mentioned above, performance of 70% or more are not realistic. We thus remove the 10% of best performing MHs and measure the performance of the remaining mobiles once they have reached a learning set cardinality greater than 500 elements; the good prediction ratio obtained is then 60.8%. 5.2

Next-AP Distributions’ Entropy

Knowing the last APs encountered by a mobile terminal (or context) does not allow a perfect prediction of its next AP. This uncertainty can be formalized as a random distribution of next APs; this distribution is characterized by a given entropy. This entropy is commonly linked to the diﬃculty of predicting the motion of the mobile. Mean entropies of order 1, 2, and 3 models are given in table 2 for APC and MHC. Table 2. Next-cell distribution entropies (in bits). Standard deviations are given between parentheses. Method APC MHC

1 1.86 (0.95) 0.98 (0.50)

2 1.72 (1.18) 0.82 (0.66)

3 1.58 (0.49) 0.91 (0.37)

The two schemes exhibit strong diﬀerences. This runs counter to the results obtained in terms of prediction accuracy (see table 1) which showed a diﬀerence indeed, but as small as about 5%. From this experiment, we can conclude that MHC clearly predicts more precisely which APs might be next encountered by a MH, but this problem is diﬀerent from the one generally studied, which is only concerned with ﬁnding the most probable next AP. Thus, even if entropy estimation allows us to get a quantitative measurement of the mobiles’ motion uncertainty given a model, directly linking entropy to prediction accuracy gives a biased picture. A more complete study of this point can be found in [16].

330

5.3

J.-M. Fran¸cois and G. Leduc

Time Division

It is commonly supposed (e.g. [17]) that it is desirable to divide a learning set in homogeneous time slices: it seems for example sensible to expect diﬀerent motion behaviours during the week-end and during the rest of the week, and it is thus reasonable to build diﬀerent models for those periods of time. Table 3. Prediction accuracy when diﬀerent order 4 models are built for various time divisions. The results given are the (weighted) mean performance of the models built. Granularity No time division Week/Week end Morning/Afternoon January 2003 Two hours periods Days of week

APC 55.2% 54.2% 54.1% 53.8% 53.1% 52.0%

MHC 59.8% 58.4% 58.2% 57.0% 53.4% 54.6%

Using such a method would however bring two drawbacks: (a) the start of the model’s learning curve could cause bad performance, and (b) short time periods could yield too small learning sets. Table 3 gives the results obtained using various time divisions. The results are surprising: in no case do the time slices improve the results. Two hypotheses can explain this fact: (a) the MHs’ behaviours are the same during all the time periods (movements are not cyclic and can be described as a stationary process) or (b) the motion context already captures those diﬀerences.

6

Conclusions and Future Works

Mobility prediction schemes can be divided into two main classes, here designated AP-centric (or centralized) and MH-centric (or decentralized). Quite surprisingly, they have never been directly quantitatively compared. This article partially ﬁlls this gap using a study based on markovian models. The parameters of those models are ﬁt via the analysis of a database containing the real motion traces of the mobile hosts of a campus WiFi network. It allows us to draw a number of conclusions: – Contrary to one could expect, the measured accuracy diﬀerence between APC and MHC is only a few percents (typically 55% vs 59%). – In any case, the prediction accuracy is low (less than 60%); this is certainly a characteristic of WiFi users, and we expect other networks (e.g. GSM) to exhibit more predictable, regular motion patterns. This is not a real concern for this study as we are more interested in comparing APC and MHC rather than in absolute results.

AP and MN-Centric Mobility Prediction

331

– Next-AP prediction uncertainty can be estimated by an entropy measurement, but this only partially reﬂects prediction accuracy and, in this case, does not provide an accurate comparison of APC and MHC. One should thus refrain from linking entropy to prediction accuracy as this can introduce a bias. – Quite surprisingly, we notice that building models speciﬁc to certain periods of time (e.g. week/week-end, morning/afternoon) does not bring any improvement. This comparison could be continued along several lines. For example, the relevance of an hybrid method combining APC and MHC schemes could be explored; the prediction of APC could, for example, be used when a MH reaches an AP it has never seen before. This situation barely arises in the dataset used here, hence such an experiment should be tried using traces extracted from a diﬀerent network.

References 1. Misra, A., Roy, A., Das, S.: An information-theoretic framework for optimal location tracking in multi-system 4G networks. In: Proc. of INFOCOM’04. (2004) 2. Laasonen, K.: Clustering and prediction of mobile user routes from cellular data. PKDD 2005, LNAI 3721 (2005) 569–576 3. Samaa, N., Karmouch, A.: A mobility prediction architecture based on contextual knowledge and spatial conceptual maps. IEEE Trans. on mobile comp. 4 (November/December 2005) 4. Hadjiefthymiades, S., Merakos, L.: Using path prediction to improve TCP performance in wireless/mobile communications. IEEE Communications 40(8) (August 2002) 5. Soh, W.S., Kim, H.: Dynamic bandwidth reservation in cellular networks using road topology based mobility predictions. In: Proc. of IEEE INFOCOM’04. (Mars 2004) 6. Pandey, V., Ghosal, D., Mokherjee, B.: Exploiting user proﬁle to support diﬀerentiated services in next-generation wireless networks. IEEE Network (September/October 2004) 7. Song, L., Kotz, D., Jain, R., He, X.: Evaluating location predictors with extensive Wi-Fi mobility data. Technical Report TR2004-491, Dartmouth College (February 2004) 8. Song, L., Kotz, D., Jain, R., He, X.: Evaluating location predictors with extensive Wi-Fi mobility data. In: Proc. of 23rd Annual Joint Conference of the IEEE Computer and Communications Societies. Volume 2. (March 2004) 1414–1424 9. Bhattacharya, A., Das, S.: LeZi-update: an information-theoretic approach to track mobile users in PCS networks. In: Proc. of MobiCom’99, Seattle (August 1999) 10. Yu, F., Leung, V.: Mobility-based predictive call admission control and bandwidth reservation in wireless cellular networks. Computer Networks 38(5) (2002) 11. Cleary, J.G., Witten, I.H.: Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications 32(4) (1984) 396–402 12. Jacquet, P., Szpankowski, W., Apostol, I. In: An universal predictor based on pattern matching, preliminary results. (2000) 75–85

332

J.-M. Fran¸cois and G. Leduc

13. Kotz, D.: CRAWDAD, a Community Resource for Archiving Wireless Data At Dartmouth. http://crawdad.cs.dartmouth.edu/ (2005) 14. Kotz, D., Essien, K.: Analysis of a campus-wide wireless network. In: MobiCom. (September 2002) 107–118 Revised and corrected as Technical Report TR2002-432. 15. Conan, V., Leguay, J., Friedman, T.: The heterogeneity of inter-contact time distributions [. . . ]. arXiv.org, cs/0609068 (2006) 16. Fran¸cois, J.M.: Performing and Making use of Mobility Prediction. PhD thesis, University of Li`ege (2007) 17. Choi, S., Shin, K.: Predictive and adaptive bandwidth reservation for hand-oﬀs in QoS-sensitive cellular networks. SIGCOMM CCR 28(4) (1998) 155–166

A Flexible and Distributed Home Agent Architecture for Mobile IPv6-Based Networks Albert Cabellos-Aparicio and Jordi Domingo-Pascual Technical University of Catalonia, Department of Computer Architecture, Advanced Broadband Communications Center, c/Jordi Girona, 1-3, 08034 Barcelona, Spain {acabello,jordid}@ac.upc.edu.com http://www.ccaba.upc.edu

Abstract. Home Agents (HA) represent a single point of failure for Mobile IPv6-based networks. To overcome this problem many solutions have been published providing reliable HA architectures. These solutions require deploying redundant HAs on each sub-network. Although these solutions eﬀectively mitigate this problem, they do not take into account the requirements of large networks with dozens of sub-networks. Deploying several HAs on each sub-network may be too expensive to deploy and to manage. In this paper we present a novel HA architecture that only requires a set of HAs for the whole network. Our basic idea is that the Mobile Node’s location can be announced to exit routers, this way re-directing packets can be done without involving the HA. Our solution provides reliability and load balancing as the existing solutions. Finally, we validate our proposal through an analytical model and compare it against other proposals through a simulation. Keywords: Mobility, Mobile IPv6, Home Agent Architecture.

1

Introduction

Mobile IPv6 [1] is considered to be one of the key technologies to provide mobility to the Internet. With “mobility” a user can move and change his point of attachment to the Internet without losing his network connections. In Mobile IPv6 a Mobile Node (MN) has two IP addresses. The ﬁrst one identiﬁes the MN’s identity (Home Address) while the second one identiﬁes the MN’s current location (Care-of Address). The MN will always be reachable through its Home Address while it will change its Care-of Address according to its movements. A special entity called a Home Agent (HA) placed at the MN’s home network maintains bindings between the MN’s Home and Care-of Addresses.

This work was partially funded by IST under contract IST-2006-NoE-0384239 (ISTCONTENT), MEC (Spanish Ministry of Education and Science) under contract TSI 2005-07520-C03-02 and the CIRIT (Catalan Research Council) under contract 2005 SGR 00481.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 333–344, 2007. c IFIP International Federation for Information Processing 2007

334

A. Cabellos-Aparicio and J. Domingo-Pascual

In addition the communications between the MN and its peers (Correspondent Nodes) are routed through the HA. Thus, the MN relies on its HA for its connectivity. However, Mobile IPv6 incorporates a route optimization mechanism where the MN can communicate directly with its Correspondent Nodes (CN). This mechanism avoids triangle-routing through the HA reducing the HA’s load. However, this will not be used for short-term communications (e.g a MN accessing a web page). A HA may be responsible for multiple MNs on a Home Link. The failure of a single HA may then result in the loss of connectivity of numerous MNs. Thus, HAs represent the possibility of a single point of failure for a Mobile IPv6-based network. Moreover, MN’s communications through the HA may also lead to either the HA or the Home Link becoming the bottleneck of the system. In addition, the HA’s operations such as security check, packet interception and tunneling might not be as optimized in the HA software as plain packet forwarding. The Mobile IPv6 standard allows the deployment of multiple HAs on the Home Link to provide reliability and load balancing. This is done so that upon the failure of the serving HA another HA can take over the functions of the failed one. This provides continuous service to the MNs registered with the failed HA. However, the transfer of service is problematic [2]. The solution is MN-driven and forces the MN to detect the failure and select a new HA. This causes delayed failure detection, service interruption in the upper layer applications, increased workload on the MN, message overhead over the air interface and IPsec Security Associations re-establishment. Many research papers have been published that address these problems. The solutions presented in [4][5][6][7][8] increase HA reliability and load balancing by deploying several redundant HAs at the Home Link. In these solutions, all the HAs share the registration state and they deﬁne eﬃcient mechanisms for HA recovery. These solutions reduce the service disruption time in front of Mobile IPv6. In addition, the MN’s traﬃc is balanced among the diﬀerent HAs. The main diﬀerence among them is that some [4][5][6] are MN-driven solutions while others [7][8] are transparent to the MN. Unfortunately, these proposals are focused on providing HA reliability and load balancing on just a single Home Link but they do not take into account the global requirements of an Autonomous System (AS). An AS that hosts MNs may have dozens of sub-networks. Deploying reliable HAs requires several redundant HAs on each link. It is important to remark that the Mobile IPv6 protocol belongs to the IPv6 standard and, theoretically, any IPv6 node has mobility capabilities. Thus, these approaches are too expensive to deploy and to manage. A diﬀerent proposal, which does not require deploying redundant HAs on each Home Link, is the Virtual Mobility Control Domain protocol (VMCD) [9][10]. The VMCD protocol allows multiple HAs to be placed at diﬀerent domains. A MN may use multiple HAs simultaneously. The basic idea behind this proposal is that each HA advertises, through eBGP, the same home network preﬁx from multiple routing domains. Each MN then picks the best HA according to its

A Flexible and Distributed Home Agent Architecture

335

topological position. The main drawback for this proposal is that the impact on the exterior BGP routing system scalability is unpredictable. In this paper, we present a novel ﬂexible and distributed HA architecture that takes into account the mobility requirements of an AS and that does not impact the BGP routing system. We consider the HA as an entity that performs several diﬀerentiated operations. We have analyzed each operation and we have assigned each of them to an entity of the network. Our basic idea to distribute the operations is that a registration from a MN into a HA can be viewed as an internal route from the network’s point of view. That is, when a MN registers a new location into its HA it is actually installing a new route (Home Address → Care-of Address). We believe that this route can be announced throughout the network and thus, it is not necessary to deploy a HA on each link. As we will see, our solution only requires deploying one HA for the whole network. This HA should be reliable and our architecture allows deploying more than one HA to distribute the load. Moreover, our solution can use the redundant mechanisms presented in [4][5][6][7][8]. In addition, our solution reduces considerably the number of MN’s data packets transmitted into the network and is compatible with legacy MNs.

2 2.1

Introduction Design Rationale

In this subsection, we will analyze the diﬀerent operations of a Home Agent (HA) and how they can be distributed from a network’s point of view. In the rest of the paper, we will use the following terminology: we deﬁne Home Network as the set of Home Links managed by our HA. We deﬁne Exit Routers (ER) as the routers that connect the Home Network with the rest of the Internet. These ERs may or may not be the AS’s border routers and an AS may have several Home Networks. Home Agents are responsible for maintaining bindings between the MN’s identity and its location. The HAs forward the MN’s signaling and the MN’s data packets as well. MNs send data packets through their HA when communicating with their Home Network or with CNs. Since MNs can communicate directly with its CNs it is expected that communications through the HA are mainly used for short-term connections. The Mobile IPv6 RFC states that packets sent through the HA may be secured through IPSec [3]. It should be taken into account that the MN can use IPSec with its peers regardless of the IPSec connection with their HA. We believe that it is not useful to secure MN to CN communications because the packets are only secured on half of the path (MN → HA) while the rest of the path (HA → CN) is not secured. Regarding the MN’s communications with the Home Network, protecting the path is useful. In this case the HA is acting like a Virtual Private Network (VPN) gateway. Under these assumptions and following the basic idea that a registration from a MN into a HA can be viewed as an internal route we can distribute the HA’s

336

A. Cabellos-Aparicio and J. Domingo-Pascual

operations throughout the network. In our architecture, a single HA is required for the whole network; we call it a ﬂexible Home Agent (fHA). This fHA will process (using IPSec) the MN’s signaling messages and will maintain registration information. It will also distribute this information throughout the network as internal routes. The network will directly process the MN’s communications with the CNs while the fHA will process the MN’s communications with the Home Network (using IPSec) in the same way as a VPN gateway. 2.2

Overview

Fig. 1 presents an overview of our architecture. Our proposal has only one HA (we call it a fHA) that will serve all of the MNs of the network. Take into account that our proposal allows more than one fHA (section 2.3) to be deployed to distribute the load. This fHA will be identiﬁed by an unicast address and the MNs will address its registration messages to it. Upon reception of a registration message, the fHA validates it and sends a routing message announcing the new route towards the MN. This information is then sent to each ER. In addition, the fHA advertises the route to the Home Link’s Access Router (AR). At this point, the network knows the location of the MN. When communicating with a CN through the HA, MNs do not address packets to the fHA but to an anycast [13] address owned by the ERs. For instance this anycast address can be conﬁgured on a Loopback interface. In this way, a given ER receives the MN’s data packets and de-capsulate, lookup and forward packets to the CN. Similarly, CNs send packets to the MN’s Home Address. Upon reception, the ER lookups the packet’s destination address (the MN’s Home Address). Since the fHA has previously installed a route at the ERs they know that the MN it is not at home. Therefore, the ERs encapsulate and forward the packet to the MN’s location. Our architecture manages eﬃciently MN to CN communications because some packets “bounce” at the ERs. This way the network’s internal traﬃc is reduced considerably. Regarding the communications from the MN to the Home Network, the MNs addresses its IPSec protected packets to the fHA that, in turn, de-capsulate and forward them to the MN’s peer. The MN’s peers address its data packets to the MN’s Home Address. Since the fHA has announced to the Home Link’s AR a route for the MN, the AR knows that the MN is away and it encapsulate the packet towards the fHA. The Home Link’s AR also multicasts Neighbor Advertisement messages on behalf the MN. This enables the AR to intercept communications from the Home Link to the MN and forwards them through the tunnel with the fHA. In the following subsections the detailed operations of our architecture are presented. 2.3

Dynamic fHA Address Discovery

This subsection speciﬁes how the fHA announces their presence. In standard Mobile IPv6 HAs announce their presence through Router Advertisement messages.

A Flexible and Distributed Home Agent Architecture

337

Fig. 1. Overview of our proposal

In this way, the MN’s can automatically select a HA. Our architecture implements this functionality in exactly the same way that Mobility Anchor Points (MAP) announce their presence in the Hierarchical Mobile IPv6 (HMIPv6) [12] protocol. Our mechanism is also compatible with legacy MNs. Each fHA sends Router Advertisement messages announcing its presence to the routers operating in the network. These messages include a preference value. In turn, the routers propagate the fHA’s announcements to ARs that then forward them to the Home Link. Each router will decrement the preference value. This way MNs can automatically discover their fHA’s address and select the best one according to the preference value. This mechanism has many beneﬁts. On the one hand, it enables ARs to automatically discover the fHA thus avoiding manual conﬁguration. On the other hand, it allows us to deploy more than one fHA on the network and distribute the load among them. The fHA’s Router Advertisement messages include the preﬁx(es) of the Home Network that it is serving and the anycast address owned by the ERs. Including the Home Network’s preﬁx enables the MNs to know if its peers are on the Home Network or not. Depending if the peer is on the Home Network or not MNs will address the data packets to the fHA or to the ERs. Finally, in order to provide compatibility with legacy Mobile IPv6 nodes, MNs may send its traﬃc to the fHA. 2.4

Signaling Processing

Each MN selects a given fHA through the above-mentioned mechanism. All the fHAs have pre-conﬁgured keys with the MNs as the Mobile IPv6 RFC states. Please note that ARs and ERs do not share any keys with the MNs. The fHAs receive registration messages from the MNs as stated by the Mobile IPv6 RFC. Upon reception of a successful registration message, the fHA has to announce this information (route) to the ERs, to the Home Link’s AR and to the rest of the

338

A. Cabellos-Aparicio and J. Domingo-Pascual

fHAs. To distribute this type of information we use a routing protocol. Instead of designing a new routing protocol we use an already existing and deployed one. The routing protocol that best fulﬁlls our requirements is the interior Border Gateway Protocol (IBGP) [11]. In our solution the fHAs, the ERs and Home Link’s ARs create an IBGP domain. It is very important to remark that this IBGP domain may be an already existing IBGP domain or a separate one. The routes announced through this IBGP domain always have the longest preﬁx (/128) and never aﬀect regular BGP routes. It should be noted that the routes announced by the fHAs will never be distributed outside the network. Finally, the entities participating in the IBGP domain have pre-conﬁgured keys to provide conﬁdentiality, integrity and authentication to the communications. For each successful received registration message, the fHAs send an IBGP UPDATE message to the ERs and to the AR responsible of the MN’s Home Link. The fHAs are able to determine the appropriate AR by inspecting the MN’s Home Address. We introduce new options in the IBGP UPDATE message. The UPDATE message sent to ERs includes the following information: (Home Address, Careof Address, Lifetime). Upon reception of this message, the ERs setup a tunnel endpoint with the MN. The tunnel source address is the anycast address while the destination address is the Care-of Address. In addition, each ER adds the following route to its routing table: (HomeAddress/128 → Tunnel). The tunnel and the route are automatically deleted after “Lifetime” seconds. The UPDATE message sent to the AR includes the following information: (Home Address, Lifetime). Upon reception of this message, the AR knows that the MN is away from home (note that the AR does not know the location of the MN). Next, the AR setups a tunnel endpoint towards the fHA that announced the route and adds the following route to its routing table: (HomeAddress/128 → Tunnel). The AR also starts sending multicast Neighbor Advertisement messages on behalf of the MN at the Home Link. If a node of the Home Network (or Home Link) sends a packet to the MN, the AR intercepts it and encapsulates it towards the fHA. Once again, the tunnel and the route are automatically deleted after “Lifetime” seconds. Once the MN returns home it sends a registration message to the fHA. Upon reception, the fHA sends an IBGP WITHDRAWAL message to the ERs and to the corresponding AR to immediately remove all the routes and tunnels related to the MN’s Home Address. 2.5

Data Packets Processing

This subsection presents how packets are routed from/to the MNs. MNs communicating with CNs encapsulate their data packets to the anycast address owned by the ERs (Fig. 2). The packets are received by the “nearest” ER that will de-capsulate and forward them towards the packet’s destination address (the CN’s address). If the exit point of the CN’s address is another ER then the packet traverses the network as a transit packet. It is important to remark that our solution does not require anycast routing. Packets addressed to

A Flexible and Distributed Home Agent Architecture

339

Fig. 2. MNs to CNs communications

Fig. 3. MNs to Home Network communications

the anycast address are routed normally (like unicast) and delivered to a given ER. We use anycast addresses because it is the standard procedure to assign the same address to diﬀerent network interfaces. MNs communicating with nodes located into their Home Network (Fig. 3) encapsulate their packets towards the fHA. However, packets sent by MN’s peers are addressed to the MN’s Home Address. The MN’s AR intercepts those packets. Since the AR knows that the MN is away from home, it encapsulates the packet towards the fHA. Since the Mobile IPv6 RFC states that the packets are tunneled through the HA encapsulating the packet from the AR to the fHA does not aﬀect the path’s MTU [1]. As has already been mentioned, the MN’s communications with the Home Networks are protected with IPSec. 2.6

Flexible Home Agent Location

This subsection discusses the possible locations of the fHAs. Each fHA can be placed anywhere in the network, as a separate server, co-located with an ER/border router or even with a BGP Route Reﬂector.

340

A. Cabellos-Aparicio and J. Domingo-Pascual

Fig. 4. fHA location example

One of the major beneﬁts of our proposal is its ﬂexibility. On the one hand, our architecture can serve all the MNs of a network with one or more fHAs. If more than one fHA is deployed MNs will select the nearest one based on the preference value. This way the load is distributed among them. Each fHA thus only process signaling messages and communications from/to the Home Network (like a VPN gateway). MNs to CNs communications are then processed by ERs. On the other hand, our architecture is transparent to MNs running with legacy Home Agents and both technologies may co-exist on the same network. Fig. 4 shows an example of the ﬂexibility of our architecture. This AS has three networks and each one can independently select which approach it deploys. For instance, the “A” network can deploy both technologies. The fHA could serve MNs belonging to the “A.1” sub-network while MNs belonging to the “A.2” sub-network could be served by a legacy HA. The “B” network can deploy only legacy Home Agents on each sub-network. Finally, the “C” network can deploy two fHAs and all the MNs from “C.1” and “C.2” could be served by them. Only routers labeled in black must belong to the IBGP domain with the fHAs of their network. There will be a separate IBGP domain for each Home Network. MNs served by an fHA send its data packets to an anycast address owned by the ERs. Since the preﬁx of the anycast address belongs to the Home Network’s preﬁx, the AS’s border routers knows how to forward the packets and do not need to be aware of our protocol.

3

Evaluation

This section presents an analytical evaluation of our proposed scheme and a comparison with a network running Mobile IPv6 enhanced with existing solutions

A Flexible and Distributed Home Agent Architecture

341

[4][5][6][7][8]. We do not consider solutions based on eBGP [9][10] because their impact on the exterior BGP routing system scalability is unpredictable. 3.1

Signaling Overhead

Let N be the number of ERs of a network that is running our proposal, let M be the number of deployed fHAs and let H be the total number of received registration messages per second (including foreign and home network registrations). Then our proposal requires sending H(N +M ) IBGP messages per second. 3.2

Transit Traﬃc Reduction

As has been commented previously, in our proposal some data packets will “bounce” at the network’s ER without being transmitted through the network. However in existing solutions [4-8] each packet sent through the HA has to be transmitted twice. One from the ER to the HA and another one in the opposite direction. In this subsection we compare this amount of transit traﬃc. We only consider the traﬃc exchanged between MNs and CNs that is routed through the HA.

Fig. 5. Transit Traﬃc in our proposal

Let I be the Kbps of traﬃc exchanged between all the MNs and its CNs through the HA. Then, existing solutions [4-8] have 2I Kbps of transit traﬃc. If we assume that each ER of the network has the same probability of being the exit point of a given packet then, our proposal has (1-1/N )I Kbps of transit traﬃc (Fig. 5). In addition, transit traﬃc in existing solutions [4-8] may follow a longer path than in our proposal. While in existing solutions [4-8] transit traﬃc must be transmitted to the HA in our proposal some transit traﬃc “bounces” at the ERs and the rest is transmitted from one ER to another. Home Links are usually deployed far away from the ERs while ERs may be close to one another (in terms of number of hops).

342

3.3

A. Cabellos-Aparicio and J. Domingo-Pascual

Stored State at the Routers

In this subsection we will analyze the size of the routing tables and the number of tunnels conﬁgured at the ERs and ARs of a network running our proposal. Each ER has 1 tunnel and 1 route for each MN of its Home Network that is away from home. Each route and tunnel requires the Home Address, the Care-of Address and a lifetime, in total 34 bytes. Likewise, each AR will have just 1 tunnel with each fHA and 1 route for each of its nodes away from home.

4

Simulation

In order to validate our proposal we have run a simulation. The simulation is intended to provide realistic values for the equations presented in section 3 and to compare our proposal with existing solutions [4-8]. In order to provide realistic values, we have conﬁgured a highly mobile environment by using a Random Trip mobility model [14]. Speciﬁcally, we have used the Random Waypoint on Generalized Domain model with a set of 8 domains. Each domain represents a layer-2 network where a MN can move without changing its point of attachment (i.e default router). Only when the MN changes from one domain to another it must register its new location. Please, refer to [14] for further information. The ﬁrst domain is considered to be the Home Network while the rest of the domains are foreign networks. The Home Network has 1000 MNs, 2 ERs and 5 sub-networks. When running our proposal the Home Network has 2 fHAs while when running existing solutions [4-8] the Home Network has a set of reliable HAs on each sub-network (5 sets in total). In addition, each MN sends 64Kbps (VoIP) of unidirectional traﬃc towards its Home Network and 128 Kbps (Data) towards a CN. It should be taken into account that when a MN is at home traﬃc is sent directly and thus we do not consider it. Similarly, we do not take into account route optimized traﬃc. Finally, we have simulated this environment during 10000 seconds (roughly 2.7 hours). Our mobility model produces a mean of 4.68 foreign network registration messages per second (messages/s) and 0.80 Home Network registration messages/s. This means that our proposal requires sending 18.72 IBGP UPDATES messages/s and 3.2 IBGP WITHDRAWAL messages/s. Summarizing, our proposal introduces 21.92 signaling messages/s where each ER must process 5.48 messages/s. Regarding the transit traﬃc table 1 presents the results. In our proposal, fHAs have to process 465.04 Mbps. Our simulated network has two fHAs and each one processes 232.52 Mbps of data traﬃc. In [4-8] HAs process 1412.9 Mbps of traﬃc, our simulated network has 5 sets of HAs, this means that each set of HA processes 282.58 Mbps. In our proposal, the data traﬃc destined towards CNs is directly processed by ERs (947.86 Mbps). Regarding the transit traﬃc, our proposal reduces it by 75% compared to existing solutions [4-8]. It should be taken into account that existing solutions [4-8] must send each data packet twice, one from the ER to the HA and another one in the opposite direction.

A Flexible and Distributed Home Agent Architecture

343

Table 1. Simulation Results (Values in Mbps) Existing Solutions [4-8] Our Proposal Traﬃc sent by MNs 1412.9 (465.04 to the HN, 947.86 to CNs) through the HA/fHA Traﬃc processed by 1412.9 465.04 HAs/fHAs Traﬃc processed by ERs N/A 947.86 Transit Traﬃc 1895.72 473.93

Our solution has to forward transit traﬃc into the network only if the receiving ER is not the exit point of the packet’s destination address Finally, during the simulation a maximum of 900 nodes were away from home at the same time (average 717, minimum 685). This means that the maximum stored state on each ER is 29.9KB. This simulation shows that our proposal is viable, and that among other beneﬁts, it can reduce the transit traﬃc considerably.

5

Conclusion

In this paper, we have presented a ﬂexible and distributed HA architecture. Existing proposals [4-8] provide a reliable HA architecture by deploying redundant HAs on each Home Link. Our proposal has the same beneﬁts but with just one set of fHA for the whole network. Our solution is reliable: a failure on the MN’s ARs will not disconnect the MN. In this case the MN will still be able to communicate with the Home Network (except with the Home Link) and with the rest of the Internet. Since our solution allows deploying several fHA for each network, a failure of a fHA does not disconnect the MN. In this case, our solution can beneﬁt from the proposed eﬃcient failure recovery mechanisms presented in [7][8] because it is fully compatible with them. This way we can minimize the service interruption time. Finally, a failure on a ER does not disconnect the MN. In this case, the network announces the failure of the ER through the exterior routing protocol and the packets will be re-routed. Our solution also provides load balancing because the MN’s data packets are processed by ERs or by a set of fHAs. Moreover, it reduces considerably the transit traﬃc through the network (75% according to our simulation). Distributing HA’s operations requires adding some extra load at the ERs and at the ARs. The ERs have to setup tunnels and conﬁgure new routes towards the MNs while the ARs have to conﬁgure a tunnel with each fHA and intercept packets destined to its MNs. We believe that routers are hardware machines optimized to perform exactly this type of operations. It is important to remark that signaling and IPSec data packets are processed by fHAs, not by routers. In addition, our simulation of a highly mobile environment shows that each ER would require processing only an average of 5.48 signaling messages per second.

344

A. Cabellos-Aparicio and J. Domingo-Pascual

Finally, as future work, we plan to implement Traﬃc Engineering for MN’s traﬃc in case that the exit routers are the AS’s border routers. We also plan to extend our solution to correspondent networks.

References 1. D. Johnson et al: Mobility Support in IPv6. RFC 3775, 2004 2. Jahanzeb Faizan et al: Problem Statement: Home Agent Reliability. IETF Draft (Work in Progress), 2004 3. S. Kent et al: Security Architecture for the Internet Protocol. RFC 2401, 1998 4. F. Heissenhuber, W. Fritsche and A. Riedl: Home Agent Redundancy and Load Balancing in Mobile IPv6. in Proc. 5th Int. Conf. Broadband Communications, 1999 5. H. Deng et al: Load Balance for Distributed Home Agents in Mobile IPv6. in Proc. 14th IEEE PIMRIC 2003 6. H. Deng et al: Load Balance for Distributed Home Agents in Mobile IPv6. IETF Draft (Work in Progress), 2003 7. J. Faizan et al: Eﬃcient Dynamic Load Balancing for Multiple Home Agents in Mobile IPv6 based Networks. in Proc. 5th Int. Conf. Pervasive Services, 2005 8. R. Wakikawa et al: Home Agent Reliability Protocol. IETF Draft (Work in Progress), 2006 9. R. Wakikawa et al: Inter Home Agents Protocol Speciﬁcations. IETF Draft (Work in Progress), 2006 10. R. Wakikawa et al: Virtual mobility control domain for enhancements of mobility protocols. INFOCOM 2005 11. Y. Rekhter et al: A Border Gateway Protocol 4 (BGP-4) RFC 1771, 1995 12. H. Soliman et al: Hierarchichal Mobile IPv6 Mobility Management (HMIPv6) RFC 4140, 2005 13. S. Deering et al: Internet Protocol, Version 6 Speciﬁcation RFC 2460, 1998 14. PalChaudhuri, S. et al: Perfect Simulations for Random Trip Mobility Models 38th Simulation Symposium, 2005

Using PANA for Mobile IPv6 Bootstrapping Julien Bournelle1 , Jean-Michel Combes2 , Maryline Laurent-Maknavicius1, and Sondes Larafa1 GET/INT, 9 rue Charles Fourier, 91011 Evry, France [email protected] [email protected] [email protected] 2 France Telecom R&D, 38/40 rue du General Leclerc, 92784 Issy-Les-Moulineaux, France [email protected]

1

Abstract. One of the current challenge of the Mobile IPv6 Working Group at the IETF is to dynamically assign to a Mobile Node its Home Agent, Home Address and to setup necessary security associations. If the Mobile Node is authenticated for network access, the current IETF approach is to use DHCPv6 to deliver the Home Agent and then to use IKEv2. In this article, we assume that the PANA protocol is used for network access authentication. We propose to add some functionalities to this protocol to support Mobile IPv6 Bootstrapping. The Home Agent is directly delivered to the Mobile Node and DHCPv6 is no more necessary. Moreover, it allows a better management of Home Address allocation by the AAA infrastructure. This proposal has been submitted to the IETF and implemented on a testbed at GET/INT research laboratory.

1

Introduction

In the Mobile IPv6 protocol, a Mobile Node needs a Home Agent, a Home Address and IPsec Security Associations with its Home Agent to secure signaling messages. To ease the deployment of this mobility protocol, Internet Service Providers need scalable mechanisms to dynamically assign these pieces of information to their customers. This is known as the Mobile IPv6 Bootstrapping Problem. In this paper, we propose a solution based on the PANA protocol to deliver the Home Agent Information to the Mobile Node during the network access authentication phase. In section 2, the Mobile IPv6 protocol is presented. Section 3 describes the Mobile IPv6 Bootstrapping problem and the IETF bootstrapping solution based on the DHCPv6 protocol. Our solution which relies on the PANA protocol is detailed in section 4. Finally, this proposal has been implemented and a description of our testbed is given in section 5. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 345–355, 2007. c IFIP International Federation for Information Processing 2007

346

2

J. Bournelle et al.

Mobile IPv6 Overview

As it stands in [1], an IPv6 Mobile Node (MN) is uniquely identiﬁed by its Home Address (HoA), and is maintained reachable, whatever its position in the IPv6 network, thanks to a registration mechanism to its Home Agent (HA). Each time the MN attaches to a new network after some moves or reboots, the mobile is assigned a new local temporary IPv6 address either from a DHCPv6 server or through address autoconﬁguration. The MN then has to inform its Home Agent of its temporary address also known as its Care-of Address (CoA). This operation of binding its HoA to its CoA is operated by the MN sending a Binding Update (BU) and the HA acknowledging the BU message with a Binding Acknowledgement (BA) (cf. Fig. 1). After the registration is completed, the HA intercepts all the data traﬃc directed to the HoA, and tunnels them to the current position of the MN (i.e. CoA). Thus any Correspondent Node (CN) only needs to know the HoA to transmit data to MN via HA. For Routing Optimization (RO) purpose, the Mobile IPv6 protocol enables CN and MN to directly communicate rather than going through HA.

Fig. 1. Mobile IPv6 architecture

Binding operations are highly sensitive to malicious traﬃc redirections, and do require strong protection. For binding updates operated by MN to HA, protection is guaranteed by the IPsec sub protocol ESP (Encapsulating Security Payload) in transport mode [2]. HA is thus enabled to authenticate the BU’s originator, and to prove the integrity of the BU content, especially the CoA which is located within the Alternate Care-of Address mobility option. However, this ESP mechanism assumes that a Security Association (SA) is pre-established between HA and MN (MIPv6 SA). For the binding updates operated by MN to CN, the mechanism is less secure as no strong assumption on a preshared Security Association might be imposed. The selected mechanism known as Return Routability [1] proposes that MN sends two messages to CN, one directly to CN and another one through HA, and that the two replies from CN follow the reverse path. The idea is that intercepting two messages instead of one to perform one possible traﬃc redirection is much

Using PANA for Mobile IPv6 Bootstrapping

347

more diﬃcult. Moreover, because of ESP protection between HA and MN, HA brings a guarantee to CN about the authenticity of the request.

3

The Mobile IPv6 Bootstrapping Problem

3.1

Problem Description

As explained in section 2, the Mobile IPv6 service needs for activation that the Mobile Node be pre-conﬁgured with a HA, a HoA and MIPv6 IPsec SA with the Home Agent to protect Mobile IPv6 signaling. Note that another mechanism exists based on an authentication option to secure this signaling [3] but it is not considered in this article. These parameters may be statically conﬁgured on each Mobile Node. However, considering an operator willing to deploy the Mobile IPv6 protocol inside its IP networks, static conﬁguration of millions of electronic devices is a burden and clearly not scalable. Moreover, as explained in [4], other reasons state for a dynamic bootstrapping mechanism: – Dynamic Home Agent assignment: to oﬀer the Mobile IPv6 service, an operator will deploy multiple Home Agents. For load balancing between HAs, the less loaded HA should be allocated to MNs. Moreover, if a HA becomes unavailable for maintenance, network renumbering or failure, with a dynamic assignment, the operator is still able to provide the service. – Dynamic Home Address assignment: the operator may want do dynamically assign Home Address to its clients. Indeed, this is preferred for a better management of address allocation and to ease administrators’ tasks. The current trend at the Internet Engineering Task Force (IETF) is to ﬁrst attribute the Home Agent to the Mobile Node and then to proceed to IKEv2 [5] exchanges between the MN and the HA. IKEv2 enables the MN to query a Home Address and, in the same time, to dynamically setup the MIPv6 SAs with its HA, as required by Mobile IPv6. Moreover, IKEv2 supports the EAP protocol [6] for the HA to authenticate the MN. Use of EAP permits the MN to use the same credentials that have been used for network access authentication (if any). Two generic scenarios are possible [4]: in the ﬁrst one, the Mobile Node has a free network access (e.g. hotspots in a coﬀee shop) while in the second one, the MN is authenticated before getting network access. For the ﬁrst scenario, the IETF proposed delivering the HA information through DNS infrastructure [7]. In the second scenario, the Home Agent information is delivered as part of the network access authentication procedure [8]. In both scenarios, the MN then uses IKEv2 with the delivered Home Agent. In this article, we assume that the Mobile Node is in the second scenario. As described in [8], the IETF approach is to use DHCPv6 [9]. In our proposed mechanism, we can bypass DHCPv6 exchanges by delivering the HA information during the network access.

348

3.2

J. Bournelle et al.

DHCPv6 Approach

One way to attribute the Home Agent for MN bootstrapping, relies on the re-use of the network access control architecture. At the same time the MN asks for the IP access to the operator’s network, it recovers the Home Agent. The IETF is working on a solution described in [8] using the DHCPv6 protocol [9]. Figure 2 shows the entities involved in this solution.

Fig. 2. MIP6 Bootstrapping with DHCPv6: architecture

Fig. 3. MIP6 Bootstrapping with DHCPv6: exchanges

Figure 3 presents the diﬀerent steps of this solution. At ﬁrst, the MN authenticates itself to the Network Access Server (NAS) with a protocol like PANA or 802.11i/802.1X (Step A). The NAS asks the AAA server in the Home Network (AAAH) about the MN’s rights (Step B). The AAAH checks in the same time whether the MN subscribed to the IPv6 mobility service. If this is the case, the

Using PANA for Mobile IPv6 Bootstrapping

349

AAAH replies to the NAS that the MN is allowed to access the network and which HA to be used (Step C). Then the NAS informs the MN it can access the network and stores the information about the HA assigned by the AAAH (Step D). To obtain the HA, the MN then sends a DHCP request (Step E). The NAS, which must also act as a DHCPv6 Relay, intercepts the request and forwards it to the DHCP server, adding the information about the HA assigned by the AAAH (Step F). If the MN required a HA in the local network, the DHCP server assigns a HA. If the MN required a HA in the Home Network, it is the one assigned by the AAAH. The DHCP sends the information about the HA to the NAS (Step G) which forwards the reply to the MN (Step H).

4 4.1

PANA Approach PANA Overview

PANA (Protocol for Carrying Authentication for Network Access) [10] is currently under standardization by the IETF and appears as a good candidate for handling MNs authentication at the network access as it is layer 2 agnostic, and it supports any EAP methods. Based on the client server model, the classical PANA architecture (cf. Fig. 4) utilizes a PANA client (PaC) located in the MN and a PANA Authentication Agent (PAA) located in the network access for instance in the access router (or NAS). For controlling MNs access to the network, an EP (Enforcement Point) should apply security and ﬁltering policies at levels 2 and/or 3, so only authenticated and authorized MNs are permitted to send data traﬃc to the access network; EP is located at the access network and possibly on the same equipment than the PAA.

Fig. 4. PANA framework

Several phases are necessary for the MN to connect to the access network. First, the PaC needs to discover the local PAA, and negotiate services. This ﬁrst phase includes classically three messages. Second for PAA to proceed to authentication of MN, some PANA messages are exchanged encapsulating EAP payloads. This authentication/authorization phase might rely on a AAA architecture handling authentication thanks to AAA servers and a AAA client located on the same equipment than PAA. Third, the MN is granted access to some Internet resources through the EP, but PANA messages are still exchanged for session

350

J. Bournelle et al.

keep alive reasons. From time to time, reauthentication of MN is required by the PAA or the AAA servers. All these PANA messages and AAA messages are in the form of AVPs (Attribute Value Pair). 4.2

Using PANA for MIP6 Bootstrapping

Without any modiﬁcations, in the case of a Mobile Node using PANA for network access authentication, exchanges would be ﬁrst the PANA exchanges, then DHCPv6 to get the HA and then IKEv2. Our idea is to modify PANA in order to negotiate and to deliver Mobile IPv6 information to the Mobile Node. For this purpose, we introduced new AVPs in PANA messages. We deﬁned a new Attribute Value Pair (AVP) called Mobility-Capability AVP. This AVP is used by the PANA Authentication Agent (PAA) to indicate to the MN/PaC that it supports Mobile IPv6 Bootstrapping. This AVP may be used to indicate other IP mobility capabilities such as Mobile IPv4 [11] or HIP [12]. This AVP is sent in the ﬁrst PANAStart-Request message (PSR) as shown on Fig. 5. If the Mobile Node supports our proposal and wants to bootstrap IPv6 Mobility service, it includes the Mobility Capability AVP indicating its willingness in the PANA-Start-Answer message (PSA). This AVP may also be used to indicate whether the MN wants a local HA or a HA in its home domain. Compared to the DHCPv6 approach, this permits to the AAAH server to decide whether the MN is allowed to use a local HA.

Fig. 5. MIP6 Bootstrapping with PANA: exchanges

After this negotiation phase, we enter the authentication phase. The PAA relies on a AAA protocol to authenticate the MN. As described in [13] or in [14], the AAA signaling may be used by the PAA to indicate that it supports Mobile IPv6 Bootstrapping. Moreover, the visited domain may also indicate that it can locally allocate a HA. At the end of the authentication phase, the PAA obtains the HA information in the AAA message containing the result of the authentication. Note that if the authentication failed, it is not necessary to provide the HA information to the NAS (e.g. PAA). To deliver the HA to the MN, the PAA creates an AVP Home-Address (HoA) and puts it in the PANA-BindRequest message (PBR). Thus, at the end of the authentication, the MN has its Home-Agent and can use IKEv2 to get its HomeAddress and to setup necessary IPsec SAs.

Using PANA for Mobile IPv6 Bootstrapping

4.3

351

Pros and Cons

The main advantage regarding our solution is that it keeps the ”end-to-end” Internet philosophy. Indeed, in our solution, the MN knows that it will receive a HA and it might be guaranteed that this HA assignment is done by its AAAH (e.g. thanks to encryption mechanisms). In the DHCP based solution, the information about the HA is stored in the DHCP server and the MN cannot be totally sure that the information about the assigned HA comes from the Home Network. Another advantage is that the AAAH server can decide if the Mobile Node can use a local HA. Indeed, in our approach, this is managed at the beginning of the authentication phase. In the DHCPv6 approach, only the MN can decide if a local HA is required. Regarding the cost of our solution, this one does not require setting-up a DHCP infrastructure (i.e. client in the MN, relay in the NAS and server in the access network), so this reduces the cost of the Mobile IPv6 deployment. Moreover our solution may support privacy because it may be possible for the AAAH to cipher the information about the assigned HA up to the MN. Finally, our solution is naturally secured regarding the authentication and the integrity of the data because the security relies on PANA. This may not be the case for DHCPv6. Now, even if our solution reduces the allocation of the HA in one roundtrip time and provides hereabove advantages, the IETF is against the fact that network access control protocols support conﬁguration.

5 5.1

Implementation Report Required Adaptation for the Implementation

Our proposal has been submitted to the IETF in [15] and can be integrated in the overall architecture of the proposed IETF solution. However, due to some external constraints, the implementation is diﬀerent from the original proposal. The diﬀerences are explained in this section. At the time of implementation, no IKEv2 with EAP and Mobile IPv6 support was available. For this reason, we used IKEv1 [16]. However, IKEv1 does not permit to remotely conﬁgure the IPv6 address of its MN clients. For this reason, in our platform, the AAA server handles the Home Address Allocation and sends it to the PAA. Then the Home Address is carried to the MN using PANA messages. Normally, IKEv2 allows use of EAP to authenticate the MN. In IKEv1, only certiﬁcates or pre-shared keys permit to authenticate the IKEv1 exchanges. For this reason, we decided to bootstrap and distribute keying material for IKEv1 authentication between MN and HA. Our solution is to derive the PSK from the Master Session Key (MSK). This MSK is derived as part of the EAP authentication method and is available at both EAP client (MN) and EAP server (AAAH). Due to the colocation of HA and AAAH, the resulting PSK for IKEv1 is easily conﬁgured by modifying IKEv1 conﬁguration ﬁle.

352

5.2

J. Bournelle et al.

Platform

The hosts A, B and C use FreeBSD 5.41 as operating system. As shown on Fig. 6, the MN is installed on host A, the host B acts as an Access Router and implements PAA and AAA client functionalities, and ﬁnally the AAA and EAP servers, colocated with the HA, are installed on host C. Details are given in next subsections.

Fig. 6. PANA - Mobile IPv6 testbed

5.3

Mobile Node

The host A was installed with SHISA2 to support Mobile IPv6, an EAP client, a PANA client and IPsec components (IKE, AH and ESP). The EAP client was extracted from xsupplicant3 while the PANA client has been implemented from scratch. The EAP-TLS method was selected as an authentication method between the EAP client and the EAP server. IPsec Security Associations to secure Mobile IPv6 signaling are established thanks to racoon application implementing IKEv1. We used a patched version of racoon (by Francis Dupont (Point6/CELAR)). The reason is that the original version of the IPsec-tools project is unable to setup correct IPsec SAs for Mobile IPv6 when MN is not in its home domain. The PANA client receives from the PAA the MN’s HoA and the HA address in corresponding AVPs in the PANA-Bind-Request message. After receiving the PANA-Bind-Request message, the PaC launches a shell script that ﬁrst conﬁgures racoon, IPsec security policy database, and Mobile IPv6 with those HoA and HA addresses, and second executes racoon and Mobile IPv6 daemons. 5.4

Access Router/PAA/AAA Client

The PAA located in host B receives EAP messages from A and forwards them to the AAA client using a UNIX socket. The AAA client is a Diameter EAP client that relies on the WIDEDiameter library. The latter library was developed during the Nautilus6 project and implements the Diameter Base Protocol [17]. 1 2 3

www.FreeBSD.org http://www.mobileip.jp http://open1x.sourceforge.net

Using PANA for Mobile IPv6 Bootstrapping

5.5

353

AAA Server/EAP Server/Home Agent

Host C supports an EAP server, a Diameter EAP server, IPsec components (IKE, AH and ESP), and a HA. SHISA for Mobile IPv6 and HA support was installed as well as the racoon version from Francis Dupont. The Diameter EAP server relies on the WIDEDiameter library. It communicates with the EAP server thanks to a local UNIX socket in order to authenticate the EAP client within the MN. As soon as the authentication is successful, the Diameter EAP server sends the HA address and the HoA to the MN. The assignment is done according to the identity of the MN and a predeﬁned users’ database. A shell script is then launched in order to conﬁgure racoon and IPsec policy and to execute racoon. 5.6

Tests and Results

The implementation of this proposal succeeds. It was carried out in several steps. First of all we tested separately each one of the protocols to make sure that they worked well. Then a modular approach was employed to test if the MN receives correctly the HoA and the HA addresses: on the one hand the AAA server must deliver them correctly to the AAA client and so to the PAA, on the other hand the PAA must deliver them correctly to the MN. Besides we also made sure that the PSK calculated by the AAA server and the MN were identical and correct. Once all these tests were positive, racoon and MIPv6 were launched automatically by PaC and AAAH server using Shell scripts. The MN and HA were checked to conﬁgure their SA and start their MIPv6 exchanges correctly. Due to some limitations of the racoon implementation, it was not possible to lead some tests with multiple Mobile Nodes. However tests done with only one Mobile Node have shown that the average time necessary to the MN to start using Mobile IPv6 is about 8.4 s: 0.04 s for PANA procedure, 3.8 s between the end of PANA procedure and the beginning of IKE procedure, 2.2 s for IKE procedure, 1.3 s between the end of IKE procedure and the beginning of BU/BA procedure, and 0.8 s for BU/BA procedure. These measurements were realized on a platform composed of three computers with processors’speed of about 1 GHz, memory of 512 Mo, and 100 Mbit/s links. It would have been interesting to perform similar measurements on the same platform using DHCPv6 approach. Unfortunately no implementation of this approach is currently available. By the way, it is important to notice, as the bootstrapping is done when the MN is switch on, the bootstrapping time is not critical as well as mobility mechanisms for handovers.

6

Conclusion

Deploying a new protocol is a challenging task for Internet Service Providers. In the Mobile IPv6 case, ISPs must deploy Home Agents inside their networks and

354

J. Bournelle et al.

customers need equipments implementing the protocol. Moreover, as explained in this article, it is necessary to have a scalable way to dynamically assign some Mobile IPv6 speciﬁc parameters to customers’ devices. Our proposal provides a way to achieve that goal by slightly modifying the PANA protocol which is under standardization process at the IETF. This approach avoids additional DHCPv6 exchanges and is well optimized with a close integration to the network access authentication.

Acknowledgements We would like to thank the Nautilus6 project4 which provides us the Diameter Base Protocol implementation and Francis Dupont for his patch to racoon. We also would like to thank Fabien Allard (France Telecom R&D) for having installed our implementation and performed some tests on a diﬀerent platform.

References 1. D. Johnson, C. Perkins, and J. Arkko. Mobility Support in IPv6. RFC 3775, June 2004. 2. J. Arkko, V. Devarapalli, and F. Dupont. Using IPsec to Protect Mobile IPv6 Signaling between Mobile Nodes and Home Agents. RFC 3776, July 2003. 3. A. Patel, K. Leung, M. Khalil, H. Akhtar, and K. Chowdhury. Authentication Protocol for Mobile IPv6. RFC 4285, January 2006. 4. A. Patel and G. Giaretta. Problem Statement for Bootstrapping Mobile IPv6. RFC 4640, September 2006. 5. C. Kaufman. Internet Key Exchange (IKEv2) Protocol. RFC 4306, December 2005. 6. B. Aboba, L. Blunk, J. Vollbrecht, J. Carlson, and H. Levkowetz. Extensible Authentication Protocol (EAP). RFC 3748, June 2004. 7. G. Giaretta, J. Kempf, and V. Devarapalli. Mobile IPv6 bootstrapping in split scenario. draft-ietf-mip6-bootstrapping-split-03, October 2006. 8. K. Chowdhury and A. Yegin. Mip6-bootstrapping via DHCPv6 for the Integrated Scenario. draft-ietf-mip6-bootstrapping-integrated-dhc-01, June 2006. 9. R. Droms, J. Bound, B. Volz, T. Lemon, C. Perkins, and M. Carney. Dynamic Host Conﬁguration Protocol for IPv6 (DHCPv6). RFC 3315, July 2003. 10. D. Forsberg, Y. Ohba, B. Patil, H. Tschofenig, and A. Yegin. Protocol for Carrying Authentication for Network Access. draft-ietf-pana-pana-12, August 2006. Work in progress. 11. C. Perkins. IP Mobility Support for IPv4. RFC 3344, August 2002. 12. R. Moskowitz and P. Nikander. Host Identity Protocol (HIP) Architecture. RFC 4423, May 2006. 13. J. Korhonen, J. Bournelle, H. Tschofenig, C. Perkins, and K. Chowdhury. The NAS - HAAA Interface for MIPv6 Bootstrapping. draft-ietf-dime-mip6-integrated-01, June 2006. 4

http://www.nautilus6.org

Using PANA for Mobile IPv6 Bootstrapping

355

14. K. Chowdbury and A. Liora. RADIUS Attributes for Mobile IPv6 bootstrapping. draft-chowdbury-mip6-bootstrapp-radius-01.txt, October 2004. 15. J. Bournelle, M. Laurent-Maknavicius, and J-M. Combes. Using PANA in the Mobile IPv6 Integrated Case. draft-bournelle-pana-mip6-01, June 2006. 16. D. Harkins and D. Carrel. The Internet Key Exchange (IKE). RFC 2409, November 1998. 17. P. Calhoun, J. Arrko, E. Guttman, G. Zorn, and J. Loughney. Diameter Base Protocol. RFC 3588, September 2003.

Detecting 802.11 Wireless Hosts from Remote Passive Observations Valeria Baiamonte1 , Konstantina Papagiannaki2, and Gianluca Iannaccone2 Politecnico di Torino Intel Research Cambridge 1

2

Abstract. The wide deployment of 802.11 WLANs has led to the coexistence of wired and wireless clients in a network environment. This paper presents a robust technique to detect 802.11 wireless hosts through passive observation of client traﬃc streams at the edge of the network. It is based on the estimation of entropy of packet interarrival times and on the analysis of variation in the measured entropy values across individual end host connections. With the aim of generating a physical layer “signature” that can be easily extracted from packet traces, we ﬁrst perform controlled experiments and analyse them through Spectral Analysis and Entropy evaluation. Based on the gained insight we design a methodology for the identiﬁcation of 802.11 wireless clients and test it on two data sets of packet-level traces collected in diﬀerent networks. Our results demonstrate that wireless identiﬁcation is highly precise in the presence of a suﬃcient traﬃc sample.

1

Introduction

The proliferation of 802.11 WLANs has led to the emergence of hybrid local area networks where wireless and wired clients can seamlessly access the network’s resources. However, network access through 802.11 imposes a diﬀerent kind of constraints on the network design. Wireless clients are likely to feature increased mobility, whereby the client point of attachment to the network may diﬀer from one point in time to the next, aﬀecting diﬀerent parts of the infrastructure. Wireless client requirements with respect to throughput and latency may be harder to meet given the unpredictable nature of the wireless medium. Security solutions implemented in the core of the network may need to incorporate knowledge on the type of client to enforce appropriate solutions. Lastly, even the provisioning of services addressed to network clients may signiﬁcantly beneﬁt from knowledge regarding the capabilities of such clients. Recent research work has focused on generally characterizing and studying wireless traﬃc with the aim of highlighting essential diﬀerences and analogies with wireline traﬃc [1,2]. While particular network environments make the identiﬁcation of wireless and wired clients trivial, say if clients obtain IP addresses from diﬀerent address blocks, there exist network environments where such a task is much harder to perform [3,4,5]. For such environments previous approaches have advocated the use of active measurements [3] or passive inspection of TCP I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 356–367, 2007. c IFIP International Federation for Information Processing 2007

Detecting 802.11 Wireless Hosts from Remote Passive Observations

357

traﬃc [4,5]. The fundamental issue with the ﬁrst approach is that it requires client cooperation for the collection of the appropriate information. The last two approaches, furthermore, precludes the identiﬁcation of clients that may use protocols other than TCP. In this work we look into the problem of passive identiﬁcation of wireless clients as observed at a location inside a network where a diverse set of client traﬃc is aggregated. We put forth the idea that a primary diﬀerence between wired and wireless systems is uncertainty which manifests itself at the physical and MAC layers. Uncertainty is naturally quantiﬁed using information theoretic measures such as that of entropy [6]. Techniques based on uncertainty estimation have lately been largely applied to traﬃc analysis. In [7,8,9], entropy of packet arrivals is used to detect traﬃc anomalies in wired networks. In this work, we exploit the concept of entropy to measure the “uncertainty” of wireless traﬃc. Variation of link quality and channel conditions together with the channel access mechanism employed by the 802.11 protocol, “brands” client traﬃc with a unique signature that can distinguish wireless from wired trafﬁc. We look into the fundamental operations of the 802.11 protocol and the way it can perturb client traﬃc due to the random access mechanism employed. More practically, we ﬁrst use active measurements in controlled experiments and deterministic traﬃc patterns. Our preliminary study draws on spectral analysis applied to packet interarrival times in such controlled environments. The inspection of the spectral density, in the signal built from the packet arrival process, lets us observe how intrinsic periodicity of sourced traﬃc can be either preserved or distorted depending on the physical medium traversed. Furthermore, spectral analysis allows us to deﬁne the signal bandwidth that needs to be ﬁltered to extract information from a traﬃc ﬂow, i.e. to deﬁne the time scales of interest for our purposes. Correctly ﬁltering the traces, we then perform entropy evaluation of packet interarrival times, observing the diﬀerences in the content information of wired and wireless traﬃc ﬂows. Based on these observations, we design an algorithm that can identify wireless clients through passive observations at an aggregation point. Using two diﬀerent data sets collected from an enteprise and a campus environment we demonstrate that our algorithm achieves high accuracy in host detection when provided with a suﬃcient traﬃc sample.

2

Extracting a Wireless Signature

Our goal is to capture the physical layer “signature” in a traﬃc ﬂow through the inspection of the packet interarrival times. We exploit two main features that a-priori diﬀerentiate wired from wireless access: (i) the unpredictability of the wireless medium, and (ii) the impact of the 802.11 medium access control (MAC) mechanism. Our conjecture is that both these mechanisms are bound to introduce random delays in the transmission of packets from a wireless host. Due to the unpredictability of propagation conditions on the shared radio medium, packets sent

358

V. Baiamonte, K. Papagiannaki, and G. Iannaccone

inside a wireless network may not be correctly decoded at the receiver. Packet loss causes wireless stations to retransmit, delaying the packet delivery at the receiver. Furthermore, every time a wireless station has a packet to send, it needs to contend for channel access. When many wireless clients co-exist in a WLAN, collisions are highly likely to happen, causing retransmission and increased backoﬀ delays. Consequently, delay and increased jitter in packet reception is one fundamental feature of wireless traﬃc. If a host were to transmit the same constant bit rate (CBR) ﬂow using the wired and wireless medium, we would expect to see increased variation in interarrival times in the wireless transmission compared to a purely periodic stream if access were wired. However, even in the case of a purely periodic data source inference of the access technology may still become complicated due to the behavior of the host and the location of the observation point. The transmission of a purely periodic traﬃc stream from a highly loaded host is unlikely to maintain its periodicity when observed at a remote location. Moreover, network congestion and multiplexing with other traﬃc may distort the original signature in the traﬃc. Such a task becomes even more challenging when the original traﬃc stream is not periodic. If the application used by the client does not generate periodic traﬃc (like VoIP for instance) we need to focus our attention to those parts in the traﬃc stream that relate to back to back packets. If packets are transmitted back to back, we would expect that the wireless medium will be able to distort their time of arrival at the receiver. Particular transport layers are bound to generate such back to back packets, for instance when TCP is transmitting a window of packets. However, if the transport layer protocol itself does not allow the observation of interarrival times of packets that have been generated closely spaced to each other, then the wireless signature will be very hard to recover. In what follows we use controlled experiments to study the behavior of periodic traﬃc when transmitted through the wireless medium. This section allows us to identify ways in which we can capture “clean” physical layer signatures that do not suﬀer from the above limitations. Diﬀerent wireless scenarios are tested to investigate whether this signature is kept and how it varies across diﬀerent conditions. The results of these experiments can clarify and better motivate the design of the methodology presented in Section 3. Experimental scenario. We generate traﬃc ﬂows from wired and wireless clients inside a private LAN to destinations outside the LAN (several Planet Lab nodes. We use the tg traﬃc generator to send UDP and TCP traﬃc streams. The UDP streams are at constant bit rate of 1.6 Mbps, with 1000 byte packets sent every 5 ms. The TCP connections consist of the bulk transfer of large ﬁles (23Mbytes). All experiments are repeated both when clients attach to the network using a ethernet interface and when they use an 802.11b wireless NIC. All experiments are accompanied by sniﬀers at all clients, the AP (running HostAP), as well as a collection point at the edge of the private LAN that could represent a typical aggregation point where our algorithms would be deployed.

Detecting 802.11 Wireless Hosts from Remote Passive Observations

359

More speciﬁcally, when it comes to wireless experiments we test the following diﬀerent scenarios that may have a signiﬁcant impact on the wireless signature captured: 1. Good: a topology where one client is placed in a good location and there is no contention (1 wireless client close to the AP, signal level: -43 dBm); 2. G-Cong: 3 clients are placed in good locations relative to the AP and contend for channel access (client-AP signal level: -43 dBm); 3. Bad: a topology where 1 client is placed in a bad location and there is no channel contention (1 wireless client far from the AP, signal level: -65 dBm); 4. Bad-Cong: a topology where 3 clients are placed in a bad location and contend for channel access (client-AP signal level: -65 dBm). Spectral Analysis The main purpose of this preliminary analysis is to investigate how the physical layer characteristics inﬂuence the explicit (e.g. CBR traﬃc) or implicit (e.g. TCP window of packets) periodicity of a traﬃc ﬂow. Spectral analysis is carried out through the Discrete Fourier Transform DFT and speciﬁcally applied to the process of packet interarrival times. It aims at estimating the harmonic content of a traﬃc stream, such as to point out the frequency bandwidth whose power spectral density can be related to the MAC and physical layer behavior. The power spectrum of packet interarrivals not only keeps statistical information of a traﬃc ﬂow, as a Probability Mass Function (PMF) would do, but it also describes how the “energy content” is spread over the frequency domain and how the “information content” is distributed over the signal bandwidth. More in detail, the signal to be processed is a discrete sequence deﬁned on the time domain and describes the process of packet arrivals: 1 nT0 ≤ tarr < (n + 1)T0 (1) x[n] = 0 elsewhere x[n] is built as a sequence of pulses of amplitude 0 or 1 spaced by T0 . Pulses of amplitude 1 at nT0 correspond to packet arrivals occurring at times tarr , with nT0 ≤ tarr < (n+1)T0 . The time granularity T0 has to be precise enough to keep trace of all packet arrivals, with no overlapping of two or more packet arrivals in the discrete time interval [nT0 , (n + 1)T0 ]. To this end, since packet arrivals are never spaced by less than 20μs1 , we have chosen T0 = 20μs, thus the resolution on the frequency domain will be f0 = 1/T0 , equal to 50KHz. The observation window is N T0 wide, i.e. the total number of observed samples is equal to N . The Fourier transformed signal, deﬁned on the discrete frequency domain is then: N −1 nk 1 x[nT0 ]e−j2π N X[kf0 ] = N n=0 1

Time slot duration in IEEE802.11.

(2)

V. Baiamonte, K. Papagiannaki, and G. Iannaccone

1

1

0.9

0.9 CDF of FFT of Inter−Arrival Times

CDF of FFT of Inter−Arrival Times

360

0.8 0.7 0.6 0.5 0.4 udp good notCong udp good Cong udp bad notCong udp bad Cong udp wired

0.3 0.2 0.1 0 0

0.002

0.004 0.006 Time [s]

0.008

0.8 0.7 0.6 0.5 0.4 tcp good notCong tcp good Cong tcp bad notCong tcp bad Cong tcp wired

0.3 0.2 0.1 0

0.01

−4

10

(a) DFT CPS (UDP) −6

x 10

5

4.5

4 FFT Power Spectrum

FFT Power Spectrum

x 10

4.5

4 3.5 3 2.5 2 1.5

3.5 3 2.5 2 1.5

1

1

0.5

0.5

0

−2

10

(b) DFT CPS (TCP)

−6

5

−3

10 Time [s]

−4

10

−2

10 Time [s]

0

10

(c) DFT PS (wired TCP)

2

0

−4

10

10

−2

10 Time [s]

0

10

(d) DFT PS (wireless TCP)

Fig. 1. DFT Cumulative Power Spectrum Analysis

Further, before treating x[n] with spectral analysis, the discrete sequence is subtracted of its mean value. The mean value results in a large DC component in the spectrum that does not provide any useful information for our classiﬁcation study. The DFT power spectrum P [n] is expressed by P [kf0 ] =| X[kf0 ] |2

(3)

To compare diﬀerent power spectra, the Cumulative Power Spectrum CPS is used. It is normalized to the total power of the signal as n k=0 P [kf0 ] C(n) = N −1 , 2 k=0 | X[kf0 ] |

0≤n
(4)

Power Spectrum results presented here are plotted as a function of time, instead of frequency, so as to relate the periodicity content information to the time domain and achieve an intuitive meaning. This is simply obtained rescaling the x-axis. In Figure 1(a), CPS of UDP wired and wireless CBR ﬂows is computed. Interestingly, the distortion caused by the physical layer is clearly visible and deeply aﬀects the periodicity of CBR ﬂows. Packets have been generated as one every 5ms. If their transmission were not delayed, we would expect the ideal frequency spectrum of a pulse train with a pulse every N/T , N = 1, 2... and T = 5ms. The CPS, hence, would appear as a step function with a step every T .

Detecting 802.11 Wireless Hosts from Remote Passive Observations

361

Focusing on the “udp wired” curve, indeed, we notice a step behavior due to bumps of power corresponding to the main harmonic components at N/T . This conﬁrms that periodicity is actually preserved on the wire, whereas it is signiﬁcantly reduced for wireless streams. The “wireless good and not congested” case still presents a step behavior, even though the amplitude of bumps decreases, while, when congested and bad located ﬂows are analysed, the periodicity of CBR traﬃc completely disappears. In order to observe a similar behavior for TCP streams, we ﬁrst need to observe how the signal power is distributed over the frequencies of a TCP signal. For this reason, Figures 1(c) and 1(d) are reported. There, the DFT Power Spectrum of a TCP wireless and a TCP wired connection are respectively compared. In the ﬁrst, clear high-powered spectrum components are concentrated at the low time scale. These harmonic components refer to those packets sent inside the TCP congestion window, whose transmission is not delayed by the reception of TCP acknowledgements. A large amount of power is then concentrated at the higher time scales and expresses the eﬀect of Round Trip Times RTTs. In the second power spectrum, shown in Figure 1(d) the transmission over the wireless medium has the eﬀect of removing most of the periodicity, by reducing the signal power and distributing it over the whole bandwidth. Dominant frequencies are no more discernible. When the original traﬃc stream is not periodic, in order to catch the periodicity distortion, we need to focus our analysis on those parts of the traﬃc streams where packets are sent in a back to back fashion, thus where the access network eﬀects are more visible. In the case of TCP traﬃc, this means taking into account the packets transmitted within a TCP window: we need to apply a ﬁlter and preserve only the high frequency bandwidth (i.e. the low time scales) where these phenomena can be caught. The ﬁlter can be easily applied over the sequence of packet interarrivals, dropping all the big interarrival times exceeding a certain threshold, TRT T . In this work TRT T has been choosen to be set to 10ms, so as to exclude big RTTs but include all the delays introduced by consecutive retransmissions in highly congested wireless networks. Thanks to this observation, we are able to isolate TCP wired ﬂow from all the TCP wireless ones through the CPS, as in the UDP/CBR case (Figure 1(b)). Sample Entropy The DFT methodology, is computationally expensive and is diﬃcult to be incorporated within an automated algorithm. A natural way to measure the “uncertainty’ is to use the concept of Entropy. We describe here the concept of empirical entropy, since we deal with empirical distributions. Let the random variable X denote the value of interarrival times, i.e. the time between back to back packets. X is randomly sampled or observed for m times inducing the empirical probability distribution on X, p(xi ) = mi /m, xi ∈ X and mi is the frequency of times X is observed to take the valuexi . The empirical entropy of interarrival time distribution is then H(x) := − xi ∈X p(xi ) log2 p(xi ). Empirical entropy values computed on the same distribution can be diﬀerent, depending on the

V. Baiamonte, K. Papagiannaki, and G. Iannaccone

1

Variance of Inter-arrival Packet Time

0.01

udp wless good notCong udp wless good Cong udp wless bad notCong udp wless bad Cong udp wired

0.1 0.01

Variance of Inter-arrival Packet Time

362

0.001 1e-04 1e-05 1e-06 1e-07

tcp wired tcp wless good notCong tcp wless good Cong tcp wless bad notCong tcp wless bad Cong

0.001

1e-04

1e-08 1e-09

1e-05 0

50

100 Time[s]

150

200

0

7

7

6 5 4 udp wless good notCong udp wless good Cong udp wless bad notCong udp wless bad Cong udp wired

1 50

100 Time[s]

200

250

6 5 4 3 tcp wired tcp wless good notCong tcp wless good Cong tcp wless bad notCong tcp wless bad Cong

2 1

0

150

(b) TCP, variance 8

Entropy of Inter-arrival Packet Time

Entropy of Inter-arrival Packet Time

(a) UDP, variance

2

100 Time[s]

8

3

50

150

(c) UDP, entropy

200

0

50

100

150

200

250

Time[s]

(d) TCP, entropy

Fig. 2. Variance and Entropy of interarrival packet times

time scales considered to discretize the samples. For our computations, a bin size of 100μs has been chosen to compute the PMF. We compute the entropy of the PMF of packet interarrivals across time. Interarrivals are ﬁltered as in the case of spectral analysis and each single point is computed acquiring the PMF of interarrival packet times every 20 seconds at the remote monitoring point. Figure 2 presents variance and entropy for UDP and TCP streams as a function of time. Entropy evaluation, diﬀerently from standard metrics like average value, median and variance, provides indeed a faithful estimation of the information content of a set of outcomes. Variance, for instance, is a measure of the average distance of the outcomes from the mean and quantiﬁes how spread a distribution is around its mean value. This is not enough, though, to quantify the amount of observational variety and randomness retained in a set of observations. As a matter of fact, distributions that present low values of variance may assume high entropy values and vice versa. In the case of our experiments, the variance evaluation for UDP experiments, shown in Figure 2(a), is already able to diﬀerentiate wired ﬂows from wireless ones. It also consistently increases in presence of congestion and low signal level within wireless environments. Delays and retransmissions result in a higher variability in packet arrivals, that aﬀects the innate periodicity of CBR traﬃc. This does not hold for TCP ﬂows though, whose variance is reported in Figure 2(b). Values range from 10−4 to 10−3 without a clear distinction among the diﬀerent cases.

Detecting 802.11 Wireless Hosts from Remote Passive Observations

363

In Figure 2(c) and Figure 2(d), instead, entropy values are clearly separated for all the cases, no matter which transport protocol is used. The information content is very low (around 1 bit) for the wired interarrivals, while it grows signiﬁcantly for wireless packet ﬂows. The diﬀerence between wired and wireless is still evident even when just one wireless client is transmitting in the WLAN.

3

Detection Algorithm

From the lessons learnt during the controlled experiments, we can deﬁne a generic algorithm to classify end hosts based on their access media. The input of our scheme is a packet-level trace collected at a monitoring point. In the ﬁrst stage, packets are aggregated on the basis of the IP address source. Within each IP-source set, 5-tuple ﬂows are then isolated. Interarrival times between consecutive packets are computed and then ﬁltered: only those, falling below TRT T , with TRT T = 10ms, are kept. The algorithm then computes two values: 1) the empirical entropy HIP , evaluated on the whole IP-source aggregated traces; 2) the empirical entropy of the largest (in terms of number of interarrivals) 5-tuple ﬂow of each IP-source trace, HIP,5 . We then deﬁne Variation of Entropy the diﬀerence ΔH = HIP − HIP,5 . The two values HIP and ΔH are used to classify hosts. The pseudo-code of the proposed methodology is reported below: If HIP ≤ Hlower then the host is wired else if HIP ≥ Hupper then the host is wireless else if Hlower < HIP < Hupper if ΔH ≥ ΔHT HR then the host is wired else if ΔH ≤ ΔHT HR then the host is wireless

The thresholds are trained using a small set of passive traces and chosen as Hlower = 3.5bits, Hupper = 5bits and ΔHT HR = 0.5. Figure 3 shows the PMF of entropy computed over the training dataset and helps explaining the selection of thresholds. The mass of entropy for wireless ﬂows is concentrated for HIP ≥ Hupper (the vertical dotted line on the right), while wired entropy values have higher probability for HIP ≤ Hlower (vertical line on the left). The PMF exhibits an overlapping area in the range [Hlower , Hupper ] bits, where the two distributions are superimposed. For the ﬂows that fall into that region we use the variation of entropy as discriminator. With wireless hosts, the uncertainty measured by HIP,5 , i.e. on the largest 5-tuple ﬂow, already accounts for the eﬀects introduced by the wireless transmission. As a consequence, adding other smaller 5-tuple ﬂows has a marginal impact on the value of the aggregate entropy resulting in a low ΔH.

364

V. Baiamonte, K. Papagiannaki, and G. Iannaccone

0.2 wired wireless

0.18 0.16

PMF of Entropy

0.14 0.12 0.1 0.08 0.06 0.04 0.02 0

0

1

2

3 4 Entropy, [bits]

5

6

7

Fig. 3. Probability Mass Function of Entropy

In the case of wired hosts, instead, the variation of entropy is driven by diﬀerent factors. When HIP is large, ΔH is measuring the impact of aggregating diﬀerent ﬂows. By adding more outcomes the distribution deﬁned over a limited interval becomes more informative, i.e., the aggregate entropy grows [6].

4

Evaluation

Performance evaluation is carried out on two diﬀerent datasets, namely Intel and Dartmouth traces. In order to evaluate the algorithm accuracy, results are compared with the ground truth, i.e. we know which IP addresses have been assigned to Ethernet and WLAN hosts form predeﬁned blocks. Intel traces are collected on the access link of the Cambridge laboratory using a CoMo monitoring system [10]. The traﬃc contains a mix of connections sourced from 93 diﬀerent hosts, out of which 27 use the wireless LAN. The Dartmouth traces are publicly available and refer to wide area wireless measurements taken at diﬀerent APs in the university campus of Dartmouth college. We analysed connections from 162 distinct wireless IP addresses. All results from both the datasets are summarised in Figure 4(a). Hosts with at least two interarrival times are scattered on the basis of the aggregated entropy (x-axis) and variation of entropy (y-axis). Vertical lines at 3.5 and 5 bits delimit the interval [Hlower , Hupper ], while the horizontal line at 0.5bits reports the threshold ΔHT HR , used to select on the basis of the variation of entropy. Most of the IP addresses map unambigously in the wireless or wired regions. A dense cloud of wireless hosts can be noticed where the aggregate entropy is above Hupper and the variation of entropy below ΔHT HR . Most of the wired hosts, on the other hand, are either spread in the region below Hlower , or in the overlapping zone where the variation of entropy is higher. However, several hosts fall in the region where the classiﬁcation is not certain. This is mainly due to the very small number of observations (i.e. few packets belonging to few connections) that we see in the traces from those end systems. Indeed, if we consider just those IP address for which we can measure at least 200 interarrival

Detecting 802.11 Wireless Hosts from Remote Passive Observations

4

4 wired wireless

3.5

3 Variation of entropy [bits]

Variation of entropy [bits]

wired wireless

3.5

3 2.5 2 1.5 1 0.5

2.5 2 1.5 1 0.5

0

0

−0.5

−0.5

−1 0

1

2 3 4 5 Entropy of aggregated flows [bits]

6

−1 0

7

1

2 3 4 5 Entropy of aggregated flows [bits]

6

7

(b) Interarr ≥ 200

(a) Full datasets Dartmouth Wireless Data

Intel IPs 100 Algorithm Accuracy on Intel IPs%

100 90 Algorithm Accuracy %

365

80 70 60 50

Entr + Var Entr Minimum Interarr Minimum Entr

40 0

50

100

150

200

Number of Interarrivals

(c) Accuracy (Dartmouth)

80

60

40

20

Entr + Var Entr Minimum Interarr Minimum Entr

0 250

0

50

100

150

200

250

Number of Interarrivals

(d) Accuracy (Intel)

Fig. 4. Performance on Dartmouth and Intel datasets

times, the accuracy improves signiﬁcantly. Figure 4(b) shows the scatter plot for those end systems. We have also compared the accuracy of our method with the accuracy of alternative (and simpler) approaches. Figures 4(c) and 4(d) compare the accuracy of three methods as a function of the number of observed packets per source. The curve labelled as “Entr + Var Entr” refers to our proposal. “Minimum Interarr” refers to an approach that just looks at the minimum interarrival between all packets from the same source. This approach is based on the fact that on the wireless media two back-to-back packets still need to contend for the medium. Therefore, if the interarrival is above 100μs (i.e., the minimum interarrival of two 802.11a/g packets with no payload) the source is considered a wireless host. Interestingly, the minimum interarrival method performs very well on the Dartmouth dataset (Figure 4c) while it performs poorly on the Intel dataset (Figure 4d). The reason behind this is that some packets are queued on the wired interface right before the monitoring point leading to very small interarrival times. By picking always the minimum interarrival, this method leads therefore to large errors. The curve “Minimum Entr” computes the minimum Entropy among all 5tuple ﬂows of a given source IP address. An Entropy below Hlower indicates that the source is wired. This approach always underperforms our proposed mechanism even when a large number of samples is considered.

366

5

V. Baiamonte, K. Papagiannaki, and G. Iannaccone

Related Works

Recent research work has focused on characterizing and studying wireless traﬃc with the aim of highlighting essential diﬀerences and analogies with wireline traﬃc [1] [2] [11]. The results from these studies were found informative to develop the proposed detection scheme. Studies closer to that presented in this paper are in [3,4,5]. In the ﬁrst, the authors propose an access network type classiﬁcation scheme, where a host interested in determining the connection type of a remote user, requests to send a known sequence of packets. Interarrival times of received packets are then recorded. At this stage, the host is able to infer the user’s connection type based on the median and entropy of the sequence of packet interarrival times. The scheme proves to be strongly reliable, but, unlike our solution, it requires the remote user cooperation. In the second, the classiﬁcation scheme is also strongly related to our solution. A Bayesian inference algorithm is developed to estimate and classify whether TCP ﬂows have traversed an 802.11 WLAN. The methodology relies on the time intervals occurring between TCP-ACK pairs and infers the connection type of a user based on this observations. Unlike the previous classiﬁcation scheme, detection is performed through passive measurements, but the identiﬁcation is possible only for clients that are employing the TCP protocol. The output of this algorithm is the fraction of wireless ﬂows with a certain degree of belief. The error, estimated as the diﬀerence between the inferred fraction of wireless ﬂows and the actual one, is bounded within +/- 0.05. The output of our algorithm, instead, extends this information by providing the identiﬁcation of each single host within a certain traﬃc trace. The accuracy error, computed as the number of mistaken detections over all considered hosts, reaches 1% and 6% in the two data sets. Moreover, the methodology in [4] makes an ineﬃcient use of traces because the detection is carried out only using TCP ACK-pairs with an inter-ACK time of 400us. (Due to TCP self clocking, wireless ﬂows show a low number of such ACK-pairs, thus often 90% of wireless ﬂows contain less than 10 ACK-pairs.) Diﬀerently our algorithm converges faster because it considers inter-packet pairs belonging to both UDP and TCP traﬃc connections whose time distance is below 10ms. The TCP-ACK pairs technique is also employed in [5]. In this work, some of the aforementioned limitations, e.g. the eﬃciency in the trace analysis, the promptness and accuracy of results, are overcome by using two eﬀective sequential hypothesis tests, with and without training data, respectively. However, the detection is still limited to hosts sourcing TCP connections only. Valid references to information-theoretic techniques applied to traﬃc analysis can be found in [7], [8], [9], where entropy of packet arrivals is used to detect traﬃc anomalies in wired networks. Research work in [12] discusses the use of information theory and uncertainty to characterize wireless networks. Finally, Spectral Analysis of traﬃc ﬂows is applied in [13], with the purpose of detecting and classifying denial of service attacks.

Detecting 802.11 Wireless Hosts from Remote Passive Observations

6

367

Conclusion

We have presented a method to classify wired and wireless hosts only based on passive traﬃc observations at a remote location. Our method does not require any cooperation from the end-systems and is protocol agnostic. The accuracy is comparable with previously proposed approaches and outperforms na¨ıve methods. Our future work is focused on testing the methodologies over larger datasets in an attempt to isolate the speciﬁc sources of errors in our classiﬁcation.

References 1. Hernandez-Campos, F., et al.: Assessing the real impact of 802.11 WLANs: A large scale comparison of wired and wireless traﬃc. In: LANMAN. (September 2005) 2. Balachandran, A., et al.: Characterizing user behavior and network performance in a public wireless LAN. ACM PER 30(1) (2002) 195–205 3. Wei, W., et al.: Classiﬁcation of access network types: LAN, wireless LAN, ADSL, cable or dialup? In: Proceedings of IEEE Infocom. (March 2005) 4. Wei, W., et al.: Identifying 802.11 traﬃc from passive measurements using iterative bayesian inference. In: Proceedings of IEEE Infocom. (April 2006) 5. Wei, W., et al.: Passive online rougue access point detection using sequential hypothesis testing with tcp ack-pairs. Technical report, University of Massachussets Computer Science (November 2006) 6. Cover, T., Thomas, J.: Elements of Information Theory. John Wiley (1991) 7. Adya, A., et al.: Architecture and techniques for diagnosing faults in IEEE 802.11 infrastructure networks. In: Proceedings of ACM Mobicom. (September 2004) 8. Lakhina, A., et al.: Mining anomalies using traﬃc feauture distributions. In: Proceedings of ACM Sigcomm. (August 2005) 9. Xu, K., et al.: Proﬁling Internet backbone traﬃc: Behavior models and applications. In: Proceedings of ACM Sigcomm. (August 2005) 10. Iannaccone, G.: Fast prototyping of network data mining applications. In: Proc. of PAM. (March 2006) 11. Ridoux, J., Nucci, A., Veitch, D.: Seeing the diﬀerence in IP traﬃc: Wireless versus wireline. In: Proceedings of IEEE Infocom. (April 2006) 12. Das, S., Rose, C.: Coping with uncertainty in mobile wireless networks. In: PIMRC. (September 2004) 13. Hussein, A., Heidemannan, J., Papadopoulos, C.: A framework for classifying denial of service attacks. In: IEEE Globecom. (December 2004)

A Scheme for Enhancing TCP Fairness and Throughput in IEEE 802.11 WLANs Eun-Jong Lee1 , Hyung-Taig Lim1 , Seung-Joon Seok2 , and Chul-Hee Kang1 Department of Electronics Engineering, Korea University 5-ga, Anam-dong, Sungbuk-gu, Seoul 136-701 Korea {lej,limht,chkang}@widecomm.korea.ac.kr Department of Computer Science and Engineering, Kyungnam University 449, wolyong-dong, Masan, 631-701 Korea [email protected] 1

2

Abstract. In this paper, we consider two fairness problems that occur in the infrastructure network (fairness between TCP uplink/downlink ﬂows and fairness between competing TCP uplink ﬂows). A large number of existing works have studied the TCP fairness issues and greatly solved the TCP unfairness problems. However, these solutions suﬀer from the drawback of TCP throughput degradation even though the TCP unfairness problem can be solved. In order to solve these problems eﬀectively, we propose a scheme that modiﬁes the receiver window size on the basis of maximum window size which is able to maximize link utilization. We have evaluated our scheme with ns-2 simulator and the results demonstrate that our scheme greatly improves both the TCP fairness and total throughput.

1

Introduction

Wireless local area networks (WLAN) based on the IEEE 802.11 standard are rapidly being deployed in hot spots such as hotels, libraries, and airports to provide the nomadic users with open access to the Internet. This WLAN is divided into two networks: infrastructure and Ad-hoc WLAN. The former is a network where all mobile stations only communicate through an access point (AP) in order to access to wired network and the latter is a network where mobile stations communicate directly with each other without access to the wired network. The IEEE 802.11 standard provides two channel access functions: Distributed Coordination Function (DCF) and Point Coordination Function (PCF). DCF is contention-based channel access method using carrier sense multiple access with collision avoidance (CSMA/CA). The CSMA/CA guarantees equal medium access opportunity to all stations. PCF is a contention free channel access method employing a polling technique by the Point Coordinator (PC) which determines the station that is going to transmit data. In this paper, our target network is infrastructure WLAN under DCF and we assume that the bottleneck between TCP end nodes is only on the wireless link. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 368–379, 2007. c IFIP International Federation for Information Processing 2007

A Scheme for Enhancing TCP Fairness

369

Currently, most of WLANs support the DCF with CSMA/CA mechanism. However, in the infrastructure network in the presence of both mobile senders and receivers, this mechanism leads to signiﬁcant unfairness problems between uplink and downlink ﬂows. In this paper, two TCP unfairness problems that occur in infrastructure WLAN are considered. Many existing works have studied the TCP fairness issues and greatly solved the TCP unfairness problems. However, these solutions suﬀer from the drawback of TCP throughput degradation even though the TCP unfairness problem can be solved. In order to solve these problems, we propose a scheme that modiﬁes the advertised receiver window size on the basis of maximum window size which is able to maximize the channel utilization. Our scheme is only deployed on AP and we do not conduct any changes to the mobile station. The rest of the paper is organized as follows. Section 2 describes two TCP unfairness problems in infrastructure WLANs in detail. Section 3 presents related works. Section 4 investigates the source of the TCP throughput degradation in an existing solution. In Section 5, we describe the proposed scheme. In Section 6, we evaluate the proposed scheme based on simulation results. Finally, in Section 7, we present our conclusions.

2

TCP Unfairness in Infrastructure WLANs

In this section, we illustrate two unfairness problems that arise between TCP ﬂows in the infrastructure network. 2.1

Unfairness Between TCP Uplink/Downlink Flows

The scenario we consider is shown in Figure 1, where mobile stations in the WLAN communicate with a ﬁxed server in a high-speed wired network through an AP. The mobile stations consist of one TCP sender and n TCP receivers. The TCP sender forwards TCP data to the AP through the uplink and the AP forwards TCP data to the mobile receivers through the downlink. All TCP data that will be transmitted to the mobile receivers are queued in the downlink buﬀer of AP and all TCP data that will transmit from the mobile sender are queued in a buﬀer of the each mobile station.

Fig. 1. Network topology

370

E.-J. Lee et al.

Fig. 2. TCP throughput for multiple ﬂows; (a) 4 downlink ﬂows and 1 uplink ﬂows, (b) 4 downlink ﬂows and 4 uplink ﬂows

Fig. 3. TCP received bytes for 14 uplink ﬂows in IEEE 802.11b WLAN

Since the WLAN under DCF provides equal access opportunity to the media for all mobile stations, the AP and TCP sender equally compete to obtain channel access opportunity. The AP and one mobile sender are provided with equal access opportunities to the media. Hence, the mobile sender gets half of the channel bandwidth and the remaining half is equally shared by all the mobile receivers through the AP. Therefore, the data to be sent from the TCP sender has n times more opportunities than the data to be sent to the TCP receiver. The more the number of mobile sender increases, the more serious becomes the unfairness between uplink and downlink ﬂow. Since the wired network is assumed to be a fast Ethernet at 100Mbps, the downlink access to the IEEE 802.11b at 11Mbps is the bottleneck. Moreover, the AP buﬀer suﬀers from heavy traﬃc load since the TCP data and ACKs share the AP downlink buﬀer. Consequently, overﬂow occurs at the AP and TCP senders of downlink ﬂow recognize the packet drop as network congestion and reduce their data sending rate accordingly. However, the uplink ﬂow still arrives at the maximum window size while the downlink ﬂow struggles with the small window size of packets [1]. In order to conﬁrm this problem, a simulation is conducted using the ns-2 simulator. Without loss of wireless channel, multiple mobile stations are set separately as senders and receivers to communicate with a ﬁxed server. The AP queue limit is set to 100 packets [1]. The propagation delay of the wired network is set at 50ms. The TCP version is Reno and the size of TCP/IP packets is ﬁxed at 1040bytes. The

A Scheme for Enhancing TCP Fairness

371

default receiver window size is set to 60 packets and we run the simulation for 190 seconds. The simulation results for received bytes for each receiver of multiple uplink and downlink ﬂow are presented in Figure 2. As shown in Figure 2 (a), it is obvious that whenever there is an uplink ﬂow passing through the AP, the throughput of all downlink TCP ﬂows is decreased greatly. Moreover, as shown in Figure 2 (b), when the number of uplink ﬂows increases, the unfairness between uplink/downlink ﬂows becomes much more serious. 2.2

Unfairness Between TCP Uplink Flows

The greedy closed loop control nature of TCP leads to the unfair sharing of the channel bandwidth among each TCP, even when all mobile senders are in the WLAN. Consider the scenario of WLANs where all mobile stations operate as the TCP sender. When TCP senders transmit TCP data, the wireless stations queue TCP data to be sent to the destination and the returning TCP ACK packets are queued at the downlink buﬀer of the AP to access the wireless channel. For large numbers of stations, the AP downlink buﬀer is overwhelmed with the TCP ACKs, and then the TCP ACK may be dropped by buﬀer overﬂow. The dropping of TCP ACKs can disturb congestion window growth and invoke repeated timeouts. TCP ﬂows with a small number of packets in ﬂight are liable to invoke timeouts since the loss of a small number of data or ACK packets is suﬃcient to induce a timeout. Hence, TCP ACK losses at the AP buﬀer can easily occur in a newly started TCP ﬂow [2]. Therefore, these stations cannot escape from this adverse condition. We conﬁrmed the unfairness between competing uplink ﬂows using ns-2 simulator. Figure 3 presents the received bytes for each TCP receiver. As shown in Figure 3, some stations experience starvation as a result of repeated timeouts.

3

Related Works

Many existing works have studied the TCP fairness issues in the cellular network, infrastructure WLANs and ad-hoc WLANs. In this section, we focus on the existing solutions related to infrastructure WLANs. These solutions are largely divided into two methods - a method for increasing the sending rate of the downlink ﬂows and decreasing the sending rate of the uplink ﬂows. The former method is illustrated in [2]. The authors ensure fairness between competing TCP uploads and downloads by prioritizing to TCP downlink trafﬁc using the 802.11e AIFS, TXOP and CWmin parameters. The authors also indicated the unfairness between competing uplink ﬂows and for this problem, proposed the ACK prioritization scheme. However, this solution suﬀers from the weak point that is possible to adapt only to the network that supports IEEE 802.11e MAC. The latter method is illustrated in [1], [3] and [4]. The authors in [3] and [4] proposed a scheme that suitably drops the incoming data to AP uplink buﬀer. The solution in [3] is based on rate control mechanisms using the token bucket

372

E.-J. Lee et al.

ﬁlter and the scheme proposed in [4] ensures the uplink/downlink fairness using a virtual queue management, named VQ-RED. However, these solutions suﬀer from the drawback that they waste wireless channel utilization. Finally, the solution proposed in [1] is a TCP receiver window size manipulation scheme. The authors indicated that TCP uplink/downlink unfairness is inﬂuenced by AP buﬀer size and then proposed a scheme that modiﬁes the receiver window size in TCP ACK into min (rwnd, buﬀer size / ﬂow number) so that any loss of the packet does not occur in the downlink buﬀer. This solution is the most eﬃcient among existing solutions because the TCP sender can reduce the data sending rate by itself and also dose not waste the wireless channel utilization owing to no loss of data at AP. Moreover, this method can be supported irrespective of IEEE 802.11 MAC type. These explain why our scheme is based on TCP receiver window size manipulation.

4

TCP Throughput Limits in an Existing Solution

Existing solutions suﬀer from the drawback of TCP throughput degradation while the TCP unfairness problem can be solved. We investigate the cause of the TCP throughput degradation based on TCP receiver window size manipulation that is most eﬃcient among existing solutions.

Fig. 4. TCP throughput ratio (uplink/downlink) for RTT

Fig. 5. TCP received bytes for RTT

A Scheme for Enhancing TCP Fairness

373

We conﬁrmed TCP throughput degradation by ns-2 simulation. Figure 4 shows the TCP uplink/downlink throughput ratio for Round Trip Time (RTT) in the case of the IEEE 802.11b standard applying the TCP receiver window size manipulation scheme (denoted in the following as “B/n”, where B is the AP buﬀer size and n is the number of TCP ﬂows) and without applying any scheme (denoted in the following as “default”). As shown in Figure 4, the B/n scheme ensures the TCP uplink/downlink fairness irrespective of RTT while the default case suﬀers from signiﬁcant unfairness during a small RTT. However, Figure 5 displays that the B/n scheme greatly decreases TCP total throughput in the case of an increase in RTT. The B/n scheme never reaches the maximum window size due to the limitation on the TCP congestion window growth. The TCP sender chooses the minimum value between congestion window and receiver window, this solution restricts the receiver window size to B/n. In other words, the TCP throughput is as follows. T CP throughput =

W indow size RT T

(1)

Then, we recognize that the B/n scheme limits the window size only to AP buﬀer size regardless of the increase or decrease in RTT. Thereby, TCP throughput with constant window size is reduced according to the increase in RTT. In order to solve this problem, we investigate the maximum window size that is able to be accommodated in the link without loss of the channel. The maximum window size considers a single bottleneck link with a capacity of μ packets per second and a FIFO buﬀer of size of B packets. For each connection, T denotes the propagation delay plus the service time in the bottleneck, namely RTT. The maximum window size is as follows. Wmax = B + μT

(2)

In this case, the bottleneck buﬀer is always full of packets and there are μT packets in ﬂight [5]. The scheme that modiﬁes receiver window size into min (rwnd, B/n) limits the window size only to AP buﬀer size (B) regardless of the factor titled T. Based on these causes, if the sending rate of all senders is limited by the standard of the maximum window size, not only TCP fairness will be guaranteed, but also link utilization will be maximized.

5

Proposed Solution

In this section, we illustrate the proposed solution which improves both TCP fairness and total throughput through a simple correction based on TCP receiver window size manipulation scheme. 5.1

Wmax -Based TCP Receiver Window Size Manipulation Scheme

We deﬁne proposed solution as Wmax -based TCP receiver window manipulation scheme. This scheme is to use the receiver window size ﬁeld in the acknowledgment

374

E.-J. Lee et al. Table 1. IEEE 802.11b Parameter Values Parameter Deﬁnition

Value

Tslot τ Tp TPHY TDIFS TSIFS CWmin CWmax LMAC H LMAC ACK LRTS LCTS LTCP data LTCP ACK Rdata Rcontrol

20 μs 1 μs 144 μs 48 μs 50 μs 10 μs 31 μs 1024 μs 68 bytes 38 bytes 44 bytes 38 bytes 1040 bytes 40 bytes 11 Mbps 2 Mbps

slot time Propagation delay Transmission time of the physical Preamble Transmission time of the PHY header DIFS time SIFS time Minimum backoﬀ window size Maximum backoﬀ window size MAC overhead ACK size RTS size CTS size TCP data size TCP ACK size Data rate Control rate

packet. This ﬁeld is originally used to match sending rate of TCP sender to processing rate of TCP receiver. Since the TCP sender chooses the minimum value between congestion window and receiver window, we can naturally reduce the sending rate of TCP using the receiver window ﬁeld. Thus by manipulating the receiver window size at the AP, we can ensure that the TCP sending rate is limited to the value we calculate. A similar approach was used in [1] for TCP uplink/downlink fairness. However, this solution, as shown in the Section 4, suﬀers from TCP throughput degradation. Our solution is to modify the receiver window size on the basis of the maximum window size which is able to maximize the link utilization instead of the AP buﬀer size. We modify the receiver window size in all TCP acknowledgment packets passing through AP into not min (rwnd, B/n), but min (rwnd, Wmax /n). In our solution, the maximum window size is calculated by AP when TCP connection opens. This value is originally for a single connection, but we have to consider a network where there are multiple TCP connections. However, this is not a problem since our solution calculates the value for a single connection at the AP and then divides the maximum window size among all TCP connections. 5.2

Calculating Maximum Window Size

We consider a particular scenario of WLANs where a single mobile station communicates with a ﬁxed server. In this case, we refer to the maximum window size shown in expression (2) that can be accommodated in steady state in the link. The bottleneck link corresponds to the downlink of AP in our scenario.

A Scheme for Enhancing TCP Fairness

375

Therefore, B is equal to AP buﬀer size and μ is the same as AP service rate. Therefore, we can newly present the expression (2). Wmax = AP Buf f er size + AP Service rate · RT T

(3)

In this case, the AP can easily ﬁnd the AP buﬀer size and the RTT (we assume that we can measure RTT using ICMP packet, e.g. ping). Then we need to ﬁnd the AP service rate (μ) In table 1, we deﬁne the IEEE 802.11b parameter value that will be used in our simulation and then we calculate the AP service rate based on the deﬁned parameter value. The AP service rate (μ), TCP data transmission time (TTCP data ), and TCP ACK transmission time (TTCP ACK ) are as follows [6]. μ=

TT CP

data

Ttcp

1 data + Ttcp

(packet/seconds)

(4)

ack

= CW + TDIF S + TP + TP HY +

LRT S Rcontrol

LCT S + TSIF S + TP Rcontrol LM AC H + LT CP /IP H + LT CP data +TP HY + Rdata LM AC ACK +TSIF S + TP + TP HY + Rcontrol +TSIF S + TP + TP HY +

TT CP

ACK

= CW + TDIF S + TP + TP HY +

(5)

LRT S Rcontrol

LCT S Rcontrol LM AC H + LT CP /IP + Rdata LM AC ACK + Rcontrol

+TSIF S + TP + TP HY + +TSIF S + TP + TP HY +TSIF S + TP + TP HY

H

(6)

In the above, CW is the average backoﬀ time [7] given by CW =

CWmin Tslot 2

(7)

We can derive from expression (5), (6) the summation of TCP data transmission time and TCP ACK transmission time and the value is 3868 μs. Then, we can derive the AP service rate using expression (4) and the value is approximately 2.15Mbps. Hence, we can again deﬁne the maximum window size as follows. Wmax = AP buf f er size + (2.15M bps ∗ RT T ) We will use this expression for our scheme.

(8)

376

5.3

E.-J. Lee et al.

Measuring TCP Flow Number

In order to implement our solution, we need to measure the number of current TCP ﬂows in the link. This measurement is similar to the method in [1]. Practically, it is very diﬃcult to know the exact number of active ﬂows, especially in the case of ﬂows in an open connection. In addition, determining the direction of data is very hard (uplink ﬂow or downlink ﬂow). However, these are not problems for our solution since we count only the number of active ﬂows regardless of the direction and consider a network where only TCP ﬂows are present. In order to count the number of active TCP ﬂows, we keep monitoring a pair of IP address and port number in the header of TCP packets passing through the AP. If a new ﬂow is observed, the variable storing the number of active ﬂows is increased by 1 and if a particular ﬂow is not observed during a regular period, the variable is decreased by 1.

6

Simulation Results

In this section, we evaluate the performance of the proposed scheme by ns-2 simulator. The simulation scenario is depicted in Figure 1 and runs on equal terms with topology used in previous simulation with the exception of the number of TCP sender. This scenario assumes a network where the number of TCP sender is equal to the number of TCP receivers. In order to evaluate the performance, we have two important performance metrics: TCP fairness and total throughput.

Fig. 6. TCP throughput ratio for RTT

Fig. 7. TCP received bytes for RTT

A Scheme for Enhancing TCP Fairness

377

Fig. 8. TCP throughput ratio for number of stations

Fig. 9. TCP received bytes for number of stations

TCP fairness is the ratio of the TCP uplink throughput is the summation of uplink throughput and downlink throughput. The simulations run with various parameter functions such as RTT, number of stations and error rate. Further, we compare the performance metrics for three reference models: default, B/n and our solution (denoted in following as “Wmax /n”) Figure 6 and 7 show the throughput ratio and total throughput for varying RTT when the mobile stations are organized into four TCP sender and four TCP receivers respectively. As shown in Figure 6 and 7, our solution ensures uplink/downlink fairness similar to that of the B/n scheme and total throughput reaches as much as the default and provides a throughput enhancement of approximately 180% higher than that of B/n scheme. Figure 8 and 9 present results for the increase of the number of stations. When the number of stations is above 10, the ratio is suddenly increased and TCP receivers in WLAN cannot receive packets because of dropping at AP. However, our scheme ensures fairness as in the B/n scheme as well as shows a throughput enhancement of 2 times than B/n as shown in Figure 9. Figure 10 and 11 show fairness and total throughput for the error rate of the wireless channel. When error rate is 0.2, both fairness and throughput of three reference models show no great diﬀerence since the TCP of uplink ﬂow cannot arrive at the desired window size due to the high error rate. When the error rate is 0.2, our scheme is no diﬀerent than other schemes, however, with a proper

378

E.-J. Lee et al.

Fig. 10. TCP throughput ratio for error rate

Fig. 11. TCP received bytes for error rate

Fig. 12. TCP received bytes for each uplink ﬂow ID

error rate not only the fairness is ensured like in the B/n scheme, but also the total throughput always reaches that of the default. Therefore, the unfairness issues in infrastructure WLAN consider only when the channel error rate is 0.2 and below. Finally, we conﬁrm the fairness and throughput between competing uplink ﬂows. As shown Figure 3 in section 2.2, the unfairness problem in IEEE 802.11b WLAN without applying any scheme is very serious. However, B/n and Wmax /n schemes fairly ensure the fairness among all mobile stations as shown Figure 12. Further, the received bytes for each TCP receiver in the case of using the Wmax /n scheme nearly show a throughput enhancement of 2 times than that of the B/n scheme.

A Scheme for Enhancing TCP Fairness

7

379

Conclusion

In this paper, we proposed a new scheme that modiﬁes the receiver window size on the basis of the maximum window size which is able to maximize the link utilization. Simulation results showed the fairness and total throughput compared with standard IEEE 802.11b and the case of applying a previous scheme. In the result, we conﬁrmed that our scheme greatly improves not only the fairness between uplink and downlink ﬂows, but also between competing uplink ﬂows like that of the previous scheme, tagged B/n. In addition, we showed that the total throughput of our proposed scheme can reach as much as the default in all cases and approximately 2 times that of the B/n scheme. However, our scheme suﬀers from the disadvantage that it has to modify the TCP packet at AP. This packet modiﬁcation would violate the TCP end-to-end semantics. Therefore, this point must be considered in future research. Acknowledgments. This research was supported by the MIC (Ministry of Information and Communication), Korea, under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute of Information Technology Assessment).

References 1. Pilosof S, Ramjee R, Raz D, Shavitt Y and Sinhan P: Understanding TCP fairness over wireless LAN, In Proceedings of IEEE INFOCOM 2003, Volume 2, (April 2003) pp.863-872. 2. D.J. Leith, P. Cliﬀord, D.W. Malone and A Ng: TCP fairness in 802.11e WLANs, IEEE Communication Letters, Volume 9, Issue 11, (Nov. 2005) pp.964- 966. 3. Blefari-Melazzi, N., Detti, A., Ordine, A. and Salsano, S.: Controlling TCP Fairness in WLAN access networks using a Rate Limiter approach, 2nd International Symposium on Wireless Communication Systems, (Sept. 2005) pp.375-379. 4. Xiaoyang Lin, Xiaolin CKaveg hang and Muppala,J.K: VQ-RED: An Eﬃcient Virtual Queue Management Approach to Improve Fairness in Infrastructure WLAN, The IEEE Conference on Local Computer Networks, (Nov. 2005) pp. 632-638. 5. T.V. Lakshman and U. Madhow: The performance of TCP/IP for networks with high bandwidth-delay products and random loss, IEEE/ACM Transaction on Networking, volume 5, Issue 3, (June 1997) pp.336-350. 6. Daniele Miorandi, Arzad A. Kherani and Eitan Altman: A Queueing Model for HTTP Traﬃc over IEEE 802.11 WLANs, Computer Networks, Volume 50, Issue 1, (Jan 2006) pp. 63-79. 7. Y. Xiao and J. Rosdahl: Throughput and delay limits of IEEE 802.11, IEEE Communications Letters, Volume 6, Number 8, (Aug. 2002) pp.355-357. 8. A. Detti, E. Graziosi, V. Minichiello, S. Salsano and V. Sangregorio: TCP Fairness Issues in IEEE 802.11 Based Access Networks, submitted paper.

TCP NJ+: Packet Loss Diﬀerentiated Transmission Mechanism Robust to High BER Environments Jungrae Kim1 , Jahwan Koo2 , and Hyunseung Choo1, 1

School of Information and Communication Engineering, Sungkyunkwan University, Suwon, Korea {witjung,choo}@ece.skku.ac.kr 2 Intelligent HCI Convergence Research Center Sungkyunkwan University 440-746, Suwon, Korea [email protected]

Abstract. Transmission mechanisms that include an available bandwidth estimation algorithm and a packet loss diﬀerentiation scheme, in general, exhibit higher TCP performance in wireless networks. TCP New Jersey, known as the best existing scheme in terms of goodput, improves wireless TCP performance using the available bandwidth estimation at the sender and the congestion warning at intermediate routers. Although TCP New Jersey achieves 17% and 85% improvements in goodput over TCP Westwood and TCP Reno, respectively, we further improve TCP New Jersey by exploring improved available bandwidth estimation, retransmission timeout, and recovery mechanisms. Hence, we propose TCP New Jersey PLUS (shortly TCP NJ+), showing that under 5% packet loss rate, a characteristic of high bit-error-rate wireless network, it outperforms other TCP variants by 19% to 104% in terms of goodput even when the network is in bi-directional congestion.

1

Introduction

Transmission control protocol (TCP) designed for wired networks is a connectionoriented transport protocol that provides reliable data communications [1], [2]. However, wireless infrastructures such as cellular networks, wireless LANs, and mobile computing have such characteristics as high bit-error-rate (BER), limited bandwidth, fading, and handoﬀ, which severe performance degradation [3]. The main reason is that the congestion control mechanism in TCP cannot distinguish between the packet loss caused by wireless link error and that caused by network congestion, thus, reacting to the loss by reducing its congestion window (cwnd). Therefore, these inappropriate reductions of the cwnd lead to unnecessary throughput degradation for TCP applications [4].

Corresponding author.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 380–390, 2007. c IFIP International Federation for Information Processing 2007

TCP NJ+: Packet Loss Diﬀerentiated Transmission Mechanism

381

Over the last decade, a considerable number of studies have been conducted on improving wireless TCP performance with the advances in wireless infrastructure technologies. According to the operations of TCP connection, wireless TCP algorithms can be divided into split or end-to-end approach [5]. Split approach attempts to prevent the wireless portion from the traditional network by separating TCP connection at the intermediate router (or a base station). The intermediate router behaves as a terminal (or a proxy) in both the wired and wireless portions. Both end hosts communicate with the intermediate router independently without knowledge of the other end. The drawbacks of split approach will be described in [5]. On the other hand, the end-to-end approach, such as TCP New Reno [6], Westwood [7], Jersey [8], and New Jersey [9], treats the route from the sender to the receiver as an end-to-end path, and the sender is acknowledged directly by the receiver. The receiver provides feedback reﬂecting the network condition, and the sender makes decisions for congestion control. TCP Westwood modiﬁes addictive increase multiplicative decrease (AIMD) mechanism [2] which is the common congestion window control strategy of the wired TCP and intends to improve TCP performance by eﬀectively adjusting its transmission rates on the basis of the available bandwidth estimation (ABE) algorithm at the sender. On the other hand, TCP Jersey and New Jersey are based on the integration of the sender side ABE algorithm and the receiver side packet loss diﬀerentiation scheme, thus, resulting in higher throughput and goodput than any other TCP variants. Even though TCP New Jersey achieves 17% and 85% improvements in goodput over TCP Westwood and TCP Reno, respectively, we further improve TCP New Jersey’s performance by exploring improved available bandwidth estimation, retransmission timeout, and recovery mechanisms. Hence, we propose TCP New Jersey PLUS (shortly TCP NJ+), showing that under 5% packet loss rate, a characteristic of high BER wireless network, it outperforms other TCP variants by 19% to 104% in terms of goodput regardless of background traﬃc when the network is in bi-directional congestion. The rest of the paper is organized as follows. Section 2 reviews the related works of the existing wired and wireless TCP schemes. Section 3 describes in detail the improved mechanisms of TCP NJ+. Section 4 presents performance evaluation via NS-2 [11] network simulator under various network conditions. The ﬁnal section oﬀers some concluding remarks.

2 2.1

Related Works TCP New Reno

TCP New Reno improves standard TCP Reno’s fast recovery [2]. The fast recovery is performed when packet loss is detected by the sender and then it enters the congestion avoidance phase after performing fast retransmit [2]. The multiple packet losses force the TCP Reno to invoke a slow down in the recovery of the dropped transmission rate. In TCP New Reno, the fast recovery does not terminate until receipt of full ACK. If the sender receives 3-dupack, which is a

382

J. Kim, J. Koo, and H. Choo

partial ACK, it will simply retransmit the lost packet and will not terminate the fast recovery until the all dropped packets are recovered. Hence transmission rates is maintained because it reduces the sending rate after all lost packets are retransmitted. Namely, TCP New Reno fast recovery takes care of the multiple packet drop from one cwnd. However, the limitation of TCP New Reno is that because it cannot distinguish the cause of packet loss more eﬀective fast recovery cannot be performed. In addition, the reduction of sending rates to use the AIMD mechanism is to invoke the dropping of throughput in wireless networks where the multiple packet loss is usually occurred. 2.2

TCP Westwood

TCP Westwood is wireless TCP using end-to-end proactive congestion control. TCP Westwood estimates the current network bandwidth at the sender side. The sender estimates network bandwidth by exploiting the rate and pattern of returning ACK through the reverse links. However, TCP Westwood does not distinguish the cause of packet loss. It will adjust the transmission rates constantly, upon experiencing the packet loss. Therefore, it decreases the throughput in high BER wireless networks. It is a problem that the accuracy of estimated available bandwidth depends on network condition which is changed by network traﬃc in links. 2.3

TCP New Jersey

TCP New Jersey improves the available bandwidth estimation algorithm using the TCP Jersey. It also adjusts the slow start threshold(ssthresh) based on the current estimation. TCP Jersey and New Jersey consist of two key components, the available bandwidth estimation algorithm and the congestion warning mechanism that helps the sender to eﬀectively diﬀerentiate the cause of packet loss at an intermediate router. In New Jersey, the TCP sender estimates current available bandwidth based on the packet interarrival time on the receiver. TCP New Jersey handles reverse links with background traﬃc. However, TCP New Jersey may experience the decreasing of throughput depending on background traﬃc in forward link to bring unexpected available bandwidth estimation. In addition, it cannot increase the reduced sending rates due to the packet loss eﬀectively according to the cause of packet loss which contains both the packet loss by BER and by network congestion. Consequently, TCP New Jersey may degrade performance in wireless networks where packets are lost consistently by the high BER or the network congestion.

3

TCP NJ+

In TCP New Jersey, throughput may be reduced depending on the background traﬃc pattern. In addition, when the sender detects the packet loss or the retransmission timeout (RTO ) is expired, it may not recover the dropped sending rate eﬀectively according to the cause of packet loss.

TCP NJ+: Packet Loss Diﬀerentiated Transmission Mechanism

383

We propose TCP NJ+ which improves the available bandwidth estimation algorithm and the recovery mechanism of TCP New Jersey in this paper. In the improved available bandwidth algorithm, the sender selects the maximum estimation to overcome the problem of available bandwidth estimation algorithm depending on background traﬃc pattern in comparison of two factors, the interarrival time of ACK packets at the sender and the interarrival time of data segments at the receiver. And also TCP NJ+ guarantees high throughput using the improved recovery mechanism which increases the reduced cwnd more eﬀectively when the sender detects the packet loss or RTO expiration. 3.1

Improved Available Bandwidth Estimation

Both TCP Jersey and TCP New Jersey estimate the current available bandwidth based on Eq. (1). Rn−1 × RT T + Ln (1) Rn = (tn − tn−1 ) + RT T Here Rn is the estimated bandwidth when ACK packet n arrives at time tn at the sender, tn−1 is the previous ACK packet arrival time at the sender, Ln is the size of data packet n, and RT T is the round trip time at time tn in TCP Jersey. Meanwhile the data segment arrival time at the receiver is found using the timestamps [10] option in the TCP header instead of using the ACK packet arrival time at the sender. Hence tn and tn−1 in Eq. (1) are the data segment arrival time of the nth data packet and its previous data packet arrival at the receiver, respectively, in TCP New Jersey. Because the available bandwidth estimation algorithm on TCP New Jersey is computed by the packet transmission time for the receiver, it can explore accurate estimation, if the network condition is degraded by background traﬃc in reverse links, which returns the ACK packet. On the other hand, it cannot calculate the accurate available bandwidth estimation, if the network condition is deteriorated from the background traﬃc in forward links that transmits the data packet. Accordingly, both TCP Jersey and TCP New Jersey suﬀer from the problem that the available bandwidth estimation algorithm depends on the background traﬃc pattern which brings the degradation of performance. Hence, TCP NJ+ estimates the available bandwidth to compare two estimations in Fig. 1. The Rsn is the estimated bandwidth when the ACK packet n arrives at time tsn at the sender, and tsn−1 is its previous ACK packet arrival time at the sender. The Rrn is the estimated bandwidth, by using timestamps option, when the data segment n arrives at time trn at the receiver and trn−1 is the previous data segment arrival time at the receiver. Ln is the size of data packet n, and RT T is the round trip time. As the Fig. 1 presents, TCP NJ+ selects the maximum value over Rsn and Rrn to guarantee the appropriate sending rate. In conclusion, TCP NJ+ resolves the problem that the available bandwidth estimation algorithm on TCP New Jersey depends on the background traﬃc pattern. It achieves higher throughput regardless of the direction of background traﬃc as you see in section 4.

384

J. Kim, J. Koo, and H. Choo Initialization : n ←1 R , R ,t ,t ← 0 s0 r 0 s 0 r 0 Procedure: ACK packet arrived at the sender if(timestamp ) Rsn ← ( RTT × Rsn −1 + Ln ) /((t sn − t sn −1 ) + RTT ) /* ABE based on ACK packet inter arrival time */ Rrn ← ( RTT × Rrn −1 + Ln ) /((t rn − t rn −1 ) + RTT )

/* ABE based on data packet inter arrival time */ R n ← max( R sn , R rn )

/* maximum value of two estimations */ n ← n +1

end if

Fig. 1. Improved available bandwidth estimation

3.2

Improved RTO Mechanism

If the ‘timeout’ is expired, the TCP sender concludes that the network is congested and reduces the ssthresh to the half of the current cwnd and the cwnd to one. In TCP Reno and TCP New Reno, RTO mechanism is operated by the AIMD algorithm [2]. Hence TCP Reno and TCP New Reno decrease the throughput because the RTO mechanism, which induces to drop the sending rate, is frequently occurred due to the high probability of packet loss in high BER wireless networks. An optimized window (ownd) in TCP New Jersey is computed by Eq. (2) shown below. Rn × RT T (2) owndn = segment size Here Rn is the value of available bandwidth estimation. If the RTO is expired by the nth packet, TCP New Jersey decides whether the packet loss is caused by either BER or the network congestion. If packet loss is caused by network congestion, the sender sets the cwnd to one and the ssthresh to owndn . Otherwise, it adjusts the cwnd and ssthresh to owndn . Since RTO is caused by BER instead of network congestion and the network condition becomes poor incidently, we can utilize the remainder of network bandwidth due to cwnd sets to one. Therefore it is inappropriate to set the cwnd to owndn which is computed by the minimum estimation when the link condition is degraded temporarily. In addition, when the network condition is dropped temporarily, it will decrease the throughput because the cwnd may be set to unexpected values. For this reason, as TCP NJ+ experiences RTO, it distinguishes the cause of network congestion and the BER. Depending on the cause, with RTO due to especially network congestion, TCP NJ+ sets the cwnd to one and the ssthresh to owndn as in the TCP New Jersey behavior. And for RTO due to the BER, TCP NJ+ sets the ssthresh to owndn and the cwnd to the value according to the algorithm given in Fig. 2 which is to employ owndn−1 as well as owndn . In

TCP NJ+: Packet Loss Diﬀerentiated Transmission Mechanism

385

if (RTO expired) if (Congestion Warning) /* if RTO due to congestion */ cwnd = 1; ssthresh = owndn; else /* if RTO due to BER */ cwnd = (owndn + owndn-1 ) / 2; ssthresh = owndn; end if end if Fig. 2. Improved RTO mechanism

TCP NJ+, owndn−1 value is higher than owndn because it is computed when the network condition is in a good state. TCP NJ+ achieves higher ownd, average of owndn−1 and owndn , and results in higher cwnd than TCP New Jersey, because the average of owndn−1 and owndn are always higher than owndn . Hence the throughput is guaranteed to reduce the recovery time of the dropped cwnd in order to avoid the network congestion and to utilize link bandwidth eﬀectively in TCP NJ+. In addition, because RTO frequently occurs in high BER wireless networks, it shows higher performance over other wireless TCP schemes. 3.3

Improved Recovery Mechanism

TCP New Jersey diﬀerentiates the packet loss caused by the BER from that caused by network congestion using CW mechanism [8]. When the TCP New Jersey receives 3-dupack, the error recovery mechanism executes as follows. First, if the packet loss is caused by the network congestion, ssthresh is set to owndn , then if cwnd is lower than ssthresh, it is set to the current cwnd. But if the cwnd is higher than the ssthresh, it is set to owndn . Second, if the cause of packet loss is the BER, TCP New Jersey maintains the current ssthresh and cwnd. Therefore, there is no way to adjust cwnd when packet loss is caused by the BER. However the sender increases cwnd for every ACK it receives. But, when the sender receives the 3rd duplicated ACK, it does not increase the cwnd. This means that the relative loss of cwnd is increased according to growing packet loss. TCP NJ+ handles such problem in TCP New Jersey. The improved error recovery mechanism in TCP NJ+ is shown in Fig. 3. When TCP NJ+ receives 3-dupack, the improved error recovery mechanism is performed as follows. First, if packet loss is caused by network congestion, TCP NJ+ operates the same as TCP New Jersey. Second, when packet loss caused by BER occurs, ssthresh is set to owndn and cwnd is increased by 1 maximum segment size (MSS). The reason why the cwnd is increased by 1 MSS is to compensate the lost cwnd by the 3rd duplicated ACK, because the cwnd is not increased by fast recovery algorithm. However, the packet loss caused by the BER may utilize the

386

J. Kim, J. Koo, and H. Choo

if (3 dupack received by sender) if (Congestion Warning) /* if packet loss due to congestion*/ ssthresh = owndn; if (ssthresh < cwnd) cwnd = owndn ; end if else /* if packet loss due to BER */ ssthresh = owndn cwnd = cwnd + 1; end if end if Fig. 3. Improved recovery mechanism

remaining bandwidth. Hence the adjustment of the cwnd eﬀectively is to invoke the improvement in performance. In TCP NJ+, the new recovery mechanism, which achieves the higher cwnd, ensures the remarkable performance improvement comparing to other wireless TCP schemes in high BER links where the packet loss occurs more frequently.

4

Simulation Results

We evaluate the goodput and fairness performance of TCP NJ+ using the NS-2 network simulator. One is a simple network topology and the other is a more realistic network topology with background traﬃc, as shown in Fig. 4 and Fig. 6, respectively. Various simulation parameters we use are presented in Table 1 [12]. Table 1. Simulation parameters Bandwidth

Wired : 100MB / Wireless : 2MB

Packet Size

762byte

Propagation Delay Wired : 10 - 20ms / Wireless : 1ms Queue Size

4.1

20 - 200 packets

Goodput Performance as Simple Topology

Goodput is the eﬀective amount of data delivered through the network. It is a direct indicator of network performance. We evaluate the goodput of TCP NJ+, New Jersey, Westwood, and Reno on various wireless link error rates using the simple topology shown in Fig. 4.

TCP NJ+: Packet Loss Diﬀerentiated Transmission Mechanism

100MB, 10ms

387

2MB, 1ms

S

BS

D

Fig. 4. Simple network topology

The source (Node S) connects to Node BS via a 100MB wired link with 10ms propagation delay. Node BS is linked to the destination (Node D) via a 2MB wireless link with 1ms propagation delay. The queue size of the wired link is set to 150 and the wireless link queue size is set to 20 respectively. The goodput result is shown in Fig. 5(a). TCP Goodput vs. Wireless Link Error Rate

TCP Goodput vs. Wireless Link Error Rate in Forw ard Link Background Traffic 2000

1600

Goodput (Kbyte/sec)

Goodput (kbyte/sec)

2000

1200 800 NJ+ NJ WW Reno

400 0

NJ+ NJ WW Reno

1600 1200 800 400 0

0.1

1

2

3

4

5

10

0.1

1

Wireless Link Error Rate (% packet loss)

(a) Goodput vs. wireless link error rate

3

4

5

10

(b) Forward link background traﬃc

TCP Goodput vs. Wireless Link Error Rate in Reverse Link Background Traffic

TCP Goodput vs. Wireless Link Error Rate in Bi-directional Link Background Traffic

2000

2000 NJ+ NJ WW Reno

1600

Goodput (Kbyte/sec)

Goodput (Kbyte/sec)

2

Wireless Link Error Rate (% packet loss)

1200 800 400

NJ+ NJ WW Reno

1600 1200 800 400 0

0 0.1

1

2 3 4 Wireless Link Error Rate (% packet loss)

5

(c) Reverse link background traﬃc

10

0.1

1

2 3 4 Wireless Link Error Rate (% packet loss)

5

10

(d) Bi-directional link background traﬃc

Fig. 5. TCP NJ+ goodput performance

TCP NJ+ shows a higher goodput performance, when wireless link error rate increases. Especially, in a 5% wireless link error rate, TCP NJ+ outperforms TCP New Jersey by 19% and TCP Westwood by 54%. 4.2

Goodput Performance with Background Traﬃc

In TCP NJ+, the available bandwidth estimation algorithm guarantees considerable throughput regardless of the background traﬃc pattern. As illustrated in Fig. 6, we measure the goodput of TCP NJ+, TCP New Jersey, TCP Westwood, and TCP Reno on various wireless link error rates under forward where

388

J. Kim, J. Koo, and H. Choo

C1

C2

100MB, 10ms

100MB, 10ms FTP Cross-Traffic 2MB, 1ms

100MB, 10ms

S

R1

100MB, 20ms

BS

D 1MB, 1ms

FTP Cross-Traffic

100MB, 10ms

100MB, 10ms

C3

C4

Fig. 6. Simulation topology with background traﬃc pattern

data packets are transferred, reverse where ACK packets are traversed, or bidirectional background traﬃc. The source (Node S) connects to Node R1 via a 100MB wired link with 10ms propagation delay. R1 is linked to Node BS via a 100MB wired link with 20ms propagation delay. The asymmetric wireless link from BS to the destination(Node D) is represented by the diﬀering bandwidth on the downlink (2MB) and uplink (1MB) with 1ms propagation delay. The crosstraﬃc ﬂows, Node C1 to Node C2 (forward direction) and Node C4 to Node C3 (reverse direction), and in both direction are FTP background traﬃc via a 100MB wired link with 10ms propagation delay. The queue size of the wired link is set to 200 and the wireless link queue size is set to 20. The goodput result of the FTP forward background traﬃc is illustrated in Fig. 5(b). In a 5% wireless link error rate, TCP NJ+ outperforms TCP New Jersey by 30% and TCP Westwood by 55%. The goodput result of the FTP reverse background traﬃc is presented in Fig. 5(c). In 5% wireless link error rate, TCP NJ+ outperforms New Jersey by 20% and Westwood by 45%. The goodput result of the FTP bi-directional background traﬃc is presented in Fig. 5(d). TCP NJ+ outperforms New Jersey by 31% and Westwood by 56% in a 5% wireless link error rate. The simulation result shows that TCP NJ+ achieves higher goodput than New Jersey regardless of the background traﬃc pattern when the wireless link error rate increases. 4.3

Fairness

Fairness is also an important metric of TCP performance evaluation. It is the bandwidth allocation measure for the multiple connections of the same TCP. We use the Jain’s fairness index proposed in [13] in order to show the fairness of TCP NJ+, New Jersey, Westwood, and Reno on various link error rates using the topology in Fig. 7. The fairness results are summarized in Table 2. In conclusion, TCP NJ+ satisﬁes good fairness like the other TCP variants.

TCP NJ+: Packet Loss Diﬀerentiated Transmission Mechanism

S1

D1

S2 10

.. . .

389

D2

R1

BS

100MB, 45ms

R2 10MB, 1ms

S10

.. . .

10

D10

Fig. 7. Simulation topology for fairness Table 2. Fairness of TCP schemes vs. link error rate Error Rate(%) 0.0 0.1 0.5 1.0 5.0 10

5

NJ+

NJ

WW

Reno

0.9999 0.9999 0.9999 0.9999 0.9994 0.9903

0.9999 0.9999 0.9999 0.9999 0.9994 0.9904

1.0000 0.9999 0.9999 0.9998 0.9964 0.9811

1.0000 0.9998 0.9986 0.9989 0.9980 0.9875

Conclusion

We have proposed TCP NJ+, to improve the performance of TCP New Jersey. Three enhanced mechanisms are proposed in TCP NJ+. First, the improved ABE guarantees higher throughput regardless of the background traﬃc pattern because it estimates the optimal available bandwidth. Second, when RTO due to BER occurs, the improved RTO mechanism inﬂates by reducing the cwnd more quickly. Third, when the packet loss caused by BER occurs, the improved recovery mechanism makes the reduced cwnd to be increased quickly. Results from the simulations demonstrate that TCP NJ+ improves the performance even when wireless link error rates increase. Particularly, TCP NJ+ outperforms New Jersey by 19% and Westwood by 54% in a 5% wireless link error rate with no background traﬃc. Under a 5% wireless link error rate with background traﬃc, TCP NJ+ outperforms New Jersey by 27% and Westwood by 52%. In addition, the fairness is also satisﬁed. In conclusion, TCP NJ+ with the improved ABE, RTO, and recovery mechanisms are robust to high BER environments, showing signiﬁcant performance improvements. Acknowledgments. This research was supported by the MIC(Ministry of Information and Communication), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Assessment), IITA-2006-(C1090-0603-0046).

390

J. Kim, J. Koo, and H. Choo

References 1. Postel, J.: Transmission control protocol. RFC 793, (1981) 2. Allman, M., Paxson, V., and Stevens, W.: TCP Congestion control. RFC 2581, (1999) 3. Xylomenos, G., Polyzos, G.C., Mahonen, P., and Saaranen, M.: TCP performance issues over wireless links. IEEE Communications Magazine, Vol. 39, (2001) 52–58 4. Lakshman, T. V. and Madhow, U.: The performance of TCP/IP for networks with high bandwidth-delay products and random loss. IEEE/ACM Transactions on Networking, Vol. 5, (1997) 336–350 5. Tian, Y., Xu, K., and Ansari, N.: TCP in wireless environments: problems and solutions. IEEE Radio Communications, Vol. 43, (2005) 27–32 6. Floyd, S. and Henderson, T.: The New Reno modiﬁcation to TCP’S fast recovery algorithm. RFC 2582, (1999) 7. Casetti, C., Gerla, M., Mascolo, S., Sanadidi, M.Y., and Wang, R.: TCP Westwood: bandwidth estimation for enhanced transport over wireless links. ACM/IEEE MobiCom, (2001) 287–297 8. Xu, K., Tian, Y., and Ansari, N.: TCP-Jersey for wireless IP communications. IEEE Journal of Selected Areas in Communications, Vol. 22, (2004) 747–756 9. Xu, K., Tian, Y., and Ansari, N.: Improving TCP performance in integrated wireless communications networks, Computer Networks, Vol. 47, (2005) 219–237 10. Jacobson, V., Braden, R., and Borman, D.: TCP extensions for high performance, RFC 1323, (1992) 11. UCB/LBNL/VINT Network Simulator Online Avilable: http://www.isi.edu/ nsnam/ns 12. Song, C., Cosman, P.C., and Voelker, G.M.: End-to-end diﬀerentiation of congestion and wireless losses, IEEE/ACM Transactions on Networking, Vol. 11 (2003) 703–717 13. Jain, R., Chiu, D., and Hawe, W.: A quantitative measure of fairness and discrimination for resource allocation in shared computer systems, Research Report TR-301, (1984)

TCP WestwoodVT: A Novel Technique for Discriminating the Cause of Packet Loss in Wireless Networks Jahwan Koo, Sung-Gon Mun, and Hyunseung Choo School of Information and Communication Engineering, Sungkyunkwan University Chunchun-dong 300, Jangan-gu, Suwon 440-746, South Korea [email protected], {msgon,choo}@ece.skku.ac.kr

Abstract. Conventional TCP in wireless environment cannot diﬀerentiate packet losses caused by network congestion from those caused by wireless link errors, thus, resulting in severe performance degradation. Accordingly, eﬃcient operation of TCP in wireless networks is a critical issue in the context of diﬀerentiation between packet loss. Towards this issue, we proposes a novel technique, WestwoodVT (WestwoodNR based on TCP Vegas buﬀer Thresholds), which is a sender-based TCP congestion control mechanism for discriminating the cause of packet loss to enhance the performance of TCP in wireless environment. Simulation results show that, under various wireless link error rates, it achieves a maximum of 41% and 118% improvements in goodput over WestwoodNR and TCP Reno, respectively. WestwoodVT only requires changes at the send-side. Thus, it eliminates any changes to the intermediate routers unlike TCP New Jersey. Therefore, it is cost eﬀective to implement in already deployed networks. Moreover, its fairness and friendliness are also satisﬁed. Keywords: Transmission control protocol, wireless networks, congestion control.

1

Introduction

Conventional transmission control protocol (TCP) is a reliable transport protocol that is designed to perform well in wired networks. However, it may suﬀer from severe performance degradation in wireless networks due to atmospheric conditions, high transmission errors, temporal disconnections, and signal fading [1]. The main reason for the TCP performance degradation over wireless networks is that the conventional TCP (such as TCP Reno [2]) cannot distinguish between packet losses caused by transmission errors and those caused by network congestion, thus, blindly reacting to these losses by reducing its congestion window. Consequently, these inappropriate reductions of the congestion window lead to unnecessary throughput degradation for TCP applications [3].

Corresponding author.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 391–402, 2007. c IFIP International Federation for Information Processing 2007

392

J. Koo, S.-G. Mun, and H. Choo

To overcome such limitation and degradation, several schemes have been proposed and are classiﬁed in [4] and [5]. Generally, current research work has been classiﬁed into two directions: (1) end-to-end TCP modiﬁcations such as congestion control mechanism and (2) link layer approaches that include intermediate router mechanisms. However, End-to-end TCP modiﬁcations do not distinguish the cause of packet loss whether it is due to network congestion or wireless link error. Link layer approaches require more time and cost to be deployed in real wired and wireless networks. Therefore, without supporting any intermediate router mechanisms, the discrimination of the cause of packet loss using the modiﬁed end-to-end TCP mechanism is a critical issue in wireless environment. In this paper, we propose a novel technique, WestwoodVT (WestwoodNR based on TCP Vegas buﬀer Thresholds), which is a sender-based TCP congestion control mechanism for discriminating the cause of packet loss to enhance the performance of TCP in wireless environment. WestwoodVT uses the ﬂow control concept of TCP Vegas [6] in order to discriminate the cause of packet loss. If the cause is due to network congestion, WestwoodVT retransmits packets using the available bandwidth estimator of WestwoodNR [7] [8], which is a very eﬃcient TCP scheme in wired and wireless networks. Meanwhile, if the cause is due to wireless link errors, WestwoodVT ignores the loss and retransmits the packet as if the packet loss has never occurred. The result from the simulation using NS-2 demonstrates that, under various wired and wireless link error rates, WestwoodVT achieves a maximum of 41% and 118% improvements in goodput over WestwoodNR and TCP Reno, respectively. Furthermore, WestwoodVT and TCP New Jersey [9] show very similar performance. Since TCP New Jersey is supported by an intermediate router mechanism, WestwoodVT, which is required only a sender-side modiﬁcation, outperforms WestwoodNR, TCP Reno, and TCP New Jersey in terms of eﬃciency, adaptation, deployment, simplicity, and cost in wired and wireless networks.

2 2.1

Related Works TCP Reno

TCP Reno [2] is a well-known standard TCP scheme. It has four transmission phases: slow start, congestion avoidance, fast recovery, and fast retransmit. At the beginning of a TCP connection, the sender enters the slow start phase, in which the congestion window size (cwnd ) is increased by one for every acknowledgment (ACK) received. When cwnd reaches the slow start threshold (ssthresh), the TCP sender enters the congestion avoidance phase, in which it increases its cwnd by the reciprocal of the current window size every time it receives an ACK. This increases the window by one in each round-trip time (RTT). When packet loss occurs at a congested link due to buﬀer overﬂow at the intermediate router, either the sender receives duplicate ACKs (DUPACKs), or the sender’s retransmission timeout (RTO) timer expires. These events activate TCP’s fast retransmit and recovery, by which the sender reduces the size of its cwnd to half and linearly increases cwnd, resulting in a lower transmission rate.

TCP WestwoodVT

2.2

393

TCP Vegas

Congestion detection in TCP Reno assumes the loss of packets as a signal of congestion. Since there is no mechanism in TCP Reno to early detect the congestion before the loss occurs, it is reactive rather than proactive. On the other hand, TCP Vegas [6] measures and controls the amount of extra data that a connection carries, where extra data means data that would not have been sent if the bandwidth used by the connection exactly matches the available bandwidth of the link. The goal of TCP Vegas is to maintain the proper amount of extra data in the network. Obviously, if a connection has too much extra data, it will result in congestion. If it has too little extra data, it is not able to respond fast enough to transient increase in the available bandwidth. TCP Vegas estimates the maximum expected transmission rate and measures actual transmission rate when received every ACK. It sets a variable, diﬀ, which is calculated by the difference in actual from expected transmission rate. It also deﬁnes two thresholds, α and β, roughly corresponding to having too little and too much extra data in networks. When diﬀ < α, TCP Vegas increases the congestion window linearly during the next RTT, and when diﬀ > β, TCP Vegas decreases the congestion window linearly during the next RTT. TCP Vegas leaves the congestion window unchanged when α < diﬀ < β. 2.3

WestwoodNR

WestwoodNR [7] [8] is a sender-side modiﬁcation based on TCP New Reno, which is intended to better handle large bandwidth delay product (BDP) with potential packet loss due to network congestion or wireless link errors, and with dynamic load. It is a rate based end-to-end proactive approach, in which the sender estimates the available network bandwidth dynamically, by measuring and averaging the rate of returning ACKs. It relies on returning the ACK stream, and this information helps it better set the congestion control parameters: ssthresh and cwnd. In WestwoodNR, the transmission rate using the available bandwidth estimator is evaluated and used by the sender to update ssthresh and cwnd upon loss indication for eﬃcient throughput in wired and wireless networks. However, WestwoodNR does not discriminate the cause of packet loss. It will adjust retransmission rates constantly upon experiencing packet loss. Therefore, it decreases the throughput in high wireless link error rate. This is also a problem that the accuracy of the estimated available bandwidth depends on network conditions, which is changed by network traﬃc in links. 2.4

TCP New Jersey

TCP New Jersey [9], known as the best existing scheme in terms of throughput and goodput, aims to improve wireless TCP performance using the available bandwidth estimator (ABE) at the sender and the congestion warning (CW) at the intermediate router. ABE estimates the bandwidth available to the TCP connection and guides the sender to properly adjust its transmission rate. CW is

394

J. Koo, S.-G. Mun, and H. Choo

a congestion notiﬁcation implemented at the intermediate routers that helps the sender to eﬀectively distinguish packet losses caused by network congestion from those caused by transmission errors. Based on the bandwidth estimation from ABE and the congestion indication implied by CW, TCP New Jersey calculates the size of the congestion window. After that, the actual size of transmission window is determined as the minimum of the receiver-advertised window size and the sender-calculated congestion window size. Consequently, the joint approach with the combination of ABE and CW allows to improve TCP performance in high bit-error-rate (BER) wireless networks. However, TCP New Jersey may experience the decreasing of throughput depending on background traﬃc pattern. It cannot increase the reduced sending rates eﬀectively according to the cause of packet loss. Consequently, if packet loss consistently occurs due to the high wireless link error, TCP New Jersey will reduce its throughput. In addition, CW is based on intermediate router mechanism. It is not easy to deploy TCP New Jersey into real networks because it requires the addition and modiﬁcation of both the sender-side and the intermediate router-side modules. Thus, it consists of implementation, deployment, and management complexity.

3 3.1

WestwoodVT Motivation

WestwoodNR has more eﬃcient ABE than other TCP schemes. However, it has no operation to discriminate the cause of packet loss. On the other hand, even if TCP New Jersey is known as the best existing scheme in terms of throughput and goodput, it requires the addition and modiﬁcation of both the sender-side and the intermediate router-side modules. Thus, it consists of implementation, deployment, and management complexity. A novel TCP scheme for discriminating the cause of packet loss while introducing minimum additional complexity is needed. Speciﬁcally, our goal is to develop a solution that, from an implementation complexity perspective, requires little more than a sender-side modiﬁcation. As we shall see, the only addition we consider is the discriminating of the cause and the eﬀective retransmission mechanism in a sender-side only. Concretely, the discrimination mechanism uses the buﬀer state of network nodes based on the ﬂow control concept of TCP Vegas. According to the cause, the retransmission mechanism uses the ABE of WestwoodNR. 3.2

Discriminating the Cause of Packet Loss

WestwoodVT investigates the buﬀer state of network nodes between sender and receiver in order to discriminate the cause of packet loss. The discrimination inherits from the concept of TCP Vegas. WestwoodVT estimates the maximum Expected transmission rate and Actual transmission rate when received every ACK. The Expected transmission rate is given by:

TCP WestwoodVT

395

W indowSize (1) BaseRT T where WindowSize is the size of the current congestion window, equal to the number of bytes in transit, BaseRTT is the minimum value of all measured RTT. The Actual transmission rate is given by: Expected =

W indowSize (2) RT T where WindowSize is equal to WindowSize of Eq. 1, RTT is the current calculated RTT. Additionally, WestwoodVT deﬁnes is calculated by the diﬀerence of Actual from Expected transmission rate. The value of presents the amount of current existing data in buﬀer of network nodes. Namely, indicates the state of the current network. This is given by: Actual =

W indowSize W indowSize − ) × BaseRT T (3) BaseRT T RT T In addition, WestwoodVT deﬁnes buﬀer thresholds, α and β. These values indicate the lower and upper bound of network nodes’ buﬀer. The discrimination of the cause of packet loss in WestwoodVT is operated in congestion avoidance phase. In this phase, the sender estimates using Eq. 1, 2, and 3 when it receives 3-DUPACK. WestwoodVT then compares to buﬀer thresholds α and β. =(

12 12

Network Congestion

Buffer Buffer(byte) (byte)

10 10

△

>β

β88 Postpone the decision

66

α<△

<β

4

4 α

Wireless Link Error

22

00

△ 11

22

33

44

55

66

77

<

88

α 99

10 10

11 11

12 12

13 13

14 14

15 15

Time (second) (second) Time

Fig. 1. Discriminating of the cause of packet loss in WestwoodVT

Figure 1 presents main concept of the mechanism, which discriminates the cause of packet loss when the sender receives 3-DUPACK. If is smaller than threshold α, WestwoodVT assumes that the current buﬀer state of network nodes is loose, and decides that the packet loss is due to wireless link error. If is greater than threshold β, it assumes that the current buﬀer state of network nodes is tight. At this point, WestwoodVT considers the packet loss as that caused by network congestion. If is greater than α and smaller than β, WestwoodVT does not decide the cause of the packet loss, maintains the current state, and retransmits lost packets. Therefore, WestwoodVT postpones the decision of cause until it receives 3-DUPACK for more accurate discrimination.

396

3.3

J. Koo, S.-G. Mun, and H. Choo

The Operation of WestwoodVT

WestwoodVT classiﬁes the cause of packet loss using the buﬀer state of network nodes. When the cause is due to network congestion, WestwoodVT operates congestion control mechanism using the ABE of WestwoodNR, which is a very eﬃcient TCP scheme in wired and wireless networks. However, when the cause is due to wireless link error, there is no requirement to use congestion control mechanism because there is no network congestion. In this case, WestwoodVT increases cwnd by one, and then retransmits lost packets as if transmitting a standard packet.

/* In Congestion Avoidance */ if (3-DUPACK received) if (△ > β) Available bandwidth estimation of WestwoodNR; /* Due to congestion */ else if (△ < α) cwnd = cwnd + 1; /* Due to wireless link errors */ else cwnd = cwnd; /* Postpone the decision */ end if end if

Fig. 2. Pseudo code of WestwoodVT

The reason of increasing the congestion window is that the sender generally increases cwnd by one for every ACK received. At the packet loss is caused by wireless link error, when the sender receives third duplicated ACK, it does not increase the congestion window because of fast recovery mechanism; normally the sender increases the congestion window by one when it receives ﬁrst or second duplicated ACK. This results in the loss of congestion window. In this situation, WestwoodVT modiﬁes fast recovery mechanism using increasing the congestion window by one. Figure 2 shows the pseudo code of WestwoodVT.

4

Simulation Results

In this section, WestwoodVT is compared to WestwoodNR, TCP New Jersey, and TCP Reno using NS-2 simulator in terms of goodput, fairness, and friendliness. These are metrics of TCP performance evaluation on wired and wireless networks. Table 1 presents parameters for simulations.

TCP WestwoodVT

397

Table 1. Simulation parameters

4.1

Packet Size

Queue Size

726 B

20 MSS

Bandwidth

Delay

wired

wireless

wired

wireless

100 MB

2 MB

10 ~ 20 ms

1 ms

Goodput Performance

Goodput is the eﬀective amount of data delivered through the network. It is a direct indicator of network performance. The simulation environment is depicted in Fig. 3.

Sender

100 MB, 10 ms

Base Station

2 MB, 1 ms

Receiver

Fig. 3. Simulation topology for obtaining goodput performance

A single TCP connection running a long-live FTP application delivers data from sender to receiver, under various wireless link errors: 0.1%, 1%, 2%, 3%, 4%, and 5%, respectively. In Fig. 3, all network nodes maintain the queue size of 20 MSS. In TCP New Jersey, CW makes a decision the cause of packet loss, either network congestion or wireless link error. When amount of existing data over router’s buﬀer is greater than 3/4 of amount of the buﬀer, if packet loss occurs, CW assumes that the cause of packet loss is due to network congestion. Otherwise, CW makes a decision that the cause is due to wireless link error. Similarly, the buﬀer thresholds of WestwoodNR are conﬁgured to roughly three quarters of the queue size of network nodes. We setup the lower bound of WestwoodVT, α to 14. if < 14, the cause is due to wireless link error. The upper bound, β is ﬁxed to 16, if > 16, the cause is due to network congestion. If is equal to 15, WestwoodVT defers the decision of the cause. Finally, we run the simulation for TCP WestwoodVT, -WestwoodNR, -New Jersey, and -Reno, respectively. Figure 4 demonstrates the result of simulation, an average of 10 times, over 200 seconds. As shown in the simulation results, WestwoodVT is superior over WestwoodNR and TCP Reno in all wireless link error rates. With wireless link error rate of 1%, WestwoodVT outperforms WestwoodNR by 3% and TCP Reno by 21%. WestwoodVT outperforms WestwoodNR by 11% and TCP Reno by 54% with wireless link error rate of 2%. Furthermore, WestwoodVT is almost identical to TCP New Jersey, until the wireless link error rate reaches 4%. WestwoodVT achieves a maximum of 41% and 118% improvements in goodput over WestwoodNR and TCP Reno, respectively.

398

J. Koo, S.-G. Mun, and H. Choo

Fig. 4. Comparison of goodput of TCP WestwoodVT, -WestwoodNR, -New Jersey, and -Reno

The performance of TCP New Jersey is overwhelming other TCP schemes beyond wireless link error rate of 4%. However, because real wireless link error rate is 1% ∼ 2% and the performance of WestwoodVT is similar to TCP New Jersey. Therefore, WestwoodVT, a sender-side scheme, is generally better than TCP New Jersey and the other TCP schemes in terms of eﬃciency, deployment, adaptation, and cost. 4.2

Goodput Performance with Background Traﬃc

In order to simulate the performance of TCP WestwoodVT, -WestwoodNR, New Jersey, and -Reno with background traﬃc of both forward and backward FTP traﬃc, we consider a more complex topology such as Fig. 5. Node 2

Node 1

100 MB, 10 ms

Sender

100 MB, 10 ms

Router 1

100 MB, 10 ms

Node 4

FTP Traffic

100 MB, 20 ms

FTP Traffic

100 MB, 10 ms

Base Station

2 MB, 1 ms

Receiver

100 MB, 10 ms

Node 3

Fig. 5. Complex simulation topology with background traﬃc

The queue size of all nodes are set to 20 MSS. A single TCP connection running a long-live FTP application delivers data from the sender to the receiver

TCP WestwoodVT

399

Fig. 6. Comparison of goodput of TCP WestwoodVT, -WestwoodNR, -New Jersey, and -Reno with background traﬃc

D

Router1 100 MB, 45 ms

Base Station

Router2 10 MB, 1 ms

...

10

...

S

10

Fig. 7. Simulation topology for obtaining fairness index

under various wireless link errors: 0.1%, 1%, 2%, 3%, 4%, and 5%, respectively. At the same time, cross-traﬃc ﬂows are generated: forward traﬃc ﬂows from node1 to node2 via router1 and base station, and backward traﬃc ﬂows from node3 to node4 via base station and router1, respectively. Figure 6 depicts simulation results over 200 seconds, an average of 10 times. The buﬀer thresholds, α and β of WestwoodVT are conﬁgured to 14 and 16, respectively. These values are based on the queue size of network nodes. The performance of TCP Reno is worst. WestwoodVT achieves 6% ∼ 24% improvements in goodput over WestwoodNR, and it is overwhelming TCP New Jersey, under all wireless error rates. In this simulation, TCP New Jersey experiences the decreasing of throughput with background traﬃc. However, WestwoodVT keeps its performance constantly, regardless of wireless link error rates or background traﬃc. 4.3

Fairness of WestwoodVT

Another important issue of TCP is fairness. Multiple connection of the same TCP scheme must interoperate nicely and converge to their fair shares. We employ Jain’s fairness index [10] function to justify the fairness of TCP schemes. The perfectly fair bandwidth allocation would result in a fairness index of 1.

400

J. Koo, S.-G. Mun, and H. Choo Table 2. Fairness comparison Wireless Link Error Rate

WestwoodVT

WestwoodNR

TCP New Jersey

TCP Reno

0.0%

0.999

0.999

0.999

0.999

0.1%

0.999

0.999

0.999

0.999

0.5%

0.999

0.999

0.999

0.999

1.0%

0.999

0.999

0.999

0.998

5.0%

0.992

0.991

0.997

0.994

We setup the simulation environment as shown in Fig. 7. We run the simulation for diﬀerent TCP schemes and compare their fairness index. The results of fairness are summarized in Table 2. All TCP schemes including WestwoodVT achieve a fairly satisfactory fairness index under various wireless link error rates. 4.4

Friendliness of WestwoodVT

Friendliness is another important property of TCP. A friendly TCP scheme should be able to coexist with other TCP schemes and not result in starvation. S

D m

Router1

10 - m

100 MB, 45 ms

Base Station

Router2 10 MB, 1 ms

...

...

m

10-m

Fig. 8. Simulation topology for obtaining friendliness

Fig. 9. Friendliness under 1% of wireless link error rate

TCP WestwoodVT

401

To verify the friendliness of WestwoodVT, we setup the simulation environment, as shown in Fig. 8, where WestwoodVT coexists with TCP Reno. There are 10 pairs of TCP connections, of which m are TCP Reno connections and 10-m are WestwoodVT connections. All 10 connections are expected to share the 10MB link equally. We measure the throughput of each connection. The mean throughput of TCP Reno and WestwoodVT is calculated by summing up the throughput of the same TCP schemes and is divided by the number of connections, respectively. The results are presented in Fig. 9. WestwoodVT behaves more aggressively than nonwireless TCP such as TCP Reno. However, this behavior is anticipated since WestwoodVT is designed to perform better in lossy wireless environments.

5

Conclusion

In this paper, we have proposed WestwoodVT which utilizes the sender-based transmission window control mechanism for discriminating the cause of packet loss. It checks the buﬀer state of network nodes between sender and receiver by using TCP Vegas operating mechanism and discriminates the cause of packet loss based on the buﬀer state. When the packet loss is due to network congestion, WestwoodVT uses the ABE of WestwoodNR for packet retransmissions. Otherwise, it retransmits packets without congestion control mechanism. For performance evaluation, we setup WestwoodVT’s buﬀer thresholds α and β to 14 and 16, respectively. The result from the simulation demonstrates that, under 1% of realistic wireless link error rate, WestwoodVT achieves a maximum of 3% and 21% improvements in goodput over WestwoodNR and TCP Reno, respectively. WestwoodVT and TCP New Jersey show very similar performance in wired and wireless networks. In our opinion, WestwoodVT is better than other TCP schemes such as WestwoodNR, TCP New Jersey, and TCP Reno in terms of eﬃciency, adaptation, deployment, simplicity, and cost in wired and wireless networks.

Acknowledgments This research was supported by the MIC, Korea, under the ITRC support program supervised by the IITA, IITA-2006-(C1090-0603-0046).

References 1. Xylomenos, G., Polyzos, G. C.: TCP Performance Issues over Wireless Links. IEEE Communications Magazine, Vol. 39. (2001) 52–58 2. Allman, M., Paxson, V., Stevens, W.: TCP Congestion Control. RFC 2581, (1981) 3. Lakshman, T. V., Madhow, U.: The Performance of TCP/IP for Networks with High Bandwidth-Delay Products and Random Loss. IEEE/ACM Trans. Networking, Vol. 5, No. 3. (1997) 336–350

402

J. Koo, S.-G. Mun, and H. Choo

4. Balakrishnan, H., Padmanabhan, V. N., Seshan, S., Katz, R. H.: A Comparison of Mechanisms for Improving TCP Performance over Wireless Links. IEEE/ACM Trans. Networking, Vol. 5, No. 6. (1997) 759–769 5. Tian, Y., Xu, K., Ansari, N.: TCP in Wireless Environments: Problems and Solutions. IEEE Radio Communications, Vol. 43. (2005) S27–S32 6. Brakmo, L. S., O’Malley, S. W., Peterson, L. L.: TCP Vegas: New Techniques for Congestion Detection and Avoidance. ACM/SIGCOMM Computer Communication Review, Vol. 24. (1994) 24–35 7. Casetti, C., Gerla, M., Mascolo, S., Sanadidi, M. Y., Wang, R.: TCP Westwood: Bandwidth Estimation for Enhanced Transport over Wireless Links. ACM/Mobicom. (2001) 287–297 8. Casetti, C., Gerla, M., Mascolo, S., Sansadidi, M. Y., Wang, R.: Westwood: End-toEnd Congestion Control for Wired/Wireless Networks. Wireless Networks Journal, Vol. 8. (2002) 467–479 9. Xu, K., Tian, Y., Ansari, N.: Improving TCP Performance in Integrated Wireless Communications Networks. Computer Networks, Vol. 47. (2005) 219–237 10. Jain, R., Chiu, D., Hawe, W.: A Quantitative Measure of Fairness and Discrimination for Resource Allocation in Shared Computer Systems. Research Report TR-301, (1984)

Modeling TCP in a Multi-rate Multi-user CDMA System Majid Ghaderi1,, , Ashwin Sridharan2 , Hui Zang2 , Don Towsley1 , and Rene Cruz3 University of Massachusetts Amherst {mghaderi,towsley}@cs.umass.edu 2 Sprint Advanced Technology Labs {ashwin.sridharan,hui.zang}@sprint.com 3 University of California San Diego [email protected] 1

Abstract. Modern CDMA wireless channels support multiple transmission rates, which can be dynamically assigned to users based on traﬃc demand. However, in practice, assignment of high rate channels comes with the penalty of increased power as well as smaller orthogonal codes, which constrains their assignment to only a subset of active users. This motivates the need to carefully control high rate channel assignments so as to minimize power and achieve fairness among users. In this work, we propose a simple class of channel allocation policies to achieve this goal for TCP sessions. We develop an analytical model that explicitly captures both TCP dynamics and the impact of multiple users contending for a shared resource to evaluate the performance of the allocation policy. The model is shown to be accurate by comparing against ns-2 simulations and its utility demonstrated in computing the minimum number of required high rate channels to minimize contention.

1

Introduction

Modern CDMA networks incorporate powerful mobile processors and digital communication techniques that allow wireless channel schedulers to rapidly allocate (or de-allocate) high data rate channels in response to channel conditions or user traﬃc demands. Although an eminently desirable feature, in practice, high data rate channels are scarce resources in a CDMA wireless system that

This research is continuing through participation in the International Technology Alliance sponsored by the U.S. Army Research Laboratory and the U.K. Ministry of Defence and was accomplished under Agreement Number W911NF-06-3-0001. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the oﬃcial policies, either expressed or implied, of the US Army Research Laboratory, the U.S. Government, the UK Ministry of Defense, or the UK Government. The US and UK Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. Corresponding author.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 403–414, 2007. c IFIP International Federation for Information Processing 2007

404

M. Ghaderi et al.

cannot be simply assigned to all users. This is due to the increased power requirements as well as reduced codeword set making such an allocation infeasible. Consequently, the high rate channels must be carefully utilized so as to fairly service users as well as minimize power. While the existing mechanisms in commercial CDMA networks based on buﬀer occupancy are reasonable for inelastic applications, it is not clear as to how to select high data rate channels or share them among multiple TCP sessions. In particular, TCP, which is the dominant transport protocol in the Internet, incorporates adaptive rate mechanisms and is quite sensitive to the available channel rates and channel errors. Hence, it can be expected to demonstrate complex behavior to dynamic rate allocations by the wireless scheduler. While rate allocation in CDMA networks for elastic applications in general has been previously studied [1, 2], they do not incorporate TCP dynamics. Our work takes the ﬁrst step in addressing this issue by modeling a class of simple allocation policies in such a system while explicitly incorporating TCP dynamics of each session. Recently we demonstrated in [3] that for a single TCP session, signiﬁcant throughput gains are possible if the scheduler selects and allocates channel rates as a function of the TCP transmission rate. In this paper,we extend this framework to include the presence of multiple users along with practical constraints on assignment of high rate wireless channels. Furthermore, we propose a class of allocation policies for sharing the high rate data channels among concurrent TCP sessions, whose performance is explored both analytically and through simulations. The utility of the model is demonstrated by evaluating the impact of various design features like the number of high-rate channels and their data rates on TCP throughput, which is useful for planning purposes. The rest of this paper is organized as follows. Section 2 describes the high rate contention problem in CDMA wireless systems. Section 3, presents our system model. The TCP-aware allocation policy is presented in Section 4 and analyzed in Section 5. Section 6 validates the accuracy of the model against ns-2 simulations. Our conclusions as well as future work are discussed in Section 7.

2

CDMA Wireless System

In this section, we describe the wireless framework considered in the paper as well as motivate the problem. For ease of exposition, we use the CDMA2000 1xRTT [4] system as a reference example of a practical system although our analysis applies equally to other wireless systems that dynamically adapt their wireless channel rate in response to user traﬃc. The focus of this work is on mobile sessions in a single cell that are concurrently involved in TCP bulk transfers on the downlink, i.e., from the access network (or Base Station Controller (BSC)) to the mobile device. The throughput experienced by such sessions on the downlink is strongly inﬂuenced by the wireless channel scheduler residing at the BSC because it controls the channel rate allocated to each session. For example, CDMA2000 standards [4] specify up to six diﬀerent channel rates that

Modeling TCP in a Multi-rate Multi-user CDMA System

405

can be dynamically assigned to a user by the BSC. The lowest channel rate, called the fundamental channel can be simultaneously assigned to all users and hence deﬁnes the capacity in terms of the number of users. The higher channel rates are called supplemental channels and dynamically assigned to only a subset of users at any given instant in order to boost data rates. Given that the scheduler can choose from up to six rates, the obvious question arises as to why the scheduler does not always assign the highest channel rate to all users? Indeed, this issue forms the crux of the resource allocation problem investigated in this paper. As was brieﬂy alluded to in Section 1, allocation of a high rate channel incurs penalties in two ways. First, in order to maintain a reasonably low frame error rate (FER), a high rate channel requires the wireless antenna to transmit at higher power which can introduce excess interference in neighboring cells and shrink the available power budget at the cell itself for transmission to other users. Doing so for all users would further exacerbate the problem. Second, and perhaps more importantly, in practice, higher rate channels in a CDMA system are usually achieved by reducing the orthogonal code length (called Walsh codes). This in turn implies that not all active users can simultaneously transmit at a higher rate, simply because fewer orthogonal codewords are available at that rate, making it inherently infeasible. Furthermore, as a direct corollary of the above argument, the higher the supplemental channel rate, the shorter the code length, and hence smaller the subset of concurrently assignable users. For both these reasons, it is desirable that the wireless channel scheduler judiciously assigns high rate channels such that a) it is utilized only when required so as to minimize power and b) cycled among all users desiring the high rate channel so as to achieve fair resource allocation. Since TCP incorporates an adaptive feedback mechanism that is sensitive to channel errors and available channel rate, its behavior to dynamic rate adaptation by the wireless scheduler can be quite diﬀerent from that of inelastic applications. This immediately raises the issue of how to allocate supplemental channels to TCP sessions in a CDMA system and its impact on achieved throughput. We address this issue in this paper.

3

System Model

We assume a single cell with N continuously backlogged TCP sessions transferring data over the downlink. Hence, there are N fundamental channels that can be simultaneously used by all users. 1. The cell is assumed to have K high rate supplemental channels, such that at any instant in time up to K ≤ N users can transmit at the higher rate. 2. We assume that the fundamental channel has rate C0 and the supplemental channel a rate of C1 . Clearly C0 ≤ C1 . At each point in time, the scheduler decides which type of channel is to be assigned to each TCP session. While the fundamental channel is always available, the allocation of high rate channels is arbitrated by the wireless scheduler because K ≤ N .

406

M. Ghaderi et al.

3. The packet error probability is implicitly assumed to be a function of the assigned channel and denoted by p0 (p1 ) when the assigned channel rate is C0 (C1 ). Note that typically p0 ≤ p1 , reﬂecting the impact of smaller codewords to achieve higher channel rates. 4. We assume the presence of power control to primarily combat fast fading and interference eﬀects. This is true in current systems where fast closedloop power control tracks a speciﬁed target SINR (or equivalently target FER). 5. We assume no (or a very small) buﬀer at the base station. Hence, TCP experiences congestion if its sending rate exceeds the maximum allocated channel rate. We note that modern cellular systems can support up to four or more diﬀerent supplemental channel rates. However, obtaining succinct analytical expressions for TCP throughput even for three rates is quite diﬃcult. Moreover, our simulations show that there is only a small improvement in system performance by considering more than two rates [5]. Finally, we emphasize that no speciﬁc assumptions have been made regarding how the two channel rates are achieved nor how they result in the speciﬁc channel error probabilities.

4

TCP-Aware Rate Allocation

We now present our TCP-aware channel allocation scheme. It essentially comprises of a simple preemptive allocation discipline with parameter α that utilizes the TCP transmission rate to make preemption decisions. Speciﬁcally, we assume that the wireless scheduler can monitor the TCP rate of each session and, based upon it, decides what channel to allocate. The mechanism works in the following fashion: 1. Upon starting, a TCP session i is assigned a low-rate channel with rate C0 . 2. Since the TCP rate increases additively (in the absence of loss), when the transmission rate of the session reaches C0 , the capacity of the fundamental channel, the scheduler executes the following decision mechanism: (a) If a supplemental channel C1 is free i.e., less that K supplemental channels are occupied, then the session is always assigned the supplemental channel. (b) Otherwise, it is allocated a supplemental channel with probability α. Since at most K users can utilize the high rate channel, this means that with probability α a session already utilizing the high-rate channel is preempted to accommodate the requesting session. The session to be preempted is chosen randomly with probability 1/K. We note that the preempted session faces a congestion loss with probability one since its current transmission rate must have been greater than C0 and hence immediately halves its sending rate. (c) If the supplemental channel request is denied, which happens with probability 1−α, then the session experiences a congestion loss with probability one and immediately halves its sending rate, in this case to C0 /2.

Modeling TCP in a Multi-rate Multi-user CDMA System

407

3. Finally, if a session occupying the supplemental channel drops its sending rate to less than that of the supplemental channel rate, either due to congestion or channel errors, its supplemental channel is de-allocated and it is assigned a fundamental channel1 . This makes sense, since it frees up resources that would not be utilized by the TCP session. The preemption probability α is a system parameter that needs to be optimized in order to maximize TCP throughput. The model we present in the next section explicitly captures the impact of α on TCP throughput and hence, can be utilized to ﬁnd the optimal value of α.

5

TCP Throughput Model

In the previous section, we proposed a simple scheme for TCP-aware channel allocation. We now present an analytical model to compute the throughput in such a system. Rather than list all the details of the analysis, due to lack of space, we highlight the salient features of the system as well as our approach to model it. The reader is referred to the technical report [6] for further information. From the perspective of a TCP session, such a multi-rate system results in the session being in either of two distinct states corresponding to whether it is allocated a fundamental or supplemental channel. Each type of channel has diﬀerent channel capacities, round trip times and packet error probabilities (C0 , R0 , p0 ) and (C1 , R1 , p1 ) respectively. Given the operation of the proposed scheduler, the state of a session, is decided by two factors: a) its own sending rate or instantaneous throughput, and b) the supplemental channel occupancy, both of which the scheduler utilizes in channel allocation. The latter is critical since it reﬂects the impact of the multi-user resource sharing aspect of the system. Speciﬁcally, whether there are K or less supplemental channel allocations inﬂuences both the probability that the session receives a supplemental channel, as well as the rate at which it itself is preempted from a supplemental channel. Now, the supplemental channel occupancy in turn depends on the rate at which sessions in the fundamental channel state request the supplemental channel, which is actually a function of the time a session spends in each state (fundamental or supplemental). This essentially closes the loop in the sense that the state of the TCP session is a function of the supplemental channel occupancy, which itself depends on the state of each session. A natural approach to analyze such a system with coupled variables is the ﬁxed-point method, which in this case, works as follows. We ﬁrst derive the instantaneous throughput distribution equations for a TCP session given the supplemental channel occupancy statistics. Note that there is a one-to-one relation between the instantaneous TCP throughput and session state. Next,we utilize the newly computed throughput equations to update the supplemental channel occupancy statistics. These iterations are repeated till a desired convergence criteria is met. Both components are described in the next two sub-sections. 1

This implies that C1 ≤ 2C0 which is true in practical systems. In practice, most wireless rates follow the relation C1 = 2C0 .

408

5.1

M. Ghaderi et al.

TCP Session Throughput

This sub-section computes the throughput of a single TCP session given the supplemental channel occupancy statistics. In order to capture the evolution of TCP dynamics in the dynamic two-rate environment, we utilize the model developed in [3] as a starting point and incorporate preemption from a supplemental channel as well as denial of the supplemental channel. The reader is referred to the technical report [6] for further information. We ﬁrst state some assumptions regarding TCP dynamics as well as our notation: 1. Let i = 0(1) denote that the session is allocated a fundamental (supplemental) channel. 2. All TCP sessions are assumed to be homogeneous in the sense that they have the same propagation delay a. Hence, their round trip time in state i is given by Ri = a + L/Ci where L is the packet size. We note that the assumption of homogeneity is made in this work only for purposes of illustration of the model. As discussed in Section 5.2 our model can readily tackle heterogeneity as well. 3. In congestion avoidance mode, the TCP sending rate of a session, when in state i, grows at a linear rate of L/Ri2 bits/sec2 in the absence of loss. 4. We assume that the channel losses can be modeled by an inhomogeneous Poisson process with rate pi X(t), i = 0, 1 at time t, where pi is the packet loss probability. 5. Let fi (x, t) denote the density function of the instantaneous rate X(t) in state i. Then f (x, t) = f0 (x, t) + f1 (x, t), where f (x, t) denotes the density function of X(t). 6. Let q denote the probability that the session is allowed to transition from i = 0 to i = 1, i.e., its request for a supplemental channel is honored. Let ν denote the rate at which a session is preempted from a supplemental channel, i.e., transitions from i = 1 to i = 0. We note that these quantities are computed based on the supplemental channel occupancy. pi R2

R0 L We further deﬁne the terms φij = L2 j and δi = R 2 for i = 0, 1 and g = R . 1 i Using techniques from ﬂuid analysis we can write diﬀerential equations for the evolution of the steady-state distribution f (x) of the instantaneous throughput X(t). Then, by applying Mellin transforms [7, 8], which is deﬁned as ∞ fˆi (u) = fi (x)xu−1 dx, (1) 2

0

we can solve the diﬀerential equations (see [6]) to obtain the steady state probability (fˆi (1)) and throughput (fˆi (2)) in each state i. We may then solve for the mean TCP throughput given by: k + Δ(2) k≥0 (φ00 ) Πk (2)ψ(2 + 2k) ˆ ˆ X = f0 (2) + f1 (2) = (2) k + Δ(1) k≥0 (φ00 ) Πk (1)ψ(1 + 2k) where,

Modeling TCP in a Multi-rate Multi-user CDMA System

ψ(u) =

409

g2 g2 1 − q u ) (C ( ) Δ(gC ) − − 1 (C1 )u Δ(C1 ) 0 0 q 2u q (2g)u φ 1 1 10 + 2), Δ(u + 1) − − uΔ(u) −ν − − φ11 Δ(u u u δ0 (2g) δ1 (2g) g2

− gu −

and, φ11

ν

Δ(x) = e−( 2 )x −( δ1 )x , C1 Δ(u) = Δ(u)xu−1 du . 2

gC0

5.2

Supplemental Channel Occupancy

In the previous sub-section, we computed the TCP throughput of a single session given supplemental channel assignment probability q and preemption rate ν. Our goal in this sub-section is to show how these values may be computed. ( N i 1)O

( N i )O i

0

iP

K

(i 1) P

Fig. 1. Number of users in high-rate region

We model the supplemental channel occupancy process as a Markov chain where the state, i, is equal to the number of concurrently active supplemental channels. Clearly, 0 ≤ i ≤ K. Now, let λ and μ denote, respectively, the rates at which a single user requests and releases a supplemental channel conditioned upon there being one available. We assume that both these processes are exponentially distributed, which is a reasonable assumption for large N and K. The transition rates for the Markov chain are then given as shown in Fig. 1. Denoting the stationary distribution of the Markov chain by π, it is easily seen that πi =

i N λ π0 , i μ

(3)

K where i=0 πi = 1. It now remains to compute λ and μ. Since λ is the rate at which a session requests a high-rate channel, it is essentially equal to the rate at which the TCP rate crosses the low channel rate boundary X(t) = C0 conditioned on a

410

M. Ghaderi et al.

supplemental channel being available. This is simply equal to λΔt, Δt → 0 and can be computed as follows:

λΔt = P (C0 − δ0 Δt ≤ X(t) ≤ C0 ) ∧ (no loss in [t, t + Δt]) X(t) ≤ C0 P {(C0 − δ0 Δt ≤ X(t) ≤ C0 ) ∧ (no loss in [t, t + Δt])} P {X(t) ≤ C0 } (1 − p0 C0 Δt)f0 (C0 − δ0 Δt, t)(δ0 Δt) = f0 (1) =

(4)

In the limit, as Δt → 0 and t → ∞, we obtain λ=

δ1 f1 ((gC0 )+ ) . q f0 (1)

(5)

Please refer to [6] for further details. To compute μ, we note that from the Mellin transform analysis in previous subsection, we readily have expressions for f0 (1) and f1 (1), the steady-state probabilities of being in low rate and high rate regions respectively. It is then straightforward to see that f0 (1) μ=λ . f1 (1) The relevant supplemental channel occupancy statistics can now be computed in a straightforward fashion as follows: 1. The probability q that a session in state i = 0 requesting the supplemental channel, will be assigned one is given by: 0 0 0 ) + απK = 1 − (1 − α)πK , q = (1 − πK

(6)

0 where, α is the scheduler parameter and πK is the conditional probability that all K supplemental channels are occupied given that a user is about to request a supplemental channel. In [6], we have shown that

(N − i)πi πi0 = K . j=0 (N − j)πj

(7)

2. The rate at which a user in state i = 1 is preempted from the supplemental channel is given by: α 1 ν = λ(N − K)πK , (8) K 1 where, πK is the conditional probability that all K supplemental channels are occupied given that a user has a supplemental channel and is given by [6]

iπi πi1 = K j=0

jπj

.

(9)

Modeling TCP in a Multi-rate Multi-user CDMA System

411

Finally, using a ﬁxed-point iteration technique we can compute mean TCP throughput in multi-user systems as follows. We start with (q = 1, ν = 0) and compute (X, f1 (gC0 ), f0 (1), f1 (1)). Using these values, we then compute (λ, πK ) from (5) and (3). By substituting in (6) and (8), we compute new values for (q, ν). The iteration continues until computed TCP throughput X converges with some desired precision. In passing, we emphasize that the model can easily accommodate heterogeneous TCP sessions. Speciﬁcally, sessions can be classiﬁed into (say) L classes of users, based on similarity of propagation delay, error characteristics, etc. for each of which we can compute the TCP throughput following the approach in Section 5.1. The high-rate channel occupancy process is then readily seen to be an L-dimensional birth-death Markov chain with transition rates λl , μl . It is well known that such a chain has a product type form for the stationary distribution which can be easily solved to obtain the occupancy probabilities.

6

Numerical Results

This section is devoted to evaluating the accuracy of our model by comparing against ns-2 simulations as well as demonstrating its utility in design issues. We consider a CDMA downlink scenario with a fundamental channel rate of C0 = 76.8 Kbps and supplemental channel rate of C1 = 102.4 Kbps resulting in packet error probabilities of p0 = 7.7 × 10−4 and p1 = 7.9 × 10−4 respectively2 . This scenario is reﬂective of the fact that higher rate channels typically incur higher frame errors. We begin by determining the accuracy of the model. Fig. 2 plots the per-user TCP throughput as a function of α for N = 10 users with K = 2, 5 and 8 supplemental channels for both the model and simulation. We have also plotted 95% conﬁdence intervals where each simulation point is averaged over 20 runs with diﬀerent seeds. We note that the model closely tracks the simulation results for all values of K. More importantly, both the model and simulations indicate the same throughput behavior as a function of α. Speciﬁcally, for K = 2, there is an α∗ ∈ (0, 1) that maximizes the TCP throughput, while larger values of K results in the optimum α∗ = 1. However, as K increases, the throughput becomes less sensitive to α, e.g., for K = 8 in Fig. 2. 6.1

Impact of Number of Supplemental Channels

Next, we study the impact of the number of supplemental channels K on user throughput. As shown previously, the model gives reasonable estimates and hence we use analytical results from the model only. Fig. 3 plots the TCP throughput as a function of K for N = 5, 10 and 20 users. For each K, the maximum throughput achieved at the optimal α∗ was chosen. 2

The rate values were chosen to reﬂect those of a CDMA2000 1xRTT system. The packet error probabilities were computed based on BER of 10−5 and 1/5 error coding. The reader is referred to [3] for more details.

412

M. Ghaderi et al.

76

Throughput (Kbps)

74 K=8

72 70

Simulation Analysis

K=5

68 66 64 62

K=2 0

0.2

0.4 0.6 0.8 Preemption probability (α)

1

Fig. 2. Throughput as function of α for N = 10 80

Throughput (Kbps)

75

70

65

60

55

N=5 N=10 N=20 1

5 10 15 Number of supplemental channels (K)

20

Fig. 3. Max throughput as function of K

The curves in Fig. 3 indicate that the throughput gain by adding additional supplemental channels follows a law of diminishing returns, i.e., beyond a certain K, addition of extra supplemental channels does not yield signiﬁcant gains. For example, for N = 20, allocation of only K = 10 supplemental channels achieves nearly the same throughput as that with a dedicated high rate channel per user, i.e., K = 20. This stands to reason, since TCP increases its rate only additively. Thus not all users require high rate channels simultaneously and the resulting

Modeling TCP in a Multi-rate Multi-user CDMA System

413

statistical multiplexing gain can be exploited to utilize less than N high rate channels while achieving almost the same eﬀect as dedicating a supplemental channel to each user. 15

Preemption rate (ν)

N=5 N=10 N=20 10

5

0

1

5 10 15 Number of supplemental channels (K)

20

Fig. 4. Max preemption rate (ν) as function of K

Importantly, the preemption rate ν computed by the model from (8), which is plotted in Fig. 4 as a function of K at α = 1, captures this phenomenon. Specifically, beyond a certain K (as a function of N ) the contention rate goes to zero which indicates that additional supplemental channels have very little impact. This is an important feature since it allows for computation of the minimum number of supplemental channels to allocate in a cell as a function of number of active TCP sessions, preventing resource wastage.

7

Conclusion

In this work, we addressed the issue of allocation of CDMA wireless channels to concurrently active TCP sessions. Due to inherent system constraints, high data rate channels cannot be assigned simultaneously to all sessions. This creates an interesting resource contention problem. We proposed a simple channel allocation scheduler for this problem in the context of TCP sessions and constructed an analytical model to study its performance. Through comparison with simulations the model was shown to provide reasonable estimates of TCP throughput and characterize interesting design parameters like the minimum required number of high data rate channels for a set of TCP sessions. There are many diﬀerent future directions possible from our current work. Some of avenues that we are exploring include incorporation of more sophisticated schedulers as well as utilizing the model to address resource planning

414

M. Ghaderi et al.

issues like selection of both the number and rates of supplemental channels and impact of heterogeneous users.

References 1. Fodor, G., Telek, M., Badia, L.: On the Tradeoﬀ Between Blocking and Dropping Probabilities in CDMA Networks Supporting Elastic Services. In: Proc. IFIP/NETWORKING, Coimbra, Portugal (May 2006) 2. Altman, E.: Capacity of a Multi-Service Cellular Network with Transmission Rate Control : A Queueing Analysis. In: Proc. ACM MOBICOM, Atlanta, GA (September 2002) 3. Ghaderi, M., Sridharan, A., Zang, H., Towsley, D., Cruz, R.: TCP-Aware Resource Allocation in CDMA Networks. In: Proc. ACM MOBICOM, Los Angeles (September 2006) 4. Association, T.I.: TIA EIA IS-2000. Available at www.tiaonline.org/standards/sfg/ imt2k/cdma2000/ (March 2000) 5. Ghaderi, M., Sridharan, A., Zang, H., Towsley, D., Cruz, R.: Modeling and Optimization TCP Aware Cross-Layer of Resource Allocation in CDMA Networks. Technical Report RR06-ATL-030566, Sprint Advanced Technology Labs (March 2006) 6. Ghaderi, M., Sridharan, A., Zang, H., Towsley, D., Cruz, R.: Performance Modelling of TCP in a Multi-Rate Multi-User CDMA System. Technical Report RR06ATL120729, Sprint ATL (December 2006) Available at http://research.sprint.com/ publications/uploads/RR06-ATL-120729.pdf. 7. Baccelli, F., Cruz, R., Nucci, A.: CDMA channel parameters maximizing TCP throughput. In: Proc. Workshop on Information Theory and its Applications, La Jolla, CA, USA (February 2006) 8. Baccelli, F., McDonald, D.: Mellin transforms for TCP throughput with applications to cross layer optimization. In: CISS, Princeton (2006)

IEEE 802.11b Cooperative Protocols: A Performance Study Niraj Agarwal, Divya ChanneGowda, Lakshmi Narasimhan Kannan, Marco Tacca, and Andrea Fumagalli The OpNeAR Laboratory Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas {nxa041000,dhc042000,lnk051000,mtacca,andreaf}@utdallas.edu

Abstract. This paper investigates the use of cooperative communications in the context of IEEE 802.11b to combat radio signal degradation. The performance gain of both an existing cooperative protocol and the one proposed in the paper is discussed. It is quantitatively shown how much the two cooperative protocols increase throughput, lower delivery latency, and extend transmission span, when compared to the conventional IEEE 802.11b protocol. These features may help improve connectivity and network performance in ad hoc applications.

1

Introduction

WLAN’s (wireless local area networks) have experienced tremendous growth and become the prevailing technology in providing wireless access to data users. The family of IEEE 802.11 protocols is perhaps the most widely adopted solution [10]. It must be noted that wireless links do not have well deﬁned coverage areas. Propagation and channel characteristics are dynamic and unpredictable. Small changes in the node position or direction of mobility may result in signiﬁcant diﬀerences in the signal strength. Adaptation to such conditions is a key issue in today and future wireless communications. One of the characteristics of the radio medium is its inherent broadcast nature. Besides the intended destination, a signal transmitted by a source may be received by other neighboring nodes that are within earshot. This broadcast nature of the radio medium can be used to improve the system throughput by having a node, other than the source and the destination, actively help deliver the data frame correctly. The cooperating node is referred to as the relay. The essence of the idea is that, the destination beneﬁts from data frames arriving via two statistically independent paths, i.e., spatial diversity. The advantages of cooperative communications include the ability to increase the radio channel capacity [6,7,14] and reduce the latency of automatic retransmission request protocols [8,9,15]. An IEEE 802.11b cooperative protocol was

This research is supported in part by NSF Grants No. ECS-0225528 and CNS0435429.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 415–426, 2007. c IFIP International Federation for Information Processing 2007

416

N. Agarwal et al.

introduced to improve both throughput and latency of the medium access control (MAC) [3]. Data frames transmitted by the source are received by the relay, which in turn forwards them to the destination. The destination acknowledges the received data frame directly to the source. Other protocols which exploit the broadcast nature of wireless medium to achieve potential gains have been proposed in [12,13]. In [13], the source attempts to transmit the data to destination directly and when the direct transmission fails, the partner nodes help in retransmitting the same frame after a backoﬀ process. In [12], the proposed protocol (ExOR), deals with routing a packet from the source to the destination using the help of intermediate nodes in a special way as compared to traditional routing. In this paper, cooperative communications in the context of IEEE 802.11b is further investigated. With the studied protocol, attempts to receive the data frame transmitted by the source are simultaneously made at both the relay and the destination. It is only when the destination is not successful in the reception attempt, that the relay re-sends the data frame again. The advantage of this approach is to limit the relay’s intervention to those cases when the source transmission attempt is not successful in reaching the destination. As discussed in the paper cooperative MAC protocols help cope with radio signal degradation. They provide higher throughput and lower latency when compared to the conventional IEEE 802.11b protocol. For a given throughput target, they achieve a maximum transmission span between the source and the destination that is up to 50% greater than one of the conventional IEEE 802.11b protocol. These features combined may help achieve improved connectivity and performance.

2

The Proposed Cooperative Protocol

This section describes the cooperative protocol proposed in the paper to enhance the performance of IEEE 802.11b. For simplicity, the protocol is described ignoring some control frames, e.g., the request to send (RTS), clear to send (CTS). The extension of the protocol description to include these additional control frames is straightforward. Assume that three nodes have agreed to cooperate1 , i.e., source S, destination D, and relay R. The proposed cooperative MAC protocol is based on the distributed coordination function (DCF) deﬁned for the ad hoc mode of the IEEE 802.11b standard. As shown in Fig. 1, when transmitting a data frame, S makes a direct attempt to reach D. While transmission takes place, R receives and stores a copy of the data frame temporarily. Four cases are possible2 . The time diagrams of the transmitted frames are shown in Figs. 2-5, respectively. 1

2

The protocol required to reach a consensus among the three nodes willing to cooperate is beyond the scope of this paper. Routing protocols available in the literature can be extended and adapted to perform relay selection [11]. In the four cases it is assumed that the acknowledgment is always received correctly by S. The extension to account for acknowledgment loss is straightforward.

IEEE 802.11b Cooperative Protocols: A Performance Study

Relay Data Destination

Source

Fig. 1. Cooperation of three nodes

SIFS

Data

ACK

S

RIFS

Data

ACK

R Data

ACK

D

Fig. 2. Case 1: successful delivery of data and acknowledgement frames

DIFS RIFS

SIFS

Data

ACK

S Data

Data

ACK

R Data

ACK

D

Fig. 3. Case 2: cooperation by R in retransmitting the data frame

TIMEOUT

BACKOFF TIME

RIFS

SIFS

DIFS

Data

Data

ACK

S Data

Data

Data

ACK

R Data

ACK

D

Fig. 4. Case 3: both S and R are unsuccessful

417

418

N. Agarwal et al.

1. Fig. 2: S transmitted frame is successfully received at D. D responds with a positive acknowledgment (ACK). 2. Fig. 3: S transmitted frame is successfully received at R, but not at D. D does not acknowledge the received data frame. Not receiving the ACK from D, R assumes that S’s attempt to reach D has failed, and proceeds with the transmission of the data frame copy. R transmitted frame is successfully received at D. D responds to S with a positive ACK. 3. Fig. 4: Same as case 2, but D does not receive the frame transmitted by R. 4. Fig. 5: S transmitted frame is neither received successfully at R nor at D.

TIMEOUT

Data

BACKOFF TIME

SIFS

Data

ACK

S Data

ACK

R Data

ACK

D

Fig. 5. Case 4: both D and R do not receive the data frame

For the cooperation protocol to work as described, time intervals between transmission attempts must be chosen carefully. Speciﬁcally, for the transmission of a data frame, S must sense the channel idle and wait for a time interval denoted as distributed inter-frame space (DIFS)3 . For ACK transmission, D does not need to wait. ACK is then received at S and R no later than a time interval denoted as short inter-frame space (SIFS). SIFS takes into account various latency factors, e.g., MAC software, transceiver hardware, and radio signal propagation. Both DIFS and SIFS are deﬁned in IEEE 802.11b. For transmission of the data frame copy, R must wait a time interval denoted as relay inter-frame space (RIFS). RIFS is speciﬁcally introduced as a component of the cooperative protocol and is not deﬁned in IEEE 802.11b. RIFS must be chosen to both allow the detection at R of the ACK transmitted by D (RIFS > SIFS), and prevent frame transmission of other nodes while the cooperation is taking place (RIFS < DIFS). A possible value for RIFS is the point (coordination function) inter-frame space (PIFS). PIFS is deﬁned in IEEE 802.11b to allow the point coordination function to have collision-free access to the channel for coordinating data frame transmissions in the infrastructure mode. Choosing RIFS=PIFS is a possible option when operating the cooperative protocol in the ad hoc mode, as the point coordination function is not present. This choice is advantageous as the relay node will not need any special scheduling mechanism on its queues. The backoﬀ procedure at S is same as in IEEE 802.11b. When the predetermined maximum number of transmission attempts is reached, the data frame is 3

Exception to this rule is when multiple frames containing the fragments of the same packet are sequentially transmitted by the same sender.

IEEE 802.11b Cooperative Protocols: A Performance Study

419

discarded. Special attention is required to handle the transmission sequence of case 2 (Fig. 3). In this case, R senses the channel after SIFS. If the channel is idle, it indicates that the ACK frame is not being transmitted by D. Then, R begins the transmission of the data frame it received from S at RIFS. Due to the backoﬀ procedure, S cannot start retransmission unless it senses the idle channel for at least DIFS > RIFS. As explained above, RIFS is chosen carefully so that S ﬁnds the channel busy after SIFS if R is trying to help the transmission between S and D. If D receives the frame transmitted by R, D sends ACK to S. On receiving ACK, S cancels its backoﬀ procedure for retransmission and start the transmission procedure for the next data frame. If S does not receive the ACK, it goes ahead with the backoﬀ procedure as deﬁned in the IEEE 802.11 standard.When R fails in its attempt to transmit the packet to D, S will continue its backoﬀ process (which is frozen when R is transmitting) and when the backoﬀ ends transmits the packet to D.Thus, when the transmission from R is not successful, the backoﬀ procedure at S does not get aﬀected. As already mentioned, the proposed protocol does not change when RTS/CTS frames are considered. When R receives the RTS and/or CTS from S and/or D, it does not attempt transmission of its own data frames. However, it keeps listening and helps deliver the data frame from S to D whenever required. Source (After packet transmission) 1 [Case 1] Yes Reset Contention window and start backoff for next frame

Stop

At timeout, is ACK received?

No Start backoff

[Case 2] Yes

ACK received before DIFS?

[Case 3, 4] No

Relay helped

Relay could not help

1

Stop

Fig. 6. Source’s ﬂowchart

The ﬂowcharts of the cooperative protocol for S and R are shown in Figs. 6 and 7, respectively. As the ﬂowcharts indicate, some changes are required in the MAC protocol for data transmission when compared to the IEEE 802.11b standard. No changes are required at D for data reception. R must know the addresses of both S and D in order to relay data frames between the two nodes. Note that if traﬃc is bidirectional, R can help relay data frames in both directions. Conversely, S and D can function with or without R, and need not know the address of R. Thus, the protocol and the data ﬂow between S and D can smoothly adapt to changing channel conditions and relative

420

N. Agarwal et al. Relay

ACK

Frame Received?

Ignore frame

Stop

Data

Wait for RIFS

[Case 1] Yes

[Case 2, 3] Channel Busy?

No

Drop frame

Transmit frame

Stop

Stop

Fig. 7. Relay’s ﬂowchart

locations of the three nodes. As already mentioned, the main diﬀerence between the protocol proposed in this section and the one in [3] is the attempt made by S to reach both D and R with the same frame transmission.

3 3.1

Results Channel Model

The path loss model used in the simulator is as follows: Esr = Est ·

GT × GR × λ2 (4π)2 (d)β

(1)

where, – – – – –

Esr , Est : energy per symbol at the receiver and transmitter, respectively, GT , GR : transmitter and receiver antenna gain, respectively, d: transmitter-receiver distance, λ: wavelength at the channel center frequency in m, β: path loss exponent, β = 2 in free space, typically 2 ≤ β ≤ 4 for environments with structures and obstacles [2,16].

Fading is assumed to be Rayleigh slow and ﬂat, i.e., the fading coeﬃcients are considered constant over a single frame transmission. The fading experienced by any given frame transmission is statistically independent of the fading experienced by any other frame transmission. The instantaneous signal to noise ratio at receiver j given a transmission from transmitter i is given by: F

2 )/10 10 γ(i,j) = ((Esr × P G/No ) × ri,j

(2)

IEEE 802.11b Cooperative Protocols: A Performance Study

421

where, – Esr : energy per symbol at the receiver, – P G: processing gain due to spreading, – No : noise spectral density of the additive white Gaussian noise (AWGN) channel No = K B × T (3) – KB : Boltzmann constant, – ri,j : Rayleigh distributed random variable to model the Rayleigh fading magnitude from node i to j, – F : noise ﬁgure of the receiver (10 dB). 3.2

Simulation Results

In this section, simulation generated results are discussed to assess the performance gain in IEEE 802.11b when using cooperative protocols. In the study, three protocols are considered, i.e., the conventional IEEE 802.11b [1], MAC II in [3] (Poly MAC II), and the MAC protocol proposed in Section 2 (UTD MAC). The assumptions made and values chosen for the protocol parameters are shown in Table 1. Three nodes are used, i.e., S, R, and D. Data ﬂow is either from S to D only (one-way traﬃc), or bidirectional between S and D (two-way traﬃc). R does not generate any own traﬃc. It is assumed that the three nodes have agreed to cooperate. They can freely use any of the four transmission rates provided by IEEE 802.11b, i.e., 1, 2, 5.5, and 11 Mbps. However, ACK frames are always transmitted at 1 Mbps to provide maximum reliability. Table 1. Parameters used in simulation Path Loss Exponent β

4

Flat Rayleigh Fading Average Transmitter Power

constant across frame 100 mW

PHY Header

192 bits

SIFS

10 μs

RIFS

30 μs

DIFS

50 μs

Slot Time

20 μs

Vulnerable Period

20 μs

Max Retrans. Attempts

6

Frame Size

1023 bytes

Min Contention Window

31 slots

Max Contention Window 255 slots Arrival Rate 1200 frames/s (saturation) MAC Header

34 bytes

MAC ACK

14 bytes

422

N. Agarwal et al.

Fading is assumed independent of the destination, e.g., when S transmits, the fading experienced at R is independent of the one at D. Frame error rates are computed using [5]. Multiple concurrent transmission attempts always result in collision. Propagation delay is assumed negligible. The DCF mode of operation is used. Neither the virtual carrier sense (RTS/CTS) mechanism, nor fragmentation are used. The maximum number of transmission attempts per data frame is 6. Simulation results are obtained using a C++ custom simulator and have 5% conﬁdence interval at 95% conﬁdence level. Simulation results are validated against the analytical model presented in [4]. Saturation load condition is obtained by choosing data frame arrival rates that exceed the network capacity. Data frames in excess are dropped and not counted. Throughput is deﬁned as the number of MAC payload bits that are successfully delivered and acknowledged by D normalized to time. The MAC and PHY header bits do not contribute to throughput. Access delay is the time taken for a data frame from the instant it reaches the head of the transmission queue at S till its ﬁrst bit of the successful transmission attempt is aired by S. When obtaining the curves for the Poly MAC II protocol, the relay node is chosen based on the transmission time gain that can be achieved if the packet goes through the relay [3]. The transmission rate for S (R) is chosen based on the distance of S (R) from R (D), as indicated in [3]. Once a relay is chosen, all the packets from S to D go through the relay R only, i.e., S never attempts to transmit directly to D. Upon correct reception, D directly transmits the ACK to S. The UTD MAC curves are obtained by selecting the transmission rates for S and R, respectively, that jointly yield the maximal throughput for each experiment. Cooperation in the UTD MAC is always invoked, regardless of the location of the three nodes. Fig. 8(a) shows throughput under saturation load for the three protocols as a function of the distance between S and D. Traﬃc is one-way. Four curves are reported for IEEE 802.11b, one for each transmission rate. R is always placed half way between S and D to provide good condition for cooperation. Under this condition, the two cooperative protocols oﬀer increased throughput when compared to IEEE 802.11b for distances of 40 m and above. Poly MAC II best contribution is reached at 70 m and above. Fig. 8(b) is similar to Fig. 8(a) except that fading is absent in the former. The cooperative protocols perform better than the IEEE 802.11b after a distance of 60 m, indicating that the performance gain is still there, irrespective of whether or not the channel is aﬀected by fading. The sudden transitions in the throughput are due to the change in the transmission rates used. Fading smoothens the transition area, as clearly visible in Fig. 8(a). Figs. 9(a) and 9(b) show throughput and expected access delay, respectively, under saturation load when the S-D distance is 100 m. R position varies along the S-D axis. S and D coordinates are (0, 0) and (100, 0), respectively. R coordinates are (X, 0), where X is the value on the horizontal axis in both ﬁgures. Traﬃc is one-way. The throughput of the cooperative protocols is signiﬁcantly aﬀected by the position of R. Poly MAC II does not invoke cooperation when

IEEE 802.11b Cooperative Protocols: A Performance Study 5

5 UTD MAC

UTD MAC Poly MAC II 802.11b − 11 Mbps 802.11b − 5.5 Mbps 802.11b − 2 Mbps 802.11b − 1 Mbps

4 3.5

4.5

3 2.5 2 1.5

802.11b−11 Mbps 802.11b−5.5 Mbps

3.5

802.11b−2 Mbps 802.11b−1 Mbps

3 2.5 2 1.5

1

1

0.5

0.5

0

Poly MAC II

4

Saturation Throughput(Mbps)

4.5

Saturation Throughput (Mbps)

423

0

20

40

60

80

100

120

20

140

40

60

80

100

120

140

Distance between Source and Destination (m)

Distance between source and destination (m)

(a) With Fading

(b) Without Fading

Fig. 8. Throughput vs. S-D distance, R is half way 0.024

UTD MAC

1

0.022

Poly MAC II 0.9

802.11b

Mean Access Delay (s)

Saturation Throughput (Mbps)

1.1

802.11b

0.8 0.7 0.6 0.5 0.4

Poly MAC II

0.02

UTD MAC

0.018

0.016

0.014

0.012

0

20

40

60

80

0.01

100

0

20

Relay X Position (m)

40

60

80

100

Relay X Position (m)

(a) Throughput

(b) Expected Access Delay

Fig. 9. R’s position along the S-D axis, S-D distance is 100 m Table 2. Bit rate pairs for UTD MAC in Figs. 9(a) and 9(b) S-R distance (m) 0-10 15-35 40-45 50-55 60 65-100 S Rate (Mbps) 1 11 11 5.5 5.5 2 R Rate (Mbps)

1

2

5.5

5.5 11

11

X ≤ 20 and X ≥ 80 m. The UTD MAC curves consist of a sequence of segments, each segment being obtained with a speciﬁc pair of transmission rates for S and R, respectively. The rate pairs are reported in Table 2 and help explain the UTD MAC plots. Sudden changes in the plots occur when the optimal transmission rate of either S or R changes. In the 0 ≤ X ≤ 10 m region the transmission rate of both S and R is 1 Mbps, as both nodes attempt to reach D from approximately the same distance. In the 15 ≤ X ≤ 35 m region, however, R increases its rate to 2 Mbps, thus providing a faster frame transmission time. In turn, S

424

N. Agarwal et al.

0.18

0.17 UTD MAC 802.11b

0.16

802.11b

UTD MAC

Mean Access Delay (s)

Saturation Throughput (Mbps)

0.16

0.14

0.12

0.1

0.08

0.15 0.14 0.13 0.12 0.11

0.06

0.1

0.04

0.09

0

10

20

30

40

50

60

70

0

10

20

Relay Y Position (m)

(a) Throughput

30

40

50

60

70

Relay Y Position (m)

(b) Expected Access Delay

Fig. 10. R’s position orthogonal to the S-D axis, S-D distance is 150 m Table 3. Bit rate pairs for UTD MAC in Figs. 10(a) and 10(b) R’s Y position from S-D axis (m) 0-20 25-30 35-75 S Rate (Mbps)

2

2

1

R Rate (Mbps)

2

1

1

changes to 11 Mbps as it provides the fastest solution to send the frame to R. In the 65 ≤ X ≤ 100 m region R increasingly approaches D. S rate goes down to 2 Mbps, which is a suitable rate to reach both R and D. When only R is reached successfully by the frame, R rate of 11 Mbps delivers the frame to D at full speed, taking advantage of the reduced distance to D. Figs. 10(a) and 10(b) shows throughput and expected access delay, respectively, under saturation load when the S-D distance is 150 m. R position varies orthogonal to the S-D axis. S and D coordinates are (0, 0) and (150, 0), respectively. R coordinates are (75, Y ), where Y is the value on the horizontal axis in both ﬁgures. Traﬃc is two-way. In this scenario, Poly MAC II never invokes cooperation. Only IEEE 802.11b and UTD MAC are shown then. Even when R is 75 m away from the S-D axis, the cooperative protocol yields a noticeable throughput gain over IEEE 802.11b. The behavior of the access delay curve for UTD MAC as Y increases can be explained by inspecting the transmission rates used by S and R (Table 3). The step like delay increase in the 20 ≤ Y ≤ 30 m region occurs due to the rate reduction from 1 to 2 Mbps performed by R ﬁrst, then by S. It must be noted that R rate is decreased before S rate is, as R must ensure reliable delivery to D, whereas S can be more aggressive given that R can provide a backup transmission attempt. In the 35 ≤ Y ≤ 75 m region the access delay increases slightly and it exceeds the delay of IEEE 802.11. This is because all nodes use 1 Mbps and the transmission via R takes longer time than the direct transmission from S to D. At Y = 0 m, UTD MAC performs three

IEEE 802.11b Cooperative Protocols: A Performance Study

425

times better than IEEE 802.11b and when Y = 75 m UTD MAC performs two times better than IEEE 802.11b. Overall, both cooperative protocols oﬀer tangible performance gains when compared to IEEE 802.11b if R is conveniently located between S and D. UTD MAC appears to be somewhat more ﬂexible in accommodating the various positions of R.

4

Conclusion

The paper investigated the use of cooperative communications techniques to enhance the IEEE 802.11b MAC protocol ability to cope with radio signal degradation with and without fading channel. Two cooperative MAC protocols were compared, i.e., the one in [3] and the one presented in the paper. Both cooperative protocols have the potential to yield higher throughput and lower latency when compared to the conventional IEEE 802.11b protocol. Alternatively, the maximum transmission span between the source and destination for a desired throughput target can be increased by up to 50% when using the cooperative protocols. All these features may help achieve improved connectivity and network performance in ad hoc applications, where nodes’ relative locations are diﬃcult to control and predict. However, as indicated in this study, to fully harness cooperative communications in IEEE 802.11b, the cooperating nodes must be able to carefully select their transmission rates. This subject will be addressed in a future work on this topic.

References 1. Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Speciﬁcation: High speed physical layer extension in the 2.4 GHz band. sept 1999. 2. T. S. Rappaport (2001)Wireless Communications: Principles and Practice (2nd Edition), Prentice Hall PTR. 3. P. Liu and Z. Tao and S. Panwar (2005) A Co-operative MAC Protocol for Wireless Local Area Networks. Proc. IEEE International Conference on Communications (ICC) Seoul, Korea 4. Giuseppe Bianchi (march 2000) Performance Analysis of the IEEE 802.11 Distributed Coordination Function, IEEE Journal on Selected Areas in Communications, pages 535-547. 5. B. Kim and Y. Fang and T. Wong (2005) Throughput Enhancement Through Dynamic Fragmentation in Wireless LANs,Proc. IEEE Transactions on Vehicular Technology. 6. A. Sendonaris and E. Erkip and B. Aazhang (2003) User cooperation diversity Part I: System description IEEE Trans. Commun vol. 51 no. 11, pages 1927-1938 7. M. Janani and A. Hedyat and T. Hunter and A. Nosratinia (2004) Coded cooperation in wireless communications: Space-time transmission and iterative decoding, IEEE Trans. on Signal Processing, vol. 52, no. 2. pages 362-371.

426

N. Agarwal et al.

8. E. Zimmermann and P. Herhold and G. Fettweis (2004) The impact of cooperation on diversity-exploiting protocols, Proc. of 59th IEEE Vehicular Technology Conference (VTC Spring). 9. P. Gupta and I. Cerutti and A. Fumagalli (2004) Three transmission scheduling policies for a cooperative ARQ protocol in radio networks, Proc. WNCG conference. 10. Namgi Kim (2005) IEEE 802.11 MAC Performance with Variable Transmission Rates, IEICE Transaction on Communications”, 2005, vol. E88-B, no. 9. pages 3524-3531 11. Aggelos Bletsas and Andrew Lippman and David.P.Reed (2005). A simple distributed method for relay selection in cooperative diversity wireless networks, based on reciprocity and channel measurements. Vehicular Technology Conference, 2005. VTC 2005-Spring. 2005 IEEE 61st. Vol. 3 pages 1484- 1488 12. Sanjit Biswas and Robert Morris (2005) ExOR: opportunistic multi-hop routing for wireless networks, Vehicular Technology Conference, 2005. VTC 2005-Spring. 2005 IEEE 61st, pages 133-144 13. Sai Shankar N and Chun-Ting Chou and Ghosh M (2005) Cooperative communication MAC (CMAC) - a new MAC protocol for next generation wireless LANs, International Conference on Wireless Networks, Communications and Mobile Computing, pages 133-144 . 14. J.N. Laneman, G.W. Wornell, and D.N.C. Tse (2001), ”An eﬃcient Protocol for realizing cooperative diversity in wireless networks,” in Proc. IEEE ISIT, Washington, p.294. 15. B. Zhao and M.C. Valenti (2005), Practical relay networks: a generalization of hybrid-ARQ, IEEE Journal on Selected Areas in Communications, vol. 23, no. 1, pages 7-18. 16. J. Wall and J.Y. Khan (2003), An Advanced ARQ Mechanism for the 802.11 MAC Protocol, Proceedings of Australian Telecommunications, Networks and Applications Conference(ATNAC).

It Is Better to Give Than to Receive – Implications of Cooperation in a Real Environment Thanasis Korakis1 , Zhifeng Tao2 , Salik Makda1 , Boris Gitelman1 , and Shivendra Panwar1 1

Department of Electrical and Computer Engineering, Polytechnic University, Brooklyn, NY 11201 2 Mitsubishi Electric Research Laboratories, Cambridge, MA 02139 [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract. Thanks to the immense potential cooperative communications displays, extensive investigations have been directed to examine its performance by means of both analysis and simulation. In this paper1 , an implementation approach has for the first time been pursued to demonstrate the viability of realizing cooperation at the MAC layer in a real environment. The paper further describes the technical challenges encountered, details the corresponding solution proposed, and shares the experience gained. The experimental measurements in a medium size (i.e., 10 stations) testbed are then reported, which not only helps develop a deeper understanding of the protocol behavior, but also confirms that a cooperative MAC protocol delivers superior performance.

1 Introduction Cooperative communications, which refer to the collaborative processing and retransmission of overheard information at stations surrounding a source, has recently gained momentum in the research community [1]. The notion of cooperation takes full advantage of the broadcast nature of the wireless channel and creates spatial diversity, thereby achieving improvement in system robustness, capacity, delay, coverage range, and interference reduction. The innovation of cooperative communications is not confined only to the physical layer. It is available in various forms at higher protocol layers [2]. A MAC protocol called CoopMAC [3] illustrates how the legacy IEEE 802.11 distributed coordination function (DCF) can be enhanced with minimal modifications to maximize the benefit of cooperative diversity. As experimentation gradually becomes one of the de facto approaches for benchmarking [4] [5], a preliminary performance evaluation for cooperative MAC protocol was attempted in [6] in a relatively rudimentary experimental setting, for proof of concept purposes. Only 3 stations with dedicated roles as source, destination and helper 1

This work is supported in part by National Science Foundation (NSF) under award 0520054, and the New York State Center for Advanced Technology in Telecommunications (CATT). The work is also supported by Wireless Internet Center for Advanced Technology (WICAT), an NSF Industry/University Research Center at Polytechnic University.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 427–438, 2007. c IFIP International Federation for Information Processing 2007

428

T. Korakis et al.

were involved therein, and throughput for only one TCP session was collected. In this paper, the scope and scale of effort have been significantly expanded, where all major protocol functionalities are implemented in an open source driver for 802.11 devices, and a comprehensive set of experiments are conducted in a testbed consisting of up to 10 stations. As highlighted below, many different aspects of the protocol performance have been scrutinized, and new results obtained thereby reported. – – – –

Delay performance (e.g., average end-to-end delay, jitter) Impact of cooperation on the helper station Impact of the ”Hello Packet” interval Impact of buffer overflow on system performance

To familiarize the reader with CoopMAC, the basic idea of the protocol is first summarized in Section 2. The implementation effort is then elaborated in Section 3, and the primary configurations of the experiments specified in Section 4. A rich set of measurement results along with the insights revealed therein are reported in Section 5. Section 6 completes the paper with final conclusions and possible future work. For ease of explanation, the term relay and helper will be used interchangeably in the following discussion.

2 Cooperation at MAC Layer In order to deliver an acceptable frame error rate (FER), packets in IEEE 802.11 can be transmitted at different bit rates, which are adaptive to the channel quality. For IEEE 802.11b, in particular, four different rates are supported over the corresponding typical ranges, as depicted in Figure 1. Source

Destination

11

sh

M R bps

STAh

5.5 R

hd

Mb ps

R sd ps 1 Mb

STAd

R11

R5.5

R2

R1

STAs

Fig. 1. Illustration of the Cooperative MAC Protocol

One key observation conveyed by Figure 1 is that a source station ST As that is far away from the destination ST Ad may persistently experience a poor wireless channel, resulting in a rate as low as Rsd (e.g., 1M bps) for direct transmission over an extended period of time. If there exists some neighbor ST Ah who in the meantime can sustain higher transmission rates Rsh and Rhd (e.g., 11M bps and 5.5M bps in Figure 1) between ST Ah and ST As , and between ST Ah and ST Ad , respectively, station ST As can enlist the neighbor ST Ah to cooperate and forward the traffic on its behalf to the

It Is Better to Give Than to Receive

429

destination, yielding a much higher effective rate. More specifically, upon the transmission of a packet, station ST As should access all the rate information in a cooperation table (a.k.a. CoopTable), and compare an estimation of the equivalent two-hop rate (Rsh Rhd )/(Rsh + Rhd ) with the direct rate Rsd to determine whether the two-hop communication via the relay yields a better performance than a direct transmission. If cooperative forwarding is invoked, CoopMAC engages the selected relay station ST Ah to receive the traffic from the source ST As at rate Rsh and then forward it to the corresponding destination ST Ad at rate Rhd after a short interframe spacing (SIFS) time. In the end, destination ST Ad indicates its successful reception of the packet by issuing an acknowledgment packet (i.e., ACK) directly back to ST As . As an option, the RTS/CTS signaling defined in IEEE 802.11 can be extended to a 3-way handshake in CoopMAC to further facilitate the ensuing cooperative data exchange. To identify station that has been selected as a helper, the Address 4 field in the MAC header of data packet from ST As to ST Ah in CoopMAC should hold the MAC address of the final destination ST Ad, while the Address 1 field contains the MAC address of the selected helper ST Ah . When the packet is further forwarded by ST Ah to ST Ad , the helper will place the address of ST Ad in field Address 1, and leave the Address 4 unused. The key enhancement in the control plane at each station is the establishment and maintenance of the CoopTable, which contains essential information (e.g., Rhd , Rsh ) related to all the potential helpers. For ST As to acquire the value of Rhd and Rsh , a passive eavesdropping approach is followed, so that the overhead of additional control message exchange can be kept at minimum level. It is worthwhile to note that although CoopMAC seemingly bears some resemblance to the ad hoc routing protocols that adopt either minimal hop or other innovative metric for path selection [7], they are in essence fundamentally different. First and foremost, the objective of CoopMAC is to exploit spatial diversity and rate adaptation, not to increase the geographical extent of the network as in ad hoc routing. Secondly, all the associated operations occur in the MAC layer, which enjoys a shorter response time and more convenient access to the physical layer information, as compared to traditional network layer routing. Interested audiences are encouraged to refer to [3] for more detailed protocol specifications and technical discussions.

3 Implementation of Cooperation Key challenges encountered in the driver implementation and the corresponding solution will be summarized in this section. Nevertheless, due to the constraints of space, certain implementation details cannot be covered. Interested readers can access the official project website [8] for more technical information and free downloading of the CoopMAC driver. 3.1 Inaccessibility to Firmware When it comes to system design, all the features specified in IEEE 802.11 MAC protocol are logically partitioned into two modules, according to the time-criticality of each task. The lower module, which usually operates on the wireless card as a part

430

T. Korakis et al.

of firmware, fulfills the time-critical functions such as the generation and exchange of RTS/CTS control messages, transmission of acknowledgment (ACK) packets, execution of random backoff, etc. The other module, which normally assumes the form of the system driver, is responsible for more delay-tolerant control plane functions such as the management of MAC layer queue(s), the formation of the MAC layer header, fragmentation, association, etc. As the cooperative MAC protocol requires changes to both the time-critical and delay-tolerant logic, the inaccessibility to firmware unfortunately causes additional complexity in implementation. Indeed, compromises had to be made and alternative approaches had to be pursued, due to this constraint. For illustrative purposes, three main circumventions that become necessary are outlined below. – Suspension of 3-way Handshake As mentioned in Section 2, a 3-way handshake option has been defined in the cooperative MAC protocol, which requires the selected helper to transmit a new control message called “Helper ready To Send” (HTS) between the RTS and CTS messages. Since the strict sequence of RTS and CTS packet has been hardwired in the firmware, an insertion of HTS becomes impossible at the driver level. It was therefore not possible to implement this option. – Unnecessary Channel Contention for Relayed Packet Once the channel access has been allocated to the source station, the helper should relay the packet a SIF S time after its reception, without any additional channel contention. Since the SIF S time is set as 10μs in IEEE 802.11b, any function demanding such a short delay must be implemented in the firmware. As a result, a compromise has been made in the implementation, where channel contention for the relayed packet on the second hop has to be attempted. – Duplicate ACK Each successful data exchange in the original cooperative MAC protocol involves only one acknowledgment message, which is sent from the destination to the source directly. Since the acknowledgment mechanism is an integral function of firmware, it is impossible to suppress the unnecessary ACK message generated by the relay station for the packet it will forward on behalf of the source. Therefore, an unwanted ACK from the relay had to be tolerated. As an implication of the circumventions described above, a faithful implementation of cooperative MAC is anticipated to outperform the one demonstrated in this paper. 3.2 Maintenance of the CoopTable As described in Section 2, the CoopTable is critical to enabling the cooperative operation. The passive approach for rate learning defined in the original CoopMAC protocol [3], however, has not been realized in our implementation due to the following reasons: – Unwanted Packet Filtering All the packets with a destination address different from the local MAC address are filtered out by the firmware. Hence, the driver is unable to retrieve any rate information from them.

It Is Better to Give Than to Receive

431

– Controllability of the Experiment Environment Even if the driver had access to such packets (e.g., by periodically switching the wireless card to the promiscuous mode), the additional random delay incurred by frequent mode switches, and traffic load and pattern at each station may complicate the collection of data in the experiments. Therefore, for sake of controllability of the experimental environment, an active information distribution approach has been followed instead. More specifically, a Hello packet is broadcast by each station in a periodic manner, to notify the neighbors about its existence as well as the sustainable transmission rate on the respective link. Upon the reception of the Hello packet, a station either inserts a new entry or updates an existing one in its CoopTable.

4 Experiment Configuration and Measurement Methodology 4.1 Testbed Configuration The testbed used in the experiment consists of 10 IBM T23 laptops, each of which contains an Intel Pentium III processor of 800 MHz and 384 MB memory. Redhat Linux 9.0 with kernel version 2.4.20 is installed as the operating system. In the ensuing experimental study, three different network topologies will be used. In each topology, one station is a dedicated destination, which mimics the functionality of an access point. The rest of the stations are either traffic sources, or helpers or both. To calibrate the testbed, the positions of stations have been adjusted until the throughputs achieved by all stations are roughly equal. 4.2 Measurement Methodology The majority of the statistics generated in the experiment, including throughput, packet loss and jitter, are measured by using Iperf [9], which is a powerful tool for traffic generation and measurement. A typical experimental setup could be to run an Iperf client at a handful of stations to generate UDP or TCP traffic streams, while an Iperf server residing on the dedicated destination receives the traffic and collects the statistics. To remove any random effect and short-term fluctuation, we run each experiment 5 times and each run lasts 10 minutes. Then, we get the average results. The measurement of average delay was non-trivial, since no mean end-to-end delay statistics are provided by Iperf or other off-the-shelf traffic measurement tools. As further explained in [5], tight synchronization between the transmitter and receiver is mandated, if the delay is to be measured directly. 4.3 Baseline Scenario To circumvent the synchronization requirement, which is notoriously difficult to meet, the end-to-end delay is therefore derived based upon a round trip delay that can be measured more easily. More specifically, a new testing function has been implemented in the driver, which lets the transmitter periodically broadcast a packet. Once the receiver

432

T. Korakis et al.

successfully decodes the packet, it immediately sends another broadcast packet back to the transmitter. Since the delay incurred in each direction can be considered to be identical, the one-way end-to-end delay experienced by a data packet is approximately equal to half of the round-trip delay observed at the transmitter. The delay statistics derived thereof essentially is the time from the moment that the wireless MAC driver pushes the packet into the MAC transmission queue, until the time the packet is passed from the physical layer to the MAC buffer at the receiver. A closer examination of this delay value reveals that it consists of several major components, namely the delay incurred at the transmitter (e.g., kernel interrupt delay in the driver, random backoff time, DIFS), transmission time, and delay experienced at the receiver (e.g., delay associated with kernel interrupt that signals to the MAC layer the arrival of a new packet, etc.). Note that no time will be spent on transmitting an ACK packet, because a broadcast transmission does not require any acknowledgment.

5 Performance Evaluation Based upon the testbed described in Section 4, numerous experiments have been conducted, and the results obtained are reported and analyzed in this section. A baseline scenario, which only consists of 1 transmitter, 1 helper and 1 receiver, is first used to develop a basic understanding of the implication of cooperation, and establish a benchmark for performance study of more sophisticated settings. Thanks to its simplicity, this scenario isolates interfering factors such as collisions, and creates an ideal environment that gives rise to several crucial insights related to the behavior of CoopMAC. Throughput Improvement. In this experiment, source station ST As generates traffic using an Iperf client, while the corresponding Iperf server running at the destination ST Ad collects the end-to-end throughput statistics. The Iperf client at ST Ah has been switched on, so that it not only relays traffic on behalf of ST As , but also transmit its own packets to ST Ad. As readily demonstrated in Figure 2, CoopMAC enables ST As to deliver substantially higher throughput. More importantly, Figure 2 confirms that CoopMAC protocol creates a win-win situation, instead of a zero-sum game. That is, ST Ah benefits by helping forward the packets for the slow source station. At first glance counterintuitive, this observation can be explained by the fact that if ST Ah participates in forwarding, ST As can finish its packet transmission much earlier, thereby enabling both ST As and ST Ah to transmit more bits in unit time. Interaction with Transport Protocol. In Figure 2, we can see the throughput comparison in a scenario of a source, an active helper and a destination. Direct transmission between source station ST As and destination ST Ad always occurs at 1 Mbps, and helper station ST Ah can sustain 11 Mbps for communication with both ST As and ST Ad . An important trend displayed in Figure 2(a) is that bandwidth in the IEEE 802.11 network is equally shared by the two UDP sources ST As and ST Ah , respectively, in spite of the fact that physical layer bit rate supported by ST Ah is over 10 times higher than that at ST As . Indeed, this notion of fairness that 802.11 strives to maintain has been known

1.8

1.8

1.6

1.6

End−to−End Throughput (Mbps)

End−to−End Throughput (Mbps)

It Is Better to Give Than to Receive

1.4 1.2 1 0.8 0.6 0.4 0.2 0

433

1.4 1.2 1 0.8 0.6 0.4 0.2

802.11b

CoopMAC

802.11b

UDP Throughput at Source

CoopMAC

UDP Throughput at Helper

(a) UDP

0

802.11b

CoopMAC

802.11b

TCP Throughput at Source

CoopMAC

TCP Throughput at Helper

(b) TCP

Fig. 2. Throughput Comparison: Active Traffic from Helper

to be at the cost of serious network-wide throughput degradation [10]. The CoopMAC protocol preserves this fairness; no significant disparity in the throughput of ST Ah and ST As can be seen in Figure 2(a). For TCP traffic in the 802.11 network, however, Figure 2(b) indicates that the slow source station ST As surprisingly grabs even more bandwidth than the fast helper station ST Ah , which seems to defy conventional wisdom. It has long been known that the cross-layer interaction between the random access wireless MAC protocol and TCP congestion control mechanism is problematic [11]. We will conduct further investigation regarding the cause of this counter-intuitive phenomenon. 5.1 Hello Packet Interval It is known that the frequency at which the Hello packet is broadcast exerts crucial influence on the system performance. A new experimental scenario that contains 1 source, 2 helpers and 1 destination has been setup to investigate this impact. Packets are only generated at source station ST As in this experiment, and the rates supported on all related links are listed in Table 1. The second relay ST Ah2 remains available all the time, while the first one ST Ah1 alternates between awake and dormant state every 15 seconds to mimic user mobility and dynamic channel conditions. Note that since relay ST Ah1 maintains fast links to both the source and destination, it will be chosen as the helper as long as the source thinks that ST Ah1 is still located in close physical proximity. Of course, if the Hello packets from ST Ah1 disappear after it becomes dormant, ST As eventually would realize that ST Ah1 is unavailable, and therefore turns to ST Ah2 for help. The Hello packet interval is varied in the experiment, and the resultant UDP throughput is collected and plotted in Figure 3. A small value of this interval lets the source

434

T. Korakis et al. Table 1. Settings for Study of Hello Packet Interval Rsd Rs h1 Rh1 d Rs h2 Rh2 d 1 Mbps 11 Mbps 11 Mbps 11 Mbps 5.5 Mbps

2 1.8

UDP Throughput (Mbps)

1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0

0.01

0.1 1 Hello Packet Interval (second)

10

Fig. 3. Impact of Hello Packet Interval

ST As be constantly updated on the current state of relay ST Ah1 , but causes more overhead. On the other, overhead can be reduced, but the information about the status of ST Ah1 may become stale at the source, as the interval grows excessively large. When the interval falls between the range of 0.1 to 0.2 seconds, a balance can be struck and the maximum throughput can be achieved, given that ST Ah1 goes off every 15 seconds. However, a general optimal operating region of the Hello interval value is far more complicated to predict, as the availability and suitability of a relay in reality depend on such highly random factors as channel fading, mobility and usage pattern. 5.2 End-to-End Delay Another key dimension of performance for any MAC protocol is the delay, which in fact plays a more critical role than throughput in determining the network’s capability of supporting QoS-sensitive applications. The scenario configured to measure the average end-to-end delay has been summarized in Table 2. The delay measurement methodology described in Section 4.2 has been applied, and the average delay is obtained based upon the experimental results for over 106 broadcast packets. As portrayed in Figure 4, it is evident that cooperative forwarding significantly lowers the average delay for all the cases studied, provided the MSDU size is larger than about 200 bytes. But once the MSDU size drops below 200 bytes, IEEE 802.11b seems to perform better, since it avoids the overhead associated with CoopMAC. Nonetheless, note that this drawback can be avoided, if CoopMAC adopts a dynamic relay selection algorithm, in which the source ST As would simply fall back to legacy 802.11 for small frames.

It Is Better to Give Than to Receive

435

Table 2. Settings for the Study on End-to-End Delay Case Rsd Rsh Rhd 1 1 Mbps 11 Mbps 11 Mbps 2 1 Mbps 11 Mbps 5.5 Mbps 3 1 Mbps 5.5 Mbps 5.5 Mbps

11

Mean End−to−End Delay (ms)

10 9

CoopMAC: 11/11 CoopMAC: 11/5.5 CoopMAC: 5.5/5.5 802.11b: 1 Mbps

802.11b

8 7 6 5 CoopMAC

4 3 2 1 0

100

200

300

400 500 600 MSDU Size (Bytes)

700

800

900

1000

Fig. 4. Mean End-to-End Delay

5.3 Protocol Dynamics To study the dynamic behavior of the protocol, a medium size testbed has been constructed, where 4 sources, 4 helpers and 1 dedicated destination are involved in the experiment. The UDP traffic is originated from both the source and the helper station, which implies that the channel access opportunities seized by each helper somehow have to be shared by both the locally generated traffic and the forwarded traffic. Table 3 lists the rate information related to the experiment. For both 802.11 and CoopMAC network, Figure 5 illustrates how the throughput achieved by each station changes with respect to the load applied. 1. Saturation Point The 802.11 network passes the critical tipping point as early as 0.2M bps/station, while CoopMAC does not experience saturation until a load of 0.5M bps/station. Thus, the maximum throughput achieved by CoopMAC is approximately 2.5 times higher than that for 802.11. 2. Post-Saturation Regime Once entering their respective saturation regions, all stations in 802.11 invariably start to witness significant packet drop and throughput deterioration. For helper stations in cooperative MAC, however, the decrease ends after an initial dip, and then stabilizes at a plateau of about 0.28M bps/station. The throughput of source stations in CoopMAC more or less follows the same trend of monotonic decline as observed in 802.11, but its absolute value is still notably higher.

436

T. Korakis et al. Table 3. Settings for Study of Network Dynamics Rsi d , ∀i ∈ [1, 4] Rsi hj , ∀i, j ∈ [1, 4] Rhi d , ∀j ∈ [1, 4] 1 Mbps 11 Mbps 11 Mbps

550

Throughput per station (Kbps)

450 400 350

550 Helper 1 Helper 2 Helper 3 Helper 4 Source 1 Source 2 Source 3 Source 4

500 450 Throughput per station (Kbps)

500

300 Start to see significant packet loss

250 200 150

400 350

200 150 100 50

0.2

0.25 0.3 0.35 Load per station (Mbps)

0.4

0.45

0.5

Helper

250

50 0.15

Start to see significant packet loss

300

100

0 0.1

Helper 1 Helper 2 Helper 3 Helper 4 Source 1 Source 2 Source 3 Source 4

0 0

Source

0.1

0.2

(a) 802.11

0.3

0.4 0.5 0.6 Load per station (Mbps)

0.7

0.8

0.9

1

(b) CoopMAC Fig. 5. Throughput Comparison

A closer scrutiny further suggests that this performance disparity between the helper stations and source stations in a CoopMAC network is an artifact resulting from our present implementation approach, and is expected to disappear once the access to firmware becomes available. More specifically, as explained in Section 3, the cooperative MAC protocol is currently realized at the driver level, which forces the helper stations to pass the received foreign packets into the driver space and queue them together with the native traffic in the same buffer. When the local load at the helpers grows high enough, the arrival rate of the indigenous packets at the buffer far surpasses that of the packets received from the source stations. Therefore, the rate at which the packets can be received at the helpers places a bottleneck on the end-to-end throughput of the forwarded traffic, which essentially gives local helper traffic preferential treatment. 5.4 Jitter To gain a more profound view of the delay performance, the jitter statistics for UDP traffic are collected and depicted in Figure 6. Direct transmission between source stations ST As and destination ST Ad always occur at 1 Mbps, and helper stations ST Ah can sustain 11 Mbps for communication with both ST As and ST Ad . Figure 6 depicts the maximum jitter for source and helper station, which is collected by Iperf for each traffic stream. Both Figures 6(a) and 6(b) indicate that jitter is sensitive to network size. Moreover, although helper stations support higher transmission rates than source stations, they suffer higher variance in end-to-end delay (jitter) in an 802.11

It Is Better to Give Than to Receive 25 CoopMAC 802.11

Max of Jitter Experienced by All Helper Stations (seconds)

Max of Jitter Experienced by All Source Stations (seconds)

25

20

15

10

5

0

437

1/1/1

2/2/1 3/3/1 Number of Sources/Helpers

4/4/1

(a) Source (Slow) Station

CoopMAC 802.11 20

15

10

5

0

1/1/1

2/2/1 3/3/1 Number of Sources/Helpers

4/4/1

(b) Helper (Fast) Station

Fig. 6. Jitter Comparison

network. A similar phenomenon was previously identified and an explanation offered in Section 5, where the interaction with the TCP layer was first investigated. Once cooperative MAC is adopted, the jitter performance for both source and helper stations can be improved. In addition, the fast helper stations now perceive lower jitter than the slow source stations, implying that the issue of unfairly high jitter for fast stations has been successfully resolved by CoopMAC.

6 Conclusions and Future Work This paper represents one of the first attempts that relies on an experimental approach to develop an understanding of cooperation at the MAC layer. The measurement results obtained confirm that cooperative MAC can substantially improve the performance (e.g., throughput, mean end-to-end delay, jitter) for not only the stations being helped, but also the ones who offer the cooperation. Furthermore, the paper sheds light on several critical issues particular to cooperation, such as the impact of MAC cooperation on the TCP protocol, the dynamics of protocol behavior, etc., which to the best of our knowledge have been presented for the first time. As for possible future work, user mobility would be incorporated into the experiment to examine its impact on the protocol performance. It is also worthwhile to develop further understanding about the implication of cooperation on power consumption, as energy efficiency is always a major design constraint for mobile devices. Moreover, investigation of the possible interference reduction effect of cooperative MAC would be attempted in a larger testbed that can emulate a multicell environment. In addition, access to the firmware codebase will be actively pursued, so that all the artifacts and constraints imposed by the current driver can be mitigated or completely eliminated, and the entirety of the cooperative protocol can finally be implemented in a faithful manner.

438

T. Korakis et al.

References 1. J. N. Laneman, D. Tse, and G. W. Wornell, “Cooperative Diversity in Wireless Networks: Efficient Protocols and outage Behavior,” IEEE Transactions on Information Theory, vol. 50, pp. 3062–3080, December 2004. 2. G. Jakllari, S. V. Krishnamurthy, M. Faloutsos, P. V. Krishnamurthy, and O. Ercetin, “A Framework for Distributed Spatio-Temporal Communications in Mobile Ad hoc Networks,” in Proceedings of IEEE INFOCOM’06, (Barcelona, Spain), April 2006. 3. P. Liu, Z. Tao, and S. Panwar, “A Cooperative MAC Protocol for Wireless Local Area Networks,” in Proceedings of IEEE ICC’05, (Seoul, Korea), June 2005. 4. A. Raniwala and T. Chiueh, “Architecture and Algorithms for an IEEE 802.11-based Multichannel Wireless Mesh Network,” in Proceedings of IEEE INFOCOM’05, (Miami, FL), March 2005. 5. I. Dangerfield, D. Malone, and D. Leith, “Experimental Evaluation of 802.11e EDCA for Enhanced Voice over WLAN Performance,” in Proceedings of Second International Workshop On Wireless Network Measurement (WiNMee 2006), (Boston, USA), April 2006. 6. T. Korakis, S. Narayanan, A. Bagri, and S. Panwar, “Implementing a Cooperative MAC Protocol for Wireless LANs,” in Proceedings of IEEE ICC’06, (Istanbul, Turkey), June 2006. 7. S. Biswas and R. Morris, “Opportunistic Routing in Multi-Hop Wireless Networks,” in Proceedings of ACM SIGCOMM’05, (Philadelphia, PA), August 2005. 8. “Official Website of Cooperative MAC Implementation,” http://eeweb.poly.edu/ coopmac/. 9. “Iperf: The TCP/UDP Bandwidth Measurement Tool,” http://dast.nlanr.net/ Projects/Iperf/. 10. M. Heusse, F. Rousseau, G. Berger-Sabbatel, and A. Duda, “Performance Anomaly of 802.11b,” in Proceedings of IEEE INFOCOM’03, (San Francisco, CA), April 2003. 11. S. Xu and T. Saadawi, “Revealing the Problems with 802.11 Medium Access Control Protocol in Multi-hop Wireless Ad Hoc Networks,” Computer Networks: The International Journal of Computer and Telecommunications Networking, vol. 38, pp. 531 – 548, March 2002.

Modeling Approximations for an IEEE 802.11 WLAN Under Poisson MAC-Level Arrivals Ioannis Koukoutsidis1 and Vasilios A. Siris1,2 FORTH-ICS, P.O. Box 1385, 71110 Heraklion, Crete, Greece Computer Science Department, University of Crete, P.O. Box 2208, 71409 Heraklion, Crete, Greece {jkoukou,vsiris}@ics.forth.gr 1

2

Abstract. We examine two approximative models for the behavior of an IEEE 802.11 WLAN under Poisson MAC-level arrivals. Both models extend, in a simple manner, the analysis for saturated stations with the decoupling approximation. The ﬁrst follows a well-known approach considering a constant busy station probability at each idle-sensed slot, equal to the load of the envisaged single-server queue. We extend the analysis of this model to calculate more accurately the throughput seen by a station in non-saturation conditions, by considering alternating ON/OFF periods. The second model uses this ON/OFF process structure for calculating attempt and collision probabilities, and subsequently performance measures, based on regenerative process theory. By comparison with simulation results, this model is shown to be more precise for low load conditions. The accuracy of the modeling approximations is also studied for a range of values of the minimum contention window, which is the most inﬂuential protocol parameter. Keywords: WLAN, 802.11, mathematical modeling, non-saturation.

1

Introduction

Tractable mathematical models of the IEEE 802.11 WLAN protocol operation with saturated stations (i.e., that always have queued packets to send) consider a decoupling approximation, in which the (re)-transmission processes of the different stations are mutually independent, and the probability of an attempt at each idle-sensed slot is the same throughout the time evolution of the system. The corresponding attempt and collision probabilities are then easily retrieved by the solution of a nonlinear system of equations, as shown in [1, 2, 3]. The consideration of saturated wireless stations is more appropriate for data traﬃc, as compared to voice traﬃc (where a codec transmits a ﬁxed payload periodically), or more generally real-time traﬃc. Therein it is important to model

This work has been supported by the General Secretariat for Research and Technology of Greece, through the project 05-AKMON-80 within Action 4.2 of the Operational Programme “Competitiveness” — 3rd Community Support Programme.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 439–449, 2007. c IFIP International Federation for Information Processing 2007

440

I. Koukoutsidis and V.A. Siris

the behavior of an unsaturated network in the whole range from low load to nearsaturation conditions, under a certain arrival pattern. The performance of the network would certainly be diﬀerent, but also the protocol parameters for which performance is optimal might change. In this scope, a fair amount of research has been devoted to the modeling of the 802.11 protocol in more realistic, non-saturation conditions, e.g., [4, 5, 6, 7]. The bulk of research papers consider Poisson MAC-level arrivals – both for mathematical simplicity and general applicability – and also use the decoupling approximation, assuming additionally a constant busy-station probability at each idle-sensed slot. This is taken equal to the load of the station regarded as a single-server queue. In [5, 6] the load is set as an additional parameter to the problem, that should be measured or estimated. However, a better model should avoid this need and include the characteristics of the traﬃc arrival distribution in its parameters. This is done in [7] where the authors, while studying voice traﬃc, extend the set of equations used in saturation conditions to also solve for the load. Then performance measures such as the mean channel access delay and throughput of a station are extracted. In this paper we revisit the analytical modeling of 802.11 in non-saturation conditions, under Poisson arrivals. We view the aforementioned load model as the predominant modeling approach, and further explore its accuracy. A critical remark is that previous research has considered throughput as seen by the system and not as seen by a station. We show that these two quantities may diﬀer substantially in non-saturation. We then propose an approximative calculation of the throughput seen by a station, by considering alternating ON/OFF periods. Further, we construct a new model for calculating attempt and collision probabilities, and subsequently performance measures, based on a regenerative process with such periods. This model is shown to be more accurate for low load conditions, which are especially important when considering real-time traﬃc. We shall present and compare these modeling approaches, further examining their accuracy in a range of values of the minimum contention window, which is the most inﬂuential protocol parameter. We study a homogeneous network with the same parameters for all stations, but the models are readily extendable to multiple service classes, as suggested in the 802.11e standard.

2 2.1

Preliminaries Protocol Description

The IEEE 802.11 protocol gives speciﬁcations for the interconnection of telecommunications equipment in a WLAN, using a CSMA/CA medium sharing mechanism, and exponential backoﬀ upon collisions [8]. Main elements of the mechanism are the method of random slot selection and the backoﬀ algorithm. The number of idle-sensed slots that a station must wait prior to a packet transmission attempt is selected uniformly in the interval [0, 2(k∧m) ·CWmin −1], termed as the contention window; k is the current backoﬀ stage, initialized to zero for the transmission of a new packet and incremented by one each time a collision

Modeling Approximations for an IEEE 802.11 WLAN

441

occurs, up to a maximum value m. There is no collision detection in the wireless system and a successful transmission is understood by the proper reception of an acknowledgement (ACK) by the sender. All stations in range of each other are assumed to be synchronized with a perfect carrier sense mechanism so that they all perceive the same sequence of idle and busy slots. 2.2

Analysis in Saturation Conditions

The analysis in saturation conditions is the basis of our investigation, and shall be extended for non-saturation. Consider a number of contending stations N . We assume there does not exist a limit on the number of attempts by a station, after which a packet would be discarded, or that this limit is suﬃciently high so that its eﬀect is negligible. Denote the attempt and collision probabilities of a station by p, c, respectively, which are the same for all stations in a homogeneous network. Since the collision probability is dependent on the attempt rate and vice versa, appropriate expressions can construct a system of equations to solve for these values. From a stochastic analysis (either a Markov chain analysis [1], or a renewal theory analysis [2]), the attempt rate can be expressed as a function of the collision probability and the backoﬀ parameters CWmin , m, as: p=

2(1 − 2c) . (CWmin − 1)(1 − 2c) + CWmin c(1 − (2c)m )

(1)

The collision probability at an attempt by a station is then given by c = 1 − (1 − p)N −1 .

(2)

We next proceed to derive the main performance measures of interest, which are the mean channel access delay and throughput. The channel access delay is deﬁned as the delay a frame experiences from the time it arrives at the head of the transmission queue until it is successfully transmitted. The following derivation is simpler compared to other approaches, e.g., [7]. The probability of a successful transmission attempt by a station at an idlesensed slot is P succ = p(1 − p)N −1 , while the probability of a collision is P coll = p(1 − (1 − p)N −1 ). The probability of no attempt is 1 − p. When the number of elapsed idle-sensed slots until (and including) a success is ω, the mean number of experienced collisions is (ω − 1)P coll . Therefore the mean number of collisions experienced by a station until a successful transmission is ∞

(ω − 1)P coll P succ (1 − P succ )ω−1 =

ω=1

P coll (1 − P succ ) . P succ

Similarly, the mean number of “no-attempt slots” until a success is (1 − p)(1 − P succ )/P succ . Therefore the mean channel access delay is given by E[Dacc ] = T succ +

P coll (1 − P succ ) coll (1 − p)(1 − P succ ) T + E[S (N −1) ] , (3) P succ P succ

442

I. Koukoutsidis and V.A. Siris

where T succ , T coll are the durations of a successful transmission (including the reception of the acknowledgement), and a transmission which resulted in a collision, respectively. S (N −1) is the duration of a slot as perceived by a station which does not attempt in that slot; it is an i.i.d r.v standing for the duration of a generic slot (i.e., either an idle slot, a successful transmission or a collision), when N − 1 stations are considered in the system. Hence E[S (N −1) ] =(1 − p)N −1 T slot + (N − 1)p(1 − p)N −2 T succ + [1 − (1 − p)N −1 − (N − 1)p(1 − p)N −2 ]T coll .

(4)

The throughput seen by a station equals the rate of successfully transmitted MAC-level information per unit of time. We calculate it by considering that each end of a successful transmission is a renewal epoch for the system. Assuming a ﬁxed MAC packet size σ and denoting the throughput by γ, we have γ=

σ . E[Dacc ]

(5)

This is opposed to γ = σP succ /E[S (N ) ], mostly derived as a measure in the literature (e.g., [1,5,6,7]), which can only be stated as the individual throughput seen by the system.

3

Modeling Approximations in Non-saturation

Our setting consists of contending stations transmitting to an access point, whose only purpose is to send back ACKs. We consider the same arrival pattern for all stations. No characterizations are made about the stability of the system, as the memory-full nature of the stochastic model makes this extremely diﬃcult. Instead we assume that there is a range of arrival rates for which the system will be stable, in the sense that the queue size of every station remains ﬁnite. This assumption is in accordance with our simulation results, as well as numerous other experimental results in the literature which conﬁrm stability for a range of arrival rates (e.g., [4, 5]). 3.1

Load Model

The ﬁrst model assumes a constant instead of a time-dependent busy station probability at each idle-sensed slot. This is taken equal to the load of an envisaged single server queue, which generally models the arrival and departure sequence at the station.1 Denoting the load by , the probability of a collision at an attempt by a station becomes c = 1 − (1 − p)N −1 , (6) 1

It is worth noting that this modeling would be correct if stations were fed by a stationary (not only possessing stationary increments) arrival process and their queues were stable.

Modeling Approximations for an IEEE 802.11 WLAN

where

= λE[Dacc ]

443

(7)

for an arrival rate λ. The probability of an attempt p (given the station has a packet to transmit) depends only on the collisions, and hence is given again by (1). The mean channel access time can then be derived by following the analysis in Sect. 2.2, where in place of p one uses the unconditional probability of an attempt, p. Thus we have a system of equations to solve for p, c, . To calculate the throughput experienced by a station, we would then have to consider a sequence of alternating ON/OFF periods, where the OFF period is geometrically distributed with parameter (and may take the value 0). Treating this as a regenerative process, we would have γ = σ/(E[Dacc ] + (1 − )/ · E[S (N −1) ]). This turns out to be extremely inaccurate, yielding throughput results of about 1 order of magnitude greater. To tackle this, we shall make use of the throughput calculation in the more involved ON/OFF model of the next subsection. 3.2

Regenerative ON/OFF Model

Consider a probabilistic ON/OFF model, where durations of the ON and OFF periods are geometrically distributed. After a frame transmission in the ON period, a station “rests” with probability rON or has another frame to send and contends again with probability 1 − rON . The OFF period consists of a random number of generic slots, i.e., idle slots, successful transmission or collision times. At the end of each such slot, a station may keep resting with probability rOFF , or have a frame to send and become busy with probability 1 − rOFF . 1st collision 2nd collision success 1st collision

OFF

ON

11100 000 11 1st frame

success

OFF

111100 0000 11 2nd frame

regeneration cycle

Fig. 1. Sample path evolution of the system with ON and OFF periods. The end of an OFF period is a regeneration epoch for the system.

Fig. 1 illustrates a sample path of the system evolution. Clearly we have described a regenerative process and the time between the ends of successive OFF periods is a regeneration cycle. Then, as in the renewal case [2], we can derive time averages. The attempt probability for a station will be p=

E[R] , E[X]

where E[R] is the mean number of attempts in the regeneration cycle and E[X] the mean number of attempt opportunities in this cycle. It is obvious that the

444

I. Koukoutsidis and V.A. Siris

mean number of attempts until a frame is transmitted has the same expression as in the saturation case. For a collision probability c, we have E[R] =

1 . rON (1 − c)

Now denote the mean OFF period by θ, measured in attempt opportunities. The mean number of attempt opportunities in a regeneration cycle is E[X] = θ +

¯ E[X] , rON

¯ being the number of elapsed backoﬀ slots until a frame is transmitted, with X which also has the same expression as in saturation. Recovering the appropriate expression and substituting θ = 1/(1 − rOFF ), we ﬁnally ﬁnd p=

2(1 − 2c) (8) rON m 2 1−r (1 − c)(1 − 2c) + (CW min − 1)(1 − 2c) + CWmin c(1 − (2c) ) OFF

The inherent deﬁciency of the model is the assumption of a constant busy station probability after each successful transmission of a frame, while in reality it increases for each consecutive transmission. Here we take rON to be the probability that no packet arrives in the interval [0, Dacc ). The error in this approximation is smaller in low loads, where consecutive frame transmissions are less frequent. The distribution of Dacc is diﬃcult to derive, and does not result in a simple expression. Instead we replace it with a constant value, equal to its mean. Considering Poisson arrivals with rate λ, we have rON = e−λE[D

acc

]

.

(9)

The value of rOFF is the probability that no packet arrives in [0, S (N −1) ). It can easily be calculated exactly, since S (N −1) takes discrete values: rOFF =(1 − p)N −1 e−λT

slot

+ (N − 1)p(1 − p)N −2 e−λT

+ [1 − (1 − p)N −1 − (N − 1)p(1 − p)N −2 ]e−λT

succ coll

(10) .

Equations (8), (9), (10), along with c = 1 − (1 − p)N −1 can be used to solve for the unknowns p, c, rON , rOFF . Based on the derived attempt and collision probabilities, the mean channel access delay is calculated following the same approach as in Sect. 2.2. The throughput is calculated as the mean MAC-level information transmitted in a regeneration cycle, over the mean duration of this cycle: γ=

σ/rON . (N −1) ]/(1 − r ON + E[S OFF )

E[Dacc ]/r

(11)

We shall employ the same formula for calculating the throughput in the load model of Sect. 3.1, using the derived quantities E[Dacc ], E[S (N −1) ] from the model, and replacing p with p.

Modeling Approximations for an IEEE 802.11 WLAN

445

Remark 1. It should be clear that the ON/OFF periods here are just a workaround and do not refer to the arrival pattern. We brieﬂy explain how such a pattern can be modeled in the Appendix.

4

Results

We note that it is very diﬃcult to show existence and uniqueness of the solution of the nonlinear system of equations in the above models, despite their relatively simple form. The mapping f in the constructed multidimensional equation x = f (x) does not possess attractive monotonicity and concativity properties. We solve numerically for the unknowns using MATLAB’s fsolve function. Test cases are taken for basic channel access, noting that the method of RTS/CTS messages [8] that mitigates the hidden terminal problem becomes less eﬃcient for unsaturated stations. According to protocol timings, the duration the medium is busy because of a successful transmission is T succ = DIF S + δ + TP LCP +

σ ack + δ + TP LCP + SIF S + , R Rb

(12)

while the duration of a collision is T coll = DIF S + δ + TP LCP +

σ . R

(13)

In the above formulae, DIF S (Distributed Inter-Frame Space) is the idle-sensed time that must elapse prior to a transmission attempt, δ the propagation delay, TP LCP the time to transmit the PLCP preamble and header (adjoined by the physical layer) and ack the size of a MAC level acknowledgement. ACK packets are always transmitted at a (usually lower) basic service rate Rb , whereas the TP LCP is ﬁxed for each physical layer conﬁguration. In addition, a SIF S (Short Inter-Frame Space) interval is used between the transmission of a frame and the sending of an acknowledgement, to allow the MAC layer to receive the packet and subsequently the transceiver to turn around. We consider an 802.11a implementation, for R, Rb equal to 6 Mbps and MAC packet size 160 bytes. Default values for MAC and physical layer parameters are taken from [9] and shown in Table 1. We compare the numerical results with those of a discrete-event simulator written in C++, which closely follows the modeled setup. The load model is also referred to as Model 1, and the regenerative ON/OFF model as Model 2. We ﬁrst present in Fig. 2 a comparison of diﬀerent throughput approximations in a range of Poisson arrival rates until saturation, based on Model 1. We have calculated the corresponding individual throughput “seen by the system”, which is shown to be highly inaccurate for non-saturation conditions. On the other hand, what is also shown on the graph is that the approximation by λσ is extremely accurate for a very large range of loads, reﬂecting the simple fact that when collisions are fewer, the throughput is almost equal to the information arrival rate.

446

I. Koukoutsidis and V.A. Siris Table 1. 802.11a MAC and physical layer parameters Parameter Value Tslot SIF S DIF S TP LCP ack

9 μs 16 μs 34 μs 20 μs 14 bytes

400

Throughput (kbps)

350 300

Eq. (11) ’Seen by system’ Simulation λ⋅σ

250 200 150 100 50 0 10

100 λ (packets/sec)

1000

Fig. 2. Comparison of diﬀerent throughput approximations based on Model 1 in a case with CWmin = 32, m = 5 and N = 10 stations, for diﬀerent Poisson arrival rates until saturation

Results on the comparison of the two methods are presented in Table 2. For cases where the load obtained by the numerical evaluation is smaller than 1 and thus the system appears stable, we calculate the associated queueing delay. The total packet transmission delay is, for an M/G/1 queue, E[Dtotal ] = E[Dacc ] +

λE[(Dacc )2 ] . 2(1 − )

(14)

For the calculation of the second moment of the channel access delay, see e.g. [7]. If the load turns out greater than 1, results are shown for saturation. Inaccuracies manifest themselves more explicitly in the estimation of mean delay, whose units are on a ﬁner scale. We see that Model 1 overestimates delay for low or moderate arrival rates, and underestimates it for high loads. This reveals the nature of the average approximation. On the other hand, Model 2 is more accurate for lower loads but performs signiﬁcantly poorer for higher load values, and fails to predict instability cases well. The two models are generally more precise in the estimation of the throughput. It should be stressed that – in contrast to delay measurements – the throughput measurement scale of kbps is a coarse scale when protocol timings are in μs. Notice, however, that the throughput expression used in Model 1 yields a greater error for highly loaded but not saturated stations.

Modeling Approximations for an IEEE 802.11 WLAN

447

Table 2. Comparison of the two models (a) CWmin = 32, m = 5, N = 5 λ (pkts/s) Load 102 2·102 3·102 4·102 5·102 6·102

0.04 0.11 0.22 0.46 1.09 1.31

E[Dacc ] (ms) E[Dtotal ] (ms) Throughput (kbps) Analysis Simulation Analysis Simulation Analysis Simulation Model 1 Model 2 Model 1 Model 2 Model 1 Model 2 0.537 0.485 0.441 0.553 0.498 0.452 127.74 127.79 127.80 0.644 0.519 0.542 0.699 0.552 0.591 253.56 254.39 255.58 0.811 0.576 0.734 0.968 0.643 0.934 372.62 377.94 383.70 1.121 0.680 1.152 1.678 0.829 2.460 468.79 493.70 511.94 2.010 0.899 2.185 ∞ 1.342 ∞ 636.74 585.92 585.40 2.010 2.010 2.186 ∞ ∞ ∞ 636.74 636.74 585.40

(b) CWmin = 32, m = 5, N = 10 λ (pkts/s) Load 102 2·102 3·102 4·102

0.06 0.33 1.34 1.79

E[Dacc ] (ms) E[Dtotal ] (ms) Throughput (kbps) Analysis Simulation Analysis Simulation Analysis Simulation Model 1 Model 2 Model 1 Model 2 Model 1 Model 2 0.678 0.527 0.576 0.706 0.543 0.604 127.59 127.75 127.88 1.382 0.728 1.674 1.705 0.801 3.163 246.48 252.94 255.80 4.119 2.305 4.479 ∞ 5.443 ∞ 310.78 320.59 285.74 4.119 4.119 4.479 ∞ ∞ ∞ 310.78 310.78 285.79

We further proceed to modify the minimum contention window CWmin , which is the most inﬂuential protocol parameter. The maximum backoﬀ stage has a much smaller inﬂuence, especially in non-saturation. We take numerical results for both models in a wide range of CWmin values. In Fig. 3 we plot simulated and best approximative analytical values for the mean channel access delay, as a more sensitive metric to changes of CWmin , compared to the mean total delay or throughput. Filled points are used to discern cases for which Model 2 was more accurate; as shown in the ﬁgure, this occurrs for a range of low loads (note that for ﬁxed other parameters the load increases or decreases with CWmin proportionally to E[Dacc ]). Inaccuracies are more pronounced in both models as CWmin increases, but except for exorbitantly large CWmin values, the system’s behavior is accurately followed. Qualitative aspects of behavior are well captured by both models. We saw in Table 2 that both models demonstrate the occurrence of instability. Instability may also occur for very large values of CWmin , because of the increased waiting time prior to an attempt. This is conﬁrmed by the numerical and simulation results in Fig. 3, for λ = 3·102 and CWmin equal to 512 and 1024 (the corresponding values coincide with the ones of the saturation case). The most important qualitative aspect concerns the decrease of the optimal CWmin value as the arrival rate decreases. The optimal CWmin equals 32 in the saturation case, for N = 5 stations, while for λ = 3 · 102 packets/sec it is equal to 8 and for λ = 102 it falls down to 4, as shown in Fig. 3.

448

I. Koukoutsidis and V.A. Siris 7 Analysis, Saturation Simulation, Saturation Best approximation, λ=3⋅102 Simulation, λ=3⋅102 2 Best approximation, λ=102 Simulation, λ=10

6

E[Dacc] (ms)

5 4 3 2 1 0 4

8

16

32

64 CWmin

128

256

512

1024

Fig. 3. Mean channel access delay for diﬀerent CWmin values, and load conditions (N =5 stations, m = 5). Filled points refer to cases for which Model 2 was more accurate.

5

Concluding Remarks

We have examined modeling approximations for the 802.11 protocol in nonsaturation conditions, with respect to the accuracy of performance evaluation measures. A regenerative ON/OFF structure was used to calculate more accurately the actual throughput experienced by a station. This can be used as a simple extension to the load model, or as a stand-alone model in low load conditions. These modelings can also be extended to the case of multiple service classes suggested in 802.11e, based on modiﬁcations of the saturation analysis shown e.g. in [3, 7].

References 1. Bianchi, G.: Performance analysis of the IEEE 802.11 distributed coordination function. IEEE J. Select. Areas Commun. 18(3) (March 2000) 535–547 2. Kumar, A., Altman, E., Miorandi, D., Goyal, M.: New insights from a ﬁxed-point analysis of single cell IEEE 802.11 WLANs. In: Proc. IEEE Infocom 2005, Miami, FL, USA (March 2005) 1550–1561 3. Ramaiyan, V., Kumar, A., Altman, E.: Fixed point analysis of single cell IEEE 802.11e WLANs: Uniqueness, multistability, and throughput diﬀerentiation. In: Proc. ACM Sigmetrics 2005, Banﬀ, Canada (June 2005) 109–120 4. Cliﬀord, P., Duﬀy, K., Foy, J., Leith, D., Malone, D.: Modeling 802.11e for data traﬃc parameter design. In: Proc. IEEE WiOpt 2006, Boston, MA, USA (April 2006) 28–37 5. Engelstad, P., Østerbø, O.: Non-saturation and saturation analysis of IEEE 802.11e EDCA with starvation prediction. In: Proc. ACM/IEEE MSWiM 2005, Montreal, Quebec, Canada (October 2005) 224–233 6. Ergen, M., Varaiya, P.: Throughput analysis and admission control for IEEE 802.11a. Mobile Networks and Applications 10(5) (2005) 705–716

Modeling Approximations for an IEEE 802.11 WLAN

449

7. Hegde, N., Prouti`ere, A., Roberts, J.: Evaluating the voice capacity of 802.11 WLAN under distributed control. In: Proc. IEEE LANMAN 2005, Chania, Greece (September 2005) 8. IEEE Computer Society: IEEE Std 802.11. (1999) 9. IEEE Computer Society: IEEE Std 802.11a-1999 (R2003). (2003) 10. Wolﬀ, R.: Stochastic Modeling and the Theory of Queues. Prentice-Hall, Englewood Cliﬀs, NJ (1989)

Appendix: ON/OFF Arrivals Let XON , XOFF denote the durations of the ON and OFF periods respectively. We will calculate the mean number of frames sent successively in an ON period, mON , and the mean number of attempt opportunities in an OFF period, mOFF , only relying on expectations of XON , XOFF . If M packets are sent in an ON period, the total time to send these is XON = M acc j=1 Dj . Employing Wald’s theorem, we have mON =

E[XON ] . E[Dacc ]

(15)

Now suppose L generic slots occur in an OFF period, the total duration of (N −1) which is TOFF = L . Then TOFF = XOFF +RL , where RL is the residual j=1 Sj time of the last generic slot which begun before the end of the OFF period. If we assume that the end of the OFF period occurs randomly within the last generic (N −1) 2 ) ] slot, we approximately have E[RL ] = E[(S (see e.g., [10]). 2E[S (N −1) ] Then from TOFF = XOFF + RL , taking expectations in both sides, dividing by E[S (N −1) ], and employing Wald’s theorem, we have mOFF =

E[(S (N −1) )2 ] E[XOFF ] + . E[S (N −1) ] 2(E[S (N −1) ])2

(16)

It is noted that in the case where XOFF is considerably larger than the time of a generic slot the residual term can be removed without signiﬁcant error. Following the notations in Sect. 3.2, the attempt probability for a station writes 1 mON · (1−c) E[R] p= (17) = ¯ + mOFF . E[X] mON · E[X] As always, the collision probability is c = 1 − (1 − p)(N −1) , so that a system of equations is constructed to solve for p, c.

Performance and Equilibrium Analysis of Heterogeneous IEEE 802.11 Based WLANs Hao Zhu Telecommunication and Information Technology Institute Florida International University 10555 W. Flagler Street, Miami, FL 33174 [email protected]

Abstract. This paper presents a general model to study the medium access control (MAC) layer performance and equilibrium of WLANs consisting of nodes with diﬀerent MAC parameters (e.g., backoﬀ window size). Our model can be used in general 802.11-based WLANs since it captures the important factors such as non-uniform backoﬀ behavior, channel errors and unsaturated traﬃc. We ﬁrst formalize the performance of 802.11 based MAC protocols with simultaneous ﬁxed point equations. We then derive the average per-ﬂow service time of each ﬂow, and apply it to calculate throughput. More importantly, based on our model, we use interval analysis to formally study the existence and uniqueness of the network’s equilibrium, and ﬁnd the suﬃcient condition for the uniqueness of equilibrium. We validate our model through simulations and the simulation results show that the model is quite accurate.

1

Introduction

With the low cost and high data rates, IEEE 802.11 based wireless local area networks (WLANs) have been widely deployed in residences, hotels, hospitals and other public areas as a communication infrastructure for high speed wireless Internet access. The core technology of 802.11 WLANs follows the medium access control (MAC) and physical layer (PHY) speciﬁcations [1] ﬁnalized by IEEE 802.11 working groups. In these speciﬁcations, the building block of medium access control mechanism is called distributed coordination function (DCF). DCF is a random access scheme and relies on the carrier sense multiple access with collision avoidance (CSMA/CA). Binary exponential backoﬀ mechanism is used for data retransmission upon a collision or transmission failure. IEEE 802.11 working group has proposed a family of 802.11 MAC protocols such as 802.11 b/g/e [1,2] to improve the system performance in diﬀerent aspects (e.g. channel capacity, service diﬀerentiation). Therefore, it is expected that future WLANs may need to accommodate nodes running heterogeneous 802.11 based MAC protocols. In order to have deep insights into the performance of heterogeneous 802.11 based WLANs, it is critical to develop a general model for 802.11 based WLANs considering the coexistence of heterogeneous MAC parameters since I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 450–461, 2007. c IFIP International Federation for Information Processing 2007

Performance and Equilibrium Analysis of Heterogeneous IEEE 802.11

451

they play important roles in determining the bandwidth share of each node and the bandwidth utilization of the network. There have been a number of studies focusing on the performance model of DCF. Bianchi [3] modeled the binary exponential backoﬀ under saturated traﬃc conditions as a two-dimensional discrete Markov Chain model. The similar Markovian technique has been subsequently adopted by many other works [4,5] in modeling WLANs under unsaturated traﬃc. Carvalho and Garcia-LunaAceves [6] modeled the packet service time in single-hop WLANs given that the state probabilities of a node’s backoﬀ operation are known. With the results from [3], the ﬁrst two moments of the service time were studied. Kumar et al. [7] proposed a simpler model by viewing the backoﬀ procedure as a renewal process. Ramaiyan et al. [8] reﬁned such approach to study the stability of the network in 802.11e WLANs [2] under saturated traﬃc conditions and give suﬃcient conditions for the existence and uniqueness of equilibrium. Kim and Hou [9] modeled the service time in large-scale single-hop WLANs for fast simulations. Medepalli and Tobagi [10] applied the average cycle time analysis to model the average service time in wireless ad hoc networks. Their model captures the impacts of unsaturated traﬃc and channel errors among contending nodes. However, the above-mentioned works are not suﬃcient to model the performance of heterogeneous 802.11 based WLANs under both saturated and unsaturated traﬃc. In this paper, we present a general model of heterogeneous 802.11 based WLANs. The model is fairly simple and quite accurate. Our work diﬀers from previous works in at least one of the following three aspects (especially the third aspect): ﬁrst, our model captures the performance of general 802.11 based WLANs since we take into account of channel errors and realistic network operating scenarios (e.g., unsaturated traﬃc). Second, our model can be used to study the performance of heterogeneous 802.11 WLANs in which each node may have diﬀerent system parameters (e.g. back-oﬀ window size). Third, we theoretically study the suﬃcient condition for the uniqueness of equilibrium in a general heterogeneous 802.11 WLANs. Our suﬃcient condition is a superset of that in [8]. We ﬁrst model the performance of the MAC protocol using the ﬁxed point analysis [7] using renewal process theory [11]. The time average performance of the network can be obtained by solving simultaneous non-linear equations. Then, we apply interval analysis [12] to study the suﬃcient condition for the uniqueness of root, which indicates the network will be in a unique equilibrium. In the previous study [7] for homogeneous 802.11 WLANs, it has been shown that the network has a unique equilibrium if the mean backoﬀ window sizes at diﬀerent backoﬀ stages are a non-decreasing sequence. Through our analysis, we prove that this suﬃcient condition still holds for a general heterogeneous 802.11 WLAN. We verify our model through simulations and the results show that our model is able to accurately capture the average behavior of the network. The rest of the paper is organized as follows. Section 2 states the system model. Section 3 describes the details of the proposed analytical model. We evaluate our model in Section 4. Section 5 concludes this paper.

452

2

H. Zhu

System Model

We study a single-cell WLAN with ﬁxed number n of contending nodes that access the same wireless channel following 802.11 DCF [1]. For simplicity, these nodes are within the transmission range of each other, and there is not hidden terminal problem. We consider general traﬃc situation for each node. In particular, node i can send a packet to any other nodes in the network, and its aggregate traﬃc rate is λi (in packet per slot). In order to capture the impact of unsaturated traﬃc sources, we assume that each node has the probability that its queue is non-empty, denoted by ρi . Suppose the average service time for a packet once it becomes the head of the queue is E(Si ) (in slots), according to the Little’s Theorem [11], we have ρi = λi E(Si )

(1)

Queueing theory [11] has shown that the average packet delay of node i exponentially increases with ρi . According to delay requirements, we assume node i operates with a targeted ρi by controlling λi based on Eq.(1). The control is deterministic if E(Si ) can be uniquely determined by (ρ1 , ..., ρn ). We will prove that this assumption actually holds. According to the traﬃc and channel condition statistics, we assume the mean bit error rate of node i is beri and data packets sent by node i have a mean length of DAT Ai . We assume the RTS/CTS handshaking mechanism is always switched on. As shown in [3,9], there is not signiﬁcant diﬀerences in modeling these two mechanisms. We also consider heterogeneous MAC parameters of each node. Even though the results in [8] can be used to model the impact of arbitrary inter frame space (AIFS) [2], it has been indicated in [8] that diﬀerentiating AIFS or the exponential component p would starve low priority traﬃc as the load of system increases. On the contrary, using initial back-oﬀ window can produce a bounded throughput ratio between diﬀerent classes of traﬃc. Thus, we simply focus on the case that the nodes may have heterogeneous backoﬀ parameters. Speciﬁcally, backoﬀ parameters (for node i) are: – Ki : At the (Ki +1)th attempt either the impending packet of node i succeeds or is discarded – bi,k : The mean backoﬀ (in slots) at the kth attempt for an impending packet of node i, where 0 ≤ k ≤ Ki . Other parameters not mentioned are the same for all nodes. Following the design principle of standard IEEE 802.11 DCF [1], we assume that bi,k , k ≥ 0 is a nondecreasing sequence.

3

The Proposed Model

In this section, we ﬁrst introduce the proposed model and then study the network’s equilibrium. To make the presentation clear, we omit the inter frame spaces since their are much shorter than the transmission time of control and data packets.

Performance and Equilibrium Analysis of Heterogeneous IEEE 802.11

3.1

453

Modeling the Attempt Rate

Suppose node i sends a data packet DAT Ai with beri , the probability of transmission failures, denoted by pe,i , can be calculated as: pe,i = 1 − (1 − peri (RT S))(1 − peri (CT S))(1 − peri (DAT Ai ))(1 − peri (ACK))(2)

where the packet error probability peri (p) is deﬁned by: peri (p) = 1 − (1 − beri )p.length

(3)

When a transmission failure happens at node i, the amount of bandwidth waste, denoted by αi can be calculated as: αi = tx(RT S) + tx(CT S) + (1 − ci )(tx(DAT Ai) + tx(ACK))

(4)

where tx(p) is the total transmission time of packet p and ci =

(peri (RT S) + peri (CT S)) pe,i

(5)

Renewal Cycle

0

Bi

1

Bi

3

2

Bi

0

Bi

Bi

Success

Collisions Transmission Error

Fig. 1. The aggregate attempt process of node i

Whenever the channel is occupied due to either data transmissions or collisions, each node freezes its backoﬀ timer until the channel is available. Therefore, similar to [7], we can remove all busy periods of the channel and study the attempt rate of each node according to the aggregate attempt process in the idle periods of the channel. As shown in Figure 1, the aggregate attempt process of a node can be seen as a renewal process in which the reward is the number attempts in each renewal cycle. Since the each node enters the next backoﬀ stage when it has a collision or transmission failure, suppose the conditional collision probability seen by node i when it attempts to access channel is pc,i , the conditional probability that an attempt of the node fails, denoted by γi follows: γi = (1 − pe,i )pc,i + pe,i

(6)

With the decoupling assumption introduced by Bianchi [3] and the renewal reward theorem, the attempt rate of node i, denoted by βi , is given by: Ki

j=0

βi = Ki

γij

j j=0 bi,k γi

where bi,k is the mean k th backoﬀ period of node i.

(7)

454

3.2

H. Zhu

Modeling the Service Time

Before studying the existence and uniqueness of equilibrium, we ﬁrst model the service time of node i, denoted by E(Si ). Equation (7) only describes the saturation attempt probability. Given the non-empty queue probability of node i, the actual attempt probability of node i is equal to ρi βi . Thus the conditional collision probability of node i can be calculated by: n

pc,i = 1 −

(1 − ρj βj )

(8)

j=1,j=i

idle

1111transmission 0000 0000 1111 collision

1111111 0000000 0000000 1111111 0000000 1111111

11111 00000 00000 11111 00000 11111 a cycle

a cycle

Time

a cycle

Fig. 2. The Illustration of Channel Activities

Since, as shown in Figure 2, the aggregate channel activity can be viewed as a renewal process, each of which contains an idle period followed by a collision or a transmission. For node i, its average service time E(Si ) is the average delay for a packet from the time it reaches the head of the queue at node i to the time the packet is transmitted or dropped. For the aggregate channel activity, the mean renewal time is the mean channel idle time plus the mean busy time for a transmission or a collision. We then study E(Si ) conditioned on node i having a packet in its queue. In this situation, the probability that a slot is idle , denoted by Pidle , is Pidle,i = (1 − βi )

n

(1 − ρj βj )

(9)

j=1,j=i

Thus, the aggregate attempt rate is geometrically distributed with parameter 1 − Pidle,i , thus the mean channel idle time seen by node i follows (Note that we use a slot as time unit): E(tidle,i ) =

1 1 − Pidle,i

(10)

When the idle period ends, an attempt happens and the channel becomes busy due to a transmission or a collision. The probability of node k performing an attempt seen by node i follows: Pa (k, i) =

⎧ ⎨ ⎩

1−(1−βi ) 1−(1−βi )

nβi

j=1,j=i

(1−ρj βj )

j=1,j=i

(1−ρj βj )

ρnk βk

,k=i , else

(11)

Performance and Equilibrium Analysis of Heterogeneous IEEE 802.11

455

With Eq. (8), given the conditional collision probability pc,k , the probability that node k’s attempt is collision free seen by node i, denoted by Ptr (k, i), is equal to: Ptr (k, i) =

Pa (i, i)(1 − pc,i ), k=i 1−βi , else Pa (k, i)(1 − pc,k ) 1−ρ β i i

(12)

When the channel is busy, the collision probability of the system seen by node i, denoted by Pcoll (i), is equal to: Pcoll (i) = 1 −

n

Ptr (k, i)

(13)

k=1

For the attempt process of node i, node i’s attempts can be separated by renewal cycles. Each cycle consists an idle period followed by a busy period. The mean channel busy time seen by node i before it obtains its transmission chance follows: E(tbusy,i ) =

n

Ptr (k, i)δk + Pcoll (i)Tc

(14)

k=1

where δk = (1 − pe,k )tx(DAT Ak ) + pe,k αk

(15)

αk is deﬁned by Eq. (4), and Tc is the mean time the channel is sensed busy during a collision, and is equal to tx(RT S) + EIF S. The mean number of renewal cycles between node i’s consecutive attempts follows geometric distribution with parameter Pa (i, i). Thus, the mean time period E(tidle,i )+E(tbusy,i ) between node i’s consecutive attempts is equal to . ConsiderPa (i,i) ing collisions and channel errors, the mean number of attempts node i needed for a successful transmission follows geometric distribution with parameter 1 − γi . Since each packet at node i will be dropped after Ki + 1 transmissions, the adK +1 Ki 1−γi i justed mean number of attempts for a packet follows: k=0 (1−γi )γik = 1−γ . i 1−γ

Ki +1

i With Lhopital’s rule for 00 , limγi →1 1−γ i vice time of node i can be obtained by:

E(Si ) =

= Ki + 1. Therefore, the mean ser-

1 − γiKi +1 E(tidle,i ) + E(tbusy,i ) ) ( 1 − γi Pa (i, i)

(16)

We can use the service time to derive throughput. With queueing theory [13], the throughput of node i, denoted by Ti , follows: Ti =

ρi DAT Ai (1 − pd,i ) E(Si )δ

(17)

Where δ is the length of a time slot. pd,i reﬂects the impact of packet dropping, and follows: pd,i = 1 −

Ki

(1 − γi )γik = γiKi +1

k=0

(18)

456

H. Zhu

3.3

The Existence of Equilibrium n Let Γ (pc,i ) = 1− j=1,j=i (1−ρj Gj (pc,j )), for 1 ≤ i ≤ n, where Gi (pc,i ) = βi . We can write these n equations compactly in the following form of the simultaneous ﬁxed point equations. (pc,1 , pc,2 , ..., pc,n ) = (Γ (pc,1 ), Γ (pc,2 ), ..., Γ (pc,n ))

(19)

Definition 1. Given ρ = (ρ1 , ..., ρi ), the network is in equilibrium if the conditional collision probability vector pc = (pc,1 , ..., pc,n ) satisﬁes the simultaneous equations (19). Because Eqs. (19) are non-linear equations, the network will be in multiple equilibria if the equations have multiple roots. In contrast, the network stays in a steady-state equilibrium if the equations have a unique root. The steady-state equilibrium can be directly used to describe the network’s time average performance1 , which is of fundamental interest. Since ρ and pc are continuous real vectors in [0, 1]n , Eqs.(19) are continuous function mapping from [0, 1]n to [0, 1]n . Hence by Brouwer’s ﬁxed point theorem there exists ﬁxed points in [0, 1]n for Eqs. (19). Many existing algorithms can be used to calculate the roots of Eq. 19. For example, Quasi-Newton algorithms (e.g. the Broyden algorithm [14]) can be used to solve the equations eﬃciently. During our study, the equations can be solved quite fast and the root can be found with less than 7 iterations. 3.4

Uniqueness of Equilibrium

The network has a unique equilibrium if and only if Eqs.(19) has a unique root. At the ﬁrst glance, the uniqueness of solution may be aﬀected by system parameters, BERs and the traﬃc arrival rate λs. Through detailed analysis, we can show that the uniqueness of equilibrium is only related to the settings of backoﬀ parameters (i.e. bi,k ). In the following sub-sections, we give the details of analysis. Interval Extensions of Rational Functions. We start our analysis by introducing basics of interval analysis. A real interval A is the bounded, closed set of the real numbers deﬁned by A = [a, a] = {x ∈ R|a ≤ x ≤ a}

(20)

where a, a ∈ R and a ≤ a. When a = a, A is called a singleton. An interval vector V in I(Rn ) has n components, each of which is an interval, Vi ∈ I(R), i = 1, ..., n. For A, B ∈ I(R), the operators +, −, ∗, / are deﬁned as follows: A + B = [a + b, a + b] A − B = [a − b, a + b] A ∗ B = [min(ab, ab, ab, ab), max(ab, ab, ab, ab)] A/B = A ∗ [1/b, 1/b] 1

The time average is the long term time average, and does not necessarily prevent the short-term unfairness problem found in [8].

Performance and Equilibrium Analysis of Heterogeneous IEEE 802.11

457

For an interval matrix A with interval coeﬃcients [aij , aij ], its norm follows: ||A|| = maxi

n

max(|aij |, |aij |)

j=1

As state in [15] that, given a rational function f (x), x ∈ Rn , its interval extension F (X), X ⊆ I(Rn ) can be obtained by simply replacing the real operations by interval operations and the variables by intervals. One simple example is, f (x) = x(1 − x); F (X) = X ∗ (1 − X) Testing the Uniqueness of Solution. We apply the interval ﬁxed point theorem to get the suﬃcient condition for the unique solution of Eqs.(19). In particular, given a initial x0 ∈ X, where X is an interval vector. The Krawczyk operator [16] is deﬁned as: K(X, x0 ) = x0 − Y f (x0 ) + (I − Y J(X))(X − x0 ))

(21)

where Y is an arbitrary nonsingular matrix and J(X) is the interval extension of the Jacobian of F (x), which is denoted by J(x). Speciﬁcally, Jij (X) is obtained by applying interval extension to Jij (x). Krawczyk [16] has shown that if X has a solution to F (x) = 0, where F (.) is a function, then so does K(X, x0 ). According to the Moore’s theorem [15], the suﬃcient condition for the uniqueness of solution for F (x) = 0 is as follows: ||I − Y J(X)|| < 1

(22)

where I is the identity matrix with the same dimension of Y J(X). The norm of interval matrix A with interval coeﬃcients Aij = [aij , aij ] is deﬁned in Eqs. (21). The reason behind is that, when Eq.(22) holds, the Krawczyk operator performs contractive mapping in X and guarantees a unique root. In order to guarantee the uniqueness of solution, we need to ﬁnd a nonsingular matrix Y which satisﬁes condition (22). Supposed we have a nonsingular matrix Y , we relax condition (22) to reduce the computational complexity. Speciﬁcally, ||I − Y J(X)|| = ||Y (Y −1 − J(X))||

Since all matrix norms satisfy the submultiplicative property ||AB|| ≤ ||A||||B||

Letting Y = Y

−1

, we have ||Y −1 (Y − J(X))|| ≤ ||Y −1 ||||Y − J(X)||

Since ||I|| = 1, to satisfy ||Y −1 ||||Y − J(X)|| < 1, we obtain ||Y −1 ||||Y − J(X)|| < ||Y −1 ||||Y || ⇒ ||Y − J(X)|| < ||Y ||

(23)

Therefore, we need to study if we can ﬁnd a nonsingular matrix Y to satisfy Ineq. (23).

458

H. Zhu

The Jacobian J(x) is calculated by:

Jij (x) = ∂Γ (xi ) ∂xj

−1,

∂Γ (xi ) , ∂xj

i=j else

(24)

is calculated as follows: n ∂Γ (xi ) = (1 − ρk Gk (xk ))ρj Gi (xj ) ∂xj k=1,k=i,j

(25)

Since Jij (x) is a rational function, we can easily apply interval extension to Jij (x) over the interval [0, 1]. We have G([0, 1]) = [G(1), G(0)] n (1 − ρk G([0, 1]) = ⇒ k=1,k=i,j

[

n

(1 − ρk G(0)),

k=1,k=i,j

n

(1 − ρk G(1))]

(26)

k=1,k=i,j

Since Gi (.) resides in [0, 1] and is a monotonically decreasing function [7] provided that bi,k is a non-decreasing sequence, Gi (.) < 0. Thus, we have Gi ([0, 1])

(1 − pe,i )ρi n k=1,k=i,j (1 − ρk Gi (1)) = ∗ Ki ( k=0 bi,k pke,i )2

Ki Ki Ki Ki (Ki + 1) kbi,k pk−1 pke,i − bi,k , 0] [ e,i 2 k=1 k=0 k=0

(27)

Therefore, J([0, 1]) follows: Jij ([0, 1]) =

[−1, −1], i = j [aij , aij ], else

(28)

where [aij , aij ] = Gi ([0, 1]) ∗ [

n

k=1,k=i,j

(1 − ρk G(0)),

n

(1 − ρk G(1))]

(29)

k=1,k=i,j

Theorem 1. For each node i, if bi,k is non-decreasing, then the network has a unique equilibrium and the corresponding λ is also unique. Proof. The existence of equilibrium has been proven in Section 3.3. Because all parameters are ﬁnite and G(.) is a non-increasing function, [aij , aij ], 1 ≤ i, j ≤ n are compact and aij < aij = 0. Given J([0, 1]), we can always construct a nonsingular matrix Y following the algorithm in Table 1.

Performance and Equilibrium Analysis of Heterogeneous IEEE 802.11

459

Table 1. The algorithm for constructing the matrix Y k=argmax1≤i≤n n j=1 |aij |; /*Note that ai < 0*/ Arbitrarily select a real positive number τ , and matrix Y is deﬁned by: ⎧ i=j ⎨ −1, Yij = akj − τ, i = k ∧ i = j ⎩ 0, else

(30)

Applying the algorithm in Table 1, it is obvious that Y exists since [aij , aij ], 1 ≤ i, j ≤ n are compact. In addition, Y is nonsingular since det(Y ) = (−1)n . Applying Y to the left-hand of Ineq (23), we have: ||Y − J([0, 1])|| = (n − 1)

n

|akj − τ |

j=1

= ||Y || − 1 < ||Y ||

Therefore, Ineq. (23) is satisﬁed and, given ρ, Eqs. (19) has a unique root pc . As a result, the network equilibrium is unique. In addition, according to the analysis in Section 3.2, the service time of node i, E(Si ), is uniquely decided by ρ and pc . Recalling Eq.(1), the mapping from ρ to λ is one-to-one, which guarantees the ﬂow control vector λ is also unique.

4

Model Validations

We validate our model via comparing analytical results to simulation results. The simulations are performed using ns-2 [17]. Without loss of generalization, packet size is assumed to be 1000 bytes and the transmission rate of each node is assumed to be 2 Mbps with a certain BER. Due to the limitation of space, we study a homogeneous network where each node follows the standard 802.11b

Our Model Simulation

140

120 Throughput (Kbps)

Throughput (Kbps)

120

100

80

100

80

60

60

40

40

20 10

Our Model Simulation

140

15

20

25 30 35 Number of Nodes

40

45

(a) Error-free Channel

50

20 10

15

20

25 30 35 Number of Nodes

40

45

(b) Error-Prone Channel

Fig. 3. The average throughput

50

460

H. Zhu

speciﬁcations. In order to study both saturated and unsaturated traﬃc scenarios, ρi is deﬁned by min(0.05 ∗ n, 1.0), where n is the number of nodes. We evaluate the average per-ﬂow throughput in two scenarios, which are errorfree and error-prone respectively. When channel is error-prone, the BER of each ﬂow is assumed to be equal to 10−5 . We evaluate the average throughput as the function of the number of nodes. The results are shown in Figure 3 (a) and (b). As shown in the Figure, our model is able to accurately capture of average throughput in the system. Note that when the number of nodes is greater or equal to 20, every ﬂow becomes saturated.

5

Conclusion

In this paper, we present a simple and accurate analysis model for the MAC performance of heterogeneous IEEE 802.11 based WLANs. We take into account of many practical factors (such as channel errors and unsaturated trafﬁc) which may have signiﬁcant impact on the network performance. With ﬁxed point analysis, we ﬁrst model the performance of the network with a simultaneous non-linear equations. Then, the service time of each ﬂow is modeled and is further used to obtain the average throughput. We then theoretically examine the equilibrium of a general heterogeneous WLAN by studying the property of the corresponding simultaneous non-linear equations. We ﬁnd the suﬃcient condition for the uniqueness of the equilibrium. Our model is validated through simulations and the results prove that our model is quite accurate to model the time average performance of the network. As future work, we will extend our model for 802.11-based multi-hop wireless ad hoc networks and study the existence and uniqueness of equilibrium as well.

References 1. IEEE: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Spec. IEEE 802.11 standard (1999) 2. 802.11 Work Group: Draft Supplement to Standard for Telecommunications and Information Exchange between Systems-LAN/MAN Speciﬁc Requirements- Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Speciﬁcations: Medium Access Control (MAC) Enhancements for Quality of Serice (QoS). IEEE 802.11e Draft 3.1 (2002) 3. Bianchi, G.: Performance analysis of the ieee 802.11 distributed coordination function. IEEE Journal on selected areas in communication 18(3) (2000) 535–547 4. Erge, M., Varaiya, P.: Throughput analysis and admission control for ieee 802.11a. Mobile Networks and Applications 10(5) (2005) 705–716 5. Garetto1, M., Chiasserini, C.F.: Performance analysis of 802.11 wlans under sporadic traﬃc. In: 4th International IFIP-TC6 Networking Conference (NETWORKING 2005). (2005) 6. Carvalho, M., Garcia-Luna-Aceves, J.: Delay analysis of ieee 802.11 in single-hop networks. In: IEEE ICNP’03. (2003) 7. Kumar, A., Altman, E., Miorandi, D., Goyal, M.: New insights from a ﬁxed point analysis of single cell ieee 802.11 wlans. In: IEEE INFOCOM’05. (2005)

Performance and Equilibrium Analysis of Heterogeneous IEEE 802.11

461

8. Ramaiyan, V., Kumar, A., Altman, E.: Fixed pointed analysis of single cell ieee 802.11e wlans: Uniqueness, multistability and throughput diﬀerentiation. In: ACM SIGMETRICS’05. (2005) 9. Kim, H., Hou, J.C.: A fast simulation framework for ieee 802.11-operated wireless lans. In: ACM SIGMETRICS’04. (2004) 10. Medepalli, K., Tobagi, F.A.: Towards performance modeling of ieee 802.11 based wireless networks: A uniﬁed framework and its applications. In: IEEE INFOCOM’06. (2006) 11. Ross, S.M.: Introduction to Probability Models (7th Ed.). Academic Press (2000) 12. Mermaier, A.: Interval Methods for Systems of Equations. Cambridge University Press (1990) 13. Gross, D., Harris, C.M.: Fundamentals of Queueing Theory (Wiley Series in Probability and Statistics). Wiley-Interscience, 3rd edition (1998) 14. Broyden, C.G.: A class of methods for solving nonlinear simultaneous equations. Methematics of Computation 19 (1965) 577–593 15. Moore, R.E.: Interval Analysis. Prentice-Hall (1966) 16. Krawczyk, R.: Newton-algorithmen zur bestimmung von nullstellen mit fehlerschranken. Computing (by Springer Wien) 4(3) (1969) 187 – 201 17. VINT Group: Network simulatior version 2 (ns-2). http://www.isi.edu/nsnam/ns/ (1999)

Exploring a New Approach to Collision Avoidance in Wireless Ad Hoc Networks Jun Peng1 and Liang Cheng2 1

Department of Electrical Engineering, University of Texas - Pan American, Edinburg, TX 78541 USA [email protected] 2 Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA 18015 USA [email protected]

Abstract. 1 We propose in this paper a new approach of bit-free control frames to improving the performance of the IEEE 802.11 DCF. Basically, a bit-free control frame does not contain any meaningful bits; instead, its length (i.e., airtime) is encoded with the control information. This new approach has two advantages over the traditional control frames. First, the airtime of a bit-free frame is easy to detect and robust against channel eﬀects. Second, bit-free control frames can be very short because no headers or preambles are needed for them. Our investigation demonstrates that the new approach improves the performance of the IEEE 802.11 DCF signiﬁcantly (network throughput gains from ﬁfteen to more than one hundred percent).

1

Introduction

The hidden terminal phenomenon in wireless packet networks is interesting but problematic. Basically, even if two nodes in a wireless network cannot sense each other, they may still cause collisions at the receiver of each other [1]. If the hidden terminal problem is not well addressed, a wireless network may have a signiﬁcantly degraded performance in every aspect, since frequent packet collisions consume all types of network resources such as energy, bandwidth, and computing power but generate no useful output. There are basically two existing approaches to the hidden terminal problem. One is the use of an out-of-band control channel for signaling a busy data channel when a packet is in the air [2,4,3]. This approach is eﬀective in dealing with hidden terminals but requires an additional control channel. The more popular approach to the hidden terminal problem is the use of in-band control frames for reserving the medium before a packet is transmitted [5,7,6]. The popular IEEE 802.11 standard [9] uses this approach in its DCF. 1

The research was partly supported by the Commonwealth of Pennsylvania, Department of Community and Economic Development, through the Pennsylvania Infrastructure Technology Alliance (PITA), and Lehigh University.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 462–474, 2007. c IFIP International Federation for Information Processing 2007

Exploring a New Approach to Collision Avoidance

463

Basically, before an IEEE 802.11 node in the DCF mode transmits a packet to another node, it ﬁrst sends out a Request to Send (RTS) frame after proper backoﬀs and deferrals. After receiving the RTS frame, the intended receiver responds with a Clear to Send (CTS) frame, which includes a Duration ﬁeld informing its neighbors to back oﬀ during the speciﬁed period. In an ideal case, the hidden terminals of the initiating sender will successfully receive the CTS frame and thus not initiate new transmissions when the packet is being transmitted. However, control frames have limited eﬀectiveness in dealing with hidden terminals because they may not be able to reach all the intended receivers due to signal attenuation, fading, or interference [8]. In addition, control frames have considerably long airtimes because they are recommended to be transmitted at the basic link rate in both narrow-band and broadband IEEE 802.11 systems for link rate compatibility among nodes. In addition, they also usually carry long physical layer preambles and headers. Therefore, in-band control frames still introduce signiﬁcant network overhead, even though they do not require an out-of-band control channel. This paper explores a new approach of bit-free control frames to addressing the disadvantages of the traditional control frames. Basically, with the new approach, control information is carried by the airtimes instead of the bits of control frames. The airtime of a frame is easy to detect and robust against interference and channel eﬀects. In addition, a bit-free control frame carries no meaningful bits so that no preamble or headers are needed for it (in-band bursts of variable lengths were used for priority scheduling in [10]). To investigate the potentials of the new approach, we have modiﬁed the IEEE 802.11 DCF by replacing the traditional control frames with bit-free control frames and have done extensive simulations with the modiﬁed protocol. Our investigation has shown that the modiﬁed protocol improves the average throughput of a wireless network by from ﬁfteen percent to more than one hundred percent. The rest of the paper is organized as follows. Section 2 presents our modiﬁcations to the IEEE 802.11 DCF. We show in Section 3 the comprehensive simulation results comparing the modiﬁed protocol to the original one. Finally, we give our conclusions in Section 4.

2 2.1

Applying the New Approach Basics

The challenge in applying the new approach to the IEEE 802.11 DCF is the limited capability of the bit-free control frames in carrying control information. Particularly, only the airtime of a control frame can carry control information. To address this issue, we use two basic strategies. One is that the bit-free control frames only carry the indispensable information for medium access control, while the other is to use frame pairs for backoﬀ duration control. For sending bit-free control frames, we assume that the IEEE 802.11 hardware has some modiﬁcation so that it can be commanded to transmit the carrier for a speciﬁed amount of time. We also assume that the airtime of a control

464

J. Peng and L. Cheng

frame can be recorded with a degree of accuracy depending on the hardware, bandwidth, and channel conditions. One protocol parameter, the minimum guard gap between the lengths of two control frames, may be adjusted based on the recording accuracy. In fact, with its carrier sense capability, the existing IEEE 802.11 hardware may record the airtime of an incoming frame. In addition, a bit-free control frame can not be mistaken as a bit-based frame, since a bit-free frame does not include a physical layer preamble and thus the synchronization on the frame can not be done. A bit-based frame, however, may be mistaken as a bit-free frame if the synchronization on the frame fails. This kind of interference is usually ﬁltered out due to the typically long airtime of a bit-based frame and the short airtime of a bit-free control frame. 2.2

Bit-Free Control Frames

The frame type needs to be speciﬁed for each frame so that the receiver knows how to interpret the bits in the bit-based frame case or the frame airtime in the bit-free frame case. Bit-free frames carry no meaningful bits so that the frame type information can only be delivered by their airtimes. Particularly, if the airtime of a bit-free frame falls into a speciﬁed range or ranges, then the frame belongs to the type of frame denoted by the range or ranges. Besides the frame type information, the other indispensable information in an RTS frame is the address of the receiver. The length of a bit-free RTS frame needs to fall into the designated range or ranges. We therefore may not be able to encode the address information of each single receiver into the airtime of a bit-free RTS frame. To address this problem, we apply a “Mod-n” calculation on each receiver address before it is encoded. Basically, we ﬁrst divide the address by n and then encode the remainder into the frame airtime. Particularly, If r = M od(RA, n), then FL = RT S(r) where RA is the receiver address, n is an integer, r is the remainder, FL is the airtime of the bit-free RTS frame to send, and RT S(r) is an r-indexed element in the set of RTS lengths in microseconds. The Duration ﬁeld in a bit-based RTS frame is also important because it speciﬁes the period during which a receiver of the frame should back oﬀ. A bitfree RTS frame does not have the capacity for the duration information. Instead, a receiver of a bit-free RTS frame starts to back oﬀ upon receiving the frame and ends the backoﬀ only after the medium has been sensed idle for a speciﬁed amount of time (more details later). In our proposed design with bit-free frames, all CTS frames have the same ﬁxed length that distinguishes them from other bit-free frames. In addition, we use control frame pairs to communicate the backoﬀ duration information of a traditional CTS frame, which will be introduced later. Similarly, all bit-free ACK frames in our design have the same ﬁxed length that distinguishes them from other types of bit-free frames (the address issue of these frames is discussed in Section 2.5).

Exploring a New Approach to Collision Avoidance

465

In addition to the RTS, CTS, and ACK bit-free frames, we add another type of bit-free control frame named CTS-Fail frame in our design. A CTS-Fail frame has a ﬁxed length and is sent by a CTS frame sender in two cases to notify other nodes to end their backoﬀ. The ﬁrst case is that a CTS frame sender does not receive any packet after SIFS (Short Interframe Space) plus propagation delays after sending the CTS frame. The second case is that a CTS frame sender receives a packet after sending the CTS frame but ﬁnds that either the packet is not intended for it or the packet has errors. In this case, the CTS-Fail frame is sent only after the packet is fully received. 2.3

Frames Working Together

To explain how the four types of bit-free control frames work together in the modiﬁed IEEE 802.11 DCF, we describe how a node contends for the medium when it has a packet to transmit. The IEEE 802.11 DCF is basically a CSMA/CA protocol, and our modiﬁcations to the protocol are only on the CA part. When a node has a packet to transmit, it starts to listen to the channel. If the channel has been found idle for a period of time longer than the DCF Interframe Space (DIFS), the node starts a random backoﬀ timer whose value is uniformly drawn from the node’s contention window (CW). If the node detects no carrier before its backoﬀ timer expires, it proceeds to transmit an RTS frame upon the expiration of its backoﬀ timer. Otherwise, the node backs oﬀ. As soon as the backoﬀ timer of the node expires, the node starts to transmit a bit-free RTS frame. As explained earlier, the airtime of the bit-free frame is determined by the address of the intended receiver. After ﬁnishing the transmission, the node waits for a CTS frame, whose airtime is ﬁxed and known. After a neighbor of the initiating sender receives the bit-free RTS frame, it does the “Mod-n” calculation on its own address and compares the remainder to the length of the received frame in microseconds. If the remainder matches the length, the neighbor sends out a bit-free CTS frame and then waits for a packet. If the CTS frame sender does not receive any packet after a period of SIFS plus propagation delays, it sends out a CTS-Fail frame. On the other hand, if the remainder does not match the length of the received RTS frame, the neighbor will enter backoﬀ and remain in the backoﬀ until the medium has been sensed idle for a period of time that is SIFS plus either the CTS frame length or the ACK frame length, whichever is longer. After the initiating sender obtains the bit-free CTS frame, it waits for SIFS and then starts to transmit the packet. If for any reason the RTS frame sender fails to obtain the expected CTS frame, the sender starts over to contend for the medium. In such a case, the sender doubles its CW. On the other hand, if a node receives an unexpected bit-free CTS frame (i.e., the node is not an RTS frame sender), the node increases its CTS frame counter N umcts by one, starts a backoﬀ monitor timer, and then enters backoﬀ. Such a node exits the backoﬀ in two cases. One is that its CTS frame counter N umcts reaches zero when the node decrements the counter by one after receiving an ACK or CTS-Fail frame, while the other is that its backoﬀ monitor timer expires (more details later).

466

J. Peng and L. Cheng

After the initiating sender succeeds in contending for the medium, receives the expected CTS frame, and fully transmits the packet, it expects a bit-free ACK frame from the receiver. If the sender does not obtain the expected acknowledgment, it doubles its CW and starts to monitor the channel again for a retransmission. On the other hand, after a node receives the data packet, it checks if the packet is intended for it and free of error. If so, the node sends back a bit-free ACK frame. If the packet is not intended for it or the packet has errors, the node checks whether it has sent a CTS frame for the packet. If so, the node sends out a CTS-Fail frame to notify its neighbors to exit backoﬀ. The whole process repeats until the initiating sender obtains an acknowledgment for the packet or the retry limit is reached. The node discards the packet in the latter case and resets its CW to the minimum size in both cases. 2.4

Some Design Considerations

The ﬁrst design consideration on the modiﬁed MAC protocol is the choices of receive power thresholds for its bit-free control frames. Unlike bit-based frames, bit-free control frames can be correctly received as long as they can be sensed. The receive power threshold for a bit-free control frame may thus be adjusted for controlling the transmission range of the frame. As introduced earlier, a bitbased CTS frame may not successfully reach all the hidden terminals of the initiating sender [8]. A node with the modiﬁed protocol, therefore, needs a lower receive power threshold for bit-free control frames. The lowest power threshold that a node may use for receiving a bit-free control frame is the carrier sense power threshold. In such a case, a node decodes a bitfree frame if the frame can be sensed. The implementation in our simulations uses this conservative choice to ensure the coverage of bit-free control frames. However, there is an exception. When a node receives a bit-free RTS frame matching its address, the node responds with a CTS frame only if the received power of the RTS frame is above the receive power threshold for data frames, since the node should not respond if it can not correctly receive a packet from the other node. Another design consideration on bit-free control frames is the set of lengths in terms of airtimes that the frames should use. The basic rule is that control frames should be easy to detect and distinguish from one another. The shortest control frame in our simulations is 20-μs long and the minimum guard gap between two lengths in the set is 5μs, which corresponds to 5-bit airtime at the transmission rate of 1Mb/s (even in broadband systems such as 802.11g, the control frames are recommended to be transmitted at a basic link rate). In reality, the minimum guard gap should be set based on the length detection accuracy of bit-free frames, which may be aﬀected by the hardware, bandwidth, and channel conditions. When choosing the length for a speciﬁc control frame that has a ﬁxed length, we need to consider another factor. In particular, when multiple bit-free frames arrive at the same node in the same time segment, they may form a “merged” bit-free frame that has a length denoting another deﬁned bit-free control frame.

Exploring a New Approach to Collision Avoidance

Case 1

1111111111111111 0000000000000000 0000000000000000 1111111111111111 000000000000000000 111111111111111111 000000000000000000 111111111111111111

0000000000000000 A1111111111111111 0000000000000000 1111111111111111

C

000000000000000000 111111111111111111 B 111111111111111111 000000000000000000

467

1111111111111111 000000000000000000 111111111111111111 0000000000000000 000000000000000000 111111111111111111 0000000000000000 1111111111111111 000000000000000000 111111111111111111 0000000000000000 1111111111111111 000000000000000000 111111111111111111 0000000000000000 1111111111111111

Corrupted Frames

Bit−Based Frames A Case 2

C

B

Still Frame B Bit−Free Frames

A Case 3

C B

A New Frame Bit−Free Frames

Fig. 1. Merging of Control Frames

This kind of false control frame may appear when the merged frame has a longer airtime than any individual merging frame, as demonstrated by Case 3 in Fig. 1. The possible adverse eﬀects of the merged frame phenomenon are alleviated by the discrete lengths of the deﬁned control frames and the strict timelines for receiving CTS and ACK frames. Particularly, only when a merged frame matches a deﬁned bit-free control frame, would it possibly cause some harm. Moreover, for control frames such as CTS and ACK, a false frame may be harmful only if it emerges in the right timeline and at the right node (if a false ACK appears accidentally at a sender, the lost packet will be recovered by upper layers). However, we may still further address the merged frame phenomenon by carefully choosing the lengths for the ﬁxed-length control frames. We have three types of ﬁxed-length control frames, which are CTS, ACK, and CTS-Fail. Among them, a false CTS frame would arguably generate the worst scenario, in which the nodes receiving the false frame enter backoﬀ and wait for a non-existing ACK or CTS-Fail frame for exiting the backoﬀ. Therefore, to avoid false CTS frames generated by merging frames, we need to assign a CTS frame the shortest length in the chosen length set for control frames. What happens if a false CTS frame emerges anyway due to a reason such as environmental noise? A backoﬀ monitor timer is used to address this problem. When a node receives a CTS frame, it starts a backoﬀ monitor timer before it enters backoﬀ. The backoﬀ monitor timer is set to a value Tm that is the transmission time of the largest allowable frame in the network. The node exits the backoﬀ anyway when its backoﬀ monitor timer expires. Additionally, a backoﬀ monitor timer also solves the problem of lost ACK or CTS-Fail frames due to interference or failed nodes. In addition, it needs some extra caution to receive a CTS frame. A RTS frame may be interpreted by two or more nodes as being intended for them due to the “Mod-n” calculation design and thus two or more bit-free CTS frames may be generated for a single RTS frame. The consequence in such a case is that the received CTS frame may be slightly longer than usual because of the various propagation delays between the RTS frame sender and its receivers (besides, the medium may be reserved in a larger space than necessary in such a case). A degree of tolerance on length variation is therefore needed for decoding a CTS frame. Particularly, if we denote the transmission distance of a node by dtx and

468

J. Peng and L. Cheng

the signal propagation speed by c, then the decoding tolerance δ on the length of a CTS frame should be the maximum possible diﬀerence of round trip times for the receivers, i.e., dtx . (1) δ =2× c Finally, a bit-free ACK frame needs to have a longer length than a bit-free CTS-Fail frame. A data frame may have more than one active receivers because the length of an RTS frame is determined by a modulus calculation. In such a case, the false active receivers will respond with CTS-Fail frames after ﬁnishing receiving the data frame. To enable the sender to still recognize the ACK frame, the ACK frame must be longer than the CTS-Fail frames. 2.5

More Design Issues

One disadvantage of bit-free control frames is that they carry no speciﬁc addresses so that they may be interpreted by any receiver as legitimate. One basic observation, however, is that when an initiating sender is expecting a CTS or ACK frame, it has already notiﬁed its neighbors except the intended receiver to back oﬀ. Therefore, an initiating sender may only receive a CTS or ACK frame from the intended receiver in a general case. Moreover, an initiating sender sets a strict timeline for receiving a CTS or ACK frame. Therefore, an initiating sender can hardly receive a false and harmful CTS or ACK frame, which makes the lack of address information in the CTS and ACK frames almost harmless. There is a special case to consider, which is that two senders may start to transmit their RTS frames almost at the same time. If the two nodes can hear each other, there is usually no harm, since in such a case the sender with a shorter RTS frame will usually detect the other sender after it ﬁnishes its RTS frame transmission. If the two senders cannot hear each other, there may exist a harmful situation in which one sender overhears the CTS frame intended for the other and mistakenly starts to transmit its packet. This kind of harmful situation occurs, however, with low probabilities because two senders with diﬀerent RTS frames have diﬀerent timelines for receiving their CTS frames. If the two RTS frames have the same length, a collision may occur and the following random backoﬀs of nodes will resolve the issue.

3

Scheme Evaluations

We have done extensive simulations with ns-2 [11] to investigate the performance of the modiﬁed IEEE 802.11 DCF and compare it to the original protocol. As mentioned earlier, we only modiﬁed the collision avoidance (CA) part of the original protocol, while other parts of the original protocol were kept unchanged. For easy reference, we named the modiﬁed MAC protocol as CSMA/FP, which denotes Carrier Sense Multiple Access with Frame Pulses.

Exploring a New Approach to Collision Avoidance x 10

Average Contention Window Sizes

Wireless LAN Throughput

Average Medium Access Delays

150

0.18 CSMA/FP IEEE 802.11

CSMA/FP IEEE 802.11

9

Average Medium Access Delay (second)

5

10

Average CW Size

Network Throughput (b/s)

8 7 6 5 4 3

100

50

2 1 0 5

10 15 20 Number of Nodes in The Network

25

Fig. 2. Network Throughput vs. Number of Nodes in the Network

3.1

0 5

469

10 15 20 Number of Nodes in The Network

25

Fig. 3. Average CW Size vs. Number of Nodes in the Network

CSMA/FP IEEE 802.11

0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 5

10 15 20 Number of Nodes in The Network

25

Fig. 4. Average Medium Access Delay vs. Number of Nodes in the Network

Configuration Details

We ﬁrst evaluated CSMA/FP in a wireless LAN with saturation traﬃc and compared it to the original protocol. We then used a more general scenario of a multihop ad hoc network to investigate its performance. Particularly, we evaluated the protocols from the perspective of an individual user in the ad hoc network. From an individual user’s perspective, the network is better if the user can have statistically higher ﬂow throughput. Although a contention-based MAC protocol may not be always fair to contending nodes in terms of one-hop throughput, the statistical rate of a random ﬂow in the network truthfully reﬂects the throughput of the network, especially when the transport layer does not apply rate control over the ﬂows in the network, as conﬁgured in our simulations. The ad hoc network has 100 nodes in an area of 1000 by 1000 square meters. Each node uses a transmission power of 0.2 watt, which means a carrier sense range of about 500 meters with the default power threshold settings of ns-2. The link rate of each node is 1Mb/s (a higher rate means that more bits will be transmitted in the time saved by CSMA/FP for using more eﬀective and eﬃcient control frames). In addition, there are a maximum of 25 Constant Bit Rate (CBR) background ﬂows. The routing protocol used in the simulations is the Dynamic Source Routing protocol (DSR) [12]. In modifying the IEEE 802.11 DCF with the bit-free control frame approach, we used an n of 20 in the “Mod-n” calculation over the receiver’s address for obtaining the length of an RTS frame. Twenty is the average number of nodes that fall into the transmission range of a node in the ad hoc network (however, we also investigated the impact of a halved n). The elements in the length set designated for RTS frames fall into two ranges for balancing the average length of an RTS frame with the average length of other control frames. One of the ranges is from 40 to 90μs, while the other is from 120 to 170 μs (with a guard gap of 5μs). In addition, a CTS frame, a CTS-Fail frame, and an ACK frame have ﬁxed lengths of 20, 100, and 110μs, respectively. For other parameters, the modiﬁed protocol shares the default ns-2 conﬁgurations with the original protocol. For example, the minimum and maximum sizes of the CW of a node are 32 and 1024 timeslots, respectively, while a timeslot is 20 μs.

470

J. Peng and L. Cheng Throughput vs. Network Load (Max. Node Speed: 0.0 m/s)

Throughput vs. Network Load (Max. Node Speed: 5.0 m/s)

1

1 CSMA/FP IEEE 802.11

0.9

0.8

0.7

0.7

0.7

0.6

0.6

0.6

0.4

Throughput

0.8

0.5

0.5 0.4

0.5 0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0 0

0.5

1 1.5 2 2.5 3 (x−1) * 512 Byte/Second) Flow Rate (2

3.5

4

Fig. 5. Flow Throughput in Percentage, Max. Node Speed 0m/s

3.2

0 0

CSMA/FP IEEE 802.11

0.9

0.8

Throughput

Throughput

Throughput vs. Network Load (Max. Node Speed: 10.0 m/s)

1 CSMA/FP IEEE 802.11

0.9

0.1 0.5

1 1.5 2 2.5 3 (x−1) * 512 Byte/Second) Flow Rate (2

3.5

Fig. 6. Flow Throughput in Percentage, Max. Node Speed 5m/s

4

0 0

0.5

1 1.5 2 2.5 3 (x−1) * 512 Byte/Second) Flow Rate (2

3.5

4

Fig. 7. Flow Throughput in Percentage, Max. Node Speed 10m/s

Wireless LANs

Fig. 2 shows the wireless LAN throughput versus the number of nodes in the LAN. In the simulations, every node always has packets to send (i.e., saturation traﬃc) and the destination of each packet is randomly selected (all nodes are in the transmission range of each other). In addition, each packet is 512-byte long. As shown in Fig. 2, the modiﬁed protocol has a relative throughput gain of about 15% (an absolute gain of about 100kb/s) when there are 5 nodes in the network. As the number of nodes in the network increases, the throughput gain of the modiﬁed protocol increases too. When the number of nodes in the network reaches 25, the relative gain increases to 25% (an absolute gain of 150kb/s). The average maximum CW size and the average medium access delay for a packet in the network are shown in Fig. 3 and Fig. 4, respectively. As shown in the two ﬁgures, a packet experiences less delay when the modiﬁed MAC protocol replaces the original one in the network. These results conform to the throughput results shown earlier. For conciseness, we only show the throughput results for ad hoc networks in the following sections. 3.3

Ad Hoc Networks

We ﬁrst tested the two protocols in the ad hoc network with stationary nodes. In particular, we run a series of simulations in which the rate of the background ﬂows varied from 0.5*512 bytes/second (B/s) to 8*512 B/s with an increase factor of 100%. A test ﬂow, meanwhile, kept its rate constant at 4*512 B/s to monitor the actual throughput that it could obtain in various cases of network load. Fig. 5 shows the percentage of the packets in the test ﬂow that are successfully received by the ﬂow receiver as the network load varies. As shown in Fig. 5, when the network load increases to some degree, more packets of the test ﬂow are delivered by the network if the modiﬁed MAC protocol replaces the original IEEE 802.11 DCF. For example, when the rate of the background ﬂows is 4*512 B/s, the throughput of the test ﬂow increases from about 44% to 61% as the modiﬁed MAC protocol replaces the original one, which indicates a relative performance gain of about 39%. A similar relative

Exploring a New Approach to Collision Avoidance

471

performance gain is observed for the modiﬁed protocol when the rate of the background ﬂows is 8*512 B/s. Fig. 6 shows the throughput of the test ﬂow when the nodes in the network have random waypoint movement and have a minimum and maximum speed of 1.0 and 5.0m/s, respectively (the average pause time is 0.5 second). Fig. 7 shows the throughput of the test ﬂow in a similar case but the maximum node speed increases to 10.0m/s. As shown in these two ﬁgures, the modiﬁed protocol, on average, has a relative performance gain of more than 50% in both cases of node mobility. Note that the network is more dynamic when the maximum node speed increases and a more dynamic network is more challenging for medium access control. 3.4

More Hidden Terminals

Using the case of a maximum node speed of 5.0m/s, the rest of the section further investigates the performance of the modiﬁed protocol. This section shows how the modiﬁed protocol performs when there is a higher probability of hidden terminals for a transmitter in the network. To increase the probability of hidden terminals, we increased the carrier sense (CS) power threshold of a node from less than one twentieth to half of its packet receive power threshold. The increase of the CS power threshold shrinks the carrier sense range of a node in the network. Fig. 8 shows the throughput of the test ﬂow when the CS power threshold has been increased in the network. By comparing this ﬁgure to Fig. 6, we ﬁnd that the modiﬁed protocol has even higher performance gains as the probability of hidden terminals is increased in the network. This is expected because the modiﬁed protocol is better in handling hidden terminals than the original protocol. 3.5

Link Losses

We also investigated the impact of link frame losses on the performance of the modiﬁed protocol. The bit-free control frames of the modiﬁed protocol are robust against channel eﬀects2 , but a data frame may still experience link losses. We have considered both independent and burst link frame losses. Fig. 9 shows the results for the case of an independent link loss rate of 5%. As shown by Fig. 9 and Fig. 6, independent losses may actually increase the relative performance gains of the modiﬁed protocol over the original protocol. Similar results have been observed for other cases of link frame losses. 3.6

Environmental Noise

We also investigated the impact of environmental noise on the modiﬁed protocol. To test the impact of environmental noise, we placed a noise source at the center of the network and let it generate random-length noise signals at an average rate of 100 signals per second. Moreover, we restricted the noise signal lengths to the range from 1μs to 200μs, which were the range designated for the bit-free control 2

Note that a receiver is in its sender’s packet transmission range while bit-free frames can be received at the CS range of the sender.

472

J. Peng and L. Cheng

0.8

0.8

0.7

0.7

0.7

0.6

0.6

0.6

0.5 0.4

Throughput

0.8

Throughput

Throughput

Throughput vs. Network Load (Max. Node Speed 5.0 m/s, High CS Threshold) Throughput vs. Network Load (Pkt Error: 5%, Max. Node Speed: 5.0 m/s) Throughput vs. Network Load (Noise: 10ms100us, Max. Node Speed: 5.0 m/s) 1 1 1 CSMA/FP CSMA/FP CSMA/FP IEEE 802.11 IEEE 802.11 IEEE 802.11 0.9 0.9 0.9

0.5 0.4

0.5 0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0 0

0.5

1 1.5 2 2.5 3 Flow Rate (2(x−1) * 512 Byte/Second)

3.5

Fig. 8. Higher CS Power Threshold Case

4

0 0

0.1 0.5

1 1.5 2 2.5 3 Flow Rate (2(x−1) * 512 Byte/Second)

3.5

4

0 0

0.5

1 1.5 2 2.5 3 Flow Rate (2(x−1) * 512 Byte/Second)

3.5

4

Fig. 9. An Average Link Fig.10. Environmental Noise Loss Rate of 5% Case Case

frames. The simulation results for this scenario are shown in Fig. 10. As shown by the comparison of Fig. 10 to Fig. 6, the modiﬁed protocol is not more sensitive to noise than the original one. In fact, after introducing the noise source in the network, the modiﬁed protocol shows higher relative performance gains over the original one. Note that a noise signal may not be able to change the length of a bit-free control frame even if when it damages a bit-based frame. Meanwhile, as explained in Section 2, a noise signal must have the right length, arrive at the right node, and possibly arrive at the right time for being harmful. 3.7

Protocol Resilience

The above subsections are about how external factors may impact the performance of the modiﬁed protocol. This subsection shows how the protocol’s own parameters aﬀect its performance. We have investigated the three most important parameters of the protocol, which are the receive power thresholds for control frames, the length set for control frames, and the base n of the Mod-n calculations for obtaining RTS frame lengths. Fig. 11 shows how the modiﬁed protocol performs when all its control frames use the same receive power threshold as data frames, which deprives the modiﬁed protocol of its advantage of better hidden terminal handling. As shown in the ﬁgure, the performance of the protocol does degrade but still maintains signiﬁcant gains over the original protocol. Fig. 12 shows the performance of the modiﬁed protocol as the average length of its control frames becomes similar to the average length of the bit-based control frames of the original protocol. As shown in this ﬁgure, the performance of the modiﬁed protocol degrades gracefully in this case. Fig. 13 shows how the modiﬁed protocol performs as the base n of the Mod-n calculation is halved. Halving the n is similar to doubling the node density of the network in terms of investigating how the redundant CTS frames for an RTS frame may aﬀect the protocol performance. As shown in Fig. 13, the performance of the modiﬁed protocol has a graceful degradation when the n is halved.

Exploring a New Approach to Collision Avoidance Throughput vs. Network Load (Max. Node Speed: 5.0 m/s)

CSMA/FP CSMA/FP − Long Pulse IEEE 802.11

0.9

0.7

0.7

0.6

0.6

0.6

0.4

Throughput

0.8

0.7

0.5

0.5 0.4

0.5 0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1 0.5

1 1.5 2 2.5 3 (x−1) * 512 Byte/Second) Flow Rate (2

3.5

Fig. 11. Data Receive Power Threshold Case

4

0 0

CSMA/FP−Mode20 CSMA/FP−Mode10 IEEE 802.11

0.9

0.8

Throughput

Throughput

1

1 CSMA/FP CSMA/FP − Data Power Threshold IEEE 802.11

0.8

0 0

4

Throughput vs. Network Load (Max. Node Speed: 5.0 m/s)

Throughput vs. Network Load (Max. Node Speed: 5.0 m/s)

1 0.9

473

0.1 0.5

1 1.5 2 2.5 3 (x−1) * 512 Byte/Second) Flow Rate (2

3.5

4

0 0

0.5

1 1.5 2 2.5 3 (x−1) * 512 Byte/Second) Flow Rate (2

3.5

4

Fig. 12. Long Bit-Free Fig. 13. Mod-n: n Changes from 20 to 10 Control Frames Case

Conclusion

We have proposed in this paper a new bit-free control frame method for collision avoidance in wireless packet networks. Bit-free control frames do not need headers or preambles, so they can be short. In addition, bit-free control frames are robust against channel eﬀects due to their simplicity. We have investigated the new approach by applying it to the IEEE 802.11 DCF and conducting extensive simulations. We have tested the new approach in both wireless LANs and ad hoc networks. We have also investigated how hidden terminals, link losses, and environmental noise may impact the new approach. Additionally, we have examined how protocol parameters such as the average length, the receive power thresholds, and the size of the length set of control frames may impact the performance of the new approach. Our conclusion is that the new bit-free control frame method is able to signiﬁcantly improve the performance of the IEEE 802.11 DCF.

References 1. Tobagi, F. A., Kleinrock, L.: Packet Switching in Radio Channels: Part II - The Hidden Terminal Problem in Carrier Sense Multiple Access and the Busy Tone Solution. IEEE Transactions on Communications 23 (1975) 1417–1433 2. Kleinrock, L., Tobagi, F. A.: Packet switching in radio channels: Part I - carrier sense multiple-access modes and their throughput- delay characteristics. IEEE Transactions on Communications 23 (1975) 1400-1416 3. Peng, J., Cheng, L., Sikdar, B.: A new MAC protocol for wireless packet networks. IEEE GLOBECOM, San Francisco, CA (2006) 4. Haas, Z. J., Deng, J.: Dual Busy Tone Multiple Access (DBTMA) - A Multiple Access Control Scheme for Ad Hoc Networks. IEEE Transactions on Communications 50 (2002) 975-985 5. Karn, P.: MACA - A New Channel Access Method for Packet Radio. Proc. of the 9th ARRL Computer Networking Conference, Ontario, Canada (1990) 6. Bharghavan, V., Demers, A., Shenker, S., Zhang, L.: MACAW: a medium access protocol for wireless LANs. ACM SIGCOMM, London, United Kingdom (1994)

474

J. Peng and L. Cheng

7. Fullmer, C. L., Garcia-Luna-Aceves, J. J.: Floor acquisition multiple access (FAMA) for packet-radio networks. ACM SIGCOMM, Cambridge, Massachusetts (1995) 8. Xu, K., Gerla, M., Bae, S.: How Eﬀective is the IEEE 802.11 RTS/CTS Handshake in Ad Hoc Networks? IEEE GLOBECOM, Taipei, Taiwan (2002) 9. IEEE 802.11 Wireless Local Area Networks http://grouper.ieee.org/groups/802/11/ 10. Sobrinho, J. L., Krishnakumar, A. S.: Real-time traﬃc over the IEEE 802.11 medium access control layer. Bell Labs Technical Journal 172-187 (1996) 11. The Network Simulator - ns-2 http://www.isi.edu/nsnam/ns/ 12. Johnson, D. B., and Maltz, D. A., Hu, Y. C.: The Dynamic Source Routing Protocol for Mobile Ad Hoc Networks (DSR). IETF Interet draft,draft-ietf-manet-dsr-10.txt (2004)

Video Rate Adaptation and Scheduling in Multi-rate Wireless Networks Sourav Pal, Sumantra R. Kundu, Amin R. Mazloom, and Sajal K. Das Dept. of Computer Science and Eng. University of Texas at Arlington Arlington, TX 76019 – USA {kundu,spal,mazloom,das}@cse.uta.edu Abstract. Current scheduling techniques used for cellular networks do not suﬃce for the emerging multi-rate systems like cdma2000 and High Data Rate (HDR). Real-time applications like video streaming must comprehend the channel conditions and consequently the data rates that are currently being supported; accordingly the content and the amount of data to be transmitted needs to be adapted to the available bandwidth. In this paper, we have considered multimedia (MPEG-4) streaming as the application over HDR and propose a content aware scheduling scheme (CAS) that takes into consideration the diﬀerent priorities of the MPEG-4 stream content. The proposed transmission scheme considers both the channel conditions as perceived by the user as well as the priority of the streams. In addition, CAS veriﬁes the playout timestamp and discards stale packets ensuring higher throughput in the process. We capture the lag of the proposed adaptation scheme using the KullbackLeibler distance and show that the rate adaption scheme has a reasonably small lag. Simulation results demonstrate that the proposed scheme results in higher overall peak signal to noise ratio (PSNR) values of the entire movie, lesser number of dropped frames, and a better throughput utilization over existing schemes.

1

Introduction

The advent of multi-rate systems like cdma2000 [1], HDR [2], and EDGE [3] provide higher bandwidth but pose challenges in the channel rate estimation and scheduling. Although advanced video standards like MPEG-4 and H.264 can take advantage of the increased transmission bandwidth and oﬀer powerful error resilient mechanisms, they are unable to handle the complexities like attenuation due to multi-path fading, shadowing, transmission errors, bandwidth ﬂuctuations, spectrum scarcity associated with the wireless channel. In this paper, we focus on the scheduling aspect for such systems. In addition to round robin (RR) and ﬁrst come ﬁrst serve (FCFS) scheduling policies, video scheduling over HDR [6] have used Modiﬁed-Largest-WeightedDelay-First (M-LWDF) and Exponential (EXP) scheduling algorithm and variations of the two to reduce the percentage of video frames that do not meet

This work is supported by NSF ITR grant IIS-0326505.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 475–487, 2007. c IFIP International Federation for Information Processing 2007

476

S. Pal et al.

the playback deadlines. However, even such sophisticated radio resource management schemes are unaware of the content, priority and timing requirements demanded by the video streams. In this paper, we provide rate adaptation techniques at the base station that adapts the transmission of the packets to the available bandwidth. It also ensures that packet buﬀer overﬂow/underﬂow does not take place at the client end. We also propose a scheduling algorithm speciﬁcally for streaming multimedia (MPEG-4) that not only adapts to the available bandwidth but also schedules the packet with respect to their priority for ensuring smooth video playout at the client end. On favorable channel conditions, the scheme transmits more packets than necessary for current viewing but honors the buﬀer overﬂow condition determined by the rate adaptation algorithm. The success of the proposed scheme is due to the priority structure of the MPEG-4 stream in which I frames have higher priority than B and P frames, since without the I frames video playout is not possible. Based on the channel conditions, the scheme not only adjusts the data but also drops the lower priority frames (in case of video), if necessary. We employ the Kullback Leibler [4] distance as a metric to measure the adaptation rate. In summary, the main contributions of this paper are as follows: – A rate adaptation scheme which takes into consideration the buﬀer overﬂow and underﬂow problems at the client end. – A content aware scheduling technique which ensures that higher priority MPEG-4 frames are transmitted prior to the lower priority ones and takes into consideration the playout timing requirements. – Experimental evaluation of the scheme using a framework consisting of Darwin Streaming Server (DSS), click modular router, and Mplayer as the client end media player reveals that the proposed technique achieves lower frame drop and higher PSNR values compared to the case with no content aware scheduling mechanism. The rest of the paper is organized as follows: We discuss the necessary background on HDR systems and MPEG-4 in Section 2. A rate adaptation scheme based on the estimated channel throughput is proposed in Section 3. Section 4 presents a case study of transmission of MPEG-4 over HDR using the proposed techniques. In Section 5, simulation results involving rate adaptation, scheduling algorithms, and performance of MPEG-4 transmission over HDR system are presented. Conclusions are drawn in the last section.

2

Background

A brief description of the HDR system model and the functioning of MPEG-4 is presented in this section. 2.1

HDR System Model

We consider a single cell of a multi-rate wireless system with the base station serving N mobile terminals. We assume that the system employs data rate

Video Rate Adaptation and Scheduling in Multi-rate Wireless Networks

477

control mechanism on the forward link that adapts to the changing channel conditions by employing adaptive modulation and coding techniques, hybrid Auto Repeat-reQuest (ARQ), and best serving sector selection. The mobile terminals perform measurement of the current channel conditions (i.e., EIob the received energy per bit to interference) and predicts the achievable rate. Every mobile terminal updates the base station of the predicted rate via the pilot signal on the reverse link data rate control (DRC) channel. At any time slot t, the data rate that can be supported by the ith mobile terminal is Ri (t), 1 ≤ i ≤ N , where Ri (t) is one of the many rates supported by the system. For example, HDR supports 11 data rates [2]. Ri (t) is the mean rate actually provided to user i measured over a sliding window of length tc and is given by Ri (t + 1) = (1 −

1 1 ) × Ri (t) + × Ri (tc ) tc tc

(1)

This cumulative estimation is put into eﬀect at each slot; however the scheduling step is executed once every new transmission.

3

Rate Adaptation in Multi-rate Systems

We initially present the adaptation algorithm and thereafter present the analytical modeling of the adaptation scheme and propose adaptation metrics using the Kullback Leibler distance. The process of adaptation of the application bit stream in response to the available transmission rate gives rise to the following three scenarios. We denote ri (t), Xb (t), and Xe (t) as the available transmission rate, the required base layer bit rate, and the enhanced layer bit rate (applicable only for FGS MPEG-4) respectively. – Case I: ri (t) < Xb (t): The adaptation scheme selectively drops base layer frames based on the proposed prioritization scheme. – Case II: Xb (t) ≤ ri (t) ≤ Xb (t)+Xe (t): The highest priority layer is encoded. For the enhanced layer, the adaptation scheme encapsulates whatever portion of the EL that can be packed into the remaining available bandwidth. – Case III: ri (t) > Xb (t) + Xe (t): This is the most favorable case where the encoding rate is higher than the bit stream rate and hence the entire data can be transmitted. Through the proposed adaptation scheme, the lowest possible granularity is achieved. However, a ﬁnite lag remains between the channel state and actual transmission of the data which in our case is the best achievable, i.e., lag by a slot. Note that during packetization the granularity is measured in terms of the decision period. Depending on the current buﬀer status at both the transmitter and user and the current channel state, the rate control system needs to determine the encoding rate. The currently encoded packets would only be served after the existing MAC PDU’s are served. Thus, the rate control system needs to determine the number of decision periods required to serve the existing MAC PDUs. Hence, prediction of the rates by transition probability computation is important which we discuss next.

478

3.1

S. Pal et al.

Transition Probability Calculation

We assume that the data rates are related to the distance of the user from the base station [13]. Though this assumption holds true under ideal conditions (i.e., no interference, no topological eﬀects), we still use it for mathematical simplicity. If we assume that the system supports m rates, then a cell can be divided into m concentric rings; the innermost ring gets the maximum data rate and outermost ring gets the minimum. In one decision period, a user can go one ring above or below the current ring because of the limitation in speed. Let Puv denote the probability of a user switching from a data rate u to data rate v. Note, v can only be u − 1, u, or u + 1. Let the steady state probability of an user being in the uth data rate be denoted by πu . We deﬁne πu as the ratio of the ring area r 2 −r 2

by the cell area, and is given by πu = ur2 u−1 where ru is the radius of the uth m−1 m−1 ring. Clearly, u=0 πu = 1. The transitional probabilities Puv are computed geometrically and can be given as follows 2 s(s − r1 )(s − r2 )(s − d) − (θ1 r12 − θ2 r22 ) πr22 Ai = πr22 Abge = πr22

Pu→u−1 = Pu→u Pu→u+1

(2)

Note that these probability calculation will be diﬀerent for the inner most and outer most rings. 3.2

Analytical Modeling of Rate and Adaptation

The encoding rate ri (t) for user i depends on the channel state, buﬀer status, and bandwidth allocated. This necessitates the computation of the number of decision periods needed to serve the existing MAC PDUs. The number of decision periods is basically the lag by which the encoder follows the channel state. If ξ be the number of decision periods needed to serve the MAC buﬀer Bi , then ξ

l Sil × Puv × Ril (t) = fb × Bi

(3)

l=0

where 0 ≤ fb ≤ 1 is the buﬀer fullness, Sil is the number of slots allocated to user l i, Puv is the transition probability from state u to v, the calculation of which can be found in [2]. Ril (t) is the rate estimated using Equation 1, all in the lth decision cycle. We need to compute the amount of data (χi (t)) that the MAC scheduler would be able to support for the corresponding decision period. Depending on the scheduling paradigm which determines the number of slots being allocated to the user, the amount of data which the adaptation layer may encode is given by ξ × Riξ (t) χi (t) = Siξ × Puv

(4)

Video Rate Adaptation and Scheduling in Multi-rate Wireless Networks

479

Conversely, if the mobile terminal undergoes favorable channel condition i.e., would support a higher data rate than the rate controller would be tempted to transmit more data provided the receiver buﬀer does not overﬂow. Transmitting more data than required would compensate if channel conditions deteriorate. The available buﬀer space (υi (t)) during that decision period is υi (t) = (1 − frb ) ×RBi − ξ l Sil × Puv × Ril (t) + ρi × (τ ξ) l=0

(5)

where 0 < frb < 1 is the receiver buﬀer fullness, RBi is the receiver buﬀer size, ρi is the rate of the playout curve rate, and τ is the time of each slot (τ = 1.67ms for HDR). The adaptation layer encoding rate, considering the buﬀer constraints at both the transmitter and receiver sides is ri (t) = min (χi (t), υi (t))

(6)

Note that ri (t) is a random variable which gives the value of rate adaptation. Let p(·) denote the pdf for the random variable ri (t). Let fg (·) denote the pdf of the Group of Picture (GoP) size distribution of the gth video. We use the Kullback Leibler (KL) distance to characterize the performance of our adaptation scheme. Deﬁnition: Kullback Leibler distance which determines the relative entropy between two distributions is given as D(pq) =

p(x) ln

x∈X

p(x) q(x)

(7)

D(pq) is the measure of the ineﬃciency by which the distribution q(x) diﬀers from distribution p(x). Hence Equation (7) provides a metric to determine the lag or closeness of q(x) to p(x). Substituting p(x) = p(·) and q(x) = fm (·), we deﬁne that the adaptation (A) of the video to channel rate as A(pfm ) =

p(.) ln

p(.) fm (.)

(8)

The granularity of the rate of adaptation is bounded by the decision period as given in Equations (3), (4), (5) and (6). Hence we analyze the adaptation with respect to the decision period. The time granularity of the decision period is denoted by s.

4

MPEG over Multi-rate System

The success of video streaming depends on devising mechanism to adapt to the changing channel conditions. The popular techniques for TCP rate adaptation [7]

480

S. Pal et al.

did not consider the ﬂuctuation in channel conditions. Numerous link layer schemes [8,9] exist for enhancing TCP over wireless networks. But link layer aware techniques [10,11] use statistical measure of packet loss for adaptation. These techniques fail to capture the content level information which needs to be exploited for video applications. It is known that better rate control algorithms are obtained if stochastic channel behavior through a priori models are considered [12]. We incorporate the channel model as well as the video information in our rate adaptation algorithm at the MAC layer. The scheduler transmits more data than necessary when the conditions are favorable, whilst avoiding buﬀer overﬂow. Whereas it drops frames without compromising on the integrity of the video when the conditions are degraded. For example, successful transmission of I-frames is more crucial than others. This knowledge of frame priority at the MAC layer helps the rate adaptation. This is because the delay in retransmission of I-frames from the application layer is higher than retransmitting from the MAC layer. In addition, the MAC layer can drop less important frames based on the priority and closely follow the channel conditions by minimizing the Kullback-Leibler distance. Resync marker

Bit Stream

Data Partitioning Resync Marker

MB No.

QP

Header APH and Motion data

HEC

Motion Data

Header APH and Motion data

MBM

MBM Re MB and APH sync No. DCT Marker data

DCT data

Re MB MBM APH sync No. and DCT Marker data

Packetization

Fig. 1. Hierarchical Packetization of Video Packets

4.1

Fragmentation, Packetization, Prioritization Layer (FPPL)

In general applications the working of these layers are mostly static in nature and do not consider the underlying network protocols or the possible variation in the channel conditions. We propose a hierarchical fragmentation, packetization and prioritization layer (FPPL), speciﬁcally designed for multi-rate wireless systems, that uses the packet resynchronization and exploits the underlying MAC and the channel conditions for packetization/fragmentation.This hierarchy is shown in Figure 1.

Video Rate Adaptation and Scheduling in Multi-rate Wireless Networks

481

Since FPPL is MAC-aware, the FPPL packet size is matched to the MAC protocol data unit (PDU) to prevent further fragmentation at the MAC layer. 4.2

Radio Resource Scheduling

The MAC Scheduler (MAC-S) is responsible for allocating slots among the users requiring diﬀerent data rates. It can be any generic scheduler like the proportional fair scheduler. These standard MAC schedulers do allocate slots either based on the supportable channel rate or/and by the QoS demanded by each type of user but unfortunately is unaware of the content type being served from the MAC buﬀer. Hence, the MAC-S is incapable of exploiting the application speciﬁc features while scheduling. To overcome the shortcomings of the MAC-S scheduler, we propose a Content Aware Scheduler (CAS) as shown in Figure 2 which works in conjunction with the MAC-S. RCS

MAC buffers for n active users

Decreasing Priority

Kth Buffer 1st 2nd Buffer Buffer

CASS

Content Aware Scheduler (CAS)

MAC Scheduler (TM/FCTM/QFCTM)

MT:mobile terminal

transmit

MT SARC for the highest priority frames

Retransmit Buffer

Fig. 2. MAC-S and CAS Schedulers

Depending on the application the FPPL generates the appropriate number of priority levels and performs fragmentation and packetization accordingly. We assume that k can be any number of priority levels depending on the application. The CAS models the MAC buﬀer into k number of queues one for each priority level. For example in case of the FGS MPEG-4 video data there would be 4 queues one each for the I, B, P-frames and the last one for the enhancement layer with the highest priority being given to the I-frames and the least to the enhancement layer. The generic MAC-S does not diﬀerentiate the priorities and simply serves the content in a ﬁrst come ﬁrst serve (FCFS) queuing discipline. However, the proposed CAS being content aware enhances the system goodput. In many cases for real time streaming applications, transmission of the higher priority content is essential for the success of the application. In addition, the proposed CAS is aware that the packets need to be transmitted before a certain deadline at the receiver side. The CAS is smart enough to discard packets if stale.

482

S. Pal et al.

For ith packet, we deﬁne a boolean variable SP (i) which determines whether the packet is stale or not. A packet i is deﬁned stale if SP (i) = 1 and the condition is tcurr > T Si +tprop else SP (i) = 0 meaning the packet is not stale; where tcurr , T Si and tprop are the current system time, the time stamp of packet i prior to which it needs to be transmitted and the propagation time, respectively. This simple yet content aware scheduling mechanism prevents error propagation and increases the goodput. The MAC-S is unaware of the prioritization scheme and is unable to schedule eﬃciently. However, the CAS utilizes the priority information and also employs a Selective Adaptive Retransmit Control (SARC) mechanism for the highest priority packets. The SARC mechanism can be utilized till any level of priority i.e. for each buﬀer but for the sake of simplicity we restrict ourselves to the highest priority buﬀer. Additionally, the CAS employs Fast Transmit Scheme (FTS) to take advantage of favorable channel condition but does take into account of the upper limit Thus, in summary, the rationale for employing SARC are the following: 1. Provide unequal error protection (UEP) to high priority data and thereby increase the transmission quality and prevent error propagation. 2. Mobile terminal’s buﬀer fullness (frb ) is transmitted back to the base station through the SARC mechanism. frb is essential for boundary condition determination. The SARC mechanism of CAS keeps retransmitting the highest priority till the mobile terminal acknowledges the successful arrival of the frame. The number of retransmissions is bounded by the timing requirements, i.e., till the packet becomes stale. Continuing with the FGS MPEG-4 example the I-frame being the most important video data frame, multiple retransmissions might be required for it. Missing or corrupted I-frame results in wastage of the corresponding B, P and EL frames even if they arrive correctly. The FTS scheme enables the CAS to transmit more data than the playout curve rate of the mobile terminal when the channel conditions are favorable provided CAS has assigned suﬃcient slots by the MAC-S. Since the receiver buﬀer of the mobile terminal is ﬁnite, the CAS should restrain from transmitting data that would overﬂow the mobile terminal buﬀer. Similar to the ARQ mechanism in TCP, the mobile terminal transmits the available buﬀer space (β) in each acknowledgment packet. The CAS makes the rate control system module aware of β. Thereafter, RCS module takes into account of the available rate, buﬀer space at both the MAC and mobile terminal, and computes the upper bound for data transmission. It also determines whether the fast transmission scheme outlined earlier is achievable or not. The working principle of CAS is explained below. Without loss of generality let us explain the CAS scheduling of an application having k priority levels whose queues denoted by iHP where i {1, 2, .., k}. The highest priority level enabled with SARC in the K th decision cycle of the MAC-S scheduler.

Video Rate Adaptation and Scheduling in Multi-rate Wireless Networks

483

Scheduling Algorithm CAS() CAS ← control state if (Overﬂow Constraint not violated) then if 1HP not empty then if T S1HP −packet < T S2nd,...,kth−packet && !SP(1HP) then CAS schedules 1HP-packets CAS employs SARC //wait for ζ slots for ACK 7: Return 8: end if 9: end if 10: if CAS ← SARC feedback then 11: Retrieve retransmit packet numbers 12: Compute Overﬂow Constraint 13: if !SP(1HP-packet) then 14: CAS retransmits 1HP-packets 15: CAS employs SARC 16: if slot available then 17: pack 2nd HP, . . .,kth HP-packets respectively //wait for ζ slots for ACK 18: end if 19: Return 20: end if 21: end if 22: i=2 23: if ith -Buﬀer not empty then 24: CAS transmits ith Buﬀer packets. 25: Increment i 26: end if 27: end if

1: 2: 3: 4: 5: 6:

CAS initiates transmission with the highest priority packets employing Selective Adaptive Retransmit Control and following up with the rest of the frames without employing any ARQ on the rest of the packets. For CAS, goodput would be more eﬀective measure than throughput. We deﬁne goodput for real time data as the number of packets transmitted per decision cycle by the CAS scheduler that the mobile terminal successfully utilizes. The calculation of CAS goodput is done in section 4.3. 4.3

CAS Goodput

In this section, we perform the analytical modeling of the Content Aware Smart Scheduler and also derive the goodput of the system using CAS. First, we model the SARC mechanism. Let Pb (m) and Pe (m) denote the probability of the bit error and packet error respectively for sending m packets simultaneously. If Ps (m) denotes the probability of successful packet transmission then Ps (m) = 1 − Pe (m) = [1 − Pb (m)]

L

(9)

where L denotes the length of the MAC protocol data unit (PDU). To provide more protection to the highest priority packets, error correction coding (ECC) is employed. The modiﬁed probability for packet transmission with ECC at any point of time is Ps (m, y) =

y L L−o Pb (m)i [1 − Pb (m)] o i=0

(10)

484

S. Pal et al.

where o is the number of errors corrected and y is the total number of correctable bit errors. Thus Ps (m, y) denotes the probability for successful packet transmission for highest priority packets. However, the number of retransmissions for these packets is limited by , the maximum number of retransmits for the ith 1HP packets. We compute as =

T S1HP −packet if !SP(1HP-packet) tprop + tarq

(11)

where tprop , tarq and T S denote the propagation, the time after which the ACK transmitted by the mobile terminal is received, and the timestamp of the corresponding packet before being marked stale, respectively. The mean number of retransmissions for 1HP-packets is given by δ1HP =

(1 − Ps )(i−1) Ps × i

i=1

= Ps ×

1 − Ps (1 − Ps )2

(12)

Let nk be the total number of MAC-PDUs generated by FPPL for the k th data segment and let nk1HP and nkR be the number of 1HP-frame packets and 2, . . . , k- frame packets respectively such that nk = nk1HP +nkR . The timestamp of the frames is dependent on the data size distribution. We model δRj as the parameter which determines whether 2, . . . , k- frame are stale. Consider that δRj = SP(j), thus the eﬀective number of packets transmitted per decision period by CAS is given by Δi = nkI × δI + nkR × δRj

(13)

Therefore, the average goodput (ρgp ) of CAS is given by K ρgp =

i=1

K

Δi

(14)

where K is the total number of decision periods for which video is transmitted. Note that throughput of such systems would fail to capture the actual system performance since it would not consider adaptive selective retransmission and selective frame dropping based on packet staleness.

5

Simulation Results

We have conducted simulation experiments where a single multi-rate cell with multiple users was simulated to illustrate the performance of the proposed adaptation technique with respect to the existing scheme for HDR. The video test sequences chosen comprises of both the simple proﬁle (SP) and advanced simple proﬁle (ASP). In order to ensure the performance of the proposed schemes, we use representative test sequences of foreman, paris and football which have

Video Rate Adaptation and Scheduling in Multi-rate Wireless Networks

485

Table 1. Speciﬁcation of the ﬁles used in our simulation Resolution Clip name fps QCIF

paris

QCIF

bitrate

15fps 64kbps

foreman 25fps 64 kbps

CIF

football

15fps 1Mbps

26 CIF,15fps, 1Mbps QCIF,25fps 64kbps QCIF, 15fps ,64kbps

24

PSNR in db

22

20

18

16

14

12

0

200

400 600 Number of Errors in the BitStream

800

1000

Fig. 3. Observe the smooth variation of PSNR with BER using our CAS algorithm 0.12

Slow Fading Very Slow Fading

Percentage lag in Adaptation

0.11

0.1

0.09

0.08

0.07

0.06

0.05

0.04

1

2

3

4

5

6

7

8

9

10

Window Size

Fig. 4. The variation of CAS adaptation due to mobility

varying resolution, frame rate and bit rate. The speciﬁcations of the streams are listed in Table 1. Figure 3 highlights the graceful degradation of PSNR values using the CAS scheduler. As for the lag in adaptation, we analyzed equation 8 numerically. Figure 4 shows how the system is able to learn and adapt if a suﬃciently long window of observation is allowed. As we do not deal with fast fading channels, we compared ‘slow’ and ‘very slow’ fading channels. Obviously, the adaptation is

486

S. Pal et al.

Percentage of Dropped I Frames

70 Non− CAS 60

CAS

50

40

30

20

10

0 4.5

5

5.5

6

6.5

7

7.5

Percentage of Error in BitStream (10−6)

Fig. 5. Improved throughput of I-frames due to CAS. The data transmission quality by CAS is also impressive and lesser bits are ﬂipped due to adaptability nature of the CAS.

Percenatge of Frames Dropped

70 Non CAS 60

CAS

50

40

30

20

10

0 4.5

5

5.5

6

6.5

7

7.5

Percentage of BER (10−6)

Fig. 6. Error resiliency of CAS and throughput of I-frames. As the channel condtion degrades the CAS shows signiﬁcantly better adaptability to erroneous medium and results fewer Ipacket drops.

better for slower fading channels. In Figures 5 and 6, we show how the throughput of I-frames in multimedia streaming could be improved in the presence of CAS.

6

Conclusion

This paper deals with rate adaptation at the MAC layer which is necessary for streaming multimedia over multi-rate wireless systems. We propose a rate adaptation technique where the application layer encoding rate is dynamically adjusted to the varying channel conditions perceived by the user. We have also enhanced the fragmentation, packetization and prioritization scheme for FGS MPEG-4 and tailored it for the multirate wireless systems. We have shown how content-aware

Video Rate Adaptation and Scheduling in Multi-rate Wireless Networks

487

scheduling at the MAC layer can enhance the performance for video data transmission by selectively deciding on the nature and importance of the content.

References 1. 3GPP2 C.S0024 Ver3.0, “cdma2000 High Rate Packet Data Air Interface Speciﬁcation”, Dec. 5, 2001. 2. P. Bender, P. Black, M. Grob, R. Padovani, N. Sindhushayana, and A. Viterbi, “CDMA/HDR: A Bandwidth-Eﬃcient High-Speed Wireless Data Service for Nomadic Users”, IEEE Comm. Magazine, July 2000. 3. A. Furuskar, S. Mazur, F. Muller and H. Olofsson, “EDGE: Enhanced Data Rates for GSM and TDMA/136 Evolution,” IEEE Personal Communication Magazine, pp 56-66, June 1999. 4. S. Kullback and R. A. Leibler, “On information and suﬃciency”, Ann. Math. Stat., 22:7986, 1951. 5. R. Koenen, “Overview of the MPEG-4 Standard”, ISO/IEC JTCI.SC29/WG11 M4030, 2001. 6. K. Gribanova and R. Jntti, “On Scheduling Video Streaming Data in the HDR System”, Vehicular Technology Conference, 2004. 7. M. Chen and A. Zakhor, “Rate Control for Streaming Video over Wireless”, Infocom, 2004. 8. H. Balakrishnan and R. Katz, “Explicit Loss Notiﬁcation and Wireless Web Performance”, IEEE Globecom Internet Mini-Conference, 1998. 9. S. Biaz, N. H. Vaidya, “Distinguishing congestion losses from wireless transmission losses: a negative result”, Computer Communications and Networks, 1998. 10. H. Balakrishnan, V. N. Padmanabhan, S. Seshan and R. H. Katz, “A Comparison of Mechanisms for Improving TCP Performance over Wireless Links”, Proc. ACM Sigcomm 1996, Stanford, CA, August 1996. 11. S. Cen, P. C. Cosman, and G.M. Voelker, “End-to-end diﬀerentiation of congestion and wireless losses”, Proc. Multimedia Computing and Networking (MMCN) conf. 2002, pp. 1-15, San Jose, CA, Jan 23-25, 2002. 12. S. Aramvith, I. Pao and M. Sun, “A Rate-Control Scheme for Video Transport over Wireless Channels”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 11, no. 5, May 2001. 13. T. Bonald and Alexandre Proutiere, “Wireless downlink data channels: user performance and cell dimensioning”, Proceedings of the 9th annual international conference on Mobile computing and networking, San Diego, CA, USA. 14. A. Jalali, R. Padovani, and R. Pankaj, “Data throughput of CDMA-HDR a high eﬃciency-high data rate personal communication wireless system,” in IEEE Proc. of Vehicular Technology Conf. 2000-Spring. 2000, vol. 3. 15. L. D. Soares and F. Pereira, “MPEG-4: a ﬂexible coding standard for the emerging mobile multimedia applications”, IEEE Int. Symp. on Personal, Indoor and Mobile Radio Communications, Volume: 3, pp. 1335-1339, 1998. 16. W. Chung, H. W. Lee, and J. Moon, “Downlink Capacity of CDMA/HDR”’,in the Proc. IEEE 2001 Vehicular Technology Conference (VTC2001-Fall), Atlantic City, NJ, USA, October 2001. 17. C. Bormann, L. Cline, G. Deisher, T. Gardos, C. Maciocco, D. Newell, J. Ott, G. Sullivan, S. Wenger, and C. Zhu, “RTP Payload Format for the 1998 Version of ITU-T Rec. H.263 Video (H.263+),” Internet Engineering Task Force, RFC 2429, Oct. 1998. 18. T. V. Laksman, A. Ortega and A. R. Reibman, “VBR video: Trade-oﬀs and potentials,” Proc. IEEE, vol 86. pp. 952-973, May 1998.

On Scheduling and Interference Coordination Policies for Multicell OFDMA Networks G´abor Fodor Ericsson Research, Kista, Sweden [email protected]

Abstract. In orthogonal frequency division multiple access systems there is an intimate relationship between the packet scheduler and the inter-cell interference coordination (ICIC) functionalities: they determine the set of frequency channels (sub-carriers) that are used to carry the packets of in-progress sessions. In this paper we build on previous work - in which we compared the so called random and coordinated ICIC policies - and analyze three packet scheduling methods. The performance measures of interest are the session blocking probabilities and the overall throughput. We find that the performance of the so-called Fifty-Fifty and What-It-Wants scheduling policies is improved by coordinated sub-carrier allocation, especially in poor signal-to-noise-and-interference situations. The performance of the All-Or-Nothing scheduler is practically insensitive to the choice of the sub-carrier allocation policy. Keywords: Orthogonal Frequency Division Multiple Access, Radio Resource Management, Interference Coordination, Scheduling.

1 Introduction The 3rd Generation Partnership Project (3GPP) has selected orthogonal frequency division multiple access (OFDMA) as the radio access scheme for the evolving universal terrestrial radio access (E-UTRA). Packet scheduling (PSC) and inter-cell interference coordination (ICIC) are important radio resource management (RRM) techniques that together determine the set of OFDMA resource blocks (essentially the sub-carriers) that are taken into use when a packet is scheduled for transmission over the radio interface [2], [3]. In broad terms, PSC is responsible for determining the session(s) that can send a packet during a scheduling interval and the number of sub-carriers that the session may use. The number of the assigned sub-carriers has a direct impact on the instantaneous bit-rate and thereby can be seen as part of the rate control mechanism. The ICIC function, in turn, is concerned with allocating the particular sub-carriers to the session taking into account the instantaneous channel conditions and the ICIC policy. Such ICIC policy may coordinate which sub-carriers should be taken into use by the schedulers in neighbor cells. The impact of these two RRM functions on the session-wise and overall throughput has been for long recognized by the standardization and research communities. Sections 11.2.4 and 11.2.5 of [2] and Chapter 6.12 of [3] describe the roles of the PSC and ICIC functions and discuss their relation. From a performance analysis perspective, I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 488–499, 2007. c IFIP International Federation for Information Processing 2007

On Scheduling and Interference Coordination Policies

489

Letaief et al. developed a model that jointly optimizes the bit and power allocation in OFDMA schedulers [5]. ICIC has been the topic of research for long (for a classical overview paper, see [4]). The paper by Liu and Li proposed a so called ”Radio Network Controller algorithm” that determines the set of allowed resources in each base station under its control, while the ”Base Station algorithm” schedules packets for transmission [9] (see also Chapter 8 of [10]). These works demonstrate that already with a single dominant interfering neighbor cell, the total throughput increases when an appropriate ICIC policy is employed by the packet scheduler. The contribution of the current paper is that we (1) explicitly take into account that traffic is elastic and (2) propose a flexible model to capture the behavior of a wide range of schedulers under two different ICIC policies. With regards to (1) we allow the bitrates of the sessions to fluctuate between the associated minimum and maximum rates. This model allows the maximum rate to be large so that the behavior of TCP-like greedy sources can be captured. Regarding (2), we introduce the notion of the scheduler policy vector that specifies the probability that a session is granted a certain amount of subcarriers when there are competing sessions in the system. We add this rather general scheduler model to the interference coordination model described in [8] and analyze the model in a sequence of steps (Steps 1-6) detailed in the paper. The performance analysis gives insight into the potential gains that inter-cell interference coordination can give when employing different packet scheduling policies. The paper is organized as follows. In the next section, we describe the scheduling and ICIC policies that we study and introduce the policy vector as a convenient tool to characterize these policies. Next, in Section 3 we state the performance analysis objective in terms of the input parameters and the performance measures of interest. The solution is summarized in a sequence of steps (as described above). Section 4 discusses numerical results. We highlight our findings in Section 5. We note that the proofs of the lemmas as well as further details, numerical results and the conclusions are available in the longer version of the paper [1].

2 Scheduling and Inter-cell Interference Coordination Policies We consider an OFDMA cell that comprises S orthogonal frequency channels (subcarriers). The number of in-progress sessions is denoted by i and represents the state of the system. When the system is in state i, the scheduler determines the number of sub-carriers that are assigned to each session. For a particular session under study, this implies that the session is assigned s number of sub-carriers with probability P (s); S s=0 P (s) = 1. We refer to the mechanism that (in each system state) establishes P (s) as the scheduling policy. The scheduling policy vector is a vector of dimension (S + 1) whose sth element specifies the probability that the session under study (and thereby any session) is allocated s channels, s = 0 . . . S. (We note that the indexing of the (S + 1) elements of the policy vector runs from 0 to S.) In the following subsections we describe three such scheduling polices. Throughout we assume that the sessions belong to the same service class that is ˆ and a maximum slowdown factor a characterized by a peak rate requirement R ˆ ≥ 1. ˆ a. Also, when The minimum accepted (guaranteed) bit rate for a session is Rmin = R/ˆ

490

G. Fodor

a session is granted s number of frequency channels, its ideal bit-rate (assuming a given and fixed modulation and coding scheme, MCS) and assuming zero packet error/loss ˆ is set to RS (that is the peak bit-rate rate (P ER = 0) is denoted by Rs . When R requirement is the bit-rate that is provided when all resources are assigned to a single session), we say that the session is greedy. We will also use the operator S(R) that returns the number of required channels in order for the session to experience R bitrate (again assuming P ER = 0). That is, when a session is admitted into the system, ˆ the number of allocated channels s (in the long term) must fulfil: Rmin ≤ Rs ≤ R. This implies that we assume that an admission control procedure operates in the system such that the maximum number of simultaneously admitted sessions remain under Iˆ S S(R/ˆ ˆ a) . We say that state i is an under-loaded, critically loaded or overloaded state if ˆ is less than, equal to or greater than S respectively. S(i · R) 2.1 The What-It-Wants Scheduling Policy ˆ channels to the sessions The What-It-Wants scheduling policy attempts to grant S(R) ˆ as long as i · S(R) ≤ S; i > 0. Otherwise, in overloaded states, it grants either Si or Si channels. Specifically, the What-It-Wants scheduling policy is defined by the ˆ ≤ S: following Policy Vector. If i · S(R) ⎧ ˆ 1 if s = S(R) ⎪ ⎪ ⎨ → − (granting peak rate with prob. 1), P W IW (s) = (1) ⎪ ⎪ ⎩ 0 otherwise. For overloaded states, we need to distinguish between two cases. If Si is an integer number, then: ⎧ S ⎪ ⎪ 1 if s = i ⎨ → − (granting an equal share with prob. 1), P W IW (s) = (2) ⎪ ⎪ ⎩ 0 otherwise. When Si is not an integer number, the following relations must hold. The scheduler grants Si channels with probability P1 and Si number of channels with probability 1 − P1 . Clearly:

S S S S S P1 · + (1 − P1 ) · = ; P1 = − . i i i i i (3) Thus, the policy vector in this case takes the form: ⎧ P1 if s = Si ⎪ ⎪ ⎪ ⎪ ⎨

→ − P W IW (s) = 1 − P1 if s = Si ⎪ ⎪ ⎪ ⎪ ⎩ 0 otherwise.

(4)

On Scheduling and Interference Coordination Policies

491

2.2 The All-Or-Nothing Scheduling Policy In the All-Or-Nothing scheduling policy all resources are assigned to the scheduled session. This type of scheduling is employed in High Speed Downlink Packet Access (HSDPA) systems when code multiplexing is not used. Thus, a session with peak rate ˆ would need to be scheduled with probability S(R)/S ˆ requirement R in order for it to receive its peak rate. However, when there are i ≥ 1 on-going sessions, any given session cannot get scheduled with higher probability than 1/i. That is, in the All-OrNothing scheduling policy, in system state i, a session gets scheduled with probability ˆ 1/i]. The scheduling policy takes the following form: P2 = Min[S(R)/S, ⎧ P2 if s = S ⎪ ⎪ ⎪ ⎪ ⎨ → − P AoN (s) = 1 − P2 if s = 0 (5) ⎪ ⎪ ⎪ ⎪ ⎩ 0 otherwise. 2.3 The Fifty-Fifty Scheduling Policy The Fifty-Fifty scheduling policy can be seen as a policy in between the What-It-Wants and All-Or-Nothing policies. When there are i sessions in the system, the scheduler divides the resources (almost) equally between the competing sessions (similarly to WhatIt-Wants). However, similarly to the All-Or-Nothing policy, in under-loaded states this ˆ Thus, would mean that the sessions receive more resources in the long term than S(R). S S in this policy, in underloaded state i, if i is not integer, a session receives i channels with probability P31 , Si number of channels with probability P32 and no channels ˆ < S and S is not with probability 1 − P31 − P32 . Clearly, in states for which i · S(R) i an integer number:

S S S S S S ˆ : − P31 · + P32 · = S(R), and P31 : P32 = − . (6) i i i i i i If Si is integer, the session is assigned Si number of channels with probability P33 and zero channels with probability 1 − P33 : P33 ·

S ˆ = S(R); i

and:

P0 = 1 − P33 .

ˆ ≥ S) the channels are fully utilized For critically and overloaded states (i · S(R) (P34 + P35 = 1):

S S P34 · i · + P35 · i · = S. i i In the critically loaded and overloaded states, if Si is integer, the number of allocated sessions for each session is Si with probability 1. Based on these observations, the scheduling policy vector for the Fifty-Fifty policy is straightforward to determine (although a bit tedious to formally specify it, see [1].

492

G. Fodor

Cell-0 Cell-1 Cell Under Study Dominant Interfering Cell

K0

Collision

K1

Cell-1

Cell-0 Random Allocation of frequency channels

Cell-0

Cell-1

Allocation of frequency channels

S Collision

No Collisions

Allocation of frequency channels

Fig. 1. Random and Coordinated ICIC policies. Coordinated ICIC can be realized by assigning a cell specific ordered list of the frequency channels to each cell such that the “collisions” of frequency channels are avoided as long as there are non-colliding pairs. Assuming a single (dominant) interfering cell (as in [9] and [10]), devising such ordered lists is straightforward. For many cells, coordinated ICIC implies careful frequency planning, as described in for example [4].

2.4 A Numerical Example Consider an OFDMA cell that supports S = 64 sub-carriers (channels). Sessions have ˆ = 4 channels. When there are 6 ina peak rate requirement that corresponds to S(R) progress sessions, the system is under-loaded (6 · 4 < 64), the three scheduling policy vectors are as follows: → − P W IW = [0, 0, 0, 0, 1, 0 . . . , 0] ; → − 60 4 , 0, . . . , 0, P AoN = ; 64 64 40 8 16 , 0, . . . , 0, , , 0, . . . , 0 , (7) PF F = 64 64 64 where the PF F vector has non-zero elements at positions 0, 10 and 11 (corresponding to 1, 11 and 12 scheduled channels). Since the system is underloaded, the What-It-Wants policy grants the peak rate with probability 1 (4 channels), the All-Or-Nothing policy allocates all the 64 channels with probability 4/64. The Fifty-Fifty policy (A = 0.0625, B = 0.03125 so P31 = 0.125 and P32 = 0.25) either allocates 10 or 11 channels to any given session (with probabilities 8/64 and 16/64 respectively) or it does not schedule the session (zero channels with probability 40/64). (All three policies allocate 4 channels in the long term average in this system state.) 2.5 ICIC Policies: Random and Coordinated Sub-Carrier (Channel) Allocation Basically, there are two approaches as to how the sub-carriers out of the available ones are selected when a session requires a certain number of sub-carriers (see Figure 1). The

On Scheduling and Interference Coordination Policies

493

simplest way is to pick sub-carriers out of the ones that are available (i.e. scheduled) randomly such that any available sub-carrier has the same probability to get allocated to an arriving session. Random allocation of sub-carriers is attractive, because it does not require any coordination between cells, but it may cause collisions even when there are free sub-carriers. In contrast, a low complexity coordination can avoid collisions as long as there are non-colliding sub-carrier pairs in the two-cell case and non-colliding tuples in the multiple-cell case. We refer to this method as coordinated sub-carrier allocation. (Further details about these ICIC policies in general can be found in [8].)

3 Performance Measures of Interest and Solution Approach 3.1 Input Parameters and Performance Measures We consider a single OFDMA cell with S channels at which sessions belonging to the same (elastic) service class arrive according to a Poisson process of intensity λ. Each session brings with itself a file whose size is an exponentially distributed random variable with parameter μ. The session requests a radio bearer that is characterized by its ˆ (for which: S(R) ˆ ≤ S) and minimum rate R/ˆ ˆ a, where a peak rate R ˆ ≥ 1 is the maximum slowdown factor associated with the session. If, at the time instant of the arrival of the new session, the admission of the new session brought the system into a state in which the minimum rate (governed by the particular scheduling policy) cannot be granted, the session is blocked and leaves the system. The single cell is disturbed (interfered) by a single dominant interferer cell, such as in [9]. In this paper we characterize the load in this dominant interfering cell by the number of used sub-carriers K1 ≤ S. When an allocated sub-carrier in the cell under study and one of the K1 disturbing channels use the same sub-carrier frequency, we say that the two sub-carriers collide and suffer from co-channel interference [4]. The performance measures of interest are the session-wise blocking probability and the mean file transfer time. These two quantities represent a trade-off since more admitted sessions imply lower per-session throughput and thereby longer file transfer times. This trade-off in a WCDMA environment has been investigated by Altman in [6] and subsequently by Fodor et al. in [7]. 3.2 Step 1: Determining the Distribution of the Allocated Sub-Carriers Recall that in each system state the scheduling policy vector determines the probability that a given session is allocated s channels. When a session is given s channels (which → − happens with probability P (s)), we need to calculate the conditional distribution of the number of the totally allocated number of channels (that is to all sessions) in the cell (denoted by K0 ), given that the session under study is given s channels. This is because K0 and the number of disturbing channels K1 determine the distribution of the colliding and collision-free channels in the cell, which in turn determine the performance measures of interest. We cannot give a closed form formula for the (conditional) distribution of K0 . However, in [1] we provide the pseudo code description of the algorithm that calculates it.

494

G. Fodor

3.3 Step 2: Determining the Distribution of the Colliding Sub-Carriers Under the Random and Coordinated Sub-Carrier Allocation Policies Lemma 1. Let S denote the total number of available sub-carriers in each cell and let K0 ≤ S and K1 ≤ S denote the number of allocated channels in Cell-0 and Cell-1 respectively. Let N1 (c) denote the number of possible channel allocations in Cell-0 and Cell-1 such that the number of collisions is c. Then, the distribution and the mean of the number of collisions under the random allocation policy (γ1 ) are as follows: cMIN = M ax[0, K0 + K1 − S],

cMAX = M in[K0 , K1 ],

S S−c S − K0 N1 (c) = · · ; c K0 − c K1 − c

E[γ1 |K0 , K1 ] =

cM AX c=cM IN

c · N1 (c) , T OT 1

P r{γ1 = c|K0 , K1 } =

where T OT 1 =

S K0

N1 (c) , T OT 1

S · . K1

Lemma 2. Using similar notation as in Lemma 1, the distribution and the mean number of collisions under the coordinated allocation policy (γ2 ) is given by: 1 if c = c0 N2 (c) = 0 otherwise, c0 =

0 K0 + K1 − S

P r{γ2 = c} = N2 (c),

if K0 + K1 < S, otherwise.

E[γ2 ] =

cM AX

c · N2 (c).

c=cM IN

3.4 Step 3: Determining the Packet-Wise Effective SINR The scheduling policy vector specifies the probability that s channels are used in Cell-0, whereas Lemmas 1-2 determine the probability that the number of colliding channels is c. We will use the following lemma to determine the probability that the number of colliding channels in a packet of size L is γ when the number of scheduled channels (for the session under study) is s and the total number of colliding channels is c ≤ s. Lemma 3 c s−c L γ L−γ P r {γ ≤ c channels out of L are colliding} = · s · s−γ γ γ L−γ

(8)

On Scheduling and Interference Coordination Policies

495

3.5 Step 4: Calculating the SINR Level in Case of Collisions for the Downlink Lemmas 1-3 determine the probability that the number of colliding channels is γ and the number of non-colliding channels is L − γ in a packet of a session under study. We now need to determine the impact of the collision on a channel’s signal-to-noise-andinterference (SINR) ratio. For this, we use the path loss model recommended by the 3GPP (described in [15]) and a result from [8]. Let θ be a predefined threshold and let X rr01 be a random variable representing the distance ratio between the mobile station distances from its serving and disturbing base station respectively. Also, let Q0 and Q1 denote the power that the serving and the neighbor base station uses on the colliding channels respectively. Furthermore, let G0 and G1 denote the path gains from the serving base station (that is in Cell-0) and the dominant neighbor base station (that is in Cell-1) respectively to the mobile station under study. Then, the probability that the SINR remains under this threshold can be approximated as follows [8]: G0 · Q0 <θ ≈ Pr G1 · Q1 + N0

Max[X]

fX (x)g(x) dx;

0

xμ θ 5 ln Q0 /Q1 1 · . g(x) erfc − 2 bς ln 10

(9)

where fX (x) is the probability density function of X; b, ς and μ are the parameters of the 3GPP path loss model as described in [15]. 3.6 Step 5: Calculating the Effective SINR and the Packet Loss Probability We are now in the position that the packet loss probability in each system state can be determined. When one or more of the channels that are used to carry a packet are hit by collisions, an efficient way to characterize the overall SINR quality of the packet is to use the notion of the effective SINR. This concept has been proposed in [12] and used in for instance [13], in which a method to calculate the packet error probability for a given value of the effective SINR was also proposed. A specific method to calculate the effective SINR (based on the SINR of the composing channels) that is applicable in cellular OFDM systems is also recommended by the 3GPP [11]. In this paper we employ the 3GPP method that can be summarized as follows. Suppose that there are L sub-carriers that carry a data packet and each has a SINR value of SIN Ri . Then, the effective SINR that is assigned to the packet is given by: L SIN R i −1 1 SIN Reff = α1 · I I , (10) L i=1 α2 where I(·) is a model specific function and I −1 (·) is its inverse. The parameters α1 and α2 allow to adapt the model to characteristics of the considered modulation and coding scheme. The exponential effective SINR metric proposed in [11] corresponds

496

G. Fodor

to I(x) = exp(−x). In [13] it is shown that for QPSK and 16-QAM modulation, the parameters α1 and α2 can be chosen as follows: α1 = 1 and α2 = 1. In [13] a method to determine the packet error rate (σ) as a function of the effective SINR is presented. Essentially, this method maps (in a 1-1 fashion) the effective SINR onto a (modulation and coding scheme dependent) packet error rate. 3.7 Step 6: Determining the Performance Measures of Interest We now make use of the assumption that the session arrivals form a Poisson process and that the session size is exponentially distributed. We choose the number of admitted sessions as the state variable and thus the number of states in the system is Iˆ + 1. The transitions between states are due to an arrival or a departure of a session. The arrival rates are given by the intensity of the Poisson arrival processes. Due to the memoryless property of the exponential distribution, the departure rate from each state depend on the nominal holding time of the in-progress sessions, and also on the slow down factor and the packet error rate in that state. Specifically, when the slow down factor is ai (n), and the packet error rate is σ(n) its departure rate is (1 − σi (n))μi /ai (n). The Markovian property for such systems was observed and formally proven by Altman et al. in [14]. Thus, the system under these assumptions is a continuous time Markov chain whose state is uniquely characterized by the state variable n.

4 Numerical Results In accordance with the 3GPP recommendation, we here (in a somewhat simplified fashion) assume that a downlink resource block (sometimes referred to as a chunk) occupies 300 kHz and 0.5 ms in the frequency and time domains respectively. A chunk carries 7 OFDM symbols on each sub-carrier; therefore the downlink symbol rate is Rsymbol =140 symbols/chunk/0.5ms. Assuming a 10 MHz spectrum band, and considering some overhead due to measurement reference symbols and other reasons, this corresponds to 30 chunks in the frequency domain (S = 30), that is 8400 ksymbol/s. The actual bit-rate depends on the applied modulation and coding scheme, in this paper we do not model adaptive modulation and coding (AMC), we simply assume a fixed binary phased shift keying (BPSK) so that each symbol carries nMCS = 2 bits. Sessions arrive according to a Poisson process of intensity λ = 1/8 [1/s]. A session is characterized by the amount of bits that it transmits during its residency time in the system (we may think of this quantity as the size of the file that is to be downloaded). We assume that this file size is an exponentially distributed random variable with mean value ν = 4 ∗ S ∗ Rsymbol ∗ nMCS . The blocking probability and file download meantime results are shown in Figures 2-3. On the x axis we let the number of disturbing channels (i.e. the occupied channels in the neighbor cell) increase (K1/5 = 1 . . . 6), while the y axis shows the blocking probabilities and the mean session residency times. The upper graphs in each figure correspond to the case when there is no channel allocation coordination between the cells, while the lower graphs assume coordination (channel segregation). This system is highly loaded so that when a ˆ is set to 1, the blocking probabilities increase from 5.5% up to 7%, while the file download time increases from 30.5 s to 33 s (not shown here). The figures correspond to the case when a ˆ = 2. We observe

On Scheduling and Interference Coordination Policies BLOCKING PROBABILITIES 0.0006

497

Ideal FifFif

0.0005

AllOrN

0.0004

WIW

0.0003 0.0002 1

2

3 4 0.2K1

5

6

BLOCKING PROBABILITIES 0.0006

Ideal FifFif

0.0005

AllOrN

0.0004

WIW

0.0003 0.0002 1

2

3 4 0.2K1

5

6

Fig. 2. Blocking probabilities as the function of the number of disturbing channels in the dominant interferer cell (K1 = 5, 10, 15, 20, 25, 30). When the sessions tolerate some slowdown ˆ the blocking probabilities are low, (in this example 2 or(here a ˆ = 2, that is Rmin = R/2), ders of magnitude lower than when a ˆ = 1 (not shown here)), and the coordinated allocation (lower diagram) performs somewhat better when the scheduling method is the All-Or-Nothing or What-It-Wants (“W-I-W”). TIMEINSYSTEM

Ideal

34

FifFif

33

AllOrN

32

WIW

31 1

2

3 4 0.2K1

5

6

TIMEINSYSTEM

Ideal

34

FifFif

33

AllOrN

32

WIW

31 1

2

3 4 0.2K1

5

6

Fig. 3. Average file download time as the function of the number of disturbing channels in the dominant interferer cell (K1 = 5, 10, 15, 20, 25, 30). When the sessions tolerate some slowdown ˆ the session holding time increases somewhat, (as compared to (here a ˆ = 2, that is Rmin = R/2, the case when a ˆ = 1 (not shown here)), and the coordinated allocation again performs somewhat better when the scheduling method is the All-Or-Nothing or What-It-Wants (“W-I-W”).

498

G. Fodor

that when the sessions tolerate some slowdown, the blocking probability dramatically decreases (down to 0.06% !) without much increasing the download time (from around 33s to around 34s). Secondly, we note that coordinated allocation is beneficial when the What-It-Wants or the Fifty-Fifty scheduling method is employed, and has no effect when the All-Or-Nothing scheduling is used. The curve denoted “ideal” corresponds to the case when the packet error rate σ is zero in all system states. We refer to [1] for further results and analysis.

5 Conclusion Inter-cell interference coordination is an important radio resource management function for (O)FDMA based cellular systems in general [4] and for the evolving Universal Terrestrial Radio Access Network (E-UTRA) in particular [2], [3]. We proposed the notion of the (scheduling) policy vector to model the behavior of the packet scheduler. Using the policy vector, we were able to derive the conditional distribution of the number of colliding and collision free channels in the cell under study for all three cases. This in turn allowed us to determine the distribution of the number of colliding and collision free (i.e. co-channel interference free) channels in each scheduled packet. We used this knowledge to calculate the effective SINR and from it the packet error rate and thereby the useful packet throughput of the system. This useful throughput determines the session wise blocking probabilities and the time it takes for elastic sessions to complete a file transfer. Our major finding is that the performance of the ICIC function (its impact on the system throughput) depends on the employed scheduler. Specifically, for peak rate limited ˆ < S), when frequency domain scheduling (“narrow band”) traffic (that is when S(R) is used in combination with time domain scheduling, it is useful to employ coordinated channel allocation in neighbor cells. Coordinated ICIC has little impact when the scheduler is pure time domain based. We also note that our further numerical results indicate that ICIC is only necessary for cell edge users, whose SINR is negatively impacted by frequency domain collisions [1].

References 1. G. Fodor and M. Telek, “Modeling and Performance Analysis of Scheduling Poicies for OFDMA Based Evolved UTRA”, Technical Report, 2006. http://webspn. hit.bme.hu/∼telek/techrep/multicell.pdf. 2. 3GPP Technical Report TR 25.912, Feasibility Study for Evolved Universal Terrestrial Radio Access (E-UTRA), Release 7, 2006. 3. 3GPP Technical Report TR R3.018, Evolved UTRA and UTRAN Radio Access Architecture and Interfaces, Release 7, 2006. 4. I. Katzela and M. Naghshineh, “Channel Assignment Schemes for Cellular Mobile Telecommunication Systems: A Comprehensive Survey”, IEEE Personal Communications, pp. 1031, June 1996. 5. Ying Jun Zhang and Khalid Ben Letaief, “Multiuser Adaptive Subcarrier-and-Bit Allocation With Adaptive Cell Selection for OFDM Systems”, IEEE Transactions on Wireless Communications, Vol. 3, No. 5, pp. 1566-1575, September 2004.

On Scheduling and Interference Coordination Policies

499

6. E. Altman, “Capacity of Multi-service Cellular Networks with Transmission-Rate Control: A Queueing Analysis”, ACM Mobicom ’02, Atlanta, GA, September 23-28, 2002. 7. G. Fodor and M. Telek, “Performance Anlysis of the Uplink of a CDMA Cell Supporting Elastic Services”, in the Proc. of IFIP Networking 2005, Waterloo, Canada, Springer LNCS 3462, pp. 205-216, 2005. 8. G. Fodor, “Performance Analysis of a Reuse Partitioning Technique for OFDM Based Evolved UTRA”, 14th IEEE International Workshop on QoS, New Haven, CT, USA, pp. 112-120, June 2006. 9. G. Li and H. Liu, “Downlink Dynamic Resource Allocation for Multi-cell OFDMA System”, 58th IEEE Vehicular Technology Conference, VTC 2003-Fall, Vol. 3, pp. 1698-1702, 6-9 October 2003. 10. H. Li and G. Liu, “OFDM-Based Broadband Wireless Networks: Design and Optimization”, WILEY, 2005, ISBN: 0471723460. 11. 3GPP Technical Report TR 25.892, Feasibility Study for OFDM for UTRAN Enhancements (Release 6), V6.0.0, 2004-06. 12. S. Nanda and K. M. Rege, “Frame Error Rates for Convolutional Codes on Fading Channels and the Concept of Effective Eb /N0 ”, IEEE Transactions on Vehicular Technologies, Vol. 47, No. 4, pp. 1245-1250, November 1998. 13. K. Brueninghaus, D. Ast´ely, T. S¨alzer, S. Visuri, A. Alexiou, S. Karger, G-A Seraji, “Link Performance Models for System Level Simulations of Broadband Radio Access Systems”, 16th IEEE International Symposium on Personal, Indoor and Mobile Communications, PIMRC, 11-14 Sepetmeber 2005. 14. E. Altman, D. Artiges and K. Traore, “On the Integration of Best-Effort and Guaranteed Performance Services”, INRIA Research Report No. 3222, July, 1997. 15. 3GPP TR 25.942, Radio Frequency System Scenarios, 2005.

Distributed Uplink Scheduling in CDMA Networks Ashwin Sridharan1 , Ramesh Subbaraman2 , and Roch Gu´erin2, Sprint Advanced Technology Labs [email protected] 2 University of Pennsylvania {rameshrs,guerin}@seas.upenn.edu 1

Abstract. Ever more powerful mobile devices are handling a broader range of applications, so that giving them greater control in scheduling transmissions as a function of application needs is becoming increasingly desirable. Several standards have, therefore, proposed mechanisms aimed at giving devices more autonomy in making transmission decisions on the wireless uplink. This paper explores the impact this can have on total throughput in CDMA systems, where this control has traditionally been centralized. The investigation relies on a simple distributed policy that helps provide insight into the impact of distributed decisions on overall system eﬃciency, and identify guidelines on how to best mitigate it.

1

Introduction

With the power and versatility of mobiles1 rivaling that of stationary platforms, the diversity and communication requirements of applications they run have also been expanding. This has resulted in a push to give mobiles more autonomy in making transmission decisions. This, however, often conﬂicts with the centralized operation of current wireless systems, e.g., the control exercized by base stations or the use of 802.11 RTS/CTS handshakes between devices and access points. In this paper, we explore the tension this creates in the uplink of CDMA systems. Traditional CDMA base stations tightly control transmission schedules and power to maintain acceptable signal to interference levels. Several standards for modern 3G/4G cellular networks, e.g., 1xEV-DO Rev. A [1], HSUPA [2], have, however, introduced mechanisms that give devices signiﬁcant autonomy in deciding when to transmit and at what rate. As stated in [3], a major driver was to deﬁne a “wide-area-mobile wireless Ethernet,” where devices had greater independence in making transmission decisions best matched to their applications. The price for this ﬂexibility is potentially higher interferences, and a corresponding degradation in performance. Investigating this issue is what motivated this paper.

1

The work of these authors was supported in part by a Gift from the Sprint Corporation and through NSF Grant CNS-0627004. We use mobiles, devices and users interchangeably.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 500–510, 2007. c IFIP International Federation for Information Processing 2007

Distributed Uplink Scheduling in CDMA Networks

501

One proposed mechanism for allowing distributed transmission decisions while maintaining some control on resource sharing, is a token bucket [1] similar to that used in wired networks [4]. Each token grants access to a certain amount of “resources,” with token generation rate and token bucket depth imposing limits on resource consumption. Mobile devices decide how to spend their tokens to achieve transmission rates (and latencies) best suited to their applications. Unlike wired networks where the token “currency” is in bytes, tokens are now in units of transmission power, the primary resource in a CDMA system. This leads to resource sharing models fundamentally diﬀerent from the “queueing systems” that capture buﬀer and bandwidth consumption in wired networks. Instead, as discussed in Section 4, the sharing of resources among users is measured through the resulting signal to interference and noise ratio (SINR). We develop models that reﬂect this sharing, with users making independent but constrained transmission decisions, where constraints arise from token bucket mechanisms. We ﬁrst investigate a simple distributed policy, with users randomly and independently alternating between idle and active periods. Token bucket constraints are introduced next that limit the frequency of active periods. We derive expressions in both settings for the achieved user rates as functions of the frequency of active periods and rate selection. This enables us to explore the impact of distributed transmission decisions on performance and assess the eﬃcacy and eﬀect of token buckets. Our evaluation is carried out in the context of uplink transmission in a single cell. Extensions to the multi-cell scenario will be considered in future work. The rest of the paper is structured as follows. Section 2 provides basic background on CDMA systems and reviews related work. Section 3 is a short tutorial on the operation of 3G/4G cellular uplinks. Our resource sharing model is covered in Section 4, while Section 5 describes our evaluation framework. Our analysis is covered in Section 6, with Section 7 comparing its results to simulations. Section 8 summarizes our ﬁndings and points to future work.

2

CDMA Reverse Link: Related Work

In CDMA systems [1,2] the reverse link or uplink, i.e., from mobiles to the access network, is well-known to be interference limited. Because users share the same spectrum and their signals are not perfectly orthogonal, the throughput they see is a function of both their own transmission power and that of other users whose transmissions are perceived as interference. This introduces a tradeoﬀ as increasing either ones own signal strength or transmission frequency also increases interferences to others. A user must therefore ﬁrst decide what transmission power to utilize and second, when and how often to transmit. There is a large literature addressing optimum transmission power selection. See, for example [5,6,7,8,9,10] that address the problem of joint allocation of transmission power and associated QoS functions (e.g., rate). These studies however do not address the second issue of uplink scheduling. Oﬀ late, this topic has received some attention, notably [11,12,13] where authors proposed

502

A. Sridharan, R. Subbaraman, and R. Gu´erin

joint scheduling and power allocation algorithms that take multi-user interferences into account. These works, however, assume a centralized control of users transmissions, which places a heavy burden on the system in terms of signaling overhead and scalability.

3

The 3G/4G Cellular CDMA Reverse Link

We now describe two key features of the operation of 3G/4G CDMA uplinks, which play important roles in enabling and aﬀecting distributed scheduling policies. The ﬁrst, pilot assisted transmission, governs the transmission power level of devices. Each device transmits a pilot signal on the uplink whose strength is set by the access network through a fast closed control loop to ensure that all pilot signals are received with equal power. When selecting an uplink transmission data rate, the device then sets its transmission power relative to the strength of the pilot signal. Speciﬁcally, if at time slot t the pilot strength of device i is i (R, t): PSi (t), transmission at a data rate R requires a transmission power PD i (R, t) = T xT 2P [R] · PSi (t) PD

(1)

where T xT 2P [R] is an a priori speciﬁed proportionality factor function of the target rate R. This mechanism does not yield optimal transmission power selections, but ensures a level playing ﬁeld to all devices by equalizing the strength of all pilot signals at the receiver. The fast power control loop allows the pilot signal to track variations of the wireless channel, and as a result the transmission power for the data is de-coupled from the problem of coping with fading and attenuation on the wireless channel. As we shall see later, this is a critical aspect of the system. The second and more recent feature of CDMA systems is the use of a tokenbucket2 to control how devices access the uplink. Each MAC (layer 2) ﬂow is assigned a token bucket which can hold σ “power-tokens” and is ﬁlled at a rate of ρ. To transmit at a data rate of R, a device must have T xT 2P [R] tokens that are then subtracted from its bucket3 . The higher R, the higher T xT 2P [R], and hence the faster the token bucket drains. Through the token bucket parameters (σi , ρi ), the network limits the maximum transmission power and frequency of devices, and therefore controls both the total uplink power and its allocation across devices. On the other hand, it relinquishes scheduling decisions to devices by letting them control the use of their power tokens.

4

System Model

The system consists of a single cell with n + 1 homogeneous and continuously backlogged users sharing a time-slotted uplink. Denote the data transmission 2 3

Also known as Grants in HSUPA [2]. In practice there are only a ﬁnite set of rates a device can transmit at(e.g., six in [1]).

Distributed Uplink Scheduling in CDMA Networks

503

i power of user i in slot t when transmitting at rate Ri as PD (Ri , t). Under the CDMA sharing model, the SINR of user i in slot t is then given by [14]:

Si (Ri , t) =

i G(Ri ) · Giloss (t) · PD (Ri , t) , j j 2 σ +θ Gloss (t) · PD (Rj , t)

(2)

j=i

where θ ∈ [0, 1] quantiﬁes the orthogonality of the codes, σ 2 is the thermal noise, Giloss (t) the path loss4 of user i in time slot t, and G(Ri ) = W/Ri its processing gain, where W is the spread-spectrum bandwidth. Recall from Section 3 that the pilot signal of each device is controlled by a fast control loop so that received pilot strengths are all identical at the base station. This can be modeled as each pilot signal seeking a common target SINR 1/φ. Let PSi (t) be the pilot strength of user i in slot t. Assuming perfect power control and unconstrained transmission power, it is then easy to see that the pilot power control loop requires each device to set PSi (t) such that, Giloss (t) · PSi (t) = Δ =

σ2 ,. φ − θp n

(3)

where θp is the orthogonality factor for the pilot signal. Since, the data transmisi (R, t) is relative to the pilot strength, using Eqn. (1) and Eqn. (3), sion power PD Eqn. (2) can be written as Si (Ri , t) =

G(Ri ) · T xT 2P [Ri ] · Δ . σ2 + θ T xT 2P [Rj ] · Δ

(4)

j=i

Eqn. (4) states that with perfect power control and no power constraint, the SINR of a user is inﬂuenced only by other users rate choices and not the channel.

5

Evaluation Framework

Given our assumption of continuously backlogged and homogeneous users, a metric of interest is long-term throughput. For simplicity, we ﬁrst approximate the eﬀective rate achieved by a user in a time slot as linearly proportional to its SINR. Speciﬁcally, the eﬀective rate achieved by a user i in time slot t, given that it transmitted at rate Ri , is given by Ci (t) =

Si (Ri , t) Ri So

(5)

where Si (Ri , t) is given by Eqn. (4) and So is the target data SINR. The linearity assumption is typically valid at small SINR values [14] as long as the modulation scheme remains the same. A limitation of Eqn. (5) is, however, that whenever Si (Ri , t) is greater than So , it yields an eﬀective rate greater than the 4

Function of distance to base station and fast fading.

504

A. Sridharan, R. Subbaraman, and R. Gu´erin

transmission rate Ri . This is clearly not possible, and hence, the above relation is modiﬁed to Ci (t) = min(Ri ,

Si (Ri , t) Ri ) . So

(6)

We refer to Eqn. (5) as the Linear Model and Eqn. (6) as the Bounded Model. = E[Ci (t)]. In both cases, the metric of interest is average user achieved rate, C Apart from C, for token bucket constrained systems, the token eﬃciency, i.e., the achieved eﬀective rate per token expended, is another metric of interest. If T is the expected number of tokens expended per time slot, the token-bucket . eﬃciency is deﬁned as η = C/T 5.1

The Scheduling Policy

We consider a transmission scheme where in each slot when a user has enough tokens it either transmits at rate R with probability p, or doesn’t transmit at all with probability 1 − p. All users are assumed independent and identical in their transmission behaviour, i.e., p and R are the same for all users. Although simple, this policy is interesting for several reasons. First, it is inherently distributed, which when combined with its simplicity makes it eminently practical. Indeed, it has direct equivalents in wireline networks, e.g., Aloha, CSMA etc. Second, by virtue of easily controllable parameters, it lets us explore and understand key system properties, e.g., impact of cell load, transmission rate, etc., which we show can strongly inﬂuence performance. In addition, it captures a hybrid sharing model between pure CDMA (all users transmitting) and a slotted-system (one user transmitting at a time) that has the potential to enable distributed control while improving performance.

6

Analysis of Scheduling Behaviour

In this section, we analyze the performance of the on-oﬀ scheduler, and identify how the value of p that maximizes throughput depends on both the number of users in the cell (n + 1) and their selected transmission rate R. The analysis is based on Eqn. (4) and assumes perfect power control. We also assume that So in Eqn. (5) is large enough that the rate obeys the Linear Model. The impact of these assumptions is explored in Section 7. Due to space limitations, we refer the reader to the technical report [15] for all proofs. 6.1

Scheduler Behaviour: No Token Bucket

i (R, t) = From Section 3, the signal power to transmit at data rate R is PD i T xT 2P [R] · PS (t). For ease of exposition, assume that R is ﬁxed and let T xT 2P [R] = K. Let Ki be the random variable denoting the transmission power used in a slot by user i. Under the on-oﬀ scheduler with no token constraints, Ki is a

Distributed Uplink Scheduling in CDMA Networks

505

Bernoulli random variable that takes values K and 0 with probability p and 1 − p, respectively. After minor algebraic manipulations of Eqn. (4) and using Eqn. (5) and G(R) = W/R, the expected achieved data rate C(p) is found to be n W 1 n j C(p) = ·p· (7) p (1 − p)n−j , θ · So j + δ j j=0 φ − θp n γ = . We now state two propositions that capture and θK θK elicit the impact of δ on the scheduling parameter p and the achieved rate C(p). where δ =

Proposition 1. If δ ≥ 1, then the expected achieved rate C(p) attains its max∗ imum value at p = 1. Proposition 2. If δ < 1, then the expected achieved rate C(p) has a unique ∗ maximum at p = p < 1. In either case, p∗ satisﬁes the following equation: n n j 1 1 = . (8) p (1 − p)n−j j j + δ (n + 1)p −1+δ j=0 φ − θp n Since δ = , assuming φ is ﬁxed5 , Propositions 1 and 2 reﬂect the θK impact of the number of users and the selected rate on the optimal p. Speciﬁcally, with few users or low enough transmission rates so that δ ≥ 1, the optimal policy yields a pure CDMA system (p∗ = 1) where everybody transmits, exploiting the orthogonality of the CDMA codes. As the load increases (n or K ) so that δ ≤ 1, the increased interference triggers a transition to a hybrid slot-division/CDMA allocation with only some users active in any given slot. This reﬂects the trade-oﬀ between reducing interference (p ) and increasing transmission opportunities (p ). Propositions 1 and 2 characterize the transition point precisely through δ. Similar results were obtained in [12], albeit in a centralized setting. Next, we study the impact of selecting diﬀerent transmission rates in the on state on the ∗ . optimal achieved rate C ∗ be the optimal achieved rates when the trans∗ and C Proposition 3. Let C 1 2 mission rates in the on state are R1 and R2 , respectively. If R1 > R2 , then ∗ > C ∗ . C 1 2 The proposition states that under an on-oﬀ scheduler, increasing the transmission rate R always improves throughput. Hence, one should select the (R, p∗ ) combination with the highest R. However, this is true only for an unconstrained system, and need not hold when token constraints are present. In such a setting token eﬃciency matters, and lower rates may fare better than higher ones that consume more tokens. 5

A typical target Pilot SINR 1/φ is between −26 dB and −17 dB and the same for homogeneous users.

506

6.2

A. Sridharan, R. Subbaraman, and R. Gu´erin

Incorporation of a Token Bucket

The previous section established that controlling transmission frequency and rate matters when devices make independent decisions. This can be realized by a token bucket with parameters (ρ, σ). The token rate ρ bounds transmission frequency and rate, while the bucket depth σ aﬀords ﬂexibility in scheduling decisions. Next, we use the results of Section 6.1 to explore how to spend tokens (transmissions at rate R cost K tokens) to maximize throughput. Consider the on-oﬀ scheduler, but now operating under token bucket constraints. Speciﬁcally, if p is the conditional transmission probability given enough tokens (≥ K), the token bucket evolution can be modeled as a Markov chain to obtain the stationary distribution πl of having l tokens in the bucket [15]. The unconditional transmission probability ptok is then given by Eqn. (10) that so that the optimum pair together with Eqn. (7) can be used to approximate C, ∗ ∗ (K , p ) is the solution of the non-linear program N1 : K) N1 : max C(p,

(9)

p,K∈K

where

n 1 n j K) = W · ptok · C(p, ptok (1 − ptok )n−j , θSo j + δ j j=0 ptok = p · (1 −

K−1

πk ),

(10)

k=1

0 ≤ ptok ≤ 1 . An algorithm that solves program N1 is described in [15], and evaluated in Section 7.2.

7

Simulation Results

We explore the validity of the analysis of Section 6 and the roles of rate selection and transmission probability in both unconstrained (Section 7.1) and token bucket constrained systems (Section 7.2). Results are obtained using a detailed simulator of the uplink that incorporates key characteristics of the channel model and transmission system. The target pilot strength is set to −17dB, which allows up to 50 active users to share the uplink. Simulations for both perfect and imperfect power control yielded very similar results. Hence, only results for the former are presented. Results for the latter can be found in [15] together with additional details on the simulator itself. All results have a 90% conﬁdence interval. 7.1

Unconstrained System Evaluation

Our ﬁrst goal is the validation of δ = 1 as a transition point for the optimal policy, i.e., from p∗ = 1 to p∗ < 1. We focus on the linear model on which the

Distributed Uplink Scheduling in CDMA Networks

507

Throughput (kbps) (LINEAR MODEL)

30

25

24 Users 45 Users

20

15

0

0.1

0.2

0.3

0.4 0.5 0.6 Transmission Probability

0.7

0.8

0.9

1

Fig. 1. Impact of δ

analysis of Section 6.1 is based, and plot in Fig. 1 the achieved throughput for two conﬁgurations: A lightly loaded system, R = 76.8 kbps and 24 users, for a load ≤ 50% and δ > 1; and a highly loaded system, R = 76.8 kbps and 45 users for a load ≈ 90% and δ = 0.23 < 1. The ﬁgure highlights the diﬀerent optimal policy of each conﬁguration (p∗ = 1 for the former and p∗ < 1 for the latter) conﬁrming predictions made in Propositions 1 and 2. Similar results were also obtained for the bounded rate model. Next we explore the diﬀerences that exist between the linear and bounded rate models. We use a scenario with δ < 1, and plot in Fig. 2 the achieved throughput as a function of p for the two rate models. For the linear model, the optimal p∗ ≈ 0.03 agrees with the solution of Eqn. (8). For the bounded rate model, the ﬁgure, however, highlights the impact of limiting the rate even as the SINR keeps increasing. The rate capping translates into a higher optimal p∗ ≈ 0.2, or in other words in allowing more simultaneously active users. The ﬁgure also illustrates for both rate models the beneﬁts of the hybrid allocation of the on-oﬀ scheduler (p < 1) over both a pure CDMA system (p = 1) and a pure slot-based scheme (plotted on the right y-axis), where the latter was realized (for the bounded rate model) through a round robin scheduler that allowed only one user to be active in any time slot. 7.2

Token Bucket Constrained System Evaluation

Fig. 2 (R = 153.6 kbps, 45 users) and Fig. 1 (R = 76.8 kbps, 45 users) also validate Proposition 3, as they show that for the linear rate model, an (R, p∗ ) combination with a higher R is indeed better. As discussed earlier, this however ignores token eﬃciency. Indeed, Fig. 3 shows that R = 76.8 kbps has higher token eﬃciency for a 24 user system. We explore next how this aﬀects throughput under token bucket constraints.

508

A. Sridharan, R. Subbaraman, and R. Gu´erin

80

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

18

16

14 60 12 50

10

40

Linear Rate

8

Bounded Rate Slot−Division

6

30

Throughput (kbps) (BOUNDED RATE)

Throughput (kbps)(LINEAR MODEL)

70

4 20 2

10

0

0.1

0.2

0.3

0.4 0.5 0.6 Transmission Probability

0.7

0.8

0.9

1

0

Fig. 2. Linear vs bounded rate – 45 users, R = 153.6kbps

Token Efficiency (kbps/Token) (BOUNDED RATE)

3.8

3.6

3.4

3.2

3 76.8 kbps 153.6 kbps

2.8

2.6

2.4

2.2

2

1.8

0

0.1

0.2

0.3

0.4 0.5 0.6 Transmission Probability

0.7

0.8

0.9

1

Fig. 3. Token Eﬃciency (24 users)

Fig. 4 plots the achieved bounded rate throughput as a function of the conditional transmission probability p, i.e., probability of transmission given enough tokens in the bucket, for a 24 user system and a token bucket with ρ = 7 dB and σ = 21.5 dB. Based on the recommendations in [1], transmissions at R = 153.6 kbps require 18.5 dB worth of tokens and 13.5 dB at R = 76.8 kbps6 . When p is low, R = 153.6 kbps yields better throughput than 76.8 kbps because the bucket is rarely exhausted and hence token eﬃciency is not critical. However, this is no longer true at higher values of p where the better token eﬃciency of 76.8 kbps yields a higher throughput. Overall, 76.8 kbps yields the highest achieved throughput because it provides a better compromise than R = 153.6 kbps between token eﬃciency and realized rate. 6

See [15] for a full list of rate to token mappings.

Distributed Uplink Scheduling in CDMA Networks

509

18

Throughput (kbps) (BOUNDED RATE)

16

14

12

10

8

6

4

2 76.8 kbps 153.6 kbps 0

0

0.1

0.2

0.3

0.4 0.5 0.6 Transmission Probability

0.7

0.8

0.9

1

Fig. 4. Token bucket constrained system (24 users, bounded rate model)

Last, we discuss the solution of program N1 of Section 6.2 that relies on the linear rate model and explore its diﬀerences with the bounded rate model (Fig. 4). Table 1 presents the optimal transmission probabilities p∗A and achieved ∗ obtained by solving N1 for both R = 76.8 kbps and R = 153.6 kbps rates CA in a 24 user system. It also gives the optimal values obtained by simulation for the bounded rate model. As expected, because the bounded rate model caps rates, its achieved rates are signiﬁcantly lower. When it comes to optimal transmission probabilities however, the analytical results are in good agreement with simulations for R = 153.6 kbps. For R = 76.8 kbps the analytical p∗ = 1.0 is higher than that predicted by simulations p∗sim = 0.35. However, comparing the last column Csim (p∗A ) in Table 1, which shows the throughput achieved in ∗ simulations using the analytically computed p∗A , with the optimal Csim , we see that they are very close for both R = 76.8, 153.6 kbps. This indicates that the p∗A obtained from solving N1 provide very reasonable estimates for setting the transmission probabilities in practice. Table 1. Token Bucket - Bounded Rate Model Rate Token Bucket (kbps) Analysis :N1 Simulation ∗ ∗ A sim p∗A p∗sim C Csim (p∗A ) C 76.8 1.0 26.4 0.35 17.84 16.56 153.6 0.21 42.9 0.25 10.63 10.59

8

Conclusions and Future Work

In this paper, we investigated the performance of a CDMA uplink, when transmissions decisions are distributed to mobiles. This was motivated by standard

510

A. Sridharan, R. Subbaraman, and R. Gu´erin

proposals [1,2] that introduced support for such distribution. The investigation relied on a simple on-oﬀ scheduler to explore the impact of distributed transmission decisions, and identiﬁed both analytically as well as via simulations, key factors that aﬀect system performance and how to account for them in designing a scheduling policy. The paper also investigated the realization of such distributed scheduling decisions through a token bucket, and how the token bucket operation aﬀected the scheduler. There are many possible extensions to this work, and we mention two we are currently exploring. The ﬁrst is identiﬁcation of the optimal capacity region for distributed decisions and how to achieve it. The second and possibly more important direction involves using the token bucket to provide diﬀerentiated services to users. Preliminary results can be found in [15].

References 1. QualComm: 1xEV: 1x EVolution, IS-856 TIA/EIA Standard (2004) 2. 3GPP TS 21.101 UTRAN-based 3GPP System, Rel. 6: High Speed Uplink Packet Access (2005) 3. Bhusha, N., Lott, C., R. Attar, P.B., Jou, Y.C., Fan, M., Ghosh, D., Au, J.: CDMA2000 1xEV-DO Revision A: A physical layer and MAC layer overview. IEEE Comm. Mag. 44(2) (2006) 4. Turner, J.: New directions in communications (or which way to the information age?). IEEE Comm. Mag. 24(10) (1986) 5. Foschini, G., Miljanic, Z.: A simple distributed autonomous power control algorithm and its convergence. IEEE Trans. Veh. Technol. 42(4) (1993) 6. Holliday, T., Bambos, N., Goldsmith, A.J., Glynn, P.: Distributed power control for time varying wireless networks: Optimality and convergence. In: Proc. 41st Allerton Conf. (2003) 7. Hande, P., Rangan, S., Chiang, M.: Distributed Uplink Power Control for Optimal SIR Assignment in Cellular Data Networks. In: Proc. INFOCOM. (2006) 8. Price, J., Javidi, T.: Decentralized and Fair Rate Control in a Multi-Sector CDMA System. In: Proc. WCNC. (2004) 9. Yim, R., Shin, O.S., Tarokh, V.: A CDMA Reverse Link Rate Control algorithm with Fairness Gaurantee. In: Proc. IWCT. (2005) 10. Tinnakornsrisuphap, P., Lott, C.: On the fairness and stability of reverse-link MAC Layer in cdma2000 1xEV-DO. In: Proc. ICC. (2004) 11. Elbatt, T., Ephremides, A.: Joint Scheduling and Power Control for Wireless Adhoc Networks. In: Proc. INFOCOM. (2002) 12. Kumaran, K., Qian, L.: Uplink Scheduling in CDMA Packet-Data Systems. In: Proc.INFOCOM. (2003) 13. Cruz, R.L., Santhanam, A.V.: Optimal Routing, Link Scheduling and Power Control in Multi-hop Wireless Networks. In: Proc. INFOCOM. (2003) 14. Viterbi, A.J.: Principles of Spread Spectrum Communication. Addision-Wesley (1995) 15. Sridharan, A., Subbaraman, R., Gu´erin, R.: Distributed Uplink Scheduling in CDMA Systems. Research Report RR06-ATL12070139, Sprint ATL (2006)

Resource Allocation in DVB-RCS Satellite Systems Andr´e-Luc Beylot1 , Riadh Dhaou1 , and C´edric Baudoin2 1

IRIT/ENSEEIHT, 2 rue C. Camichel, BP7122, F-31071 Toulouse Cedex 7, France [email protected], [email protected] 2 Alcatel Alenia Space, 26 avenue JF Champollion BP1187, F-31037 Toulouse Cedex 1, France [email protected]

Abstract. This paper compares several approaches for dynamic allocation in geo-stationary networks based on DVB-RCS system. Each Satellite Terminal (ST) regularly sends requests to the Network Control Center (NCC) which in turn allocates resource to the users. Unfortunately, this delayed request-assignment makes the dynamic bandwidth allocation very diﬃcult. Simple mechanisms such as a ﬁxed allocation or requests based on the current size of the terminals’ queue are compared to predictive methods based on control theory techniques which have been previously proposed. A lower bound is also derived by considering that the actual size of the buﬀer can be instantaneously known. It is shown that if the traﬃc is not really bursty, a ﬁxed allocation which implies lighter signalling mechanisms leads to good results. In bursty traﬃc conditions, simple mechanisms for which the requests correspond to the actual size of the buﬀer may lead to the best performance results. Keywords: Resource allocation, Modeling and performance evaluation.

1

Introduction

DVB-RCS (Digital Video Broadcasting - Return Channel by Satellite) satcom systems provide a shared uplink between various users of terminals. Dynamic allocation mechanisms, taking into account the needs for each terminal, have been investigated to enable an optimal use of the expensive and limited bandwidth. The capacity requests are calculated according to the generated traﬃc in the terminal. Several modes of calculation are proposed which are based on the size of the buﬀer information and on the input and/or output rates with possible use of a calculation window taking into account the satellite delay. The allocation requests are then processed in the Network Control Center (NCC). This one carries out a control on the radio resource according to the requests and to their priority. In this paper, we deﬁne several methods of calculation of requests and allocations. The allocation loop is characterized by an important delay, a shared access to the resource, a granularity of allocation. The methods aim at decreasing the average latency of the traﬃc in the buﬀers while insuring fairness I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 511–521, 2007. c IFIP International Federation for Information Processing 2007

512

A.-L. Beylot, R. Dhaou, and C. Baudoin

between satellite terminals. The suggested loops are modelled and evaluated to compare their performance, their complexity and their sensitivity to the traﬃc type.

2 2.1

Dynamic Resource Allocation The System

The DVB-RCS standard considers various classes of service on the MAC level and associated allocation methods. Packets have a ﬁxed size. – CRA: Constant Rate Assignment. This class is not the subject to allocation requests, the capacity is allocated (in slot/frame) at the connection setup and remains constant. This operation is used for delay sensitive traﬃcs (voice). – RBDC: Rate Based Dynamic Capacity. This category is used for the traﬃcs with variable ﬂow which can tolerate the response time of the MAC scheduler. The bandwidth is granted on request to run out the instantaneous traﬃc rate. – VBDC: Volume Based Dynamic Capacity. Used for the traﬃcs without any time constraints, VBDC requests are expressed in number of slots and satisﬁed if there are remaining slots. The duration of a frame Tf is equal to 30ms. An allocation cycle will include propagation delay and various computing times (terminal and NCC) and is set to d = 23 frame durations. Thus, the request for frame n must be calculated and sent in frame n − 23. In such a context several allocation methods have been proposed. 2.2

Predictive Allocation Techniques

Several techniques of control were recently applied to communication networks. They lead to performance gains given adequate traﬃc conditions. The Smith’s Predictor (SP) [7] is a popular and very eﬀective long dead-time compensator for stable processes. The main advantage of the SP method is that the time delay is eﬀectively taken outside the control loop in the transfer function. Classical SPs are used to remove potentially destabilising delays from the feedback loop by employing “loop cancellation” technique. The SP has been successfully used in designing congestion control algorithms in TCP [5] and ATM [6] networks. Authors in [4] proposed a DAMA mechanism based on SP controller, making it possible to adapt the load on-demand for satellite networks. Results are limited to the study of simple parameters and to systems which can be described by ﬁrst order’s models with a long delay (the multiple combinations of parameters and the study of systems with models of a higher nature are still open). The SP algorithm can be described as follows. Let yi (n) denote the allocation request

Resource Allocation in DVB-RCS Satellite Systems

513

emitted by terminal i during frame n and ui (n) its buﬀer size at the beginning of the same frame. (K is a gain factor 0 < K < 1, only integer values for the requests are considered): yi (n) = K(ui (n) +

n−1

yi (j))

(1)

j=n−d

In the present paper, the allocation technique mainly concerns VBDC traﬃc. Consequently, if the total amount of the requests is lower than the actual number of slots C, all the requests are satisﬁed. Else, the allocation is proportional to the request. yi (n − d) if k yk (n − d) ≤ C (2) ui (n) = yi (n−d) yk (n−d) + εi (n − d) otherwise k

As results may not be integer values, the remaining slots are randomly and uniformly distributed among active terminals whose request has not been fulﬁlled. εi (n − d) stands consequently for an additional allocated slot. A novel approach for dynamic bandwidth allocation (DBA) in satellite networks has been presented in [1,2,3]. Each ST uses a local adaptive predictor to forecast the future input traﬃc ﬂow along with a local predictive (recedinghorizon) controller to generate a bandwidth request to the NCC. This approach gives good results when the traﬃc model is well known. The diﬃculties to adapt the predictor to actual traﬃc conditions and the complexity of the optimization of the controller have been evaluated [8] but will not be discussed in this paper. Moreover, several papers [13,14,15] shown that adaptive resource allocation policies may outperform the ﬁxed policies. 2.3

Other Allocation Techniques

In the present paper, we aim at comparing simple algorithms to the SP method. The ﬁrst one supposes that the controller knows instantaneously the buﬀer state of the terminals. Of course, this method cannot be implemented because the necessary delay to indicate packet arrivals is exactly the problem we face. Nevertheless, it leads to optimal performance results to which other techniques may be compared. This method is named “Big Brother” in the present paper. Many algorithms may be chosen to implement fairness principles between STs. When considering inﬁnite buﬀers, all the work-conserving policies lead to the same value of the mean delay. We did not choose any of them. The second method which has been considered is a Fixed Allocation. Terminals receive a constant number of slots per frame. This allocation may be done during the set-up period of the terminal. In the present paper, uniform traﬃc conditions are considered. The number of allocated slots may vary according to the number of active sessions using a simple procedure and with a very light signalling mechanism. For the evaluation, the duration of the allocation period is supposed to be large enough to consider that a terminal is allocated a constant number of slots during a frame period. If it has fewer packets to send, the

514

A.-L. Beylot, R. Dhaou, and C. Baudoin

corresponding slots will be lost. The principle is that the round-trip delay is so long that it is not eﬃcient to implement solutions at a packet level. The third method is exactly the opposite of the previous one: a signalling procedure is implemented. The terminals send requests reﬂecting their current buﬀer size. This mechanism is named “Candid Algorithm” because no intelligence is required to calculate the size of the requests. The controller collects all those requests, calculates the corresponding numbers of allocated slots and sends back these values.

3 3.1

Model and Analysis of the System Model of the System and Traﬃc Assumptions

In the present paper, we do not investigate the actual scheduling of the slots or the exact position of the allocated slots in the frame. Bulk departures are considered at the beginning of the frame during which packets are actually sent. Diﬀerent traﬃc models will be considered. For simplicity, we will ﬁrst assume that packets arrive to the queues according to independent Poisson processes. Interrupted Poisson Processes (IPP) will then be considered in order to show the inﬂuence of the burstiness of the input traﬃc on the performance criteria. 3.2

Models “Big Brother” and “Candid” - Poisson Arrivals

In the case of Poisson Arrivals, the analysis of the two methods can be done through a M/D(H) /1 queue with bulk departures where H is the maximal size of the burst. This size is constant and is equal to C in “Big Brother” case and to C N in “Candid method” (N is the number of terminals). The evolution of such a queue at departure epochs can be described as follows: Xn+1 = M ax {Xn − H, 0} + An

(3)

where Xn is the number of packets in the queue (or in the aggregate queue in the Big Brother case) just before departure and An the number of arrivals during frame n. This queue has been extensively studied. Using a z-transform approach, classical solutions consist on ﬁnding the roots of the denominator of: X(z) =

A(z)(S(z) − z H ) A(z) − z H

(4)

where X(z), A(z), S(z) are the pgf of the stationary queue length, the number of arrivals and the number of departures. The mean number of packets in the queue can be expressed as follows E(X) = ρ +

A (1) − S (1) 2(H − ρ)

(5)

ρ is the mean number of arrivals during a frame ρ = A (1) = S (1). We propose new bounds of such a term using the following considerations. The ﬁrst lower

Resource Allocation in DVB-RCS Satellite Systems

515

bound consists on observing that the second moment of the number of departures is lower than the second moment of the number of arrivals. It is a lower bound because the number of departures is bounded by the maximal size of the bulk. This bound is reached if the number of departures is exactly equal to the number of arrivals. The second lower bound considers that either 0 or H packets are served. As S (1) = ρ, we get S (1) ≤ (H − 1)ρ. An upper bound of the mean queue size is derived by considering that the variance of the number of departures is positive thus, S (1) ≥ ρ2 − ρ. Finally, M ax(ρ +

ρ2 − (H − 1)ρ ρ , ρ) ≤ E(X) ≤ ρ + 2(H − ρ) 2(H − ρ)

(6)

In the present paper, we also implemented the very smart solution proposed in [9]. It consists on inverting arrivals and services and considering that Eq. 3 corresponds to the evolution of the response time of a discrete D/G/1 queue with, in this case, a service time which is distributed according to a Poisson Process. The mean number of packets in the queue is thus equal to: +∞ +∞ 1 E(X) = ρ + (i − kH)ci,k k k=1

(7)

i=kH

where ci,k is the probability that i packets arrives during k frames. The main interest of this method is that these expressions are root-free. To implement this method, we set a maximal value for the inﬁnity. Classical bounds, such as Kingman’s formula [10] of G/G/1, can also be applied E(X) ≤ r + r2 (Ca2 + Cs2 )2(1 − r)λ

(8)

ρ , λ is the arrival rate of packets, Ca2 and Cs2 are respectively the squared r= H coeﬃcient of variation of “arrivals” and “service times” of the D/G/1 queue. We ρ can easily ﬁnd that Ca2 = 0 and that Cs2 = ρ1 . Consequently, E(X) ≤ ρ + 2(H−ρ) which is exactly the upper bound that has been derived in Eq. 6. In order to derive the mean delay, we can substract from all the previous results the mean number of departure ρ, then apply Little’s formula [11] and ﬁnally add the mean time between arrival time and the beginning of the following frame. As Poisson arrivals are considered, this parameter is equal to 12 frame duration (a more formal proof can be obtained by deriving the mean number of packets at arrival instant and applying then Little’s formula). Thus, the mean delay (expressed in number of frames) is equal to:

E(R) =

3.3

E(X) 1 − ρ 2

(9)

Models “Big Brother” and “Candid” - Bursty Arrivals

Many models can be considered to take into account the burstiness of the traﬃc. In the present paper, we consider Interrupted Poisson Process in order to derive

516

A.-L. Beylot, R. Dhaou, and C. Baudoin

some numerical results. Our aim was mainly to compare the diﬀerent methods rather than ﬁnding exact performance results. As it has been mentionned earlier, if the optimal allocation policy depends to much on the traﬃc characteristics, an erroneous model of traﬃc may lead to bad performance results. IPP processes were introduced in order to model ON/OFF traﬃc with “ON” and “OFF” period duration exponentially distributed (with respective parameter α, β) and Poisson arrivals during the “ON” period (rate γ). The superposition of such processes is not IPP [12]. The performance criteria will thus be derived through the analysis of a IP P/D(H) /1 queue for “Candid” method and of a n − IP P/D(H) /1 queue in “Big Brother” case. We will consider the embedded Markov Chain at departure epochs, by considering the number of packets in the queue and the number of active source(s). The number of arrivals during a frame period An depends on the number of active sources. For “Candid” mechanism, we thus analyzed the continuous process corresponding to the number of packets generated since the beginning of the period N (t) and the state of the source E(t). The transient analysis of this process have been previously been performed [12] using a z-transform approach Φi (z, t) = k P r[E(t) = i, N (t) = k]z k . In the present paper, we prefer a Laplace-Transform approach. Let Pj (k, i, t) = P r[N (t) = k, E(t) = i|E(0) = j] and Pj∗ (i, k, s) denote its Laplace transform. We easily get the initial values: ∗ ∗ P1 (0, 1, s) = α+s P0 (0, 1, s) = ατ τ (10) s+β+γ ∗ P0 (0, 0, s) = τ P1∗ (0, 0, s) = βτ where τ = s2 +(α+β+γ)s+αγ. The following terms can be recursively described γ(s+α) ∗ ∗ ∗ Pj (k − 1, 1, s) as follows: Pj∗ (k, 0, s) = βγ τ Pj (k − 1, 1, s) and Pj (k, 1, s) = τ Let s1 and s2 denote the poles of the denominator τ . After some calculations, those Laplace transforms can be inverted. They are of the following forms: Pj (k, i, t) =

k+1

tj−1 (uk,i,m exp(+s1 t) + vk,i,m exp(+s2 t)) (j − 1)! m=1

(11)

Coeﬃcients uk,i,m and vk,i,m are recursively computed [8]. A numerical method has been implemented to derive the values of those functions in t = Tf and we 1 only consider a ﬁnite number of terms. Let Rj,i,k = Pj (k, i, Tf ). The numerical solution leads to the steady state probabilities (number of packets, state of the and thus the mean number of packets just before departure is equal source) xk,j to E(X) = k k(xk,0 + xk,1 ). Using the same argument as in the Poisson arrival case, we can again apply Eq. 9. In Candid method, we have to study a n − IP P/D(H) /1 queue. We nearly applied the same method. The number of arrivals during a frame period duration given the number of active sources are recursively derived by using previous equations. m+1 = Rj,i,k

1 k p=0 q=0

m 1 Rj,i,q R0,p,k−q (1 −

i i m 1 ) + Rj−1,i,q R1,p,k−q m m

(12)

Resource Allocation in DVB-RCS Satellite Systems

4 4.1

517

Results Analytical Results - Poisson Arrivals

In the present work we consider a high load ρ = 0.8 (for lighter traﬃc conditions, predictive methods are less useful). The frame duration Tf is equal to 30ms. The maximum aggregate bit rate is set to 2Mbps which nearly leads to a maximal number of slots per frame equal to C = 140. In ﬁgure 1, we represented the mean access delay as a function of the number of terminals N . The case N = 1 corresponds to “Big Brother” mechanism. We compared the diﬀerent analytical methods (upper bound, lower bound, numerical solution) to simulation results. The numerical method leads to excellent results. They are nearly equal to those obtained using a discrete event simulation whose conﬁdence intervals are extremely low. The upper bound is quite accurate. The load is high, it corresponds to traﬃc conditions for which Kingman’s formula is known to be accurate. The diﬀerence between this upper bound and simulation results is between 3% and 20%. The lower bounds are also accurate; the one considering that the variance of the number of departures is equal to the variance of the number of arrivals leads to good results when the number of terminals is low. In this case, the number of slots allocated to a terminal is high and consequently, most of the packets arrived during frame n will be emitted in frame n + 1. The other lower bound is exact when each terminals are allocated one slot per frame and leads to inconsistent results when the number of terminals decreases. 4.2

Comparison of the Allocation Methods, Poisson Arrivals

We will compare the ﬁxed allocation method, the SP technique (for various values of K), and the lower bound under Poisson arrivals. The standardized load of the system is 0.8 and we vary the number of terminals. SP leads to worse results than a ﬁxed allocation especially when the number of terminals is large. With regard to the SPs parameter setting, we note that larger values of K lead to better results. The value K = 1 was considered, even if it constitutes an extreme value (the series of requests diverge). We can note that for this value of K the diﬀerence with the ﬁxed allocation method is lower. We are inclined to say that if the system is heavily loaded, all the resources will be very intensely used and that devoting them to a terminal or another does not change the global result. As Poisson arrivals are considered, the lower bound is nearly equal to 0.5 (a packet has to wait at least for the beginning of the following frame) and does not depend on the number of terminals (we did not plot a curve ; it corresponds to the ﬁxed technique with N = 1 terminals). We can ﬁnally think that these good results are due to Poisson arrivals which are not sporadic enough compared to real traﬃc. Poisson sources result in a rather good regularity in arrivals ﬂows, which could correspond to an unfavourable conﬁguration for the SP. Under a sporadic traﬃc, in particular with larger durations of activity, SP may have time enough to stabilize itself and thus to outperform a ﬁxed allocation.

518

A.-L. Beylot, R. Dhaou, and C. Baudoin Load=0.8, Poisson Arrival 4 Simulation Numerical Upper bound=Kingman Upper bound

Average response time (in number of frames)

3.5

3

2.5

2

1.5

1

0.5

0 0

20

40

60 80 N (Number of terminals)

100

120

140

Fig. 1. Analytical results: Mean Access Delay (in number of frames) as a function of the number of terminals, Poisson Arrivals Load=0.8, Poisson Arrival, static method vs smith predictor

Average response time (in number of frames)

10 Fix-Simulation Fix-Numerical Smith k=1 Smith k=0.8 Smith k=0.6 8

6

4

2

0 0

20

40

60 80 N (Number of terminals)

100

120

140

Fig. 2. Mean Access Delay as a function of the number of terminals, Poisson Arrivals

Resource Allocation in DVB-RCS Satellite Systems

4.3

519

Inﬂuence of the Burstiness, IPP Arrivals

Let us now compare the results obtained with ﬁxed allocation, SP, Candid and “Big Brother” methods under bursty traﬃc conditions. We consider again a normalized load of 0.8. IPP processes are characterized by three parameters: the mean duration of “OFF” periods α1 , the mean duration of ON periods β1 α and the arrival rate during “ON” period γ. Thus, ρ = N C γ α+β γ is to packets per frame during active periods. We took several values of the duration rate of active periods β and we deduced the duration rate of silence periods α as a function of the number of terminal N . Results are depicted in Fig. 3 and 4. In Fig. 3, the average duration of “ON” periods is short (1 frame). The diﬀerence between the allocation mechanisms and the lower limit remains reasonable. The ﬁxed allocation gives better results than the SP; the performance is degraded when we increase the number of terminals. The “candid method” gives the worst results but these results are less sensitive to the number of terminals. In this conﬁguration, the best results obtained with the SP correspond to the value K = 1.0 for the same reasons invoked when considering Poisson arrivals. We also note that the durations of the “ON” periods remain short compared to the supposed reactivity of the SP. Load=0.8, gamma=15, beta=1, static method vs smith predictor vs candid method Big Brother-Simulation Big Brother-Numerical Fix-Simulation Fix-Numerical Smith k=1 Smith k=0.8 Smith k=0.6 Candid

Average response time (in number of frames)

14

12

10

8

6

4

2

0 10

15

20 N (Number of terminals)

25

30

Fig. 3. Mean Access Delay as a function of the number of terminals, IPP arrivals

In Fig. 4, the mean duration of the “ON” periods is set to 100 frames. The results derived from the diﬀerent allocation methods are now very far away from the lower bound. The system becomes extremely weak; the periods of overload

520

A.-L. Beylot, R. Dhaou, and C. Baudoin Load=0.8, gamma=15, 1/beta=100, static method vs smith predictor vs candid method 400 Big Brother-Simulation Big Brother-Numerical Fix-Simulation Fix-Numerical Smith k=1 Smith k=0.8 Smith k=0.6 Candid

Average response time (in number of frames)

350

300

250

200

150

100

50

0 10

12

14

16 N (Number of terminals)

18

20

22

Fig. 4. Mean Access Delay as a function of the number of terminals, IPP arrivals

are very long. Even in the “big brother” case, the performance deteriorates. The performance is clearly degraded when we increase the number of terminals with the ﬁxed allocation whereas dynamic methods allow on a limited horizon a more eﬃcient absorption of traﬃc bursts. SP and candid methods are less sensitive than the ﬁxed allocation on the number of terminals. If we consider N = 10 terminals, all the methods nearly lead to the same results. Starting from N = 14, we observe for the ﬁrst time the interest of the SP compared to the ﬁxed allocation method that does not work properly. In this conﬁguration, in the ﬁxed allocation scheme, terminals are allocated 7 slots in each frame, which does not make it possible any more to absorb instantaneously peaks of traﬃc. The beneﬁt of the absence of signalling is reduced. The optimal value of K is K = 0.6. The SP is better (except for K = 1.0) than the ﬁxed allocation... but in these cases, “Candid” method leads to better performance results.

5

Conclusion

In GEO Satellite systems, dynamic resource allocation is a diﬃcult problem because of the very important delay which separates requests from their response and of the strong potential variability of injected traﬃcs. The Smith’s predictor has been extensively studied in communication networks. If traﬃc variations or delays are low, this method can be interesting. Nevertheless, ﬁxed allocation or multi-step allocation appears more eﬀective in the satellite context. Taking into account of a predictor of traﬃc is undoubtedly important. However, our

Resource Allocation in DVB-RCS Satellite Systems

521

experiment shows that if the traﬃc has not the same nature as the modelled traﬃc, results are questionable. In our experiments, it appears that for not very sporadic traﬃcs, the ﬁxed allocation is the most eﬃcient (because it does not generate signiﬁcant signalling’s overhead and delay) and that for much more sporadic traﬃcs, a “naive” method aiming at emitting the current size of the buﬀer absorbs more eﬃciently traﬃc variations in particular over long durations. In conclusion, the methods based on the Smith’s predictor may possibly be better than simpler ﬁxed allocation methods when the systems are heavily loaded or when the durations of the burst periods are large. However, if these periods are really long, some signalling could make it possible to switch from a step of ﬁxed allocation to another one. It could be advisable to consider, in further analysis, real traﬃc traces and to focus also on throughput and fairness.

References 1. Chisci L, Fantacci R, Pecorella T: Multi-terminal dynamic bandwidth allocation in GE0 Satellite Networks. In: IEEE VTC04 Spring (2004) 2797-2801. 2. Chisci L, Fantacci R, Pecorella T: Predictive bandwidth control for GEO satellite networks. In: IEEE ICC 2004 (2004) 3958–3962. 3. Chisci L, Fantacci R, Francoli F, Pecorella T: Dynamic Bandwidth Allocation via Distributed Predictive Control in Satellite Networks. In:IEEE ISCCSP04 (2004) 373–376. 4. Delli Priscoli F, Pietrabissa A: Load-adaptive bandwidth-on-demand protocol for satellite networks. In:IEEE ICDC02, Las Vegas (2002) 4066-4071. 5. S. Mascolo S: Congestion control in high-speed communication networks using the Smith principle. Automatica, Vol. 35 (1999) 1921–1935. 6. Mascolo S, Di Sciascio E, Grieco A: End-to-End Congestion Control and Bandwidth Estimation in High Speed ATM Networks. In:International Conference ITI 2001, Pula, Croatia (2001) 57–62. 7. Smith O.J.M: Closer control of loops with dead time. Chemical Engineering Progress. Vol. 53 (1957) 217–219. 8. Beylot AL, Dhaou R: Optimisation de boucles dallocation. Alcatel Space Grant Final Report (2006). 9. Janssen A, van Leeuwarden J: Analytic computation schemes for the discrete-time bulk service queue. Queueing Systems. Vol. 50 Springer (2005) 141-163. 10. Kingman J: On the Algebra of Queues. Methuen, London (1966). 11. Little J: A proof for the queuing formula: L = λW , Op Research Vol. 9 (1961) 383-387 12. Fischer W, Meier-Hellstern K: The MMPP cookbook. Perf Eval. Vol. 18 (1992) 149-171 13. Alagoz F, Vogcic B-R, Walters D, Altrustamani A, Pickholtz R-L: Fixed versus adaptive admission control in direct broadcast Satellite networks with return channel. IEEE Jour. on Sel. Areas in Comm. Vol. 22 (2004) 238-249. 14. Baglietto M, Davoli F, Marchese M, Mongelli M: Neural approximation of openloop feedback rate control in satellite networksi. IEEETrans on Neural Net., Vol. 16 (2005) 1195-1211. 15. Celandroni N, Davoli F, Ferro E, Gotta A: Networking with multi-service GEO satellites: Cross-layer approaches for bandwidth allocation. Int. Jour. of Sat. Comm. and Net., Vol. 24 (2006) 387-403.

Enhanced Downlink Capacity in UMTS Supported by Direct Mobile-to-Mobile Data Transfer Larissa Popova, Thomas Herpel, and Wolfgang Koch University Erlangen-Nuremberg, Germany {popova,koch}@LNT.de [email protected]

Abstract. The goal of this work is ﬁrst to analyze the feasibility of a peer-to-peer ﬁle sharing technique in mobile cellular environments, taking into account key characteristics and peculiarities of the UMTS Radio Access Network (UTRAN). The concept is referred here to as mobile-tomobile (m2m). Next, our research eﬀorts explore the performance beneﬁts of m2m ﬁle sharing applications in UMTS networks in terms of releasing overall downlink capacity, which can be used to provide better Quality of Service (QoS) for real-time services. To evaluate the performance of the proposed m2m concept we conducted extensive simulation studies with appropriately modiﬁed radio propagation models for low antenna heights for both, transmitter and receiver, as it is typical for m2m. Two alternative scenarios of serving user requests (m2m network mode and conventional UMTS mode) have been constructed and analyzed. The results indicate a dramatic increase in service probability and overall throughput gain of up to 85 % in a UMTS network, supported by the m2m data transmission mode. Furthermore, results show that by a well-designed m2m routing policy and proper utilization of currently not used uplink resources (due to asymmetric uplink/downlink traﬃc load) substantial reduction of the expected ﬁle download time can be achieved. Keywords: Direct mobile-to-mobile data transfer, WCDMA, released downlink capacity, asymmetric uplink/downlink traﬃc load, uniﬁed radio interface, data exchange policy.

1

Introduction

Rapid growth of new multimedia services such as real-time streaming, distributed video conferences, with their demand for high data rates, as well as downloading of popular movies or music ﬁles put a considerable load onto the valuable and limited resources of radio networks. It is well known that in UMTS networks the traﬃc load is asymmetrically distributed between uplink/downlink. Typically, the downlink is the potential bottleneck, while free resources may be available in the uplink. Thus, a proper management of the air interface resources is a challenging task in order to utilize the spectral resources more eﬃciently. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 522–534, 2007. c IFIP International Federation for Information Processing 2007

Enhanced Downlink Capacity in UMTS

523

Nowdays, mobile networks are widely used for so-called background services such as digital camera images, mp3 ﬁle downloads or new movie trailers. Although these services are interesting for many users, conventional solutions for capacity utilization such as multicast or broadcast are not applicable since users request service at diﬀerent times. The above considerations stimulate research for employing new paradigms in the architecture of cellular communication systems, as well as in policies in handling of user service requests. In this work we propose an approach to overcome the capacity limitations of cellular networks and to prevent exhaustion of the downlink capacity by enabling direct data transmission among the background users within a radio network. Currently, there are only a few proposals for a mobile peer-to-peer ﬁle sharing architecture. In [7] the authors are motivated by the potential UMTS capacity improvement of embedding WLAN (Wireless Local Area Network) systems in UMTS. The eﬀort of this research is focused on the basic applicability and enhanced routing of the information in the proposed hybrid network by using a series of short mobile-to-mobile hops between already connected mobiles to extend coverage into areas not covered by the conventional UMTS cell. The propagation model they used is the free-space model, which is not realistic for terrestrial scenarios. The feasibility of the eDonkey Internet protocol in a GPRS environment was investigated in [4,6] and extended to UMTS radio networks in [5]. The main focus lies on resource mediation and control by using diﬀerent strategies for data caching in the wired part of the network. Thus, signalling and data traﬃc is displaced from the air interface to the core network. No direct data transmission among the users is intended. All the above mentioned studies are based on entirely diﬀerent approaches and have other goals than the concept, proposed in this paper. In our concept the users that are interested in downloading a popular ﬁle form a mobile cooperative community (groups) and using the fact that traﬃc load of multimedia services is asymmetrically distributed between uplink and downlink, contribute their own currently not used uplink capacity for providing the packets of the content to other users in the group (in their coverage range) in multicast mode on the uplink carrier frequencies. Our concept is denoted as m2m (mobileto-mobile). Although we focus on the FDD (Frequency Division Duplex) mode of WCDMA (Wideband Code Division Multiple Access) the principle can be applied to other systems as well. Particularly, in hotspot environment users increasingly demand ubiquitous data availability. Thus, the main focus of our analysis lies in the optimization of data availability to users in hotspots (e.g. airports, railway stations) by using the m2m ﬁle sharing technique. We consider the background user population, dynamically joining and leaving the system. The users organize communities which are interested, for example, in the movie trailer or the latest computer games. Groups are dynamically reshaped, so that the members of each group represent a relatively loosely coupled formation. Instead of transmission of the complete data ﬁle by providing individual links from the Node B to each user, the original ﬁle, which is available somewhere in the network, is divided into

524

L. Popova, T. Herpel, and W. Koch

m logical packets and distributed packet by packet to active background users in order to generate one complete copy of this ﬁle in every radio cell. A user which received a packet from Node B acts as a server for that particular packet. The mobile terminals (MTs) are assumed to be able to receive in both uplink and downlink and each new user brings further uplink resources into the system. By this cooperative data transfer among the users, the Node B does not need to spend a lot of its valuable resources to accommodate the population of the above mentioned background user-class, as a result a major part of the traﬃc is shifted away from the downlink, making the released downlink capacity available for other (e.g. real-time demanding) services. The main goal of this work is to show the feasibility of the proposed m2m technique for cellular radio networks like UMTS and to analyze advantage of using m2m application in some speciﬁc wireless scenarios, which, we believe, are of practical importance. The performance results were evaluated taking into account the speciﬁcs of UTRAN, e.g. wireless interference, realistic propagation model, etc.

2 2.1

M2M Model Characteristics and Assumptions M2M Concept

To enable the wireless peer-to-peer technique we revised the idea of a mesh cooperative architecture for the ﬁxed-line Internet [2] and appropriately extended it for the UTRA-FDD (Frequency Division Duplex). In Figure 1 the main concept of the m2m technique is demonstrated. MTs, which are interested in downloading a popular content participate in the ﬁle sharing via direct m2m data transfer with the purpose to reconstruct this original popular content, which is distributed in the network. However, in contrast to Internet peer-to-peer applications in a cellular system the number of users who can cooperate with each other is limited by the transmit power of the MT and its coverage range, which will be typically less than a cell. Therefore, in case of wireless cooperative community formation, mobile communities are location and radio propagation dependent. Thus, to cooperate with each other, m2m users must be organized into groups with nearby located users. Grouping is not restricted to users of one cell. The decision to which group a user should be assigned is based strictly on current propagation conditions, described in Section III. Upon arrival each new user, which is interested in download of the popular content, establishes contact to the Node B/RNC (Radio Network Controller) in order to get an authorization to participate in m2m ﬁle transfer and to get information about nearby located m2m users. All authorized m2m users must allow using of their uplink capacity for providing the packets of the content, he has requested, to other m2m users, which are interested in it. In order to reduce the transmission of identical packets on the network links the m2m transfer is performed in multicast mode on the normally only partly used uplink carrier frequency, whereas receivers in the group switch to listen on the uplink.

Enhanced Downlink Capacity in UMTS

525

Fig. 1. M2M Concept

2.2

Radio Interface Restrictions

In the following some restrictions and potential solutions of implementing a peerto-peer technique in UMTS Radio Access Network (UTRAN) are listed. – One of the main restrictions of UTRAN is wireless interference. In order to avoid interference from an MT transmitting in m2m mode on other signals at the Node B receiver, the transmit power is set to the minimum, which is -44 dBm according to 3GPP speciﬁcations [1]. – Compared to ﬁxed-line networks the air interface of wireless systems has a relatively limited transmission capacity. Furthermore, the eﬀect of user blocking, as a consequence of congestion control, is expected to occur more often than in ﬁxed-line environments in both directions. Thus, in order to increase the eﬃciency of uplink bandwidth usage it is necessary to maximally reduce the signalling information. This can be done by taking advantage of already existing infrastructure of mobile communication systems, e.g. the network providers know the online status and service agreement of the mobile users. – The limitation of battery capacity of the handsets results in a lower online time compared to the ﬁxed-line desktop PCs. Thus, a well designed ﬁle sharing organization between mobiles is essential. – There is a need for appropriate propagation models for wireless peer-to-peer links. The problems and challenges such as a fair scheduling, battery life, billing, security and rights management could be mitigated by strategies such as 1) oﬀering bonus system (e.g. upload credits) to network subscribers who choose to allow using their uplink capacity to provide the contents to other users, 2) by implementing restrictions on the amount of traﬃc that a particular terminal is permitted to distribute, to save the battery life. The contents liable to cost can be also distributed with m2m strategy, might be, however, protected, for example, via DRM (Digital Rights Management). However, all analysis in this work are mainly focused on the technological characteristics of mobile peer-to-peer applications and the proper consideration of above mentioned issues is not a topic of this paper.

526

2.3

L. Popova, T. Herpel, and W. Koch

Traﬃc Model

We study the performance of the proposed m2m concept in a UMTS network with dynamic user arrival and departure pattern. User arrivals are modelled by a Poisson process and the MTs are randomly distributed over the cell area. All users are assumed to be pedestrians. We assume that there are mobile speciﬁc content types, like mp3, to be distributed with m2m strategy. The sizes of popular mp3 ﬁles used in our simulation have been taken from [4]. 2.4

M2M Propagation Model

Successful data transmission in wireless systems depends on radio propagation conditions. The characteristic of the m2m channels is signiﬁcantly diﬀerent from conventional Node B - MT links because of the low antenna heights of both receiver/transmitter and a relatively short propagation distance. Pathloss and Shadowing: In our work the shadowing process is characterized by a lognormal distribution with standard deviation σ=6 dB and the pathloss between two MTs is calculated using extended Okumura-Hata model [3]. Although, the application of this model is generally restricted to antenna heights of Node B starting from 30 m, the direct comparison of this model with a model from [8] tailored for low antenna heights at both sides shows a good suitability of the model of [3] for the present application. Figure 2 shows a scatter plot for the received power depending on transmitter/receiver separation (a minimum transmit power of -44 dBm was assumed). Fading: For the simulated traﬃc model, cf. Subsection C, flat small-scale fading (constant within one radio frame (0.01 sec)) is assumed for the m2m links. Received power samples for pairs of m2m−sender/receiver 0

Received signal power [dBm]

−20 −40 −60 −80 −100 −120 −140 −160 −180 −200

10

20 30 40 50 60 70 Distance between mobile pairs [m]

80

90

Fig. 2. Scatter plot of received signal power for m2m links as a function of pathloss and shadowing, constant transmit power of -44 dB

Enhanced Downlink Capacity in UMTS

3

527

Algorithm

We simulate a population of users that are interested in downloading a popular ﬁle, which originally is only available in the core network. Since downlink capacity is limited, Node B does not serve all users simultaneously, and the popular ﬁle is divided into m logical packets, each with an individual ID, and distributed periodically (every 10 frames) packet by packet to active background users in order to generate one complete copy of this ﬁle in every radio cell. A user which received a packet from the Node B behaves as a server for that particular packet and uses its uplink resources for providing the packet. The users are organized into dynamically reshaped multiple groups to cooperate with each other with the purpose to reconstruct the original ﬁle. 3.1

Group Organization Policy

In the following general assumptions for the organization of dynamic groups in m2m ﬁle sharing are presented. – Each new user looks for a group to join when trying to receive popular content. In case that no appropriate group for a new m2m user is found he forms a group with stand-alone user only and waits until the next group update. – MTs form groups, which satisfy the following conditions: 10lg PT X −Lij − Λij ≥ −112dBm | PT X = PT X,min , ∀i, j ∈ group, where Lij is the pathloss between MTs i and j and Λij is the random variable describing the shadowing process. – The simplest way to inform a new MT about all other MTs, already requesting the same content in its coverage range is to transmit “Hello” packets by all MTs, periodically. But this procedure puts considerable load on signalling channels. Thus, it would be more eﬃcient if upon arrival each new m2m user contacts a Node B that provides information about all other m2m users already in the system in the range of tens of meters, to determine the potential members of the group (triangulation or GPS). Only MT, assigned to a multiple member group sends a “Hello” packet to get an appropriate information about the pathloss to any other MTs from its group. In such a way we can reduce the signalling information between stand-alone MTs and prolong the limited battery life of MTs. – The size of the groups is restricted to a maximum number of members. – Users in the group dynamically join and leave the group at any time (battery life, handover). Each MT can be a member of only one group at the same time. Groups are periodically updated (every 100 radio frames) and reshaped in order to check positions of MTs and their radio propagation characteristics on the one hand, and to track and authorize new m2m users or handover-users in the group in case they fulﬁll the above mentioned “joingroup” criteria, on the other hand.

528

L. Popova, T. Herpel, and W. Koch

– Information about the link quality can be obtained e.g. from “Hello” packets, periodically transmitted by MTs. – Grouping is not restricted to one cell. 3.2

Data Exchange Policy

Using the m2m service, the MT concurs in the employment of its own free uplink resources (service level agreement). We assume that each user knows about the packets it has downloaded and the packet IDs that are available at its neighbors. The data exchange algorithm ﬁnds an appropriate “sender” candidate, based on a local “most-utile-packet” scheme, in order to maximize the number of users for which the packet can be useful. This procedure is performed framewise. After the best candidate is found, the admission control procedure veriﬁes whether the system has enough uplink capacity to accept the connection. If there is more than one sender candidate or if a sender candidate has more than one packet to send, the packet with the lowest ID will be distributed. In order to avoid packet collisions caused by wireless interference from other groups, to reduce identical packets on the network links and in turn to utilize uplink bandwidth more eﬃciently, our concept uses the so-called multicast mode among users within a group. Such a parallel packet downloading policy improves the performance of the system, in terms of number of simultaneously served m2m users. Users, which have not found any useful packets within a speciﬁed time interval, try to connect to the Node B for packet delivery.1 We now summarize the main characteristics/assumptions of our data exchange policy: – – – –

Initially, no packet is available among m2m users. The data exchange policy is based on a local “most-utile-packet” scheme. Packets can be distributed in an arbitrary order. The m2m data transfer is performed in multicast mode on the uplink carrier frequency; identiﬁcation of the sender is done using a unique scrambling code. – The MTs must be able to receive in both uplink and downlink. – No physical data channel on Node B is needed to control the intra-group data transfer. – The Node B/RNC responsibilities are 1) to distribute at least one complete copy of the original ﬁle in every radio cell (time interval between Node B “packets upload sessions” is 10 radio frames), 2) to support the data exchange process in the m2m groups with signalling information such as “listen on the uplink frequency”, etc. 3) to serve timeout requests from MTs. On the one hand the last point of the download policy gives the MTs better chances of ﬁnding missing packets and ﬁnishing the download faster, on the other hand, however, it puts additional load on the downlink resources. To more or less overcome this problem and to enable a more eﬃcient data exchange by minimizing the probability of serving m2m requests via Node B, it is necessary to estimate the optimal group size. 1

The MT can receive the single packets of a data ﬁle in a random order.

Enhanced Downlink Capacity in UMTS

4

529

Performance Evaluation

In this section we evaluate the performance of the proposed m2m concept through simulations. We ﬁrst list the assumptions and parameter settings we employed in our simulations: – The MTs are distributed randomly over the cell area. The radius of the cell is 50 m (hotspot scenario). – The MTs generate a Poisson arrival process with rate λ of requests for some popular content. – The maximum group size varies between 3-10. – Users within the group dynamically join and leave the group at any time (handover, battery discharging). – The MTs depart from the system immediately after ﬁnishing their download. – The size of a logical packet is equal to one UMTS radio frame.2 – The quality of the wireless channel in each group remains constant within each radio frame, but can vary from frame to frame (block-fading channel). – If the packet is incorrect after detection, we declare a packet loss. – The simulation time is 400 - 1200 sec and we collect data framewise. 4.1

Comparison of M2M with Conventional File Sharing

First, we focus on the download data volume characteristics when only one popular ﬁle is dispersed with the m2m technique. We compare performance results with those for the conventional UMTS mode, where a continuous transmission of data is organized by individual links from Node B to each user. We consider three performance measures: – Overall downlink throughput gain: data volume reduction in the downlink. – Service probability gain: number of served users. – Download time gain: download time reduction. We deﬁne the download time as the time window, in which the user receives the complete ﬁle. The criterion for the download time gain is the 90% quantile of ﬁnished downloads in the system. The most important simulation parameters are summarized in Table 1. Figure 3 shows the system performance versus the oﬀered traﬃc load. The upper graphs demonstrate the eﬃciency of the m2m ﬁle sharing mechanism for medium and heavy traﬃc load in terms of the download time reduction. The bottom graphs depict the m2m performance gain in terms of released downlink resources. As one would expect, the results show that the higher the traﬃc load, the more eﬃcient is the performance of m2m ﬁle sharing. Obviously, with increased traﬃc load m2m users have better opportunities to ﬁnd the content of interest, which 2

Depending on the coding scheme and spreading factor this leads to corresponding packet sizes.

530

L. Popova, T. Herpel, and W. Koch Table 1. Main simulation parameters

Traﬃc and environmental settings Traﬃc load (max. number of m2m users/cell) Maximum m2m group size Antenna type Cell radius Moving process for MT User proﬁle Radio interface and algorithmic settings File size Required data rate Maximum user data rate with 1/2-rate coding Size of logical packets for m2m data Receiver sensitivity Transmission power in m2m mode Eb /N0 target Inner loop power control for m2m sender Simulation step Group update period

10 (low), 30 (medium), 50 (high) 3, 7, 10 omni-directional 50 m Gaussian Random Walk Pedestrian 500 KByte 60 kb/s 30 kb/s 225 bit (coded) -112 dBm -44 dBm 3 dB OFF 1 radio frame (0.01 sec) 100 radio frames (1 sec)

positively inﬂuences the ﬁle downloading time, on the one hand and increases downlink throughput capacity on the other hand, making the released Node B capacity available to other services. Table 2 shows some numerical values for the overall downlink throughput gain and relative download time reduction for complete ﬁle download in m2m network mode compared with conventional UMTS data transmission (+: m2m slower, -: m2m faster). The rows “Data volume in DL in conv. mode [MB/cell]” and “Data volume in DL in m2m [MB/cell]” show how many Mbyte of data had to be sent via the downlink channels in order to distribute the data ﬁle of 500 Kbyte size to the users within one cell. We examine now some parameters and their inﬂuence on the system performance. 4.2

Impact of Group Size

We consider again the same m2m scenario with maximum group size 3, 7, 10 and investigate the eﬀect of the group size on the performance of the proposed Table 2. Overall downlink throughput gain Load low medium high Data volume in DL in conventional mode [MB/cell] 5.73 16.51 34.44 Data volume in DL in m2m mode [MB/cell] 1.33 2.93 4.67 Released DL capacity [%] 76.70 82.23 86.44 Time m2m [sec] +189.3 -20.1 -142.3

Enhanced Downlink Capacity in UMTS Comparison of download times (medium load)

Comparison of download times (high load)

1

1

UMTS conventional m2m 0

100

200

300 Time [sec]

400

Δ t = 142.3 sec

Probability of completed downloads

Probability of completed downloads

Δ t = 20.1 sec

0

500

UMTS conventional m2m 0 0

600

100

Comparison of reserved downlink channels (medium load)

200

300

400 500 Time [sec]

600

700

800

Comparison of reserved downlink channels (high load)

250

180 UMTS conventional m2m

160

UMTS conventional m2m Reserved downlink channels

200 Reserved downlink channels

531

150

100

50

140 120 100 80 60 40 20

0 0

100

200

300 Time [sec]

400

500

600

0 0

100

200

300

400 500 Time [sec]

600

700

800

Fig. 3. Performance comparison of conventional and m2m mode for medium traﬃc load (left) and high traﬃc load (right), group size = 7

Impact of different group sizes on file download time (high load)

Impact of uplink interference on m2m data transmission, group size 3 (high load)

120 1

total erroneous successful

m2m data [packets per second]

Probability of completed downloads

100

group size 3 group size 7 group size 10

0

0

200

400 600 Time [sec]

800

80

60

40

20

1000

Fig. 4. Impact of the group size on the ﬁle download time for high traﬃc load

0 0

200

400 600 Time [sec]

800

1000

Fig. 5. Impact of uplink interference on m2m data transmission for group sizes 3 in a high loaded UMTS system

technique for low, medium and high traﬃc load. In ﬁgure 4 the inﬂuence of the group size on the ﬁle download time for a high loaded UMTS system is illustrated. Intuitively, the larger the group size, the higher the multicast eﬃciency. The number of needed senders is lower, hence, for the same number of members in

532

L. Popova, T. Herpel, and W. Koch

Table 3. Impact of uplink interference for diﬀerent group sizes and traﬃc scenarios (erroneous data [%]) Diﬀerent System Scenarios low load medium load high load

Erroneous data, % group size 3 group size 7 group size 10 3.71 1.86 2.07 13.70 5.56 6.25 18.96 8.88 8.99

the system large groups consume less bandwidth. Equivalently, with the same uplink resource consumption more members will get a service when the group size is bigger. With increased number of multiple coexisting groups (group size 3) in the network some performance degradation of the m2m technique was observed. This eﬀect is inﬂuenced by admission control, wireless interference and user mobility. Obviously, with increasing the number of groups the number of sender candidates that can be admitted is bounded by the uplink capacity. It results in a rejection of some link admission requests of sender candidates. Besides, if the group size is too small and the mobility of users is low (as assumed in this work), the probability to ﬁnd in each frame a missing packet is quite low; the number of packet requests from m2m users to Node B increases and puts additional load on the downlink resources. Another important performance criterion is the average rate of the successfully delivered packets. For systems with small group size we observed enormous m2m uplink interference, which leads to signiﬁcant performance degradation and in turn to an increase of download time. Figures 5 and 6 demonstrate the impact of the uplink interference on m2m data transmission for group sizes 3 and 7 in a high loaded UMTS system (eﬀective user data rate is 30 kb/s). Relative losses of link quality (number of corrupted data in %) for diﬀerent group sizes and traﬃc scenarios are shown in Table 3. Impact of uplink interference on m2m data transmission, group size 7 (high load)

350 total erroneous successful

200

Uplink channels and multicast receivers

m2m data [packets per second]

250

150

100

50

0 0

100

200

300

400 500 Time [sec]

600

700

800

Fig. 6. Impact of uplink interference on m2m data transmission for group sizes 7 in a high loaded UMTS system

uplink channels (low load) multicast receivers (low load) uplink channels (medium load) multicast receivers (medium load) uplink channels (high load) multicast receivers (high load)

300 250 200 150 100 50 0 0

50

100 150 Time [sec]

200

250

Fig. 7. Instantaneous number of transmitting MTs and corresponding number of receiving multicast MTs, respectively, in the cluster of 7 cells

Enhanced Downlink Capacity in UMTS

533

Table 4. The average number of multicast receivers per group for diﬀerent m2m traﬃc scenarios Load Users per one UL-channel Service probability gain, % m2m (low) 1.91 46.92 m2m (medium) 2.43 58.40 m2m (high) 3.11 67.30

4.3

Impact of Multicast Technique

As the required data rate for each m2m request is always the same it is correct to evaluate the eﬀectiveness of the proposed concept with respect to the relative service probability gain by using our knowledge about the number of successfully delivered packets in each multicast group every frame. Figure 7 and Table 4 demonstrate the beneﬁt of using multicast in terms of the relative gain in the number of MTs, that can be supported by m2m concept. Comparing our performance results with those of the conventional UMTS mode, one can observe a dramatic increase in service probability by using the proposed m2m technique.

5

Conclusions and Perspectives

In this paper, a concept based on direct m2m data exchange on uplink channels is presented and its performance for UMTS is analyzed. Our concept targets the distribution of large ﬁles in a dynamic wireless environment, where multiple user groups coexist in the UMTS network by taking wireless interference among the neighboring groups into account. The main focus of our analysis lies in the optimization of data availability to users in hotspots (e.g. airports, railway stations), since in hotspot environment users increasingly demand ubiquitous data availability. This technique does not require any centralized knowledge of the transfers in the rest of the network to perform the intra-group data transfer, but requires cooperative MTs. The ﬂexible nature of the cooperative community formation makes the proposed algorithm robust to sudden departure of any member in the group. Furthermore, we demonstrated through extensive simulations the performance advantages of using m2m mode in UMTS network for distribution of popular non-real time contents compared to the conventional UMTS mode. The following conclusions can be drawn from the numerical results: – The higher the arrival intensity and mobility of m2m users, the more bandwidth becomes available. As a consequence, the more eﬃcient is the m2m ﬁle sharing. – The performance beneﬁts provided by the m2m technique in terms of released downlink capacity are up to 85% compared to the conventional UMTS mode. – Furthermore, we observe up to 21% of download time reduction for complete ﬁle download in a UMTS network, supported by m2m data transmission mode.

534

L. Popova, T. Herpel, and W. Koch

– Besides, the number of MTs that can be supported by such a system is more than three times higher than in conventional UMTS. In this work only a few examples of numerical results are shown. In our studies we have conducted many more simulations, than can be presented here with diﬀerent parameter settings to investigate the feasibility of our concept. Our purpose was to demonstrate in this paper the advantage of using m2m application in some speciﬁc wireless scenarios, which, we believe, are of practical importance. According to the achieved results, the proposed algorithm might be a promising alternative for distribution of popular content in cellular radio networks like UMTS and motivate further analysis of using m2m technique for a wide range of scenarios.

References 1. UE Radio Transmission and Reception. 3GPP TS 25.101. 2. B. Cohen. Incentives Build Robustness in BitTorrent. In Workshop on Economics of Peer-to-Peer Systems, Berkeley, CA, USA, May 2003. 3. E. Damosso. Digital Mobile Radio towards future generation systems, COST Report 231. In Proc. of IEEE PIMRC, Taipei, Taiwan, October 1996. 4. T. Hossfeld, K. Tutschku, and F. Andersen. Mapping of File-Sharing onto Mobile Environments: Feasibility and Performance of eDonkey with GPRS. In Proc. of IEEE WCNC, New Orleans, USA, March 2005. 5. T. Hossfeld, K. Tutschku, and F. Andersen. Mapping of Filesharing onto Mobile Environments: Enhancement by UMTS. In Proc. of IEEE Pervasive Computing and Communications (PerCom), Kauai Island, Hawaii, March 2005. 6. T. Hossfeld, K. Tutschku, F. Andersen, H. de Meer, and J. Oberender. Simulative Performance Evaluation of a Mobile Peer-to-Peer File-Sharing System. In Proc. of IEEE Next Generation Internet Networks (NGI), Rome, Italy, April 2005. 7. D. Tacconi, C. Saraydar, and S. Tekinay. Ad Hoc Enhanced Routing in UMTS for Increased Packet Delivery Rates. In Proc. of IEEE WCNC, New Orleans, USA, March 2005. 8. Z. Wang, E. Tameh, and A. Nix. Statistical Peer-to-Peer Channel Models for Outdoor Urban Environments at 2GHz and 5GHz. In Proc. of IEEE VTC Fall, Los Angeles, USA, September 2004.

Impact of Technology Overlap in Next-Generation Wireless Heterogeneous Systems Ahmed Zahran1 , Ben Liang1 , and Aladdin Saleh2 1

Department of Electrical and Computer Engineering, University of Toronto 2 Wireless Technology, Bell Canada

Abstract. The integration of diﬀerent wireless access technologies is propelled by the need to support new services and better resource utilization in next-generation wireless networks. This integration complicates the system design due to the interaction of diﬀerent factors including network-oriented, application-oriented, and user-oriented system parameters. In this work, we present an analytical framework to estimate different session-level performance metrics in two-tier systems, using the 3G-WLAN integrated network as an example. We investigate the impact of the amount of coverage overlap and the topology of the underlay technology on diﬀerent session performance metrics as well as the total session cost. The obtained results show that clustering can signiﬁcantly reduce the vertical-handoﬀ signaling load and the forced termination probability of diﬀerent applications in comparison with a random topology. Additionally, the proposed cost analysis provides design guidelines for developing economical WLAN management mechanisms to maintain reduced session cost with extended WLAN coverage.

1

Introduction

Next-generation wireless network will converge the service of diﬀerent pervasive access technologies while providing the user with a large set of novel revenue generating multimedia-based applications. Hence, heterogeneity will naturally become the main system feature due to the huge diversity in the characteristics of access technologies and applications. The 3G-WLAN integrated system is an example of this wireless heterogeneous networking paradigm that has received great support from industrial and standardization bodies [1, 2, 3]. Currently, both technologies are among the most pervasive wireless access approaches that complement each other. On the one hand, 3G networks provide an expensive universal coverage; on the other hand, WLANs provide ample networking resources for the users at a cheaper cost wherever available. Hence, users will generally enjoy the best of each access technology, and service providers will enjoy better utilization of their resources.

This research was made possible thanks to Bell Canada’s support through its Bell University Laboratories R&D program.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 535–545, 2007. c IFIP International Federation for Information Processing 2007

536

A. Zahran, B. Liang, and A. Saleh

The integrated system is very rich in its parameter set, including technologyoriented parameters, application-oriented parameters, and user-oriented parameters. All these parameters are interwoven together in an inter-technology roaming enabled environment creating a real challenge for system analysis and performance evaluation. For example, in the 3G-WLAN integrated model, the bandwidth provided to the mobile terminal (MT) in both networks may vary by one order of magnitude after any change of network attachment point. Combining this fact with the bandwidth greediness of some applications due to their buﬀering or pre-fetching capabilities, one can directly conclude that inter-technology roaming, commonly known as vertical handoﬀ (VHO) [4], will signiﬁcantly inﬂuence next-generation session dynamics. This inﬂuence aﬀects diﬀerent system design aspects such as resource utilization, signaling, and quality-of-service (QoS). Consequently, investigating the impact of VHO on system performance under diﬀerent scenarios is crucial for better design and performance evaluation of next-generation systems. The performance evaluation of heterogeneous wireless network is a challenging task due to the system complexity and the large set of parameters that should be considered. To the best of our knowledge, very few papers [5, 6] deal with mathematical modeling and performance analysis of heterogeneous wireless networks. In [5], the authors study the admission region of voice and data trafﬁc within the integrated system. They derive expressions for the dropping and blocking probabilities of data and voice calls in 3G cell and WLAN. Their main result is that voice traﬃc in the double coverage area should be restricted from occupying all the WLAN bandwidth. In [6], we propose a novel mobility model for two-tier integrated system using Coxian phase-type structures to represent the cell residence time. Additionally, we develop a generic framework based on matrix-geometric theory and Markov reward models to estimate diﬀerent celllevel metrics such as network utilization times and VHO rate. In this paper, we present an analytical framework to estimate diﬀerent sessionlevel performance metrics. The obtained results are used to study the impact of WLAN coverage and topology variation on the session performance and total session cost. The analytical framework is validated through simulation. Additionally, the cost analysis shows that WLAN clustering in extended WLAN deployment can greatly reduce the session cost provided that the increased network management cost of extended WLAN deployment is eliminated. In Section 2, we present our 3G-WLAN integrated network model. Section 3 presents an analytical framework for estimating diﬀerent session parameters and the session cost. We present the numerical results in Section 4 and conclude in Section 5.

2

Network Model

In this work, we assume that a cellular network provides universal coverage with WLANs overlapping at a portion Pow of the total area. Consequently, the percentage of the unique cellular coverage area is Poc = 1 − Pow . Without loss of generality, we assume that active MTs prefer WLAN due to its larger bandwidth

Impact of Technology Overlap

537

and lower cost. Hence, any active MT will always handoﬀ to WLAN when it is encountered. Additionally, we adopt the extended-Coxian model [6] for user mobility in this integrated network. This model is based on phase-type (PH) distributions [7]. Generally, a PH random variable is deﬁned as the absorption time of an evanescent ﬁnite-state Markov process to a single absorbent state. Additionally, a PH random variable is conventionally denoted as P H(α, T), where α and T represent the initial transient state distribution and the transition rates among transient states, respectively. The proposed model employs the Coxian structure to represent cell residence time as a probabilistic sum of WLAN and cellular technology residence times (TRT), deﬁned as the duration spent by the MT in a speciﬁc technology.

A1

a1

b1

B2

a2

b2

A3

a3

B4

b3

a4

b4

A/B k bk

Cell Exit

Fig. 1. Mobility Model

Figure 1 shows the extended-Coxian mobility model where each stage is labeled with a letter corresponding to the visited technology and a number representing the stage sequence within the intended cell. The model consists of consecutive stages such that upon exiting any stage i, the MT may proceed to another technology within the same cell with probability ai or move to a neighbor cellular cell (i.e. get absorbed) with probability bi . The TRT of stage i is PH distributed and denoted as P H(αi , Ti ). Hence, the cellular residence time can be expressed as P H(αm , Tm ) ⎞ ⎛ T1 a1 t1 α2 0... ..0 ⎜ 0 T2 a2 t2 α2 ..0 ⎟ ⎟ (1) Tm = ⎜ ⎝ . . . ⎠ 0.. 0 Tk αm = [α1 0]

(2)

where ti = −Ti e [7]. Note that, in a two tier system, there are two mobility sub-models; each sub-model has a diﬀerent initial phase technology. Hence, we use C-type and W-type cell to diﬀerentiate models with initial cellular and initial WLAN phases respectively. Furthermore, we assume that the MT application session has exponentially distributed holding times denoted as tch and twh with parameters λch and λwh for cellular network and WLAN respectively. The exponential session

538

A. Zahran, B. Liang, and A. Saleh

holding time assumption is common in wireless networks due to their time-based pricing strategies and the limited power resources of mobile devices.

3

Cost Analysis

In this section, we estimate a total session cost as a function of session resource utilization, session QoS, and network administrative cost. First we present our session model in Section 3.1. Then, we present a generic framework to estimate the total session network utilization and total session VHO load in Section 3.2. Then we present our proposed total session cost function in Section 3.3. 3.1

Session Modeling

In our session model, we combine the mobility model and application characteristics to represent both application activity and user mobility. Consequently, the main conceptual diﬀerence between the session and mobility models is the interpretation of absorption. On contrary to the mobility model in which absorption only occurs due to cell exit, the session may get absorbed for diﬀerent reasons including normal session termination, denoted as the TERM state, successful handoﬀ to a neighbor cell, denoted as the SHH state, session dropping during horizontal handoﬀ, denoted as the HHFT state, or session dropping during vertical handoﬀ, denoted as the VHFT state. Hence, the generator matrix of the Markovian session process will have the following structure: QT T QT erm QSHH QHHF T QV HF T , (3) QS = 0 0 0 0 0 where QT T , QT erm , QSHH , QHHF T , QV HF T can be expressed as shown in [6]. Note that diﬀerent cases are distinguished in the analysis according to diﬀerent combinations of mobility sub-models and session types. We use A to denote mobility sub-model that is determined by the initial technology of the visited cell, where A ∈ {c, w} for C-type and W-type cells respectively. Additionally, we use B to denote the session type, where B ∈ {n, h} for new and handoﬀ sessions respectively. As a notational remark, we use superscripts to deﬁne the sessiontype and mobility sub-model combination. Following this notation, we denote AB AB AB PTAB ERM , PSHH , PV HF T , and PHHF T as the absorption probabilities to Term, SHH, VHFT, and HHFT states respectively for a B-type session starting in an A-type cell. These absorption probabilities can be estimated from the embedded Markov chain of the session model as shown in [8]. 3.2

Session Performance Analysis

In this subsection, we present a generic analytical framework to estimate diﬀerent session-level performance metrics from their corresponding cell-level metrics of diﬀerent session types and mobility sub-models. The derivations of the cell-level performance metrics are omitted here for brevity. However, interested readers

Impact of Technology Overlap

539

are referred to [6] for the details. The following analytical framework is used to estimate session cellular network utilization, session WLAN utilization and session VHO rate, denoted as Lsc , Lsw and μsv respectively. Due to page limitation, in the following derivation, only μsv is shown for illustration purpose. For exh ample, Lsc can be estimated by replacing μAB cv and μcv with their corresponding AB h metrics Lcc and Lcc respectively. The session VHO rate is deﬁned as the expected total number of VHOs performed by the MT during an entire session. Given that the session starts in the W-type cell, a conditional session VHO rate, μw sv , can be expressed as wn μw sv = μcv +

∞

P (n = k|W )kμhcv ,

(4)

k=1

where μwn cv is the VHO rate per cell of a new session starting in a W-type cell, P (n = k|W ) is the probability that a session starting in a WLAN will exactly visit k handoﬀ cells before termination, and μhcv is the VHO rate per cell of a ch wh handoﬀ session and is expressed as μhcv = Pwh μwh cv + Pch μcv , in which μcv and ch μcv represent the VHO rate per cell of a handoﬀ session starting in a W-type and C-type cells respectively, and Pwh and Pch represent the probability that the session will handoﬀ to a W-type and C-type cells respectively. Note that Pwh and Pch are topology dependent and equal the percentage of the cell edge with dual and unique coverage respectively. Clearly, μw sv can be expressed as wn w h μw sv = μcv + μhs μcv ,

(5)

where μw hs is the expected number of visited handoﬀ cells per session given that the session starts in a W-type cell. In order to estimate μw hs , we deﬁne the following probabilities – Phh : the probability that a handoﬀ session will successfully perform a horiwh ch zontal handoﬀ to a neighbor cell. Hence Phh = Pwh PSHH + Pch PSHH . – P hh : the probability that a handoﬀ session will be terminated in the same cell either due to normal termination or due to forced termination during VHO. – Ps : the probability that a handoﬀ session will visit exactly one more cell. wh ch Hence, Ps = Pwh PSHH P hh + Pch PSHH P hh . Consequently, the marginal distribution function of the number of successful w , can be expressed as HHO given that the session starts in a W-type cell, NHH ⎧ wn wn ⎨ PT erm + PVwn HF T + PHHF T , k = 0 wn w PSHH P hh , k = 1 P (Nhs = k) = . (6) ⎩ k−2 wn Phh Ps , ∀k ≥ 2 PSHH Hence, μw hs is calculated as μw hs =

∞

k=0

w kP (Nhs = k|W ) .

(7)

540

A. Zahran, B. Liang, and A. Saleh

c i w Using the mathematical identity ∞ i=0 ic = (1−c)2 , |c| < 1, μhs can be expressed as 2 − Phh w wn μhs = PSHH P hh + Ps . (8) (1 − Phh )2 Similarly, the expected number of successful HHO for a session starting in the cellular network, μchs , can be expressed as 2 − Phh c cn μhs = PSHH P hh + Ps . (9) (1 − Phh )2 Similar to (5), the expected total session VHO rate assuming the session starts in a C-type cell, μcsv , can be expressed as c h μcsv = μcn cv + μhs μcv .

(10)

Hence, using the total probability theorem, the total session VHO rate, μsv , is c μsv = Pwh μw sv + Pch μsv .

3.3

(11)

Total Session Cost

The total session cost, χ, is calculated as a function of session performance metrics and system administrative cost. This cost function is expressed as χ = Cc Lsc + Cw Lsw + Cvh μvh + Chh μhh + Cq Pf t + Cm ∗ η Pwo ,

(12)

where μhh represents the session horizontal handoﬀ rate, Pf t represents the session forced termination probability, Cc and Cw represent cellular network and WLAN utilization cost coeﬃcients respectively, Cvh and Chh represent session vertical and horizontal handoﬀ signaling cost coeﬃcients respectively, Cq represents the session forced termination probability cost coeﬃcient, Cm represents the management cost, and η is a WLAN area cost base. The derivation of μhh and Pf t is presented in [6]. Clearly, this cost function considers diﬀerent aspects including resource utilization cost, signaling load cost, QoS factor, and administrative cost. The latter cost is considered due to its substantial increase in extended WLAN deployment [9]. The main source of the increase in management cost under extended WLAN deployment is administrative tasks such as RF monitoring and conﬁguration, load balancing support, security issues, and station mobility support.

4

Numerical Results

In addition to the analysis, we have simulated an integrated heterogeneous system with square cells for simplicity of illustration. Each cell is sub-divided into N subdivisions, where WLANs are located. We consider two diﬀerent WLAN topologies: random and clustered. In the random topology, WLANs are randomly

Impact of Technology Overlap

541

distributed in the cell and the cell encounter a new random topology as it moves to each new cell. In the clustered topology, all WLANs are aggregated in one hotspot that is randomly located within the cell. In order to emulate practical MT operation, a handoﬀ area [10] of dH seconds is assumed between overlay 3G cells. This delay corresponds to the hysteresis introduced in handoﬀ algorithms to decrease the ping-ponging impact during horizontal handoﬀ. Additionally, the MT 3G-WLAN handoﬀ is delayed with ds seconds as a typical delay required for WLAN discovery and handoﬀ signaling. In the simulation, we adopt a twodimensional Gauss-Markov movement model from [11] for MT mobility. In its discrete version, at time n, the MT velocity in each dimension, vn , is given by vn = αv vn−1 + (1 − αv )μv + 1 − α2v xn−1 , (13) where αv represents a past velocity memory factor such that 0 ≤ αv ≤ 1, μv is the asymptotic mean of vn , and xn is an independent and stationary Gaussian process with zero mean and standard deviation σv , where σv is the asymptotic standard deviation of vn . The MT mobility parameters for the shown results are α = 0.9, μv = 0, and σv = 2.5. Table 1 lists the default values of analysis and simulation parameters. Table 1. Analysis and simulation parameters Parameter Value Parameter Value Parameter Value dH (sec) 5 ds (sec) 3 N 100 Pvb 0.01 Phb 0.01 Cq 10000 Cc 20 Cw 2 η 10 Cvh 75 Chh 75

In this study, we categorize the applications as symmetric and asymmetric. The former preserves the same level of resource utilization independent of the available network resources, while the latter has a greedy nature and can consume as much bandwidth as the network can provide. Conversational applications, such as voice over IP and video conferencing, are examples of the former, while streaming applications with buﬀering capabilities, such as video on demand and radio on demand, are examples of the latter. Generally, asymmetric applications beneﬁt from resource rich networks such as WLANs by increasing the download rate for smoother playback. In the simulation and analysis, we consider diﬀerent combinations of session duration (long and short) and application type (symmetric and asymmetric) whose parameters are shown in Table 2. The following ﬁgures plot the analysis and simulation results (including 95% conﬁdence intervals) for diﬀerent session metrics. The obtained results show a good match between the analysis and simulation results. This match demonstrates the accuracy of the derived analytical framework in estimating diﬀerent metrics.

542

A. Zahran, B. Liang, and A. Saleh Table 2. Application Parameters Unit Symmetric Symmetric Asymmetric Asymmetric Parameter Short Long Short Long tch (minutes) 3 30 6 60 twh (minutes) 3 30 1 10

4.1

Session Performance

Figure 2(a) plots the session VHO rate versus WLAN coverage for diﬀerent applications in random and clustered topologies. Clearly, the ﬁgure shows that WLAN aggregation signiﬁcantly reduces VHO signaling load. This reduction reaches more than 70% at 50% coverage for all the simulated applications. Generally, the VHO rate is mainly driven by the circumference of WLAN hotspots. Hence, the VHO rate of symmetric applications always increases as WLAN coverage increases in the clustered topology, but it starts dropping as WLAN coverage increases beyond 50% in the random topology. Additionally, the VHO rate of asymmetric applications decreases as WLAN coverage increases due to the beneﬁcial higher bandwidth of WLAN, which decreases the session duration of such applications. Figure 2(b) shows the session forced termination probability of diﬀerent applications versus WLAN coverage in random and clustered topologies. Similar to VHO, clustering WLANs signiﬁcantly reduces the forced termination probability of diﬀerent applications compared to the random topology. For example, a 45% reduction is achieved at 35% coverage overlap for diﬀerent applications. Figure 2(c) shows HHO rate variations for diﬀerent applications versus WLAN coverage for both random and clustered topologies. Generally, the HHO rate of symmetric applications is independent of WLAN coverage, while the HHO rates of asymmetric applications signiﬁcantly decrease as WLAN coverage increases. Additionally, this ﬁgure shows that the applications perform more HHOs in the clustered topology compared to the random one. However, this slight increase is a logical consequence for extended application lifetime due to the reduced session forced termination probability in the former topology. Figures 2(d) and 2(e) plot the total-session cellular network and WLAN utilization, respectively, versus WLAN coverage for diﬀerent applications in random and clustered topologies. Generally, a speciﬁc network utilization for a symmetric application is linearly proportional to its corresponding coverage independent of the WLAN topology. On contrary, as WLAN coverage is extended, the cellular utilization of asymmetric applications exponentially decreases with an insigniﬁcant increase in WLAN utilization. Additionally, the ﬁgure shows that the clustered cellular network and WLAN utilization are slightly larger when compared to the random topology. However, this is also due to the longer session lifetime in the clustered topology.

0.16

Random Symm. Short Random Symm. Long Random Asymm. Short Random Asymm. Long Clustered Symm. Short Clustered Symm. Long Clustered Asymm. Short Clustered Asymm. Long

25

20

Forced Termination Probability

Session Vertical Handoff Rate

Impact of Technology Overlap

15

10

5

543

Random Symm. Short Random Symm. Long Random Asymm. Short Random Asymm. Long Clustered Symm. Short Clustered Symm. Long Clustered Asymm. Short Clustered Asymm. Long

0.14 0.12 0.1 0.08 0.06 0.04 0.02

0.1

0.2

0.3

0.4

0.5

0.6

0.1

0.2

WLAN Coverage (Pwo)

(a) Session VHO rate

0.4

0.5

0.6

(b) Forced termination probability 2200

8 Random Symm. Short Random Symm. Long Random Asymm. Short Random Asymm. Long Clustered Symm. Short Clustered Symm. Long Clustered Asymm. Short Clustered Asymm. Long

7 6 5

Session Cellular Utilization Time

Horizontal Handoff Rates

0.3

WLAN Coverage (Pwo)

4 3 2 1

Random Symm. Short Random Symm. Long Random Asymm. Short Random Asymm. Long Clustered Symm. Short Clustered Symm. Long Clustered Asymm. Short Clustered Asymm. Long

2000 1800 1600 1400 1200 1000 800 600 400 200

0.1

0.2

0.3

0.4

0.5

0.1

0.6

0.2

(c) HHO rate

0.4

0.5

0.6

(d) Session Cellular utilization

1000

Session WLAN Utilization Time

0.3

WLAN Coverage (Pwo)

WLAN Coverage (Pwo)

Random Symm. Short Random Symm. Long Random Asymm. Short Random Asymm. Long Clustered Symm. Short Clustered Symm. Long Clustered Asymm. Short Clustered Asymm. Long

900 800 700 600 500 400 300 200 100 0.1

0.2

0.3

0.4

0.5

0.6

WLAN Coverage (Pwo)

(e) Session WLAN utilization Fig. 2. Session performance metrics versus WLAN coverage. The solid lines represent analysis, and the dashed lines represent simulation.

4.2

Session Cost

Figures 3(a) and 3(b) plot the total session cost versus the percentage of WLANcellular technology overlap for Cm = 0 and Cm = 5000 respectively. Figure 3(a)

544

A. Zahran, B. Liang, and A. Saleh 4

4

x 10

x 10

5

Random Symm. Short Random Symm. Long Random Asymm. Short Random Asymm. Long Clustered Symm. Short Clustered Symm. Long Clustered Asymm. Short Clustered Asymm. Long

4 3.5

4.5

Total Session Cost

4.5

Total Session Cost

5

3 2.5

4 3.5 3

2

2

1.5

1.5

1

Random Symm. Short Random Symm. Long Random Asymm. Short Random Asymm. Long Clustered Symm. Short Clustered Symm. Long Clustered Asymm. Short Clustered Asymm. Long

2.5

1 0.1

0.2

0.3

0.4

0.5

0.6

0.1

0.2

0.3

WLAN Coverage (Pwo)

0.4

0.5

0.6

WLAN Coverage (Pwo)

(a) Cm = 0

(b) Cm = 5000 5

x 10 8

Total Session Cost

7 6 5

Random Symm. Short Random Symm. Long Random Asymm. Short Random Asymm. Long Clustered Symm. Short Clustered Symm. Long Clustered Asymm. Short Clustered Asymm. Long

4 3 2 1

3

10

4

10

5

10

Cost Ratio ( Cq / Cc)

(c) Network topology and application QoS sensitivity Fig. 3. Total session cost. The solid lines represent analysis, and the dashed lines represent simulation.

shows that increasing WLAN coverage reduces the total session cost in this scenario. This reduction may reach 50% and 22.5% for asymmetric and symmetric applications respectively as the WLAN coverage increases from 10% to 35%. However, the administrative costs of extended coverage approximately eliminates the integration cost beneﬁts for the considered application set, except for asymmetric applications with large session duration, as shown in Figure 3(b). Even for this application, there exists an optimal WLAN coverage, e.g. approximately 50% overlap for the asymmetric application under consideration, beyond which the overall session cost is dominated by its administrative component. Hence, maintaining the cost reduction in an extended WLAN scenario mandates developing economical WLAN management schemes, such as remote management. Figure 3(c) plots the total session cost versus the ratio between forced termination and cellular utilization costs for diﬀerent applications at 36% coverage

Impact of Technology Overlap

545

overlap. The ﬁgure suggests that WLAN clustering greatly reduces the total session cost as the application sensitivity to forced termination increases. Hence, clustering WLANs is highly recommended for systems dominated with this type of applications.

5

Conclusion

The inherent heterogeneity of next generation systems complicates system analysis and performance evaluation especially with the unavoidable interaction of diﬀerent system parameters. A generic framework is presented to estimate diﬀerent session-level performance metrics from their corresponding cell-level metrics. These metrics are used to study the impact of the amount of coverage overlap and the topology of the underlay technology on the total session cost. The obtained results show that WLAN clustering can reduce the VHO signaling load and the forced termination probability of diﬀerent applications by up to 70% and 45%, respectively. Additionally, the cost analysis shows that increased administrative cost due to extended WLAN coverage can signiﬁcantly reduce the integrated system cost beneﬁts. Hence, developing economical WLAN management mechanisms is crucial to achieving real cost beneﬁts from extended WLAN coverage.

References 1. Buddhikot, M.M., Chandranmenon, G., Han, S., Lee, Y.W., Salgarelli, S.M.L. (2003): Integration of 802.11 and third generation wireless data networks. In Proc. of IEEE INFOCOM, San Francisco, US, 503-512. 2. 3GPP2 (2003): Feasibility study on 3GPP systems to. wireless local area network WLAN interworking. Technical report, 3GPP TR 22.934. 3. ETSI (2001): Requirements and architectures for interworking between HIPERLAN/3 and 3rd generation cellular systems. Technical report, ETSI TR 101 957. 4. Stemm, M., Katz, R.H. (1998): Vertical handoﬀs in wireless overlay networks. ACM Mobile Networks and Applications 3(4) 335-350 5. Song, W., Jiang, H., Zhuang, W. (to appear): Performance analysis of WLAN-ﬁrst scheme in Cellular/WLAN interworking. IEEE Trans. on Wireless Commun. 6. Zahran, A.H., Liang, B., Saleh, A. (2006): Modeling and performance analysis for beyond 3G integrated wireless networks. In Proc. of IEEE International Conference on Communications (ICC). 7. Latouche, G., Ramaswami, V. (1999): Intorduction to Matrix analytic Methods in Stochastic Modeling. ASA-SIAM series on Statistics and Applied Probability. 8. Papoulis, A., Pillai, S. (2002): Probability, Random Variables and Stochastic Processes. 4th edn. McGraw-Hill. 9. Hara, B.O., Calhoun, P., Kempf, J. (2005): Conﬁguration and provisioning for wireless access points (CAPWAP) problem statement. RFC 3990. 10. Wang, J., Zeng, Q.A., Agrawal, D.P. (2003): Performance analysis of a preemptive and priority reservation handoﬀ schemefor integrated service-based wireless mobile networks. IEEE Trans. Mob. Comput. 2(1) 65-75. 11. Liang, B., Haas, Z.J. (2003): Predictive distance-based mobility management for multi-dimensional pcsnetworks. IEEE/ACM Trans. on Net. 11(5) 718-732.

An On-Line Measurement-Based Admission Control for VBR Video Traﬃc in Wireless Multimedia Home Networks Yi-Hsien Tseng1 , Eric Hsiao-Kuang Wu2 , and Gen-Huey Chen1, 1

2

Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan [email protected] Department of Computer Science and Information Engineering, National Central University, Chung-Li, Taiwan

Abstract. Variable bit rate (VBR) video traﬃc with high data rate is expected to be the largest proportion of traﬃc carried by future wireless home networks. In order to guarantee quality-of-service (QoS) requirements of such type of VBR video traﬃc and improve channel utilization, this paper proposes an on-line measurement-based admission control scheme. The proposed scheme is designed based on the property of the aggregate VBR video traﬃc is lognormal distributed and consists of two main components which are measurement process and admission decision respectively. The measurement process applies a linear Kalman ﬁlter to estimate statistical parameters of aggregate VBR video traﬃc. Then, the estimated statistical parameters are used to calculate the eﬀective bandwidth for admission decision. The proposed scheme is computational eﬃciently and accurate without much prior information of traﬃc. Simulation results conﬁrm its accuracy and also show that the proposed scheme performs well for small number of connections as well as large number of connections.

1

Introduction

Because of ease of installation or relocation, wireless home networks are particularly attractive to ensure required “wire-like” performance for many indoor multimedia applications. Consequently, Variable bit rate (VBR) video traﬃc is expected to be the largest proportion of traﬃc carried by future wireless home networks. In order to guarantee quality-of-service (QoS) requirements of VBR video traﬃc, an admission control scheme is needed by the network to decide whether a new connection is accepted or rejected. VBR encoding can oﬀer higher picture quality and greater opportunity for statistical multiplexing gains [1] (with the same average bandwidth) than constant bit rate (CBR) encoding. However, an admission control scheme may make decision under uncertain conditions because

Corresponding author.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 546–557, 2007. c IFIP International Federation for Information Processing 2007

An On-Line Measurement-Based Admission Control for VBR Video Traﬃc

547

the required bandwidth of VBR video traﬃc varies with time. A good admission control scheme for VBR video traﬃc should improve channel utilization through statistical multiplexing and guarantee QoS at the same time. Many admission control schemes have been proposed to date (e.g, [2]-[7]). In [8], the overview and the three-way taxonomy were provided. They commented that the admission control schemes proposed by [2] and [3] are simple and can achieve high bandwidth eﬃciency. These two schemes were both based on eﬀective bandwidth and using measurement. In this paper, we propose an on-line Measurement-based admission control (MBAC) scheme for VBR video traﬃc. The type of VBR video which we deal with is the full-motion video generated by a Motion Picture Expert Group (MPEG) encoder [9]. Several probability distribution functions, such as Gamma, Weibull, lognormal etc., can be used to ﬁt the distribution of MPEG-coded video trace (the sequence of bits per frame in the compressed video traﬃc). The proposed scheme consists of two main components which are measurement process and admission decision respectively. A linear Kalman ﬁlter [10] is applying in the measurement process to estimate statistical parameters of aggregate VBR video trace. The estimated statistical parameters are used to calculate the eﬀective bandwidth for admission decision. The eﬀective bandwidth is devised from the assumption that the aggregate VBR video trace is lognormal distributed. The proposed scheme can achieve good performances which include channel utilization and QoS guarantee for VBR video traﬃc. Meanwhile, it is computation eﬀectively and can be adopted to be an on-line MBAC scheme for real-time VBR video traﬃc. Moreover, the proposed scheme needs not to know much statistical characteristic (only peak rate) of individual VBR video traﬃc in advance. The rest of this paper is organized as follows. In Section 2, an on-line MBAC scheme for VBR video traﬃc is proposed. Simulation and comparison results are shown in Section 3. Finally, this paper concludes with some remarks in Section 4.

2

A MBAC with Aggregate Eﬀective Bandwidth Estimation

In this section, we propose a MBAC scheme with aggregate eﬀective bandwidth estimation (MBAC-AEBE) for VBR videos. The MBAC-AEBE is based on the assumption that VBR video trace is lognormal distributed and consists of two main components which are measurement process and admission decision respectively. 2.1

Network Model and Aggregate VBR Video Traﬃc Model

The network model that we consider is the IEEE 802.15.3 [11] wireless home network. The IEEE 802.15.3 MAC protocol uses time division multiple access (TDMA) to allocate channel time among devices, in order to prevent conﬂicts, and it allocates new channel time for a connection only when enough bandwidth is available.

548

Y.-H. Tseng, E.H.-K. Wu, and G.-H. Chen

The elementary topology unit for the IEEE 802.15.3 MAC layer is a piconet. A piconet contains a number of independent data devices (DEVs) that are allowed to exchange frames directly with each other. The master/slave relationship was adopted for these DEVs; a particular DEV, named piconet coordinator (PNC), acts as the master and the others are slaves. Timing for a piconet is realized by superframes and is shown in Fig. 1.

Fig. 1. Timing for a piconet in IEEE 802.15.3 wireless home networks

The MBAC-AEBE is performed in the PNC to determine if a new VBR video traﬃc is accepted or not. The VBR video traﬃc that we consider is the fullmotion video traﬃc generated by a MPEG encoder. MPEG-coded video traces used by this paper can be found in [12]. The MPEG introduced three frame types: intra-frame (I), inter-frame (P) and bidirectional-frame (B). These frame types are organized into so called groups of pictures (GoPs). The size of each GoP is measured by the MBAC-AEBE and is used for admission control. It was shown in [13] that the distribution of VBR video trace can be ﬁtted well by a lognormal distribution. A random variable X with a lognormal distribution has PDF: 2 2 1 √ e−(lnx−μ) /2σ , (1) f (x; μ, σ) = xσ 2π where μ and σ 2 are the mean and variance of the random variable Y = ln(X) which is Gaussian (Normal) distributed. If aggregate GoP sizes are lognormal distributed, they are easy to be transformed into be Gaussian distributed. Then the mean μ and variance σ 2 of the Gaussian distribution can be estimated by their good unbiased estimators X and S 2 as follows respectively.

An On-Line Measurement-Based Admission Control for VBR Video Traﬃc

549

X1 + ... + Xn . (2) n n (Xi − X) 2 S = i=1 . (3) n−1 However, it may not be easy to ﬁnd good unbiased estimators for other distributions. X=

2.2

Measurement Process Using Kalman Filter

From above discussion, we denote a aggregate GoP trace as a random process x(n) with lognormal distribution. Then, the y(n) = ln(x(n)) is a random process with Gaussian distribution. The measurement process of the MBAC-AEBE adaptively estimates the mean and variance of y(n) in each GoP period. This is done by an optimization scheme based on a linear Kalman ﬁlter. The Kalman ﬁlter can provide an optimal least-square estimate of the system state for a linear system. The Kalman ﬁlter is a set of mathematical equations that implement a predictor-corrector type estimator that is optimal in the sense that it minimizes the estimated error covariance. The Kalman ﬁlter addresses the problem of estimating the state x ∈ Rn of a discrete-time controlled process that is governed by the process equation xk = Axk−1 + wk , with a measurement z ∈ R

m

(4)

which measurement equation is zk = Bxk + vk ,

(5)

where the n × n matrix A relates the state at the previous time step k − 1 to the state at the current step k. The m × n matrix B relates the state xk to the measurement zk . The random variables wk and vk are the process and measurement noise respectively and are assumed to be independent, white, and with normal probability distributions p(w) ∼ N (0, Q),

(6)

p(v) ∼ N (0, R),

(7)

where Q is the process noise covariance and R is the measurement noise covariance. The kalman ﬁlter estimates the process state at some time and then obtains corrections from measurements. The Kalman ﬁlter consists of ﬁve equations to solve above problem. The equations for the Kalman ﬁlter are categorized to two groups which are time update equations and measurement update equations respectively. The time update equations can also be thought of as predictor equations and are listed as follows.

550

Y.-H. Tseng, E.H.-K. Wu, and G.-H. Chen

xˆ− xk−1 , k = Aˆ

(8)

Pk− = APk−1 AT + Q,

(9)

x ˆ− k

where is a priori state estimate at step k given knowledge of the process prior to step k. x ˆk−1 is a posteriori estimate which was updated in the measurement update equations. Pk− is a priori estimate error covariance. Pk−1 is a posteriori estimate error covariance which was updated in the measurement update equations. The measurement update equations can be thought of as corrector equations and are listed as follows. Kk = Pk− B T (BPk− B T + R)−1 ,

(10)

x ˆk = x ˆ− ˆ− k + Kk (zk − B x k ),

(11)

Pk = (I − Kk B)Pk− ,

(12)

where Kk is the Kalman gain. Moreover, zk is known from observing a system. A, B, Q and R are also known parameters while designing a system. The initial values of x ˆk−1 and Pk−1 are must determined before running Kalman ﬁlter. About the guideline for determining the initial values can be found in [14]. Since the measurement process of the MBAC-AEBE attempts to estimate the mean and variance of y(n), the state for our system is deﬁned as xk = [Mk , Vk ], where Mk and Vk are the mean and variance of y(n) at the k-th step. If there is no any traﬃc entering or leaving the system, the state does not change from step to step so the A in the process equation (4) is set to identity matrix (I). The process equation of our system is: xk = xk−1 + wk−1 .

(13)

And our noisy measurement is of the state directly so the B in the measurement equation (5) is also set to I. So the measurement equation of our system is: (14) zk = xk + vk , ¯ k , S 2 ]. X ¯ k is the sample mean of y(n) at the k-th where zk is deﬁned as zk = [X k step and can be obtained from calculating (2). Sk2 is the sample variance of y(n) at the k-th step. One way to obtain is to calculate (3). Since Mk will adaptively approximate to the mean of the system, we ﬁnd that the better way to calculate Sk2 is: n (yi − Mk−1 ) . (15) Sk2 = i=1 n−1

An On-Line Measurement-Based Admission Control for VBR Video Traﬃc

551

Then, the state xk of our system can be estimated by calculating the equations from (8) to (12) at each step. The process noise covariance Q is set to zero-matrix because we fully “trust” that the state does not change from step to step. The measurement noise covariance R is usually measured prior to operation of the ﬁlter. From our experiences, we set R = [0.1 0.1]T ·[0.1 0.1]. However, the value of R is not critical while Q is set to zero-matrix. The initial value of x ˆk−1 called xˆ0 is set to [0 0]T because we observe nothing from y(n) before starting. Furthermore, we choose the initial value of Pk−1 called P0 to be [1 1]T · [1 1]. In fact, we can choose almost P0 = O and the ﬁlter will eventually converge. The values of Mk and Vk are used to calculate aggregate eﬀective bandwidth when a new traﬃc requests a connection. The admission decision based on the aggregate eﬀective bandwidth will be discussed in next section. The measurement process continues to run without any change if the new request is rejected. On ¯ k and S 2 should be reset when a new request the contrary, the values of Pk−1 , X k is accepted or an existing traﬃc is ﬁnished. Since the mean and variance of y(n) will change after a new request is accepted or an existing traﬃc is ﬁnished. We must make xk to approximate the new state (i.e. new mean and new variance). Thus, Pk−1 is reset as the value of P0 . This makes the ﬁlter does not believe xk is true enough and make xk to approximate ¯ k and S 2 should be calculated without consider the new state. The values of X k historical aggregate GoP sizes before the k-th step. 2.3

Admission Decision Using Aggregate Eﬀective Bandwidth

For the admission decision of the MBAC-AEBE, we consider a model that a PNC has accepted some connections and the capacity is C. We assume that the current aggregate bandwidth is E. The aggregate GoPs x(n) demands a loss probability no larger than which is as follows. ¯ p > E} < , P r{X

(16)

¯ p = 1/p[x(1) + ... + x(p)]. The MBAC-AEBE tries to ﬁnd the value of where X E while given the value of and adopts E for admission decision when there is a new request with peak rate R. If E + R < C, the connection is accepted; otherwise it is rejected. ¯ p is also a Gaussian random variable. FurIf x(n) is a Gaussian process, X ¯ p approaches to be a Gaussian thermore, from the Central Limit Theorem, X random variable for p is large enough (p ≥ 30) even that x(n) is not a Gaussian process. However, it is diﬃcult to ﬁnd E accurately by solving (16) if x(n) is non-Gaussian especially for p is small. ¯ p ) is exactly a Since we have known that x(n) is lognormal distributed, ln(X Gaussian random variable. The assumption of a Gaussian distribution allows us to adopt standard approximations to estimate tail probability of the distribution. Then, we can estimate E by solving ¯ p ) > ln(E)} < . P r{ln(X

(17)

552

Y.-H. Tseng, E.H.-K. Wu, and G.-H. Chen

2 ¯ For solving (17), the mean μ ˆlnX¯ and variance σ ˆln ¯ of ln(Xp ) must be known. X From the measurement process of the MBAC-AEBE, the mean Mk and variance Vk of y(n) = ln(x(n)) have been known. Then, the mean μ ˆ x and variance σ ˆx2 of x(n) can be calculated by Fenton’s approach [15] as follows:

μ ˆx = exp(Mk +

Vk ), 2

σ ˆx2 = exp(2Mk + Vk ){exp(Vk ) − 1}.

(18) (19)

2 ¯ p is μ ¯ Moreover, it is known that the mean μ ˆX¯ of X ˆ x and the variance σ ˆX ¯ p of Xp 2 2 is σ ˆx /p. The μ ˆlnX¯ and σ ˆlnX¯ can be obtained by calculating as follows [16]: 2 σ ˆln ¯ = ln( X

2 σ ˆX ¯ + 1), 2 μ ˆX¯

2 σ ˆln ¯ X . 2 Finally, the approximation[17] of E is given by:

μX¯ ) − μ ˆ2lnX¯ = ln(ˆ

E exp(ˆ μlnX¯ + α σlnX¯ ), with α =

−2ln() − ln(2π).

(20)

(21)

(22)

In brief, it needs to calculate (18)-(22) for the admission decision. Moreover, the value of p in (16) can be non-integer and can be controlled to handle different level of burstness of aggregate traﬃc. The value of p can be decreased for handling higher level of burstness. The admission decision is involved only when a DEV requests a new connection. The computational complexity of the admission decision is not high. Recalling that the measurement process of the MBAC-AEBE based on the linear Kalman ﬁlter is also computational eﬃciently. Therefore, the MBAC-AEBE is suitable for on-line usage.

3

Simulation Results

The scenario of our simulation is described as follows. Diﬀerent numbers of DEVs are required to transmit VBR video traﬃc to other DEVs within the same piconet. We accept connections of these diﬀerent numbers of traﬃc to evaluate the performance of the measurement process of the MBAC-AEBE and the accuracy of the eﬀective bandwidth for the admission decision of the MBAC-AEBE. We assume that the channel is ideal, in order to focus our attention on the performance of the MBAC-AEBE. The performance of the measurement process of the MBAC-AEBE is evaluated by checking if the estimated Mk and Vk can approach to the mean μ and variance σ 2 of logarithmic aggregate GoP sizes. The logarithmic aggregate GoP sizes with length 3333 GoPs are ﬁtted by the Gaussian distribution and the values of its μ and σ 2 are found as 13.395 and 0.074 respectively. Furthermore, the sample mean and sample variance are calculated by considering previous 30

An On-Line Measurement-Based Admission Control for VBR Video Traﬃc

553

aggregate GoP sizes at each step (GoP index). And the sample mean and the sample variance are fed into the measurement equation (14) of our measurement process. It is shown in Fig. 2 that the value of the Mk well approaches to the value of μ, whereas the sample mean varies largely over time. Therefore, using the estimated Mk is much better and more stable than using the sample mean for admission decision.

Fig. 2. Performance of the measurement process for estimating the mean (μ)

Since the Mk is the good estimation of μ, the MBAC-AEBE adopts (15) to calculate the sample variance and feeds it into the measurement equation. Then, as shown in Fig. 3, the value of the Vk also well approaches to the value of σ 2 . However, the sample variance calculated by (15) still varies largely over time. Therefore, using the estimated Vk is much better and more stable than using the sample variance for admission decision. Moreover, we evaluate the performance of the admission decision of the MBAC-AEBE. The performance of the admission decision is checked by evaluating the accuracy of the aggregate bandwidth E calculated by (22). The value of loss ratio is more close to the desired QoS (i.e. desired loss ratio) if the value of E is more accurate. The overestimation of the aggregate eﬀective bandwidth may decay channel utilization, whereas the underestimation of the aggregate effective bandwidth may not guarantee the desired QoS. The desired loss ratio is set to 1e-4 in our simulations. The accuracy of E is evaluated for diﬀerent numbers of connections which are only 1 connection (Fig. 4), aggregating 12 connections (Fig. 5) and aggregating 18 connections (Fig. 6). It is also compared with the method propose in [17] which assumed that the aggregate traﬃc is Gaussian distributed. This method is labelled as JSAC91 in our simulations.

554

Y.-H. Tseng, E.H.-K. Wu, and G.-H. Chen

Fig. 3. Performance of the measurement process for estimating the variance (σ 2 )

Fig. 4. Accuracy comparison of estimated eﬀective bandwidth for only one connection

Observing the results shown from Fig. 4 to Fig. 6, the accuracy of the JSAC91 is increasing when the number of connections is increasing. This conforms to the Central Limit Theorem because the JSAC91 assumed Gaussian. On the contrary, the MBAC-AEBE is much accurate no matter how many connections are. The resulting loss ratios for the MBAC-AEBE approach to the desired QoS very well such as the MBAC-AEBE with p=2.58 in Fig. 4, the MBAC-AEBE with p=1.71 in Fig. 5 and the MBAC-AEBE with p=1.44 in Fig. 6. In practice, the best value of p can be found by using a steepest-descent algorithm [14].

An On-Line Measurement-Based Admission Control for VBR Video Traﬃc

555

Fig. 5. Accuracy comparison of estimated aggregate eﬀective bandwidth for 12 connections

Fig. 6. Accuracy comparison of estimated aggregate eﬀective bandwidth for 18 connections

4

Conclusion

This paper proposed an on-line measurement-based admission control scheme called MBAC-AEBE in order to guarantee quality-of-service (QoS) requirements of real-time VBR video traﬃc. The MBAC-AEBE consisted of two main components which were measurement process and admission decision respectively.

556

Y.-H. Tseng, E.H.-K. Wu, and G.-H. Chen

For on-line usage, the measurement process of the MBAC-AEBE adaptively estimated the mean and variance of logarithmic aggregate GoP sizes in each GoP period by an optimization scheme based on a linear Kalman ﬁlter. The Kalman ﬁlter could provide an optimal least-square estimate of the system state for a linear system. Then, the estimated mean and variance by measurement process were used to calculate the eﬀective bandwidth for admission decision. Simulation results showed that the measurement process of the MBAC-AEBE could provide accurate estimations of the mean and the variance of logarithmic aggregate GoP sizes. We also showed that the aggregate eﬀective bandwidth estimated by the MBAC-AEBE could provide loss ratio much approached to the desired QoS. The performance of the admission decision depended on the accuracy of the estimated aggregate eﬀective bandwidth. Moreover, the performance of the admission decision was evaluated for diﬀerent numbers of connections. It showed that the MBAC-AEBE was much accurate no matter how many connections are.

References 1. I. Dalqic and F. A. Tobaqi, “Performance evaluation of ATM networks carrying constant and variable bit-rate video traﬃc,” IEEE Journal of Selected Areas in Communications, vol. 15, pp. 1115-1131, 1997. 2. Z. Dziong, M. Juda, and L. G. Mason, “A framework for bandwidth management in ATM networks - aggregate equivalent bandwidth estimation approach,” IEEE/ACM Transactions on Networking, vol. 5, no. 1, pp. 134-147, Feb. 1997. 3. K. Shiomoto, S. Chaki, and N. Yamanaka, “A simple bandwidth management strategy based on measurements of instantaneous virtual path utilization in ATM networks,” IEEE/ACM Transactions on Networking, vol. 6, no. 5, Oct. 1998. 4. H. Saito and K. Shiomoto, “Dynamic call admission control in ATM networks,” Journal of Selected Areas in Communications, vol. 9, no. 7, pp. 982-989, Sept. 1991. 5. T. E. Tedijanto and L. Gun, “Eﬀectiveness of dynamic bandwidth management mechanisms in ATM networks,” Proceedings of IEEE INFOCOM ’93, pp. 358-367, 1993. 6. K. Shiomoto and S. Chaki, “Adaptive connection admission control using real-time traﬃc measurements in ATM networks,” IEICE Transactions on Communications, vol. E78-B, no. 4, pp. 458-464, Apr. 1995. 7. R. J. Gibbens, F. P. Kelly, and P. B. Key, “A decision-theoretic approach to call admission control in ATM networks,” IEEE Journal of Selected Areas in Communications, vol. 13, no. 6, pp. 1101-1114, Aug. 1995. 8. K. Shiomoto, N. Yamanaka and T. Takahashi, “Overview of measurement-based connection admission control methods in ATM networks,” IEEE Communications Surveys & Tutorials, vol. 2, no. 1, pp. 2-13, First Quarter 1999. 9. D. Le Gall, “MPEG: A video compression standard for multimedia applications,” Communications of the ACM, vol. 34, pp. 47-58, Apr. 1991. 10. R. E. Kalman, “A new approach to linear ﬁltering and prediction problems,” Transaction of the ASME-Journal of Basic Engineering, pp. 35-45, Mar. 1960. 11. IEEE standard 802.15.3: Wireless Medium Access Control (MAC) and Physical Layer (PHY) Speciﬁcations for High Rate Wireless Personal Area Networks (WPANs), Inst. Elec. Electron. Eng., New York, USA, 2003.

An On-Line Measurement-Based Admission Control for VBR Video Traﬃc

557

12. M. Reisslein et al., “Traﬃc and Quality Characterization of Scalable Encoded Video: A Large-Scale Trace-Based Study,” Arizona State University, Dept. of Elect. Eng., Tech. Rep., Dec. 2003. Video traces available from http://trace.eas.asu.edu. 13. M. Krunz, R. Sass and H. Hughes, “Statistical characteristics and multiplexing of MPEG streams,” Proceedings of IEEE INFOCOM ’95, vol. 2, pp. 455-462, 1995. 14. S. Haykin, Adaptive Filter Theory. Englewood Cliﬀs, NJ: Prentice-Hall, 1991. 15. L. F. Fenton, “The sum of lognormal probability distributions in scatter transmission systems,” IRE Transactions on Communications Systems, pp 57-67, 1960. 16. K. Nagarajan and G. T. Zhou, “A new resource allocation scheme for VBR video traﬃc source,” Proceedings of 34th Asilomar Conference on Signals, Systems, and Computers, Paciﬁc Grove, CA, pp. 1245-1249, 2000. 17. R. Guerin, H. Ahmadi, and M. Naghshineh, “Equivalent capacity and its application to bandwidth allocation in high-speed networks,” IEEE Journal of Selected Areas in Communications, Vol. 9, pp 968-981, 1991.

On Event Signal Reconstruction in Wireless Sensor Networks ¨ ur B. Akan Barı¸s Atakan and Ozg¨ Next Generation Wireless Communications Laboratory Department of Electrical and Electronics Engineering Middle East Technical University, 06531, Ankara, Turkey {atakan,akan}@eee.metu.edu.tr http://www.eee.metu.edu.tr/∼ nwc/ Abstract. In Wireless Sensor Networks (WSN), the eﬀective detection and reconstruction of the event signal is mainly based on the regulation of sampling and communication parameters used by the sensor nodes. The aim of this paper is to understand the eﬀect of these parameters on the reconstruction performance of event signal in WSN. Theoretical analysis and results show that with proper selection of sampling and communication parameters, the event signal can be satisfactorily reconstructed at the sink. Furthermore, this study also reveals that the non-uniform and irregular sampling of the event signal outperform the uniform sampling in terms of the reconstruction performance while providing signiﬁcant energy conservation. Moreover, it is also shown that node density is closely related with the quality. Keywords: Wireless sensor networks, event signal reconstruction, non-uniform sampling, irregular sampling.

1

Introduction

In Wireless Sensor Networks (WSN), energy-eﬃcient and reliable communication is mainly based on the regulation of sampling and communication parameters such as reporting frequency1 (or sampling frequency), number of source nodes, size of the source node selection area around the event. Therefore, to eﬀectively reconstruct the observed physical phenomenon at the sink, it is imperative to understand the eﬀect of these parameters over the reconstruction performance. There has been some research eﬀorts about the reconstruction of the observed phenomenon in WSN. In [2], the eﬀect of the compression over the reconstruction of event signal is investigated. The upper bound for the compression rate which can achieve the given distortion bound is given. In [3], to eﬀectively reconstruct

1

This work was supported by the Turkish Scientiﬁc and Technical Research Council (TUBITAK) Career Award under grant KARIYER-104E043. The sampling frequency of a sensor node is the number of samples taken from the sensed event signal. However, since in WSNs each data packet includes a number of samples, reporting frequency which is deﬁned as the number of transmitted data packet per unit time is directly proportional with sampling frequency. Therefore, throughout this paper, we mainly use the sampling frequency.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 558–569, 2007. c IFIP International Federation for Information Processing 2007

On Event Signal Reconstruction in Wireless Sensor Networks

559

the correlated observations of sensor nodes, both asymmetric and symmetric encoder settings are proposed. In [4], a source-channel matching approach is proposed to estimate the observed ﬁeld, which can provide the analysis for the eﬀect of the parameters such as number of nodes, power, ﬁeld complexity over the estimation. In [5], to eﬃciently reconstruct the observed ﬁeld, the tradeoﬀ between coding rate and reliability characterized by the loss probability is investigated. In [6], the relation between the system throughput and reconstruction distortion is investigated for the case in which the reconstruction is performed by a large scale sensor network with mobile agents. However, none of these studies consider actual signal reconstruction. They use either estimated or compressed version of sensor samples to reach to an estimation of the event signal instead of an actual event signal reconstruction. Furthermore, they do not incorporate the sampling and communication parameters to investigate their eﬀects on the actual reconstruction of the event signal. In this paper, a theoretical analysis for the reconstruction of event signal at the sink is performed. The aim of this theoretical analysis is to understand the eﬀect of the sampling and communication parameters over the performance of event signal reconstruction in WSN. The main contributions and results of this work can be outlined as follows: 1. Exact event signal reconstruction is performed using the sensor readings instead of an estimation of the event features. It is shown that with proper selection of sampling and communication parameters, an event signal can be satisfactorily reconstructed at the sink. 2. For a given application-speciﬁc reconstruction error constraint, the uniform sampling scheme, in which source nodes use the same sampling frequency, is ineﬃcient for the reconstruction of the event signal with high frequency components. However, the non-uniform and irregular sampling schemes, in which source nodes use heterogeneous sampling frequencies, enable sensor network to eﬀectively reconstruct the event signal with high frequency components as well as providing signiﬁcant energy conservation. 3. The reconstruction performance can also be aﬀected by the size of the source node selection area around the event. When this area is decreased, the lower reconstruction error can be obtained as well as signiﬁcant energy conservation can be achieved with smaller number of sources. This highlights the node density problem coupled with the reconstruction process. That is, to achieve a certain level of reconstruction error, certain node density is imperative. The remainder of this paper is organized as follows. In Section 2, we introduce the theoretical model for event signal reconstruction. In Section 3, by using the theoretical model given in Section 2, we ﬁrst give the results on the performance of event signal reconstruction with uniform, non-uniform and irregular sampling schemes, respectively. Then, we present the comparative results of these three sampling schemes. The concluding remarks are given in Section 4.

560

2

¨ B. Atakan and O.B. Akan

Reconstruction of Observed Event Signal

In WSN applications, the physical phenomena which are collectively observed by the sensor nodes can be modeled by a single point source or a ﬁeld source. Here, we adopt the point source model [7] for event signal, and then we introduce the model for the event signal reconstruction. Note that, event signal reconstruction analysis introduced in this paper can be directly extended to the ﬁeld source case. 2.1

Modeling of Observed Event Signal

In WSN applications, the event signal modeled by a point source is assumed to generate a continuous random process fS (t) having a variance σS2 and a mean μS . Assuming that the point source is at the center of the coordinate axis [7], the sensor node situated at location (x, y) receives the signal given by √ x2 + y 2 − xθ2 +y2 s )e (1) f (x, y, t) = fS (t − v where v denotes the diﬀusion velocity of fS (t) and θs denotes the attenuation constant of event signal fS (t). Since f (x, y, t) is the delayed and attenuated version of fS (t), the variance and mean of f (x, y, t) can be expressed by μE (x, y) = μs e

−

2 (x, y) = (σs e− σE

√

x2 +y2 θs

(2)

√

x2 +y2 θs

)2

(3)

Using the observations which are attenuated and delayed version of the event signal, each sensor node generates its data packets. The k th data packet generated by sampling the received signal f (xi , yi , t) by sensor node ni at location (xi , yi ), i.e., Si,k , is deﬁned as (4) Si,k = f (xi , yi , tkp ) f (xi , yi , tkp+1 ) ... f (xi , yi , t(k+1)p ) where Si,k is p × 1 vector, p and k denote the packet length and packet number, respectively, f (xi , yi , tkp ) is a sample of event signal f (xi , yi , t), taken at time tkp . Next, we introduce the model for the reconstruction of event signal at the sink. 2.2

Event Signal Reconstruction

In WSNs, after the detection of an event, sensor nodes sample the event signal and send the generated data packets (Si,k ) to the sink. During this process, sensor circuitries add noise to these packets (Si,k ) as = Si,k + Ni,k Si,k

(5)

On Event Signal Reconstruction in Wireless Sensor Networks

561

where Si,k is the noisy version of the sensor packet Si,k and Ni,k is the observation 2 ). In addition to the additive noise, due to the noise, where Ni,k ∼ N (0, σN constraint in cost, power, and communication, in WSNs, packet losses often arise during the communication between the sensor nodes and the sink. In this paper, to model the packet losses, we assume an overall packet loss probability λ which includes loss due to channel errors, collision, and congestion. Hence, we deﬁne the lossy and noisy version of the sensor packet (Xi,k ) as 0 with probability λ Xi,k = Si,k + Ni,k with probability (1 − λ)

In WSNs, the packet losses heavily depend on the increasing contention in the forward path which is mainly based on the excessive communication load which results in collision, and congestion in the forward path. Therefore, the overall packet loss probability in a sensor network can be modeled by the sampling and communication parameters aﬀecting the communication load over sensor network. Using ns-2 network simulator [11], we conduct comprehensive set of simulation experiments to model the overall packet loss probability (λ) based on sampling and communication parameters used by sensor nodes. The simulation setting used in these experiments is given in Table 1. The experiment results given in Fig. 1 show that the overall packet loss probability (λ) is mainly based on the number of source nodes (M ) and their sampling frequency (f ). Using OriginLab [10] data ﬁtting toolbox, we analytically express the overall packet loss probability λ as a function of number of source nodes and their sampling frequency as follows −M 15

λ = 1 − e−0.01f /(150e

+5)

(6)

where M is the number of source nodes, f is the sampling frequency of source nodes (samples/sec.). Since uncoded transmission considerably outperforms any approach based on the separation paradigm when the measured physical phenomenon is reconstructed within some prescribed distortion level [8], we assume that the sensors perform uncoded transmission. When the sensor nodes transmit the uncoded observations, minimum mean square error (MMSE) estimation is the optimal decoding technique. Therefore, the decoded version of the sensor packet Si,k , i.e., Zi,k , can be expressed as follows 2 σE (xi , yi ) + μ2E (xi , yi ) (Xi,k ) (7) Zi,k = 2 (x , y ) + μ2 (x , y ) + σ 2 (1 − λ) σE i i E i i N By using decoded sensor packets (Zi,k ∀i, k), to generate the samples used for reconstruction of the event signal fS (t), sink ﬁrst averages the decoded sensor packets and then, arranges the successive averaged packets as follows As =

M M M

1 Zi,1 Zi,2 . . . Zi,( τ f ) p M i=1 i=1 i=1

(8)

562

¨ B. Atakan and O.B. Akan

Fig. 1. The overall packet loss probability (λ) for the varying M and f

where As (1 × τ f vector) consists of the samples used for reconstruction of the event signal fS (t), τ is a reconstruction interval (second). In order to obtain a ˆ reconstruction of the event signal fS (t), i.e., S(t), As is provided as the input to an ideal reconstruction (low-pass) ﬁlter [9] with frequency response Hr (jΩ) and impulse response hr (t). Then, the output of the ﬁlter is expressed as ˆ = S(t)

τf

As [n]hr (t −

n=1

n ) f

(9)

Here, assuming that the event signal fS (t) is a band-limited with the highest frequency component of ΩF , a cutoﬀ frequency Ωc of the ﬁlter must be ΩF < Ωc < πf − ΩF as long as πf > 2ΩF is satisﬁed.2 Hence, for cutoﬀ frequency πf , the corresponding impulse response hr (t) can be given by hr (t) =

sinπf t πf t

(10)

ˆ as follows and by substituting hr (t) into (9), we obtain S(t) ˆ = S(t)

τf n=1

As [n]

sin[π(f t − n)] π(f t − n)

(11)

Using (7), (8), (9), (10), (11), the overall reconstruction operation can be formulated as in (12). 2

This is known as Nyquist Sampling Theorem and ΩF and 2ΩF are called as the Nyquist Frequency and Nyquist Rate, respectively.

On Event Signal Reconstruction in Wireless Sensor Networks

563

2 2 σ (x , y ) + μ (x , y ) Xi,k [j] i i i i E E ˆ τ, f, M, p) = 1 S(t, × M − M e−0.01f /(150e 15 +5) k=1 j=1 i=1

sin π f t − ((k − 1)p + j) × 2 (x , y ) + μ2 (x , y ) + σ 2 π f t − (k − 1)p + j σE i i E i i N τ f /p p M

(12) ˆ τ, f, M, p) is a continuous time signal Since the reconstructed event signal S(t, as the event signal fS (t), to measure the reconstruction eﬃciency, we resample ˆ τ, f, M, p) and fS (t) with a sampling frequency which is higher than Nyquist S(t, ˆ τ, f, M, p) and fS (t), we deﬁne the rate. Based on the resampled version of S(t, deterministic reconstruction error Er used for measuring the eﬃciency of the event signal reconstruction as Er =

τf 1 ˆ τ, f, M, p])2 (fS [i] − S[i, τ f i=1

(13)

ˆ τ, f, M, p] are ith sample of fS (t) and S(t, ˆ τ, f, M, p), rewhere fS [i] and S[i, spectively. Next, using the theoretical reconstruction model given in Section 2.2, we give the performance results of the event signal reconstruction with uniform, non-uniform and irregular sampling schemes.

3

Reconstruction Performance Analysis and Results

Here, we consider the scenario in which 300 sensor nodes are randomly deployed in a 100 × 100m ﬁeld and a point source which characterizes the observed event signal is located at the center. We ﬁrst perform the event signal reconstruction with three diﬀerent sampling schemes: – Uniform Sampling: In this scheme, source nodes sample the event signal with the same sampling frequency (f ). – Non-uniform Sampling: In this scheme, source nodes select their sampling frequency according to their distance to the event location such that source nodes closer to event location use higher sampling frequency. – Irregular Sampling: In this scheme, source nodes take samples from the event signal with a probability while sampling the event signal with a sampling frequency. Then, we present the comparative results on the reconstruction performance of these sampling schemes. To make results more reliable, we take 100 Monte Carlo runs for every result in each scenario. For the event signal reconstruction process, we use fS (t) = sin(2π50t)+sin(2π120t) as an event signal which consists of 50Hz and 120Hz sinusoidal components as an event signal. According to the Nyquist Sampling Theory, since fS (t) includes 50Hz and 120Hz sinusoidal components, fS (t) can be reconstructed with the sampling frequency larger than the Nyquist rate 240 Samples/sec. We set the simulation parameters in Table 1.

564

¨ B. Atakan and O.B. Akan Table 1. Simulation Parameters Network size (100 × 100 m) Attenuation constant (θs ) 50 Number of sensor nodes 300 Number of source nodes 16 − 100 Reconstruction interval (τ ) 10 s Sampling frequency (f ) 50 − 10000 (samples/sec) Packet length (p) 100 (samples) Packet length (byte) 50 byte Sensor transmission range 20 m Sensor channel capacity 2 M b/sec Routing DSR MAC 802.11

3.1

Event Signal Reconstruction with Uniform Sampling

In this section, we perform the reconstruction of event signal with the uniform sampling scheme. In Fig. 2, Er is given with varying f for diﬀerent values of M . As observed, for f = 50 and f = 100, Er takes its maximum value because the sampling frequencies 50 and 100 are less than the Nyquist rate (240 Samples/sec). However, although the sampling frequency 200 is less than the Nyquist rate, it can reduce Er . This is mainly because that the sampling frequency 200 can be sufﬁcient for the reconstruction of the 50Hz sinusoidal component in fS (t), which decreases the overall reconstruction error. In Fig. 3 (a), the event signal fS (t) is shown. In Fig. 3 (b) the reconstructed event signal is shown by using the sampling frequency 500. Since the sampling frequency 500 is larger than the Nyquist rate, the event signal fS (t) can be satisfactorily reconstructed at the sink. The amplitude diﬀerence between fS (t) and the reconstructed signal results from the attenuation in the event signal from source nodes to sink. The common technique for improving the reconstruction accuracy is to utilize oversampling. However, as observed in Fig. 2, it cannot be always possible to decrease Er by increasing f . While f increases, as observed in Fig. 1, λ increases. Therefore, at the higher f , the suﬃcient number of samples required for the reconstruction cannot be successfully delivered to sink and Er increases. This highlights the reconstruction problem for the event signal with high frequency components, which necessitates high sampling frequency according to the Nyquist Sampling Theory. Therefore, the uniform sampling does not stand as an eﬀective sampling scheme to reconstruct the event signal with higher frequency components. Reconstruction accuracy is also inﬂuenced by M such that while M increases, due to the increasing contention, λ increases as shown in Fig. 1. Therefore, as M increases, it is getting more diﬃcult to deliver the suﬃcient number of samples to sink for the event signal reconstruction. Due to this result, as observed in Fig. 2, the smaller Er cannot be obtained at the higher f by using higher M . However, as M decreases, the smaller Er cannot be obtained at the lower f .

On Event Signal Reconstruction in Wireless Sensor Networks

565

Therefore, M must be selected according to the event signal requirements such that if fS (t) has larger bandwidth, M must be selected as smaller value and if fS (t) has smaller bandwidth, M must be selected as higher value to obtain smaller Er .

Deterministic Reconstruction Error (E r)

1

0.95

0.9

0.85 M=16 M=32 M=48 M=64 M=80

0.8

0.75

2

10

3

10 f (Samples/Sec.)

4

10

Fig. 2. Er with varying f for diﬀerent M in uniform sampling

3.2

Event Signal Reconstruction with Non-uniform Sampling

Here, to realize non-uniform sampling, we divide 100 × 100m environment to 5 levels called L1 , L2 , L3 , L4 and L5 such that from L1 to L5 , the size of the levels are 20 × 20m, 40 × 40m, 60 × 60m, 80 × 80m and 100 × 100m, respectively. While the source nodes in L1 use the sampling frequency f , the source nodes between L1 and L2 use f /2 and the source nodes between L2 and L3 use f /3 and so on. Consequently, there exist 5 diﬀerent levels of sampling frequency from f to f /5. In Fig. 4, Er is shown with varying f for diﬀerent M . For the illustration purposes, in f axis of Fig. 4, we show only the largest sampling frequency in L1 . For example, f = 1000 means that the source nodes in L1 use f = 1000 and the source nodes between L1 and L2 use f = 500 and so on. Moreover, to compute λ, we use the average sampling represents all frequency, fa , which f f f f sampling frequencies in 5 levels, i.e., fa = f + 2 + 3 + 4 + 5 /5. As observed in Fig. 4, similar to uniform sampling, in non-uniform sampling, Er is not be reduced at the higher f . However, since fa is less than f used in uniform sampling, as observed in Fig. 1, smaller λ can be obtained at the higher sampling frequencies with respect to uniform sampling. This is mainly because decrease in f results in less contention at the forward path. Therefore, at the higher f , non-uniform sampling enables WSN to deliver more information to sink and to obtain smaller Er . This allows WSN to reconstruct the event signal with higher frequency components which necessitates the higher sampling frequency. 3.3

Event Signal Reconstruction with Irregular Sampling

In irregular sampling, source nodes take the samples from the event signal with the probability α while they sample the event signal with the sample frequency f .

566

¨ B. Atakan and O.B. Akan 2

0.3

1.5

0.2

1

0.1

Amplitude

Amplitude

0.5 0

0

−0.1

−0.5 −0.2

−1

−2

M=16 f=500 Samples/Sec.

−0.3

−1.5

M=16 f=500 Samples/Sec. 0

20

40

60

80

100

−0.4

Samples

0

20

40

60

80

100

Samples

(a)

(b)

Fig. 3. (a) Event signal fS (t). (b) Reconstructed event signal.

Deterministic Reconstruction Error (E r)

1

0.95

0.9

0.85

0.8 M=16 M=32 M=48 M=64 M=80

0.75

0.7

0.65

2

10

3

10 f (Samples/Sec.)

4

10

Fig. 4. Er with varying f for diﬀerent M in non-uniform sampling

Therefore, in irregular sampling, to compute λ we use αf as a normalized sampling frequency fn . To observe the eﬀect of α on the event signal reconstruction, in Fig. 5, Er is shown with varying f for diﬀerent α. It can be possible to choose α according to the reconstruction error requirements and the event signal bandwidth such that as α decreases, since fn decreases, less contention and overall packet losses in the forward path can be obtained. Therefore, it can be possible to deliver more information to sink at the higher f and this enables the reconstruction of event signal with higher frequency components. However, for the reconstruction of event signal with low frequency components, as α decreases, since the number of delivered samples decreases, this results in increase in Er . Therefore, to obtain lower Er at lower f , α must be increased. 3.4

Comparative Results on Three Sampling Schemes

To compare the uniform, non-uniform and irregular sampling schemes, in Fig. 6, Er is shown with varying f for these three sampling schemes. In terms of Er , the non-uniform sampling scheme outperforms the uniform and irregular sampling

On Event Signal Reconstruction in Wireless Sensor Networks

567

Deterministic Reconstruction Error (E r)

1

0.95

0.9 M=48 0.85

α=0.8 α=0.6 α=0.4 α=0.2 α=0.1 α=0.01

0.8

0.75

2

10

3

10 f (Samples/Sec.)

4

10

Fig. 5. Er with varying f for diﬀerent α

scheme. For an application-speciﬁc Er constraint (Er = 0.85) shown in Fig. 6, non-uniform sampling scheme enables the sensor network to reconstruct the event signal having higher frequency components as well as providing smaller Er with respect to non-uniform and irregular sampling. For smaller f , the uniform and irregular sampling have almost the same Er performance.

Deterministic Reconstruction Error (E r)

1 Uniform sampling Non−uniform sampling Irregular sampling

0.95

M=48 α=0.2

0.9

0.85

0.8

0.75

Application specific reconstruction error constraint

0.7

0.65

2

10

3

10 f (Samples/Sec.)

4

10

Fig. 6. Er with varying f for three reconstruction schemes

However, for higher f , irregular sampling provides smaller Er performance with respect to uniform sampling scheme. Therefore, with a given applicationspeciﬁc Er constraint, it can be possible to reconstruct the event signal having higher frequency components using irregular sampling scheme. To show the energy eﬃciency of three sampling schemes, we evaluate the average energy consumption for the three reconstruction schemes. We assume that each source node consumes one unit of average energy denoted by Eav to transmit a sample of the event signal to sink. Using the same simulation parameters (M = 48, α = 0.2, τ = 10s) in Fig. 6, we can compute the average energy consumption as τ f M Eav . For all sampling schemes, τ , M and Eav are

568

¨ B. Atakan and O.B. Akan 6

x 10

0.9 r

Averaged Energy Consumption (τ fMEav)

4

1 Uniform sampling Non−uniform sampling Irregular sampling

Deterministic Reconstruction Error (E )

5 4.5

3.5 3 2.5 2 1.5 1

0.7 0.6 0.5

20x20m 30x30m 40x40m 50x50m 60x60m

0.4 0.3

0.5 0 1 10

0.8

2

3

10

10 f (Samples/Sec.)

(a)

4

10

0.2

2

10

3

10 f (Samples/Sec.)

4

10

(b)

Fig. 7. (a) Average energy consumption for three sampling schemes. (b) Er with varying f for diﬀerent size of A.

the same. However, f changes according to the sampling schemes such that while in the uniform sampling each source node transmits f samples per second, in irregular sampling, each source node transmit fn = αf samples per second and in the non-uniform sampling, on average, fa samples per second. According to this energy model, in Fig. 7 (a), for three sampling schemes, the average energy consumption is shown with varying sampling frequency. The irregular and nonuniform sampling signiﬁcantly outperforms the uniform sampling in terms of energy consumption, because the irregular sampling and the non-uniform sampling transmit the less number of samples per unit time (fn and fa , respectively) to reconstruct the event signal with respect to the uniform sampling. The reconstruction error can also be aﬀected by the attenuation in the received event signal. The attenuation in the received signal increases with the distance between the point source and the source nodes. In Fig. 7 (b), for the diﬀerent size of source node selection area (A) around the event and for M = 48, the reconstruction error is shown with varying sampling frequency by using the uniform sampling scheme. The reconstruction error (Er ) decreases while A decreases because the attenuation in the received signal decreases while A decreases. Furthermore, it is seen in Fig. 7 (b) that the minimum Er is achieved for A = 20x20m and M = 48. Therefore, to achieve this minimum reconstruction error, in each 10x10m area, there should exist at least 12 sensor nodes. This highlights the node density problem in the reconstruction process such that for a certain level of reconstruction error, a certain node density is imperative.

4

Conclusion

In this paper, theoretical analysis and results show that with proper selection of sampling and communication parameters, the observed physical phenomenon can be satisfactorily reconstructed at the sink node. It is also shown that the uniform sampling is ineﬃcient for the reconstruction of the event signal with high frequency components and non-uniform sampling and irregular sampling signiﬁcantly outperform the uniform sampling for the reconstruction of event sig-

On Event Signal Reconstruction in Wireless Sensor Networks

569

nal with high frequency components and energy conservation. Therefore, unlike the existing congestion control mechanisms, non-uniform and irregular sampling based rate control stands as a promising approach. Furthermore, as a result of the attenuation in the sensed event signal, it is also shown that to achieve a certain level of reconstruction error, certain node density is imperative.

References 1. I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, E. Cayirci: Wireless Sensor Networks: A Survey. Computer Networks Journal (Elsevier) vol. 38, Issue 4, (2002) 393–422 2. A. Kashyap, L. Alfonso, L. Monta, C. Xia, Z. Lui: Distributed Source Coding in Dense Sensor Networks In Proc. IEEE DCC (2005) 3. S. S. Pradhan, K. Ramchandran: Distributed Source Coding: Symmetric Rates and Applications to Sensor Networks. In Proc. IEEE DCC (2000) 363–372 4. W. Bajwa, A. Sayeed, R. Nowak: Matched Source-Channel Communication for Field Estimation in Wireless Sensor Networks. In Proc. IPSN’05 (2005) 332–339 5. D. Marco, D. L. Neuhoﬀ: Reliability vs. Eﬃciency in Distributed Source Coding, In Proc. IPSN’04, (2004) 6. M. Dong, L. Tong, B. M. Sadler: Source reconstruction via mobile agents in sensor networks: throughput-distortion characteristics In Proc. IEEE MILCOM 2003, Boston, USA, (2003) 7. M. C. Vuran, O. B. Akan: Spatio-temporal Characteristics of Point and Field Sources in Wireless Sensor Networks In Proc. IEEE ICC 2006, stanbul, June (2006) 8. M. Gastpar, M. Vetterli, Source-Channel Communication in Sensor Networks: In Proc. IPSN’03, Palo Alto, USA, (2003) 9. A. V. Oppenheim, R. W. Schafer, J. R. Buck: Discrete-Time Signal Processing Prentice Hall, 1999. 10. http://www.originlab.com/ 11. The Network Simulator, ns-2, http://www.isi.edu/nsnam/ns/

Peer-Assisted On-Demand Streaming of Stored Media Using BitTorrent-Like Protocols Niklas Carlsson and Derek L. Eager Department of Computer Science, University of Saskatchewan, Saskatoon, SK S7N 5C9, Canada {carlsson,eager}@cs.usask.ca

Abstract. With BitTorrent-like protocols a client may download a file from a large and changing set of peers, using connections of heterogeneous and timevarying bandwidths. This flexibility is achieved by breaking the file into many small pieces, each of which may be downloaded from different peers. This paper considers an approach to peer-assisted on-demand delivery of stored media that is based on the relatively simple and flexible BitTorrent-like approach, but which is able to achieve a form of “streaming” delivery, in the sense that playback can begin well before the entire media file is received. Achieving this goal requires: (1) a piece selection strategy that effectively mediates the conflict between the goals of high piece diversity, and the in-order requirements of media file playback, and (2) an on-line rule for deciding when playback can safely commence. We present and evaluate using simulation candidate protocols including both of these components. Keywords: BitTorrent-like systems, peer-assisted streaming, probabilistic piece selection.

1 Introduction Scalable on-demand streaming of stored media can be achieved using scalable server protocols such as patching [1] and Hierarchical Stream Merging [2], server replication as with CDNs, and/or peer-to-peer techniques. This paper concerns peer-to-peer approaches. A number of prior P2P protocols for scalable on-demand streaming have used a cache-and-relay approach [3-6]. With these techniques, each peer receives content from one or more parents and stores it in a local cache, from which it can later be forwarded to clients that are at an earlier play point of the file. Some work of this type concerns the problem of determining the set of servers (or peers) that should serve each peer, and at what rate each server should operate [7, 8]. Related ideas, based on application-level multicast architectures, have been used in protocols for live streaming [9, 10]. The above approaches work best when peer connections are relatively stable. Motivated by highly dynamic environments where peer connections are heterogeneous with highly time-varying bandwidths and peers may join and/or leave the system frequently, recent work by Annapureddy et al. [11] has considered the use of I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 570–581, 2007. © IFIP International Federation for Information Processing 2007

Peer-Assisted On-Demand Streaming of Stored Media

571

BitTorrent-like protocols [12] for scalable on-demand streaming. (Other recent work has considered use of such protocols for live streaming [13-15].) In BitTorrent-like protocols, a file is split into smaller pieces which can be downloaded (in parallel) from any other peer that has at least one piece that the peer does not have itself. In the approach proposed by Annapureddy et al. for on-demand streaming of stored media files, each file is split into sub-files, each encoded using distributed network coding [16]. Each sub-file is downloaded using a BitTorrent-like approach. By downloading sub-files sequentially, playback can begin after the first sub-file(s) have been retrieved, thus allowing a form of “streaming” delivery. Note that use of large sub-files results in large startup delays, while using very small sub-files results in close to sequential piece retrieval, which can lead to poor performance as will be shown in Section 3.3. The best choice of sub-file sizes would be workload (and possibly also client) dependent, although the method requires these sizes to be statically determined. The authors do not elaborate on how the sizes can be chosen, or how startup delays can be dynamically determined. Rather than statically splitting each file into sequentially retrieved sub-files or using a small window of pieces that may be exchanged (as in BitTorrent-like protocols that have been proposed for live streaming [13]), in this paper we propose an approach in which any of the pieces needed by a peer may be retrieved any time they are available. As in BitTorrent, selection of which piece to retrieve when a choice must be made is controlled by a piece selection policy. For the purpose of ensuring high piece diversity, which is an important objective in download systems (where the file is not considered usable until fully downloaded) [17], BitTorrent uses a rarest-first policy, giving strict preference to pieces that are the rarest among the set of pieces owned by all the peers from which it is downloading. On the other hand, in the context of streaming it is most natural to download pieces in-order. The piece selection policy proposed in this paper attempts to achieve a good compromise between the goals of high piece diversity, and in-order retrieval of pieces. We also address the problem of devising a simple on-line policy for deciding when playback can safely commence. The remainder of the paper is organized as follows. Section 2 provides a brief overview of BitTorrent. Section 3 defines and evaluates candidate piece selection policies. Section 4 addresses the problem of dynamically determining the startup delay. Finally, conclusions are presented in Section 5.

2 Overview of BitTorrent With BitTorrent files are split into pieces, which themselves are split into smaller subpieces. Multiple sub-pieces, potentially of the same piece, can be downloaded in parallel from different peers. A peer is said to have a piece whenever the entire piece is downloaded. A peer is considered interested in all peers that have at least one piece that it currently does not have itself. BitTorrent distinguishes between peers that have the entire file (called seeds), and peers currently downloading the file (called leechers).

572

N. Carlsson and D.L. Eager

In addition to the rarest-first piece selection policy, BitTorrent uses a number of additional policies that determine which peers to upload to. While each peer establishes persistent connections with a large set of peers (e.g., 80 [17]), at each time instance, each peer only uploads to a limited number of peers. Only peers that are unchoked may be sent data. Generally, clients re-evaluate the set of unchoked peers relatively frequently (e.g., every 10 seconds, each time a peer becomes interest/uninterested, and/or each time a new connection is established/broken). To discourage free-riding, BitTorrent uses a tit-for-tat policy in which leechers give upload preference to the leechers that provide the highest download rates to them. Without any measure of the upload rates from other peers, it has been found beneficial if seeds give preference to recently unchoked peers [17]. Periodically (typically every third time the set of unchoked peer is re-evaluated), each client uses an optimistic unchoke policy to probe for better pairings (or in the case of a seed, allow a new peer to download pieces).

3 Piece Selection Section 3.1 defines candidate policies, Section 3.2 describes our simulation model, and Section 3.3 evaluates the performance of the piece selection policies defined in Section 3.1. 3.1 Candidate Policies To allow playback to begin well before the entire media file is retrieved, pieces must be selected in a way that effectively mediates the conflict between the goals of high piece diversity and the in-order requirements of media file playback. Assuming that peer j is about to request a piece from peer i, we define two baseline policies: • Rarest: Among the set of pieces that peer i has and j does not have, peer j requests the rarest piece among the set of all pieces held by peers that j is connected to. Ties are broken randomly. • In-order: Among the set of pieces that peer i has, peer j requests the first piece that it does not have itself. We propose using simple probabilistic policies. Perhaps the simplest such technique is to request an in-order piece with some probability and the rarest piece otherwise. Other techniques may use some probability distribution to bias towards earlier pieces. We have found that the Zipf distribution works well for this purpose. The specific probabilistic policies considered here are as follows: • Portion (p): For each new piece request, client j uses the in-order policy with a probability p and the rarest policy with a probability (1–p). • Zipf (θ): For each new piece request, client j probabilistically selects a piece from the set of pieces that i has, but that j does not have. The probability of selecting each of these pieces is chosen to be proportional to 1/(k+1–k0)θ, where k is the index of the piece, and k0 the index of its first missing piece.

Peer-Assisted On-Demand Streaming of Stored Media

573

Note that parameters p and θ can be tuned so that the policies are more or less aggressive with respect to their preference for earlier pieces. For the results presented here the parameters are fixed at the following values: p =50%, p = 90%, and θ = 1.25. 3.2 Simulation Model A similar approach is used as in prior simulation studies of BitTorrent-like protocols [18, 16]; however, rather than restricting peers to a small number of connections, it is assumed that peers are connected to all other peers in the system. It is further assumed that pieces are split into sufficiently many sub-pieces that use of parallel download is always possible when multiple peers have a desired piece. It is assumed that a peer i can at most have ni concurrent upload connections and that no connections are choked in the middle of an upload. The set of peers that a peer i is uploading to may change when (i) it completes the upload of a piece, or (ii) some other peer becomes interested and peer i is not utilizing all its upload connections. The new set of upload connections consists of (i) any peer currently in the middle of an upload, and (ii) additional peers up to the maximum limit ni. Additional peers are selected from the set of interested peers. To simulate optimistic unchoking, with a probability 1/ni a random peer is selected, and with a probability of (ni–1)/ni the peer which is uploading to peer i at the highest rate is selected. Random selection is used to break ties. This ensures that seeds only use random peer selection. For simulating the rate at which pieces are exchanged, it is assumed that connection bottlenecks are located at the end points (i.e., either by the upload bandwidth U at the sender or by the download rate D at the receiver) and the network operates using max-min fair bandwidth sharing (using TCP, for example). Under these assumptions each flow operates at the highest possible rate that ensures that (i) no bottleneck operates above its capacity, and (ii) the rate of no flow can be increased without decreasing the rate of some other flow operating at the same or lower rate. 3.3 Performance Comparisons Throughout this paper it is conservatively assumed that there is a single persistent seed and that all other peers leave the system as soon as they have received the entire file (i.e., act only as leechers). In a real system peers are likely to continue serving other peers as long as they are still playing out the media file, and some peers may choose to serve as seeds beyond that time. With a larger aggregate available download bandwidth and higher availability of pieces the benefits of more aggressive piece selection techniques are likely to be even greater than presented here. Without loss of generality, downloading of a single file is considered, with size and play rate both equal to one using normalized units. With these normalized units, the volume of data transferred is measured in units of the file size and time is measured in units of the time it takes to play the file data. Hence, all rates are expressed relative to the play rate, and all startup delays are expressed relative to the playback time. The file is split into 512 pieces, and unless stated otherwise, peers are assumed to have three times higher download capacity than upload capacity. Each peer is assumed to upload to at most four peers simultaneously. Throughout this section, policies are evaluated with regard to the lowest possible startup delay.

N. Carlsson and D.L. Eager

100

1

80

0.1

CDF (%)

Average Startup Delay

574

inorder portion, 90% portion, 50% rarest zipf(1.25)

0.01

0.001 0

4 8 12 Client Bandwidth

60 inorder portion, 90% portion, 50% rarest zipf(1.25)

40 20

16

Fig. 1. Average achievable startup delay under constant rate Poisson arrival process. (D/U = 3, λ = 64, and ϕ = 0).

0 0.0001

0.01 1 Startup Delay

100

Fig. 2. Cumulative distribution function of the best achievable startup delay. (U = 2, D = 6, λ = 64, and ϕ = 0).

This section initially considers a simple scenario in which peers do not leave the system until having fully received the file, requesting peers arrive according to a Poisson process, and peers are homogenous (i.e., have the same upload and download bandwidth). Alternative scenarios and workload assumptions are subsequently considered. To capture the steady state behavior of the system, the system is simulated for at least 4000 requests. Further, measurements are only done for requests which do not occur near the beginning or the end of each simulation. Typically, statistics for the first 1000 and the last 200 requests are not included in the measurements; however, to better capture the steady state behavior of the in-order policy a warmup period longer than 1000 requests is sometimes required.1 Each data point represents the average of 10 simulations. Unless stated otherwise, this methodology is used throughout this paper. To illustrate the statistical accuracy of the results presented here, Fig. 1 includes confidence intervals capturing the true average with a confidence of 95%. Note that the confidence intervals are only visible for the in-order policy. Subsequent results have similar accuracy and confidence intervals are therefore omitted. Fig. 1 shows the average startup delay as a function of the total client bandwidth (U + D). The peer arrival rate is λ = 64 and the seed has an upload bandwidth equal to that of the leechers. The most significant observation is that the Zipf(1.25) policy consistently outperforms the other candidate policies. In systems with an upload capacity at least twice the play rate (i.e., U ≥ 2) peers are able to achieve startup delays two orders of magnitude smaller than the file playback time and much shorter than with the rarest-first policy. Fig. 2 presents the cumulative distribution of achievable startup delays for this initial scenario. Note that Zipf(1.25) achieves low and relatively uniform startup delays. The high variability in startup delays using the in-order policy are due to groups of peers becoming synchronized, all requiring the same remaining pieces, which only the seed has. Being limited by the upload rate of the seed these peers will, at this point, see poor download rates. With many peers completing their downloads at roughly the same time, the system will become close to empty, before a new group 1

The in-order policy was typically simulated using at least 20,000 requests.

Peer-Assisted On-Demand Streaming of Stored Media

575

of peers repeats this process. This service behavior causes the number of peers in the system using the in-order policy to follow a saw-tooth pattern. In contrast, the number of concurrent leechers, using the other policies, is relatively stable. Fig. 3(a) shows that, as expected, in-order and portion(90%) do well in systems with low arrival rates; however, Zipf(1.25) outperforms these policies at moderate and higher arrival rates. The performance of Zipf(1.25) is relatively insensitive to the arrival rate. Note also that the decrease in average delay observed for high arrival rates for the in-order policy may be somewhat misleading as the achievable startup delay in this region is highly variable, as illustrated in Fig. 2. Fig. 3(b) shows that the results are relatively insensitive to the download/upload bandwidth ratio for ratios larger than 2. In this experiment the upload rate U is fixed at 2 and the download rate D varied. Note that typical Internet connections generally have ratios between 2 and 8 [19]. The increasing startup delays using the in-order policy are caused by a larger share of seed bandwidth being spent on serving recently arrived peers (which can be served by almost every other peer). Fig. 3(c) illustrates that higher seed bandwidth allows the more aggressive (with respect to fetching pieces in order) portion and in-order policies to achieve better performance. For these results it is assumed that the maximum number of upload connections of the seed is proportional to its capacity. In the second scenario that we consider, peers arrive according to a Poisson process, but each peer may leave the system prematurely. The rate at which each peer departs, prior to its complete reception of the file, is denoted by ϕ. Fig. 4 illustrates that the results are insensitive to the rate peers depart the system. This insensitivity to early departures is a characteristic of peers not relying on retrieving pieces from any particular peer and has been verified by reproducing very similar graphs to those presented in Fig. 1, 2, and 3. In the third scenario that we consider, as motivated by measurement studies done on real file sharing torrents [20], peers are assumed to arrive at an exponentially decaying rate λ(t) = λ0e-γt, where λ0 is the initial arrival rate at time zero and γ is a decay factor. By varying γ between 0 and ∞ both a pure Poisson arrival process and a flash crowd in which all peers arrive instantaneously (to an empty system) can be captured. Fig. 5 shows the impact of γ on performance. λ0 is determined such that the expected number of arrivals within the first 2 time units is always 128. We note that with a decay factor γ = 1, 63.2% of all peer arrivals occur within the first time unit. With a decay factor γ = 6.9, the corresponding percentage is 99.9%. For these experiments no warmup period was used and simulations were run until the system emptied. Note that the performance of in-order and portion(90%) quickly becomes very poor, as the arrival pattern becomes burstier (i.e., for large γ and λ0 values). The fourth and final scenario that we consider assumes Poisson arrivals, as in the first scenario, but with two classes of peers: low bandwidth peers (UL = 0.4, DL = 1.2) and high bandwidth peers (UH = 2, DH = 6). Fig. 6 shows that the average startup delay for the high bandwidth peers significantly increases as the fraction of low bandwidth peers increases. The figure for low bandwidth peers looks very similar, with the exception that startup delays are higher (e.g., the minimum startup delay using Zipf(1.25) is roughly 0.08). Similar results have also been observed in a

576

N. Carlsson and D.L. Eager

Average Startup Delay

0.1

1

Average Startup Delay

inorder portion, 90% portion, 50% rarest zipf(1.25)

1

0.01

0.1 inorder portion, 90% portion, 50% rarest zipf(1.25)

0.01 0.001

0.001 0.1

1 10 Peer Arrival Rate

0

100

0.5

1 1.5 Defection Rate

2

2.5

1 0.1 inorder portion, 90% portion, 50% rarest zipf(1.25)

0.01 0.001 1

2 3 4 5 Client Down/Upload B/w Ratio

1

Average Startup Delay

Average Startup Delay

(a) Impact of the peer arrival rate λ (U = 2, D Fig. 4. Average achievable startup delay = 6, γ = 0, ϕ = 0). under a constant rate Poisson arrival process with early departures (U = 2, D = 6, λ = 64)

6

0.1 inorder portion, 90% portion, 50% rarest zipf(1.25)

0.01 0.001 0.01

0.1

1

10

100

Exponential Decay Factor

1

Average Startup Delay (High Bandwidth Clients)

Average Startup Delay

(b) Impact of the ratio between the client Fig. 5. Average achievable startup delay with download and upload bandwidth D/U (U = 2, an exponentially decaying arrival rate (U = 2, λ = 64, ϕ = 0). D = 6, λ(t) = λ0e-γt, λ0 = 128γ / (1 – e-2γ))

0.1 inorder portion, 90% portion, 50% rarest zipf(1.25)

0.01

0.001 1

2 3 4 5 Server/Client Upload B/w Ratio

1

0.1 inorder portion, 90% portion, 50% rarest zipf(1.25)

0.01

0.001 6

0

20

40

60

80

100

% Low Bandwidth Clients

(c) Impact of the seed bandwidth (U = 2, D = Fig. 6. Average achievable startup delay 6, λ = 64, ϕ = 0). under a constant rate Poisson arrival process with both low and high bandwidth clients (λ = Fig. 3. Impact of system parameters on the 64, ϕ = 0, UL = 0.4, DL = 1.2, UH = 2, DH = 6) achievable startup delay

scenario where all peers are assumed to have a total bandwidth of 8 (U = 2 and D = 6), but a specified fraction of the peers make only 20% of their upload bandwidth available (i.e., UL = 0.4 and UH = 2).

Peer-Assisted On-Demand Streaming of Stored Media

577

4 Dynamically Determining Startup Delay In highly unpredictable environments, with large and changing sets of peers, it is difficult to predict future system conditions. Therefore, one cannot expect any on-line strategy for selecting a startup delay to give close to minimal startup delays, without the potential of frequent playback interruption owing to pieces that have not been received by their playout point. To deal with such missing pieces, existing error concealment techniques can be applied by the media player, but at some cost in media playback quality. In this section we present a number of simple policies for determining when to start playback and evaluate how they perform when used in conjunction with the Zipf(1.25) piece selection policy. 4.1 Simple Policies Possibly the simplest policy is to start playback once some minimum number of pieces have been received. • At-least (b): Start playback when b pieces have been received, and one of those pieces is the first piece of the file. Somewhat more complex policies may attempt to measure the rate at which inorder pieces are retrieved. We define an “in-order buffer” that contains all pieces up to the first missing piece, and denote by dseq, the rate at which the occupancy of this buffer increases. Note that dseq will initially be smaller than the download rate (as some pieces are retrieved out-of-order), but can exceed the download rate as holes are filled. The rate dseq can be expected to increase over time, as holes are filled more and more frequently. Therefore, it may be safe to start playback once the estimated current value of dseq allows the in-order buffer to be filled within the time it takes to play the entire file (if that rate was to be maintained). With k pieces in the in-order buffer dseq must therefore be at least (K–k) / K times as large as the play rate r. Using this “rate condition” two rate-based policies can be defined. • LTA (b): The current value of dseq is conservatively estimated by (Lk/K)/T, where T is the time since the peer arrived to the system. With the LTA(b) policy a client starts playback when at least b pieces have been retrieved and (Lk/K)/T ≥ r(K–k)/K. (See Fig. 7.) • EWMA (b, α): The current value of dseq is estimated by (L/K)/τseq, where τseq denotes an exponentially weighted moving average of the time between additions of a piece to the in-order buffer. With the EWMA(b, α) policy a client starts playback when at least b pieces have been retrieved and (L/K)/τseq ≥ r(K–k)/K. τseq is initialized at the time the first piece of the file is retrieved to the time since the peer’s arrival to the system. When multiple pieces are inserted into the in-order buffer at once, they are considered to have been added at times equally spaced over the time period since the previous addition. For example, if the 10th, 11th and 12th pieces of the file are added to the in-order buffer together (implying that at the time the 10th pieces was received pieces 11 and 12 had previously been received), then τseq is updated three times, with each inter-arrival time being one third of the time since the 9th piece was added to the in-order buffer.

578

N. Carlsson and D.L. Eager The amount of in-order data received

data

The total amount of data received

Required amount of in-order data, if received at constant rate The amount of data played out if playback starts at time T T

time

1

EWMA+20 LTA+20 20 60 160

0.1 0.01

50 % Late Pieces

Ave. Used Startup Delay

Fig. 7. Startup condition of the LTA policy, using the amount of in-order data received by each time T

EWMA+20 LTA+20 20 60 160

40 30 20 10

0.001

0 0

4 8 12 Client Bandwidth

16

0

(a) Average startup delay.

4

8 12 Client Bandwidth

16

(b) Percentage of late pieces.

1

2 10 40

0.1

5 20 80

0.01 0.001

2 10 40

10 % Late Pieces

Ave. Used Startup Delay

Fig. 8. Performance with constant rate Poisson arrival process (D/U= 3, λ= 64, and ϕ= 0)

8

5 20 80

6 4 2 0

0

4 8 12 Client Bandwidth

(a) Average startup delay.

16

0

4

8 12 Client Bandwidth

16

(b) Percentage of late pieces.

Fig. 9. Impact of parameter b in the LTA policy (D/U = 3, λ = 64, and ϕ = 0)

4.2 Performance Comparisons Making the same workload assumptions as in Section 3.3, the above startup policies are evaluated together with the Zipf(1.25) piece selection policy. While policies may be tuned for the conditions under which they are expected to operate, for highly

EWMA+20 LTA+20 20 60 160

1 0.1

50

579

EWMA+20 LTA+20 20 60 160

40

% Late Pieces

Ave. Used Startup Delay

Peer-Assisted On-Demand Streaming of Stored Media

30 20

0.01

10

0.001 0.01

0.1 1 10 Exponential Decay Factor

100

0 0.01

0.1 1 10 Exponential Decay Factor

(a) Average startup delay.

100

(b) Percentage of late pieces.

0.1 EWMA+20 LTA+20 20 60 160

0.01

0.001 0

20 40 60 80 % Low Bandwidth Client

50

EWMA+20 LTA+20 20 60 160

40 30 20 10 0 0

100

(a) Average startup delay for high bandwidth clients.

Ave. Used Startup Delay (Low Bandwidth Clients)

% Late Pieces (High Bandwidth Clients)

1

50

1

0.1 EWMA+20 LTA+20 20 60 160

0.01

0.001 0

20 40 60 80 % Low Bandwidth Client

100

(c) Average startup delay for low bandwidth clients.

20 40 60 80 % Low Bandwidth Client

100

(b) Percentage of late pieces for high bandwidth clients.

% Late Pieces (Low Bandwidth Clients)

Ave. Used Startup Delay (High Bandwidth Clients)

Fig. 10. Performance with an exponentially decaying arrival rate (U=2, D=6, λ(t) = λ0e-γt, λ0 = 128γ / (1 – e-2γ))

EWMA+20 LTA+20 20 60 160

40 30 20 10 0 0

20 40 60 80 % Low Bandwidth Client

100

(d) Percentage of late pieces for low bandwidth clients.

Fig. 11. Performance with heterogeneous clients (λ = 64, ϕ = 0, UL = 0.4, DL = 1.2, UH = 2, DH = 6)

dynamic environments, it is important for policies to adapt as the network condition changes. To evaluate the above policies over a wide range of network conditions the four scenarios from Section 3.3 are used. Most comparisons are of: (i) at-least(20), (ii)

580

N. Carlsson and D.L. Eager

at-least(60), (iii) at-least(160), (iv) LTA(20), and (v) EWMA(20, 0.1). Fig. 8 through 11 present the average startup delay and the percentage of pieces that are not retrieved in time for playback. Again, note that such pieces could be handled by the media player using various existing techniques, although with some degradation in quality. Fig. 8 and 9 present results for the first scenario. Fig. 8 shows that the dseq estimate, used by LTA(20) and EWMA(20,0.1), allows these policies to adjust their startup delay based on the current network conditions. These policies increase their startup delay enough so as to ensure a small percentage of late pieces, for reduced client bandwidths. This is in contrast to the at-least policy which always requires the same number of in-order pieces to be received before starting playback (independent of current conditions). Fig. 9 shows the impact of using different b-values with the LTA policy. As can be observed, most benefits can be achieved using b equal to 10 or 20. These values allow for relatively small startup delays to be achieved without any significant increase in the percentage of late pieces. While omitted, results for the second scenario suggest that the results are relatively insensitive to departure rates. Fig. 10 presents the results for the third scenario. Here, the exponential decay factor (measuring the burstiness with which peers arrive) is varied four orders of magnitude. Fig. 11 presents the results for scenario four, in which arriving peers belong to one of two classes, high and low bandwidth clients. For this scenario, the portion of low bandwidth peers is varied such that the network conditions change from good (where most peers are high bandwidth clients) to poor (where the majority of peers are low bandwidth clients). As in previous scenarios, we note that both LTA(20) and EWMA(20,0.1) adjust well to the changing network conditions, while the at-least policy is non-responsive and do not adjust its startup delays. This is best illustrated by the relatively straight lines and/or high loss rates observed by this policy. Designed for highly dynamic environments we find the LTA(b) policy promising. It is relatively simple, uses a single parameter, and is somewhat more conservative than EWMA(b,α), which may potentially give too much weight to temporary changes in the rate at which the in-order buffer is being filled.

5 Conclusion This paper considers adaptations of the BitTorrent-like approach to peer-assisted download that provide a form of streaming delivery, allowing playback to begin well before the entire file is received. A simple probabilistic piece selection policy is shown to achieve an effective compromise between the goal of high piece diversity, and inorder piece retrieval. Whereas one cannot expect any on-line strategy for selecting startup delays to give close to minimal startup delays, we find that a simple rule based on the average rate at which in-order pieces are retrieved to give promising results.

References 1. Hua, K. A., Cai, Y., Sheu, S.: Patching: A Multicast Technique for True Video-on-Demand Services. In: Proc. ACM MULIMEDIA ’98, Bristol, U.K., Sept. 1998, pp. 191--200. 2. Eager, D. L., Vernon, M. K., Zahorjan, J.: Optimal and Efficient Merging Schedules for Video-on-Demand Servers. In: Proc. ACM MULTIMEDIA ’99, Orlando, FL, Nov. 1999, pp. 199--202.

Peer-Assisted On-Demand Streaming of Stored Media

581

3. Cui, Y., Li, B., Nahrstedt, K.: oStream: Asynchronous Streaming Multicast in Applicationlayer Overlay Networks, IEEE Journal on Selected Areas in Communications (Special Issue on Recent Advances in Service Overlays) 22 (1) (2004), pp. 91--106. 4. Bestavros, A., Jin, S.: OSMOSIS: Scalable Delivery of Real-time Streaming Media in Adhoc Overlay Networks. In: Proc. ICDCS Workshops ’03, Providence, RI, May 2003, pp. 214--219. 5. Sharma, A., Bestavros, A., Matta, I.: dPAM: A Distributed Prefetching Protocol for Scalable Asynchronous Multicast in P2P Systems. In: Proc IEEE INFOCOM ’05, Miami, FL, Mar. 2005, pp. 1139--1150. 6. Luo, J-G., Tang, Y., Yang, S-Q.: Chasing: An Efficient Streaming Mechanism for Scalable and Resilient Video-on-Demand Service over Peer-to-Peer Networks. In: Proc. IFIP NETWORKING ’06, Coimbra, Portugal, May 2006, pp. 642--653. 7. Rejaie, R., Ortega, A.: PALS: Peer-to-Peer Adaptive Layered Streaming, in: Proc. NOSSDAV ’03, Monterey, CA, June 2003, pp. 153--161. 8. Hefeeda, M., Habib, A., Botev, B., Xu, D. Bhargava, B.: PROMISE: Peer-to-Peer Media Streaming using CollectCast. In: Proc. ACM MULTIMEDIA ’03, Berkeley, CA, Nov. 2003, pp. 45--54. 9. Castro, M., Druschel, P., Rowstron, A., Kermarrec, A-M., Singh, A., Nandi, A.: SplitStream: High-Bandwidth Multicast in Cooperative Environments. In: Proc. ACM SOSP ’03, Bolton Landing, NY, Oct. 2003, pp. 298--313. 10. Kozic, D., Rodriguez, A., Albrecht, J., Vahdat, A.: Bullet: High Bandwidth Data Dissemination using an Overlay Mesh. In: Proc. ACM SOSP ’03, Bolton Landing, NY, Oct. 2003, pp. 282--297. 11. Annapureddy, S., Gkantsidis, C., Rodriguez, P. R.: Providing Video-on-Demand using Peer-to-Peer Networks, Proc. Workshop on Internet Protocol TV (IPTV) ’06, Edinburgh, Scotland, May 2006. 12. Cohen, B.: Incentives Build Robustness in BitTorrent. In: Proc. Workshop on Economics of Peer-to-Peer Systems, Berkeley, CA, June 2003. 13. Zhang, X., Liu, J., Li, B., Yum, T-S. P.: CoolStreaming/DONet: A Datadriven Overlay Network for Peer-to-Peer Live Media Streaming. In: Proc. IEEE INFOCOM ’05, Miami, FL, March 2005, pp. 2102--2111. 14. Zhang, M., Zhao, L., Tang, Y., Luo, J-G., Yang, S-Q.: Large-scale Live Media Streaming over Peer-to-Peer Networks through Global Internet. In: Proc. Workshop on Advances in Peer-to-Peer Multimedia Streaming ’05, Singapore, Nov. 2005, pp. 21--28. 15. Liao, X., Jin, H., Liu, Y., Ni, L. M., Deng, D.: AnySee: Peer-to-Peer Live Streaming. In: Proc. IEEE INFOCOM ’06, Barcelona, Spain, Apr. 2006. 16. Gkantsidis, C., Rodriguez, P. R.: Network Coding for Large Scale Content Distribution. In: Proc. IEEE INFOCOM ’05, Miami, FL, Mar. 2005, pp. 2235--2245. 17. Legout, A., Urvoy-Keller, G., Michiardi, P.: Rarest First and Choke Algorithms are Enough. In: Proc. ACM IMC ’06, Rio de Janeiro, Brazil, Oct. 2006. 18. Bharambe, A. R., Herley, C., Padmanabhan, V. N.: Analyzing and Improving a BitTorrent Network's Performance Mechanisms. In: Proc. IEEE INFOCOM ’06, Barcelona, Spain, Apr. 2006. 19. Saroiu, S., Gummadi, K. P., Gribble, S. D.: A Measurement Study of Peer-to-Peer File Sharing Systems. In: Proc. IS&T/SPIE MMCN ’02, San Jose, CA, Jan. 2002, pp. 156-170. 20. Guo, L., Chen, S., Xiao, Z., Tan, E., Ding, X., Zhang, X.: Measurement, Analysis, and Modeling of BitTorrent-like Systems. In: Proc. ACM IMC ’05, Berkley, CA, Oct. 2005, pp. 35--48.

Multiple Identities in BitTorrent Networks Jin Sun, Anirban Banerjee, and Michalis Faloutsos Department of Computer Science and Engineering University of California, Riverside Riverside, CA 92521 {jsun,anirban,michalis}@cs.ucr.edu

Abstract. Peer-to-peer (P2P) ﬁle sharing systems have become ubiquitous and at present the BitTorrent (BT) based P2P systems are very popular and successful. It has been argued that this is mostly due to the Tit-For-Tat (TFT) strategy used in BT [1] that discourages free-ride behavior. However, Hale and Patarin [2] identify the weakness of TFT and hypothesize that it is possible to use multiple identities to cheat. To test this hypothesis we modify the oﬃcial BT source code to allow the creation of multiple processes by one BT client. They use diﬀerent identities to download the same ﬁle cooperatively. We experiment with several piece selection and sharing algorithms and show that BT is fairly robust to the exploitation of multiple identities except for one case. In most cases, the use of multiple identities does not provide siginiﬁcant speedup consistently. Interestingly, clients with multiple identities are still punished if they do not maintain a comparable upload rate with other normal clients. We attribute this to the robust way that the TitFor-Tat policy works. From our experiments we observe that the BT protocol is rather resilient to exploits using multiple identities and it encourages self-regulation among BT clients. Keywords: Peer-to-peer networks, BitTorrent, fairness, resource allocation.

1

Introduction

Peer-to-peer ﬁle sharing systems enable large-scale content distribution, by allowing users to cooperate with each other and voluntarily share their resources, mostly ﬁles. There are numerous P2P clients available nowadays, such as KaZaA [3], GNUtella [4], eDonkey [5] and BitTorrent [1]. P2P applications have been known to contribute signiﬁcantly to Internet traﬃc as indicated by some measurements of the Internet backbone recently [6]. Among all these ﬁle-sharing applications, BitTorrent seems to be the most popular one and has evolved to account for a large portion of the P2P traﬃc on the Internet [7]. BitTorrent achieves a higher level of robustness and resource utilization than most currently known cooperative techniques [1]. It works by grouping users, who have common interests in downloading a speciﬁc ﬁle, together into swarms to cooperate with one another to speed up the download process. BitTorrent I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 582–593, 2007. c IFIP International Federation for Information Processing 2007

Multiple Identities in BitTorrent Networks

583

is distinctive for its “choking/unchoking” algorithms that promote high-level reciprocation among users. The “Tit-for-Tat” (TFT) strategy, an integral part of BT, encourages users to cooperate: A BT client uploads more generously to its peers which reciprocate and provide it with the data it needs. Those who do not share run the risk of being neglected by the swarm. Some people believe that the TFT strategy is the key strength for BT’s success. However, Hale and Patarin [2] highlight some weaknesses of the TFT policy and state that it is possible to fake identity in BitTorrent and get free ride because there is no mechanism to provide trust-based identiﬁcation authentication. Although it has been pointed out in [2] that it is possible to use fake peer IDs in BitTorrent networks, to the best of our knowledge there is no detailed description or implementation of this approach in related literature. In some sense, it is natural to expect that selﬁsh users would try to get freerides, at the cost of other benevolent users in the swarm given the fact that there is no monitoring mechanism such as reputation management in P2P networks [8, 9]. In addition to the tendency to free-riding, a user may also wish to use multiple identities to speed up the download process. For example, a peer may have two diﬀerent IPs and want to create two BT processes that can work cooperatively to download the same ﬁle without any overlap. This is especially attractive when the ﬁle being downloaded is very large. However, this is not currently supported by the oﬃcial BT client and other compatible implementations. The questions we attempt to answer through our research eﬀort are: 1. Does use of multiple identities in BT help speed up downloads? 2. How is the swarm eﬀected by such selﬁsh behaviors? Motivated by these interesting questions, we have developed and implemented a modiﬁed BT client based on the oﬃcial BT source code. Using the modiﬁed BT client, a user can (1) create multiple processes with diﬀerent IDs and these (2) processes can cooperatively download the same ﬁle using diﬀerent piece selection and sharing algorithms. Our implementation seems to be the ﬁrst to explore this idea of multiple identities for BT users. To answer these questions we conducted extensive experiments with the modiﬁed BT client in the CS labs at UC Riverside. We list our contributions below: 1. Using multiple identities does not provide consistent beneﬁts: From our experiments, we ﬁnd that the modiﬁed client can only achieve limited speedup in very speciﬁc cases. We ﬁnd that BT is fairly robust to the exploitation of multiple identities. 2. Modiﬁed clients are penalized like normal clients: We observe that modiﬁed clients need to maintain a comparable upload rate with other normal clients, otherwise they suﬀer from high ﬁle download latency. We attribute this robustness of BT to the Tit-For-Tat policy, which is an eﬀective mechanism to implement fairness in the swarm. The rest of the paper is organized as follows. In Section 2, we provide some more background information

584

J. Sun, A. Banerjee, and M. Faloutsos

on BitTorrent systems. In Section 3, we describe how to create multiple processes with diﬀerent IDs to download the same ﬁle cooperatively and propose a few diﬀerent piece selection and sharing algorithms. In Section 4, we describe our experiments with the modiﬁed client and analyze the results. In Section 5, we conclude this paper.

2

BitTorrent File Sharing System

Before describing our modiﬁcations to the BT client, it is necessary to give some more details on the BitTorrent protocol. Data: BitTorrent achieves eﬃcient content distribution by swarm download. The basic idea is to split a ﬁle into equal-size pieces and have clients download pieces from diﬀerent peers simultaneously [1]. Each piece is further split into blocks, which is the basic transmission unit in BitTorrent. Each node sends a request for each block it does not have and also advertises the blocks it has to its peers. To download a target ﬁle, a user ﬁrst needs to ﬁnd a torrent ﬁle which contains enough information about the ﬁle to download, especially the URL of a tracker. Network: A tracker is a centralized machine that keeps track of all the peers participating in a swarm. Active peers report their status once a while to the tracker and also get up-to-date information from the tracker about one another. Then these peers can download/upload blocks among themselves and share the burden of ﬁle distribution cooperatively. In the BT protocol, individual peers choose their own Peer IDs arbitrarily. Peer IDs are not used for authentication. Trackers track these IDs only to facilitate peer-to-peer connections. So it is possible to use multiple peer IDs on behalf of a user without being detected by either the tracker or other peers. It should also be noted that peers cannot be diﬀerentiated by IP addresses alone in BT networks for two reasons. One is that BT supports multiple connections behind Network Address Translator s (NATs), which means that several peers behind the same NAT can have the same global IP address. The second reason is that connections from behind a proxy are also accepted in BT. To summarize, BT does not prevent a user from launching multiple cooperative processes to download the same ﬁle. The question is whether users can take advantage of this seeming weakness.

3

BT Client Implementation with Multiple Identities

In this section, we describe the changes we have made to the original BT client implementation. These changes enable the creation of N BT processes (with N diﬀerent IDs) for a user to download one ﬁle cooperatively. If one process has already downloaded some pieces, then the other processes do not need to request and download the same pieces again. We can use diﬀerent approaches to assign individual pieces to these process.

Multiple Identities in BitTorrent Networks

585

When starting to download a ﬁle, a user can specify the total number of processes to be created, say N . Each process has a unique Peer ID and can register with the tracker successfully and request for a peer list. Once a process receives the peer list from the tracker, it acts as a normal BT client following the BT protocol. For other peers, these processes do not exhibit any abnormal behavior other than that they may have the same IP address. Our approach diﬀers from the standard mechanism regarding piece selection. A process sends a new request to its peers for a piece, only picking one from its own “task list”, instead of from the entire piece list. By splitting the target ﬁle into several parts and assigning diﬀerent parts to diﬀerent processes, we cut down the actual download size by N for individual process. Naturally, we expect that this approach may reduce the overall download time for one ﬁle although we will soon see that some BT dynamics prevent this from happening easily. There can also be some variations of the basic scheme. For example, rather than creating ﬁxed “task lists” for individual processes, it is also possible for these processes to exchange information among themselves and decide which pieces to download next by skipping those pieces that have been downloaded or are being downloaded by other processes. Another variation is whether these processes can upload to their peers those pieces that have been downloaded or are being downloaded by other processes. Limitations: It should be noted that we cannot create an arbitrary number of processes without limit for several reasons. First, the download capacity (bandwidth) is a limited resource and shared by all the processes concurrently if they run on the same machine. When the total number of processes increases, the download bandwidth for each process decreases, thus the download time will be aﬀected. This can be more conspicuous when both upload and download traﬃc are multiplexed on the same physical link and the download traﬃc is also aﬀected by the upload traﬃc. Second, BitTorrent protocol is built on TCP and usually all the processes have to share the same TCP buﬀer and too many concurrent network connections would degrade TCP performance. Third, TCP ﬂow control might also aﬀect the performance when too many connections are established. Next, we describe the four diﬀerent algorithms for piece assignment and sharing among concurrent processes in detail. 3.1

Fixed-Range Piece Assignment with No Piece-Sharing Among Processes

This is the most straightforward approach to share the download task among concurrent processes. The ﬁle to be downloaded (the target ﬁle) is divided evenly into N parts. Each process is assigned a part, which is the “task list” that needs to be accomplished. When a process sends a new request to its peers, it only chooses a piece from its own “task list”, instead of from the entire piece list. This approach is simple and incurs the least overhead and these processes can run on diﬀerent machines.

586

J. Sun, A. Banerjee, and M. Faloutsos

However, in this simple approach, there is no information exchange between diﬀerent processes. Each process behaves independently and there is no piece sharing among processes. One possible limitation of this approach is that each single process may have higher probability of being choked by others, because it can only oﬀer a subset of the ﬁle pieces that others might want. 3.2

Fixed-Range Piece Assignment with Piece-Sharing Among Processes

In the second approach, when generating a new request, the N concurrent processes still use the same piece assignment algorithm as discussed earlier. However, here processes also try to coordinate with one another. They exchange information among themselves about what pieces have been downloaded as a whole. That is, when each process announces what pieces they have downloaded, it will also include those pieces that have been downloaded by other processes. Thus each modiﬁed process is also able to upload pieces downloaded by other processes. This may help to reduce the chance of these modiﬁed processes being choked by others because they can oﬀer to upload more pieces. To share pieces among these diﬀerent processes, there are two ways. One is to set up dedicated connections among these processes and then they transfer the actual pieces they have downloaded to one another. However, this incurs more communication overhead and implementation complexity. The other way to to have all the processes run on the same machine and each process can copy the pieces it has downloaded as individual ﬁles to a common directory accessible to all. This allows each process to check this directory and also upload to its peers pieces downloaded by other processes. One caveat with this approach is that since these processes need to run on the same machine, they may be prevented from connecting to the same peer and they also need to share the available the bandwidth and the available TCP buﬀer. However, because of its simplicity we choose this approach in our implementation. 3.3

Random Piece Assignment with No Piece-Sharing Among Processes

The third approach diﬀers in terms of its random piece selection algorithm: Whenever one process is about to select a piece to request, it no longer chooses from the ﬁxed subset of pieces like the two aforementioned approaches. Instead, it randomly picks one among all the pieces that are not requested/downloaded by any other processes. Each process needs to request the piece which the BitTorrent protocol returns (surely that piece needs to be among those that other connecting peers can provide), otherwise it may lose the opportunity to get pieces from others if they stick to the ﬁxed piece assignment discussed earlier. When they generate less requests to other peers, they may not make the best use of the available download bandwidth.

Multiple Identities in BitTorrent Networks

587

Obviously, it is necessary for each process to be aware of what pieces other processes have already had. Each process needs to check (via ﬁle) what pieces the other processes are downloading and avoid downloading the same ones. Therefore, we can use the same approach by copying every piece to a common directory and then any process needs to check this directory ﬁrst before generating a new request. Similar to the previous approach, this simpliﬁes implementation although it requires all processes to run on the same machine. 3.4

Random Piece Assignment with Piece-Sharing Among Processes

In this approach, the major diﬀerence with the previous scheme is that each process also advertises to its peers pieces downloaded by other processes. So each process can also upload to its peers pieces downloaded by other processes and may reduce the possibility of being choked by its peers. Due to space related constraints, we refrain from describing changes made to the original BT client code. However, our source code and commentary are available on our web site [10] to beneﬁt the research community.

4

Experiments and Discussions

To validate our methodology, we have conducted experiments in swarms created in the CS labs at UC Riverside. In all experiments, we run both the modiﬁed clients alongside the original BT clients to compare their performance. 4.1

Experiment Setup

Experiments in UCR intranet are conducted in an easily controlled environment, so they can help us to assess the impact of diﬀerent approaches precisely. We deploy 40 nodes to download a 30MB ﬁle. In this swarm, there is one tracker and 1 upto 6 seeders. Only one leecher runs our modiﬁed client (which actually creates multiple processes as described in Section 3), other leechers and seeders use the original BT client. We use BitTorrent’s default parameter settings for all nodes except: – Node Degree: This parameter determines the neighborhood size, i.e., the number of concurrent connections a node will maintain. It is set to 10 in our experiments. – Maximum upload rate: We only change it for the modiﬁed client. For normal clients and seeders, this parameter defaults to 20 Kbps. – Number of processes created : This parameter applies to the modiﬁed BT client only. One modiﬁed BT client can create 2–5 processes to download the same ﬁle cooperatively.

588

J. Sun, A. Banerjee, and M. Faloutsos

4.2

Experiment Results and Analysis

Although we have experimented with four diﬀerent approaches, we ﬁrst show the results for the two approaches that do not do piece-sharing and then later we discuss the results for the approaches that use piece-sharing. Ranged piece assignment with no piece-sharing among processes. To evaluate this approach, we compare the performance of our modiﬁed client with the original BT client in terms of download completion time by varying: 1. the number of seeders in the swarm 2. the maximum upload speed for the modiﬁed client 3. the number of concurrent processes in the modiﬁed client respectively. The results are shown in Figs. 1, 2 and 3 respectively. Comparison of download time: We present Fig. 1, which compares the download time between normal BT clients and the modiﬁed BT client. The latter uses two concurrent processes to download the ﬁle cooperatively. We vary the number of seeders to see how it can aﬀect the download speed. In this set of experiments, both normal BT clients and the modiﬁed BT client have the same default upload speed of 20Kbps. This graph clearly shows that when there is only one seeder in the swarm, the modiﬁed client can hardly download faster than the original client. When the number of seeders increases to 2, there is a signiﬁcant speedup achieved by the modiﬁed client. However, when the number of seeders increases further, the speedup is more level for the modiﬁed client. It is easy to explain the speedup for the original client because more seeders translates to more upload capability oﬀered by the swarm. For the modiﬁed client, the case is a bit more complicated. When there is only one seeder in the swarm, the eﬀect of the rarest-ﬁrst piece selection algorithm is the most conspicuous: The availability of any piece is mostly limited by the upload capability of the seeder which is only 20 Kbps. Most of the time the two processes that are created by the modiﬁed client are blocked by the availability of the requested pieces and they in fact spend much time waiting for the pieces to become available in the swarm. Eﬀect of varying number of seeders: When there are more than one seeders in the swarm, the modiﬁed client can achieve much higher speedup than the original client. The reason is that the availability of the pieces is much less constrained by the upload capacity of the seeders and that each process only needs to download one half of the ﬁle so the download time is much shorter. When more seeders are available, they do not help the modiﬁed client much because the upload capacity of seeders are shared by both original clients and modiﬁed client and increasing one seeder does not increase the capacity much. Because modiﬁed client beneﬁts from more than one seeders as discussed above, in later experiments we focus on the case when there are three seeders in the swarm.

Download Completion Time (min)

Multiple Identities in BitTorrent Networks

589

Average Download Time with Different Number of Seeders (upload = 20kbps) 35 Process 1 Process 2 30 Normal Client 25 20 15 10 5 0 1

2

3 Seeder Number

4

5

Download Completion Time (min)

Fig. 1. Average download time with diﬀerent number of seeders in the swarm (upload rate = 20 Kbps) Average Download Time with Different Max Upload Rate in Modified Client 35 Process 1 Process 2 30 Normal Client 25 20 15 10 5 0 5

10

20

30

40

60

Max Upload Speed (kBS)

Download Completion Time (min)

Fig. 2. Average download time with diﬀerent maximum upload speed in our modiﬁed BT client Average Download Time with Different Number of Concurrent Processes 35 Modified Client Normal Client 30 25 20 15 10 5 0 2

3 4 Number of Concurrent Processes

5

Fig. 3. Average download time with diﬀerent number of concurrent processes in our modiﬁed BT client. Processes number varies from 2 to 5.

590

J. Sun, A. Banerjee, and M. Faloutsos

Eﬀect of varying upload speed: In Fig. 2, we show the eﬀect of varying upload speed for the modiﬁed client. It can be observed that for the modiﬁed client the download time curve follows three stages. When the modiﬁed client decreases its upload speed below the norm, its download time increases even though each process only needs to download only one half of the ﬁle. The reason is that these processes can oﬀer only half of the pieces that other peers are interested in and they may be choked more often by other peers. When they are unwilling to upload more data to other peers, then they are punished more severely. So they should not decrease upload rate below the norm. When modiﬁed client increases its upload rate, the beneﬁt of the need to download only one half of the ﬁle is more conspicuous so the download time decreases sharply. This also partially compensates for the deﬁciency of the limited availability of pieces from these processes. They can increase their upload rate to help reduce the possibility of being choked by others. However, further increasing the upload rate will not help much beyond a certain point because by that time these processes cannot necessarily achieve the maximum upload rate as they have limited pieces to oﬀer and other peers will not request pieces from them so often. It is also interesting to note that increasing upload rate may negatively aﬀect the download speed as shown from the slight tip at 40 Kbps. This is due to the reason that these processes have their upload and download traﬃc multiplexed and too much upload traﬃc can reduce their download speed and seeders may also reduce upload rate to these processes further. This shows that the modiﬁed client should not cheat by decreasing upload rate below the norm and it should not be too generous either. Eﬀect of varying number of processes: Fig. 3 shows the results when we vary the number of processes created by the modiﬁed client. As usual the number of seeders is still three. We can see that for normal clients, the download time decreases slightly with the increase of processes created by the modiﬁed client. This is easy to explain because with more processes participating in the swarm, they oﬀer more upload capacity and normal clients can beneﬁt from this. However, for the modiﬁed client the beneﬁt is not so clear-cut. Increasing processes reduces the workload of individual processes, however this also reduce the number of pieces that these processes can oﬀer to others in exchange for higher download speed. This also increases the possibility that these processes are choked by others. So it is not to the modiﬁed client’s advantage to increase the number of processes arbitrarily and creating two processes has already helped a lot. We also note that in a swarm full of free-riders who refuse to upload anything, our modiﬁed client performs much better than the other normal clients. However, the overall download time for all the clients in the swarm increases a lot and everyone suﬀers. In reality, users have to behave to avoid the slowdown. So BT protocol eﬀectively curbs selﬁsh behavior and encourages self-regulation among users.

Multiple Identities in BitTorrent Networks

591

Random piece assignment with no piece-sharing among processes. In this approach, each process does not stick to a ﬁxed-range of pieces to download, instead it randomly picks one piece that are not requested by other processes. Please note that in this approach the processes created by the modiﬁed client need to run on the same machine and have to share the available download and upload bandwidth. To evaluate this approach, we also vary the number of seeders, maximum upload speed for the modiﬁed client and the number of concurrent processes and compare the download time between the modiﬁed client and the original BT client. The results are shown in Figs. 4, 5 and 6. Average Download Time with Different Number of Seeders

Download Completion Time (min)

30 Process 1 Process 2 Normal Client

25

20

15

10

5

0 2

3

4

5

6

Seeder Number

Fig. 4. Average download time with diﬀerent number of seeders Average Download Time with Different Max Upload Rate in Modified Client

Download Completion Time (min)

30 Process 1 Process 2 Normal Client

25

20

15

10

5

0 0

10

20

30

40

50

60

70

80

90

Max Upload Speed (kBS)

Fig. 5. Average download time with diﬀerent upload speed in our modiﬁed BT client

Comparison of download times: Evaluating Fig. 4 alongside Fig. 1, we can observe that the download time varies a lot more in this approach. The main reason is that the load for the two processes may be uneven: One process may download more pieces than the other process because once it starts downloading the rarest piece according to BT’s piece selection algorithm, the other cannot request it and have to wait for the next available one. The other process may be

592

J. Sun, A. Banerjee, and M. Faloutsos Average Download Time with Different Number of Concurrent Processes 30

Download Completion Time (min)

Modified Client Normal Client

25

20

15

10 2

3

4

5

Number of Concurrent Processes

Fig. 6. Average download time with diﬀerent number of concurrent processes in our modiﬁed BT client

choked more often by other peers. However, when there are more seeders in the swarm, then the other process may still beneﬁt a lot from the increased upload capability oﬀered by these seeders and the download time is reduced. Eﬀect of varying upload rate: Fig. 5 shows that this approach beneﬁts more by increasing the upload rate in comparison with Fig. 2. This can be explained as follows. Because these two processes may have uneven load, one process that has less pieces to oﬀer may need to increase its upload rate to oﬀer more data to its peers to reduce the possibility of being choked by others. Fig. 6 shows that with the default upload rate, the download time cannot be reduced by increasing the number of processes. This is easy to explain because when there are more processes, the load of each process is distributed more unevenly and usually one process may shoulder most of the load and other processes have little to download and hence cannot contribute much to the download task. Approaches that do piece-sharing. Earlier we have shown the results for the approaches that do not do piece-sharing. We have also experimented with approaches that do piece-sharing. However, these piece-sharing approaches also yield only satisfactory results under a few cases: 1. When there are more seeders in the network. 2. When maximum upload rate is increased. The speedup is limited due to the following reason. Although these processes may upload more pieces to their peers, this does not increase the number of pieces that are available to themselves even though their peers are willing to reciprocate with more pieces. In this case, their upload and download rate are unbalanced and they end up with faster uploading but without faster download. To summarize, there is no easy way to achieve consistent speedup under any case by using multiple identities with either piece selection and sharing algorithm. We do not mean to say that all possible strategies to achieve download

Multiple Identities in BitTorrent Networks

593

speedup have been exhausted. However, it should be noted that given the highly dynamic nature of BT networks, it would be extremely challenging if ever possible to devise a strategy that works in most if not all scenarios.

5

Conclusion

In this paper, we have described the seeming weakness in BT systems, i.e., the possibility of using multiple identities to cheat which was ﬁrst identiﬁed by Hale and Patarin [2]. However, there was no implementation that exploit such weakness. Then we describe our modiﬁcation to the oﬃcial BT client implementation to allow the creation of multiple processes in one BT client to download the same ﬁle cooperatively. Our extensive experiments with diﬀerent piece selection and sharing algorithms show that it is possible to achieve speedup in a few selected cases. However, no strategy can guarantee a speedup. In fact, increasing the number of processes may even hurt the performance of the modiﬁed client. The modiﬁed BT client helps only when the user is a free-rider, however then the overall download speed of that user will be low and the the overall network performance degrades with many such users. This shows that the BT protocol is rather resilient to the exploit of using multiple identities. We argue that if such exploits were easily achievable, then BT systems would have suﬀered a breakdown long time ago.

References 1. Cohen, B.: Incentives Build Robustness in BiT Torrent. http://www.bittorrent. com/bittorrentecon.pdf 2. Hale, D., Patarin, S.: How to cheat BitTorrent and why nobody does. In: Technical Report UBLCS 05/12/05. (2005) 3. Liang, J., Kumar, R., Xi, Y., Ross, K.K.: Pollution in P2P File Sharing Systems. In: IEEE INFOCOM. (2005) 4. GNUtella. http://en.wikipedia.org/wiki/Gnutella 5. EDonkey. http://en.wikipedia.org/wiki/EDonkey2000 6. Karagiannis, T., Broido, A., Brownlee, N., Faloutsos, M.: Is P2P dying or just hiding? In: Globecom, Dallas, TX, USA. (2004) 7. Mennecke, T.: BitTorrent Remains Powerhouse Network. In: Slyck News. (2005) 8. Buragohain, C., Agrawal, D., Suri, S.: A Game-Theoretic Framework for Incentives in P2P Systems. In: International Conference on Peer-to-Peer Computing. (2003) 9. Golle, P., Leyton-Brown, K., Mironov, I., Lillibridge, M.: Incentives For Sharing in Peer-to-Peer Networks. In: Proceedings of the 3rd ACM conference on Electronic Commerce. (2001) 10. Sun, J.: Multiple Identities in BitTorrent Networks. http://www.cs.ucr.edu/˜jsun/ BitTorrent.html

Graph Based Modeling of P2P Streaming Systems Damiano Carra1 , Renato Lo Cigno1 , and Ernst W. Biersack2 1

Dip. di Informatica e Telecomunicazioni, Universit` a di Trento, Trento, Italy {carra,locigno}@dit.unitn.it 2 Institut EURECOM, Sophia Antipolis, France [email protected]

Abstract. This paper addresses the study of fundamental properties of stream-based content distribution services. We assume the presence of an overlay network with limited connectivity degree, and we develop a mathematical model that captures the essential properties of overlaybased streaming protocols and systems. The methodology is based on graph theory and models the streaming system as a stochastic process, whose characteristics are related to the streaming protocol. The model can capture the transient behavior of the distribution graphs, i.e., the evolution of the structure over time. Results show that mesh-based architectures are able to provide bounds on the receiving delay and maintain rate ﬂuctuations due to system dynamics very low.

1

Introduction

The recent success of streaming based on peer-to-peer (P2P) applications seems to achieve what traditional streaming and multicasting applications have never achieved: distributed video-on-demand and live broadcasting on the Internet. The ﬁrst tree based systems [1][2] coexist now with more advanced mesh-based systems [3][4][5] that are more resilient to node dynamic behavior and more suited to the intrinsic Internet characteristics. In spite of the success of P2P streaming, the fundamental properties of such systems have not been investigated in depth. Many proposals use heuristic methods to improve performance, but these heuristics are veriﬁed only a posteriori and protocol parameters are tuned according to these results. Performance analysis of overlay streaming systems received some attention only recently. Most of the analytical works focus on tree based structures (e.g., [6]) and meshes are studied only through simulation [5]. To the best of our knowledge, no study analyzes the behavior of the streaming distribution system as a function of the topological properties of the graph that is built by the P2P application. Many studies based on graph theory focus on the steady state properties of networks

This work was partially supported in Trento by the Italian MIUR PRIN project PROFILES (http://profiles.dit.unitn.it).

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 594–605, 2007. c IFIP International Federation for Information Processing 2007

Graph Based Modeling of P2P Streaming Systems

595

[7][8], but the way the graph structure is built is not constrained by protocol rules. In this work we develop a mathematical model based on graph theory that can be used to analyze fundamental performance issues of overlay streaming services. We model such systems with a high level abstraction that allows the study of fundamental behavior under diﬀerent conditions. Our model can analyze the dynamics of the graphs, i.e., the evolution of the structure over time. It can assess the impact of diﬀerent protocol choices and of bandwidth heterogeneity on the streaming process, and it gives enough insight in the problem to formulate improved streaming strategies. The results obtained by the systematic study of diﬀerent conﬁgurations and scenarios show that performances are mainly inﬂuenced by the policies related to content format. Mesh based architecture are very robust to failures, even in presence of high churn, and the delay experimented by nodes is bounded.

2

Mesh-Based Overlay Streaming Systems

We do not consider a speciﬁc system but we identify common basic characteristics of recent proposals [4][5]. Consider an overlay network built by a P2P application. Once the overlay layer is built, paths between the source and the destinations are created according to the rules of the streaming protocol. At each hop, nodes both receive the stream and contribute uploading it to other nodes, i.e., they work as content relay. Since nodes in such networks can appear or disappear frequently, the set of nodes from which a node i is downloading changes over time. 2.1

System Parameters

The content is distributed using R diﬀerent stripes. Each stripe contains part of the stream (coded, for instance, using MDC techniques [9]). A node needs R < R out of R stripes to achieve a target quality, while the remaining R − R stripes contain redundant information. The evolution of the network is subject to two main events: node arrivals and node departures. We assume that arrivals and departures are exponentially distributed according to rates λ(t) and μ(t) respectively. The dependence on the time makes the model more ﬂexible: for instance, we can describe diﬀerent arrival patterns, such as ﬂash crowds or more smooth arrivals. Let Tstr be the duration of the stream and N the mean number of nodes receiving the stream at steady state. A fraction of the nodes joins the stream at time zero, and there is an arrival interval during which λ(t) > μ(t) until steady state is reached. Fig. 1 shows a sample arrival pattern. The departure rate μ(t) is the inverse of the mean time spent in the system (sojourn time). λ(t) at steady state compensates departures. For a given time interval T , the ratio between the cumulative number of disappeared nodes and the mean number of active nodes during T is deﬁned as the churn of the system.

596

D. Carra, R. Lo Cigno, and E.W. Biersack nodes N

initial nodes arrival interval

time

Fig. 1. Sample arrival pattern

A 100% churn means that during T the number of the departed nodes is equal to the mean number of nodes in the system, i.e., there is a stochastic complete change of the nodes during T . Nodes are divided into diﬀerent classes according to their bandwidth. Each (j) (j) class j has an upload bandwidth bu and a download bandwidth bd , which can be either symmetric, asymmetric or correlated, e.g., bui + bdi constant, as in a shared medium based access. The bandwidths are random variables described by a probability density function (pdf) that is known (e.g., derived from measurement studies). The rate of the streaming is rstr . We suppose that all nodes have a download bandwidth at least equal to the streaming rate. Each stripe has a rate equal to rstr /R , and we assume that the server is able to upload all the R stripes, i.e., it has a bandwidth greater than Rrstr /R . Each node has a constraint on maximum and minimum number of active uploads that limit the possible outdegree of the node: k max is the maximum and k min is the minimum outdegree. Each node has B neighbors. Among its neighbors the node selects its parent nodes, i.e., those from which it download. R parents are called active; the remaining are called standby, since they are used as a backup in case of active parent failure. 2.2

Join, Update and Leave Procedures

Nodes belonging to the initial set start building a diﬀusion tree for each stripe. The number of nodes in each diﬀusion tree depends on the characteristics of the nodes involved, i.e., their access bandwidths that determine the possible children. Each node is involved in multiple diﬀusion trees. When a new node arrives, it chooses randomly an active node as ﬁrst contact, and then builds its neighbor list with the help of the contacted node. From the neighbors list, the node selects its parents and attaches to them. Nodes periodically search for new connections among their neighbors with rate λup in order to increase their indegree. When a node leaves, all the inbound and outbound connections are canceled. Orphan nodes try to replace the disappeared parent. If the disappeared parent

Graph Based Modeling of P2P Streaming Systems

597

was in the standby set, the node does not react (it simply loses a backup parent). If the disappeared parent was in the active set, the node tries to switch the state of a standby parent, i.e., it starts downloading from the standby parent if it has available bandwidth. If the node has no backup parents, there is a temporary loss of quality that depends on the time necessary to search for a new parent.

3

Mathematical Background

The network of contacts among users of a P2P networks can be modeled as a graph, where nodes represent the users and edges the neighborhood relationship. When users start exchanging data (in our case, they start receiving and distributing the stream) they use a subset of the available outgoing/incoming edges. The focus of our analysis is the characteristics of the distribution graph, i.e., the subgraph of the overlay graph, where edges are the connections eﬀectively used by nodes. In general, the distribution graph is time varying, i.e., nodes and edges can appear or disappear in time. The evolution of the graph can be seen as a stochastic process with Markovian properties, since the graph at time t + dt depends only on the graph at time t and the event occurred during dt. The distribution graphs can be described through their structural characteristics. We consider two main distributions: the degree distribution and the delay distribution [7]. The degree distribution ps (k, t) is the probability that node s has k connections at time t.1 Knowing the degree distribution of each node in the graph, we can derive the total degree distribution N (t) 1 ps (k, t) P (k, t) = N (t) s=1

(1)

where N (t) is the number of nodes attached to the streaming at time t. The delay distribution represents the distance of the node from the source of the stream following the shortest path. We deﬁne ps (, t) as the probability that node s is steps away from the source at time t. The total delay distribution can be derived in a similar way as done for Eq. (1). Hereinafter, for notation simplicity, we omit the speciﬁcation of node s. 3.1

Master Equations and Rate Equations

The analysis of the graph can be done through the study of the properties of the degree and delay distributions. For a Markov process, the temporal behavior can be described using the diﬀerential form of the Chapman-Kolmogorov equations, known as Master Equations (MEs) [7]. 1

It is possible to distinguish between indegree and outdegree distributions, ps (ki , t) and ps (ko , t) respectively, with ki + ko = k.

598

D. Carra, R. Lo Cigno, and E.W. Biersack

Considering a node s, the variation of the probability to ﬁnd the value α (α is the degree or the delay) at time t can be expressed as ∂ p(α, t) = wβ,α (t)p(β, t) ∂t

(2)

β

where wβ,α (t) is the transition rate from the value β to the value α at time t. Transition rates are closely related to the streaming protocol policies and behavior. The general formulation of the MEs must be specialized for our problem, i.e., we have to deﬁne all the possible transitions. The MEs fully determine the evolution in time of the stochastic system for any node s. It is also useful to have the equations for the average value (degree k or delay l). The correspondent equations are called Rate Equations (REs): ∂ ∂ α= αp(α, t) ∂t ∂t α

(3)

The REs describe the average quantities and express deterministically the behavior of the system: actually, REs are a set of diﬀerential equations that describe the evolution over time of the mean properties of the system. Figure 2 shows the relationship between the results of the MEs and the result of the REs for a given observed random variable (e.g., node degree or delay). MEs clearly provide a great insight on the system (at a cost of more resources necessary to ﬁnd the solution), since they fully characterize the properties over time. REs gives a mean value that is equivalent to the ﬂuid approximation of the system.

random variable values

results of the MEs

000000000000 111111111111 0000000000 1111111111 000000000000 111111111111 000000000000 111111111111 0000000000 1111111111 000000000000 111111111111 11111111111 00000000000 000000000000 111111111111 0000000000 1111111111 0000000000 1111111111 000000000000 111111111111 00000000000 11111111111 000000000000 111111111111 0000000000 1111111111 0000000000 1111111111 000000000000 111111111111 00000000000 11111111111 000000000000 111111111111 0000000000 1111111111 0000000000 1111111111 000000000000 111111111111 00000000000 11111111111 000000000000 111111111111 0000000000 1111111111 0000000000 1111111111 000000000000 111111111111 00000000000 11111111111 of 000000000000 111111111111 0000000000 1111111111 0000000000 1111111111 000000000000result 111111111111 00000000000 11111111111 000000000000 111111111111 0000000000 1111111111 the RE 0000000000 1111111111 00000000000 11111111111 000000000000 111111111111 0000000000 1111111111 00000000000 11111111111 0000000000 1111111111 00000000000 11111111111

probability

time

Fig. 2. Results of the Master Equations and the Rate Equations

The methodology we propose is able to provide the solution for the MEs, and hence the complete system characterization. When the complexity of the system increases and the required resources for solving it become prohibitive, we can focus on the REs and obtain an analysis of the mean value. Thus, the proposed method oﬀers a great ﬂexibility in deciding the desired level of detail in the system analysis. Space forbids the detailed description of the transition rates: the interested reader can refer to [12].

Graph Based Modeling of P2P Streaming Systems

4

599

Monte Carlo Integration of the MEs

The set of MEs that describe the distribution process cannot in general be solved in closed form. However, the structure of the transition matrix that describes the stochastic process, is extremely suited for an eﬃcient numerical solution based on Monte Carlo techniques [10][11]2 , i.e., for a solution based on process simulation. Monte Carlo integration is basically a random walk in the state space of the process. The convenience of the methodology is given by the fact that it is very simple to build a random walk following the graph building rules given in Sect. 2 and the same rules deﬁne a transition matrix with good local properties, i.e., given a state there are few states where the process can evolve and, from the reward point of view, they are similar one another, so that there are not “diverging paths” that may lead to instabilities in the solution. Samples obtained via Monte Carlo techniques are i.i.d. by construction, so that conﬁdence intervals can be estimated on the whole probability distribution. The key strength of the methodology, however, is not the eﬃciency of the numerical solution: Indeed, this method provides great ﬂexibility in the system description and speciﬁcation. The realization of the stochastic process can be as close as desired to a real implementation of the protocol/system. On the one hand, this is like a generic simulation approach, but, being based on formal deﬁnitions, avoids the risk of incomplete or bugged speciﬁcations; on the other hand, assumptions made in ﬂuid models can be avoided, since we can describe the system behavior in full detail. 4.1

Comparison with Other Methodologies

We consider a very simple case in order to show the diﬀerences with other modeling approaches. Consider the case where a node update its indegree only during update events. We assume inﬁnite upload and download bandwidths and no constraints on the maximum outdegree. If ki (t) is the indegree at time t, at every update event the node will add R − ki (t) parents. In fact, under these assumptions the probability to ﬁnd all the necessary parents to obtain all the stripes is 1, since there is always a node that is able to provide a connection. The diﬀerential equation that describes the evolution can be written as d ki (t) = λup (R − ki (t)) − ki (t)μ (4) dt The second term considers the fact that each of the ki (t) parents can leave with rate μ. Considering the initial condition ki (0) = 1 (we suppose that all nodes are present at the beginning with exactly one parent each) the solution of (4) is λup R ki (t) = (5) 1 − e−(λup +μ)t + e−(λup +μ)t λup + μ 2

In physical and chemical sciences this technique is often called Stochastic Simulation Algorithm or Gillespie Algorithm, but we prefer to stick to the term ‘Monte Carlo’ normally used in computer science.

D. Carra, R. Lo Cigno, and E.W. Biersack

mean # of parents

600

12 10 8 6 4 2 0

λ = 5/Tstr λ = 3/Tstr λ = 2/Tstr

0 0.2 0.4 0.6 0.8 1 time Fig. 3. Solution of MEs and REs

0 2 4 6 8 10 indegree (a) λup = T 5

str

0.5 0.4 0.3 0.2 0.1 0

probability

0.5 0.4 0.3 0.2 0.1 0

probability

probability

In Fig. 3 we compare the analytical solution of this very simple case with the solution of the Rate Equations (3) derived from our model. We set R = 10 stripes, μ = 1/Tstr and λup = T 5 , T 3 and T 2 . We normalize the time with str str str respect to Tstr . The numerical solution follows closely the analytical one. But the results obtained from our model give more insight. In fact, we can observe how the full indegree distribution changes over time.

0 2 4 6 8 10 indegree (b) λup = T 3

str

0.5 0.4 0.3 0.2 0.1 0 0 2 4 6 8 10 indegree (c) λup = T 2

str

Fig. 4. Indegree distribution at time Tstr /2 obtained from the solution of the MEs

Fig. 4 shows the distribution of the number of parents (indegree) at time Tstr /2 for diﬀerent values of λup . Notice that there is a non-null probability that nodes remain without parents, thus being disconnected entirely from the distribution process, a phenomenon that a ﬂuid approach analyzing the means entirely disregards, while in most cases it is the most interesting result.

5

Application of the Methodology

We use a conﬁguration with N = 104 nodes3 . The input bandwidth distribution is composed by three classes: slow nodes, medium nodes and fast nodes with symmetric bandwidth respectively equal to rstr , 2rstr and 5rstr and probabilities 3

We have also checked conﬁgurations with 105 nodes obtaining similar results.

Graph Based Modeling of P2P Streaming Systems

601

equal to 0.2, 0.4 and 0.4. The streaming rate is divided into R stripes and the source generates R stripes. Results are obtained for R = 10 and R = 2, 4, 6, 8. We consider an observation time equal to Tstr . We consider two arrival patterns, with initial number of nodes equal to 0.1N and 0.5N respectively; the remaining nodes arrive within Tstr /5. The mean sojourn time is set to Tstr , 2Tstr , and 5Tstr . Each node can have up to 60 neighbors in the overlay graph (the actual number of neighbors depends on dynamics of the nodes); among these relationships, while uploading a node can have a maximum outdegree equal to 14. The stream is chunk based (e.g., few video frames or a slice of a few tens of milliseconds of sound) and we normalize the dimension of the chunk, U , such that U rstr = 1 unit. A node becomes eligible for uploading the content after a delay equal to the download time of a single chunk. So the delay can be considered as the “distance” (relative delay) of the node from the source of the stream. The length of the stream, Tstr , is set to 10000 rU = 10000 units. str Besides degree and delay properties, we consider also the quality of the mesh: when a node remains orphan of an active parent, it switches to one of its standby parents: if they have enough bandwidth to help the node, the node has no service disruption; if no standby parent is able to help the node, it must search for a new parent, with a possible service disruption. We measure the quality of the mesh as the percentage of nodes that successfully switch to a standby parent. Due to space constraints, we report only some sample results; a more comprehensive set can be found in [12]. 5.1

Analysis of the Indegree

Analysing the indegree we can examine whether the subdivision in stripes helps the distribution process or not. On the one hand, more stripes means that each stripe has a lower rate, so the loss of a single stripe has less impact. On the other hand, each node must maintain more active connections, and the probability that one of these connections fails increases. Figure 5(a) shows the indegree distribution of the nodes at time t = Tstr , computed with Eq. (1). In this case we have initial number of nodes equal to 0.1N and mean sojourn time Tstr . The distribution tends to peak around R as R tends to R. This means that all the nodes in the network are able to receive the full quality, since the degree is always greater or equal to R . Note that with R = 8 there is a fraction of the nodes with exactly 8 parents: this means that, in case of one parent disappears, the quality received by the node may be temporarely aﬀected. The average temporal behavior of the indegree can be analyzed looking at the results of the rate equations (Fig. 5(b)) computed with Eq. (3). A stable value is reached after few time units: this means that the structure, even in presence of high churn is able to maintain a high quality of the stream.

D. Carra, R. Lo Cigno, and E.W. Biersack

R’ = 8

0.6 0.3 0 0.6 0.3 0 0.6 0.3 0 0.6 0.3 0

R’ = 6

R’ = 4

R’ = 2

0

mean # of parents

Probability

602

12 10 8 6 4 2 0

2

4 6 8 10 Indegree (a) Prob.Distr. at Tstr

R’ = 2 R’ = 4 R’ = 6 R’ = 8 0

2

4

6

8

10

3

time (10 units) (b) Evolution in time

Fig. 5. Solution of the MEs for the indegree ki (t)

5.2

Analysis of the Delay

The delay, expressed as time units, represents the number of hops from the source. We plot the Complementary Cumulative Distribution Function (CCDF, deﬁned as 1-CDF) in order to study the tail of the distributions. We consider R = 4 and we set diﬀerent sojourn times (μ). Fig. 6(a) shows the case of initial number of nodes equal to 0.5N . The tail of the distribution is not aﬀected by the diﬀerent values of μ. In Fig. 6(b) we show the impact of R on the delay. Increasing the number of stripes has a side eﬀect: since each node needs all the R stripes to correctly play the stream, the absolute delay is given by maximum delay among the stripes. By increasing the number of stripes, the probability to have higher delays increases, since we have to compute the maximum among an increased number of stripes. 5.3

Analysis of the Quality of the Mesh

Aggregate results for the indegree and the delay are not able to capture all the aspects related to the quality of the received stream by a generic node i. In Table 1 we summarize other results that can be obtained from the solution of the MEs. The value of the churn is computed according to the arrival pattern: arrivals and departures are Poisson processes with rate λ(t) and μ(t) respectively, so we can calculate the cumulative disappeared nodes at time Tstr and consequently the value of churn. During node’s lifetime, there is a non null probability that all its parents leave and node i is not able to ﬁnd other parents. This situation causes an error and node i leaves the system. From the probability to have outdegree 0

Graph Based Modeling of P2P Streaming Systems 0

10

R’ = 2 R’ = 4 R’ = 6 R’ = 8

-1

10

CCDF

10

CCDF

0

10

μ = Tstr μ = 2Tstr μ = 5Tstr

-1

603

-2

10

-2

10

10-3

10-3 0

5

10 15 20 25 30

0

delay (units)

5

10 15 20 25 30 delay (units)

(a) R = 4

(b) Diﬀerent R

Fig. 6. CCDF of the delay (Initial number of nodes: 0.5N ) Table 1. Other statistics R 4 4 4 8 8 8

1/μ Tstr 2Tstr 5Tstr Tstr 2Tstr 5Tstr

% Churn 98.33% 49.09% 19.77% 98.24% 49.15% 19.74%

% Error 0.11% 0.05% 0.03% 0.10% 0.05% 0.02%

%Switch 99.63% 99.63% 99.63% 94.38% 96.76% 97.99%

at any instant t, p(0, t), integrating over time t we can compute the probability that a node leaves with an error message. This result is reported in column % Error of Table 1 for diﬀerent values of R . With high churn, one node per thousand is forced out of the system. In some contexts, this value may be inacceptable. Another interesting results is the probability to switch to a standby parent if an active parent leaves. This is given by p(ki , t) with ki < R . Integrating over time t we are able to compute the switch probability (see last column of Table 1). With a small R , the percentage of switches is very high, i.e., the received stream is stable. On the other hand, with R near to R, with high churn, if the number of parents of a node n drops below R , the probability to switch to a standby parent is 94%. This means that the quality temporalily decreases, as expected looking at degree distribution (Fig. 5(a)).

6

Discussion and Conclusions

The contribution of this paper is the introduction of a novel methodology for the high-level representation of overlay streaming systems. Based on the use of Master Equations, the solution of the model yields the entire probability

604

D. Carra, R. Lo Cigno, and E.W. Biersack

distribution of the results (not only the mean value), as well as the temporal (transient) dynamics. We have modeled systems proposed recently obtaining novel insights in the dynamics of self-organizing systems for streaming distribution. In the following we summarize the main ﬁndings that can help in designing better P2P streaming systems. Redundant stripes play a fundamental role in obtaining good performances. Recent proposals [4] consider only a small fraction of redundant information so, in case of node departures, the stability of the streaming is aﬀected. Maintaining standby parents, a common solution used by such systems, may not alleviate the problem, since a node can have a standby set of parents with the same stripes of the (remaining) active parents. The delay is inﬂuenced by stripe ‘size’: the greater R (smaller stripes) the higher the delay. The number of necessary stripes R should be kept low to keep a low delay. The delay remains low independently from the dynamics of the network, which is a counter-intuitive result. Under medium to high churn, nodes may become disconnected from the stream, interrupting the service (notice that instead churn does not aﬀect delay). Only stable nodes can prevent disconnections. This performance measure cannot be computed with any methodology that only yields averages, and may be diﬃcult to observe with standard simulations, because even small disconnection rates are unacceptable and require simulating thousands of nodes for hours to be observed with suﬃcient reliability.

References 1. Pendarakis, D., Shi, S., Verma, D., Waldvogel, M.: ALMI: An Application Level Multicast Infrastructure. In Proc. of the 3rd Usenix Symposium on Internet Technologies & Systems (USITS) (2001) 2. Banerjee, S., Bhattacharjee, B., Kommareddy, C.: Scalable Application Layer Multicast. In Proc. SIGCOMM (2002) 3. Chu, Y.-H., Rao, S. G., Zang, H.: A Case for End System Multicast. In Proc. of ACM SIGMETRICS (2000) 4. Zhang, X., Liu, J., Li, B., Yum, T. S. P.: DONet/CoolStreaming: A Data-driven Overlay Network for Live Media Streaming. In Proc. IEEE INFOCOM (2005) 5. Magharei, N., Rejaie, R.: Understanding Mesh-based Peer-to-Peer Streaming. In Proc. NOSSDAV (2006) 6. Baccelli, F., Chaintreau, A, Liu, Z., Riabov, A., Sahu, S.: Scalability of Reliable Group Communication Using Overlays. In Proc. IEEE INFOCOM (2004) 7. Dorogovtsev, S. N., Mendes, and J. F. F.: Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford University Press, Oxford (2003) 8. Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over Time: Densiﬁcation Laws, Shrinking Diameters and Possible Explanations. In Proc. 11th ACM SIGKDD (2005) 9. Goyal, V. K.: Multiple Description Coding: Compression Meets the Network. IEEE Signal Processing Magazine, Vol. 18, Issue 5, pp. 74-93 (2001)

Graph Based Modeling of P2P Streaming Systems

605

10. Honerkamp, J.: Stochastic Dynamical Systems: Concepts, Numerical Methods, Data Analysis. VCH, New York (1994) 11. Gillespie, D. T.: Exact Stochastic Simulation of Coupled Chemical Reactions. Journal of Physical Chemistry, Vol. 63, Issue 25, pp. 2340-2361 (1977) 12. Carra, D., Lo Cigno, R., Biersack, E. W.: “Graph Properties of Mesh-based Overlay Streaming Systems,” TR DIT-06-043, Univ. of Trento (2006) Available: http://www.dit.unitn.it/locigno/preprints/DIT-06-043.pdf

Modeling Seed Scheduling Strategies in BitTorrent Pietro Michiardi1, Krishna Ramachandran2, and Biplab Sikdar2 1 Institut Eurecom [email protected] 2 Rensselaer Polytechnic Institute {ramak,sikdab}@rpi.edu

Abstract. BitTorrent has gained momentum in recent years as an effective means of distributing digital content in the Internet. Despite the remarkable scalability and efficiency properties that characterize BitTorrent in the long haul, several studies identify the source of the content as the main culprit for the poor performance of the system in a transient regime where user requests for a popular content swamp the source and in case of high node churn. Our work models the scheduling decisions made at the source (called the seed) for selecting which pieces of the content to inject in the system through a stochastic optimization process and provides an analytical framework to compare different strategies. We define a new piece selection algorithm (called proportional fair scheduling, PFS) that incorporates the seed’s limited vision of the system dynamics in terms of user requests so as to ensure a better content distribution among the users. We prove convergence of PFS and compare its short and long term performance against the mainline BitTorrent implementation and the “smart seed” technique recently introduced in [9]. Our results show that PFS induces substantial improvements on both system performance, by decreasing the download time at the users, and system robustness against peer dynamics, by quickly reacting to sudden changes in the request patterns of the users.

1 Introduction Peer-to-peer (P2P) networks provide a paradigm shift from the traditional client server model of most networking applications by allowing all users to act as both clients and servers. The primary use of such networks so far has been to swap media files within a local network or over the Internet as a whole. Among current solutions deployed in the Internet, BitTorrent (BT) has received a lot of attention from the research community because of its scalability properties and its ability to handle the so called flash crowd scenario, a transient phase characterized by a sudden burst of concurrent requests for a popular content. However, recent results [1, 2, 3, 9, 11] have revealed some inefficiencies of BT that translate into a prolonged transient phase, indicating the source of the content (called the seed) as the main cause of a disproportionate distribution of the content among the downloaders. In this paper, we motivate the need to incorporate intelligence into scheduling file pieces at the seed and develop an analytic framework wherein the impact of the chosen strategies can be studied for a BT-like P2P network. We propose a novel scheduling policy called Proportional Fair Scheduling (PFS) that improves the content distribution process based both on past scheduling decisions and I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 606–616, 2007. c IFIP International Federation for Information Processing 2007

Modeling Seed Scheduling Strategies in BitTorrent

607

on the actual distribution of content requests as seen by the seed. Using the proposed analytical framework we compare our scheduling policy with the one used in the mainline BT implementation and with the best known scheduling improvement called “smart seed” [9]. Through numerical evaluation we show that PFS outperforms previous policies in the short term. For the long term analysis we built a BT simulator and show that our scheduling algorithm achieves a fair content distribution, and reduces the time needed for the seed to inject the content in the system. To summarize, our contributions in the current work can be stated as follows: – Present an analytic framework wherein different scheduling policies can be modeled and their behavior analyzed. – Propose a new algorithm, called Proportional Fair Scheduling (PFS) for piece distribution that performs better than the current proposed scheduling modification for the seed. 1.1 BitTorrent Overview Before proceeding further, we provide a brief system overview. BT is a P2P application that replicates the content by leveraging the upload bandwidth of the peers involved in the download process. Each unique content in the system is associated with a .torrent file, and is independent of the remaining torrents in the system. What this implies is that a peer’s view of the BT system is confined to a subset, termed the peer set, of all the hosts associated with a specific torrent. Peers wishing to download a particular content obtain the corresponding .torrent file from a web server and use a centralized entity called the tracker to collect a random subset of hosts currently active in the torrent. Peers involved in a torrent cooperate to replicate the file among each other using swarming techniques. BitTorrent achieves scalable and efficient content replication by employing the choke and rarest first algorithms. The former is used for peer selection, i.e. which peer to upload to, while the latter for selecting the file part scheduled to be transfered. Finally, a peer in BitTorrent exists in two states: seed state wherein it has the entire content or leecher state wherein it is in the process of downloading the file. Note that we have limited our description to details relevant to the current work and have glossed over several technicalities of the BT protocol, which may be found in [7]. The rest of the paper is organized as follows: in Section 2 we survey related literature, while in Section 3 we discuss on the rationale and motivations of our work. In Section 4 we present our analytical model that emulates the various content scheduling strategies for a seed, Section 4.1 provides an analytical dissection and addresses issues such as stability and convergence of the scheduling strategies. We present our results in Section 5 and draw relevant inferences from them and finally summarize the work in Section 6.

2 Related Work In recent times BitTorrent has received substantial interest from the research community, with several modeling as well as simulation studies aiming at improving its performance. Mathematical models for BT are presented in [3, 4, 5]. In [4] a fluid model is used to characterize the performance of BitTorrent like networks in terms of the average

608

P. Michiardi, K. Ramachandran, and B. Sikdar

number of downloads and download times. The authors in [5] propose to improve upon the aforementioned modeling work using a stochastic differential equation approach, by incorporating more realistic BT network behavior in their study. A Markovian model of a BT network was studied in [3], wherein the authors propose a novel peer selection strategy to improve download times. Along similar lines is another modeling work, [10], wherein a branching process based Markovian model was formulated to study BitTorrent like networks. Simulation based studies are the focus of the works presented in [1, 2, 6, 8, 9]. In [1], the authors investigate the efficacy of the rarest first and the choke algorithms while [2] documents the impact of various system parameters on the networks performance. Along similar lines, [8] presents the dissection of the performance of the mechanisms and algorithms used by BT over a five month period. In [6], the authors make the case for a network coding scheme to improve content replication, while in [9], the authors study the performance of BT by employing metrics such as file download time, link utilization and fairness. A common feature shared by the literature surveyed thus far is the attempt at modeling the BT system in its entirety. As a result, not all facets pertaining to efficient content distribution are explored. For instance, the first step in this direction is to ensure that the initial seed is able to inject the entire content among the leechers at the earliest and this calls for specialized scheduling algorithms. Unfortunately, with a wholistic approach, this is difficult to accomplish. In this current work we restrict our attention to the seeds, and study the impact of scheduling decisions at their end on the effectiveness of content distribution in the system. This is elaborated further in the following section.

3 Rationale and Motivation Typically when content first appears in a BT network, it is stored at a single host, i.e. there is a single seed. From here on, the lifetime of a torrent can be broadly classified into three stages: the initial flash crowd or transient phase where the seed experiences a huge volume of concurrent requests for the content followed by the steady state phase where the system dynamics (especially the arrival of requests for content) are regular and finally the “dying” out phase which marks the point where a substantial portion of the leechers complete downloading the content and leave the system. Note that, it is not binding for one stage to necessarily succeed the other. For, instance a torrent could witness multiple iterations of the flash crowd and steady state phases before eventually dying out. The motivation for the current work stems from the findings of various simulation studies [2, 9, 11] revealing an inefficiency in the performance of the protocol during the flash crowd phase of a torrent arising from a disproportionate distribution of content among the leechers. It was found that in the flash crowd scenario, often the distribution from the seed becomes a bottleneck in the replication process. In such a scenario, a lack of intelligence during the upload process at the seed could result in some of the pieces not being replicated at all. This phenomenon is termed starvation and can adversely impact the torrent’s performance in the following manner: consider the scenario where after a certain time (say t), the seed decides to go offline. At such time, if there are certain parts of the file that have not yet been replicated among any of the leechers, then the

Modeling Seed Scheduling Strategies in BitTorrent

609

torrent would eventually die out since none of the leechers would be able to complete the download. Even otherwise, a disproportionate distribution of the parts would result in a prolonged flash crowd scenario since the leechers have nowhere else to request the parts from. In other words the seed and the leechers hosting the rarer parts would be swamped with a huge volume of upload requests. This problem if further magnified if the seed is bandwidth constrained. Thus, an improved distribution of content at the seed’s end would serve to improve the performance of the torrent by decreasing the download time of the leechers, since there is a bigger pool of leechers with the same piece. A relevant doubt at this stage would be to question the rationale behind distinguishing between scheduling decisions at a seed and those at a leecher. In other words, why would not a common scheduling algorithm work for both ? The answer to this lies in the difference between the view of the torrent as seen by a leecher and a seed. While the leecher has complete information on the part distribution among the peers in it’s peer set, this knowledge is hidden from the seeds. Thus, lack of a global snapshot constrains a seed to base scheduling decisions on it’s own past history in order to improve content distribution and hence the motivation behind the current work. The endeavor in the current work is arrive at a mathematical framework generic in nature so as to facilitate the performance quantification of various scheduling strategies that could be implemented at the seed. In this paper we try and address the following problem: How best can a seed incorporate the limited view of the BT system into its scheduling decisions so as to ensure better content distribution among the downloaders? To this end, as a part of their simulation study of BT, the authors in [9] propose the local rarest first (LRF) policy, termed “smart seed” scheduling policy, as an improvement over the current scheduling scheme. However, the proposed scheme is not receptive to the system dynamics, i.e. leechers entering and leaving the torrent, and further, the optimality of such a strategy is not guaranteed. In this paper, we provide a theoretical grounding for the problem through a framework based on stochastic approximation algorithms. In particular, we compare the performance of our scheduling strategy, the proportional fairness scheme (PFS), with the current proposed modification, local rarest first (LRF), and the existing policy, random scheduling (RS), currently used in the mainline BT client.

4 Analytical Framework In this section, we present our analytic framework based on stochastic approximation to study the performance of piece scheduling decisions made at the seed. While the framework is generic in nature and applicable to study a large class of scheduling policies, for illustrative purposes we focus our discussion on characterizing the proportional fairness (PFS) and the LRF schemes. In the current section we present a detailed overview of incorporating the PFS scheme into the framework while in Section 4.2 we outline the modeling of the LRF scheme. The gist of the two schemes is presented below: – LRF: In this policy users are served on a first come first serve basis. Leechers request the seed for a set of parts (RB) and the seed uploads the least served piece amongst RB.

610

P. Michiardi, K. Ramachandran, and B. Sikdar

– Proportional Fairness Scheme (PFS): In this scheme, the seed takes into account the requests coming in for each part and the corresponding past throughput and uploads the piece with the maximum ratio of the two. Note that the existing scheduling algorithm (RS) is purely random in nature hence we do not model it in the current work. Before proceeding with the description of the model, we outline our assumptions: The content to be replicated is divided into p equal parts and is stored at a single seed. The seed is modeled by a single server queue with no buffer space. Time is slotted in intervals with the granularity of each round chosen to accommodate the transfer of a single file part. For the sake of simplicity, in the current work we allow peers to upload to 1 other randomly selected peer, as opposed to the fully fledged implementation wherein 4 peers are selected using the choke algorithm. In particular, the seed serves only one part in a round, with the decision on the piece to be uploaded in the next round made based on the requests that arrive during the current time slot. The peer satisfying the scheduling criteria is served in the next slot while the rest of the requests are dropped. The above assumptions are a reasonable mapping to a bandwidth constrained seed where it makes sense to dedicate the entire bandwidth to serve a particular request instead of increasing the latency by dividing it. Let the request vector at the end of slot n (start of slot n + 1) be represented as R(n + 1) = [r1,n+1 , r2,n+1 , · · · , rp,n+1 ], where ri,n+1 denotes the number of times part i was requested for in round n. In other words, each entry in R(n + 1) represents the number of leechers requesting for that particular part during the previous round, i.e. round n. Let the throughput vector be denoted as T (n) = [t1,n , t2,n , · · · , tp,n ], where ti,n represents the number of times part i was served in n rounds. Similarly, let θ(n) = [θ1,n , θ2,n , · · · , θp,n ] denote the vector of sum of requests for the different parts, each time it was served, averaged over the past n rounds. The average throughput and request rate for part i after n rounds are defined as follows: n n Ii,k ri,k Ii,k θi,n = k=1 Ti,n = k=1 n n where Ii,k is an indicator variable equal to 1 if part i is scheduled in round k and 0 otherwise. Thus, at the end of each round, each entry in vectors θ and T can be updated as follows: θi,n+1 = θi,n + n [Ii,n+1 ri,n+1 − θi,n ] and Ti,n+1 = Ti,n + n [Ii,n+1 − Ti,n ] (1) 1 with Ii,n+1 as explained above and n = n+1 . Given the above system parameters, the seed scheduling algorithm we propose (PFS) can be summarized as follows:

– Among the non-zero request entries that arrive in a round, select that part maximizing the following ratio: ri,n+1 arg max (2) i θi,n + d

Modeling Seed Scheduling Strategies in BitTorrent

611

If there are multiple parts satisfying the above criterion, break ties arbitrarily. Here, d is a constant arbitrarily close to zero and is chosen to avoid the divide by zero error in the initial stages of the torrent when the throughputs for nearly all the parts are close to or equal to zero. – Upload the chosen part from the previous step to the requesting peer. Again, break ties arbitrarily It is quite natural to question the soundness, be it theoretical or practical, of a formulation as in Equation (2). The proposed format can be justified if the content replication process were to be viewed, from a seed’s perspective, as a variant of the utility maximization problem. Note that in a BT system, the onus is primarily on the seed to ensure the spread of content among the peers in the system. Thus, a seed seeks to maximize the replicas of each piece among the leechers and therefore it is reasonable to assume that the utility function chosen is concave in nature. In this context consider the utility function to be the sum of the logarithm of average number of requests of the individual pieces, i.e. p log(θi + d) (3) U (θ) = i=1

Then it can be shown [13] that for this particular choice of utility maximization, the policy outlined in Equation (2) yields optimal results. We further note that the seed is not constrained to choose the policy of Equation (2). Any reasonable representative concave function can be chosen as the utility function and the scheduling policy appropriately tailored to obtain optimal results. 4.1 Convergence Analysis The formulation of Equation (1) is in the framework of stochastic approximation algorithms [12]. Notably, under certain assumptions, which can be shown to be valid in a BitTorrent scenario, it can be shown that the stochastic approximation algorithm in Equation (1) can be described by a deterministic mean field ordinary differential equation (ODE) system. This enables us to characterize the behavior of the proposed algorithm and is also a useful tool to study the asymptotic properties such as the long term throughput of the respective file pieces. An important consequence of the convergence proof is that concerning the stability of the system. For example, a scheduling policy that converges asymptotically also characterizes a stable system. We now outline the assumptions required for the ODE convergence: – Stationarity of the request distribution: {R(n), n < ∞}. Note that in a BT system, the requests generated by leechers for the missing pieces depend only on the current distribution of the parts among each other. For instance, if a system snapshot at time t were to be translated to a different instant, say t1 , the pattern of requests generated would be similar. Define the stationary expectation w.r.t. the request distribution for part i as ˆ i (θ) = E[I ri rj (4) h { ≥ },∀j=i ] θi +di

θj +dj

612

P. Michiardi, K. Ramachandran, and B. Sikdar

ˆ i (.), 1 ≤ i ≤ p. We demonstrate this with the help of a – Lipschitz continuity of h simple case where the file consists of two parts and the joint probability density is given by p(r1 , r2 ). Then, for part 1, Equation (4) can then be simplified as ˆ 1 (θ) = h

I{ rr1 ≥w} p(r1 , r2 )dr1 dr2

(5)

2

where w = (θ1 + d)/(θ2 + d). Note that in the above equation we have used a continuous density function for the request generation process, which is in fact discrete. This is because, it has been shown in [14], that the requests for the parts can be approximated by a Gaussian distribution which is continuous. In the current work, we employ the same approximation and hence the formulation of Equation (5). Now, Eqn. (5) is Lipschitz continuous with respect to w, since the area of the region where the indicator function is not zero is a differentiable function of ˆ 1 (θ) and h ˆ 2 (θ) ˆ 2 (θ). Further, the derivatives of h w [13]. Similar is the case for h will be continuous if p(r1 , r2 ) is bounded and continuous. – Bounded density of R(n). This is trivially satisfied since the number of users in a BT system is finite thus ensuring that the requests generated during each round of time remain bounded. Under the above assumptions, the stochastic approximation algorithm of Equation (1) can be approximated by the ODE given by: T˙iP F S = E[I{

ri θi +di

rj j +dj

≥θ

}∀j=i ]

− TiP F S

(6)

4.2 Modeling Other Policies The analytic framework provides a generic setting wherein a wide class of scheduling policies can be modeled and quantified. We illustrate the robustness of the framework by modeling the LRF scheme in [9] as follows: – For each piece i in the request block (RB) set ri,n+1 = 1 – Choose piece i such that: arg maxi∈RB θi,n1+di ; break ties arbitrarily – Upload the piece from the previous step The corresponding throughput formulation for part i, TiLRF , is then given by: LRF LRF Ti,n+1 = Ti,n + n [I{ θ

1 i +di

≥θ

1 j +dj

}∀j=i

LRF − Ti,n ]

(7)

and the equivalent ODE by: T˙iLRF = E[I{ θ

1 i +di

≥θ

1 j +dj

}∀j=i ]

− TiLRF

(8)

Modeling Seed Scheduling Strategies in BitTorrent

613

5 Results In this section we present results comparing the efficiency of the PFS scheme against LRF. To prove the robustness of the proposed framework, we quantify the performance gains obtained in the short term as well as in the long run. For the short term analysis we perform a numerical evaluation of the PFS scheduling using the stochastic approximation algorithm as described in Section 4. On the other hand, we perform the long term evaluation using a custom simulator of the BT system. The rationale behind this choice lies in the lack of a realistic characterization of the piece request rate R(n) = [r1,n , r2,n , · · · , rp,n ] to be used in the analytical evaluation presented in Section 4.1. Our implementation, which is outlined in Section 5.2, also provides a global perspective of the system, as opposed to the seed’s perspective offered by the analytical model. 5.1 Short Term Behavior Since the primary objective in the initial stages of a torrent is to minimize starvation of pieces, a natural benchmark for comparing the policies would be to measure the number of starved pieces at a certain point of time under each policy. Here, we choose to make the comparison after p rounds, where p denotes the number of pieces the content is divided into. The rationale behind this is as follows: since we assume that the seed schedules one piece per round, in the ideal case it would require p rounds to ensure that the file in it’s entirety is present among the leechers. Figure 1 graphs the performance of the various policies in the flash crowd stage. In Figure 1(a), the number of starved parts of a 30 part file are plotted for each policy over 100 runs of our algorithm while Figure 1(b) quantifies the impact of the file size on the number of starved parts. Each point on the graph of Fig. 1(b) is an average of 100 runs. As seen from the plots, the proportional fair scheme offers significant gains over the other two policies. Even with increasing file sizes, the performance degradation is not very substantial. In fact, for a file consisting of 100 parts, the ratio of starved pieces in the “flash crowd” phase is about 1:3 for PFS and LRF, while it is around 1:18 when comparing PFS and the RS schemes. We believe the better performance of the algorithm could be attributed to the following factors: – The seed makes a scheduling decision taking into account all the requests that are made in a particular round, unlike LRF and RS where users are served in a first come first served manner. For instance, if a large number of leechers request for a particular piece there is a higher probability of it being a rare piece as compared the rarity of a piece requested by a single user. – In an open BT system the local rarest piece need not reflect reality, from the seed’s perspective, due to leechers entering and leaving the system. Thus, when a seed bases its scheduling decisions only on its past history like in the LRF case, due to peers’ dynamics a seed may have a stale vision of what is rare and what is not in the system. The PFS scheme accounts for this by using the number of requests for a piece as the system’s indicator of rarity and makes the scheduling decision accordingly.

614

P. Michiardi, K. Ramachandran, and B. Sikdar Scheduling Efficiency of PFS, LRF and Random Scheduling Schemes

Scheduling Efficiency of PFS, LRF and Random Scheduling Schemes 40

15

LRF PFS RS

Number of starved parts

Number of starved parts

35

10

LRF PFS RS

LRF PFS RS 5

30

25

20

15

10

5

0

0

10

20

30

40

50

60

Simulation Run

(a)

70

80

90

100

0 30

40

50

60

70

80

90

100

Number of parts (file size)

(b)

Fig. 1. Performance evaluation in the flash crowd phase

5.2 Long Term Behavior As a final validation of our theoretical formulation presented in Eqn (2), we present a simulation comparison of the proposed PFS algorithm against the LRF scheme, especially the behavior over long time periods. Since we only modify the seed scheduling algorithm, it only makes sense to quantify the impact within the seed’s peer set and not globally. The main objective in the long term is to prevent a high variance in the number of replicas of each part, i.e. prevent a disproportionate piece replication in the peer set since it is the root cause of all problems. In other words the scheduling process should be “fair” to the individual pieces. The intuition behind this is that ensuring a balanced replication of the pieces can help improve download times since there is a higher level of redundancy and also distribute the load more evenly among the leechers. As a measure of the degree of fairness, we employ the Max-Min Fairness Index [15] given by min∀i (xi ) max∀i (xi ) , where xi denotes the number of replicas of part i at the end of a round in the seed’s peer set. Before discussing the long term results, we provide a brief description of the custom simulator we designed. BitTorrent Simulator: We developed a synchronous simulator working in rounds wherein we implemented both seed and leecher algorithms following the BT specification. We then implemented two scheduling policies at the seed side, the PFS and the LRF. The only limitation we imposed on the simulator follows the one of the analytical model: only one peer is unchoked in each round. The peer set size for a peer is set to the default value of the mainline BT client, that is 80 peers. To quantify the impact of the scheduling decisions, we assume that leechers that finish downloading leave the torrent, i.e. there is a single seed in the system at all times. Simulation Results: We compare the LRF and the PFS scheduling algorithms assuming the content to be split in p = 150 pieces. We simulate the presence of one seed only in the system and study two representative and realistic scenarios: the first where the torrent experiences a heavy flash crowd and the second indicative of a torrent with a high churn rate.

Modeling Seed Scheduling Strategies in BitTorrent

615

To simulate the flash crowd setting, 160 peers are injected into the system in the first round, after which no further joins are allowed. The objective here is to study the algorithm’s sensitivity toward achieving a balanced replication in the wake of a huge volume of requests. Note that the Max-Min Fairness plots can also be used to infer and compare the download times experienced by the leechers. Since we assume that leechers with the entire content depart, the time T when the graph reaches one also denotes the instant when all the leechers in the system have finished downloading. Therefore, the faster the graph peaks to one, the better it is in terms of fairness as well as download times. In Figure 2(a) we plot the Max-Min Fairness index versus time (in simulation rounds) for the flash crowd scenario described above. When using PFS scheduling, T = 159 while for the LRF case T = 219. A similar trend was observed over multiple repetitions of the experiment, showing an improvement of the total time to download the content in favor of PFS whereas this improvement was even more pronounced in the case of smaller files. 1

0.8 0.7 0.6 0.5

T=159

0.4

T=219

0.3 0.2

PFS

0.1

LRF

0

Max−Min Fairness Index

Max−Min Fairness Index

1 0.9

0.9

140

160

180

200

220

240

Time (simulation round)

(a)

LRF

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

120

PFS

0.8

0

100

200

300

400

500

Time (simulation round)

(b)

Fig. 2. Simulation results for the long term analysis

In the second simulation study, we focus on the responsiveness of scheduling decisions at the seed when substantial variations in the population of peers downloading the content arise, i.e. a system with high churn. In particular, we consider 80 peers joining the system at round 1, then 30 randomly chosen peers leaving the system at round 150, and finally 80 new peers joining the system at round 250. Although both PFS and LRF scheduling reach the highest fairness index, Figure 2(b) clearly shows that PFS reacts consistently faster to peer dynamics as compared to LRF. Similar results have been obtained for different runs of the same scenario.

6 Conclusion and Future Work In this work, we motivated the need for improved scheduling algorithms at the seed in a BT system and quantified the performance gains obtained thus. A generic analytical framework to model such algorithms was presented and a novel seed scheduling strategy to achieve better content replication was proposed. Through numerical evaluation of the model as well as simulations the improved performance of the proposed PFS

616

P. Michiardi, K. Ramachandran, and B. Sikdar

algorithm over existing strategies in the literature (LRF and the existing mainline random scheduling schemes) was demonstrated.

References 1. A. Legout, G. Urvoy-Keller and P. Michiardi, Rarest First and Choke Algorithms Are Enough, ACM SIGCOMM/USENIX IMC 2006, Rio de Janeiro, Brazil. 2. G. Urvoy-Keller and P. Michiardi, Impact of Inner Parameters and Overlay Structure on the Performance of BitTorrent, IEEE Global Internet Symposium 2006, Barcelona, Spain. 3. Y. Tian, D. Wu and K. W. Ng, Modeling, Analysis and Improvement for BitTorrent-Like File Sharing Networks, IEEE INFOCOM 2006, Barcelona, Spain. 4. D. Qiu and R. Srikant, Modeling and performance analysis of BitTorrentlike peer-to-peer networks, ACM SIGCOMM 2004, Portland, OR, USA. 5. B. Fan, D-M. Chiu and J. C. Sl Lui, Stochastic Differential Equation Approach to Model BitTorrent-like P2P Systems, IEEE ICC 2006, Istanbul, Turkey. 6. C. Gkantsidis and P. Rodriguez, Network Coding for Large Scale Content Distribution, IEEE INFOCOM 2005, Miami, USA. 7. B. Cohen, Incentives Build Robustness in BitTorrent, Workshop on Economics of Peer-toPeer Systems 2003, Berkeley, USA. 8. M. Izal, G. Urvoy-Keller, E. W. Biersack, P. Felber, A. A. Hamra and L. Garces-Erise, Dissecting BitTorrent: Five Months in a Torrent’s Lifetime, PAM 2004, Antibes, France. 9. A. Bharambe, C. Herley and V. N. Padmanabhan, Analyzing and Improving a BitTorrent Network’s Performance Mechanisms, IEEE INFOCOM 2006, Barcelona, Spain. 10. X. Yang and G. de Veciana, Service capacity in peer-to-peer networks, IEEE INFOCOM 2004, Hong Kong, China. 11. F. Mathieu and J. Reynier, Missing Piece Issue and Upload Strategies in Flashcrowds and P2P-assisted Filesharing, Technical Report, ENS, France. 12. H. J. Kushner and G. Yin, Stochastic Approximation Algorithms and Applications, 2nd ed. Berlin, Germany: Springer-Verlag, 2003. 13. H. J. Kushner and P. A. Whiting, Convergence of Proportional-Fair Sharing Algorithms Under General Conditions, IEEE Transactions on Wireless Communications, Vol. 3, No. 4, July 2004 14. D. Erman, D. Ilie and A. Popescu, BitTorrent Session and Message Models, ICCGI 2006, Bucharest, Romania. 15. B. Radunovi´c and J.Y. Le Boudec, A Unified Framework for Max-Min and Min-Max Fairness with Applications, Technical Report, EPFL, July 2002

Streaming Performance in Multiple-Tree-Based Overlays Gy¨ orgy D´ an, Vikt´ oria Fodor, and Ilias Chatzidrossos Laboratory for Communication Networks School of Electrical Engineering KTH, Royal Institute of Technology Stockholm, Sweden {gyuri,vfodor,iliasc}@ee.kth.se Abstract. In this paper we evaluate the data transmission performance of a generalized multiple-tree-based overlay architecture for peer-to-peer live streaming that employs multipath transmission and forward error correction. We give mathematical models to describe the error recovery in the presence of packet losses. We evaluate the data distribution performance of the overlay, its asymptotic behavior, the stability regions for the data transmission, and analyze the system behavior around the stability threshold. We argue that the composed measure of the mean and the variance of the packet possession probability can support adaptive forward error correction.

1

Introduction

The success of peer-to-peer overlays for live multicast streaming depends on their ability to maintain acceptable perceived quality at all the peers, that is, to provide data transmission with low delay and information loss. Live streaming solutions usually apply multiple distribution trees and some form of error control to deal with packet losses due to congestion and peer departures. In these systems peers have to relay data to their child nodes with low delay, which limits the possibilities of error recovery. Consequently, the main problem to be dealt with is the spatial propagation and thus the accumulation of losses, which results in low perceived quality for peers far from the source. Several works deal with the management of the overlay, with giving incentives for collaboration, with peer selection and tree reconstruction considering peer heterogeneity and the underlying network topology ([1,2,3,4,5,6] and references therein). In [7] the authors propose time shifting and video patching to deal with losses and discuss related channel allocation and group management issues. In [8] robustness is achieved by distributing packets to randomly chosen neighbors outside the distribution tree. In [9] retransmission of the lost data is proposed to limit temporal error propagation. CoopNet [10] and SplitStream [11] propose the use of multiple distribution trees and a form of multiple description coding

This work was in part supported by the Swedish Foundation for Strategic Research through the projects Winternet and AWSI, and by Wireless@KTH.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 617–627, 2007. c IFIP International Federation for Information Processing 2007

618

G. D´ an, V. Fodor, and I. Chatzidrossos

(MDC) based on forward error correction (FEC). In the case of packet losses peers can stop error propagation by reconstructing the video stream from the set of received substreams using error correcting codes. There are several designs proposed and also implemented, the evaluation of these solutions is however mostly based on simulations and measurements. Our goal is to deﬁne abstract models of peer-to-peer streaming overlays that help us to understand some basic characteristics of streaming in multiple transmission trees and thus, can support future system design. Previously, we proposed mathematical models to describe the behavior of CoopNet like architectures in [12,13,14]. Our results showed that the two architectures proposed in the literature, and used as a basis in recent works (e.g., [2,15]) are straightforward but not optimal. Minimum depth trees minimize the number of aﬀected peers at peer departure, minimize the eﬀect of error propagation from peer to peer, and introduce low transmission delay. Nevertheless, the overlay is unstable and may become disconnected when one of the trees runs out of available capacity after consecutive peer departures. Minimum breadth trees are stable and easy to manage, but result in long transmission paths. Thus many nodes are aﬀected if a peer leaves, there may be large delays, and the eﬀect of error propagation may be signiﬁcant. In [16] we proposed a generalized multiple-tree-based architecture and showed that the stability of the overlay can be increased signiﬁcantly by the ﬂexible allocation of peer output bandwidth across several transmission trees. In this paper, we evaluate the performance of this generalized architecture. First we show how the allocation of peer output bandwidth aﬀects the data distribution performance and discuss whether proper bandwidth allocation can lead to both increased overlay stability and good data distribution performance. We show that the packet possession probability at the peers decreases ungracefully if the redundancy level is not adequate, and evaluate how the variance of the packet possession probability can predict quality degradation. The rest of the paper is organized as follows. Section 2 describes the considered overlay structure and error correction scheme. We evaluate the stability of the data distribution in Section 3. Section 4 discusses the performance of the overlay based on the mathematical models and simulations and we conclude our work in Section 5.

2

System Description

The peer-to-peer streaming system discussed in this paper is based on two key ideas: the overlay consists of multiple transmission trees to provide multiple transmission paths from the sender to all the peers, and data transmission is protected by block based FEC. 2.1

Overlay Structure

The overlay consists of the streaming server and N peer nodes. The peer nodes are organized in t distribution trees with the streaming server as the root of the trees. The peers are members of all t trees, and in each tree they have a diﬀerent parent node from which they receive data. We denote the maximum number of

Streaming Performance in Multiple-Tree-Based Overlays

619

children of the root node in each tree by m, and we call it the multiplicity of the root node. We assume that nodes do not contribute more bandwidth towards their children as they use to download from their parents, which means, that each node can have up to t children to which it forwards data. A node can have children in up to d of the t trees, called the fertile trees of the node. A node is sterile in all other trees, that is, it does not have any children. We discuss two diﬀerent policies that can be followed to allocate output bandwidth in the fertile trees. With the unconstrained capacity allocation (UCA) policy a node can have up to t children in any of its fertile trees. With the balanced capacity allocation (BCA) policy a node can have up to t/d children in any of its fertile trees. By setting d = t one gets the minimum breadth tree described in [10], and by setting d = 1 one gets the minimum depth tree evaluated in [2,6,10,15]. For 1 < d < t the number of layers in the overlay is O(logN ) as for d = 1. The construction and the maintenance of the trees can be done either by a distributed protocol (structured, like in [11] or unstructured, like in [9]) or by a central entity, like in [10]. The results presented in this paper are not dependent on the particular solution used. Nevertheless, we deﬁned a centralized overlay construction algorithm in [16]. The objective of the algorithm is to minimize the depth of the trees and the probability that trees run out of free capacity after node departures. This is achieved by pushing sterile nodes to the lower layers of the trees and by assigning arriving nodes to be fertile in trees with the least available capacity. If we denote the number of layers in the trees by L, then in a well maintained tree each node is 1 ≤ i ≤ L hops away from the root node in its fertile trees, and L − 1 ≤ i ≤ L hops away in its sterile trees. As shown in [16], increasing the number of fertile trees increases the overlay stability, with signiﬁcant gain already at d = 2, and the UCA policy giving the highest increase. Tree 1

R

1

4

7

6

2

Tree 2

R

1

2

3 5

8

6

8

7 5

a)

Tree 3

R

2

3

4 3

6

4

7

Tree 1

R

1

4

5 1

8

9 7

6

2

Tree 2

R

1

2

3 5

8

6

8

7 5

Tree 3

R

2

3

4 9 3

6

4

7

5 1

8

3

b)

Fig. 1. a) Overlay with N = 8, t = 3, m = 3 and d = 2, b) the same overlay with N = 9. Identiﬁcation numbers imply the order of arrival, squares indicate that the node is fertile.

Fig. 1 shows an overlay constructed with the proposed algorithm for t = 3, m = 3 and d = 2, also showing how the overlay changes when a new node joins. 2.2

Data Distribution with Forward Error Correction

The root uses block based FEC, e.g., Reed-Solomon codes [17], so that nodes can recover from packet losses due to network congestion and node departures. To every k packets of information it adds c packets of redundant information, which results in a block length of n = k + c. We denote this FEC scheme by FEC(n,k). If the root would like to increase the ratio of redundancy while maintaining its

620

G. D´ an, V. Fodor, and I. Chatzidrossos

bitrate unchanged, then it has to decrease its source rate. Lost packets can be reconstructed as long as at most c packets are lost out of n packets. The root sends every tth packet to its children in a given tree in a round-robin manner. If n ≤ t then at most one packet of a block is distributed over the same distribution tree. Peer nodes relay the packets upon reception to their respective child nodes. Once a node receives at least k packets of a block of n packets it recovers the remaining c packets and forwards the ones belonging to its fertile trees.

3

Data Distribution Model

We use two metrics to measure the performance of the data distribution in the overlay. The ﬁrst metric is the probability π that an arbitrary node receives or can reconstruct (i.e., possesses) an arbitrary packet. If we denote by ρr the number of packets possessed by node r in an arbitrary block of packets, then π can be expressed as the average ratio of packets possessed in a block over all is σ, the average standard nodes, i.e., π = E[ r ρr /n/N ]. The second metric deviation of ρr /n over all nodes, i.e., σ 2 = E[ r (ρr /n)2 /N ] − π 2 . The mathematical model we present describes the behavior of the overlay in the presence of independent packet losses and without node dynamics. We denote the probability that a packet is lost between two adjacent nodes by p. We assume that the probability that a node is in possession of a packet is independent of that a node in the same layer is in possession of a packet. We also assume that nodes can wait for redundant copies to reconstruct a packet for an arbitrary amount of time. For the model we consider a tree with the maximum number of nodes in the last layer, hence nodes are fertile in layers 1..L − 1 and are sterile in layer L. For simplicity, we assume that n = t and t/d is an integer. We will comment on the possible eﬀects of our assumptions later. The model assumes the BCA policy. By modifying the weights in (1) the model can be used to consider other policies as well. For brevity we restrict the analytical evaluation to this case, and compare the results to simulations with other policies. Hence, our goal is to calculate π (z) = E[ r (ρr /n)(z) /N ] = r E[(ρr /n)(z) ]/N , the average of the z th moment (z ∈ {1, 2}) of the ratio of possessed packets. We introduce π(i)(z) , the z th moment of the ratio of possessed packets for a node that is in layer i in its fertile trees. For simplicity we assume that nodes are in the same layer in their fertile trees. We can express π (z) by weighting the π(i)(z) with the portion of nodes that are in layer i of their fertile trees. π (z) =

L−1 i=1

(t/d)i−1 π(i)(z) . − 1)/(t/d − 1)

((t/d)L−1

(1)

To calculate π(i)(z) we have to calculate the probabilities πf (i) that a node, which is in layer i in its fertile tree, receives or can reconstruct an arbitrary packet in its fertile tree. Since the root node possesses every packet, we have that πf (0) = 1. The probability that a node in layer i receives a packet in a tree is πa (i) = (1 − p)πf (i − 1). A node can possess a packet in its fertile tree either if it receives the packet or if it can reconstruct it using the packets received in

Streaming Performance in Multiple-Tree-Based Overlays

621

the other trees. Reconstruction can take place if the number of received packets is at least k out of the remaining n − 1, hence we can write for 1 ≤ i ≤ L − 1 ⎧ n−1 ⎨ min(j,d−1) d−1 πf (i) = πa (i) + (1 − πa (i)) πa (i)u (1 − πa (i))d−1−u u ⎩ j=k u=max(0,j−n+d) n−d j−u n−d−j+u (1 − πa (L)) πa (L) . (2) j−u Based on the probabilities πf (i) we can express π(i)(z) (1 ≤ i ≤ L − 1). If a node receives at least k packets in a block of n packets then it can use FEC to reconstruct the lost packets, and hence possesses all n packets. Otherwise, FEC cannot be used to reconstruct the lost packets. Packets can be received in the d fertile trees and in the t − d sterile trees. Hence for π(i)(z) we get the equation n−d d 1 (z) z d τ (j + u) (3) π(i) = π (i)u (1 − πa (i))d−u u a n j=1 u=1 n−d πa (L)j (1 − πa (L))n−d−j , j where τ (j) indicates the number of packets possessed after FEC reconstruction if j packets have been received:

j 0≤j 0. The π(i)(z) can then be calculated using (3). The calculation of π and σ are straightforward, since π = π (1) , and σ 2 = π (2) − π 2 3.1

Asymptotic Behavior for Large N

In the following we give an asymptotic bound on π to better understand its evolution. It is clear that πf (i) is a non-increasing function of i and πf (i) ≥ 0. Hence, we can give an upper estimate of πf (i) by assuming that the nodes that forward data in layer i are sterile in the same layer. Then, instead of (2) we get the following nonlinear recurrence equation π f (i + 1) = π a (i + 1) + (1 − π a (i + 1))

n−1 j=n−c

n−1 j

(4) π a (i + 1)j (1 − π a (i + 1))n−1−j .

This equation is the same as (2) in [13], and thus the analysis shown there can be applied to describe the evolution of π f (i). For brevity, we only state the main results regarding π f (i), for a detailed explanation see [13]. For every (n, k)

622

G. D´ an, V. Fodor, and I. Chatzidrossos

there is a loss probability pmax below which the packet possession probability π f (∞) > 0 and above which π f (∞) = 0. Furthermore, for any 0 < δ < 1 there is (n, k) such that π f (∞) ≥ δ. Consequently, in the considered overlay if p > pmax , then limN →∞ π = 0, because π f (i + 1) ≥ πf (i + 1) ≥ π(i + 1)(1) , and limN →∞ π f (L − 1) = 0. For p < pmax stability depends on the number of layers in the overlay and the FEC block length due to the initial condition πf (L − 1)(0) = (1 − p)L−1 , but not directly on the number of nodes. This explains why placing nodes with large outgoing bandwidths close to the root improves the overlay’s performance [2,6]. In the case of stability πf (i) ≥ π f (∞) > 0 and π(i)(1) ≥ π f (∞) > 0. 3.2

Discussion of the Assumptions

In the following we discuss the validity of certain assumptions made in the model. The model does not take into account the correlations between packet losses in the Internet. Nevertheless, it can be extended to heterogeneous and correlated losses by following the procedure presented in [13] for the minimum breadth trees. Losses occurring in bursts on the output links of the nodes inﬂuence the performance of the overlay if several packets of the same block are distributed over the same tree, that is, if n > t. Bursty losses in the backbone inﬂuence the performance if packets of diﬀerent distribution trees traverse the same bottleneck. The analysis in [13] showed that loss correlations on the input links and heterogeneous losses slightly decrease the performance of the overlay. The model can be extended to nodes with heterogeneous input and output bandwidths. The procedure is similar to that followed when modeling heterogeneous losses [13], but the eﬀects of the heterogeneous bandwidths on the trees’ structure have to be taken into account. We decided to show equations for the homogeneous case here to ease understanding. In the analysis we assume that the number of nodes in the last layer of the tree is maximal. If the number of nodes in the last layer of the tree is not maximal then some nodes are in layer L − 1 in their sterile trees, and the overlay’s performance becomes slightly better. The results of the asymptotic analysis still hold however. Our results for block based FEC apply to PET and the MDC scheme considered in [10], where diﬀerent blocks (layers) of data are protected with diﬀerent FEC codes. The packet possession probability for the diﬀerent layers depends on the strength of the FEC codes protecting them, and can be calculated using the model. We do not model the temporal evolution of the packet possession probability. We use simulations to show that the performance predicted by the model is achieved within a time suitable for streaming applications. The model does not take into account node departures, an important source of disturbances for the considered overlay. Following the arguments presented in [13] node departures can be incorporated in the model as an increase of the loss probability by pω = Nd /N × θ, where Nd is the mean number of departing nodes per time unit and θ is the time nodes need to recover (discovery and reconnection) from the departure of a parent node. The simulation results presented in [13] show that for low values of pω this approximation is accurate.

Streaming Performance in Multiple-Tree-Based Overlays

623

The results of the model apply for n < t without modiﬁcations, and a similar model can be developed for n > t by considering the distribution of the number of packets possessed by the nodes in their fertile trees. However, for n > t node departures lead to correlated losses in the blocks of packets, which aggravates their eﬀect on the performance.

4

Performance Evaluation

π(i)

In the following we analyze the behavior of the overlay using the analytical model presented in the previous section and via simulations. For the simulations we developed a packet level event-driven simulator. We used the GT-ITM [18] topology generator to generate a transit-stub model with 10000 nodes and average node degree 6.2. We placed each node of the overlay at random at one of the 10000 nodes and used the one way delays given by the generator between the nodes. The mean delay between nodes of the topology is 44 ms. The delay between overlay nodes residing on the same node of the topology was set to 1 ms. Losses on the paths between the nodes of the overlay occur independent of each other according to a Bernoulli model with loss probability p. We consider the streaming of a 112.8 kbps data stream to nodes with link capacity 128 kbps. The packet size is 1410 bytes. Nodes have a playout buﬀer capable of holding 140 packets, which corresponds to 14 s of playout delay. Each node has an output buﬀer of 80 packets to absorb the bursts of packets in its fertile trees. We assume that session holding 1 0.9 times follow a log-normal distribution 0.8 with mean 1/μ = 306s and that 0.7 N=500,d=1,p=0.06 nodes join the overlay according to a N=500,d=1,p=0.12 0.6 N=500,d=1,p=0.14 Poisson process with λ = N μ, supN=500,d=2,p=0.06 0.5 N=500,d=2,p=0.12 ported by studies [19,20]. To obtain N=500,d=2,p=0.14 0.4 N=50000,d=1,p=0.06 the results for a given overlay size N , N=50000,d=1,p=0.12 0.3 N=50000,d=1,p=0.14 we start the simulation with N nodes N=50000,d=2,p=0.06 0.2 N=50000,d=2,p=0.12 in its steady state as described in [21] N=50000,d=2,p=0.14 0.1 0 2 4 6 8 10 12 14 and let nodes join and leave the overLayer (i) lay for 5000 s. The measurements are made after this warm-up period for a Fig. 2. π(i) vs. i for m = t = n = 4, c = 1 static tree over 1000 s and the presented results are the averages of 10 simulation runs. The results have less than 5 percent margin of error at a 95 percent level of conﬁdence. Fig. 2 shows π(i) as a function of i for t = n = 4, c = 1, and diﬀerent overlay sizes and values of d as obtained by the mathematical model (i.e., the BCA policy). The value of the threshold for the FEC(4,3) code is pmax = 0.129. The ﬁgure shows that when the overlay is stable (e.g., ∀d, N at p = 0.06), neither its size nor d has a signiﬁcant eﬀect on π(i). However, in the unstable state (d = 2, N = 50000 at p = 0.12; ∀d, N at p = 0.14) both increasing N and increasing d decrease π(i), as the number of layers in the overlay increases in both cases.

624

G. D´ an, V. Fodor, and I. Chatzidrossos 0.4

Stdev of Packet possession probability (σπ)

1

Packet possession probability (π)

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

t=4,d=1,N=500 t=4,d=1,N=50000 t=4,d=1,π (∞) f

t=8,d=2,N=500 t=8,d=2,N=50000 t=8,d=2,πf(∞) t=16,d=4,N=500 t=16,d=4,N=50000 t=16,d=4,π (∞) f

0 0

0.05

0.1

0.15

0.2

0.25

0.3

Loss probability (p)

Fig. 3. π vs. p for m = 4, n = t, k/n = 0.75

0.35 0.3 0.25

t=4,d=1,N=500 t=4,d=1,N=50000 t=8,d=2,N=500 t=8,d=2,N=50000 t=16,d=4,N=500 t=16,d=4,N=50000

0.2 0.15 0.1 0.05 0 0

0.05

0.1

0.15

0.2

0.25

0.3

Loss probability (p)

Fig. 4. σπ vs. p for m = 4, n = t, k/n = 0.75

Fig. 3 shows π as a function of p obtained with the mathematical model for m = 4 and n = t. The vertical bars show the values π(1) at the upper end and π(L − 1) at the lower end. We included them for d = 1 only to ease readability, but they show the same properties for other values of d as well. The ﬁgure shows that π remains high and is unaﬀected by N and d as long as the overlay is stable. It drops however once the overlay becomes unstable, and the drop of the packet possession probability gets worse as the number of nodes and hence the number of layers in the overlay increases. At the same time the diﬀerence between π(1) and π(L − 1) (the packet possession probability of nodes that are fertile in the ﬁrst and the penultimate layers, respectively) increases. Furthermore, increasing t (and hence n) increases π in a stable system, but the stability region gets smaller and the drop of the packet possession probability gets faster in the unstable state due to the longer FEC codes. The curves corresponding to π f (∞) show the value of the asymptotic bound calculated using (4). Due to the ungraceful degradation of π it is diﬃcult to maintain the stability of the overlay in a dynamic environment by measuring the value of π. Hence, we look at the standard deviation of the packet possession probability. Fig. 4 shows the standard deviation σπ as a function of p obtained with the mathematical model for m = 4 and n = t. The standard deviation increases rapidly even for low values of p and reaches its maximum close to pmax . Increasing the value of t decreases the standard deviation of π, i.e., the number of packets received in a block varies less among the nodes of the overlay. Its quick response to the increase of p makes σπ more suitable for adaptive error control than π itself. To validate our model we ﬁrst present simulation results for the BCA policy. Figs. 5 and 6 show π and σπ respectively as a function of p for the same scenarios as Figs. 3 and 4. Both ﬁgures show a good match with the analytical model and conﬁrm that the increase of the standard deviation is a good indicator of the increasing packet loss probability. For long FEC codes the simulation results show slightly worse performance close to the stability threshold compared to the analytical results. The diﬀerence is due to late arriving packets, i.e., FEC reconstruction is not possible within the time limit set by the playout buﬀer’s size.

Streaming Performance in Multiple-Tree-Based Overlays 0.4

Stdev of Packet possession probability ( σπ)

1

Packet possession probability (π)

0.9 0.8 0.7 0.6 0.5 0.4 0.3

t=4,d=1,N=500 t=4,d=1,N=50000 t=8,d=2,N=500 t=8,d=2,N=50000 t=16,d=4,N=500 t=16,d=4,N=50000

0.2 0.1 0 0

0.05

0.1

0.15

0.2

0.25

0.3

t=4,d=1,N=500 t=4,d=1,N=50000 t=8,d=2,N=500 t=8,d=2,N=50000 t=16,d=4,N=500 t=16,d=4,N=50000

0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0

0.05

Fig. 5. π vs. p for m = 4, n = t, k/n = 0.75. BCA policy, simulation results.

Stdev of Packet possession probability ( σπ)

0.7 0.6 0.5 0.4

0.1 0 0

0.2

0.25

0.3

0.4

0.8

0.2

0.15

Fig. 6. σπ vs. p for m = 4, n = t, k/n = 0.75. BCA policy, simulation results.

1 0.9

0.3

0.1

Loss probability (p)

Loss probability (p)

Packet possession probability (π)

625

t=4,d=1,N=500 t=4,d=1,N=50000 t=8,d=2,N=500 t=8,d=2,N=50000 t=16,d=4,N=500 t=16,d=4,N=50000 0.05

0.1

0.15

0.2

0.25

0.3

Loss probability (p)

Fig. 7. π vs. p for m = 4, n = t, k/n = 0.75. UCA policy, simulation results.

0.35 0.3

t=4,d=1,N=500 t=4,d=1,N=50000 t=8,d=2,N=500 t=8,d=2,N=50000 t=16,d=4,N=500 t=16,d=4,N=50000

0.25 0.2 0.15 0.1 0.05 0 0

0.05

0.1

0.15 0.2 Loss probability (p)

0.25

0.3

Fig. 8. σπ vs. p for m = 4, n = t, k/n = 0.75. UCA policy, simulation results.

To see the eﬀects of the capacity allocation policy, we show π as a function of p in Fig. 7 for the same scenarios as in Fig. 5 but for the UCA policy. Comparing the ﬁgures we see that π is the same in the stable state, but is higher in the unstable state of the overlay. Furthermore, the region of stability is wider. Comparing the results for σπ shown in Fig. 8 we can observe that for N = 50000 and d > 1 the standard deviation is somewhat higher compared to the BCA policy and is similar in shape to the d = 1 case. This and the wider region of stability using the UCA policy is due to that the overlay has less layers than using the BCA policy as it is shown in Fig. 9. The ﬁgure shows the cumulative distribution function (CDF) of the layer where the nodes of the overlay are in their sterile trees. There is practically no diﬀerence between the distributions for d = 1 and the BCA policy with d > 1 for the same t/d value. With the UCA policy nodes tend to have more children in the fertile tree where they are closest to the root due to the parent selection algorithm, so that the trees’ structure is similar to the d = 1 case and the number of layers is lower than using BCA (compare the results for t = 16, d = 1 vs. t = 16, d = 4, UCA vs. t = 16, d = 4, BCA).

626

G. D´ an, V. Fodor, and I. Chatzidrossos

P(A node is sterile in layer ≤ i)

Hence, the data distribution perfor1 mance of an overlay with t trees and d = 1 can be closely resembled with an 0.8 overlay with d > 1 and td trees by em0.6 ploying a (centralized or distributed) mechanism that promotes parents t=4,m=4,d=1,Analytical 0.4 close to the root, such as the UCA polt=4,m=4,d=1 t=8,m=4,d=2,UCA t=8,m=4,d=2,BCA icy. Doing so allows the use of longer 0.2 t=16,m=4,d=4,UCA t=16,m=4,d=4,BCA FEC codes, hence better performance t=16,m=16,d=1 0 in the stable region but a similar sta0 5 10 15 20 25 Layer (i) bility region due to the lower number of layers. At the same time one can Fig. 9. CDF of the layer where nodes are decrease the probability that a node sterile for N = 50000. Simulation results. fails to reconnect to the overlay after the departure of a parent node [16]. Longer FEC codes decrease the variance of the packet possession probability and allow a smoother adaptation of the redundancy as a function of the measured network state.

5

Conclusion

In this paper, we analyzed a peer-to-peer live streaming solution based on multiple transmission trees, FEC and free allocation of the output bandwidth of the peers across several trees. The aim of this design is to avoid tree disconnections after node departures, which can happen with high probability in resource scarce overlays if all the peers can forward data in one tree only. We presented a mathematical model to express the packet possession probability in the overlay for the case of independent losses and the BCA policy. We determined the stability regions as a function of the loss probability between the peers, of the number of layers and of the FEC block length, and analyzed the asymptotic behavior of the overlay for a large number of nodes. We calculated the variance of the packet possession probability to study the overlay around the stability threshold. We concluded that the variance increases signiﬁcantly with the packet loss probability between the peers and consequently it is a good performance measure for adaptive forward error correction. It will be subject of future work to design a robust stabilizing controller that can maintain a target packet possession probability in a dynamic environment. We concluded that as long as the overlay is stable, the performance of the data transmission is not inﬂuenced by the number of fertile trees and the allocation policy, while longer FEC codes improve it. Increasing the number of fertile trees decreases however the packet possession probability in the overlay in the unstable region due to longer transmission paths. Nevertheless, with the UCA policy one can increase the number of trees, that of the fertile trees and the FEC block length, while the performance can be close to that of the minimum depth trees, because the UCA policy leads to shallow tree structures. These results show that adjusting the number of fertile trees can be a means to improve the overlays’ stability without deteriorating the performance of the data distribution.

Streaming Performance in Multiple-Tree-Based Overlays

627

References 1. Liao, X., Jin, H., Liu, Y., Ni, L., Deng, D.: Anysee: Scalable live streaming service based on inter-overlay optimization. In: Proc. of IEEE INFOCOM. (April 2006) 2. Bishop, M., Rao, S., Sripanidkulchai, K.: Considering priority in overlay multicast protocols under heterogeneous environments. In: Proc. of IEEE INFOCOM. (April 2006) 3. Cui, Y., Nahrstedt, K.: Layered peer-to-peer streaming. In: Proc. of NOSSDAV. (2003) 162–171 4. Cui, Y., Nahrstedt, K.: High-bandwidth routing in dynamic peer-to-peer streaming. In: Proc. of ACM APPMS. (2005) 79–88 5. Tan, G., S.A., J.: A payment-based incentive and service diﬀerentiation mechanism for peer-to-peer streaming broadcast. In: Proc. of IEEE IWQoS. (2006) 41–50 6. Sung, Y., Bishop, M., Rao, S.: Enabling contribution awareness in an overlay broadcasting system. In: Proc. of ACM SIGCOMM. (2006) 411–422 7. Guo, M., Ammar, M.: Scalable live video streaming to cooperative clients using time shifting and video patching. In: Proc. of IEEE INFOCOM. (2004) 8. Banerjee, S., Lee, S., Braud, R., Bhattacharjee, B., Srinivasan, A.: Scalable resilient media streaming. In: Proc. of NOSSDAV. (2004) 9. Setton, E., Noh, J., Girod, B.: Rate-distortion optimized video peer-to-peer multicast streaming. In: Proc. of ACM APPMS. (2005) 39–48 10. Padmanabhan, V., Wang, H., Chou, P.: Resilient peer-to-peer streaming. In: Proc. of IEEE ICNP. (2003) 16–27 11. Castro, M., Druschel, P., Kermarrec, A., Nandi, A., Rowstron, A., Singh, A.: SplitStream: High-bandwidth multicast in a cooperative environment. In: Proc. of ACM SOSP. (2003) 12. D´ an, G., Fodor, V., Karlsson, G.: On the asymptotic behavior of end-point-based multimedia streaming. In: Proc. of Internat. Z¨ urich Seminar on Communication. (2006) 13. D´ an, G., Fodor, V., Karlsson, G.: On the stability of end-point-based multimedia streaming. In: Proc. of IFIP Networking. (May 2006) 678–690 14. D´ an, G., Chatzidrossos, I., Fodor, V., Karlsson, G.: On the performance of errorresilient end-point-based multicast streaming. In: Proc. of IWQoS. (June 2006) 160–168 15. Sripanidkulchai, K., Ganjam, A., Maggs, B., Zhang, H.: The feasibility of supporting large-scale live streaming applications with dynamic application end-points. In: Proc. of ACM SIGCOMM. (2004) 107–120 16. D´ an, G., Chatzidrossos, I., Fodor, V.: On the performance of multiple-tree-based peer-to-peer live streaming. In: Proc. of IEEE INFOCOM. (May 2007) 17. Reed, I., Solomon, G.: Polynomial codes over certain ﬁnite ﬁelds. SIAM J. Appl. Math. 8(2) (1960) 300–304 18. Zegura, E.W., Calvert, K., Bhattacharjee, S.: How to model an internetwork. In: Proc. of IEEE INFOCOM. (March 1996) 594–602 19. Veloso, E., Almeida, V., Meira, W., Bestavros, A., Jin, S.: A hierarchical characterization of a live streaming media workload. In: Proc. of ACM IMC. (2002) 117–130 20. Sripanidkulchai, K., Maggs, B., Zhang, H.: An analysis of live streaming workloads on the Internet. In: Proc. of ACM IMC. (2004) 41–54 21. Le Boudec, J.Y., Vojnovic, M.: Perfect simulation and stationarity of a class of mobility models. In: Proc. of IEEE INFOCOM. (March 2004)

Path Selection Using Available Bandwidth Estimation in Overlay-Based Video Streaming Manish Jain and Constantine Dovrolis College of Computing, Georgia Institute of Technology

Abstract. IP networks present a challenging environment for video streaming because they do not provide throughput, jitter, or loss rate guarantees. In this work, we focus on improving the perceived quality of video streaming through dynamic path selection. Selecting one of several Internet paths is possible using multihoming and/or an overlay routing infrastructure. We conduct an experimental comparison of various measurement-based path selection techniques for video streaming. The path selection is based on the measurement of network-layer metrics, such as loss rate, jitter or available bandwidth, while the video quality is evaluated based on the VQM tool. Our experiments show that the most eﬀective technique for adaptive path selection relies on an estimate of the lower bound of the available bandwidth variation range. We show how to perform such measurements using the video packets, eliminating the measurement overhead in the selected path. Finally, we show that adaptive path selection is more eﬀective than a simple, but commonly used, form of FEC.

1

Introduction

As the “last mile” access capacity continues to grow, IP video streaming becomes more popular among users and content providers. Many experts believe that IPTV is the next “killer-application” in the Internet [7]. However, supporting video streaming and IPTV presents signiﬁcant challenges. IP networks often suﬀer from several network impairments, including packet losses, signiﬁcant jitter and one-way delays, as well as outages of unpredictable duration. Additionally, most IP networks today do not oﬀer deterministic or statistical QoS guarantees. Since the early nineties, several approaches for adaptive video streaming applications have been proposed. One approach is to adjust the encoding scheme and/or video frame rate in response to changes in the network state [11,23,4]. The main drawback of such schemes, however, is that the perceived video quality varies with time, causing user dissatisfaction. Another class of approaches is to use proactive error correction techniques, such as Reed-Solomon FEC codes, or to retransmit lost packets through standard ARQ schemes [18,13,12]. The major drawback of FEC schemes is that they introduce bandwidth overhead even when the network does not drop packets. The drawback of retransmissions is that they require a playback delay of a few round-trip times, additional state/buﬀering at the sender, and reverse-path traﬃc (e.g., negative ACKs). Another approach I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 628–639, 2007. c IFIP International Federation for Information Processing 2007

Path Selection Using Available Bandwidth Estimation

629

is to mask the eﬀect of lost or discarded (i.e., late) packets through the use of codec-speciﬁc error concealment techniques [19,14]. The eﬀectiveness of such techniques is limited however. Even though IP networks typically use a single path from one host to another, the recent popularity of multihoming and overlay networks allows content providers to choose between several network paths towards a given receiver [26]. Such path diversity gives video streaming one more adaptation option: to dynamically switch from one path to another depending on the observed (or predicted) performance in the candidate paths. This technology uses network-level measurements, such as loss or jitter, and it has been shown that it can quickly react to congested paths or outages [22,2]. A variation of this approach is to combine path diversity with Multi-Description Coding (MDC) techniques [5,3], and use multiple paths simultaneously. The studies in this area rely on loss rate, delay or TCP throughput measurements, and they typically perform these measurements using “dummy” probing packets. In this work, we consider an overlay-based video streaming architecture in which the objective is to maximize the perceived video quality through dynamic overlay path selection. A novel aspect of our study is that the network measurements that drive the path selection process rely on available bandwidth (avail-bw) estimation. The avail-bw of a network path is deﬁned as the residual capacity at the path’s bottleneck, and so it represents the maximum additional load that the path can carry before it becomes saturated [9]. The reason we focus on avail-bw is because this metric can determine whether a path has enough capacity to carry a video stream before we switch the stream to that path. Other network-layer metrics, such as jitter or packet loss rate, can only determine whether a path is already congested, causing degradation in the video quality at the receiver [20,21,6]. We also show how to modify an existing avail-bw estimation technique, described in [9], so that the measurements are performed using application packets, rather than “dummy” probing packets, eliminating the measurement overhead in the currently selected path. We evaluate the video quality based on the VQM technique described in the ITU-T recommendation J.144 [1]. It has been shown that VQM is superior to other video quality metrics, such as PSNR, because the VQM score is more representative of the user-perceived video quality [15]. With a series of repeatable experiments in a controlled environment we compare the VQM score of path selection schemes based on jitter, loss rate, and various percentiles of the avail-bw distribution. The main result of this experimental study is that performing path selection based on the estimated lower bound of the avail-bw variation range performs signiﬁcantly better in terms of VQM score, path switching frequency, and probability of aborting an ongoing video stream. The rest of the paper is organized as follows. §2 presents an overlay-based video streaming architecture. §3 describes the path selection techniques that we evaluate. The experimental methodology is described in §4, and the results are presented in §5. §6 gives a brief comparison between adaptive path selection and a simple (but commonly used) FEC scheme. We conclude in §7.

630

2

M. Jain and C. Dovrolis

VDN Architecture and In-Band Measurements

In this section, we present the high-level architecture of an overlay-based video streaming architecture, referred to as the Video Distribution Network (VDN). A content provider constructs a VDN by deploying several overlay nodes that will act as either overlay ingress/egress nodes or as intermediate nodes (see Figure 1). Each VDN node runs a Measurement Module (M-module) to measure the performance (jitter, loss rate, avail-bw) of the overlay links from that VDN node to its neighbors. The VDN also runs a link-state protocol so that each node is aware of the latest state in all VDN links. We only consider VDN paths with at most one intermediate overlay node, based on the results of [26]; additional intermediate nodes are rarely needed in practice. The path of a video stream is determined at the ingress VDN node, i.e., we use source routing. The egress VDN node removes any VDN headers and delivers the stream to the receiver.

Video Server INGRESS

INTERMEDIATE

INTERMEDIATE

EGRESS

Video Receiver

Fig. 1. VDN architecture

For the purposes of this paper, the most important aspect of the VDN architecture is the M-module. In our implementation, the M-Module relies on active measurement to estimate the packet loss rate, jitter, and the avail-bw variation range at a given overlay link. The loss rate is measured as the fraction of lost packets in a 1-Mbps stream of 1500-byte packets. The jitter is estimated as the maximum absolute diﬀerence between the spacing of consecutive packets at the receiver relative to their spacing at the sender, using the same probing stream. The avail-bw variation range is measured as described in our earlier work [9,10]. One feature of the M-Module is that it uses the application’s video packets to perform in-band network measurement. The in-band approach eliminates the measurement overhead, at least in the path that is currently selected for video streaming (we still use out-band measurements with empty probing packets in paths that do not transfer any video streams). Using video packets for the estimation of loss rate or jitter is relatively straightforward. The estimation of avail-bw, on the other hand, requires shaping the video stream at a diﬀerent rate than the transmission rate at the sender. In the following paragraph we

Path Selection Using Available Bandwidth Estimation

Packet #

@ Ingress Rp > Rv

Rv

@ Egress

Rp < Rv

D

5 δ4

4 3 2 1

631

δ4 δ2

δ2 D

Dispersion Fig. 2. Shaping of video packets to a probing rate Rp for avail-bw estimation

describe how the M-module shapes the video stream to a particular rate at the input of each overlay link. Suppose that N video packets of size L arrive at a VDN node at a rate Rv . The arrival time of the ith packet is tai . The objective of the local M-module is to shape those N packets at a probing rate Rp , so that it can measure whether Rp is smaller or larger than the avail-bw in the outgoing overlay link. To do so, the N incoming packets are delayed so that their output spacing is L/Rp . If tdi is the departure time of the ith packet, then tdi − tdi−1 = L/Rp and tdi = tai + δi , where δi is the delay introduced to the ith packet. For instance, if Rp > Rv , the M-module shapes the packet stream by introducing a decreasing amount of delay in successive packets; otherwise, the M-module introduces an increasing amount of delay in successive packets (see Figure 2). The M-module adds the value δi into the VDN header of packet i. In some cases, the receiving application may demand that the video stream arrives at the rate Rv in which it was sent to (e.g., for clock synchronization). In that case, the egress VDN node can reshape the video stream rate back to the initial transmission rate at the sender, by delaying each packet by D − δi , where D is the maximum cumulative delay that can be introduced to the packet from the M-modules at the ingress and intermediate nodes and δi is the corresponding cumulative delay that the ith packet experienced.

3

Path Selection Schemes

In this section, we describe the four path selection schemes we evaluate. The schemes are distinguished based on the choice of the key measured network performance metric. Loss based path selection (LPS): In LPS, we monitor the average loss rate in all candidate paths during 3-second periods. The path with the minimum loss rate is selected. If the currently used path has zero loss rate, then we do not switch to another path even if there are other loss-free paths.

632

M. Jain and C. Dovrolis

Jitter based path selection (JPS): The jitter of successive packet pairs is also measured over 3-second periods. The path with the minimum 90th percentile of jitter measurements is selected. If the minimum jitter is practically the same in more than one paths, then JPS selects the path with the lowest loss rate. If the loss rate is also equal, then JPS stays at the current path if that is one of the best paths, or it randomly picks one of the best paths otherwise. Avail-bw based path selection (APS): This scheme has two variations. In the ﬁrst, we use the average avail-bw (A-APS). In the second, we use the lower bound of the avail-bw variation range (L-APS) (see [10] for more details). A new avail-bw estimate results in almost every 3 seconds, similar to LPS and JPS. If the availbw estimate (average or lower bound) in the currently selected path is greater than twice the video transmission rate, then we stay in that path. Otherwise, we choose the path with the highest avail-bw estimate; note that this may still be the currently selected path.

4

Experimental Setup

We have evaluated the performance of the previous path selection schemes with controlled experiments in the testbed of Figure 3. The video stream is transmitted from node B to E through either the direct path, or through the path that traverses node D. The video stream is a 2-minute clip formed by combining SMPTE test video sequences [8] and encoded in MPEG-2 with average rate 6Mbps. The VLC player [24] transmits the stream to the network. The stream is initially routed through the direct path. The M-modules run at nodes B, D, and E, and they perform both network measurement and the path selection process. Both paths carry cross traﬃc and they are occasionally congested. The cross traﬃc in each path is generated by replaying NLANR packet traces, collected from various university access links [16]. The average cross traﬃc rate is set to the desired value by scaling the packet interarrivals by the appropriate factor. We compare the performance of various path selection algorithms based on three criteria: video quality, user-abort probability and path switching frequency. The video quality is measured using the VQM tool [25], which implements the ITU-T J.144 recommendation [1]. VQM compares the original video stream with the received video stream, and it reports a metric between 0 and 1. Note that a lower VQM scores correspond to better video quality. It has been shown that the VQM score correlates very well with the user-perceived video quality (MOS score). The VQM software supports ﬁve models to evaluate video quality, described in detail in the NTIA Handbook [17]. In this work, we use the television model. In the following graphs we report the minimum, average, and maximum VQM score from the ﬁve runs of each experiment. The ﬁve runs diﬀer in terms of the initial phase between the video clip and the cross traﬃc traces. The user-abort probability focuses on the short-term variations of the VQM score. The idea is that if the VQM score is too high (poor quality) during a time window, then the user would either abort the video stream or she would be

Path Selection Using Available Bandwidth Estimation

633

Node_A

Node_D Node_B INGRESS

Cross Traffic Node_E EGRESS

100 Mbps 100Mbps 1 Gbps

1 Gbps 100 Mbps (Tight Link)

Node_C

Cross Traffic 1 Gbps

1 Gbps

Node_F

Fig. 3. Testbed

unsatisﬁed. We measure the VQM score of consecutive 10-second video segments. A video stream is considered aborted if one of the following two conditions is met: either the VQM score in a single segment is larger than 0.55, or two consecutive video segments have VQM scores larger than 0.35. We chose these values based on extensive subjective tests of several video streams under diﬀerent conditions. To estimate the user-abort probability, we measured the fraction of aborted video stream in 30 experiments. The last evaluation metric is the total number of path switching events. Even though the path switching frequency does not aﬀect the video quality, it is an important aspect of any dynamic routing mechanism from the network operator’s perspective. Frequent path switching of large traﬃc volumes can aﬀect the network stability and traﬃc engineering. Consequently, even though our primary interest is to optimize video streaming quality, we would also like to avoid unnecessary path switching.

5

Results

Figures 5, 6 and 7 show the performance of the considered path selection schemes under diﬀerent load conditions (i.e., utilization of the bottleneck link in each path). The PSC trace shown in Figure 4 is replayed at the direct path, while the FRG trace is replayed at the indirect path. We adjust the average rate of each trace to achieve the desired utilization. Even though we set the long-term average rate of the two traces at the same value, there are time periods where one path is congested while the other is not. Note that there are time periods, mainly in higher load conditions, where both paths are congested. Obviously, path switching techniques cannot avoid congestion in that case, but they can still choose the least congested path. Figure 5 shows the overall VQM score for each path selection scheme. LPS clearly performs poorly since it only reacts after congestion has aﬀected the currently selected path. The A-APS scheme does not perform much better than LPS. The reason is that the average avail-bw does not capture the variability of

634

M. Jain and C. Dovrolis 120 PSC

90

Rate (Mbps)

60 30 0 120

20

40

60

80

60

80

FRG

90 60 30 0

20

40

Time (sec)

Fig. 4. Exmaples of stationary and non-stationary NLANR cross traﬃc traces

0.6

VQM

0.5 0.4

LPS A-APS JPS L-APS

0.3 0.2 0.1 0 50

60

70

80

90

100

Utilization (%)

Fig. 5. VQM scores for the four path selection schemes

the avail-bw distribution, and so the two paths appear as almost equally good in most of the time. The JPS and L-APS schemes have comparable performance and they are clearly better than A-APS and LPS. This is because both JPS and L-APS are able to detect the onset of queuing delays in the currently selected path, before that path becomes congested. Note that L-APS is slightly better than JPS, especially in the case of 90% utilization. Figure 6 shows the number of path switching events in the same set of experiments. The main observation is that L-APS has the lowest path switching frequency. JPS causes signiﬁcantly more path changes, and is comparable to A-APS. This is because JPS relies on a comparison of the maximum jitter in the two paths, and so a minor variation in the jitter, which can result from an shortlived cross-traﬃc burst, may trigger JPS to switch paths. Instead, L-APS does not switch paths if the currently selected path provides a large safety margin, in terms of avail-bw, for the given video stream rate. Figure 7 shows the user-abort probability, i.e., the fraction of aborted video streams. The ranking of the four path selection schemes is as in the case of the long-term VQM score in Figure 5.

Path Selection Using Available Bandwidth Estimation

635

Number of path switches

We next show some results for the traces shown in Figure 8. These traces include instances of traﬃc non-stationarity. In the PSC trace, the traﬃc rate varies abruptly between 20Mbps and 45Mbps during the second half (level shifts). On the other hand, the AMP trace exhibits periods of slowly increasing traﬃc load (notice the traﬃc “ramp”). We are interested to examine the eﬀectiveness of the considered path selection schemes under such traﬃc conditions. In these experiments, we use the same trace in both paths. In one path, the trace playback starts from the beginning, while in the other path the trace playback starts from the middle. 22 20 18 16 14 12 10 8 6 4 2 0 50

LPS A-APS JPS L-APS

60

70

80

90

100

Utilization (%)

Fig. 6. Path switching frequency

Abort Fraction (%)

24 20 16

LPS A-APS JPS L-APS

12 8 4 0 50

60

70

80

90

100

Utilization (%)

Fig. 7. User-abort probability

Figures 9 and 10 show the VQM scores for the PSC and AMP traces, respectively. Note that, overall, the level shifts of the PSC trace cause higher VQM scores compared to the smoother AMP trace. L-APS performs clearly better in this case than the JPS scheme. Note that there is a diﬀerence in the relative performance of JPS and A-APS in the two traces. The reason is that, even though

636

M. Jain and C. Dovrolis 60 40

Rate (Mbps)

20 0 0 60

20

PSC-1122856626 60 80

40

AMP-1122897844

40 20 0 0

20

40

60

80

Time (sec)

Fig. 8. Cross traﬃc traces with instances of non-stationarity 1 0.9 0.8

VQM

0.7 0.6

LPS A-APS JPS L-APS

0.5 0.4 0.3 0.2 0.1 0 50

60

70

80

90

100

Utilization (%)

Fig. 9. VQM scores for the PSC trace 1 0.9 0.8

VQM

0.7 0.6

LPS JPS A-APS L-APS

0.5 0.4 0.3 0.2 0.1 0 50

60

70

80

90

100

Utilization (%)

Fig. 10. VQM scores for the AMP trace

JPS is more proactive than A-APS in switching paths, in the AMP trace the A-APS scheme performs slightly better because it can detect the slowly decreasing level of avail-bw during the traﬃc ramp.

Path Selection Using Available Bandwidth Estimation

6

637

Path Switching Versus FEC

In this section, we conduct a preliminary comparison between path switching techniques and FEC-based loss recovery. A commonly used FEC scheme is the Reed-Solomon (RS) code [18,13,12]. In an (n, k)-RS code, n − k out of n packets carry FEC packets. An (n, k) RS-code can recover from all losses in a block of n packets if at least k of those n packets are received. The main drawback of FEC-based schemes is their transmission overhead n − k/k. Path Switching (load 70%) (k+1,k) FEC (load 70%) Path Switching (load 80%) (k+1,k) FEC (load 80%)

0.25

VQM

0.2 0.15 0.1 0.05 0

19

16

13

10

7

4

1

k

Fig. 11. Path switching versus FEC

Here, we evaluate the simplest form of FEC in which n = k + 1. This instance of RS-coding is equivalent to sending a single parity packet after every k data packets. We compare the VQM score of this technique with L-APS for two load conditions: 70% and 80% bottleneck utilization using the NLANR traﬃc trace (BWY-1063315231). Figure 11 shows the results for diﬀerent values of k. At the far left, k=19 corresponds to a transmission overhead of about 5%, while at the far right, k=1 corresponds to 100% overhead. Note that path switching with L-APS performs consistently better than FEC, except the case of k=1. The main reason is that path switching can often avoid congestion altogether, while FEC attempts to recover from the eﬀects of congestion, which is not always possible. Additionally, the FEC scheme we evaluate here is not eﬀective in dealing with the bursty nature of congestion-induced packet losses.

7

Summary

This work focused on the use of measurement-driven path selection techniques for video streaming. We showed that if the path selection is driven by a conservative estimate of the available bandwidth, then the resulting video streaming performance is signiﬁcantly improved compared to other commonly used network metrics. An interesting open problem is to develop path switching mechanisms that are driven by direct video quality measurements at the receiver. Another open problem is to evaluate the performance of a hybrid approach using both FEC and path selection mechanisms.

638

M. Jain and C. Dovrolis

Acknowledgement This work was supported in part by a research grant from EGT Inc. We are thankful to Nikil Jayant from Georgia Tech., and Junfeng Bai, John Hartung and Santhana Krishnamachari from EGT Inc. for the valuable discussions.

References 1. Objective Perceptual Video Quality Measurement Techniques for Digital Cable Television in the presence of Full Reference. ITU-T Recommendation J.144 rev. 1, 2003. 2. Y. Amir, C. Danilov, S. Goose, D. Hedqvist, and A. Terzis. 1-800-OVERLAYS: Using Overlay Networks to Improve VoIP Quality. In Proceedings of NOSSDAV, 2005. 3. J. Apostolopoulos, T. Wong, W. Tan, and S. Wee. On Multiple Description Streaming with Content Delivery Networks. In Proceedings of INFOCOM, 2002. 4. A. Balk, D. Maggiorini, M. Gerla, and M. Y. Sanadidi. Adaptive MPEG-4 Video Streaming with Bandwidth Estimation. In Proceedings of QoS-IP, 2003. 5. A. Begen, Y. Altunbasak, O. Ergun, and M. Ammar. Multi-path Selection for Multiple Description Video Streaming over Overlay Networks. Signal Processing: Image Communication, 20:39–60, 2005. 6. J. Boyce and R. Gaglianello. Packet Loss Eﬀects on MPEG Video Sent Over the Public Internet. In Proceedings of Multimedia, 1998. 7. S. Cherry. The Battle for Broadband. IEEE Spectrum, 42(1):24–29, Jan. 2005. 8. C. Fenimore. Mastering and Archiving Uncompressed Digital Video Test Materials. In Proceedings of 142nd SMPTE Technical Conference, 2000. 9. M. Jain and C. Dovrolis. End-to-End Available Bandwidth: Measurement Methodology, Dynamics, and Relation with TCP Throughput. IEEE/ACM Transactions on Networking, 11(4):537–549, Aug. 2003. 10. M. Jain and C. Dovrolis. End-to-end Estimation of Available Bandwidth Variation Range. In Proceedings of SIGMETRICS, June 2005. 11. K. Jeﬀay, D. L. Stone, T. Talley, and F. D. Smith. Adaptive, Best-Eﬀort Delivery of Digital Audio and Video Across Packet-Switched Networks. In Network and Operating System Support for Digital Audio and Video, 1993. 12. W. Jiang and H. Schulzrinne. Comparison and Optimization of Packet Loss Repair Methods on VOIP Perceived Quality under Bursty Loss. In Proceedings of NOSSDAV, 2002. 13. L. Kontothanassis, R. Sitaraman, J. Wein, D. Hong, R. Kleinberg, B. Mancuso, D. Shaw, and D. Stodolsky. A Transport Layer for Live Streaming in a Content Delivery Network. Proceedings of IEEE, 92, 2004. 14. W. M. Lam and A. Reibman. An Error Concealment Algorithm for Images Subject to Channel Errors. IEEE Transactions on Image Processing, 4(5):533–542, May 1995. 15. X. Lu, S. Tao, M. E. Zarki, and R. Guerin. Quality-Based Adaptive Video Over the Internet. In Proceedings of CNDS, 2003. 16. NLANR MOAT. Passive Measurement and Analysis. http://pma.nlanr.net/ PMA/, 2006. 17. M. Pinson and S. Wolf. NTIA HB 06434: In-Service Video Quality Metric Users Manual. http://www.its.bldrdoc.gov/pub/ntia-rpt/06-434/, 2005.

Path Selection Using Available Bandwidth Estimation

639

18. I. Reed and G. Solomon. Polynomial Codes over Certain Finite Fields. Journal of the Society for Industrial and Applied Mathematics, 8(2), 1960. 19. P. Salama, N. Shroﬀ, E. Coyle, and E. Delp. Error Concealment Techniques for Encoded Video Streams. In Proceedings IEEE International Conference on Image Processing, pages 9–12, 1995. 20. W. Tan and A. Zakhor. Real-Time Internet Video Using Error Resilient Scalable Compression and TCP-Friendly Transport Protocol. IEEE Transactions on Multimedia, 1(2), 1999. 21. S. Tao, J. Apostopoulos, and R. Guerin. Real-Time Monitoring of Video Quality in IP Networks. In Proceedings of NOSSDAV, 2005. 22. S. Tao and R. Guerin. Application-speciﬁc Path Switching: A Case Study for Streaming Video. In Proceedings of ACM International Conference on Multimedia, 2004. 23. B. Vandalore, W. chi Feng, R. Jain, and S. Fahmy. A Survey of Application Layer Techniques for Adaptive Streaming of Multimedia. Real-Time Imaging, (3), 2001. 24. VLC Media Player. http://www.videolan.org/vlc, 2006. 25. S. Wolf. VQM Software. http://www.its.bldrdoc.gov/n3/video/vqmsoftware.htm, 2006. 26. Y. Zhu, C. Dovrolis, and M. Ammar. Dynamic Overlay Routing Based on Available Bandwidth Estimation: A Simulation Study. Computer Networks Journal, 2006.

Fundamental Tradeoﬀs in Distributed Algorithms for Rate Adaptive Multimedia Streams Vilas Veeraraghavan and Steven Weber Drexel University, Department of ECE, Philadelphia, PA 19104 [email protected], [email protected]

Abstract. Rate adaptive multimedia streams are streaming media connections where the encoding rate is adjusted dynamically (with corresponding changes in media content resolution) in response to changing levels of congestion along the connection. The ﬁeld of optimization based congestion control has yielded sophisticated distributed algorithms for resource allocation among competing elastic streams. In this work we study the fundamental tradeoﬀs for a class of optimization based distributed algorithms for rate adaptive streams, building on our earlier work. We focus on three tradeoﬀs: i) the tradeoﬀ between maximizing client average quality of service (QoS) and client fairness, ii) the tradeoﬀ between granularity of control (both temporal and spatial) and QoS, and iii) the tradeoﬀ between maximizing the received volume and minimizing the ﬂuctuations in received rate. These tradeoﬀs are illustrated through extensive simulation results using ns-2.

1

Introduction

1.1

Optimal Congestion Control for Rate Adaptive Streams

Optimal congestion control for elastic traﬃc is a mathematically tractable optimization problem because of both the separable nature of the objective function (sum user utility) and the assumption that the individual user utility functions are strictly concave increasing. As pointed out by Shenker [1], the utility function for a rate adaptive stream will be concave increasing but will have a convex increasing neighborhood around zero, capturing the fact that even rate adaptive media has an associated minimum required rate for satisfactory service quality. This convex neighborhood around zero complicates the mathematical analysis substantially. Recent work by Lee, Mazumdar, and Shroﬀ [2] and Chiang, Zhang, and Hande [3] has thoroughly discussed solutions to optimization problems of this type. The work by Lee et al. uses the theory of subdiﬀerentials to obtain traction on the non-convex optimization problem, although the possible existence of a duality gap limits the value of this approach in the absence of further restrictions. They propose the “self-regulating” property, which essentially requires users make reasonable allocation requests that ensure non-negative net utility (beneﬁt minus costs).

This work is supported by the NSF under grant 0435247.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 640–651, 2007. c IFIP International Federation for Information Processing 2007

Fundamental Tradeoﬀs in Distributed Algorithms

641

The work by Chiang et al. takes a diﬀerent focus by identifying necessary and suﬃcient conditions under which the distributed algorithms designed for concave utility functions converge to globally optimal resource allocations even when the actual utility functions are are not concave. Furthermore, the authors identify several practical issues that complicate deployment of distributed algorithms for rate adaptive streams: timescale and causality. In particular, they establish that prices need to be generated on a faster time scale than required for elastic traﬃc, and that optimal prices at time t may depend upon optimal prices at a future time t > t. The focus of [2,3] is on the mathematical complexities associated with nonconcave optimization problems and their corresponding distributed solutions. Our focus, instead, is on three fundamental tradeoﬀs not explicitly addressed in [2,3]: optimality versus fairness, granularity of control, and received volume versus rate ﬂuctuations (discussed in detail below). 1.2

Three Fundamental Tradeoﬀs

Our model of a rate adaptive stream is based on the notion of stream volume (v), deﬁned as the product of the maximum desired bit rate at full resolution (smax ) times the stream duration (d). For simplicity, we assume the maximum desired bit rate is a constant, even though in the common case of VBR encoding of video the maximum rate will be time-varying. Thus v max = smax d is the total number of bits associated with the stream. By employing dynamic rate adaptation, corresponding to a time-varying instantaneous received rate s(t) ≤ d smax , the client will receive a total number of bits v = 0 s(t)dt. Quality of service. We study two distinct quality of service measures, the time average utility and the rate of adaptation, which serve as a proxy for clientperceived media quality. The ﬁrst metric, q, is the time average (over the stream duration) of the instantaneous utility, u(x), where instantaneous utility is measured as a function of the normalized received rate s(t)/smax . Note that in the case where u(x) = x (linear utility), we have q = v/v max , the fraction of bits received. The rate of adaptation, r, is simply the time average rate of change of the instantaneous received rate, i.e., the sum of the magnitudes of the changes in rate divided by the duration. We have selected these two QoS metrics because they each capture an important part of the overall rate adaptive streaming media client experience. In particular, q captures the fact that higher instantaneous rates yield higher instantaneous utility, while r captures the fact that ﬂuctuations in the encoding level are both aurally and visually distracting. Tradeoﬀ #1: optimality and fairness. We assume the primary objective of the network is to maximize the client average time average utility. Since each stream is counted equally in computing the client average, it follows that the client average is maximized by giving preferential treatment to small volume users. Intuitively, allocating resources to small volume streams goes further proportionally to improving their QoS than does allocating resources to large volume streams. We term a resource allocation policy that attempts to maximize

642

V. Veeraraghavan and S. Weber

client average QoS by giving preferential treatment to small clients a volume discriminatory policy. It is clear that, although small volume streams are satisﬁed under such an allocation, users requesting large volume streams (say, of high rate streaming video content) will bristle at the policy. Fundamentally the tension can be seen as that between optimality and fairness. Tradeoﬀ #2: granularity of control and QoS. Dynamics rate adaptation can be done in real time by on the ﬂy re-encoding of the media content to yield the desired bit rate, or “stored” by selecting among one of a discrete set of available encodings. The tradeoﬀ is roughly this: real-time encoding is computationally burdensome and therefore infeasible for large scale media servers, but has the singular beneﬁt that the transmission rate can be tuned at an arbitrarily ﬁne granularity of control. Stored encoding is more scalable computationally (although storage may become an issue), but the available transmission rates are limited to the stored encodings. We call the number of encodings the spatial granularity of control. A second natural tradeoﬀ for any distributed algorithm is that between the optimality of the resource allocation obtained by the algorithm and the time between state updates. Clearly the performance of the algorithm improves in the timeliness of the feedback. This improvement may slow for very rapid state updates as the time scale of the updates becomes much faster than the time scale of the changes in system state. We call the time between updates the temporal granularity of control. Tradeoﬀ #3: time average utility and rate of adaptation. There is an inherent tension between the objective of maximizing the time average utility (q), and the objective of minimizing the rate of adaptation (r). As mentioned above, both q and r are important in that q captures the notion that higher resolution encodings yield higher user satisfaction, while r captures the notion that changes in encoding level detract from the user experience. It is intuitively clear that the highest received rate is obtained by an algorithm that is capable of instantly adjusting the encoding level to match the available capacity on the link. Such an algorithm will clearly maximize q, but in an environment where the available capacity is changing rapidly, will also incur a possibly unacceptably high rate of adaptation, r. Of course minimizing r is easy: simply avoid dynamic rate adaptation altogether, which has the cost of foregoing a signiﬁcantly higher average received rate obtainable through dynamic adaptation.

2

Controllers for Rate Adaptive Streams

The controllers presented in this section are developed in our earlier work [4] (that work does not discuss the three tradeoﬀs which form the heart of this work). The controllers are similar to but distinct from the controllers developed by [2,3]. The ﬁrst distinction is that we assume an admission control mechanism limits the traﬃc on the link such that each stream is assured of receiving its minimum granularity encoding, and that the utility function is concave for

Fundamental Tradeoﬀs in Distributed Algorithms

643

encodings above the associated minimum rate; this removes the focus on the convex neighborhood around zero which is central to [2,3]. Second, our algorithm is designed to maximize our primary QoS metric, the time average utility q, whereas the algorithms in [2,3] are designed to maximize the instantaneous utility. Third, we emphasize the use of a near-optimal discrete controller, which selects the encoding level among the discrete set of available encodings, whereas [2,3] focus on continuous controllers. 2.1

Network Model, Stream Model, and QoS Metrics

Network model. We let L denote the set of links in the network, and the vector c = (cl , l ∈ L) denote the capacities of those links. We assume that the streaming traﬃc is given priority over the best-eﬀort traﬃc on the network, so that the entire link capacity is available to the streaming traﬃc. We recognize this is a major assumption, but our focus in this work is not on observing the co-existence of streaming and elastic traﬃc in the model. Each client-server pair is identiﬁed with a unique and ﬁxed route through the network. Let R denote the set of routes, where a route r is composed of a set of links {l ∈ r} = {l | l ∈ r}. The vector λ = (λr , r ∈ R) denotes the arrival rate of new stream requests on each route. We index the admitted streams on each route, so that (i, r) denotes the ith admitted stream on route r. Stream model. We model a rate adaptive stream by ﬁve quantities: i) stream duration (d), ii) minimum subscription level (smin ), iii) maximum subscription level (smax ), iv) the instantaneous normalized rate utility function u : [0, 1] → [0, 1], and v) the weight (w), reﬂecting the relative importance of the stream. All ﬁve quantities will in general be stream-dependent. Each stream (i, r) has its individual minimum and maximum subscription max level denoted by (smin i,r , si,r ). We assume the utility function for each client, ui,r : [0, 1] → [0, 1], is a twice diﬀerentiable strictly concave increasing function with a convex neighborhood around zero. The argument of the utility function is the fractional rate received, i.e., if the client receives rate si,r then the utility min is ui,r (si,r /smax i,r ). We deﬁne si,r as the rate where the utility function switches from convex to concave. We consider both continuous and discrete controllers. A continuous controller is capable of creating an encoding of any desired rate max “on the ﬂy” si,r ∈ [smin i,r , si,r ]. A discrete controller is capable of using any of a min max set Sir = {si,r , . . . , si,r } “stored” encodings. The admission control rule is this: admit a new stream i on route r as long as there is suﬃcient capacity to satisfy the minimum rate requirements of the previously admitted streams as well as that of the stream seeking admission: smin i,r +

nr r l

smin j,r ≤ cl , l ∈ r,

(1)

j=1

where nr is the number of active streams on route r at the time of request. Note that the admission process is completely separate from the allocation process.

644

V. Veeraraghavan and S. Weber

Quality of service metrics. The ﬁrst quality of service metric is the time average utility: wi,r ai,r +di,r u(si,r (t)/smax (2) qi,r = i,r )dt, di,r ai,r where ai,r is the admission time, di,r is the duration, and wi,r is an assigned weight. The second metric is the rate of adaptation: ai,r +di,r 1 (3) ri,r = si,r (t) − si,r (t+ )dt. di,r ai,r Note that r is used both to indicate a route and the rate of adaptation; the meaning will be clear from context. We let q be the primary QoS metric, and r be the secondary QoS metric. Thus when we speak of maximizing QoS we will always mean maximizing q. 2.2

Continuous Rate Controller

In our earlier work [4] we show that the the weighted client average QoS is maximized provided the resource allocation at each point in time t is the solution of a weighted sum utility optimization problem: w s i,r i,r max ui,r max si,r ≤ cl , l ∈ L . (4) s∈S di,r si,r (i,r)∈N (t)

rl i∈Nr

where S is set of all feasible allocations for active streams, N (t) is the set of active streams at time t and Nr is the set of active streams on route r. This objective plays the same role as the SYSTEM problem originally formulated by Kelly in [5]. Recall our assumption that the optimization must ensure each stream receives its minimum encoding rate or higher. This assumption, and the deﬁnition of the minimum rate as the point at which the utility function switches from convex to concave, allow us to apply Kelly’s distributed algorithm framework in [6] to the above problem. The resulting controller is: w s (t) i,r i,r ui,r (t) , (i, r) ∈ N (t), (5) − p s˙ i,r (t) = κsi,r (t) r vi,r smax i,r As mentioned in the introduction, this controller is of the same canonical form as that proposed by Kelly et al. in [6], where κ is the gain constant and pr (t) is the route price, assumed to be additive over the instantaneous link costs comprising the route. The only diﬀerence from Kelly’s formulation is that we require that max the continuous controller maintain a rate in the interval [smin i,r , si,r ]. Thus we set s˙ i,r (t) = 0 if either the route price pr (t) is high and si,r (t) = smin i,r or the route price is low and si,r (t) = smax i,r . The volume dependent continuous controller sets each wi,r = w, while the volume independent controller sets each wi,r = vi,r . To encourage faircomparison between these two controllers we select the weight w such that E Vw = 1, where the expectation is taken with respect to the distribution of the volume of the the admitted streams.

Fundamental Tradeoﬀs in Distributed Algorithms

2.3

645

Discrete Rate Controller

Introduced in [4], our proposed discrete controller works as follows. Each stream runs a virtual controller that computes s˙vir i,r (t) from the continuous controller of the previous section, and from this computes svir i,r (t) in response to the updated min max route prices pr (t). Each stream employs a pair of thresholds (zi,r , zi,r ) such that max smin i,r + si,r min max smin ≤ zi,r < smax (6) i,r < zi,r ≤ i,r . 2 The subscription level changes according to the following rule: max smin i,r ⇒ si,r

max if svir i,r (t) > zi,r ,

min smax i,r ⇒ si,r

min if svir i,r (t) < zi,r .

(7)

Note that when we set min zi,r =

max smin i,r + si,r max = zi,r , 2

(8)

the above algorithm simply selects available subscription level nearest to the virtual subscription level. Setting the min and max thresholds strictly below and above the median subscription level serves to retard the frequency of subscription level changes. In particular, if a stream is low then the virtual subscription level max to induce an increase. Similarly, if a stream is high must actually rise above zi,r min to induce a decrease. then the virtual subscription level must drop below zi,r This serves as a hysteresis mechanism to retard the ﬂuctuations in subscription level which have an adverse eﬀect on the rate of adaptation metric.

3

Three Fundamental Tradeoﬀs

We study the three fundamental tradeoﬀs in distributed algorithms for rate adaptive streams. We present simulation results obtained by implementing the continuous and discrete controllers from the previous section in ns-2. Due to space constraints we restrict our attention in this section to a single link model of capacity c (kbps). User utility function. Similar to our example in [4], we presume that all streams employ a common (sigmoid) utility function: u(x) =

1 , ρ > 0, σ > 0, γ ∈ (0, 1). 1 + ρe−σ(x−γ)

(9)

Recall that the argument of the utility function is the fractional rate x = s/smax . Thus u has a convex neighborhood around zero extending to x = γ, and is then concave for x > γ. The parameter σ governs the shape. The parameter ρ governs the height of the function for x near γ. We have selected γ = 12 , ρ = 3, and σ = 10. This means each stream has a minimum rate that is 50% of its

646

V. Veeraraghavan and S. Weber

maximum rate, i.e., smin /smax = 1/2. Moreover, the minimum rate is presumed to account for approximately 50% of the possible quality or satisfaction, since u(smin /smax ) = u(γ) = 1/(1 + ρ) = 1/2, while u(1) ≈ 1. Maximum subscription level and stream duration. Streaming media content varies widely in both the maximum subscription level (small bit rates for audio content, high bit rates for HD video content), and the duration (short durations for songs, long durations for movies). To capture this diversity in the simplest manner possible, we employ an elephants and mice model where max and the stream duration, Di,r , are both the maximum subscription level, Si,r Bernoulli random variables. The notation X ∼ Ber(s, l, p) denotes a random variable X is Bernoulli with p = P(X = s) = 1 − P(X = l), where we think of s for small and l for large. Deﬁne the constants sˆmax,min sˆmax,max ps σ dˆmin dˆmax pd δ 128 1280 0.5 704 60 600 0.5 330

(10)

All rates are in kbps and all durations are in seconds. Deﬁne the volume diversity parameter a ∈ [0, 1] and the diversity spread functions smin,max smax,min(a) = (1 − a)σ + aˆ smax,max(a) = (1 − a)σ + aˆ smax,max dmin (a) = (1 − a)δ + adˆmin dmax (a) = (1 − a)δ + adˆmax . For ﬁxed a, we set max Si,r (a) ∼ Ber(smax,min (a), smax,max (a), ps )

Di,r (a) ∼ Ber(dmin (a), dmax (a), pd ). First observe that the volume diversity parameter does not aﬀect the mean for either S or D: max E[Si,r (a)] = σ, E[D(a)] = δ, a ∈ [0, 1]. (11) Note that for a = 0 the Bernoulli values coincide, i.e., smax,min(0) = smax,max (0) = σ and dmin (0) = dmax (0) = δ, while for a = 1 the Bernoulli values take on their extreme values: smax,min (1) = sˆmax,min , smax,max(1) = sˆmax,max and dmin (1) = dˆmin , dmax (1) = dˆmax . Thus increasing a from 0 to 1 increases the diversity of stream volumes found on the link while not aﬀecting the mean volume E[V ] = σδ. Recall that for the volume dependent algorithm we select w such that E[1/V ] = w; for the current model this yields w = (400/121)σδ. Finally, for the discrete controller our default selection (aside from Figure 2) is to use K = 2 encodings: S = {smin , smax }. Link capacity and loading. We will devote signiﬁcant attention to studying the QoS under the controllers as the link capacity is varied (while the arrival rate is held constant). Note that when the capacity per stream is near or smaller

Fundamental Tradeoﬀs in Distributed Algorithms

647

than the average minimum required rate per stream, that the typical stream will spend most of its tenure at its minimum rate. If the capacity per stream is at or exceeds the average maximum rate per stream then the typical stream will spend most of its time at its maximum rate. With this in mind we parameterize the link capacity as c = mλσδ, where λ is the arrival rate (assumed Poisson). Note that, barring blocking, the average number of streams on the link is E[|Nt |] = λδ (by Little’s Law), and as such the capacity per stream is c/(λδ) = mσ. At m = 1 the capacity per stream matches the typical maximum subscription level. Recalling that the user utility function sets smin /smax = 1/2, we see that for m < 1/2 the capacity per stream matches the typical minimum subscription level. Following [7], we term the regime m ∈ [0, 1/2] the overloaded regime, m ∈ [1/2, 1] the rate adaptive regime, and m ∈ [1, ∞) the underloaded regime. We have selected λ so that on average (barring blocking) there are 10 streams sharing the link, thus λ = 10/δ. It follows that the link capacity at the scaling threshold m = 1/2 is c = m(λδ)σ = 1/2 · 10 · 704 = 3520, while and at m = 1 is 7040. In our capacity plots we will vary m from 0.1 to 1.2. When the link capacity is ﬁxed, we will use m = 1/2, which corresponds to provisioning the link the rate adaptive regime such that the capacity per stream is twice what is minimally required by a typical stream, but half of the maximum rate requested by a typical stream. Simulation setup. We have implemented our controllers in ns-2 [8] which provides ample support and a realistic simulation environment to test our model. It ensures us results that we are most likely to see during an actual real-world implementation. We set up 1000 nodes acting as transmitters connected by a single bottleneck link to 1000 receiver nodes. The bottleneck link has a packet queue of size 100 used to smooth the traﬃc and implement a DropTail policy for the packets. This link is a duplex link allowing acknowledgment packets from the receivers to reach the transmitter indicating if the packet has been lost or received without incident. All the transmitting nodes are UDP sources and the receivers are Loss Monitoring agents. We use a CBR (Constant Bit-Rate) traﬃc pattern for each node. The values for the parameters of each stream like duration and subscription levels are assigned as discussed previously. Each simulation point is averaged over 1000 streams and over 100 repetitions of the experiment. 3.1

Tradeoﬀ #1: Optimality and Fairness

The ﬁrst tradeoﬀ we study is between optimality and fairness. In particular, the volume dependent controller optimizes the client average quality of service by giving preference to small volume streams, while the volume independent controller treats all streams fairly. The top plot in Figure 1 presents QoS for both controllers as the link capacity is increased, while the bottom plot presents QoS for both controllers as the volume diversity parameter is increased. For the top plot the volume diversity parameter is maximized at a = 1.0. For the bottom plot the link capacity is selected using a capacity scaling parameter of 0.5. The plots illustrate how the volume dependent controller is able to exploit volume

648

V. Veeraraghavan and S. Weber

diversity to maximize the client average QoS, with pronounced improvements in the rate adaptive capacity scaling regime, and when the volume diversity parameter is large. Customer average QoS varying with the capacity for the discrete controller 1

0.95

0.95

0.9

0.9

0.85

0.85 Average QoS

Average QoS

Customer average QoS varying with the capacity for the continuous controller 1

0.8 0.75

0.8 0.75

0.7

0.7

0.65

0.65

0.6

0.6

Volume Dependent Volume Independent

0.55

Volume Dependent Volume Independent

0.55 0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0

1000

2000

3000

Capacity (Kbps)

4000

5000

6000

7000

8000

9000

0.9

1

Capacity (Kbps)

Customer average QoS varying with volume diversity

Customer average QoS varying with volume diversity

0.85

0.85 Volume Dependent Volume Independent

Volume Dependent Volume Independent

0.8

0.8

0.75 Average QoS

Average QoS

0.75 0.7 0.65

0.7

0.65 0.6 0.6

0.55 0.5

0.55 0.1

0.2

0.3

0.4

0.5

0.6

Volume Diversity (a)

0.7

0.8

0.9

1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Volume Diversity (a)

Fig. 1. Tradeoﬀ #1: comparing optimality and fairness. All four plots compare the QoS performance of the volume dependent controller with that of the volume independent controller. The top two plots vary the link capacity, c (holding a = 1.0), and illustrate the signiﬁcant increase in QoS achieved by the volume dependent controller, especially in the overloaded and rate adaptive regimes. The bottom two plots vary the volume diversity parameter, a (holding m = 0.5), and show that the improvement in QoS achievable by the volume dependent controller increases in a. In both cases we present results for both continuous and discrete controllers (using ns-2).

3.2

Tradeoﬀ #2: Granularity of Control and QoS

We next consider the impact of both temporal and spatial granularity of control on the QoS. Temporal granularity refers to the time between state updates, while spatial granularity refers to the number of available subscription levels. The top plot in Figure 2 shows the loss rate as the time between updates is increased; as expected the loss rate increases. More surprising is that the volume dependent controller achieves a signiﬁcantly lower loss rate than the volume dependent controller. This can be explained by the fact that the volume dependent controller gives preference to smaller rate streams, which make proportionally

Fundamental Tradeoﬀs in Distributed Algorithms

649

smaller changes in rate, and as such they “feel out” the available capacity more gradually than the large volume streams. The middle plot in Figure 2 shows the impact on the QoS as the time between state updates is varied from τ = 5 to τ = 60 (we used τ = 1 for the plots in Figure 1). There is a signiﬁcant impact on both controllers, but no “cliﬀ”, meaning the controller performance degrades gracefully in the absence of updates. The bottom plot in the ﬁgure shows the QoS as the link capacity is scaled (again from m = 0.1 to m = 1.2) with a varying number of encodings, K, available. The K encodings are assumed to max be uniformly spaced over the interval [smin i,r , si,r ]. Note that the curve K = ∞ is the continuous controller. Clearly there is a law of diminishing returns as K increases, and the content provider would be able to assess the tradeoﬀ between the cost of storing more encodings with the marginal beneﬁt to client QoS. Average loss over bottleneck link varying with feedback rate

Average QoS varying with feedback rate

0.1

1 Volume Dependent Volume Independent

0.09

0.95 0.9

0.08

0.85 Average QoS

Average Loss

0.07 0.06 0.05 0.04

0.8 0.75 0.7 0.65

0.03

0.6

0.02

0.55

0.01

Volume Dependent Volume Independent

0.5 0

10

20

30

40

50

60

0

10

20

Time between updates (secs)

30

40

50

60

Time between updates (secs) Customer average QoS varying with Capacity of bottleneck link

1 K=2 K=4 K=infinity 0.95

Average QoS

0.9

0.85

0.8

0.75

0.7 0

1000

2000

3000

4000

5000

6000

7000

8000

9000

Capacity of link (kbps)

Fig. 2. Tradeoﬀ #2: the impact of the granularity of control on the QoS. The top plot shows the loss rate as a function of the time between updates for both the volume dependent and volume independent controller. As expected the loss increases as time between updates increases, more surprising is the fact that the volume independent controller suﬀers signiﬁcantly higher loss than the volume independent controller. The middle plot shows the impact of increasing the time between state updates on the QoS for both the volume dependent and volume independent controllers (both the continuous controller with m = 0.5). The bottom plot shows the impact of varying the number of oﬀered encodings as the link capacity is scaled (holding a = 1.0). All plots are from ns-2.

650

V. Veeraraghavan and S. Weber

Customer Average QoS versus weight (w)

Customer Average Rate of Adaptation versus weight (w)

1

7 Continuous Controller Discrete Controller 6 Average Rate of Adaptation

0.9

Average QoS

0.8

0.7

0.6

0.5

5

4

3

2 Continuous Controller Discrete Controller

0.4

1 0

100

200

300 Weight (w)

400

500

600

0

100

200

300 Weight (w)

400

500

600

Customer Average QoS versus Rate of Adaptation for varying weights 1 Continuous Controller Discrete Controller 0.95

Average QoS

0.9

0.85

0.8

0.75

0.7 1

2

3

4 5 Rate of adaptation (r)

6

7

Fig. 3. Tradeoﬀ #3: The tradeoﬀ between maximizing the time average utility and minimizing the rate of adaptation. The top plot shows the increase in QoS obtained by increasing the weight w for both the continuous and discrete controller. The middle plot shows the increase in the rate of adaptation as the weight of the controller is increased (for m = 0.5, a = 1.0), and the bottom plot is a scatterplot of (r(w), q(w)) as the weight is increased from w = 1 to w = 500.

3.3

Tradeoﬀ #3: Time Average Utility and Rate of Adaptation

The third tradeoﬀ is that between the competing aims of maximizing the time average utility (q) and minimizing the rate of adaptation (r). The top two plots in Figure 3 shows the change in q and r as the weight w is increased. There is a clear dependence of q, r on w for the continuous controller, and none for the discrete controller. This is because the discrete controller’s update rule is relatively insensitive to the weight, depending only on the mapping between the virtual controller and the available set of discrete rates. The bottom plot shows the inherent tradeoﬀ between maximizing q and minimizing r: the x-axis is r and the y-axis is q, the points are the QoS pairs (q(w), r(w)) as the weight w is increased from w = 1 to w = 500. The continuous controller shows a signiﬁcant increase in q, but at the cost of an increase in r; the discrete controller (with K = 2 encodings) is again less sensitive to the weight, but also unable to fully achieve the q levels obtainable by the continuous controller.

Fundamental Tradeoﬀs in Distributed Algorithms

4

651

Conclusion

This paper has focused on three fundamental design tradeoﬀs in designing distributed algorithms for rate adaptive multimedia streams: i) the tradeoﬀ between optimality and fairness, ii) the tradeoﬀ between granularity of control and QoS, and iii) the tradeoﬀ between maximizing the time average utility and minimizing the rate of adaptation. Although each of these tradeoﬀs are qualitatively intuitive, the quantitative results are instructive, and oﬀer structural insights into the sometimes complex dependencies among the system parameters.

References 1. Shenker, S.: Fundamental design issues for the future internet. IEEE JSAC 13(7) (September 1995) 2. Lee, J.W., Mazumdar, R.R., Shroﬀ, N.B.: Non-convex optimization and rate control for multi-class services in the Internet. IEEE/ACM Transactions on Networking 13(4) (August 2005) 827–840 3. Chiang, M., Zhang, S., Hande, P.: Distributed rate allocation for inelastic ﬂows: optimization frameworks, optimality conditions, and optimal algorithms. In: Proceedings of IEEE INFOCOM, Miami, FL (March 2005) 4. Weber, S., Veeraraghavan, V.: Distributed algorithms for rate-adaptive media streams. Springer Networks and Spatial Economics (submitted) (May 2006) 5. Kelly, F.: Charging and rate control for elastic traﬃc. European Transactions on Communications 8 (1997) 33–37 6. Kelly, F., Maulloo, A., Tan, D.: Rate control in communication networks: shadow prices, proportional fairness, and stability. Journal of the Operational Research Society 49 (1998) 237–252 7. Weber, S., de Veciana, G.: Rate adaptive multimedia streams: optimization and admisssion control. IEEE/ACM Transactions on Networking 13(6) (December 2005) 1275–1288 8. Network Simulator ns-2, http://www.isi.edu/nsnam/ns/

Optimal Policies for Playing Buﬀered Media Streams Steven Weber Drexel University, Dept. of ECE, Philadelphia, PA 19104 [email protected]

Abstract. This paper addresses a practical problem in our everyday use of streaming media on the Internet: as a user observes the buﬀering of a media stream with an uncertain transfer rate, when should that user initiate playback of the stream? The tension is that initiating playback prematurely will increase the likelihood of buﬀer starvation, while a delay in initiating playback is undesirable because it necessitates waiting. Three policies are studied: the optimal policy (exploiting full knowledge of the transfer process), the optimal static policy (the expected value of the optimal policy), and an online policy assuming only knowledge of the transfer rate observed thus far. Lower and upper bounds are derived on the optimal policy as well as the associated minimum cost; these bounds are expressed in terms of a (random) hitting time of the transfer process. Simulation results for a Markov modulated transfer rate process identify static and online policies as near-optimal depending on the time scale of the transfer rate process and the duration of the stream.

1

Introduction

This paper addresses a practical problem in our everyday use of streaming media on the Internet: as a user observes the buﬀering of a media stream with an uncertain transfer rate, when should that user initiate playback of the stream? The tension is that initiating playback prematurely will increase the likelihood of buﬀer starvation, upon which the client media player paused playback and informs the user that the client is buﬀering the stream. On the other hand, a delay in initiating playback is undesirable because it necessitates waiting; a key motivation behind streaming is to avoid the delay in downloading media. We will not distinguish between the cases where the playback initiation decision is made by the human user or the client media player. In either case the objective is to simultaneously minimize the prefetch time and the stall time. The prefetch time is the time between initiating the transfer and initiating playback, and the stall time is the time spent re-buﬀering after a buﬀer starvation. The diﬃculty in making this decision lies in the fact that i) the instantaneous rate of the media content is time-varying and likely unknown a priori by the client, and ii) the instantaneous transfer rate available on the network is time-varying, and also a

This work is supported by the NSF under grant 0435247.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 652–663, 2007. c IFIP International Federation for Information Processing 2007

Optimal Policies for Playing Buﬀered Media Streams

653

priori unknown. To simplify the problem we restrict our attention to the simpler constant bit rate media content case, leaving the uncertain transfer rate as the unknown. Suppose the instantaneous transfer rate is given by the random process {x(t)}, and that the CBR playback rate is one. During playback, the client buﬀer is ﬁlling at rate x(t) − 1 when x(t) > 1, and draining at rate 1 − x(t) when x(t) < 1. Suppose the user incurs a cost per unit time cp (p for prefetch) for each second between initiating transfer (at time t = 0) and initiating playback, and incurs a cost per unit time cs (s for stall) for each second that playback is stalled for re-buﬀering. Let τp and τs be the total time spent in the prefetch and stall states respectively, so that the total cost incurred by the user is c = cp τp + cs τs . The client has control over τp by choosing when to initiate playback, but τs is a random variable that depends upon both τp and the transfer rate process. This paper investigates several policies that may be used to minimize the cost. Three policies are studied in particular: – Optimal policy: using full a priori knowledge of the transfer process {x(t)}, the optimal policy identiﬁes the playback initiation time that minimizes the cost c; – Optimal static policy: the optimal static policy is the expected optimal playback initiation time, with the expectation taken with respect to the distribution of the transfer process; – Online policy: the online policy makes a decision to initiate playback at time t based on the instantaneous transfer rate observed over [0, t), with no knowledge of the future evolution of the transfer process or its distribution. Lower and upper bounds on both the optimal playback initiation time and the corresponding minimized cost are given in terms of a (random) hitting time of the transfer process. Simulation results are provided for the three policies for the case when the transfer process is a two state Markov modulated process. The primary takeaway is that static policies are near-optimal when the timescale at which the transfer rate process changes is much smaller than the duration of the stream, while online policies are near-optimal when the timescale at which the transfer rate process changes is similar to the duration of the stream. The rest of this paper is organized as follows. Section 2 summarizes some of the related work on streaming media. Section 3 presents the mathematical model, and Section 4 oﬀers the analytical bounds on the optimal policy and optimal cost. Simulation results are presented in Section 5 and a brief conclusion is oﬀered in Section 6. All proofs are placed in the appendix.

2

Related Work

There are a large number of papers on the analysis, design, and performance optimization of streaming media; far too many to summarize here. Our own prior work on streaming media has focused on the network-wide optimal control of rate adaptive streaming media, with the objective of maximizing clientaverage quality of service (QoS) metrics [1,2]. Arguably the ﬁrst algorithm for

654

S. Weber

media rate adaptation is the receiver-driven layered multicast (RLM) algorithm of McCanne, Jacobson, and Vetterli [3]. Selected related work on rate adaptation includes Rejaie, Handley, and Estrin [4]; Vickers, Albuquerque, and Suda [5]; Saparilla and Ross [6]; Gorinsky, Ramakrishnan, and Vin [7]; Kar, Sarkar, and Tassiulas [8]; Bain and Key [9]; Argiriou and Georgiadis [10]; and Chou and Shin [11]. Although rate adaptation introduces a valuable additional control for improving the performance of streaming media, we will not address this extension here. There is an extensive literature starting from the late 1990s on the problem of optimal smoothing of VBR video over variable channels. In a sense this problem is orthogonal to the one we are considering, primarily because we focus on client side control whereas optimal smoothing is focused on server side control. The objective in the smoothing literature is to transmit the VBR video such that the peak to mean ratio of the transferred stream is minimized. Signiﬁcant publications include Lam, Chow, and Yau [12]; Zhang, Kurose, Salehi, and Towsley [13]; Duﬃeld, Ramakrishnan, and Reibman [14]; Rexford and Towsley [15]; and Sen, Rexford, Dey, Kurose, and Towsley [16]. There is relatively little work on client control of prefetching and playback initiation. Reisslein and Ross [17] cover prefetching policies appropriate for a server multiplexing several simultaneous streams over a shared link; Fitzek and Reisslein [18] consider a similar setup but over wireless links.

3

Mathematical Model

3.1

Problem Statement

A user initiates a streaming media connection at time t = 0. After making a connection with the media content server, the client media player begins buﬀering media content from the server. The control decision faced by the client is when to initiate playing the media stream, i.e., when to start draining the client buﬀer. It is assumed that there is a cost incurred at rate cp > 0 (p for prefetch) for each second between initiating the connection and initiating playback of the content. This cost measures the user’s frustration at having to wait for the content to begin playback. The transfer rate available to the client server connection is time-varying, and it is therefore possible that the client buﬀer will starve, forcing the client player to pause the media content. It is assumed that there is a cost incurred at rate cs > 0 (s for stall) for each second the media player is forced to stall the media content after ﬁrst initiating playback. Let τp be the time spent prefetching the stream prior to initiating playback, and let τs be the total time spent stalled after initiating playback. The objective of the control decision is to minimize the incurred cost c = cp τp + cs τs . We restrict our attention to the case where cp < cs , i.e., the unit cost of prefetching is smaller than the unit cost of stalling. This is because the optimal policy when cp ≥ cs is simply to not prefetch (τp = 0).

Optimal Policies for Playing Buﬀered Media Streams

3.2

655

Two Fundamental Processes

The media content is modeled as a constant bit rate (CBR) source requiring playback at unit rate, and having duration d. The instantaneous transfer rate of the client server connection is a random process {x(t)}.1 There are two fundamental processes that determine performance: the duration of media content that can be played by the client and the duration of media content that has been played by the client. Definition 1. The duration of media content that can be played by the client at time t is the cumulative amount of data transferred over [0, t], up to the stream duration d: t x(s)ds. (1) y(t) = d ∧ 0

Definition 2. The duration of media content that has been played by the client is the solution of the following nonlinear diﬀerential equation ⎧ t < τp or t ≥ tz ⎨ 0, d y(t) > z(t) , z(t) ˙ = z(t) = 1, ⎩ dt x(t) ∧ 1, else

(2)

where τp = inf{t : z(t) > 0} is the prefetch duration and tz = inf{t : z(t) = d} is the time playback is completed. The change in playback position, z(t), ˙ is zero for t < τp since the stream has not yet started playing, and is zero for t ≥ tz since the stream has already ﬁnished. When there is unplayed media content stored at the receiver, y(t) > z(t), the client is playing at the full unit rate, and z(t) ˙ = 1. Otherwise y(t) = z(t), which means the client buﬀer is starved, i.e., the client has played all currently available content. In this case the client plays the media content at rate x(t) ∧ 1 during starvation, hence when x(t) < 1 the client spends a fraction x(t) of the time in play mode, and 1 − x(t) of the time in pause (stall) mode. Let ty = inf{t : y(t) = d} be the time that the client obtains all the media content, and let tz = inf{t : z(t) = d} be the time that the client actually completes play. It is straightforward to obtain the processes {y(t), z(t)} from the realization of {x(t)}, as illustrated in the left side of Figure 1. The play process, {z(t)}, is obtained by drawing a unit slope line starting at the playback time t = τp until it intersects {y(t)} ∧ d. If it hits {y(t)} then it tracks {y(t)} at rate x(t) < 1, until x(t) > 1 at which time it continues at unit slope, and so on. The right side of Figure 1 gives the state transition diagram of the streaming process. Here prefetch is the time after data transfer begins but before play is initiated, and postfetch is the time after data transfer ﬁnishes but before play 1

It is easily seen that the model may be generalized to arbitrary CBR play rate r by scaling {x(t)} by 1/r.

656

S. Weber

is ﬁnished. It follows that the stall time, τs , is a functional of the play process, {z(t)}, namely: tz (1 − z(t))dt. ˙ (3) τs = τp

Table 1 summarizes the mathematical notation used in the model. Table 1. Notation d stream playback duration {x(t)} instantaneous transfer rate {y(t)} duration of media content that can be played {z(t)} duration of media content that has been played τp prefetch delay τs total time spent stalled ty time the client obtains all the media content tz time the client completes playback cp rate cost is incurred when prefetching cs rate cost is incurred when stalled

y(t), z(t) d

start

y(t) x(t)

prefetch

z(t)

play / pause postfetch

1 τp

ty tz

t

finish

y(t) = z(t) = 0

t=0

y(t) > 0, z(t) = 0

0 < t ≤ τp

0 < z(t) ≤ y(t) < d

τp < t ≤ ty

y(t) = d, z(t) < d

ty < t < tz

y(t) = z(t) = d

t = tz

Fig. 1. Left: Evolution of the processes {y(t), z(t)} using a prefetch duration of τp . Transfer of data begins at time 0 and completes at time ty ; play begins at time τp and completes at time tz . Right: State transitions in the streaming process, along with the times at which they occur and the values of (y(t), z(t)) that characterize each state.

3.3

Three Prefetching Policies

Optimal policy: under the assumption that cp < cs , it follows that the optimal policy to minimize c = cp τp + cs τs is to set τpop = inf{t : τs = 0} (op for optimal policy). The optimal policy clearly requires a priori knowledge of the transfer rate process {x(t)}. Optimal static policy: the optimal static policies is the expected value of the optimal policy, τpsp = E[τpop ] (sp for static policy). Online policy: let {Ft } be the ﬁltration of knowledge available to the client by time t. The online policy is to select a prefetch time τpol = t based on {Ft } (ol for online).

Optimal Policies for Playing Buﬀered Media Streams

4 4.1

657

Analytical Results Optimal Policy

Consider the case where the transfer rate process {x(t)} is known a priori by the client, as assumed by the optimal policy. We begin with an easy result for the case when x(t) = x for all t. All proofs are in the appendix. Lemma 1. Suppose x(t) = x for all t. The optimal policy and the corresponding optimal cost are τpop = d

(1 − x)+ (1 − x)+ 1cp
(4)

Next consider the case when either x(t) < 1 for all t or x(t) > 1 for all t. Lemma 2. Suppose x(t) < 1 for all t or x(t) ≥ 1 for all t. The optimal policy and the corresponding optimal cost are τpop = (ty − d)+ 1cp
(5)

where ty = inf{t : y(t) = d} is the transfer completion time. Note that x(t) ≥ 1 for all t implies ty ≤ d, while x(t) < 1 for all t implies ty > d. In the above regimes the optimal policy and corresponding cost are expressible in terms of the single random variable ty . The point to emphasize is that the ty is a hitting time of the transfer process {y(t)}, and is independent of the play process {z(t)}. Unfortunately the simple characterization of Lemma 2 does not hold for general {x(t)}, but we can obtain bounds on the cost in terms of ty . In the following theorem we restrict our attention to instantaneous transfer rate processes with minimum and maximum rates, xmin and xmax , respectively. Without loss of generality we assume xmin < 1 < xmax , since otherwise the process falls under the class of processes addressed in Lemma 2. Theorem 1. Suppose {x(t)} is bounded from below and above such that xmin < x(t) < xmax , where xmin < 1 < xmax . Deﬁne the maximum prefetch time under the lower and upper bounds as τplmax = (ty − d)+ , τpumax =

1 − xmin (xmax ty − d). xmax − xmin

(6)

The optimal policy and the corresponding optimal cost are τpop ∈ [τplmax 1cp
op

∈ [cp ∧

cs τplmax , cp

∧

cs τpumax ].

Note that the bounds become tight as xmax ↓ 1.

(7) (8)

658

4.2

S. Weber

Optimal Static Policy

This policy is appropriate for the regime when the transfer rate process {x(t)} is unknown a priori by the client, but the statistics of {x(t)} are known (or may be estimated through observation of multiple streaming sessions), permitting the computation of the optimal static prefetch policy: τpsp = E[τpop ]. The optimal static policy may be approximated by observing n iid realizations of the optimal prefetch policy, τpop,1 , . . . , τpop,n for n realizations of the transfer process, n x1 , . . . , xn , (where xi = {xi (t)}), and computing τˆpsp = n1 i=1 τpop,i . 4.3

Online Policy

We now assume the transfer rate process {x(t)} is unknown by the client a priori. Lacking any information as to the distribution of the process {x(t)}, we seek to estimate the data transfer completion time, ty , at each time t, using ˆ and only knowledge of {x(u)}t0 . An estimate of a quantity w is denoted as w, the notation w(t) ˆ indicates that this is the estimate made using knowledge of {x(u)}t0 . The empirical estimate of the average transfer rate is: 1 x ˆ(t) = t

t

x(u)du = 0

y(t) . t

(9)

d The estimate of the transfer completion time as of time t is tˆy (t) = t y(t) . Deﬁne op τˆp (t) as the best estimate as of time t of the time to start playback so as to avoid stalling. It is clear that

1 −1 . τˆpop (t) = tˆy (t) − d = d x ˆ(t)

(10)

The online policy is then: τpol = inf{t : τˆpop (t) < t},

(11)

i.e., to begin playback at the ﬁrst time that the current time exceeds the estimated optimal prefetch time.

5

Simulation Results

To investigate the performance of the three policies we consider the case when the transfer process {x(t)} is modulated by a homogeneous continuous time two state Markov process. In particular, let the states be 1 and 2, where x(t) = r when the process is in state 1 and x(t) = 2 − r when the process is in state 2 for some r ∈ [0, 1]. Let the rate matrix for the Markov chain be

−q q Q= , (12) q −q

Optimal Policies for Playing Buﬀered Media Streams Average cost versus transfer rate (cp = 1, cs = 2, q=.1)

Average cost versus transfer rate (cp = 1, cs = 2, q=10)

16

1.6 optimal static online

optimal static online

1.4

12

Average cost (c)

Average cost (c)

14

10 8 6 4 2

1.2 1 0.8 0.6 0.4 0.2

0

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0

0.1

Transfer rate (r)

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Transfer rate (r)

Average cost versus transition rate (cp = 1, cs = 2, r=0)

Average cost versus transition rate (cp = 1, cs = 2, r=0.9)

18

0.9 optimal static online

16

optimal static online

0.8 0.7 Average cost (c)

14 Average cost (c)

659

12 10 8 6

0.6 0.5 0.4 0.3

4

0.2

2

0.1

0

0 0.1

1 Transition rate (q)

10

0.1

1 Transition rate (q)

10

Fig. 2. Average cost c¯ = cp τ¯p + cs τ¯s under the three policies (optimal, static, online). The top two plots have the transfer rate r as the abscissa, for transition rate q = .1 (top) and q = 10 (second). The bottom two plots have the transition rate q as the abscissa, for transfer rate r = 0 (third) and r = 0.9 (bottom).

so that the stationary distribution is π1 = π2 = 12 , and q governs the timescale of the evolution of the chain. Note that the average transfer rate is E[x(t)] = π1 r + π2 (2 − r) = 1. We explore the performance of the three policies in the regime when q d (the transfer process evolves on a slower timescale than the stream duration), as well as q d (the transfer process evolves on a faster timescale than the stream duration). Further we explore the performance of the three policies in the regime when r ≈ 0 (the transfer process is bursty with rates ≈ 0 and ≈ 2) and r ≈ 1 (the transfer process is smooth with rates ≈ 1). Because of this we refer to q as the temporal burstiness parameter and r as the spatial burstiness parameter. Figure 2 presents the simulation results. All four plots show the average cost c¯ = cp τ¯p + cs τ¯s under the three policies (optimal, static, online), for d = 10 cp = 1 cs = 2.

(13)

For r small the transfer rate process is spatially bursty while for r near one the transfer rate process is almost (spatially) constant. For q small the transfer rate process is temporally almost static, while for q large the transfer rate process is temporally bursty. The top two plots have the transfer rate r as the abscissa, for transition rate q = .1 (top) and q = 10 (second). The top two plots demonstrate that online policy is superior for temporally static transfer rates while the static

660

S. Weber Average optimal prefetch time (q=0.1) Average optimal prefetch time

10 optimal prefetch time lower bound upper bound

9 8 7 6 5 4 3 2 1 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Transfer rate (r)

Fig. 3. The optimal (no stall) prefetch time along with the lower and upper bounds from Theorem 1 versus the transfer rate parameter r

policy is superior for temporally bursty processes. Intuitively, knowledge of the statistics of the process are only of value when there are suﬃcient number of rate changes so that the ﬂuctuations will average out. The bottom two plots have the transition rate q as the abscissa, for transfer rate r = 0 (third) and r = 0.9 (bottom). The bottom two plots demonstrate that fact that there is a critical transition rate qc such that for q < qc the online policy is superior, while for q > qc the static policy is superior. Furthermore, qc is larger for spatially bursty processes (r small) and smaller for spatially smooth processes (r large). All four plots demonstrate that the average cost is monotone decreasing as process becomes spatially and temporally smoother, i.e., as q and r increase. Finally, Figure 3 presents the optimal (no stall) prefetch time along with the lower and upper bounds from Theorem 1 versus the transfer rate parameter r. The bounds are seen to be reasonably tight for all r, and are seen to become tight as r ↑ 1.

6

Conclusion

In this paper we have studied policies governing when a streaming media client should commence playback. The tradeoﬀ that is optimized is between minimizing the stall time (due to buﬀer starvation) and minimizing the prefetch time. Three policies were considered: the optimal policy (assuming full knowledge), the optimal static policy, and an online policy. It was observed that the online policy is superior when the duration of the stream is on the order of the ﬂuctuations of the transfer process, while the static policy is superior when the duration of the stream is much longer than the ﬂuctuation timescale.

Optimal Policies for Playing Buﬀered Media Streams

661

References 1. Weber, S., de Veciana, G.: Rate adaptive multimedia streams: optimization and admisssion control. IEEE/ACM Transactions on Networking 13(6) (December 2005) 1275–1288 2. Weber, S., de Veciana, G.: Flow-level QoS for a dynamic load of rate adaptive sessions sharing a bottleneck link. accepted for publication in Computer Networks (October 2006) 3. McCanne, S., Jacobson, V., Vetterli, M.: Receiver-driven layered multicast. In: Proceedings of ACM SIGCOMM, Stanford, CA (August 1996) 4. Rejaie, R., Handley, M., Estrin, D.: Quality adaptation for congestion controlled video playback over the internet. In: Proceedings of ACM SIGCOMM. (1999) 189–200 5. Vickers, B., Albuquerque, C., Suda, T.: Source-adaptive multi-layered multicast algorithms for real-time video distribution. Technical Report Technical Report ICS-TR 99-45, University of California, Irvine (June 1999) 6. Saparilla, D., Ross, K.: Optimal streaming of layered video. In: Proceedings of IEEE INFOCOM. (March 2000) 7. Gorinsky, S., Ramakrishnan, K., Vin, H.: Addressing heterogeneity and scalability in layered multicast congestion control. Technical Report TR2000–31, UT-Austin (November 2000) 8. Kar, K., Sarkar, S., Tassiulas, L.: Optimization based rate control for multirate multicast sessions. In: Proceedings of IEEE INFOCOM. (2001) 123–132 9. Bain, A., Key, P.: Modeling the performance of in-call probing for multi-level adaptive applications. Technical Report Technical Report MSR-TR-2002-06, Microsoft Research (October 2001) 10. Argiriou, N., Georgiadis, L.: Channel sharing by rate-adaptive streaming applications. In: Proceedings of IEEE INFOCOM. (2002) 11. Chou, C.T., Shin, K.G.: Analysis of adaptive bandwidth allocation in wireless networks with multilevel degradable quality of service. IEEE Transactions on Mobile Computing 3(1) (January–February 2004) 5–17 12. Lam, S., Chow, S., Yau, D.: A lossless smoothing algorithm for compressed video. IEEE/ACM Transactions on Networking 4(5) (October 1996) 697–708 13. Zhang, Z.L., Kurose, J., Salehi, J.D., Towsley, D.: Smoothing, statistical multiplexing, and call admission control for stored video. IEEE Journal on Selected Areas in Communications 15(6) (August 1997) 1148–1166 14. Duﬃeld, N., Ramakrishnan, K., Reibman, A.: SAVE: an algorithm for smoothed adaptive video over explicit rate networks. IEEE/ACM Transactions on Networking 6(6) (December 1998) 717–728 15. Rexford, J., Towsley, D.: Smoothing variable-bit-rate video in an internetwork. IEEE/ACM Transactions on Networking 7(2) (April 1999) 202–215 16. Sen, S., Rexford, J., Dey, J., Kurose, J., Towsley, D.: Online smoothing of variablebit-rate streaming video. IEEE Transactions on Multimedia 2(1) (March 2000) 37–48 17. Reisslein, M., Ross, K.W.: High-performance prefetching protocols for VBR prerecorded video. IEEE Network (November/December 1998) 46–55 18. Fitzek, F., Reisslein, M.: A prefetching protocol for continuous media streaming in wireless environments. IEEE Journal on Selected Areas in Communications 19(10) (October 2001) 2015–2028

662

S. Weber

Proof of Lemma 1. It is obvious that not prefetching is optimal when x ≥ 1. Suppose instead that x < 1. It is clearly sub-optimal to have any postfetch time, thus the maximum prefetch time is τpmax = d 1−x x . Under prefetch time x seconds, and plays at rate x τp ∈ [0, τpmax ] the stream plays at rate 1 for τp 1−x τp seconds. The time spent playing at rate x can be decomposed into for xd − 1−x x d − τp 1−x seconds spent playing at rate 1 and d 1−x x − τp seconds spent stalled. The cost incurred is then 1−x 1−x − τp = (cp − cs )τp + cs d . c = cp τp + cs d x x The cost is linear in τp with slope cp − cs and intercept cs d 1−x x . When cp < cs the cost is minimized by setting τp∗ = τpmax , incurring a cost of cmin = cp d 1−x x . Proof of Lemma 2. It is obvious that not prefetching is optimal when x(t) ≥ 1 for all t. Suppose instead that x(t) < 1 for all t. Since x(t) < 1 for all t it follows that there exists some t∗ ∈ [τp , ty ] marking the ﬁrst time the buﬀer starves. In particular, y(t) > z(t) for all t ∈ [0, t∗ ] and y(t) = z(t) for all t ∈ [t∗ , ty ]. The buﬀer stays starved due to the assumption that x(t) < 1 for all t. The client plays at full unit rate throughout [τp , t∗ ], and plays at rate x(t) throughout [t∗ , ty ]. As the total time spent playing must sum to d it follows that d−(t∗ −τp ) seconds are spent playing in [t∗ , ty ], while the remaining (ty −t∗ )−(d−(t∗ −τp )) = ty −d−τp seconds are spent stalled. It is clear that for a given ty and d there will be an unnecessary postfetch period if τp > ty −d, thus we can without loss of generality restrict our attention to prefetch durations in the interval τp ∈ [0, ty − d]. For any such prefetch duration the incurred cost is c = cp τp + cs (ty − d − τp ) = (cp − cs )τp + cs (ty − d). From here the optimal policy is easily identiﬁed using an argument identical to that in the proof of Lemma 1. Proof of Theorem 1. The lower bound on the stall time occurs when the data arrives as early as possible. Similarly, the realization of {y(t)} that maximizes the stall time is for the data to arrive as late as possible. A little thought shows that the maximum useful prefetch time for the lower bound arrival pattern is τplmax = ty − d. Assume τp ∈ [0, τplmax ]. The best realization of {y(t)} for a given ﬁle transfer completion time, ty , is for the data to arrive at the maximum rate, xmax , for as long as possible, then at the minimum rate, xmin , for the remaining time. The playback process {z(t)} under prefetch time τp can then be computed for this best case. It is clear that there will not be a postfetch period under this realization, and that {z(t)} plays at full unit rate for the entire duration except for the last τ seconds, where the value of τ can be computed as the solution of t −τp −d . Thus the minimum time spent in ty − τp − τ = d − xmin τ , yielding τ = y 1−r the stall state is τsmin = τ (1 − r) = ty − τp − d. Next consider the worst case realization of {y(t)} shown in the left ﬁgure in Figure 4. The ﬁrst step is to identify the maximum useful prefetch time. See the

Optimal Policies for Playing Buﬀered Media Streams

663

right ﬁgure in Figure 4. The maximum useful prefetch time is the shortest time such that it is not possible to stall playback, given the data transfer completion time ty . The worst case realization of {y(t)} subject to the assumed bounds is shown in the ﬁgure. The maximum useful prefetch time is then the smallest τp such that beginning {z(t)} at τp never touches {y(t)}. From the ﬁgure it can 1−xmin be seen that this maximum time is τpumax = xmax −xmin (xmax ty − d). The worst realization is for the data to arrive at the minimum rate, xmin , for as long as possible, then at the maximum rate, xmax , for the remaining duration. Tedious xmax ty −d τp algebra yields the time where {z(t)} plays at rate xmin is xmax −xmin − 1−xmin , and thus the maximum time spent in the stall state is τsmax = τpumax − τp . Combining the bounds on the stall time leads to bounds on the cost: clmin = cp τp + cs τsmin = (cp − cs )τp + cs τplmax , τp ∈ [0, τplmax ] cumin = cp τp + cs τsmax = (cp − cs )τp + cs τpumax , τp ∈ [0, τpumax ]

Under both bounds the optimal policy is again obvious, namely to not prefetch if cp > cs , and to prefetch for τp = τpmax if cp < cs .

d

d 1

xmin y(t) y(τp )

z(t)

xmin

xmax

y(t)

1 ty

τp y(τp ) 1 − xmin

xmax ty − d τp − xmax − xmin 1 − xmin

tz

t

1

xmax

z(t) τp

1 − xmin (xmax ty − d) xmax − xmin

ty

tz

t

d xmax

Fig. 4. Sketch of proof of Theorem 1. Left: upper bound on the stall time. Right: maximum useful prefetch time for upper bound.

Non-parametric and Self-tuning Measurement-Based Admission Control Thomas Michael Bohnert1 , Edmundo Monteiro1 , Yevgeni Koucheryavy2, and Dmitri Moltchanov2 University of Coimbra Pinhal de Marrocos, 3030-290 Coimbra, Portugal {tbohnert,edmundo}@dei.uc.pt Tampere University of Technology P.O. Box 553, FI-33101 Tampere, Finland {yk,moltchan}@cs.tut.fi 1

2

Abstract. The Measurement-based Admission Control algorithm presented in this paper has been devised to overcome three shortcomings. Firstly, its conﬁguration parameter strictly correspond to standard QoS deﬁnitions in Service Level Agreements, namely packet loss probability. While this is featured by alternative designs too, the second issue, that of considerable performance ﬂuctuations under varying traﬃc conditions, is a rather general problem. Applying a non-parametric approach the presented algorithm’s estimation model is free from assumptions and simulations assert consistent performance for a set of various conditions. Finally, the third improvement is certainly most appealing, the algorithm’s independence from ﬁne-tuning. In fact, the optimal value of the performance determining parameter is being estimated from the measured sample. This makes the algorithm highly adaptive as well as autonomous and simulations conﬁrm near optimal performance and accuracy.

1

Introduction

By deﬁnition a preventive congestion avoidance mechanism, the Admission Control (AC) rationale is to limit the number of resource consumers in a network in advance. This is achieved by explicit access regulation for each and any request, i.e. a new ﬂow, based on the expected availability of resources. In general, a new ﬂow is admitted onto a network if its characteristics in terms of resource demand superimposed with that of previously admitted ﬂows would be up to an extent, such that committed QoS grants are being sustained. In computer networks a common approach is Measurement-based Admission Control. Principle idea is to sample the work arriving process in real time, compute statistics and evaluate them in purpose build queuing models. The strength of MBAC therefore seems to lie in its independence on a priori knowledge and its capability to cope with changing conditions. An immediate and intuitive conclusion is that MBAC emerges naturally as the only solution for a dynamic environment like the Internet and indeed, MBAC has been shown to be eﬀective in some sense and applicable in an extensive set of previous works, see for instance [1, 2, 3, 4, 5]. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 664–677, 2007. c IFIP International Federation for Information Processing 2007

Non-parametric and Self-tuning Measurement-Based Admission Control

665

At this point, considering the plethora of MBAC algorithms devised in the past, a good justiﬁcation for yet another one is necessary. In order to do so, we point the reader to an interesting fact. While further MBAC research has been questioned in [6], the same authors list in a follow-up [7] a set of yet open issues. In summary, by a comprehensively conducted simulative comparison the authors came up with the following conclusions. – Algorithms performance varies considerably for diﬀerent conditions, i.e. trafﬁc characteristics. – Performance strongly depends on algorithms individual, model speciﬁc performance parameter ﬁne-tuning. – Performance parameters are rooted in mathematics and lack intuitive meaning in a QoS or networking context. – Performance parameter mapping to target QoS objectives is inconsistent. – In most cases, algorithms under scrutiny missed targeted QoS objectives. In fact, these ﬁndings have been further conﬁrmed in [8], with the only but signiﬁcant diﬀerence that in this study the set of MBAC algorithms has been implemented in a real system. To evaluate the signiﬁcance of these conclusions, let us review some AC context. An acceptable customer-blocking rate is around three percent in the busiest time. Given this, daily service usage patterns impose that AC only plays an active role in around one hour a day. In this time frame a CSP (or ISP) wants a maximum of customers admitted onto the network to maximise return of invest. Real-time Traﬃc (RT), inherently rate limited, does not cause congestion in any other time. Provider’s objective, however, is contrary to the customer’s as those ideally want exclusive access to resources. In conclusion, optimal AC admits a maximum number of ﬂows while tightly approaching committed QoS objectives. Eventually, there is the operator’s point of view. Service Level Agreements (SLA) deﬁne parameters like Bandwidth, Jitter, Delay or Loss [9] [10] and thus, an algorithm’s conﬁguration parameter ought to be of this type. Moreover, setting of QoS targets must be reliable to ensure customers satisfaction. In the following discourse we present an MBAC algorithm designed to incorporate the identiﬁed requirements. In Sec. 2 we present a general queuing model and derive a non-parametric estimation algorithm. Thereafter we present the evaluation setup in Sec. 3 and introduce a performance and accuracy metric in Sec. 4. Performance results for diﬀerent conﬁgurations are presented in Sec. 5 followed by an accuracy evaluation. Main ﬁndings together with the resulting conclusions are discussed in Sec. 7.

2

A General Model for Measurement-Based Admission Control

Speaking in mathematical terms, the principle MBAC issue can be expressed as a real-time estimation problem. A sample S = {ζ1 , . . . , ζn } of Random Variables (RV) is captured by continuous measurement of the work arrival process and

666

T.M. Bohnert et al.

model speciﬁc statistics are calculated. Feeding these statistics in a queuing model allows predicting resource availability. Thus, the very ﬁrst step is to deﬁne a queuing model. As identiﬁed in the preceding section, an algorithm’s conﬁguration parameter shall have a meaning in a QoS context. In order to incorporate this, we use a simple buﬀer overﬂow model with the standard QoS criterion Ploss as conﬁguration parameter. Assume {ζt } denoting the work arrival process, and let A[s, t] be the amount of work arriving in the interval (s, t]. Further let ζt = A[−t, 0] such that the queue length at time zero is Ω = sup(ζt − Ct).

(1)

t≥0

with C denoting link capacity. The probability that the queue length exceeds ω is herewith (2) P {Ω > ω} = P {sup(ζt − Ct) > ω}. t≥0

As (2) is diﬃcult to compute, we use the lower bound approximation P {sup(ζt − Ct) > ω} ≥ sup P {ζt > ω + Ct}. t≥0

t≥0

(3)

With ρ = ω + Ct, and let Ft (x) = P {ζt ≤ x} be the Cumulative Distribution Function1 (CDF) of {ζt } we get P {Ω > ω} ≥ sup P {ζt > ω + Ct} = sup(1 − Ft (ρ)). t≥0

t≥0

(4)

Now lets set ω in ρ = ω + Ct to be the buﬀer size and lets further denote rp as the peak rate of a ﬂow. The ﬂow’s worst-case resource demand would be if it emits continuously with rp and hence ρ is set to ρ = ω + Ct − rpt. This yields the ﬁnal admission criteria based on the buﬀer overﬂow probability Pˆloss (ρ) which reads as true if Pˆloss (ρ) < Ploss Admitbool = . (5) f alse if Pˆloss (ρ) ≥ Ploss As the r.h.s of (4) reveals, this simple model’s accuracy depends on the marginal distribution function. This is fundamental to probability modelling and the generally adopted approach obeys that of Parametric Statistics. Parametric means, that the nature of the CDF is assumed to be known. Motivated by the Central Limit Theorem (CLT) for instance, many MBAC models assume Gaussianity (or Normality), like [1, 3, 4, 5, 11], while for instance [4, 2] assume a Gumbel distribution motivated by Extreme Value Theory. Once the CDF is known, the problem is to estimate the moments from the captured sample S to get Pˆloss (ρ). 1

Strictly speaking the Marginal Distribution Function.

Non-parametric and Self-tuning Measurement-Based Admission Control

667

With rising aggregation levels in access networks and due to its conformance with Self-Similarity (SS) and Long-Range-Dependency (LRD) ﬁndings in network traﬃc [12, 13], much attention has been paid to Gaussian approximation recently. An analytical argument in favour of is documented in [14], while [15] further supports it empirically based on real traﬃc traces, though for backbone traﬃc and for large time scales. The latter results, however, are questionable as common Goodness-of-ﬁt techniques were applied though known to be error prone for large samples and in the context of SS and LRD [16, Chap. 10] [17, 18] [19, Page 33]. In [17] a more profound mathematical approach has been applied to test the Gaussian approximation, but the result is rather inconclusive, stating that under certain conditions the approximation can be met but also grossly missed, particularly for small time scales. In contrast to Gaussian approximation, the authors of [20, 21] state traﬃc obeys a Poisson law and is non-stationary. However, these results have been recently called to be far unrealistic [18]. Finally, and only for the sake of further highlighting diversity of ﬁndings, we cite [22] which recently recommended the Gamma distribution as the “best choice on average” for a comprehensive set of recorded traﬃc traces. This short discourse indicates the uncertainty incurred by assuming in advance. The inﬂicted consequence is two fold. First, assuming for example a Gaussian nature implies a heavy-traﬃc approximation. Only then the CLT becomes valid. Furthermore, densities tend to be heavy-tailed, and in this case, moments beyond ﬁrst order may not exist at all. There is ample evidence that exaclty these assumptions subject MBAC performance so strongly to traﬃc conditions. Based on this ﬁnding, the focus of this work was to seek and investigate an alternative. To free Ff (ρ) in (4) from assumptions, Non-Parametric Statistics are to be applied. The simplest way would be to compute a histogram as estimate for Ff (ρ). Indeed, several histogram based MBAC algorithms, e.g. [23, 24, 25], have been published in the past. However, in none of these works the choice for histograms has been justiﬁed by statistical uncertainty, but has been used rather implicitly. This is evidenced by the fact that histograms are solely used in an ad-hoc fashion and parameters, namely bin width and bin origin, are chosen intuitively. That this salvages the risk of great imprecision is widely unknown but elaborately presented in [19, Chap. 3] for instance. Furthermore, histograms are poor for heavy-tailed distributions as they only represent the statistical information of the actual sample. As sample size is inherently limited, rare (tail) events are only in a few of sequentially captured samples. A non-parametric method superior to histograms to estimate an unknown density f is Kernel Density Estimation (KDE). Following [19, Chap. 6, Equ. 6.1] its deﬁnition reads 1 y − Yi k fˆh (y) = nh i=1 h n

(6)

668

T.M. Bohnert et al.

where Y is sample of n RVs, Y={Y1 , . . . , Yn }, h is the window width, also called the smoothing or bandwidth and k is the kernel, which has to satisfy the condition +∞ k(u)du = 1 (7) −∞

and is therefore often chosen from a density function. In brief, the estimator can be considered as a sum of superimposed bumps centred at the observations Yi . The shape of the bumps is deﬁned by the kernel function while the window width h deﬁnes their width [26]. Integrating (6) yields the distribution estimator 1 Fˆh (y) = nh

1 y − Yi K fˆh (u)du = n i=1 h −∞ n

y

where

(8)

x

K(x) =

k(u)du

(9)

−∞

is the integrated kernel function, short the kernel. The metric for accuracy evaluation for this type of estimator is the Mean Integrated Squared Error (MISE) deﬁned as

+∞

M ISE(h) = E −∞

{Fˆh (y) − F (y)}2 dy.

(10)

Letting hn1/2 → ∞ as n → ∞ yields the Asymptotic Mean Integrated Squared Error (AMISE). It has been shown in [27] that the AMISE is minimised by setting h equal to (11) ho = (Υ (K)/R1 )1/3 n−1/3 with Υ (K) being a function of the kernel only. For instance, using a √ standard normal density as kernel, k(x) = φ(k) and K(x) = Φ(x), yields Υ = 1/ π. Provided an estimate of R1 , the so-called roughness deﬁned as +∞ Rm = (f (m) (y))2 dy (12) −∞

where f m denotes the mth derivative of the true, unknown density f , the convergence rate of this estimator, determined by (11), is in the order of n−1/3 . This is slower as the rate of n−1 for parametric estimators under optimal conditions, i.e. a full match between reality and model. This condition is, however, practically never given. One can see that the performance of this estimator depends on the smoothing parameter ho , which is a function of R1 , and the kernel. In [19, Sec. 6.2.3.2] it has been shown that optimal kernels have compact support but as they are “best in average”, recall the AMISE is a mean, these kernels are optimal for the body of the density. To estimate tail probabilities, a kernel with inﬁnite support

Non-parametric and Self-tuning Measurement-Based Admission Control

669

is the better choice. However, the kernel choice is of minor signiﬁcance in any case [19, Sec. 6.2.3]. Thus, the performance knob for this type of estimator is ho , and thusly for any therefrom derived MBAC algorithm. Moreover, this performance parameter masks the conﬁguration parameter as its setting rules how precisely the QoS target is being approached. Unfortunately, this parameter lacks any meaning in an QoS context and we are facing one of the critics of MBAC algorithms as presented in Sec. 1. Having a closer look, however, we see that the smoothing parameter ho = f (R1 ) and (12) poses the same problem as (8) with the only diﬀerence that we have to diﬀerentiate (6) and not to integrate it. Thus, and we can apply KDE to estimate the derivative R1 from the sample the same way we do to estimate Fˆ (x). This method, is known as plug-in smoothing. This yields an particular advantage as the MBAC algorithm becomes herewith “self-tuning” because the estimator does compute its only performance parameter from the captured traﬃc sample. No human intervention, i.e. ﬁne tuning, is necessary. By repeated integration by parts, (12) becomes +∞ m f 2m (y)f (y)dy = (−1)m E[f (2m) (Y )]. (13) Rm = (−1) −∞

Using a Gaussian kernel with smoothing a, one gets an estimate of fˆ(2m) (y) n 1 (2m) y − Yi fˆ(2m) (y) = φ . n i=1 a

(14)

Based on the last two equations, the Jones and Sheather estimator reads [28] n Yj − Yi ˆ m (a) = (−1)m 1 R φ(2m) . na i,j=1 a

(15)

Its AMISE optimal smoothing is am (Rm+1 ) =

2(m+1/2) Γ (m + 1/2) πRm+1 n

1/(2m+3) (16)

and is an explicit function of Rm+1 . Thus, ˆ m (Rm+1 ) = R ˆ m (am (Rm+1 )). R

(17)

This sequential relationship motivates the sequential plug-in rule published in [29]. Brieﬂy, “ﬁx J ≥ 1 and take RJ+1 as given”, hence ˆ J (RJ+1 ), R ˆ J−1 (R ˆ J ), R ˆ J−2 (R ˆJ−1 ), . . . R ˆ2 ) for (11). The recommended value for J is 4 [29]. ˆ 1 (R until one obtains R

(18)

670

T.M. Bohnert et al.

A so-called reference estimate for RJ+1 in (18) has been published in [30]. With the samples standard deviation σ, it becomes RJ+1 =

Γ (J + 3/2) . σ 2J+3 2π

(19)

and we ﬁnally can write R1 as ˆ 1,J = f (RJ+1 , J) R

(20)

Given (20), one simple has to “plug-in” this estimate in (11) to get ho = (Υ (K)/R1,J n−1/3 . 1/3

(21)

Based on this framework, we can deﬁne a non-parametric, self-tuning MBAC algorithm. In order to do so, we insert (20) in (8) and the latter in (4). Thus, Pˆloss (ρ) in (5) ﬁnally reads n 1 ρ − ζti

Pˆloss = P {Ω > ρ} = sup 1 − . K nho i=1 ho t≥0

(22)

In conclusion, the features of (22) account for three major issues discussed in Sec. 1. First, its conﬁguration parameter Ploss is a common SLA QoS parameter. Further, its performance knob, optimal smoothing ho , is independent from human ﬁne-tuning but the algorithm computes it autonomously. Eventually, the algorithm’s performance does not depend on traﬃc conditions like for example the heavy-traﬃc approximation (CLT).

3

Evaluation Setup

Common practice in this ﬁeld of research is to implement and evaluate using the NS-2 Network Simulator. In order to gain optimal insight we deﬁned a simple “n sources, one router, one destination” topology where MBAC is deployed at the bottleneck link between the router and the destination. As traﬃc model, we opt for Voice Over IP (VoIP) traﬃc as it is the ﬁrst QoS requiring service with a penetration on a global scale. As one motivation for the model was independence from assumptions, the questions we ask is how well the algorithm performs if subjected to diﬀerent load conditions. Thus, we varied the bottleneck link capacity C in the range of 1 to 5 Mbps, an range where the CLT assumption far from reasonable. For each of this conﬁgurations, we evaluated the performance for a set of QoS targets Ploss in (5), namely {10−2, 10−3 , 10−4 , 10−5 }. For service request inter-arrival distribution we chose an Exponential distribution and so we did for holding times. The latter has a ﬁxed mean λh of 300s, while the former, λs , is adjusted such that for each capacity C, the link is continuously overloaded and the MBAC works on its limit.

Non-parametric and Self-tuning Measurement-Based Admission Control

671

By choosing two diﬀerent On/Oﬀ models with either Exponential (EXPOO) or Pareto (POO) distributed On/Oﬀ times, another dimension of diversity is added. While the former represents a short-range dependent model, the latter produces work arriving processes with SS/LRD characteristics [31]. In the style of [32], On and Oﬀ times are respectively 0.3s and 0.6s and in On state sources emit 125-Bytes-packets with rate 64kbps. Thus, sources can be considered as G.711 encoded with Voice Activity Detection. Routers buﬀer sizes vary with the link capacity too. For high quality VoIP communications, the maximum mouth-to-ear delay has to be less than 150ms. Assuming a 10 hop path as common, and further delay equally distributed over all hops, the buﬀer size in (4) is set to ω = C ∗ 15ms. Thus, the likelihood Pdelay (D ≥ 15ms) is the same as a loosing a packet. The time scales of interest in (4) are set to t = {0.02s, 0.04s, 0.06s, 0.08s, 0.1s} and the total simulation time was set to 4100s. By visual inspection, we found that for any conﬁguration the time to reach steady state was well below 500s. Given this, the ﬁrst 500s have been discarded for performance evaluation, resulting in an eﬀective time of tsim = 3600s, i.e. a busy hour.

4

Performance Metrics

In [7], the authors stated that all algorithms under test perform equally given homogeneous sources. This is a sizeable statement. The conclusion has been drawn after evaluating the algorithms using the MBAC standard Load-Loss metric. The approach is to overload the network with ﬂow arrivals and to measure Loss =

#pktsf w #pktslost , Load = . #pktssent Ctsim

(23)

where pktsf w denotes the packets forwarded on the bottleneck. But what does this metric tell us? Actually only little about MBAC performance. Indeed, it simply reveals that for a given load in the network, the queue overload level is X. The point here is that to achieve a target network load, an algorithm’s particular tuning parameter, typically some λ, δ depending on the model, is adjusted until it admits the required numbers of ﬂows to get the desired load. This number of ﬂows is in the case of homogeneous sources almost the same for each and any MBAC algorithm and hence, the loss is also the same. There is almost no diﬀerence except in the sequence of admissions. Thus, this metric doesn’t really tell us much about the admission behaviour of a particular algorithm. What further conﬁrms this ﬁnding is that in case of heterogeneous sources the performance did actually vary, see Fig. 8 in [7]. Clearly, for this scenario each algorithm admits a particular sequence, and thus set of sources to get the desired load resulting in individual aggregates with diﬀerent characteristics. Consequently loss for each conﬁguration is also diﬀerent. Hence, this metric is merely useful for heterogeneous sources. In [33], it has been shown that the worst-case traﬃc pattern is that of homogeneous On/Oﬀ sources; as used for this evaluation. This requires a modiﬁcation

672

T.M. Bohnert et al.

of the performance metric by replacing Loss in (23) with Ploss in (5). As a result we get a utilisation measure, implicitly expressing the number of ﬂows admitted onto the network, for a given QoS target. For example, a more aggressive algorithm results in higher resource usage, a more restrictive one leaves more spare resources. This metric therefore expresses a criteria for an ISP/CSP as discussed in Sec. 1. The former metric is used to provide a criteria from a providers perspective. Given an QoS target, how much use does the algorithm make of available resources. The second metric we introduce reﬂects customers as well as operators point of view. More precisely, we are interested in how tightly the algorithm approaches required QoS targets. For example, if a providers commits in an SLA a Ploss of 10−5 and the operator conﬁgures the algorithm respectively, how accurate does the algorithm approach this limit.

5

Load-Loss Performance

Standard Load-Loss performance results are shown in Fig. 1(a) and Fig. 1(b). The left graph depicts the results for the 5Mbps conﬁguration and both source models. One can see that the algorithm does perform well, i.e. the network resources are used up to ∼ 0.94 times the link capacity C (0.94 ∗ C) for POO sources and a QoS target set to Ploss = 10−2 . For tighter QoS requirements, the algorithm rejects more ﬂows and thus, the lowest link utilisation is for EXPOO sources ∼ 0.82∗C and an QoS target of 10−5 . In Fig. 1(b) results for the other extreme are illustrated, a link capacity C as low as 1Mbps. Similar to the previous example, the algorithm makes use of available resources for relaxed QoS targets while the performance curve decays rapidly along with tighter requirements. However, compared to the 5Mbps case, the maximum link utilisation is much lower. The reason appears clear. A smaller capacity C, means less ﬂows admitted and the resulting aggregate resembles much more that of a single source, i.e. a On/Oﬀ

(a) 5Mbps

(b) 1Mbps

Fig. 1. Target QoS objective versus Load performance for a 5Mbps (left) and 1Mbps (right) conﬁguration. The X-Axis shows Ploss in (5), the Y-Axis Load as deﬁned in (23) but in percentage.

Non-parametric and Self-tuning Measurement-Based Admission Control

673

Table 1. Loss - Load performance for each conﬁguration and the POO source model Ploss 0.010000 0.001000 0.000100 0.000010

1Mbps 74.64 69.33 66.13 64.87

2Mbps 84.36 79.50 77.85 75.65

3Mbps 88.61 84.83 82.48 81.52

4Mbps 92.07 88.30 86.77 85.39

5Mbps 93.77 90.68 88.64 87.67

pattern. It is therefore more impulsive and as the algorithm does account for this, it does reject more ﬂows. The remaining results for conﬁguration 2, 3 and 4Mbps exhibit similar patterns and as space is rare here, we omit their graphical presentation. For each conﬁguration and the POO model exact numbers are presented in Tab. 1.

6

Control Accuracy Evaluation

In the style of the previous section, Fig. 2(a) depicts the precision of the algorithm for a 5Mbps conﬁguration. The measured loss as deﬁned in (23) is plotted as a function of the conﬁgured QoS objective. The plotted performance curve is roughly divided in two parts and its midpoint is located at a target loss probability Ploss = 10−3 . Approaching this point from the right, QoS objectives are closely met, i.e. Loss ∼ Ploss for Ploss = 10−2 and Ploss = 10−3 . The opposite is the case for Ploss = 10−4 and Ploss = 10−5 . For this conﬁguration the algorithm is slightly too daring and the QoS objective deviates from its optimum. A similar behaviour is also depicted in Fig. 2(b), which illustrates the results for the 1Mbps conﬁguration. The optimal working point in this case lies exactly at Ploss = 10−3 . For the less stringent requirement Ploss = 10−2 it does guarantee the QoS objective and approaches the optimal point tightly for both traﬃc models. The same cannot be claimed for more tighter QoS objectives, namely Ploss = 10−4 or10−5 . As in the 5Mbps conﬁguration, the algorithm is too daring, admitting too many ﬂows into the network with the result of greater packet loss as demanded. There are basically two reasons, which contribute, to this imprecision. As already mentioned before, for such a small link capacity the aggregate is very impulsive and the true marginal distribution therefore rather discrete than continuous, implying consequences on the optimality of the estimator. Besides the former, the dominant factor lies in the queuing model itself. Observant readers might have noticed that (22) poses an overﬂow model rather than a loss model. Consequently, the algorithm does not diﬀerentiate between a one packet and 100-packet loss but does consider both cases as an equal overﬂow. Further, the deﬁnition of ρ assumes that the buﬀer is always empty at any observation point. There is one feature important to be noticed. In both cases the conﬁguration behaviour is practically equal for both traﬃc models, i.e. the curves are tightly

674

T.M. Bohnert et al.

(a) 5Mbps

(b) 1Mbps

Fig. 2. Target QoS objective versus Loss performance for a 5Mbps (left) and 1Mbps (right) conﬁguration. The X-Axis shows Ploss in (5), the Y-Axis Loss as deﬁned in (23).

superimposed. This implies that there is a chance for a consistent mapping of the conﬁguration and the experienced QoS level independent from actual traﬃc characteristics. In other words, it allows to acquire reliable knowledge about the deviation for a certain setting. Furthermore, though slightly imprecise, the deviations are only for very stringent cases. For these settings, imprecision are relatively signiﬁcant but absolutely at least questionable. More detailed insight, namely precise numbers for the POO source model are tabulated in Tab. 2, as before for all conﬁgurations. Table 2. Accuracy evaluation for each conﬁguration and the POO source model Ploss 0.010000 0.001000 0.000100 0.000010

7

1Mbps 0.004293 0.001092 0.000683 0.000789

2Mbps 0.005778 0.001582 0.001056 0.000341

3Mbps 0.005057 0.001771 0.000644 0.000553

4Mbps 0.007468 0.002409 0.001801 0.000938

5Mbps 0.011436 0.004424 0.001731 0.001665

Conclusion

The design of MBAC algorithm presented in this work has been devised to target three major critics of MBAC. First, conﬁguration parameter are without meaning in a SLA or QoS context. Second, common MBAC design is based on parametric models and incurs dependencies on assumptions and therefore limits applicability. Hence, MBAC performance depends strongly on a match between model assumptions and traﬃc characteristic causing reasonable performance for on type of traﬃc and poor for another. Finally individual performance depends on human ﬁne-tuning of model speciﬁc tuning knobs.

Non-parametric and Self-tuning Measurement-Based Admission Control

675

Deﬁning a simple queuing model, the algorithms conﬁguration parameter is Ploss . This solves the ﬁrst point, conﬁguration in conformance with common SLA QoS parameters. By using a purely non-parametric model, i.e. free from any assumption, the algorithm is free of any statistical dependency what distinguishes this approach from common design. The results are promising. The algorithm exhibits consistent performance if traﬃc characteristics are varied in either dimensions, namely link capacity (aggregation) and traﬃc nature, i.e. source models. Furthermore, the proposed model tackles the third major critic of MBAC algorithms, the requirement of human performance parameter ﬁne-tuning. Indeed, the algorithm can be considered as self-tuning as it estimates the optimal value for its only performance parameter, namely ho from the sample. No human intervention is needed. Simulations conﬁrmed that the algorithm does perform eﬃciently, i.e. makes near optimal use of resources, and does approach granted QoS commitments tightly. Most important, its behaviour is consistent, i.e. its performance and conﬁguration is near independent from traﬃc conditions.

Acknowledgement The authors sincerely acknowledge the invaluable discussions with Prof. Paulo Oliveira of the Department of Mathematics and Prof. Helder Arajo of the Department of Electrical Engineering, both associated with the University of Coimbra. This work has been jointly supported by the EU IST Integrated Project WEIRD (WiMAX Extensions for Remote and Isolated Data Networks) and ESF COST 290 Action.

References 1. S. Georgoulas, P. Trimintzios, and G. Pavlou. Joint Measurement-and Traﬃc Descriptor-based Admission Control at Real-Time Traﬃc Aggregation. In ICC ’04: IEEE International Conference on Communications, 2004. 2. E. Knightly and J. Qiu. Measurement-based admission control with aggregate traﬃc envelopes. In IEEE ITWDC ’98, Ischia, Italy, sep 1998. 3. M. Grossglauser and D. N. C. Tse. A framework for robust measurement-based admission control. IEEE/ACM Transactions on Networking, 7(3):293–309, 1999. 4. J. Qiu and E. W. Knightly. Measurement-based admission control with aggregate traﬃc envelopes. IEEE/ACM Transactions on Networking, 9(2):199 – 210, 2001. 5. T. K. Lee and M. Zukerman. Practical approaches for connection admission control in multiservice networks. In Proceedings of IEEE ICON ’99, pages 172–177, Brisbane, Australia, May 1999. 6. L. Breslau, S. Jamin, and S. Shenker. Measurement-based admission control: what is the research agenda. In IWQOS, pages 3–5, London, UK, mar 1999. 7. L. Breslau, S. Jamin, and S. Shenker. Comments on the Performance of Measurement-Based Admission Control Algorithms. In INFOCOM, pages 1233– 1242, Tel Aviv, Israel, mar 2000.

676

T.M. Bohnert et al.

8. A. W. Moore. An implementation-based comparison of Measurement-Based Admission Control algorithms. J. High Speed Networks, 13(2):87 – 102, 2004. 9. J. Evans and C. Filsﬁls. Deploying DiﬀServ at the network edge for tight SLAs, part 1. IEEE Internet Computing, pages 61–65, January 2004. 10. J. Evans and C. Filsﬁls. Deploying DiﬀServ at the network edge for tight SLAs, part 2. IEEE Internet Computing, pages 61–69, mar 2004. 11. S. Floyd. Comments on measurement-based admissions control for controlled-load services. Technical report, August 1996. 12. V. Paxson and S. Floyd. Wide-Area Traﬃc: The Failure of Poisson Modeling. In SIGCOMM, pages 257–268, 1994. 13. G. Mao. Finite Timescale Range of Interest for Self-Similar Traﬃc Measurements, Modelling and Performance Analysis. In IEEE International Conference on Networks, ICON’03, pages 7 – 12, Sydney, 2003. 14. R. G. Addie. On the applicability and utility of Gaussian models for broadband Traﬃc. In Proceedings of the International Conference on ATM, Colmar, France, 1998. 15. C. Fraleigh, F. Tobagi, and C. Diot. Provisioning IP backbone networks to support latency sensitive traﬃc. In INFOCOM 2003, volume 1, pages 375–385. IEEE, apr 2003. 16. J. Beran. Statistics for Long-Memory Processes. Chapman & Hall / CRC, 1 edition, 1994. 17. J. Kilpi and I. Norros. Testing the Gaussian approximation of aggregate traﬃc. In Internet Measurement Workshop, pages 49 – 61. ACM, 2002. 18. W. Willinger, D. Alderson, and L. Li. A pragmatic approach to dealing with highvariability in network measurements. In Alﬁo Lombardo and James F. Kurose, editors, Internet Measurment Conference, pages 88–100. ACM, 2004. 19. Multivariate Density Estimation. Probability and Mathematical Statistics. John Wiley & Sons, 1 edition, 1992. 20. T. Karagiannis, M. Molle, M. Faloutsos, and A. Broido. A Nonstationary Poisson View of Internet Traﬃc. In Infocom, 2004. 21. J. Cao, W. S. Cleveland, D. Lin, and D. X. Sun. On the nonstationarity of Internet traﬃc. In ACM SIGMETRICS ’01, pages 102 – 112, New York, NY, USA, July 2001. ACM Press. 22. A. Scherrer, N. Larrieu, P. Owezarski, and P. Abry. Non Gaussian and Long Memory Statistical Characterisation for Internet Traﬃc with Anomalies. Technical report, Ecole Normale Superieure de Lyon, 2005. 23. M. Zukerman and P. W. Tse. An Adaptive Connection Admission Control Scheme for ATM Networks. In Icc (3), pages 1153 – 1157, 1997. 24. K. Gopalan, T. C. Chiueh, and Y.-J. Lin. Probabilistic delay guarantees using delay distribution measurement. In ACM Multimedia, pages 900 – 907, 2004. 25. M. Menth, J. Milbrandt, and S. Oechsner. Experience Based Admission Control. In The Ninth IEEE Symposium on Computers and Communications ISCC2004, Alexandria, Egypt, 2004. 26. B. W. Silverman. Density Estimation for Statistics and Data Analysis. Monographs on Statistics and Applied Probability. Chapman & Hall, 1 edition, 1986. 27. M. C. Jones. The performance of kernel density functions in kernel distribution function estimation. Statistics & Probability Letters, 9(2):129–132, February 1990. 28. M. C. Jones and S. J. Sheather. Using non-stochastic terms to advantage in kernel distribution function estimation. Statistics and Probability Letters, 11:511–541, oct 1991.

Non-parametric and Self-tuning Measurement-Based Admission Control

677

29. B. E. Hansen. Bandwidth Selection for Nonparametric Distribution Estimation. Technical report, University of Wisonsin, May 2004. 30. J. S. Marron and M. P. Wand. Exact mean integrated squared error. Annals of Statistic, 20:712–736, 1992. 31. T. M. Bohnert and E. Monteiro. A comment on simulating LRD traﬃc with pareto ON/OFF sources. In CoNEXT’05: ACM Conference on Emerging Network Experiment and Technology, pages 228–229, Toulouse, France, October 2005. ACM Press. 32. A. P. Markopoulou, F. A. Tobagi, and M. J. Karam. Assessing the quality of voice communications over internet backbones. IEEE Transactions on Networking, 11(5):747–760, oct 2003. 33. G. Mao and D. Habibi. A Cell Loss Upper Bound for Heterogeneous On-Oﬀ Sources with Application to Connection Admission Control. Computer Communications, 25(13):1172–1184, August 2002.

Optimal Rate Allocation in Overlay Content Distribution Chuan Wu and Baochun Li Department of Electrical and Computer Engineering University of Toronto {chuanwu,bli}@eecg.toronto.edu

Abstract. This paper addresses the optimal rate allocation problem in overlay content distribution for eﬃcient utilization of limited bandwidths. We systematically present a series of optimal rate allocation strategies by dividing our discussions into four typical scenarios. Based on application-speciﬁc requirements, these scenarios reﬂect the contrast between elastic and streaming content distribution, with either per-link or per-node capacity constraints. In each scenario, we show that the optimal rate allocation problem can be formulated as a linear optimization problem, which can be solved eﬃciently in a fully distributed fashion. In simulations, we investigate the convergence of our distributed algorithms in both static and dynamic networks, and demonstrate their eﬃciency. Keywords: Overlay Network, Rate Allocation, Optimization.

1

Introduction

In recent years, content distribution with overlay networks has been proposed to oﬀer more eﬃcient bandwidth usage than that with multiple unicast sessions. To achieve better bandwidth utilization and failure resilience, overlay content distribution over mesh topologies has become typical in most recent proposals, which features parallel downloading from multiple overlay nodes. However, due to the limitation of bandwidth capacities in overlay networks, a critical question remains to be answered in any such content distribution scheme: What is the best way to select upstream peers and allocate ﬂow rates in an overlay topology, such that content can be most eﬃciently distributed? To eﬀectively address this question, it is necessary to consider that diﬀerent content distribution applications may have diﬀerent optimality goals and constraints for their rate allocation strategies. The content to be distributed can be classiﬁed into two broad categories: elastic content (e.g., bulk data ﬁles), and streaming content with speciﬁc bit rate requirements (e.g., media streaming). In the case of distributing elastic content, such as ﬁle downloading, it is important to optimally select upstream nodes and allocate ﬂow rates so that throughput of content distribution can be maximized. In the case of distributing media streams, the required streaming rate needs to be sustained for all receivers in active sessions. Besides, in both cases, capacity constraints in overlay networks may lie in I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 678–690, 2007. c IFIP International Federation for Information Processing 2007

Optimal Rate Allocation in Overlay Content Distribution

679

the overlay links (link capacity constraints), due to limited available bandwidth along the link, or the overlay nodes (node download/upload capacity constraints), caused by limited node download/upload capacities. In this paper, we consider both types of content and both assumptions of capacity constraints, and systematically present a series of optimal rate allocation strategies in four content distribution scenarios. We show that in each scenario, the optimal rate allocation problem can be modeled into a linear optimization problem, for which eﬃcient and fully decentralized solutions exist. The remainder of the paper is as follows. In Sec. 2 and Sec. 3, we motivate the optimization formulations in elastic and streaming content distribution scenarios, respectively, and present eﬃcient distributed solution algorithms. We discuss practical concerns of applying the algorithms in Sec. 4. Simulation results are presented in Sec. 5. We discuss related work and conclude the paper in Sec. 6 and Sec. 7, respectively.

2

eBurst: Distribution of Elastic Content

In this paper, we consider content distribution sessions in mesh overlay topologies, consisting of one data source S and a set of receivers in T . Each receiver is served by one or more upstream nodes, and may serve one or more downstream nodes. Such a topology can be modeled as a directed graph G = (N, A), where N is the set of overlay nodes and A is the set of overlay links. We have N = S ∪ T . To distribute elastic content, it is always desirable to achieve maximum throughput in order to minimize the total time to completion. The problem is: How do we optimally allocate rates on each overlay link to maximize throughput? We show that such a problem, referred to as eBurst, can be formulated as linear programs. In order to better characterize the multicast ﬂow of a content distribution session, we resort to the notion of conceptual unicast flows [1] in formulating the linear programs. A multicast content distribution ﬂow can be conceptually viewed as consisting of multiple unicast ﬂows from the source to each of the receivers. These conceptual unicast ﬂows co-exist in the overlay without contending for capacities, and the actual delivery rate on an overlay link is the maximum of the rates of all the conceptual ﬂows going through the link. In formulating the linear programs, the utilization of conceptual unicast ﬂows is useful to capture the inherent property of a multicast ﬂow, as the conceptual unicast ﬂows follow the nice property of ﬂow conservation at each intermediate node, while the multicast ﬂows do not. 2.1

eBurst with Link Capacity Constraints

We ﬁrst consider the assumption that capacity constraints lie in the overlay links, which is the case when the bottleneck is in the core of the overlay, such as transcontinental links. Let uij be the capacity of overlay link (i, j). R denotes the throughput of the content distribution session, i.e., the aggregate receiving rate at each participating peer. xij is the delivery rate on link (i, j). Let f t denote the conceptual unicast ﬂow from source S to a receiver t, |f t | be its ﬂow rate,

680

C. Wu and B. Li Table 1. eBurst LCC LP subject to j:(i,j)∈A

max R t fij −

t fji = bti ,

∀i ∈ N, ∀t ∈ T,

(1)

j:(j,i)∈A t ≥ 0, ∀(i, j) ∈ A, ∀t ∈ T, (2) fij t ≤ xij , ∀(i, j) ∈ A, ∀t ∈ T, (3) fij

0 ≤ xij ≤ uij ,

where

∀(i, j) ∈ A,

(4)

R ≥ 0, ⎧ ⎨ R if i = S, bti = −R if i = t, ⎩ 0 otherwise.

and fijt be the rate of f t ﬂowing through link (i, j). The eBurst problem with Link Capacity Constraints (LCC) can be formulated as the linear program in Table 1, referred to as the eBurst LCC LP. In this LP, (1) and (2) model each conceptual ﬂow f t as a valid network ﬂow, following ﬂow conservations. (3) represents the relation between conceptual ﬂow rates and the actual delivery rate on each link, which is further constrained by link capacities in (4). There exists an eﬃcient combinatorial algorithm to solve the eBurst LCC LP. By reformulating constraints (2), (3) and (4) as 0 ≤ fijt ≤ uij , fijt ≤ xij ≤ uij , ∀(i, j) ∈ A, ∀t ∈ T, we notice that this LP can be decomposed into |T | maximum ﬂow problems, each corresponding to one conceptual unicast ﬂow f t , ∀t ∈ T . Therefore, this LP can be solved by computing maximum ﬂows from the source to each of the receivers, and then delivery rate xij is decided as the maximum of the rates of all the maximum conceptual ﬂows on (i, j). Since the maximum ﬂow problem can be solved by distributed algorithms, such as push-relabel algorithm [2], we are able to derive an eﬃcient decentralized algorithm for the LP, as given in Table 2. Table 2. Distributed algorithm for eBurst LCC LP 1. Compute maximum ﬂow f t from S to t, ∀t ∈ T , with distributed push-relabel algorithm. 2. Compute the maximum throughput R = mint∈T |f t |. t , ∀(i, j) ∈ A. 3. Compute optimal rates xij = maxt∈T fij

2.2

eBurst with Node Capacity Constraints

When bandwidth bottlenecks occur at the “last-mile” links to the overlay nodes, it is more appropriate to model capacity constraints at each node rather than

Optimal Rate Allocation in Overlay Content Distribution

681

Table 3. eBurst NCC LP subject to j:(i,j)∈A

max R t fij −

t fji = bti ,

∀i ∈ N, ∀t ∈ T,

(5)

j:(j,i)∈A t ≥ 0, ∀(i, j) ∈ A, ∀t ∈ T, (6) fij

t ≤ xij , ∀(i, j) ∈ A, ∀t ∈ T, (7) fij

xij ≤ Oi ,

∀i ∈ N,

(8)

xji ≤ Ii ,

∀i ∈ N,

(9)

j:(i,j)∈A

j:(j,i)∈A

where

R ≥ 0, xij ⎧ ⎨ R bti = −R ⎩ 0

≥ 0,

∀(i, j) ∈ A,

if i = S, if i = t, otherwise.

each link, with maximum upload and download capacities. For node i, let Oi be its upload capacity and Ii be its download capacity. The linear program with Node Capacity Constraints (NCC) is formulated in Table 3, referred to as the eBurst NCC LP. This LP regulates delivery rates on the overlay links using node capacities in (8)(9), rather than link capacities in (4). It is not readily decomposable to known combinatorial optimization problems. To obtain a distributed algorithm, we apply Lagrangian relaxation and design the corresponding subgradient algorithm, which is an eﬃcient LP solution technique and can be naturally implemented in a distributed manner. We have derived a fully decentralized algorithm by applying Lagrangian relaxation to the dual of the eBurst NCC LP. Due to space limit, we only provide the main idea to develop the algorithm. For complete details, interested readers are referred to our technical report [3]. We ﬁrst note that, if we can decide the set of optimal delivery rates, xij , ∀(i, j) ∈ A, that satisfy (8)(9), the eBurst NCC LP boils down to an eBurst LCC problem. In order to obtain the optimal values for primal variables xij , we investigate the variable-constraint correspondence between an LP and its dual, i.e., each primal variable corresponds to one dual constraint. When Lagrangian relaxation is applied to the dual LP, a primal variable is actually the same as the Lagrangian multiplier associated with its corresponding dual constraint. We further understand that with the Lagrangian relaxation technique, the optimal values for the Lagrangian multipliers can be obtained by the subgradient algorithm. Therefore, to acquire xij , we solve the dual LP of the eBurst NCC LP in Table 3 with Lagrangian relaxation and subgradient algorithm, by relaxing the set of dual constraints corresponding to the primal variables xij , ∀(i, j) ∈ A.

682

C. Wu and B. Li Table 4. Distributed algorithm for eBurst NCC LP

1. Initialize rates xij [0], ∀(i, j) ∈ A, to non-negative values. 2. Repeat the following iteration until the sequence {x[k]} converges to x∗ : (1) With xij [k] as the upper bound of the delivery rate on link (i, j), ∀(i, j) ∈ A, compute the maximum ﬂow from S to t, ∀t ∈ T , with the distributed push-relabel algorithm; (2) Update x by — Compute x = x[k] + θ[k] t∈T z t [k], where θ[k] = a/(b + ck), a > 0, b ≥ 0, c > 0, and for all (i, j) ∈ A, ⎧ 1 if edge (i, j) is in the min cut of the ⎪ ⎪ ⎨ minimum of all maximum ﬂows from t zij [k] = S to t, ∀t ∈ T ⎪ ⎪ ⎩ 0 otherwise. — Project x onto the feasible simplex P = {x| xij ≤ Oi , xji ≤ Ii , ∀i ∈ N, xij ≥ 0, ∀(i, j) ∈ A} j:(i,j)∈A

by xij [k + 1] = min(xij ,

j:(j,i)∈A xij

m:(i,m)∈A xim

Oi ,

xij m:(m,j)∈A

xmj

Ij ), ∀(i, j) ∈ A.

→ Optimal delivery rates obtained. 3. With x∗ij as the link capacity on link (i, j), ∀(i, j) ∈ A, compute the maximum ﬂow f t from S to t, ∀t ∈ T , with the distributed push-relabel algorithm. 4. Compute the maximum throughput R = mint∈T |f t |. → Maximum content distribution throughput obtained.

We also observe that the LP in Table 3 has the underlying structure of maximum ﬂow problems. Therefore, due to the primal-dual relationship between max-ﬂow and min-cut linear programs, its dual LP has the underlying structure of min-cut problems, which we can utilize when solving the dual LP with subgradient algorithm. The complete distributed algorithm is shown in Table 4. This distributed algorithm has nice combinatorial interpretations. Starting from some initial feasible delivery rates, the optimal rates are derived iteratively. In each iteration, we increase rates on links in the current minimum cut of the network, i.e., links that are saturated with currently allocated rates, and always guarantee node capacity constraints are satisﬁed by projecting increased rates onto the feasible simplex P . After this projection, the bandwidth share for non-saturated links, i.e., links that are not in the minimum cut, is reduced while that for saturated links is increased. This reﬁnement repeats itself until the optimal rate allocation on all the links is achieved.

3

sBurst: Distribution of Streaming Content

Real-time content streaming, such as live multimedia or stocks quotes, usually demands a ﬁxed streaming rate, r, to sustain the streaming session. For

Optimal Rate Allocation in Overlay Content Distribution

683

these applications, instead of maximizing throughput, it is desirable to optimize rate allocations to minimize the total cost of streaming, while guaranteeing the streaming rate r. More rigorously, if we use cij to denote the streaming cost on an overlay link (i, j), our objective is to minimize (i,j)∈A cij xij . When cij represents per-link delay, the optimal rate allocation minimizes total delay of the session. When all cij ’s are 1, the total bandwidth usage is minimized and thus the best bandwidth eﬃciency is achieved by the optimization. Henceforth, this optimization problem is referred to as sBurst. 3.1

sBurst with Link Capacity Constraints

The sBurst problem with link capacity constraints, referred to as sBurst LCC LP, is formulated in Table 5. Table 5. sBurst LCC LP

min subject to

cij xij

(i,j)∈A

t fij −

j:(i,j)∈A

t fji = bti ,

∀i ∈ N, ∀t ∈ T, (10)

j:(j,i)∈A t ≥ 0, ∀(i, j) ∈ A, ∀t ∈ T,(11) fij t fij

≤ xij , ∀(i, j) ∈ A, ∀t ∈ T,(12)

∀(i, j) ∈ A, 0 ≤ xij ≤ uij , ⎧ r if i = S, ⎨ bti = −r if i = t, ⎩ 0 otherwise.

where

(13)

This LP can be solved with Lagrangian relaxation and subgradient algorithm, by relaxing constraint group (12). Associating Lagrangian multipliers μij with (12), we obtain the Lagrangian dual of the sBurst LCC LP: max L(μ)

(14)

µ≥0

where L(μ) = min P

(i,j)∈A

xij (cij −

t∈T

μtij ) +

μtij fijt ,

(15)

t∈T (i,j)∈A

and polytope P is deﬁned by constraints (10)(11)(13). The Lagrangian subproblem in (15) can be decomposed into |T | shortest path problems: μtij fijt (16) min (i,j)∈A

684

C. Wu and B. Li

subject to j:(i,j)∈A

fijt −

j:(j,i)∈A

t fji = bti , ∀i ∈ N,

fijt ≥ 0,

∀(i, j) ∈ A,

for each t ∈ T , and a minimization problem xij (cij − μtij ) min

(17)

t∈T

(i,j)∈A

subject to 0 ≤ xij ≤ uij , ∀(i, j) ∈ A. The shortest path problems in (16) can be eﬃciently solved in a distributed manner by the distributed Bellman-Ford algorithm [4]. The optimal solution to the minimization problem in (17) can be computed as follows: 0 if t∈T μtij ≤ cij , xij = (18) t uij if t∈T μij > cij . In each iteration of the subgradient algorithm, we solve the subproblems in (16) and (17) with the current Lagrangian multiplier values μ[k]. Then we update the Lagrangian multipliers by μtij [k + 1] = max(0, μtij [k] + θ[k](fijt [k] − xij [k])), ∀(i, j) ∈ A, ∀t ∈ T, where θ is a prescribed sequence of step sizes satisfying: θ[k] > 0, limk→∞ θ[k] = 0, and

∞

θ[k] = ∞.

k=1

Since the primal values in the optimal solution of the Lagrangian dual are not necessarily optimal to the primal LP, we further apply the algorithm introduced by Sherali et al. [5] to recover the optimal primal values. At the k th iteration, t we compose a primal iterate f ij [k] via t f ij [k] =

k

λkh fijt [h],

(19)

h=1

k k k where h=1 λh = 1 and λh ≥ 0, for h = 1, . . . , k. In our algorithm, we choose the step length sequence θ[k] = a/(b+ck), ∀k, a > 0, b ≥ 0, c > 0, and convex combination weights λkh = 1/k, ∀h = 1, . . . , k, ∀k. These guarantee the convergence of our subgradient algorithm; they also guarantee that any accumulation point f ∗ of the sequence {f [k]} generated via (19) is an optimal solution to the primal problem in Table 5 [5]. Now we can design our distributed algorithm to solve sBurst LCC LP. We delegate the computation tasks on link (i, j) to be carried out by incident node j. The algorithm to be executed by each node is given in Table 6.

Optimal Rate Allocation in Overlay Content Distribution

685

Table 6. Distributed algorithm on node j for sBurst LCC LP 1. Initialize Lagrangian multipliers μtij [0], ∀i : (i, j) ∈ A, ∀t ∈ T , to non-negative values. 2. Repeat the following iteration until sequence {μ[k]} converges to μ∗ , {f [k]} converges to f ∗ : ∀i : (i, j) ∈ A, ∀t ∈ T t [k] by distributed Bellman-Ford algorithm; (1) Compute fij (2) Compute xij [k] by Eqn. (18); k 1 t t (3) Compute f ij [k] = h=1 k fij [h] =

k−1 t [k fij k

t − 1] + k1 fij [k];

t (4) Update Lagrangian multiplier μtij [k + 1] = max(0, μtij [k] + θ[k](fij [k] − xij [k])); ∗ t , ∀i : (i, j) ∈ A. 3. Compute optimal rate x∗ = maxt∈T f ij

ij

Table 7. sBurst NCC LP

min subject to j:(i,j)∈A

cij xij

(i,j)∈A t fij −

t fji = bti ,

∀i ∈ N, ∀t ∈ T,

j:(j,i)∈A t ≥ 0, ∀(i, j) ∈ A, ∀t ∈ T, fij

t fij

≤ xij , ∀(i, j) ∈ A, ∀t ∈ T,(20)

xij ≤ Oi ,

∀i ∈ N,

xji ≤ Ii ,

∀i ∈ N,

j:(i,j)∈A

j:(j,i)∈A

where

3.2

xij ≥ 0, ∀(i, j) ∈ A, ⎧ r if i = S, ⎨ bti = −r if i = t, ⎩ 0 otherwise.

sBurst with Node Capacity Constraints

The sBurst problem with node capacity constraints, referred to as sBurst NCC LP, is formulated in Table 7. This LP can be solved with similar Lagrangian relaxation techniques as solving the sBurst LCC LP, by relaxing (20). The only diﬀerence is that the resulting minimization subproblem is deﬁned diﬀerently: min xij (cij − μtij ) (21) (i,j)∈A

t∈T

686

C. Wu and B. Li

subject to j:(i,j)∈A

xij ≤ Oi , ∀i ∈ N,

j:(j,i)∈A

xji ≤ Ii , ∀i ∈ N,

xij ≥ 0,

∀(i, j) ∈ A,

which is an inequality constrained transportation problem, and can be solved by distributed auction algorithm [6]. Thus, we can also design a distributed algorithm for sBurst NCC LP, as summarized in Table 8. Table 8. Distributed algorithm on node j for sBurst NCC LP 1. Initialize Lagrangian multipliers μtij [0], ∀i : (i, j) ∈ A, ∀t ∈ T , to non-negative values. 2. Repeat the following iteration until sequence {μ[k]} converges to μ∗ , {f [k]} converges to f ∗ : ∀i : (i, j) ∈ A, ∀t ∈ T t [k] by distributed Bellman-Ford algorithm; (1) Compute fij (2) Compute xij [k] by distributed auction algorithm; t (3) Compute f [k] = k−1 f t [k − 1] + 1 f t [k]; ij

k

k ij

ij

t (4) Update Lagrangian multiplier μtij [k + 1] = max(0, μtij [k] + θ[k](fij [k] − xij [k])); ∗ t , ∀i : (i, j) ∈ A. 3. Compute optimal rate x∗ = maxt∈T f ij

4

ij

Algorithm Execution in Dynamic Overlays

In an overlay session characterized by dynamics, the proposed distributed algorithms are also invoked in a dynamic manner. When a node joins a session, it is bootstrapped with a set of upstream nodes. It then starts downloading with the available upload capacities acquired from them. Meanwhile, it requests the source to recompute the optimal rate allocation. When a node departs from a session or fails, an aﬀected downstream node attempts to acquire additional bandwidths from its remaining upstream nodes. Meanwhile, it requests the source to recompute the optimal rate allocation. At the source, when it receives more than a certain number of requests for recomputation, it broadcasts such a request to all the nodes, which activate a new round of execution of the distributed algorithm, while continuing to download at the original rates. Note that in such a dynamic environment, the execution of a distributed algorithm always starts from the previous optimal values (rather than from the very beginning when all values are initialized to any non-negative values, such as zeros), thus expediting its convergence. After the distributed algorithm converges, all the nodes adjust their download rates to the new optimal values.

Optimal Rate Allocation in Overlay Content Distribution

5

687

Performance Evaluation

We next conduct an empirical study of the distributed optimization algorithms. All simulations are conducted over random network topologies generated with BRITE [7] topology generator based on power-law degree distributions. The average number of neighbors per node in the topologies is six. For link-constrained problems, link capacities are generated with heavy-tailed distributions between 100 Kbps and 4 Mbps; for node-constrained problems, each node has 1.5 − 4.5 Mbps of download capacity and 0.6 − 0.9 Mbps of upload capacity. For sBurst problems, streaming of a 300 Kbps bitstream is simulated and cost coeﬃcients are random numbers chosen from (0, 3). 5.1

Convergence in Static Networks

To investigate the scalability of our optimal rate allocation algorithms, we ﬁrst evaluate their convergence speed in static networks of diﬀerent sizes. For eBurst LCC LP, we have shown in Table 2 a purely combinatorial algorithm, which can derive the solution eﬃciently. Here, we are more concerned with the eﬃciency of the iterative subgradient algorithms to solve the other three problems, given in Table 4, 6, 8, respectively.

140

180

feasibility 90% optimality optimality

120

160 140 120

100

60

60 40

20

20 50 100 150 200 250 300 350 400 450 500

Number of nodes in the network (A) eBurst NCC LP

0

140

feasibility 90% optimality optimality

100

80

40

160

120

100

80

0

180

feasibility 90% optimality optimality

Number of iterations

Number of iterations

160

Number of iterations

180

80 60 40 20

50 100 150 200 250 300 350 400 450 500

Number of nodes in the network (B) sBurst LCC LP

0

50 100 150 200 250 300 350 400 450 500

Number of nodes in the network (C) sBurst NCC LP

Fig. 1. Convergence speed in static networks for distributed algorithms in Table 4, 6, 8

Fig. 1 shows that for all three problems, with the increase of network sizes, the numbers of iterations their algorithms take to achieve optimality only increase slowly, thus not aﬀecting algorithm scalability. In all cases, the algorithms converge to feasibility within only a few rounds. Even the convergence to 90% optimality is much faster, within 20% fewer rounds than those required for convergence to optimality. Therefore, in realistic networks, we can obtain a feasible solution to a certain degree of optimality in a much shorter time, when it is not necessary to achieve absolute optimality. 5.2

Convergence in Dynamic Networks

We next investigate the algorithm convergence in practical dynamic environments. Due to space limit, we only show the results obtained by the eBurst NCC algorithm in Table 4, while other algorithms produce similar results.

40 35 30 25 20 15 10 5 0 0 20 40 60 80 100 120 140 160 180 200

Number of additional iterations

C. Wu and B. Li Number of additional iterations

688

40 35 30 25 20 15 10 5 0 200 180 160 140 120 100 80 60 40 20

Number of nodes joined in the session (A) Node joining

Number of nodes left in the session (B) Node leaving

Fig. 2. Convergence speed in a dynamic network for eBurst NCC algorithm in Table 4 1.5

Throughput R (Mbps)

Throughput R (Mbps)

1.5

1

0.5

0.5

0 0

1

20 40 60 80 100 120 140 160 180 200

Number of nodes joined in the session (A) Node joining

0 200 180 160 140 120 100 80 60 40 20 0

Number of nodes left in the session (B) Node leaving

Fig. 3. Throughput achieved in a dynamic network with eBurst NCC algorithm in Table 4

In this experiment, 200 nodes sequentially join an elastic content distribution session, and then start to depart when their downloads are completed. The distributed algorithm is invoked every 10 node joins or departures. As discussed in Sec. 4, the algorithm always runs from the previously converged optimal rates when it is invoked. We show the number of additional iterations required to converge to new optimal values from the previous ones in node joining and departure phases in Fig. 2 (A) and (B), respectively. We ﬁnd that, compared to running from the very beginning in the cases of static networks of the same sizes, our dynamic execution of the algorithm converges much faster. Independent of the current network sizes, it always takes less than 15 iterations to converge, in both joining and departure cases. While dynamic networks are more akin to realistic scenarios, this suggests our optimal rate allocation algorithms can deliver good performance and provide excellent scalability in practice. In addition, we illustrate maximum throughput of the dynamic session, R, in Fig. 3. In Fig. 3 (A), at the beginning of the node joining phase, the throughput drops because of the competition of more nodes for the available upload capacities in the network. Later, when more nodes have joined, more upload capacities are provided to the session, and thus the throughput gradually increases. During the node departure phase in Fig. 3 (B), due to similar reasons, the throughput ﬁrst shows a decreasing trend, and then rises when only a few nodes are left.

Optimal Rate Allocation in Overlay Content Distribution

6

689

Related Work

On the topic of overlay content distribution, mesh-based approaches have become typical in most recent proposals [8,9,10]. Disseminating large-scale content on mesh topologies, their parallel transfers make them possible to deliver at fundamentally higher bandwidth and reliability, without the cost of constructing multicast trees. With respect to rate allocation in mesh overlay topologies, most existing work either relies on TCP, or employs various heuristics without formulating the problem theoretically. Compared to our optimal rate allocation, their rate allocation falls short of achieving global optimality. There is no way to guarantee that maximum throughput is achieved at all nodes, or a required streaming rate is provided to all at the lowest cost. There are a few exceptions that formulate the problem into optimization models and propose distributed solutions [1,11,12]. Our work is original in that we systematically model all the typical content distribution scenarios, and design efﬁcient algorithms to solve the formulated optimization problem combinatorially or numerically, in a fully distributed manner. In addition, we discuss execution of the algorithms in practical dynamic environments. This has not been addressed in previous optimization-based approaches, most of which are largely theoretical in nature.

7

Conclusion

The problem of interest in this paper is to design eﬃcient distributed algorithms for optimal rate allocation under all typical scenarios of overlay content distribution. For this purpose, we formulate rate allocation problems into linear programs, which optimize bandwidth utilization towards a variety of objectives, and develop fully decentralized algorithms to eﬃciently compute the optimal link rates. We believe such an optimal rate allocation algorithm is critical to any schemes of overlay content distribution. As ongoing work, we are investigating the combination of optimal rate allocation with eﬃcient distribution schemes, and its application in realistic networks.

References 1. Li, Z., Li, B.: Eﬃcient and Distributed Computation of Maximum Multicast Rates. In: Proc. of IEEE INFOCOM 2005. (March 2005) 2. Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms, and Applications. Prentice Hall (1993) 3. Wu, C., Li, B.: Optimal Rate Allocation in Overlay Content Distribution. Technical report, http://iqua.ece.toronto.edu/papers/ratealloc.pdf (Oct 2006) 4. Bertsekas, D.P., Gallager, R.: Data Networks, 2nd Ed. Prentice Hall (1992) 5. Sherali, H.D., Choi, G.: Recovery of Primal Solutions when Using Subgradient Optimization Methods to Solve Lagrangian Duals of Linear Programs. Operations Research Letter 19 (1996) 105–113

690

C. Wu and B. Li

6. Bertsekas, D.P., Castanon, D.A.: The Auction Algorithm for the Transportation Problem. Annals of Operations Research 20 (1989) 67–96 7. Medina, A., Lakhina, A., Matta, I., Byers, J.: BRITE: Boston University Representative Internet Topology Generator. Technical report, http://www.cs.bu.edu/brite (2000) 8. Kostic, D., Rodriguez, A., Albrecht, J., Vahdat, A.: Bullet: High Bandwidth Data Dissemination Using an Overlay Mesh. In: Proc. of the 19th ACM Symposium on Operating Systems Principles (SOSP) 2003. (October 2003) 9. Sherwood, R., Braud, R., Bhattacharjee, B.: Slurpie: A Cooperative Bulk Data Transfer Protocol. In: Proc. of IEEE INFOCOM 2004. (March 2004) 10. Zhang, X., Liu, J., Li, B., Yum, T.P.: CoolStreaming/DONet: A Data-Driven Overlay Network for Live Media Streaming. In: Proc. of IEEE INFOCOM 2005. (March 2005) 11. Lun, D.S., Ratnakar, N., Koetter, R., Medard, M., Ahmed, E., Lee, H.: Achieving Minimum-Cost Multicast: A Decentralized Approach Based on Network Coding. In: Proc. of IEEE INFOCOM 2005. (March 2005) 12. Adler, M., Kumar, R., Ross, K.W., Rubenstein, D., Suel, T., Yao, D.D.: Optimal Peer Selection for P2P Downloading and Streaming. In: Proc. of IEEE INFOCOM 2005. (March 2005)

SLA Adaptation for Service Overlay Networks Con Tran1, Zbigniew Dziong1, and Michal Pióro2 1

Department of Electrical Engineering, École de Technologie Supérieure, University of Quebec, Montréal, Canada [email protected], [email protected] 2 Department of Communication Systems, Lund University, Sweden 2 Institute of Telecommunications, Warsaw University of Technology, Poland

Abstract. Virtual Network Operators lease bandwidth from different data carriers to offer well managed Virtual Private Networks. By using proprietary algorithms and leased resource diversity they can offer Quality of Service, at competitive prices, which is difficult to attain from traditional data network operators. Since maximizing the profit is a key objective for virtual network operators, in this paper we describe a novel resource management approach, based on an economic model that allows continuous optimizing of the network profit, while keeping network blocking constraints. The approach integrates leased link capacity adaptations with connection admission control and routing policies by using concepts derived from Markov Decision Process theory. Our numerical analysis validates the approach and shows that the profit can be maximized under varying network conditions. Keywords: Service Overlay Network, Economic Model, Management, Quality of Service, Markov Decision Process.

Resource

1 Introduction With the continuing increase in popularity of the Internet, more and more applications are offered. However, since the Internet is formed by numerous interconnected Autonomous Systems (AS) operating independently, real time applications such as Voice over Internet (VoIP), Streaming Multimedia and Interactive Games are often unable to get their required end-to-end Quality of Service (QoS) guarantee. One possible approach to overcome the problem is to use global Service Overlay Networks (SON) [1], [2], where a set of overlay routers (or overlay nodes, ON), owned and controlled by one Virtual Network Operator (VNO), is interconnected by virtual overlay links (OL) realized by leasing bandwidth from the AS’s via Service Level Agreements (SLA) [3]. In general there are two approaches to provide QoS in Service Overlay Networks. In the first approach, the VNO that manages the SON leases the best effort access to the AS’s. In this case, to ensure availability of required bandwidth and QoS, the SON has to monitor continuously its OL’s established over best effort Internet and react fast in case of some OL quality deterioration, e.g., by rerouting the connections on the paths with acceptable QoS. In the second approach the VNO leases the overlay links I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 691–702, 2007. © IFIP International Federation for Information Processing 2007

692

C. Tran, Z. Dziong, and M. Pióro

with appropriate bandwidth and QoS guarantees (e.g., based on the MPLS-TE technology). In this case user end-to-end QoS connections are realized through the SON admission control and routing policy. Obviously the VNO can use both types of leased bandwidth in a flexible and controlled way in order to reduce cost since best effort bandwidth is less expensive. The above arguments indicate that Virtual Network Operators can potentially offer well managed QoS Virtual Private Network (VPN) at competitive prices, by using economically-sound SONs that can be managed by sophisticated proprietary control mechanisms adaptable towards the needs of particular clients. At the same time the VNOs can lease bandwidth from several different operators that removes the geographical barriers and also offers much better architecture for reliability and failure protection. These advantages led to creation of several VNOs like Internap (internap.com), Mnet (mnet.co.uk), Sirocom (sirocom.com), Vanco (vanco.co.uk), Virtela [4] (virtela.net), Vanguard (vanguardms.com), Netscalibur (netscalibur.com). In this paper, we focus on resource management in SONs based on leased bandwidth with QoS guarantees (e.g., based on the MPLS-Traffic Engineering technology). In particular, we describe a novel framework, based on an economic model, which allows continuous network profit optimization. The approach integrates leased link capacity adaptations (to traffic and SLAs variations) with connection admission control and routing policies by using concepts derived from Markov decision process theory. The paper is organized as follows. The framework for the SON economic model is described in Section 2. In Section 3, we present the resource adaptation model that constitutes the main element of the economic model. Numerical analysis validating the proposed approach is given in Section 4.

2 SON Economic Framework In general, the cost of establishing and operating a SON can be divided into three parts: the initial cost of installing ON’s and the maintenance center, the ongoing costs of the leased bandwidth according to SLA, and the ongoing maintenance costs. These costs should be covered by the revenue achieved from connection charges. The monetary and provided goods flows between the users, SON and AS’s are illustrated in Fig. 1.

Service users

Revenue from connections

SLA costs

AS

SON End-to-end QoS connections

Overlay links bandwidth with QoS

Fig. 1. SON economic framework

To simplify presentation, assuming that the maintenance cost is proportional to SLA’s cost, we can integrate them together in the total SON operation cost expressed as a sum of overlay link costs Cs (expressed as a cost per time unit). Further we

SLA Adaptation for Service Overlay Networks

693

assume that this total cost should be covered from the operating profit Pˆ and that the objective of the virtual operator of SON is to maximize this profit expressed as:

Pˆ = Rˆ − ∑ s∈S Cs = ∑ j∈J λ j rˆj − ∑ s∈S C s

(1)

where: Rˆ : SON average revenue rate, λ j : average rate of admitted class j connections,

rˆj : average revenue (price) for class j connections, S : set of overlay links, J : set of user connection classes. In this formulation the maximization of the revenue profit can be achieved by leased bandwidth adaptation that influences the admission rate λ j and cost Cs, and by adjusting the service prices rˆj . Note that a change of service price can influence the service demand [5] expressed in our model as the average rate of connection arrivals λ j . The profit maximization should be done in such a way that the connection

blocking probabilities B j of all classes do not exceed the constraints B cj . In general, the demand for SON services is variable due to factors like periodic variability in scale of a day, week, month, year, and variability caused by trends of increasing (decreasing) demand for certain services. Also bandwidth prices in SLA can fluctuate due to competition and market trends. Since the leased bandwidth amount within SLA can be adapted from time to time or even dynamically (on line), it is natural that the VNO needs a mechanism that can indicate when and how to adapt the SLA’s to maximize profit while respecting the required blocking constraints. In this paper we describe a framework for such an approach. The proposed framework is based on integration of a model for SLA adaptation with a model for CAC (call admission control) and routing that is based on Markov decision process. As it will be explained in Section 3, this integration provides two key advantages. First, the CAC and routing algorithm takes into account the costs of SLA and therefore the resource utilization is maximized taking into account their cost. Second, some statistics of CAC and routing parameters provide information indicating which SLAs should be adapted in order to maximize the profit. It is important to emphasize that, as described in [6], in the proposed framework, besides the concept of revenue from connections, rˆj , we also use very similar concept of reward from connections, r j . In general the reward from connections is a more general concept since it does not have to have monetary meaning and it can be treated as a control parameter. In our framework we first use the reward concept to maximize the reward profit from the network expressed as: P = R − ∑ s∈S C s = ∑ j∈J λ j r j − ∑ s∈S C s

(2)

Then we determine rˆj as a function of r j . The reason for this dual approach is that it allows decomposition of the original problem (1) into two sub-problems. The first is

694

C. Tran, Z. Dziong, and M. Pióro

optimal link bandwidth allocation for given connection arrival rates λ j and blocking constraints B cj , that is solved by using the reward formulation (2). The outcome of resolving this sub-problem is a set of link bandwidth allocations and a set of connection reward parameters. The second sub-problem is to optimize rˆj as a function of r j . In the simplest case, one can assume that rˆj = r j . Nevertheless the demand for services ( λ j ) can be influenced by the choice of rˆj . In particular if the demand is significantly reduced for rˆj = r j , one may need to reduce the revenue parameters in order to maximize the revenue profit (1). An approach to solve the second subproblem is discussed in [6]. In the balance of this paper we concentrate on a solution of the first sub-problem. In particular we present a basic model for bandwidth adaptation assuming that connection reward parameters are given.

3 Resource Adaptation In this section we describe a model for resource adaptation for given connection arrival rates. In Section 3.1 the applied CAC and routing policy based on Markov decision theory is presented. In Section 3.2 the mechanism for bandwidth adaptation is described for given reward parameters. Finally, in Section 3.3 we discuss the issue of adaptation of reward parameters in order to achieve the required blocking constraints. While the proposed framework is applicable to multi-class services with different bandwidth requirements, in this paper we will consider a network with homogeneous connections where all classes have the same bandwidth requirement and the same mean service time. 3.1 CAC and Routing Policy

For a network with given links dimensions and reward parameters, optimizing profit is realized with efficient use of available resources, through a CAC and routing policy which admits the connections on the most profitable routes. The policy used is a state dependent CAC and routing based on Markov Decision Process (MDP) theory [7]. In addition to dynamic link costs, defined in [7] as shadow prices, we integrated the costs of the SON physical overlay links, Cs, in the model. As shown in [7, 8], for given reward profit rate, one can find an optimal CAC and routing policy that maximizes the average profit by using one of the standard MDP solution algorithms. In this paper, this exact approach is called the MDP CAC and routing model. In the MDP model, the network state space is usually too large in realistic network examples. To overcome this problem, a decomposition technique, called MDPD, described in [7, 8], is applied. The technique uses a link independence assumption to decompose the network reward process into separable link reward processes. The decomposed process is described separately for each link s by the link state X=(xj:j=1,2,…) (where xj is the number of class j connections admitted on the link), the link arrival and service rates λsj , µ j, and the link connection reward parameters r js .

SLA Adaptation for Service Overlay Networks

695

In this paper, this approach is called the MDPD CAC and routing model. In this model, to integrate monetary link costs into the CAC and routing policy, we propose to divide connection reward into link connection rewards proportionally to the used resources cost: r js = r j . (Cs N s ) (∑o∈S k Co N o )

(3)

where N s : link s capacity (max number of connections) S k : set of links on path k Then, analogously to.(2) the link expected profit is: Ps = R s − C s =

∑ j∈ J λ sj r js − C s

(4)

where λ sj is the rate of admitted class j link connections. In the MDPD model, a link net gain g sj ( X ) is defined as the expected link reward increase from accepting an additional connection of class j. A state dependent link shadow price is then defined as the difference between the link reward and net gain. It represents the cost of accepting the class j call on link s in state X: p sj ( X ) = r js − g sj ( X )

(5)

During network operation, the link shadow prices can be calculated based on measurement of link connection arrival rates [7], [8]. Then the optimal CAC and routing policy will choose the connection path with the maximum positive net gain: g max = max k ∈W j [r j − ∑ s∈S k p sj ( X )]

(6)

If no path produces a positive gain, the connection is rejected. 3.2 Link Bandwidth Adaptation for Given Reward Parameters

With changing network conditions, such as traffic level or leased bandwidth cost, SON parameters should be adapted to continuously realize the reward profit maximization objective. As mentioned, the SON operator can control overlay links capacities (by modifying the SLAs) and reward parameters r j . In this section, we concentrate on overlay link capacity adaptation with given reward parameters. Profit Sensitivity to Link Capacity. In the MDPD model the network profit sensitivity to a link capacity can be approximated by the link profit sensitivity to the link capacity. Following equation (4) we have:

∂Ps ∂N s = (∂R s ∂N s ) − (∂C s ∂N s )

(7)

It has been shown in [8] that the average reward sensitivity to link capacity can be approximated by the average link shadow price of a connection class with unit bandwidth requirement, (∂R s ∂N s ) ≅ p s ( N s ) . The value of the average link shadow price can be calculated or measured during execution of the CAC and routing algorithm. In the second term of (7), assuming that C s is a linear function of N s , we

696

C. Tran, Z. Dziong, and M. Pióro

have C s = cN s where c is the bandwidth unit cost. Substituting in (7), the link profit is maximized when ∂Ps ∂N s = 0, i.e. at the solution for the following equation, which constitutes the base for the capacity adaptation procedure: p s (N s ) − c = 0

(8)

Bandwidth Adaptation Model. An iterative procedure is used to converge to the solution for (8), giving the optimized link capacity. In our case, we use Newton’s successive linear approximations, in which the capacity new value Nn+1 at each iteration step n is given by (link index s is omitted to simplify the notation):

N n +1 = N n −

pn − c [∂ ( p ( N ) − c) ∂N ]

(9)

Approximating the derivative in equation (9) at iteration n:

∂ ( p (N ) − c) ∂ N = ∂ p (N ) ∂N ≅ ( pn − pn−1) (Nn − Nn −1)

(10)

and substituting (10) in (9), the new capacity will be: N n +1 = N n − ( N n − N n −1 ) [( pn − c) ( pn − pn −1 )]

(11)

Link capacity adaptation is then realized by executing periodically the following steps for each link: a. Estimate new average link shadow price p n based on CAC and routing algorithm link shadow prices. b. Calculate new value of link bandwidth: N n′ +1 = N n + α ( N n +1 − N n )

(12)

where Nn+1 is from (11) and α is a damping factor used to improve convergence. c. Round N n′ +1 to the nearest value available from SLA and, if N n′ +1 − N n exceeds the threshold of importance, use it as the new link capacity. It is important to underline that the bandwidth adaptation procedure and the CAC and routing procedure (section 3.1) are integrated by the fact that link shadow prices obtained in the latter procedure are used in the capacity adaptations. In turn, CAC and routing policies adapt to changes in link capacities and in bandwidth unit costs. 3.3 Reward Parameter Adaptation to Meet the Blocking Constraints

The CAC algorithm rejects connection requests when bandwidth is not available or the connection is not profitable. The resulting blocking probabilities can be defined for each connection class j as: B j = (λ j − λ j ) λ j = 1 − (λ j λ j )

(13)

We define the network average blocking probability as: BT = ∑ j (λ j − λ j )

∑j λj = ∑j λ j Bj ∑j λj =1− (∑j λj ∑j λj )

(14)

SLA Adaptation for Service Overlay Networks

697

As mentioned in Section 2, blocking probabilities should not exceed the network and/or class blocking constraints BTc and B cj . To achieve this objective we propose an adaptation of reward parameters, since in general the increase of r j will cause decrease of Bj and BT, and vice versa. Note that a change of r j may influence the optimal solution for link bandwidth allocation. Therefore the adaptation of a reward parameter should be integrated with adaptation of link bandwidths. One possible relatively simple solution is to apply the two adaptation algorithms iteratively, as illustrated in Fig. 2. Network blocking constraint can be met by multiplying all class reward parameters {rj} by a common factor γ.

Fig. 2. SON adaptation algorithm

Once network constraint has been met, class blocking probabilities can be readjusted, as required to meet class constraints, by varying relatively the class reward parameters between them (while preserving the average network reward parameter). In this paper, we only consider meeting the network blocking constraint. Let rT = ∑ j λ j r j

∑ j λ j and rs = ∑ j λ sj r js ∑ j λ sj = ∑ j λ sj r js λs be the current

average network and link reward parameters. To achieve equality BT (rT ) = BTc , one can apply Newton’s iterations with an approximation for the derivative ∂BT ∂rT . At each iteration n, the new network reward parameter will be: rTn +1 = rTn +

BTc − BTn

(15)

(∂BTn ∂rTn )

Since adaptation multiplies all class reward parameters by the same factor γ, all reward parameter relative differentials will be equal: ∂r j r j = ∂rT rT = ∂rs rs

(16)

Using (16) in the differentiation of (14) gives: ∂BT = ∂rT

1

∑ jλj

∑ j (λ j

r j ∂B j rT ∂r j

)

(17)

698

C. Tran, Z. Dziong, and M. Pióro

To simplify calculation for ∂B j ∂r j , we use the one moment performance model based on the link independence assumption, where link blocking probability Bs applies to all connection classes on the link. Then, connection path k and class j blocking probabilities are: B k = 1 − Π s∈S k (1 − B s )

(18)

B j = Π k ∈W j B k = Π k ∈W j [1 − Π s∈S k (1 − Bs )]

(19)

Differentiation of class blocking probability (19) gives: ∂B j ∂r j

⎡ r ∂Bs ⎤ = ∑ k∈W {[Π o∈W j \{k } (1 − Π s∈S o (1 − Bs ))] *∑ s∈S k ⎢(Π t∈S k \{s} (1 − Bt )) * s ⎥} j r j ∂rs ⎥⎦ ⎣⎢

(20)

To find link s blocking probability derivative ∂Bs ∂rs , we can use the formulation of the link average shadow price given in [8], which at optimal network profit (8) equals the bandwidth unit cost c: ps = c = (λs λs )[E(λs , Ns −1) − E(λs , Ns )] ∑ j∈J s λkj rjs

(21)

where E( λ, N) is the Erlang B formula. Based on (21), for a given c, differential δNs required to maintain optimal profit condition corresponding to differential δrs, is determined using the following: E(λs, Ns +∂Ns −1) −E(λs, Ns +∂Ns ) =

[E(λs , Ns −1) −E(λs , Ns )]rs rs +∂rs

(22)

Using δNs, link s blocking probability derivative is then approximated as: ∂Bs ∂rs = [ E (λs , N s + ∂N s ) − E (λs , N s )] ∂rs

(23)

The adapted network reward parameter (15), obtained using (17), (20), (23), allows for determination of multiplier γ to be used for class reward parameters.

4 Numerical Analysis In this section, we present numerical results for our resource adaptation model. Adaptation results are obtained based on each of the two cases of CAC and routing models (section 3.1): MDP and MDPD. The MDPD case allows for adaptations in realistic size networks, while the MDP case provides exact solutions which can be used to verify the approximate MDPD case. In both cases, the network performance evaluation (class blocking probabilities and flow distribution among the paths) is based on an exact model with exact network state. The advantage of using this exact performance model is that we can concentrate on key issues of profit optimization without being affected by the noise of an event-driven statistical simulation. The obvious limitation is that the network examples size is quite small due to the number of network states. In the future, we will present results based on event driven simulations that can handle realistic size examples.

SLA Adaptation for Service Overlay Networks

699

In this analysis, we focus on the two important profit optimization process issues: a. Link bandwidth adaptation to maximize profit, both in the MDPD model and the exact MDP model. b. Reward parameter adaptation to meet the blocking constraints. 4.1 Analyzed Network Examples

To be able to process the exact analytical model, we limit network state space by using two simple network examples called 3L and 5L, where links capacities are also limited. Each network has three connection classes, each connecting a pair of nodes over direct or indirect paths. Connection classes arrival rates are λ1 , λ2 , λ3 , all have the same bandwidth requirement (1 unit), service rate (μ=1) and reward rate (rj=r=1). Bandwidth unit costs are 0.2 in network 3L and 0.1 in network 5L. The networks are represented in Fig. 3 and 4, with connection class 1 shown. In network 3L, connection classes 2 and 3 similarly connect respectively ON2-ON3 and ON3-ON1. In network 4L, connection classes 2 and 3 connect ON0-ON2 and ON0-ON3. Class 1 direct connection

ON2 Class 1 direct connection

λ1

OL3 OL1

λ1

OL1 ON0

OL2 ON1 ON3 Class 1 indirect connection

Fig. 3. Network 3L

ON1

Class 1 indirect connection

OL4

OL2

ON2

OL3

OL5 ON3

Fig. 4. Network 5L

4.2 Link Bandwidth Adaptation for Given Reward Parameters Link Bandwidth Adaptation Convergence. To illustrate the convergence we use network 3L with initial link bandwidth set to N1 = N 2 = N 3 = 3 and connection

arrival rates set to λ1 = λ2 = λ3 = 3 . The link bandwidth adaptation procedure (with the damping factor set to one) is performed to obtain maximized network profit. The results are shown in Fig. 5. For both models, MDPD and MDP, the convergence to optimal links capacities of 6 is reached very fast after two iterations. Then we change arrival rates to λ1 = 4, λ2 = 3, λ3 = 2 and the adaptation is performed again. The results are shown in Fig. 6 and 7 for MDPD and MDP models respectively. Both models converge to the same optimal values (N1=7, N2=6, N3=4). Numerical results for example network 5L, shown in Fig. 8, confirm the good convergence of the model to the optimal network profit.

700

C. Tran, Z. Dziong, and M. Pióro 7

0.4 bandwidth MDP profit MDPD profit

6

0.3 0.2

5

0.1

4

MDP dP/dN MDPD dP/dN

0 3 1

2 iteration step

-0.1

3

1

2 iteration step

3

Fig. 5. Network 3L bandwidth adaptation, λ1 = λ 2 = λ3 = 3 0.1 N=(7,6,4)

4.9 4.8

0.05 dP/dN

network profit

5

N=(7,6,5) 4.7

0 -0.05

link 1

-0.1

link 2 link 3

N=(6,6,6) 4.6 4.5

1

2 iteration step

-0.15

3

1

2 iteration step

3

Fig. 6. Network 3L adaptation, MDPD, λ1 = 4, λ2 = 3, λ3 = 2 N=(6,6,4)

5.2

5.1

5

0.1

N=(7,6,4)

N=(6,6,5)

dP/dN

network profit

5.3

link 1 link 2 link 3

-0.1

N=(6,6,6)

1

0

2 3 iteration step

-0.2

4

1

2 3 iteration step

4

Fig. 7. Network 3L adaptation, MDP, λ1 = 4, λ2 = 3, λ3 = 2 5.5 N=(4,4,4,0,0)

4.5

N=(7,7,7,0,0)

4

1

2 3 iteration step

0.4 0.2 0

N=(3,3,3,3,3) 3.5

link link link link link

0.6 dP/dN

network profit

N=(5,5,5,0,0) 5

4

-0.2

1

2 3 iteration step

Fig. 8. Network 5L adaptation, MDP, λ1 = λ 2 = λ 3 = 3

4

1 2 3 4 5

SLA Adaptation for Service Overlay Networks

701

The presented results indicate good convergence of the proposed model. Also it is important that the approximate MDPD model converges to the same solution as exact MDP model with comparable convergence speed. Convergence Improvement. Based on the shape of the profit sensitivity curve ∂P ( N ) , we noticed that, to improve convergence speed, the damping factor for ∂N s s positive derivative should be > 1, and the one for negative derivative should be < 1. More profound analysis of this issue will be given in a future publication, with experiments based on event driven simulation of larger network examples. 4.3 Reward Parameter Adaptation to Meet the Blocking Constraints

In this section, analysis results for using reward parameter adaptation to control network blocking probability are given. In the derivatives computation, we used simple linear interpolation to estimate fractional link capacities blocking probability. The adaptation algorithm is applied to network 3L with class arrivals λ={4, 3, 2}. For blocking constraint set at 2%, results are represented in Fig. 9 while Fig. 10 shows the case when blocking constraint is set at 1%. 20

5

reward parameter network blocking

4

Lambda=(4, 3, 2) SLA unit cost = 0.2

3 2 1

1

2 3 4 iteration step

N=(8,6,5)

N=(8,6,5) network profit

reward parameter, blocking probability(%)

6

15

N=(7,6,5)

10

5

5

N=(7,6,5)

N=(7,5,4) 1

2

3 4 iteration step

5

Fig. 9. Network 3L reward parameter adaptation, block constraint=2% 35 reward parameter network blocking

5 4 3 2 1 0

N=(8,7,5)

30 network profit

reward parameter, blocking probability(%)

6

N=(8,6,5)

25 20

N=(8,6,5)

N=(8,6,5)

15 10

Lambda=(4,3,2), SLA unit cost = 0.2

1

2 3 4 iteration step

5

5

1

N=(7,5,4) 2 3 4 iteration step

5

Fig. 10. Network 3L reward parameter adaptation, block constraint=1%

These results show that the adaptation algorithm can fulfill the blocking constraints in a few iterations.

702

C. Tran, Z. Dziong, and M. Pióro

5 Conclusions In this paper, we proposed an economic framework for operating Service Overlay Networks with the objective of network profit maximization subject to blocking constraints. The framework uses Markov decision theory for integration of the CAC and routing model with the link bandwidth and reward parameter adaptation models. The key element of this integration is the concept of link shadow price that provides consistent economical framework for the whole problem. Preliminary numerical results validate the proposed approach by showing good convergence, although network examples are of limited size due to the exact analytical model used for network performance evaluation. Therefore, in a forthcoming publication, we will present a study based on event driven simulations using realistic size networks. In this case the measurements and predictions will be used to feed the MDPD based models. We will also study reward parameter adaptation to meet individual class blocking constraints. Acknowledgments. This work is supported in part by grants from NSERC.

References 1. Li, B., et al.: Guest Editorial, Recent Advances in Service Overlay Networks. IEEE Journal on Selected Areas in Communications, vol. 22, no. 1, (2004), p 1-5 2. Sen, Arunabha, et al.: On Topological Design of Service Overlay Networks. Lecture Notes in Computer Science, Vol. 3552, Quality of Service –IWQoS 2005: 13th International Workshop, IWQoS 2005 Proceedings (2005), p 54-68 3. Duan, Z., Zhang, Z.-L., Hou, Y.T.: Service Overlay Networks: SLAs, QoS, and Bandwidth Provisioning. IEEE/ACM Transactions on Networking, vol. 11, no. 6, (December 2003), p 870-883 4. Allen, D. Virtela’s IP VPN Overlay Networks. Network Magazine, (January 2002) 5. Fulp, E.W., Reeves, D.S.: Bandwidth Provisionning and Pricing for Networks with Multiple Classes of Service. Computer Networks, vol. 46, no. 1, (September 2004), p.41-52 6. Dziong, Z.: Economic Framework for Bandwidth Management in Service Overlay Networks. Technical Report. École de Technologie Superieure, (July 2006) 7. Dziong, Z., Mason, L.G.: Call Admission and Routing in Multi-Service Loss Networks. IEEE Transactions on Communications, vol. 42, issue 234, part 3, (February/March/April, 1994), p 2011-2022 8. Dziong, Z.: ATM NETWORK RESOURCE MANAGEMENT. McGraw-Hill, New York, (1997)

Virtual Private Network to Spanning Tree Mapping Yannick Brehon, Daniel Kofman1 , and Augusto Casaca2 ENST, 46 rue Barrault, 75013 Paris, France {brehon,kofman}@enst.fr INESC-ID/IST, Rua Alves Redol 9, 1000-029 Lisboa, Portugal [email protected] 1

2

Abstract. Metro Ethernet has been widely adopted lately. One of its goals is to provide clients with VPN services, and in this context, loadbalancing and protection have become important issues. This paper presents an original formulation of these problems in Metro Ethernet, and introduces an algorithm for load balancing which provides good numerical results.

1

Introduction

Ethernet has been a very successful technology and has been widely accepted in local networks. Network operators have been inclined to push this technology further into their network, following a cost-eﬀective desire of uniﬁcation. Therefore, Ethernet is now becoming the technology of choice when building Metropolitan Area Networks (MANs). Ethernet MANs will support VPNs (Virtual Private Networks), using the concept of VLANs (Virtual Local Area Networks), such as deﬁned by [1]. VPNs are distributed LANs which are connected using a provider-provided network. Once connected, nodes in VPNs are able to communicate as if they were physically directly connected. VLANs allow a single network to interconnect several groups of Ethernet nodes - groups which, in a VPN context, will be the VPNs - and maintain a separation between those groups, isolating one from another. There are many advantages to the Ethernet technology. It has been around long enough and deployed widely enough to beneﬁt from substantial cost reductions. It is now a mature technology which is very cost eﬀective, easy to deploy and easily inter-operable. However it lacks a decent control plane. For the network operator, this means there are no resource reservation schemes available and no load-sharing mechanisms. For the network user, this means there are no QoS (Quality of Service) guarantees and no protection mechanisms for failure resiliency. These are essential drawbacks which need to be addressed when deploying MAN-grade Ethernet. These major issues are therefore a focus of great interest. For the protection issues, the Spanning Tree Protocol [3] (STP) has evolved to a Rapid Spanning Tree Protocol [4] (RSTP). These protocols are in charge I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 703–713, 2007. c IFIP International Federation for Information Processing 2007

704

Y. Brehon, D. Kofman, and A. Casaca

of building the Spanning Tree (ST) which will support the Ethernet traﬃc. The RSTP version accelerates reconvergence of the tree when links or nodes of the current tree fail, bringing the convergence time from 30 seconds to 3 seconds in most failure scenarios. While this is still no protection mechanism, the recovery times are at least more acceptable. As far as load balancing is concerned, the introduction of the Multiple Spanning Tree Protocol [5] (MSTP) allows for multiple trees to coexist on a single network. Each tree has an identiﬁer and traﬃc is mapped to various trees so as to allow some sort of traﬃc engineering. For instance, using multiple trees, it is possible to achieve load balancing in the network by spreading the load across trees which use diﬀerent links. Many papers have studied how to do traﬃc engineering using MSTP [6,7,8]. [6] shows the tradeoﬀ between the number of STs deployed, the alternate routing and the load-balancing. It also oﬀers an algorithm to group various VPN clients onto fewer STs. This grouping strategy however does not show how to initially build the STs. [7] proposes a heuristic algorithm to build a spanning tree for each VPN. In the end, there are as many trees deployed as there are VPNs in the network: this raises both management and scaling issues. Finally, [8] proposes a control admission scheme to make the best use of deployed STs. This last paper therefore is adapted for optimal dynamic use of STs. In this paper, we formulate the problem as an original Mixed Integer Non Linear Program. However, this problem is not computationally tractable. Therefore we propose a heuristic algorithm to help the operator ﬁrst conﬁgure his Ethernet switches, that is, build the STs of MSTP, and then map client VPNs to STs in an optimal way for load sharing considerations. The rest of the paper is organized as follows. We ﬁrst introduce the problem of VPNs to ST mapping in Section 2 and show the various considerations a network operator must face. We will then present the formulation of this optimization problem as a Mixed Integer Non Linear Program (MINLP) in Section 3. In Section 4, we introduce our algorithm for heuristic optimal mapping of VPNs onto STs, and results will be shown in Section 5. Finally, in Section 6 conclusions are drawn and future work is presented.

2

VPNs and Spanning Trees

The Metro Ethernet architecture (see Figure 1) consists of a core network which relies on Ethernet to transparently connect distant nodes of the network. A customer may want to interconnect various sites and operate them as a single LAN: he wants to setup a Virtual Private Network (VPN). When this VPN uses a Metro Ethernet provider, it appears as a C-VLAN (Customer VLAN) for the provider. Each of his sites is connected to the core network via a gateway: the access point (AP). The network provider is in charge of transparently interconnecting these APs according to some Service Level Agreement (SLA). The notion of VLAN identiﬁer (VID) was introduced for this purpose. The use of VLAN IDs enables the logical separation between groups of Ethernet clients (otherwise impossible due to Ethernet’s ﬂat addressing). When a frame

Virtual Private Network to Spanning Tree Mapping

705

ST 1 ST 2 C-VLAN 1

C-VLAN 1

C-VLAN 3

C-VLAN 3

C-VLAN 2 Access Point (edge)

C-VLAN 3 C-VLAN 2 C-VLAN 1

Metro Ethernet Core

Fig. 1. Metro Ethernet Architecture

belonging to a given client enters the provider’s network, it is tagged with a VID. Each port of the Ethernet switches in the core are assigned VIDs. The switch will forward the tagged frame to and from a port only if it is part of the corresponding C-VLAN. This mechanism allows the diﬀerent C-VLANs to use the same provider network and yet, C-VLANs are isolated one from another. Ethernet uses a spanning tree protocol which prevents loops in the forwarding. This protocol constructs a shortest-paths tree given a root and link weights. The root is automatically selected based on an administrator-set identifer for each node. The link weights are also set by management. Frames are then sent on this tree, which guarantees a single path between any two nodes, and the switches just need to learn which port gives access to which Ethernet destination machine. However, using a tree structure means many links are not used, since they are left out of the structure. This means the network resources are wasted. One way to overcome this issue is to use the Multiple Spanning Tree Protocol (MSTP). This protocol allows, among other things, to have several Spanning Trees (STs) on the same network. Each tree has a single identiﬁer. The C-VLANs are mapped to STs, such that each C-VLAN uses only one of the available STs. However, one ST can carry multiple C-VLANs. The question is to build the STs and have them carry the C-VLANs in a way which is optimal: in Figure 1, there are many ways to map C-VLANs 1, 2 and 3 to the two STs. The optimality can be deﬁned in many ways. One may want to minimize total network resources usage, minimize the maximally loaded link (that is improve load sharing) or have distinct trees (or at least, trees which share the minimum amount of links) for failure resiliency. These aspects will be dealt with in the next sections.

3

Optimization Problem

In this section, we will provide a mathematical formulation of the above described problem in the form of a MINLP (Mixed Integer Non Linear Program). Given a

706

Y. Brehon, D. Kofman, and A. Casaca

physical topology, which STs and which assignment of VPNs to STs make the optimal use of the network resources? We ﬁrst make preliminary observations which are used in the MINLP formulation. 3.1

Preliminary Observations

'L S 1 − S 1 −

(N-p) nodes of the VPN

p nodes of the VPN

For simplicity of the formulation in Section 3.2, we will consider here that nodes in a given VPN i are all given the same access bandwidth (Di ). We also assume there is a uniform traﬃc matrix. This implies that the bandwidth is reserved in such a way that along any tree supporting a given VPN, there must be enough bandwidth to satisfy traﬃc of volume Di /(N b Clients(i) − 1) between all nodes in the VPN, where N b Clients(i) is the number of nodes belonging to VPN i. However, simple modiﬁcations will allow nodes to have diﬀerentiated accesses, by adding weights to the nodes. The calculation of the bandwidth needed on a single link is very simple. Consider two nodes on a tree such as those depicted in Figure 2. One node gives access to p nodes of the VPN, the other node to N − p. Then, for the above bandwidth reservation hypothesis to apply, we need to reserve Di ∗ p ∗ (N − p)/(N − 1) units of bandwidth on the link. The key element to understanding the formulation is therefore the calculation of how many nodes are accessed through a given node.

Fig. 2. Link Bandwidth Calculation

3.2

Optimization Problem Formulation

Let us assume that: – v = 1, 2, .., n indices network nodes – e = 1, 2, .., p indices links of the network – i = 1, 2, .., N b V P N indices the VPNs we are trying to setup where N b V P N is the total number of VPNs to be supported. – j = 1, 2, .., N b ST indices spanning trees, where N b ST is the maximum amount of Spanning Trees which can be setup in the network.

Virtual Private Network to Spanning Tree Mapping

707

Additionally we will assume that the following constants are known by the operator: – Di is the bandwidth allocated to each node of VPN i – Pvi is the binary indicating that node v is a part of VPN i. A node may only be a part of a single VPN: one node must be created for each VPN connected to a given AP. This is not a restriction, it is simply for clarity of the formulation. A simple way to ponder some nodes in regards of others is therefore to connect more nodes of the same VPN to a given AP. – N b Clients(i) = v Pvi is the number of clients of VPN i – aev is the binary indicating that link e has an extremity at v – Ce is the capacity of link e – Me is the module of link e (number of trees which the link can support) Finally, the optimization problem will manipulate these variables: – – – – – –

– –

–

xeij is the ﬂow allocated to link e for VPN i on tree j xej is the total ﬂow allocated to link e on tree j xe is the total ﬂow allocated to link e M ap(i, j) is the binary variable set to 1 if VPN i is mapped to tree j if (v, e, i, j) is the integer variable indicating how many nodes of VPN i are available behind node v when coming from link e, using tree j. dir(e, j, v) is the binary variable indicating that link e is being used in VPN j such that v must go through e to get to the root. If this variable is null, then either the link is not being used in the tree’s active topology or it is being used in the other direction. uj is the binary variable indicating whether ST j has at least one VPN mapped to it. 1/Λ is the amount by which the bandwidths assigned to each VPN can be multiplied. 1/Λ should be greater or equal to 1 if the initial bandwidth is to be assigned. 1/Θ is the amount by which the number of trees assigned to each VPN can be multiplied. 1/Θ should be greater or equal to 1 if the modules are signiﬁcant.

We can then easily formulate a Mixed Integer Non-Linear Program with a set of constraints: – Domain deﬁnition constraints: • The dir() function is only diﬀerent from zero for (e, v) couples such that link e is connected to node v dir(e, j, v) ≤ aev

∀j, v, e

(1)

• The if () function is only diﬀerent from zero for (e, v) couples such that link e is connected to node v. Also, no interface may present an access for a VPN for more nodes than there are clients (this prevents loops to appear, considered as being part of the ST (wrongly) by the program:

708

Y. Brehon, D. Kofman, and A. Casaca

the graphs generated by the algorithm are connex, and because of the n-1 active links (see constraint (8)), a spanning tree) if (v, e, i, j) ≤ N b Clients(i) ∗ aev

∀e, i, j, v

– Mapping constraint: • Only one ST can be used for a given VPN: M ap(i, j) = 1 ∀i

(2)

(3)

j

• ST use:

uj ≥

M ap(i, j) Nb V PN i

∀j

(4)

– Spanning Tree (ST) Deﬁnition: • In a ST, there are exactly (number of nodes - 1) links used (one link going to the root per node, the uplink ). We use this property to deﬁne our ST: dir(e, j, v) = n − 1 ∀j (5) e,v

• One uplink per non-Root node and per ST: dir(e, j, v) ≤ 1 ∀v, j

(6)

e

• A given link may only be oriented once at most: dir(e, j, v) ≤ 1 ∀j, e

(7)

v

– We must deﬁne, for each node, the number of VPN nodes it gives access to when coming from a given link. These are the interface deﬁnitions constraints: • For ST nodes which are part of the VPN (and are thus, by hypothesis, leaves of the network graph), the interface is set to 1. if (v, e, i, j) = M ap(i, j) ∀j, i, v/Pvi = 1 (8) e/aev =1

• For ST nodes which are not part of the VPN, the interface gives access to the sum of the values of the interfaces its other interfaces are linked to. if (v , e , i, j)− if (v, e, i, j) ≥ e =e/ v =v/ ae v =1 ae v =1

(1 −

v ∀j, i, v/Pvi

dir(e, j, v )) ∗ M axClients = 0, e/aev = 1

(9)

Virtual Private Network to Spanning Tree Mapping

709

where M axClients is the largest VPN count (= maxi N b Clients(i)). The second term insures that the inequality is only applied to links in the active topology (see constraint 10). • For all ST nodes, interface must only be set to a positive number if the link is part of the active topology: if (v, e, i, j) ≤

dir(e, j, v ) ∗ M axClients ∀e, i, j, v

(10)

v

– Based on the number of clients of a VPN accessed behind every node, it is easy to deﬁne a set of constraints dictating the ﬂow assignation and related conditions. • Flow on links: xeij ≤ xej ∀e, j (11) i

xej ≤ xe

∀e

(12)

i

• Flow can only be set if the VPN is mapped to a tree, and only if the link is used in the tree’s active topology xeij ≤

dir(e, j, v) ∗ M axF low

∀j, e, i

(13)

v

xeij ≤ M ap(i, j) ∗ M axF low

∀j, e, i

(14)

where MaxFlow is a constant large enough not to be a constraint. • Flow assignation: a link must provide enough ﬂow for every node on one side to be able to setup a connection with a node on the other side (this is based on Section 3.1)

xeij ≥ Di /(N b Clients(i) − 1) ∗ (N b Clients(i) −

if (v , e, i, j)∗

v =v/ aev =1

if (v , e, i, j))

(15)

v =v/ aev =1

∀e, i, j, v/aev = 1 • Capacity constraint: this constraint can be used in two ways. First, in a resource optimization problem for a given network. Second, by setting all link capacities equal, maximizing 1/Λ in this constraint will minimize the maximally loaded link, i.e. achieve load balancing. xe ≤ Λ.Ce

∀e

(16)

710

Y. Brehon, D. Kofman, and A. Casaca

• Number of trees per link constraint: this constraint can also be used in two ways. First, in a resource optimization problem for a given network, one may want to limit the number of trees on a link. Second, by setting all link modules equal, maximizing 1/Θ in this constraint will minimize the maximal number of trees on any single link, which means that any link failure will impact a minimal number of trees. dir(e, j, v) ≤ Θ.Me ∀e (17) j,v

The objective is now: Minimize α

uj + β.Λ + γ.Θ

(18)

j

Where α, β, γ are the weights assigned according to relative importance of respectively, the cost of the number of STs (management cost), load balancing and protection needs. Note that constraint (15) is quadratic, semi deﬁnite negative, which means that the solution space is concave. This means it is not possible to solve exactly without exploring the entire solution space. So as to solve this problem, we tried linearizing this constraint, but calculations were still untractable (using the commercial solver CPLEX [10]), even for instances of the problem implying very small networks; the following heuristic method was thus conceived.

4

Heuristic Mapping of VPNs on Trees

In this section we will present an original heuristic algorithm for mapping VPNs to STs: the Diversified Forest with Greedy Mapping (DFGM). This method is based on two stages. First, we build a set of STs which aim at having few links in common: the Spanning Forest. Second, we assign VPNs to STs in a greedy fashion. The rationale behind this process is that this way, the network operator only builds a Spanning Forest once and then maps his various VPNs to it. This method is therefore very convenient for operators, while providing good results (see Section 5). 4.1

Building the Spanning Forest

For this stage of the algorithm, the weights of each link will be changed so as to represent how often the link appears in spanning trees. After building a ST, the weights of its links are increased by a ﬁxed value Δ. Initially, all links have a same ﬁxed weight of 1. When building the STs, an important factor is the choice of the root. The root is typically the node of the tree which will have the most traﬃc running along it. Therefore, it is important that the roots be as varied as possible, which

Virtual Private Network to Spanning Tree Mapping

711

is why we introduce a variable associated to each node which counts how many times a node has been a root in any ST. The root is then chosen as the node which minimizes: Wlink all adjacent links ∗ (number of times root) (19) all adjacent links Clink Once the root is chosen, the ST is built by using a method such as Dijkstra’s shortest path [9]. The weights are then updated and the process is repeated a ﬁxed number of times N b ST . Algorithm 1. Spanning Forest Generation Require: A graph G = (V, A) Require: Cl the capacity of link l, ∀l ∈ A Require: Wl the weight of link l, ∀l ∈ A Ensure: A diversified Spanning Forest 1: Set Wl = 1 ∀l ∈ A 2: for all i ∈ [1, ..., N b ST ] do 3: Select a root minimizing the quantity of (19) 4: Build a spanning tree STi rooted at the selected node 5: Wl ← Wl + Δ, ∀l ∈ STi 6: end for

4.2

Mapping the VPNs on the Spanning Forest

Once the spanning forest has been generated, the VPNs have to be mapped to it. In this part, a simple “greedy” method is applied. The VPNs are sorted by their volume (ie: by decreasing order of the quantity Di ∗ N b Clients(i) using the same notations as in section 3.2). Next, the largest VPN is mapped to the ST of the Forest which minimizes the load of the maximally loaded link. If more than one ST have the same minimum value for the maximally loaded link, the ST which satisﬁes the VPN while using the total minimum amount of bandwidth in the network is selected.

5

Results

In this section, we applied the heuristic method described in the previous section to three diﬀerent networks. The ﬁrst one is a 11 node- and 14 (bidirectional) linknetwork, based on the French research network VTHD [11]. The second network is a 21 node- and 36 link-network, based on an Italian network [12]. The third network is a highly connected 12 node- and 40 link- network, such as the one described in [7]. We compare our heuristic to the 3 algorithms described in [7]. The ﬁrst one constructs one ST associated to each VPN. This leads to as many STs as there are VPNs. Links have weights which increase with the load carried, and trees are constructed based on these weights. This technique is referred to as “MST with

712

Y. Brehon, D. Kofman, and A. Casaca Table 1. Relative Algorithm Performances

30 VPNs

1678 1084 930 862 794 5915 3658 2807 2872 2252 Max.load Total nb of trees 1 30 30 30 10.4 1 30 30 30 13.8

10 VPNs

Max.load Total nb of trees

535 1

390 354 349 270 1826 1267 1055 1221 785 10 10 10 6.72 1 10 10 10 7.29

DFGM

Enhanced MST

MST without traffic update MST with traffic update

Single ST

DFGM

Dense Enhanced MST

MST with traffic update

Single ST

DFGM

Enhanced MST

MST with traffic update

MST without traffic update

Single ST

Algorithm →

MST without traffic update

Network Italian

VTHD

1581 650 403 382 286 1 30 30 30 11.5 522 280 202 219 139 1 10 10 10 6.71

traﬃc update”. They further enhance their tree by mapping some links of some shortest paths to the constructed trees: this will be referred to as the “enhanced MST” technique. When the link weights are not updated after constructing a tree, that is, each ST is constructed for a VPN by simply selecting a root randomly among the VPN’s AP nodes, the technique is referred to as “MST without traﬃc update”. We compared the results of our heuristic with those provided by the above algorithms and the single ST strategy, by implementing all algorithms. For simplicity, networks have links which are all of equal capacity. We generate a ﬁxed number of VPNs, having each a random set of APs in the network. Each VPN asks for a random amount of bandwidth, uniformly chosen betwen 0 and 10 MBps. This amount must be available between any 2 nodes of the VPN. Δ was set to 1 in this chart. The results presented here are average values of 100 simulations. The results are presented in Table 1, and give the statistics for maximum load on a link and number of trees used. As the results show, our algorithm (DFGM ) outperforms the other 4 as far as load balancing is concerned. For the VTHD network, the maximum load on a single link is 7.9% lower on average than the best concurrent algorithm (here, the “enhanced MST”). For the Italian network, the improvement is of 19.8% over the “MST with traﬃc update”. We can also notice that for this large network which is not highly connected, the “enhanced MST” algorithm does not improve on the “MST with traﬃc update”. Finally, for the dense network, our algorithm improves the “enhanced MST” algorithm by 25%. When there are 30 VPNs (respectively 10), the “enhanced MST” and “MST with/without traﬃc updates” all use 30 (resp. 10) trees. The single ST uses a single tree to support all VPNs. However, our algorithm makes use of only 30% to 50% that number of trees. This does not have any meaning regarding traﬃc distribution, but it does mean a considerable label space savings (less VLANIDs). It also means there will be less signaling-related traﬃc, since there are fewer trees to maintain.

6

Conclusion

In this paper we tackled the problem of mapping VPNs to Spanning Trees in Metro Ethernet networks. We formulated this problem as a novel MINLP

Virtual Private Network to Spanning Tree Mapping

713

problem. However, due to tractability issues, we conceived a heuristic algorithm to solve the problem of optimally balancing the load of the VPNs accross the network, which is quite simple to implement by an operator. We compared the results of our heuristic to the other known methods of mapping VPNs to Spanning Trees which are available, and very good results were obtained. Further research on the subject is targeting the resiliency of the generated forests, to minimize the impact of a single failure.

References 1. IEEE 802.1q, Virtual Bridged Local Area Networks, IEEE, 2003. 2. L. Berger, Generalized Multi-Protocol Label Switching (GMPLS) Signaling Functional Description, Internet RFC 3471, January 2003, IETF. 3. IEEE 802.1d, Media Access Control Bridges, IEEE, 1998. 4. IEEE 802.1w, Rapid Spanning Tree Conﬁguration, IEEE, 2001. 5. IEEE 802.1s, Multiple Spanning Trees, IEEE, 2002. 6. M. Ali, G. Chiruvolu, A. Ge, Traﬃc Engineering in Metro Ethernet, IEEE Network, p.10-17 March/April 2005. 7. M. Padmaraj, S. Nair, M. Marchetti, G. Chiruvolu, M. Ali, A. Ge, Metro Ethernet Traﬃc Engineering Based on Optimal Spanning Trees, WOCN 2005, p.568-572. 8. X. He, M. Zhu, Q. Chu, Traﬃc Engineering for Metro Ethernet Based on Multiple Spanning Trees, ICN/ICONS/MCL 2006, p.97. 9. E. W. Dijkstra, A note on two problems in connection with graphs Numerische Mathematik volume 1, p. 8389, 1959. 10. CPLEX and AMPL optimization packages, ILOG, on-line, www.cplex.com ´ 11. Minist`ere de l’Economie, des Finances et de l’Industrie de France, R´eseau National de Recherche en T´el´ecommunications, on-line, www.vthd.org 12. R. Sabella, E. Iannone, M. Listanti, M. Berdusco, S. Binetti, Impact of transmission performance on path routing in all-optical transport networks, IEEE JSAC vol.6, p. 1617-1622, Dec 1988.

Optimal Topology Design for Overlay Networks Mina Kamel1 , Caterina Scoglio1, and Todd Easton2 Electrical and computer Engineering Department Industrial and Manufacturing Systems Engineering Department Kansas State University Manhattan, KS 66506, USA {mkamel,caterina,teaston}@ksu.edu http://www.eece.ksu.edu/networking 1

2

Abstract. Overlay topology design has been one of the most challenging research areas over the past few years. In this paper, we consider the problem of ﬁnding the overlay topology that minimizes a cost function which takes into account the overlay link creation cost and the routing cost. First, we formulate the problem as an Integer Linear Programming (ILP) given a traﬃc matrix in case of cooperative and non cooperative node behavior. Then, we propose some heuristics to ﬁnd near-optimal overlay topologies with a reduced complexity. The solutions of the ILP problem in average-size networks have been analyzed, showing that the traﬃc demands between the nodes aﬀects the decision of creating new overlay links. The heuristics are also compared through extensive numerical evaluation, and guidelines for the selection of the best heuristic as a function of the cost parameters are also provided.

1 1.1

Introduction Motivation

Peer-to-peer and many multimedia applications have recently grown with the need for high Quality of Service (QoS) [1], [2], [3], [4], [5] and [6]. Providing the required quality of service for these applications over a packet switching network has been a critical task since a long time. A recent approach for providing QoS without changing the network architecture is based on the use of overlay networks. An overlay network is an application-layer logical network created on top of the physical network. It is formed by all or a subset of the underlying physical nodes. The connections between each pair of overlay nodes are provided by overlay links which consist of paths composed by underlying physical links. Overlay networks can be used to improve performance and provide quality of service on the IP network, by routing data on the overlay links based on performance measurements. Among the most interesting open problems in overlay network design is the topology creation such as node location and link setup. These topics have recently been addressed in [7], [8], [9]. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 714–725, 2007. c IFIP International Federation for Information Processing 2007

Optimal Topology Design for Overlay Networks

1.2

715

Related Work

In designing the overlay topology [8], node behavior can be considered selﬁsh. In the selﬁsh behavior, nodes establish links in order to minimize their own costs. Consequently the global overlay network obtained by selﬁsh nodes can be diﬀerent from the optimal global network that could be created if the nodes behave in a cooperative way. This diﬀerence is called the cost of Anarchy. Selﬁsh and non-selﬁsh behaviors of the nodes in the networks have a great impact on the selection of the topology and its cost. The cost function used in [8] does not consider the demand volume between nodes as an important factor. Instead, we believe that when considering traﬃc demands, it is possible to obtain topologies that have better characteristics with respect to some keys graph-theoretic metrics introduced in [10], such as node degree and characteristic path length (CPL). In [7], the authors consider the static and the dynamic overlay topology design problems. The static overlay topology design is applied when there are no changes in the traﬃc requirements. In case that the communication requirements change over the time, the authors consider the dynamic overlay topology design based on two cost components: occupancy cost and reconﬁguration cost. However this approach is suited for service overlay networks, where an overlay service provider designs the overlay network. In [11], the authors address many topics concerning selﬁsh routing in Internet-like environments. They use the fully connected overlay topology to limit the parameter space and to reduce the complexity of the problem. They study the performance of the selﬁsh overlay routing when all the network nodes are included in the overlay network. Routing constraints are shown to have little eﬀect on the network-wide cost when varying network load. The goal of this paper is to study the problem of optimal topology design taking into account traﬃc demands, and to analyze the characteristics of the obtained optimal topologies in order to provide simple guidelines for the overlay topology design. 1.3

Contribution

In this paper, we consider the problem of ﬁnding the overlay topology that minimizes a cost function which is given by the weighted sum of the overlay link creation cost and the routing cost. The routing cost is proportional to the traﬃc demand. First, we formulate the problem as an Integer Linear Programming (ILP) for a given traﬃc matrix in case of cooperative (C node) and non-cooperative (N-C node) behavior. We assume that the nodes act non cooperatively if each node establishes overlay links to send only its traﬃc demands. The N-C node behavior is assumed to avoid the phenomenon of the free riding. Following [8], it has been noticed that in overlay topologies, only a few nodes establish most of the links and all the other nodes use those links to route their traﬃc. Consequently the resulting topology has few nodes with high degree, leading to a non-robust and unbalanced topology. The assumption of non-cooperative node behavior avoids transit traﬃc to be routed on newly created overlay links. On the other hand, if we consider that each node establishes overlay links to send its traﬃc demands and to allow other nodes to route

716

M. Kamel, C. Scoglio, and T. Easton

their traﬃc demands over them, the nodes act cooperatively. Both behaviors are considered when minimizing the overall network cost. The solutions of the ILP problem in average-size networks are analyzed, showing that the amount of traﬃc demands between the nodes aﬀect the decision of creating new overlay links, and the resulting optimal topologies are diﬀerent from the regular topologies obtained when neglecting traﬃc demands. Furthermore, some heuristics are proposed to ﬁnd near-optimal overlay topologies with a reduced complexity. Each heuristic is based on the selection of the best destination toward which to build an overlay link. Some heuristics are based on traﬃc volume, number of hops and a combination of both. Another heuristic is based on clustering the nodes and assigning leaders for each cluster. A ﬁnal heuristic allows each node to create new overlay links, where nodes are considered in a certain sequence. Extensive testing and simulations are done on the heuristics to compare the generated topology with the optimal ones. Guidelines for the selection of the best heuristic among the set of the proposed ones, as a function of the cost weight, are also provided. Summarizing, our contributions in the paper are: 1. Formulating the problem of establishing new overlay links in the network using ILP. 2. Proposing some heuristics to generate near optimal overlay topology. In section 2, we deﬁne the cost function and the ILP formulation of the optimal overlay network topology. In section 3, we present the proposed heuristics. In section 4, we show and explain the results of both the ILP problem formulation and the proposed heuristics. Finally, in section 5, we conclude and discuss some directions for the future work.

2 2.1

Overlay Topology Design Problem Formulation

Overlay networks are created at the application layer, over a given physical network. Overlay network nodes select their neighbors and establish direct overlay links creating an overlay topology. Let Gu = (N, E) be the graph representing the underlay, or physical network and G = (N, L) be the graph representing the overlay network. We have assumed that the same set of nodes N are in both the overlay and physical networks, while the set of overlay links can be diﬀerent from the set of physical links E. We deﬁne the default topology as the overlay topology having L ≡ E where all underlay links are also overlay links. Any logical link in L is setup on a path li,j composed by physical links on the shortest paths between node i and node j. Assuming that each node i ∈ N has a traﬃc demand toward a node subset Si ⊂ N , let di,j be the traﬃc demand between node i and node j in the subset Si . The objective for the node is to create logical links to be connected to all nodes in Si such that the total cost is minimized. The cost function is composed by two components:

Optimal Topology Design for Overlay Networks

717

1. Cost to create an overlay link between a pair of nodes, proportional to the number of hops in the shortest path on the physical network. 2. Cost to transport the traﬃc demands, proportional to the length of shortest path and the amount of traﬃc demand between a pair of nodes. The cost for node i to connect to each node k ∈ Si and carry traﬃc demand di,j is deﬁned: hi,k + ti,j di,j . (1) Ci = α k∈Bi

k∈Si

where Bi is the set of neighbors toward which node i has an overlay link with a neighbor node k, hi,k is the number of intermediate nodes in the physical path of li,k and ti,j is the number of transit overlay links in the path to node j. α is a cost coeﬃcient which represents the relative weight of the two cost components: link creation cost and traﬃc transport cost. The total cost of the overlay network is consequently deﬁned as: Ci . (2) C(G) = i∈N

The cost model deﬁned in the paper [8] and [9] is modiﬁed to include the trafﬁc demand. It is important to note that Ci is a function of both the location of the overlay link li,j and the demand di,j . Table 1 deﬁnes all the parameters in the cost function. Figure 1 shows a simple example of an overlay network topology over a given physical network. For example, considering the default network in the Figure where no overlay links are created, node 1 wants to send a traﬃc demand d1,5 to node 5 and a traﬃc demand d1,7 to node 7. If node 1 does not select any new neighbor node, it is only connected with node 2 and the cost for node 1 is only given by the routing cost. Since the number of links in the path from Table 1. The most used parameters and variables in the paper Parameters Deﬁnition hi,k Number of intermediate nodes in the physical path li,k . ti,j Number of transit overlay links in the path between node i and node j. li,k Number of hops in the shortest path between source node i and neighbor node k. α Overlay cost coeﬃcient. di,j Traﬃc demand between node i and node j. ai,j Element of the adjacent matrix equals to 1 if there is a physical link between node i and node j. Variable Deﬁnition δi,j Binary decision variable equals to 1 if there is an overlay link between node i and node j. yi,j,k Amount of ﬂow leaving node i going to node j started from node k.

718

M. Kamel, C. Scoglio, and T. Easton 1

2

8

3

4

6

7

1

2

8

3

5

2

8

3

6

7

6

7

Logical Network A

Default Topology

1

4

4

5

6

7

1

2

8

3

4

Logical Network C

Logical Network B 5

5

Fig. 1. Examples of default topology and logical networks

node 1 to node 5 equals to 4, the number of transit links to reach node 5 equals to 4-0-1=3 and to reach node 7 equals to 5-0-1=4, we have C1 = 3d1,5 + 4d1,7 . In case of the overlay network A, node 1 selects nodes 5 and 7 as neighbors, so two overlay links are setup: one connecting node 1 with node 5 and the other connecting node 1 with node 7. The total cost is only given by the cost of creating the logical links. The second cost component related to the transport of the demands is zero, since no transit links are used because there are direct overlay links between the source node and the destination nodes. In this case we have C1 = 3α+ 4α. Due to the diﬀerent behaviors of the nodes in the network, we classify the problem formulation into two categories. One is the non cooperative (N-C node) behavior and the other is the cooperative (C node) behavior. 2.2

Integer Linear Programming

In this section, we present the ILP formulations for the following two cases: 1. C node: The new overlay link built between any two nodes can be used to route the traﬃc demands of other nodes. 2. N-C node: The new overlay link built by a given source can only be used by that source to route the traﬃc demand . Consequently, the C node behavior implies the formulation of the global optimum while the N-C node implies the formulation of the local optimum for each source. C node behavior. The decision variables used in this problem formulation are δi,j and yi,j,k . δi,j is the binary decision variable of building an overlay link between node i and node j. It is also used in the N-C node problem formulation. yi,j,k represents the amount of ﬂow leaving node i going to node j started from node k. Table 1 deﬁnes the decision variables and the parameters used in the formulation. The objective function is formulated as: min 0.5αhi,j δi,j i∈N j∈N

+

i∈N j∈N k∈N

yi,j,k −

k∈N l∈N

dk,l .

(3)

Optimal Topology Design for Overlay Networks

subject to:

yk,j,k =

j∈N

dk,l ∀ k .

719

(4)

l∈N

(yi,j,k − yj,i,k ) = dk,j ∀ k, j, k = j .

(5)

i∈N

δi,j ≥ ai,j ∀ i, j .

(6)

yi,j,k ≤ M (δi,j + ai,j ) ∀ i, j .

(7)

k∈N

yi,j,k ≤ M (δi,j + ai,j ) ∀ i, j, k .

(8)

Eqn.(3) shows the cost of establishing an overlay link and the cost of routing the traﬃc demand. Eqns.(4-8) are the main constraints to the optimization problem; Eqn.(4) shows the total amount of the traﬃc demands sent by each node; Eqn.(5) represents the balance of the coming and outgoing traﬃc demands through any node in the network; In Eqn.(6) we consider all the physical links are overlay links; Eqns.(7-8) show that the traﬃc demand can be routed on any new overlay link according to the shortest path between the source node and the destination node. These equations are called the link load equations [12] because the traﬃc demand on each link cannot exceed the link capacity. M is a large number which represents the incapacitated problem. N-C node behavior. The C node problem formulation is a global optimization and the N-C node problem formulation can be reduced from the C node formulation as a local optimization. Each source node creates overlay links for its beneﬁt to satisfy the demand volume to all its destinations. By repeating this process for each node in the network, the obtained overlay topology is the optimal overlay topology of the N-C node behavior. The ﬁnal topology is the union of each source-multi destinations optimal topology. When reducing the C node formulation to the N-C node formulation we replace δi,,j with δj and replace both the source index i in ai,j and the source index k in yi,j,k and dk,l respectively with the source number. The problem formulation becomes, 0.5αhsource,j δj min j∈N

+

yi,j,source −

i∈N j∈N

dsource,l .

(9)

l∈N

subject to: j∈N

ysource,j,source =

l∈N

dsource,l .

(10)

720

M. Kamel, C. Scoglio, and T. Easton

(yi,j,source − yj,i,source ) = dsource,j ∀j = source .

(11)

i∈N

δj ≥ asource,j ∀ j .

(12)

yi,j,source ≤ M (δj + ai,j ) ∀ i, j .

(13)

Algorithm 1. N-C node behavior Adjacent Matrix=[] for i = 1 to N do Run the C node formulation for source i Adjacent Matrix[i,:]=δj end for Generate the optimal overlay topology from the Adjacent Matrix

Algorithm 1 shows the generation of the optimal overlay topology for the N-C node behavior. The problem of creating overlay links in the network is NP-complete because it can be reduced to the Hamiltonian Path Completion problem which is in the NP-complete class [13].

3

Proposed Heuristics

In this section, we introduce some heuristics based on a greedy approach, a node clustering approach, maximum number of hops and maximum traﬃc volume. All the proposed heuristics can be applied to both the N-C node and C node behaviors to generate near optimal overlay topologies. 3.1

Greedy Heuristic

In this heuristic, a sequence of nodes is selected. The ﬁrst node selects the best neighbor to minimize its incremental cost and establishes a new overlay link. The next node in the sequence also selects the best neighbor node, taking into account the previously established overlay links if nodes are C-node. 3.2

Node Clustering Heuristic

The shortest path between any source-destination pair contains nodes with high node degree on it. In this heuristic, nodes in the network are grouped in a decentralized way. In each group, there is a leader node which has high node degree. We deﬁne a relay node which is the nodes physically connected with more than one leader node in the network. Ordinary nodes are the remaining nodes in the group. The leader nodes in the network establishes direct overlay links between them. In order to create the groups and select the leaders, we propose the following decentralized procedure. Each node i sends information

Optimal Topology Design for Overlay Networks

721

about its node degree to the physical neighbors and it receives their node degrees information. If a given node has the highest node degree among its neighbors, it will consider itself a leader node. If not, it may be either a relay node or an ordinary node. If node i is a leader node, it informs all its physical neighbors that it becomes the leader of the group. If any ordinary node receives at least two messages from diﬀerent leader nodes, it will consider itself as a relay node, it selects randomly one leader and it will begin to inform its neighbors about the selected one. If an ordinary node does not receive information from any leader node, it selects the neighbor node with the maximum node degree and joins its group. Each leader node in the network maintains a list of all the leader nodes in the network. When a leader node receives information about a new leader in the network, it saves it in its leader nodes list. Using this list, each leader node runs the C node optimization program to decide about the new overlay neighbor nodes toward which it builds overlay links. 3.3

Max-Length, Max-Demand and Max-Length-Demand Heuristics

From the cost function characteristic eqn.(1), it is evident that establishing overlay links toward far destinations and/or carrying high traﬃc volumes is economically advantageous. Based on these motivations, we propose the following heuristics where each node establishes an overlay link with a destination having maximum distance (max-length) max(li,j ), maximum traﬃc demand (max-demand) max(di,j ) and maximum product of distance and traﬃc demand (max-length-demand) max(li,j di,j ) respectively. If the source node ﬁnds more than one destination with the same maximum decision parameter, it randomly chooses one and builds with it an overlay link. Finally, each node informs its physical neighbors to update the shortest paths to all their destinations if nodes are C-node.

4

Results and Discussion

The ILP formulations which provide optimal overlay topologies and the heuristics are applied to a 24-node network with average node degree of 3.583 representing a US nation-wide IP backbone network topology [14]. Two traﬃc scenarios matrices are used 1) homogeneous traﬃc matrix 2) random traﬃc matrix. We compute the network costs and some graph metrics characterizing the generated topologies. 4.1

Integer Linear Programming

N-C node behavior. Figure 2 shows the overall network cost and some metrics graph characterizing the generated optimal overlay topologies. When the traﬃc demand matrix is homogeneous, few optimal overlay topologies are found for α intervals. For this reason, the graph metrics in those intervals are constant. For example, when 1 < α ≤ 4, the optimal topology (T1) is the fully connected

722

M. Kamel, C. Scoglio, and T. Easton

25

Average Node Degree

7000 6000

Overall Cost

T3 5000 4000

T2

3000 2000 1000 0

T1 0

5

10

α

15

20

0

5

10

α

15

20

25

15

20

25

1.6 1.5 1.4 1.3 1.2 1.1 0

5

10

α

15

20

25

Number of New Overlay Links

250

1.7

CPL

T3 10

Homogeneous Traffic Demand Random Traffic Demand

1.8

1

T2 15

5

25

T1

20

200

150

100

50

0

0

5

10

α

Fig. 2. Overall network cost, average node degree, characteristic path length and number of new overlay links for diﬀerent values of α in case the N-C node behavior for both the random and the homogeneous traﬃc matrices

network. When 7 < α ≤ 10, the optimal topology (T2) is a less connected graph and the average node degree is constant and equal to 16.5. When the traﬃc demand matrix is random, the overall cost increases smoothly. When α is very small (1 < α ≤ 2), the optimal overlay topology is very close to the fully connected network. As α increases, the topology becomes less dense approaching the default topology. C node behavior. Figure 3 shows the overall network cost and some metrics graph characterizing the generated optimal overlay topologies. When the traﬃc demand matrix is homogeneous, few optimal overlay topologies are found for some intervals of α, similar to the intervals found in N-C node behavior results. The results show that The network cost of the N-C node is higher than the network cost of the C node. The average node degree of the N-C node and number of new overlay links are higher than those of the C node. When α is very small, the optimal overlay topologies of the N-C node and the C node behaviors are similar for both the homogeneous and the random traﬃc matrices. As α increases, the optimal overlay topology of the N-C node is more dense than the optimal overlay topology of the C node. In the N-C node behavior, the source nodes build many overlay links to minimize the overall cost while in the C node behavior, the source nodes don’t build many overlay links, since they can use new overlay links built by other nodes. Running time. The running time T (in hh:mm:ss) to solve the ILP problem is summarized as follow: – N-C node behavior: For homogeneous traﬃc demand T =00:01:30 for α=10 and T =00:07:50 for α=24. For random traﬃc demand T =00:01:28 for α=10 and T =00:10:17 for α=24.

Optimal Topology Design for Overlay Networks 25

T3

5000 4000

T2

3000 2000 1000 0

T1

0

5

10

α

15

20

1.8 1.7 1.6

CPL

1.5 1.4 1.3 1.2 1.1 1

0

5

10

α

T2 15

T3 10

5

25

T1

20

0

5

10

Homogeneous Traffic Demand Random Traffic Demand

Number of New Overlay Links

Overall Cost

Average Node Degree

6000

15

20

25

723

α

15

20

25

15

20

25

250

200

150

100

50

0

0

5

10

α

Fig. 3. Overall network cost, average node degree, characteristic path length and number of new overlay links for diﬀerent values of α in case the co operative behavior of the nodes for both the random and the homogeneous traﬃc matrices 12000

Overall Cost

10000 8000 6000 4000

ILP Greedy Node Clustering Max−Length

2000 0

0

5

10

α

15

20

25

10000

Overall Cost

8000

6000

ILP Greedy Node Clustering Max−Length Max−Demand Max−Length−Demand

4000

2000

0

0

5

10

15

α

20

25

30

Fig. 4. Comparison between the diﬀerent heuristics and the ILP results: a)Homogeneous traﬃc demand=10 b)Random traﬃc demand with maximum value=20

– C node behavior: For homogeneous traﬃc demand T =00:10:40 for α=10 and T =03:08:54 for α=24. For random traﬃc demand T =00:03:40 for α=10 and T =01:01:55 for α=24. Obviously, the running time of the C node problem is much greater than the running time of the N-C node problem. Therefore, in the following section, we apply our heuristics to solve the optimization problem for the C-node behavior. Clearly, when the size of the problem increases (number of nodes N), our heuristics will be needed to solve the N-C node optimization problem too. 4.2

Heuristics

Our ﬁve heuristics are compared with the ILP results. For the C node behavior, The ILP C node cost curve represents the lower bound for any topology and

724

M. Kamel, C. Scoglio, and T. Easton

for any value of α as shown in Figure 4. When the traﬃc demand matrix is homogeneous, the greedy heuristic and the ILP results are the same for small values of α. As α increases, the greedy heuristic is still the best heuristic but not the same as the ILP results. When α is greater than twice the value of the homogeneous traﬃc demand, the max-length heuristic is the best. When the traﬃc matrix is random, the greedy heuristic is the best and approaches the optimality up to α equal to the maximum traﬃc demand. As α increases the max-demand heuristic becomes the best one. The default topology is the solution for the greedy heuristic when α is greater than twice the value of the maximum traﬃc demand. In addition, we found that the overall cost does not change for diﬀerent node sequences. Considering the cooperative behavior between leaders in the node clustering heuristic, the relationship between the overall network cost and α is linear.

5

Conclusion and Future Work

The objective of this paper is to ﬁnd the optimal overlay network topology considering both the routing cost and the overlay link creation cost. We formulate the problem using the Integer Linear Programming for both the non cooperative and cooperative node behaviors. In addition, we propose some heuristics to select the near optimal topology when the problem size increases. We consider two diﬀerent traﬃc scenarios: homogeneous and random traﬃc demands. Our results show that the selection of the best heuristic among the set of the proposed ones is a function of α. In case the traﬃc demand is homogeneous, the greedy heuristic has the minimum cost function when α is less than or equal to twice the value of the traﬃc demand. For larger values of α, the max-length heuristic is selected to have the minimum cost function. This happens because creating overlay links with far destinations reduces the number of hops in the shortest paths and other nodes can use those new overlay links to route their traﬃc demands. In case the traﬃc demand is random, the greedy heuristic is selected when α is less than the maximum value of the traﬃc demand. When α is greater, max-demand is the best heuristic. This means that the nodes build direct overlay links with the destinations having large amount of traﬃc demand, to avoid the transit of large demands over intermediate nodes. Future work will focus on studying the overlay topology creation and adaptation in case of unknown and variable traﬃc demand and for diﬀerent realistic underlay topologies. Acknowledgments. The authors would like to thank Dr. Tricha Anjali for the helpful comments and discussion.

References 1. X. Gu, K. Nahrstedt, R. Chang, and C. Ward, “Qos-assured service composition in managed service overlay networks,” in In Proc. IEEE 23rd International Conference on Distributed Computing Systems, Providence., May 2003.

Optimal Topology Design for Overlay Networks

725

2. S. Baset and H. Schulzrinne, “An Analysis of the Skype Peer-to-Peer Internet Telephony Protocol,” in In Proceedings of the INFOCOM ’06, Barcelona, Spain, April 2006. 3. S. Vieira and J. Liebeherr, “Topology design for service overlay networks with bandwidth guarantees,” in Proceedings of IWQoS 2004, Montreal, Canada, June 2004. 4. Zhi Li and P. Mohapatra, “Qron: Qos-aware routing in overlay networks,” Selected Areas in Communications, IEEE Journal, vol. 22, pp. 29–40, 2004. 5. B. Zhao, L. Huang, J. Stribling, S. Rhea, A. Joseph, and J. Kubiatowics, “Tapestry: A resilient global-scale overlay for service deployment,” IEEE Journal on Selected Area in Communications, Special Issue on Service Overlay Networks, vol. 22, no. 1, Jan 2004. 6. J. Han, D. Watson, and F. Jahanian, “Topology Aware Overlay Networks,” in Proceedings of IEEE INFOCOM’05, Miami, USA, March 2005. 7. J. Fan and M. Ammar, “Dynamic Topology Conﬁguration in Service Overlay Networks: A Study of Reconﬁguration Policies,” in Proceedings of IEEE INFOCOM’06, Barcelona, Spain, April 2006. 8. B. Chun, R. Fonseca, I. Stoica, and J. Kubiatowicz, “Characterizing selﬁshly constructed overlay networks,” in In Proceedings of IEEE INFOCOM’04, Hong Kong, March 2004. 9. A. Fabrikant, A. Luthra, E. Maneva, C. Papadimitriou, and S. Shenker, “On a Network Creation Game,” in in Proceedings of ACM PODC, 2003. 10. H. Zhang, J. Kurose, and D. Towsley, “Can an Overlay Compensate for a Careless Underlay?,” in Proceedings of IEEE INFOCOM’06, Barcelona, Spain, April 2006. 11. Lili Qiu, Yang Richard Yang, Yin Zhang, and Scott Shenker, “On selﬁsh routing in internet-like environments,” in Proceedings of the ACM SIGCOMM, august 2003, All ACM Conferences, pp. 151–162. 12. M. Pioro and D. Medhi, Routing, Flow, and Capacity Design in Communication and Computer Networks: Chapter 4, Morgan Kaufmann, San Fransisco, CA, 2004. 13. F. T. S. Chen, Boesch and J. McHugh, “;On covering the points of a graph with point disjoint paths,” in Graphs and Combinatorics (Proc. Capitol Conf. on Graph Theory and Combinatorics), 1974. 14. “Keyao zhu,” www.networks.cs.ucdavis.edu/.

Construction of a Proxy-Based Overlay Skeleton Tree for Large-Scale Real-Time Group Communications Jun Guo and Sanjay Jha School of Computer Science and Engineering The University of New South Wales, Sydney, NSW 2052, Australia {jguo,sjha}@cse.unsw.edu.au

Abstract. We consider the problem of constructing a proxy-based overlay skeleton tree (POST) in the backbone service domain of a two-tier overlay multicast infrastructure. Spanning all multicast proxies deployed in the overlay backbone, POST acts as an eﬃcient resource sharing platform for supporting large numbers of concurrent multicast sessions, without the need of tree computation for each individual session. The problem is concerned with deciding an appropriate deployment of multicast proxies in the overlay backbone, upon which we wish to ﬁnd an optimal POST solution so that the maximum end-to-end latency is minimized subject to degree balancing constraints. This problem is shown to be NP-hard. We present a simple heuristic method for deploying multicast proxies, and devise a low complexity greedy algorithm for optimizing the end-toend latency and degree distribution of POST. Simulation experiments conﬁrm that our proposed approach yields good quality approximate solutions that are close to the optimum.

1

Introduction

Due to the complexity of deploying Internet-wide IP multicast [1] and the inability of end-system multicast to support high-bandwidth delay-sensitive group communications [2,3], proxy-assisted two-tier overlay multicast has been upheld as a viable alternative to support manageable and scalable inter-domain multicast services for real-time applications on the Internet (see [4,5,6] and references therein). In this paper, we refer to the proxy-assisted overlay multicast architecture as overlay multicast for brevity. One salient feature of overlay multicast is the set of dedicated multicast proxies that a multicast service provider strategically deploys in the backbone service domain. Advances in hardware technologies have made it possible to endow proxies with large processing power and high fanout capability at ever decreasing cost. Multicast proxies are typically placed at co-location facilities with high-speed connection to the Internet. This way, they can readily utilize the over-provisioned links in the core networks [7] without experiencing significant variations in end-to-end delay [8]. In comparison with peer-to-peer trees I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 726–737, 2007. c IFIP International Federation for Information Processing 2007

Construction of a Proxy-Based Overlay Skeleton Tree

727

formed entirely by end hosts with limited last-mile bandwidth, overlay multicast trees based on multicast proxies are much ﬂatter and wider. Moreover, they are fairly static and thus eliminate large control overhead caused by dynamic tree maintenance in end-system multicast. Since multicast proxies are application layer servers and thus can use unicast mechanism to communicate between each other, overlay multicast eﬀectively obviates the need for router support as required by IP multicast. In a high level description, multicast proxies can be classiﬁed into edge proxies and transit proxies. As illustrated in Fig. 1, edge proxies are placed at the edge of the overlay backbone. Transit proxies are placed deep in the overlay backbone. Transit proxies and edge proxies constitute Tier 1, while edge proxies and end hosts constitute Tier 2. Each edge proxy acts as a member proxy for the cluster of end hosts within its service area, and sends the multicast traﬃc into or receives the multicast traﬃc from the overlay backbone by means of a transit proxy. Each transit proxy is responsible for replicating and forwarding multicast traﬃc to other transit proxies or edge proxies. Since overlay multicast is designed to support multiple concurrent multicast sessions in the backbone service domain, multicast proxies essentially constitute a resource sharing environment. This gives rise to the concern of how to manage the multicast state scalability issue, so that the overlay backbone can support as many groups as possible without incurring signiﬁcant control overhead. In the literature, the idea of aggregated multicast with intergroup tree sharing has been proposed [9]. Multicast sessions with approximately the same set of end hosts are forced to share a single overlay topology to reduce the global control overhead. Based on the aggregated multicast approach, the overlay multicast routing protocol proposed in [6] was shown to signiﬁcantly reduce the control overhead for establishing and maintaining overlay multicast trees. In this paper, we introduce the concept of proxy-based overlay skeleton tree (POST) and advocate it as an eﬃcient approach to further reduce the computation cost and control overhead of overly multicast trees. POST is computed in such a way that it spans all multicast proxies provisioned in the overlay backbone. The computation of POST is feasible due to the fact that in practice the multicast service provider has complete administrative control over all multicast proxies deployed in the overlay multicast network. POST can be used as a default tree to support any multicast session in the overlay backbone, so long as all branches along the skeleton tree connecting the participating edge proxies have enough bandwidth to support the requested multicast session. For example, the skeleton tree depicted in Fig. 1 can be used to support a multicast session participated by edge proxies B, C and D, so long as all the four branches, namely (B,G), (G,H), (C,H) and (D,H), have enough bandwidth. POST can also be used to support low-overhead control message passing among multicast proxies in the overlay backbone. Two measures of “goodness” for a POST are end-to-end latency and degree distribution. On the one hand, an overlay routing path joining two edge proxies in the POST is likely to traverse a series of transit proxies. Consequently, it

728

J. Guo and S. Jha

Fig. 1. Proxy-assisted two-tier overlay multicast architecture

is important to minimize the maximum end-to-end latency (i.e. tree diameter) of the POST. This is essential to the provision of good quality delay-sensitive multicast services in the overlay multicast network. On the other hand, each transit proxy in the overlay backbone has in general one interface with limited access bandwidth to the Internet. Thus, it is important to distribute the degree evenly between transit proxies for load balancing and to reduce access link stress. Higher link stress indicates greater contention for an interface and is likely to result in higher congestion and packet loss. Our contribution in this paper is to propose an optimization problem for constructing an eﬃcient POST in the multicast overlay backbone. We also demonstrate in this paper that appropriate placement of multicast proxies can have signiﬁcant impact on the performance of POST. The optimization problem is concerned with deciding an appropriate deployment of multicast proxies in the overlay backbone, upon which we wish to ﬁnd an optimal POST solution so that the tree diameter is minimized subject to degree balancing constraints. We frame the problem as a constrained spanning tree problem, which is shown to be NP-hard. We present a simple heuristic method for deploying multicast proxies, and devise a low complexity greedy algorithm for optimizing the end-to-end latency and degree distribution of the POST. The rest of this paper is organized as follows. We discuss the related work in Sect. 2. Section 3 deals with the formulation of the optimization problem. In Sect. 4, we describe our solution method. Simulation experiments are reported in Sect. 5. Finally, we provide concluding remarks in Sect. 6.

2

Related Work

Existing proposals in [4,5] for building an overlay multicast backbone tree were all based on the assumption that the proxy deployment is given. Also, these proposals considered tree computation for a single multicast session only. The routing protocols proposed in [4] addressed two constrained spanning tree problems for single-source streaming applications: 1) Minimize the maximum end-to-end latency subject to access bandwidth constraint at each participating multicast proxy; 2) Minimize the average end-to-end latency subject to access bandwidth constraint at each participating multicast proxy. The routing protocols proposed

Construction of a Proxy-Based Overlay Skeleton Tree

729

in [5] addressed two constrained spanning tree problems for multi-source multicast applications: 1)Minimize the tree diameter subject to access bandwidth constraint at each participating multicast proxy; 2) Balance the access bandwidth usage at each participating multicast proxy subject to tree diameter bound. The CT algorithm proposed in [5] is closely related to our work, which can be modiﬁed to compute a POST in our context. In [10], Lao, Cui, and Gerla considered the deployment problem for edge proxies in the overlay backbone. They adopted a similar approach to the K-median problem commonly used for web server replica placement in content distribution networks [11]. Their approach is used in this paper for the deployment of edge proxies. In our context, we further study the deployment problem for transit proxies, and demonstrate that it has signiﬁcant impact on the end-to-end latency performance of POST.

3

Problem Formulation

Consider an overlay network in the form of a complete undirected graph G = (W, P, E). W is the set of N edge proxies which have been deployed using the approach of [10]. P is the set of K potential nodes in the physical topology for deploying M , M < K, transit proxies. E is the set of undirected edges, where an edge (i, j) ∈ E between two nodes i, j ∈ W ∪ P represents the bidirectional unicast path with long-term average latency li,j between i and j in the physical topology. In practice, unicast latency quantities can be obtained from active or passive measurements of TCP connection round trip time [8,12]. From P , we wish to decide an appropriate set of nodes PT for deploying the M transit proxies. We then wish to form a POST, i.e. T = (W, PT , ET ), spanning all nodes in W ∪ PT where ET are the edges included in T . An edge (i, j) ∈ E can be included in T if and only if at least one of the two nodes i and j is in PT . Consequently, the set of internal nodes of T is composed of transit proxies only, and the set of leaf nodes of T is exclusively composed of edge proxies. For each pair of edge proxies i and j in T , we deﬁne Ri,j as the set of edges that form the overlay routing path between i and j. Let Li,j denote the latency of the overlay routing path between i and j. Given the unicast latency matrix {li,j }, we readily have Li,j = li,j . (1) (i,j)∈Ri,j

Let Lmax denote the maximum end-to-end delay (i.e. the diameter of T ), which is given by Lmax = max Li,j . (2) i<j∈W

Consider the case where the unicast latency matrix {li,j } satisﬁes the triangle inequality. Thus, the latency Li,j of the overlay routing path between any two edge proxies i and j can not be smaller than the latency li,j of the unicast path between the two nodes. This allows us to establish the lower bound (LB) on Lmax as

730

J. Guo and S. Jha

LLB max = max li,j . i<j∈W

(3)

Let d(i) denote the degree of transit proxy i. The following proposition states a property on the sum i∈PT d(i) of degrees of transit proxies in T . Proposition 1. The sum of degrees of transit proxies in T is N + 2M − 2. Proof. Consider the subtree of T composed of M transit proxies only. It follows from Corollary 1.5.3 of [13] that a spanning tree with M nodes has exactly M −1 edges. Each such edge is incident to two transit proxies, which contributes two counts towards the sum of degrees of transit proxies. Now, each of the N edge proxies must be connected to one transit proxy, and thus contributes one count towards the degree of the transit proxy to which it is connected. Therefore, the sum of degrees of transit proxies in the entire T is N + 2M − 2. We deﬁne a degree balancing index F , given by F = max d(i) − min d(i) . i∈PT

i∈PT

(4)

A smaller value of F indicates a more balanced degree distribution among transit proxies. Let F LB denote the lower bound on F . Clearly, F LB = 0 if N + 2M − 2 = kM, k = 1, 2, . . .

(5)

and we require each transit proxy to have a degree of exactly k in T . In situations where (5) does not hold, F LB = 1. In such cases, letting k = N +2M−2 and M n = N + 2M − 2 − kM , we require n transit proxies to have a degree of exactly k + 1, and the remaining M − n transit proxies to have a degree of exactly k. Before we present the formal statement of the optimization problem, we shall ﬁrst clarify the motivation of such a problem by considering a 14-node example illustrated in Fig. 2. We assume that ﬁve edge proxies have been deployed at nodes 1, 2, 4, 7 and 10. A routine computation of (3) on the unicast latency matrix {li,j } gives

Fig. 2. Overlay topology and unicast latency matrix

Construction of a Proxy-Based Overlay Skeleton Tree

731

Fig. 3. Solutions for POST

LLB max = 14 for this particular instance. From the remaining nine nodes, we wish to choose three nodes for deploying the transit proxies. Since M = 3 and N + 2M − 2 = 9, we have F LB = 0. To achieve this, we require each transit proxy to have a degree of exactly three in T . An optimal solution displayed in Fig. 3 chooses nodes 5, 6 and 9 for placing the transit proxies, and computes a POST with minimal tree diameter of 14 that hits LLB max . Our proposed heuristic method chooses nodes 3, 6 and 9 for the transit proxies. The best POST based on this heuristic proxy placement solution achieves Lmax = 18. In contrast, a random placement of transit proxies at nodes 3, 8 and 12 incurs a large tree diameter of 28. This justiﬁes that appropriate placement of transit proxies in the multicast overlay backbone can have large impact on the latency performance of POST. Definition 1. Minimum diameter degree-balanced spanning tree problem. Given a complete undirected graph G = (W, P, E), ﬁnd a constrained spanning tree T = (W, PT , ET ) such that the tree diameter Lmax is minimized and T is subject to constraints on: 1) PT ∈ P , |PT | = M ; 2) All nodes in PT are for internal nodes, while all nodes in W are for leaf nodes; 3) F is restricted to F LB . For brevity, in the rest of this paper, we shall refer to this problem as the MDDB problem. Such a heavily constrained spanning tree problem is NP-hard, since its decision problem can be reduced to the NP-complete weighted diameter problem ([14], page 205). We thus resort to heuristic methods to ﬁnd nearoptimal solutions for this challenging problem.

4

Solution

Our approach for solving the MDDB problem contains two parts. In the ﬁrst part, we present a simple heuristic method for deploying the set of transit proxies. In the second part, we devise a low complexity algorithm for greedily optimizing the end-to-end latency of T subject to degree balancing among transit proxies.

732

J. Guo and S. Jha

4.1

Proxy Deployment

For all i ∈ P , we obtain si , which is computed by si = li,j .

(6)

j∈W

We choose M nodes in the set P with the smallest values on si to constitute the set of transit proxies PT . We deﬁne the node with the smallest value on si as the central proxy, since it has the smallest summed distance to all edge proxies in the set W . For brevity, in the rest of this paper, we refer to this method as HPTP, which stands for Heuristic Placement of Transit Proxies. 4.2

Degree Balancing

An important objective of our greedy algorithm is to guarantee that the degree balancing index F of internal nodes of T is restricted to F LB . As we have discussed in Sect. 3, if F LB = 0, we require each of the M internal nodes to have . This case can be easily resolved. a degree of exactly k, where k = N +2M−2 M For each node j that has just been added to the partial tree of T , we check the cumulated degree d(i) of the internal node i which connects node j. If d(i) = k, we mark node i so that node i will not be considered by any of the remaining unconnected nodes. On the other hand, if F LB = 1, we require n internal nodes to have a degree of exactly k + 1, and the remaining M − n internal nodes to have a degree of exactly k, where in this case k = N +2M−2 and n = N + 2M − 2 − kM . To M achieve this, we make sure that once the n-th internal node whose degree reaches k + 1 has been marked, we further check for each unmarked internal node if its degree has reached k. If so, we mark such nodes accordingly, again to make sure that they will not be considered by any of the remaining unconnected nodes. 4.3

Tree Construction

Let V denote the set of all transit proxies in the partial tree T of T constructed so far. Let V = PT − V denote the set of all transit proxies not yet in T . Let P¯ denote the set of all internal nodes that have been marked. Let σi denote the latency of the longest overlay routing path from an unmarked internal node i to any other node in T . Starting from the initial T with V including the central proxy only, for each unconnected transit proxy j in the set V , we ﬁnd an unmarked internal node i in T that minimizes δj = lj,i + σi . We then identify the unconnected transit proxy j in the set V with the smallest value on such δj , and add it to T by creating an edge joining node i and node j. After node j is added to the tree, we update V and V , and we apply the degree balancing module to check if node i needs to be marked into the set P¯ . This procedure is iterated until all M transit proxies have been connected. In cases where ties exist either on i or on j, we break the ties on i by choosing node i such that v∈V (lv,i + σi ) is the largest. Intuitively, if we do not go for

Construction of a Proxy-Based Overlay Skeleton Tree

733

node i at this iteration, later node i would more likely lead to larger σi values of some other transit proxies not yet in the tree. On the other hand, we break the ties on j by choosing node j such that v∈V −P¯ (σv + lv,j ) is the largest. Similarly, if we do not connect node j at this iteration, later node j would more likely be connected to an internal node which yields larger σj value of node j. After all transit proxies have been connected, we now let V denote the set of all edge proxies in the partial tree T of T constructed so far, and Let V = W − V denote the set of all edge proxies not yet in T . At any iteration, for each unconnected node j in the set V , we ﬁnd an unmarked internal node i in T which minimizes δj = lj,i + σi . We then identify the unconnected node j in V with the largest (not smallest) value on such δj , and add it to T by creating an edge joining node i and node j. Intuitively, if we do not connect node j at this iteration, later it would more likely contribute towards a larger value on the tree diameter. Once node j is added to the tree, we update V and V , and we again apply the degree balancing module to check if node i needs to be marked into the set P¯ . In cases where ties exist either on i or on j, we break the ties on i again by choosing node i such that v∈V (lv,i + σi ) is the largest. Intuitively, if we do not go for node i in this case, later node i would certainly lead to larger overlay latency between all nodes it has connected and some unconnected leaf nodes it would connect. On the other hand, we break the ties on j by choosing node j such that v∈PT −P¯ (σv + lv,j ) is the largest. Similarly, if we do not connect node j at this iteration, later node j would more likely be connected to an internal node which contributes towards a larger value on the tree diameter. It can be shown that this greedy algorithm is O(M N 2 ) in complexity. For brevity, we refer to it as GOLD (Greedily Optimize Latency and Degree) in the rest of this paper.

5

Simulation Experiments

We have studied the performance of our proposed algorithms for the MDDB problem through detained simulation experiments. The various network topologies used in our experiments were obtained from the GT-ITM topology generator [15]. We used the ﬂat random graph model for the small size topologies and the transit-stub graph model for the large size topologies. Unicast latency between diﬀerent pairs of nodes ranges from 1 to 50 ms in the ﬂat random graphs and from 1 to 500 ms in the transit-stub graphs. The number of nodes in the ﬂat random graphs was varied between 14, 16 and 18. All transit-stub graphs had 1,200 nodes. In each graph, edge proxies were placed at a set of nodes, chosen uniformly at random. The number of edge proxies was ﬁxed to ﬁve in the ﬂat random graphs and varied between 400, 500 and 600 in the transit-stub graphs. The number of transit proxies was ﬁxed to three in the ﬂat random graphs and varied from 20 to 40 in the transit-stub graphs. Based on these network topologies, we design the following experiments to study the performance of our proposed algorithms for the MDDB problem. We have derived an integer linear programming (ILP) formulation for the MDDB

734

J. Guo and S. Jha Flat random (N=5, K=9, M=3)

Flat random (N=5, K=11, M=3)

20

Flat random (N=5, K=13, M=3)

50

80

16

14

12

5

10

15

40 35 30 25 20 15

HPTP ILP 10 0

70

Tree diamter (ms)

18

Tree diamter (ms)

Tree diamter (ms)

45

10 0

20

Case ID

10

15

50 40 30

HPTP ILP 5

60

20 0

20

HPTP ILP 5

Case ID

10

15

20

Case ID

Fig. 4. Quality of HPTP

Transit−stub (N=400)

Transit−stub (N=500)

16

14

12

20

Percentage deviation

18

18

16

14

12

ECT GOLD 10 20

Transit−stub (N=600)

20

Percentage deviation

Percentage deviation

20

25

30

M

35

18

16

14

12

ECT GOLD 40

10 20

25

30

M

35

ECT GOLD 40

10 20

25

30

35

40

M

Fig. 5. Performance comparison between ECT and GOLD

problem (see Appendix). The ILP formulation was used to ﬁnd optimal solutions of the MDDB problem for small size topologies. Note that this ILP model was also used to compute the best POST solution based on a given heuristic proxy placement solution obtained by HPTP. This is done by ﬁxing the indicator variables deﬁned for proxy placement in the ILP model to the corresponding nodes found by HPTP. This allows us to examine the quality of HPTP. Figure 4 presents the results from experiments with the ﬂat random graphs. For each graph, we have randomly generated 20 sets of diﬀerent edge proxy placement solutions. We observe from these results that the performance of HPTP is reasonably good, considering that it is such a simple approach. In many cases, HPTP hits the optimal solution identiﬁed by ILP. The worst case performance among these results is 40% away from the optimum. Since we are not able to use ILP to solve for large size problem instances due to its known exponential computational complexity, we used GOLD to compute approximate POST solutions for HPTP on large size topologies. The CT algorithm proposed in [5] was designed for a minimum diameter degree-limited spanning tree problem to address eﬃcient routing in the multicast overlay backbone. It is similar to Prim’s algorithm [16] for minimum spanning tree in a sense that the tree construction process starts from an arbitrarily selected node and takes an arbitrary tie-breaking approach. In contrast, GOLD starts from the central proxy of the overlay backbone and takes an elaborate tie-breaking approach in the tree construction process. For the purpose of comparison, we have directly combined the CT algorithm with the degree balancing

Construction of a Proxy-Based Overlay Skeleton Tree

735

Table 1. CPU time on a 3.2GHz Xeon machine for M = 40 N = 400 N = 500 N = 600 ECT 0.84 sec GOLD 0.90 sec

1.27 sec 1.35 sec

1.80 sec 1.95 sec

module of GOLD. The resulting tree construction method is referred to as enhanced CT (ECT) in the simulation experiments. Given that we have established the lower bound on Lmax , we used LLB max as the benchmark to measure the quality of HPTP on large size topologies. The Lmax results presented in Fig. 5 were plotted in the form of percentage deviation from LB LB LLB max , given by (Lmax − Lmax )/Lmax (%). These results clearly demonstrate the enhanced performance of GOLD in comparison with ECT. In all cases, GOLD improves Lmax by up to 4.4%. Such a performance gain is only at the expense of a modicum of additional computation cost, as demonstrated in Table 1. Moreover, HPTP shows suﬃciently good performance on these transit-stub graphs. Even though we rely on GOLD to ﬁnd approximate POST solutions, the percentage deviation is yet less than 18% in all cases.

6

Conclusion and Future Work

In this paper, we have introduced the concept of POST and advocated it as an eﬃcient approach to support large-scale real-time group communications in two-tier overlay multicast networks. We have demonstrated that appropriate placement of multicast proxies can have signiﬁcant impact on the latency performance of POST. We have proposed a constrained spanning tree problem to ﬁnd optimal solutions for POST such that its maximum end-to-end latency can be minimized subject to degree balancing constraints. A low complexity heuristic approach has been developed, which has been shown by simulation experiments to be scalable for large-size overlay multicast networks and yield good quality solutions of POST. The proposed algorithm can be utilized by a weight-coded genetic algorithm to further improve the solution quality [17]. In our future work, we plan to design eﬃcient restoration algorithms to endow POST with resilience capability [18]. Acknowledgments. We thank Dr Vijay Sivaraman for his insightful comments. This project is supported by Australian Research Council (ARC) Discovery Grant DP0557519.

References 1. Diot, C., Levine, B.N., Lyles, B., Kassem, H., Balensiefen, D.: Deployment issues for the IP multicast service and architecture. IEEE Network 14(1) (Jan./Feb. 2000) 78–88

736

J. Guo and S. Jha

2. Banerjee, S., Bhattacharjee, B., Kommareddy, C.: Scalable application layer multicast. In: Proc. ACM SIGCOMM 02. (Aug. 2002) 205–217 3. Chu, Y.H., Rao, S.G., Zhang, H.: A case for end system multicast. In: Proc. ACM SIGMETRICS 00. (Jun. 2000) 1–12 4. Banerjee, S., Kommareddy, C., Kar, K., Bhattacharjee, B., Khuller, S.: Construction of an eﬃcient overlay multicast infrastructure for real-time applications. In: Proc. IEEE INFOCOM 03. (Mar. 2003) 1521–1531 5. Shi, S.Y., Turner, J.S.: Routing in overlay multicast networks. In: Proc. IEEE INFOCOM 02. (Jun. 2002) 1200–1208 6. Lao, L., Cui, J.H., Gerla, M.: TOMA: a viable solution for large-scale multicast service support. In: Proc. IFIP NETWORKING 05. (May 2005) 906–917 7. Bhattacharyya, S., Diot, C., Jetcheva, J., Taft, N.: Pop-level and access-link-level traﬃc dynamics in a tier-1 POP. In: Proc. ACM IMW 01. (Nov. 2001) 39–53 8. Jaiswal, S., Iannaccone, G., Diot, C., Kurose, J., Towsley, D.: Inferring TCP connection characteristics through passive measurements. In: Proc. IEEE INFOCOM 04. (Mar. 2004) 1582–1592 9. Fei, A., Cui, J., Gerla, M., Faloutsos, M.: Aggregated multicast: an approach to reduce multicast state. In: Proc. IEEE GLOBECOM 01. (Nov. 2001) 1595–1599 10. Lao, L., Cui, J.H., Gerla, M.: Multicast service overlay design. In: Proc. SPECTS 05. (Jul. 2005) 11. Qiu, L., Padmanabhan, V.N., Voelker, G.M.: On the placement of web server replicas. In: Proc. IEEE INFOCOM 01. (Apr. 2001) 1587–1596 12. Paxson, V.: End-to-end internet packet dynamics. IEEE/ACM Trans. Networking 7(3) (Jun. 1999) 277–292 13. Diestel, R.: Graph Theory. 3rd edn. Springer-Verlag, New York (2005) 14. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, San Francisco (1979) 15. Zegura, E.W., Calvert, K.L., Bhattacharjee, S.: How to model an internetwork. In: Proc. IEEE INFOCOM 96. (Mar. 1996) 594–602 16. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. 2nd edn. MIT Press, Cambridge, M.A. (2001) 17. Guo, J., Jha, S., Banerjee, S.: GOLD: An overlay multicast tree with globally optimized latency and out-degree. Technical Report UNSW-CSE-TR-0615, The University of New South Wales, Australia (2006) 18. Lau, W., Jha, S., Banerjee, S.: Eﬃcient bandwidth guaranteed restoration algorithms for multicast connections. In: Proc. IFIP NETWORKING 05. (May 2005) 1005–1017

Appendix Here we provide an ILP formulation for the MDDB problem. For the convenience of forming the overlay routing path between any two edge proxies in the ILP formulation, we deﬁne E as a set of directed edges that include both directed edges i, j and j, i for each undirected edge (i, j) in E. Let the 0-1 variables xi , i ∈ P , indicate if a transit proxy is deployed at node i. Let the 0-1 variables (m,n) pi,j , m < n ∈ W , i, j ∈ E , indicate if the directed edge i, j is included in the overlay routing path between node m and node n. Let the 0-1 variables ti,j , (i, j) ∈ E, indicate if the undirected edge (i, j) is used by the POST. Let the

Construction of a Proxy-Based Overlay Skeleton Tree

737

non-negative variables dmax and dmin respectively identify the maximum degree and the minimum degree among the transit proxies. For the latter, it is useful to deﬁne a non-negative variable yi for each i ∈ P . Let the non-negative variable z identify the diameter of the POST. The MDDB problem can now be formulated as: Minimize z

(7)

subject to

xi = M

i∈P

(m,n)

pi,j

j:i,j∈E

−

(m,n)

pj,i

j:j,i∈E

(8)

⎧ ⎨ 1, if i = m = 0, if i ∈ W ∪ P − {m, n} ⎩ −1, if i = n

∀m < n ∈ W (m,n) (m,n) pi,j + pj,i ≤ N · (N − 1) · ti,j , ∀(i, j) ∈ E

m

ti,j +

j:(i,j)∈E

(10) (11)

tj,i ≤ 1, ∀i ∈ W

(12)

tj,i ≤ (N + M − 1) · xi , ∀i ∈ P

(13)

(m,n)

(14)

ti,j +

j:(i,j)∈E

ti,j = N + M − 1

(i,j)∈E

(9)

j:(j,i)∈E

j:(j,i)∈E

z≥

pi,j

li,j , ∀m < n ∈ W

i,j∈E

dmax ≥

ti,j +

j:(i,j)∈E

yi = (N + M − 1) · (1 − xi ) +

tj,i , ∀i ∈ P

j:(j,i)∈E

ti,j +

j:(i,j)∈E

(15) tj,i , ∀i ∈ P

(16)

j:(j,i)∈E

dmin ≤ yi , ∀i ∈ P

(17)

dmax − dmin ≤ F LB

(18)

Equation (8) restricts that M nodes in P are selected as transit proxies. Equations (9) and (10) ensure that the solution is a spanning tree that includes all edge proxies. More explicitly, they enforce one single overlay routing path between any two edge proxies. Equations (11) to (13) ensure that the set of leaf nodes of the spanning tree is exclusively composed of edge proxies, and the set of internal nodes is composed of transit proxies only. Equation (14) identiﬁes the diameter of the spanning tree. Equations (15) to (18) enforce the balance of degree among transit proxies. All the equations jointly ensure that the solution is a POST that satisﬁes all constraints of the MDDB problem.

Increasing the Coverage of a Cooperative Internet Topology Discovery Algorithm Benoit Donnet1 , Bradley Huﬀaker2 , Timur Friedman3 , and K.C. Claﬀy2 Universit´e Catholique de Louvain – CSE Department, Belgium 2 Caida – San Diego Supercomputer Center, USA Universit´e Pierre & Marie Curie – Laboratoire LIP6/CNRS, France 1

3

Abstract. Recently, Doubletree, a cooperative algorithm for large-scale topology discovery at the IP level, was introduced. Compared to classic probing systems, Doubletree discovers almost as many nodes and links while strongly reducing the quantity of probes sent. This paper examines the problem of the nodes and links missed by Doubletree. In particular, this paper’s ﬁrst contribution is to carefully describe properties of the nodes and links that Doubletree fails to discover. We explain incomplete coverage as a consequence of the way Doubletree models the network: a tree-like structure of routes. But routes do not strictly form trees, due to load balancing and routing changes. This paper’s second contribution is the Windowed Doubletree algorithm, which increases Doubletree’s coverage up to 16% without increasing its load. Compared to classic Doubletree, Windowed Doubletree does not start probing at a ﬁxed hop distance from each monitor, but randomly picks a value from a range of possible values.

1

Introduction

Today’s most extensive tracing system at the IP interface level, skitter [1], uses 24 monitors, each targeting on the order of one million destinations. In the fashion of skitter, scamper [2] makes use of several monitors to traceroute IPv6 networks. Other well known systems, such as Ripe NCC’s TTM [3] and Nlanr’s AMP [4], employ a larger set of monitors, on the order of one- to two-hundred, but they avoid probing outside their own network. Recent work indicates, however, the need to increase the number of traceroute sources in order to obtain a more accurate topology measurement. Indeed, it has been shown that reliance upon a relatively small number of monitors to generate a graph of the internet can introduce unwanted biases [5], [6]. One way of rapidly creating a large distributed monitoring infrastructure would be to deploy traceroute monitors in an easily downloadable and readily usable piece of software, such as a screensaver. This was ﬁrst proposed by J¨ org Nonnenmacher, as reported by Cheswick et al. [7]. The ﬁrst publicly downloadable distributed route tracing tool is Dimes [8], released as a daemon in September 2004. However, building such a large infrastructure leads to potential scaling issues: the quantity of probes launched might consume large amounts of network I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 738–748, 2007. c IFIP International Federation for Information Processing 2007

Cooperative Internet Topology Discovery Algorithm

739

resources and the probes sent from many vantage points might appear to endhosts or ﬁrewalls as a distributed attack. These problems were quantiﬁed in our prior work [9]. The Doubletree algorithm [9] is designed to perform large-scale topology discovery in a network friendly manner. Doubletree avoids retracing the same routes in the internet by taking advantage of the tree-like structure of routes fanning out from a source or converging at a destination. The key to Doubletree is that the monitors share information regarding the paths that they have explored. If one monitor has already probed a given path to a destination then another monitor should avoid that path. Probing in this manner can signiﬁcantly reduce the load on routers and destinations while maintaining high node and link coverage. However, even if Doubletree’s results, in terms of links and nodes coverage, are high (above 90%, compared to classic probing, such as skitter), some nodes and links are not reachable by Doubletree. This paper’s ﬁrst contribution is a careful study of the topological data missed by Doubletree. Based on a subset of skitter data, we simulate Doubletree and show that the majority of nodes and links missed is located between 9 and 20 hops from Doubletree monitors. We believe that these missed links and nodes are located between routes’ divergence and convergence points in the network. Doubletree does not take into account such points due to the way it models the network. Considering the tree-like structure of routes implies a static view of the network. But convergence and divergence of routes arise because of network dynamics as such points are created by load balancing or routing changes. Load balancing refers to the fact that routers might spread their traﬃc across multiple paths [10]. Three policies might be considered: per-ﬂow, per-packet and per-destination. Note that impacts of load balancing on traceroute-like probing are detailed in [11]. Based on this knowledge of missed information, we propose an improvement to Doubletree called Windowed Doubletree, this paper’s second contribution. Instead of starting probing at a ﬁxed point somewhere in the network, Windowed Doubletree builds a window based on the location of nodes and links missed. For each destination to probe, a Windowed Doubletree monitor picks up randomly a value in this window and then, probes in the same way as classic Doubletree. We evaluate Windowed Doubletree and ﬁnd that it is able to increase the classic Doubletree coverage by 16% while maintaining nearly the same impact on destinations and router interfaces as classic Doubletree. The remainder of this paper is organized as follows: Sec. 2 presents Doubletree; Sec. 3 discusses Doubletree’s limitations; Sec. 4 introduces and evaluates Windowed Doubletree, our improvement to Doubletree based on knowledge of missed information; ﬁnally, Sec. 5 concludes this paper by summarizing its main contributions.

2

Doubletree

Doubletree [9] takes advantage of the tree-like structure of routes in the context of probing, as illustrated in Fig. 1. Routes leading out from a monitor towards multiple destinations form a tree-like structure rooted at the monitor (Fig. 1(a)).

740

B. Donnet et al. Destination 2

Destination 2 estination 3

Destination 1

estination 3 Destination 1

Monitor 1 Monitor 2

Monitor 1 Monitor 2

onitor 3

(a) Tree rooted at the monitor

onitor 3

(b) Tree rooted at the destination

Fig. 1. Tree-like structure of routes

Similarly, routes converging towards a destination from multiple monitors form a tree-like structure, but rooted at the destination (Fig. 1(b)). A monitor probes hop by hop so long as it encounters previously unknown interfaces. However, once it encounters a known interface, it stops, assuming that it has touched a tree and the rest of the path to the root is also known. Using these trees suggests two diﬀerent probing schemes: backwards (monitor-rooted tree – decreasing TTLs) and forwards (destination-rooted tree – increasing TTLs). For both backwards and forwards probing, Doubletree uses stop sets. The one for backwards probing, called the local stop set, consists of all interfaces already seen by that monitor. Forwards probing uses the global stop set of (interface, destination) pairs accumulated from all monitors. A pair enters the global stop set if a monitor receives a packet from the interface in reply to a probe sent towards the destination address. A Doubletree monitor starts probing for a destination at some number of hops h from itself. It will probe forwards at h + 1, h + 2, etc., adding to the global stop set at each hop, until it encounters either the destination or a member of the global stop set. It will then probe backwards at h − 1, h − 2, etc., adding to both the local and global stop sets at each hop, until it either has reached the distance of one hop or it encounters a member of the local stop set. It then proceeds to probe for the next destination. When it has completed probing for all destinations, the global stop set is communicated to the next monitor. Note that in the special case where there is no response at distance h, the distance is halved, and halved again until there is a reply, and probing continues forwards and backwards from that point. Doubletree has a single tunable parameter, the initial hop distance h. While Doubletree largely limits redundancy on destinations once hop-by-hop probing is underway, its global stop set cannot prevent the initial probe from reaching a destination if h is set too high. Therefore, each monitor sets its own value for h in

Cooperative Internet Topology Discovery Algorithm

741

Fig. 2. Cumulative mass plot of path Fig. 3. Doubletree coverage compared to lengths from skitter monitor champagne classic probing

terms of the probability p that a probe sent h hops towards a randomly selected destination will actually hit that destination. Fig. 2 shows the cumulative mass function for this probability for skitter monitor champagne. If one considers as reasonable a 0.2 probability of hitting a responding destination on the ﬁrst probe, the champagne monitor must choose h ≤ 14.

3 3.1

Doubletree Limitations Methodology

Our study of Doubletree’s limitations is based on skitter data from August 1st through 3rd , 2004. At this time, skitter was deployed on 24 monitors scattered around the world: the United States, Canada, the United Kingdom, France, Sweden, the Netherlands, Japan and New Zealand. The diﬀerent monitors shared a common destination set of 971,080 IPv4 addresses. Each monitor cycled through the destination set at its own rate, taking typically three days to complete a cycle. For the purpose of our studies, in order to reduce computing time to a manageable level, we worked from a limited set of 50,000 destinations, randomly chosen from the original set. We conducted simulations based on the skitter data, assuming that each of the 24 skitter monitors applied Doubletree, as described in Sec. 2. We implemented the same communication scheme for Doubletree as the one described by Donnet et al. [9, Sec. IV.B.]. Essentially, a random order was chosen for the monitors and each one simulated the running of Doubletree in turn. Each monitor added to the global set the (interface, destination) pairs that it encountered, and passed the set to the subsequent monitor. A single experiment used traceroutes from all 24 monitors to a common set of 50,000 destinations chosen at random. Each data point in plots represents the mean value over ﬁfteen runs of the experiment, each run using a diﬀerent set

742

B. Donnet et al.

Fig. 4. Nodes and links at each hop

Fig. 5. Nodes and links missed - p = 0.05

of 50,000 destinations generated at random. No destination was used more than once over the ﬁfteen runs. We determined 95% conﬁdence intervals for the mean based, since the sample size was relatively small, on the Student t distribution. These intervals are typically, though not in all cases, too tight to appear on the plots. 3.2

Results

Fig. 3 shows the mean Doubletree coverage (nodes and links) compared to classic probing for some p values (an increment of 0.01 between 0 and 0.2 and an increment of 0.1 between 0.2 and 1). A coverage value of 1 would indicate that Doubletree is able to discover the same proportion of nodes and links as classic probing. A p value of zero means that a Doubletree monitor performs forwards probing only. At the other extreme, a p value of 1 indicates that a Doubletree monitor, in all cases when a destination replies to the ﬁrst probe, performs backwards probing only. We see that the coverage increases with p but never reaches 1. A coverage peak is reached when p = 0.8. After that point, the coverage decreases a little bit. As we can see, though it can achieve over 90% coverage, Doubletree, in its basic form, has room for improvement. Fig. 4 shows the average number, over the ﬁfteen destination subsets, of nodes and links discovered at each hop by skitter. The vertical axis, in log-scale, gives the quantity and the horizontal axis the hop count. As a given node might be located at a distance x from one monitor and at a distance y from another (the same problem might occur with links), we sort all nodes and links by distance and plot, for each node or link, the minimum distance. This methodology gives thus a monitor-independent representation. Further, note that for the links, the distance plotted is the distance of the start of the link. Finally, one can see that Fig. 4 shows quantities below 100 for some nodes and links. This is a consequence

Cooperative Internet Topology Discovery Algorithm

743

of the mean over the ﬁfteen destination subsets. Only a few nodes (or links) might appear in a few subsets at some distances from the monitors. In our dataset, skitter discovers, on average, 131,780 nodes and 279,799 links. Looking at Fig. 4, we note that there is a rapid increase in the number of nodes and links at each hop until reaching a peak close to the traceroute sources: at hop 8 corresponding to 35,620 diﬀerent links and at hop 9 corresponding to 19,662 nodes. After this peak, the number of nodes and links per hop decreases slowly until reaching a minimum (or a value very close to the minimum) after the 35th hop. Fig. 5 investigates the nodes and links missed with Doubletree when p = 0.05. This is in the range of p values recommended by our prior work [9, Sec. IV.B.], as it provides a good compromise between redundancy reduction and coverage. Fig. 5 plots the quantity of nodes and links missed at each hop. We see that the majority of nodes and links missed are located between 9 and 20 hops from a monitor, where the majority of nodes and links are located, according to Fig. 4. A peak (5,953 links and 1,387 nodes are not elicited) is reached at 12 hops from the sources. After 20 hops, the amount of topological information missed becomes negligible. We believe that this information missed is located between routes’ divergence and convergence points. Doubletree encounters diﬃculties in discovering nodes and links within such points because of its stopping rules. For the sake of explanation, let us consider the destination-rooted tree (see Fig. 6). Suppose that Monitor 2 has h ≤ 4 and probes Destination 2. It will, as explained in Sec. 2, probe forwards from h and backwards from h − 1. It will also populate the global stop set with the (interface, destination) pairs it encounters. When Monitor 2 has ﬁnished probing, it sends its global stop set to Monitor 3. Suppose that Monitor 3 has h ≤ 2. When discovering the path to Destination 2, it will probe forwards until reaching the gray interface (A). As this interface towards Destination 2 was previously discovered by another monitor in the system, Monitor 3 stops probing, considering that the rest of the path to the root of the tree is already known. However, because of load balancing or routing change, the path from the gray interface towards Destination 2 has changed. As a consequence, in this example, one node (E) and two links (dashed lines - A → E and E → C) will not be discovered. The same reasoning applies for the monitorrooted tree but with backwards probing and a stopping rule based on the local stop set. This case occurs because of the way Doubletree models the network. As explained in Sec. 2, Doubletree assumes, in the context of probing, that the routes have a tree-like structure. This is true in a large proportion as suggested by Doubletree’s coverage results (see Fig. 3), but this hypothesis implies a static view of the network. When a Doubletree monitor stops probing towards the root of a tree, it assumes that the rest of the path to the tree is both known and unchanged since earlier probing. The existence of routes’ convergence and divergence points, however, imply a dynamic view of the network, as some parts of the network might change due to load balancing and routing.

744

B. Donnet et al. Destination 2 estination 3

Destination 1

C

Monitor 1 Monitor 2

onitor 3

Fig. 6. Route’s convergence point in the Fig. 7. Windowed case of destination rooted tree champagne monitor

4

Doubletree

-

the

Windowed Doubletree

As explained in Sec. 3, Doubletree will miss node E (see Fig. 6) if it ﬁrst probes path A → B → C and then A → E → C. E will be hidden behind the shared interfaces of A and C. That is unless Doubletree starts its probing on the second hop in the paths between A and C. This will allow it to discover both B and E. However, as explained in Sec. 2, Doubletree, in its classic form, starts probing from a ﬁxed value of h for all destinations. Consequently, unless the inconsistency between the paths occurs at its ﬁxed h, it will not be discovered. Our idea is to randomize the distance at which probing will start. Rather than launching the ﬁrst probe at a constant value of h, each monitor will randomly pick a value of h in the window of missing nodes and links. This is illustrated in Fig. 7. The horizontal axis gives the distance to monitors of missed nodes and the vertical axis the cumulative mass of missed nodes. The window, between 9 and 20 hops, is shown by the shaded area. Note that taking into account smaller values for the window would raise the risk of intra-monitor redundancy [9, Sec. III.A.], i.e., the duplication of a monitor’s own work, that leads to ineﬃciency. Introducing this element of randomness in probing dramatically increases the probability that at least one Doubletree trace will start probing inside a route’s divergence and convergence segment. Thus allowing Doubletree to discover the nodes and links hidden within this segments. The rest of Doubletree’s behavior is left unchanged. Probing continues forwards from h and backwards from h − 1 while using global and local stop set to decide when probing must stop. Finally, all monitors continues to cooperate in order to exchange their global stop set. We call this improved algorithm Windowed Doubletree.

Cooperative Internet Topology Discovery Algorithm

745

Fig. 8. Coverage in comparison to classic Fig. 9. Windowed Doubletree redundancy Doubletree and classic probing (95th percentile)

4.1

Evaluation

Our study was based on the same data set as the one described in Sec. 3.1. We assumed that Windowed Doubletree is running, as described in Sec. 4, on the skitter monitors, during the same period of time that the skitter data represents. In addition, we simulated randomness to pick a value for h within the window with the Mersenne Twister MT19937 pseudorandom number generator [12]. The main performance metric for a probing system is the extent to which it discovers what it should. Fig. 8 shows the Windowed Doubletree node and link coverage compared to classic probing and classic Doubletree. The horizontal axis gives the probability p (i.e., the probability of hitting a destination with the ﬁrst probe sent) between 0 and 1. The vertical axis gives the coverage proportion in comparison to classic probing, i.e., skitter. A value of 1 would mean that Doubletree is able to discover the same proportion of nodes and links as classic probing. We compare Windowed Doubletree (the horizontal lines) with classic Doubletree (the curves). Looking ﬁrst at the node coverage (upper curve), we see that Windowed Doubletree (plain line) is able to discover more nodes than classic Doubletree. The increase is slight: between 9% (p = 0) and 0.3% (p = 0.8). Therefore, reaching nearly the same proportion of nodes discovered by using classic Doubletree would mean considering a p value of 0.8. Previous work [9] pointed out that such a value for the parameter p is not advisable due to the load on destinations. Indeed, a value of p = 0.8 means that in 80% of the cases, the ﬁrst probe sent by a Doubletree monitor will hit a destination, which can be interpreted by ﬁnal hosts and ﬁrewalls as an attack. With link coverage, the diﬀerence is much greater: Windowed Doubletree captures between 16% (p=0) and 0.7% (p=0.8) more links than classic Doubletree. Windowed Doubletree is thus able to discover more than 98% of the nodes and 93% of the links discovered in classic probing. Windowed Doubletree increases the coverage of classic Doubletree.

746

B. Donnet et al.

Fig. 10. Nodes missed with Windowed Fig. 11. Links missed with Windowed Doubletree compared to classic Doubletree Doubletree compared to classic Doubletree

The goal of applying Doubletree is to reduce the load on network interfaces in routers and, more importantly, at destinations. If Windowed Doubletree increased this load, it would be a concern. Fig. 9 shows the redundancy for routers interfaces (left vertical axis) and for destinations (right vertical axis). With regard to router interface redundancy, we are concerned by the overall load. We therefore count the total number of visits to an interface. We call this metric gross redundancy. We evaluate the destination redundancy by counting the number of monitors that hit a given destination. The maximum value is thus the total number of monitors in the system, i.e., 24 in our case. For both destination and router interfaces, we are concerned with the extreme values, so we plot the 95th percentile. Looking ﬁrst at the gross redundancy, we see that Windowed Doubletree produces the same amount of redundancy than the classic Doubletree with p = 0.14. This corresponds to a value within the range of p values advised by previous work [9, Sec. IV.B.]. Note that the 95th percentile for the internal interface gross redundancy using a skitter-like approach is 1340 (not shown on Fig. 9). If we compare Windowed Doubletree to the skitter-like approach, Windowed Doubletree allows a reduction in redundancy of 67.09%. The destination redundancy caused by Windowed Doubletree corresponds to that produced by classic Doubletree with p = 0.18 (i.e., 18), which belongs to the advised range of p values. The small increase in destination redundancy is due to the fact that the choice of the h value with Windowed Doubletree is no longer trying to minimize the risk of hitting a destination with the ﬁrst probe sent. However, Fig. 9 shows that randomly picking a value within a range of values does not lead to disastrous destination redundancy. Finally, note that classic probing generates a destination redundancy of 24 (not shown in Fig. 9), i.e., the number of monitors in our simulations. Fig. 10 and 11 compare classic Doubletree with Windowed Doubletree in terms of nodes and links missed. The vertical axis gives the quantity of information

Cooperative Internet Topology Discovery Algorithm

747

(nodes or links) missed and the horizontal axis the distance (in term of number of hops) from the monitors. In the fashion of Fig. 4, nodes and links are sorted by distance and for each node and link, we plot the minimum distance. Obviously, as suggested by Fig. 8, by introducing an element of randomness in probing, we are able to reduce the quantity of nodes and links missed by classic Doubletree. Looking ﬁrst at the nodes (Fig. 10), we see that the maximum quantity of nodes missed is located further from monitors: 16 hops (328 nodes) instead of 12 hops (1387 nodes). The same phenomenon appears for links: the maximum quantity of links missed is located 14 hops (2,528 links) from monitors instead of 12 hops (5,953 links). We ﬁnally notice that, unlike classic Doubletree, Windowed Doubletree is able to discover nodes and links that are located beyond 30 hops from monitors.

5

Conclusion

In this paper, we improved a cooperative topology discovery algorithm, Doubletree, in order to increase its node and link coverage. We ﬁrst studied the properties of Doubletree losses and found that these losses are due to the hypothesis Doubletree makes on how the routes are modeled when probing. We also determined that the nodes and links missed are located between routes’ divergence and convergence points. Based on knowledge of nodes and links missed, we proposed Windowed Doubletree. Instead of starting probing at a ﬁxed point in the network as classic Doubletree does, Windowed Doubletree builds a window based on the location of nodes and links missed. For each destination to probe, Windowed Doubletree randomly picks up a value in this window and starts probing from this point. We demonstrated that, by introducing an element of randomness in probing, Windowed Doubletree is able to discover more nodes and links than classic Doubletree while maintaining a low impact on routers and destinations.

Acknowledgements Mr. Donnet’s work was partially supported by the European Commission-funded 034819 OneLab project and by an internship at Caida.

References 1. Huﬀaker, B., Plummer, D., Moore, D., claﬀy, k.: Topology discovery by active probing. In: Proc. Symposium on Applications and the Internet. (2002) 2. Luckie, M.: (IPv6 scamper) WAND Network Research Group. 3. Georgatos, F., Gruber, F., Karrenberg, D., Santcroos, M., Susanj, A., Uijterwaal, H., Wilhelm, R.: Providing active measurements as a regular service for ISPs. In: Proc. Passive and Active Measurement (PAM) Workshop. (2001) 4. McGregor, A., Braun, H.W., Brown, J.: The NLANR network analysis infrastructure. IEEE Communications Magazine 38(5) (2000)

748

B. Donnet et al.

5. Lakhina, A., Byers, J., Crovella, M., Xie, P.: Sampling biases in IP topology measurements. In: Proc. IEEE INFOCOM. (2003) 6. Clauset, A., Moore, C.: Traceroute sampling makes random graphs appear to have power law degree distributions. cond-mat 0312674, arXiv (2004) 7. Cheswick, B., Burch, H., Branigan, S.: Mapping and visualizing the internet. In: Proc. USENIX Annual Technical Conference. (2000) 8. Shavitt, Y., Shir, E.: DIMES: Let the internet measure itself. ACM SIGCOMM Computer Communication Review 35(5) (2005) 9. Donnet, B., Raoult, P., Friedman, T., Crovella, M.: Deployment of an algorithm for large-scale topology discovery. IEEE Journal on Selected Areas in Communications, Sampling the Internet: Techniques and Applications 24(12) (2006) 2210–2220 10. Thaler, D., Hopps, C.: Multipath issues in unicast and multicast next-hop selection. RFC 2991, Internet Engineering Task Force (2000) 11. Augustin, B., Cuvellier, X., Orgogozo, B., Viger, F., T., F., Latapy, M., Magnien, C., Teixeira, R.: Avoiding traceroute anomalies with Paris traceroute. In: Proc. Internet Measurement Conference (IMC). (2006) 12. Matsumoto, M., Nishimura, T.: Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator. ACM Transactions on Modeling and Computer Simulation 8(1) (1998) 3–30

Robust IP Link Costs for Multilayer Resilience Michael Menth, Matthias Hartmann, and Rüdiger Martin University of Würzburg, Institute of Computer Science, Germany {menth,hartmann,martin}@informatik.uni-wuerzburg.de

Abstract. In this work we optimize administrative link costs of IP networks in such a way that the maximum utilization of all links is as low as possible for a set of considered failure scenarios (e.g., all single link failures). To that aim, we present the new "hill hopping" heuristic with three different variants and compare their computation times and the quality of their results. We adapt the objective function of the heuristic to make the link cost settings robust to single link failures, single node failures, and single link or node failures, and compare the results. In particular, we optimize the routing for multilayer networks where unused backup capacity of the link layer can be reused to redirect traffic on the network layer in case of an IP node failure.

1 Introduction IP routing is very robust against network failures as it always finds possible paths between two endpoints as long as they are still physically connected. When a failure occurs, traffic is rerouted which may lead to congestion on the backup paths. In fact, this is the most frequent cause for overload in IP backbones [1] and may violate the quality of service (QoS) in terms of packet loss and delay. In IP networks, traffic is forwarded along least-cost paths whose costs are based on the sum of the administrative costs of their links. The modification of the administrative link costs changes the routing and is thereby a means for traffic engineering. The link costs are usually set to one, which is the hop count metric, proportionally to the link delay, or reciprocally to the link bandwidth. However, for a network with a given topology, link bandwidths, and traffic matrix, the maximum link utilization can be minimized by choosing appropriate link costs, but this problem is NP-hard [2]. Therefore, heuristic methods are applied to solve it [3]. In the presence of failures, the overload due to backup traffic may be reduced by influencing the routes by a modification of the link costs. Calculating new costs and uploading them on the routers takes some time and this is cumbersome because most outages last less than 10 minutes [4]. In addition, when the failed link resumes operation, the new link costs may be suboptimal. A simpler solution is setting the link costs a priori in such a way that they lead to a low utilization of all links both under failurefree conditions and after rerouting for a set of protected failures S. So far, only a few papers [5, 6, 7, 8] addressed this kind of optimization and and they considered only the protection of single link failures.

This work was funded by Siemens AG, Munich, and by the Deutsche Forschungsgemeinschaft (DFG) under grant TR257/23-1. The authors alone are responsible for the content of the paper.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 749–761, 2007. c IFIP International Federation for Information Processing 2007

750

M. Menth, M. Hartmann, and R. Martin

In multilayer networks, connections of lower layers provide links for upper layers. Figure 1 illustrates that a logical IP MPLS Layer network link may be implemented by a label switched path (LSP) of an underlySONET/SDH ing MPLS layer. This LSP contains furLayer ther intermediate label switching routers Fig. 1. Connections of lower layers provide links (LSRs) not visible on the IP layer. Likefor upper layers wise, links between these LSRs may be implemented by virtual or physical connections of an underlying SONET/SDH or optical layer. Multilayer networks provide rerouting or protection switching capabilities and backup capacity on different layers [9]. This seems to be a waste of resources, but protection on lower layers reacts faster than rerouting on the IP layer. To save bandwidth, it is desirable to share the backup capacity between layers, which is possible between the IP layer and the packet-switched MPLS layer. The contribution of this paper is manifold. It suggests the new "hill hopping" heuristic for the optimization of resilient IP routing with three different intuitive variants. It presents a new methodology for an empirical comparison of the computation times of the algorithms and the quality of their results. And it goes beyond the protection of single link failures as node failures are also considered and, in particular, backup capacity sharing between layers for multilayer networks. Section 2 gives the problem formulation for resilient IP routing and summarizes related work. Section 3 proposes several new heuristics and compares their computation times and the quality of their results. Section 4 adapts the objective function of the heuristics to different protection variants including multilayer protection and illustrates their impact on the bandwidth efficiency. Finally, we summarize this work and draw our conclusions in Section 5. IP Layer

2 Optimization of IP Routing With and Without Resilience Requirements In this section, we review fundamentals of IP routing and summarize related work on routing optimization with and without resilience requirements. 2.1 Fundamentals of IP Routing In IP networks, routers have routing tables that contain for many IP-address-prefixes one or several next hops. A router forwards an incoming packet by finding the longest prefix in the routing table that matches its destination address and by sending it to one of the corresponding next hops. Thus, IP implements destination-based routing. Single path routing forwards the traffic only to the interface or next hop with, e.g., the lowest ID while multipath routing splits the traffic equally among all possible next hops (cf. 7.2.7 of [10]). The routing tables are usually constructed in a distributed manner by routing protocols like OSPF or IS-IS that exchange information about the current topological structure of the network. A router calculates the next hops to all other routers in the network by using the shortest (or least-cost) paths principle to avoid forwarding loops.

Robust IP Link Costs for Multilayer Resilience

751

In particular, sink trees are computed to every destination in the network. In addition a router knows which node in the network serves as egress router to prefixes outside the network. This combined information is constructed into the routing table. IP routing is very robust against network failures because in case of a failure, the new topological information is exchanged by the routing protocol and routers update their routing tables. This rerouting may take seconds, but currently new mechanisms for IP fast rerouting are investigated [11]. Single-path routing is default, but we apply the equal-cost multipath (ECMP) option, which allows multi-path routing over all least-cost paths towards the same destination. It makes the routing independent of device numbers and enables fast and local traffic redirection if a next hop fails and several next hops exist [12]. 2.2 Problem Formulation We model a network by a graph G = (V, E) consisting of its set of nodes V and its set of directed links E. Calligraphic letters X denote sets and the operator |X | indicates the cardinality of a set. Each link l ∈ E has a capacity c(l) and is associated with cost k(l). The capacities and the costs of all links are represented in a compact way by the vectors c and k. Note that vectors and matrices are printed boldface and the indexed components of a vector v are denoted by v(i). We work with integer link cost between kmin = 1 and kmax , thus, they are taken from a vector space with (kmax )|E| elements. A network is resilient to a certain failure scenario s if the rerouted traffic does not lead to congestion. Therefore, resilience always relates to a set of protected failure scenarios S. Each s ∈ S describes a set of non-working network elements. For the sake of simple notation, the working scenario ∅ is part of S. The function u(l, v, w) indicates the percentage of the aggregate from node v to w that is carried over link l. This description models both single and multipath routing. We extend this routing function to uks (l, v, w) to account for a specific set of link costs k and a certain failure scenario s ∈ S. The traffic matrix D contains the demand rate D(v, w) between any two nodes v, w ∈ V. The utilization ρ(k, l, s) of a link l in a failure scenario s, the maximum uti(k, l) of link l in all failure scenarios s ∈ S, and the maximum utilization lization ρmax S ρmax S,E (k) of all links l ∈ E in all failure scenarios s ∈ S is calculated for any link cost vector k by ⎛ ⎞ ρ(k, l, s) = ⎝ uks (l, v, w) · D(v, w)⎠ /c(l) (1) v,w∈V

(k, l) ρmax S ρmax S,E (k)

= max (ρ(k, l, s)) s∈S

=

max (ρmax (k, l)) S l∈E

(2) (3)

Note that the calculation of Equations (2) and (3) is quite costly since destination trees need to be calculated by Dijkstra’s algorithm for each protected network failure s ∈ S. max max When our algorithms calculate ρmax S,E (k ) to test for ρS,E (k ) < ρS,E (k), the calcumax max lation of ρS,E (k ) stops as soon as ρS,E (k) has been exceeded to save computation time. In addition, the failure scenarios can be sorted in such a way that this condition occurs early.

752

M. Menth, M. Hartmann, and R. Martin

The objective of IP routing optimization is to find a link cost vector k such that the maximum link utilization ρmax S,E (k) is minimal. If resilience is not required, the set of protected failure scenarios contains only the working scenario S = {∅}, otherwise it contains, e.g., all single (bidirectional) link failures. 2.3 Related Work We briefly review existing work regarding the optimization of IP routing with and without resilience requirements. Optimization of IP Routing without Resilience Requirements. The problem of IP routing optimization without resilience requirements is NP-hard [2]. Some papers try to solve the problem by integer linear programs and branch and bound methods. Since the search space is rather large, others prefer fast heuristics and use local search techniques [3], genetic algorithms, simulated annealing, or other heuristics. The papers also differ slightly in their objective functions. In case of traffic hot spots or link failures, link costs may be changed, but this possibly causes service interruptions such that the number of changed link costs should be kept small [13]. Optimization of IP Routing with Resilience Requirements. Optimization of IP routing becomes even more difficult if different failure scenarios must be taken into account for minimization of the objective function in Equation (3). It has been proposed independently by [5, 6, 7] for single link failures and almost at the same time. The presented algorithms use a local search technique combined with a tabu list or a hash function to mark already visited solutions. To escape from local minima, [5] sets some link weights to random values. To speed up the algorithm, [6] investigates only a random fraction of possible neighboring configurations while [7] applies an additional heuristic to generate a fraction of good neighboring configurations. Finally, [8] accelerates the evaluation of the objective function by considering only a set of critical links instead of the entire set of protected failure scenarios S.

3 New Heuristics for Resilient IP Routing In this section, we propose new heuristics to find good link costs k for resilient IP routing and compare their computation times and the quality of their results. 3.1 Description of the Algorithms We apply the well-known hill climbing heuristic and propose the new hill hopping heuristic for resilient IP optimization. In addition, we propose three different methods for the generation of random neighbors knew from a large neighborhood of the current link costs k. These methods are required by hill hopping and can be reused by other heuristic control algorithms. The Hill Climbing Algorithm. The hill climbing algorithm starts with an initial current vector k of current link costs. It first evaluates the maximum link utilization ρmax S,E (knew ) of all link cost vectors knew in the close neighborhood of the current

Robust IP Link Costs for Multilayer Resilience

753

vector k which consists of all vectors that differ from k by at most 2 in a single link. max It chooses the knew with the best improvement ρmax S,E (k) − ρS,E (knew ) as successor vector of k. If no such knew can be found, the algorithm terminates; otherwise, the procedure restarts with the new current vector k. The Hill Hopping Algorithm. The quality of the results of the hill climbing algorithm suffers from the fact that it terminates when the first local minimum is found. We avoid this drawback by Algorithm 1. Here, the current cost vector k is substituted by knew if its maximum utilization ρmax S,E (knew ) is smaller than the one of the currently best link costs kbest multiplied by a factor T ≥ 1. Thus, the maximum utilization of the current unsuc link costs can be slightly larger than ρmax S,E (kbest ). The method terminates if nmoves new vectors knew have been explored without finding a better one than kbest . In analogy to the hill climbing algorithm we call this method hill hopping. The current vector k has a high quality. We view this quality as a hill in the multi-dimensional state space. A randomly generated successor knew can be fairly distant from k and if it is accepted as new current vector k, it also represents a quality hill. Thus, this method performs hill hopping. The design of this algorithm was inspired by the threshold accepting algorithm [14] which is a simplification of the simulated annealing heuristic. Input: start vector kstart , maximum number of unsuccessful moves nunsuc moves , threshold T for accepting new candidates k ← kstart , kbest ← kstart , n ← 0, while n < nunsuc moves do knew ←G ENERATE R ANDOM N EIGHBOR(k) n←n+1 max if (ρmax S,E (knew ) ≤ T · ρS,E (kbest )) then k ← knew max if ρmax S,E (k) < ρS,E (kbest ) then kbest ← k, n ← 0 end if end if end while Output: link costs kbest

Algorithm 1: H ILL H OPPING searches for link costs kbest that lead to low maximum link utilization ρmax S,E (kbest ) in all protected failure scenarios S. Neighborhood Generation for the Hill Hopping Algorithm. The hill hopping algorithm uses the method G ENERATE R ANDOM N EIGHBOR for the generation of a new vector knew in the wide neighborhood of k. We propose three different implementations of that method. Random Neighborhood Generation RN G(h, d). The random neighborhood generation (RNG) randomly chooses h∗ links according to a uniform distribution between 1 and h. It changes their costs by adding or subtracting an integral value between 1 and d, but the minimum cost value kmin = 1 and the maximum cost value kmax must be respected as side conditions.

754

M. Menth, M. Hartmann, and R. Martin

k k Link Ranking Methods rrel (l) and rabs (l). The following neighborhood generation methods take advantage of the link-specific maximum utilization ρmax (k, l) in EquaS k tion (2). The relative rank rrel (l) of a link l is the number of links l ∈ E that have a (k, l ) than l. Note that several links possibly have the smaller utilization value ρmax S k k same relative rank. The absolute rank of a link rabs (l) is its relative rank rrel (l) plus max the number of links l with the same maximum link utilization ρS (k, l ) but with a lower link ID than l. Both rankings yield numbers between 0 and |E|−1. In contrast to the relative rank, the absolute rank is a 1:1 mapping.

Greedy Neighborhood Generation GN G(h, d). The greedy neighborhood generation chooses a number h∗ between 1 and h. It then chooses h∗ links based on a special heuristic, and increases or decreases their costs if they carry high or little load, respectively. The heuristic to select the h∗ links works as follows. The absolute rank k rabs (l) of a link l is associated with one of the |E| equidistant subintervals of (0; 1) in Figure 2. A link is randomly chosen based on the probability density function k (l) of that link f (x) = (m + 1) · (2 · x − 1)m in Figure 2. If the absolute rank rabs |E−1| l is smaller than 2 , it has a relatively low maximum utilization value ρmax (k, l). S Therefore, its cost is decreased by an integral random variable which is uniformly distributed between 1 and d. Otherwise it is increased by that value. This is repeated until the cost of h∗ different links are changed. The GNG rarely changes the cost of links l (k, l) and it only increases (decreases) with a medium maximum link utilization ρmax S the costs of links with low (high) ρmax (k, l). This is similar to the heuristic used in [7]. S

Fig. 2. The greedy neighborhood generation (GNG) chooses h∗ links randomly according to the displayed probability density function using k and then modifies the absolute link rank rabs their costs by a negative or positive offset between 1 and d

Fig. 3. The intelligent neighborhood generation (ING) chooses h∗ links arbitrarily and then modifies their costs by an offset according to the displayed probability density function that k (l) of the condepends on the relative rank rrel sidered link

Intelligent Neighborhood Generation IN G(h, d). Like the RNG, the intelligent neighborhood generation (ING) also chooses h∗ links arbitrarily to modify their costs by a randomly selected integral offset value between −d and d. This offset is derived from a k (l) link-specific triangle distribution whose vertex is determined by the relative rank rrel of the respective link l. This is visualized in Figure 3. In contrast to GNG, the cost of any link has the same chance to be changed and if so, it can be increased and decreased.

Robust IP Link Costs for Multilayer Resilience

755

Like GNG, ING also favors the increase (decrease) of the cost of links with high (low) maximum link utilization ρmax (k, l), but it has more possible neighboring configuraS tions than GNG. 3.2 Performance Comparison of the Heuristics We study the computation time of the algorithms presented above and the quality of their results. The computation time is measured both by the actual computation time of the algorithms and by the number of evaluated link cost vectors knew ; note that there is no linear mapping between these quantities since the calculation of ρmax S,E (k) may be stopped early when a preliminary result is already too large. The optimization quality is ρmax S,E (1) captured by the scale-up factor θ(k) = ρmax (k) which has the following interpretation: S,E

a routing based on link costs k can carry the same traffic matrix scaled by factor θ(k) to reach the same maximum link utilization ρmax S,E (1) as the routing based on the standard hop metric k = 1. Tor

Bos

Buf

Sea Chi

Cle NeY

Kan SaF

Was

Den LaV

Atl

LoA Pho

NeO

Dal

Orl

Hou Mia

Fig. 4. The Labnet03 network consists of 20 nodes and 53 bidirectional links

We use the Labnet03 network in our study with equal link bandwidths together with the population-based traffic matrix from [15] (cf. Figure 4). The ECMP option is used for traffic forwarding, the parameters for the heuristics are set to kmax = 10, nunsuc moves = 30000, h = 5, d = 1, m = 1 (for GNG), and the set of protected failure scenarios S comprises all single link failures.

Computation Time. The heuristics improve the quality of their results incrementally, and may take very long depending on the termination criterion nunsuc moves . Therefore, preliminary results are already available before the program ends and we take advantage of that fact to compare the average convergence speed of 100 different optimization runs for all heuristics. The start vector kstart can be viewed as the seed for both the deterministic hill climbing algorithm and the stochastic hill hopping algorithm. For hill hopping we initialize start vectors kstart with random numbers between 1 and kmax while for hill climbing we use 1 and 2 with equal probability since hill climbing cannot escape from a local optimum. Figures 5(a) and 5(b) illustrate the evolution of the scale-up factor θ(k) depending on the number of evaluated link cost vectors for the hill climbing and the three hill hopping variants averaged over 100 different runs. They show the scale-up factor for the first 3000 and 30000 evaluations, respectively. Due to the random initialization, the scale-up factor for the hill climbing algorithm is on average below 1 for the first 3000 evaluations, but it achieves good scale-up factors in the end. We observe the first improvement for hill climbing on average after 265 evaluations because of the chosen random initialization. As hill climbing terminates relatively early in a local optimum, the corresponding curve ends at 24000 evaluations; this has been the maximum number of evaluations in 100 runs. Hill hopping with GNG leads very fast to good results, but

M. Menth, M. Hartmann, and R. Martin

1.6

Average scale−up factor θ(kbest)

1.6 1.4 1.2 1 0.8 0.6 0.4

Hill climbing Hill hopping with RNG Hill hopping with GNG Hill hopping with ING

0.2 0 0

500

1000

1500

2000

2500

Number of evaluated link cost vectors

(a) The first 3000 evaluations.

3000

Average scale−up factor θ(kbest)

756

1.4 1.2 1 0.8 0.6 0.4

Hill climbing Hill hopping with RNG Hill hopping with GNG Hill hopping with ING

0.2 0 0

0.5

1

1.5

2

2.5

3

Number of evaluated link cost vectors x 104

(b) The first 30000 evaluations.

Fig. 5. Evolution of the scale-up factor θ(k) for different heuristics depending on the number of evaluated link cost vectors knew and averaged over 100 runs

it is outperformed by all other algorithms on the long run. An analysis of the resulting link costs shows that most of them take either the minimum or the maximum value, i.e., this heuristic is not able to leave certain local optima. Hill hopping with ING also yields good results quite quickly, but RNG produces better link costs after sufficiently many evaluations. Thus, sharpening the search for good candidates in the neighborhood of the current link costs k accelerates the convergence of the scale-up factor, but it also impedes the random discovery of excellent configurations. 1.7

Best scale−up factor θ(kbest)

Quality of the Results. We run the heuristics repeatedly with different seeds over 24 hours. After each termination of hill hop1.5 ping, we applied an additional hill climbing to the final result to make sure that the 1.4 local optimum is found. Table 1 shows that 1.3 the presented algorithms run with a differHill climbing ent frequency because they evaluate a difHill hopping with RNG 1.2 Hill hopping with GNG ferent number of link cost vectors and some Hill hopping with ING 1.1 0 4 8 12 16 20 24 of these evaluations are stopped early. Time (hours) We sort the runs according to ascending Fig. 6. Ordered scale-up factors from iterative scale-up factors and present the time seoptimization runs within 24 hours ries of their cumulated computation times in Figure 6. It shows that the RNG variant of hill hopping leads to the best results, followed by ING, GNG, and normal hill climbing. Investigating different networks showed that the order of efficiency of the different algorithms remains the same, but the distance between the curves varies. We observed that RNG requires a low maximum link cost kmax to limit the search space whereas ING also works well for large kmax . We also tested other heuristics with similar computational requirements, e.g. the original threshold accepting (TA) algorithm [14] and simulated annealing (SA), but hill hopping leads to the best results. In addition, hill hopping has fewer parameters than TA or SA and is, therefore, simpler to apply. 1.6

Robust IP Link Costs for Multilayer Resilience

757

Table 1. Number of optimization runs within 24 hours with the corresponding number of evaluated link cost vectors method #runs #evals/run #evals in 24h hill climbing 358 13719 4911386 hill hopping (GNG) 95 63079 5992505 hill hopping (ING) 58 110752 6423628 hill hopping (RNG) 48 90032 4321527

4 IP Resilience for Multilayer Networks We first comment on multilayer resilience. Then we discuss various protection variants with different implications on the resource management which impact the objective function of the optimization problem. Finally, we compare the different protection variants. 4.1 Multilayer Resilience As mentioned in Section 1, networks have a layered architecture as illustrated in Figure 1. Several layers can provide resilience mechanisms with backup capacity to repair broken paths. Link management or routing protocols trigger their activation, and the temporal coordination of the resilience mechanisms of different layers is an important issue that is solved, e.g., by timers. The reaction time on lower layers must be shorter than on upper layers to avoid unnecessary and repeated reroutes on upper layers. As a consequence, cable cuts are repaired by lower layer protection while the outage of IP routers still requires IP rerouting to reestablish connectivity. As any failure can be repaired on upper layers, multilayer resilience seems a waste of resources. However, lower layer protection mechanisms are faster than IP rerouting since they switch the traffic to preestablished backup paths in case of a failure. Therefore, multilayer resilience is used in practice, but it is desirable to save backup capacity by reusing the backup capacity of the MPLS layer on the IP layer whenever possible. 4.2 Optimization of IP Routing in Multilayer Networks We now consider different options for multilayer resilience. They differ in reaction speed and the available capacity after rerouting. The failure of IP nodes must be protected by slow IP rerouting. In contrast, IP link failures, which are more likely, can be healed by slow IP rerouting if no link layer protection exists (NoLLP), by fast 1:1 link layer protection (1:1LLP), or by very fast 1+1 link layer protection (1+1LLP). We talk about low, medium, and high service availability (LSA, MSA, HSA) if there is no explicit backup capacity, if the capacity suffices to carry the backup traffic from single link failures, or from single link and router failures. In the following, we discuss different link protection alternatives with different requirements for service availability. No Link Layer Protection with Low, Medium, and High Service Availability (NLLP-LSA, NLLP-MSA, NLLP-HSA). As failures are protected only by IP rerouting, the full capacity is available for the IP layer and Equation (3) can be used as objective function for the routing optimization. The service availability impacts the set of

758

M. Menth, M. Hartmann, and R. Martin

protected failure scenarios such that S contains the failure-free scenario only, all single link failures, or all single link and router failures for NoLLP-LSA, NoLLP-MSA, and NoLLP-HSA, respectively. Link Layer Protection with Medium Service Availability (LLP-MSA). In the presence of 1+1 or 1:1 link layer protection, IP routing can be optimized for the failure-free case S = ∅ since link failures are completely covered by LLP and node failures do not need to be protected. We assume that backup capacity sharing is not possible and that LLP consumes 50% of the link layer capacity. Therefore, the utilization values of the IP layer capacity are twice as large as in a network with NoLLP if the same link layer capacity is available. To get meaningful comparative results, we change Equation (3) to max ρmax (k, l)) . S,E (k) = 2 · max (ρS l∈E

(4)

1:1 Link Layer Protection with High Service Availability (1:1LLP-HSA). With 1:1LLP, the link layer provides a primary link and a backup link to the IP layer. If the primary link fails, the traffic is automatically redirected to the backup link, otherwise the backup link can carry extra traffic. Thus, only half the capacity can be used for premium traffic in failure-free scenarios. We account for this fact by calculating the utilization of the primary capacity which is twice the utilization of the overall link capacity. As we protect all single failures in our study, all links work when a router fails such that the capacity of the backup links can be reused in this case. Thus, the full link capacity is available for rerouting due to node failures. To capture these side conditions, we substitute Equation (3) for the optimization of resilient IP routing by max max ρ(k, l, s) (5) ρS,E (k) = max 2 · max ρ(k, l, ∅), l∈E

{(l,s):l∈E,s∈S∧s=∅}

where the set of protected failure scenarios S comprises all single router failures. 1+1 Link Layer Protection with High Service Availability (1+1LLP-HSA). With 1+1LLP, traffic is simultaneously carried over primary and backup paths such that the reaction time is very short if a failure occurs. Hence, the backup capacity can never be reused on the IP layer, and only half of the link layer capacity is available for IP traffic. Therefore, Equation (4) applies instead of Equation (3) for the optimization of resilient IP routing with S being the set of all single router failures. Related Aspects. We briefly mention additional issues that have not been taken into account by the above scenarios and may be for further study. Shared Protection on Lower Layers. The above scenarios assumed that on the lower layer, the capacity of a backup path is fully dedicated to a single primary path. When shared protection is allowed, the same capacity carries backup traffic from different primary paths in different failure scenarios. As a consequence, significantly more than 50% of the link capacity can be used to carry protected IP traffic on the IP layer [15]. The authors of [16] have shown that single link failures can be protected more efficiently by plain WDM protection than by plain IP restoration if backup capacity sharing is allowed for both options.

Robust IP Link Costs for Multilayer Resilience

759

Shared Risk Groups (SRGs). For simplicity reasons, we consider only single link or router failures. However, multi-failures may also occur due to simultaneous uncorrelated failures or due to correlated failures of so-called shared risk groups (SRGs). The simplest form of a shared risk link group (SRLG) is the failure of a router which entails the simultaneous failure of all its adjacent links. More complex SRGs occur, e.g., due to the failure of unprotected lower layer equipment. General SRGs can be integrated in our optimization approach by simply including them into the set of protected failure scenarios S, but in practice, the difficulty is mostly the missing knowledge about them. 4.3 Performance Comparison We compare the bandwidth efficiency of the multilayer resilience scenarios with and without routing optimization. We calculate the maximum link utilization ρmax S,E (X, k) for the hop count metric (k = 1) and for optimized link cost kbest for each multilayer resilience scenario X. As the maximum utilization value ρmax S,E (X, k) is the largest for unoptimized routing in the 1+1LLP-HSA scenario, we use ρmax S,E (1+1LLP-HSA, 1) as ρmax (1+1LLP-HSA,1)

the base for the relative scale-up factor η(X, kbest ) = S,E . ρmax S,E (X,kbest ) Table 2 presents results from the Labnet03 (cf. Figure 4) both for the heterogeneous traffic matrix used above and for a homogeneous traffic matrix. The scale-up factors η(X, 1) and η(X, kbest ) illustrate the impact of the multilayer scenario X. They quantify how much more traffic can be carried with X compared to 1+1LLP-HSA. Obviously, most traffic can be transported with NoLLP-LSA, followed by NoLLP-MSA, NoLLP-HSA, LLP-MSA, 1:1LLP-HSA, and 1+1LLP-HSA. In the presence of a heterogeneous traffic matrix, the protection of link and router failures requires more backup capacity than the protection of only link failures when no LLP is used. In contrast, in the presence of the homogeneous traffic matrix NoLLP-HSA and NoLLP-MSA need about the same backup resources and 1:1LLP-HSA is as efficient as LLP-MSA, i.e., the protection of additional router failures does not cost extra resources. Backup capacity sharing between the link and the network layer makes 1:1LLP-HSA 27%-56% more efficient than 1+1LLP-HSA. Hence, if the reaction time of IP rerouting is not acceptable, 1:1LLP-HSA may be preferred as the more efficient alternative to 1+1LLP-HSA. Table 2. Scale-up factors for optimized and unoptimized IP routing in different multilayer resilience scenarios resilience w/o IP opt with IP opt scenario X η(X, 1) η(X, kbest ) θ(X, kbest ) hetero TM NoLLP-LSA 3.13 4.76 1.52 NoLLP-MSA 2.25 3.41 1.52 NoLLP-HSA 2.00 2.63 1.32 LLP-MSA 1.56 2.38 1.52 1:1LLP-HSA 1.56 1.97 1.26 1+1LLP-HSA 1.00 1.32 1.32 homo TM NoLLP-LSA 2.54 4.60 1.81 NoLLP-MSA 1.93 3.27 1.69 NoLLP-HSA 1.93 3.21 1.66 LLP-MSA 1.27 2.30 1.81 1:1LLP-HSA 1.27 2.29 1.80 1+1LLP-HSA 1.00 1.68 1.68

760

M. Menth, M. Hartmann, and R. Martin

However, 1+1LLP-HSA reacts faster than 1:1LLP-HSA if links fail and may be applied when very fast resilience is needed. The scale-up factor of Section 3 can be alternatively calculated by θ(X, kbest ) = η(X,kbest ) and shows the benefit of routing optimization for each scenario X. Routing η(X,1) optimization improves the resource efficiency by 26%-52% in case of the heterogeneous traffic matrix and by 66%-81% in case of a homogenous traffic matrix. It is so powerful that more traffic can be carried with optimized NoLLP-HSA than with unoptimized NoLLP-LSA in case of the heterogeneous traffic matrix. Thus, with routing optimization, resilience can be achieved without additional bandwidth.

5 Summary and Conclusion As overload in networks is mostly caused by redirected traffic due to network failures [1], administrative IP link costs should be set in such a way that the maximum utilization ρmax S,E (k) of the links is low both under failure-free conditions and in likely failure scenarios. We presented the hill climbing and the hill hopping algorithms with different neighborhood generation strategies for the optimization of resilient IP routing. A comparison showed that some of them converge faster, but others lead to better optimization results. The presented methodology for the performance comparison is general and can be applied to other heuristic approaches. Different levels of service availability and multilayer resilience change the side conditions for the optimization because different failures need to be protected and depending on the technology, some of the physical layer capacity is dedicated to lower layers or can be shared among layers. Our results showed that with routing optimization 32% to 81% more traffic can be carried in our test network while keeping the same maximum utilization as without routing optimization. The exact values depend both on the required level of service availability, the multilayer resilience option, and the traffic matrix. Furthermore, the network itself has a large impact which has not been documented in this paper. Routing optimization turned out to be so powerful that protection of link and node failures leads in some settings to lower maximum link utilizations than unoptimized routing under failure-free conditions.

References 1. Iyer, S., Bhattacharyya, S., Taft, N., Diot, C.: An Approach to Alleviate Link Overload as Observed on an IP Backbone. In: IEEE Infocom, San Francisco, CA (April 2003) 2. Pióro, M., Szentesi, Á., Harmatos, J., Jüttner, A., Gajowniczek, P., Kozdrowski, S.: On Open Shortest Path First Related Network Optimisation Problems. Performance Evaluation 48 (2002) 201 – 223 3. Fortz, B., Thorup, M.: Internet Traffic Engineering by Optimizing OSPF Weights. In: IEEE Infocom, Tel-Aviv, Israel (2000) 519–528 4. Iannaccone, G., Chuah, C.N., Bhattacharyya, S., Diot, C.: Feasibility of IP Restoration in a Tier-1 Backbone. IEEE Network Magazine (Special Issue on Protection, Restoration and Disaster Recovery) (March 2004) 5. Fortz, B., Thorup, M.: Robust Optimization of OSPF/IS-IS Weights. In: International Network Optimization Conference (INOC), Paris, France (October 2003) 225–230

Robust IP Link Costs for Multilayer Resilience

761

6. Yuan, D.: A Bi-Criteria Optimization Approach for Robust OSPF Routing. In: 3rd IEEE Workshop on IP Operations and Management (IPOM), Kansas City, MO (October 2003) 91 – 98 7. Nucci, A., Schroeder, B., Bhattacharyya, S., Taft, N., Diot, C.: IGP Link Weight Assignment for Transient Link Failures. In: 18th International Teletraffic Congress (ITC), Berlin (September 2003) 8. Sridharan, A., Guerin, R.: Making IGP Routing Robust to Link Failures. In: IFIP-TC6 Networking Conference (Networking), Ontario, Canada (May 2005) 9. Demeester, P., Gryseels, M., Autenrieth, A., Brianza, C., Castagna, L., Signorelli, G., Clemente, R., Ravera, M., Jajszczyk, A., Janukowicz, D., Doorselaere, K.V., Harada, Y.: Resilience in Multilayer Networks. IEEE Communications Magazine (August 1999) 70 – 76 10. Oran, D.: RFC1142: OSI IS-IS Intra-Domain Routing Protocol (February 1990) 11. Rai, S., Mukherjee, B., Deshpande, O.: IP Resilience within an Autonomous System: Current Approaches, Challenges, and Future Directions. IEEE Communications Magazine (October 2005) 142–149 12. Shand, M., Bryant, S.: IP Fast Reroute Framework. http://www.ietf.org/ internet-drafts/draft-ietf-rtgwg-ipfrr-framework-06.txt (October 2006) 13. Fortz, B., Thorup, M.: Optimizing OSPF/IS-IS Weights in a Changing World. IEEE Journal on Selected Areas in Communications 20 (May 2002) 756 – 767 14. Dueck, G., Scheuer, T.: Threshold Accepting; a General Purpose Optimization Algorithm. Journal of Computational Physics 90 (1990) 161–175 15. Menth, M.: Efficient Admission Control and Routing in Resilient Communication Networks. PhD thesis, University of Würzburg, Faculty of Computer Science, Am Hubland (July 2004) 16. Sahasrabuddhe, L., Ramamurthy, S., Mukherjee, B.: Fault Tolerance in IP-Over-WDM Networking: WDM Protection vs. IP Restoration. IEEE Journal on Selected Areas in Communications (Special Issue on WDM-Based Network Architectures) 20(1) (Jan. 2002) 21–33

Integer SPM: Intelligent Path Selection for Resilient Networks R¨ udiger Martin, Michael Menth, and Ulrich Sp¨ orlein University of W¨ urzburg, Institute of Computer Science, Germany {martin,menth,spoerlein}@informatik.uni-wuerzburg.de

Abstract. The self-protecting multipath (SPM) is a simple and eﬃcient end-to-end protection switching mechanism. It distributes traﬃc according to a path failure speciﬁc load balancing function over several disjoint paths and redistributes it if one of these paths fails. SPMs with optimal load balancing functions (oSPMs) are unnecessarily complex because traﬃc aggregates potentially need to be split which is an obstacle for the deployment of SPMs in practice. The contribution of this paper is the proposal of an integer SPM (iSPM), i.e., the load balancing functions take only 0/1 values and eﬀectively become path selection functions. In addition, we propose a greedy heuristic to optimize the 0/1 distributions. Finally, we show that the iSPM is only little less eﬃcient than the oSPM and that the computation time of the heuristic for the iSPM is clearly faster than the linear program solver for the oSPM such that the iSPM can be deployed in signiﬁcantly larger networks.

1

Introduction and Related Work

Carrier grade networks typically require high availability in the order of 99.999% such that restoration or protection switching is needed. Restoration mechanisms, e.g. shortest path rerouting (SPR) in IP networks, try to ﬁnd new routes after a network element fails. Such methods are simple and robust [1, 2] but also slow [3]. Protection switching pre-establishes backup paths for fast switch-over in failure cases [4]. The classical concept is end-to-end (e2e) protection with primary and backup paths. In case of a failure, the traﬃc is just shifted at its path ingress router from the primary to the backup path. The switching is fast, but the signalling of the failure to the ingress router takes time and traﬃc already on the way is lost. Therefore, fast reroute (FRR) mechanisms provide backup alternatives not only at the ingress router but at almost every node of the primary path. Fast reroute mechanisms are already in use for MPLS [5, 6] and are currently also discussed for IP networks [7, 8, 9, 10]. In this context, the self-protecting multipath (SPM) has been proposed in previous work [11,12] as an e2e protection switching mechanism. Its path layout

This work was funded by Siemens AG, Munich, and by the Deutsche Forschungsgemeinschaft (DFG) under grant TR257/18-2. The authors alone are responsible for the content of the paper.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 762–773, 2007. c IFIP International Federation for Information Processing 2007

Integer SPM: Intelligent Path Selection for Resilient Networks

763

consists of disjoint parallel paths and the traﬃc is distributed over all of them according to a traﬃc distribution (or load balancing) function (see Figure 1). If a single path fails, the traﬃc is redistributed over the working paths according to another traﬃc distribution function. Thus, a speciﬁc traﬃc distribution function lfd is required for each demand d and for every pattern f of working and nonworking paths. Opposed to the conventional primary and backup paths concept, the SPM does not distinguish between a dedicated primary and backup paths. Both under failure-free conditions and in case of network failures, the traﬃc may be spread over several of the disjoint paths. And in contrast to optimum primary and backup paths [13], the SPM performs a traﬃc shift only if at least one of its disjoint paths is aﬀected by a failure. Thus, the reaction is based on local information and signalling of remote failures across the network is not required. This is important as the connectivity in such a situation is compromised.

p2 p1

ldf

p0

Fig. 1. The SPM distributes the traﬃc of a demand d over disjoint paths Pd = (p0d , ..., pkdd −1 ) according to a traﬃc distribution function lfd which depends on the pattern f of working and non-working paths

When a network is given with link capacities, traﬃc matrix, and the path layout for the disjoint paths of the SPMs, the traﬃc distribution functions lfd can be optimized. Optimization means that the maximum utilization of any link in the network is minimized for a set of protected failure scenarios S. Optimum traﬃc distribution functions lfd can be calculated by linear programs (LPs) [14] and may split the demands for transmission over diﬀerent paths. A comparison with other resilience mechanisms showed that this optimal SPM (oSPM) is very eﬃcient [15] in the sense that it can carry more primary traﬃc to achieve the same maximum utilization values than optimized single shortest path (SSP) and equal-cost multipath (ECMP) IP (re)routing, variants of MPLS FRR, and various e2e protection mechanisms based on the primary and backup path principle. However, the oSPM has three major drawbacks. Firstly, optimal traﬃc distribution functions require that traﬃc aggregates are potentially split and carried over diﬀerent paths. Thus, load balancing techniques are needed for the implementation of the SPM, which makes the SPM unnecessarily complex and which is a major obstacle for its deployment. Secondly, the LPs for the optimization of the oSPM become computationally infeasible for large networks. Thirdly, load balancing techniques required for traﬃc distribution are problematic due to inaccuracies caused by stochastic eﬀects [16].

764

R. Martin, M. Menth, and U. Sp¨ orlein

The contribution of this work is the deﬁnition of the integer SPM (iSPM) that allows only 0/1 values in the traﬃc distribution function lfd . This abandons the problems induced by fractional load balancing, but thereby the traﬃc distribution function lfd eﬀectively becomes a path selection function. The 0/1 constraints make the optimization more diﬃcult. Therefore, we develop a powerful heuristic for that problem. We show that the iSPM is only little less eﬃcient than the oSPM and that the heuristics are much faster than the LPs such that the iSPM can be applied in signiﬁcantly larger networks than the oSPM. This paper is organized as follows. Section 2 reviews the superiority of the oSPM over SSP (re)routing in small and medium-size networks and analyzes the values of the optimal traﬃc distribution functions. Section 3 describes the heuristic for the optimization of the 0/1 traﬃc distribution functions lfd for the iSPM. Section 4 compares the eﬃciency of oSPM and iSPM, it studies the eﬃciency of the iSPM in large networks, and it compares the time for the optimization of the traﬃc distribution functions for the oSPM and iSPM. Finally, the conclusion in Section 5 summarizes this work.

2

The Optimal Self-protecting Multipath (oSPM)

The conﬁguration of the SPM in existing networks is a two-stage approach. First, the k-shortest paths algorithm from [17] ﬁnds a suitable node and link disjoint multipath Pd for each demand d. Then, the traﬃc distribution functions lfd must be assigned for all demands d and their respective failure patterns f of working and non-working paths. In this section we brieﬂy review the optimal assignment for the distribution functions lfd by linear programs (LPs) [14] and show the superiority of this optimal SPM (oSPM) over single shortest path (SSP) (re)routing in small and medium size networks. 2.1

Measuring and Comparing the Eﬃciency of Resilience Mechanisms

We perform a parametric study to measure and compare the eﬃciency of resilience mechanisms. The degree deg(v) of a network node v is the number of its outgoing links. We construct sample networks for which we control the number of nodes n in the range from 10 to 200, the average node degree degavg ∈ {3, 4, 5, 6}, and the maximum deviation of the individual node degree from the average node degree degmax = {1, 2, 3}. We use the algorithm of [12] for the construction of these networks since we cannot control these parameters rigidly with the commonly used topology generators [18,19,20,21,22]. We sampled 5 random networks for each combination of network characteristics and tested altogether 1140 different networks. This is a huge amount of data and for the sake of clarity we restrict our presentation to a representative subset thereof. However, all statements made also hold for the larger data set. We consider the maximum link utilization of a network in all single link and router failure scenarios s ∈ S and M compare it for the optimized oSPM assignment (ρoSP max ) and unoptimized SSP

Integer SPM: Intelligent Path Selection for Resilient Networks

765

(re)routing (ρSSP max ). We use the unoptimized SSP (re)routing as our comparison baseline since it is the most widely used in today’s Internet. A comparison of the oSPM to optimized SSP (re)routing can be found in [15]. We use the prooSP M oSP M oSP M = (ρSSP as performance measure tected capacity gain γSSP max − ρmax )/ρmax to express how much more traﬃc can be transported by oSPM than by SSP with the same maximum link utilization. All ﬁgures in this paper are based on the assumption of a homogeneous traﬃc matrix and homogeneous link bandwidths, i.e., the entries of the traﬃc matrix are all the same and all links of a network have the same bandwidth. This, however, is not a major restriction as the topologies are random. 2.2

Superiority of the oSPM over SSP (Re)Routing

(%) Protected capacity gain γoSPM SSP

oSP M for the oSPM for small to Figure 2 shows the protected capacity gain γSSP medium size networks. Each point in the ﬁgure stands for the average result of the 5 sample networks with the same characteristics. The shape, the size, and the pattern of the points determine the characteristics of these networks, the corresponding x-coordinates indicate the average number of disjoint paths k ∗ that could be found in the networks for the SPM structures. The protected capacity gain increases signiﬁcantly with an increasing number of disjoint parallel paths k ∗ . More parallel paths increase the traﬃc distribution over the network and, thus, the capacity sharing potential for diﬀerent failure scenarios. Networks with the same average node degree degavg are clustered since there is a strong correlation between k ∗ and degavg . Finally, large networks lead to a signiﬁcantly oSP M larger protected capacity gain γSSP than small networks. Ideally, link bandwidths are dimensioned for the expected traﬃc based on the traﬃc matrix and the routing. In our study, we have random networks with equal link bandwidths. Thus, there are mismatches between the bandwidth and the traﬃc rate on the links. As the possiblity for strong mismatches increases with the network size, the potential to reduce the maximum link utilization by optimized resiliency

Network characteristics

n = 10 n = 15 n = 20 n = 30 n = 40 n = 50 n = 60 degavg = 3 degavg = 4 degavg = 5 degavg = 6 degdev,max = 1 degdev,max = 2 degdev,max = 3

250 200 150 100 50 0 2

3

4

5

6

Avg. number of disjoint parallel paths k* oSP M Fig. 2. Protected transmission gain γSSP of the oSPM compared to SPR for random networks depending on their average number of parallel paths

766

R. Martin, M. Menth, and U. Sp¨ orlein

Table 1. Number of traﬃc distribution functions lfd that use a given number of active paths for the COST239 network and the traﬃc share of demand d carried over the up to ﬁve possible paths in this network averaged over all traﬃc distribution functions and failure scenarios # of active paths 1 2 3 4 5 Traﬃc distribution functions lfd (%) 60 33 6.5 0.5 0

Path number 1 2 3 4 5 Average traﬃc share of a demand (%) 88.5 10 1.0 0.5 0

methods also increases. Although random networks are not realistic, they help to illustrate how well routing algorithms can exploit the optimization potential. 2.3

Analysis of the oSPM Traﬃc Distribution Functions

The analysis of the oSPM traﬃc distribution functions leads to two observations. First, most traﬃc distribution functions use one active path only and very few use more than two at the same time. Second, even if more than one path is active, almost all load is carried by a single active path. We exemplify these observations for the European research network COST239 in Table 1. It shows the percentage of traﬃc distribution functions lfd that eﬀectively use a certain number of active paths in the left part. We sort the paths of an SPM in a speciﬁc failure scenario s ∈ S according to the proportion of the traﬃc they carry and number them. The right part shows the average proportion of the traﬃc carried by each of the paths. The values in the table show that the optimal traﬃc distribution function carry most of the traﬃc over a single link although more alternatives exist. These observations motivate the key idea to restrict the traﬃc distribution functions to 0/1 values without signiﬁcantly losing the increased eﬃciency of the SPM.

3

The Integer SPM (iSPM)

The integer SPM (iSPM) allows only 0/1 values for the traﬃc distribution functions lfd which makes the optimization even more diﬃcult. This section ﬁrst clariﬁes some notation and then presents a greedy heuristic to optimize iSPM conﬁgurations. 3.1

Concept and Basic Notation

To formalize the SPM concept, we explain our basic notation, introduce implications of failure scenarios, and describe the concept of path failure speciﬁc traﬃc distribution functions. General Nomenclature. A network N = (V, E) consists of n = |V| nodes and m = |E| unidirectional links. A single path p between two nodes is a set pdistinct 0 · ∈ {0, 1}m. If and of contiguous links represented by a link vector p = p m−1

Integer SPM: Intelligent Path Selection for Resilient Networks

767

only if pi = 1 holds, path p contains link i. We denote traﬃc aggregates between routers vi ∈ V and vj ∈ V by d = (i, j). The basic structure of an SPM for a traﬃc aggregate d is a multipath Pd that consists of kd paths pid for 0 ≤ i < kd that are link and possibly also node disjoint except for their source and destination kd −1 ). nodes. It is represented by a vector of single paths Pd = (p0d , ..., pd Implications of Failure Scenarios. A failure scenario s is given by a set of failed links and nodes. The set of protected failure scenarios S contains all outage cases including the normal working case for which the SPM should protect the traﬃc from being lost. The failure indication function φ(p, s) yields 1 if a path p is aﬀected by a failure scenario s; otherwise, it yields 0. The failure symptom of a kd −1 , s) and indicates its multipath Pd is the vector fd (s) = φ(p0d , s), ..., φ(pd failed single paths in case of failure scenario s. Thus, with a failure symptom of fd = 0, all paths are working while for fd = 1 connectivity cannot be maintained. The set of all diﬀerent failure symptoms for the SPM Pd between vi and vj is denoted by Fd = {fd (s) : s ∈ S}. Traﬃc Distribution Functions. There is one SPM for each traﬃc aggregate d. This speciﬁc SPM has a general traﬃc distribution function to distribute the traﬃc over its kd diﬀerent paths. While the oSPM implements fractional traﬃc distribution and can use all working paths in parallel, the iSPM selects only a single path due to the restriction to 0/1 values. Thus, the iSPM uses the traﬃc distribution function as a path selection function. If certain paths fail, which is indicated by the symptom fd (s), the traﬃc distribution function shifts the traﬃc to one (iSPM) or several (oSPM) of the remaining working paths. Thus, the SPM needs a traﬃc distribution function lfd for each symptom f ∈ Fd that results from any protected failure scenarios s ∈ S. In this work, we take the protection of all single link or node failures into account such that at most one single path of a disjoint SPM multipath fails. This implies kd diﬀerent traﬃc distribution functions lfd for every traﬃc aggregate d. Since the general traﬃc kd distribution function lfd ∈ (R+ describes a distribution, it must obey 1 lfd = 1. 0) Furthermore, failed paths must not be used. 3.2

A Greedy Algorithm for Optimizing iSPM Conﬁgurations n0 An iSPM conﬁguration can be described by the following set L = {lfd = nk · −1 : d

d ∈ D, f ∈ Fd , lfd ∈ {0, 1}kd , 1 lfd = 1} and comprises all traﬃc distribution functions of the network. A neighboring iSPM conﬁguration L diﬀers from L by exactly one traﬃc distribution vector lfd . In the following ρS,E max (L) denotes the global maximum link utilization for a iSPM conﬁguration L over all scenarios S and all links E. Opposed to that, the local maximum link utilization for a iSPM conﬁguration L in scenario s ∈ S and the links of path pid is denoted by s,E(pi )

s,E(pid )

ρmax d (L). Since {s} ⊆ S and E(pid ) ⊆ E, the inequality ρS,E max (L) ≤ ρmax holds, i.e. the local value is only a lower bound for the global value.

(L)

768

R. Martin, M. Menth, and U. Sp¨ orlein

Require: network N = {V, E}, traﬃc demands D, multipath Pd for each aggreagte d ∈ D, and initial traﬃc distribution functions L S,E 1: calculate ρnew max ← ρmax (L) 2: repeat 3: ρmax ← ρnew max S,E 4: identify scenario smax ∈ S and link lmax ∈ E where ρmax (L) is reached 5: for all traﬃc aggregates d carrying traﬃc over lmax in smax do 6: identify single path pid of multipath Pd with lmax ∈ pid 7: for all single paths pjd (j = i) of Pd do 8: set L(d, j): pjd carries demand d in smax instead of pid j

smax ,E(p )

d 9: calculate ρ(d, j) ← ρmax (L(d, j)) with E(pjd ) = {l : l ∈ pjd } 10: insert (d,j) into sorted list Q according to ascending ρ(d, j) 11: end for 12: end for 13: repeat 14: remove ﬁrst tuple (d, j) from Q S,E 15: calculate ρnew max ← ρmax (L(d, j)) new 16: if ρmax < ρmax then 17: L ← L(d, j) 18: end if 19: until ρnew max < ρmax ∨ Q = ∅ 20: until ρnew max ≥ ρmax

Algorithm 1: Heuristic algorithm for the optimization of the load balancing functions of the iSPM Algorithm 1 describes the heuristic for the optimization of the iSPM conﬁguration. It follows a greedy approach to keep the computational complexity low. Initially, we choose a iSPM conﬁguration L where every traﬃc distribution function lfd sends the traﬃc for demand d ∈ D over a shortest working path for the respective failure pattern f ∈ F. Then, in each traversal of the outer loop (line 2-20), the algorithm basically chooses a neighboring iSPM conﬁguration L with a lower maximum link utilization ρS,E max (L ). This is done in two steps. First, we identify the bottleneck link lmax and the bottleneck scenario smax (line 4). Then we consider the following neighboring iSPM conﬁgurations L(d, j) (line 5-12). The demand d must be carried by the current conﬁguration L over the bottleneck link lmax (line 5) and conﬁguration L(d, j) diﬀers from L only in such a way that d is relocated from the bottleneck path pid containing lmax to another path pjd within its multipath Pd (line 8). These neighboring iSPM conﬁgurations L(d, j) potentially improve the utilization of the bottleneck link in the bottleneck scenario. We asses their quality by the computational less expensive local maximum utilization value s

,E(pj )

max d ρ(d, j) = ρmax (L(d, j)) (line 9) and rank them according to this value (line 10). Then, the neighboring iSPM conﬁguration L(d, j) with the best local maximum utilization value ρ(d, j) is chosen that also improves the overall maximum utilization value ρS,E max (L(d, j)) (line 13-19).

Integer SPM: Intelligent Path Selection for Resilient Networks

769

We chose this simple version of our algorithm for presentation because it nicely shows the key concept and because it produced very good results in all our experiments. However, in pathological cases with two independent bottlenecks links lmax and bottleneck scenarios smax the algorithm might have problems. Such cases require more enhanced methods that we cannot present here due to lack of space.

4

Results

In this section, we ﬁrst show that the path selection functions of the iSPM lead to almost the same eﬃciency as the load balancing functions of the oSPM. Then we compare the empirical computation time for the conﬁguration of the iSPM and the oSPM depending on the network size. Finally, we show the beneﬁt of the iSPM with respect to single shortest path (SSP) (re)routing in large networks. 4.1

Comparison of the Eﬃciency of iSPM and oSPM in Small and Medium-Size Networks

M iSP M oSP M oSP M Figure 3 shows the relative deviation ΔiSP of oSP M = (ρmax − ρmax )/ρmax iSP M the maximum link utilization of the iSPM (ρmax ) from the the one of oSPM M (ρoSP max ). Again, each point in the ﬁgure stands for the average result of the 5 sample networks with the same characteristics. The ﬁgure reveals an obvious M of the iSPM are larger than those trend: the maximum link utilizations ρiSP max of the oSPM and the diﬀerence increases with an increasing number of parallel paths k ∗ . The iSPM heuristic reaches deviation values of up to 50% for very small networks with n = 10 nodes, but for large networks the deviations are rather small. We explain this observation in the following. The number of demands in the

60

n = 10 n = 15 n = 20 n = 30 n = 40 n = 50 n = 60 degavg = 3 degavg = 4 degavg = 5 degavg = 6 degmax,dev = 1 degmax,dev = 2 degmax,dev = 3

Relative deviation ΔiSPM SSP (%)

Network characteristics 50

40

30

20

10

0 2

3

4

5

6

Avg. number of disjoint parallel paths k* M iSP M Fig. 3. Relative deviation ΔiSP oSP M of the maximum link utilization of the iSPM (ρmax oSP M from the one of the oSPM (ρmax )

770

R. Martin, M. Menth, and U. Sp¨ orlein

network scales quadratically with the number of nodes. Since the iSPM heuristic is restricted to integer solutions, it can shift only entire traﬃc aggregates to alternate paths while the oSPM is not restricted to any traﬃc granularity. In particular, for n = 10 nodes this granularity is too coarse for the iSPM to achieve similarly good maximum link utilizations as the oSPM. For networks with at least n ≥ 30 nodes, the deviations fall below 15%. And for networks with at least n ≥ 15 nodes and a moderate number of disjoint parallel paths (2 ≤ k ∗ ≤ 4.5), the deviation is smaller than 5% compared to the one of the oSPM. Considering the fact that large values of k ∗ ≈ 5 are rather unrealistic in real networks, the approximation of the oSPM by the iSPM yields very good results for realistic networks. In addition, the oSPM requires additional bandwidth to compensate load balancing inaccuracies which is not accounted for in this comparison. As the traﬃc distribution function of the oSPM eﬀectively degenerates to a path selection function in case of the iSPM, the iSPM cannot distribute the traﬃc of a single aggregate over diﬀerent paths. However, we observe that the iSPM is still almost as eﬃcient as the oSPM and so its eﬃciency also increases with an increasing number of disjoint parallel paths k ∗ . We explain that phenomenon as follows. The k ∗ disjoint paths serve as local sensors and indicate remote failures. Thus, more paths imply more accurate information about the network health that leads to a more eﬃcient path selection in failures cases. In addition, more paths also provide more alternatives to reduce the maximum link utilization in Algorithm 1. 4.2

Comparison of the Computation Time for iSPM and oSPM

Figure 4 shows the average computation time of the iSPM heuristic and the oSPM optimization depending on the network size in links and in nodes. For the iSPM, values for network sizes between 10 and 200 nodes are provided while for the oSPM, values are only available for networks of up to 60 nodes because the memory requirements of the LPs exceed the capabilities of our machines for larger networks.

Fig. 4. Average computation time for the optimization of the iSPM and the oSPM

Integer SPM: Intelligent Path Selection for Resilient Networks

771

The type of LP solver has a large impact on the computation time for the oSPM. The presented data in Figure 4 stem from from our analysis in [14] with the COmputational INfrastructure for Operations Research (COIN-OR) solver [23] which turned out to be the fastest freely available solver for this problem formulation. While the optimization of the oSPM already reaches values in the order of a day for n = 60 nodes, the heuristic runs clearly below 1 h even for very large networks with n = 200 nodes. The computation time of the iSPM heuristic is clearly sub-exponential and neither dominated by the number of nodes nor the number of links. With an increasing number of nodes, more traﬃc demands are possible candidates for reallocation to alternative paths in Algorithm 1 while with an increasing number of links, the computation of the global ρS,E max -value becomes more time intensive. 4.3

Eﬃciency of the iSPM in Large Networks

oSP M While Figure 2 shows the protected capacity gain γSSP of the oSPM compared to single shortest path (SSP) (re)routing for random networks with 10 – 60 nodes, iSP M of iSPM compared to SSP routing for random Figure 5 shows the gain γSSP networks with 10 – 200 nodes because the heuristic for the conﬁguration of the iSPM can cope with larger networks than the LP-based optimization for the oSPM. We observed in Figure 2 that the protected capacity gain of the oSPM increases with increasing network size and this trend continues with the iSPM for larger networks in Figure 2. As a result, the iSPM can carry between 150% and 330% more protected traﬃc than SSP routing.

Network characteristics

n = 10 n = 30 n = 60 n = 90 n = 120 n = 160 n = 200 degavg = 3 degavg = 4 degavg = 5 degavg = 6 degmax,dev = 1 degmax,dev = 2 degmax,dev = 3

300

i SPM

Protected capacity gain γ SSP (%)

350

250 200 150 100 50 0 2

3

4

5

6

Avg. number of disjoint parallel paths k* iSP M Fig. 5. Protected capacity gain γSSP of the iSPM compared to SSP routing

5

Conclusion

The SPM is a simple end-to-end protection switching mechanism that distributes the traﬃc of a single demand over several disjoint paths and it redistributes it if

772

R. Martin, M. Menth, and U. Sp¨ orlein

one of its disjoint paths fails. Thus, it is basically quite simple, but optimal path failure (f ) speciﬁc traﬃc distribution functions lfd require that traﬃc aggregates d may be split. This makes the simple mechanism unnecessarily complex and the accuracy of practical load balancing algorithms suﬀers from stochastic eﬀects. In addition, the conﬁguration of such optimal SPMs (oSPMs) in large networks is a time-consuming process that prevents its deployment in large networks. To get rid of these problems, we suggested in this work the integer SPM (iSPM) that uses only 0/1 traﬃc distribution functions which eﬀectively become path selection functions. As the restriction to 0/1 values makes the optimization problem more complex, we proposed a simple greedy heuristic to optimize the conﬁguration of the iSPM such that the maximum link utilization of all protected failure scenarios S is minimized. We showed that the iSPM is only little less eﬃcient (< 5%) than the oSPM in medium-size or large networks. Furthermore, the optimization of the conﬁguration takes about one hour for the iSPM in networks with 200 nodes while it takes about one day for the oSPM in networks with 60 nodes. And ﬁnally, the iSPM can carry between 150% and 330% more protected traﬃc than hop count based single shortest path routing in large networks with 160 – 200 nodes. After all, this work brings the SPM a major step forward to deployment in practice.

Acknowledgment The authors would like to thank Prof. Tran-Gia for the stimulating environment which was a prerequisite for that work.

References 1. Nucci, A., Schroeder, B., Bhattacharyya, S., Taft, N., Diot, C.: IGP Link Weight Assignment for Transient Link Failures. In: 18th International Teletraﬃc Congress (ITC), Berlin (September 2003) 2. Fortz, B., Thorup, M.: Robust Optimization of OSPF/IS-IS Weights. In: International Network Optimization Conference (INOC), Paris, France (October 2003) 225–230 3. Francois, P., Filsﬁls, C., Evans, J., Bonaventure, O.: Achieving Sub-Second IGP Convergence in Large IP Networks. ACM SIGCOMM Computer Communications Review 35(2) (July 2005) 35 – 44 4. Autenrieth, A., Kirst¨ adter, A.: Engineering End-to-End IP Resilience Using Resilience-Diﬀerentiated QoS. IEEE Communications Magazine 40(1) (January 2002) 50–57 5. Pan, P., Swallow, G., Atlas, A.: RFC4090: Fast Reroute Extensions to RSVP-TE for LSP Tunnels (May 2005) 6. Saito, H., Yoshida, M.: An Optimal Recovery LSP Assignment Scheme for MPLS Fast Reroute. In: International Telecommunication Network Strategy and Planning Symposium (Networks). (June 2002) 229–234 7. Kvalbein, A., Hansen, A.F., Cicic, T., Gjessing, S., Lysne, O.: Fast ip network recovery using multiple routing conﬁgurations. In: IEEE Infocom, Barcelona, Spain (April 2006)

Integer SPM: Intelligent Path Selection for Resilient Networks

773

8. Shand, M., Bryant, S.: IP Fast Reroute Framework . http://www.ietf.org/ internet-drafts/ draft-ietf-rtgwg-ipfrr-framework-06.txt (October 2006) 9. Francois, P., Bonaventure, O.: An Evaluation of IP-Based Fast Reroute Techniques. In: CoNEXT (formerly QoFIS, NGC, MIPS), Toulouse, France (October 2005) 10. Bryant, S., Shand, M., Previdi, S.: IP Fast Reroute Using Not-via Addresses. http://www.ietf.org/internet-drafts/draft-ietf-rtgwg-ipfrr-framework06.txt (October 2006) 11. Menth, M., Reifert, A., Milbrandt, J.: Self-Protecting Multipaths - A Simple and Resource-Eﬃcient Protection Switching Mechanism for MPLS Networks. In: 3rd IFIP-TC6 Networking Conference (Networking), Athens, Greece (May 2004) 526 – 537 12. Menth, M.: Eﬃcient Admission Control and Routing in Resilient Communication Networks. PhD thesis, University of W¨ urzburg, Faculty of Computer Science, Am Hubland (July 2004) 13. Murakami, K., Kim, H.S.: Optimal Capacity and Flow Assignment for Self–Healing ATM Networks Based on Line and End-to-End Restoration. IEEE/ACM Transactions on Networking 6(2) (April 1998) 207–221 14. Menth, M., Martin, R., Spoerlein, U.: Optimization of the Self-Protecting Multipath for Deployment in Legacy Networks. In: IEEE International Conference on Communications (ICC). (June 2007) 15. Menth, M., Martin, R., Hartmann, M., Spoerlein, U.: Eﬃciency of Routing and Resilience Mechanisms. In: currently under submission. (2007) 16. Martin, R., Menth, M., Hemmkeppler, M.: Accuracy and Dynamics of HashBased Load Balancing Algorithms for Multipath Internet Routing. In: IEEE International Conference on Broadband Communication, Networks, and Systems (BROADNETS), San Jose, CA, USA (October 2006) 17. Bhandari, R.: Survivable Networks: Algorithms for Diverse Routing. Kluwer Academic Publishers, Norwell, MA, USA (1999) 18. Waxman, B.M.: Routing of Multipoint Connections. IEEE Journal on Selected Areas in Communications 6(9) (1988) 1617–1622 19. Zegura, E.W., Calvert, K.L., Donahoo, M.J.: A Quantitative Comparison of GraphBased Models for Internet Topology. IEEE/ACM Transactions on Networking 5(6) (1997) 770–783 20. Jin, C., Chen, Q., Jamin, S.: Inet: Internet Topology Generator. Technical Report CSE-TR-433-00, Department of EECS, University of Michigan, USA (2000) 21. Medina, A., Matta, I., Byers, J.: BRITE: An Approach to Universal Topology Generation. In: International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), Cinncinnati, Ohio, USA (August 2001) 22. Tangmunarunkit, H., Govindan, R., Jamin, S., Shenker, S., Willinger, W.: Network Topology Generators: Degree-Based vs. Structural. In: ACM SIGCOMM. (August 2002) 23. Forrest, J.: The COIN-OR Linear Program Solver (CLP). http://www.coin-or.org (2005)

Beyond Centrality - Classifying Topological Signiﬁcance Using Backup Eﬃciency and Alternative Paths Yuval Shavitt and Yaron Singer Tel Aviv University, Tel Aviv, Israel [email protected]

Abstract. In networks characterized by broad degree distribution, such as the Internet AS graph, node signiﬁcance is often associated with its degree or with centrality metrics which relate to its reachability and shortest paths passing through it. Such measures do not consider availability of eﬃcient backup of the node and thus often fail to capture its contribution to the functionality and resilience of the network operation. In this paper we suggest the Quality of Backup (QoB) and Alternative Path Centrality (APC) measures as complementary methods which enable analysis of node signiﬁcance in a manner which considers backup. We examine the theoretical signiﬁcance of these measures and use them to classify nodes in the Internet AS graph while applying the BGP valleyfree routing restrictions. We show that both node degree and node centrality are not necessarily evidence of its signiﬁcance. In particular, some medium degree nodes with medium centrality measure prove to be crucial for eﬃcient routing in the Internet AS graph. Keywords: Internet topology, network analysis, Internet AS graph.

1

Introduction

The topological study of networks appears in a wide spectrum of research areas such as physics [3], biology [13], and computer science [10]. In research of the Internet, node signiﬁcance classiﬁcation has received attention in past studies [18,4,5] and was treated in two diﬀerent contexts: study of the Internet resiliency against attacks and failures [3,12,14,7] and identiﬁcation of the Internet core nodes as well as signiﬁcance categorization [18,5]. Both study threads were conducted at the Internet AS level graph. Several attempts have been made in the past to characterize the core of the Internet AS graph. In [18] the most connected node was used as the natural starting point for deﬁning the Internet’s core. Other ASes were also classiﬁed to four shells and tendrils that hang from the shells, where ASes in shells with a small index are considered more important than ones in higher indices. Further work has dealt with classiﬁcation of nodes into few shells with decreasing importance [8,17]. In recent study [5] k-shell graph decomposition was used to classify nodes by importance to roughly 40 layers of hierarchal signiﬁcance. The I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 774–785, 2007. c IFIP International Federation for Information Processing 2007

Beyond Centrality - Classifying Topological Signiﬁcance

775

k-shell classiﬁcation, based on the node’s connectivity, identiﬁed over 80 ASes as the Internet core, some of which with medium degrees. Almost exclusively, an attempt to rank ASes by metrics other than node degree was done by CAIDA [1], where the ‘cone’ of the node was used to determine its importance, namely the number of direct and indirect customers of the AS. As network functionality is often measured by connectivity and vertex distances in the graph used as its model, measures which credit vertices connected to a relatively large number of vertices at relatively short distances are often used as signiﬁcance indicators [9,3]. However, inadequate consideration of backup by such measures often overshadows signiﬁcance in context of its contribution to functionality and resilience of the network. Existence of backup raises question regarding a node’s signiﬁcance since failure of a node with backup does not eﬀect connectivity nor does it increase path lengths in the network and therefore the eﬀect of failure in such instances is minimal. Furthermore, existence of backup denies exclusivity of the information passing through the node in the network. Since nodes can have backups of various qualities, measures of backup eﬃciency and topological signiﬁcance which considers backup are crucial for analysis of network functionality. In this paper, we suggest two complementary measures which capture a node’s contribution to the network’s functionality: the Quality of Backup (QoB) and the Alternative Path Centrality (APC). The QoB measures backup quality of a vertex regardless of its centrality or eﬀect on the functionality of the network, enables comparison of backup eﬃciency between vertices in the graph as well as between vertices from diﬀerent graphs, and can thus serve as a universal measure for backup. The APC measures functionality which considers both backup quality and centrality of vertices in graphs and therefore enables analysis of nodes’ signiﬁcance in a wider context in comparison to other topological measures. Our starting point for the node signiﬁcance classiﬁcation problem is examination on levels of theoretical abstraction, and then evaluation of our results on the Internet AS graph. Since failure of a node on the AS level is possible [7] though highly unlikely, applying APC and QoB on the Internet AS graph allows a unique insight to the Internet as oppose to quantifying eﬀects of failures. On the AS level, centrality which considers backup reveals signiﬁcance in context of potential information which exclusively passes through a node, and its backup quantiﬁes the dependency of its customers on its transit services. In our study we use APC to identify the most signiﬁcant nodes in the Internet AS graph, and show that these are not necessarily members of the Internet core. In accordance with properties of APC, it is not surprising that the largest ASes in the core, such as UUNET and Sprint, also have very high APC values due to the large number of customer ASes. However, small networks with poor backup like the French research network RENATER, and the GEANT and Abiline academic backbones which have degrees as low as 51 (RENATER) and low centrality values, have very high APC values as well. The rest of this paper is organized as follows. The next section discusses the concept of backup in networks and introduces the QoB as a measure of universal

776

Y. Shavitt and Y. Singer

backup eﬃciency. Section 3 provides detail of the APC construction and discusses its properties. In Section 4, we discuss modiﬁcations of our methods in order to maintain relevance in the Internet AS graph model. Section 5 holds our analysis of the Internet AS graph using the modiﬁed measures in comparison to previous works. Finally, we summarize and discuss our work in section 6.

2

Quantifying Backup Eﬃciency

In our attempt to quantify backup in networks we observed that the quality of backup of a given vertex in the graph is determined by the number of direct children covered by a set of backup vertices, and the eﬃciency of reaching this set by the set of direct parents. For a vertex v ∈ V , we deﬁne the set of children Cv , set of parents Pv , and the backup set Bv , as follows: Cv = {u ∈ V |(v, u) ∈ E}, Pv = {u ∈ V |(u, v) ∈ E}, Bv = {w ∈ V |∃u ∈ Cv : (w, u) ∈ E}. Clearly, in instances of undirected graphs Cv ≡ Pv , and the discussion which follows remains relevant for these instances as well. For u, w ∈ V , we use δ(u, w) to denote the shortest path distance between u and w in G. The shortest distance, δ(u, w), can be calculated by any set of rules, e.g., based on additional annotations on the graph edges, and is not limited to minimum hop. By convention, if u cannot reach v through any path in G, then δ(u, v) = ∞. We also use δv (u, w) to represent the distance of the shortest path which bypasses v from u to w. Let G = (V, E) be a directed or undirected graph where V is the set of vertices and E is the set of edges. For v ∈ V , let Pv be the set of v’s direct parents and let Cv be the set of v’s direct children. The Quality of Backup of v, denoted ρ(v) is: −1 u∈Pv w∈Cv (max{δv (u, w) − 1, 1}) ρ(v) = |Pv | · |Cv | The rational behind this measure is the following. To measure backup eﬃciency of a given vertex, it is enough to examine the cost of re-routing paths from its set of parents to its set of direct children. Note that max{δv (u, w) − 1, 1} = δv (u, w) − δ(u, w) + 1 for all pairs u, w, where u ∈ Pv and w ∈ Cv . For v ∈ V , it is easy to see that ρ(v) = 1 ⇐⇒ ∀x, u ∈ Pv × Cv ∃w ∈ Bv : (x, w) ∈ E ∧ (w, u) ∈ E. Also, note that ρ(v) = 0 ⇐⇒ δv (x, u) = ∞ ∀x, u ∈ Pv × Cv . Thus, ρ : V −→ [0, 1], and returns 1 for vertices with perfect backups and 0 for vertices with no backup. Formal implementation of QoB on unweighted directed graphs is presented in the ﬁgure below. Here, bfsv denotes the bfs algorithm which bypasses a vertex v, and δ¯v (u), denotes the vector of shortest path distances from u to all the vertices in the graph, which bypass v.

Beyond Centrality - Classifying Topological Signiﬁcance

777

QoB(v, G) ρ←0 for all u ∈ Pv do δ¯v (u) ← bfsv (u, G) for all w ∈ Cv do ρ ← ρ + (max{δv (u, w) − 1, 1})−1 ρ ρ ← |Pv |·|C v| return ρ

The following theorem shows the QoB measure indeed enables local measurement of a vertex’s backup in the graph. Theorem 1. For G = (V, E), for a vertex v ∈ V with Pv = ∅ and Cv = ∅, ρ(v) monotonically increases with respect to rise in backup eﬃciency. Proof. For u ∈ Pv and w ∈ Cv , assume that δv (u, w) = c in G, where 1 < c ≤ ∞. Construct G by adding some edge e ∈ / E, such that δv (u, w) = c < c, where δv (u, w) represents the distance between u and w bypassing v in G . Therefore, 1 1 −1 ≥ (max{δv (u, w) − 1, 1})−1 δv (u,w) > δv (u,w) , and (max{δv (u, w) − 1, 1}) (where equality holds only when c = 2). It thus easily follows that ρ (v) > ρ(v), where ρ (v) is the QoB measure of v in G .

3

Alternative Path Centrality

The above section discusses backup eﬃciency of a vertex regardless of centrality considerations. In an attempt to quantify signiﬁcance, note that centrality of a node (its ability to reach a relatively large number of nodes eﬃciently) also plays a vital role in analysis: a node which has relatively eﬃcient backup can be crucial to the network’s functionality due to its high centrality, while a node with poor backup and low centrality can have little eﬀect on functionality in the network. The Alternative Path Centrality (APC) measure presented in this section enables quantifying topological contribution of a node to the functionality of the network as it considers both centrality and backup eﬃciency. Given a graph G = (V, E) as above and u ∈ V , the topological centrality measure used here, denoted χ, where χ : V −→ R is: χ(u) =

w∈V \{u}

Clearly, 0 ≤ χ(u) ≤ |V | − 1

1 δ(u, w)

∀u ∈ V.

For a vertex u ∈ V , the value of χ(u) depends on the number of vertices connected to u and their distances from it; χ monotonically increases with respect to both centrality and connectivity of the vertex. Thus, in relation to other vertices in the graph, high χ values are obtained for a vertex which is connected to a large

778

Y. Shavitt and Y. Singer

number of vertices at short distances. Symmetrically, a vertex connected to a small number of vertices at large distances yields low χ values. These properties make the χ function a favorite candidate for measuring vertices’ centrality in the network. In [11] for a network G, the average of χ values was used to deﬁne the eﬃciency of the network. Similar topological measures have also been used in [3] and in [9] to study functionality in complex networks. For G = (V, E), The APC value of v ∈ V , denoted ϕ(v) is: χ(u) − χv (u) ϕ(v) = u∈V \{v}

u∈V \{v}

Where χv denotes centrality values calculated in the graph using alternative paths which bypass v. The rational behind APC is simple. In instances where network functionality is determined by shortest paths and connectivity, the signiﬁcance of a node v to the network’s functionality can be measured by its eﬀect on these criteria. Computing the diﬀerence between vertices’ topological centrality using v, and topological centrality bypassing v, enables witnessing v’s exclusive contribution to the network’s functionality. The algorithm presented below is a simple implementation of APC using the Breadth First Search (bfs) algorithm for unweighted directed graphs. APC(v, G) ϕ←0 for all u ∈ V \{v} do ¯ δ(u) ← bfs(u, G) δ¯v (u) ← bfsv (u, G) χΔ ← 0 for all w ∈ V \{v, u} do 1 1 − δv (u,w) χΔ ← δ(u,w) return ϕ

Using the bfs algorithm, the overall computational complexity of APC is O(|V | · |E|). For weighted graphs, one can substitute the bfs algorithm with a single-source shortest path algorithm for non-negative weighted graphs, such as Dijkstra’s algorithm [6] and achieve (|V | · (|V | · log |V | + |E|)) running time. We conclude our discussion of the APC properties with the following theorem which shows that APC properly considers both centrality and backup of a vertex in the graph. Theorem 2. For G = (V, E) and v ∈ V , Cv = ∅, ϕ(v) monotonically increases with respect to rise in topological centrality and decrement in backup quality. Proof. To prove ϕ(v) monotonically increases with respect to rise in centrality, let χ(v) < |V | − 1, and w ∈ V be a vertex for which 1 < δ(v, w) ≤ ∞. Let e ∈ /E be some edge for which δ (v, w) < δ(v, w), where δ (v, w) denotes the shortest

Beyond Centrality - Classifying Topological Signiﬁcance

779

path distance in G = (V, E {e}), and e does not create new alternative paths to w in G (otherwise backup eﬃciency increases). We show that ϕ(v) < ϕ (v), where ϕ (v) denotes the APC value of v in G . For all x ∈ V , which reach w through v, δ (x, w) < δ(x, w) and δv (x, w) = δv (x, w). For all such vertices, x, we have: 1 1 1 1 − > − δ (x, w) δv (x, w) δ(x, w) δv (x, w) and it therefore follows that ϕ(v) < ϕ (v). To prove monotonic increase with respect to decrement in backup quality, assume that some edge e ∈ / E has been added to G, such that ρ(v) increases. We again denote G = (V, E {e}), and use similar notation as above. We therefore assume ρ (v) > ρ(v). Speciﬁcally, there is some pair u, w ∈ Pv × Cv such that δv (u, w) < δv (u, w). For this pair we have: 1 δ (u, w)

−

1 δv (u, w)

<

1 1 − δ(u, w) δv (u, w)

It trivially follows that ϕ (u) < ϕ(u), and concludes proof of the theorem.

4

Adaptation of APC and QoB for the Directed AS Graph

To apply QoB and APC on the Internet, we have adjusted these measures to conform to the model of the AS graph and speciﬁcally to the routing restriction which it imposes. We begin with a brief description of the AS graph model. The Internet AS Graph The Internet today consists of tens of thousands of networks, each with it own administrative management, called Autonomous Systems (ASes). Each such AS uses an interior routing protocol (such as OSPF, RIP) inside its managed network, and communicates with neighboring ASes using an exterior routing protocol, called BGP. The graph which models inter-connection between ASes in the Internet is referred to as the Internet AS graph. Since the ASes in the Internet are bound by commercial agreements, restrictions are imposed on the paths which may be explored. The commercial agreements between the ASes are characterized by customer-provider, provider-customer and peer-to-peer relations. A customer pays its provider for transit services, thus the provider transits all packets to and from its customers. The customer, however, will not transit packets for its provider. Speciﬁcally, a customer will not transit packets between two of its providers, or between its provider and its peers. Peers are two ASes that agree to provide transit information between their respective customers. In pioneering work, Lixin Gao [8] has deduced that a legal AS path may either be an up hill path, followed by a down hill path, or an up hill path, followed by a peering link, followed by a down hill path. An up hill path is a sequential set, possibly empty, of customer-provider links, and a down hill path is a sequential

780

Y. Shavitt and Y. Singer

set, possibly empty, of provider-customer links. Therefore a legal route between ASes can be described as a valley free path. A peering link can be traversed only once in each such path, and if it exists in the path it marks the turning point for a down hill path. The ASQoB and ASAPC Measures Since transitivity is not immediate in the AS graph, the QoB requires two cardinal adjustments to maintain relevance. Consider the AS graph G = (V, E), and some v ∈ V , for which we wish to obtain ρ(v) in G. Let u ∈ Pv and w ∈ Cv . The ﬁrst adjustment is to consider the pair u, w ⇐⇒ u can reach w through v using a legal AS path. Since the bfs algorithm does not consider the up, down, and peer labels, valley free paths are not exclusively discovered, and it cannot be used to measure minimum-hop distances in the AS graph. For this, we use the asbfs algorithm [16] which discovers valley free shortest paths from a source vertex in the unweighted AS graph in linear time. In order to provide motivation for the second adjustment required, we present the following example. Consider the graph illustrated in Fig. 1. In quantifying the QoB of v ∈ V Suppose a vertex u ∈ Pv has reached a vertex w ∈ Cv through an up hill path through v, though by using the alternative path through the vertex b ∈ Bv , u now reaches w through a down hill path. All vertices in Cw which are reached through an up hill path (x in this example), are now unreachable to u as this creates an illegal AS path. Therefore, to factor this into the QoB measure in the AS graph, we use the following strategy. For all vertices w ∈ Cv we scan for vertices x ∈ Cw which are reachable from v through legal AS paths, and consider the pairs u, x ∈ Pv × Cw as well. The ASQoB algorithm is described in the ﬁgure below. We denote by Ruv the set of reachable children of v from u in accordance to policy based routing in the AS graph.

ASQoB(v, G) ρ←0 for all u ∈ Pv do δ¯v (u) ← asbfsv (u, G) for all w ∈ Ruv do ρ ← ρ + (max{δv (u, w) − 1, 1})−1 for all x ∈ Rvw do ρ ← ρ + (max{δv (v, x) − 1, 1})−1 ρ ρ← u∈Pv w∈Ruv |Ruv |+|Rvw | return ρ

Drawing its strength from the properties of the QoB measure, the ASQoB remains faithful to the principles of measuring backup eﬃciency in the AS graph. For v ∈ V , as reachable children are scanned in two levels, we are guaranteed that ρ(v) = 1 ⇐⇒ v has a perfect backup which does not disqualify legal AS paths.

Beyond Centrality - Classifying Topological Signiﬁcance

781

x

w

v

b

u

Fig. 1. Illustration of an instance in an AS graph where a direct child can be reached through a backup vertex, though its paths cannot be used. Direction of an edge implies it is an up edge, and for each up edge a down edge in the opposite direction exists (not portrayed). Here, b serves as a backup for v. In accordance to the valley free restrictions, u can reach w, though cannot reach x through b.

Substituting the bfs algorithm with its analogous for the AS graph, asbfs, applying APC on the AS graph is immediate. The calculation of a shortest path, δ, is done while considering the valley free routing and all the properties discussed in section 3 hold.

5

Analyzing the Directed AS Graph

We used the combined data from the DIMES [15] and RouteViews [2] projects for week 11 of 2006. The AS graph is comprised of 20,103 ASes and 57,272 AS links. We approximate the AS relationship by comparing the k-core index [5] of two ASes and taking the one with the highest k-core index as the provider of the other. If the k-core indices of two ASes are equal, the ASes are treated as peers. While we are aware that our approximation involves some inaccuracies, there is no known error free algorithm for this task. Since the majority of the interesting ASes are within the range of AS numbers 1-22,000, we present results of these 11,407 ASes along with results of ASes with degree higher than 40 of the rest of AS graph. We ﬁrst show that while centrality is closely related to the node degree in the AS graph, our APC criteria captures signiﬁcance which is not necessarily associated with high degree. Fig. 2 shows the centrality values of AS nodes averaged by their degree on a log-log scale. There is almost a monotonic increase in centrality for nodes of degree above 300, and the close relationship between centrality and degree is evident. On the other hand, Fig. 3 shows there is a clear monotonic (and fairly linear in the log-log scale) increase in the average APC value from degree 3 up to around 40, and above this value the number of nodes with the same degree is below 10. Therefore any one ‘outlayer’, namely a node with extreme high or low APC values, can change the average signiﬁcantly. To display the relationship between high centrality and high APC we plot the degree and APC values of the nodes with the highest centrality (Fig. 5) and the

782

Y. Shavitt and Y. Singer 4

5

10

10

4

3

10

average APC (log scale)

average centrality (log scale)

10

2

3

10

2

10

10

1

10

1

10 0 10

0

1

10

2

10 degree (log scale)

3

10

4

10

Fig. 2. Average centrality as a function of its degree

10 0 10

1

10

2

10 degree (log scale)

3

10

4

10

Fig. 3. Average APC as a function of its degree

degree and centrality of the nodes with the highest APC values (Fig. 4). The ﬁve ASes with the highest degree, 701 (UUNET), 7018 (AT&T), 1239 (Sprint), 3356 (Level3), and 174 (Cogent), are also the ﬁve ASes with the highest centrality. These are the largest tier-1 providers. In contrast, only UUNET is in the top ten APC list, mainly due to its high number of peer ASes; Sprint and Cogent also have high APC values. These three tier-1 providers support many stub ASes but have relatively low backup measure (0.7–0.75) which explain their high APC values. Level3, which has high centrality, has low APC value because it has a rather high QoB around 0.82. This means that although Level3 (3356) plays a central role in Internet routing, it may be replaced through alternative routes and thus is not as important as the previous three nodes. The next nodes with high centrality are 3549 (GBLX), 2914 (Verio), 7132 (SBC), 6461 (Abovenet), and 12956 (Telefonica). These are all tier-1 providers or major providers in Europe. For the nodes with the highest APC values the picture is diﬀerent: while UUNET (701) has the fourth largest APC value, many of the high locations in the list are captured by medium sized ASes with poor (and sometimes extremely poor) backup. Through study of the QoB distribution in the AS graph we have learned that there is a large concentration around 1, which is a testament of perfect backup. The median QoB value is 0.9799, and a large majority of the nodes have QoB values above 0.95. The nodes ranked ﬁrst, third, and eighth in the top APC list are educational networks: GEANT (20965) in Europe, ENA (11686) in the USA, and RENATER (2200) in France (Abiline the US research network was ranked eleventh). The other group of nodes is of medium size providers, France Telecom (3215), YIPES (6517), Ukraine Telecom (6849), and ServerCentral (23352), each appears to have high APC values due to a diﬀerent reason. France Telecom, YIPES and UKR Telecom have extremely low QoB, while ServerCentral connects remote locations that may not have eﬃcient alternative paths. Statistics of nodes with highest APC values are displayed in table 1. Fig. 6 shows the distribution of the APC values in the AS graph (note the truncation of the ﬁrst column). The APC distribution is shown to have a long but narrow tail with only a few nodes with very high APC values, these nodes are

Beyond Centrality - Classifying Topological Signiﬁcance

783

high APC nodes

4

10

high centrality nodes 15000

701

701

11686 7018 1239 10000 174

20965 3

12859

10

APC

centrality

23352

6517

3356 7132

& 6849 5000

3215 2200

10910

2914 6461 3549

12956

2

10 1 10

2

3

10

0 500

4

10

10

1000

1500

degree

2000

2500

3000

degree

Fig. 4. The degree and centrality of the nodes with the highest APC values

Fig. 5. The degree and APC of the nodes with the highest centrality values

Distribution of APC Values in AS Graph for all nodes with degree 3 and up 60 4402

Distribution of QoB Values in AS Graph for all nodes with degree 3 and up 3000

50 2500

40 2000

30 1500

20 1000

10 500

0

0

0.5

1

1.5

2

2.5

3 4

x 10

Fig. 6. A histogram of the APC values for nodes of degree greater than 2. The ﬁrst bin holds 4402 ASes and was truncated.

0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fig. 7. A histogram of the backup values for nodes of degree greater than 2

scattered almost over the entire degree range, starting with nodes with degree of just above 50 (see Fig. 4 and Table 1). The QoB distribution shown in Fig. 7 has a large concentration around 1, which is a testament of perfect backup. The median value is 0.9799, and as the histogram shows a large majority of the nodes have QoB values above 0.95. To discuss our results in comparison to other measures of node signiﬁcance, we refer to table 2 which shows the top ten nodes in the CAIDA ranking [1] based on the number of customers a node has. The list is dominated by high degree nodes; the two medium degree nodes in the list have also rather high APC values; in general all the nodes have relatively high APC values and eight of them are in the top 38 APC list. All the nodes in the list have poor QoB values, possibly due to relatively large stub ASes connecting to them. It is vivid that the centrality of the nodes in the CAIDA list is much larger than on our

784

Y. Shavitt and Y. Singer

Table 1. statistics of AS nodes with highest APC values

Table 2. statistics of AS nodes with highest CAIDA signiﬁcance rankings

AS No. degree cent. QoB APC 20965 74 1190 0.79 26628 10910 205 385 0.59 16298 11686 187 3389 0.92 16042 701 2616 7956 0.72 14276 3215 115 422 0.80 13493 6517 175 474 0.83 12851 6849 186 472 0.56 12765 2200 51 347 0.50 12549 12859 79 1017 0.94 12396 23352 71 2113 0.94 12065

AS No. degree cent. QoB APC 3356 1784 7690 0.82 7559 209 1272 5381 0.72 6113 7018 2354 7992 0.74 11448 1239 2020 8022 0.74 10604 701 2616 7956 0.72 14276 3561 708 5762 0.79 2579 174 1483 7144 0.76 8797 703 216 1441 0.86 10539 19262 188 905 0.75 10763 702 680 5672 0.77 2101

APC list. While all the nodes identiﬁed as important in the CAIDA list have high APC values, the opposite analogy does not apply. Several of the nodes in our top 10 list are ranked below 200 in the CAIDA list.

6

Conclusion

We have shed light on the contribution of backup eﬃciency for the node signiﬁcance classiﬁcation problem. Given our theoretical analysis, we believe this contribution has merit in classiﬁcation of network nodes in other ﬁelds outside the data networking domain. We are aware that our results are not accurate for several reasons. First, as we stated in the main text, our AS relationship approximation is not accurate. Second, although we used the most detailed Internet map available through the DIMES project, the graph itself is still missing many links which can eﬀect the calculation of all the measures, as well as the AS relationship deduction. In the future we intend to broaden this research to study the eﬀect of node failure on the Point of Presence (PoP) level as well as study relationship of sets of nodes in the AS graph in the context of backup and functionality. On the theoretical level, we intend to study the robustness of the APC and QoB measures to error in measurements, as well as further formal analysis of their properties.

References 1. CAIDA AS ranking. http://as-rank.caida.org/. 2. University of oregon RouteViews project. http://www.antc.uoregon.edu/routeviews/. 3. R´eka Albert, Hawoong Jeong, and Albert-L´ aszl´ o Barab´ asi. Error and attack tolerance of complex networks. Nature, (406):378–382, 2000. 4. Sagy Bar, Mira Gonen, and Avishai Wool. An incremental super-linear preferential internet topology model. In PAM ’04, Antibes Juan-les-Pins, France, April 2004.

Beyond Centrality - Classifying Topological Signiﬁcance

785

5. Shai Carmi, Shlomo Havlin, Scott Kirkpatrick, Yuval Shavitt, and Eran Shir. Medusa: New model of Internet topology using k-shell decomposition. Technical report, arXiv, January 2006. 6. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Cliﬀord Stein. Introduction to Algorithms. 2001. 7. Danny Dolev, Sugih Jamin, Osnat Mokryn, and Yuval Shavitt. Internet resiliency to attacks and failures under BGP policy routing. Computer Networks, 50(16):3183–3196, November 2006. 8. Lixin Gao. On inferring automonous system relationships in the Internet. IEEE/ACM Transactions on Networking, 9(6):733–745, December 2001. 9. H. Jeong, S. P. Mason, A. L. Barab´ asi, and Z. N. Oltvai. Lethality and centrality in protein networks. Nature, 411:41–42, 2001. 10. Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. In SODA, pages 668–677, 1998. 11. Vito Latora and Massimo Marchiori. Eﬃcient behavior of small-world networks. Physical Review Letters, 87(19):198701, 2001. 12. Damien Magoni. Tearing down the internet. IEEE Journal on Selected Areas in Communications, 21(6):949–960, August 2003. 13. Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, D. Chklovski, and Uri Alon. Network motifs: simple building blocks of complex networks. Science, 298:824–827, 2002. 14. Seung-Taek Park, Alexy Khrabrov, David M. Pennock, Steve Lawrence, C. Lee Giles, and Lyle H. Ungar. Static and dynamic analysis of the internet’s susceptibility to faults and attacks. In IEEE INFOCOM 2003, San-Francisco, CA, USA, April 2003. 15. Yuval Shavitt and Eran Shir. DIMES: Let the internet measure itself. ACM SIGCOMM Computer Communication Review, 35(5), October 2005. 16. Yuval Shavitt and Yaron Singer. A linear time shortest paths algorithm for the Internet AS graph. Tel Aviv University Technical Report, (EE102), 2007. 17. Lakshminarayanan Subramanian, Sharad Agarwal, Jennifer Rexford, and Randy H. Katz. Characterizing the Internet hierarchy from multiple vantage points. In IEEE INFOCOM 2002, New-York, NY, USA, April 2002. 18. L. Tauro, C. Palmer, G. Siganos, and M. Faloutsos. A simple conceptual model for the Internet topology. In Global Internet, November 2001.

Incorporating Protection Mechanisms in the Dynamic Multi-layer Routing Schemes Anna Urra, Eusebi Calle Ortega, Jose L. Marzo, and Pere Vila Institute of Informatics and Applications (IIiA), University of Girona, 17071 Girona, Spain

Abstract. In the next generation backbone networks, IP/MPLS over optical networks, the ability to maintain an acceptable level of reliability has become crucial since a failure can result in a loss of several terabits of data per second. Although routing schemes with protection exist, they generally relate to a single switching layer: either wavelength or packet switching oriented. This paper presents a new dynamic and multi-layer routing scheme with protection that considers cooperation between IP/MPLS and optical switching domains. A complete set of experiments proves that the proposed scheme is more eﬃcient when compared to routing algorithms with full optical protection or full IP/MPLS protection.

1

Introduction

The use of optical technology in core networks combined with IP/Multi-Protocol Label Switching (MPLS) [1] solution has been presented as a suitable choice for the next generation Internet architecture. The integration of both layers is facilitated by the development of Generalized MPLS (GMPLS) [2]. In this network architecture, a single ﬁber failure can result in potentially huge data losses as the eﬀects propagate up and through the network causing disruptions in the service of many applications. Thus, survivability has become a key issue to improve and satisfy the increasing requirements of reliability and Quality of Service (QoS) of these applications. Fault recovery schemes have been adopted in the network in order to provide such survivability. These schemes are based on switching the traﬃc aﬀected by the failure to a backup path. The computation of the working and backup paths is a crucial step to oﬀer the required QoS to the traﬃc services. Some relevant parameters, such as resource consumption and recovery time, could be aﬀected negatively if suitable routing algorithms are not used. According to the timing of backup path computation, recovery mechanisms are classiﬁed in protection and restoration [3]. Although restoration is ﬂexible in terms of resource consumption, it oﬀers low recovery time and the recovery

This work was supported by the Spanish Research Council (TEC2006-03883/TCM) and the Generalitat de Catalunya through grant SGR-00296. The work of A. Urra was also supported by the Ministry Universities, Research and Information Society (DURSI) and European Social Funds.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 786–796, 2007. c IFIP International Federation for Information Processing 2007

Incorporating Protection Mechanisms

787

action may not be successful because of insuﬃcient network resources. Protection describes recovery schemes that are pre-planned for both spare capacity and backup paths achieving the shortest recovery time and providing high availability against network failures. The accuracy and performance of routing algorithms with protection in terms of resource consumption depends on the available network information. The availability of full or partial network information inﬂuences the management of the network capacity [4]. The reduction of the recovery time is another parameter to be considered for backup path selection and it is achieved by applying segment or local backup path methods instead of path protection [5]. Nowadays diﬀerent QoS routing algorithms exist that consider protection mechanisms, full/partial network information and local/segment backups [4,6,7,8]. However, these routing schemes operate in a single switching layer: either optical and wavelength oriented or IP/MPLS and Label Switched Path (LSP) oriented. Thus, both optical and IP/MPLS layers independently deploy their own fault recovery methods.This results in protection duplications making fault management more diﬃcult and poor resource utilization. Two network scenarios may be considered in order to improve network management and resources: 1) the static multi-layer network scenario or 2) the dynamic multi-layer network scenario. In the static multi-layer network scenario [9,10], the logical topology deﬁned by the optical layer is given, ﬁxed and partially protected. Some of the logical links are assumed to be already protected at the optical layer. Thereby, at the IP/MPLS layer, spare capacity is reserved to protect only those logical links that are unprotected. In the dynamic multi-layer network scenario, interoperability between each IP/MPLS and optical switching domain is considered. Although eﬀort has been devoted in developing dynamic multi-layer routing schemes that consider both switching domains [11], protection is not considered amongst them. In this paper, a dynamic cooperation between wavelength and LSP domain is taken into account in order to provide protected paths cost-eﬀectively.

2

Multi-layer Architecture Overview

In the multi-layer architecture, Label Switched Paths (LSPs) are routed in the optical network through lightpaths. For better utilization of the network resources, LSPs should be eﬃciently multiplexed into lightpaths and then, these lightpaths should be demultiplexed into LSPs at some router. This procedure of multiplexing/demultiplexing and switching LSPs onto/from lightpaths is called traﬃc grooming. Traﬃc grooming is an important issue for next generation optical networks. Photonic multi-layer routers have the technology to implement traﬃc grooming [11]. Each consists of a number of Packet-Switching Capable (PSC) ports (p) and number of wavelengths (w). The number of PSC indicates how many lightpaths can be demultiplexed into this router, whereas the number of wavelengths corresponds to the number of wavelengths connected to the same adjacent router. Three scenarios are associated with p according to the following switch architectures [12]:

788

A. Urra et al.

– Single-hop grooming: p = 0. Using this type of switching architecture, the network does not oﬀer packet switching capability at intermediate nodes. Thus, traﬃc from a source node is multiplexed onto a direct lightpath to the destination node. In this case, either backup lightpaths at the optical domain or global backup LSPs (path protection) at the IP/MPLS domain are established to protect the connections. – Multihop partial-grooming: 0 < p < w. In this case, some wavelengths may be demultiplexed at the intermediate nodes for switching at ﬁner granularity. Therefore, some LSPs will be able to perform segment/local protection. – Multihop full-grooming: p = w. Every wavelength on each ﬁber link forms a lightpath between adjacent node pairs. Thus, the logical topology is predetermined and exactly the same as the physical topology. All the IP/MPLS protection strategies, i.e. global, segment and local, are suitable for all LSPs. Note that, although the PSC ports at intermediate nodes allow performing packet segment/local protection, the number of optical-electrical-optical (o-eo) conversions increases. Thus, the cost of o-e-o conversions must be considered during the path computation because they represent a bottleneck to network throughput and also inﬂuence the overall delay. The granularity of the recovery strategy is also an important parameter in terms of time recovery and fault management. Diverse switching granularity levels exist into the optical IP/MPLS network scenario. Going from coarser to ﬁner, there is ﬁber, wavelength (lightpath) and LSP switching. The level of recovery at the optical layer is bundle of lightpaths or individual lightpaths. Since recovering at the optical layer recovers aﬀected connections in-group, the recovery action is fast and easier to manage than recovering each aﬀected LSP individually in the IP/MPLS layer. However, the coarser is the granularity; the higher the resource consumption. The ﬁner IP/MPLS granularity results in better resource consumption.

3

Problem Statement

In this section we discuss the basis of our proposed routing scheme. A tradeoﬀ exists between the resource consumption and the cost added to the network in terms of recovery time, failure management and node technology. Better use of network resources is achieved by recovering at IP/MPLS layer due to its ﬁner switching granularity. However, the recovery actions at optical domain are much faster and easier to manage, since the aﬀected connections are recovered in group. Therefore, a cooperation between both layers seems to be the solution in order to take the advantages of each switching domain. The proposal presented in this paper is a ﬁrst order approach that takes into account the dynamic multi-layer network scenario. This proposal is based on the establishment of link-disjoint lightpath/LSP pairs: the lightpath/LSP and the backup lightpath/LSP. When a failure occurs at a lightpath, the traﬃc is switched to the respective backup lightpath. If no backup lightpath exists, the

Incorporating Protection Mechanisms

789

traﬃc is switched to the respective backup LSPs. The main objective is to take advantage of both switching domains. 3.1

Network Deﬁnition

Let GP = (V, EP ) and GL = (V, EL ) represent the physical topology and the logical topology respectively, where V is the set of photonic MPLS routers; EP and EL are the set of network physical links and lightpath respectively. Each router has p input and output PSC ports, where P SCi(u) input ports and P CSo(u) output ports of node u are already not assigned to any lightpath. Each physical link has w wavelengths. When a LSP is requested, the proposed routing scheme considers both physical links and lightpaths, i.e. EP ∪ EL . In order to univocally identify the physical links and existing lightpaths that connect node pair (i, j) the 3-tuple (i, j, k) is used. Thus, the link (i, j, k) is a physical link if k = 0, otherwise (k > 0) it is a lightpath. uv total Each (i, j, k) lightpath has an associated Rijk residual capacity; Sijk capacity reserved to protect the physical link (u, v, 0); and Tijk the total shared capacity allocated in link (i, j, k). LSP requests are deﬁned by (s, d, b) where (s, d) is the source and destination node pair; and b, speciﬁes the amount of capacity required for this request. For each request, a working LSP (WP) has to be set-up. A backup LSP (BP) must be also set-up, whenever the WP has, at least, one unprotected lightpath. If there are not suﬃcient resources in the network, for either the WP or the BP, the request is rejected. 3.2

Lightpath and LSP Computation

In the proposed scheme, a new procedure to compute the WP is presented. In this procedure the following cost parameters are taken into account: 1. The residual capacity of the link candidates, Rijk . 2. The maximum number of hops, H, i.e. maximum number of lightpaths that the WP may traverse. 3. The free packet switching ports of each router, P CSi and P SCo. Note that the residual capacity of the physical links with free wavelengths is the capacity of the wavelength. The proposed procedure, called Dynamic MultiLayer Routing (DMR) algorithm (Algorithm 1), computes the min-hop WP based on a variation of the Dijkstra algorithm. In this case, the number of hops coincides with the number of lightpaths. Thus, the consecutive sequence of physical links, that constitutes a lightpath, is only considered as one hop. The DMR procedure uses the network graph composed by lightpaths and physical links, i.e. G = (V, EP ∪EL ). This procedure ends when it reaches the destination node or there is no feasible path between source and destination nodes. If a feasible path exists then the procedure may return: 1. A sequence of existing protected lightpaths. 2. A sequence of physical links. In this case, a new unprotected lightpath is set up between source and destination node.

790

A. Urra et al.

2

1

4

New unprotected -LSP

5

3

6

G=(V, E P U E L)

6

Packet LSP Lambda LSP Physical link Unprotected lambda LSP

Fig. 1. Working p-LSP computation. Creation of a new unprotected lightpath using the physical links (5,4) and (4,1).

3. A sequence of physical links, protected and unprotected lightpaths. In this case, new unprotected lightpaths are setup for each consecutive sequence of physical links as shown in Fig. 1. In this example, a new unprotected lightpath is set up with the physical links (5,4) and (4,1).

Algorithm 1. Dynamic Multi-Layer Routing for all v ∈ V do Cost(v) = ∞ P red(v) = s W P last(v) = s Cost(s) = 0 Q←s while (d ∈ / Q and Q = ) do u ← min cost(Q) Q = Q − {u} for all v ∈ adjacency(u, G) do for all (u, v, k) ∈ E do if (Rijk ≥ b) and ((k = W P last(u) = 0) or (Cost(u) + 1 < Cost(v) < H)) then if (P SCi(v) > 0 and W P Last(u) > 0 and k = 0) or (P SCo(v) > 0 and k > 0 and W P last(u) = 0) or (W P last(u) > 0 and k > 0) or (k = W P last(u) = 0) then P red(v) = u W P last(v) = k Q←v if not (k = W P last(u) = 0) then Cost(v) = Cost(u) + 1

In the DMR algorithm (Alg. 1), Cost(v) is a vector containing the path cost from s to v; P red(v) contains the v’s predecessor node; and W P last(v) contains the identiﬁer k of link (u, v). Q represents the list of adjacent vertices which are not visited yet. Function min cost(Q) returns the element u ∈ Q with the lowest Cost(u); and adjacency(u) is the adjacency list of vertex u in graph G.

Incorporating Protection Mechanisms

3.3

791

Backup Lightpath and LSP Computation

Once the WP is known, the BP is computed. Three diﬀerent procedures could be applied depending on the WP characteristics: Step 1. If the WP is a sequence of existing protected lightpaths, the computation of the BP is not required. Step 2. If the WP is a new unprotected lightpath and an available and shareable backup lightpath exists, this is used to protect the lightpath. Otherwise, a new backup lightpath is set-up applying DMR algorithm (Algorithm 1) with G = (V, EP ). If the procedure fails to ﬁnd a backup lightpath, go to Step 3. Step 3. If the WP is a combination of protected and unprotected lightpaths, then a variation of the Partial Disjoint Path (PDP) algorithm [9] is used to compute the BP. The variations are the ones included to the Dijkstra algorithm in order to consider the packet switching ports in the DMR algorithm. The PDP may overlap with protected lightpaths of the WP, since they are already protected, and the nodes of the WP. Therefore, no extra resources are necessary in the IP/MPLS layer against failure of protected lightpaths in the optical layer. When the BP overlaps the WP, more than one segment backup paths are established.

4 4.1

Performance Evaluation Restorable Routing Algorithms

Our proposed Dynamic Multi-layer Routing scheme with Protection (DMP) is evaluated. DMP computes the WP using the DMR algorithm (Alg. 1) and the BP according to the criteria presented in Section 3.3. In order to compare the merits of the new routing scheme, the following algorithms based on Oki policies [11] are also considered: – Policy 1 with Protection (P1P). The routing policy 1 ﬁrst tries to allocate the LSPs to an existing lightpath. If a lightpath is not available then a sequence of existing lightpaths with two or more hops that connects the source and destination nodes are selected. Otherwise, a new one-hop lightpath is established. When a new lightpath is created, then a backup lightpath is also set up. – Policy 2 with Protection (P2P). The routing policy 2 ﬁrst tries to allocate the LSPs to an existing lightpath. If the lightpath is not available then a new one-hop lightpath is established and selected as the new LSP. Otherwise, a sequence of existing lightpaths with two or more hops are selected. As in the case of the P1P algorithm, a backup lightpath is set up when a new lightpath is created. If P1P and P2P fail to ﬁnd a feasible LSP or backup lightpath, then the request is rejected. As shown in Table 1, the Full Routing Information (FIR) algorithm [4] is also considered in order to evaluate the performance of the new routing scheme

792

A. Urra et al. Table 1. Routing schemes for multi-layer protection evaluation

Routing Working Backup Protection Switching scheme path path domain architecture DMP DMR DMR IP/MPLS and Multihop partial grooming (Alg. 1) (Sec. 3.3) optical protection P1P Policy 1 Backup lightpaths Optical protection Multihop partial grooming P2P Policy 2 Backup lightpaths Optical protection Multihop partial grooming FIR WSP FIR IP/MPLS protection Multihop full grooming

when only IP/MPLS protection is applied. In this case, the Widest Shortest Path (WSP) is used to compute the WP. 4.2

Simulation Results

For this set of simulations the request rejection ratio and the network resource consumption are analyzed according to the following parameters: – H: The maximum number of lightpaths that a LSP may traverse. The number of hops is an important parameter since it cuts down the number of o-e-o conversions. – p: The number of PSC ports per node. – w: The number of wavelengths per ﬁber. Note that the FIR scheme is simulated under multihop full grooming. Thereby, its performance is independent of p and w. The NSFNET topology described in [11] is used. NSFNET topology consists of 14 nodes and 21 physical links. Each physical link is bi-directional, with the same number of wavelengths in each direction. The transmission speed of each wavelength is set to 10 Gbps. The number of PSC ports p is the same in each node. Figure 2 shows the performance of the proposed scheme, DMP, compared to 1) optical oriented routing algorithms with protection, P1P and P2P, and 2) IP/MPLS oriented routing algorithm with protection, FIR. Results show that the proposed DMP outperforms P1P and P2P schemes because of the ﬁner granularity. P2P is practically independent of the number of hops because of the ﬁrst-create procedure used to compute the LSP. Hence, most of the LSPs have low number of hops. However, each lightpath may traverse several physical links, consuming high amount of wavelengths. On the other hand, FIR presents a sharp increase in the request rejection ratio from H = 6 because there are no many disjoint paths with number of hops ≤ 6 and, consequently, many requests are rejected for H < 6. Next two results show the inﬂuence of the number of PSC ports per node for all routing algorithms when H = 4 and H = 6 (see Fig. 3). FIR operates under multihop full grooming (p = w), however, the results are shown in order to present the IP/MPLS bound of the solution in terms of capacity when H = 6.

Incorporating Protection Mechanisms

793

Request rejection ratio (%)

30 PASFF 25

FIR

20

P1P

15

P2P

10 5 0 2

4

6

8

10

12

Maximum number of hops, H

30 25 20

PASFF FIR

15

P1P

10

P2P

5 0 3

a)

5

8

10

20

30

40

Number of PSC ports per node,

Request rejection ratio (%)

Request rejection ratio (%)

Fig. 2. Number of hops analysis (p = 10)

50

p

30 25 20 15 10 5 0 3

b)

5

8

10

20

30

40

Number of PSC ports per node,

50

p

Fig. 3. Number of PSC ports per node analysis when a)H = 4 b) H = 6

Again, DMP scheme results in better use of the network resources compared to P1P and P2P. When p is small, the rejections are due to few available PSC ports and, for all, optical protection is applied. Figure 4 shows the inﬂuence of the number of wavelengths per ﬁber when p = 10 and H = 4 and H = 6. As shown, the number of rejected requests lineally increases for FIR when H = 6. Moreover, since P2P prioritize lightpaths that directly connects source and destination nodes, it outperforms P1P when w > 24. Plus, P2P also oﬀers better performance than DMP when H = 4 for w > 24. Note that DMP and FIR behavior sharply change according to the maximum number of hops (see Fig. 2) while P1P and P2P do not. From these results, it can be concluded that the DMP algorithm allows decreasing the rejected requests due to the ﬁnest recovery granularity at the IP/MPLS domain. Additionally, DMP outperforms FIR algorithm when the number of o-e-o conversions is reduced (H). Moreover, when H ≥ 6, DMP only outperforms FIR when the network nodes have high number of PSC ports. In terms of resource consumption analysis, ﬁrst the total number of lightpaths and backup lightpaths established is evaluated in Fig. 5. The case of H = 4 and w = 18 is only plotted for clarity since the behavior of all the schemes is similar in all cases in terms of network resources. Figure 5a shows the total number of lightpaths created. Since FIR operates under multihop full grooming, each

794

A. Urra et al.

30 PASFF

25

FIR

20

P1P

15

P2P

10 5 0 6

a)

12

18

24

30

Request rejection ratio (%)

Request rejection ratio (%)

30

25 20 15 10 5 0

36

6

Number of wavelengths per fiber,

b)

w

12

18

24

30

36

Number of wavelengths per fiber, w

Fig. 4. Number of wavelengths per ﬁber analysis when p = 10 and a)H = 4 b)H = 6

1000

Number of lambda LSPs

DMP 800

FIR P1P

600

P2P 400 200 0 3

a)

5

8

10

20

30

40

Number of PSC ports per node,

50

p

Number of backup lambda LSPs

wavelength is seen as a lightpath. Knowing that 1) the number of links of the NSF network is 21, 2) there is a bi-directional ﬁber per link and 3) each ﬁber has 18 wavelengths; the total number of lightpaths in the network for FIR is 21 · 2 · 18 = 756. This number is an upper bound of the maximum number of lightpaths that may be established. In the DMP scheme, when the number of PSC ports increases, the number of new lightpaths slightly increases. On the other hand, the number of new lightpaths sharply increases from P SC = 3 to P SC = 10 for P1P and P2P algorithms. The number of PSC ports has higher impact to P1P and P2P schemes because of the full optical protection applied. This is shown in Fig. 5b, where the curve of new backup lightpaths has similar behavior than the one of new lightpaths for P1P and P2P. However, although P1P has similar number of new lightpath than P2P, it has lower number of new backup lightpaths respect to P2P; P1P scheme shares higher number of backup lightpaths. In the case of DMP scheme, few lightpaths are optically protected because most of the failures are recovered at IP/MPLS domain. Figure 6 analyzes the average of hops of the LSPs. P2P results into low average number of lightpaths per LSP since it gives priority at creating new lightpaths for each request, see Fig. 6. On the other hand, the rest of the algorithms oﬀer an average of two lightpaths per LSP. Taking into account that H = 4, LSPs may

b)

300

200

100

0 3

5

8

10

20

30

40

Number of PSC ports per node,

50

p

Fig. 5. Total number of a) lightpaths and b) backup lightpaths for H = 4 and w = 18

Incorporating Protection Mechanisms

795

Average of lambda LSPs per packet LSP

2.5 DMP 2

FIR

1.5

P1P P2P

1 0.5 0 3

5

8

10

20

30

40

50

Number of PSC ports per node, p

Fig. 6. Average of a) physical links per λ-LSP b) λ-LSPs per p-LSP and c) physical links per p-LSP, for H = 4 and w = 18

1+2+3+4 traverse up to 4 lightpaths, thus, the theoretical average number is 4 = 2.5. Thereby, the new LSPs have usually less than 4 lightpaths when P1P, FIR and DMP algorithms are applied. Note that the best algorithm in terms of hops is P2P; it requires low amount of packet switching operations. However, it suﬀers from high request rejection ratio.

5

Conclusion

In this paper a novel routing scheme has been proposed: the Dynamic Multilayer routing with Protection (DMP) scheme. DMP scheme considers a dynamic cooperation between packet and wavelength switching domain in order to minimize the resource consumption. Results have shown that FIR and DMP are the best schemes in terms of network resources. The use of IP/MPLS recovery mechanisms with ﬁner granularity results into better ﬁlling of the capacity and less number of rejected requests comparing to P1P and P2P that apply protection at optical domain. Moreover, when the number of o-e-o conversions is limited (H), the proposed scheme outperforms the FIR scheme that only considers IP/MPLS recovery. Thus, DMP should be chosen to compute new lightpaths/LSPs and their backups; reducing the number of o-e-o operations and making an eﬃcient use of the network resources.

References 1. E. Rosen, A. Viswanathan and R. Callon: Multiprotocol label switching architecture,IETF RFC 3031,(2001). 2. E. Mannie: Generalized Multi-Protocol Label Switching (GMPLS) architecture, IETF RFC 3945, (2004). 3. V. Sharma and F. Hellstrand: Framework for Multi-Protocol Label Switching (MPLS)-based recovery, IETF RFC 3469, (2003). 4. G. Li, D. Wang, C. Kalmanek and R. Doverspike: Eﬃcient distributed path selection for shared restoration connections, IEEE Infocom, (2002) 140–149.

796

A. Urra et al.

5. J. L. Marzo, E. Calle, C. Scoglio and T. Anjali: QoS on-line routing and MPLS multilevel protection: a survey, IEEE Commun. Mag. 41, (2003) 126–132. 6. P.-H. Ho, J. Tapolcai and H. T. Mouftah: On achieving optimal survivable routing for shared protection in survivable next-generation Internet, IEEE Trans. Reliab. 53, (2004) 216–225. 7. D. Xu, Y. Xiong and C. Qiao: Novel algorithms for shared segment protection, IEEE J. Sel. Areas Commun. 21, (2003) 1320–1331. 8. K. Kar, M. Kodialam and T. V. Lakshman: Routing restorable bandwidth guaranteed connections using maximum 2-route ﬂows, IEEE Infocom, (2002) 772–781. 9. A. Urra, E. Calle and J. L. Marzo: Reliable services with fast protection in IP/MPLS over optical networks, Journal of Optical Networking 5, (2006) 870– 880. 10. A. Urra, E. Calle and J. L. Marzo: Enhanced multi-layer protection in multi-service GMPLS networks, IEEE Globecom, (2005) 286–290. 11. E. Oki, K. Shiomoto, D. Shimazaki, N. Yamanaka, W. Imajuku and Y. Takigawa: Dynamic multilayer routing schemes in GMPLS-based IP+Optical networks, IEEE Commun. Mag. 43, (2005) 108–114. 12. K. Zhu, H. Sang and B. Mukherje: A comprehensive study on next-generation optical grooming switches, J. Sel. Areas Comm. 21, (2003) 1173–1186.

Accelerated Packet Placement Architecture for Parallel Shared Memory Routers Brad Matthews1 , Itamar Elhanany1 , and Vahid Tabatabaee2 1

Department of Electrical and Computer Engineering, The University of Tennessee 2 Institute for Advanced Computer Studies, The University of Maryland

Abstract. Parallel shared memory (PSM) routers represent an architectural approach for addressing the high memory bandwidth requirements dictated by output-queued switches. A fundamental related challenge pertains to the design of the high-speed memory management algorithm which is responsible for placing arriving packets into non-conﬂicting memories. In previous work, we have extended PSM results by introducing the concept of Fabric on a Chip (FoC). The latter advocates the consolidation of core packet switching functions on a single chip. This paper further develops the underlying technology for high-capacity FoC designs by incorporating a speedup factor coupled with a multiple packet placement process. This yields a substantial reduction in the overall memory requirements, paving the way for the implementation of large scale FoCs. We further provide analysis for establishing an upper bound on the suﬃcient number of memories along with a description of an 80Gbps switch implementation on an Altera Stratix II FPGA.

1

Introduction

Recent years have witnessed unprecedented advances in the design, veriﬁcation formalism, and deployment of high capacity, high performance packet switching fabrics. Such fabrics are commonly employed as fundamental building blocks in data networking platforms that span a wide variety of application spaces. Local and Metro area network platforms, for example, host fabrics that typically support up to hundreds of gigabits/sec. However, switching fabrics are not limited to Internet transport equipment. Storage area networks (SANs) often necessitate large packet switching engines that enable vast amounts of data to traverse a fabric, whereby data segments ﬂow from users to storage devices, and vice versa. The switching capacity of an Internet router is often dictated by the memory bandwidth required to buﬀer arriving packets. With the demand for greater capacity and improved service provisioning, inherent memory bandwidth limitations were encountered rendering input queued (IQ) [1] switches and combined input and output queued (CIOQ) architectures more practical. Output-queued (OQ) switches, on the other hand, oﬀer several highly desirable performance characteristics, including minimal average packet delay, controllable Quality of Service (QoS) provisioning, and work-conservation under any admissible trafﬁc conditions [2]. However, the memory bandwidth of such systems is O(N R), I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 797–807, 2007. c IFIP International Federation for Information Processing 2007

798

B. Matthews, I. Elhanany, and V. Tabatabaee

where N denotes the number of ports and R the data rate of each port. Clearly, for high port densities and data rates, this constraint dramatically limits the scalability of the switch. In relation to standard switching architectures, the Fabric on a Chip (FoC) approach seeks to exploit the recent improvements in the fabrication of VLSI circuitry in order to consolidate many switching functions on a single silicon die. Advances in packaging technology now make it possible for large amounts of information to simultaneously be forwarded to a single chip, which was not possible several years ago. There are several key advantages that are attributed to the concept of FoC. First, it eliminates the need for virtual output queueing (VOQ) [3] as well as some output buﬀering associated with standard switch architectures. Second, by exploiting the ability to access the multiple on-chip Mbits of dual-port SRAM, packets can be internally stored and switched without the need for external memory devices. The crosspoint switches and scheduler, pivotal components in input-queued switches, are avoided thereby substantially reducing chip count and power consumption. Third, much of the signaling and control information that typically spans multiple chips can be carried out on a single chip. Finally, the switch management and monitoring functions can be centralized since all the information is available at a single location. In an eﬀort to retain the desirable attributes of output-queued switches, while signiﬁcantly reducing the memory bandwidth requirements of shared memory architectures, such as the parallel shared memory (PSM) switch/ router, have recently received much attention [4]. PSM utilizes a pool of slow-running memory units operating in parallel. At the core of the PSM architecture is a memory management algorithm that determines, for each arriving packet, the memory unit in which it will be placed. This paper extends previous work by the authors [5] on the design of large-scale PSM switches, from a single-chip realization perspective. By introducing computation and memory speedup components, a more eﬃcient high-speed memory management algorithm is attained, yielding higher system scalability. The rest of the paper is structured as follows. In Section II an overview of parallel shared memory switch architectures is provided from an FoC standpoint. Section III describes the proposed switch architecture and memory management algorithm. Section IV oﬀers a detailed analysis establishing an upper bound on the suﬃcient number of parallel memories required. In Section V the hardware architecture and FPGA-based simulation results are described, while in Section VI the conclusions are drawn.

2

Switch Fabric on a Chip

Initial work has indicated that, assuming each of the shared memory units can perform at most one packet-read or -write operation during each time slot, a suﬃcient number of memories needed for a PSM switch to emulate a FCFS OQ switch is K = 3N − 1 [4]. The latter can be proven by using constraint sets analysis (also known as the ”pigeon hole” principle), summarized as follows. An arriving packet must always be placed in a memory unit that is currently not

Accelerated Packet Placement Architecture for PSM Routers

799

being read from by any output port. Since there are N output ports, this ﬁrst condition dictates at least N memory units are available. In addition, no arriving packet may be placed in a memory unit that contains a packet with the same departure time. This results in additional N − 1 memory units representing the N − 1 packets having the same departure time as the arriving packet, that may have already been placed in the memory units. Should this condition not be satisﬁed, two packets will be required to simultaneously depart from a memory unit that can only produce one packet in each time slot. The third and last condition states that all N arriving packets must be placed in diﬀerent memory units (since each memory can only perform one write operation). By aggregating these three conditions, it is shown that at least 3N − 1 memory units must exist in order to guarantee FCFS output queueing emulation. Although this limit on the number of memories is suﬃcient, it has not been shown to be necessary. In fact, a tighter bound was recently found, suggesting that at least 2.25N memories are necessary [6]. Regardless of the precise minimal number of memories used, a key challenge relates to the practical realization of the memory management mechanism, i.e. the process that determines the memories in which arriving packet are placed. Observably, the above memory-management algorithm requires O(N ) iterations to complete. In [7],[8] Prakash, Sharif, and Aziz proposed the Switch-Memory-Switch (SMS) architecture, which is a variation on the PSM switch, as an abstraction of the M-series Internet core routers from Juniper. The approach consists of statistically matching input ports to memories, based on an iterative algorithm that statistically converges in O(logN ) time. However, in this scheme, each iteration comprises multiple operations of selecting a single element from a binary vector. Although the nodes operate concurrently from an implementation perspective, these algorithms are O(log 2 N ) at best (assuming O(logN ) operations are needed for each binary iteration as stated above). Since timing is a critical issue, the computational complexity should directly reﬂect the intricacy of the digital circuitry involved, as opposed to the high-level algorithmic perspective. To address the placement complexity issue, in prior work we proposed a pipelined memory management algorithm that reduced the computational complexity of placing a packet in a buﬀer to O (1). The subsequent cost associated with reducing the placement complexity is an increase in the number of required parallel memories to O N 1.5 and a ﬁxed processing latency. The justiﬁcation resides in the newfound ability to store and switch packets on chip as multiple megabits of dual-port SRAM are now available. Furthermore, it is now plausible to consider that all data packets can arrive at a FoC directly, thus eliminating the need for virtual output queueing [3] as well as some of the output buﬀering common employed by existing router designs. The elimination of crosspoint switches and scheduler, as found in IQ switches, provides an important reduction in chip count and power consumption. In achieving a greater degree of integration from such consolidation, a substantial reduction in overall resource consumption is expected.

800

B. Matthews, I. Elhanany, and V. Tabatabaee

In extending the architecture to allow for speedup and multiple packet placements, we enable more than one packet decision and placement to occur during a single packet time. This helps reduce the number of total required parallel memories, as discussed in the next section.

3 3.1

Packet Placement Algorithm Switch Architecture

We begin with a detailed description of the proposed PSM switch structure, depicted in Figure 1. The most signiﬁcant component in the architecture is the pipelined memory-management algorithm. A departure time is calculated for each packet, prior to the insertion of packets into the memory management subsystem. This process is governed by the output scheduling algorithm employed, and is generally very fast. The most straightforward scheduler is ﬁrst-comeﬁrst-serve (FCFS), in which packets are assigned departure times in accordance with their arrival order. To provide delay and rate guarantees, more sophisticated schedulers [2] can be incorporated which is reﬂected by the departure time assignments. The main contribution of this paper resides in the memory management algorithm that distributes the packet-placement process, at a cost of ﬁxed latency. This is achieved by utilizing a multi-stage pipeline, as illustrated in Figure 2. cell buﬀering units arranged in The pipeline architecture consists of L(L+1) 2 a triangular structure, where L denotes the number of parallel memory units. Each row is therefore associated with one memory unit. The notion of speedup, s, is introduced with the requirement that the pipeline operate s times faster than the line rate. One desired beneﬁt of operating the pipeline at a higher rate is reduced latency. Moreover, if the incoming packets from the set of N input ports are presented to the pipeline in groups of Ns , the number of conﬂicts from packets with the same arrival time is reduced from N to Ns . Incoming packets from input port i are initially inserted into row i mod Ns . The underlying mechanism is that at every time slot, packets are horizontally PSM Switch

Pipeline Structure

Packet Arrivals

Shared Memory Bank

Packet Departure

1 2

. . .

Calculation of Departure Times

3

. . .

. . .

. . .

. . .

. . .

. . .

k

Fig. 1. General architecture of the proposed parallel shared memory (PSM) switch. Arriving packets are placed in a set of (k > N ) memory units.

Accelerated Packet Placement Architecture for PSM Routers

801

Packets shift horizontally in each time slot

Input port 1

Parallel memory 1

Input port N Packet shifts vertically from diagonal element if the corresponding memory already hosts a packet with the same departure time

Parallel memory L

Fig. 2. Illustration of the memory management pipeline structure

shifted one step to the right, with the exception of the diagonal cells. A packet residing in a diagonal cell is either shifted (moved) vertically to another row in the same column or placed in the memory associated with the row in which it resides. Vertical packet shifts occur if the memory associated with the row in which the packet resides contains another packet with the same departure time. If a vertical shift is to be performed, the diagonal cell must select a row in its column that satisﬁes the following three conditions: (1) the pipeline cell of the row selected does not already contain a packet; (2) the memory in the row selected must not contain a packet with the same departure time; (3) all of the pipeline cells located in the selected row, regardless of their column, must not contain a packet with the same departure time. Applying these constraints, vertical moves provide a mechanism for resolving memory placement contentions. The goal of the scheme is that once a packet reaches a diagonal cell in the pipeline it has exclusive access to the memory located in its row. If the current row memory is occupied, an attempt is made to place the packet in a row for which there are no existing conﬂicts. Placement decisions along the diagonal are made concurrently and independently as means of maximizing the processing speed of the system. As selections are independently made for each packet, it is possible for packets along the diagonal to simultaneously select the same row. In a single packet placement scheme, there exists only one memory location in a row for any given departure time. To reduce the number of conﬂicts associated with packets simultaneously selecting the same row, the number of memory locations in a row for a given departure time can be increased to m > 1. As multiple packet placements to a single memory are now allowed, we must guarantee that m packets can be read from memory during a single packet time. One might speculate that the pipeline speedup, s, and the number of placements allowed, m, which is eﬀectively the memory read rate, must be equal. This is generally not required, since it might be necessary to operate the pipeline at a slower rate as dictated by potentially faster on-chip SRAM resources. In this case, it is still prudent to oﬀer additional placement locations in order to reduce conﬂict. In provisioning m packet placement locations for each memory, it would appear that the reduction in row memories is merely an inconsequential outcome of

802

B. Matthews, I. Elhanany, and V. Tabatabaee

increasing the memory depth. Recognizing that as packets shift vertically from block b to b + 1, the block size, in terms of physical rows, decreases as s and m increase. This infers that a vertical movement bypasses fewer potentially acceptable rows with each subsequent placement. In subsequent sections, we provide analysis that derives optimal values for these parameters. In order to illustrate the underlying memory-management principal, we refer to the following example. 3.2

Upper Bound on the Suﬃcient Number of Memories

In this section, we obtain an upper bound on the number of memories suﬃcient for the pipelined memory management architecture, given a speedup factor, s. Let us view the pipeline rows as arranged in B sequential blocks. Speedup is introduced into the system through the partition of the N arriving packets into s distinct segments. Packets that arrive at time t to any of the N ports are presented to the ﬁrst block which consists of Ns rows. Arriving packets are then multiplexed and written to one of the Ns rows in this ﬁrst block. Once placed in a row, a packet can only be written to one of m memory locations for a given departure time, or shift vertically to another row in block b + 1. N − b rows in block b, for b ∈ Lemma 1. There should be at least s+m sm [2, 3, ...B] Proof. Consider a packet moving from block b to b+1. For a system with speedup s, it will ﬁnd at most Ns − 1 packets having the same arrival time. Furthermore, there are at most N − bm − 1 packets with the same arrival time, since at least, bm packets with the same arrival time are served inthe ﬁrst b blocks. Therefore, in block b + 1, there are at most Ns − 1 rows occupied with packets with the same arrival time. Since up to m packets with the same departure time can be additional rows for packets with the served with one memory, we need N −bm m N same departure time. Hence, we need − 1 + N −bm rows for block b + 1, or s m s+m N − b rows for block b ∈ [2, 3, ...B]. sm For a switch with N ports and P blocks, the total number of rows (parallel memories) can be expressed as: s+m N L(N ) = + N − 2 + .. (1) s sm s+m N −B + sm s+m N +N = (B − 1)− s sm (B + 2) (B − 1) /2 ((s + m) (B−1) + m) − =N sm (B + 2) (B − 1) /2

(2)

Accelerated Packet Placement Architecture for PSM Routers

803

To compute the total number of rows (or memory units), we must determine the maximum number of B(N ) blocks, or vertical shifts, required to successfully assign all packets to memory. Lemma 2. The maximum number of packets with the same departure time in √ 2 the fourth block is P4 ≤ N −m . Suppose there are P1 packets with the same departure time in the ﬁrst block. Recall that there can be no more than Ns packets with the same arrival time in the ﬁrst block and no more than N packets with the same departure time in the system, such that P1 ≤ N . Packets only move vertically from the ﬁrst block if a given packet resides in a row that contains m other packets withthe same departure time. Let us state that there are P1 packets residing in R1 R1 ≤ Ns rows of the ﬁrst block, then the number of packets that propagate vertically to the second block must equal the number of conﬂicting packets given by P3 = P1 − mR1

(3)

Decisions regarding which row destination for a given packet are made independently such that packets with the same departure time can shift simultaneously to the same row. Note that a maximum of R1 packets can shift simultaneously such that the resulting number of rows with conﬂicts in the second block is given by P1 − mR1 R2 ≥ (4) R1 The value of R2 represents the number of unique rows that received packets, with the same departure time, from the ﬁrst block. Applying these same principles, we can further state the maximum number of packets with the same departure time that can shift to the third block block is given by P − mR1 (5) P3 = P2 − mR2 ≤ P1 − mR1 − m R1 If P1 − mR1 is divisible by R1 , then m P3 ≤ P1 1 − − mR1 + m2 R1

(6)

otherwise, since P4 ≤ P3 − 1, we have m P3 ≤ P1 1 − − mR1 + m2 + 1 R1 m (7) P3 ≤ P1 1 − − mR1 + m2 R1 √ The maximum value of (7) is obtained when R1 = P1 . Substituting P1 = N , yields the following inequality √ 2 P4 ≤ N −m (8)

804

B. Matthews, I. Elhanany, and V. Tabatabaee

Note that if N is a complete square we have, √ 2 N −m . P3 ≤

(9)

Corollary 1. A suﬃcient number of parallel memory blocks required for an √ N xN switch, employing the proposed architecture, is O N Proof. Equation (8) shows that for an N -port switch, the maximum number of conﬂicting packets with the same departure time in the fourth block is √ 2 N − 1 . Let B (N ) represent the number of stages required for an N -port switch. We can thus express the total number of stages required using the recursive relationship, B (N ) = 1 .. . √ B (N ) = B N − 2 N + 1 + 3 from which we conclude that B (N ) = O

√ N .

Theorem 1. For an N = k 2 k ∈ {1, 2, ...} and s = m, the number of memories is 4k 3 6 4 3 5 2 L (N ) ≤ 2 + − − k − k m m m2 m m2 2 5 + 2 − 2− (10) m m with equality if N = k 2 Proof. We prove the equality for N = k 2 , suggesting that the general case trivially follows. We ﬁrst show by strong induction that the number of required row blocks are 2 2k 2 + 2− (11) B k ≤ m m For k =1, the result is trivial. In order to prove it for k ≤ m, it is suﬃcient to show B m2 = 2 To that end, for k = m, notice the number of rows in the ﬁrst block is, k2 m2 N = = =m (12) R1 = s s m Therefore, the maximum number of packets that can move simultaneously to the same row in the second block is m (one packet from each row in the ﬁrst block). Since each memory can serve up to m packets with same departure time, all packets in the second block rows can be scheduled and there is no need to have third block rows. So far, we have proved the result for k = 1, . . . , m. Next

Accelerated Packet Placement Architecture for PSM Routers

805

we use, the strong induction step to prove it for k > m. We assume it is true for 2 1, . . . , k {k ≥ m} and prove it for k + 1. For N = (k + 1) , using lemma 2 and (9) (given that N is a complete square), we have 2 2 B (k + 1) ≤ B (k + 1 − m) + 2 2 2 (k + 1 − m) +2− +2 m m 2 2 (k + 1) + 2− = m m

≤

(13)

Now, we let s=m, then substitute (13) into (1) to obtain, 2 (B − 1) (B + 2) (B − 1) − L k2 = k2 mk 2 2 2 2m − m + 1 + 1 ≤ k2 m 2k 2 2k 2 + 4 − m m m +1− m − 2 3 5 6 4 4k 3 2 − − = 2 + k − k2 m m m2 m m2 2 5 + − 2− m m2

4

(14)

Hardware Implementation

To establish the viability of the FoC architecture, the proposed memory management algorithm was implemented in hardware targeting an Altera Stratix II EP28S60 FPGA device. The implementation consisted of eight ports, each operating at 10 Gbps, representing a switch with an aggregate capacity of 80 Gbps. The maximum departure time, k, was set to 64. Further, the system was designed with a placement decision speedup (s) of four, requiring packet placement decisions to be performed in approximately 12.5ns. Additionally, there were four unique locations for each departure time in each row memory, i.e. m = 4. The prototype system, with speedup and multiple packet placement, utilized eight physical memories consuming a total of 26.624 kb, including logic mapped to memory. This assumed that only packet headers are processed (as payload is irrelevant to the decision making process). However, if a 64-byte payload is assumed, the aggregate on-chip memory requirements increased to 1.05 Mbit. While eight physical memories were implemented, principally for symmetry and test purposes, no more than ﬁve memories were actually required (as stated in Table 1). The design required 17,383 adaptive look-up tables (ALUTs), or 35% of the ALUTs available on the target device. Proper evaluation of the switch was established by attaching a packet generator, implemented using an Altera Cyclone

806

B. Matthews, I. Elhanany, and V. Tabatabaee Table 1. Number of memories in the proposed PSM switch Switch Ports (N ) Speedup (s) Memory Units 8 2 19 8 4 5 16 2 58 16 4 18 32 4 51 64 4 144

EP1C6Q-240C8 device, to apply both Bernoulli i.i.d. as well as bursty traﬃc to the PSM switch fabric. In varying the traﬃc load and patterns over a wide range of possible scenarios, the viability of the proposed algorithm in a real-time environment was established. The overall latency contributed by the architecture with respect to a pure output-queued switch was 100 ns (8 stages of 12.5ns each).

5

Conclusion

The notion of designing a packet switching fabric on a chip was introduced and discussed from a theoretical as well as practical perspective. In the context of emulating an output-queued switch, it has been argued that a fundamental challenge pertains to the memory-management algorithm employed. A packetplacement algorithm and related high-speed parallel architecture were described in detail, emphasizing the feasibility attributes. Future work will focus on further reducing memory requirements and the incorporation of quality of service (QoS) provisioning. The switch model and framework presented here can be broadened to further investigate the concept of consolidating multiple switch fabric functions on silicon.

Acknowledgements This work has been partially supported by the Department of Energy (DOE) under research grant DE-FG02-04ER25607, and by Altera, Inc.

References 1. McKeown, N.: The iSLIP scheduling algorithm for input-queued switches. IEEE/ACM Transactions on Networking 7(2) (1999) 188–201 2. Shreedhar, M., Varghese, G.: Eﬃcient fair queueing using deﬁcit round robin. Proc. of ACM SIGCOMM ’95 (1995) 231–242 3. Tamir, Y., Frazier, G.: Higher performance multiqueue buﬀers for vlsi communication switches. In: 15th Annual Symposium on Computer Architecture. (1988) 343–354

Accelerated Packet Placement Architecture for PSM Routers

807

4. Iyer, S., Zhang, R., McKeown, N.: Routers with a single stage of buﬀering (2002) 5. Matthews, B., Elhanany, I., Tabatabaee, V.: Fabric on a chip: Towards consolidating packet switching functions on silicon. In: Proc. IEEE International Conference on Communications (ICC). (2006) 6. Liu, H., Mosk-Aoyama, D.: Memory management algorithms for dsm switches. Stanford Technical Paper (2004) 7. Aziz, A., Prakash, A., Ramachandra, V.: A near optimal scheduler for switchmemory-switch routers. In: Proceedings of the Fifteenth Annual ACM Symposium on Parallel Algorithms and Architectures. (2003) 343–352 8. Prakash, A., Aziz, A., Ramachandra, V.: Randomized parallel schedulers for switchmemory-switch routers: Analysis and numerical studies, IEEE INFOCOM 2004 (2004)

RSVP-TE Extensions to Provide Guarantee of Service to MPLS∗ Francisco J. Rodríguez-Pérez, José Luis González-Sánchez, and Alfonso Gazo-Cervero University of Extremadura - Escuela Politecnica, Avda. Universidad s/n 10071 Caceres, Spain {fjrodri,jlgs,agazo}@unex.es

Abstract. Independent Quality of Service (QoS) models need to be set up in IP and ATM integration and they are difficult to coordinate. This gap is bridged when MultiProtocol Label Switching (MPLS) is used for this purpose. We propose Guarantee of Service (GoS) to improve performance of privileged flows in congested MPLS networks. We first discuss the GoS requirements for the use in conjunction with MPLS. Then we propose a minimum set of extensions to RSVP-TE that allow signaling of GoS information across the MPLS domain. Keywords: Guarantee of Service, MPLS, RSVP-TE, local recoveries.

1 Introduction Multiprotocol Label Switching (MPLS) is currently mainly used to provide Virtual Private Networks (VPNs) services or in IP-ATM with QoS integration purposes [1], also combining ATM traffic engineering capabilities with flexibility of IP and classof-service differentiation. MPLS bridges the gap between IP and ATM avoiding the need of setting up independent QoS models for IP and for ATM, which are difficult to match. ATM switches can dynamically assign Virtual Path Identifier/Virtual Channel Identifier (VPI/VCI) values which can be used as labels for cells. This solution solves the problem without the need for centralized ATM-IP integration servers. Like ATM Virtual Circuits (VCs), MPLS Label Switched Paths (LSPs) let the headend Label Edge Routers (LER) to control the path that traffic uses towards a specific sink. LSP tunnels also allow a variety of policies related to network performance optimization [2]. Resource ReSerVation Protocol (RSVP) is a signaling mechanism used to reserve resources for these LSP tunnels. MPLS can reserve bandwidth on the network when it uses RSVP to build LSPs. Unlike ATM, there is no forwarding-plane enforcement of the reservation. A reservation is made in the control plane only, which means that if a Label Switch Router (LSR) makes an RSVP reservation and later it needs a bigger bandwidth, it will congest that LSP, damaging performance of other flows which can have even more priority, unless we attempt to ∗

This work is sponsored in part by the Regional Government of Extremadura (Education, Science and Technology Council) under GRANT Nº PDT05A041.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 808–819, 2007. © IFIP International Federation for Information Processing 2007

RSVP-TE Extensions to Provide Guarantee of Service to MPLS

809

police the flows using QoS techniques. Although RSVP with Traffic Engineering (RSVP-TE), is expected to be an important application in such problematic [3], an extended RSVP-TE protocol can be used in a much wider context for performance improvement. MPLS-TE is providing fast networks, but assuming that devices are not going to fail and without data loss. However, resource failures and unexpected congestions cause a great part of lost traffic. In these cases, upper layers protocols can request lost data retransmissions at end points, but the time interval to get retransmitted data can be significant. For some types of services with high requirements of delay and reliability, as stock-exchange data or medical information, MPLS is not able to ensure that performance will not be worse due to lost traffic endto-end (E2E) retransmissions. In this work we describe a set of extensions to MPLS RSVP-TE signaling to provide GoS over MPLS. Thus, we will allow to offer GoS to privileged data flows [4][5], making discarded packets due to congestion to be locally recovered, avoiding in this way, as far as possible, E2E retransmissions requested by upper layers. Following section shows how GoS can be applied to privileged MPLS flows. In the third section we study the RSVP-TE extensions to transport GoS information through the domain. In fourth section an analysis of the proposal is shown and finally this article concludes indicating the contributions of the research.

2 GoS over MPLS The GoS capabilities for a MPLS privileged data flow is the capacity of a specific node to local recovering of discarded packets belonging to such flow. This work proposes up to four GoS levels (see Table 1), codified with two bits; so each packet can be marked with this information throughout all the route. A greater GoS level implies a greater probability that a packet can be found in the GoS buffer of any node of its LSP. Thus the need of end to end retransmissions is avoided, recovering lost data in a much rather local environment. Implementation of GoS levels is carried out by means of the MPLS packet header, in the network level header and upper layers headers too. The main implied levels in an MPLS communication are Network, Link and level 2+ or MPLS. However, we have to bear in mind the possibility of marking GoS levels in Transport layer for Application level packets. Thus, following the TCP/IP model, data is marked with GoS at Application level directly by user and after that, the process would mark the TCP segments to be encapsulated in IP packets, which finally would receive a label to be switched across the MPLS domain. At Application level, a GoS capability session can be started selecting a specific port when opening a TCP socket. Table 1. GoS Levels Codification GoS1 0 0 1 1

GoS0 0 1 0 1

Meaning No GoS packet. Level 1 of GoS. Level 2 of GoS. Level 3 of GoS.

810

F.J. Rodriguez-Perez, J.L. Gonzalez-Sanchez, and A. Gazo-Cervero

For example, in order to use email service we access to the port 110 or we use port 22 to SSH services. In this way, GoS use three ports to open TCP sessions, mapped with each one of the three GoS available levels. This will cause the Transport and upper levels to be marked with GoS. Moreover, at Network level the GoS mark has been implemented in the IP Options field, which has a size of at most 40 bytes. However only the first byte of this field is needed to codify the two bits for GoS. Finally, to mark a packet with GoS in MPLS level, the label field has value 1, which has been defined as a special value for MPLS labels. The EXP field (see figure 1) can transport the two bits needed for GoS. This mark can be set by the ingress LER. 2.1 GoS Packets Identification In GoS nodes, a temporal buffer called NonStop-Forwarding Memory for GoS PDUs (NMGP) is needed. Moreover, GoS packets buffered in these nodes must also be identified to allow a GoS packet which satisfy a local retransmission request can be found. So privileged PDUs will be indexed in these buffers, allowing all sent and received GoS packets are globally identified in the MPLS domain, for nodes which request local retransmissions recognize each packet whose retransmission is needed as well as for upstream nodes to find stored GoS marked packets. The IP address from Network layer allows to identify each node in a network topology so it can identify data flows, but it can not identify each packet sent by a specific node. An id identifier will go with each GoS packet and will be assigned by the sender node that generates it. A four octets identifier allows to recognize up to 232 = 4,294,967,296 packets sent by a node. So Network level address of the sender, and this four bytes id will be considered as unique identifier for a GoS packet. This id field will also be marked in the Options field, after the GoS level field (see figure 2). In case of IPv6, GoS information can be forwarded with the Hop-by-Hop optional header, to be processed in every node throughout the LSP. This header also allows a No-GoS node to ignore GoS data and to continue processing the IPv6 packet.

TTL: (8 bits) S: (1 bits) EXP: (3 bits) Label: (20 bits)

Fig. 1. MPLS packet header structure 1 oc. GoS Level

4 octets

4 octets

4 octets

Packet Identifier

Address of GoS node 1 (GoSP1)

Address of GoS node d (GoSPd)

GoS Plane, up to 8 addresses Fig. 2. IP Options field format for characterization of GoS packets

RSVP-TE Extensions to Provide Guarantee of Service to MPLS

811

2.2 GoS Path Marking and Local Recoveries We consider a domain G(U), with a set of nodes U and a data flow ϕ(G)=ϕ(xi, xn) in G(U) across a path LSPi,n, with origin in node xi and destination in node xn, with {xi, xn} ⊂ U. Node xn only knows incoming port and incoming label of every arrived packet of ϕ(G), i.e., xn only knows that xn-1 is the sender of ϕ(xi, xn). It could know which node is the sender of a packet basing on label information, but this is not a reliable strategy because node xn-1 could use flow aggregation mechanisms to merge k flows coming from other nodes into a unique flow, in the form:

ϕ ( x n −1 , x n ) =

k

∑ϕ i =1

i

( x n −1 , x n ) .

(1)

On the other hand, if xn, due to congestion, do not keep Flow Conservation Law: k

∑

j =1

p nj

<

k

∑

i =1

p in

,

(2)

being pij the traffic volume sent from xi to xj through xn; so node is discarding one or more packets. In this case xn cannot find any node to request local retransmissions of lost packets. It is very important to know the set of nodes by which a specific GoS packet has passed through and this is known as GoS Path Marking. Thus, xn will know that discarded traffic can have been stored in upstream GoS nodes in LSPi,n. The first node to request a local retransmission will be the starting node of the GoS Plane, i.e., its previous GoS neighbor. To get this stack of GoS nodes we have to obtain the set of nodes X such that X ⊆ LSPi,n = {(xi, xi+1), (xi+1, xi+2), ..., (xn-1, xn)} ⊆ U, of G(U) domain, with maximum diameter d(xi, xn)=n-i such that X are GoS capable. In this way, with packet discarding, a local retransmission could be requested to any node belonging to X, avoiding requests to the head end and bringing a lesser increment of global ϕ(G) in the domain. Path marking at MPLS level implies using of several bits from the label, making that Non-GoS nodes (LSPi,n – X) do not know how to handle GoS traffic. So working at network level is a better strategy, i. e., GoS nodes of LSPi,n mark its network level address in the IP Options field of the GoS privileged packets. This stack of network address of nodes that have switched the packet is known as GoS Plane and the number of elements of this stack is the diameter (d) of the GoS Plane. Maximum value of d is max d = ((OS-BU)/BpA), where OS is the IP Options field size (40 bytes); BU is the number of bytes used in the GoS proposal for packet characterization (1 byte for GoS level and 4 bytes for packet identification); BpA (Bytes per Address) is the number of bytes needed to codify an IP address (4 bytes). The value d = 8 is the maximum supported GoSP diameter. The objective of GoS is not to propose the replacement of all the nodes in a MPLS domain but the incorporation of several GoS capable MPLS nodes. In this way, in case a local retransmission was necessary in a node, there is a GoS Plane of at most 8 nodes to go upstream, increasing possibilities of finding lost packet. Moreover, Internet Effective Diameter (IED), that is defined as the maximum number of indispensable hops that are needed to reach to any other node in Internet [6], shows that rounding to 4, approximately 80% of the pairs of nodes in Internet are reachable in this distance. If

812

F.J. Rodriguez-Perez, J.L. Gonzalez-Sanchez, and A. Gazo-Cervero

we consider an effective diameter of 5, it covers more than 95% of the pairs of nodes so a GoSP diameter of at most 8 nodes is a suitable size. The last d GoS nodes which have switched a specific GoS packet is always known. This stack will also be marked in the Options field, after the GoS level field and after the four bytes packet identifier. So, in order to support GoS, the IP Options field of a packet will be formatted like in figure 2 is shown.

3 GoS Signaling The specification of RSVP-TE [7] defines extensions to the Resource reSerVation Protocol (RSVP) in order to make network resources reservations and to distribute labels, establishing LSPs with traffic engineering capabilities. Among these extensions are the ability to specify a strict path to be followed by an LSP or supporting of state recoveries. RSVP messages are send encapsulated in IP packets and are composed of header and a set of objects. For example the Hello Object enables RSVP routers to detect when neighbor nodes are not reachable, so this mechanism provides a very local and effective failure detection. Thus, the Hello extension is designed in the way that one side can use the mechanism while the other side does not and may be initiated at any configured failure detection interval and at any time; there are two types of Hello objects: Hello Request and Hello Ack. Nodes with no Hello capabilities or not configured for it, can ignore this messages, i.e., reception of Hello messages not alter the common operation of any node. It is intended for use between immediate neighbors, so Time To Live (TTL) IP field must be 1; however, with a TTL>1 it could be used as keepalive between non neighbors nodes. 3.1 RSVP-TE Hello Message Operation A node may periodically (default value is 5 ms) generates a Hello message containing a Hello Request object for each neighbor who's status is being tracked. For every Hello Request, neighbor must send a Hello Ack (see figure 3). If no messages are received within a configured number of Hello intervals (default for this is 3.5 intervals), then a node presumes that it cannot communicate with the neighbor. They also compare new received values of Source and Destination Instance fields with the values most recently received from every neighbor and with last values send to them. This is used to assume that communication with the peer has been lost too. 3.2 GoS Extended Hello Operation In [7] an extension for Hello message is proposed for handling nodal faults, relates to the case where a node losses its control state (e.g., after a restart) but does not loose its data forwarding state, as well as for control channel faults, relates to the case where control communication is lost between two nodes. The format of this extended Hello message is:

HDR. HDR BODY

HELLO REQUEST COMMON OBJECT HEADER

RSVP-TE Extensions to Provide Guarantee of Service to MPLS

Version Flags (4 bits) (4 bits)

Message Type (1 octet)

RSVP Checksum (2 octets)

TTL (1 octet)

Reserved (1 octet)

RSVP Message Length (2 octets)

Object Length (2 octets)

Class-Num (1 octet)

813

C-Type (1 octet)

Source Instance (4 octets) Destination Instance (4 octets) Object Length (2 octets)

Class-Num (1 octet)

C-Type (1 octet)

Packet Identifier (4 octets)

GoSP1 BODY

GoS REQUEST OBJECT

Flow Identifier (4 octets)

(4 octets)

GoSP2 (4 octets)

GoSP8 (4 octets)

Fig. 3. GoS extended Hello message format, with common Hello Request and GoS Request objects

::= + [] + + [] In this work a GoS Hello message is proposed with following format: ::= + [] + + [], which, besides a Hello object, also includes an object with a GoS request or a GoS ack. GoS nodes will use information of Source and Destination Instances of common Hello objects to test connectivity with neighbors in GoSP as explained above. Formats of GoS Request and GoS Ack objects are in figures 3 and 4. The usual state of a GoS MPLS node is data forwarding state, switching labels and forwarding data packets to the next node (see figure 3). There are only two events that change this state in the GoS node (see figure 5). One of them is detection of a discarded GoS packet. In this case node is able to capture GoS characterization information of discarded packet (see figure 2) and change its state to request of local retransmission, to send a extended Hello message with a GoS Request to the first node of GoSP (GoSP1) (see figure 6). When a GoS Ack object is received from GoSP1, it changes to the forwarding state again. The other event that make state changing is receiving from any downstream GoS node an extended Hello message with a GoS Request for a local retransmission. Here the node changes its state to NMGP search, to accede to its temporal buffer trying to find the requested packet, according to the characterization information received in the GoS request.

F.J. Rodriguez-Perez, J.L. Gonzalez-Sanchez, and A. Gazo-Cervero

HDR. HDR BODY

Version Flags (4 bits) (4 bits)

Message Type (1 octet)

RSVP Checksum (2 octets)

TTL (1 octet)

Reserved (1 octet)

RSVP Message Length (2 octets)

Object Length (2 octets)

Class-Num (1 octet)

C-Type (1 octet)

Source Instance (4 octets) Destination Instance (4 octets) Object Length (2 octets)

BODY

GoS ACK OBJECT

HELLO ACK OBJECT

COMMON HEADER

814

Class-Num (1 octet)

C-Type (1 octet)

GoS Ack (4 octets)

Fig. 4. GoS extended Hello message format, with common Hello Ack and GoS Ack objects

If it finds in NMGP the requested packet, it send a GoS Hello message with a GoS Ack object indicating that packet was found and it will be locally retransmitted. After this, it changes to local retransmission state, to get the GoS packet from NMGP and retransmit it. After this it will return to initial forwarding state. In case of not find the packet in NMGP buffer, it will send a GoS Ack object indicating that packet was not found, changing to request of local retransmission state and sending a GoS Hello message with the GoS request to the next GoSP node, if it is not the last one. This new GoS request message to the next node in GoSP is shorter than previous one, since that if a node does not find the requested GoS packet in the NMGP and it has to request it to next node of GoSP, it first will remove its address of GoS Request object, to simplify the message (see figure 3). So with a bigger used diameter in the plane GoS, the GoS messages to send will be shorter.

4 Analysis and Evaluation of the Proposal In this section we will show an analysis of GoS benefits in the delay of packets belonging to privileged flows. We consider an MPLS domain G(U) network with a set X of n nodes and a set U of links. Let δij the delay of link (xi, xj) ∈ U and let δ(xi, xj) the delay of a path between two nodes xi and xj which can be non-neighbors. Our objective is to minimize the delay used by packets when are transmitted between two any nodes of the path LSPi,n of U(G): min δ(xi, xj) =

n

n

i =1

j =1

∑ ∑δ n

subject to:

∑x l =2

1l

ij

xij ,

=1,

(3)

(4)

RSVP-TE Extensions to Provide Guarantee of Service to MPLS

n

n

∑ x −∑ x i =1

il

n −1

∑x l =1

where

ln

j =1

lj

= 0, l = 2, 3,..., n − 1 ,

=1

Hello Ack: Not found

Next GoS node response

GoS packet discarded

(5)

(6)

xi , j = 1, ∀ ( xi , x j ) ∈ LSPi , n δ i,i = 0, ∀ i ∈ N ; xi , j = 0, ∀ ( xi , x j ) ∉ LSPi , n .

Local retrans. request

815

and

GoS buffer search

Hello Request received

Hello Ack: Found Local retransmission

Forwarding mode

Fig. 3. GoS extended Hello message format, with common Hello Request and GoS Request objects

X1

δ1,2

X2

δ2,3

X3

δ3,4

X4

δ4,5

X5

fw fw fw

TIME

GoS Req. GoS Req. GoS Req.

GoS Resp LRP

GoS Resp LRP E2ER

E2ER fw

GoS Resp LRP E2ER

E2ER

LRP fw

LRP

E2ERP

LRP E2ERP

fw E2ERP E2ERP

Fig. 4. Operation after a packet discard in intermediate node X4, using 3 available GoSP diameters to get a local retransmission. It is compared with a case of end to end recoveries (δ: links delay; Fw: packet forwarding; E2ER: end to end retransmission request time; LRP: locally recovered packet; E2ERP: end to end recovered packet).

816

F.J. Rodriguez-Perez, J.L. Gonzalez-Sanchez, and A. Gazo-Cervero

4.1 End to End Retransmissions Let xn a non-GoS congested end node. In case of packet discarding by xn, then function Discarding Detection Time (DDTe2e) between two nodes of LSPi,n is:

DDTe 2e ( x i , x n ) =

n −1

∑δ l =i

l ,l +1

xl ,l +1

(7)

Minimal delay of the end to end (e2e) retransmission is: n −1

δ e 2e ( xi , x n ) = 2∑ δ l ,l +1 xl ,l +1

(8)

l =i

So total delay Δe2e ( xi , xn ) to get discarded flow in xn is got from (7) and (8): n −1

Δ e 2 e ( x i , x n ) = 3 ∑ δ l ,l +1 xl ,l +1

(9)

l =i

4.2 If Congested End Node xn is GoS Capable Let xn a GoS congested end node. In case of packet discarding by xn, then Discarding Detection Time (DDTd) between source and sink nodes of path LSPi,n is:

DDTd ( xi , x n ) =

n −1

∑δ l =i

l ,l +1

xl ,l +1

(10)

Minimal delay of local retransmission using a GoSP with diameter d (δd) is:

δ d ( xi , x n ) = 2

n −1

∑δ

l =n− d

l ,l +1

xl ,l +1 ,

subject to: 0 < d < n – i

(11)

(12)

If diameter in Eq. (11) was n-i, then if l = n–d = n – (n–i) = n – n + i = i, we get:

2

n −1

n −1

l = n−d

l =i

∑ δ l ,l +1 xl ,l +1 = 2 ∑ δ l ,l +1 xl ,l +1 ,

(13)

i.e., it would be and e2e retransmission. Moreover, if in Eq. (11) GoSP diameter was bigger than n-i, then it would be trying to get a retransmission from a previous node to xi, but this one is the source of data flow, so it is unfeasible. Thus, total delay Δ d ( xi , x n ) to get discarded traffic from start is got from (10) and (11):

Δ d ( xi , x n ) =

n −1

∑δ l =i

n −1

l ,l +1 xl ,l +1 + 2 ∑ δ l ,l +1 xl ,l +1 l =n−d

(14)

RSVP-TE Extensions to Provide Guarantee of Service to MPLS

817

At this point we test if (14) < (9): n −1

∑δ l =i

l ,l +1

x l ,l +1 + 2

2

n −1

∑δ

l =n −d

n −1

l ,l +1

x l ,l +1 < 3 ∑ δ l ,l +1 x l ,l +1

(15)

l =i

n −1

n −1

l =n− d

l =i

∑ δ l ,l +1 xl ,l +1 < 2 ∑ δ l ,l +1 xl ,l +1

(16)

So according to Eq. (8) and Eq. (11), we only need to verify in Eq. (16) that δd(xi, xn) < δe2e(xi, xn). The only condition that distinguishes the members of (16) is the set of values of variable l. We only need to demonstrate that l takes a lesser number of values in δd(xi, xn) than in δe2e(xi, xn): n – 1 – (n – d) < n – 1 – i; n – 1 – n + d < n – 1 – i; -1 + d < n – 1 – i; -1 + 1 + d < n – i ⇒ d < n – i

(17)

We get that the problem is kept in feasibility zone, since Eq. (17) is one of the restrictions of (12). Thus, it has been demonstrated that Δ d ( xi , xn ) < Δ e 2 e ( xi , xn ) . So, Eq. (14) offers delay benefits: Eq. (14) – Eq. (9) > 0, improving (3):

Δ e 2 e ( xi , x n ) − Δ d ( xi , x n ) = 2

n − d −1

∑δ l =i

l ,l +1

xl ,l +1

(18)

4.3 If a Congested Intermediate Node xDD is GoS Capable Let xDD a GoS congested intermediate node. In case of packet discarding by xDD, then Discarding Detection Time (DDTd) between source and congested node xDD is:

DDTd ( xi , x DD ) =

DD − 1

∑δ l =i

l ,l +1

xl ,l +1

(19)

x l ,l +1 ,

(20)

Minimal delay of the e2e retransmission is:

δ d ( x i , x DD ) = 2

DD − 1

∑δ

l = DD − d

l ,l +1

subject to: 0 < d ≤ DD - i,

(21)

If diameter in Eq. (20) was bigger than DD - i , then it would be trying to get a retransmission from a previous node to xi, but this one is the source of data flow, and this is unfeasible. (In this case, retransmission from source node xi (d = DD – i), brings improvement with respect to e2e, because xDD is a previous node to xn,, i.e.: if DD < n

818

⇒

F.J. Rodriguez-Perez, J.L. Gonzalez-Sanchez, and A. Gazo-Cervero

DD

– i < n – i), so it is a local retransmission. So total delay Δ d ( xi , x n ) to get

discarded traffic from initial instant of transmission is got from (19) and (20):

Δd ( xi , xn ) = DDTd ( xi , xDD) + δd ( xi , xDD) +

n− 1

∑δ

l = DD

l ,l +1

xl ,l +1 =

(22)

= DDTe 2e ( xi , xn ) + δ d ( xi , x DD ) At this point we test if (22) < (9): n −1

DD −1

n −1

l =i

l = DD − d

l =i

∑ δ l ,l +1 xl ,l +1 + 2

∑ δ l ,l +1 xl ,l +1 < 3 ∑ δ l ,l +1 xl ,l +1

(23)

Optimizing, we get: DD −1

2

∑ δ l ,l +1 xl ,l +1 < 2

l = DD − d

n −1

∑δ l =i

l ,l +1

xl ,l +1

(24)

So according to Eq. (8) and Eq. (11), again we only need to verify in Eq. (24) that

δd(xi, xn)<δe2e(xi, xn). As in Eq. (17) we get that the problem is kept in feasibility zone. So Eq. (22) offers delay benefits, i. e., Eq. (22) – Eq. (9) > 0, improving Eq.(3): n−1 ⎛ DD−d −1 ⎞ Δ e2e ( xi , xn ) − Δ d ( xi , xn ) = 2 ⎜ ∑δ l ,l +1 xl ,l +1 + ∑δ l ,l +1 xl ,l +1 ⎟ l = DD ⎝ l =i ⎠

(25)

Consider a congested MPLS domain with a path LSPi,n. The last n-i nodes are discarding packets. Figure 7 shows a comparative between no congested traffic, e2e case (using e2e retransmissions) and three cases of local retransmissions (with d=1, d=2 and d=3). In figure 8 a comparative at different time samples is shown. For example, at 3,700 ms only 171 packets have been correctly received in the sink node. In GoSP diameter=3 case, sink node has already received 213 packets node, with d=2, 313 packets and in case of using d=1, sink has received 584 packets.

Packets

800 700 600 500 400 300 200

40 26 0 48 0 70 0 92 0 11 40 13 60 15 80 18 00 20 20 22 40 24 60 26 80 29 00 31 20 33 40 35 60

100 0

Time (ms)

Fig. 5. Throughput comparative between local retransmissionsand E2E, with congestion in last n–1 nodes

RSVP-TE Extensions to Provide Guarantee of Service to MPLS

819

700 600 E2E

400

d=1

300

d=2

200

d=3

Packets

500

100 0 370

740

1110

1480

1850

2220

2590

2960

3330

3700

Fig. 8. Comparative between local retransmissions and E2E at different time samples

5 Conclusion This work proposes GoS as a traffic local recovery technique in an MPLS domain in order to improve performance of privileged data flows. We have first discussed the requirements for GoS over MPLS. We have then shown that by introducing a limited number of RSVP-TE protocol extensions it is possible GoS signaling to such privileged data flows that require reliability. The proposed technique has been analysed and demonstrated the benefits due to local retransmissions of discarded traffic with respect to end to end retransmissions.

References 1 Taesang Choi.: Design and implementation of an information model for integrated configuration and performance management of MPLS-TE/VPN/QoS, IFIP/IEEE 8th International Symposium on Integrated Network Management (2003) 143-146. 2 Fowler, et al.: QoS path selection exploiting minimum link delays in MPLS-based networks, Proceedings IEEE Systems Communications (2005) 27-32. 3 Suryasaputra, R. Kist, A. A. Harris, R.J.: Verification of MPLS traffic engineering techniques, 13th IEEE International Conference on Networks (2005) 190-195. 4 Dominguez-Dorado, M., et al.: Guarantee of Service (GoS) support over MPLS using Active Techniques, WSEAS Transactions on Communications (2004) 1959-1964. 5 Fowler, S. Zeadally, S.: Priority-based congestion control in MPLS-based networks, Advanced Industrial Conference on Telecommunications. IEEE AICT/SAPIR/ELETE (2005) 332-337. 6 G. Siganos: Powerlaws and the AS-level Internet topology, ACM/IEEE Transactions on Networking (2003), vol. 11, 514-524. 7 RFC3473: GMPLS Signaling Resource ReserVation Protocol-Traffic Engineering (RSVP-TE) Extensions (2003).

An Adaptive Management Approach to Resolving Policy Conflicts Selma Yilmaz1, and Ibrahim Matta2 1

2

CISCO Systems Inc., 170 West Tasman Drive, San Jose, CA 95134 [email protected] Computer Science Department, Boston University, Boston, MA 02215 [email protected]

Abstract. The Border Gateway Protocol (BGP) is the current inter-domain routing protocol used to exchange reachability information among Autonomous Systems (ASes) in the Internet. BGP supports policy-based routing which allows each AS to independently define a set of local policies regarding which routes to accept and advertise from/to other networks, as well as which route the AS prefers when more than one route becomes available. However, independently chosen local policies may cause global conflicts, which result in protocol divergence. We propose a new algorithm, called Adaptive Policy Management (APM), to resolve policy conflicts in a distributed manner. Akin to distributed feedback control systems, each AS independently classifies the state of the network as either conflictfree or potentially conflicting by observing its local history only (namely, route flaps). Based on the degree of measured conflicts, each AS dynamically adjusts its own path preferences—increasing its preference for observably stable paths over flapping paths. The convergence analysis of APM derives from the sub-stability property of chosen paths. APM and other competing solutions are simulated in SSFNet for different performance metrics. Keywords: Inter-domain Routing, Border Gateway Protocol (BGP), Feedback Control, Convergence Analysis, Simulation.

1 Introduction BGP plays a major role in the performance of the Internet, and is known to have properties that are far from ideal. BGP allows policy-based routing; each autonomous system (AS) independently defines a set of local policies regarding which routes to accept and advertise from/to other networks, as well as which route the AS prefers when more than one route becomes available. However, independently defined local policies may lead to policy conflicts. Policy conflicts occur when neighboring ASes have opposite interests over routes. Any policy conflict can be resolved by changing the preference of the ASes over their paths, i.e. local policies.

This work was supported in part by NSF CNS Cybertrust Award 0524477, CNS ITR Award 0205294, and EIA RI Award 0202067. This work was done while Selma Yilmaz was at the Computer Science Department of Boston University. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 820–831, 2007. c IFIP International Federation for Information Processing 2007

An Adaptive Management Approach to Resolving Policy Conflicts

821

Although not all policy conflicts are harmful, a group of ASes may define conflicting policies that cannot be satisfied simultaneously, causing BGP to diverge. Assume AS u, v, and z form such group. The scenario of divergence may take place as follows: When AS u improves its best path, it forces AS v to give up its best path for a less preferred path, which in turn gives AS z an opportunity to improve its best path, which forces AS u to give up its best path for a less preferred path, and so on. Each AS in such conflict repeatedly selects the same sequence of routes, never converging on any one set of routes. Therefore, route oscillations due to policy conflicts are persistent, and require some kind of intervention to stop. Route instabilities taking place across ASes may negatively impact end-to-end network performance and efficiency of the Internet [1]—packets may be dropped or delivered out of order due to repeated advertising and withdrawal of routes. BGP is crucial for a healthy and efficient global routing, and thus it is a worthy goal to guarantee convergence of BGP independent of the locally selected policies. Contribution of This Paper: There have been a number of studies on guaranteeing safety, i.e. convergence, of BGP independent of the locally selected policies [2], [3], [4]. In our previous work [5], we introduced the idea of dynamically detecting and suppressing BGP oscillations through probabilistic change of path ranks (preferences). The algorithm is designed to detect policy conflicts by using local histories only. This paper extends and completes our preliminary idea [5] in many ways: (1) we augment the algorithm of path rank change so that an AS might choose a less preferred but observably stable path over a more preferred but oscillating path, thus it becomes natural for an AS to implicitly assign a higher cost (and hence less preference value) to oscillating (flapping) paths; (2) with new additions, the algorithm enables the nodes to dynamically adapt to any state of the network. After the system stabilizes, we let the nodes attempt to (conservatively) restore some of the original local preference values of their paths, which they have modified, so as to keep the overall path rank change minimal.1 ; (3) a new mechanism is added to distinguish route flaps due to topology changes, so as not to confuse them with those due to policy conflicts; (4) BGP extensions of the proposed algorithm are specified; (5) a correctness and convergence analysis of the proposed algorithm is developed based on the sub-stability property of chosen paths; (6) the proposed algorithm is implemented in the SSFNet simulator [7], and simulation results for different performance metrics are presented. The metrics also capture the dynamic performance of our algorithm as well as other competing solutions, thus exposing often neglected aspects of performance. Although our exposition is BGP-specific, the problem of inconsistent policies at independent distributed entities is more general. The paper is organized as follows: Section 2 reviews background and related work. Section 3 describes our algorithm. Results and future work are presented in Sections 4 and 5, respectively. 1

Akin to distributed recovery mechanisms, e.g. congestion avoidance of TCP [6]. As we indicate later, the adaptation of local preference values of paths may also be influenced by input from local AS administrators.

822

S. Yilmaz and I. Matta

2 Background 2.1 Border Gateway Protocol Abstraction We use the abstraction of BGP proposed by Griffin et al. [3], which is called Safe Path Vector Protocol (SPVP). SPVP is a distributed algorithm for solving the so-called Stable Paths Problem (SPP). This model abstracts away low-level details of BGP and makes it easier to reason about convergence related issues. In SPP, a network is represented as an undirected graph, G = (V, E), where V represents the autonomous systems and E represents BGP sessions. Node 0 is the destination to which all other nodes are trying to find paths. A path P is a sequence of nodes (vk , vk−1 , · · · , v1 , 0), such that (vi , vi−1 ) ∈ E, for all i, 1 ≤ i ≤ k. Paths must be simple, i.e. no repeated nodes. An empty path, , indicates that a router cannot reach the destination. Each node v in the graph has a set of permitted paths, P v , to the destination, which are the routes learned from peers, and allowed by the local policy of the node. Each node v also has a ranking function, λv , to impose an order of preference on the paths, such that more preferable paths have higher values assigned to them. Given a node u, and W ⊆ P u with distinct next-hops, max(u, W ) is defined to be the highest ranked path in W . A path assignment is a function π that maps each node u to a permitted path. π defines the path chosen by each node to reach the destination. Given a path assignment π and a node u, the set of permitted paths that are one-hop extension of paths through neighbors is defined as choices(u, π) = {(u, v)π(v)|{u, v} ∈ E}

Pu .

The path assignment π is called stable at node u if π(u) = max(u, choices(u, π)). The path assignment π is called stable if it is stable at every node u ∈ V . If a stable path assignment π exists, an SPP is solvable and every such assignment is called a solution. An instance of SPP may have no solution. SPVP is an abstraction of BGP. With this abstraction, each node maintains two data structures: rib(u) is the current path that node u is using to reach the destination, and rib in(u ⇐ w) denotes the path that has been most recently advertised by peer w and processed at node u. The set of paths available at node u is updated as choices(u) = {(u, w)rib in(u ⇐ w)|w ∈ peers(u)}

130 10

1

0 2

3 320 30

210 20

Pu

1

0

130 10

2

3 320 30

210 20

Fig. 1. An example of a divergence. Permitted paths are shown next to each node. Longer paths are more preferred than shorter paths. Current best paths are underlined. Due to the cyclic conflict, this group of nodes cannot reach a stable state and keep oscillating between the shown states.

An Adaptive Management Approach to Resolving Policy Conflicts

823

and the best path at u is best(u) = max(u, choices(u)) and rib(u) = best(u). As long as node u receives advertisements from its peers, best(u) is recomputed with the most recent choices(u), and stored in rib(u). Just as it is the case with BGP, when u changes its current path, it notifies its current peers about the change. This may cause the peers to send advertisements to their peers. The network reaches a stable state when there is no node which would change its current path to the destination. If such a state is reached, then the resulting state is the solution of the Stable Paths Problem (SPP). If SPP has no solution, then SPVP diverges. Figure 1 shows an example of a policy conflict leading to divergence. 2.2 Prior Solutions to BGP Divergence In this paper, due to space restrictions, we will focus only on dynamic solutions, which attempt to detect and resolve policy conflicts at run time. A complete review of previous solutions can be found in [8]. Safe Path Vector Protocol (SPVP): Griffin et al. [3] suggest extending BGP to carry additional information called history with each routing update message. A possible trace of SPVP for the system shown in Figure 1 is shown in Figure 2 (a). History allows each router to describe the exact sequence of events that led to the selection of a path as the best path. An event (+P ) indicates that the node has chosen path P as its best path, and P is more preferred than its previous best path. Similarly, an event (−P ) indicates that the node has updated its best path, and this current best path is less preferred than its previous best path P . A history containing loops is an indication of a potential protocol divergence. At step 4 of Figure 2 (a), all 3 nodes have a cycle in the histories of their current best paths. SPVP assumes that such paths are problematic, and therefore eliminates them. For the assumed timing of events, with SPVP the system converges to unreachable destination for all nodes. Since a cycle in the history is a necessary but not sufficient condition for divergence, there may be false positives. Carrying history with each update creates communication overhead, and may also reveal private information about the preferences of ASes over the routes. APM uses the idea of keeping track of history of path changes, but does it only locally. By keeping histories as local information and avoiding exchanging such information helps overcome related privacy concerns and communication overhead. Cobb and Musunuri Algorithm: Cobb et al. [4] propose an algorithm which associates an integer cost with each node and exchanges the cost with update messages. The cost increases monotonically if the system diverges. Therefore, discarding the advertisements from the nodes whose cost is greater than a threshold is suggested. Assuming threshold value of 2, Figure 2 (b) shows a possible trace of Cobb and Musunuri algorithm for the system. Since the cost of the nodes involved in the same conflict grows in tandem, all of the nodes simultaneously give up their most preferred paths and stabilize on their lowest preferred paths. A weakness of this algorithm is keeping per node cost, which causes aggregation of the paths through the same node. One flapping path may cause all the alternative paths (through the same node) to be eliminated. With APM, we extend the idea of using count to keep per-path state at each node instead of per-node state, which prevents

824

S. Yilmaz and I. Matta

aggregation of the paths through the same node. Empowered with this extra information together with probabilistic update of path ranks, APM can pinpoint the paths causing problems, and lead to fewer path elimination.

3 Adaptive Policy Management (APM)

0 1

0

(10)

2 (20)

3 (30) 1 1 (130) 2 (210) 3 (320)

0 3 0 1 1 0

(30)

(+130) (+210) (+320)

2 1 (10) 2 (20) 3 (30) 3 1 (130)

(-130)(+320) (-210)(+130) (-320)(+210)

4 1 (10) 2 (20) 3 (30)

(-130) (+320)(-210) (+130) (-210) (+130)(-320) (+210)

(+130)(-320)(+210) 2 (210) (+210)(-130)(+320) 3 (320) (+320)(-210)(+130)

5 1 2 3

(-320) (+210)(-130) (+320) (-10) (-20) (-30)

(a)

2

2 0 3 0 2 1 1 2 1 3 1

best path

(20)

(130) (210) (320) (10) (20) (30)

(130) 3 1 1 2 1 (210) 3 1 (320) (10) 4 1 2 2 2 (20) (30) 3 2 5 1 count(3)≥2,

won’t use (130) will stabilize on (10)

2 count(1)≥2,

won’t use (210) will stabilize on (20) 3 count(2)≥2, won’t use (320) will stabilize on (30)

(b)

path

step node best

count of node

history

0 1 (10)

path

step node best

step node

We propose a new algorithm to dynamically detect and eliminate policy conflicts leading to BGP divergence. The idea is to locally detect the paths involved in a conflict, and eliminate the conflict by changing the relative preference of such paths. Note that such adaptation is limited to the node’s set of permitted paths, any of which the AS is willing to use albeit at different preference level. Each node involved in a particular conflict observes route flaps: Constantly chooses a path as its best path and later gives it up for another path. For example, in Figure 1, node 1 constantly upgrades its current best path to (130), but later it is forced to give up (130) for its less preferred path (10) as a result of its neighbors’ response to this upgrade. The nodes observing constant route flaps can stop such behavior by sticking to their less preferred but more stable path, even when a better alternative is advertised. This can be achieved by changing the local preference of the paths. When the node stops advertising the paths alternately, the cyclic effect of the global conflict will be broken. In Figure 1, for example, if node 1 changes its local preferences to prefer (10) over (130), the system stabilizes on the following path assignment: (10)(210)(30). To be able to locally detect route flaps and the paths whose preference is causing divergence, each node needs to keep some form of local history. We suggest keeping track

local history

path preferences

0 1 (10) ((10),1) 2 (20) ((20), 1) 3 (30) ((30),1) 1 1 (130) ((130),1),((10),1) 2 (210) ((210),1),((20),1)

(130)>(10) (210)>(20) (320)>(30)

3 (320) ((320),1),((30),1) 2 1 (10) ((130),1),((10),2)

(320)>(30) (130)>(10) (210)>(20) (320)>(30)

2 (20) ((210),1),((20),2) 3 (30) ((320),1),((30),2) 3 1 (130) ((130),2),((10),2) 2 (210) ((210),2),((20),2) 3 (320) ((320),2),((30),2) 4 1 (10) ((130),2),((10),3) 2 (20) ((210),2),((20),3)

3 (30) ((320),2),((30),3)

(130)>(10) (210)>(20)

(130)>(10) (210)>(20) (320)>(30)

count(10)> min threshold change rank with probability 1/2 assume this happens: (10)>(130) count(20)>min threshold change rank with probability 1/2 assume this does not happen: (210)>(20) count(30)>min threshold change rank with probability 1/2 assume this does not happen: (320)>(30)

stabilizes on lower (10)>(130) 5 1 (10) preferred and available path 2 (210) stabilizes on most preferred and available path 3 (30) stabilizes on most preferred and available path

(210)>(20) (320)>(30)

(c)

Fig. 2. Possible traces of the algorithms for the system shown in Figure 1: (a) SPVP; (b) Cobb and Musunuri algorithm assuming threshold value for count is 2; (c) APM with min threshold = 2

An Adaptive Management Approach to Resolving Policy Conflicts

825

of the paths that have been recently selected as best path, and their counts indicating how many times the path has been chosen as best path and later given up. Figure 2 (c) shows how counts keep increasing during divergence of the system shown in Figure 1. Nodes involved in the conflict can detect divergence by comparing counts against a threshold called min threshold. Since the algorithm we are proposing is distributed and based on using only local information, there may be many nodes synchronously detecting the same conflict and lowering the preferences of their higher preferred paths. If we assume min threshold=2 for each node in Figure 2 (c), at step 4, all 3 nodes simultaneously change their local preferences to prefer their shorter paths, which are more stable in the sense that they are always available. Note that the conflict can be broken even if only one of the nodes performs the path rank change. To prevent this kind of simultaneous and unnecessary path preference changes, we suggest changing relative preferences with probability 1/2. Because of the probabilistic adjustcount ment of path preferences, even though the max threshold effect of a particular conflict is observed several times, it is possible that the conmin threshold flict remains unresolved. max threshold is introduced to handle such cases: When time the count associated with a particular Policy Policy Policy Conflict Conflict path exceeds max threshold, the path is Conflict Avoidance Free Control removed from the set of permitted paths, Phase Phase Phase and added to the bad paths set, B. The Fig. 3. Phases of APM paths in this set are excluded from further consideration in the best path selection process (until they are restored as the algorithm adapts to a conflict-free state), even if they are advertised by peers and permitted by original local policies. Comparing counts against min threshold, and max threshold helps each node independently classify the state of the network: (a) Policy conflict-free phase: When counts are smaller than min threshold, the node assumes that there is no persistent oscillation; (b) Policy conflict-avoidance phase: If any count value exceeds min threshold, but stays lower than max threshold, the node assumes that there is a policy conflict leading to persistent oscillation, which can be avoided by changing the relative preference (rank) of the paths; (c) Policy conflict-control phase: If any count exceeds max threshold, the path associated with this count is added to a set of bad paths, and excluded from further consideration in the best path selection process. Figure 3 shows these three different phases of our algorithm. Subsections 3.1 and 3.2 more formally describe APM. 3.1 Update Handling Figure 4 shows the pseudo-code of our Adaptive Policy Management (APM) scheme for handling routing updates. The process runs at each node u in response to a received update. When node u chooses a path p ∈ P u as its best path, it informs its peers by sending an update message. rib(u) indicates the current best path to the destination selected at node u. rib in(u ⇐ w) indicates the most recent path sent from w ∈ peers(u),

826

S. Yilmaz and I. Matta //Update Handling process APMS Update Handling[u] receive Update m from peer w keepaliveCount(w)=0 then if rib in(u ⇐ w) = m peerStability(w)++ rib in(u ⇐ w) = m if rib(u)= bestB (u) Pold =rib(u)

then

Pnew = bestB (u) if (Pnew = ) then count(Pnew )++ if count(Pnew ) > max threshold

then

//Policy Conflict−Control Phase

B(u) = B(u)

{Pnew }

Pnew = bestB (u) count(Q)=0 for each path Q ∈ localHistory peerStability(v)=0 for each v ∈ peers(u) else if count(Pnew ) > min threshold

then

do with probability= 1/2 //Policy Conflict−Avoidance Phase

find the most preferred safe path, Psaf e rank(Psaf e )=1 Pnew = Psaf e count(Q)=0 for each path Q ∈ localHistory peerStability(v)=0 for each v ∈ peers(u) if Pnew = Pold

then rib(u) =Pnew for each v ∈ peers(u) send rib(u)

do

to v

Note: The code to the right of the is assumed to be executed in one atomic step .

Fig. 4. APM: Update Handling

and processed at node u. The set of path choices available at node u that are considered for best path selection, excluding the bad paths in B(u), is defined as choicesB (u) = {(u, w)rib in(u ⇐ w) − B(u)|w ∈ peers(u)}

Pu

and the best path as bestB (u) = max(u, choicesB (u)). As long as node u receives advertisements from its peers, bestB (u) is recomputed with the most recent choicesB (u). When rib(u) changes, node u notifies its peers by sending an update message. To find the stable paths, peerStability is associated with each peer. When a peer advertises a path different from its previously advertised path, the peerStability of the peer increases. peerStability of 1 indicates that the path advertised by the peer has not changed over time. The paths advertised by such peers are referred to as safe paths. A node observing a route flap can stop the flap by making the safe path its most preferred path, i.e. rank(saf e path) = 1. Note that count values associated with paths in local history cannot be used to measure stability of peers. A path p advertised by w may have a high count value associated with it, even if w never changes this advertisement. The state of the system is defined by the local state as well as the local preferences at each node. The state changes whenever there is a path rank change, or a path is placed

An Adaptive Management Approach to Resolving Policy Conflicts

827

//Keepalive Handling process APMS Keepalive Handling[u]

receive keepalive from w keepaliveCount(w)++ if keepaliveCount(v) ≥ ka threshold for every v ∈ peers(u) for each v ∈ peers(u) r=rib in(u ⇐ v) if (localpref(r) = originallocalpref(r)) (r ∈ B(u)) then do with probability= 1/4 if r ∈ B(u) then remove r from B(u) localpref(r)= originallocalpref(r) count(Q)=0 for each path Q ∈ localHistory peerStability(v)=0 for each v ∈ peers(u) keepaliveCount(v)=0 for each v ∈ peers(u) Pnew =bestB (u) if Pnew = rib(u)

then

rib(u) =Pnew for each v ∈ peers(u) send rib(u) to v

do

Note: The code to the right of the assumed to be executed one atomic step

Fig. 5. APM: Restoring Local Preferences once Stability is Reached

in B. In either case, this new state corresponds to a different SPP, possibly a stable one. Therefore, counters are reset to give opportunity for a fresh start. 3.2 Restoring Local Preferences When the system stabilizes, only KEEPALIVE messages are exchanged between peers. Each node keeps track of the number of KEEPALIVE messages received from its peers, and compares this value against a threshold, denoted by ka threshold, to test the stability of the system. Figure 5 shows how node u probabilistically restores some rank changes for its paths after the system has stabilized. Since policies are placed for a purpose by each node, such as traffic engineering or security, it is important for ASes not to change them unless they are conflicting with the policies of other nodes and absolutely necessary to eliminate route oscillations. Although it is safe to restore rank changes that do not compromise the current stability, there is no way for node u to know which changes are safe to restore. Therefore, node u uses a probabilistic (albeit more conservative) approach, and risks introducing instability back into the system. Contrary to update handling, node u increases the local preference of a path with a much smaller probability, 1/4. We allow for bringing paths out of B with probability 1/4 as well. If node u performs a rank change and/or remove (restore) a path from B, counters kept in the local history are reset because this new state corresponds to a different SPP. Note that the much smaller probability for reset, i.e. 1/4, provides a conservative way of probing the network state, thus oscillating in the vicinity of a stable state at a very slow rate. This is akin to the congestion avoidance mechanism of TCP, during which the current state of the network is probed at a slower rate. We refer the reader to [8] for the convergence analysis of APM.

828

S. Yilmaz and I. Matta

4 Simulation Results We have simulated the algorithms in the SSFNet simulator [7]. We only present one set of results and refer the reader to [8] for additional results. We have compared APM against the SPVP algorithm [3], Cobb and Musunuri algorithm [4], and BGP4 [9], where the details of these algorithms can be found in Section 2.2. The variations of APM include using different values for max threshold of 3 and 10. min threshold is set to 2, and ka threshold is set to 6. We have two versions of the Cobb and Musunuri algorithm, where the threshold for node cost is set to either 3 or 10 to be consistent with the max threshold value of APM. Griffin et al. [3] suggest suppressing routes only after seeing the same policy cycle multiple times to handle transient oscillations. In our simulations, we suppressed the routes only after seeing the same policy cycle twice. This is consistent with the min threshold value we have chosen for APM. To be able to observe throughput and delay, we have used the topology shown in Figure 7. There are 7 groups of nodes: {AS 1, AS 2, AS 3}, {AS 4, AS 5, AS 6}, {AS 7, AS 8, AS 9}, {AS 10, AS 11, AS 12}, {AS 13, AS 14, AS 15}, {AS 16, AS 17, AS 18}, {AS 19, AS 20, AS 21}. Each node in a group prefers the path through its clockwise neighbor, which creates a policy conflict for each group. Permitted paths and path preferences are shown in Figure 6. Simulation is run for 350 seconds, and data flow from servers to clients continues for the duration of the simulation. Buffer size is 50000 bytes, and routing packets are

Node

Permitted Paths

Local Preference

(1 2 0) (1 0) paths learned from 8 (7 8 1 2 0) (7 8 1 0) (7 0) paths learned from 19

100 80 1 100 100 80 1

8

(8 9 0) (8 1 0) (8 1 2 0) paths learned from 20

100 80 80 1

9

(9 7 0) (9 0) paths learned from 21

100 80 1

1 7

Node Permitted Paths 19 (19 20 8 9 0) (19 20 8 1 0) (19 20 8 1 2 0) (19 7 0) (19 7 8 1 2 0) (19 7 8 1 0) 20

21

(20 21 9 0) (20 21 9 7 0) (20 8 9 0) (20 8 1 0) (20 8 1 2 0) (21 19 7 0) (21 19 7 8 1 2 0) (21 19 7 8 1 0) (21 9 0) (21 9 7 0)

Local Preference 100 100 100 80 80 80 100 100 80 80 80 100 100 100 80 80

Fig. 6. Path Rankings for Topology Used in Simulation (Only for the top portion is shown; rest is symmetric)

An Adaptive Management Approach to Resolving Policy Conflicts AS21

AS 20

AS 19

Client Host Server Host

Client Host Server Host

AS 8

AS 7

Client Host

Client Host Server Host

AS 9 Server Host

15

AS 3 AS 4

AS 6

AS 5 Client Host Server Host

AS 13 Client Host Server Host

Client Host Server Host

Server Host

AS 18

AS 12

Client Host

Client Host Server Host

1

AS 17

15 Server Hosts

Client Host Server Host

1

AS 11

Client Host Server Host

Client Hosts

AS 2

AS 0

Client Host Server Host

Server Host

AS 1

AS 16

AS 10

Client Host

829

AS15

AS 14 Client Host Server Host

Client Host Server Host

Fig. 7. Topology

given priority over data packets when there is congestion at the buffers. The performance plots presented next show 90% confidence intervals for the metrics. Figure 8 shows the percentage of the nodes that cannot reach the destination AS 0. SPVP and the Cobb and Musunuri algorithms eliminate a high number of paths while enforcing stability, and therefore leave a higher number of nodes with unreachable destination. Different versions of APM perform much better than SPVP and the Cobb and Musunuri algorithms. With APM, a higher max threshold value helps resolve policy conflicts through changing path preferences, and therefore minimize the number of path eliminations. For max threshold=10, the system stabilizes to a state where each node has a way to reach the destination. A higher threshold value also helps the Cobb and Musunuri algorithm achieve better performance, but the improvement is not much because of the simultaneous elimination of the paths through the same high cost node. Figure 9 shows the results for power. This metric is used to measure the ratio of throughput (average total number of packets delivered over the last 50 seconds) and delay (average delay of delivered packets over the last 50 seconds). The power metric captures the desire of achieving as high throughput as possible while keeping delay as

830

S. Yilmaz and I. Matta

Percentage of nodes that cannot reach destination

100 APM (maxth=3) APM (maxth=10) SPVP Cobb&Musunuri(count_th=3) Cobb&Musunuri(count_th=10)

80

60

40

20

0 50

100

150

200 Time (sec)

250

300

350

Fig. 8. Percentage of nodes that cannot reach the destination 160000 APM (maxth=3) APM (maxth=10) SPVP BGP4 Cobb&Musunuri(count_th=3) 120000 Cobb&Musunuri(count_th=10) 140000

Power

100000 80000 60000 40000 20000 0 50

100

150

200 Time (sec)

250

300

350

Fig. 9. Power

small as possible. Different versions of APM have higher power value than SPVP and the Cobb and Musunuri algorithms because APM maximizes throughput. The different performance for throughput stems from both unreachable destinations, and/or competition for the limited buffer size. SPVP and the Cobb and Musunuri algorithms leave a higher number of nodes with unreachable destination (Figure 8). SPVP has the longest update messages, which take longer to process and require more memory to be stored in the buffers. Although BGP4 causes constant exchange of updates due to divergence, its performance is better than SPVP and the Cobb and Musunuri algorithms! This is because BGP4 does not cause permanent path elimination, even though some packets may not reach the destination temporarily due to instability.

5 Summary and Future Work Unlike static centralized solutions (e.g. Gao et al. [2] algorithm) which may lead to unnecessary elimination of many routes from the start to guarantee stability, APM is a

An Adaptive Management Approach to Resolving Policy Conflicts

831

dynamic distributed algorithm allowing ASes to adapt to the current state of the network, either conflict free or potentially conflicting. In this paper, we demonstrated the superiority of APM over other dynamic algorithms [3], [4]. If only some of the nodes in the network were upgraded to deploy APM, APM still can catch and resolve policy conflicts since the algorithm is based on only local information kept at each node. However, in such heterogenous settings, the nodes deploying APM will be the only ones which may give up their preferred paths for the sake of network stability without knowing whether or not the other nodes are working for the same purpose. As future work, to be able to improve cooperation among ASes to deploy APM, incentives should be proposed. To increase the transparency of APM, we plan to investigate allowing local AS administrators to explicity guide the backoff and recovery probabilities for lowering and restoring the ranks of paths. With such human input, it may be possible to resolve policy conflicts in a more efficient way albeit at a longer timescale. We are currently working on developing a prototype implementation of APM.

References 1. Govindan, R., Reddy, A.: An Analysis of Interdomain Routing Topology and Route Stability. In: Proceedings of the Conference on Computer Communications (IEEE Infocom), Kobe Japan (April 1997) 2. Gao, L., Rexford, J.: Stable Internet Routing without Global Coordination. In: Proceedings of ACM SIGMETRICS, Santa Clara CA (June 2000) 3. Griffin, T., Wilfong, G.: A Safe Path Vector Protocol. In: Proceedings of IEEE INFOCOM, Tel Aviv Israel (March 2000) 4. Cobb, J.A., Musunuri, R.: Enforcing Convergence in Inter-Domain Routing. In: Proceedings of IEEE Global Communications (GLOBECOM) Conference, Dallas TX (December 2004) 5. Yilmaz, S., Matta, I.: A Randomized Solution to BGP Divergence. In: Proceedings of the 2nd IASTED International Conference on Communication and Computer Networks (CCN’04), Cambridge MA (November 2004) 6. Jacobson, V.: Congestion Avoidance and Control. In: ACM SIGCOMM ’88, Stanford CA (August 1988) 314–329 7. SSFNet: Scalable Simulation Framework: http://www.ssfnet.org 8. Yilmaz, S., Matta, I.: An Adaptive Management Approach to Resolving Policy Conflicts. Technical Report BUCS-TR-2006-008, CS Department, Boston University (May 2006) 9. Rekhter, Y., Li, T.: A Border Gateway Protocol RFC 1771, 1995.

Reinforcement Learning-Based Load Shared Sequential Routing Fariba Heidari, Shie Mannor, and Lorne G. Mason Department of Electrical and Computer Engineering, McGill University, Montreal, Quebec, Canada [email protected], [email protected], [email protected]

Abstract. We consider event dependent routing algorithms for on-line explicit source routing in MPLS networks. The proposed methods are based on load shared sequential routing in which load sharing factors are updated using learning algorithms. The learning algorithms we employ are either based on learning automata or on online learning algorithms that were originally devised for solving the adversarial multi-armed bandit problem. While simple to implement, the performance of the proposed learning algorithms in terms of blocking probability compares favorably with the performance of other event dependent routing methods proposed for MPLS routing such as the Success-to-the-top algorithm. We demonstrate the convergence of one of the learning algorithms to the user equilibrium within a set of discrete event simulations.

1

Introduction

In the early days of packet switching much attention was given to the routing problem. See [1] for an early survey. With the emergence of the Internet, destination-based IP routing was widely adopted for reasons of scalability and stability in spite of the fact that destination-based routing gives the user little control over how his/her traﬃc is routed. This in turn means that traﬃc may be routed over congested links (paths) while at the same time alternative less congested paths are available. The need for better control of traﬃc routing, also referred to as “traﬃc engineering”, gave rise to the Multi Protocol Label Switching (MPLS) standard. MPLS is a connection oriented framework proposed by the IETF to enable traﬃc engineering, congestion management and QoS provisioning in traditional IP networks [2, 3, 4]. In the MPLS framework, constraint-based routing and label swapping replaces the hop-by-hop destination-based routing mechanism used in traditional IP networks. In MPLS the route selection can employ either hop by hop routing or explicit routing. In the explicit routing method, a single Label Switching Router (LSR) speciﬁes all (or some of) the hops along the path. Explicit routing gives the designer the ability to control the traﬃc load distribution in the network. The purpose of this work is to introduce an adaptive method for explicit source routing in MPLS networks. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 832–843, 2007. c IFIP International Federation for Information Processing 2007

Reinforcement Learning-Based Load Shared Sequential Routing

833

Most of the algorithms proposed for traﬃc engineering in MPLS networks are state dependent algorithms where traﬃc routing is based on the information about the current status of the network [5, 6, 7]. These methods impose an information ﬂooding overhead on the network. Event dependent routing methods, on the other hand, update their knowledge about the status of the network from the observed events. In [8] and later in [9], it was shown that the MPLS protocol and signaling extensions for crank back permits the use of source based dynamic routing schemes in the Internet. Such routing schemes have been widely used in TDM networks and proposed for ATM networks. Results presented in the aforementioned references demonstrate the merits of event dependent routing schemes in MPLS networks in terms of performance and scalability. In particular, the presented blocking probability performance of the Success-To-the-Top (STT) algorithm makes it a viable routing method for a realistic network. Consequently, the STT algorithm was proposed by AT&T as an event dependent routing method in AT&Ts IP network. In this paper, we present an event dependent routing scheme with the application to explicit source routing in MPLS networks. The proposed method is based on Load Shared Sequential Routing (LSSR) where load sharing factors are updated using learning techniques [10]. Learning automata and techniques that were developed for the multi-armed bandit problem are the learning algorithms used in this study. These algorithms are suitable choices in applications where the system has incomplete information about the environment and learns via trial and error. The algorithms are extremely simple to implement and computationally eﬃcient. The application of learning automata in dynamic routing has been widely studied in diﬀerent applications such as telephone routing [11], [12], wavelength routing in WDM networks [13] and MPLS routing [14]. An application of multi-armed bandit algorithms in adaptive routing was presented in [15]. The paper is organized as follows. We present an overview of user equilibrium in Section 2. The reinforcement learning algorithms used in this work are reviewed in Section 3. The the proposed routing algorithm is explained in Section 4. Simulation results for two simple network topologies are presented in Section 5. A summary and conclusions are given in Section 6.

2

User Equilibrium

The concept of Nash equilibrium is commonly used in non-cooperative strategic games. In a Nash equilibrium, there is no incentive in terms of cost decrease for an individual user to unilaterally change his/her strategy. Before deﬁning a Nash equilibrium, let us deﬁne the cost structure of the model we consider in this paper. Given a network that uses LSSR type routing we let λo,d represent the total traﬃc from origin ’o’ to destination ’d’. The load sharing model assumes that for every origin-destination pair (o − d), there is a load sharing vector αo,d such that:

834

F. Heidari, S. Mannor, and L.G. Mason |T (od)|

=1

αo,d = 1,

and αo,d ≥ 0, = 1, 2, . . . , |T (od)|,

is the load sharing where T (od) is the set of route trees from ’o’ to ’d’ and αo,d factor of the th route tree from ’o’ to ’d’. We further assume that given the load sharing factors, there is a cost associated with every route tree. We let this and note that it depends, generally in a non-linear way, cost be denoted by Lo,d on load sharing factors of other origin-destination pairs. This cost represents the blocking probability between ’o’ and ’d’ via the th route tree. With some abuse of notation, we denote the total cost incurred from ’o’ to ’d’ when the load o,d o,d sharing factors are (αo,d 1 ), (α2 ), . . . , (α|T (od)| ) by: |T (od)|

L(αo,d ) =

=1

o,d αo,d L ,

and we note that L(αo,d ) depends in general on all the sharing factors of all the pairs in the network. The set of load sharing factors α is a Nash Equilibrium if and only if for each pair ’o’ and ’d’, given the load sharing factors of other pairs, there is no incentive in terms of decreasing the cost to change the load sharing vector αo,d to αo,d = αo,d . We note that the players in this game are the pairs of nodes and not the individual nodes, that is, every origin-destination pair is assumed to act selﬁshly with respect to other pairs, even if the other pairs include the same origin or destination. In other words, the set of load sharing factors α is a Nash Equilibrium solution if and only if, for every pair ’o’ and ’d’, given the load sharing factors of other pairs, L(αo,d ) ≥ L(αo,d )

∀αo,d = αo,d .

User equilibrium can also be explained in terms of Wardrop equilibrium [16]. The Wardrop equilibrium assumes the contribution of each user’s traﬃc to the cost is negligible. It is therefore not surprising that as the number of users increases to inﬁnity with constant total traﬃc, the Nash Equilibrium converges to the Wardrop equilibrium [17]. For the routing problem, the Wardrop Equilibrium is the solution to a non-linear minimization problem that is described in [18]. It is straightforward to show that at the Wardrop equilibrium the cost on all the route-trees on which there is traﬃc (i.e., αo,d k > 0) is the same. The centralized solution for the load share optimization problem is studied in [18], where both user and system optimization perspectives are considered. We refer the reader to that paper and references therein.

3

Reinforcement Learning

The reinforcement learning framework [10] considers an agent that interacts with a dynamic unknown environment and attempts to learn an optimal, or at least

Reinforcement Learning-Based Load Shared Sequential Routing

835

reasonable, policy via a sequence of trials. At each stage (t), based on his policy, the agent selects one of the possible actions ai ∈ A. The agent receives a reward (x(t) ∈ X) which is a measure of the desirability of the selected action. The agent may use this signal to update his policy. In the following subsections the policy updating scheme used in two standard simple reinforcement learning approaches is explained. We start with learning automata and then discuss algorithms for the multi-armed bandit problem. 3.1

Learning Automata

Learning automata is one of the earliest frameworks dealing with learning the optimal behavior via trial and error [19, 20]. Formally, a learning automaton is described as a quadruple {A, P , X, T } where A is the set of actions with K = |A|, P is the probability distribution over the set of actions, X is the response from the environment and T : P × A × X → P is the updating scheme. Based on their Markovian behavior, updating schemes are categorized as either ergodic or absorbing algorithms. Absorbing schemes converge to speciﬁc states and are more suitable choices in stationary environments. Ergodic schemes converge in distribution independent of the initial states and are preferred in nonstationary environments. In this work, the set of possible values for the reward signal is equal to X = {0, 1}. One of the most well-known updating schemes for this case is the following linear mapping: Let a(t) = ai : (1 − G)pj (t) ∀j = i x(t) = 1 : pj (t + 1) = j =i pj (t) + G(1 − pj (t)) B + (1 − B)pj (t) ∀j = i x(t) = 0 : pj (t + 1) = K−1 j =i , (1 − B)pi (t) where G and B are parameters. The parameter G represents the gain in the case of a positive reward, while the parameter B represents the gain if the reward is 0. Here, we use the Linear Reward Inaction method (LRI) as the ﬁrst choice for updating load sharing factors. In the LRI method, the probability distribution is updated only when the selected action is rewarded. This can be derived from the above formula by choosing B =0. The LRI method is known to be −optimal [19] if the reward stream is stationary. That is, if the reward process (for each action) is stationary and if one of the actions leads to a higher reward in expectation, by choosing G arbitrarily small, P {limt→∞ pi (t) = 1} can be as close to unity as desired (where i is the action leading to the highest reward). However, if G is not small enough, the algorithm converges to a wrong solution with non-zero probability. Another variation for updating load sharing factors is the ergodic Linear Reward- Penalty (LRP) method. The LRP is derived from the above formula by choosing B << G. LRP is sub-optimal in comparison with the LRI algorithm in stationary environments. However, it avoids being locked in a state

836

F. Heidari, S. Mannor, and L.G. Mason

and is a better choice for non-stationary environments. For the LRP algorithm, a large a value of G will result in a large variance in the steady state distribution. 3.2

Algorithms for the Multi-armed Bandit Problem

The multi-armed bandit problem is the problem of learning the optimal actionselection policy through a sequence of trials and errors. The variant of the multiarmed bandit problem we consider is the so-called adversarial multi-armed bandit problem. The reward process is non-stochastic and is assumed to be generated by Nature or even by an adversary. The reward associated with the selected action at tth trial may depend on the actions selected in the previous trials. This formulation is reminiscent of online learning problems and leads to performance guarantees even when stationarity assumptions cannot be made. This is especially useful when considering multi-agent learning where each agent follows a regret minimizing algorithm; see [21] and references therein. Exponential weighting-type algorithms were proposed to solve the multiarmed bandit problem. We used an exponential weighing algorithm proposed to solve the problem in the non-stochastic case ([22]). In the real networks, the blocking probability of the links depend on their traﬃc load and the routing policies of diﬀerent (o − d) pairs. This makes the EXP3.P algorithm as a suitable candidate for updating load sharing factors. A pseudocode of the EXP3.P algorithm taken from [22] is as follows: Algorithm 1. Algorithm EXP3.P 1. Parameters: α > 0 and γ ∈ (0, 1]

T 2. Initialization: For i=1,...,K wi (1) = exp( αγ 3 K ). At each stage t = 1, ..., T : γ +K . (a) For i = 1, ..., K set pi (t) = (1 − γ) Kwi (t) w (T ) j=1

j

(b) Choose a(t) = ai randomly according to the distribution p1 (t),..., pK (t). (c) Receive reward xi (t) ∈ [0, 1]. (d) For j=1 ,...,K set xj (t) j=i x ˆj (t) = pj (t) 0 otherwise γ α √ (ˆ xj (t) + wj (t + 1) = wj (t) exp ) . 3K pj (t) KT

The parameters α and γ control the amount of exploration done by the algorithm initially (α) and persistently (γ). The choice of these parameters leads to a diﬀerence between the cumulative reward of the optimal action over the hindsight (consistently choosing the best action as if it was known) and the cumulative reward obtained from the learning algorithm. By selecting the pa-

Reinforcement Learning-Based Load Shared Sequential Routing

837

rameters appropriately, the per round value of this diﬀerence can be made to go to zero as the number of stages goes to inﬁnity.

4

A Reinforcement Learning Approach to Load Shared Sequential Routing

Load Shared Sequential Routing (LSSR) randomly partitions the traﬃc load (λo,d ) associated with an origin-destination pair (o − d) into n sub-streams using o,d a set of load sharing factors ({αo,d 1 , ..., α|T (od)| }). Each sub-stream is then oﬀered to a route-tree which consists of one or more alternate paths. The number and the order of the alternate paths can be diﬀerent from one route-tree to the other. The alternate paths of the selected route-tree are then tried sequentially. If there is not enough bandwidth available on at least one link of a path, an MPLS notiﬁcation message is sent to the origin node and the origin node forwards the request to the next alternate path. Sending the notiﬁcation messages can be done using the extensions of constraint-based routing using label distribution protocol ([23]) and the resource reservation protocol ([24]). This process is repeated until the requested bandwidth is allocated on one alternate path or all alternate paths have been tried unsuccessfully. If all paths have been tried unsuccessfully, the request is lost and rejected from the network. A pictorial representation of the LSSR model is provided in Fig. 1. The LSSR model imposes no restriction on the load-sharing factors other than non-negativity and o,d αk = 1. k

Fig. 1. LSSR Model: Diﬀerent paths between O and D may be tried

838

F. Heidari, S. Mannor, and L.G. Mason

In Reinforcement Learning-based Load Shared Sequential Routing (abbreviated RL-based LSSR), load sharing factors αo,d are updated using learning techk niques. Here, the set of actions is equal to the set of available route-trees, response from the environment is the {0, 1} rejected / accepted feedback from the network and the probability distribution over the set of actions is the set of load sharing factors {αo,d k }.

5

Simulation Results

In this section, the performance of the RL-based LSSR algorithm is compared with that of another event dependent routing algorithm, Success-to-the-top (see the Introduction for references). STT is a decentralized on-line routing algorithm with a random updating scheme. In this algorithm, the bandwidth request between an (o − d) pair is ﬁrst sent through the primary path. In the case there is a direct link between (o − d), the primary path would be the direct path. If the request is blocked on this path, it is sent through the last successful secondary path. If the bandwidth request is blocked on both the primary and the last successful secondary, another alternate path is selected at random and the request is forwarded through this path. The algorithm allows a maximum of N crank backs. If the request is accepted on one of the alternate paths, that path is labelled as the last successful path and will be used in routing the next bandwidth request between this (o − d) pair. As a measure of performance, we have used an estimation of the overall blocking probability. The blocking probability is estimated using exponential smoothing. The following calculation is done recursively whenever a bandwidth request is received: R = SR + (1 − S)(1 − X) . Here, R is bandwidth request blocking probability. X takes a value of ’1’ when the current bandwidth request is accepted and ’0’ otherwise. S is the smoothing constant. The larger the S, the slower R converges and the smoother the convergence curve appears. In our experiments S is set to 0.9999. The conﬁdence interval used in this set of simulations is 90-th percentile. In the ﬁrst experiment a fully connected 4-node network is used. Between each (o − d) pair, there are 5 route-trees. The ﬁrst route-tree includes only the direct path; the second and third route-trees each include the direct path and one of the two-hop alternate paths. The last two route-trees include the direct path and both two-hop alternate paths. In the last two route-trees, the order of alternate paths is diﬀerent. The capacity of each of the unidirectional links is equal to 50 trunks and traﬃc on each (o − d) pair is 45 Erlangs. Sessions arrive according to a Poisson process and holding times are exponential distributed. All bandwidth requests are equal to 1 trunk. For the learning automaton model, the parameter ’G’ is set to 0.001 and the parameter ’B’ = .1G. Discrete event simulations are performed using OPNET Modeler 11.5.

Reinforcement Learning-Based Load Shared Sequential Routing

839

A comparison of RL-based LSSR using diﬀerent learning algorithms is illustrated in Fig. 2. The results of LRI-based LSSR simulation shows that load sharing factors converge to the user equilibrium solution of the problem as can be seen in Fig. 3. This is not the case for the LRP and EXP3.P algorithm and these algorithms do not necessarily converge to user equilibrium. In EXP3.P algorithm, the probability of selecting each action is lower bounded by a small value. Still, all the learning algorithms behave more or less the same in terms of performance. In the next experiment, a more realistic non-symmetrical network with 9 nodes is used. The capacity and oﬀered traﬃc matrices are listed in Appendix A. Here again, capacities are expressed in number of trunks (bandwidth units) and the oﬀered traﬃc is in Erlangs. In order to have a reasonable comparison between STT and RL-based LSSR, the same set of alternate paths is used for RL-based LSSR and STT. Each route-tree has one direct link and three alternate paths and the maximum number of crank backs for STT is equal to three. The total blocking probability of RL-based LSSR and STT are compared with each other in Fig. 4. As can be seen in the ﬁgure, there is not much diﬀerence in performance of the diﬀerent learning based algorithms. However, RL-based LSSR algorithms have better performance in terms of blocking probability in comparison with STT. O−D:1−3

0.14

LReP

RT5

0.6

0.1

0.5 Load Sharing Factors

Total Blocking Probability

0.12

0.7

LRI LReP EXP3.P

LRI

0.08

0.06 EXP3.P 0.04

RT4 0.4 RT1 RT2 RT3 RT4 RT5

0.3 RT2 0.2 RT1

0.02

0.1 RT3

0

0

500

1000

1500 Time(sec)

2000

2500

3000

Fig. 2. Performance comparison in 4Node network

0

0

100

200

300 400 500 Number of Calls x 10

600

700

800

Fig. 3. Load Sharing Factors of 1-3 in 4-Node network using LRI learning

In the last set of simulations the performance of the STT and RL-based LSSR algorithms are compared with each other in the case where there is a link failure in the system. As can be seen in the results (Fig. 5), STT has the highest blocking probability before and after the change has occurred. However, none of these algorithms react fast enough to be used in the link failure recovery and other restoration schemes such as the method presented in [25] can be used to improve the performance of the system.

840

F. Heidari, S. Mannor, and L.G. Mason −3

x 10 9

LRI STT LReP EXP3.P

STT

0.18 0.16

STT

LRI EXP3.P LReP STT

7 Total Blocking Rate

Total Blocking Probability

8

6 5 4 3

0.14 0.12 0.1

LReP EXP3.P

0.08 LRI

2 0.06 1

EXP3.P 0.5

LReP 1

1.5 Time (sec)

LRI 2

0.04 2.5 4

x 10

Fig. 4. Performance comparison in 9Node network

6

1400

1600

1800 Time(sec)

2000

2200

2400

Fig. 5. Performance comparison when a link fails at T = 1800 seconds

Discussion and Conclusion

In this paper, adaptive on-line routing algorithms for explicit source routing were presented. The proposed algorithmic framework (called RL-based LSSR) is based on load share sequential routing and uses reinforcement learning techniques to update the load sharing factors. We considered three learning algorithms: LRI, LRP and EXP3.P. The LRI algorithm is a suitable choice for stationary environments and by a suitable choice of learning parameter, the resulting load sharing factors converge to the user equilibrium solution. The LRP and EXP3.P algorithms are better choices for non-stationary environments. The EXP3.P algorithm has additional performance guarantees for the worst cases where the reward generation may be adversarial. In real networks, the blocking probability is a function of the traﬃc load and the routing policy of diﬀerent (o − d) pairs. This makes EXP3.P algorithm a suitable choice for routing in such environments. The performance of the RL-based LSSR was compared with STT, another event dependent routing algorithm also proposed for routing in MPLS networks. The discrete event simulation results in some example networks show that RLbased LSSR compare favorably with STT in terms of network blocking probability. RL-based LSSR algorithms track the smooth changes in the traﬃc pattern. However, they are not fast enough for link failure recovery. One future area of research is addressing the problem of improving the response time to abrupt changes such as link failure. The learning algorithms that were presented are rather simple. While this simplicity has the advantage of low computational cost and low informational needs, one may consider more complex algorithms that take the state of the network into account. Of course, since the complete state of the network is not known to each node, some reasoning in terms the uncertainty of the state estimate is needed. The advantage of such state dependent schemes may be in their ability to synchronize between the diﬀerent nodes as well as detecting abnormal traﬃc patterns.

Reinforcement Learning-Based Load Shared Sequential Routing

841

In this paper, the analytical formulation of user equilibrium load sharing factors in LSSR model was also reviewed and it was shown that in the user equilibrium solution, for each (o − d) pair, the traﬃc loss probability of all used route trees are equal and less than the traﬃc loss probability of other route trees. The simulation results presented in Section 5 conﬁrm that by suitable choice of learning parameter, LRI-based LSSR converges to the user equilibrium solution.

Acknowledgements This work was partially supported in part by the NSERC Strategic Project Grant STPGP 269449 03 and by the Canada Research Chairs Program. The authors thank Hanhui Zhang for supplying code and results which were useful in the research reported here.

References [1] Mason, L.G.: Equilibrium Flows, Routing Patterns and Algorithms for Store-andForward Networks. Journal of Large Scale systems 8 (1985) 187–209 [2] Rosen, E., Viswanathan, A., Callon, R.: Multiprotocol Label Switching Architecture. RFC 3031 (2001) [3] Awduche, D.O.: MPLS and Traﬃc Engineering in IP Networks. IEEE Communications Magazine 37 (1999) 42–47 [4] LeFaucheur, F., Lai, W.: Requirements for Support of Diﬀerentiated Servicesaware MPLS Traﬃc Engineering. RFC 3564 (2003) [5] Kar, K., Kodialam, M., Lakshman, T.: Minimum Interference Routing of Bandwidth Guaranteed Tunnels with MPLS Traﬃc Engineering Applications. IEEE Journal on Selected Areas in Communications 18 (2000) 2566–2579 [6] Suri, S., Waldvogel, M., Bauer, D., Warkhede, P.R.: Proﬁle-Based Routing and Traﬃc Engineering. Computer Communications 26 (2003) 351–365 [7] Szeto, W., Boutaba, R., Iraqi, Y.: Dynamic Online Routing Algorithm for MPLS Traﬃc Engineering. Lecture Notes in Computer Science (2002) 936–946 [8] Ash, G.R.: Performance evaluation of QoS-routing methods for IP-based multiservice networks. Computer Communications 26 (2003) 817–833 [9] Ash, G.R.: Traﬃc Engineering and QoS Optimization of Integrated Voice & Data Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2006) [10] Sutton, R.S., Barto, A.G.: Reinforcement Learning: an Introduction. MIT Press, Cambridge, MA (1998) [11] Narendra, K.S., Wright, E.A., Mason, L.G.: Application of learning automata to telephone traﬃc routing and control. IEEE Trans. on Systems, Man and Cybernetics 7 (1977) 785–792 [12] Brunet, G.: Optimisation de l’acheminement s´equentiel non hi´erarchique par automates intelligents. Master’s thesis, INRS-Telecommunication (1991) [13] Alyatama, A.: Dynamic routing and wavelength assignment using learning automata technique [all optical networks]. In: Proceedings of IEEE GLOBCOM. (2004) 1912–1917 [14] Zhang, H.: Simulation of learning Automata Load Shared Sequential Routing in MPLS network. Master’s Project Report, Electrical and Computer Engineering, McGill University (2003)

842

F. Heidari, S. Mannor, and L.G. Mason

[15] Gy¨ orgy, A., Ottucs´ ak, G.: Adaptive routing using expert advice. The Computer Journal 49 (2006) 180–189 [16] Wardrop, J.G.: Some theoretical aspects of road traﬃc research. In: Proceedings of the Institution of Civil Engineers, Part II. (1952) 325–378 [17] Haurie, A.B., Marcotte, P.: On the relationship between Nash-Cournot and Wardrop Equilibria. Networks 15 (1985) 295–308 [18] Brunet, G., Heidari, F., Mason, L.G.: Load Shared Sequential Routing in MPLS Networks: System and User Optimal Solutions. EuroFGI Net-COOP, to appear (2007) [19] Narendra, K.S., Thathachar, M.A.L.: Learning Automata: an Introduction. Prentice Hall, Upper Saddle River, NJ, USA (1989) [20] Lakshmivarahan, S.: Learning Algorithms: Theory and Applications. Springer, New York (1981) [21] Cesa-Bianchi, N., Lugosi, G.: Prediction, learning, and games. Cambridge University Press (2006) [22] Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32 (2002) 48–77 [23] Jamoussi, B., Andersson, L., Callon, R., Dantu, R., Wu, L., Doolan, P., Worster, T., Feldman, N., Fredette, A., Girish, M., Gray, E., Heinanen, J., Kilty, T., Malis, A.: Constraint-Based LSP Setup using LDP. RFC 3212 (2002) [24] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., Swallow, G.: RSVP-TE: Extension to RSVP for LSP tunnels. RFC 3209 (2001) [25] Qin, Y., Mason, L.G., Jia, K.: Study on a joint multiple layer restoration scheme for IP over WDM networks. IEEE Network Magazine 17 (2003) 43–48

Reinforcement Learning-Based Load Shared Sequential Routing

A

Appendix Table 1. Traﬃc Matrix of 9-Node Network 0 8672 12000 0 0 2976 13056 5728 4304

9552 0 2848 40480 4576 0 1696 752 704

12976 1376 0 17152 12832 1376 2000 1952 1472

0 5872 19504 0 0 3200 5504 2576 3152

0 8176 18000 0 0 5072 5472 1776 976

3680 0 1104 4848 1728 0 4848 4448 672

13856 3280 2224 5072 6624 6352 0 4304 3472

8224 1824 4928 2704 3776 4480 5024 0 5024

5104 0 0 2400 2304 1232 1056 3200 0

Table 2. Capacity Matrix of 9-Node Network 0 7000 15000 0 0 5000 10000 5000 8000

9000 0 3000 6000 7000 0 3000 2000 3000

13000 4000 0 24000 20000 3000 4000 4000 5000

0 6000 24000 0 0 4000 6000 3000 5000

0 6000 15000 0 0 5000 9000 6000 5000

6000 0 4000 4000 6000 0 9000 4000 3000

22000 2000 3000 7000 7000 7000 0 4000 3000

9000 4000 4000 6000 5000 7000 6000 0 10000

5000 3000 8000 4000 5000 2000 5000 8000 0

843

An Adaptive Neuron AQM for a Stable Internet Jinsheng Sun and Moshe Zukerman The ARC Special Research Centre for Ultra-Broadband Information Networks, Department of Electrical and Electronic Engineering, The University of Melbourne, Victoria, 3010, Australia {j.sun,m.zukerman}@ee.unimelb.edu.au

Abstract. Recognizing that Internet congestion control is a complex nonlinear system, we propose here to use an intelligent controller to improve its stability and performance. In particular, we propose here a new, powerful, easy-to-conﬁgure and robust active queue management (AQM) scheme called adaptive neuron AQM (AN-AQM). We present extensive simulation results for AN-AQM, over a wide range of network conditions and scenarios, that demonstrate its attributes. We demonstrate its robustness in various realistic environments involving bursty HTTP connections and non-responsive UDP connections. Comparison with other AQM schemes has demonstrated the superiority of AN-AQM over well-known AQM schemes in achieving faster convergence to queue length target, and smaller queue length jitter. Keywords: Congestion control, Active queue management, Neuron, AQM.

1

Introduction

Internet congestion control aims to achieve eﬃcient resource utilization, acceptable packet loss and stable operation. It is based on two parts: 1) the end-to-end Transport Control Protocol (TCP) and 2) buﬀer management. The traditional algorithm for buﬀer management is Drop-Tail, which drops packets only if buﬀer overﬂows. This passive behavior may create a synchronized process alternating between periods of excessive packet loss and under utilization which in turn results in unacceptable queueing delay, ineﬃcient (low) link utilization and instability. To mitigate such problems, active queue management (AQM) has been introduced [2] to improve performance by actively trigger packet drops before a buﬀer overﬂows. AQM has been a very active research area, and many AQM schemes have been proposed (see for example [1,4,5,6,7,8,10,11,12,13,14,15,16,19,20] and references therein). However, the many published AQM proposals fail to achieve optimal congestion control operation because they use ﬁxed parameters, which is inadequate because the network state varies with time. Accordingly, an intelligent AQM controller is required for the Internet, which is complex, highly nonlinear and time varying. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 844–854, 2007. c IFIP International Federation for Information Processing 2007

An Adaptive Neuron AQM for a Stable Internet

845

In this paper, a novel neuron-based AQM scheme is proposed. It is called Adaptive Neuron AQM (AN-AQM). We apply the ideas of [3,22], where an adaptive neuron PID controller is designed for a multi-model plant. Extensive simulation results over a wide range of scenarios show that AN-AQM can control queue length and achieves fast queue-length convergence to a desirable target. These performance attributes are still maintained following signiﬁcant changes to network conditions, even for long-delay networks. We also demonstrate by simulations that AN-AQM is more eﬃcient and stable than other well-known AQM schemes. The remainder of the paper is organized as follows. In Section 2, we describe the AN-AQM scheme in details. Then, in Section 3, we present simulation results to demonstrate that AN-AQM is eﬀective, robust and outperforms other wellknown AQM schemes. Finally, we conclude the paper in Section 4.

2

The AN-AQM Scheme

The AN-AQM scheme can be described by the following equation, p(k) = p(k − 1) + Δp(k)

(1)

where p(k) is the packets dropping probability, Δp(k) is the increment of packets dropping probability given by an adaptive neuron Δp(k) = K

6

wi (k)xi (k)

(2)

i=1

and K > 0 is the neuron proportional coeﬃcient; xi (k)(i = 1, 2, ..., 6) denote the neuron inputs, and wi (k) is the connection weight of xi (k) determined by the learning rule. Let (3) e(k) = q(k) − QT denote the queue length error, where q(k) is the queue length, and QT is the target queue length. Let r(k) γ(k) = −1 (4) C denote the normalized rate error, where r(k) is the input rate of the buﬀer at the bottleneck link, and C is the capacity of the bottleneck link. The inputs of AN-AQM scheme are x1 (k) = e(k) − e(k − 1), x2 (k) = e(k), x3 (k) = e(k) − 2e(k − 1) + e(k − 2), x4 (k) = γ(k) − γ(k − 1), x5 (k) = γ(k), and x6 (k) = γ(k) − 2γ(k − 1) + γ(k − 2). According to Hebb [3], the learning rule of a neuron is formulated by (5) wi (k + 1) = wi (k) + di yi (k) where di > 0 is the learning rate, and yi (k) is the learning strategy. The associative learning strategy given in [3] is as follows: yi (k) = e(k)p(k)xi (k).

(6)

846

J. Sun and M. Zukerman

where e(k) is used as teacher’s signal. This implies that an adaptive neuron, which uses integrated Hebbian Learning and Supervised Learning, makes actions and reﬂections to the unknown outsides with associative search. It means that the neuron self-organizes the surrounding information under supervision of the teacher’s signal e(k). It also implies a critic on the neuron actions. The AN-AQM scheme is based on the following nine parameters. 1. Sampling time interval T ; an appropriate value is T = 0.001s. 2. Target queue length QT ; this parameter decides the steady-state queue length value, which will aﬀect the utilization and the average queueing delay. A high target will improve link utilization, but will increase the queueing delay. The target queue length QT should be selected according the Quality of Service (QoS) requirements. 3. The neuron proportional coeﬃcient K; a suggested value is K = 0.01. 4. The learning rate d1 ; a suggested value is d1 = 0.00001. 5. The learning rate d2 ; a suggested value is d2 = 0.00001. 6. The learning rate d3 ; a suggested value is d3 = 0.00001. 7. The learning rate d4 ; a suggested value is d4 = 0.0001. 8. The learning rate d5 ; a suggested value is d5 = 0.0001. 9. The learning rate d6 ; a suggested value is d6 = 0.0001. The above values of K and di (i = 1, 2, ..., 6) have been chosen based on trial and error by a few simulations. Nevertheless, Zhang et al. [22] showed that an adaptive neuron system is very robust and adaptable, so the choice of values of K, di (i = 1, 2, ..., 6) is not that critical. This is also conﬁrmed by simulation results presented in Section 3.9. The initial values of wi (i = 1, 2, ..., 6) do not aﬀect the performance signiﬁcantly; we use: wi = 0.00001, (i = 1, 2, ..., 6).

3

Performance Evaluation and Comparison

In this section, we conduct extensive simulations using ns2 [9] to demonstrate the performance attributes of AN-AQM and its superiority over other AQM schemes such as ARED [5], PI [7] and REM [1]. Our simulations and comparisons will cover the following attributes: 1. the ability to control the queue length to quickly converge to a given queue length target; 2. robustness to traﬃc loading under ﬁxed and dynamic scenarios, to bottleneck link capacity, and to impact of traﬃc noise (UDP and HTTP); 3. robustness to Round Trip Propagation Time (RTPT) and eﬀectiveness for long delay network; 4. performance under multiple bottleneck topology. Many of the above attributes that we achieve for AN-AQM are the result of its ﬁrst attribute, namely, stabilizing queue length at a target value QT . As mentioned above, if we can control the queue to stay close to a desirable target, we can achieve high throughput, predictable delay and low delay jitter. The low delay jitter also enables meeting QoS requirements for real time services especially when the queue length target is achieved independently of traﬃc conditions [21].

An Adaptive Neuron AQM for a Stable Internet

3.1

847

Single Bottleneck Topology

The single bottleneck network topology used in the simulation is shown in Figure 1. The only bottleneck link is the Common Link between the two routers. The other links are assumed to have suﬃcient capacity to carry their traﬃc. The sources use TCP/Reno. In the following simulations, unless mentioned otherwise, the following parameters are used: the packet size is 1000 bytes, the common link capacity is 45 Mb/s, the round trip propagation delay is 80 ms, the buﬀer size is 900 packets (twice the bandwidth-delay product of the network). The TCP connections always have data to send as long as their congestion windows permit. The receiver’s advertised window size is set suﬃciently large so that TCP connections are not constrained at the destination. The ack-every-packet strategy is used at the TCP receivers. The target queue length is set at 300 packets. Sourc e

Destination

A1

D1 Com m on Link

A2 : An

Router B

Router C

D2 : Dn

Fig. 1. The single bottleneck topology

3.2

Performance for Constant Number of TCP Connections

In this simulation experiment, we test whether AN-AQM can control and stabilize the queue length at an arbitrarily chosen target for diﬀerent loads and link capacities. Figure 2 presents the instantaneous queue lengths for 300 and 1500 TCP connections. All sources start data transmission at time 0. We can see that AN-AQM is eﬀective at stabilizing and keeping the queue length around the target QT . In order to further demonstrate this ability of AN-AQM, we set QT at 50 and 500 for 800 TCP connections. The results are depicted in Figure 3. Again, we can see that AN-AQM is indeed successful in controlling the queue length at any arbitrary chosen target. In order to test the performance of AN-AQM for diﬀerent link capacities, we vary the capacity from 45 Mb/s to 15 Mb/s and to 115 Mb/s while the other parameters remain the same. The simulation results for 800 TCP connections are given in Figure 4. Again, we can see that the queue length is stable. 3.3

Performance for Diﬀerent Round Trip Propagation Time

We now investigate the impact of round trip propagation time (RTPT) on the performance indices. Two simulations have been performed. In both there are

Queue length (packets)

J. Sun and M. Zukerman

Queue length (packets)

848

800 600 400 200 0

0

50 100 time (sec)

150 (a)

800 600 400 200 0

200

0

50 100 time (sec)

150 (b)

200

Queue length (packets)

Queue length (packets)

Fig. 2. Queue length variations for diﬀerent numbers of greedy TCP connections: (a) 300 TCP connections (b) 1500 TCP connections

800 600 400 200 0

0

50 100 time (sec)

150 (a)

800 600 400 200 0

200

0

50 100 time (sec)

150 (b)

200

Queue length (packets)

Queue length (packets)

Fig. 3. Queue length variations for the following queue length targets: (a) QT = 50 (b) QT = 500

800 600 400 200 0

0

50 100 time (sec)

150 (a)

800 600 400 200 0

200

0

50 100 time (sec)

150 (b)

200

Queue length (packets)

Queue length (packets)

Fig. 4. Queue lengths variations for the following link capacities: (a)15 Mb/s (b)115 Mb/s 800 600 400 200 0

0

50 100 time (sec)

150 (a)

200

800 600 400 200 0

0

50 100 time (sec)

150 (b)

200

Fig. 5. Queue length variations in the following scenarios: (a) RTPTs are uniformly distributed between 20 and 140 ms (b) RTPTs are uniformly distributed between 150 ms to 250 ms

An Adaptive Neuron AQM for a Stable Internet

849

800 TCP connections with diﬀerent RTPTs. In the ﬁrst, the RTPTs are uniformly distributed between 20 and 140 ms, and in the second, they are uniformly distributed between 150 and 250 ms. Figure 5 presents the queue lengths for these two simulations. The results demonstrate that AN-AQM is still eﬀective to stabilize queue length around the target with TCP connections having diﬀerent RTPTs. 3.4

Performance for TCP Connections Randomly Start and Stop

Queue length (packets)

Queue length (packets)

In this experiment, we investigate the impact of TCP connections randomly start and stop on the performance indices. We ﬁrst present results of two simulation runs where we dynamically vary the number of active TCP connections. The number of TCP connections is varied from 500 to 1500 in the ﬁrst and from 1500 to 500 in the second. In each of the runs, a group of 100 connections start (or stop), at the same time, at each 10 seconds interval. The instantaneous queue lengths are plotted in Figure 6. We can clearly see that AN-AQM is able to stabilize the queue length around the control target when the number of connections dynamically varies over time.

800 600 400 200 0

0

50 100 time (sec)

150 (a)

800 600 400 200 0

200

0

50 100 time (sec)

150 (b)

200

Queue length (packets)

Queue length (packets)

Fig. 6. Queue length variations for two cases: (a) Number of TCP connections varies from 500 to 1500 (b) Number of TCP connections varies from 1500 to 500

800 600 400 200 0

0

50 100 time (sec)

150 (a)

200

800 600 400 200 0

0

50 100 time (sec)

150 (b)

200

Fig. 7. Queue length variations under varying number of TCP connections (a) Number of TCP connections varies from 300 to 1500 (b) Number of TCP connections varies from 1500 to 300

Next, we perform two simulations involving random start and stop times, thus simulating staggered connection setup and termination. In the ﬁrst, the initial number of connections is set to 300 and, in addition, 1200 connections have their start-time uniformly distributed over a period of 100 seconds. In the

850

J. Sun and M. Zukerman

second simulation, the initial number of connections is set to 1500, out of which 1200 connections have their stop-time uniformly distributed over a period 100 seconds. The instantaneous queue lengths are plotted in Figure 7. We can clearly see that AN-AQM is able to stabilize the queue length around the control target. 3.5

Performance for Long Delay Network

Queue length (packets)

Recalling that simulations in [11] have reported that AQMs, such as PI, RED and REM, are unstable when RTPT is 400 ms. In this experiment, we investigate here the performance of AN-AQM for a long delay network. In the simulation, there are 800 TCP connections, and the RTPTs are 500 ms. Figure 8 presents the queue length for the AN-AQM. The results demonstrate that AN-AQM is still eﬀective in stabilizing the queue length around the target for TCP connections with long RTPTs. 800 600 400 200 0

0

50

100 150 time (sec)

200

Fig. 8. Queue length variations for 800 TCP connections with the RTPT of 500 ms

3.6

Performance for TCP Connections Mixed with Both HTTP and UDP Connections

Queue length (packets)

In this simulation experiment, we investigate the performance impact of the disturbances caused by HTTP as well as UDP connections. We have considered 800 TCP connections with diﬀerent RTPTs. These RTPTs are uniformly distributed between 50 and 500 ms. The bursty HTTP traﬃc involves 400 sessions (connections), and the number of pages per session is 250. The RTPTs of the

800 600 400 200 0

0

50

100 150 time (sec)

200

Fig. 9. Queue length variations for 800 TCP connections in the presence of additional UDP and HTTP ﬂows

An Adaptive Neuron AQM for a Stable Internet

851

HTTP connections are uniformly distributed between 50 and 300 ms. There are 400 UDP ﬂows with propagation delay uniformly distributed between 30 to 250 ms. Each of the UDP sources follows an exponential ON/OFF traﬃc model, both the idle and the burst times have mean of 500 ms. The packet size is set at 500 bytes, and the sending rate during on-time is 64 kb/s. Figure 9 provides the queue lengths, which demonstrate that AN-AQM is robust to the disturbances caused by HTTP as well as UDP connections. 3.7

Multiple Bottlenecks

Here we extend the simple single bottleneck topology to a case of multiple bottlenecks. We consider the network topology presented in Figure 10. There are two bottlenecks in this topology. One is between Routers B and C, and the other is between Routers D and E. The link capacity of the two bottlenecks is 45 Mb/s and the capacity of other links is 100 Mb/s. There are three traﬃc group. The ﬁrst group has N TCP connections traversing all bottleneck links, the second group has N1 TCP connections traversing the bottleneck link between Routers B and C, and the third group has N2 TCP connections traversing the bottleneck link between Routers D and E. The RTPTs of the ﬁrst group are 80 ms, and for the second and third groups, they are 100 ms and 150 ms, respectively. Two simulation tests have been performed. In the ﬁrst, N = 500, N1 = 200, and N2 = 200, and in the second, N = 100, N1 = 800, and N2 = 400. Figure 11 presents the queue lengths for these two case. The results demonstrate that AN-AQM is eﬀective in stabilizing the queue length around the target for TCP connections in multiple bottleneck network. TCP Sender Group 2

TCP sender Group 1

100Mbps Router A

Router B

TCP Sender Group 3

45Mbps

100Mbps Router C

TCP Sink Group 2

TCP Sink Group 1

45Mbps

Router D

Router E

100Mbps Router F

TCP Sink Group 3

Fig. 10. The multiple bottleneck network topology

3.8

Comparison with Other AQMs

In this section, we perform simulations to compare the performance of AN-AQM with ARED [5], PI controller [7], and REM [1]. The network topology used in the simulation is the same as in Figure 1. The target queue length is set at 300 packets for all AQM algorithms. For ARED, we set the parameters: minth = 15, maxth = 585 and wq = 0.002, and other parameters are set the same as in [5]: α = 0.01, β = 0.9, intervaltime = 0.5s. For PI controller, we use the default

J. Sun and M. Zukerman

800 600 400 200 0

Queue length (packets)

Queue length (packets)

1000

0

50 100 time (sec)

150 (a)

1000 800 600 400 200 0

0

50 100 time (sec)

150 (c)

1000 800 600 400 200 0

200

Queue length (packets)

Queue length (packets)

852

50 100 time (sec)

150 (b)

200

0

50 100 time (sec)

150 (d)

200

1000 800 600 400 200 0

200

0

600 400 200 0

Queue length (packets)

Queue length (packets)

800

0

50 100 time (sec)

150 (a)

800 600 400 200 0

0

50 100 time (sec)

150 (c)

200

800 600 400 200 0

200

Queue length (packets)

Queue length (packets)

Fig. 11. Queue length variations in multiple bottleneck network: (a) Router B for N = 500, N1 = 200, and N2 = 200 (b) Router D for N = 500, N1 = 200, and N2 = 200 (c) Router B for N = 100, N1 = 800, and N2 = 400 (d) Router D for N = 100, N1 = 800, and N2 = 400

0

50 100 time (sec)

150 (b)

200

0

50 100 time (sec)

150 (d)

200

800 600 400 200 0

Fig. 12. Comparison of queue length variations: (a) AN-AQM (b) ARED (c) PI (d) REM

parameters in ns2 : a = 0.00001822, b = 0.00001816 and the sampling frequency w = 170. For REM, the default parameters of [1] are used: φ = 1.001, γ = 0.001. The parameters of AN-AQM are set as speciﬁed in the previous Section. In the simulation, the initial number of connections is set to 300, and 1200 additional connections have their start-times uniformly distributed over a period of 100 seconds. Figure 12 presents the queue lengths for all four AQMs. We can

An Adaptive Neuron AQM for a Stable Internet

853

Table 1. Mean and standard deviation of the queue length for various AQMs AN-AQM ARED REM PI Mean Standard deviation

298.9 2.3

322.2 321.0 285.3 15.6 71.2 50.5

see that AN-AQM reacts and converges to the target queue occupancy of 300 faster than all other three AQMs. In order to evaluate the performance in steady-state, we calculate, the average and the standard deviation of the queue length for the last 150 seconds. The results are presented in Table 1. We observe that AN-AQM queue length has the mean of 298.9, which is the closest to the target of 300, and achieved the lowest standard deviation of all AQMs. 3.9

Comments on Parameter Robustness and Complexity

We have examined the robustness and sensitivity of AN-AQM to parameters setting. To this end, we performed a set of simulations using again the single bottleneck network topology shown in Figure 1. In each simulation experiment, we ﬁxed all parameters except one, and measure the sensitivity and the performance of AN-AQM resulting from changing the single parameter. The results which are presented in the extended version of this paper [18] demonstrate that AN-AQM is robust to neuron proportional coeﬃcient misconﬁguration and learning rate gain misconﬁguration as it achieves stability in all cases in the similar way. A thorough computational analysis of AN-AQM and its comparison with other AQM is planned for future study. However, since AN-AQM conducts computation every sample interval and not every packets (such as RED for example) we do not expect the computational complexity to be an impediment.

4

Conclusion

We have introduced a novel AQM scheme called AN-AQM. We have demonstrated by simulations that AN-AQM is able to maintain the queue length around the given target under diﬀerent traﬃc loading and scenarios, diﬀerent RTPTs, and diﬀerent bottleneck link capacities. The numerous simulation results have also demonstrated that AN-AQM is powerful, easy to conﬁgure, and robust to bursty HTTP connections and non-responsive UDP connections. Comparison with other well-known AQM schemes has demonstrated the superiority of AN-AQM in achieving faster convergence to target queue length, and then maintaining the queue length closest to the target. Acknowledgments. This work was jointly supported by grants from the Australian Research Council (Grant DP0559131), and the Natural Science Foundation of Jiangsu Province, China (No. BK2004132).

854

J. Sun and M. Zukerman

References 1. Athuraliya S., Low S.H., Li V.H. and Yin Q.: REM: Active Queue Management. IEEE Network Mag. 15 (2001) 48–53 2. Braden B., et al.: Recommendations on Queue Management and Congestion Avoidance in the Internet. IETF RFC2309 (1998) 3. Du Y. and Wang N.: A PID Controller with Neuron Tuning Parameters for Multimodel Plants. Proceedings of 2004 International Conference on Machine Learning and Cybernetics 6 (2004) 3408–3411 4. Feng W., Kandlur D., Saha D., Shin K.: The Blue Queue Management Algorithms. IEEE/ACM Transactions on Networking 10 (2002) 513–528 5. Floyd S., Gummadi R. and Shenker S.: Adaptive RED: An Algorithm for Increasing the Robustness of RED’s Active Queue Management. http://www.icir.org/ ﬂoyd/red.html (2001) 6. Floyd S. and Jacobson V.: Random Early Detection Gateways for Congestion Avoidance. IEEE/ACM Trans. Networking 1 (1993) 397–413 7. Hollot C.V., Misra V., Towsley D. and Gong W.: On Designing Improved Controllers for AQM Routers Supporting TCP Flows. Proceedings of IEEE INFOCOM 2001 3 (2001) 1726–1734 8. Kunniyur S.S. and Srikant R.: An Adaptive Virtual Queue (AVQ) Algorithm for Active Queue Management. IEEE/ACM Transactions on Networking 12 (2004) 286–299 9. The NS Simulator and the Documentation, http://www.isi.edu/nsnam/ns/ 10. Ranjan P., Abed E.H., and La R.J.: Nonlinear Instabilities in TCP-RED. IEEE/ACM Transactions on Networking 12 (2004) 1079–1092 11. Ren F., Lin C. and Wei B.: A Robust Active Queue Management Algorithm in Large Delay Networks. Computer Communications 28 (2005) 485–493 12. Sun J., Chen G., Ko K.T., Chan S. and Zukerman M.: PD-Controller: A New Active Queue Management Scheme. Proceedings of IEEE Globecom 6 (2003) 3103–3107 13. Sun J., Ko K.T., Chen G., Chan S. and Zukerman M.: PD-RED: to Improve the Performance of RED. IEEE Communications Letters 7(2003) 406–408 14. Sun J., Chan S., Ko K.T., Chen G. and Zukerman M.: Neuron PID: A Robust AQM Scheme. Proceedings of ATNAC 2006 (2006) 259–262 15. Sun J., Zukerman M. and Palaniswami M.: Stabilizing RED Using a Fuzzy Controller. Proceedings of ICC 2007 (2007) 16. Sun J. and Zukerman M.: Improving RED by a Neuron Controller. Proceedings of ITC20 (2007) 17. Sun J. and Zukerman M.: RaQ: A Robust Active Queue Management Scheme Based on Rate and Queue Length. to appear in Computer Communications (2007) 18. Sun J. and Zukerman M.: An Adaptive Neuron AQM for a Stable Internet (extended version). unpublished report, available on-line: http://www.ee.unimelb.edu. au/staﬀ/mzu/AN AQM extended version.pdf 19. Wydrowski B. and Zukerman M.: GREEN: An Active Queue Management Algorithm for a Self Managed Internet. Proceedings of ICC 2002 4 (2002) 2368–2372 20. Wang C., Li B., Hou Y.T., Sohraby K. and Long K.: A Stable Rate-based Algorithm for Active Queue Management. Computer Communications 28 (2005) 1731–1740 21. Wydrowski B. and Zukerman M.: QoS in Best-eﬀort Networks. IEEE Communications Magazine 40 (2002) 44–49 22. Zhang J., Wang S. and Wang N.: A New Intelligent Coordination Control System for a Unit Power Plant . Proceedings of the 3rd World Congress on Intelligent Control and Automation (2000) 313–317

Light-Weight Control of Non-responsive Traﬃc with Low Buﬀer Requirements Venkatesh Ramaswamy1, Leticia Cu´ellar1 , Stephan Eidenbenz1 , uhl2 , and Birgitta Weber3, Nicolas Hengartner1 , Christoph Amb¨ CCS-3, Los Alamos National Laboratory, USA {vramaswa,leticia,eidenben,nickh}@lanl.gov 2 University of Liverpool, Liverpool, Great Britain [email protected] 3 Unilever R&D, Port Sunlight, UK [email protected] 1

Abstract. We propose ESREQM (Eﬃcient Sending Rate Estimation Queue Management), a novel active queue management scheme that achieves almost perfect max-min fairness among ﬂows with minimum (constant) per-ﬂow state and a constant number of CPU operations to handle an incoming packet using a single queue with very low buﬀer requirements. ESREQM estimates sending rates of ﬂows through a history discounting process that allows it to guarantee max-min fairness by automatically adapting parameters. It can also be used to punish non-responsive ﬂows. The per-ﬂow state is limited to a single value per ﬂow, which allows the ﬂow memory to be in SRAM, thereby making packet processing scalable with link speeds. ESREQM results in good link utilization with low buﬀer size requirements because it provably desynchronizes TCP ﬂows as a by-product. We show our results through a mixture of analysis and simulation. Our scheme does not make assumptions on what transport protocols are used.

1

Introduction

Congestion in a network occurs when the sum of demand on any resource in the network is greater than its available capacity. Most common resources in a network are link bandwidth, network processors and buﬀer space. Congestion, if left uncontrolled, will lead to a situation known as congestion collapse, a deadly situation in which the throughput of the network approaches zero and the packet delays approach inﬁnity. The key measures of performance of a congestion control scheme are (i) fairness in resource allocation, (ii) eﬃciency in resource utilization (in particular link utilization and buﬀer requirements1 ), and (iii) computational

1

This work was done while the author was working for Swiss Federal Institute of Technology, Zurich. Small buﬀers requirements are desirable as traditional rules of thumb result in huge memory requirements for high-speed routers.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 855–866, 2007. c IFIP International Federation for Information Processing 2007

856

V. Ramaswamy et al.

eﬃciency2 . We propose an Active Queue Management (AQM)-scheme called Efﬁcient Sending Rate Estimation Queue Management (ESREQM) that optimizes all three measures while previous work addresses at most two of the three measures. In addition, ESREQM has the desirable features of allowing punishment of unresponsive ﬂows and of straight-forward maximum buﬀer size provisioning. Congestion control schemes can be implemented at the source or at the forwarding router. When congestion control is implemented at the source, the source is expected to curtail its sending rate when it detects congestion in the network. In the TCP congestion control algorithm, a source will reduce its sending rate when it detects congestion in the form of packet loss, ECN or source quench [1]. Congestion control at the source is eﬀective only if all the sources in the network implement a common congestion control algorithm. Unfortunately, for a heterogeneous network such as the Internet, the TCP assumption that all the sources will implement a common congestion control algorithm is naive: network nodes are owned and operated by a multitude of commercial (and governmental) entities that might choose to optimize the throughput of their own traﬃc in a selﬁsh manner without regard for network-wide optimization. While TCP congestion control has worked well in practice for a while, recent advances in streaming applications have led to increased use of UDP as the transport protocol. The unresponsive nature of UDP makes it TCP-unfriendly: a UDP ﬂow can starve a TCP ﬂow by taking all available bandwidth [1]. Because the malicious UDP ﬂow is sending at a very high rate, the TCP ﬂow gets very low throughput. Reallife occurrences of TCP starvation have been reported with increasing intensity [1]. Source-based congestion control cannot guarantee fair resource allocation. Assuming it is used in combination with reasonably conﬁgured FIFO drop-tail routers, source-based congestion control scores high on the other two metrics: resource utilization and computational eﬃciency. Congestion control at the router can be classiﬁed into two main groups scheduling algorithms and queue management algorithm [2]. Scheduling algorithms usually require a separate queue for each ﬂow at the output port of the router. Each queue is then serviced in some predetermined order, usually in a round-robin way. Well-known scheduling algorithms are Weighted Fair Queueing (WFQ) and its variants such as Worst-case Fair Weighted Fair Queueing (WF2 Q), Self-clocked Fair Queueing (SCFQ) and Stochastic Fair Queueing (SFQ) [1]. Scheduling algorithms can typically guarantee quality of service in terms of throughput and delay [1]. As the number of ﬂows increases, maintaining per-ﬂow queue becomes computationally prohibitive. In terms of congestion control measures, scheduling algorithms provide fairness and eﬃcient resource utilization, but they are not computationally eﬃcient. Queue management algorithms such as Random Early Drop, CHOKe, or Stochastic Fair Blue (SFB) decide upon packet arrival whether to drop the incoming packet and usually maintain a single FIFO queue that is shared by all ﬂows [2]. These schemes are usually computationally eﬃcient as they do not 2

We can deﬁne computational eﬃciency in terms of CPU cycles used and memory required by the scheme.

Light-Weight Control of Non-responsive Traﬃc

857

need to maintain multiple queues and most can be conﬁgured to achieve good resource utilization for bandwidth and buﬀer size, but they do not guarantee fairness among ﬂows [2]. The key idea of our scheme ESREQM is that dropping decisions are primarily based on the sending rates of traﬃc ﬂows and only secondarily depend on aggregate queue length, which represents a departure from the traditional AQM paradigm that drops almost exclusively based on aggregate queue length. The implementation overhead of ESREQM is much lower compared to scheduling schemes such as WFQ because it only needs a single queue in contrast to perﬂow queues in WFQ. ESREQM requires to store only the bare minimum state to guarantee fairness, which allows the ﬂow memory to be Static Random Access Memories (SRAM) instead of Dynamic Random Access Memories (DRAM), thereby making packet processing scalable with link speeds [3]. These features make ESREQM very attractive compared to other similar schemes. We describe the conceptual approach to estimating sending rates in Section 2. In Section 3 we develop ESREQM. After discussing control parameter settings in Section 4, we present simulation results that validates our fairness claim. Section 6 describes ESREMQ’s buﬀer requirements: it turns out that buﬀer can be provisioned according to a simple function of link bandwidth and number of ﬂows. ESREMQ has very small buﬀer requirements which ﬁnds an explanation in the de-synchronization of TCP ﬂows that ESREMQ achieves as a side eﬀect. We conclude in Section 7.

2

Conceptual Design

Most queue management schemes at the router either do not allocate bandwidth fairly to the ﬂows or do not scale. While current approaches base packet dropping decisions on the aggregated queue size, we believe that the key to fair bandwidth allocation is to base dropping decision on the characteristics of each individual traﬃc ﬂow and also on the aggregate instantaneous queue length. In this work we propose a novel queue management scheme called Sending Rate Estimation based Queue Management (SREQM) that drops a packet of ﬂow j that arrives at time t with a probability Pj (t) that depends on an estimate of the relative sending rate of ﬂow j at time t and the aggregate queue size at time t. That is, Pj (t) = f (Hj (t), Q+ (t)),

(1)

where Hj (t) is an estimate of the sending rate of ﬂow j at time t, Q+ (t) is the instantaneous queue length and f (Hj (t), Q+ (t)) is a function of the estimated sending rate of ﬂow i and the instantaneous queue length. This approach does not require per-ﬂow queues, but only requires sending rate estimate book keeping. In order to enforce max-min fairness with bounded errors, a queueing scheme must maintain per-ﬂow state [4]. We maintain minimum per-ﬂow state information in the form of the relative sending rate of each ﬂow. Estimating the relative sending rates of ﬂows can be very complex in terms of computation and memory. In [5] we systematically develop a scalable estimator

858

V. Ramaswamy et al.

easily amenable to high speed implementations. Here we deﬁne the estimator without providing the details of its development. Without loss of generality, assume that time is divided into discrete time slots with a single packet from one of the ﬂows in each time slot. A ﬂow has a unique identiﬁer given by the four tuple : (source address, source port, destination address, destination port). The relative sending rate of each ﬂow, fi , can be estimated by an exponentially weighted ﬁlter, Hi (·), given by ⎧ 1 − T1 Hj (t − 1) + T1 if the packet at time slot t ⎪ ⎪ ⎨ is from ﬂow j Hi (t) = 1 H 1 − (t − 1) otherwise. ⎪ j ⎪ T ⎩ where T is a parameter corresponding to the number of packets in history considered for estimation [5]. Hi (·) is an exponential smoother, where the most recent observation is weighted by 1 − T1 , the second most recent observation 2 and so on. This makes Hi (·) adapt easily to changes is weighted by 1 − T1 in the sending rate of ﬂows. Moreover, we only need to keep track of a single number for each ﬂow, whereas most other estimators require us to keep track of some form of history for each ﬂow. Since Hi (·) estimates the relative sending rate of ﬂow i, and the sum of the relative sending rates of all the ﬂows should be one. We can show that Hj (t) = 1. (2) j

Our estimation procedure readily extends to the continuous case. Suppose that each ﬂow is a realization from a Poisson process with parameter λj . For n ﬂows arriving at the router, we can show that n

λ i=1 i t − λj T E [Hj (t)] = n 1−e , (3) i=1 λi and at steady state, λj E [Hj (t)] = n

i=1

λi

,

(4)

which is the relative sending rate of ﬂow j. In the following section, we describe an algorithms to guarantee fairness, which follows directly from conceptual design.

3

Eﬃcient Sending Rate Estimate-Based Queue Management Scheme: An Algorithm for Eﬃciency and Fairness

Most queue management schemes do not allocate bandwidth fairly or achieve fair bandwidth allocation only at high complexity. While most queueing schemes

Light-Weight Control of Non-responsive Traﬃc

859

drop packets based on the average queue size, SREQM estimates the relative sending rate of ﬂows using the estimator Hi (·) and uses this estimate along with the instantaneous aggregate queue length to drop packets from each ﬂow. The pseudo-code for the whole procedure is given in Algorithm 1. Algorithm 1. ESREQM :: onPacketArrival(packet P ) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19:

x ⇐ ﬂow id of packet P Update the sending rate of each ﬂow based on the estimator, Hi (·) if (Hx ≤ K) then add packet P to the queue; else drop packet P end if if (count > 0) then count−− else if (queue size < qmin ) then K ++ count ⇐ F end if if (queue size > qmax ) then K −− count ⇐ F end if end if

The algorithm works as follows: When a packet arrives, extract the ﬂowid of the packet. Update the estimate Hj (·) for all the ﬂows. If the value of Hj (·) of the ﬂow from which the packet arrives is greater than K, drop the packet, else add the packet to the queue. The parameter K is called the fairshare parameter and it represents the maximum share of the bandwidth a ﬂow can get. The ﬂows need not be restricted when there is no congestion. We change the value of K dynamically to reﬂect the changes in the characteristics of the incoming traﬃc as well as the level of congestion. This change is governed by the current queue size. If the queue size is larger than some maximum threshold qmax , which is an indication of congestion, the value of K is decreased by one. This results in restricting the sending rate of ﬂows. Likewise, when the current queue size is below some minimum threshold qmin , which is an indication of low link utilization, the value of K is increased by one, allowing ﬂows to come in at a faster rate. To ensure a smooth variation of K, the update procedure of K is done once in F packet arrivals. The parameter F is called the congestion parameter and is a representative of how fast the system responds to congestion.

4

Setting the Parameters of ESREQM

ESREQM has three main parameters: the history parameter - T , the fair share parameter - K and the congestion parameter - F . Setting the parameters to the

860

V. Ramaswamy et al.

right values is crucial for the proper functioning of the algorithm. In this section, we give some engineering guidelines to set the parameters of the algorithm. The eﬀective estimation of the relative sending rate depends on appropriately selecting the value of the history parameter, T . To get a feel for how the selection of T matters, consider two extreme cases, T = 1 and T = ∞. For T = 1, the estimation will not be correct as Hi (·) can only have two values - one or zero. For T = ∞, Hi (t) = Hi (t − 1) and assuming Hi (0) = 0, Hi (·) will always be zero making the estimate incorrect. The case for T = ∞ is counter intuitive because higher the value of T , larger the history and we would expect the estimate to be better. It can be shown that as T goes up the variance decreases, but the bias increases, thereby making the estimate unacceptable. A value of 400 for the parameter T yields good performance [1]. The value of the parameter K varies dynamically based on the level of congestion. The initial value of K does not have a major impact on the performance of the algorithm. Our simulations suggested that the K will ﬁnally converge to T n , where n is the number of ﬂows. The initial value of K used in our simulations was 50. The congestion parameter, F , determines how fast we change K. This also determines the level of penalty that we impose on ﬂows that send at a rate higher than their fair share. The higher the value of F , the larger the level of penalty we impose. Various simulation results [1] show that a value of approximately 200 for parameter F will result in a fair bandwidth allocation.

5

Experimental Results

In order to demonstrate the eﬀectiveness of our scheme in achieving fairness, we perform several simulation studies using network simulator (ns-2). There are many deﬁnition of fairness, but here we use one of the most common deﬁnitions, which is the max-min fairness [1]. We conduct studies on a dumbbell topology as shown in Figure 1. For the dumbbell topology there are n sources and n destinations numbered S1 to Sn and D1 to Dn respectively. Two routers R1 and R2 connect the sources to their destinations. Each of the links is assigned a bandwidth of 10Mbps, which means that the bottleneck link will have n times less bandwidth than the combined bandwidth of the access links. We illustrate the the eﬀectiveness of our scheme using an example experiment with 32 ﬂows, out of which 31 are TCP ﬂows and one is a UDP ﬂow with sending rate 1Mbps. The bottleneck link bandwidth is 10Mbps, which means that the UDP is sending at a much higher rate than its fair share. Figure 2 shows the average throughput of all 32 ﬂows over a period of 500 sec in the form of a box plot. We can see that all the ﬂows get approximately the same bandwidth, which is close to their theoretical fair share. The UDP ﬂow which was sending at a rate much higher than its fair share was brought down close to its fair share. More experiments with other topologies such as the parking lot topology are given in [1].

Light-Weight Control of Non-responsive Traﬃc

861

We now discuss the conclusions that can be drawn from the simulation studies. ESREQM approximates max-min fairness in all the scenarios considered. The level of fairness is much better than that of many other schemes such as RED, CHOKe, SFB or FRED. While some of the other schemes do not maintain per-ﬂow state information, we only maintain minimum state information which is required to guarantee max-min fairness. ESREQM does not require precise parameter tuning and the max-min bandwidth allocation degrades only very gradually when the parameters are far out of range [1].

0.5

D1

S1

Link usage

R1

Link Share (Mbps)

0.4

R2

0.3

0.2

0.1

Dn

Sn

0 0

5

10

15

20

25

30

Flow ID

Fig. 1. Dumbbell Topology

6

Fig. 2. Average Throughput of 32 ﬂows

Buﬀer Requirements

Accurately sizing the buﬀer is one of the most challenging parts of a queueing scheme design [6]. Low buﬀers can cause excessive packet losses and large buﬀers can lead to unacceptable packet delays. The buﬀer requirements of the queueing scheme is heavily inﬂuenced by the packet dropping mechanism used by the scheme. Buﬀer in today’s routers are provisioned based on a rule of thumb by Villamiziar et al. [7], which says that the amount of buﬀer required is the end to end round trip time (RTT) times the link bandwidth (c). This rule, though developed in the early 90s is still widely accepted. The rule was developed to make a congested link busy 100% of the time assuming very few TCP ﬂows traversing the router with a droptail queue. Based on this rule, today’s routers, which operate at a speed of around 50Gbps would require 50Gbps x 250ms (typical RTT) = 12.5Gbits of memory. Router buﬀers are usually built using SRAMs or DRAMs. SRAMs are fast and expensive whereas DRAMs are slow and cheap. In order to build 12.5Gb of router buﬀer using SRAMs, based on current standards, it will require approximately 300 SRAM chips on a board, making the board too big and too hot. On the other hand, if we use DRAM, due to slow access speed, parallel implementation will be necessary making the design too complicated [6]. Future routers will be required to operate at much higher speeds, making the buﬀer design a much more complicated problem. There is deﬁnitely a need for queue management schemes which has low buﬀer requirements.

862 8

V. Ramaswamy et al. 8

Number of sources = 16

Number of sources = 32

8

6

6

6

Q+ (t) n

7

Q+ (t) n

7

Q+ (t) n

7

3

3

3

2

2

2

1

1

5

5

4

5

4

0

Number of sources = 64

4

1

0

0

100 200 300 400 500 600 700 800 900 1000 Time (sec)

100 200 300 400 500 600 700 800 900 1000 Time (sec)

100 200 300 400 500 600 700 800 900 1000 Time (sec)

Fig. 3. Average queue length over time with 16 TCP ﬂows

Fig. 4. Average queue length over time with 32 TCP ﬂows

Fig. 5. Average queue length over time with 64 TCP ﬂows

Appenzeller et al. [6] showed that with a large number of ﬂows and a droptail buﬀer, the buﬀer requirement can actually come down. They show that the buﬀer T) √ , where n of the number of ﬂows, RTT is requirement can be as low as c(RT n the end to end round trip time and c is the link bandwidth. Gorinsky et al. [8] showed by simulation that the above result does not hold when the number of ﬂows is in the order of 100 ﬂows, which would be the case in slower access links serving fewer connections. In order to get some understanding of the buﬀer requirements under our scheme, we perform simulation studies on a congested router which implements ESREQM. Two sets of experiments, one with just TCP traﬃc and other with both TCP and UDP traﬃc (UDP traﬃc less than 10% of the total traﬃc) is performed. In the ﬁrst set of experiments, only TCP traﬃc is considered, and we vary the number of TCP ﬂows, while scaling the maximum queue capacity proportionally. Let Q+ (t) be the queue size at time t and n be the number of ﬂows. Figures 3, 4 + and 5 plot the evolution of queue size divided by the number of ﬂows, Q n(t) , when the number of ﬂows is 16, 32 and 64 respectively. Surprisingly, the variations in Q+ (t) decrease as we increase the number of ﬂows. When the number of ﬂows n

approaches inﬁnity, Q n(t) approaches a constant. Our estimation of the required buﬀer size is based on this observation. Deﬁne Qi (t) to be the number of packets from ﬂow i on the queue at time t, i.e., ∀i Qi (t) = Q+ (t). Assume the following conditions are valid. +

– (Q1 (t), · · · , Qk (t)) are exchangeable3 [1] – Cov(Q1 (t), Q2 (t)) = cn−γ where c is a constant and γ ≥ 1. With the above assumptions, the central limit theorem [1] can be applied and we can write + Q (t) − E[Q+ (t)] √ ≤ x ≈ Φ(x), (5) P nσ 3

Any size k subset of the n random variables has the same joint distribution as (Q1 (t), · · · , Qn (t)).

Light-Weight Control of Non-responsive Traﬃc

863

Table 1. statistics of Qi (t) for diﬀerent number of ﬂows (n) n Average (pkts) Std. deviation 8 4.250 1.1077 16 4.217 0.7605 32 4.209 0.5388 64 4.222 0.4545 128 4.2452 0.3423

where Φ(·) denotes a normal distribution and σ is the standard deviation of Qi (t). To empirically verify if the central limit theorem holds, we want to ascertain if the distribution of Qi (t) is Gaussian. A common technique to test if a data set comes from a Gaussian distribution is to plot the observed quantiles versus the theoretical quantiles in a quantile-quantile (Q-Q) plot. The Q-Q plot of Qi (t) for n = 128 is given in Figure 8. The plot is approximately linear indicating that Qi (t) is approximately Gaussian. Deﬁne Bα to be the buﬀer size such that the probability that the queue size, Q+ (t), will be less than or equal to Bα is α. For the speciﬁc case of α = 0.999999, we can compute B0.999999 as √ B0.999999 = nˆ μ + nˆ σ 12 log 10, (6) where μ ˆ is the sample mean and σ ˆ is the sample standard deviation of Qi (t). Table 1 summarizes the various statistics of Qi (t) for diﬀerent number of ﬂows. Using Table 1, we can compute B0.999999 for ESREQM with 32 ﬂows as 135KB. This means that for ESREQM the probability of exceeding a buﬀer size of 135KB with 32 ﬂows is less than 10−6 . Note that with the same settings, droptail queue will require 500KB. To investigate the eﬀect of link bandwidth, we kept both the number of ﬂows and the parameters of the algorithm constant, while varying the link capacity. We observed an increase in the average sending rate of ﬂows, with Qi (t) remaining the same. In other words, the change in the link bandwidth did not aﬀect the average queue size and our estimation of Q+ (t) remains valid. Note that in all our experiments, the link utilization was 100%. We still see the convergence when there are UDP and TCP ﬂows. The average queue size for the case when there are 64 TCP ﬂows and one UDP ﬂow and well as the case when there are 128 TCP ﬂow and one UDP ﬂow are shown in Figure 6. We can observe that the variations in the average queue size is lower in the case with 128 TCP ﬂows. The convergence of the average queue length was observed in RED routers under very special conditions of dropping function when there are only TCP ﬂows [9]. In our scheme the convergence happens without any special conditions and also with TCP and UDP traﬃc. The two main conclusions that we can draw from the simulation results presented in this section are as follows. ESREQM requires much lower memory than most of the other queueing schemes for same performance under similar network conditions. For a given number of ﬂows, with ESREQM, we can

V. Ramaswamy et al. 500

10

128 TCP flows, 1 UDP flow (rate 300 Kbps) 64 TCP flows, 1 UDP flow (rate 300 Kbps)

˜ i (t) W

864

6

300

4

200

2

100

0 50

100

150

200 250 Time (sec)

300

350

0

400

Fig. 6. Average queue length over time with 1 UDP ﬂow and diﬀerent number of TCP ﬂows

Synchronous TCP flows TCP flows after passing through our queueing scheme

400

i=1

n

Q+ (t) n

8

50

100

150

200

250

300

Time (sec)

Fig. 7. Aggregate Window

Congestion

accurately estimate the amount of buﬀer needed. This is very important for the optimal performance of any queue management scheme. 6.1

An Explanation for the Low Buﬀer Requirement

An intuitive explanation for the low buﬀer requirements of our queueing scheme is presented in this section. The congestion window process of a TCP ﬂow typ˜ i (t) to be process deically has a form of a saw-tooth waveform [10]. Deﬁne W scribing the congestion window for ﬂow i and Q+ (t) be the aggregate queue length at time t. The queueing process [10] at the router can be described as Q+ (t + 1) = Q+ (t) +

n

˜ i (t) − cτ, W

(7)

i=1

where c is the rate at which the router serves the packet τ is the average end to end round trip time of each ﬂow. τ = τp +τq , where τp is the average propagation delay and τq is the average queueing delay. The above equation can be re-written as n n + ˜ i (t) − c τp + Q (t) = ˜ i (t) − cτp . (8) W W Q+ (t + 1) = Q+ (t) + c i=1 i=1 It is obvious from the above equation that the maximum queue size is reached when all the ﬂows are synchronized and the maximum queue size will depend on the maximum congestion window for each ﬂow. When there are many synchronized TCP ﬂows with each of these ﬂows having a “saw-tooth” congestion window, the eﬀective congestion window will also have a form of a saw-tooth. This is a consequence of the fact that adding many synchronous saw-tooth waveforms will lead to a single large saw-tooth. For synchronous TCP ﬂows, this phenomenon is illustrated in [10]. Previous studies have shown that even though

Light-Weight Control of Non-responsive Traﬃc

865

4.4 4.2 4.0 3.6

3.8

Sample Quantiles

4.6

4.8

Normal Q-Q plot of Qi (t)

−4

−2

0

2

4

Theoretical Quantiles

Fig. 8. QQ plot of Qi (t) when n=128

two TCP ﬂows start at diﬀerent times, they get quickly synchronized in real networks when the routers are employed with droptail or RED [6]. Therefore, from (8), we expect no improvement in the buﬀer size requirement when there are many synchronized TCP ﬂows. With our queueing scheme, the TCP ﬂows get de-synchronized quickly as the dropping criteria for packets from a ﬂow is not completely dependent on other ﬂows (the dropping is a function of both Hi (t) and Q+ (t) and not just Q+ (t)). If we add together many de-synchronized saw-tooth waveforms, their sum is less likely to be like a saw-tooth waveform as they smooth each other out, and we get a wave form which is less varying. When we add many de-synchronized congestion window processes, the same thing happens. The resulting window process will have very less variation and the peak of the eﬀective window process comes down, which results in much smaller buﬀer requirement. This is illustrated in ˜ i (t)) of all the ﬂows Figure 7. We plot the sum of congestion window ( ni=1 W when the ﬂows are synchronized (as in the case of droptail) and when the ﬂows traverse through a congested router implementing ESREQM. Since the ˜ i (t) is much ﬂows get desynchronized with ESREQM, the maximum of ni=1 W lower resulting in a low buﬀer requirement.

7

Conclusion

In this paper, the problem of allocating max-min fair bandwidth to ﬂows in a congested router is addressed. We presented architecture and algorithms, along with simulation results of a simple scalable scheme which can provide approximate max-min fair bandwidth allocation. The buﬀer requirements of the scheme is shown to be much lower than the conventional routers, which is a great advantage from the implementation perspective. We showed by a combination of analysis and simulation that our scheme performs well simultaneously in terms of three key measures (i) fairness in resource allocation (ii) eﬃciency in resource utilization and (ii) computational eﬃciency.

866

V. Ramaswamy et al.

References 1. V. Ramaswamy, “Eﬃcient Control of Non-Cooperative Traﬃc Using Sending Rate Estimate-Based Queue Management Schemes,” Ph.D. dissertation, The University of Mississippi, 2006. 2. R. Pan, B. Prabhakar, and K. Psounis, “CHOKe, A Stateless Active Queue Management Scheme for Approximating Fair Bandwidth Allocation,” in Proc. of IEEE INFOCOM’00, July 2000, pp. 942–951. 3. G. Varghese, Network Algorithmics: An Interdisciplinary Approach to Designing Fast Networked Devices, 1st ed. Morgan Kaufmann, December 2004. 4. A. Das, D. Dutta, A. Goel, A. Helmy, and J. Heidemann, “Low State Fairness: Lower Bounds and Practical Enforcement,” in Proc. of IEEE INFOCOM’05, Month 2005, pp. 2436–2446. 5. V. Ramaswamy, L. Cuellar, S. Eidenbenz, and N. Hengartner, “Preventing Bandwidth Abuse at the Router through Sending Rate Estimate-Based Queue Managemnet Schemes,” in Proc. of IEEE ICC’07, May 2007. 6. G. Appenzeller, I. Keslassy, and N. McKeown, “Sizing Router Buﬀers,” in Proc. of ACM SIGCOMM’04, August 2004, pp. 281–292. 7. C. Villamizar and C. Song, “High Performance TCP in ansnet,” ACM Computer Comm. Review, vol. 24, no. 5, pp. 45–60, 1994. 8. S. Gorinsky, A. Kantawala, and J. Turner, “Link Buﬀer Sizing: A New Look at the Old Problem,” in Proc. of ISCC’05, June 2005, pp. 507–514. 9. P. Tinnakornsrisuphap and A. M. Makowski, “Limit Behavior of ECN/RED Gateways Under a Large Number of TCP Flows,” in Proc. of IEEE INFOCOM’03, April 2003, pp. 873–883. 10. J. Sun, M. Zukerman, K. Ko, G. Chen, and S. Chan, “Eﬀect of Large Buﬀers on TCP Queueing Behavior,” in Proc. of IEEE INFOCOM’04.

The Eﬀects of Fairness in Buﬀer Sizing Mei Wang and Yashar Ganjali Department of Electrical Engineering, Stanford University {wmei,yganjali}@stanford.edu

Abstract. Buﬀer sizing in Internet routers is a fundamental problem that has major consequences in the design, implementation, and economy of the routers, as well as on the performance observed by the end users. Recently, there have been some seemingly contradictory results on buﬀer sizing. On the one hand, Appenzeller et al. show that as a direct consequence of desynchronization of ﬂows in the core of the Internet, buﬀer sizes in core routers can be signiﬁcantly reduced without any major degradation in network performance. On the other hand, Raina and Wischik show that such reduction in buﬀer sizing comes at the cost of synchronization and thus instability in the network. This work uniﬁes these results by studying the eﬀects of fairness in buﬀer sizing. We show that the main diﬀerence arises from the implicit assumption of fairness in packet dropping in the latter result. We demonstrate that desynchronization among ﬂows observed by Appenzeller et al. is caused by unfair packet dropping when a combination of TCP-Reno and the drop-tail queue management is used. We also show that bringing fairness in packet dropping will introduce synchronization among ﬂows, and will make the system unstable as predicted by Raina and Wischik. Our analysis suggests that there is an intrinsic trade-oﬀ between fairness in packet drops and desynchronization among TCP-Reno ﬂows when routers use the drop-tail queue management. Achieving fairness, desynchronization, small buﬀer size, and 100% link utilization at the same time is desirable and feasible yet challenging. The studies in this paper provide insights for further explorations in reaching this goal.

1

Motivation and Introduction

There have been increasing amount of interests on buﬀer sizing due to the important role that buﬀer plays in routers and the performance of the Internet. The goal of buﬀer sizing is to ﬁnd out how small we can make Internet router buﬀers without any degradation in network performance. A plethora of recent work emerged to reduce buﬀer sizes [1, 2, 3, 4, 5] and to understand the relationships between buﬀer sizing and other parameters of the network [8,9,10,11,12,13,14], such as, throughput, delay, loss, stability [7,6], and the impacts of various traﬃc conditions [8, 15]. Recently, there have been some seemingly contradictory results on buﬀer sizing in Internet core routers. Appenzeller et al. show that buﬀer sizes in core routers can be reduced signiﬁcantly, without any major degradation in network I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 867–878, 2007. c IFIP International Federation for Information Processing 2007

868

M. Wang and Y. Ganjali

performance [1], whereas, Raina and Wischik show that such reduction in buﬀer sizing can cause instabilities in the network [7]. Instability, here, is deﬁned as the periodic variations in the aggregate congestion window size of the ﬂows. Which result is correct? To answer this question, we studied the dynamics of the system in terms of buﬀer sizing through mean-ﬁeld theory [16, 22] analysis and ns2 [17] simulations. We demonstrated that there is an intrinsic trade-oﬀ between fairness among TCP-Reno ﬂows and desynchronization among them using the drop-tail queue management scheme: fairness in packet drops can create synchronization among ﬂows; conversely, unfair packet drops can lead to reduced synchronization. Fairness has always been considered to be a desirable property in the network. The network is expected to treat individual ﬂows in a fair manner when resources are limited. Synchronization among TCP ﬂows, on the other hand, has always been considered to be an undesirable eﬀect. Synchronized ﬂows need much larger buﬀer sizes in core routers and cause local/global instabilities in the system, as well as degradation in the performance observed by individual ﬂows. To see the origins of the trade-oﬀ between fairness and desynchronization, let us consider a congested link in a network carrying a large number of ﬂows. An Active Queue Management (AQM) scheme that fairly drops packets at times of congestion will impact a large percentage of ﬂows since it will distribute packets dropped among the ﬂows. On the other hand, an unfair AQM scheme can drop a lot of packets from a few ﬂows, thus reducing the number of ﬂows which see one or more packet drops. TCP-Reno ﬂows react dramatically to packet drops by halving their congestion window sizes for each dropped packet [18]. Therefore, when a large number of ﬂows see packet drops around the same time they will synchronously react by reducing their congestion window sizes. This may lead to a signiﬁcant reduction in instantaneous throughput of the system. In an unfair AQM scheme, however, only a few ﬂows will take the hit, and thus only a small fraction of ﬂows will react to packet drops. Therefore, the aggregate congestion window will change less signiﬁcantly. Our results unify the seemingly contradictory conclusions mentioned above. Appenzeller et al. have shown that due to the desynchronization of ﬂows in the core √ of the Internet, one can reduce buﬀer sizes in core routers by a factor of N from the original value of bandwidth-delay product [1]. Here N is the total number of ﬂows going through the core router. Our analysis shows that this desynchronization is a direct result of the unfair nature of packet drops in TCPReno combined with the drop-tail queue management scheme (Section 2.1). Also, in Section 2.2 we show that introducing fairness in packet dropping using the drop-tail scheme creates global synchronization. This is consistent with Raina and Wischik’s results. Our study shows that the results from both groups are correct, and the main diﬀerence can be well explained by understanding the eﬀects of fairness in buﬀer sizing. Fairness also explains the recent observation by Dhamdhere et al. that the majority of ﬂows have dropped packets in a congestion cycle [8]. The same results are obtained in our work when there is fair packet dropping among ﬂows. We

The Eﬀects of Fairness in Buﬀer Sizing

869

demonstrate this through both the mean-ﬁeld analysis[22] and ns2 simulations (Section 2). Achieving fairness, desynchronization, small buﬀer size, and 100% link utilization at the same time is desirable yet very challenging. The studies on the dynamics in this paper show insights and feasibility for further explorations in reaching this goal.

2

Fairness-Desynchronization Trade-Oﬀ

In this section we study the relationship between synchronization and fairness in a network using TCP-Reno as the congestion control mechanism. We show that, when the congested routers use the drop-tail queue management scheme, there is an inherent trade-oﬀ between fairness in packet dropping and desynchronization amongst TCP-Reno ﬂows. We investigate this trade-oﬀ through two diﬀerent scenarios. First, we consider a scenario using the standard droptail queue management scheme in which TCP-Reno ﬂows are desynchronized, and show that desynchronization comes at the cost of losing fairness in dropping packets. Second, we show that a slightly modiﬁed drop-tail queue management scheme in intermediate routers which is fair in dropping packets can lead to synchronization among TCP-Reno ﬂows.

B

Q(t)

0011 1100

Capacity C Router R2

Router R1 RTT Sources

Destinations

Fig. 1. Schematic of the model system considered. There are N long TCP ﬂows connected to Router R1. The congested uplink from R1 to R2 is with capacity C. When the queue size Q(t) becomes equal to buﬀer size B, newly arrived packets are being dropped based on the drop-tail algorithm.

2.1

Desynchronization Is a Result of Unfairness

Recent results on buﬀer sizing in Internet core routers suggest that in a network carrying TCP-Reno ﬂows and with the drop-tail queue management scheme in intermediate routers, buﬀer sizes can be reduced as the number of ﬂows is increased [1]. More precisely, the √ amount of buﬀering needed in core buﬀer routers is inversely proportional to N , where N is the number of ﬂows going through the bottleneck link. The underlying assumption here is that as the number of ﬂows N increases, ﬂows become more desynchronized. Variations in individual congestion window sizes cancel each other out when the ﬂows are desynchronized, and as a result, variations in the aggregate congestion window size are

870

M. Wang and Y. Ganjali

reduced, which means smaller buﬀers are able to accommodate the variations in aggregate congestion window size. To further analyze these results, we use the same model as in Appenzeller’s ns2 simulations [1], shown in Figure 1. There are N long-lived TCP ﬂows sharing a congested link with capacity C. Each TCP ﬂow has a diﬀerent round-trip-time (RTT) and the average (harmonic average) RTT of ﬂows is RTT. As a result of the TCP congestion avoidance algorithm, the queue size Q(t) and the sum of the window sizes follow sawtooth patterns. Here, we use the same queue management scheme as in [1] – the commonly deployed drop-tail algorithm at router R1: when Q(t) becomes equal to buﬀer size B, newly arrived packets are dropped. Our goal is to analyze the collective behavior of the diﬀerent TCP ﬂows, especially how they scale for large N . In order to reproduce the results in [1], this setup uses only long-lived TCP ﬂows sharing a common congested link. Previous research has shown that other types of ﬂows or congested links elsewhere in the network do not play a signiﬁcant role in buﬀer sizing [1, 8]. We extended the study by considering more general cases including large round trip time variations, diﬀerent percentages of short ﬂows, diﬀerent window size caps, and longer simulation time, etc. In all cases considered, we found the same qualitative behavior on fairness versus synchronization. Thus, for this paper, we will focus on this simple yet critical setup to get a clear understanding of the underlying mechanisms. Explorations of more complicated systems can be expanded from this base study. The results from [1] are conﬁrmed by our ns2 simulations as shown in Figure 2 where the variations in the queue length (Figure 2(a)) and the window size (Figure 2(b)) decrease with increasing number of ﬂows N . Figure 2 shows the results using three diﬀerent values of N : 100, 300, and 1000. The capacity of the bottleneck link scales with the number of ﬂows [3], N *1Mb/s, and the round trip time varies between 80ms to 100ms among the ﬂows. The size of each packet is 960 bytes and the duration of the whole experiment is 50 seconds. To further study packet dropping, Figure 2(c) shows that when the buﬀer becomes full and packets are dropped at the intermediate router, there are roughly N packets being dropped during each congestion period. This result oﬀers a key clue to understanding the interplay between fairness in packet dropping and synchronization among the ﬂows, as will be discussed in Section 2.2. The origin of the N dropped packets per congestion period is explained in the Appendix. To analyze the fairness among the ﬂows, for each ﬂow, we take the average of congestion window sizes over time, and plot the histogram in Figure 3(a) for the case with N =1000, which is representative of the other simulations. The interesting observation here is the extremely long tail of the histogram, which indicates that there are ﬂows whose average congestion window size grows well beyond that of the majority of the ﬂows. In TCP-Reno, changes in congestion window size happen as a direct consequence of packet drops. Therefore, these observations suggest that packet drops might not be fairly distributed amongst ﬂows. To further study fairness in packet drops, we make the following deﬁnition.

No. of dropped pkts/flow Average window size (pkts/flow)

(Queue size)/N (pkts/flow)

The Eﬀects of Fairness in Buﬀer Sizing 4

871

(a)

3 2 1 0 30

20 18

40

35

(b)

16 14 12 10 30 1.5

35

N=100 N=300 N=1000

40

(c)

1 0.5 0 30

35 time (sec)

40

Fig. 2. Scaling with the number of ﬂows from ns2 simulations using the drop-tail algorithm. a) Average queue length, b) average window size, and c) average number of packets dropped over each congestion cycle. The averaging is done over the number of ﬂows N . Three diﬀerent simulation results are shown with N =100, 300, and 1000. Other parameters are kept the same for the three simulations: (C ∗ RTT)/N = 12 packets/ﬂow, B/N = 3 packets/ﬂow, RTT = [80, 100]ms.

Definition 1. In the setting defined above, let us denote the number of packet drops as seen by flow i between times t1 and t2 as D(i, t1 , t2 ). We denote the vector of all drops between t1 and t2 as D(t1 , t2 ). Unfairness in packet drops between t1 and t2 , denoted by U (t1 , t2 ), is defined as the variance of D(t1 , t2 ). Fairness is defined as: 1 . (1) F (t1 , t2 ) = 1 + U (t1 , t2 ) Based on this deﬁnition, fairness is a number greater than zero and less than or equal to one. Maximum fairness is when all ﬂows see the same number of drops. In this case, unfairness, or the variance in packet drops, is zero, which makes fairness equal to one. By default we set t2 = t1 + RTT. Figure 4 shows the cumulative distribution function (CDF) of fairness. With drop-tail queuing, for about 55% of the time, fairness is equal to 1. These are times when no packets are dropped, and thus the system is completely fair. However, when there are packet drops in the system (the remaining 45% of the

872

M. Wang and Y. Ganjali

(a)

250

8

(b) 200

6 100

4 2 0 40

50

50

60

70

80

90 100 110 120

Number of flows

Number of flows

150

150 100 50

0 0

20

40 60 80 Average window size

100

120

0 5

10 20 15 Average window size

25

Fig. 3. Histogram of time-averaged window size of individual ﬂows from ns2 simulations based on (a) the drop-tail algorithm and (b) the randomized drop-tail algorithm. These results are obtained from the same conﬁgurations as in Figures 2 and 5 for N =1000.

time) the system is very unfair in dropping packets; fairness is less than 0.5 for more than 30% of the time. This graph suggests that the combination of TCP-Reno and the drop-tail queue management scheme is not fair in dropping packets. The other line in this ﬁgure will be discussed later in Section 2.2. Unfairness in packet drops is not surprising. TCP-Reno injects packets to the system in a bursty manner [2]. Whenever such a burst of packets arrives to a full or nearly full queue in an intermediate router which uses drop-tail, a large number of those packets are expected to be dropped quite unfairly since other ﬂows in the system see almost no drops. A fair packet dropping scheme on the other hand, would drop packets from all ﬂows in the queue rather than the ﬂow whose burst has reached the tail of the queue. Now, if by any chance a ﬂow’s bursts arrive at times when the queue is not nearly full for a number of RTTs, that speciﬁc ﬂow will not see any packet drops, thus its congestion window size will grow signiﬁcantly compared to the rest of the ﬂows. 2.2

Fairness Leads to Synchronization

If desynchronization among TCP-Reno ﬂows comes from unfairness in packet drops, should increasing fairness lead to synchronization? In this section, we will try to ﬁnd out whether this is the case or not. One way to increase fairness in packet drops is to add randomness. We start with a simple experiment by introducing some randomness while not deviating too much from the drop-tail algorithm. We call this the randomized drop-tail scheme: when the queue length is less than 90% of buﬀer size, none of the packets are dropped; otherwise, each packet is dropped according to a probability that is calculated as follows: Queue Length − 0.9 . (2) pdrop = 10 × Buﬀer Size In other words, the drop probability increases linearly as a function of queue length from 90% to 100% of the buﬀer size. Using this algorithm, packets are

The Eﬀects of Fairness in Buﬀer Sizing

873

Percentage of time (%)

100 80

Drop-tail Randomized drop-tail

60 40 20 0 0

0.2

0.4 0.6 Fairness

0.8

1

Fig. 4. Comparison of cumulative distribution functions of fairness over time between the drop-tail and the randomized drop-tail schemes

dropped only when the buﬀer is nearly full, which therefore does not deviate much from the drop-tail algorithm. Note that this randomized drop-tail scheme is diﬀerent from the well-known “Random Early Detection” (RED) queue management scheme [23]. The latter is a much more sophisticated scheme where the packet dropping probability depends not only on the queue length but also on the history of the queue. RED starts to drop packets even when the queue length is short, much earlier than the drop-tail scheme. Figure 5 shows the results from this randomized drop-tail algorithm with otherwise the same conﬁguration as in Figure 2. With randomness introduced, the queue size variation and the total window size variation become larger than in Figure 2. Furthermore, these variations do not decrease any more with increasing N . Both the queue length (Figure 5(a)) and the window size (Figure 5(b)) enter into periodic sawtooth behavior, indicative of global synchronization. Interestingly, the number of packets dropped over each congestion cycle (Figure 5(c)) still comes up to be close to N , similar to the previous case for the unfair packet dropping (Figure 2(c)). The fairness in this scenario is shown in Figure 4. The CDF of fairness indicates that adding randomness in packet dropping increases fairness: for about 60% of the time fairness is equal to one (when there are no dropped packets), and for the rest of the time (when there are packet drops) it is above 0.5, higher than in the drop-tail scenario. The fact that the randomized drop-tail recovers fairness in the ﬂows is further illustrated in Figure 3(b), where the time-averaged window sizes of the diﬀerent ﬂows fall into a rather narrow distribution about the mean. The results obtained from the randomized drop-tail simulations can be understood quantitatively by a mean-ﬁeld method [22]. In the speciﬁc case when the N long ﬂows have identical RTTs, there should be N packets dropped during

874

M. Wang and Y. Ganjali

No. of dropped pkts/flow

Average window size (pkts/flow)

(Queue size)/N (pkts/flow)

each congestion cycle. When these N dropped packets are spread randomly among the N ﬂows, about 63% of the ﬂows see packet dropping from probability calculations [22]. Therefore, the mean-ﬁeld calculation predicts global synchronization for large N when packets are dropped fairly within the drop-tail scheme. Fair packet dropping means that the packet dropping probability for each ﬂow is proportional to the ﬂow size, i.e., its current congestion window size. 7 6 5 4 3 2 1 0 30

20 18

(a)

35

40

35

N=100 N=300 40 N=1000

(b)

16 14 12 10 30 1.5

(c) 1 0.5 0 30

35 time (sec)

40

Fig. 5. Scaling with the number of ﬂows from ns2 simulations using the randomized drop-tail algorithm. a) Average queue length, b) average window size, and c) average number of packets dropped over each congestion cycle. The conﬁgurations are identical to those in Figure 2 except for B/N = 6 packets/ﬂow to ensure the full utilization of the link.

To study whether the conclusions drawn above are robust against diﬀerent parameter variations, we carried out ns2 simulations with many diﬀerent conditions where we varied RTTs and their variations, uplink capacity, buﬀer size, number of ﬂows, with and without congestion window size limits, and the mixing of short ﬂows. In all cases considered, we observed the same behavior: the drop-tail algorithm gives unfair bandwidth partitioning, whereas the randomized drop-tail is much more fair. The results from a particular set of simulations with the randomized drop-tail are shown in Table 1, where the RTTs and the number of ﬂows are varied. The overall behavior remains the same: both the total window size variation and the percentage of ﬂows with dropped packets

The Eﬀects of Fairness in Buﬀer Sizing

875

Table 1. ns2 simulations with the randomized drop-tail algorithm by varying the range of round trip time and the number of long ﬂows. The other parameters are the same as in Figure 5.

RTT

N

(ms) [89 91] [80 100] [60 120]

100 300 1000 100 300 1000 100 300 1000

W Wmax Wmin 1- W min max

(pkts) 17.6 17.7 17.7 17.4 17.5 17.9 17.3 18.4 17.4

(pkts) 11.5 11.8 11.8 12.5 12.5 12.8 13.4 13.4 13.2

(%) 34.5 33.6 33.4 28.5 28.7 28.4 22.2 27.4 24.4

Flows w/ drop dropped flow packets (pkts) (%) 0.88 64 0.86 62 0.86 62 0.87 52 0.85 53 0.83 53 0.79 42 0.77 45 0.76 47

N

B

t1

t2

t3

Queue

60 50

(a) Drop-tail Randomized drop-tail

40 30

C*RTT+B

20 10 0

~N

Σ (Windows)

Flows with drops per congestion cycle (%)

70

100

1000 Number of long flows

(b)

10000

Fig. 6. The percentage of ﬂows with dropped packets per congestion cycle as a function of the number of long TCP ﬂows: the comparison between the drop-tail and the randomized droptail algorithms. The conﬁgurations are identical to those in Figure 2 with RTT = [80, 100]ms.

time

Fig. 7. Sawtooth pattern of both the (a) queue size and (b) sum of windows of all ﬂows. At t1 , the buﬀer becomes full and packets start to be dropped. This continues for one round-trip time until t2 when some ﬂows start to half their windows and pause sending packets. At t3 , all ﬂows with dropped packets have reduced their windows and the sum of windows reaches minimum. Between t3 and t1 +T , all the ﬂows resume sending packets and the buﬀer slowly ﬁlls up.

do not decrease as N increases. For the cases where the round trip times vary little (from 89ms to 91ms), the results match very well with the mean-ﬁeld analysis results - there are about 63% of ﬂows with dropped packets [22]. This is

876

M. Wang and Y. Ganjali

consistent with and explains the simulation result reported earlier that there are over 60% ﬂows with packet drops during each congestion cycle [8]. Figure 6 is a summary of the diﬀerence between the drop-tail and the randomized drop-tail algorithms. When the number of ﬂows N increases, the percentage of ﬂows that have dropped packets per congestion cycle decreases in the drop-tail scheme, in contrast to the randomized drop-tail scheme where the percentage of dropped ﬂows does not change appreciably with N . The results from the above ns2 simulations show that once fairness is recovered in the drop-tail queue management scheme, as in the randomized drop-tail scheme, the simulation results match very well to the theoretical calculations based on the mean-ﬁeld analysis [22]. However, this fairness in packet dropping causes synchronization among ﬂows, thus requires larger buﬀer sizes. The fact that fairness causes synchronization can explain the diﬀerence between the results observed by Appenzeller et al. [1] and those of Raina and Wischik [7]. Appenzeller et al. do not see any synchronization in the system, since they use TCP-Reno with drop-tail, which is unfair in packet drops. On the other hand, Raina and Wischik base their analysis on a ﬂuid model, which assumes fairness in packet dropping. As a result they predict synchronization among ﬂows, and an unstable system.

3

Conclusion

In this paper, we investigated the relationship between fairness and desynchronization among TCP ﬂows. We found that: 1) the desynchronization in the presence of a large number of long ﬂows from the drop-tail scheme is a result of unfairness in packet dropping; and 2) if fairness is imposed, the drop-tail scheme leads to global synchronization of the TCP ﬂows. This results in a periodic sawtooth behavior in the queue length and the total window size, during which the majority of the ﬂows have dropped packets within each congestion cycle. The studies in this paper provides insights on the eﬀects of fairness in buﬀer sizing. One of the key factors that leads to the global synchronization is the fact that the packet drops occur during a short time period. One can avoid this by spreading out the packet drops over a longer period of time. Thus, it is feasible to achieve both fairness and desynchronization with full link utilization. Further research in ﬁnding the right scheme would be interesting and very much needed.

Acknowledgments The authors would like to thank Professor Ashish Goel, Dr. Larry Dunn, and Damon Mosk-Aoyama for helpful discussions and feedback; and Dr. Guido Appenzeller for the scripts to reproduce results in [1].

The Eﬀects of Fairness in Buﬀer Sizing

877

References 1. G. Appenzeller, I. Keslassy and N. McKeown, “Sizing Router Buﬀers”, ACM SIGCOMM 2004. 2. M. Enachescu, Y. Ganjali, A. Goel, N. McKeown, T. Roughgarden, “Routers with very small buﬀers”, Proceedings of INFOCOM, 2006. 3. D. Wischik, N. McKeown, “Part I: Buﬀer sizes for core routers”, ACM SIGCOMM Computer Communication Review, July 2005. 4. N. Beheshti, Y. Ganjali, R. Rajaduray, D. Blumenthal, N. McKeown, “Buﬀer sizing in all-optical packet switches”, Proceedings of OFC/NFOEC, March 2006. 5. R. Shorten, D. Leith, “On queue provisioning, network eﬃciency and the transmission control protocol”, accepted to appear in IEEE/ACM Transactions on Networking, Dec. 2007. 6. G. Raina, D. Towsley, D. Wischik, “Part II: Control theory for buﬀer sizing”, ACM SIGCOMM Computer Communication Review, July 2005. 7. G. Raina, D. Wischik, “Buﬀer sizes for large multiplexers: TCP queueing theory and instability analysis”, EuroNGI, April 2005. 8. A. Dhamdhere, H. Jiang, C. Dovrolis, “Buﬀer sizing for congested Internet links”, Proceedings of INFOCOM, 2005. 9. A. Dhamdhere, C. Dovrolis, “Open issues in router buﬀer sizing”, ACM SIGCOMM Computer Communications Review, Jan. 2006. 10. D. Wischik, “Buﬀer requriements for high-speed routers”, ECOC 2005. 11. D. Wischik, ”Fairness, QoS, and buﬀer sizing”, ACM SIGCOMM Computer Communications Review, Jan. 2006. 12. J. Sommers, P. Barford, A. Greenberg, W. Willinger, “A SLA perspective on the router buﬀer sizing problem”, University of Wisconsin Techical Report, 2006. 13. G. Appenzeller, N. McKeown, J. Sommers, P. Barford, “Recent Results on Sizing Router Buﬀers”, Proceedings of the Network Systems Design Conference, Oct. 2004. 14. S. Gorinsky, A. Kantawala, J. Turner, “Link buﬀer sizing: a new look at the old problem”, Proceedings of the IEEE Symposium on Computers and Communications, June 2005. 15. A. Lakshmikantha, R. Srikant, C. Beck, “Are small buﬀers feasible in high speed routers?”, in submission, 2006. 16. K. Huang, “Statistical Mechanics”, 2nd edition, Wiley, 1987. 17. network simulator 2. 18. D. Comer, “Internetworking with TCP/IP”, Vol 1, 5th edition, Prentice Hall, 2005. 19. F. Baccelli, A. Chaintreau, D. Vleeschauwer, D. McDonald, “A mean-ﬁeld analysis of short lived interacting TCP ﬂows”, SIGMETRICS, 2004. 20. M. Marsan, M. Garetto, P. Giaccone, E. Leonardi, E. Schiattarella, A. Tarello, “Using partial diﬀerential equations to model TCP mice and elephants in large IP networks”, Proceedings of INFOCOM, 2004. 21. M. Wang, X. Zou, F. Bonomi, B. Prabhakar, “Elephant traps using a random sampling algorithm”, in preparation. 22. M. Wang, Y. Ganjali, “Unifying buﬀer sizing results through fairness”, Technical Report, HR06-HPNG-060606, Stanford University, June 2006. http://yuba. stanford.edu/techreports/TR06-HPNG-060606.pdf 23. S. Floyd and V. Jacobson, “Random Early Detection gateways for congestion avoidance”, IEEE Transactions on Networking, August 1993. 24. R. K. Pathria, “Statistical Mechanics”, 2nd edition, Butterworth-Heinemann, 1996.

878

M. Wang and Y. Ganjali

APPENDIX:

Mean-ﬁeld Analysis

In this appendix, we explain the origin of the claim that there are N packets dropped in each congestion period, where N is the total number of long-lived TCP ﬂows sharing the congested link. The network conﬁguration considered here is shown in Figure 1. As observed in typical ns2 simulations, both the queue size and the total window size follow a sawtooth pattern over time. We study the sawtooth pattern that has stabilized to a periodic function with period T , which serves as a good representative. We dissect the sawtooth pattern into diﬀerent segments, as illustrated in Figure 7. There is a period of time when the buﬀer is full and packets are being dropped. During this period, (between t1 and t2 in Figure 7) the window size keeps increasing until some ﬂows detect the dropped packets. These ﬂows in turn reduce window sizes and temporarily stop sending packets. During this time, between t2 and t3 , the total window size declines and the buﬀer starts to deplete. When the ﬂows that have dropped packets receive enough acknowledgements, they resume sending packets. During this segment, between t3 and t1 +T , the total window size increases and the queue length slowly builds up. When the buﬀer ﬁrst becomes full, denoted by time t1 , router R1 starts to drop packets. At t1 , the sum of the windows of all the ﬂows, W (t1 ), consists of packets in transit C × RTT plus those that are queued in the buﬀer B. Router R1 continues to drop packets while the buﬀer remains full until some of the ﬂows detect the dropped packets and reduce their window sizes. The feedback for ﬂows to reduce window sizes comes after the un-dropped packets complete a round-trip, starting at t2 = t1 + RTT + B/C. The extra packets sent during the time period between t1 and t2 , i.e., after the buﬀer is full and before the senders receive the feedback to reduce the window sizes, are dropped. Each ﬂow increases its window by 1 for each round-trip before it receives the dropped packets feedback. The time duration from t1 to t2 is roughly one round-trip time. There are total N ﬂows. Thus, the total number of packets injected into the network at t2 is W (t2 ) = C ×RTT+B +N . Therefore, the number of packets dropped during t1 to t2 is W (t2 ) − W (t1 ) = C × RTT + B + N − (C × RTT + B) = N,

(3)

After t2 , some senders start reducing their window sizes, the queue is not full any more, and no packets are dropped either until the next cycle at T + t1 . Thus, the total number of packets dropped within one congestion cycle is N .

Time to Buffer Overflow in an MMPP Queue Andrzej Chydzinski Silesian University of Technology Institute of Computer Sciences Akademicka 16, 44-100 Gliwice, Poland [email protected]

Abstract. In this report we deal with the time to buffer overflow in a finite-buffer queue with MMPP (Markov-modulated Poisson process) arrivals. The results include a closed-form formula for the transform of the distribution of the time to buffer overflow. The main benefit of this formula is that, using properties of the transform, we can easily compute the average overflow time and all the moments (variance etc). Moreover, by means of an inversion algorithm, we can obtain the probability density function and cumulative distribution function of the overflow time. Analytical results are illustrated by a numerical example based on MMPP parameterization fitted to an IP traffic trace file. Keywords: Performance evaluation, buffer overflow, MMPP queue.

1 Introduction The popularity of MMPP in various areas of network traffic modeling, simulation and performance evaluation is connected with the fact that it is probably the simplest model that allows for precise fitting of the statistical parameters of the traffic, including its autocorrelation structure. In other words, the MMPP, remaining analytically tractable, can reflect burstiness and self-similarity of network traffic which results in reliable performance parameters (like loss ratio, queueing delay, buffer occupancy) obtained using MMPP-based models [1]. In this paper we deal with the time to buffer overflow, an informative performance characteristic for buffering processes in network elements (see [2] for a deeper discussion). As the distribution of the time to overflow heavily depends on the autocorrelation structure of the traffic, it is very important to use a model that is able to mimic that structure properly. MMPP is very well suited for this purpose, especially taking into account that the methodology for fitting traces to MMPP is well developed (see [1],[3]-[6]). The main result of this paper is a closed-form formula for the Laplace transform of the buffer overflow time distribution in a finite-buffer queue, presented in Theorem 1. Using this formula, we can easily compute the average overflow time, all moments, the probability density function and cumulative distribution function. A finite buffer of size b and general type of service time are assumed, which means that in Kendall’s notation we investigate herein the M M P P/G/1/b queueing system. To the best of the author’s knowledge there have been no reported results of this type yet. Most of the papers are devoted to other queueing characteristics, mainly to queue I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 879–889, 2007. c IFIP International Federation for Information Processing 2007

880

A. Chydzinski

size distribution or queueing delay (workload). In particular, these characteristics for various MMPP queueing models in the stationary regime can be found by the reader in [7]-[9], while their time-dependent versions in [10]-[15]. There are also some papers in which approximation techniques for analysis of MMPP queues are shown, for instance [16]-[18]. As regards the buffer overflow time, in [19,20] some asymptotic results devoted to related problems can be found. In [21], a solution for discrete-time Markovian queues of Geom(n)/Geom(n)/ 1/N type is given. Finally, in [22] systems with exponential service times are investigated. The approach presented herein is different, and it does not restrict analysis to Markovian or exponential service times, which excludes some cases which are important from a practical point of view, like constant service time. It also has an advantage of giving solutions in a closed, easy to use, form. The remaining part of paper is organized in the following way. In section 2, the arrival process, the queueing model and the notation used throughout the article are presented. In section 3, the analysis of the buffer overflow time is performed. In particular, a formal definition of the overflow time is given and the main result of the paper, Theorem 1, is proven. In addition, some computational issues are discussed at the end of that section. In section 4, a numerical illustration based on IP traffic is presented. Finally, remarks concluding the paper are gathered in section 5.

2 The Model For a very good survey on MMPP we refer the reader to [7]. Following the authors, a Markov-modulated Poisson process can is obtained by varying the arrival rate of the Poisson process according to an m-state continuous-time Markov chain. In particular, when the Markov chain is in state i, arrivals occur according to a Poisson process of rate λi . Therefore MMPP is parametrized by two m × m matrices: Q – infinitesimal generator of the continuous-time Markov chain, Λ = diag(λ1 , . . . , λm ) – it has arrival rates on its diagonal, non-diagonal elements are zeros. In this article we deal with a single-server queue whose arrival process is given by an MMPP. The service time is distributed according to a distribution function F (·), which may assume any form, and the standard independence assumptions are made. The buffer size is finite and equal to b, including service position. This means that if a packet (cell, job, customer) at its arrival finds the buffer full, it is blocked and lost. The following nomenclature is used throughout the paper: J(t) – the state of the modulating Markov chain at time t X(t) – the queue size at time t P(·) – the probability Pi,j (n, t) = P(N (t) = n, J(t) = j|N (0) = 0, J(0) = i) – the counting function for the MMPP. N (t) denotes the total number of arrivals in (0, t] Qij – element of the matrix Q. This notation is used for all matrices. z(s) = ((s + λ1 − Q11 )−1 , . . . , (s + λm − Qmm )−1 )T – column vector of size m with elements + λi − Qii )−1 m(s n−1 ∞ −st ˜ dn,i (s) = j=1 k=0 0 e Pi,j (k, t)(1 − F (t))dt,

Time to Buffer Overflow in an MMPP Queue

881

d˜n (s) = (d˜n,1 (s), . . . , d˜n,m (s))T , z(s) = ((s + λ1 − Q11 )−1 , . . . , (s + λm − Qmm )−1 )T . ∞ −st = Pi,j (k, t)dF (t), ak,i,j (s) 0 e 0 if i = j, pij = Qij /(λi − Qii ) if i = j. In addition, the following m × m matrices are used: 0 = m × m matrix of zeroes, I = m × m identity matrix, Ak (s) = [ak,i,j (s)]i,j , (λi − Qii )pij Z(s) = , s + λi − Qii i,j Λij , E(s) = s + λi − Qii i,j R0 (s) = 0, R1 (s) = A−1 0 (s), Rk+1 (s) = A−1 0 (s)(Rk (s) −

k

Ai+1 (s)Rk−i (s)),

k ≥ 1.

i=0

In this notation [bi,j ]i,j denotes an m × m matrix with elements bi,j .

3 Time to Buffer Overflow The time to buffer overflow is denoted herein by τn,i and defined formally in the following way. Let X(t), t ≥ 0 be the queue size process. Let the initial queue size be n, 0 ≤ n < b, and the initial state of the modulating Markov process be i, 1 ≤ i ≤ m. Then τn,i = inf{t > 0 : X(t) = b|X(0) = n, J(0) = i}. Although usually we are interested only in τ0,i , that is in the overflow time starting from empty buffer, the analysis presented below covers all possible initial lengths of the queue. The distribution of the buffet overflow time will be presented using the transform of the tail of τn,i , namely: ∞ ln,i (s) = e−st P(τn,i > t)dt, 0

and its column vector: ln (s) = (ln,1 (s), ln,2 (s), . . . , ln,m (s))T . Using the transform of the tail is very convenient because having calculated ln,i (s) we can easily obtain the expected value of τn,i : Eτn,i = ln,i (0), the transform of its cumulative distribution function:

(1)

882

A. Chydzinski

∞

e−st P(τn,i < t)dt =

0

1 − ln,i (s), s

(2)

or the transform of its probability density function: ∞ e−st Pt (τn,i < t)dt = 1 − sln,i (s).

(3)

0

Theorem 1. For the M M P P/G/1/b queue it holds true that:

ln (s) =

b−n k=0

Rb−n−k (s)Ak (s)G−1 b (s)hb (s) −

b−n

Rb−n−k (s)d˜k (s),

n < b,

(4)

k=1

where Gb (s) = (I − Z(s))

b

Rb−k (s)Ak (s) − E(s)

k=0

hb (s) = (I − Z(s))

b

b−1

Rb−1−k (s)Ak (s),

(5)

k=0

Rb−k (s)d˜k (s) − E(s)

k=1

b−1

Rb−1−k (s)d˜k (s) − z(s).

(6)

k=1

P r o o f of Theorem 1. Conditioning on the first departure epoch we get for 0 < n < b, 1 ≤ i ≤ m: P(τn,i > t) =

m b−n−1 j=1

k=0

t

P(τn+k−1,j > t − u)Pi,j (k, u)dF (u)

0

+ (1 − F (t))

m b−n−1 j=1

Pi,j (k, t),

(7)

k=0

and for 1 ≤ i ≤ m: P(τ0,i > t) =

m

t

P(τ0,j > t − u)pij (λi − Qii )e−(λi −Qii )u du

j=1 0 m t

+

j=1

P(τ1,j > t − u)Λij e−(λi −Qii )u du

0

+e−(λi −Qii )t .

(8)

The first part of (7) covers the situation where the first departure time u is before t and and the buffer does not get full by the time u. The second part covers the situation where the first departure time u is after t and the buffer does not get full by the time t. The first part of (8) corresponds to the situation where the modulating state changes by the

Time to Buffer Overflow in an MMPP Queue

883

time t, while the second part corresponds to the case where the first arrival occurs by the time t. Finally, the last part of (8) covers the situation where nothing happens by the time t. Using transforms we obtain for 0 < n < b, 1 ≤ i ≤ m: ln,i (s) =

m b−n−1 j=1

ak,i,j (s)ln+k−1,j (s) + d˜b−n,i (s),

k=0

and for 1 ≤ i ≤ m: l0,i (s) =

m

m

l0,j (s)

j=1

pij (λi − Qii ) Λij 1 + l1,j (s) + . s + λi − Qii s + λ − Q s + λ − Qii i ii i j=1

Using vector notation we have: ln (s) =

b−n−1

Ak (s)ln+k−1 (s) + d˜b−n (s),

0 < n < b,

k=0

l0 (s) = Z(s)l0 (s) + E(s)l1 (s) + z(s). Replacing ln (s) = ub−n (s) we obtain: n−1

Ak+1 (s)un−k (s) − un (s) = ψn (s),

0 < n < b,

(9)

k=−1

ub (s) = Z(s)ub (s) + E(s)ub−1 (s) + z(s),

(10)

with ψn (s) = An (s)u1 (s) − d˜n (s). All possible solutions of the system (9) have the following form: un (s) = Rn (s)c(s) +

n

Rn−k (s)ψk (s),

n ≥ 1,

(11)

k=1

where c(s) is a vector which does not depend on n (see Theorem 1 in [23]). Putting n = 1 into (11) we get c(s) = A0 (s)u1 (s) and un (s) =

n

Rn−k (s)Ak (s)u1 (s) −

k=0

n

Rn−k (s)d˜k (s).

k=1

Finally, by means of the boundary condition (10) we get u1 (s) = G−1 b (s)hb (s). This finishes the proof of Theorem 1.

884

A. Chydzinski

In order to make (4) useful for practical purposes we have compute first matrices Ak (s), Rk (s), Z(s), E(s) and vectors d˜k (s), z(s). Firstly, Z(s), E(s) and z(s) can be computed in an obvious way. Secondly, computing Rk (s) is also easy if we know matrices Ak (s). Therefore we are reduced to finding an efficient way of calculating Ak (s) and d˜k (s). This can be done by means of the uniformization technique [8]. In particular, for Ak (s) we have An (s) =

∞

(12)

γj (s)Kn,j ,

j=0

where K0,0 = I, Kn,0 = 0, n ≥ 1, K0,j+1 = K0,j (I + θ−1 D0 ), θ = max{(−D0 )ii } i

Kn,j+1 = θ−1

n−1

∞

Ki,j Dn−i + Kn,j (I + θ−1 D0 ),

i=0 −(θ+s)t

γj (s) =

e

(θt)j

j!

0

dF (t).

The truncation rule for the sum in (12) can be found in [7]. Vectors d˜k (s) can be computed using the following m × m matrices: ∞ Dk (s) = e−st Pi,j (k, t)(1 − F (t))dt . i,j

0

The uniformization technique gives now: Dn (s) =

∞

δj (s)Kn,j ,

j=0

with

δj (s) = 0

∞

e−(θ+s)t (θt)j (1 − F (t))dt. j!

Naturally, we have d˜n,i (s) =

m n−1 (Dk (s))i,j . j=1 k=0

Having computed all coefficient matrices and vectors we may proceed to compute the mean buffer overflow time and transforms of its pdf and cdf, using (1), (2) and (3), respectively. If we are also interested in the precise shape of the density function,

Time to Buffer Overflow in an MMPP Queue

885

algorithms for the numerical Laplace transform inversion have to be used. For example, in [24] the following method based on the Euler summation formula is proposed: f (t) ≈

m n+k m

2−m (−1)j aj (t), k j=0

(13)

k=0

where

eA/2t ˜ dk (t), k ≥ 0, 2lt

l A A ijπ ∗ ∗ ijπ/t ˜ + Re f , d0 (t) = f +2 e 2lt 2lt lt j=1 ak (t) =

d˜k (t) = 2

A ijπ ikπ + + Re f ∗ eijπ/t , k ≥ 1. 2lt lt t j=1

l

f (t) is the original function, f ∗ (s) denotes a transform to be inverted while coefficients m, n, A, l are used to control the inversion error. Typical values are m = 11, n = 38, A = 19 and l = 1.

4 Numerical Illustration For numerical purposes, we are going to use a parameterization of MMPP fitted to an IP traffic trace file. Using one million packet headers from the file FRG-11372081981.tsh, recorded at the FGR aggregation point run by PMA (Passive Measurement and Analysis Project [25]), the following MMPP parameteriztion was obtained in [15]: ⎡ ⎤ −172.53 38.80 30.85 0.88 102.00 ⎢ 16.76−883.26 97.52 398.9 370.08 ⎥ ⎢ ⎥ 281.48 445.97−1594.49 410.98 456.06 ⎥ Q=⎢ ⎢ ⎥, ⎣ 23.61 205.74 58.49−598.93 311.09 ⎦ 368.48 277.28 7.91 32.45−686.12 (λ1 , · · · , λ5 ) = (59620.6, 113826.1, 7892.6, 123563.2, 55428.2). The mean packet size is 850 bytes. Other basic characteristics of the traffic sample and its MMPP model are shown in Table 1. It is important that the autocorrelation function fits the original traffic reasonably well on several time scales (see Fig. 5 in [15]). It is assumed that service time, d, is constant, the initial queue size is 0, and that the initial state of the modulating process is distributed according to π = (0.52174, 0.12808, 0.023151, 0.11352, 0.21351), which is the steady state vector of the underlying Markov chain, J(t). Now, manipulating the service time we may obtain different traffic intensities ρ = dπΛ 1.

886

A. Chydzinski Table 1. Parameters of the original and MMPP traffic

original traffic MMPP

0

mean interarrival time [µs] arrival rate, [pkts/s] 13.940 71732 13.941 71729

100KB

200KB

300KB

400KB

500KB

MEAN TIME TO OVERFLOW s

100000. 0.5

1000

0.6 0.7

10

0.8 0.9

0.1

0.001

0

100KB

200KB 300KB BUFFER SIZE

400KB

500KB

Fig. 1. Mean time to overflow versus the buffer size for MMPP arrivals and five different traffic intensities, namely 0.5, 0.6, 0.7, 0.8 and 0.9

Sample results for five different traffic intensities are depicted in Figure 1. Each curve represents the average time to overflow as a function of the buffer size in logarithmic scale. Analyzing this set of results we can notice a few things. Firstly, for large buffers the average overflow time seems to grow linearly (on logscaled plot) with the buffer size. This effect can potentially be used for estimation of the overflow time in large-buffer systems using numerical results obtained for much smaller buffers. Secondly, a characteristic bend can be observed in a low range (between 5 and 10KB) of each curve. This indicates that for a very small queueing capacity the system’s behaviour is significantly different. Thirdly, the time to overflow decreases when the traffic intensity grows but this effect was to be expected. Now we demonstrate how the autocorrelation structure of the traffic influences the buffer overflow time. For this purpose we consider the Poisson arrivals instead of MMPP. Naturally, the same arrival rate is assumed. Figure 2 reports average overflow times in this case. As the Poisson process is the simplest case of MMPP, the shapes of curves are similar, but there are great differences in values. For instance, in Table 2 a comparison between MMPP and Poisson-arrival models for 20KB of buffering space is shown. The overflow times in the autocorrelated model are always smaller, and the difference grows enormously as the traffic intensity decreases.

Time to Buffer Overflow in an MMPP Queue

MEAN TIME TO OVERFLOW s

0

20KB 0.5

1. 107

40KB 0.7

60KB

80KB

887

100KB

0.8 0.9

10000

10

0.01

0

20KB

40KB 60KB BUFFER SIZE

80KB

100KB

Fig. 2. Mean time to overflow versus the buffer size for Poisson arrivals and five different traffic intensities, namely 0.5, 0.6, 0.7, 0.8 and 0.9 Table 2. Mean overflow time [s] for MMPP and Poisson arrivals and the buffer size of 20KB traffic intensity 0.5 0.6 0.7 0.8 0.9

Poisson arrivals 5.958×107 8.780×104 3.367×102 3.004×100 7.256×10−2

MMPP arrivals 1.629×100 3.543×10−2 1.441×10−2 1.128×10−2 9.758×10−3

5 Conclusion In this paper a comprehensive solution for the buffer overflow time in a finite-buffer queue fed by the Markov-modulated Poisson process is shown. As the MMPP is able to mimic the complex autocorrelation structure of network traffic, this solution may be helpful in the proper sizing of buffers in network elements. The main result is presented in a closed, easy to use, form. It allows one to obtain all characteristics of the overflow time distribution, including the mean value, the moments, pdf, cdf etc. It has also the following advantage in numerical calculations. As the sequences Ak (s), Rk (s), d˜k (s) do not depend on the buffer size, b, nor the initial queue length, n, after computing these sequences up to some index we can obtain a variety of results for different values of n and b with practically no additional effort.

Acknowledgment This work was supported in part by MNiSW under grant N517 025 31/2997.

888

A. Chydzinski

References 1. Salvador, P., Valadas, R. and Pacheco, A. Multiscale Fitting Procedure Using Markov Modulated Poisson Processes. Telecommunication Systems 23:1,2, 123-148, (2003). 2. Schwefel, H.-P., Lipsky, L. and Jobmann, M. On the necessity of transient performance analysis in telecommunication systems, in: Teletraffic Engineering in the Internet Era, eds. J.M. de Souza, N.L.S. da Fonseca and E.A.S. Silva, Elsevier, Amsterdam, (2001). 3. Deng, L. and Mark, J. Parameter estimation for Markov modulated Poisson processes via the EM algorithm with time discretization, Telecommunication Systems 1, pp. 321-338, (1993). 4. Ryden, T. An EM algorithm for parameter estimation in Markov modulated Poisson processes, Comput. Stat. Data Anal. 21, pp. 431447, (1996). 5. Klemm, A., Lindemann, C. and Lohmann, M. Modeling IP traffic using the batch Markovian arrival process. Performance Evaluation, Vol. 54, Issue 2, (2003). 6. Yoshihara, T., Kasahara, S. and Takahashi, Y. Practical time-scale fitting of self-similar traffic with Markov-modulated Poisson process, Telecommunication Systems 17(1/2):185211 (2001). 7. Fischer, W. and Meier-Hellstern, K. The Markov-modulated Poisson process (MMPP) cookbook. Performance Evaluation 18, No.2, 149-171 (1992). 8. Lucantoni, D. M. New results on the single server queue with a batch Markovian arrival process. Commun. Stat., Stochastic Models 7, No.1, 1-46 (1991). 9. Baiocchi, A. and Blefari-Melazzi, N. Steady-state analysis of the MMPP/G/1/K queue. IEEE Trans. Commun. 41, No.4, 531-534 (1992). 10. Lucantoni, D. M., Choudhury, G. L. and Whitt, W. The transient BM AP/G/1 queue. Commun. Stat., Stochastic Models 10, No.1, 145-182 (1994). 11. Le Ny, L.-M. and Sericola, B. Transient Analysis of the BM AP/P H/1 Queue. International Journal of Simulation: Systems, Science & Technology. Special Issue on Analytical & Stochastic Modeling Techniques, 3(3-4), (2002). 12. Van Houdt B. and Blondia C. QBDs with marked time epochs: a framework for transient performance measures. Proc. of QEST 2005, Torino, Italy, p. 210-219,(2005). 13. Kulkarni, L. and Li, S.-Q. Transient behaviour of queueing systems with correlated traffic. Perform. Eval. 27&28, 117-145 (1996). 14. Lee, D.-S. and Li, S.-Q. Transient analysis of multi-server queues with Markov-modulated Poisson arrivals and overload control. Perform. Eval. 16, No.1-3, 49-66 (1992). 15. Chydzinski, A. Transient analysis of the MMPP/G/1/K queue. Telecommunication Systems 32(4): 247–262, (2006). 16. Ross, S. Approximating transition probabilities and mean occupation times in continuoustime Markov chains. Probability in the Engineering and Informational Sciences, 1:251 – 264, (1987). 17. Carmo, R.M.L.R., de Souza e Silva, E. and Marie, R. Efficient solutions for an approximation technique for the transient analysis of Markovian models. Technical report, IRISA Publication Interne. N 1067, (1996). 18. Van Houdt B. and Blondia C. Approximated transient queue length and waiting time distributions via steady state analysis. Stochastic Models, 21:2/3, p. 725-744, (2005). 19. Glasserman, P. and Kou, S.-G. Limits of first passage times to rare sets in regenerative processes, Ann. Appl. Probab. 5, 424445, (1995). 20. Gnedenko, B. V. and Kovalenko, I. N. Introduction to Queueing Theory, 2nd ed. Birkhuser, Basel, (1989).

Time to Buffer Overflow in an MMPP Queue

889

21. Chaudhry, M. L. and Zhao, Y. Q. First-passage time and busy period distributions of discretetime Markovian queues: Geom(n)/Geom(n)/1/N . Queueing Systems, 18: 5-26, (1994). 22. Asmussen, S., Jobmann, M. and Schwefel, H.-P. Exact Buffer Overflow Calculations for Queues via Martingales. Queueing Systems 42(1): 63-90, (2002). 23. Chydzinski, A. The oscillating queue with finite buffer. Performance Evaluation, 57/3 pp. 341-355, (2004). 24. Abate, J. Choudhury, G. L. and Whitt, W. An introduction to numerical transform inversion and its application to probability models. Chapter in Computational Probability, pp. 257-323, W. Grassman (ed.), Kluwer, Boston, 1999. 25. http://pma.nlanr.net/

Fundamental Eﬀects of Clustering on the Euclidean Embedding of Internet Hosts Sanghwan Lee1 , Zhi-Li Zhang2 , Sambit Sahu3 , Debanjan Saha3 , and Mukund Srinivasan2 Kookmin University, Seoul, Korea [email protected] 2 University of Minnesota, Minneapolis, MN, USA {zhzhang,mukund}@cs.umn.edu IBM T.J. Watson Research Center, Hawthorne, NY, USA {sambits,dsaha}@us.ibm.com 1

3

Abstract. The network distance estimation schemes based on Euclidean embedding have been shown to provide reasonably good overall accuracy. While some recent studies have revealed that triangle inequality violations (TIVs) inherent in network distances among Internet hosts fundamentally limit their accuracy, these Euclidean embedding methods are nonetheless appealing and useful for many applications due to their simplicity and scalability. In this paper, we investigate why the Euclidean embedding shows reasonable accuracy despite the prevalence of TIVs, focusing in particular on the eﬀect of clustering among Internet hosts. Through mathematical analysis and experiments, we demonstrate that clustering of Internet hosts reduces the eﬀective dimension of the distances, hence lowdimension Euclidean embedding suﬃces to produce reasonable accuracy. Our ﬁndings also provide us with good guidelines as to how to select landmarks to improve the accuracy, and explains why random selection of a large number of landmarks improves the accuracy.

1

Introduction

Network distance estimation schemes have been extensively studied during the past several years. These schemes include [1,2,3,4,5], just to name a few. Among many proposed schemes, the coordinate based schemes are gaining interest because of its simplicity and reasonably good accuracy. In the coordinate based system, each host is assigned a set of coordinates. The set of coordinates represents the position of the host in a virtual Euclidean space. The network distance is estimated by the Euclidean distance between the two hosts in the virtual Euclidean space. To assign coordinates to the hosts, many schemes rely on a set of special hosts called landmarks. Each host measures the distances to the landmarks and transforms the measured distances into a set of coordinates by using various optimization techniques.

This work was supported in part by the new faculty research program 2006 of Kookmin University in Korea and by the NSF grant CNS-0435444.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 890–901, 2007. c IFIP International Federation for Information Processing 2007

Fundamental Eﬀects of Clustering on the Euclidean Embedding

891

Although Euclidean embedding methods for network distance estimation in general provide reasonably accurate distance estimation for a majority of nodes, there are fundamental limitations on their accuracy. In particular, recent studies [6,7] have shown that the triangle inequality violations (TIVs) prevalent in the network distances among Internet hosts fundamentally limit the suitability of Euclidean embedding of network distances, thus the accuracy of Euclidean embedding-based distance estimation methods. Despite these limitations, Euclidean embedding methods are nonetheless appealing and useful for some applications, due to their simplicity and scalability. For example, P2P applications can easily employ the geographic forwarding such as GPSR for scalable object look up by assigning coordinates to hosts and objects based on network distances and random hash functions ([8], [9]). In this paper, we investigate why the Euclidean embedding shows reasonable accuracy despite the prevalence of TIVs in network distances. In particular, we explore the eﬀects of clustering among Internet hosts on network distance estimation – in particular, in terms of landmark selections – and how to exploit such clusters to judiciously select landmarks to obtain more accurate distance estimations. Clustering of Internet hosts are primarily due to the Internet routing hierarchy and AS (Autonomous System) topology. In other words, there are inherent clusters of hosts where distances between hosts within the same cluster are signiﬁcantly smaller than those across clusters. In [7] we have showed that distances (i.e., latencies) among hosts within the same cluster tend to have more TIVs than among hosts in diﬀerent clusters. In this paper based on mathematical analysis and simulation experiments, we demonstrate that reasonably good estimation of inter-cluster distances can hide the inaccuracy of small distances due to TIVs. Our ﬁndings provide us with good guidelines as to how to select landmarks to improve the accuracy of distance estimation. For instance, landmarks should be selected from each cluster and the number of landmarks should be proportional to the size of the clusters. Before we proceed to present our work in more details, we would like to emphasize that the goal of this paper is not to try to improve the accuracy of Euclidean-embedding based distance estimation methods, which, as stated earlier, are fundamentally limited by the prevalence of TIVs. Instead, the goal is to understand the underlying factors that contribute to reasonably accurate distance estimations using the Euclidean embedding approach (while within the conﬁnes of its fundamental limitations), and to provide good guidelines for landmark selections to produce best possible results. The remainder of the paper is organized as follows. Section 2 describes the GNP and Virtual Landmarks methods. In Section 3, we show that clusters help improve the accuracy of Virtual Landmark Method. We discuss the eﬀect of clusters on the landmark selection in Section 4. We conclude the paper in Section 5.

2

Background

In this section, we describe two representatives of Euclidean embedding based distance estimation schemes : Global Network Positioning (GNP) ([2]) and Virtual Landmarks ([3]).

892

S. Lee et al.

GNP uses a ﬁxed set of landmarks as the reference points. The landmarks measure the distances among themselves and assign coordinates by using simplex downhill optimization method. Basically, they assign coordinates such that the error between the actual distance and the estimated one is minimized. Then, each host measures the distances from itself to the landmarks. Based on the already assigned coordinates of the landmarks and the measured distances, each host ﬁnds out the coordinates that minimize the estimation errors by using iterative simplex downhill method. However, the iterative simplex downhill method has very high computation time. To reduce the computation time, Virtual Landmarks method employs Principal Component Analysis (PCA). PCA is based on the singular value decomposition on the symmetric distance matrix among n nodes. The following description on singular value decomposition is mostly adopted from [3]. Let D be the n × n matrix and each entry dij is the distance from node i to node j. The singular value decomposition of the matrix D has the form D = U · W · V T,

(1)

where U is an n×n orthogonal matrix , V is an n×n orthogonal matrix, and W is an n×n diagonal matrix. The diagonal entries of W are called the singular values of the matrix D. The singular values of D are the nonnegative square roots of the eigenvalues of DT D, and the columns of U and V are orthonormal eigenvectors of DDT and DT D. The number of non-zero singular values is the rank of the matrix D. Let x1 , x2 , · · · , xk be the k(< n) eigenvectors corresponding to the k largest eigenvalues. We stack the vectors into rows to form a transformation matrix, M ∈ Rk×n , i.e., M = (x1 , x2 , · · · , xk )T . The dimension reduction is by simply multiplying M to a given high dimensional distance vector, v ∈ Rn , i.e., v = M v, where v ∈ Rk . In Virtual Landmarks, the distances among the landmarks are ﬁrst measured to form the distance matrix D. Then, the transformation matrix M and the coordinates of the landmarks are computed based on the above discussion. Each host measures the distances from itself to the landmarks (let’s call the distance vector, h. Such h is also called Lipschitz Embedding.) By computing M h, the coordinates of each host are computed. One thing to note is that the number of the large singular values of D can represent the number of clusters as described in the next section.

3

Impact of Clusters on the Accuracy of Euclidean Embedding

Euclidean embedding of network distances is basically an optimization problem. It tries to assign coordinates to hosts so that the diﬀerence between the estimated distance and the real one is minimized. GNP strictly follows this idea by using simplex downhill method. Even though GNP uses the two phase coordinate computation (one for landmarks and one for hosts), which is diﬀerent from the global optimization, it mimics the global optimization in such a way that

Fundamental Eﬀects of Clustering on the Euclidean Embedding

893

the accuracy of GNP may approach the accuracy of the optimal embedding. Especially when the data set is from a Euclidean space, GNP is able to ﬁnd the very accurate coordinates up to some precision errors of the machine. Virtual Landmarks method, however, does not have any strong justiﬁcation on why the Lipschitz embedding can estimate the distances with reasonable errors. All [3] shows is that the PCA can reduce the dimension of the Lipschitz embedding without much accuracy loss from the accuracy of Lipschitz embedding. This thought motivates us to investigate why PCA-based Euclidean embedding shows reasonably good estimation accuracy. We conjecture that the existence of clusters may have some impacts on the accuracy. One intuition is that when the number of clusters is small, the Lipschitz embedding can achieve good accuracy for estimating the inter cluster distances, which shows the reasonable good accuracy overall. To justify our conjecture, we ﬁrst show the estimation accuracy of Virtual Landmarks method over various synthetic and real measurement data sets. Then, we relate the accuracy with the number of clusters in the data set, which is accurately found by the number of large singular values of PCA used in Virtual Landmarks method. For the metric of the accuracy, we use the relative error, rx,y , which is deﬁned as follows. rx,y =

|dx,y − dˆx,y | , min(dx,y , dˆx,y )

(2)

where dx,y is the actual distance between hosts x and y and dˆx,y is the estimated one. We ﬁrst generate two types of synthetic distance matrices : Random points and Clustered points from Euclidean spaces. For the random point data sets, we randomly generate 360 points from a unit hyper cube of 2 and 8 dimensional Euclidean space. They are called “d-2” and “d-8” respectively. For the clustered points, we ﬁrst select k (the number of clusters, 6 in the experiments) points as cluster centers in a unit hyper cube. Then, we generate c nodes within a small hyper cube (side length is 0.1, which is 10% of the side of the unit hyper cube. c is 60, 30, and 20 depending on the number of clusters.) centered at each cluster center. The number of dimensions is 2 and 8. We construct the distance matrix among the nodes by using the Euclidean distance between each pair of nodes. The two distance matrices are called “d-2-cl” and “d-8-cl” according to their dimensions. Furthermore, we use two real measurement data sets : Planetlab and King. PlanetLab is derived from the distances measured among the Planetlab nodes on Sep 30th 2005 [10]. We choose the minimum of the 96 measurement data points for each measurement between node pairs. After removing the hosts that have missing distance information and the hosts that have same /24 preﬁxes, we obtain a 148 × 148 distance matrix among 148 nodes. King data set is the one used in [4]. After removing the hosts that have missing distance information, we ﬁnally get a distance matrix among 462 hosts.

S. Lee et al.

cumulative distribution

894

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

d-2 d-2-cl King Planetlab d-8 d-8-cl 0

0.2

0.4

0.6

0.8

1

relative error

Fig. 1. Accuracy of Virtual Landmark Method: Virtual Landmarks method shows lower accuracy for high dimensional Euclidean data set. However, when the data set has clusters, the accuracy degradation is limited.

We run Virtual Landmarks method to embed the above distance matrices into the Euclidean space. In this experiment, we use 18 randomly selected landmarks. For “d-2” and “d-2-cl”, we use 3 dimensions and for “d-8” and “d-8-cl”, we use 9 dimensions. For King and Planetlab data sets, we use 7 dimensions, which are suggested in [2,3]. The cumulative distributions of the relative errors are shown in Fig 1. One surprising result is that the accuracy for “d-8” is very poor. However, when the data sets have clusters such as in ”d-8-cl”, the accuracy is reasonably good. Planetlab also shows reasonably good accuracy in that more than 70 % of the estimations show the relative error less than 0.25. The result highly suggests that the accuracy of the Virtual Landmarks method is related with the existence of clusters. To explain this relationship, we focus on the suggestion of the authors of the Virtual Landmarks method on choosing the number of dimensions. They suggest that the number of dimensions should be the number of dominant singular values of PCA. Interestingly, in the following, we show that the number of dominant singular values is the number of clusters. This implies that the coordinates computed by the Virtual Landmarks are actually the approximate distances from each host to the clusters. [11] also states similar insights that the PCA dimension reduction automatically performs data clustering according to the K-means objective function. To show that the number of dominant singular values is the number of clusters, we need to deﬁne the number of dominant singular values. For that purpose, we use the magnitude change r(i) of the ith singular value. The number of dominant singular values is deﬁned as i such that r(i) is the largest. r(i) is deﬁned as follows, where the singular values (λi ) are sorted in descending order, 1 if i = 0 or (λi = 0, λi−1 = 0) r(i) = λi−1 (3) λi (≥ 1)otherwise

Fundamental Eﬀects of Clustering on the Euclidean Embedding

895

We ﬁrst prove that the number of dominant singular values is the number of clusters for a distance matrix with extremely tight clusters, i.e., the points in each cluster are at the same position in the Euclidean space. Theorem 1. Let C = {C1 , C2 , · · · , Ck } be the k clusters of points in a d dimensional hyper cube. Each cluster Ci contains ni points. Let N = i ni . Let S be the set of N points. Let D be the N × N distance matrix among the points in S. Let λi be the i-th singular value of the singular value decomposition of D for i = 0, · · · , N − 1. If the assumption that the points in Ci are at the same position for i = 1, · · · , k holds, then r(k) = max r(i), i>1

for k > 1, where r(i) is defined in (3). Proof. Since we assume that the points in Ci are at the same position for i = 1, · · · , k, the distance matrix D is k × k block matrix. The i, j block is ni × nj matrix. Since the points in each cluster have the same position and the distance between two nodes is the Euclidean distance, the ﬁrst n1 rows of the matrix D are the same, the second n2 rows of the matrix D are the same, and so on. Since all the diagonal blocks are 0, there are k distinct rows in the matrix D. This means that the rank of D is k. The number of non-zero singular values of the singular value decomposition of D is actually the rank of D, i.e., k. Since k + 1-th singular value is 0, r(k) = ∞ = maxi>1 (i). So the number of dominant singular values of D is k, the number of clusters. To show that the same is true for non-extreme data sets, we use three kinds of data sets including Euclidean distance matrix with clusters, Topology based synthetic distance matrices, and the real measurement data sets. For Topology based synthetic distance matrices, we ﬁrst use BRITE tool from Boston University to generate synthetic 2-level topologies. Then, we move the nodes in each AS into smaller regions to make clear clusters, which look like the one in Fig. 2. 1

y-axis

0.8 0.6 0.4 0.2 0 0

0.2

0.4 0.6 x-axis

0.8

1

Fig. 2. Example: 18 ASes with 20 nodes in each AS. There are total 360 nodes in the topology.

S. Lee et al.

70

6 cl 5 dim 6 cl 20 dim 12 cl 5 dim 12 cl 20 dim 18 cl 5 dim 18 cl 20 dim

magnitude change

60 50 40 30 20 10 0 0

5

10 15 20 25 singular value number

30

35

(a) Euclidean distance matrix with clusters

magnitude change

12

6 AS 60 nodes 12 AS 30 nodes 18 AS 20 nodes

10 8 6 4 2 0 0

5

10 15 20 25 singular value number

30

35

(b) Synthetic Topology based distance matrix 4 magnitude change

896

Planetlab Vivaldi NLANR

3.5 3 2.5 2 1.5 1 0

5

10

15

20

25

30

35

singular value number

(c) Real measurement data set Fig. 3. Magnitude changes of Principal Component Analysis

Fundamental Eﬀects of Clustering on the Euclidean Embedding

897

Table 1. Properties of Clusters (Planet lab data) Cluster Num of intra cluster hosts dist (ms) 1 26(23) 14.012 2 31(28) 6.761 3 17(14) 4.322 4

17(14)

7.744

5 6 7

56(53) 21(18) 34(31)

69.391 1.328 14.444

Countries

Locations

USA, Canada USA, Canada USA

East cost cities, East cost cities, California, Washington, Arizona USA, Canada Washington, Oregon, Calgary, Vancouver Europe, Asia, Australia USA CA USA Central USA from Michigan to Texas

We create 6 AS, 12 AS, and 18 AS topologies, and each topology has 360 nodes in total. By assigning the Euclidean distance between adjacent nodes as the weight of the link and running hierarchical routing, we compute the distance matrix among hosts. Furthermore, we use one more real measurement data set called NLANR data set. NLANR data set is collected from Active Measurement Project (AMP) ([12]) on April 7. 2004. After removing some hosts that have missing distance information, we ﬁnally get a distance matrix among 83 hosts. We apply PCA on the distance matrices. As can be seen in Fig. 3(a), there are high peaks at the singular value number that equals the number of clusters in the Euclidean distance matrices. Similarly, Fig. 3(b) shows that the high peaks occur at the right number of ASes (i.e. clusters). This clear peak does not appear for the real Measurement data sets as can be seen in Fig. 3(c). However, there are still several reasonable peaks around 5-7, which means that Virtual Landmarks can get beneﬁt from the existence of clusters. This is manifested in Fig. 1, where the accuracy of Planetlab is similar to that of “d-8-cl”. To corroborate whether there exist clusters in the real measurement data set, we cluster the Planetlab data set (202 nodes including the nodes that have same /24 preﬁxes) by the spectral clustering algorithm1 . Then, we compute the average intra cluster distances of the clusters and ﬁnd the locations of the hosts in each cluster2 (refer to Table 1). The average distances are computed after excluding 3 outliers (the hosts that have largest average distance to all the other hosts in the cluster) from each cluster. The second column shows the number of hosts in each cluster and the number in the parenthesis is the number of hosts after excluding 3 outliers. Most of the clusters have very small average intra cluster distances (1.328ms to 14.444ms) except the cluster 5. The cluster 5 has high intra cluster distances and the hosts of the cluster are scattered around Europe, Asia, and Australia. However the hosts in other clusters are located 1 2

We think that any clustering method is ﬁne for this purpose. We look up the location of each IP address at “http://www.ip2location.com”

898

S. Lee et al.

in relatively small regional areas. In general, the Internet has 5-7 clusters with small intra-cluster distances.

4

Eﬀect of Clusters on Landmark Selection

In this section, we investigate the eﬀect of clusters on the landmark selection problem. [13] shows some experiment results suggesting that one landmark from each cluster improves the accuracy. Furthermore, they show that the random landmark selection is reasonably good when the number of landmarks is around 20-30. However, the paper only shows experimental results rather than a rigorous analysis. Here, we provide a theorem stating that the number of landmarks selected in a cluster should be proportional to the number of hosts in that cluster, not just one landmark from each cluster. Theorem 2. Under the assumption that the hosts in each cluster are at the same position, the distance estimation that uses number of landmarks proportional to the hosts in a cluster performs better than one that uses equal number of landmarks in each cluster. Proof. Let C = {C1 , C2 , · · · , Cc } be the set of c active clusters. Let N the set of all the hosts. Let ni be the number of hosts in cluster Ci . Let n = min(n1 , n2 , · · · , nc ). In the proportional landmarks case, the number of landmarks in cluster Ci is nni (for simplicity, we assume all these numbers are integers). Let P be the set of landmarks used in distance estimation in the proporc i=1 ni be the number of landmarks per cluster tional landmarks case. Let k = nc used in distance estimation in the equal landmarks case. Let E be the entire set of landmarks used in distance estimation in the equal landmarks case. So c ni . Let L be the set of c landmarks, one from each clus|P| = |E| = kc = i=1 n ter. So L is a subset of P and E. We assume that the objective function of the distance estimation system is min

x

|dx,y − dˆx,y |2 ,

(4)

y

where x, y ∈ N and dx,y is the actual distance between x and y, and dˆx,y is the estimated one. Let Ka = x y |dx,y −dax,y |2 where x, y ∈ N and Da = (dax,y ) is the distance matrix obtained by the distance estimation method using the landmarks from P. That is, Da is such that p q |dp,q − dap,q |2 is minimum, where p, q ∈ P. Thus, Da minimizes lp lq |dp,q − dap,q |2 , (5) p

q

where p, q ∈ L and li is the number of landmarks in the cluster to which node i belongs.

Fundamental Eﬀects of Clustering on the Euclidean Embedding

899

Let Kb = x y |dx,y − dbx,y |2 where x, y ∈ N and Db = (dbx,y ) is the distance matrix obtained by the distance estimation method using the landmarks from E. That is, Db is such that p q |dp,q − dbp,q |2 is minimum, where p, q ∈ E. Since all the in a cluster are assumed to be at the same location, we hosts a 2 have Ka = p q ηp ηq |dp,q − dp,q | where p, q ∈ L and ηi is the number of hosts lp , we in the cluster to which node i belongs. Since ηp = n × have Ka = n2 p q lp lq |dp,q − dap,q |2 where p, q ∈ L. Similarly, Kb = n2 p q lp lq |dp,q − dbp,q |2 where p, q ∈ L. Since Da minimizes (5), we have Ka ≤ Kb . This result applies to any embedding scheme that tries to optimize (4) including both GNP and Virtual Landmarks method. One obstacle of applying proportional landmark selection in the real situation is that nni may not be an integer. In this case, we can select one landmark from each cluster. compute landmark To coordinates, we can use the weighted objective function p q np nq |dp,q − dˆp,q |2 , where p, q ∈ L. Then, to assign coordinates of a host i, we can use the weighted objective function, q nq |di,q − dˆi,q |2 . In other words, we give a weight to the error between the host and the landmark, and the weight is proportional to the number of hosts in the cluster. A more serious obstacle is that we do not know the clusters in advance because we do not have the distance matrix among all the hosts. However, in the following, we show that the performance of the random landmark selection with increasing number of landmarks actually converges to that of the proportional (clustering based) landmark selection. The intuition is that when the number of landmarks is large, the number of landmarks selected from each cluster is proportional to the number of hosts in the cluster. The data set used in this experiment is the 6 ASes (clusters) with 60 nodes in each AS topology (total 360 nodes) used in the previous section. In the clustering based method, we randomly select one host from each cluster as a landmark, since we know the clusters that the hosts belong to. We select 6 such landmarks. In the sampling based method, we randomly select a subset of hosts from the set of entire hosts as landmarks. The numbers of landmarks in the sampling based method are 6, 12, 18, 24, and 30. We use 6 as the number of dimensions. After we select the landmarks, we run the Virtual Landmark method on the data set 20 times. Fig. 4 shows the relative errors of the 20 runs at 50th, 70th, and 90th percentiles over diﬀerent landmark selection method. “CL” represents the clustering based method and “SA” represents the sampling based method. The number of landmarks in each method is appended to the key. The bars show the average relative errors with min and max values of the 20 runs. As can be seen in Fig. 4, the clustering based selection shows better performance in average. Furthermore, the clustering based selection has small min-max range at each percentile, which shows the stability of the clustering based method. However, the sampling based selection has large min-max ranges for small number of landmarks. When the number of landmarks increases, the accuracy converges to that of the clustering based method. It shows that the proportional landmark selection can be achieved by using a large number of landmarks. The data sets from 12 AS 30 node topology and 18 AS 20 node topology also show similar result.

900

S. Lee et al.

1.4

relative error

1.2 1 0.8

CL 6L SA 6L SA 12L SA 18L SA 24L SA 30L

0.6 0.4 0.2 0 50th

70th

90th

Percentiles

relative error

Fig. 4. Accuracy of diﬀerent landmark selection methods with the Synthetic 6 AS 60 node topology

1

CL 10L SA 10L SA 15L SA 20L SA 25L SA 30L

0 50th

70th

90th

Percentiles Fig. 5. Accuracy of diﬀerent landmark selection methods with King data set

Next, we run the same experiment with the King data set. In the clustering based method, we ﬁrst apply the spectral clustering algorithm to construct 10 clusters. Then, we randomly select one host from each cluster as the landmark. In the sampling based method, we randomly select a set of hosts from the set of entire hosts as landmarks. The numbers of landmarks in the sampling based method are 10, 15, 20, 25, and 30. We use 10 as the number of dimensions. We run the experiment 20 times with diﬀerent sets of landmarks. Fig. 5 shows the result of the King data set. Just like the result of the synthetic data sets shown in Fig. 4, the sampling based method with 10 landmarks shows high variance on the accuracy. As the number of landmarks increases, the variance decreases, which means that the random selection approaches to the proportional selection.

Fundamental Eﬀects of Clustering on the Euclidean Embedding

5

901

Conclusion

In this paper, we investigated the factors that make the Euclidean embedding show reasonably good accuracy for distance estimation. We showed that the existence of clusters actually helps improve the accuracy of distance estimation in Virtual Landmarks method because of the way that the Virtual Landmarks method chooses the number of dimensions. We also showed that selecting landmarks proportional to the size of clusters increases the accuracy and in reality, the random selection of a large number of landmarks can achieve the performance of proportional landmark selection.

References 1. Francis, P., Jamin, S., Jin, C., Jin, Y., Raz, D., Shavitt, Y., Zhang, L.: Idmaps: A global Internet host distance estimation service. IEEE/ACM Transactions on Networking (2001) 2. Ng, T.E., Zhang, H.: Predicting Internet network distance with coordinates-based approaches. In: Proc. IEEE INFOCOM, New York, NY (June 2002) 3. Tang, L., Crovella, M.: Virtual landmarks for the Internet. In: Proceedings of the Internet Measurement Conference(IMC), Miami, Florida (October 2003) 4. Dabek, F., Cox, R., Kaashoek, F., Morris, R.: Vivaldi: A decentralized network coordinate system. In: Proceedings of ACM SIGCOMM 2004, Portland, OR (August 2004) 5. Madhyastha, H.V., Anderson, T., Krishnamurthy, A., Spring, N., Venkataramani, A.: A structural approach to latency prediction. In: Proceedings of the Internet Measurement Conference(IMC), Rio de Janeiro, Brazil (October 2006) 6. Zheng, H., Lua, E.K., Pias, M., Griﬃn, T.G.: Internet routing policies and roundtrip-times. In: The 6th anuual Passive and Active Measurement Workshop, Boston, MA (March 2005) 7. Lee, S., Zhang, Z.L., Sahu, S., Saha, D.: On suitability of euclidean embedding of internet hosts. In: Proc. ACM SIGMETRICS, Saint Malo, France (June 2006) 8. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable contentaddressable network. In: Proceedings of ACM SIGCOMM 2001, San Diego, CA (August 2001) 9. Yu, Y., Lee, S., Zhang, Z.L.: Leopard: A locality-aware peer-to-peer system with no hot spot. In: the 4th IFIP Networking Conference (Networking’05), Waterloo, Canada (May 2005) 10. Stribling, J.: Round trip time among planetlab nodes. http://www.pdos.lcs. mit.edu/˜strib/pl app/ (2005) 11. Ding, C., He, X.: K-means clustering via principal component analysis. In: Proc. of Int’l Conf. Machine Learning (ICML 2004), Banﬀ, Alberta, Canada (July 2004) 12. NLANR: Nlanr amp data set. http://amp.nlanr.net/Status/ (2005) 13. Tang, L., Crovella, M.: Geometric exploration of the landmark selection problem. In: The 5th anuual Passive and Active Measurement Workshop, Antibes Juan-lesPins, France (April 2004)

A Multihoming Based IPv4/IPv6 Transition Approach Lizhong Xie, Jun Bi, and Jianping Wu Network Research Center, Tsinghua University, China Education and Research Network (CERNET) Beijing 100084, China

Abstract. How to make IPv4 users utilize IPv6 applications is a typical scenario of the IPv4/IPv6 inter-operation. Nowadays, Tunnel Broker and 6to4 tunnel mechanisms are the popular solutions for this problem. This paper proposes a multihoming based algorithm MI46 to integrate Tunnel Broker and 6to4 tunnel mechanism. It overcomes the shortcomings of both Tunnel Broker and the 6to4 tunnel mechanism to form an optimized method to make the IPv4 users use the IPv6 applications. Keywords: IPv4/IPv6 Transition, Multihoming, SHIM6.

1 Introduction The Internet running IPv4 protocol [1] has gained huge success in the past 20 years. However, it has grown to a scale well beyond the designers envisioned over decades ago. In 1998, The IETF (Internet Engineer Task Force) introduced IPv6 protocol [2] which is designed to overcome the limitation of IP address and security problem. In recent years, a lot of countries (North America, Europe and East Asia) drive the development of the IPv6 protocol by constructing IPv6 operational network. Nowadays, more and more people have realized that it is inevitable to transit from IPv4 to IPv6. Transition from IPv4 to IPv6 is a very complex problem, which involves the compatibility of the equipments, techniques, applications and so on. The IETF established a working group called “IPng Transition” (ngtrans) [3] to study these problems, and proposed plenty of transition methods. But, recently the IETF uses the ‘IPv6 Operations’ (v6ops) working group [4] and the new term “inter-operation” instead of the ngtrans working group and the term “transition”. The IETF believes that the transition from IPv4 to IPv6 is a long-term process and the inter-operation of the IPv4/IPv6 network is extremely necessary. So, the current main point is to study how to make the IPv4 and IPv6 network inter-operate well enough, rather than how to replace the IPv4 network with the IPv6 network. Though the coexistence of IPv6 and IPv4 network will last a long time, the IPv6only applications will be more and more popular when network application providers and application users realize that the IPv6 network is a definite trend. In this situation, it is very significant to study how to make the users in the IPv4 network use the IPv6only applications, which is a typical scenario of the application inter-operation of the IPv4/IPv6 networks. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 902–911, 2007. © IFIP International Federation for Information Processing 2007

A Multihoming Based IPv4/IPv6 Transition Approach

903

Tunnel broker [5] and 6to4 tunnel [6] mechanism is the typical solutions to make the IPv4 users utilize IPv6 applications. This paper points out the pitfalls of these two solutions, and proposes a simplified-SHIM6 based mechanism MI46 which integrates Tunnel Broker and 6to4 tunnel solutions. In the MI46 mechanism, the dual-stack host in the IPv4 network holds both global IPv6 address and the 6to4 address. It uses the global IPv6 address and Tunnel Broker mechanism to visit the pure IPv6-host in the native IPv6 network, whereas it employs the 6to4 address and 6to4 tunnel mechanism to visit another dual-stack host in the IPv4 network via IPv6 protocol. In this way, we form a new optimized algorithm to make the IPv4 users use the IPv6 applications by integrating the advantages of Tunnel Broker and the 6to4 tunnel mechanisms. The rest of the paper is organized as follows. Section 2 shows the problem statement; Section 3 presents the MI46 algorithm; In Section 4, the advantage of MI46 is introduced; and Section 5 concludes the paper.

2 Problem Statement In this section, we first introduce Tunnel Broker and 6to4 tunnel mechanisms briefly, and then give a clear problem statement this paper focuses on by pointing out the pitfalls of Tunnel Broker and 6to4 tunnel mechanisms. 2.1 The Background of Tunnel Broker and 6to4 Tunnel Tunnel broker is used to help users to manage the configured tunnels automatically. With the help of the tunnel broker, the dual-stack host in the IPv4 network can obtain the global permanent IPv6 address from the IPv6 ISP. Then, in order to form the IPv6 connectivity, an IPv6-in-IPv4 tunnel is set up between the dual-stack host and the IPv6-relay gateway, which is called tunnel server in Tunnel Broker mechanism. So, in Tunnel Broker mechanism, all traffic has to be forwarded by the IPv6-relay gateway. The 6to4 tunnel is another automatic way to connect isolated IPv6 sites/hosts attached to an IPv4 network which has no native IPv6 support. An IPv6-relay gateway, which is called 6to4 Relay in the 6to4 tunnel mechanism, is provided for such IPv6 sites/hosts to visit IPv6 native network before they can obtain native IPv6 connectivity. With 6to4, the current IPv4 network is treated as the link layer, and the existing IPv4 routing infrastructure is utilized to forward IPv6-in-IPv4 encapsulated packet. The 6to4-host uses a 6to4 IPv6 address (2002:IPv4 Address:: /80) as the communication identifier. When the IPv6 packet is sent, the IPv4 address of tunnel end point can be found within the 6to4 address, and then a tunnel is formed without explicit configuration. 2.2 Problem Statement How to make the IPv4 users use the IPv6 applications is a typical scenario of the application inter-operation of IPv4/IPv6 network. When a dual-stack host in the IPv4 network wants to use the IPv6 applications, the following two scenarios are possible: (1) Scenario 1: The dual-stack host in the IPv4 network visits the pure IPv6-host in the native IPv6 network. With the deployment of the IPv6, a lot of IPv6

904

L. Xie, J. Bi, and J. Wu

applications will be located at the native IPv6 network. The dual-stack host in the IPv4 network must visit the pure IPv6-host/server in the native IPv6 network if it wants to use these IPv6 applications. (2) Scenario 2: Two or more dual-stack hosts in the IPv4 network communicate with each other using the IPv6 protocol in order to use the IPv6-only applications. In the future, many IPv6 applications will have no IPv4 support. Therefore, the dual-stack hosts in the IPv4 network must communicate with each other using the IPv6 protocol if they want to use the IPv6-only applications. As mentioned above, Tunnel Broker and the 6to4 tunnel mechanisms are the typical solutions to make the IPv4 users use the IPv6 applications. However, each of these two solutions can not be applied to both the above scenarios appropriately. Tunnel Broker can work well in the scenario 1, but when it works in the scenario 2 (Fig.1), the traffic between the two dual-stack hosts must be forwarded by the IPv6-relay gateway. So, the IPv6-relay gateway may potentially become the communication bottleneck. Besides, a packet from one dual-stack host to another must be encapsulated and dencapsulated twice. The first time encapsulation/ dencapsulation occurs between the sender and the IPv6-relay gateway and the second time encapsulation/ dencapsulation occurs between the IPv6-relay gateway and the receiver. This behavior may lead to bad user experience.

Fig. 1. Scenario that IPv4 users use IPv6 applications by tunnel broker

、

In Figure 1, the dual-stack hosts A B in the IPv4 network and the host C in the IPv6 network all run the IPv6 applications. The dual-stack host A and B obtain the global IPv6 addresses from the IPv6 ISP. When the host A communicates with the host C, there is no doubt that the traffic need to be forwarded by the IPv6-relay gateway. However, when the host A communicates with the host B, the traffic still need to be forwarded by the IPv6-relay gateway, which increases the burden of the

A Multihoming Based IPv4/IPv6 Transition Approach

905

IPv6-relay gateway unnecessarily. Apparently, it is a better way that the host A communicates with B via a direct tunnel. That is just the behavior of the 6to4 tunnel mechanism. The 6to4 tunnel mechanism can work well in the scenario 2 since there is no need to employ a relay gateway to forward the traffic between the dual-stack hosts. However, because of the special format of 6to4 address, it’s hard to do routing aggregation for 6to4 address. Hence, the 6to4 tunnel mechanism is not very suitable as a common approach for IPv6 communication. So, it is not a good method in scenario 1. In summary, Tunnel Broker can work better in scenario 1 than in scenario 2, while the 6to4 tunnel mechanism is more suitable in scenario 2 than in scenario 1. So, one interesting problem is how to integrate these two mechanisms and form an optimized solution. That is just the topic we discuss next.

3 The MI46 Algorithm In this paper, we present a simplified-SHIM6 based algorithm MI46 to integrate Tunnel Broker and the 6to4 tunnel mechanisms. We believe that this integration can form an optimized IPv4/6 transition approach. 3.1 The Background of SHIM6 Currently, the SHIM6 mechanism [7] is the most promising multihoming approach in the IETF’s viewpoint. Multihoming refers to the phenomena that one network end node accesses to the Internet through multiple network paths mainly due to the consideration of fault resilience. For the purpose of access to the Internet via multiple network paths, the multihomed network end node often possesses several addresses. Once the current network path fails, the multihomed network end node can immediately switch to another address and use another network path to communicate. Therefore, as a technical solution of multihoming, SHIM6 has dealt with the problem of address switch, which is just the key problem of the MI46 algorithm. In the SHIM6 approach, a new ‘SHIM6’ sub-layer is inserted into the IP stack in end hosts that wish to take advantage of multihoming (Figure 2). The SHIM6 sublayer is located within the IP layer between the IP endpoint sub-layer and IP routing sub-layer. With the SHIM6, hosts have to deploy multiple provider-assigned IP address prefixes from multiple ISPs. These IP addresses are used by applications and if a session becomes inoperational, SHIM6 sub-layer can switch to use a different address pair. The switch is transparent to applications as the SHIM6 layer rewrites and restores the addresses at the sending and receiving host. For the purpose of transport-layer communication survivability, the SHIM6 approach separates the identity and location functions for IPv6 addresses. In SHIM6, the identifier is used to uniquely identify endpoints in the Internet, while the locator is used to perform the role of routing. There is a one-to-more relationship between the identifier and locator. The SHIM6 layer performs the mapping function between the identifier and the locator consistently at the sender and the receiver. The upper layers above the SHIM6 sub-layer just use the unique identifier to identify the

906

L. Xie, J. Bi, and J. Wu

communication peer, even though the locator of the peer has changed. Hence, when the multihomed host switches to another locator, the current transport layer communication does not break up since the identifier is not changed.

Fig. 2. SHIM6 Architecture

3.2 The MI46 Algorithm In the MI46 algorithm, for the purpose of integrating Tunnel Broker and the 6to4 tunnel mechanisms, we construct a virtual IPv6 network for upper-layer IPv6 applications by inserting a SHIM6 sub-layer in the IPv6 stack of the dual-stack host within the IPv4 network. The SHIM6 sub-layer distinguishes that the peer is located in the IPv4 or IPv6 network, and selects the 6to4 tunnel or Tunnel Broker mechanism to communicate. The architecture of the MI46 mechanism is shown in Figure 3. IPv6 Applications

TCP, UDP, AH, ESP...

Simplified SHIM6

Global IPv6 Address, Tunnel Broker

6to 4 Address, 6to 4 Tunnel

IPv6 Network

IPv4 Network

Fig. 3. MI46 Architecture

In the MI46 mechanism, the dual-stack host in the IPv4 network need hold both the global IPv6 address and the 6to4 address. We choose the global IPv6 address as the primary address since it is hard to do aggregation for 6to4 address. The global IPv6 address is assigned by the IPv6 ISP, while the 6to4 address is generated by the

A Multihoming Based IPv4/IPv6 Transition Approach

907

dual-host itself. Applications and transport layer, which are above the SHIM6 sublayer, just use the global IPv6 address as its identifier. And the IP layer, which is below the SHIM6 sub-layer, uses the global IPv6 address or 6to4 address as the locator. When a dual-stack host that has deployed the MI46 mechanism in the IPv4 network initially contacts with another IPv6 host (a dual-stack host in the IPv4 network or a host in the native IPv6 network), it uses the global IPv6 address and Tunnel Broker mechanism. In succession, the dual-stack host sends a probe message to examine the correspondent whether or not supports the MI46 mechanism. If the correspondent deploys the MI46 mechanism, it will return a response message to the initiator. Then, they exchange their respective 6to4 address through two handshakes. The whole process of the four handshakes is shown in Figure 4. The algorithms of four handshakes in the initiator and correspondent are shown in Figure 5.

Fig. 4. Four handshakes process

If the four handshakes complete, the initiator can deduce that the correspondent is also in the IPv4 network. They both establish the mapping state between the peer’s global IPv6 address (identifier) and the 6to4 address (locator). Then all the subsequent traffic between the initiator and the correspondent change the locators from the global IPv6 addresses to the 6to4 addresses. But the identifiers used by upper layer keep consistent due to the mapping function of the MI46 sub-layer. That guarantees the transport layer communications are not terminated. Now, all traffic goes through the direct 6to4 tunnel (Figure 6). After the dual-stack host in the IPv4 network completes current communication with the correspondent, they both keep the mapping state of the peer’s global IPv6 address and the 6to4 address for a while. Next time when they communicate with each other, they can look up he mapping state directly and use the 6to4 address as both the identifier and the locator. In this situation, the mapping function of the SHIM6 is turned off automatically to improve the packet handling performance. The expired mapping states are deleted by the garbage collecting system.

908

L. Xie, J. Bi, and J. Wu

a. The algorithm of four handshakes in the initiator

b. The algorithm of four handshakes in the correspondent Fig. 5. The algorithms of four handshakes in the initiator and correspondent

Fig. 6. The communication process between two dual-stack hosts by MI46

A Multihoming Based IPv4/IPv6 Transition Approach

909

If the dual-stack host in the IPv4 network visits the pure IPv6-host in the native IPv6 network, the second handshake of the four handshakes will not complete since it’s unnecessary to support the MI46 mechanism for the hosts in the native IPv6 network. Thus, the whole communication uses the global IPv6 address and Tunnel Broker mechanism. 3.3 The Simplification of SHIM6 The section above has explained in details how SHIM6 integrates Tunnel Broker and 6to4 tunnel mechanisms to form a new IPv4/IPv6 transition approach. This section will explain why we simplify the SHIM6 mechanism. The SHIM6 is a solution of multihoming, whose main goal is fault resilience. That is, it can seamlessly switch to another network path when the current one fails. The goal of MI46 is to integrate Tunnel Broker and the 6to4 tunnel mechanisms to form an optimized method to make the IPv4 users use the IPv6 applications. The different goals of the SHIM6 and MI46 determine that it is not necessary for the MI46 to use the all functions of the SHIM6. Thus, we need a simplified SHIM6. What the MI46 needs is just the address mapping function, the address switch function and the function of exchanging address list between the hosts. Other functions, essential for the SHIM6, are not needed for the MI46. For example, SHIM6 needs to detect failures between two communicating hosts continually and explore the reachability of another pair of addresses between the same hosts if a failure occurs and an operational pair can be found. This function is not necessary for MI46 since it doesn’t deal with the fault resilience.

4 The Advantage of the MI46 Algorithm As we mentioned above, Tunnel Broker and the 6to4 tunnel are the popular solutions to make the IPv4 users use the IPv6 applications. Tunnel Broker mechanism is very suitable for the scenario that the IPv4 hosts visit the hosts in the IPv6 network. But when it works in the situation that two dual-stack hosts use IPv6 protocol to communicate, it may make the burden of IPv6-relay gateway heavier and make it become the potential communication bottleneck; meanwhile, it brings bad user experience due to encapsulation and decapsulation twice. The 6to4 tunnel mechanism establishes the direct tunnel using the 6to4 addresses when two dual-stack hosts in the IPv4 network communicate with each other, so it’s a good approach in this scenario. But because the 6to4 address is a kind of special format address, it is hard to do aggregation and leads to the fact that routing system works in an uncomfortable way. Hence, 6to4 address is not preferred for common IPv6 communication like the situation that IPv4 hosts visit the native IPv6 network. The MI46 integrates Tunnel Broker and the 6to4 tunnel mechanisms well, which looks transparent for the applications and the upper layers. When the dual-stack host in the IPv4 network visits the pure IPv6-hosts in the native IPv6 network, the whole communication uses the global IPv6 address and Tunnel Broker mechanism. When two dual-stack hosts communicate with each other, they seamlessly switch to 6to4 addresses and use the direct 6to4 tunnel. Figure 7 shows how the MI46 works well in above two scenarios.

910

L. Xie, J. Bi, and J. Wu

Fig. 7. Scenario that IPv4 users use IPv6 applications by MI46

In Figure 7, the dual-stack host A in the IPv4 network communicates with the host C in the IPv6 network using the global IPv6 address and Tunnel Broker mechanism. When A communicates with the dual-stack host B in the IPv4 network, they seamlessly switch to the 6to4 addresses and use the direct 6to4 tunnel to communicate. With the MI46 algorithm, when two or more dual-stack hosts communicate with each other, the burden of the IPv6-relay gateways can be reduced effectively. And it is expected that users will get better experience with the direct 6to4 tunnel instead of IPv6-relay gateway’s forward. Since we use the global IPv6 addresses in the situation that the dual-stack host visits the native IPv6 network, the problem that the 6to4 address is hard to aggregate can be avoided. The comparison of Tunnel Broker 6to4 tunnel and MI46 is shown in the table 1.

、

Table 1. Comparison of Tunnel Broker

、6to4 tunnel and MI46

Dual-stack hosts in IPv4 network visit the pure-IPv6 host in the native IPv6 network Tunnel Broker Very suitable.

6to4 Tunnel MI46

Unsuitable. 6to4 address is hard to do aggregation. Very suitable.

Dual-stack hosts in the IPv4 network communicate with each other using IPv6 protocol Unsuitable. The burden of the IPv6-relay gateway is heavy and the user experience is bad. Very suitable. Very suitable.

A Multihoming Based IPv4/IPv6 Transition Approach

911

5 Conclusions In this paper, we propose a simplified-SHIM6 based algorithm MI46 which integrates Tunnel Broker and the 6to4 tunnel mechanisms to form an optimized method to make the IPv4 users use the IPv6 applications. With the MI46 algorithm, we can overcome the shortcoming of the 6to4 tunnel mechanism, that is, the 6to4 address is hard to do aggregation when it is used as a common method to visit the IPv6 network. Meanwhile, we improve Tunnel Broker mechanism. In the situation that two or more dual-stack hosts communicate with each other, the MI46 algorithm can reduce the burden of the IPv6-relay gateways effectively, and the users can get better experience. Therefore, we conclude that the MI46 is a better solution to make the IPv4 users use the IPv6 applications than Tunnel Broker and the 6to4 tunnel mechanisms.

References 1. Postel, J.: INTERNET PROTOCOL, RFC 0791, September, 1981 2. Deering, S. and Hinden, R.: Internet Protocol Version 6 (IPv6) Specification, RFC 2460, December, 1998 3. IETF Next Generation Transition (ngtrans) Working Group, http://www.ietf.org/html. charters/OLD/ngtrans-charter.html 4. IETF IPv6 Operations (v6ops) Working Group, http://www.ietf.org/html.charters/v6opscharter.html 5. Durand, A., Fasano, P. and Lento, D.: IPv6 Tunnel Broker, RFC 3053, Jan. 2001 6. Carpenter, B. and Moore, K.: Connection of IPv6 Domains via IPv4 Clouds, RFC 3056, February 2001 7. Nordmark, E. and Bagnulo, M.: Level 3 multihoming shim protocol, draft-ietf-shim6proto-07.txt, November 2006

Offline and Online Network Traffic Characterization Su Zhang and Mary K. Vernon University of Wisconsin Madison, Computer Science Department, 1210 W. Dayton St. Madison, U.S. {zs,vernon}@cs.wisc.edu

Abstract. This paper investigates a new technique called Bayesian-BlockAnalysis (BBA) for analyzing the time varying rate of events. The first goal is to evaluate the accuracy of BBA in identifying the rate changes in synthetic traces that have a given interevent times distribution, and known rate change points. We find that BBA is highly accurate on traces with exponential interevent times and known rate changes, and reasonably accurate with more heavier-tailed interevent times. The second goal is to apply BBA to actual network event traces. And for request arrivals or loss rate traces, BBA identifies significant stationary-rate periods which are qualitatively consistent with previous results obtained with less efficient or less accurate techniques. For packet arrivals to gateways, BBA identifies stationary rate periods that are corroborated by binning the data on a new timescale. Finally, we also show BBonline rate estimation is accurate for synthetic as well as actual system traces. Keywords: Network Traffic Characterization, Bayesian Analysis, EWMA, Rate Estimation, Stationary Rate Period, Loss Rate, Packet Arrival Rate.

1 Introduction An important problem in analyzing various types of network traffic events – such as request arrivals to a web server, packet losses in a packet flow, or packet arrivals to a gateway – is to determine how the average event rate varies with time. A related question is how the inter-event times are distributed during a period in which the average event rate is stationary. Accurate methods for obtaining such results can yield insight into how traffic varies at different points in the network as well as how to generalize the measured behavior at a given point to create representative workloads for system design. How frequently the event rate varies, for example, may impact the stability of various traffic control algorithms such as the control algorithms in the proposed new RCP [4] and XCP [7] transport protocols. On the other hand, obtaining accurate results for the average event rate vs time is challenging since the inter-event times are highly bursty (even in the case that the arrival process is Poisson), and changes in rate occur at unpredictable times. For packet arrivals, previous studies [10, 11] showed evidence of burstiness on different time scales. For such arrivals, the self-similar and multifractal models [1, 6, 12] are developed to match the statistical properties of the observed network traffic at both small and large timescales. But those models are not easy to apply because there is very I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 912–923, 2007. © IFIP International Federation for Information Processing 2007

Offline and Online Network Traffic Characterization

913

little intuition about how to modify the parameters to represent a range of workloads that the system should be designed to handle. Recent papers [8] show that packet arrivals on high bandwidth links tend towards Poisson because of the quickly increased multiplexing of the traffic on the network links. But a key question is how to identify the periods in which the event rate is stationary. A widely used method for estimating arrival rate as a function of time on-line – that is, as each new arrival event occurs – is the exponential weighted mean average (EWMA). This is a weighted sum of the estimated rate at the previous arrival event and the inverse of the most recent inter-arrival time. Recent work by Kim and Noble [9] shows that the EWMA estimates can be agile in detecting rate changes or stable during periods of fixed average rate, but not both simultaneously. They propose several alternative ad hoc filters which improve on EWMA, but still have the same general problem. Recent studies have used several off-line methods to determine average event rate as a function of time for various traffic traces. Zhang et al. [14] use two change point detection methods, known as “bootstrap” and “rank order” to study, for example, loss event rate vs. time for packet streams. These methods are computationally expensive and require a relatively large sample size, and as noted in their paper, are known to have non-negligible errors in terms of missed change points and false positives, respectively. Other recent work has used ad hoc binning to characterize request arrival rate to Internet media servers [2] and job arrivals to a large Internet compute server [3]. Such binning methods [11, 2, 3] are based on an ad hoc bin size and have two substantial drawbacks. First, periods of fixed rate are only identified by visual inspection and the endpoints are at the predefined bin boundaries. Second, each bin must have a good statistical sample and rate changes that occur within a bin cannot be identified. Finally, [8] apply the Canny Edge Detector algorithm [15] to the curve of cumulative event count versus time for packet arrivals to a high bandwidth (OC-48) Internet link. However, the accuracy of this method is very sensitive to the time unit used for updating the cumulative event count, and the time unit that yields accurate results for a given trace is unknown. This paper evaluates a possible new approach to determining the periods of stationary network event rate, namely a recently proposed highly efficient Bayesian analysis technique called Bayesian Blocks (BB), which was developed for characterizing the periods of constant brightness in photon counting data [13]. Key advantages of the BB analysis are, (1) it is computationally efficient, (2) the time unit and other input parameter values that yield accurate results are relatively easy to determine, and (3) it can be applied to a small sample size. These properties are particularly useful for on-line use of the technique. The BB analysis assumes a non-stationary Poisson event process, but there are intuitive reasons that it might also be successfully applied to more general arrival processes. Specifically, we make the following contributions: •

•

We quantify the accuracy of the offline BB analysis by applying it to a large number of synthetic event traces with exponential or heavier-tailed distributions of inter-event times and known rate changes. To our knowledge, the accuracy of the BBA technique has not been assessed previously. The BB technique was developed for off-line data analysis, but we also apply it on-line, to estimate that rate at each new arrival event (without knowledge of future arrival events).

914

•

•

•

S. Zhang and M.K. Vernon

The synthetic trace results show that 80-90% of the off-line BB rate estimates are within 30% of the actual rates in synthetic traces, even if the inter-event times have a heavy tail. The on-line BB rate estimation is less accurate than the offline analysis, but is still significantly more accurate than previous on-line rate estimation methods. We apply the BB analysis to a variety of measured network event traces, including packet arrivals to a gateway, request arrivals to Internet servers, and loss events in long Internet packet streams, and illustrate the insights that can be obtained. We find that the BB analysis can analyze a trace with over 106 events and over 5000 change points in just a couple of minutes on a modern desktop (Pentium M 1.66HZ).

Section II provides a brief description of the BB analysis technique. Section III presents the accuracy assessment of the BB analysis technique using synthetic traces. Section IV applies both off-line and on-line BB analysis to loss events in long Internet flows as well as Internet server request arrivals. Section V applies the BB analysis to characterize network gateway traffic. Section VI concludes the paper.

2 Background The BB analysis algorithm is both conceptually simple and computationally efficient, and is described very well in [13]. The interested reader is referred to that paper to understand how the algorithm works. A key feature of the algorithm is that for each interval beginning with the full trace of event times, it considers each possible partitioning of the interval into two intervals with different event rates, using an input parameter called the “odds threshold” (OT) and statistical maximum likelihood measures to decide whether the interval should be partitioned at the point with the highest likelihood of a change point. In addition to the OT parameter, there is a Minimum Interval (MI) parameter that defines the minimum number of events per period of fixed average arrival rate.

3 Accuracy Assessment 3.1 Synthetic Traces Each synthetic trace contains event times that are generated with one of the following distributions of time between events during each period of fixed average event rate: exponential, lognormal (logn), or 2-stage hyperexponential (h2). The absolute value of average event rate is immaterial because during the BBA we define the time unit for the event times to achieve an average of one event every 25 time units. This time unit is defined such that it is small enough so the probability that multiple arrivals will occur in the same time unit is negligible, since the calculations assume this will never

Offline and Online Network Traffic Characterization

915

occur, and it is as large as possible so the factorials from the number of time units in the interval do not dominate the calculated likelihood. Hence, the key parameter of the logn and h2 distributions is the coefficient of variation (CV) during each period of fixed average rate. We consider logn and h2 distributions with CV of 1.2, 2, and 3, motivated by results to date for actual event traces which are summarized in Section 4, 5. We expect that BBA accuracy will decrease as CV of the inter-event times increases, and also note that h2 inter-event times with CV=3 have a heavy tail. Each synthetic trace has the following further parameters: 1) number of changes in average rate (n), 2) number of events (m) between adjacent changes in rate, and 3) the rate change factor (rcf), which is the magnitude of the each change in rate. We consider values of n ranging from 1 to 64, and find that the BBA accuracy is independent of n, as might be expected. We vary m from 200 to 2000 when CV ≤ 1.2 , and otherwise from 1000 to 4000, and find as expected that accuracy increases as m increases (benefit from larger sample sizes). Similarly, we expect accuracy to increase as rcf increases, and we consider values of rcf ≥ 1.5. In each trace, each rate change is a rate increase with probability 0.5 and is otherwise a rate decrease. For each combination of interevent time distribution and values of m and rcf, we generate 10,000 traces for evaluating BBA accuracy. Results for a representative trace with n = 2, m=200, and Poisson inter-event times are shown in Figure 2; results for a trace with n = 7, m = 4000, and h2 inter-event times with CV=3 are given in Figure 3. Unless stated otherwise, the synthetic traces contain abrupt rate changes. Figure 2(d) provides results for a trace with gradual rate change for comparison. 3.2 Accuracy Measures The traditional accuracy measures for rate change point detection algorithms (e.g., [9]) are: (1) fraction of known rate changes that are detected within a pre-defined distance of the actual rate change, and (2) rate of false positives. However, these measures do not account for predicted rate changes that occur just before or just after the pre-defined distance, nor do they measure the accuracy of the magnitude of a correctly or falsely predicted rate change. Hence, we choose a more comprehensive measure, fe where e=15 or e=30, such that fe is equal to the fraction of actual event times at which the predicted rate is within e% of the actual event rate immediately prior to that event. Note that an undetected rate change will yield incorrectly predicted rate values for each event in the next period of average rate. 3.3 Parameter Value Selection BBA has two parameters other than the inter-event time unit which we define such that on average an event occurs every 25 time units. The two parameters, Odds Threshold and Minimum Interval (OT and MI), are such that increasing either value decreases the number of falsely predicted rate changes but also decreases the number of correctly predicted rate changes. We use BBA of the synthetic traces to determine which values of these BBA parameters leads to the highest overall fe accuracy.

916

S. Zhang and M.K. Vernon 1 0.8

f_15

0.6

0.4 800 400 200 100

0.2 0 1

10

100

1000

Odd Threshold (OT)

Fig. 1. Accuracy V.S. Odd Threshold (OT)

To conserve space, we omit the experimental results which show that for interevent time distributions with CV ≤ 1.2, the fraction of accurate rate estimates improves slightly as MI increases from 2 to 10 (close to the assumption, not much false positives), but does not improve for larger values of MI. Similarly, for inter-event time distributions with CV = 3, the overall accuracy on the synthetic traces improves as MI increases to 80 (because most false positive periods <80), but does not improve for larger values of MI. Hence, for all further results in this paper, we use MI=10 when CV ≤ 1.2 and MI = 80 when CV=3. Figure 1 shows the average f15 for traces with rcf = 1.5 and CV ≤ 1.2. The four curves in the figure correspond to different values of m. These results show that a smaller OT produces a higher overall accuracy, as measured by the fraction of the BBA rate estimates that are within 15% of the actual rate. Omitted due to space constraints are results for traces with CV = 3 where for each m ≥ 1000, average accuracy improves gradually as OT increases from 5 to 10,000 and then accuracy decreases for larger values of OT. For the higher CV, a larger OT is needed to avoid false positives. For the further BBA results in this paper, we use OT = 4 when CV < 2, and OT = 1000 for CV ≥ 2. 3.4 BBA Accuracy Results Figures 2, 3 provide results for particular traces that illustrate the typical accuracy of the BB analysis for different types of traces. The results in figures 2 (a, b, c) is for a trace with exponential interevent times (CV = 1), n = 2, m=200, and rcf = 2. The actual rate (equal to 10 for the first 200 events) is shown with a gray diamond symbol at selected points, while the predicted rate at each event is plotted with the solid black curve. Figure 2(a) shows that the BBA predictions closely match with the actual event rates throughout the trace. Figure 2(b) shows that a simulated BBA “on-line” that is at each event time BBA predicts the rate using all of the prior event times, most predicted rates still match fairly closely with the actual rate. In particular, the BBAonline is significantly better at predicting the location of rate changes and has overall more stable estimates than EWMA estimates which are illustrated in Figure 2(c). A

Offline and Online Network Traffic Characterization 40 Actual

Actual(H2,CV=1.2)

BB Offline(4,10)

BB online

Rate

30

Rate

20

917

20

10 0

0 0

0

100 200 300 400 500 600

400

600

Arrival Events

Eve nt

(a) BBA

(b) BBA-Online

Estimated by EWMA(0.9)

40

200

8

BBA (4,10) Actual

Actual 6 Rate

Rate

30 20 10

4 2

0

0 0

100

200

300

400

Event (c) EWMA

500

600

0

300

600

900

Event (d) Gradual Rate

Fig. 2. Representative Accuracy (CV=1, m=200, RCF=2 for a, b, c)

key result of our experiments is that BBA-online is significantly more accurate than EWMA, over all possible values of the EWMA weight parameter. Figure 2(d) provides results for a trace similar to previous ones in Figure 2, but with one gradual rate change instead of two abrupt rate changes. Again BBA is quite accurate, and BBA online estimates (omitted to conserve space) are significantly more stable than EWMA. Figure 3 shows that BBA is also reasonably accurate for most of the time when the inter-event time distribution has CV=3 and m = 4000. However, the BBA estimates for such traces have a reasonably high number of falsely predicted rate “spikes”, having width less than 100 events and rcf >2. Since those rate spikes can easily be “erased”, we consider measures of fe both with the spikes included and with them erased and replaced by the higher of the two rates before and after the spike.

918

S. Zhang and M.K. Vernon CV=3.1

CV=1

BB Offline (1000,25) Actual

20

1

16 Rate

0.8 12

e<15%, RCF>=2

0.6

8

0.4

4

e<=30%, RCF>=2

0.2 0

0 0

4000

8000

12000

16000

20000

24000

28000

32000

Event

Fig. 3. Representative Accuracy for Abrupt Rate Change, H2, CV=3.1

200

400

800

1000

4000

N um be r o f S a m ple s pe r P e rio d (m)

Fig. 4. Quantitative Accuracy

Figure 4 summarizes the overall average values of f15 and f30 as a function of CV, m, rcf, and e. The higher value in each pair of bars is for the case that e=30 and the “spikes” are erased. Note that for traces with CV ≤ 1.2 and rcf ≥ 2, f30 > 90% when m ≥ 400 and f30 > 80% when m ≥ 200. For traces with CV = 3 and rcf ≥ 2, f30 > 90% when m ≥ 4000 and f30 > 80% when m ≥ 1000. Under these conditions, we can use BBA to achieve fairly accurate analysis of the event rate in the actual event traces, with the understanding that 10-20% of the BBA rate estimates have error greater than 30%. In the next section we analyze several actual event traces and interpret the results in light of these accuracy results for the synthetic traces.

4 Media Server Arrivals 4.1 Off-Line Characterization of Media Server Load In this section we apply the BB offline analysis to characterize client session arrivals to a media server (called BIBS), which were previously found to be Poisson [2], and to characterize the client interactive requests to a different media server (called eTeach), which were found to have a Pareto inter-arrival time distribution during periods of approximately stable arrival rate. In the previous work, stationary periods were determined (approximately) by visual inspection of binned request arrival counts. Figure 5 (a) shows the result of BBA applied to a BIBS one day trace with highest load. We observe that the stationary periods identified by the BB analysis agree with the binned measures of number of arrivals in each hour, yet are significantly more precise. In particular, the BB analysis (a) clearly identifies the (most likely) endpoints of the stationary periods, (b) reveals longer intervals of constant rate than is apparent in the binned data, and (c) more precisely identifies the peak rates (e.g. at 9pm in

Offline and Online Network Traffic Characterization

Measured Lognormal(m=3,sigma=0.9,cv=1.1) Exponential (mean=31) H2(,p1=0.26,lamda1=10,lamda2=0.026)

1

Measured number per hour

120

1-F(X)

0.1

80

0.01 40 0

200 Arrival rate (per hour)

Arrival rate (per hour)

BB Offline(4,10)

4

8

12

16

20

24

Time (starting at 7AM)

(a) Predicted & Measured Rate VS Time

BB Online(4,10)

160 120

0.001 0

919

80 40 0

0

50

100

150

200

250

Interarrival time(secs)

(b) Interarrival time distribution for Hour 14.3-18.7

0

4

8

12

16

20

24

Time (hours since 7am)

(c) Predicted Rate VS Time

Fig. 5. Actual & BBA Predicted Media Server Request Arrival Rate

Figure 5(a)) which can be obscured by other analysis techniques and may be important for system design or configuration. The precise stationary periods also delineate the samples that can be used to obtain more detailed measures of the client arrival process. Further more, the BB results provide a simple global characterization of the observed client session requests as a time varying Poisson process with relatively infrequent, abrupt rate changes. The stationary intervals that have the highest rate, lowest rate, and largest duration are potentially of greatest significance for system design, configuration and optimization. Figure 5(b) shows the typical results for the request interarrival time distribution in each of the twenty highest rate periods found in the high rate days for the BIBS server. Similar results were obtained for the ten highest rate periods on the high rate days for the eTeach server. In both cases, 97-98% of the measured interarrival times fit the exponential distribution. On the other hand, the full distribution is more precisely modeled by a two-stage hyperexponential with a slightly heavier tail than the exponential. For both servers, this characterization is more precise than in the previous ad hoc analysis [2] because the periods of stationary rate could not be precisely delineated in the ad hoc analysis. The results for the eTeach server are also somewhat significant because several previous ad hoc characterizations of interactive client requests to Web servers (e.g. [2] and citations therein) have found the interarrival distributions to deviate significantly from the Poisson. Thus, the BB analysis provides a substantially new characterization of the interactive client requests to a Web media server, namely that these requests are nearly Poisson during the stationary periods, with a relatively small number of rate changes per day. 4.2 On-Line Estimation of Media Request Rate In this section we evaluate the BB analysis as a technique for estimating the media server request arrival rate during system operation. Such arrival rate estimates might

920

S. Zhang and M.K. Vernon

be needed by media caching algorithms or by the Patching streaming protocol to compute the maximum duration of the patch streams. For the BIBS request trace that was characterized off-line in Figure 5(a), Figures 5(c) provides the on-line rate estimation at each client arrival using the BB analysis.. The key result is that, generally, the BB online analysis provides significantly more stable and accurate online estimates than EWMA (EWMA results are similar as in Figure 2(c)). Very short transient spikes in the online rate estimate are due to local fluctuations in the interevent time. In some cases, it may be useful to take action in response to the high rate estimate. For example, the spikes could indicate that the file should be temporarily cached in memory, since disk bandwidth is limited and the file need not be cached for long. In other cases, it may be appropriate to ignore a temporary spike in estimated arrival rate, pending further evidence that the new rate will be sustained. We note that distinguishing the temporary spikes from the stationary rate estimates may be easier in the BB estimates than in the EWMA estimates, since the BB rate estimates are significantly more stable before and after the spikes. Thus, this BB-online analysis shows promise for online rate estimation.

5 Internet Packet Traffic and Loss Rate 5.1 Loss Events in UDP Flows We use the off-line BB analysis to characterize the loss events observed during two different 24-hour UDP packet flows. Figure 6 provides results for a flow which was transmitted between two sites in a metropolitan area network. We have similar observations on loss event traces as in media server analysis. Especially we identified up to 1.5 hours stationary loss rate period which could be well modeled by a Poisson process. And BB-online also is generally significantly more stable and accurate than previous rate estimation methods (e.g. ALI(8)).

Me a sur e d( me a n=22,c v=1.08) Expone nt ia l(me a n=22)

Estimated, offline(4,10) Measured each 1000 sec

3

5 4

0.1

1-F(X)

Loss Rate (%)

4

2

BB Online (4,10)

6

H2(la mda 1=0.063,la mda 2=0.03,p1=0.45)

1

Loss Rate (%)

5

3 2

0.01

1

1 0.001

0 9

11 13 15 Packet T ransmission T ime(hour)

(a) Predicted & Measured Rate VS Time

0

100

200

Inter Loss events

(b) Inter Loss event distribu tion for Hour 9.6-11

0 9

11

T ime(hour)

13

(c) Predicted Rate VS Time

Fig. 6. Actual & BBA Predicted Loss Rate

15

Offline and Online Network Traffic Characterization 100

BBoffline(1K,80) Measured Rateeach 250msec

6 4 2

0.1

60 40

0

0 -6

-4 -2 0 2 Arrival time (sec)

4

0.01

0.001

20

-8

Measured(Mean=0.4,CV=3.1) LognormalMLE(m=4.3,sigma=1.7) Pareto(k=0.384,alpha=1.96) H2(p1=0.88,lamda1=9.2,lamda2=0.36)

1

80

1-F(x)

8

Arrivals/Bin

Rate (arrivals/msec)

10

921

0.0001 0

200 400 600 800 1000 1200 Bin (10msec)

0

5

10

15

20

25

Inter arrival time (msec)

(a) Stationary Rate Period of 5 sec

(b) Binned Packet Arrivals

(c) Representative Interarrival

interval with 1st highest rate

Count

Distribution

Fig. 7. BBA on IPEX Gateway Trace

5.2 Packet Arrivals to the IPEX Gateway In this section, we apply the off-line BB analysis to characterize packet arrivals in an IPEX gateway [5] trace (32 hours) and a heavily multiplexed OC48 trace studied by [8] (1 hour). In Figure 7(a) we provide the BBA results including a 5-second interval with the largest number of packet arrivals (8699). The BBA result is commensurate with measured rate from binned packet arrival counts during each 250 milliseconds interval. Notably, the BB analysis finds intervals of several seconds in duration during which the packet arrival rate is estimated to be stationary, punctuated by abrupt (and large) rate changes. Figure 7(b) shows the packet arrival counts in 10 milliseconds for the same period in 7(a). In contrary to (a), the stationary intervals are not evident. And the highly variable measures are similar to what previous work showed using same binning method. This result suggested that previous binning results could not reveal the stationary rate periods because (1) It is highly dependent on the choice of the bin size. Too small bins cause high variability due to statistical fluctuations in the small samples and too coarse bins make the bin boundaries not likely aligning with endpoints of the stationary periods. (2) Also many bins plotted tightly causes visual effect of high variability. In contrast we selected the 250 msec as the bin size in 8(a) based on the estimated rate from BBA which reveals multi-second stationary rate periods. Figure 7(c) shows that the packet interarrival times during a stationary period are well modeled by a two-stage hyperexponential distribution, rather than by a Pareto or lognormal distribution. 5.3 Packet Arrivals to OC48 Trace We further analyzed the OC48 trace which does not have as highly variable packet arrivals as IPEX trace due to the heavy multiplexing. Figure 8(a) shows the BBA result from 1 minute interval with most number of arrivals in one hour. BBA reveals

922

S. Zhang and M.K. Vernon

1

Rate (arrivals/msec)

1-F(x)

0.1

Poisson(lamda=7.7)

0.01 0.001 0.0001 0

20

40

60

BB Offline (4,10)

200

Measured (7.7)

80

100

2 second Bin

180 160 140 120 100

0 5 10 15 20 25 30 35 40 45 50 55 60

Inter-Arrival Times (microsecond)

Arrivals (sec)

(a) OC48, BBA, 30see with most arrivals starting at 9:20AM, 8/14/2002

(b) Interarrival Distribution for Period 38.2-49 sec

Fig. 8. OC48, BBA

stationary rate period as long as 8 seconds with binned data commensurate with its result. This result is contrast to the findings of sub-second intervals from [8]. This is probably due to the limitations of Canny Edge Detection used in [8] which is sensitive to the choice to the parameter. Also the packet inter-arrival times during the stationary rate period we found in OC48 traces could be modeled approximately by Poisson as shown in Figure 8(b). Further research is needed, including analysis of packet traces for other sites before definitive conclusions can be drawn, but it appears that overall, the BB analysis results provide significant new insights into the local and medium timescale behavior of the packet arrival process that are relevant for creating network traffic workloads for system design.

6 Conclusion This paper has investigated a recently proposed efficient technique, called Bayesian Blocks (BB), for characterizing the time-varying rate in a bursty event stream. Key properties of the BB analysis are that it is simple to apply, computationally efficient, and requires a relatively small sample size. The accuracy of off-line BB analysis was assessed by applying the technique to a variety of synthetic traces with known rate changes. Off-line BB analysis was found to be able to accurately identify each period of constant rate as well as each period of stationary average rate during which the interevent times have a heavy-tailed distribution. BB analysis was applied to a variety of measured event traces of interest and all of the event traces we analyzed were found to have significant periods of stationary rate. Finally, we found BB-online to be significantly more accurate than previous on-line rate estimation methods. Future work includes applying the BB method to further traces such as TCP connection arrivals and FTP data connection arrivals. We are also interested in evaluating BB and other methods for detecting changes in the average round trip times and changes in the available bandwidth for network flows, which are needed for

Offline and Online Network Traffic Characterization

923

high performance rate control. Finally, further development of the BB-online algorithm is a topic of our current research.

References 1. P. Abry and D. Veitch, Wavelet Analysis of Long-range Dependent Traffic, IEEE Trans. on Info. Theory, Vol. 44, pp. 2-15 (1998) 2. J. M. Almeida, J. Krueger, D. L. Eager, and M. K. Vernon, Analysis of Educational Media Server Workloads, Proc. Nossdav’01, (2001). 3. S. Chiang and M. K. Vernon, Characteristics of a Large Shared Memory Production Workload", Proc. 7th Workshop on Job Scheduling Strategies for Parallel Processing (2001) 4. Nandita Dukkipati, Nick McKeown, Processor Sharing Flows in the Internet, Stanford HPNG Tech report (2004) 5. IPEX: Collaboration for Internet Traffic Measurement, http://www.xiwt.org/ipex/ipex. html 6. A. Feldmann, A. C. Gilbert, And W. Willinger, Data networks as cascades: Investigating the multifractal nature of Internet WAN traffic, SIGCOMM (1998) 7. Dina Katabi, Mark Handley Charlie Rohrs, Congestion Control for High Bandwidth-Delay Product Networks, Sigcomm (2005) 8. T. Karagiannis, M. Molle, M. Faloutsos, and A. Broido, A nonstationary Poisson View of Internet Traffic, Proc. Infocom 2004, Hong Kong, March (2004) 9. M. Kim and B. Noble, Mobile Network Estimation, MOBICOM (2001) 10. W. E. Leland, M. S. Taqqu, W. Willinger, and D. V. Wilson, On the Self-similar Nature of Ethernet Traffic, IEEE/ACM Trans.on Networking, Vol. 2, pp. 1-15 (1994) 11. V. Paxson and S. Floyd, Wide-Area Traffic: The Failure of Poisson Modeling, IEEE/ACM Trans. on Networking, Vol. 3, No. 3, pp. 226-244, June (1995) 12. R. Riedi, M. Crouse, V. Ribeiro, and R. Baraniuk, A Multifractal Wavelet Model with Application to Network Traffic, IEEE Trans. on Information Theory, Vol. 45, No. 3, pp. 992-1018(1999) 13. J. D. Scargle, Studies in Astronomical Time Series Analysis v. Bayesian Blocks, A New Method to Analyze Structure in Photon Counting Data, The Astrophysical Journal, Vol. 504, Sept. 1, pp. 405-418.( 1998) 14. Y. Zhang, N. Duffield, V. Paxson, and S. Shenkar, On the Constancy of Internet Path Properties, Proc. Internet Measurement Workshop, San Francisco, (2001) 15. J. Canny. A Computational Approach To Edge Detection. In IEEE Transactions Pattern Analysis and Machine Intelligence (1986)

Catching IP Traﬃc Burstiness with a Lightweight Generator Chlo´e Rolland1 , Julien Ridoux2 , and Bruno Baynat1 1

Universit´e Pierre et Marie Curie - Paris VI, LIP6/CNRS, UMR 7606, Paris, France {rolland,baynat}@rp.lip6.fr 2 ARC Special Research Center for Ultra-Broadband Information Networks (CUBIN), an aﬃliated program of National ICT Australia The University of Melbourne, Australia [email protected]

Abstract. This paper presents LiTGen, an easy to use and tune openloop traﬃc generator that statistically models IP traﬃc on a per user and application basis. From a packet level capture originating in an ISP wireless network1 , and taking the example of Web traﬃc, we show that the simple underlying structure of LiTGen is suﬃcient to reproduce the traﬃc burstiness accurately. In addition, the ﬂexibility of LiTGen enables us to investigate the sensitivity of the traﬃc structure with respect to the distributions of the random variables involved in the model, and their possible dependencies. Keywords: Traﬃc generator, scaling behaviors, second-order analysis, semi-experiments.

1

Introduction

Measuring, understanding and reproducing network traﬃc characteristics are essential steps of traﬃc engineering. Traﬃc generators usually tackle the later step. Among past proposals, focusing primarily on Web traﬃc, [1] and [2] proposed hierarchical models, but did not validate them against real traﬃc traces. The work presented in [3] is an eﬀort to generate representative traﬃc for multiple and independent applications. However, the model underlying the introduced generator was not designed to specify the packet level dynamics neither to catch the traﬃc scaling structure. Recently, [4] aims at reproducing the burstiness observed in captured traﬃc but relies on a third party link and network layers opaque emulator requiring high computing resources. This paper presents LiTGen, a “Light Traﬃc Generator”. LiTGen relies on a simple hierarchical description of traﬃc entities, most of them modeled by uncorrelated random variables and renewal processes. This design does not require to consider network or protocol characteristics (e.g. RTT, link capacities, 1

This study would not have been conducted without the support of Sprint Labs. The authors would like to thank Sprint Labs for providing the wireless traces and particularly A. Sridharan for his support.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 924–934, 2007. c IFIP International Federation for Information Processing 2007

Catching IP Traﬃc Burstiness with a Lightweight Generator

925

TCP dynamics. . . ) and allows fast computation executed on a commonplace computer. Focusing on Web traﬃc, this paper confronts LiTGen to real traces captured on an operational wireless access network at Sprint Labs. Using a second-order analysis, we identify the dependencies across the random variables composing LiTGen’s underlying model and prove LiTGen’s ability to reproduce accurately the captured traﬃc and its properties over a wide range of timescales. To the best of our knowledge, we are the ﬁrst ones to produce synthetic wireless traﬃc for the ﬁrst two orders of the internal time series. In the rest of this paper, section 2 describes LiTGen; section 3 develops the second-order analysis we used. We then investigate in section 4, the sensitivity of the traﬃc structure with respect to the distributions of the random variables involved in the underlying model. Finally we conclude this paper with a summary of our ﬁndings and directions for future work.

2 2.1

Building a Lightweight Traﬃc Generator Underlying Model

Earlier works identiﬁed several possible causes of correlation in IP traﬃc, namely the superimposition of traﬃc sources modeled by heavy tailed distributions [5,6] and the inherent structure and interactions of protocol layers [7]. These assumptions call on the conception of a traﬃc generator based on a user-oriented approach and a semantically meaningful hierarchical model. This model is made of several levels, each of them characterized by a speciﬁc traﬃc entity. Each network entity deﬁned is represented by one or several random variables either related to a time (duration or inter-arrival time) or a size metric. Session level. We assume each user undergoes an inﬁnite succession of session and inter-session periods. Taking the example of Web traﬃc, a user downloads a certain number of Web pages during a session. The random variable Nsession describes the user’s session size, counting the number of pages downloaded, while TIS characterizes the inter-session durations. Page level. The Web pages downloaded during a session are separated by reading times (OFF periods). We deﬁne two random variables to characterize this level: the page size Npage describes the number of objects involved in a page and Tof f models the corresponding reading duration. Object level. Each page is split up into a set of requests (sent by the user) and responses (from the server), where responses gather the page’s objects (HTML skeletons or embedded objects such as pictures). IAobj and Nobj characterize respectively the objects inter-arrivals in a page and the number of packets in an object. Packet level. Finally, each object is made of a set of packets. IApkt characterizes the successive inter-arrivals times between packets in an object. LiTGen’s model is then kept simple since not modeling the client/server interactions. Also the model does not rely on a complex emulator that would reproduce

926

C. Rolland, J. Ridoux, and B. Baynat

the link layer or TCP dynamics. Note however that network characteristics and/or TCP dynamics can explicitly be taken into account by introducing simple queuing or Markovian models as input of our traﬃc generator. This work is currently under investigation. Finally, note that one can equivalently remove from the hierarchy the session level by including the inter-sessions durations in the OFF periods durations distribution. Nevertheless, it would make the characterization of Tof f more complex and LiTGen less easy to use in practice. 2.2

Traﬃc Entities Identiﬁcation

To calibrate LiTGen’s underlying model, we beneﬁt from data traces captured on the Sprint PCS CDMA-1xRTT access network. The traﬃc has been captured on an OC-3 collecting link and corresponds to tens of wireless access cells (more details about the trace and its diﬀerences with wireline traﬃc, for both upload and download traﬃc, can be found in [8]). The trace consists of a collection of IP packets with accurate timestamps, entire TCP/IP headers, and provides a large diversity of users’ applications traﬃc. Because it has been extensively studied and is well known, we focus in this paper on the Web traﬃc downloaded by the mobile users on the wireless network to validate LiTGen. The application of LiTGen to other types of traﬃc, such as mail or P2P, can be found in [9]. The model calibration requires to identify the traﬃc entities from the captured trace. We aggregate packets to identify objects, pages and sessions, based on the 5-tuple associated to each packet ({IPD , IPS , portD , portS , proto}). A ﬁlter based on a source port number selection ({80, 8080, 443}) retains Web packets only2 . A user’s Web packets share the same destination IP address and are then grouped into sets of a given ({IPS , portD }) pair. These sets correspond to the Web ﬂows the user requested. We identify objects within the packets sets by analyzing the TCP ﬂags (SYN, ACK. . . ) of the TCP/IP headers. This method, comparable to [10], is particularly useful when HTTP persistent connections are used since a set of packets may carry several Web objects. Because we do not have access to the packets’ payloads, the identiﬁcation of Web pages relies on heuristics to determine their boundaries. Once Web objects have been delimited, we deﬁne active periods during which one or several objects are being downloaded, opposed to inactive periods. The silence corresponding to a given inactive period may be due to the user (thus it is an OFF period) or the system (e.g. idle time due to the Web browser). We use a temporal clustering method (also used in [1,2,10]) to distinguish those two kinds of silences: an inactive period that lasts for more than a predeﬁned threshold is labeled as an OFF period. Based on the inactive periods distribution, we empirically set the threshold to 1 second, a result consistent with [1,2,10]. Similarly, we aggregate Web pages into user sessions, setting empirically a threshold value to 300 seconds to separate OFF periods from inter-sessions3. 2 3

See [9] for other kinds of traﬃc. Note that the precise value of this threshold does not impact signiﬁcantly the results.

Catching IP Traﬃc Burstiness with a Lightweight Generator

2.3

927

Traﬃc Generation

While in this paper we apply LiTGen to Web traﬃc, it can obviously be adapted to any kind of traﬃc [9]. When numerous applications are multiplexed, we ﬁrst set the number of users for each of them. For validation purposes we extract each application proportion and number of users from the captured trace. In an operational network these statistics can be derived from operator’s knowledge of customers subscribed services. LiTGen then generates traﬃc for each user independently, from upper level entities (sessions) to lower ones (packets). For a given user, the process begins by generating a session starting at time t = 0. Lower levels traﬃc entities are then created until all the user’s packets have been generated. A random circular shift is performed on each packet timestamp to accurately mix the diﬀerent users traﬃc, since the ﬁnal synthetic trace is obtained by superimposing synthetic traﬃc of all users and all applications.

3

Validation

We evaluate LiTGen on its ability to capture the complexity of the traﬃc correlation structure. To this aim, we use an energy spectrum comparison method to match the packets arrivals time series extracted from the captured trace and the corresponding synthetic trace. Since the 24-hour trace is not stationary (see [8] for details), the analysis is performed on a one-hour long period extracted from the entire captured trace. The results presented in the following correspond then to a given one-hour period; similar results were obtained for the other one-hour extracted traces. 3.1

Wavelet Analysis

We use the Logscale Diagram Estimate or LDE [11] to perform discrete wavelet transform analysis. For a given time series of packets arrivals, the LDE produces a logarithm plot of the data wavelet spectrum estimates. Although the LDE has the ability to identify correlation structures in the data trace [12], we mainly use it to assess the accuracy of the synthetic traces produced by LiTGen. We ﬁrst generate synthetic traﬃc using a simple version of our generator, called basic LiTGen. In this version, all traﬃc entities are generated from renewal processes, using the empirical distributions extracted from the captured trace; and no dependency of any kind is introduced between the random variables. Figure 1 shows that the spectrum of the synthetic trace produced by basic LiTGen (thin curve) is far diﬀerent from the captured trace spectrum. As a ﬁrst conclusion basic LiTGen does not succeed in reproducing the captured traﬃc scaling structure with a good accuracy. We thus need to introduce additional dependencies between the random variables. In the following we investigate the impact of the introduction of a very simple correlation structure in LiTGen on the produced traces. Previous studies [8,13] pointed out that a great part of the LDE energy was due to the organization of packets within ﬂows. This leads to reﬁne LiTGen’s

928

C. Rolland, J. Ridoux, and B. Baynat

23

488mus 0.002 0.0078 0.031

0.12

0.5

2s

8s

32s

128s

512s

7

9

Captured trace Synthetic trace using basic LiTGen Synthetic trace using extended LiTGen

15

2

log Variance(j)

19

11

7

−11

−9

−7

−5

−3

−1 scale j

1

3

5

Fig. 1. Evaluating basic VS extended LiTGen Table 1. Fitting the average values of IAspkt Approximated f (s) = mean(IAspkt)

f (s) = a.sb , a = 0.8811 and b = −0.5897

Indices of Goodness of ﬁt Sum of Squares due to Error

0.1868

Square of the multiple correlation (R-Square) 0.8712 Degrees of Freedom Adjusted R-Square

0.8693

Root Mean Squared Error

0.0524

model: in this extension, referred to as extended LiTGen, the in-objects packets arrival times are still modeled by renewal processes, but now the average in-object packets inter-arrival times depend on the corresponding object size. Thus, in order to evaluate extended LiTGen, we extract size-dependent empirical distributions of in-objects packets inter-arrivals from the captured trace (the maximum object’s size in terms of packets, extracted from the captured trace, is equal to 15547). When generating traﬃc, the packets inter-arrivals in a Web object of size s are then taken from the corresponding IAspkt distribution. The resulting spectrum (circle curve in Fig. 1) is barely distinguishable from the captured one. The introduction of this simple dependency between the objects sizes and the packets inter-arrival times succeeds in reproducing accurately the traﬃc correlation structure. This dependency may reﬂect the impact of TCP slow start on objects of diﬀerent sizes: the bigger an object, the shorter its average packets inter-arrival times. Indeed, the average values of the IAspkt distributions can be approximated with a good accuracy by a decreasing power law, strengthening the previous conjecture. Table 1 gives the goodness of such a ﬁt taking into account the objects of size s < 70 (ﬁt with cutoﬀ, the population of objects of larger size is scarce, 99% of the captured values correspond to objects of size s < 12). Without considering network or protocol peculiarities, extended LiTGen reproduces the second order traﬃc characteristics while remaining much simpler than [4]. However, in order to use extended LiTGen in operational conditions, we

Catching IP Traﬃc Burstiness with a Lightweight Generator

23

488mus 0.002 0.0078 0.031

0.5

2s

8s

32s

128s

512s

3

5

7

9

Captured trace LiTGen: equivalent population LiTGen: level n/4 LiTGen: level n/16 LiTGen: level n/32 LiTGen: level n/64

19

log2 Variance(j)

0.12

929

15

11

7

−11

−9

−7

−5

−3

−1 scale j

1

Fig. 2. Investigating grouped IAgpkt distributions in LiTGen

need to be able to characterize how the packets inter-arrival times distribution depends on the objects sizes. Obviously, the best would be to model analytically this relation, either by ﬁnding a suitable distribution whose parameters are described as functions of the objects sizes, or, as stated before, by involving simple (e.g. Markovian) TCP and/or network models as an input of our traﬃc generator. This work is currently under investigation. As a ﬁrst attempt to understand the impact of this relation between the inobjects packets inter-arrivals and the objects sizes over the traﬃc characteristics, and to help us carrying through analytical modeling, we ﬁrst investigate the possibility of reducing the number of IAspkt distributions to be considered, in order for LiTGen to remain simple and accurate. To this aim, we group objects of diﬀerent sizes and compute “grouped” IAgpkt distributions. This operation requires to determine how many groups are needed and their composition. We ﬁrst group objects by maintaining an equivalent size p of the population for each group. The ﬁrst group contains all the objects of size s = 2 and deﬁnes the population p. The following group is made by gathering objects of increasing sizes till the population of the group reaches p; and so on for the other groups. Because the population of the ﬁrst group is large, we obtain only three groups corresponding to the following object sizes: {[2], [3;6], [7;15547]}. Thus, the packets inter-arrival times in a Web object composed of two packets are taken from the empirical distribution IAs=2 pkt . The packets inter-arrival times in a Web object composed of three to six packets are taken from the empirical distribution s=3∪4∪5∪6 IApkt ; and so on. The resulting spectrum (triangle line in Fig. 2) is very similar to the basic LiTGen one. Finally, the small objects do not have a major impact on the spectrum. The large ones, even if less represented, carry a great number of packets, and have a signiﬁcant impact on the spectra, at medium and large time scales. In another attempt, we group the objects by describing the objects sizes distribution along a binary tree. The root of the tree is a single group corresponding to basic LiTGen and the use of one empirical distribution IApkt (not dependent on the objects sizes). The set of groups representing the tree leaves (deepest

930

24

C. Rolland, J. Ridoux, and B. Baynat

488mus 0.002 0.0078 0.031

0.5

2s

8s

32s

128s

512s

24

488mus 0.002 0.0078 0.031

0.5

2s

8s

32s

128s

512s

1

3

5

7

9

16

2

16

0.12

Reference: Synthetic trace Semi−exp: A−Pois Semi−exp: S−Thin Semi−exp: P−Uni Semi−exp: T−Pkt

20

log Variance(j)

20

log2 Variance(j)

0.12

Reference: Captured trace Semi−exp: A−Pois Semi−exp: S−Thin Semi−exp: P−Uni Semi−exp: T−Pkt

12

8

4

12

8

−11

−9

−7

−5

−3

−1 scale j

1

3

5

7

9

(a) SE - captured trace

4

−11

−9

−7

−5

−3

−1 scale j

(b) SE - synthetic trace using extended LiTGen

Fig. 3. LiTGen underlying model evaluation: semi-experiments methodology

level n in the tree) corresponds then to extended LiTGen and the use of the maximum number n of empirical distributions IAspkt , for each diﬀerent object size s. Finally, the intermediate levels i in the tree correspond to the use of i emi during the generation process. In ﬁgure 2, we observe pirical distributions IAgpkt spectra of the groups formed by speciﬁc levels of the binary tree. As the level observed is deeper in the tree (see the ﬁgure, from level n/64 to n/4), and the number of IAgpkt distributions used is greater, the corresponding spectrum gradually gets closer to the reference spectrum (and extended LiTGen) conﬁrming the equivalent contribution of small and large objects. Other methods to build objects groups led to similar spectra and interpretation. In conclusion, no key value deﬁning the number of groups appears; the ability to reduce the number of IAspkt distributions to model depends directly on the desired accuracy. 3.2

Semi-experiments Analysis

To exhibit the internal properties of LiTGen’s synthetic traﬃc, we now conduct an analysis based on semi-experiments (SEs). SEs have been introduced in [14] and consist in an arbitrary but insightful manipulation of internal parameters of the time series studied. The comparison of the energy spectrum on the LDE before and after the SE leads to conclusions about the importance of the role played by the parameters modiﬁed by the SE. Using extended LiTGen, we apply the same set of SEs to the captured and synthetic traces and observe how they impact both spectra (Fig. 3(a) and 3(b)). T-Pkt is a Truncation manipulation that transforms the objects arrival process by keeping only the ﬁrst packet of each object. Removing packets decreases the energy of the spectrum that takes smaller values. As shown in ﬁgures 3(a) and 3(b), T-Pkt has a similar impact on the captured and the synthetic traces.

Catching IP Traﬃc Burstiness with a Lightweight Generator

931

S-Thin tests for the independence of objects. It randomly Selects objects with some probability, here equal to 0.9. When applying S-Thin, the spectra of the two traces keep the same shape but drop by a small amount close to log2 (0.9) = −0.15 (barely visible on the plots). A-Pois targets the interactions between objects. This manipulation repositions the objects Arrival times according to a Poisson process and randomly permutes the objects order (preserving the internal structure of objects). While A-Pois is a drastic manipulation, it has similarly very little eﬀect on the spectra of the two traces, indicating the negligible contribution of the objects arrival process. P-Uni reveals the impact of in-objects packets burstiness. It uniformly distributes arrival times of packets in each object while preserving packets count and object duration. This manipulation ﬂattens the spectrum from scales j = −11 to j = −5 in a comparable manner for the captured and synthetic traces. To sum up, the captured and synthetic traces spectra present similar reactions to each SE. As a consequence, LiTGen reproduces the key internal properties of the captured traﬃc highlighted by the SEs, i.e. the objects arrival process has few inﬂuence; the objects can be considered as independent and the packets arrival process within objects contributes mostly to the energy spectrum. The simple structure of LiTGen, which still relies on renewal processes, is thus suﬃcient to reproduce these traﬃc internal properties.

4

Sensitivity of the Traﬃc with Regard to the Distributions

We now investigate if “well-known” distributions can accurately approximate the empirical ones. Using statistical goodness of ﬁt tests, we found that heavy tailed distributions approximate well the majority of the random variables: while

23

488mus 0.002 0.0078 0.031

0.12

0.5

2s

8s

32s

128s

512s

23

488mus 0.002 0.0078 0.031

Synthetic trace Synthetic trace: investigating IA

obj

Synthetic trace: investigating Npage

0.5

2s

8s

32s

128s

512s

3

5

7

9

Synthetic trace: investigating Nobj

Synthetic trace: investigating T

19

0.12

Synthetic trace Synthetic trace: investigating IApkt

19

off

Synthetic trace: investigating T

IS

log Variance(j)

15

15

2

log2 Variance(j)

Synthetic trace: investigating Nsession

11

7

11

−11

−9

−7

−5

−3

−1 scale j

(a) Insensitive: IAobj , Nsession and TIS

1

3

5

Npage ,

7

9

Tof f ,

7

−11

−9

−7

−5

−3

−1 scale j

1

(b) Sensitive: IApkt and Nobj

Fig. 4. Sensitivity of the traﬃc scaling behaviors with regard to the r.v. distributions

932

C. Rolland, J. Ridoux, and B. Baynat

23

488mus 0.002 0.0078 0.031

0.12

0.5

2s

8s

32s

128s

512s

Synthetic trace (empirical distributions) Synthetic trace: ml distributions except IApkt & Nobj Synthetic trace: ml (memoryless) distributions

15

2

log Variance(j)

19

11

7

−11

−9

−7

−5

−3

−1 scale j

1

3

5

7

9

Fig. 5. Investigating the substitution of all empirical distributions

Npage , Nobj and Tof f are close to power laws, Nsession and IAobj are close to sub exponential distributions (respectively Lognormal and Weibull, see [15] for more details). Even if we did not manage to model IApkt by a well-known distribution accurately, it is closer to sub exponential distributions than exponential ones. Finally, TIS can be well approximated by an exponential distribution. While several works studied the relationship between traﬃc burstiness, and network or protocol characteristics (e.g. loss probabilities, RTT, link capacities, TCP dynamics) [4,13,16], the ﬂexibility of extended LiTGen enables us to investigate the impact of random variables distributions on traﬃc burstiness. To this aim, we replace individually the empirical distribution of each random variable by a memoryless distribution (exponential or geometric) of same mean. We thus create seven synthetic traces, each one corresponding to a given random variable being replaced. We then compare these traces to the reference synthetic trace generated by extended LiTGen calibrated with the empirical distributions. Fig. 4(a) illustrates the investigation of IAobj , Npage , Tof f , Nsession and TIS . Even if heavy tailed distributions ﬁt accurately these random variables, except TIS , their modeling by memoryless distributions leads to very few impact on the data spectra. Indeed, we can barely distinguish the synthetic traces from the reference one. In conclusion, the traﬃc scaling structure of the studied wireless trace is insensitive with respect to the distributions of these random variables. In Fig. 4(b), we present the investigation of IApkt and Nobj . As an example, we observe that modeling IApkt by an exponential distribution ﬂattens the spectrum at scales below j = −3. The traﬃc scaling structure is then very sensitive to the distributions of IApkt and Nobj . This conﬁrms the results obtained by the SEs methodology that highlighted the packets inter-arrival process within objects as the main source of energy in the spectrum. Finally, we create two new synthetic traces, which we compare to the reference one. We obtain the ﬁrst trace by modeling the insensitive random variables (cf. Fig. 4(a)) with memoryless distributions. We thus create a synthetic trace in which only Nobj and IApkt are calibrated with empirical distributions. Fig. 5 shows that the corresponding spectrum (triangle line) matches the reference

Catching IP Traﬃc Burstiness with a Lightweight Generator

933

one. This result has a strong practical appeal: ﬁve of the seven random variables are modeled by memoryless distributions, while reproducing the traﬃc scaling structure. Moreover, we found that this spectrum reproduces the traﬃc internal properties highlighted by the semi-experiments in section 3.2. In Fig. 5, the last synthetic trace is obtained by modeling all the random variables with memoryless distributions. We observe a great deviation between the corresponding spectrum (square line) and the reference one, which conﬁrms the sensitivity of the traﬃc with respect to the distributions of IApkt and Nobj .

5

Conclusion

This paper describes LiTGen, a light traﬃc generator that reproduces accurately the traﬃc scaling properties at small and large time scales. Illustrated on Web traﬃc, we show the accuracy of LiTGen to maintain second-order traﬃc characteristics without considering network or protocol peculiarities. We highlighted the dependency between objects sizes and in-object packets inter-arrivals, and showed how this impacts the quality of the generator. Thanks to LiTGen, we investigated the impact of the random variables distributions describing the IP traﬃc structure. This investigation helped us to model simply these random variables and also identiﬁed the crucial ones. As an example, the objects sizes (in number of packets) and the respective packets inter-arrivals have to be modeled carefully in order to reproduce accurately the original traﬃc spectrum. Nevertheless, our study demonstrated that the presence of heavy tailed distributions in traﬃc does not necessarily implies correlation, some of them can be modeled by memoryless distributions without impacting the traﬃc scaling properties. In future works, we will investigate a new methodology to evaluate LiTGen’s accuracy. We will compare the performance of a simple queue model fed, in the one hand by the captured traﬃc and, in the other hand by the synthetic traﬃc generated with LiTGen. The ﬁrst results conﬁrm LiTGen’s ability to catch the captured traﬃc properties accurately: the performance of the queue under the synthetic traﬃc are very close to the ones obtained under the captured traﬃc, whereas simpler renewal processes (such as Poisson processes) as an input give performance parameters that are very far from reality. We will also investigate the dependency between object sizes and the packets arrival process as a possible signature for anomaly detection.

References 1. Mah, B.A.: An empirical model of http network traﬃc. In: IEEE Infocom, Kobe, Japan (April 1997) 2. Barford, P., Crovella, M.: Generating representative web workloads for network and server performance evaluation. In: ACM SIGMETRICS, Madison, Wisconsin, USA (June 1998) 3. Sommers, J., Barford, P.: Self-conﬁguring network traﬃc generation. In: ACM IMC, Taormina, Sicily, Italy (October 2004)

934

C. Rolland, J. Ridoux, and B. Baynat

4. Vishwanath, K.V., Vahdat, A.: Realistic and responsive network traﬃc generation. In: ACM SIGCOMM, Pisa, Italy (September 2006) 5. Crovella, M., Bestavros, A.: Self-similarity in world wide web traﬃc: Evidence and possible causes. In: ACM SIGMETRICS, Philadelphia, PA, USA (May 1996) 6. Willinger, W., Taqqu, M.S., Sherman, R., Wilson, D.V.: Self-Similarity throught high-variability: Statistical analysis of ethernet LAN traﬃc at the source level. In: ACM SIGCOMM, Philadelphia, PA, USA (August 1995) 7. Misra, V., Gong, W.B.: A hierarchical model for teletraﬃc. In: IEEE CDC, Tampa, Florida, USA (December 1998) 8. Ridoux, J., Nucci, A., Veitch, D.: Seeing the diﬀerence in IP traﬃc: Wireless versus wireline. In: IEEE Infocom, Barcelona, Spain (April 2006) 9. Rolland, C., Ridoux, J., Baynat, B.: LiTGen, a lightweight traﬃc generator: application to P2P and mail wireless traﬃc. In: PAM, Louvain-la-neuve, Belgium (April 2007) 10. Donelson-Smith, F., Hernandez-Campos, F., Jeﬀay, K., Ott, D.: What TCP/IP protocol headers can tell us about the web. In: ACM SIGMETRICS, Cambridge, Massachusetts, USA (June 2001) 11. D. Veitch and P. Abry: Matlab code for the wavelet based analysis of scaling processes, http://www.cubinlab.ee.mu.oz.au/∼darryl/. 12. Abry, P., Taqqu, M.S., Flandrin, P., Veitch, D.: Wavelets for the analysis, estimation, and synthesis of scaling data. In: Self-Similar Network Traﬃc and Performance Evaluation. Wiley (2000) 13. Jiang, H., Dovrolis, C.: Why is the internet traﬃc bursty in short time scales? In: Sigmetrics, Banﬀ, Alberta, Canada (June 2005) 14. Hohn, N., Veitch, D., Abry, P.: Does fractal scaling at the IP level depend on TCP ﬂow arrival process? In: ACM IMC, Marseille, France (November 2002) 15. Rolland, C., Ridoux, J., Baynat, B.: Hierarchical models for diﬀerent kinds of trafﬁcs on CDMA-1xRTT networks. Technical report, UPMC - Paris VI, LIP6/CNRS (2006) http://www-rp.lip6.fr/∼rolland/techreport.pdf. 16. Figueiredo, D., Liu, B., Misra, V., Towsley, D.: On the autocorrelation structure of TCP traﬃc. In: Computer Networks. Volume 40. (October 2002) 339–361

Ιmportance of the Maturity of Photonic Component Industry on the Business Prospects of Optical Access Networks: A Techno-Economic Analysis Dimitris Varoutas1, Thomas Kamalakis1, Dimitris Katsianis1, Thomas Sphicopoulos1, and Thomas Monath2 1

Department of Informatics and Telecommunications, University of Athens Greece, GR15784 2 T-Systems Nova GmbH, Goslarer Ufer 35, D-10589

Abstract. This paper discusses the influence of the maturity of the photonic component industry (PCI) in the business prospects of optical access network deployments. Using the TONIC techno-economic tool, the business prospects of such deployments (in terms of traditional investment indexes such as the Net Present Value – NPV or Internal Rate of Return – IRR) are related to the several factors which characterize the maturity of the PCI such as production volumes and component cost evolution. The analysis shows that even if the cost of Fiber-to-the-Home/Office (FTTH/O) and Fiber-to-the-Curb scenarios are mainly influenced by fiber installation costs, the price reduction rate of photonic components may also affect the investment strategies. To speed up the cost reductions, telecom carriers should therefore invest on optical technology research and development.

1 Introduction The photonics era is marked by an unprecedented bandwidth increase in the core and metropolitan area network. Optical technologies are presently migrating towards the customer premises where various scenarios can be considered. In the Fiber-to-theHome/Office scenario, the optical fiber is deployed up to the customer premises. The world leader in FTTH/O is Japan, where the number of subscribers has reached 6 millions and is growing steadily since 2003. Many countries in Europe are also considering FTTH/O in their major cities. However, financial reality often forces telecom carriers to turn to intermediate solutions such as Fiber-to-the-Curb (FTTC) [2]. The business prospects of such large telecom investment projects can be investigated using techno-economic methods and tools. Applying such methodologies, and taking into account the characteristics of European urban areas as well as service demand forecasts, it has been shown than FTTC solutions are indeed much more favorable to FTTH/O [1]. In [3], the historical competitive and economic reasons for the slow deployment of FTTH in the United States market have been analyzed and technical and regulatory proposals have been concluded. In addition to these works, studies on the economic perspectives of specific optical network architectures and technologies have recently appeared 4-7, but they focus mainly in I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 935–944, 2007. © IFIP International Federation for Information Processing 2007

936

D. Varoutas et al.

analyzing the original installation cost. However, in order to fully understand the business prospects of a telecom investment, many other financial aspects such as Operation, Administration and Maintenance (OA&M) costs must be taken into account in order to compute revenues and cash flows. Another point to consider is that the costs of the various network components tend to decrease in time as a result of increased production volumes as well as research and innovation. The PCI is a relatively immature industry compared to the electronic industry. Although integrated optics have made remarkable progress over the years, there are still many barriers preventing the industry to reach electronic-like standards. This is mainly due to the small scale of integration and to the fact that there are still many competing integration platforms (InP, silicon and organic materials). This paper aims to highlight the way in which the maturity of the photonic component industry affects the prospects of hybrid fiber/copper Fiber-To-The (FTTx) access architectures from the operator’s point of view. Towards this end, the TONIC techno-economic tool, developed within a series of EU funded projects is used. This methodology have been used to study various upgrade or deployment scenarios for both optical and wireless telecommunication networks [2,9,10]. The current version of the tool has been developed within the ISTTONIC (Techno-economics of IP Optimized Networks and Services) project [11] and the CELTIC/ECOSYS (techno-ECOnomics of integrated communication SYStems and services) project [12]. The scenarios studied in this paper are based on either Fiber-To-The Cabinet (FTTC) or Fiber-To-The-Home/Office FTTH/O architectures. These can be implemented using two alternatives: an Asynchronous Transfer Mode (ATM) Passive Optical Network (PON) or an Ethernet-based access network with optical point-topoint connections (Figure 1). As shown in [2] the business prospects of ATM and Ethernet technologies are practically the same in both FTTC and FFTH/O. Using these network infrastructures and defining suitable service and traffic characteristics, the various scenarios are studied over an 8-year period (2003-2010) for different types of metropolitan areas assuming an incumbent operator. The starting year chosen to be 2003 rather than 2006, since in many countries as in Japan, FTTH/O deployments date as back as 2003 and the aim of the paper is to demonstrate how investments in optical technology may also affect FTTH/O and FTTC deployments that have been made as early as that. The discount ratio (reflecting the cost of capital, the opportunity cost and the associated risk) is taken 10% which is a mean value between telecom operators. Taxes have not been considered at all, since they vary depending on the country in question and the market conditions. Table 1. Area Structure And Duct Availability Area

Lmean(m)

Ds(Km-2)

Nb

Nsb

D1

D2

D. Urban Urban Suburban

1400 2200 3400

5641 2048 410

1024 2048 16384

64 32 4

90% 60% 25%

50% 40% 25%

Ιmportance of the Maturity of Photonic Component Industry

937

Fig. 1. Generic Comparison Architectures

2 Technoeconomic Methodology The model’s operation is based on its database, where the cost figures of the various network components are kept and are constantly updated from data gathered from the biggest European telecommunication companies. The business case is also determined by the broadband services to be provided by the network. This includes the estimation of the market penetration of each service over the study period. The tariffs for the broadband services have to be defined, i.e. the part of the tariff that is attributed to the network under study. From the combination of yearly market penetration and yearly tariff information the TONIC tool calculates the revenues for each year. The broadband access technologies and the architecture of the network to provide the selected service set must be explicitly defined and this needs some network planning expertise. The tool may be used for both wired and wireless scenario evaluation while many network architectures can be accounted for, such as tree, mesh or ring architectures. To estimate the various infrastructure parameters, such as fiber cable and duct lengths the TONIC tool includes a set of geometric models. These geometric models are optional parts and the TONIC/ECOSYS tool can be used without them, e.g. for radio access technology evaluation, where no geometric models are necessary. The result of the architecture scenario definition is the so-called shopping list. This list is made for each year of the study period and it shows the volumes of all network cost elements (equipment, cables, cabinets, ducting, installation etc.) and the distribution of these network components over different flexibility points and link levels. The costs of the network components are calculated using an integrated cost database. Architecture scenarios are used together with the cost database to calculate investments required for each year. The future market penetration of the services and the tariffs associated with them, are calculated through market forecasts and are inserted into the tool. The tool also calculates the future price of the various network elements which are needed in the calculation of the Operation, Administration and Maintenance (OA&M) costs. The price P(t) of each network element is assumed to follow the extended learning curve [11],

938

D. Varoutas et al. log 2 K

−1 2ln 9 ⎡ ⎤ t ln ⎡ nr (0)−1 −1⎤ − ⎧ ⎦ ΔT ⎫ (1) P (t ) = P (0) ⎢ nr (0) −1 ⎨1 + e ⎣ ⎬ ⎥ ⎢⎣ ⎩ ⎭ ⎥⎦ where the constants K, nr(0) and ΔT are the learning curve coefficient, the initial relative production volume and the growth period respectively. The growth period ΔT is the time taken for the total production volume to reach from 10% to 90% of its maximum value, while the learning coefficient K is the price reduction experienced when the production volume is doubled. In cases where historical data are available these constants can be determined using ordinary least squares regression. The TONIC tool then calculates revenues, investments, cash flows and other financial results for each year of the study period. The following subsections describe the basic assumptions used in the application of the techno-economic methodology.

CEx

Dense urban: 4 x 16.384 potential subscriber Urban:

4 x 16.384 potential subscriber

Suburban:

8 x 8.192

potential subscriber

LEx

Fig. 2. Example of Central Exchange (CEx) segmentation

1.200 m

200 m

FTTCab

Average Copper cable length 1.400 m

FTTEx

LEx Customer Floor Building Cabinet Branching box 2 65536 16384 1024 256 32 4 1:4

1:16

1:4

Number of FP (flexibility points) Distribution rate of FP

1:8

1:8

Fiber cable 5.200 m

Branching CEx box 1 1 1 1:4

1:1

Fig. 3. Distribution Structure and Geographic Model of the Dense Urban Area

2.1 Area Model and Distribution Structure

To calculate the fiber duct and cable lengths, each area (dense urban, urban or suburban) is first described in terms of subscriber density, loop lengths, geographical and market characteristics. The area model is based on the Metropolitan Access Network (MAN), starting from the Central Exchange (CEx) and comprising the whole access network all the way to the customers. It is assumed that one CEx serves either four dense urban, or four urban, or eight suburban service access Areas (Figure 2). Dense urban and urban areas under study correspond to 16,384 customers while the suburban area to 8,192 customer units. For each area type, all customers are connected to the same Local Exchange (LEx). The total number of customer units connected via POTS lines to one CEx thus amounts to 65,536 for all scenarios. The parameters of the

Ιmportance of the Maturity of Photonic Component Industry

939

three different areas are summarized in table I in terms of the mean cable length Lmean connecting the subscribe to the LEx, the density of subscribers Ds, the number of building Nb, the number of subscribers per building Nsb, the duct availability D1 between the LEx and the cabinets and the duct availability D2 between the cabinets and the buildings. The duct availability is one of the most important parameters. It has a strong influence on the economics of the various scenarios due to the high investment costs of ducting systems related to civil works. Figure 3 illustrates the access network distribution structure and the geometric model for the dense urban area. The model has been developed as a total of eight flexibility points (network levels) (FP) and seven link levels (LL). The cost of digging up trenches, installing ducts and cables, is crucial for the economics of any telecommunications network infrastructure. The geometric model forms the basis for the calculation of very important cost components and it is a fundamental step towards the techno-economic modeling of a telecom investment project. 2.2 Service and Customer Definitions

The services to be considered depend on the customer profiles which are classified in Residential and Business customers (including Small and Medium enterprises (SME) and Small Office/Home Office (SOHO)). Key network requirements for business customers are scalability, security, flexibility and differentiated QoS. The range of services required by business customers is wider than for residential customers: file transfer within an Intranet, which means burst traffics and highly variable bit rates, high bit rate access to the Internet and videoconferencing with strong real time constraints. In most cases, these services require higher bit rates than typical residential services. A detailed description of the service definition can be found in 2. 2.3 Tariff Structure

The broadband tariff structure is rather complex. Important tariffs are: connection tariff, access tariff, service provider tariff, traffic tariff, transaction tariff, and charge for content (i.e. pay per view). The tariff model [14] is constructed in the following way: basically it sets a reference tariff (derived from a survey conducted by a number of large operators in Europe) of 720 €€ /year in 2001 for 512 kbs asymmetrical services dedicated residential customers and another reference tariff of 5,280 €€ /year in 2001 for 512 kbs symmetrical services dedicated to business customers. In addition the model assumes an increase of 17% for each doubling of asymmetric downstream capacities and an increase of 30% for each doubling of symmetric downstream capacities in 2002. The model also assumes an annual tariff erosion of 10%. Yearly price erosion of 10% is applied in addition to the above mentioned tariff increase for transfer rate doubling. A more detailed analysis can be found in [16]. 2.4 Network Switch Model

In order to calculate the OA&M costs for each network component, the different switching systems must be modeled. The network element cost model has been built

940

D. Varoutas et al.

based on a switch model containing the parts found in switches of different vendors. These are in general basic equipment including switching fabric, power supply, rack and line interface cards. The model takes into account list price information of several vendors and volume production effects. Considering the relatively new switching systems and the fact that at the moment these “carrier class” systems under study are at a very early stage of development worldwide, both for ATM and Ethernet systems, it has been assumed that the initial accumulated production nr(0) volume is only 1%. An Ethernet layer-3 switch is installed at the Central Exchange. A detailed description of the switch models can be found in [2].

3 Influence of the Cost of Photonic Components In [2] it has been shown that FTTC solutions for dense urban and urban areas result in advantageous business solutions due to the existing infrastructure. It has also been shown that the total investments for point-to-point Ethernet compared with point-tomultipoint ATM FTTC architectures focusing on both business and residential market are practically of the same value. Point-to-multipoint solutions like GPON become more interesting when considering FTTH residential scenarios and have been compared to point-to-point solutions advantages when capacity demand is limited and distribution ratio is high. All together the authors believe in a future heterogeneous solution where both point-to-point and point-to-multipoint solutions are able to sustain their specific advantages. But the further evolution of FTTC architectures to the FTTH/O architecture in the future can be economically viable only if the market becomes mature enough for such high-speed data transfer rates. This is due mainly to the tariffs and customer penetration but the equipment cost is also an important factor. The Net Present Value (NPV) and the Payback period can be used as financial measures in the evaluation of telecom investment projects. To estimate the importance of the various parameters (such as equipment cost, etc), one can carry out sensitivity analysis in these measures. Sensitivity analysis is carried out by calculating these measures and varying one single parameter while keeping all the other parameters constant. The sensitivity results can be used to identify and rank the most critical parameters in the investment project. Figures 4(a) and 4(b) show the NPV for both the FTTC and the FTTH/O Ethernet deployment scenarios in dense urban areas respectively. The parameters varied are identified in the left of the bar figures. As shown in figure 4, the NPV for the FTTC basic scenarios is 30M€€ and much higher than that of FTTH/O (8M€€ ) which is to be expected mainly because of the increased cable installation cost in the latter scenario. Both business cases however have a positive NPV. From the FTTC NPV sensitivity analysis in figure 4(a) we observe that for a 20% increase in the total penetration the NPV rises as high as 37M€€ , implying an increase of 23% in the total investment value. Figure 4 also shows that in both FTTC and FTTH/O architectures the price evolution for the components is important. Unfortunately however, even in the case that the prices within a ten years period fall down to the 33% of the 2003’s level, the deployment is mandated mainly from economical and not from technological factors. This is due to the fact that technology

Ιmportance of the Maturity of Photonic Component Industry

941

NPV [Mio € € ] 0

Wholesale ARPU

10

20

30

40

60

80 %

40 %

1,2 times default

0,8 times default

Total Penetration

50

Network Operations

30 %

Access Equipment Costs

53% of 2003 level

Sales and Marketing

7,25 %

10 %

31% of 2003 level Downside

2,75 %

Upside

(a) NPV [Mio € € ] -20

Wholesale ARPU

Network Operations

-10

0

10

30 %

40

10 %

53% of 2003 level

Inhouse Cabling

700 € € (per customer)

Sales and Marketing

30

80 %

40 %

Access Equipment Costs

Total Penetration

20

0,8 times default 7,25%

31% of 2003 level 300 € € (per customer) 1,2 times default 2,75%

Buildings connected 2003

39 % 21 %

Buildings connected 2010

99 % 81 %

Downside Upside

(b)

Fig. 4. Access equipment cost evolution is more highly critical for FTTO/H deployment (b) than for FTTC architectures (a). ARPU=Average Revenue Per User.

(especially optoelectronics) contributes less to the overall investment cost even in the FTTC scenario. This is better illustrated in figure 5, where the contribution of each cost component to the total investment cost is analyzed. As seen the labor/installation costs contribute as much as 62% to the overall and hence expenditures are mainly dominated by this factor. In suburban areas, this cost contribution will be even higher and hence even more unfavorable figures are to be expected in this case. Although the price of photonic components seems to be a secondary factor influencing the business case, one should keep in mind that part of the overall expenditures are

942

D. Varoutas et al. Investments by type Material/PassOptComp 1% Material/FibreCable 7% Material/Electronics 29%

Material/Enclosures 1%

Labour/Installation 62%

Fig. 5. Investments per type. The main costs are due to installations and electronics in FTTC.

related to OA&M costs which are related to the price evolution of the telecom equipment. It is therefore more interesting to investigate the sensitivity of the various economic figure of merits with respect to the quantities related to the price evolution of photonic components and hence the maturity of the PCI. Influence of production volumes of optoelectronics to NPV 40,0 30,0

Millions Euros

20,0 10,0 0,0 5

7

9

11

13

15 NPV FTTO/H Dense Urban

-10,0

NPV FTTO/H Urban NPV FTTC Dense Urban

-20,0

NPV FTTC Urban

-30,0 ΔΤ

Fig. 6. Relation between the NPV and the ΔΤ parameter of photonic components

This is demonstrated in figure 6, where the value of the NPV is plotted for both FTTC and FTTH/O in the case of urban and dense urban areas with respect to the parameter ΔT of the optical components. As ΔT is increased, the rate of increase of the production volumes of optical components is reduced and the NPV drops. Comparing the slope of the curves, it is deduced that the deterioration is more severe in the FTTH/O scenario. In the dense urban case, the NPV is reduced considerably but remains marginally positive. In the urban case, the NPV attains even smaller values, thereby seriously undermining the value of such a business alternative. The slope of the curves

Ιmportance of the Maturity of Photonic Component Industry

943

Influence of production volumes of optoelectronics to PayBack Period 9,0

PayBackPeriod FTTO/H Dense Urban PayBackPeriod FTTC Dense Urban

Payback Period (yrs)

8,0

PayBackPeriod FTTC Urban

7,0 6,0 5,0

4,0 3,0 5

7

9

11

13

15

ΔΤ

Fig. 7. Relation between the IRR and the ΔΤ parameter of photonic components

indicates that major improvements in optoelectronics technology may benefit the deployment of FTTH/O to urban and suburban areas and will hurdle the labour and installation costs. Figure 7, illustrates the Pay Back Period (PBP) as a function of ΔT for the three scenarios with positive NPV. In the case of FTTC, ΔΤ does not seriously change the PBP in the dense urban case, while its impact is more noticeable in the urban case. As expected the PBP in the dense urban FTTO/H scenario is more intimately related to ΔΤ. A slow down of the photonic component industry by 10 years (from ΔT=5yrs to ΔT=15yrs) can add 2 years in the expected PBP of the investment. Since the new optoelectronic components are not mature yet, they will likely have uncertainty in their price evolution. The influence of this factor in percentage values is quite large due to the fact that optoelectronics represent the majority of investments in the project, except labour/installation costs.

4 Conclusion In this paper a methodology for determining the impact of various key factors influencing the economic viability of metro/access optical network upgrades was presented. The prospects of both FTTC and FTTH/O deployments based on Ethernet technology from the viewpoint of an incumbent network operator were considered. The business cases studies show that the success of the FTTC and FTTH/O deployments are determined mainly from economic and market issues (tariffs, demand) rather than technological ones. Existing technology was shown to favor investments in dense urban areas and particularly FTTC. FTTH/O results in longer pay-back periods and should therefore wait for a either more mature market or technological improvements, or otherwise operators must accept longer Pay Back Periods. Investments in photonic component technology can lead to shorter payback periods and consequently, additional cash flows. These cash flows could be used for further investments in order to provide advanced broadband services to end users especially in urban and suburban areas as well as to other areas in general. Since price evolution of components is not linear and rather logarithmic with

944

D. Varoutas et al.

respect to production volumes, mass production of optoelectronics can turn unfavorable cases to successful ones. In conclusion, investments in technology can develop new values besides the traditional ones adopted by the operators, which are based mainly on new service development and operational cost savings. Acknowledgement. This work was co-funded by the European Social Fund & National Resources under a EPEAEK II- PYTHAGORAS grant. Authors would also like to thank their partners in IST/TONIC and CELTIC/ECOSYS projects.

References 1. Organisation for Economic Co-operation and Development http://www.oecd.org/ document/9/0,2340,en_2649_37441_37529673_1_1_1_37441,00.html 2. T. Monath, N. K. Elnegaard, P. Cadro, D. Katsianis, and D. Varoutas, Economics of fixed broadband access network strategies, IEEE Commun. Mag., 41, 132-139 (2003). 3. N. J. Frigo, P. P. Iannone, and K. C. Reichmann, A view of fiber to the home economics, IEEE Commun. Mag. , 42, S23 (2004). 4. R. Van Caenegem, J. A. Martinez, D. Colle, M. Pickavet, P. Demeester, F. Ramos, and J. Marti, From IP over WDM to all-optical packet switching: Economical view, J Lightwave Technol, 24, 1638-1645 (2006). 5. E. Le Rouzic and S. Gosselin, 160-Gb/s optical networking: A prospective technoeconomical analysis, IEEE J. Lightwave Technology, 23, 3024-3033 (2005). 6. J. Livas, Optical transmission evolution: From digital to analog to network trade offs between optical transparency and reduced regeneration cost" ΙΕΕΕ J. Lightwave Technology, 23, 219-224 (2005). 7. S. K. Korotky, Network global expectation model: A statistical formalism for quickly quantifying network needs and costs ΙΕΕΕ J. Lightwave Technology, 22, 703-722 (2004). 8. E. Chen and D. Lu, The economics of photonics manufacturing, J.P. Morgan H&Q Equity Research (2001). 9. D. Varoutas, D. Katsianis, T. Sphicopoulos, F. Loizillon, K. O. Kalhagen, K. Stordahl, I. Welling, and J. Harno, Business opportunities through UMTS-WLAN networks, Ann. Telecommun., 58, 553-575 (2003). 10. D. Katsianis, I. Welling, M. Ylonen, D. Varoutas, T. Sphicopoulos, N. K. Elnegaard, B. T. Olsen, and L. Budry, The financial perspective of the mobile networks in Europe, IEEE Pers. Commun., 8, 58-64 (2001). 11. EU TONIC, Techno-economics of IP Optimised Networks and Services, IST-2000-25172. Available at http://www-nrc.nokia.com/tonic/ 12. ECOSYS, techno-ECOnomics of integrated communication SYStems and services, Available at http://www.celtic-ecosys.org/ 13. T. P. Wright, “Factors affecting the cost of airplanes,” J. Aeronautic Science, 3, pp. 122128 (1936). 14. K. Stordahl, L. A. Ims, M. Moe, “Broadband market - the driver for network evolution”, Proc Networks 2000, Toronto, Canada, September 10-16 (2000). 15. K. Stordahl, L. A. Ims, N. K. Elnegaard, F. Azevedo, B.T. Olsen, “Broadband access network competition - analysis of technology and market risks”, Proc. Globecom ’98, Sydney (1998). 16. D. Varoutas, C. Deligiorgi, Ch. Michalakelis and T. Sphicopoulos. “A hedonic approach to estimate price evolution of telecommunication services: Evidence from Greece”, to appear in Applied Economic Letters, Taylor and Francis.

The Token Based Switch: Per-Packet Access Authorisation to Optical Shortcuts Mihai-Lucian Cristea1 , Leon Gommans1 , Li Xu1 , and Herbert Bos2 1

University of Amsterdam, The Netherlands {cristea,lgommans,lixu}@science.uva.nl 2 Vrije Universiteit Amsterdam, The Netherlands [email protected]

Abstract. Our Token Based Switch (TBS) implementation shows that a packetbased admission control system can be used to dynamically select a fast end-to-end connection over a hybrid network at gigabit speeds. TBS helps high-performance computing and grid applications that require high bandwidth links between grid nodes to bypass the regular Internet for authorised packets by establishing shortcuts on network links with policy constraints. TBS is fast and safe and uses the latest network processor generation (Intel IXP2850) and the Fairly Fast Packet Filter software framework.

1 Introduction Grid and other high-performance applications tend to require high bandwidth end-toend connections between grid nodes. Often the requirements are for several gigabits per second. When spanning multiple domains, fibre optic networks owners must cooperate in a coordinated manner in order to provide high-speed end-to-end optical connections. Currently, the administrators of such connections use paper-based long-term contracts. There exists a demand for a mechanism that dynamically creates these fast end-to-end connections (termed lightpaths) on behalf of grid applications. The use of lightpaths is also envisaged for applications that are connected through hybrid networks [1]. A hybrid network contains routers and switches that accept and forward traffic at layers 1, 2, or 3. In other words, hybrid networks consist of traditional (layer 3) routed networks which allow for optical (layer 1 or 2) shortcuts for certain parts of the end-to-end path. Currently, the peering points between routed networks of the Internet Service Providers (ISPs) by way of the Border Gateway Protocol (BGP) policies determine statically what traffic bypasses the (slow) routed transit network and which links they will use to do so. However, when considering hybrid networks interconnected over long distances, we would like the peering points to play a more active/dynamic role in determining which traffic should travel over which links, especially since multiple links often exist in parallel. Therefore, an important role for the peering points is path selection and admission control for the links. Figure 1 shows a hybrid network composed of three ISPs interconnected through routed networks (Internet) and also through two optical links managed by different owners. The connections across the optical links is via the peering points (P P s). An example of such a connection may be a high-bandwidth transatlantic link. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 945–957, 2007. c IFIP International Federation for Information Processing 2007

946

M.-L. Cristea et al.

Users X and Y on the left access servers on the right. We want them to bypass the routed Internet and use optical shortcuts instead. However, while not precluded by the model, we do not require each user to have an individual relation with each ISP and shortcut along the path. Indeed, the link owners should not normally have a notion of which specific IP addresses are allowed to access the link. Instead, we expect ISPA (say, the organisation, department or research group) to have a contract with the link owners and decide locally (and possibly at short timescales) who should get access to the links in domains that are not under its control. For scalability, the model allows the system to be federated along more levels, where ISPA contacts the authorisation at the next level, which in turn contacts a server at a higher level still, etc. In practice, the number of levels is expected to be small (often one).

Internet routed networks

1111111 0000000 0000000 1111111 0000000 1111111 rogue 0000000 1111111 0000000 1111111 user 0000000 1111111 0000000 1111111

user X

Legend: BGP border router peering PP point

ISP A PP

PP

server 1 (compute)

ISP B

ISP C PP

PP

user Y

Link owner 1

Link owner 2

server 2 datastore

Fig. 1. Peering in hybrid networks: end-to-end lightpaths

A flexible way of bypassing the routed Internet is to have ISPA tag traffic from X and Y with some sort of token to signal to remote domains that they should be pushed across the optical shortcuts. At the same time, we want to prevent rogue users (e.g., the ISPB client indicated in the figure) to tag their packets with similar tokens to gain unauthorised access to the shortcuts. In principle, the signalling of peering requests to the peering points P P s can be done either out-of-band, or in-band, or both as described in the Authorisation Authentication Accounting (AAA) framework (RFC 2904) [2]. In out-of-band signalling there is an explicit control channel to P P s, separate from the data channel(s). In an in-band mechanism, the admission control is based on specific data inserted into the communication channel. We opt for in-band signalling for reasons of flexibility resulting from the perpacket granularity. Specifically, we insert tokens into each packet as proof of authorisation. Tokens are a simple way to authorise resource usage which may convey different semantics. For instance, we may specify that only packets with the appropriate token are allowed to use a pre-established network connection in a specific time-frame and embed these tokens in the packets of an application distributed over many IP addresses. However, our tokens differ from common flow identifiers (e.g., ATM VPI/VCI pairs, MPLS labels, and IPv6 flow labels) in that they cannot be tampered with and that they

The TBS: Per-Packet Access Authorisation to Optical Shortcuts

947

may be associated with arbitrary IP addresses. In essence, tokens are like IPsec authentication headers (AH [3]) except that we aim for authentication that is more efficient in computation and more flexible than IPsec standard (we can authenticate various fields from Ethernet or IP headers by using a customised mask). In addition, the application domain is different in sense that our TBS system serves applications distributed over many IP addresses. Tokens bind packet attributes to an issuing attribute authority (e.g., an AAA server in our case). Tokens could be acquired and subsequently used anonymously. Moreover, a token can bind to differrent semantics (e.g., a user, a group of users, or an institute) and decouples time of authorisation from time of use. In the switch described in this paper, tokens are used to select shortcuts and different tokens may cause packets to be switched on different links, but in principle they could also be used to enforce other QoS-like properties, such as loss priorities. This paper describes both the design of the Token Based Switch (TBS) and its implementation on high-speed network processors. TBS introduces a novel and rather extreme approach to packet switching for handling high-speed link admission control to optical shortcuts. The main goal of this project was to demonstrate that the TBS is 1 path sefeasible at multi-gigabit link rates. In addition it has the following goals: lection with in-band admission control (specific tokens gives access to shortcut links), 2 external control for negotiating access conditions (e.g., to determine which tokens 3 secured access control. give access to which links), and The third goal is also important because part of the end-to-end connection may consist of networks with policy constraints such as those meant for the research application domain in the LambdaGrid. Moreover, a critical infrastructure needs protection from malicious use of identifiers (e.g., labels in (G)MPLS, header fields in IPv4, or flowID in IPv6). For these reasons, our token recognition uses cryptographic functions, for example, to implement the Hash function based Message Authentication Code (HMAC) (see RFC2104 [11]). An external control interface is required to negotiate the conditions that give access to a specific link. Once an agreement has been reached, the control interface should accept an authorisation key and its service parameters that will allow packets to access the owner’s link. To operate at link speeds we push all complex processing to the hardware. In our case we use a dual Intel IXP2850 with on-board crypto and hashing units. Several projects address the issue of setting up lightpaths dynamically (e.g., UCLP, DWDM-RAM [4]), and others look at authorisation (e.g., IP Easy-pass [5]). However, to our knowledge, no solutions exist that support both admission control and path selection for high speed network links in an in-band fashion for setting up safe, per-packetauthenticated, optical short-cuts. In addition, our approach makes it possible to set up multiple shortcuts on behalf of applications that span multiple domains. As a result, a multi-domain end-to-end connection can be transparently improved in terms of speed and number of hops by introducing shortcuts. The remainder of this paper is organised as follows. Section 2 discusses the TBS architecture. Section 3 explains implementation details. The system is evaluated in Section 4. Related work is discussed throughout the text and summarised in Section 5. Finally, conclusions are presented in Section 6.

948

M.-L. Cristea et al.

2 Architecture At present, many techniques can be used to build end-to-end network connections with some service guarantees. For instance, Differentiated Service (DiffServ) [6] manages the network bandwidth by creating per hop behaviors inside layer-3 routers, while Multi Protocol Label Switching (MPLS) [7] establishes a path using label switched routers. However, a more radical approach that is typically adopted for layer-1 switches, uses the concept of a lightpath [8]. In this paper, a lightpath is defined as an optical unidirectional point-to-point connection with effective guaranteed bandwidth. It is suitable for very high data transfer demands such as those found in scientific Grid applications [9]. In order to set up a lightpath, a control plane separate from the data forwarding plane is used on behalf of such applications. For multi-domain control, a mechanism is needed that respects local policies and manages the network domains to set up the end-to-end connection (usually composed of multiple lightpaths) on demand. 2.1 High-Level Overview Figure 2 shows the architecture used for setting up a lightpath using token based networking. In a nutshell, the process is as follows. On behalf of a subset of its users, ISPA generates a Link Access Request (LAR) message to use a particular link (from the available 1 The details of LAR generalinks of its peering point) for a certain period of time . tion based on user requests are beyond the scope of this article. Interested readers are 2 The AAA referred to [10]. The LAR is sent to the AAA server of peering point A . 3 Next, server fetches a policy from the policy repository that validates the request . 4 ISPA may a key (TokenKey) is generated and sent to ISPA and peering point A . use the key to create a token for each packet that should use the link. The token will be injected in the packets which are then forwarded to the peering point. The entity responsible for token injection is known as the token builder. On the other side, peering point A uses the same TokenKey as ISPA to check the incoming token-annotated packets. In other words, for each received packet, peering point A creates a local token and checks it against the embedded packet token. The packet is authenticated when both tokens match. An authenticated packet is then forwarded to its corresponding fibre-optic 5 All other packets are transmitted on a ‘default’ link that is connected to the link . Link Admission Request

AAA Server

Link Access Request 2 User App1

Token Key

1

Token Builder

4

ISPA

Link Admission Policy Repository Fibre−optic Link 1

Token Switch

User App2

to other PP’s

3

5 6

Fibre−optic Link 2 to Transit Network

BGP Router

Peering Point A

Fig. 2. Token based networking architecture

The TBS: Per-Packet Access Authorisation to Optical Shortcuts

949

6 The entity responsible for token checking and packet switching is transit network . known as the token based switch. When a packet arrives at the token switch, we must find the appropriate key to use for generating the token locally. Which key to use is identified by fields in the packet itself. For instance, we may associate a key with a IP source and destination pair, so all traffic between two machines are handled with the same key. However, other forms of aggregation are possible. For instance, we may handle all traffic between two networks with the same key, or all machines participating in a Grid experiment, etc. In general, we allow the key to be selected by means of an aggregation identifier, embedded in the packet. The aggregation identifier is inserted in the packet together with the token by the ISP to signal the key to use.

2.2 Token Principles Compared to other mechanisms (such as certificates), a token is a general type of trusted and cryptographically protected proof of authorisation with flexible usage policies. A token created for an IP packet is essentially the result of applying an HMAC algorithm over a number of packet fields as illustrated in Figure 3 and explained below. An HMAC algorithm is a key-dependent way to create a one-way hash (see RFC2401 [11]). In our implementation we opted for a strong proof of authorisation by means of HMAC-SHA1. It may be possible to use more lightweight algorithms such as RC5 which is also used by IP EasyPass [5]. However, we wanted to evaluate the performance that could be achieved with strong authentication. We believe that using RC5 or similar algorithms would only make it scale better. Ethernet frame: Mask:

ETH header

1111111 0000000 00 11 0000000hdr_len 1111111 00 11

IP header

Data

ETH crc

11 00 00000 1010 00 ttl11111 11 00000 11111 hdr_chk,sip,dip

Legend:

tot_len

sip = source IP dip = destination IP hdr_len=header length tot_len=total IP length hdr_chk=IP checksum ttl = time to live

Data to encrypt (64bytes)

Token Key (20 bytes)

ETH header

HMAC−SHA1

IP header

Token (20 bytes)

11111111 00000000 Options 00000000 11111111 (24 bytes)

Data

ETH crc

Fig. 3. Token creation

To evaluate the TBS principles, we developed a prototype that stores the token in each packet’s IP option field (as defined in RFC 791). An IP option can be of variable length and its field will be ignored by normal routers. Although some routers have a slower processing path for the IP option packets than simple IP packets (because higher level headers will be at different positions in the packet), we noticed that our TBS system works well in high speed, important and pricey sites (e.g., ISPs, grid nodes interconnection points) where all systems and also routers are updated to the state-of-the-art

950

M.-L. Cristea et al.

hardware. We stress, however, that the IP option implementation is for prototyping purposes. More elegant implementations may use a form of MPLS-like shim headers. Figure 3 shows the process of token creation and insertion in the IP option field. In our prototype, the HMAC-SHA1 algorithm generates the unique token (20 bytes) that will be injected into the packet’s IP option field. As an IP option field has a header of two bytes, and network hardware systems commonly work most efficiently on chunks of four or eight bytes, we reserve 24 bytes for the IP option field. In order to ensure the token uniqueness for packets, we need to include fields that are different in every packet. Therefore, a part of the packet data will be included together with the packet header in the HMAC calculation. We mention that some of the first 64 bytes of an Ethernet frame are masked in order to make the token independent of the IP header fields which change when a token is inserted (e.g., header length, total length, IP checksum) or when the packet crosses intermediate routers (e.g., TTL). The mask also provides flexibility for packet authentication, so that one could use the (sub)network instead of end node addresses, or limit the token coverage to the destination IP address only (e.g., by selectively masking the IP source address).

3 Implementation Details The Token Based Switch (TBS) proposes to perform secured lightpath selection onthe-fly by means of packet based authentication. Therefore, packet checking at high speeds (Gbps) is crucial in our context. The latest generation of Network Processor Units (NPUs) includes crypto units and provides a suitable solution for packet authentication in a TBS system. Although the NPUs are powerful hardware specifically designed for packet processing at high speeds, they consist of a complex architecture (e.g., multi-RISC cores and shared memory controlled by a central GPU). Therefore, building software applications on NPUs is a challenging task. As explained below, we use Intel IXP2850 NPUs as our hardware [12] and extend the Fairly Fast Packet Filter (FFPF) software framework [13] to ease implementation of our software. 3.1 Hardware Platform Our prototype uses the IXDP2850 development platform (see Figure 4), consisting of 1 & , 2 ten gigabit fibre interfaces , 3 a loopback fabric indual IXP2850 NPUs 4 and fast data buses (SPI, CSIX). Each NPU has several external memories terface (SRAM, DRAM) and its own PCI bus for the control plane (in our setup it connects to a slow 100Mbps NIC). In addition, each 2850 NPU contains on-chip 16 multi-threaded RISC processors (μEngines) running at 1.4GHz, a fast local memory, lots of registers and two hardware crypto units for encryption/decryption of commonly used algorithms (e.g., 3DES, AES, SHA-1, HMAC). As illustrated in Figure 4, the incoming packets are received by the Ingress NPU (via the SPI bus). These packets can be processed in parallel with the help of μEngines. The packets are subsequently forwarded to the second NPU via the CSIX bus. The second NPU can process these packets and then decide which will be forwarded out of the box (via the SPI bus) and which outgoing link will be used.

The TBS: Per-Packet Access Authorisation to Optical Shortcuts SRAM DRAM SRAM DRAM SRAM DRAM

1

2

PCI 64/66 CSIX

SPI

10 x 1Gb/s

Ethernet 100 Mbps

IXP2850

Ingress NPU

3

951

IXP2850

Fabric Interface Chip 4 PCI 64/66

Egress NPU SRAM DRAM SRAM DRAM SRAM DRAM

100 Mbps Ethernet

Fig. 4. IXDP2850 development platform

3.2 Software Framework: FFPF on IXPs The Fairly Fast Packet Filter (FFPF) is a flexible software framework designed for high-speed packet processing. FFPF supports both commodity PCs and IXP network processors natively and has a highly modular design. FFPF was designed to meet the following challenges. First, it exploits the parallelism offered by multiple cores (e.g., the host CPU and the IXP’s μEngines). Second, it separates data from control, keeping the fast-path as efficient as possible. Third, FFPF avoids packet copies by way of a ‘smart’ buffer management system. Fourth, the FFPF framework allows building and linking custom packet processing tasks inside the low-level hardware (e.g., the μEngines of each NPU). For example, a packet processing application may be built by using the FFPF framework as follows. The application is written in a simple packet processing language known as FPL [14], and compiled by the FPL-compiler. Then, the application’s object code is linked with the FFPF framework and loaded into the hardware with the help of the FFPF management tools. Most of the complexity of programming low-level hardware (e.g., packet reception and transmission, memory access) is hidden behind a friendly programming language. 3.3 Token Based Switch The TBS application consists of two distinct software modules: the token builder and the token switch (see also Figure 2). In our prototype, the modules are implemented on a single hardware development system (IXDP2850) although in reality they are likely to be situated in different locations. Therefore, our implementation consists of a demo system as shown in Figure 5. The token builder application module is implemented on two μEngines in the Ingress NPU, while the token switch module is implemented on two μEngines in the Egress NPU. Although the mapping can be easily scaled up to more μEngines, we use only two μEngines per application module because they provide sufficient performance already. As we will see in Section 4, the bottleneck is the limited number of crypto units. The token builder application implements the token principles as described in Figure 3. FFPF automatically feeds the token builder module with packet handles. The token builder retrieves the first 64 bytes of the current packet from a shared packet

952

M.-L. Cristea et al. Gigabit ports:

0

1

2

3

4

5

6

7

8

9

Rx

Tx

Gig

Ingress NPU

0−9

Egress NPU

TokenBuilder token =HMAC( masked 64B) compute IP checksum TX()

Tx

TokenSwitch

Rx

if (token != HMAC(masked 64B) Tx(port 8) else Tx(port Np)

KT:SRAM[ ] 0 1 2

PBuf

Gig 0−9

KT:SRAM[ ]

Ku SIP,DIP 0 1 2

Ku SIP,DIP

Np

PBuf

Fig. 5. Token Based Switch using a dual IXP2850

buffer memory (PBuf) into local registers and then applies a custom mask over these bytes in order to hide unused fields like IP header length, IP total length, etc. The application also retrieves a proper token key (K) from a local KeysTable by looking up an aggregation identifier (e.g., a flow may be identified by the IP source address and/or IP destination address pair, or other aggregates). The aggregation identifier also determines which fields to use in the authentication (the mask). Next, an HMAC-SHA1 algorithm is issued over the first (masked) 64 bytes of packet data using K (20 bytes) as encryption key. The encryption result (20 bytes) is then ‘inserted’ into the current packet’s IP option field. This operation involves shifting packet data to make space for the option field. It also involves re-computing its IP checksum because of the IP header modification. Once the packet has been modified it is scheduled for transmission. In this prototype, the ingress NPU packets are transmitted out to the egress NPU via a fast bus. The token switch application implements the token switch machine from the system architecture (see Figure 2). FFPF automatically feeds the token switch module with packet handles. The token switch checks whether the current packet has the appropriate IP option field, and extracts the first 64 bytes of the original packet data and the token key value from the option field into local registers. Next, it applies a custom mask over the 64 bytes of data. The application also retrieves a proper token key from its local KeysTable. If no entry is found in the KeysTable the packet cannot be authorised and it will be sent out to a default port (e.g., port 8) for transmission over a (slow) routed connection. Otherwise, an HMAC-SHA1 algorithm is issued over the 64 bytes of data and using the token key value (20 bytes) as encryption key. The encryption result is compared to the built-in packet token. When they match, the packet has been successfully authorised and it will be forwarded to its authorised port (Np).

4 Evaluation Figure 6 shows the system setup used for proving the concepts of token based switching [15]. The two IXP2850 NPUs (Ingress and Egress) boot from a Linux boot server machine. At run-time we use FFPF for the control path (running on the Linux server and

The TBS: Per-Packet Access Authorisation to Optical Shortcuts

953

both embedded linux NPUs). Three other linux machines (Rembrandt 6, 7 and 8) serve as clients and are each connected via gigabit fibre to an NPU gigabit port. In order to find out the maximum data rate the proposed system can handle, we evaluate the following scenario: – An UDP traffic stream was generated (using the iperf tool) from Rembrandt6 to port 6 of the IXDP2850; – A key was set for authorising traffic (Rembrandt6 → 7) to end up on port 7 and another key was set for (Rembrandt 7 → 6) to end up on port 6; – Unauthorised traffic goes to the default port (port 8); – To prove that authorised traffic ends up on port 7 and unauthorised traffic ends up on port 8, Rembrandt7 and 8 were connected to the IXDP2850 ports 7 and 8, respectively. We used the tcpdump tool on Rembrandt7 and 8 for listening to their gigabit ports.

Internet 6

rembrandt6

Ingress

Linux

7

BootServer Gateway

10/100Mbps switch

rembrandt7

Egress Intel IXDP2850

8

rembrandt8

Fig. 6. Token Based Switch demo 1 ‘data The performance of the above test is shown in Figure 7.a. It has two charts: 2 ‘successfully received’ which represents the received rate in the IXDP2850 box and switched data’ which denotes the rate that the TBS could handle properly using just a single thread for processing. The ‘data received’ chart is low for small packets because of the Gigabit PCI card limitation used in the Rembrandt6 PC for traffic generation. So, for small packets it reflects the limitations of the traffic generator rather than those of the TBS. The second chart, ‘Successfully switched data’, is lower than the first one for high speeds because here we are using a single threaded implementation. The multithreaded version coincides exactly with the ‘data received’ chart and is thefore not visible. While we cannot predict the real performance for speeds above 1 Gbps without performing measurements with a high-speed traffic generator, we estimated the outcome by using the Intel’s cycle accurate IXP simulator running in debug mode. Table 1 shows the cycle estimation for a 150 bytes packet processed by each software component from the Data path (Rx, Tx, Rx CSIX, Tx CSIX, TokenBuilder and TokenSwitch). Table 1.a illustrates the cycles spent for one packet in each software module of the FFPF implementation on IXP2850. These modules are optimised for multithreading packet processing (e.g., receiving, storing, transmitting). The first row in Table 1.b contains the cycles spent for one packet in a single threaded version of the token builder and

954

M.-L. Cristea et al.

a running in hardware

b running in cycle accurate simulator

Fig. 7. Token Based Switch performances

token switch modules. We note that these values are high because all subtasks (e.g., encryption, token insertion, checksum computation) run linearly (no parallelism involved at all) and use only one crypto unit each. This single threaded version gives the performance shown in Figure 7.a. The next rows illustrate various implementations of the multithreading version. Although we should expect better performance when we increase parallelism (e.g., more threads, or more μEngines), having only two crypto units available per NPU limits the performance to the value of roughly 2000 cycles (the token switch module spends its time mostly on authentication, while the token builder module does also token insertion in the packet). Note that our prototype implements each TBS module (token builder and token switch) on only one NPU of the IXDP2850 hardware system. The reason this was done is that we have only one of these (expensive) IXDP2850 devices available in our lab. In a real setup, however, each TBS module may use the full dual-NPU IXDP2850 for building or checking tokens and therefore the system performance is expected roughly to double compared to our presented figures, mainly because we would benefit from the availability of four crypto units. Table 1. Cycle budget TBS TokenBuilder TokenSwitch single threaded 5777 3522 4 threads, 1 μEngine 3133 2150 8 threads, 1 μEngine 3000 2100 4 threads, 2 μEngines 2600 2100 8 threads, 2 μEngines 2500 2000 (a) FFPF on IXP2850 overhead (b) TBS overhead FFPF module μEngine cycles Rx 408 Tx CSIX 276 Rx CSIX 504 Tx 248

Using the cycle estimation given in Table 1, we express the throughput as a function of the packet size, number of threads and number of μEngines: rate=f(packet size, threads, μEngines) without taking into account additional contention. The estimated throughput for our multithreaded version goes up to roughly 2Gbps (Figure 7.b). We also measured the latency introduced by our Token Based Switch system. The token builder application (the whole Ingress NPU chain) takes 13.690 cycles meaning

The TBS: Per-Packet Access Authorisation to Optical Shortcuts

955

a 9,7μs processing time (introduced latency), and the TokenSwitch application (the whole Egress NPU chain) takes 8.810 cycles meaning a 6,2μs latency. We mention that a μEngine in the IXP2850 NPU runs at 1400MHz.

5 Related Work In addition to commercial solutions for single domain provider-controlled applications such as Nortel DRAC, Alcatel BonD, some research is also underway to explore the concept of user-controlled optical network paths. One of the leading software packages is the User Controlled Lightpath Provisioning (UCLP) [16]. UCLP currently works in a multi-domain fashion, where all parties and rules are pre-determined. Truong et al [17] worked on policy-based admission control for UCLP and implemented fine-grained access control. Some interesting work in the optical field is also done in Dense Wavelength Division Multiplexing-RAM [4,18], where a Grid-based optical (dynamic) bandwidth manager is created for a metropolitan area. Our approach is different in the sense that we provide a mechanism to dynamically set up multiple shortcuts across a multi-domain end-toend connection. Therefore, an end-to-end connection can be easily improved in terms of speed and hop count by introducing ‘shortcuts’ based on new user’s agreements. IP Easy-pass [5] proposed a network-edge resource access control mechanism to prevent unauthorised access to reserved network resources at edge devices (e.g., ISP edge-routers). IP packets that are special demanding, such as real-time video streams, get an RC5 encrypted pass appended. Then, at edge-routers, a linux kernel validates the legitimacy of the incoming IP packets by simply checking their annotated pass. Unlike our work, the solution aims at fairly low link rates. While our solution shares the idea of authentication per packet (token), we use a safer encryption algorithm (HMAC-SHA1) and a separate control path for key management (provided by AAA servers). In addition, we demonstrate that by using network processors we are able to cope with multi-gigabit rates. Most related to our TBS is Dynamic Resource Allocation in GMPLS Optical Networks (DRAGON) framework [19]. This ongoing work defines a research and experimental framework for high-performance networks required by Grid computing and e-science applications. The DRAGON framework allows dynamic provisioning of multi-domain network resources in order to establish deterministic paths in direct response to end-user requests. DRAGON’s control-plane architecture uses GMPLS as basic building block and AAA servers for authentication, authorisation and accounting mechanism. Thereby, we found a role for our TBS within the larger DRAGON framework and we currently work together to bring the latest TBS achievements into DRAGON.

6 Conclusions and Future Work This paper presents our implementation of the Token Based Switch application on Intel IXP network processors, which allows one to select an optical path in hybrid networks. The admission control process is based on token principles. A token represents the

956

M.-L. Cristea et al.

right to use a pre-established network connection in a specific time frame. Tokens allow separation of the (slow) authorisation process and the real-time usage of high-speed optical network links. The experimental results show that a TokenSwitch implementation using the latest Network Processor generation can perform packets authorisation at multi-gigabit speeds.

Acknowledgements This work was supported by the GigaPort NG and the EU IST NextGrid projects. The authors wish to thank to Harry Wijshoff and Dejan Kostic for their feedback and Lennert Buytenhek for his advice and support.

References 1. Winkler, L.: The hybrid optical and packet infrastructure (HOPI) testbed. Internet2 whitepaper (April 2004) 2. Vollbrecht, J., Calhoun, P., Farrel, S., Gommans, L., Gross, G., Bruin, B., de Laat, C., Holdrege, M., Spence, D.: RFC2904, AAA Authorization Framework. IETF (2000) 3. Kent, S., Seo, K.: Security Architecture for the Internet Protocol. RFC 4301 (Proposed IPsec Standard) (December 2005) 4. Figueira, S., Naiksatam, S., Cohen, H.: DWDM-RAM: Enabling Grid Services with Dynamic Optical Networks. In: Proc. of the SuperComputing, Phoenix, Arizona (Nov 2003) 5. Wang, H., Bose, A., El-Gendy, M., Shin, K.G.: IP Easy-pass: a light-weight network-edge resource access control. IEEE/ACM Transactions on Networking 13(6) (2005) 6. Blake, S., D.Black, Carlson, M., Davies, E., Wang, Z., Weis, W.: RFC2475, An Architecture for Differentiated Services. IETF (1998) 7. Rosen, E., Viswanathan, A., Callon, R.: RFC3031, Multiprotocol Label Switching Architecture. IETF (2001) 8. DeFanti, T., de Laat, C., Mambretti, J., Neggers, K., Arnaud, B.S.: TransLight: a global-scale LambdaGrid for e-science. Commun. ACM 46(11) (2003) 34–41 9. de Laat, C., Radius, E., Wallace, S.: The rationale of current optical networking initiatives. Future Generation Computer Systems 19(6) (Aug 2003) 999–1008 10. Gommans, L., Dijkstra, F., de Laat, C., Taal, A., Wan, A., T., L., Monga, I., Travostino, F.: Applications drive secure lightpath creation across heterogeneous domains. IEEE Communications Magazine 44(3) (March 2006) 100–106 11. Bellare, M., Canetti, R., Krawczyk, H.: RFC2104, HMAC: Keyed-Hashing for Message Authentication. IETF (1997) 12. Intel Corporation: Intel IXP2xxx Network Processor. IXP NPs product brief (2005) 13. Bos, H., de Bruijn, W., Cristea, M., Nguyen, T., Portokalidis, G.: FFPF: Fairly Fast Packet Filters. In: Proceedings of OSDI’04, San Francisco, CA (December 2004) 14. Cristea, M.L., de Bruijn, W., Bos, H.: Fpl-3: towards language support for distributed packet processing. In: Proceedings of IFIP Networking, Waterloo, Canada (May 2005) 15. Gommans, L., Travostino, F., Vollbrecht, J., de Laat, C., Meijer, R.: Token-based authorization of connection oriented network resources. In: Proc. GRIDNETS, San Jose, CA, USA (Oct 2004)

The TBS: Per-Packet Access Authorisation to Optical Shortcuts

957

16. Wu, J., Campbell, S., Savoie, J., Zhang, H., Bochmann, G., Arnaud, B.: User-managed endto-end lightpath provisioning over ca*net 4. In: Proceedings of the National Fiber Optic Engineers Conference, Orlando, FL, USA (Sep 2003) 17. Truong, D., Cherkaoui, O., ElBiaze, H., Aboulhamid, M.: A Policy-based approach for User Controlled Lightpath Provisioning. In: Proc. of NOMS, Seoul, Korea (Apr 2004) 18. Figueira, S., Naiksatam, S., Cohen, H.: OMNInet: a Metropolitan 10Gb/s DWDM Photonic Switched Network Trial. In: Proceedings of Optical Fiber Communication, Los Angeles, USA (Feb 2004) 19. Lehman, T., Sobieski, J., Jabbari, B.: DRAGON: A Framework for Service Provisioning in Heterogeneous Grid Networks. IEEE Communications Magazine 44(3) (March 2006)

Online Multicasting in WDM Networks with Shared Light Splitter Bank Yuzhen Liu and Weifa Liang Department of Computer Science, The Australian National University Canberra, ACT 0200, Australia {yliu,wliang}@cs.anu.edu.au

Abstract. We study online multicasting in WDM networks with shared light splitter bank. Our objective is to maximize the network throughput. It is desirable that the cost of realizing each multicast request be minimized, and the network throughput will be maximized ultimately through the cost saving on each individual request. We ﬁrst propose a cost model for realizing an online multicast request under such network environments with limited light splitters and wavelength converters, which models the cost of utilization of network resources, particularly in modelling the light splitting and wavelength conversion ability at nodes. We then show that ﬁnding a cost-optimal multicast tree for a multicast request under the proposed cost model is NP-complete, and instead devise approximation and heuristic algorithms for it. We ﬁnally conduct experiments to evaluate the performance of the proposed algorithms. The results show that the proposed algorithms are eﬃcient and eﬀective in terms of network throughput.

1

Introduction

A WDM network consists of optical wavelength routing nodes interconnected by point-to-point ﬁber links. In WDM networks, the ﬁber bandwidth is partitioned into multiple frequency bands (wavelengths) that may be used to transmit messages simultaneously as long as each message uses a diﬀerent wavelength. It becomes increasingly evident that WDM networks are able to provide data transmission rates several orders of magnitude higher than current electronic networks, and will soon become the core technology for the next generation Internet by providing unprecedented large available wavelengths [1]. The key to high speed in these networks is to maintain signals in optical form rather than traditional electronic form with the devices such as optical crossconnects(OXCs) [2]. The OXCs equipped with light splitters, referred to as multicast-capable OXCs (MC-OXCs), have the ability to split the incoming signal into more than one outgoing signal with the same wavelength but lower power level. The splitters are the fundamental optical device contributing to power loss [3]. Even in the ideal case the power of each output of a splitter is only 1/n of that of the input signal, where n is the fanout of the splitter. Some devices such as erbium-doped ﬁber ampliﬁers can be used to keep the power level of an optical signal above I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 958–969, 2007. c IFIP International Federation for Information Processing 2007

Online Multicasting in WDM Networks with Shared Light Splitter Bank

959

some threshold so that the signal is able to be detected. However, the use of ampliﬁers would increase the cost of WDM networks. Another factor increasing the cost of WDM networks is the use of wavelength converters that allow the wavelength of outgoing signals diﬀerent from that of their incoming counterpart. Therefore, the number of light splitters and the number of wavelength converters in a network should be taken into account [3,4], and the splitters and converters installed at a node should be shared by all its incoming signals in a power-eﬃcient and cost-eﬀective WDM network. Light splitters are the key components to implement multicast. Multicast routing and wavelength assignment (MC-RWA) is a fundamental problem in WDM networks, which aims at ﬁnding a set of links and wavelengths on these links to establish the connection from the source to the destination nodes. MC-RWA includes the building of a routing tree (light-tree) and the assignment of wavelengths to the links in the tree. Since the combined multicast routing and wavelength assignment is a hard problem, the most adopted strategy is to decouple the problem into two separate subproblems: the light-tree routing problem and the wavelength assignment problem [2,3,5]. The former aims to build a routing tree for a multicast request, while the latter aims to assign the available wavelengths to the links in the tree. There are several studies for constructing multicast trees under some physical constraints on optical networks. Sahin and Azizolgu [6] considered the multicast problem under various fanout policies. Libeskind-Hadas and Melhem [7] investigated multicast communication in circuit-switched multi-hop networks by showing, despite the fact that the general multicast problem is NP-hard, it is polynomially solvable when the optimization objective is the wavelength assignment only. For constructing constrained multicast trees in WDM networks, Bermond et al [8] investigated routing and wavelength assignment with only unicast-capable switches. Libeskind-Hadas [9] extended the unicast communication (point-to-point communication model) by proposing a multi-path routing model, in which the multicast problem is to ﬁnd a set of paths from the source to the destination nodes such that each path contains a subset of destination nodes, all the destination nodes are included by these paths, and the cost sum of these paths is minimized. Zhang et al [10] considered the multicast routing problem by focusing on the limited light splitting of optical switches and provided several heuristics for the problem. Zhang and Yang [5] considered the multicast problem with an objective of minimizing the number of converters by providing an approximation algorithm with approximation ratio of O(log n). Rouskas [2] provided an excellent survey of the problem under the light splitter switching model. In this paper we consider the online multicast routing and wavelength assignment problem in the networks in which light splitters/wavelength converters are installed only on a fraction of the nodes and shared by incoming signals. Since there are eﬃcient algorithms for wavelength assignment in tree structures available [5,11], we focus on the routing problem under this shared light splitter bank architecture. Speciﬁcally we consider the following online mulicasting problem.

960

Y. Liu and W. Liang

Assume that there is a sequence of multicast requests that is unknown in advance and the requests arrive one by one. Once a multicast request arrives, the response by the system to the request is to either realize the request by building a multicast tree for it or reject the request due to lack of network resources. The objective is to maximize the network throughput that is the number of the realized multicast requests in the given sequence. Due to the unknown pattern of future requests, we focus on realizing each individual multicast request by building an economic multicast tree for the request.

2

Preliminaries

In this section we ﬁrst introduce a model of WDM networks with shared light splitter bank. We then propose a node cost model that characterizes the utilization of these network resources. We ﬁnally deﬁne the optimal multicast tree problem and the online multicast request maximization problem. 2.1

Shared Light Splitter Bank Model

The WDM network with shared light splitter bank is modelled by an undirected graph G(V, E, Λ, w), where V is a set of nodes (vertices), E is a set of bidirectional optical ﬁber links (edges), Λ = {λ1 , λ2 , . . . , λk } is a set of wavelengths in G and Λ(e) is the set of wavelengths available on edge e, and w is a function from V to the set of non-negative real number, |V | = n, |E| = m, and |Λ| = k. The node weight w(v) models the light splitting/wavelength conversion ability at v and is inversely proportional to this ability, that is, a node with higher ability has a smaller node weight, whereas a node with lower ability has a larger node weight. For example, w can be such a function deﬁned as follows: w(v) = 1−f (v) if f (v) = 0; otherwise w(v) = ∞, where f (v) is the ratio of the residual light splitting/wavelength conversion ability to the initial light splitting/wavelength conversion capacity at v with 0 ≤ f (v) ≤ 1. When f (v) = 0, all messages entering v will be trapped at v and there is no outgoing ﬂow from v. Thus, we may set w(v) to be a suﬃciently large positive value, and v is unlikely to be included as an internal node into multicast trees. When f (v) = 1, which means that there is no traﬃc load at v at all or the light splitting/wavelength conversion ability at v is full, we may simply set w(v) = 0. For a node v with 0 < f (v) < 1, its light splitting/wavelength conversion ability is limited. It can be seen that node v has some light splitting/wavelength conversion ability if 0 ≤ w(v) < 1. To ensure a multicast tree does not contain any unnecessary node v with w(v) = 0 as its internal node, each node v with w(v) = 0 can be assigned a small value, e.g. 1 , where n is the number of nodes in the network. w(v) = = n+1 2.2

Node Cost Model

For a given WDM network G and a multicast request, a multicast tree rooted at the source and spanning all the terminals is built if there are suﬃcient network

Online Multicasting in WDM Networks with Shared Light Splitter Bank

961

resources to realize the request. Since each leaf node in the multicast tree only receives messages from its parent, no light splitting/wavelength conversion is needed at the node. Thus, we deﬁne the cost C(T ) of a multicast tree T in G is the weighted sum of all the internal nodes in T . We refer to this model as the node cost model, which aims to be used in minimizing the utilization of the light splitting/wavelength conversion resources in the multicast tree per request. 2.3

Problem Deﬁnition

The multicast tree for a given multicast request (s; D) in G is such a tree rooted at s and spanning all the nodes in D that all its leaf nodes are terminals, where the source s ∈ V − D and the terminal set D ⊂ V . The optimal multicast tree for a given multicast request (s; D) is such a multicast tree that the weighted sum of the internal nodes in the tree is minimized. The optimal multicast tree problem is to ﬁnd an optimal multicast tree for a given multicast request (s; D) in G. The optimal multicast tree problem is referred to as the optimal broadcast tree problem when D = V − {s}. The online multicast request maximization problem for a sequence of multicast requests is to maximize the number of the realized requests in the sequence until the system is unable to accommodate any further requests. Due to the nature of unforeseen future requests, it is very diﬃcult to provide an exact solution to the online multicast request maximization problem, instead, in this paper we focus on ﬁnding a “cost-optimal” multicast tree for each request under the node cost model. We must mention that we here deal with the WDM networks with shared light splitter bank, the availability of light splitters/wavelength converters at a node is the major concern and the link traﬃc load will not be taken into account in the node cost modelling.

3

Algorithms Based on the Node Cost Model

In this section we ﬁrst show that the optimal multicast tree problem under the proposed node cost model is NP-complete. We then provide approximation and heuristic algorithms for the problem of concern. 3.1

NP-Hardness of the Optimal Multicast Tree Problem

In the following we show that the optimal broadcast tree problem is NP-complete by a reduction from the maximum leaf spanning tree problem (MLST for short) in G, which is to ﬁnd a spanning tree in G such that the number of leaves in the tree is maximized. MLST has been shown to be NP-complete [12]. In fact, in terms of computational hardness, the optimal broadcast tree problem and MLST are equivalent within polynomial time. In addition, the optimal broadcast tree problem is a special case of the optimal multicast tree problem, thus, the optimal multicast tree problem is also NP-complete.

962

Y. Liu and W. Liang

Theorem 1. The optimal broadcast tree problem in a WDM network G(V, E, w) with shared splitter bank is not only NP-complete but also complete for MAXSNP. Proof. Given an instance G(V, E) of MLST and an integer k, the decision version of MLST is to determine whether there is a spanning tree in G such that the number of leaf nodes in the tree is no less than k. We now construct an instance – a WDM network G(V, E, w) of the optimal broadcast tree problem, where each node v in V has identical light splitting/ wavelength conversion ability w(v) = r > 0. Let Topt be the optimal broadcast tree in G and n1 the number of leaf nodes in Topt . Then, the weighted sum of the internal nodes in the tree is r ∗ (n − n1 ), which is the minimum when n1 is maximized. Given the instance G(V, E) of MLST, we can see that there is a corresponding instance G(V, E, w) of the optimal broadcast tree problem with an integer r ∗ (n − k), and thus there is a broadcast tree in G(V, E, w) such that the weighted sum of its internal nodes is no more than r ∗ (n − k). Clearly, to verify whether a given tree is a solution to the optimal broadcast tree problem can be done within polynomial time. Thus, the optimal broadcast tree problem is NP-complete. It is easy to show that the optimal broadcast tree problem and the MLST problem are equivalent in terms of computational complexity under polynomial time reduction. It has been shown that the MLST problem is not only NP-complete [12] but also complete for MAX-SNP [13], which means that it does not permit a fully polynomial-time approximation schema unless P = N P [14]. Thus, it is unlikely to have a fully approximation schema to the optimal broadcast tree problem unless P = N P . 3.2

A Simple Approximation Algorithm

Due to the NP hardness of the optimal multicast tree problem, we provide a simple approximation algorithm for the optimal multicast tree problem, which is referred to as algorithm SA. The edge-weighted directed Steiner tree problem for a source s and a terminal set D is to ﬁnd a tree in G rooted at s and spanning all the nodes in D such that the weighted sum of the edges in the tree is minimized. Now we approach the optimal multicast tree problem by reducing it to the edge-weighted directed Steiner tree problem for the source s and the terminal set D in an auxiliary directed graph G (V , E , w ), where V = {v1 , v2 | v ∈ V }, E = {v1 , v2 | v ∈ V } ∪ {v2 , u1 , u2 , v1 | (u, v) ∈ E}, w (v1 , v2 ) = w(v) and w (v2 , u1 ) = w (u2 , v1 ) = 0, s = s1 , D = {v1 | v ∈ D}. Theorem 2. Given a WDM network G(V, E, w), a source s and a terminal set D, s ∈ V − D, D ⊂ V , assume that G (V , E , w ) is the corresponding auxiliary graph of G. Let T be a solution to the edge-weighted directed Steiner tree problem for s and D in G . Then, T is a solution to the optimal multicast tree problem for (s; D), where V (T ) = {v | v1 ∈ V (T )} and E(T ) = {v, u | v1 is an internal node in T and v2 , u1 ∈ E(T )}.

Online Multicasting in WDM Networks with Shared Light Splitter Bank

963

Proof. If v1 is an internal node in T , then v2 ∈ V (T ) because v1 , v2 is the only edge starting from v1 in G . Since v2 ∈ / D , v2 is not a leaf node in T , then there exists a node u1 in T such that v2 , u1 ∈ E(T ). Thus, T is a tree. If v1 , v2 is an edge in T , v is an internal node in T . Then we have C(T ) = W (T ), where C(T ) is the weighted sum of the internal nodes in T , and W (T ) is the weighted sum of the edges in T . Now we prove that C(T ) is minimized. If there is another tree T1 rooted at s and spanning all the nodes in D and C(T1 ) < C(T ). We deﬁne T1 as follows. V (T1 ) = {v1 | v ∈ V (T1 )} ∪ {v2 | v is an internal node in T1 } E(T1 ) = {v1 , v2 , v2 , u1 | v, u ∈ E(T1 )} Then, T1 is a tree in G rooted at s and spanning all the nodes in D , and W (T1 ) = C(T1 ). Thus, we have W (T1 ) = C(T1 ) < C(T ) = W (T ), which contradicts to the assumption that T is a solution to the edge-weighted directed Steiner tree problem for s and D . Following Theorem 2, an approximation solution to the edge-weighted directed Steiner tree problem in G can be transformed into an approximation solution to the optimal multicast tree problem in G. It is known that the best possible approximation solution for the directed Steiner tree problem so far is O(|D |δ ) times of the optimum [15], where δ is a constant with 0 < δ ≤ 1, |D | = |D|. We thus have the following theorem. Theorem 3. Given a WDM network G(V, E, w) with shared light splitter bank and a multicast request (s; D), there is an approximation solution to the optimal multicast tree problem, which is O(|D|δ ) times of the optimum, where δ is a constant with 0 < δ ≤ 1. 3.3

A Heuristic Algorithm

In the following we propose a heuristic algorithm for the optimal multicast tree problem. The proposed heuristic is similar to the approximation algorithm for the node-weighted Steiner tree problem, referred to as algorithm KR, by Klein and Ravi [16] but with some important modiﬁcations. The node-weighted Steiner tree problem is to ﬁnd a tree in G spanning all the nodes in terminal set D such that the weighted sum of the nodes in the tree is minimized. The algorithm KR maintains a forest F that consists of a node-disjoint set {T1 , T2 , . . . , Tk } of trees and contains all the terminals, 1 ≤ k ≤ |D|. Initially, each terminal by itself is a tree. The algorithm uses a greedy strategy to iteratively merge several current trees into a larger tree until there is only one tree left in the forest. In each iteration, the algorithm selects a node and a subset of the current trees of size at least two so as to minimize the ratio w(v) + Tj ∈S d(v, Tj ) (1) |S|

964

Y. Liu and W. Liang

where S ⊆ F, |S| ≥ 2, d(v, Tj ) is the distance from v to Tj . The distance along a path in algorithm KR does not include the weights of the two endpoints of the path. Thus, the choice minimizes the average node-to-tree distance. The algorithm uses the shortest paths between the node and the selected trees to merge the trees into a larger one. To implement each iteration, for each node v, the quotient cost of v is deﬁned to be the minimum value of (1), taken over all subsets of the current trees of size at least two. To ﬁnd the quotient cost of v, the algorithm computes the distance dj from v to each Tj , assuming without loss of the generality that the trees are numbered so that d1 ≤ d2 ≤ . . . ≤ dk . In computing the quotient cost of v, it is suﬃcient to consider subsets of the form {T1 , T2 , . . . , Ti }, 2 ≤ i ≤ k. The quotient cost for a given node can be calculated in polynomial time, and the minimum quotient cost can then be determined. Thus, each iteration can be carried out within polynomial time. The solution delivered by the algorithm for node-weighted Steiner tree problem is 2 ln |D| times of the optimum. Note that the approximation of the solution is within a constant factor of the best possible approximation achievable in polynomial time unless P˜ ⊇ N P [17]. Guha and Khuller [18] later provided an improved algorithm for the problem with a better approximation ratio at the expense of a longer running time. Their improved algorithm delivers a solution within 1.35 ln |D| times of the optimum. It should be emphasized that the problem we deal with is diﬀerent from the one discussed by Klein and Ravi [16], despite there being some similarities between them. Their approximation analysis is based on an assumption that the weight of each terminal is zero, since all the terminals will be included into the Steiner tree. However, for the optimal multicast tree problem, we treat each terminal diﬀerently, depending on whether or not it is an internal node in a multicast tree. If it is, its node weight should be taken into account; otherwise its node weight can be ignored because a leaf node is only a receiver of messages and no light splitting/wavelength conversion is needed at it. Thus, the solution delivered by algorithm KR is not an approximation solution for the optimal multicast tree problem. Now, we propose a heuristic for the optimal multicast tree problem based on some modiﬁcations to algorithm KR. The diﬀerences between our heuristic and algorithm KR are at the following crucial steps in deﬁning the length of a path between two nodes and calculating the quotient cost of a node. Assume that there are k trees T1 , T2 , . . . , Tk currently, k ≤ |D|. To compute the quotient cost of a given node v, we need to compute the distance from v to Tj , which is in turn reduced to computing the length of the shortest paths between v and every node u in Tj . In our algorithm, the length of a path between v and a tree node u is the weighted sum of the nodes in the path except u and v if u is not a leaf node in Tj or Tj = {u}; otherwise, the length of the path is the weighted sum of all the nodes in the path except v. While computing the quotient cost of a node v in (1) in our algorithm, w(v) is not taken into account if v is an internal node in one of the current trees; otherwise w(v) is included in the calculation of the quotient cost of v. We refer to this heuristic as algorithm MKR.

Online Multicasting in WDM Networks with Shared Light Splitter Bank

4

965

Performance Study

In this section we evaluate the performance of the proposed approximation algorithm SA and heuristic algorithm MKR against that of two existing algorithms KR and SPT in terms of network throughput by conducting experimental simulations, where SPT is the edge-weighted shortest path tree algorithm in which each edge has identical weight. 4.1

Simulation Environment

We assume that 100 nodes are deployed randomly in a region of 10× 10 m2 using the NS-2 simulator. For each pair of nodes u and v, a random number ru,v is generated, 0 ≤ ru,v < 1. Whether or not u and v are connected is determined by ru,v and the edge probability [19,20] P (u, v) = βe

−d(u,v) Lα

where d(u, v) is the Euclidean distance between u and v, L is the maximum distance between nodes in the region, and α and β are the parameters governing the edge density in the network, 0 < α, β ≤ 1. There is an edge between u and v if and only if ru,v < P (u, v). Diﬀerent values of α and β result in diﬀerent network topologies even with the same node distribution. We assign a weight to each node in the network to model its light splitting/wavelength conversion ability. Initially, the weight w(v) is a random number between zero and one. w(v) increases by c if v is an internal node of the multicast tree built for a multicast request that consumes light splitting/wavelength

Network throughput

200

150

100

MKR SA KR SPT

50

0 10

20

30

40

50

Size of the terminal sets Fig. 1. Comparison of the network throughputs with α = 0.3 and β = 0.3

966

Y. Liu and W. Liang

Network throughput

200

150

100

MKR SA KR SPT

50

0 10

20

30 40 Size of the terminal sets

50

(a) α = 0.3, β = 0.4

Network throughput

200

150

100

MKR SA KR SPT

50

0 10

20

30 40 Size of the terminal sets

50

(b) α = 0.4, β = 0.3

Network throughput

200

150

100

MKR SA KR SPT

50

0 10

20

30 40 Size of the terminal sets

50

(c) α = 0.9, β = 0.1 Fig. 2. Comparison of the network throughputs with various values of α and β

Online Multicasting in WDM Networks with Shared Light Splitter Bank

967

conversion resources of amount c . Node v has no light splitting/wavelength conversion ability when its current weight is greater than or equal to one. We assume that the sequence of multicast requests consists of 200 requests. We randomly select the source and the terminal set for a multicast request, and the size of the terminal set ranges from 10 to 50 with increments of 10. We also assume that each multicast request lasts for a period of time and consumes a certain amount of light splitting/wavelength conversion resources. For simplicity, we further assume that the consumption of these resources is identical for all the internal nodes in a multicast tree. We simulated various algorithms on 10 diﬀerent randomly generated network topologies for diﬀerent problem size. For each size of the network instance, the value shown in the graphs is the mean of 10 individual values obtained by running each algorithm on these 10 randomly generated network topologies. 4.2

Performance Evaluation of Various Algorithms

We ﬁrst evaluate the network throughputs delivered by diﬀerent algorithms with various sizes of the terminal sets when α = 0.3 and β = 0.3. It can be seen from Fig. 1 that algorithms MKR and SA outperform algorithms KR and SPT signiﬁcantly with the sizes of the terminal sets varying from 10 to 50 with increments of 10. When the terminal set consists of 10 nodes, more than 90% of the requests can be realized by algorithms MKR and SA, whereas the realization ratios of algorithms KR and SPT are only 80% and 60% respectively. In addition, the network throughput delivered by algorithm KR drops faster than those delivered by the other algorithms with growth of the size of the terminal sets. When the size of the terminal sets reaches 50, the realization ratio of algorithm KR is only 50%, whereas algorithms MKR and SA can still realize around 70% of the requests. It also can be observed from Fig. 1 that the network throughput delivered by algorithm MKR is always greater than that delivered by algorithm SA for various sizes of terminal sets. We then change the edge density in various network topologies by varying the values of α and β. It is indicated in Fig. 2 that there is no signiﬁcant diﬀerence among the algorithms in terms of the performance, compared with the case where α = 0.3 and β = 0.3, i.e. the performance of algorithms MKR and SA is constantly better than that of algorithms KR and SPT.

5

Conclusion

In this paper we have studied online multicasting in WDM networks with shared light splitter bank aiming at maximizing the network throughput. We ﬁrst proposed a node cost model for multicast trees that models the cost of utilizing the network resources such as light splitters/wavelength converters at nodes. We then showed that ﬁnding a cost-optimal multicast tree under the proposed cost model is NP-complete, and instead devised approximation and heuristic algorithms for ﬁnding such cost-optimal multicast trees. We ﬁnally conducted

968

Y. Liu and W. Liang

experiments to evaluate the performance of the proposed algorithms. The experimental results show that the proposed algorithms are eﬃcient and eﬀective in terms of network throughput. Acknowledgment. It is acknowledged that the work by the authors is fully funded by a research grant No:DP0449431 by Australian Research Council under its Discovery Schemes.

References 1. T. F. Znati, T. Alrabiah and R. Melhem. Low-cost delay-bounded point-to-point multipoint communication to support multicasting over WDM networks. Computer Networks, Vol.38, pp.423–445, 2002. 2. G. N. Rouskas. Optical layer multicast: rational, building blocks, and challenges. IEEE Network, Vol.17, pp.60–65, 2003. 3. M. Ali and J. Deogun. Power-eﬃcient design of multicast wavelength-routed networks. IEEE JSAC, Vol.18, pp.1852–1862, 2000. 4. Y. Zhou and G. S. Poo. Optical multicast over wavelength-routed WDM networks: A survey. Optical Switching and Networking, Vol.2, pp.176–197, 2005. 5. Z. Zhang and Y. Yang. Online optimal wavelength assignment in WDM networks with shared wavelength converter pool. Proc. of IEEE INFOCOM’05, pp.694–705, 2005. 6. G. Sahin and M. Azizoglu. Multicast routing and wavelength assignment in widearea networks. Proc. of SPIE Conf. on All-Optical Networks, Vol.3531, pp.196–208, 1998. 7. R. Libeskind-Hadas and R. Melhem. Multicast routing and wavelength assignment in multi-hop optical networks. IEEE/ACM Trans. Networking, Vol.10, pp.621–629, 2002. 8. J -C Bermond, L. Gargano, S. Perennes, A. Rescigno, and U. Vaccaro. Eﬃcient collective communications in optical networks. Theoretical Computer Science, Vol.233, pp.165–189, 2000. 9. R. Libeskind-Hadas. Eﬃcient collective communication in WDM networks. Proc. of IC3 N, pp.612–616, 2000. 10. X. Zhang, J. Wei and C. Qiao. Constrained multicast routing in WDM networks with sparse light splitting. Proc. of IEEE INFOCOM’00, pp.1781–1790, 2000. 11. B. Chen and J. Wang. Eﬃcient routing and wavelength assignment for multicast in WDM networks. IEEE Journal on Selected Areas in Communications, Vol.20, pp.97–109, 2002. 12. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP–Completeness. W. H. Freeman, 1979. 13. G. Galbiati, F. Maﬃoli and A. Morzenti. A short note on the approximability of the maximum leaves spanning tree problem. Inform. Proc. Lett., Vol.52, pp.45–49, 1994. 14. C. H. Papadimitriou and M. Yannakakis. Optimization, approximation, and complexity classes. Proc. of STOC, ACM, pp.229–234, 1988. 15. M. Charikar, C. Chekuri, T-Y Cheung, Z. Dai, A. Goel, S. Guha and M. Li. Approximation algorithms for directed Steiner problems. J. Algorithms, Vol.33, pp.73–91, 1999.

Online Multicasting in WDM Networks with Shared Light Splitter Bank

969

16. P. N. Klein and R. Ravi. A nearly best-possible approximation algorithm for nodeweighted Steiner trees. J. Algorithms, Vol.19, pp.104–114, 1995. 17. C. Lund and M. Yannakakis. On the hardness of approximating minimization problems. J. of the ACM, Vol.41, pp.960–981, 1994. 18. S. Guha and S. Khuller. Improved methods for approximating node weighted Steiner trees and connected dominating sets. Inform. and Comp., Vol.150, pp.57– 74, 1999. 19. B. Waxman. Routing of multipoint connections. IEEE Journal on Selected Areas in Communications, Vol.6, pp.1617–1622, 1988. 20. F. Bauer and A. Varma. ARIES: a rearrangeable inexpensive edge-based online Steiner algorithm. IEEE Journal on Selected Areas in Communications, Vol.15, pp.382–397, 1997.

Evaluation of Optical Burst-Switching as a Multiservice Environment Pablo Jes´ us Argibay-Losada, Andres Su´ arez-Gonz´alez, Manuel Fern´ andez-Veiga, Ra´ ul Rodr´ıguez-Rubio, and C´ andido L´ opez-Garc´ıa Departamento de Enxe˜ ner´ıa Telem´ atica, Universidade de Vigo Campus Universitario s/n, E-36310 Vigo, Spain

Abstract. We propose a scheme for providing diﬀerentiated services in an optical burst switching network environment. Our framework considers diﬀerentiation among several traﬃc classes in packet loss probability, assuming only that the network diﬀerentiates neatly between two classes of bursts. We formulate a reduced load ﬁxed point model for evaluating the blocking probabilities of the diﬀerent types of bursts. The model is the basis to compare several ﬂavors of OBS in terms of ability for achieving quality of service. Motivated by technological constraints, we investigate the eﬀect of wavelength density on its performance.

1

Introduction

1

In optical burst switching networks (OBS), the data transport plane and the control plane are fully decoupled, both in time and space [1]. Packets are assembled at access nodes according to destination, class or quality of service, and form bursts to be entered into the optical core. Within the signaling plane, a control packet is sent ahead of the burst along the same path so as to reserve the necessary resources inside the core nodes. After a delay, the data burst is sent along the all-optical path set by the prior reservation. However, control packets are not acknowledged, and a burst may be discarded owing to contention with other bursts at the buﬀerless optical switches. Though this description grabs the essential operations of OBS networks, there exist a number of possibilities regarding how the access nodes and the switches actually work. First, bursts can be assembled according to many diﬀerent criteria, and their statistical and temporal behavior (length, time variability, correlation), in addition to the routing strategy, turn out to be paramount factors in determining the degree of contention at the optical switches [2]. Moreover, such contention can be managed in several diﬀerent ways by means of proper burst scheduling [3,4]. Thus, the scheduling algorithm running in the nodes has a strong inﬂuence in the probability of the bursts being successfully transmitted to their destination [5,6] and in the provision of quality of service (QoS) to 1

This work was supported by the Ministerio de Educaci´ on y Ciencia through the project TIC2003-09042-C03-03 of the Plan Nacional de I+D+I (partially ﬁnanced with FEDER funds).

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 970–980, 2007. c IFIP International Federation for Information Processing 2007

Evaluation of Optical Burst-Switching as a Multiservice Environment

971

the bursts and the data packets conveyed within them [7,8]. Included among the mechanisms conceived to enhance the performance of OBS networks are segmentation (i.e., splitting the burst into smaller independent units [9,10]), deﬂection (i.e., rerouting of bursts in conﬂict [11]), or the use of diﬀerent time oﬀsets [12]. If carefully engineered, these algorithms are all capable of providing some degree of QoS in terms of burst blocking probability, at least for underloaded networks. Given this variety of approaches to support QoS inside OBS, the question we ask in this paper is whether the diﬀerentiation oﬀered to the optical bursts can be exploited outside the optical domain. In real networks, OBS is likely to be deployed incrementally, coexisting with electronic packet switching. We show that it is possible to diﬀerentiate proportionally in packet loss probability among several classes of traﬃc arriving to the access nodes, provided only the OBS network oﬀers two widely separated blocking priorities at the burst level. The particular scheduler used in the optical nodes plays a secondary role in this architecture. In order to bind burst blocking and packet loss probability, we propose a scheme based on a probabilistic classiﬁcation of packets into high and low priority bursts. For the performance analysis of this system, a general reduced load ﬁxed-point model for the OBS network is developed. The model accounts for networks with any number of internal traﬃc classes and two controlplane OBS variants, and can easily incorporate diﬀerent scheduling algorithms in the nodes. In fact, we take an abstract view of the algorithm, and analyze two broad classes of general schedulers which include many of the former as particular cases. By solving the system model for three simple topologies, the region of feasible diﬀerentiation is evaluated. Speciﬁcally, given a combination of scheduler, topology and transmission resources, the blocking probabilities for all the priority classes of end-to-end burst ﬂows are computed, and the QoS capabilities of this architecture are elucidated. The paper is organized as follows. Section 2 describes the proportional differentiated services scheme. In Section 3 we formulate the general ﬁxed point approximation for a two priority OBS network. Section 4 explains how to incorporate into the model the scheduling algorithm, and Section 5 states the cases of full meshes, star and ring topologies. In Section 6 the model is used to analyze the diﬀerentiation capabilities of the proposed scheme as a function of the scheduling algorithms, the network topology and the density of the WDM links in the OBS network. We summarize our conclusions in Section 7.

2

The Packet Diﬀerentiation Algorithm

Suppose that N classes of packets are oﬀered to an OBS network. We aim to provide diﬀerentiated services in such a way that the packet loss probability (PLP) of any two classes i and j satisﬁes pi ci = pj cj where cx is an arbitrarily assigned coeﬃcient that measures the class’ quality of service. Assume also that the transport network diﬀerentiates between two

972

P.J. Argibay-Losada et al.

classes of bursts, say type 0 and type 1 bursts, in such a way that the loss probability of packets transported in type 0 bursts (respectively, 1) is B0 (B1 ), where B0 B1 . If packets of class x are assembled into type 1 bursts with probability hx1 , and into type 0 bursts with probability 1 − hx1 , then the ratio of PLP between classes i, j ∈ {1, . . . , N } is pi (1 − hi1 )B0 + hi1 B1 = . pj (1 − hj1 )B0 + hj1 B1

(1)

Under the assumption (1 − hi1 ) · B0 /B1 1, this becomes pi /pj hi1 /hj1 that is, the packet loss ratio is approximately the ratio of two arbitrarily chosen constants. Consequently, by varying the fraction of packets of each class transmitted through the underlying classes of bursts it is possible to provide proportional loss probabilities. The range of feasible diﬀerentiations with this probabilistic scheme is bounded by the case in which one packet class goes entirely through the low priority (type 1) bursts, and the other class is always assigned to the high priority bursts, giving a maximum diﬀerentiation power of pi /pj |max B1 /B0 . For any number of packet classes and its corresponding set of diﬀerentiation factors, set B1 /B0 equal to the ratio between the largest and lowest coeﬃcient. It is worthy of mention that the probabilistic mapping of several external traﬃc classes into two internal transport levels is easy to incorporate into the network equipment, and is independent of any technology, so it could be used at the edge of any core network, either OBS or not.

3

System Model for the OBS Network

We shall use a ﬁxed point model [13] to analyze and dimension an OBS network with service diﬀerentiation capabilities. Fixed point models have been widely used in the circuit-switching world, in packet switching environments and in OBS to analyze the fundamental performance of those networks [14],[15]. In this work, we assume that the rate at which sources generate packets is independent of the state of the network, therefore having an open-loop system. The case in which sources are responsive to the network state can be treated by extending the system model with the equations that capture the dependency between data rate generation and network state [16], and is left for further study. One feature explicitly included is the use of diﬀerent burst schedulers. Consider an OBS network with L links, N nodes with full wavelength conversion and two internal types of bursts. Each link l is unidirectional having capacity Cl . Let R be the set of routes in the network, α ∈ R an origin-destination pair, and (x, α) a Poisson process modeling the priority x burst arrivals to route α, with rate λ(x,α) . Each burst will attempt to reserve in each node along its path a time S dependent on the scheduling discipline at use in the node. For example, in JET mode, each burst uses the resource for L/C time units, where L is the burst length and C is the capacity of each wavelength in the WDM network, whereas in JIT [17] the reservation interval begins with the arrival of the control packet, L , and its length equals the oﬀset plus the burst transmission time, so S JET = C

Evaluation of Optical Burst-Switching as a Multiservice Environment

973

L and S JIT = Oﬀset + C . Hence, each policy p , p ∈ {JET, JIT} will induce an p oﬀered traﬃc intensity Ap(x,α) = λ(x,α) · S(x,α) . Denote by Axl and Bxl the oﬀered traﬃc intensity and blocking probability of class x bursts at link l, respectively. Then, the implicit solution of the following ﬁxed point system

Axl =

r∈R

o(l,r)−1

Apx,r Ir,l

(1 − Bxir ) ,

Bxl = Λ(A0l , A1l )

(2)

i=1

gives the vector Φx = (Bx1 , Bx2 , . . . , BxL ) of blocking probabilities for class x at the L links in an arbitrary network. Ir,l (1 if link l belongs to route r; 0, otherwise) is the |R| × L topology matrix of the network, o(l, r) is the ordinal of link l in route r, ir is the i-th link of route r, and Λ(·, ·) is a mapping giving the losses for each class as a function of the oﬀered load, the capacity of link l and the local scheduling algorithm. Note that the load oﬀered to link l includes the sum from the whole set of routes traversing that link. In the system model, the traﬃc contributed by route α is approximated as a Poisson process with intensity λ(x,α) thinned by the losses in the links preceding l along that route. This approximation is more accurate as the degree of connectivity of the network increases. Note also that the model has been formulated only for two classes of bursts, but the generalization to an arbitrary number of traﬃc classes is straightforward. After solving for the link blocking probabilities, the blocking probability of bursts of class x in route r is simply Bxr = 1 − i∈r (1 − Bxi ).

4

The Scheduling Algorithms

Each scheduling algorithm deﬁnes a functional form Λ for the losses suﬀered by both classes of bursts. For instance, if Λ was the Erlang-B formula and the network had only one traﬃc class, the model would be equivalent to the Erlang ﬁxed point approximation for an OBS network. However, we are interested in analyzing more general architectures, and focus in this section on two scheduling policies, namely a system with ﬁxed priorities and preemptive service, and a system with time-dependent priorities based on an occupancy threshold. 4.1

Preemptive Priorities

Assume class 0 bursts are given preemptive priority over class 1 bursts. Thus, a control packet for a burst that has been preempted continues to reserve resources for a burst that no longer exists, with the possibility of blocking other class 1 bursts that actually contain data. Following [18], this kind of scheduler can be modeled as if class 0 bursts were unaﬀected by the others and approximating the blocking probability of the combined traﬃc as that resulting from the aggregate oﬀered load. Such model does not take into account the eﬀect of the preemptions,

974

P.J. Argibay-Losada et al.

so it is expected to produce good estimations in low load regimes. Therefore, B EB (A0 + A1 , m), B0 = EB (A0 , m) and B1

B − β0 B0 ELP (A0 , A1 , m) 1 − β0

0 the fraction of class 0 bursts arrivals.2 The PLP will in general with β0 = λ0λ+λ 1 be diﬀerent than the blocking probability for bursts if the latter depends on the bursts’ length. This will not happen if the assembler produces ﬁxed length bursts and, depending on the speciﬁc scheduler being in use at a node, if the oﬀsets of the bursts arriving at that node are the same. But the assumption of constant oﬀsets does not hold for the scheduler being described in this section. Nevertheless, given that the diﬀerence between the two quickly diminishes with the number of output channels in this scheduler [19], we will approximate the loss probability of a packet by the blocking probability of its burst type.

4.2

Threshold-Induced Priorities

An alternative method for establishing priorities between bursts could be the use of a threshold u such that if the number of reserved resources when a burst of low priority is due to arrive is higher than the threshold, the burst is blocked. In this case, the blocking probabilities of both classes depend on the mtraﬃc intensities of the two classes [20], and are given by B0 = P (m), B1 = i=u P (i), with (A0 +A1 )i P (0) i ∈ [1, u] i! P (i) = i−u u A0 (A0 + A1 ) i! P (0) i ∈ [u + 1, m] and P (0) the normalizing constant such that m i=0 P (i) = 1. This is an appealing scheme for implementing priorities, since it oﬀers the possibility of dynamically adjusting the threshold in response to changes in the network state.

5

Modelling the Topology

The system model (2) can be explicitly rewritten for simple topologies. Let us consider here three cases. 5.1

Full Mesh OBS Network

The network is composed by N nodes fully connected in pairs, with m wavelengths per link. The traﬃc intensities in each possible origin-destination pair are A0 and A1 . In a full mesh, every link is a single route, and its blocking probability depends only on the oﬀered traﬃc Axl . If the OBS nodes use the preemptive scheduler, then Axl = (N −1)Ax , B0l = EB (A0l , m), and B1l = ELP (A0l , A1l , m). Since all routes have one-link paths, B0 and B1 are the blocking probabilities in the route under consideration. 2

Throughout the paper, EB is the Erlang-B formula, m is the number of wavelengths on link l and ELP stands for the blocking probability of low priority bursts.

Evaluation of Optical Burst-Switching as a Multiservice Environment

5.2

975

Star OBS Network

Consider now a network composed of N core nodes connected through one additional core node by means of bidirectional links with m wavelengths per direction. There are M edge nodes connected through high-capacity links (no losses in the link going from the edge to the core) to each of the N external core nodes. Assuming that routes between two edge nodes connected to the same core node do not exist, and that there is the same traﬃc intensity between each pair of edge nodes for each priority, then there are two points where a burst can be blocked, the link going from its nearer core node to the central switch, and the link from the central switch to the destination core. Denote by Ax1 , Ax2 , Bx1 and Bx2 the intensities and blocking events of class x bursts oﬀered to those two links. Then, with the preemptive scheduler, we have Ax1 = (N − 1)M 2 Ax , Ax2 = Ax1 · (1 − Bx1 ), and blocking probabilities B01 = EB (A01 , m),

B11 ELP (A01 , A11 , m)

B02 EB (A02 , m),

B12 ELP (A02 , A12 , m).

Finally, the end-to-end blocking probability for a burst of class x will be Bx = 1 − (1 − Bx1 )(1 − Bx2 ). 5.3

Ring OBS Network

This network is a ring of N core nodes connected by bidirectional links, where M edge nodes are connected to each core node through high-capacity links, as before, and there is no traﬃc between two edges from the same core. If there is traﬃc between every pair of edges for each priority, and shortest-path routing is used, then, for N odd the oﬀered load to each link in the ring is the same, N2 −1 (N/2 − i) (1 − Bxl )i , and by symmetry. Consequently, Axl = M 2 Ax i=0 B0l = EB (A0 , m), B1l = ELP (A0 , A1 , m). The blocking probability of class-x bursts that traverse a path of L hops in the ring is given by Bx = 1 − (1 − Bxl)L for L = 1, . . . , N/2.

6

Diﬀerentiation Performance

In this section we use the model to provide guidelines for the topology design and conﬁguration of a service-diﬀerentiation enabled OBS network in several useful scenarios: full mesh, star, ring and mesh topologies. 6.1

Full Mesh

We begin with the case of a fully connected network of core nodes, each one also connected to several edge nodes. Fig. 1(a) depicts the loss probabilities of both burst classes as a function of the link load, in when the oﬀered traﬃc from both priorities is the same. This conﬁguration uses 10 wavelengths per link, an oﬀset of 10 ms, an average burst size of 0.1 s and the preemptive scheduler.

976

P.J. Argibay-Losada et al.

100000

1e+12

1 1e+10 1e-05 1e-10

1e+08

1e-15 B

1e+06 1e-20 B0 JET B1 JET B1 / B0 JET B0 JIT B1 JIT B1 / B0 JIT

1e-25 1e-30 1e-35 1e-40 1e-04

0.001

0.01 A/W

10000 100

0.1

1

(a) preemptive scheduler

1 1e-04

B0 JIT / B0 JET B1 JIT / B1 JET Dif JET / Dif JIT 0.001

0.01 A/W

0.1

1

(b) threshold-based scheduler

Fig. 1. Blocking probabilities and diﬀerentiations in a full mesh

The ﬁrst observation is that JIT has higher blocking than JET for all bursts, as a result of having a higher demand for wavelength reservation than JET for a burst of a given size. The proportional diﬀerentiation is the same for both schemes in the low load region, until losses in the JIT case approaches unity — some two orders of magnitude sooner than JET, when the load is around 0.01. With m wavelengths and a traﬃc intensity ratio A1 /A0 = K, the diﬀerentiation B1 /B0 in the low load region has an asymptotic value (1 + K)m+1 − 1 ELP (A0 , KA0 , m) = . A0 →0 EB (A0 , m) K

D lim

In this case K = 1, so D = 2047, which is what we see in the ﬁgure. This result may be used to dimension the network under the constraint of a desired maximum diﬀerentiation for the packets of the external classes. Having in mind that the network load must be low, this approach could be coupled with some form of admission control to eﬀectively ensure the desired diﬀerentiation. Regarding prorportional diﬀerentiation, JET and JIT are similar until the load approaches the high loss region for JIT, with the result that at medium loads JET performs better for the packets of the external classes. Figure 1(b) shows the results using a threshold-based scheduler with a threshold of 5 wavelengths. Now, the behavior at low loads is dependent on the absolute value of the load: the largest diﬀerentiation achievable for the external traﬃc decreases with the load. Thus, JIT and JET show diﬀerent proportional diﬀerentiation levels, with JET achieving much better diﬀerentiation between high and low priority bursts. ombining no wavelength conversion in the node with the preemptive scheduler, the behavior is the same as with wavelength conversion, but the blocking probabilities are much larger. In this case we have that the asymptotic diﬀerentiation in the low load case is lim

A0 ,A1 →0

ELP (A0 , A1 , m) = K + 2. EB (A0 , m)

which is substantially lower (near 3) than with wavelength conversion (2047). Incidentally, though not shown, the results for a star topology are very similar to the full mesh, albeit with slightly larger blocking probabilities.

Evaluation of Optical Burst-Switching as a Multiservice Environment

100000

977

1e+12

1

B0 JIT / B0 JET B1 JIT / B1 JET Dif JET / Dif JIT

1e+10

1e-05 1e+08 1e-10 1e+06

B

1e-15 1e-20

10000

1e-25 1e-30 1e-35 1e-40 1e-04

100

B0 JET B1 JET B1 / B0 JET B0 JIT B1 JIT B1 / B0 JIT 0.001

0.01 A/W

0.1

1

1

(a) Blocking probabilities and differentiations

0.01 1e-04

0.001

0.01

0.1

1

A/W

(b) Proportional diﬀerentiation

Fig. 2. Ring topology with preemptive scheduler

6.2

Ring

In this case we consider a ring network with 4 core nodes and 5 edge nodes connected to each core node. The results for this conﬁguration are shown in Figs. 2(a) and 2(b). In these plots, B0 or B1 refer to the average blocking probabilities from all routes. The behavior is remarkably similar to the star topology, and Figures 3(a) and 3(b) display a comparison between both cases. In the former, the blocking probabilities in the ring for the high priority class are around 0.75 times the ones for the star, so it appears that connecting the nodes as a ring in this speciﬁc scenario oﬀers advantages from the point of view of absolute blocking for the high priority class; the ring topology also has a higher blocking for the low priority class, so regarding the proportional diﬀerentiation (Fig. 3(b)) the ring performs better than the star. In a more general case, the comparison between the ring and star topologies can be made explicit using our model. In the low load region, the ratio between the blocking of high priority bursts in rings and stars as a function of the number of nodes in the network, denoted by ρ, is3 B0ring 1 ( N2 + 1)2 N . star A0 →0 B0 8 N −1 2

ρ = lim

Thus, the ratio does not depend on the number of edge nodes, M . It only depends on the number of core nodes, N , and we plot this dependence in Figure 3(c). It rapidly increases with N ; for low values of N the ring is better, and for high values of N the star is better; the point at which the two topologies are the same is around 5. So we can conclude that if the number of core nodes to connect in some topology is lower than 6, then the ring topology is preferable over the star topology in terms of the blocking probabilities. Of course, in the star conﬁguration with large numbers of nodes, the central node must have enough processing power for handling all the traﬃc in the network. 3

This comes from B0star = EB (N − 1)M 2 A0 , m + EB (N − 1)M 2 A0 (1 − B0 ), m N/2−1 and B0ring = EB M 2 A0 i=0 N/2−i (1−Bx )i , m L, where L = (N/2+1)/2 the mean number of hops of a path in the ring.

978

P.J. Argibay-Losada et al.

10

10 B0 JET ring / B0 JET star B1 JET ring / B1 JET star B0 JIT ring / B0 JIT star B1 JIT ring / B1 JIT star

Diff JET ring / Diff JET star Diff JIT ring / Diff JIT star

1

0.1 1e-04

0.001

0.01

0.1

1

1 1e-04

A/W

0.01

0.1

1

A/W

(a) Blocking probabilities 1000

0.001

(b) Relative diﬀerentiation 1e+30 1e+25

B0_ring / B0_star

100

1e+20 1e+15

10 1e+10 100000

1

1 DWDM JET Diff / CWDM JET Diff DWDM JIT Diff / CWDM JIT Diff 0.1 0

10

20

30

40

50

60

70

Number of core nodes

80

90

100 110

(c) Number of core nodes

1e-05 1e-05

1e-04

0.001

0.01

0.1

1

10

100

A/W

(d) Coarse vs. dense DWDM, ring

Fig. 3. Comparison between star and ring topologies

6.3

Coarse WDM and Dense WDM

Next, we study how the diﬀerentiation capability is aﬀected by the presence of more wavelengths in the links. Assuming we have full wavelength conversion and focus on the ring topology, we see in Figure 3(d) the comparison between two architectures, one with 10 wavelength links and another with 100 wavelength links. The DWDM system has superior diﬀerentiation capabilities than the CWDM system. The diﬀerence between 10 an 100 wavelengths accounts for a growth of some 27 orders of magnitude in the ratio of burst blocking probabilities for bursts of each type. It is also possible to use the model to approximate the diﬀerentiation gain between two scenarios with the preemptive scheduler, one with m wavelengths per link and another with L · m wavelengths per link. Assuming low traﬃc load in the links, the blocking probabilities should also be also, so we can approximate the end-to-end blocking probability with the sum of probabilities in the traversed links. Moreover, since blocking probabilities are small, we approximate the oﬀered intensities in all links as the original ones, so the blocking probability for one class of bursts is equal to a constant number of times the blocking on one link. Thus, the network can be approximated by the full mesh case, and if we denote D(x) the proportional diﬀerentiation in blocking probability between the two classes of bursts when there are x wavelengths in a link, we have D(Lm) (1 + K)Lm = lim = (1 + K)(L−1)m A0 →0 D(m) A0 →0 (1 + K)m lim

Evaluation of Optical Burst-Switching as a Multiservice Environment

979

With a scenario of 10 and 100 wavelengths (L = 10) and same traﬃc intensities for both priorities (K = 1), the ratio of possible maximum diﬀerentiations is in the order of (1 + 1)9·10 1027 , and that is what we observe in Figure 3(d).

7

Discussion and Conclusions

The motivation in this work has been to analyze the region of feasible diﬀerentiations in burst blocking probability of OBS networks supporting two internal types of bursts. In the course of that characterization, we have developed an analytical model useful for dimensioning any OBS network in order to be able to oﬀer diﬀerentiated services in blocking probability to several external classes of packets. The model also helps in the task of analyzing the eﬀect of the scheduler algorithm running at the nodes and in choosing a suitable topology and OBS policy. There are some remarkable conclusions drawn from the numerical studies. First, dense WDM oﬀers a substantial advantage over coarse WDM with respect to loss probability diﬀerentiation for the same traﬃc mix. Also, wavelength conversion at the switches (either partial or total) plays a vital role for achieving low blocking probabilities for any amount of oﬀered traﬃc, and therefore it is a necessary function in the proposed QoS architecture. Finally, there is a clear diﬀerence in the capacity of the OBS network between the diﬀerent control planes. For moderate values of the blocking probability, JET outperforms JIT in approximately two orders of magnitude in the throughput.

References 1. C. Qiao and M. Yoo, “Optical burst switching (OBS) - a new paradigm for an optical internet,” Journal of High Speed Networks, vol. 8, no. 1, pp. 69–84, 1999. 2. X. Yu, J. Li, X. Cao, Y. Chen, and C. Qiao, “Traﬃc statistics and performance evaluation in optical burst switched networks,” Journal of Lightwave Technology, vol. 22, no. 12, pp. 2722–2738, 2004. 3. J. Xu, C. Qiao, J. Li, , and G. Xu, “Eﬃcient burst scheduling algorithms in optical burst-switched networks using geometric techniques,” IEEE JSAC, vol. 22, no. 9, pp. 1796–1811, 2004. 4. C. Gauger, M. Kohn, and J. Scharf, “Comparison of contention resolution strategies in OBS network scenarios,” in Proceedings of the 6th International Conference on Transparent Optical Networks, Wroclaw, Poland, July 2004, pp. 18–21. 5. N. Barakat and E. Sargent, “An accurate model for evaluating blocking probabilities in multi-class OBS systems,” vol. 8, no. 2, pp. 119–121, 2004. 6. M. Yoo, C. Qiao, S. Dixit, “QoS performance of OBS in IPoWDM networks,” IEEE J. Select. Areas Commun., vol. 18, no. 10, pp. 2062–2071, 2000. 7. ——, “Optical burst switching for service diﬀerentiation in the next-generation optical internet,” IEEE Commun. Mag., vol. 39, no. 2, pp. 98–104, 2001. 8. Q. Zhang, V. Vokkarane, J. Jue, B. Chen, “Absolute QoS diﬀerentiation in optical burst-switched networks,” IEEE JSAC., vol. 22, no. 9, pp. 1781–1795, 2004. 9. V. Vokkarane and J. Jue, “Prioritized burst segmentation and composite burstassembly techniques for QoS support in optical burst-switched networks,” IEEE J. Select. Areas Commun., vol. 21, no. 7, pp. 1198–1209, 2003.

980

P.J. Argibay-Losada et al.

10. M. Neuts, Z. Rosberg, H.L. Vu, J. White, M. Zukerman, “Performance analysis of optical composite burst switching,”IEEE Comm. Lett., v.6, n.8, p.346-348, 2002. 11. A. Zalesky, H. L. Vu, Z. Rosberg, E. Wong, M. Zukerman, “Modelling and performance evaluation of optical burst switched networks with deﬂection routing and wavelength reservation,” INFOCOM 2004, Hong Kong, 2004, pp. 1864–1871. 12. N. Barakat, E. Sargent, “Analytical modeling of oﬀset-induced priority in multiclass OBS networks,” IEEE Trans. Comm., vol. 53, no. 8, pp. 1343–1352, 2005. 13. F. P. Kelly, “Blocking probabilities in large circuit-switched networks,” Advances in Applied Probability, vol. 18, no. 2, pp. 473–505, 1986. 14. Z. Rosberg, H. L. Vu, M. Zukerman, J. White, “Performance analyses of optical burst-switching networks,” IEEE JSAC, vol. 21, no. 7, pp. 1187–1197, 2003. 15. K. W. Ross, Multiservice Loss Models for Broadband Telecommunication Networks. London,UK: Springer-Verlag, 1995. 16. J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, “Modeling tcp reno performance: a simple model and its empirical validation,” IEEE/ACM Trans. Networking, vol. 8, no. 2, pp. 133–145, 2000. 17. I. Baldine, G. Rouskas, H. Perros, and D. Stevenson, “JumpStart: a just-in-time signaling architecture for WDM burst-switched networks,” IEEE Commun. Mag., vol. 40, no. 2, pp. 82–89, 2002. 18. H. L. Vu and M. Zukerman, “Blocking probability for priority classes in optical burst switching networks,” IEEE Commun. Lett., vol. 6, no. 5, pp. 214–216, 2002. 19. P. Argibay-Losada, A. Su´ arez-Gonz´ alez, M. Fern´ andez-Veiga, R. F. Rodr´ıguezRubio, and C. L´ opez-Garc´ıa, “From relative to observable proportional diﬀerentiation on OBS networks,” in Proc. ACM Conference on Emerging Network Experiment and Technology (CoNEXT’05), Toulouse, France, Oct. 2005, pp. 115–123. 20. M. Schwarz, Telecommunication Networks. Addison-Wesley, 1988.

The TCP Minimum RTO Revisited Ioannis Psaras and Vassilis Tsaoussidis Dept. of Electrical and Computer Engineering Demokritos University of Thrace, Xanthi, Greece {ipsaras,vtsaousi}@ee.duth.gr http://comnet.ee.duth.gr

Abstract. We re-examine the two reasons for the conservative 1-second Minimum TCP-RTO to protect against spurious timeouts: i) the OS clock granularity and ii) the Delayed ACKs. We ﬁnd that reason (i) is canceled in modern OSs; we carefully design a mechanism to deal with reason (ii). Simulation results show that in next generation’s high-speed, wireless-access networks, TCP-RTO should not be limited by a ﬁxed, conservative lower bound. Keywords: TCP, Minimum RTO, High Speed Links, Last Mile Wireless.

1

Introduction

The Retransmission Timeout policy of standard TCP is governed by the rules deﬁned in RFC 2988 [11]. The TCP-RTO is calculated upon each ACK arrival after smoothing out the measured samples, and weighting the recent RTT-variation history: RT O = SRT T + 4 × RT T V AR,

(1)

where RTTVAR holds the RTT variation and SRTT the smoothed RTT. The same RFC also speciﬁes that the TCP-RTO should not be smaller than 1 second [11]. This value is known as the Minimum RTO and constitutes the subject of interest in the present paper. Currently, no oﬃcial instruction exists to address the setting of the Minimum RTO value for TCP. Allman and Paxson in [1] investigated the impact of the Minimum RTO and found that TCP results in lower Throughput performance for Minimum RTO values smaller than 1 second. There were two main limitations that required a (conservative) lower bound for the TCP-RTO to protect it against spurious expirations: 1. the Clock Granularity (500ms for most OSs at that time): if the RTT equals the clock granularity, then the timeout may falsely expire before the ACK’s arrival at the server. 2. the Delayed Acknowledgments (usually set to 200 ms) [3]: in case an ACK is delayed for more than the current TCP-RTO value, the timer will spuriously expire. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 981–991, 2007. c IFIP International Federation for Information Processing 2007

982

I. Psaras and V. Tsaoussidis

We study each of the above limitations in turn and show that, in fact, there is a lot of space for improvement in the Minimum RTO setting to improve TCP performance. In Section 2, we provide details regarding the clock granularity of modern OSs and ﬁnd that it is far below the 500ms threshold assumed in [11]. We deﬁne a Cost Function and show (experimentally) the impact of the Minimum RTO setting on TCP’s performance. We conclude that the timer granularity does not constitute a limitation for setting the Minimum RTO, anymore. In Section 3, we investigate the limitation of the TCP Delayed ACK mechanism on the Minimum RTO. We propose a mechanism that makes the TCP server aware of whether the next ACK to be received will, possibly, be delayed or not. Based on that, we assign a Minimum RTO value to each outgoing packet: a longer Minimum RTO to packets whose ACKs may be delayed and no Minimum RTO, otherwise. We present our performance evaluation plan in Section 4. In Section 5.1, we claim that due to limited research studies on the subject of the Minimum RTO, several OSs implement diﬀerent values for the lower bound of the TCP-RTO, leading to communication inconsistencies. In Sections 5.2 and 5.3, we investigate the impact of a Minimum RTO value on short, web ﬂows and on long FTP ﬂows, respectively; using simulations, we show that the proposed mechanism signiﬁcantly improves TCP performance, especially in case of wireless losses. We conclude the paper in Section 6.

2

Clock Granularity

We deﬁne a Cost Function (Equation 2) to capture the extra time a sender has to wait before retransmitting, due to the conservative Minimum RTO value. C(f ) =

RT Omin RT Ocurrent

(2)

If C(f ) < 1, then the Minimum RTO value adds no extra waiting time, in case of packet loss, since the TCP-RTO value is larger than the Minimum RTO. Otherwise, the Minimum RTO value will negatively impact TCP throughput, by forcing the TCP sender to wait for the Minimum RTO timer to expire, prior to retransmitting. We set (both the client’s and the server’s) clock granularity to 500ms and simulate one ﬂow over a 500ms round-trip propagation delay path, to observe: i) the rationale behind the conservative 1-second Minimum RTO setting [11], [1] and ii) the impact of the Minimum RTO value relatively with the actual TCPRTO value. We ﬁnd (see Fig. 1(a)) that: i) the TCP-RTO algorithm adjusts to values higher than 1 second, hence, C(f ) < 1 and ii) the Minimum RTO value is only needed as a security setting against spurious retransmissions (i.e., in case the round-trip propagation delay or the client’s clock granularity equals the server’s clock granularity and at the same time, the TCP-RTO adjusts to a smaller value, the sender will spuriously timeout). We reduce the round-trip propagation delay to 6ms and repeat the previous experiment (see Fig. 1(b)). Again, we observe that C(f ) < 1. We conclude

The TCP Minimum RTO Revisited 24

983

440 Sequence Number

Sequence Number

22 20 18 16 14 12 10

435

430

425

420 9.5

10

10.5

11

11.5

12

12.5

13

13.5

14

15

15.5

Time (s) DATA pkt ACK

16

16.5

17

17.5

Time (s)

RTO min RTO

DATA pkt ACK

RTO min RTO

(a) Granularity = 500ms, Round-Trip (b) Granularity = 500ms, Round-Trip Propagation Delay = 500ms Propagation Delay = 6ms Fig. 1. 500ms Clock Granularity

that in case of coarse-grained clocks the Minimum RTO does not have negative impact on TCP Throughput, since the TCP-RTO adjusts to values higher than the Minimum RTO. The Minimum RTO, instead, is only needed as a security setting against spurious timeouts. In Table 1 we present details regarding some of the most popular OSs, currently; we observe that the clock granularity is always set to a value below (or equal to) 25ms. We repeat the above experiment using, this time, a ﬁner-grained clock of 10ms. Table 1. Details on Modern OSs OS Windows Solaris Linux

Clock Granularity 15-16ms 10ms ≤ 25ms

Delayed ACK 200ms 50-100ms Dynamically Set

Figure 2 uncovers the signiﬁcant diﬀerence between the TCP-RTO values and the Minimum RTO limitation. In contradiction to coarser-grained clocks, simulated previously, we observe that C(f ) is now far above 1, obviously leading to severe performance degradation, in case of packet losses. For the sake of simplicity, we assume the time interval between the ACK arrival and the RTO value, in Fig. 2, to be negligible and we modify the Cost Function (Equation 2) accordingly: C(f ) ≈

RT Omin RT Omin ≤ , T (ACK Arr) RCG + RT P D + QD

(3)

where T (ACK Arr) holds the ACK Arrival Time, RCG the Receiver’s Clock Granularity, RT P D the Round-Trip Propagation Delay and QD the Queuing Delay. Since we simulate only one ﬂow, we also consider the Queuing Delay to be insigniﬁcant. Hence, from Equation 3 we derive that C(f ) ≈ 62.5. Of course, the cost of extra waiting time due to a high Minimum RTO value will decrease

984

I. Psaras and V. Tsaoussidis

Sequence Number

1480 1475 1470 1465 1460 1455 1450 7.2

7.4

7.6

7.8

8

8.2

8.4

8.6

Time (s) DATA pkt ACK

RTO min RTO

Fig. 2. Granularity = 10ms, Round-Trip Propagation Delay = 6ms

as the Round-Trip Propagation and Queuing Delay increase. We conclude that: i) the clock granularity should not be a matter of concern for the setting of the Minimum RTO, and ii) the conservative 1-second Minimum RTO will have major impact on TCP’s performance, in case of packet losses.

3

Dealing with Delayed ACKs

The Delayed ACK mechanism [3] is quite popular among the vast majority of OSs, currently. According to that mechanism, the TCP client will delay sending an ACK for an incoming packet, for as long as the Delayed ACK timer suggests (see Table 1), unless another packet needs to be sent on that connection (piggypacking). In other words, if a stream of packets arrive at the TCP client, the latter will generate one ACK for every other packet. Otherwise, if one packet arrives at the TCP client, without being followed by any subsequent packet, then an ACK will be generated only after the Delayed ACK timer expiration. The Minimum RTO will prevent spurious RTO expiration in the latter case. We propose a mechanism to identify the packets whose ACKs are (possibly) going to be delayed1 ; the Minimum RTO is extended accordingly, for those packets only, to prevent spurious TCP-RTO expirations. Our mechanism is based on the following observations: – TCP’s Sending Window Management and ACK Processing [2] speciﬁes that the TCP server should send D back-to-back packets, upon each new-ACK arrival (ACK-clocking), according to Equation 4: D = snd.una + min(cwnd, rwnd) - snd.nxt,

(4)

where snd.una holds the oldest unacknowledged sequence number, cwnd and rwnd the congestion and advertised window, respectively and snd.nxt the next sequence number to be sent. 1

We leave interactive applications as a subject of future work.

The TCP Minimum RTO Revisited

985

– At the time when D back-to-back packets are generated, TCP does not know if the application has more data to send, and even if it does have, we do not know after how long. – Since the D packets “travel” back-to-back, only the ACK of the last packet of the ”train” of packets may be delayed, iﬀ the server’s application stops generating new data. – Every 2nd packet will always be ACKed. Consider that at time t0 all previously transmitted packets are already ACKed and D = 4 (or, generally, D is even). The TCP client will sent ACKs for the 2nd and 4th packets. In this case, the client will not delay ACKing any packets and consequently, there is no need for an extended Minimum RTO. Hence, we apply no Minimum RTO and leave the TCP-RTO deal with the outgoing packets’ timeout value. Now, consider that at time t0 , D = 3 (or, generally, D is odd). The TCP client will immediately ACK the 2nd packet and will trigger the Delayed ACK timer for the 3rd packet. If the server’s application does not generate any other packet (within the Delayed ACK’s timer interval minus the one-way propagation delay), then the 3rd packet will experience delayed ACK response. In this case, we need to extend the Minimum RTO, for the 3rd packet only, to prevent spurious timeout expiration. We extend the above considerations to cover all possible back-to-back sending patterns; we use one variable, which we call set odd and is initially set to false. The proposed mechanism operates in one of the following States: – State 1: “noMINRTO”. Do not apply extended Minimum RTO to any outgoing packet (i.e., the receiver will always ACK the last packet of the backto-back train of packets); set set odd to false. – State 2: “extended MINRTO”. Apply extended Minimum RTO to the last packet of the next train of back-to-back packets; set set odd to true. According to the following steps, the proposed mechanism applies an extended Minimum RTO value only if needed (State 2). Otherwise, the TCP-RTO algorithm deals with the timeout value (State 1). The ﬂow-diagram of the proposed mechanism is presented in Fig. 3. – Step 1: Extend the Minimum RTO for the ﬁrst packet sent in the Slow-Start phase and proceed to step 2 or 3, depending on the value of D. – Step 2: If (and for as long as) D is even and set odd is false, remain in State 1. – Step 3: Once D becomes odd, go to State 2. – Step 4: If (and for as long as) D is even and set odd is true, remain in State 2. – Step 5: When D becomes odd again, go to State 1 (i.e., the sum of two odd numbers is always even and hence, the ACK for the last packet of the next train will not be delayed). – Step 6: Proceed to step 2, if D is even, or to step 3, otherwise.

986

I. Psaras and V. Tsaoussidis 1st packet of the Slow-Start phase extend MINRTO set_odd:false D:even

D:odd D:even

D:even D:odd noMINRTO set_odd:false

extend MINRTO for last pkt of train set_odd:true

D:odd

Fig. 3. State Diagram of the Proposed Algorithm

Summarizing, the Minimum RTO is set according to the following equation: R ms, for the last pkt if set odd = 1, RT Omin = RT Ocur , otherwise, where R is a ﬁxed, extended value for the Minimum RTO. We discuss the setting of this value in Section 5.1. We present part of the above process in Fig. 4. Initially (i.e., until packet 1478) set odd is false and D = 2, in which case there’s no need for an extended Minimum RTO (State 1). Next, D = 3 and hence the proposed mechanism extends the Minimum RTO of the 3rd packet and sets set odd to true (State 2). From that point onwands, since set odd is true and D is not odd, the proposed mechanism will extend the Minimum RTO of the last (i.e., 2nd ) packet of the back-to-back train of packets (State 2). 1490 Sequence Number

1488 2 pkts b2b

1486

2 pkts b2b

1484

2 pkts b2b

1482 1480

3 pkts b2b

1478

2 pkts b2b

1476

The ACK of the last b2b pkt may be delayed. Extend the Minimum RTO

2 pkts b2b 7

7.2

7.4

DATA pkt ACK

7.6 Time (s)

7.8

8

RTO min RTO

Fig. 4. Modeling ACKs Arrival

We note that the proposed mechanism does not apply for packets sent during Fast Retransmit (FR). During FR the Minimum RTO is set to R ms; the mechanism resumes from Step 6 after FR or timeout expiration.

The TCP Minimum RTO Revisited

987

According to the above, we re-write Equation 2, for the proposed mechanism as follows: R ms , for the last pkt if set odd = 1, C(f ) = RT Ocur 1, otherwise. Obviously, the cost of extra waiting time, due to the conservative Minimum RTO setting, is now signiﬁcantly decreased; at the same time, the risk of running into spurious timeouts, due to delayed ACK response from the TCP client, is eﬀectively avoided.

4

Performance Evaluation Plan

We evaluate the performance of the proposed mechanism using ns-2 [10]. We use realistic protocol settings to reﬂect the behavior of Internet servers [9], [8], [14]. That is, most OSs use the SACK [7] version of TCP with the timestamps option enabled [6] and the response against spurious timeouts [5], [13] in place. We set the Delayed ACK timer to 200ms and the clock granularity to 10ms; we compare the proposed mechanism with three diﬀerent Minimum RTO implementations: i) 200ms implemented in Linux TCP, ii) 400ms implemented in Solaris TCP and iii) 1 second as proposed by IETF (and probably implemented in Windows TCP). We use the network topologies shown in Fig. 5, where all buﬀers use the RED [4] queuing policy. The buﬀer sizes are set according to the Delay − Bandwidth P roduct of the outgoing links.

(a) Topology 1 Clients 10Mbps, 1ms Web Server

30Mbps, 10ms

100Mbps, 10ms

10Mbps, 5ms

10Mbps, 10ms

(b) Topology 2 Fig. 5. Simulation Topologies

We use two traditional performance metrics: 1. the Average Task Completion Time (ATCT) in case of short, web-applications, and 2. the System Goodput, in case of FTP applications: Goodput =

Original Data , Connection time

(5)

988

I. Psaras and V. Tsaoussidis

where Original Data is the number of Bytes delivered to the high-level protocol at the receiver (i.e., excluding the retransmitted packets and the TCP header overhead) and Connection time is the amount of time required for the data delivery.

5

Results

We divide the Results Section in three main subsections. Initially (Section 5.1), we show that due to limited standardization eﬀorts on the subject of the Minimum RTO setting, communication problems may arise when the communicating ends are supported by diﬀerent OSs. Next, we present the impact of the Minimum RTO setting on: i) short, web-like ﬂows (Section 5.2) and ii) long FTP ﬂows (Section 5.3). We emphasize on next generation, broadband wireless access networks, where ﬂow-contention is low and losses occur mainly due to wireless errors. 5.1

The Need for a Standard Mechanism

We have already shown that there exist diﬀerent implementation settings for both the Delayed ACK timer and the Minimum RTO value among diﬀerent OSs. We report, however, that in case Equations 6 and 7 hold, then the sender will run into spurious timeout expirations every time the receiver delays the ACK response. Server s M inimum RT O < RT P D + QD + Client s DelACK T imer (6) M inimum RT O > RT Ocur

(7)

We verify the above statement experimentally. We simulate a Linux server (Minimum RTO = 200ms) and a Windows client (Delayed ACK Timer = 200ms), over a 42ms Round Trip Propagation Delay path (see Fig. 5(b)).

606

608 Spurious Retransmission

Sequence Number

Sequence Number

608

604 602 600 598 596

606 604 602 600 598

Extended Minimum RTO to avoid spurious Timeout

596 36

36.1 36.2 36.3 36.4 36.5 36.6 36.7 36.8 Time (s) DATA pkt ACK

RTO min RTO

36

36.1 36.2 36.3 36.4 36.5 36.6 36.7 36.8 Time (s) DATA pkt ACK

RTO min RTO

(a) Linux Server - 200ms Delayed ACK (b) Modiﬁed Linux Server - 200ms DeClient (e.g., Windows client) layed ACK Client (e.g., Windows client) Fig. 6. The Need for a Standard Mechanism

The TCP Minimum RTO Revisited

989

Indeed, we see in Fig. 6(a) that the Linux server spuriously times-out and retransmits packet 601 (i.e., the ACK arrives 42ms later). On the contrary, the proposed mechanism extends the Minimum RTO long enough to avoid spurious retransmissions (see Fig. 6(b)). In the present work, whenever deemed necessary, according to the proposed mechanism, we apply M inimum RT O = R = 500ms. That is, the proposed mechanism will eﬀectively deal with situations where RT P D + QD ≤ 300ms (see Equation 6), since we have not found any implementation, where the Delayed ACK interval is greater than 200ms. 5.2

Impact on Short Web Flows

We use the topology shown in Fig. 5(b), where 3 ﬂows, download a contentrich web-page (i.e., 100KBs) every 5 seconds; end-users are connected through wireless, lossy links to router R2 (P ER = 3%). In Table 2, we present the Average Task Completion Time2 (ATCT) for the Linux TCP implementation, comparatively with the proposed mechanism, after 20 successfully completed tasks. Table 2. Average Task Completion Time (ATCT) Linux noMINRTO Diﬀerence

ﬂow 1 2.4s 2.1s ∼ 13.2%

ﬂow 2 2.5s 2.4s ∼ 4.22%

ﬂow 3 2.7s 2.55s ∼ 6%

Since the propagation and transmission delay is the same in both cases, we subtract them in order to capture the delay diﬀerence solely due to the proposed algorithm; we ﬁnd that the 200ms Minimum RTO value, implemented in Linux TCP, increases the ATCT by 8% in average. We note that the diﬀerence in the ATCT is further increased in case of higher Minimum RTO values (e.g., Solaris, IETF), as well as in case of faster transmission links. 5.3

Impact on Long FTP Flows

We present three diﬀerent evaluation scenarios considering three network parameters: i) the PER, ii) the Number of Participating ﬂows and iii) the Bandwidth Capacity of the Backbone link. In all cases, we use the topology shown in Fig. 5(a); the simulation setup for each experiment is shown in Table 3, while the corresponding results are presented in Figs. 7(a), 7(b) and 7(c). In all three cases, we observe that the proposed mechanism provides signiﬁcant performance increase against the Minimum RTO settings implemented in Linux TCP, Solaris TCP and the IETF proposal (Windows TCP). When 4% of the transmitted packets are corrupted due to wireless errors (Fig. 7(a)), for example, the proposed mechanism improves TCP Goodput performance by approximately 2

Each Task is deﬁned as a complete transfer of a web page.

990

I. Psaras and V. Tsaoussidis Table 3. Experiment Details PER see Fig. 3% 3%

Fig. 7(a) Fig. 7(b) Fig. 7(c)

TCP Flows 3 see Fig. 500

bw bb 6 Mbps 100 Mbps see Fig.

Goodput (B/s)

Goodput (B/s)

1e+07 400000 350000 300000 250000 200000 150000 100000 50000 0

8e+06 6e+06 4e+06 2e+06 0

2

3

4

20

5

Packet Error Rate (%) IETF

Solaris

Linux

noMINRTO

IETF

(a) Increasing PER Goodput (B/s)

40

60

80

100

Number of Participating Flows Solaris

Linux

noMINRTO

(b) Increasing TCP Contention

7e+07 6e+07 5e+07 4e+07 3e+07 2e+07 1e+07 0 2

4

6

8

10

Backbone Bandwidth Capacity (Gbps) IETF

Solaris

Linux

noMINRTO

(c) Increasing Backbone Bandwidth Capacity Fig. 7. Impact on Long FTP Flows

33% over the Linux/Solaris TCP implementation, while the increase becomes even larger (i.e., 50%) against the IETF setting. Faster transmission links provide further advantage to the proposed algorithm (Fig. 7(c)); both the RTT and the Queuing Delay decrease, making the cost of extra waiting time an even more dominant factor, performance-wise (see Equations 2 and 3). As TCP contention increases (Fig. 7(b)) [12], however, the performance diﬀerence decreases, since in that case the queuing delay, reﬂected in the TCP-RTO value, miniﬁes the impact of extra waiting time due to the Minimum RTO setting (see Equations 2 and 3).

6

Conclusion

We have shown that the conservative 1-second Minimum RTO setting causes severe TCP performance degradation, especially in case of last mile wireless users connected to high-speed backbone links. We argue that such a (security) setting, to protect against spurious TCP timeouts, is not needed, since: i) modern OSs use ﬁne-grained clocks and ii) the Delayed ACK response can be dealt with, using the proposed mechanism. Simulation results show that under conditions (i.e.,

The TCP Minimum RTO Revisited

991

high-speed backbone links, wireless errors at the last mile wireless link), TCP may achieve up to 50% higher Goodput performance, when the proposed mechanism is used; at the same time, spurious timeout expirations, due to delayed ACK response from the TCP client, are eﬀectively avoided.

References 1. M. Allman and V. Paxson. On Estimating End-to-End Network Path Properties. In Proceedings of ACM SIGCOMM, pages 263–274, September 1999. 2. M. Allman, V. Paxson, and W. Stevens. TCP Congestion Control, RFC 2581, April 1999. 3. R. Braden. Requirements for internet hosts - communication layers, October 1989. 4. S. Floyd and V. Jacobson. Random Early Detection gateways for Congestion Avoidance. IEEE/ACM Transactions on Networking, 1(4):397–413, 1993. 5. A. Gurtov and R. Ludwig. Responding to Spurious Timeouts in TCP. In Proceedings of IEEE INFOCOM, 2003. 6. V. Jacobson, R. Braden, and D. Borman. TCP Extensions for High Performance, RFC 1323, May 1993. 7. M. Mathis, J. Mahdavi, S. Floyd, and A. Romanow. TCP Selective Acknowledgement Options, RFC 2018, April 1996. 8. Alberto Medina, Mark Allman, and Sally Floyd. Measuring interactions between transport protocols and middleboxes. In Proceedings of IMC ’04, pages 336–341, New York, NY, USA, 2004. ACM Press. 9. Alberto Medina, Mark Allman, and Sally Floyd. Measuring the evolution of transport protocols in the internet. SIGCOMM CCR, 35(2):37–52, 2005. 10. ns 2. The Network Simulator - ns - 2, http://www.isi.edu/nsnam/ns/. 11. V. Paxson and M. Allman. Computing TCP’s Retransmission Timer, RFC 2988, May 2000. 12. Ioannis Psaras and Vassilis Tsaoussidis. Why TCP Timers (still) Don’t Work Well. Elsevier Computer Networks Journal, COMNET, to appear 2007. 13. P. Sarolahti, M. Kojo, and K. Raatikainen. F-RTO: An Enhanced Recovery Algorithm for TCP Retransmission Timeouts. In Proceedings of ACM SIGCOMM, September 2003. 14. P. Sarolahti and A. Kuznetsov. Congestion control in linux TCP. In Proceedings of USENIX’02.

Improving XCP to Achieve Max-Min Fair Bandwidth Allocation Lei Zan and Xiaowei Yang University of California, Irvine {lzan,xwy}@ics.uci.edu

Abstract. TCP is shown to be ineﬃcient and instable in high speed and long latency networks. The eXplicit Control Protocol (XCP) is a new and promising protocol that outperforms TCP in terms of eﬃciency, stability, queue size, and convergence speed. However, Low et al. recently discovered a weakness of XCP. In a multi-bottleneck environment, XCP may achieve as low as 80% utilization at a bottleneck link and consequently some ﬂows may only receive a small fraction of their max-min fair rates. This paper proposes iXCP, an improved version of XCP. Extensive simulations show that iXCP overcomes the weakness of XCP, and achieves eﬃcient and fair bandwidth utilization in both single- and multibottleneck environments. In addition, we prove that iXCP is max-min fair in steady state. This result implies that iXCP is able to fully utilize bottleneck bandwidth. Simulations also show that iXCP preserves the good properties of XCP, including negligible queue lengths, near-zero packet loss rates, scalability, and fast convergence.

1

Introduction

It is well known that TCP’s congestion control mechanism is ineﬃcient and instable in high bandwidth-delay-product environments [1, 2, 3, 4]. TCP treats a packet loss as an implicit congestion signal, and reacts to congestion by cutting its congestion window size by half, and then gradually increases its window size by one every round trip time (RTT). This saw-toothed behavior of TCP leads to throughput oscillation and link under-utilization, especially in a high bandwidth-delay-product environment. The eXplicit Control Protocol (XCP) [5] overcomes the limitations of TCP by sending explicit window adjustment information from routers to end hosts. It is a promising candidate to replace TCP in a future Internet architecture [6]. Unlike TCP, an XCP ﬂow does not implicitly probe available bandwidth. Instead, a router computes the spare bandwidth of an output link, and fairly allocates the bandwidth among all ﬂows that share the same link. The router then writes the amount of window adjustment in the congestion header of an XCP ﬂow. This explicit control mechanism allows a ﬂow to quickly utilize the available bandwidth of a link. Early results have shown that XCP is highly eﬃcient, stable, and scalable [5]. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 992–1004, 2007. c IFIP International Federation for Information Processing 2007

Improving XCP to Achieve Max-Min Fair Bandwidth Allocation

993

However, Low et al. [7] recently revealed a weakness of XCP. In a multibottleneck environment, XCP’s congestion control mechanism may cause some bottleneck link to be under-utilized, and a ﬂow may only receive a small fraction of its max-min fair allocation [8].1 Low et al. demonstrated this result with both a theoretical model and simulations. With the chosen parameters, XCP may utilize only 80% of bottleneck bandwidth in the worst case. In this paper, we refer to this theoretical model as Low’s model. Intuitively, this problem occurs because XCP-enabled routers independently compute bandwidth allocation. For instance, if a ﬂow is bottlenecked at a downstream link, an upstream router would still attempt to allocate bandwidth to that ﬂow to ensure local fairness. This leads to link under-utilization, which in turn causes some ﬂow to receive only a fraction of its max-min fair allocation. In this paper, we propose a simple improvement to XCP that overcomes this limitation. We add an additional bottleneck identiﬁer ﬁeld into XCP’s congestion header. If a ﬂow is bottlenecked at an outgoing link of a router, the router writes the link identiﬁer into this ﬁeld. Other routers are then aware that the ﬂow is not bottlenecked at their links. We further modify XCP’s control algorithm not to waste bandwidth on ﬂows that are bottlenecked at other routers. We use extensive simulations to show that our improved XCP (iXCP) is able to achieve nearly 100% link utilization and max-min fairness in steady state. We use a theoretical model to show that iXCP is max-min fair. This result also implies that iXCP is able to fully utilize bottleneck bandwidth. In addition, our simulation results show that iXCP preserves the desirable features of XCP. It converges fast to max-min bandwidth allocation with a negligible queue length; it is eﬃcient and fair in high speed and long latency networks as well as conventional networks; it also remains as a scalable stateless solution. This paper has three key contributions. The ﬁrst is our analysis on the root cause of XCP’s under-utilization problem. Although this problem has been observed in [7,9], to the best of our knowledge, we are the ﬁrst to pinpoint the particular protocol mechanism that causes the problem. The second is our improvement to XCP. This improvement makes XCP achieve its full potential: iXCP is highly eﬃcient and fair in all types of topologies. The third is the theoretical analysis that shows iXCP is max-min fair. This analysis provides us the conﬁdence that iXCP will continue to operate eﬃciently and fairly in scenarios that we did not simulate. We note that our theoretical analysis builds on Low’s model [7]. The rest of the paper is organized as follows. In Section 2, we brieﬂy summarize how XCP works and its weakness. Section 3 describes iXCP. We use extensive simulations to evaluate the performance of iXCP in Section 4. We compare our work with related work in Section 5. Section 6 concludes the paper. 1

A max-min fair allocation maximizes the bandwidth allocated to the ﬂow with a minimum allocation. Max-min fairness satisﬁes the following conditions: 1) the allocation is feasible, i.e., the sum of the allocated rates does not exceed a link’s capacity; 2) a ﬂow’s rate allocation cannot be increased without violating the feasibility constraint or without decreasing the rate of some ﬂow that has an equal or smaller bandwidth share.

994

2

L. Zan and X. Yang

Understanding the Weakness of XCP

Understanding what causes a problem is the ﬁrst step to solve the problem. In this section, we describe how XCP works and analyze what makes XCP underutilize link bandwidth. 2.1

How XCP Works

XCP uses explicit window adjustment for a ﬂow to increase or decrease its congestion window. Each packet carries a congestion header that contains three ﬁelds: the sender’s current congestion window size: cwnd, the estimated round trip time: rtt, and the router feedback ﬁeld: feedback. Each sender ﬁlls its current cwnd and rtt values in the congestion header on packet departures, and initializes the feedback ﬁeld to its desired window size. The core control mechanism of XCP is implemented at routers. Each router has two logical controllers: eﬃciency controller and fairness controller. In each control interval, the eﬃciency controller computes spare bandwidth as follows: φ = αd(c − y) − βb

(1)

where d is the average round-trip time, c is the link capacity, y is the input traﬃc rate, and b is the persistent queue length. α and β are two control parameters, with default value 0.4 and 0.226 respectively. The fairness controller is responsible for fairly allocating bandwidth to each ﬂow. When a link is in high-utilization region, XCP’s fairness controller performs a “bandwidth shuﬄing” operation to ensure local fairness. This operation simultaneously allocates and deallocates the shuﬄed bandwidth among ﬂows. The shuﬄed bandwidth computed by XCP is as follows: hxcp = max{0, γy − |φ|}

(2)

where γ is a control parameter with default value 10%. In each control interval, the spare bandwidth computed from Equation (1) if it is positive and the shuﬄed bandwidth computed from Equation (2) are allocated to each ﬂow additively, i.e., each ﬂow gets an equal amount of increment. At the mean time, the spare bandwidth if it is negative and the shuﬄed bandwidth is deallocated multiplicatively, i.e., each ﬂow gets a rate decrement proportional to its current rate. A router writes the net window adjustment (the increment minus the decrement) information in the feedback ﬁeld of a packet. A more congested downstream router may overwrite this ﬁeld with a smaller increment or a larger decrement. The receiver echoes back the feedback to the sender, and the sender adjusts its window size accordingly. 2.2

The Weakness Revealed

Low et al. rigorously modeled XCP’s control algorithm in [7]. Both their analysis and simulations revealed that XCP may under-utilize a bottleneck link. In the worst case, XCP may only achieve 80% link utilization (with XCP’s default

Improving XCP to Achieve Max-Min Fair Bandwidth Allocation Bottleneck

:

995

n long flows

155Mbps 100Mbps Bottleneck

Fig. 1. In this topology, XCP under-utilizes the 155Mb/s link. The n long ﬂows are bottlenecked at the 100Mb/s link. So the short ﬂow should be able to send at 55Mb/s, but with XCP, it may get a bandwidth allocation anywhere between 24Mb/s to 55Mb/s, depending on the number of long ﬂows.

parameters). We describe this problem using the network topology shown in Figure 1. This is an example used by Low et al. in [7]. In this example, there are n (n ≥ 2) long ﬂows that cross both links in the ﬁgure and a short ﬂow that only goes through the 155Mb/s link. Since each long ﬂow is bottlenecked at the 100Mb/s link, in a max-min fair allocation, each of them gets a bandwidth allocation of 100/n Mb/s. The short ﬂow should fully utilize the residual bandwidth of the 155Mb/s link, obtaining a 55Mb/s bandwidth allocation. However, both simulations and analysis show that XCP allocates a rate r(n) < 55 Mb/s bandwidth to the short ﬂow in steady state. The function r(n) decreases as n increases. Figure 2(a) and 2(b) show the utilization of the 155Mb/s bottleneck link and the ratio between the short ﬂow’s allocated rate and its max-min fair share respectively. As we can see, both the link utilization and the short ﬂow’s bandwidth allocation decrease as the number of long ﬂows n increases. Intuitively, this problem occurs because XCP-enabled routers independently allocate bandwidth to each ﬂow based on their local information. Although a long ﬂow is bottlenecked at the downstream 100Mb/s link, the upstream router at the 155Mb/s link still attempts to increase its bandwidth share to ensure local fairness. As a result, the short ﬂow that is only bottlenecked at the upstream link cannot fully utilize the link and obtain its max-min fair share. We explain it in more depth to shed light on the solution. The problem is primarily caused by XCP’s fairness controller. The “bandwidth shuﬄing” operation essentially uses the Additive-Increase Multiplicative-Decrease (AIMD) algorithm to adjust rates among ﬂows. Thus, a ﬂow with a larger bandwidth share will be tariﬀed more and get back less, and vice versa. In a single bottleneck environment, the AIMD policy will equalize all ﬂow’s bandwidth shares, thereby achieving fairness. The problem arises in a multi-bottleneck environment. In such an environment, the de-allocated bandwidth from a shuﬄing operation may not be fully utilized, when some ﬂows are bottlenecked at other links (no matter downstream or upstream links) and cannot further increase their sending rates. This deallocated but not re-used bandwidth leads to link under-utilization. The ﬂows that could have used this bandwidth have a net loss in a shuﬄing operation. Recall that the fairness controller is also in charge of allocating the spare bandwidth computed by the eﬃciency controller. When the net loss of a ﬂow from the shufﬂing operation balances out its net gain in the spare bandwidth allocation, the

996

L. Zan and X. Yang

system has reached an equilibrium, in which a ﬂow’s sending rate cannot further increase although there remains un-used bandwidth. One might think that if the problem is caused by the bandwidth shuﬄing operation, we can simply disable bandwidth shuﬄing to ﬁx the problem. However, bandwidth shuﬄing is essential to prevent starvation for new ﬂows and to achieve some degree of fairness. Without bandwidth shuﬄing, XCP can achieve full link utilization, but the rate allocations to diﬀerent ﬂows can be arbitrarily unfair (See [10] for a concrete example).

3

Improved XCP (iXCP)

In this section, we describe our improvement to XCP. As we discussed in the previous section, the under-utilization problem is caused by a router’s attempt to shuﬄe bandwidth from ﬂows that can further increase their rates to ﬂows that cannot, i.e., ﬂows that are bottlenecked at other routers. Therefore, we modify XCP’s control algorithm to shuﬄe bandwidth only among ﬂows that are bottlenecked at current routers. This modiﬁcation ensures that the deallocated bandwidth by the shuﬄing operation will be re-used. Therefore, a link will be fully utilized. To implement this improvement, a router must be aware whether a ﬂow is bottlenecked at the router or not. For a router to obtain this information, we include two additional ﬁelds in the XCP’s congestion header: the bottleneckId and the nextBottleneckId ﬁeld. The control algorithm uses the bottleneckId ﬁeld to compute per-packet feedback. A router sets the nextBottleneckId ﬁeld on the ﬂy based on the feedback value. If the feedback from this router is smaller than the one in the packet header, then the outgoing link of this router becomes the new bottleneck for the ﬂow, and the router writes its outgoing link identiﬁer to this ﬁeld. To ensure uniqueness, a router may use a random 32-bit value as a link identiﬁer. Initially, both the bottleneckId and the nextBottleneckId ﬁelds are set to a sender’s access link identiﬁer. An iXCP receiver acknowledges back the feedback and the nextBottleneckId ﬁeld to the sender. The sender copies the nextBottleneckId ﬁeld from the acknowledgement to the bottleneckId ﬁeld of its outgoing packets. The control algorithm of XCP is modiﬁed as follows. On a packet arrival, a router estimates the spare bandwidth of an outgoing link as in the original XCP shown in Equation (1), and the router estimates the shuﬄed bandwidth only from those packets whose bottleneckId s match the link identiﬁer of the router. In iXCP, the shuﬄed bandwidth can be represented as follows: hixcp = max{0, γ(y − y0 ) − |φ|}

(3)

As y0 denotes the input traﬃc rate for ﬂows whose bottleneckId s do not match router’s link identiﬁer, y − y0 denotes the input rate for ﬂows bottlenecked at current router whose bottleneckId s match router’s link identiﬁer. On a packet departure, if the packet’s bottleneckId matches the current outgoing link identiﬁer of the router, the router follows the original XCP algorithm and

Improving XCP to Achieve Max-Min Fair Bandwidth Allocation

997

computes the feedback from both the spare bandwidth computed from Equation (1) and the shuﬄed bandwidth computed from Equation (3). Otherwise, the feedback is computed only from the spare bandwidth. Pseudo code of the algorithm is presented in [10]. The drawback of our modiﬁcation is that iXCP increases the congestion header of XCP by eight bytes and the acknowledgement header by four bytes. If we assume the average packet size is 400 bytes [11], and each packet has both a congestion header and an acknowledgement header, we increase the packet header overhead by 3%. However, as we will show in the following sections, iXCP can increase link utilization by almost 20% in some multi-bottleneck environments. Besides, packet sizes may increase in the future due to advanced technologies, e.g., gigabit Ethernet with a jumbo frame size of 9000 bytes. Therefore, we think it is a worthy tradeoﬀ to design iXCP to achieve eﬃcient and fair bandwidth allocation in both single- and multi-bottleneck topologies at the cost of slightly increased header overhead. We build on Low’s theoretical model [7] to prove the following theorem and we refer interested readers to [10] for detailed proof. Theorem 1. iXCP achieves max-min fair bandwidth allocation in steady state. That is, with iXCP, all bottleneck links will be fully utilized; and a ﬂow cannot increase its rate without decreasing the rate of a ﬂow that has a smaller or equal share.

4

Simulation Results

In this section, we use extensive ns2 [12] simulations to evaluate the performance of iXCP and compare it with XCP. Due to space limitation, we only present a subset of our results. More results can be found in [10]. The simulation scenarios include multiple bottleneck topologies, heterogeneous RTTs, and web-like traﬃc. There are two key questions we aim to answer: 100

80 60 40 iXCP: analysis iXCP: simulation XCP: analysis XCP: simulation

20 0 4

8

16

32

64

128 256 512 1024

Number of long flows

ratio of R/Rmaxmin (%)

Link utilization (%)

100

80

iXCP: analysis iXCP: simulation XCP: analysis XCP: simulation

60 40 20 0 4

8

16

32

64

128 256 512 1024

Number of long flows

(a) Utilization for the 155Mb/s link in (b) Ratio of the short ﬂow’s rate (R) Figure 1 over its max-min fair rate (Rmaxmin ) Fig. 2. iXCP achieves nearly optimal link utilization and max-min fair rate for the short ﬂow. The ﬁgures compare iXCP with XCP using both simulation and theoretical results.

998

L. Zan and X. Yang

iXCP XCP

15

10

5

iXCP XCP

10

Packet drops

Bottleneck Queue(Packets)

20

8 6 4 2 0

0 4

8

16

32

64

128 256 512 1024

4

8

Number of long flows

(a) Average queue size

16

32

64

128 256 512 1024

Number of long flows

(b) Packet drops

Fig. 3. Small average queue size and zero packet drops are achieved for the 155Mb/s link in Figure 1

1) does iXCP ﬁx the problem of XCP and achieve max-min fair allocation? 2) does iXCP make other properties of XCP worse? To answer the ﬁrst question, we compute two metrics: 1) link utilization; 2) a ﬂow’s rate normalized by its theoretical max-min rate. If the link utilization is 100%, and a ﬂow’s normalized rate is 1, it shows that iXCP achieves max-min fair bandwidth allocation. To answer the second question, we examine three metrics: 1) the persistent queue sizes; 2) the packet drops; 3) the convergence speed when ﬂows join and leave the network. In our simulations, we use the same parameter settings as those chosen by XCP [5] for the purpose of comparison. The coeﬃcients for the eﬃciency controller α and β are set to be 0.4 and 0.226 respectively, and the coeﬃcient for the fairness controller γ is set to be 0.1. The packet size is 1000 bytes. The propagation delay of each link in all topologies is 20ms unless otherwise noted, and the bandwidth of each link is speciﬁed in the ﬁgures. 4.1

A Simple Two-Bottleneck Environment

We simulated the scenario as shown in Figure 1. The short ﬂow is bottlenecked at the ﬁrst 155Mb/s link, and all the other long ﬂows are bottlenecked at the second 100Mb/s link. We vary the number of long ﬂows from 4 to 1024. We only show the results for the short ﬂow on the ﬁrst link, because XCP cannot fully utilize the bandwidth of the ﬁrst link and does not allocate the max-min rate to the short ﬂow, as explained in Section 2.2. The second link is fully utilized in both XCP and iXCP, and each long ﬂow obtains its max-min rate of 100/n Mb/s. Figure 2(a) and 2(b) show the link utilization for the ﬁrst bottleneck link and the ratio between the short ﬂow’s rate and its theoretical max-min share. We show both the simulation and the theoretical results for XCP and iXCP. Theoretical results for XCP are obtained using Low’s model [7]; theoretical results for iXCP are obtained using our analysis in [10]. As can be seen, as the number of long ﬂows increases, the link utilization of XCP decreases and approaches its

Improving XCP to Achieve Max-Min Fair Bandwidth Allocation

1

210

200

1

2

1 flows

160

120

3

5 flows

4

10 flows

60

999

(Mbps)

5

20 flows 30 flows

Fig. 4. A multi-bottleneck topology

theoretical lower bound 80%. The short ﬂow only obtains 40% of its max-min rate. In contrast, iXCP achieves more than 97% link utilization; the short ﬂow’s rate is more than 90% of its max-min rate. Figure 3(a) and 3(b) show the average queue size and the packet drops at the 155Mb/s link. Both XCP and iXCP have very small queue size, and negligible packet drops. (In the simulations, the packet drops are zero.) The simulation results for iXCP match well with the theoretical analysis until the number of long ﬂows becomes large. We examined packet traces and concluded that the discrepancy is caused by rounding errors. XCP’s control algorithm computes a window adjustment value in bytes, but a sender advances its sending window in packets. When the window increment is less than one packet, the window size in the congestion header of a packet is actually larger than a sender’s true sending window. This rounding error aﬀects the calculation of perpacket feedback, stopping a ﬂow from further increasing its window. When the number of ﬂows is large, the aggregated rounding errors are not negligible and therefore leads to slight under-utilization. 4.2

A Multi-bottleneck Environment

We studied how iXCP performs in a multi-bottleneck environment as shown in Figure 4. In this topology, there are a total of ﬁve hops and all ﬂows are bottlenecked at the last hop they cross. A link is labeled with an identiﬁer ranging from 1 to 5. For example, the thirty longest ﬂows are bottlenecked at the ﬁfth link; the twenty second longest ﬂows are bottlenecked at the fourth link; and so on. The max-min rates for ﬂows bottlenecked at link 1 to 5 are 10Mbps, 8Mbps, 4Mbps, 3Mbps, 2Mbps, respectively. Figure 5(a) and 5(b) show the utilization and the normalized ﬂow rate at each link achieved by both iXCP and XCP. We only show the bottlenecked ﬂows for each link, and the normalized rates are averaged over all bottlenecked ﬂows. The standard deviations among ﬂows are too small to be visible. Thus, they are not shown in the ﬁgures. As can be seen, iXCP achieves full link utilization and max-min rate allocations for all ﬂows at all bottleneck links, while XCP cannot fully utilize the bandwidth of the ﬁrst four links. Small average queue lengths and zero packet drops are achieved for both iXCP and XCP. The results are presented in [10].

1000

L. Zan and X. Yang

100

ratio of R/Rmaxmin (%)

100

Utilization (%)

80 60 40 20 iXCP XCP

0 1

2

3

4

Link ID

(a) Link utilization

80 60 40 20 iXCP XCP

0 5

1

2

3

4

5

Link ID

(b) Ratios of ﬂow rates (R) over their max-min fair rates (Rmaxmin )

Fig. 5. iXCP achieves nearly 100% link utilization and max-min fair ﬂow rates for each link in the multi-bottleneck topology shown in Figure 4

4.3

Dynamic Flows

We simulated the case in which the bottlenecks of ﬂows change over time as ﬂows dynamically join and leave the network. This simulation is to show how well iXCP converges when there are sudden traﬃc demand changes. The simulation topology is shown in Figure 6. Each link has a propagation delay 20ms. We start the long ﬂow, and four short ﬂows from s1 to s4 at time t = 0. Each short ﬂow only crosses one link. We then start three short ﬂows s5 , s6 , and s7 at t = 20. At time t = 40, the short ﬂows from s4 to s7 stop. Figure 7(a), 7(b), and 7(c) show how each ﬂow’s rate changes over time as its bottleneck shifts with traﬃc demand. All ﬂows converge to their max-min rates shortly after a change in the network. In the ﬁrst 20 seconds, the long ﬂow is bottlenecked at the 120Mb/s link. The max-min rates for the long ﬂow is 40Mb/s and the rates for the short ﬂows are: s1 = 110Mb/s, s2 = 60Mb/s, and s3 = s4 = 40Mb/s. After the simulation starts, all ﬂows converge to their max-min rates within ten seconds. When ﬂows s5 to s7 start at t = 20, the bottleneck link for the long ﬂow shifts to the 150Mb/s link. The ﬂow’s rate quickly converges to its new max-min rate 30Mb/s. Correspondingly, other shorts ﬂows quickly converge to their max-min rates (the short ﬂows s1 , s5 , s6 and s7 converge to 30Mb/s, and s2 increases to 70Mb/s, and s3 and s4 increase to 45Mb/s.). At t = 40, the short ﬂows from s4 to s7 stop. The new bottleneck for the long ﬂow is the 100Mb/s link. It quickly converges to its max-min rate 50Mb/s. Similarly, s1 converges to its new max-min rate 100Mb/s, s2 converges to 50Mb/s, and s3 converges to 70Mb/s. Figure 7(d) shows the utilization of each link. As can be seen, iXCP is robust to the sudden changes in traﬃc demand. After four ﬂows exist the system, there is only a temporary dip, and the link is quickly fully utilized again. The convergence property of iXCP to equilibrium is similar to that of XCP as shown in the longer version of this paper [10]. Both converge to equilibrium within seconds, whereas TCP persistently oscillates and never converges [10]. However, in our simulations, it takes iXCP slightly longer (about 1.6 seconds)

Improving XCP to Achieve Max-Min Fair Bandwidth Allocation long flow 150Mbps

100Mbps

120Mbps

s1

s2

s3

1001

t=0.0

t=0.0 s4 t=20.0

s5 s6 s7

Fig. 6. The long ﬂow and four short ﬂows s1 to s4 start at time t = 0. Three short ﬂows s5 , s6 , and s7 start at t = 20. At time t = 40, the short ﬂows from s4 to s7 stop.

140

Flow Throughput (Mbps)

Flow Throughput (Mbps)

140 120 100

s1 s1

80 60 40

long flow long flow

20

s5,s6,s7

0

120 100 80 60

s2

40

long flow

20 0

0

10

20

30

40

50

60

0

10

Time (seconds)

30

40

50

60

Time (seconds)

(a) Flow rates at the 150Mb/s link

(b) Flow rates at the 100Mb/s link

140

100

120 100

s3

80 60

s3,s4

s3,s4

long flow

40

long flow

20

Link utilization (%)

Flow Throughput (Mbps)

20

80

60

40

20

150Mbps link 100Mbps link 120Mbps link

long flow

0

0 0

10

20

30

40

50

60

Time (seconds)

(c) Flow rates at the 120Mb/s link

0

10

20

30

40

50

60

Time (seconds)

(d) Link Utilization for all the three links shown in Figure 6

Fig. 7. These ﬁgures show how iXCP adapts to the ﬂow dynamics and the link utilization of all three links during adaptation. The simulation topology is shown in Figure 6. Each ﬂow can quickly converge to its max-min fair rate when there are ﬂows joining or leaving the network. Each link can achieve nearly 100% link utilization

to ramp up a ﬂow’s rate from zero to the equilibrium rate. This is because iXCP is fairer than XCP: it requires additional round trip times to shuﬄe bandwidth to fully achieve max-min fairness. Due to the lack of space, simulation results for heterogeneous RTTs and weblike traﬃc are not included here. A complete set of simulation results are presented in [10].

1002

5

L. Zan and X. Yang

Related Work

Low et al. simulated and analyzed XCP’s under-utilization problem [7]. This work is inspired by their discovery. Our analysis is built on Low’s model. Zhou et al. also foresaw this problem and proposed P-XCP [9]. However, P-XCP does not address the root cause of the under-utilization problem, and only alleviates the problem in some topologies. With P-XCP, a router estimates the number of ﬂows bottlenecked at itself. The router allocates more bandwidth to those bottlenecked ﬂows by scaling the spare bandwidth with a factor N/Nb , in which N is the total number of ﬂows, and Nb is the ﬂows that are bottlenecked at the router. This scheme only increases link utilization when a bottleneck link is upstream to an under-utilized link. For instance, it does not increase link utilization in the example shown in Figure 1. Moreover, as P-XCP over-allocates spare bandwidth, it causes rate ﬂuctuations and increases the persistent queue size. Comparisons between P-XCP and iXCP are shown in [10]. To the best of our knowledge, our work is the ﬁrst that systematically addresses the under-utilization problem of XCP and to prove that the improved XCP is max-min fair in all types of topologies in steady state. JetMax [13] is a rate-based congestion control algorithm that aims to provide max-min fairness. Similar to our scheme, a Jetmax packet header includes a bottleneck identiﬁer ﬁeld, but its control algorithm is rate-based, completely diﬀerent from XCP, which is a window-based protocol. Charny et al. [14] also proposes a rate-based algorithm to realize max-min ﬂow rate allocation, using a diﬀerent control algorithm and feedback scheme. However, their approach requires per-ﬂow state at routers. In contrast, both XCP and iXCP are stateless. Other work has focused on the implementation of XCP [15] and improvements of XCP in other areas. Zhang et al. [16] extensively studied the implementation and deployment challenges of XCP. Hsiao et al. [17] extended XCP to support streaming layered video. Kapoor et al. [18] proposes to combine XCP with a Performance Enhancement Proxy (PEP) to provide fast satellite access. Yang et al. [19] proposed an improvement to shorten XCP’s response time for new ﬂows to acquire their bandwidth. Zhang et al. [20] presented a control theoretical model that considers capacity estimation errors. XCP-r [21] proposes to calculate the congestion window size at the receiver side to cope with ACK losses.

6

Conclusions and Future Work

XCP [5] is a new and promising protocol that outperforms TCP in terms efﬁciency, stability, queue size, and convergence speed. However, Low et al. [7] discovered a weakness of XCP. In some multi-bottleneck environments, XCP may only utilize as low as 80% of bottleneck bandwidth. This paper proposes an improved XCP (iXCP) that solves the link underutilization problem of XCP. We use extensive simulations as well as a theoretical analysis to show that iXCP is able to eﬃciently utilize bottleneck bandwidth and is max-min fair in steady state. iXCP also preserves other features of XCP, including small queue size, near-zero packet drops, and fast convergence.

Improving XCP to Achieve Max-Min Fair Bandwidth Allocation

1003

The performance of both iXCP and XCP degrades in highly dynamic situations. We have analyzed the cause of this performance degradation in [10], but it is our future work to further investigate the interactions between ﬂow dynamics and iXCP’s control algorithm and to propose improvement that makes the control algorithm more robust to ﬂow dynamics.

Acknowledgement The authors would like to thank Yong Xia for useful discussions, Dina Katabi for answering questions and Xin Liu for help with simulations.

References 1. C. Jin, D. Wei, and S. Low, “Fast TCP: Motivation, Architecture, Algorithms, Performance,” in Proc. of INFOCOM, March 2004. 2. S. Floyd, “HighSpeed TCP for Large Congestion Windows,” in IETF RFC 3649, Dec. 2003. 3. T. Kelly, “Scalable TCP: Improving Performance in HighSpeed Wide Area Networks,” in 1st International Workshop on PFLDN, Feb. 2003. 4. L. Xu, K. Harfoush, and I. Rhee, “Binary increase congestion control for fast long-distance networks,” in Proc. of INFOCOM, Mar. 2004. 5. D. Katabi, M. Handley, and C. Rohrs, “Congestion Control for High BandwidthDelay Product Networks,” in Proc. of SIGCOMM, 2002. 6. D. Clark, K. Sollins, J. Wroclawski, D. Katabi, J. Kulik, X. Yang, R. Braden, T. Faber, A. Falk, V. Pingali, M. Handley, and N. Chiappa, “New Arch: Future Generation Internet Architecture,” Tech. Rep., USC ISI, 2003. 7. S. Low, L. Andrew, and B. Wydrowski, “Understanding XCP: Equilibrium and Fairness,” in Proc. of INFOCOM, 2005, pp. 1025–1036. 8. D. Bertsekas and R. Gallager, “Data Networks,” in Prentics-Hall Inc., 2nd ed., 1992. 9. K. Zhou, K. Yeung, and V. Li, “P-XCP: A Transport Layer Protocol for Satellite IP Networks,” in Proc. of GLOBECOM, 2004, pp. 2707–2711. 10. Lei Zan and Xiaowei Yang, “Improving XCP to Achieve Max-Min Fair Bandwidth Allocation,” Tech. Rep., UCI ICS, 2006, http://www.ics.uci.edu/~lzan/ ixcp techreport.pdf. 11. “Caida report,” http://www.caida.org/analysis/AIX/plen hist/, February 2000. 12. “The network simulation: ns2,” http://www.isi.edu/nsnam/ns. 13. Y. Zhang, D. Leonard, and D. Loguinov, “JetMax: Scalable Maxmin Congestion Control for High-Speed Heterogeneous Networks,” in Proc. of INFOCOM, 2006. 14. A. Charny, D. Clark, and R. Jain, “Congestion Control With Explicit Rate Indication,” in Proc. of ICC, 1995. 15. A. Falk and D. Katabi, “Speciﬁcation for the Explicit Control Protocol (XCP),” 2005. 16. Y. Zhang and T. Henderson, “An Implementation and Experimental Study of the eXplicit Control Protocol (XCP),” in Proc. of INFOCOM, 2005, pp. 1037–1048.

1004

L. Zan and X. Yang

17. H. Hsiao and J. Hwang, “A Max-Min Fairness Congestion Control for Streaming Layered Video,” in Proc. of ICASSP, 2004, pp. V981– V984. 18. A. Kapoor, A. Falk, T. Faber, and Y. Pryadkin, “Achieving faster access to satellite link bandwidth,” in Proc. of INFOCOM, 2005, pp. 2870–2875. 19. Y. Yang, C. Chan, P. Wang, and Y. Chen, “Performance Enhancement of XCP for High Bandwidth-Delay Product Networks,” in Proc. of ICACT, 2005, pp. 456–461. 20. Y. Zhang and M. Ahmed, “A Control Theoretic Analysis of XCP,” in Proc. of INFOCOM, 2005, pp. 2831–2835. 21. D.M. Lopez-Pacheco and C. Pham, “Robust Transport Protocol for Dynamic HighSpeed Networks: Enhancing the XCP Approach,” in Proc. of IEEE MICC-ICON, Nov. 2005.

TCP Libra: Exploring RTT-Fairness for TCP Gustavo Marfia1 , Claudio Palazzi1,2 , Giovanni Pau1 , Mario Gerla1 , M.Y. Sanadidi1 , and Marco Roccetti2, 1 Computer Science Department University of California Los Angeles, CA 90095 {gmarfia,cpalazzi,gpau,gerla,medy}@cs.ucla.edu 2 Dipartimento di Scienze dell’Informazione, University of Bologna Italy 40126 [email protected]

Abstract. The majority of Internet users rely on the Transmission Control Protocol (TCP1) to download large multimedia files from remote servers (e.g. P2P file sharing). TCP has been advertised as a fair-share protocol. However, when session round-trip-times (RTTs) radically differ from each other, the share (of the bottleneck link) may be anything but fair. This motivates us to explore a new TCP, TCP Libra2 , that guarantees fair sharing regardless of RTT. The key element of TCP Libra is the unique window adjustment algorithm that provably leads to RTT-independent throughput, yet converging to the fair share. We position TCP Libra in a non-linear optimization framework, proving that it provides fairness (in the sense of minimum potential delay fairness) among TCP flows that share the same bottleneck link. Equally important are the friendliness of Libra towards legacy TCP and the throughput efficiency. TCP Libra is source only based and thus easy to deploy. Via analytic modeling and simulations we show that TCP Libra achieves fairness while maintaining efficiency and friendliness to TCP New Reno. A comparison with other TCP versions that have been reported as RTT-fair in the literature is also carried out.

1 Introduction TCP was initially designed to provide a connection-oriented reliable service in the ARPANET environment, which later became the Internet. TCP addresses three major issues: reliability, flow control and congestion control [3]. To achieve the third goal,

1 2

This material is based upon work partially supported by the National Science Foundation under Grant No. 0520332; by the Italian Ministry for Research via the ICTP/E-Grid Initiative and the Interlink Initiative, by the Italian Ministero Affari Esteri under the Initiative Laboratorio Congiunto and by the UC-Micro Grant MICRO 04-05 private sponsor STMicroelectronics. Libra is the Latin word for “scale”. Reference Author: Giovanni Pau, Computer Science Department, University of California Los Angeles, CA 90095, e-mail:[email protected]. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or any of the other funding agencies. With TCP, unless differently specified, we refer to TCP New Reno. Libra in Latin means scale, thus indicating a balance between the flows.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1005–1013, 2007. c IFIP International Federation for Information Processing 2007

1006

G. Marfia et al.

Fig. 1. With TCP New Reno, RTT imbalance typically leads to uneven share of bandwidth between two competing flows. In this experiment one server is at UCLA (with RTT < 1ms) and the other in Taiwan (with RTT = 200ms). Two clients repeatedly download a 26MB file from both servers: both clients are at UCLA. The clients share a 1Mbps wireless LAN Access to the Internet. We show the respective transfer times averaged over more than 100 trials. If clients download from Taiwan and UCLA separately, they complete the downloads in 540 and 305 sec respectively (with equivalent rates .68Mbps and .38Mbps). When the clients share the access link, the UCLA client completes the download in less than 400 sec (equivalent to .52Mbps net rate). The client in Taiwan completes the download in almost 900 sec. Thus, the download from Taiwan achieves the solo rate of .38Mbps only AFTER the UCLA client completes. While the two clients share (during the first 400 sec), the UCLA download achieves .52Mbps, while the Taiwan download achieves only .04Mbps rate! This is a very severe example of unfairness induced by TCP New Reno.

the sending rate is dynamically adjusted to avoid both service starvation and network overflow. The most widely deployed version, TCP New Reno, implements a congestion control algorithm, known as AIMD (Additive Increase, Multiplicative Decrease). The very basic concept can be summarized as follows (for a detailed description, please refer to [2]): – When a packet loss is detected, the TCP sender decreases its sending window by half. – When a packet is successfully delivered, it increases its sending window by one. The data-sending rate of TCP3 is determined by the rate of incoming acknowledgments (ACKs). At steady state, it equals the arrival rate of ACKs. This behavior has been referred to as TCP’s “self-clocking behavior”. This self-regulation procedure, however, may be unfair. Competing TCP senders with different end-to-end propagation delays will receive ACKs at different rates and likewise will increase their sending window at different rates. This property is know as RTT-bias (of TCP New Reno). It was analytically derived in [4]. As a rule of thumb competing TCP throughputs are inversely proportional to round-trip-times. An example of this phenomenon, from a real experiment, can be appreciated in Fig. 1. 3

We will interchangeably use the terms congestion window, sending window, and data-sending rate.

TCP Libra: Exploring RTT-Fairness for TCP

1007

Because of RTT-bias, competing users with larger RTTs will experience higher file download latency. This problem affects big users (e.g., supercomputer researchers), as well as small users. The RTT induced throughput imbalance is often compensated by content providers by deploying a content delivery network with a fine-grain geographic distribution. Users can then download content from nearby caches instead of the remote server. TCP congestion control has been extensively studied during the last 30 years, leading to several TCP variants, each providing some added features (along with possible drawbacks). In this context we introduce TCP Libra, a sender-side-only variation to legacy TCP. The goal of TCP Libra is to maintain fairness in the face of uneven RTT values. A similar line of work was followed in [6] and [7], where the authors were inspired by intuition and heuristics. The TCP Libra design was inspired by the analysis of a generalized TCP model. By properly optimizing the model parameters we were able to obtain an algorithm that can be rigorously shown to converge to the RTT fair share. Moreover, the TCP Libra is not only RTT-fair. It also preserves link efficiency and scalability to large link capacities, being friendly to legacy TCP. Another important property of TCP Libra is the sender-side-only modification - non reliant on router active queue management and thus easy to deploy. We have modeled, simulated and implemented TCP Libra (on Linux 2.6.15). Analytical and simulation results are presented in the paper. The remainder of the paper is organized as follows. The congestion control algorithm is introduced in Section 2, along with the key control parameters/function and with the utility model interpretation. Experimental results are reported in Section 3. Conclusions and future work are in Section 4.

2 The TCP Libra Algorithm TCP Libra behaves exactly like TCP New Reno except for congestion window management. In fact, Libra differs from TCP New Reno in the following details: α T2

n n 1 – windown+1 ← windown + window in case of a successful transmission. n Tn +T0 T1 windown – windown+1 ← windown − 2(Tn +T0 ) in case of 3 DUPACK (and the threshold is set accordingly).

where windown is the congestion window at step n, αn a unique control function described in the next section, T0 and T1 are fixed parameters and Tn is the RTT at step n. T1 is the parameter that sets the multiplicative decrease term. T0 is the parameter that sets the sensitivity of the protocol to the RTT. We can see that the window increase is driven, for Tn << T0 (the typical case), by the α factor and by the square of round-triptime. In this case, RTT-fairness is enforced (as we will show later) and the algorithm helps large bandwidth-delay-product flows, by letting their windows grow much faster than in TCP New Reno. If instead Tn >> T0 (a rather rare event), the window increase is driven by the α factor and round-trip-time. RTT-fairness is not preserved in this case, but it is weighted as the inverse square root of round-trip-time. This last property of TCP Libra ensures that flows with pathological problems on their paths get a lower sending rate.

1008

G. Marfia et al.

2.1 The α Control Function In the previous section we introduced the new α control function. The design of α was accomplished with the objective of pursuing the following main objectives: 1. Increase convergence speed and achieve scalability for the algorithm. 2. Keep the algorithm behavior stable.4 For the above reasons, the α factor is expressed as the product of two components, namely: α = S ∗ P , where 1. S = Scalability factor. 2. P = Penalty or Damping factor. We achieve scalability by adjusting the scalability factor to the capacity of the narrow link. To compute the latter, we use packet pair techniques that are run embedded into the algorithm. We will further expand this point in the following subsection. In particular, we set: (1) S = k1 Cr where k1 is a constant and Cr , expressed in Mbps, is the capacity of the narrow link seen by the r-th source. The penalty factor P has been designed in order to adapt the source sending rate increase to the network congestion. As usual, congestion is measured by connection backlog time (i.e. the difference between RTT and minimum RTT). One may use differT (t)−T min

r r −k2 T max −T min

r r ent expressions of penalty functions here. A possible option for P is e , where Tr (t) represents the instantaneous round-trip-time, Trmax the maximum roundtrip-time and Trmin the minimum round-trip-time experienced by the r-th connection. This function tends to penalize the growth of the congestion window when the current round-trip-time approaches to the historic maximum round-trip-time experienced during connection lifetime. In a later section we will show the simulation results for TCP Libra. The simulations have been obtained by substituting T0 = 1, T1 = 1, k1 = 2 and k2 = 2. While a higher value of k2 , in the penalty function, would have improved the link utilization by keeping the window at its maximum for a longer time, we have noticed that a higher k2 generates an excessively timid behavior of TCP Libra toward TCP New Reno. We adjusted this value as a trade off between utilization and friendliness. The parameter k1 is adjusted accordingly to k2 . T0 is set to 1 (i.e. 1 second), since in the great majority of cases this will result in Tr << T0 [10], which gives a diminished sensitivity to Tr , without excessively penalizing the algorithm’s stepsize. T1 is set to 1, which means that the window decrease will mainly be driven by legacy TCP’s decrease rate. Before moving to comparisons and results, we will discuss in the remainder of this section two other important TCP Libra design components, namely, capacity estimation and the utility function that the Libra algorithm attempts to maximize.

4

Here we mean stability in terms of the local asymptotic stability.

TCP Libra: Exploring RTT-Fairness for TCP

1009

2.2 Capacity Estimation TCP Libra relies on CapProbe [11] for capacity estimation. CapProbe is an accurate and fast converging estimation algorithm that may be implemented passively by any TCP scheme. In brief, CapProbe relies on packet pairs and on the concept of minimum delay sum of a packet pair. When a packet pair is sent, if any of the two packets experiences queuing, the sum of RTTs will increase. By monitoring packet pairs and selecting the pair that has the minimum delay sum it is possible, with a high probability to recognize the pair that did not experience any queuing (please refer to [12] for more information regarding this algorithm and on useful ideas on how to incorporate it into a TCP scheme). By isolating one pair that did not experience any queuing and from the knowledge of the interval between the two ACKs for that packet pair, it is possible to compute the capacity of the narrow link of the path. The capacity information is used in eq. (1). 2.3 TCP Libra Fairness: A Utility Function Interpretation We begin by deriving the TCP Libra fluid flow equation for any one of the several TCP flows sharing the same bottleneck. In the following w(t), x(t), T (t) and λ(t) represent, respectively, the instant window size, instant rate, round-trip-time and probability of 2 1 α(t)T (t) packet drop. Libra increments the window by w(t) T (t)+T0 per each successful ACK, 2

x(t) α(t)T (t) hence the window increase per unit time = w(t) T (t)+T0 (1 − λ(t)). When a packet is dropped, causing 3 DUPACK the window decreases by w(t) − T1 w(t) 2(T (t)+T0 ) . The rate of this event is x(t)λ(t). The window decrease per unit time =

x(t)λ(t)(w(t) − of TCP Libra:

T1 w(t) 2(T (t)+T0 ) ).

w(t) ˙ =

We now have all the ingredients to write the fluid model

T1 w(t) α(t)T (t) ) (1 − λ(t)) − x(t)λ(t)(w(t) − T (t) + T0 2(T (t) + T0 )

(2)

By setting T (t) = T˜ , w(t) = x(t)T˜ and α(t) = α ˜ , we have: x(t) ˙ =

α ˜ T1 x(t) (1 − λ(t)) − x(t)λ(t)(x(t) − ) ˜ T + T0 2(T˜ + T0 )

(3)

which (after setting T0 = T1 = 1 and assuming T0 >> T˜ ) may be rewritten as: x(t) ˙ =

1 ˜ x2 (t)/2 + α ( 1 2 − λ(t)) ˜ T +1 2α ˜ x (t) + 1

(4)

Eq. (4) is expressed in the form of a gradient algorithm (please refer to [1] [9] for a rigorous treatment of this topic), where 1 x2 1(t)+1 represents the marginal utility funcα ˜ tion (i.e. the first derivative of the utility function) the algorithm attempts to optimize, 2 α ˜ a non negative, non decreasing function that acts λ(t) the aggregate price and x T˜(t)+ +1 as a gradient step amplifier. By equating marginal utility and price, we nullify the gradient and find the equilibrium solution for rate x(t). The equilibrium point is unique

1010

G. Marfia et al.

because the derivative of the objective function = marginal utility price (see eq. (4)), decreases with x, and thus the objective function is concave. ˜ 1−λ x ˜= α (5) ˜ ˜ λ It will be noticed that the equilibrium TCP Libra throughput x˜ does not depend on RTT under the assumption that T0 >> T˜ . Thus, our design of window increase and decrease algorithm, coupled with the careful setting of parameter T0 , will guarantee fairness among all flows sharing the same bottleneck. Indeed, this fairness property is not a coincidence: we reverse- engineered TCP Libra to make it happen!

3 Performance Evaluation We evaluate TCP Libra using the NS-2 [13] simulation platform. We compare it to TCP Sack and to other TCP versions that claim RTT-fairness, namely, BIC [8] and Fast TCP [5]. For the latter schemes, we have selected the parameters made available online in simulation scripts by the authors themselves. Each experiment was run for 1000 seconds in order to reach steady state. Two different topologies are used, each featuring different scenarios. The bottleneck link can take different values. The bottleneck buffer can take two different values: (a) the number of packets that fill the bottleneck link, or; (b) the packets that fill the longest path. Routers along the path were configured to implement either drop tail or an adaptive RED queuing policy. The advertised window for each connection was set larger than the corresponding pipe size so that occasional packets may be dropped, even when that connection is the only active connection. Finally, the packet size was set equal to 1000 bytes. Space limitations allow us to present only a subset of the obtained simulation results. We report only 100Mbps bottleneck results since other values did not show significant difference. The Fast TCP parameters alpha and beta were set to 100, as found in literature. BIC TCP parameters were set as recommended in [8]. 3.1 Parking Lot Topology Fig. 2 shows the so called parking lot topology with 8 end to end flows. Flows 1 and 2 have 180ms of minimum RTT and traverse 9 links; flows 3 and 4 have 90ms of minimum RTT and also traverse 9 links. The remaining flows, 5 through 8, are short flows. They utilize 3 link paths with 30ms minimum RTT. With these values, the bottleneck buffer choices are 375 pkts (bottleneck link pipe) and 2,250 pkts (longest path pipe). To overcome phase effects, flows were started at random times within the first 5 seconds of simulation. In Fig. 3, for each TCP variant we report the Jain’s index values for flows 1-4. TCP Libra provides good fairness with both buffer sizes. The penalty factor in Libra adapts the window increase slope to the relative backlog time, thus reducing sensitivity to buffer size.

TCP Libra: Exploring RTT-Fairness for TCP S1

S2

R2

1011

R1

flows 1, 2 - RTT 180ms

S3

S4

R4

flows 3, 4 - RTT 90ms

R3

100Mbps Bottlenecks

X1

X2

S5

R5

X3

X4

X5

X6

X7

X8

S6

R6

S7

R7

S8

R8

flows 5, 6, 7, 8 - RTT 30ms

Fig. 2. Parking Lot Topology Fig. 3. Jain’s index values Fig. 4. Efficiency in the Parkachieved during the same sim- ing Lot Topology ulations by flows 1-4, for each different protocol. The buffer is set equal to the bottleneck link pipe size and the longest path pipe size, respectively.

Fig. 5. Jain’s index is computed over all the 8 connections. TCP Sack, which had an acceptable Jain’s index for connections 1-4 and an optimal one for connections 5-8, obtains a very poor value when considering all the 8 connections together.

Fig. 6. Friendliness: TCP Sack was utilized on flows 2 (180ms of RTT), 4 (90ms of RTT), 6 and 8 (30 ms of RTT each), while the evaluated alternative protocol was utilized on flows 1 (180ms of RTT), 3 (90ms of RTT), 5 and 7 (30 ms of RTT each) —Parking Lot Topology

Throughput efficiency is considered in Fig. 4. TCP Libra’s utilization is slightly lower than that of TCP Sack. As expected, utilization for all protocols increases with buffer size. The particular behavior of short RTT connections 5-8 and longer RTT connections 1-4 is revealed by Fig. 5 where a strong fluctuation of fairness index is noted. Fast TCP performance visibly changes with buffer size. In the small buffer case, Fast TCP’s alpha parameter is not optimally tuned and hence a Fast TCP flow tries to keep too many packets in the bottleneck buffer (Note: the alpha parameter in Fast controls the number of packets a flow maintains in the bottleneck link). 3.2 Dumbbell Topology The coexistence of legacy TCP (i.e. TCP Sack) and the new protocols is evaluated in Fig. 6. The bar chart in Fig. 6 presents the relative TCP Sack throughputs when TCP

1012

G. Marfia et al.

Sack is competing with itself first, and then with each of the new protocols. We measure the TCP Sack goodput achieved in each of the RTT flow classes (long to short) as well as the aggregate goodput over all of its connections. More precisely, TCP Sack was used for flows 2 (180ms of RTT), 4 (90ms of RTT), 6 and 8 (30 ms of RTT each), while the new protocol was used for flows 1 (180ms of RTT), 3 (90ms of RTT), 5 and 7 (30 ms of RTT each). Fast TCP’s unfriendliness towards TCP Sack is again due to an incorrect value of the alpha parameter. When coexisting with BIC TCP, TCP Sack achieves a slight increase in its achieved goodput. TCP Libra shows a friendly, balanced behavior toward TCP Sack. The aggregate throughput of the coexisting TCP Sack flows diminishes only by 11%. There is a desirable, even if limited, redistribution of the TCP Sack goodput from the 30ms RTT flows to the 90ms and 180ms RTT ones. In this respect, TCP Libra seems to help the coexisting TCP Sack flows by (slightly) improving their fairness degree. The dumbbell topology is the classic topology used for TCP fairness evaluation. We adapted the simulation scripts utilized by [8]. Each link has a different RTT. The one way propagation delay on the links is 21ms for the short connections (from S2 to R2 and between Bi and Ci ) and 119ms for the longest one (from S1 to R1 ). Background traffic flows between Bi to Ci and Cj to Bj . This is composed by 4 forward regular long-lived TCP Sack flows, 4 backward regular long-lived TCP Sack flows, 25 small TCP flows with advertised window limited to 64 segments and an amount of web traffic in both directions able to occupy from 20% to 50% of the available bottleneck link capacity when no other flows are present. We here only examine the case in which the bottleneck link is small (i.e. equal to bottleneck link pipe size), the most demanding case for all protocols. Two different queuing policies are tested: drop tail and RED (Random Early Detection). The results are reported in Fig. 8. RED can considerably improve fairness, as expected, since RED was designed to prevent capture by aggressive flows.

R1

S1

forward

S2 X B1

10ms Y 100Mbps

R2

C3

backward

BN

CN

Fig. 7. Dumbbell Topology

Fig. 8. Jain’s index values achieved by the evaluated protocol while competing with a large amount of concurrent traffic. The concurrent connections were both short and long lived TCP sessions in both directions.

TCP Libra: Exploring RTT-Fairness for TCP

1013

4 Conclusions and Future Work The very interesting properties exhibited by TCP Libra strongly motivate us to continue refine the scheme using the feedback from simulation and early experiments. In particular, we will study TCP Libra performance in a larger number of heterogeneous network scenarios. Following the success of the early lab tests and eager to explore the practical impact on real Internet applications, we are now deploying TCP Libra in controlled Internet testbeds such as PLANET Lab. At the same time, we are preparing to run Internet experiments in increasingly complex and demanding scenarios with several collaborators at other institutions around the world.

References 1. R. Srikant, The Mathematics of Internet Congestion Control. A Birkhauser book, 2004. 2. J. Postel, “Rfc0793: Transmission control protocol,” Internet Engineering Task Force (IETF), Tech. Rep., 1981. 3. V. Jacobson, “Congestion avoidance and control,” in Proceedings of ACM SIGCOM ’88. ACM Press, 1988. 4. J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, “Modeling tcp throughput: A simple model and its empirical validation,” in Proceedings of the ACM SIGCOMM ’98, 1998. 5. C. Jin, D. Wei, and S. Low, “Fast tcp: Motivation, architecture, algorithms, performance,” in Proceedings of IEEE INFOCOM, 2004. 6. S. Floyd and V. Jacobson, “Traffic phase effects in packet-switched gateways,” ACM SIGCOMM Computer Communication Review, 1991. 7. T. R. Henderson and R. H. Katz, “Transport protocols for internet-compatible satellite networks,” IEEE Journal on Selected Areas in Communications, 1999. 8. L. Xu, K. Harfous, and I. Rhee, “Bic tcp,” in Proceedings of the IEEE INFOCOM, 2004, 2004. 9. D. P. Bertsekas, Nonlinear Programming. Athena Scientific, 1999. 10. J. Aikat, J. Kaur, F. D. Smith, and K. Jeffay, “Variability in tcp round-trip times,” in IMC ’03: Proceedings of the 3rd ACM SIGCOMM Conference on Internet Measurement. New York, NY, USA: ACM Press, 2003. 11. R. Kapoor, L.-J. Chen, L. Lao, M. Gerla, and M. Y. Sanadidi, “Capprobe: a simple and accurate capacity estimation technique,” in SIGCOMM ’04. New York, NY, USA: ACM Press, 2004. 12. C. Marcondes, A. Persson, L. Chen, M. Y. Sanadidi, and M. Gerla, “Tcp probe: A tcp with built-in path capacity estimation.” in he 8th IEEE Global Internet Symposium (in conjunction with IEEE Infocom.05), Miami, USA, 2005. 13. T. V. Project, The Network Simulator - ns-2.

Interactions of Intelligent Route Control with TCP Congestion Control Ruomei Gao1 , Dana Blair2 , Constantine Dovrolis1, Monique Morrow2, and Ellen Zegura1 1

College of Computing, Georgia Institute of Technology 2 Cisco Systems Inc.

Abstract. Intelligent Route Control (IRC) technologies allow multihomed networks to dynamically select egress links based on performance measurements. TCP congestion control, on the other hand, dynamically adjusts the send-window of a connection based on the current path’s available bandwidth. Little is known about the complex interactions between IRC and TCP congestion control. In this paper, we consider a simple dual-feedback model in which both controllers react to packet losses, either by switching to a better path (IRC) or by reducing the oﬀered load (TCP congestion control). We ﬁrst explain that the IRCTCP interactions can be synergistic as long as IRC operates on larger timescales than TCP (“separation of timescales”). We then examine the impact of sudden RTT changes on TCP, the behavior of congestion control upon path changes, the eﬀect of IRC measurement delays, and the conditions under which IRC is beneﬁcial under two path impairment models: short-term outages and random packet losses.

1

Introduction

We consider a model in which the egress TCP traﬃc of a stub network S is controlled by an IRC-capable multihomed edge router. In particular, we focus on the aggregate TCP traﬃc T (S, D) from S towards a destination network D. The traﬃc T (S, D) is subject to two closed-loop controllers: TCP congestion control at the ﬂow level and IRC path control for the entire aggregate (see Figure 1). TCP interprets packet losses as indications of congestion and adjusts the congestion window of each ﬂow based on the well-known TCP congestion control algorithms. TCP uses ACKs and the Retransmission TimeOut (RTO) to close the feedback loop, and so the TCP reaction timescale, i.e., the amount of time it takes to detect a packet loss and decrease the congestion window, is roughly in the order of one Round-Trip Time (RTT). IRC, on the other hand, interprets packet losses as indication of path impairment (not necessarily congestion). It controls the egress link, and thus the forwarding path, of the TCP ﬂows from S to D. IRC can switch to another path at the end of each routing period (of duration Tr ), after estimating the performance of all egress paths through active or passive monitoring [4]. A typical value for Tr in current IRC products is a few tens of seconds [3]. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1014–1025, 2007. c IFIP International Federation for Information Processing 2007

Interactions of Intelligent Route Control with TCP Congestion Control

1015

Fig. 1. The dual-feedback model

The control action of both TCP and IRC can aﬀect the packet loss rate that T (S, D) experiences. For instance, if T (S, D) saturates the current path causing some packet drops, then TCP reduces the window of the aﬀected connections decreasing the oﬀered load. Or, if the current path is subject to short-term outages, then IRC can switch T (S, D) to another path that does not suﬀer from such impairments. Note that the packet loss rate that T (S, D) experiences is not determined only by the two controllers, but also by external eﬀects (e.g., congestion due to other traﬃc in the current path, faulty equipment, routing instabilities). In this paper, we ﬁrst discuss the “separation of timescales” principle, which states that the two controllers, TCP and IRC, should operate with signiﬁcantly diﬀerent reaction timescales so that they do not compete with each other. We then focus on the impact of sudden RTT changes, as a result of path switching, on TCP’s throughput. An RTT decrease can cause packet reordering and throughput reduction due to Fast-Retransmit. A large RTT increase, on the other hand, can also cause throughput loss due to the expiration of the Retransmission Timeout. We also examine the impact of sudden available bandwidth changes, as a result of path switching, on TCP’s throughput. The important point here is that TCP may not be able to quickly capture the additional throughput that is available in the path that IRC has switched to. This is something that the IRC path selection process needs to take into account to avoid unnecessary switching decisions. We next study the performance of IRC when it relies on a realistic active measurement process, and under two path impairment models: short-term outages and random losses. We conclude that an IRC system that always switches to a path with lower loss rate may fail to provide the maximum possible TCP throughput to its users. This point indicates that IRC systems can beneﬁt from using predicted throughput as the main path switching criterion. Last, we examine whether IRC is beneﬁcial overall, depending on the underlying path impairment. Our analysis showes that IRC is synergistic to TCP for outages that last more than the measurement timescale and when the loss rate is signiﬁcant. The structure of the paper is as follows. We ﬁrst discuss the ”separation of timescales” issue (§2). Then, we investigate the impact of sudden RTT changes on TCP throughput (§3). Next (§4), we examine how TCP congestion control reacts to path changes, depending on the available bandwidth diﬀerence between

1016

R. Gao et al.

the two paths. The impact of measurement latency on the IRC path selection process is the focus of §5. Finally, in §6, we put everything together and examine the conditions under which IRC is beneﬁcial for TCP traﬃc.

2

Seperation of Timescales

In the rest of this paper, we consider a simple instance of the dual-feedback model in which there are only two paths from S to D. One path is referred to as “primary”, while the other is referred to as “backup” and it is only used as long as IRC detects a path impairment at the primary. Further, we model each of the primary and backup paths as a bottleneck link of capacity Cp and Cb , respectively. By deﬁnition, Cp ≥ Cb . The RTTs of the two paths are RTTp and RTTb , respectively. The following ns-2 simulations use TCP NewReno with Selective and Delayed ACKs. In the dual-feedback model described earlier, the role of IRC can be either synergistic or antagonistic to TCP. Synergistic interaction means that the throughput of T (S, D) with IRC is larger than the throughput without IRC, otherwise, the interactions are antogonistic. Antagonistic interactions can happen when IRC reacts to packet losses that TCP is designed to handle. For example, IRC can be antagonistic to TCP when it reacts to packet losses that are caused when T (S, D) saturates its current path, and the new path chosen by IRC provides a lower throughput than the original path. In this case, IRC should stay the course and let TCP react to these packet losses. On the other hand, synergistic interaction takes place when IRC reacts to random packet losses, congestion induced by traﬃc other than T (S, D), path outages, etc. Congestion control on the ﬂows of T (S, D) would not be able to avoid such externally-imposed losses. To achieve synergistic interaction, it is important to follow the separation of timescales principle. This means that the two controllers should operate with signiﬁcantly diﬀerent reaction timescales so that they do not compete with each other. Given that TCP’s reaction timescale can be anywhere from few milliseconds up to almost a second (the RTT range for most TCP connections in the Internet), IRC’s routing period Tr should be larger than that. On the other hand, however, it is desirable to keep Tr as low as possible so that IRC can provide fast recovery from short-term impairments [4]. Given this trade-oﬀ, we propose that Tr is more than one second and less than 3-5 seconds. With such separation of timescales, TCP gets the opportunity to react to sporadic packet losses ﬁrst. If TCP does not manage to eliminate the losses during a measurement period of Tm (< Tr ) seconds, which takes place at the end of the routing period, IRC gets activated and switches T (S, D) to another path. IRC can be synergistic to TCP as long as the new path provides higher TCP throughput than the current (lossy) path. In Figure 2, we illustrate what can happen when Tr is either too short or too long. In Figure 2(a), we compare the aggregate throughput of four TCP ﬂows in T (S, D) when Tr is set to 1 second (recommended value) with Tr =200 msec. The latter is close to TCP’s reaction time in this path, given that the RTT is

Interactions of Intelligent Route Control with TCP Congestion Control

ary im Pr up ck Ba 10 9 8 7 6 5 4 3 2 1 0

Throughput (Mbps)

Throughput (Mbps)

ary im Pr up ck Ba 10 9 8 7 6 5 4 3 9.5

Tr=1 sec Tr=200ms

10

10.5 11 Time (sec)

11.5

1017

Tr= 1 sec Tr=10secs

10 11 12 13 14 15 16 17 18 19 Time (sec)

(a) Unnecessary path switching when (b) Slow IRC reaction when Tr is too Tr is too short (Tr =200ms) long (Tr =10sec) Fig. 2. The eﬀect of the IRC routing period Tr

about 60 msec. In both cases we set the measurement period to Tm =200msec. At t=10sec, we introduce a congestion event at the primary path caused by CBR cross traﬃc. Without IRC, or with Tr =1sec, the cross traﬃc causes some congestive losses at the primary path, TCP decreases its oﬀered load, but in this case, the throughput of T (S, D) in the primary path is still higher than in the backup path. When Tr is set to 200msec, IRC is quick to detect packet losses and it switches the traﬃc to the backup path before TCP can adjust its oﬀered load. Note that TCP’s throughput is penalized by this unnecessary path switching. On the other hand, a very long routing period is not desirable either. As shown in Figure 2(b), when the primary path suﬀers from random packet drops with 10% loss rate, IRC should switch the traﬃc as soon as possible because TCP cannot avoid this path impairment by decreasing its oﬀered load. For Tr =10sec, it takes much longer to detect the impairment and react to it, compared to Tr =1sec, causing a signiﬁcant throughput penalty. Following the previous guidelines, in the rest of the paper we set the routing period to Tr =1sec. The IRC measurement period Tm covers the last 400msec of the routing period.

3

TCP and RTT Changes

In this section, we focus on the impact of sudden RTT changes, due to IRC path switching, on TCP performance. Intuitively, an abrupt RTT decrease can cause packet reordering and the activation of Fast-Retransmit, while a signiﬁcant RTT increase can cause the expiration of the Retransmission TimeOut (RTO). In this the next section, we consider an ideal IRC system that can detect impairments at the primary path without latency or error. The eﬀect of IRC measurement delays and errors is studied in §5. Suppose, without loss of generality, that the RTT of the primary path is lower, i.e., RTTp < RTTb . Note that the reverse path, from D to S, is the same in both paths (IRC cannot control the incoming traﬃc), and so the RTT diﬀerence is

1018

R. Gao et al.

equal to the diﬀerence of the One-Way Delays (OWD) in the forward path from S to D. Let OWDp and OWDb be the OWDs in the primary and backup paths, respectively. In the following simulations, we limit the advertised window of the TCP ﬂows that form T (S, D) so that the traﬃc aggregate cannot saturate either path. Consequently, any reduction in the TCP throughput is due to RTT changes rather than congestion control.

Ratio of affected experiments %

100 80 60 40 20 0 0

3

5

10 15 No (packets)

20

25

Fig. 3. Switching to a lower RTT path can cause Fast-Retransmit

Switching to lower RTT path: First, consider the case that IRC switches the traﬃc from the backup to the primary path, i.e., all TCP ﬂows in T (S, D) experience a sudden RTT decrease. The ﬁrst few packets sent through the primary path will arrive at the destination Out-Of-Order (OOO), because they reach the receiver before the last few packets sent through the backup path. For each OOO packet, the TCP receiver sends a Duplicate-ACK (DUPACK) to the sender. Typically, three consecutive DUPACKs trigger a Fast-Retransmit, and the congestion window is reduced by 50%. This reduction can cause a reduction in the TCP throughput, as a result of the IRC path change. The number of OOO packets (No ) depends on the OWD diﬀerence between the two paths. Speciﬁcally, a ﬁrst-order estimate of the number of OOO packets in a TCP connection after the path change is No = K(OWDb − OWDp ), where K is the throughput (in packets per second, pps) of the connection just before the path change. We expect that Fast-Retrasmit will take place when No 3. Since K varies signiﬁcantly, however, Fast-Retransmit may occur for even lower OWD diﬀerences. So, to avoid throughput loss due to Fast-Retransmit, IRC can attempt to avoid switching to a path of lower RTT when the OWD diﬀerence is more than approximately 3/K seconds. For example, if the target per-connection throughput is 1Mbps and the packet size is 1500 bytes, then K≈83 pps, and Fast-Retransmit will probably happen if the OWD diﬀerence is more than about 35msec. To simulate such path changes, we ﬁxed RTTp to 40msec and varied RTTb from 40msec to 200msec. For each value of RTTb , we conducted 1000 independent simulations of path switching from the backup to the primary path, where T (S, D) consists of four TCP connections. In each simulation, we examined whether the path switching was followed by a throughput decrease due to

Interactions of Intelligent Route Control with TCP Congestion Control

1019

Fast-Retransmit. The results are shown in Figure 3. The y-axis is the percentage of simulations in which we observed a throughput reduction, and the x-axis is an estimate of No . Note that Fast-Retrasmit starts even before we reach the threshold of three packets. The threshold No =3 is a reasonable rule of thumb, however, as it results in a throughput decrease in more than 30% of the simulations. Switching to higher RTT path: We next consider the case that IRC switches the traﬃc from the primary to the backup path, i.e., all TCP ﬂows in T (S, D) experience a sudden RTT increase. Such events can cause the expiration of the RTO timer even though there was no packet loss. The RTO expiration is followed by a reduction of the congestion window to one segment and by Slow-Start. The RTO timer is determined by a running-average of the observed RTTs, also taking into account the variability of the measured RTTs (see [9] for details). The RTO timer has a ﬁxed OS-speciﬁc lower bound, which is often set to 200msec or 1sec [8]. When IRC switches to the backup path, the RTO of the ongoing connections is either based on the ﬁxed minimum-RTO (say 200 msec) or the RTO of the primary path RTTp . If RTTb is larger than the minimum of these two values, then the path switching event will be followed by the expiration of the RTO timer for all ongoing TCP ﬂows. This event can cause major throughput loss, and IRC systems should avoid it, if possible. A practical guideline is to avoid switching to a path in which the RTT is larger by 200 msec or more. We have conducted similar simulations as in the previous paragraphs, and the results (not presented here) conﬁrm this analysis.

4

IRC and TCP Congestion Control

TCP congestion control is designed to adjust the send-window based on the available bandwidth (avail-bw) in the connection’s path. However, the adjustment is gradual, hence a TCP connection will not be able to just “jump” to the appropriate sending rate after IRC has switched to a path with lower or higher avail-bw. In this section, we focus on the behavior of congestion control upon IRC path changes. We consider two cases, depending on whether IRC switches to a path with higher avail-bw (backup to primary transition) or to a path with lower avail-bw (primary to backup transition). Switching to higher avail-bw path: When a TCP ﬂow moves from the backup to the primary path, its throughput will change from Rb = Wb /RTTb to Rp = Wb /RTTp , where Wb is the send-window before the path change. Given that the primary path provides higher avail-bw than the backup path, the connection can now increase its window. This window increase process will typically be linear, assuming the connection was in congestion avoidance before the path change. We show such an event in the simulation of Figure 4(a). The top part of the plot shows the aggregate window of four TCP ﬂows, while the bottom part shows their aggregate throughput. The shaded part of the graph covers the time period when IRC uses the backup path, while the rest covers the primary path period.

9 8 7 6 5 4 198

199

200

201

202

Time (sec)

(a) Backup to primary

44 40 36 9 8 7 6 5 4 200

Aggr-wnd (packets)

50 40 30

Thruput (Mbps)

Aggr-wnd (packets)

R. Gao et al.

Thruput (Mbps)

Thruput (Mbps)

Aggr-wnd (packets)

1020

201

202 Time (sec)

203

204

50 40 30 9 8 7 6 5 4 200

201

202

203

204

Time (sec)

(b) Primary to backup (no (c) Primary to backup (w/ congestion) congestion)

Fig. 4. Congestion window and throughput variations upon path changes

Note the rather slow increase of the aggregate window and throughput after the path change. The rate of this increase depends on the throughput diﬀerence Cp − Rp , the number of ongoing ﬂows, and their RTT. With N connections in congestion avoidance, and with a packet size L, T (S, D) will utilize the additional C −R capacity in RTTp pLN p seconds, if the window of each connection increases by one packet per RTT (i.e., ignoring the eﬀect of delayed ACKs). Switching to lower avail-bw path: Suppose that IRC has detected an impairment at the primary path and it switches T (S, D) to the backup path. We need to consider two cases, depending on whether T (S, D) will cause congestion in the backup path or not. In the ﬁrst case, T (S, D) does not cause congestion at the backup path. This will happen if the oﬀered load in T (S, D) is lower than the avail-bw in the backup path, or the buﬀer space at the bottleneck link of the backup path is suﬃciently large to hold the excess packets. This scenario is shown in Figure 4(b). Here, we have four connections that are limited by the advertised window when they use the primary path (Rp =8Mbps), and that saturate the backup path (Cb =5Mbps) without causing congestion. After the path change, the throughput of T (S, D) drops immediately to Cb without causing packet drops, because the bottleneck link has suﬃciently large buﬀers. TCP congestion control is not activated after such path changes, simply because there is no congestion when the traﬃc is switched to the lower avail-bw path. In the second case, T (S, D) does cause congestion at the backup path. In that case, one or more TCP connections experience packet loss after the path switching event, and there is a time period in which the aggregate throughput is less than Cb . Figure 4(c) shows an example. Note that the transition to the backup path is followed by a reduction in the aggregate congestion window and in the throughput at the backup path. Again, the duration of this transient eﬀect depends on the number of ongoing ﬂows, their RTT, and Cb . The key point of this section is that IRC path changes can move a TCP aggregate to a path with diﬀerent (higher or lower) avail-bw. It should be expected that the switched traﬃc will need some time to adapt to this avail-bw. Consequently, IRC will need to wait for the completion of such transient throughput variations before considering switching to another path.

Interactions of Intelligent Route Control with TCP Congestion Control

5

1021

IRC Measurement Latency

In the previous two sections, we considered an ideal-IRC model that can instantaneously detect the start and end of an impairment period in the primary path. In practice, measurements take some time and they are error-prone because they rely on sampling and inference. In this section, we compare the ideal-IRC model with a practical-IRC model that relies on ping-like active probing to detect packet losses in the primary and backup paths. Speciﬁcally, our practical-IRC model has a routing period Tr of 1 second, a measurement period Tm of 400 msec (covering the last 40% of the routing period), and it generates one probing packet (64 bytes) per 10msec to detect packet losses in each path. We examine two types of impairment in the primary path: outages and random packet losses. In both types, the primary path alternates between “good” periods of average duration Tg and “bad” periods of average duration Tb . Both durations are exponentially distributed. During an outage, all packets sent to the primary path are lost. With random losses, packets are dropped with probability p. In the following simulations, Tg =100 sec and Tb =50 sec, unless noted otherwise. In the simulations presented in this section, the traﬃc in T (S, D) consists of non-persistent TCP ﬂows. Speciﬁcally, 100 users generate transfers with Pareto distributed sizes (shape parameter = 1.8), and so the aggregate traﬃc is LongRange Dependent. After each transfer, a user stays idle for an exponentially long “thinking time”; the user then returns to the network with a new transfer. The average throughput of the traﬃc aggregate in the primary path during the good periods is Rpg ≈7Mbps, while the average throughput in the backup path is Rb ≈5Mbps. Outage impairments: Figure 5(a) compares ideal-IRC with practical-IRC in the case of outages for various values of the average outage duration Tb . As expected, ideal-IRC does better than practical-IRC for all values of Tb , but the absolute throughput diﬀerence is not signiﬁcant. To better understand the diﬀerence between the two models, we need to consider the measurement latency that is introduced by practical-IRC to detect an impairment. Speciﬁcally, Tmb is the latency to detect the impairment of the primary path and switch to the backup path, while Tmg is the latency to detect the restoration of the primary path and switch to that path. Rpg is the average throughput of T (S, D) on the primary path during “good” periods, Rpb is the average throughput on the primary path during “bad” periods (which is zero in the case of outages), and Rb is the average throughput in the backup path. Thus, the average throughput of T (S, D) with ideal-IRC is RI =

(Rb Tb + Rpg Tg ) (Tb + Tg )

(1)

and the average throughput with practical-IRC is RP =

Rb (Tb − Tmb + Tmg ) + Rpg (Tg − Tmg ) + Rpb Tmb Tb + Tg

(2)

7.2 7.0 6.8 6.6 6.4 6.2 6.0 5.8

practical-IRC ideal-IRC

0 20 40 60 80 100 Avg outage duration Tb (sec)

(a) Outage impairment

7.0 practical-IRC 6.9 ideal-IRC 6.8 6.7 6.6 6.5 6.4 6.3 0% 2% 4% 6% 8% 10% Packet loss probability p

Throughput (Mbps)

R. Gao et al. Throughput (Mbps)

Throughput (Mbps)

1022

7.5 7.0 6.5 6.0 5.5 5.0 4.5 4.0 3.5

Rpb Rb

0 2 4 6 8 10 12 14 16 Packet loss probability p

(b) Random loss impair- (c) Rpb and Rb under ranment dom losses

Fig. 5. Comparison of ideal-IRC with practical-IRC

Ideal-IRC performs better than practical-IRC, i.e., RI > RP , if and only if Rpb − Rb Tmg < Rpg − Rb Tmb

(3)

Given that Rpg − Rb and Tmg /Tmb are positive, ideal-IRC is better if Rpb < Rb , independent of the measurement latencies Tmb and Tmg . In other words, the ideal-IRC model is better if the backup path gives higher throughput than the primary during bad periods. For outages, since Rpb is zero, condition (3) is always true. Random loss impairments: Figure 5(b) compares ideal-IRC with practicalIRC in the case of random packet losses with probability p. Note that, rather surprisingly, practical-IRC performs better than ideal-IRC when the loss rate is less than 10%. The reason becomes clear from the previous analysis. If Rpb < Rb , ideal-IRC performs better than practical-IRC independent of the measurement latencies. Figure 5(c) compares Rpb and Rb for diﬀerent values of p. Note that the throughput in the primary path during “bad” periods is still higher than the throughput in the backup path, if the loss rate is less than about 12%. An important conclusion from this analysis is that a lossless path is not necessarily better than a lossy path. Consequently, an IRC system that always switches to a path with lower loss rate may fail to provide the maximum possible TCP throughput to its users. An improved IRC system can use TCP throughput as the primary performance metric for path selection. That is, IRC should be able to measure the TCP throughput in the current path, estimate or predict the TCP throughput in other paths, and then switch based on throughput comparisons. The problem of predicting TCP throughput has been the focus of [6].

6

To IRC or Not to IRC?

We can now focus on the following key question: Under which conditions is IRC beneﬁcial to TCP traﬃc, in terms of the aggregate resulting throughput? Speciﬁcally, we compare the TCP throughput of the aggregate T (S, D) with IRC control (practical-IRC) and without IRC control (“no-IRC”). The latter means

Interactions of Intelligent Route Control with TCP Congestion Control

7.0

Throughput (Mbps)

Throughput (Mbps)

7.3

no-IRC practical-IRC

8.0 6.0 5.0 4.0 3.0 2.0

1023

no-IRC practical-IRC

7.1 6.9 6.7 6.5

1.0 10

30 50 70 90 Avg outage duration Tb (sec)

(a) Outage impairment

0%

2% 4% 6% 8% Packet loss probability p

10%

(b) Random loss impairment

Fig. 6. Comparison of practical-IRC with static routing (no-IRC)

that T (S, D) stays at the primary path independent of whether that path is lossy or not. Outage impairments: Figure 6(a) compares practical-IRC and no-IRC in the case of outages. We observe that the throughput of practical-IRC is always higher, indicating that IRC is always beneﬁcial and synergistic to TCP when dealing with outages. The following analysis, however, reveals that under a certain condition IRC may not be beneﬁcial even for outages. First, it is easy to see that the average throughput with the no-IRC model is RN =

Rpb Tb + Rpg Tg Tb + Tg

(4)

The no-IRC model performs better than the practical-IRC model, i.e., RN > RP , when Rb − Rpb Tmg < (5) Rpg − Rb Tb − Tmb Note that Rpg > Rb and Tb > Tmb . For outages, since Rpb = 0, (5) can be written as Rb (Tb − Tmb ) < (Rpg − Rb )Tmg (6) This inequality indicates that if the throughput loss during the periods Tmg (right hand side) is larger than the throughput gain during the periods Tb − Tmb (left hand side), IRC is not beneﬁcial. This can be the case when the outage duration Tb is comparable to the measurement latency Tmb , for instance. Therefore, when (6) is true, it is better to keep the traﬃc at the primary path, instead of switching to the backup path. Random loss impairments: Figure 6(b) shows the corresponding results for random losses. The capacity of the backup path is Cb =6Mbps, in this case, to illustrate that practical-IRC can be better than no-IRC when the loss rate p is larger than a certain value. Note that for low values of p, less than 5% in these simulations, static routing appears slightly better. The analysis of the previous

1024

R. Gao et al.

paragraph still applies, and condition (5) determines whether practical-IRC is better than no-IRC. Note that Rpb >0 in this case. In summary, the simulation and analytical results of this section show that IRC should not switch paths simply based on the detection of some packet losses in the primary path. Major losses or outages should trigger a path change if they persist for more than the measurement latency. For sporadic losses, on the other hand, the key question is whether the measured throughput in the primary path in the presence of such losses is lower than the predicted throughput in the backup path. If that is not the case, then IRC should stay in the current path.

7

Related Work and Summary

A number of papers (see [1] and references therein) explored the performance and availability beneﬁts of IRC systems. The work by Tao et al. [10] evaluated IRC experimentally, also comparing it with overlay routing. Recently, two papers investigated the risk of oscillations when diﬀerent IRC systems get synchronized [4,7]. [7] focused on exploring the conditions under which synchronization may occur, while [4] proposed randomized IRC path switching techniques to avoid synchronization. The interactions of TCP congestion control with adaptive routing have been studied from a more theoretical perspective in [2]. The most related paper to our work is [5], which focused on the stability of intradomain traﬃc engineering when the traﬃc is generated from persistent TCP connections. In conclusion, the main points of this paper are: IRC can be synergistic to TCP if the measurement timescale is larger than TCP’s typical RTT or RTO; IRC needs to examine the RTT diﬀerence between two candidate paths to avoid retransmissions and throughput loss; IRC should expect a throughput transient period after most path changes; switching paths based on loss rate comparisons can lead to throughput loss; IRC is synergistic to TCP for outages that last more than the measurement timescale and for signiﬁcant loss rates.

References 1. A. Akella, B. Maggs, S. Seshan, A. Shaikh, and R. Sitaraman. A MeasurementBased Analysis of Multihoming. In Proceedings of ACM SIGCOMM, 2003. 2. E. J. Anderson and T. E. Anderson. On the Stability of Adaptive Routing in the Presence of Congestion Control. In IEEE INFOCOM, Apr. 2003. 3. Cisco Systems. Optimized Edge Routing. http://www.cisco.com/en/US/netsol/ ns471/networking solutions package.html. 4. R. Gao, C. Dovrolis, and E. Zegura. Avoiding Oscillations due to Intelligent Route Control Systems. In Proceedings of IEEE INFOCOM, 2006. 5. J. He, M. Bresler, M. Chiang, and J. Rexford. Towards Multi-Layer Traﬃc Engineering: Optimization of Congestion Control and Routing. IEEE Journal on Selected Areas in Communications, 2007. 6. Q. He, C. Dovrolis, and M. Ammar. On the Predictability of Large Transfer TCP Throughput. In Proceedings of ACM SIGCOMM, 2005.

Interactions of Intelligent Route Control with TCP Congestion Control

1025

7. R. Keralapura, C. Chuah, N. Taft, and G. Iannaccone. Race Conditions in Coexisting Overlay Networks. IEEE/ACM Transactions on Networking, 2006. 8. A. Medina, M. Allman, and S. Floyd. Measuring the Evolution of Transport Protocols in the Internet. SIGCOMM Comput. Commun. Rev., 35(2):37–52, 2005. 9. V. Paxson and M. Allman. RFC2988: Computing TCP’s Retransmission Timer, 2000. 10. S. Tao, K. Xu, Y. Xu, T. Fei, L. Gao, R. Guerin, J. Kurose, D. Towsley, and Z. Zhang. Exploring the Performance Beneﬁts of End-to-End Path Switching. In Proceedings of IEEE ICNP, 2004.

Fast and Scalable Classiﬁcation of Structured Data in the Network Sumantra R. Kundu1 , Sourav Pal1 , Christoph L. Schuba2 , and Sajal K. Das1 1

Dept. of Computer Science and Eng. University of Texas at Arlington Arlington, TX 76019 – USA {kundu,spal,das}@cse.uta.edu 2 Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 – USA [email protected]

Abstract. For many network services, such as ﬁrewalling, load balancing, or cryptographic acceleration, data packets need to be classiﬁed (or ﬁltered) before network appliances can apply any action processing on them. Typical actions are header manipulations, discarding packets, or tagging packets with additional information required for later processing. Structured data, such as XML, is independent from any particular presentation format and is an ideal information exchange format for a variety of heterogeneous sources. In this paper, we propose a new algorithm for fast and eﬃcient classiﬁcation of structured data in the network. In our approach, packet processing and classiﬁcation is performed on structured payload data rather than only packet header information. Using a combination of hash functions, Bloom ﬁlter, and set intersection theory our algorithm builds a hierarchical and layered data element tree over the input grammar that requires logarithmic time and tractable space complexity.

1

Introduction

Enterprise networks present today usually follow a multi-tiered architecture, wherein at the networking tier, diﬀerent appliances perform diﬀerent logical networking services, such as ﬁrewalling, load balancing, or cryptographic acceleration on data packets prior to their reaching the ﬁrst tier application servers. In order for network appliances to decide which actions are to be applied to individual data packets, datagrams need to be classiﬁed using a set of predeﬁned rules or service classes. Typical classiﬁcation actions include header manipulation, routing, ﬁltering, and tagging packets with additional information for subsequent processing. State of the art content-aware switches implementing deep (application layer) packet inspection technology are still limited to examining only the ﬁrst few hundred bytes of application data; typically attempting to match preconﬁgured URLs or HTTP cookies. They cannot match generically I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1026–1036, 2007. c IFIP International Federation for Information Processing 2007

Fast and Scalable Classiﬁcation of Structured Data in the Network

1027

speciﬁed structured data elements beyond these predeﬁned, basic types of HTTP protocol values. On the other hand, generic packet classiﬁcation involving structured data represents signiﬁcant progress beyond traditional and existing packet classiﬁcation schemes. In this new approach, packets are classiﬁed not based on their destination address, but rather on the nature and type of data present in the payload. Structured data payloads represent strictly ordered sequences of events and can be viewed as a formatted byte strings. Many applications today not only internally organize their data in a structured fashion, but also use similarly structured information for protocol exchanges between them. Such an approach is attractive since it combines data and metadata together and provides a common presentation protocol for a variety of heterogeneous data sources. More and more applications are no longer monolithic, but interact in a distributed fashion that involves extensive peer-to-peer or client-server interaction across the network. Thus, with application chatter occurring in a structured data format, inline network appliances can provide additional beneﬁts if they are able to classify, identify, and operate on traﬃc ﬂows based on the nature of their payload. Among numerous other possibilities, such content-aware packet classiﬁcation facilitates: – Intelligent Network Partitioning: where content and structured data-aware network appliances are able to direct traﬃc streams according to the characteristics and capabilities of the destination network resources. For example, in an enterprise business environment, all ﬁnancial transactions involving structured data formats and matching certain well-deﬁned criteria (e.g., credit card transactions exceeding a certain value), could be directed towards a pool of dedicated resources for better response time. – Preferential Data Dissemination: networks deploying such content-aware appliances are able to provide preferential information dissipation. Thus, network traﬃc could be ﬁltered and delivered according to the preferences of the end user, as embodied in publish/subscribe systems [6]. – Selective Content Encryption: instead of encrypting the entire packet payload during internal network transactions (e.g., email exchanges within diﬀerent departments inside a university network and containing social security numbers of employees) only sensitive portions of the data stream (social security numbers, in this case) could be encrypted and decrypted by such content-aware network appliances. However, such an approach has its own challenges. The ﬁrst and foremost is the overhead and time involved in classifying such structured streams. In this paper, we address this issue and propose an algorithm for fast and eﬃcient classiﬁcation of structured data. Our algorithm executes in the data plane of network appliances and oﬀers the ability to ﬁlter, redirect, and mark network traﬃc for controlled and preferential transmission. As a case study, we use the eXtensible Markup Language (XML) [5] as an example of structured data format for information exchange. We selected XML because it is predicted to become a large portion (as much as 35% [16]) of network traﬃc in the near future. As

1028

S.R. Kundu et al.

shown in Figure 1, we use the hypothetical XML document as a running example throughout our paper. The remainder of the paper is organized as follows: In Section 2, we provide a brief introduction to structured data classiﬁcation in the context of XML and introduce the terminology used in our algorithm. Then, in Section 3 we introduce our algorithm and study its various aspects. In Section 4, we evaluate the performance of one aspect of the classiﬁcation algorithm and highlight the prototype development. Finally, in Section 5, we conclude with remarks and directions for future work.

<workorder value="1313",priority="6"> <customer name="UTA", URL="www.cse.uta.edu"> <priority value="2"> 123456789087 <description> <shipping carrier="UPS">overnight<shipping> <price unit="USD">4567 2

Fig. 1. Sample XML Document used in our study

2

Background

Information exchange using XML consists of structured data sets representing strictly ordered sequences of events. For example, in web services, applications use XML messages in the form of Simple Object Access Protocol (SOAP) messages to exchange data objects among themselves. Among others, studies involving the use and application of XML in publish/subscribe peer-to-peer (P2P) networks have been documented in [12][14]. Most of these studies have focused on preferential data dissemination and appropriate message semantics in P2P and ad-hoc networks. In large databases, studies on structured data have focused on database access strategies and query subsystems development [11]. We diﬀer from both the above approaches by looking at structured data classiﬁcation from the perspective of a network packet classiﬁer. The central idea is to design an XML-aware packet classiﬁer which is capable of understanding XML vocabulary independently of any speciﬁc XML grammar scheme. In Figure 2 we depict the building blocks of our classiﬁer. It consists of three main functional blocks: a rule parser, an event generator, and an event manager. The function of the event generator is to generate tokens based on a speciﬁc deﬁned stream parsing schema. For XML, we use the schema as described in W3C standards [13] to generate the tokens. The function of the event manager is to decide upon the most relevant and appropriate action that matches a given rule for the XML

Fast and Scalable Classiﬁcation of Structured Data in the Network Rule File

1029

Structured Stream

Rule Parser

Event Generator

Stream Schema

Event Manager Actions

Rule Database

Drop

Transform

Redirect

Fig. 2. Architecture of the Network Classiﬁer used for Structured Data Routing

stream under consideration. Typical actions include dropping and/or tagging the data stream for further processing, transforming the stream to a diﬀerent document format (e.g., HTML), or redirecting the stream to a speciﬁc destination endpoint (IP address). The Problem: In the light of the above discussion, we deﬁne the classiﬁcation problem in the following way: Given a structured data stream where data objects appear in an ordered sequence, how to decide upon an action that best matches the input grammar? We observe that in order to decide upon the best action to be initiated on a particular data stream, data sets in the data stream need to be continuously and eﬃciently compared against an in-built structured tree of data sets built over the input grammar. Furthermore, because multiple actions might match a given data stream, the algorithm needs to select the most appropriate action that needs to be applied. We now deﬁne some of the terminology used in this paper. Nodes: Nodes in a structured tree correspond to data deﬁned over the input alphabet. For XML, new nodes are identiﬁed by data associated with the opening tag (<). For example, in Figure 1, customer is a valid element node in the structured tree. Rules and Actions: Rules and actions together deﬁne the structure of the input rule ﬁle that is fed to the rule parser (see Figure 2). Each rule is represented as a tuple; a grammar and an associated action, where the grammar is deﬁned over the domain of an application, conforming to an established schema. Thus, Rule := For XML traﬃc, we have deﬁned the grammar of the input rule ﬁle similar to XPath Expressions (XPE) [13] with relative path names, followed by an action. A sample input grammar (deﬁned in the input rule ﬁle) for our XML document

1030

S.R. Kundu et al. Table 1. Example of grammar executed on XML stream

Rule1 /priority(value== 1);IP:10.11.12.13 : 8234 Rule2 /price> 20000;IP:10.11.12.14 : 8080 Rule3 action:transform:HTML;IP:10.11.12.15 : 80

of Figure 1 might have semantics as shown in Table 1. In the example, priority is an element node and (value == 1) is its attribute. We say an event has occurred when we encounter the data element priority that has the attribute value equal to 1 in the XML data stream. Rule 1 directs the XML stream with the element node, priority, having an attribute (value == 1) to the tuple 10.11.12.14:8234. In Rule 2, we are only interested in events for which the element node price has values greater than 20000. All attributes of price if present in the data stream constitute don’t care conditions. In Rule 3, the XML stream is converted to an HTML document format and sent to the IP address tuple 10.11.12.18 : 80. Due to the fact that structured data is a formatted byte string where the element nodes are arranged in a hierarchical fashion, an input rule ﬁle deﬁnes the element nodes and the associated events the user is interested in. Thus, events occur only if the element nodes deﬁned in the rule ﬁle are also present in the data stream. The taxonomy of the element nodes in the rule ﬁle determine the event list with attributes determining the transition function between the events.

3

Classiﬁcation Algorithm

Since we only need to ﬁnd a subset of events from the structured data stream, we model the element nodes and the relationship between them as a directed acyclic graph. This graph is referred to as the structured data tree (SDT) in our paper. Events occur and traverse along the edges of an SDT. The rule database contains the SDT deﬁned by the input rule ﬁle. This graph is created at system startup or whenever the input rule ﬁle is updated by the input rule parser. It is important to note that normally we do not attempt to construct the whole XML data stream in memory (except for, e.g., document transformations). In order to determine if an element node exists in the rule database, the algorithm maintains a simpliﬁed bloom ﬁlter matrix with controllable degree of false positives. The event generator drives the ﬁnite state machine (FSM) until it reaches a node for which the action is deﬁned. Such nodes are referred to as reachable states. However, multiple actions might match at each reachable state. A Jaccard coeﬃcient vector is calculated to select the most relevant action based on the current event state. This approach is equivalent to a depth-ﬁrst search of the structured data without the overhead of building the entire streaming data tree in computer memory. In the next section, we elaborate how we use these concepts in our classiﬁcation algorithm.

Fast and Scalable Classiﬁcation of Structured Data in the Network

3.1

1031

Bloom Filters

Bloom ﬁlters [3] are commonly used for probabilistic membership query tests on a set of S = {s1 , s2 , . . . sn } elements. Each element of the set S, is mapped to a constant space representation spanning k hash functions, [h1 , h2 , . . . hk ] to create a Bloom Filter Matrix (BFM). The individual hash functions are bit vectors of length v bits and together deﬁne the Bloom ﬁlter space. Initially all the bits of the hash function i, hi , are set to 0. Insertion of a new element, s, into the bloom ﬁlter space is accomplished by setting all the corresponding bits of each of the hash vectors, hi , equal to 1. False positives occurs when a diﬀerent element, s , is incorrectly predicted to be present in the ﬁlter space because all its hi are found equal to one. This happens when both elements s and s suﬀer collisions in their hash values. In our paper, we use the BFM for identifying and ﬁltering relevant element nodes from the event generator. Such an approach greatly reduces the amount of valid FSM state search required during state transitioning. Further, instead of using k diﬀerent hash functions, we break up the SHA1 hash function values into k diﬀerent partitions; thus, mimicking k diﬀerent hash functions. This partitioning is utilized for creating the BFM and for insertions and search operations of element nodes in the rule database. In this study, each partition is referred to as a block and we assume all the blocks to be of equal length. 3.2

BFM: Bloom Filter Matrix

In our algorithm, the BFM is created by deﬁning blocks of length n from the 160 bits SHA1 message digest. This division corresponds to (160/n) number of hash functions (or blocks). For each of the hash functions, the integer value corresponding to the n bits, is set to 1. Thus, for blocks of n bits, the BFM would require system memory equal to 2n × (160/n) ×sizeof (int) bits. This approach is illustrated in Figure 3 for an arbitrary block size of n bits. The percentage of false positives depends on the number of hash functions (k) and on the ratio of the size of the ﬁlter (f ) to the size of the data set (n). For hi = (0 . . . 2v − 1), to a good approximation, the false positive rate (Pf ) is approximately given by [3] [9]: Pf = (1 − p)k

where

p = e−kn/(2

v

−1)

(1)

For memory constrained systems, there are two available options for limiting memory usage: Decreasing the size of individual blocks. For example, a BFM with blocks of size 8 bits will require approximately 68% less memory than an equivalent BFM with blocks of size 10 bits. Using a truncated integer representation for each block. For example, a block length of 8 bits can represent 256 values. However, to conserve memory, the system might want to represent it with only 5 bits, allowing 32 unique values.

1032

S.R. Kundu et al. SHA1 Signature (160 bits) 101010100101 101010100101

101010100101 n bits

size of each Hash Function

Hash Functions

Fig. 3. Bloom Filter Matrix (BFM) created out of a single SHA1 hash function

Decreasing the block length size to 5 bits, will decrease the BFM memory requirement by 87% (as compared to a BFM with blocks of size 8 bits) at the cost of increased false positive rate. However, we amortize this cost by organizing the SDT in system memory so as to minimizing the overhead of node search and retrieval operations. 3.3

Storage of the Element Nodes

In order to eliminate false positives during element node membership queries, we have to make sure that the node exists in the rule database. Blocks of length n bits deﬁned over the SHA1 hash function not only identify the hash vectors hi of the BFM, but also deﬁne levels in the rule database memory. In such a case, it is not diﬃcult to see that the maximum depth of the SDT along any search path is (160/n). In the best case when the signatures of all element nodes have at least one bit diﬀerent among their ﬁrst n bits (i.e., in the ﬁrst block), there are no collision, and all elements ﬁt in the ﬁrst level of the SDT. Associated with each level of the SDT is a Signature Map Table (SMT). It contains entries which are individual element node signature digests corresponding to level i and are mapped to the associated address in system memory where nodes reside. The size of the SMT is equal to the number of entries times the space required to store the element signatures. Collisions occur when two elements have identical signature bit patterns of length n at level i. In this case, the Signature Map Table (SMT) contains a reference to the next level of the tree. The depth of the SDT traversal is directly related to the probability of a collision occurring between two nodes with a signature digest of length n. The probability of two element nodes having identical bit patterns for a message digest of length n bits is 2/2n . This value is equal to approximately 1.35e − 48 for n = 160 (total length of the message digest of SHA1). The probability of two

Fast and Scalable Classiﬁcation of Structured Data in the Network

1033

element nodes having at least one bit diﬀerent is equal to ( 1 − 2nnC2 ), for a message digest of length n bits. We choose SHA1 for our algorithm because of its low collision probability and its collision resistance. The latter property implies that the probability of node collision decreases rapidly as our algorithm traverse the SDT along any search path. Figure 4 shows the organization of the system memory for any level i. 160 bits SHA1 digest

1111 0000 0000 1111

160 bits SHA1 digest

signature address

defines signature for level ’i’ denotes XML Node

from level (i−1)

system memory level i

level (i+1)

Fig. 4. Organization of Element Nodes in System Memory (RAM)

3.4

Checking the Existence of an Element Node

Determining the existence of the element node in the rule database is a two step process. In the ﬁrst step, the SHA1 message digest is calculated and is partitioned into blocks of n bits. Then, the corresponding entries of the BFM are checked in parallel. If any one of the entries corresponding to the integer representation of the block of n bits among the k, (k = 1, 2 . . . 160/n) hash functions, is equal to 0, then the element node is not present in the rule database and the algorithm terminates. Else, the node exists with a certain probability of false positive. To treat false positives, in the second step of the search operation, the algorithm searches through the SDT until an entry in the SMT is found that contains the element node or until a path of length 160/n in the SDT has been traversed. Building the Structured Data Tree (SDT): After all the element nodes have been inserted into the rule database, individual nodes need to be connected to form the SDT. This step is performed based on the relationship enforced between the element nodes in the input rule ﬁle. The SDT, thus, deﬁnes the event list and the ﬂow of events in our classiﬁer architecture.

1034

3.5

S.R. Kundu et al.

Selecting the Most Appropriate Action in the Context of an Element Node

In structured data, individual element nodes have predicates which are valid within the name space of the node only. For example, the element node priority has the predicate (value = 2) in our example of Figure 1. The predicates deﬁne the events in the context of the element node and are stored as members of a set. There might be boolean relationship between the members themselves. Suppose we have the following grammar present in the input rule ﬁle: /workorder(value>’’1000’’) OR (priority>’’2’’) OR (currency=’’USD’’); /workorder(value<=’’2765’’) OR(customer=’’sun.com’’); /workorder(department!=’’sales’’) OR(value>’’345’’ );

Now, in the context of XML element workorder of Figure 1, we observe that all the above rules have one or more members which match this data stream. Our aim is to ﬁnd the action which best ﬁts the given data stream. The Jaccard coeﬃcient between two sets A and B is deﬁned as the quotient of the cardinality of set A ∩ B and set A ∪ B and is used to ﬁnd the degree of similarity between the two sets A and B: J = |A ∩ B|/|A ∪ B| Thus, given sets A and B, we can calculate the degree of resemblance between the two. The rule corresponding to the set with the highest value of J is chosen for the given structured data stream. In case of identical values of J one of the possible actions is chosen at random.

4

Experimental Evaluation

We have implemented our classiﬁcation algorithm in C++ and conducted experiments on a Pentium IV 2.4 GHz Intel CPU with 1 GB main memory running SuSe 9.2 Professional edition on Linux kernel 2.6.8. We used the Xerces C++ XML parser [15] in our preliminary evaluation phase. The following events are generated by the XML parser: startStream(), startElementNode(), elementConstraints(), endElementNode(), endStream(). Events startStream() and endStream() denote the start and end of the structured data stream. Event startElementNode() triggers the context of the element node while event endElementNode() closes the context. Event elementConstraints() deﬁnes all the events and transition functions within the name space of the element node. We have evaluated the redirection feature of the classiﬁcation algorithm for XML documents with an input rule ﬁle containing 2500 rules as deﬁned in Section 2. From Figure 5, we observe that the overhead of redirection increases with the increase in the number of rules, but the increase in overhead is still manageable. A prototype for Content Based Routing (CBR) is currently being developed using the the Click software router [7]. Such a router will have the capability

Fast and Scalable Classiﬁcation of Structured Data in the Network

1035

0.1

Execution Time (in seconds)

0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0

50

1000

2000

3000

4000

Number of Queries

Fig. 5. Performance of the algorithm with number of queries in the rule ﬁle

to perform inline network ﬁltering and forwarding of structured data and will directly interface with the network device driver as a loadable kernel module. The classiﬁcation algorithm presented in this paper is currently being integrated inside our CBR testbed for structured data routing.

5

Conclusions and Future Work

In this paper, we have introduced a novel algorithm for classiﬁcation of structured data in the network. In the classiﬁer architecture, a combination of hash functions, a Bloom ﬁlter, and a Jaccard coeﬃcient vector has been used to create a structured data tree and to select the most appropriate action corresponding to an input structured data stream. We have implemented the classiﬁcation algorithm and evaluated the redirection feature of the algorithm. It is observed that while the execution time increases with respect to the number of queries over a set of 2500 tags, it is well within tractable limits. While the results are encouraging, we intend to carry out further investigations with real time network traﬃc. There exists several opportunities to extend the results of our work. Using Merkle trees [8] it is possible to improve the process of instantiation and searching the SDTs. Such an approach would guarantee both logarithmic time and space complexity for all combinations of SDT operations. We are also looking upon improving the action matching algorithm and deﬁning an eﬃcient structured data parser that can extract tokens when the structured data is found to cross network packet boundaries. Acknowledgment. The material of this work is supported by NSF ITR grant IIS-0326505. We would also like to thank the anonymous reviewers for their helpful comments and suggestions which improved the quality of this study.

1036

S.R. Kundu et al.

References 1. S. Abiteboul, A. Bonifati, G. Cobena, I. Manolescu, and T. Milo, “Dynamic XML Documents with Distribution and Replication”, ACM SIGMOD, pp. 527-538, June 2003. 2. M. R. Anderberg, “Cluster Analysis for Applications”, New York Academic Press, 1973. 3. B. H. Bloom, “Space/time trade-oﬀs in hash coding with allowable errors”, Communications of the ACM, pp. 422-426, 1970. 4. J. W. Byers, J. Considine, M. Mitzenmacher, and S. Rost, “Informed Content Delivery Across Adaptive Overlay Networks”, IEEE/ACM Transactions on Networking (TON), vol.12, pp 767-780, 2004. 2002. 5. eXtensible Markup Language (XML), Information available online at: http://www.w3.org/XML/ 6. P. W. Foltz and S. T. Dumais, “Personalized information delivery: an analysis of information ﬁltering methods”, Communications of the ACM, vol. 35, pp 51-60, 1992. 7. R. Morris, E. Kohler, J. Jannotti, and M. Frans Kaashoek, “The Click Modular Router”, ACM Symposium on Operating Systems Principle (SOSP), pp. 217 -233, 1999. 8. R. Merkle, “Protocols for public key cryptography”, IEEE Symposium on Security and Privacy, pp. 122-134, 1980. 9. M. Mitzenmacher, “Compressed bloom ﬁlters”, ACM Symposium on Principles of Distributed Computing, pp. 144-150, 2001. 10. National Institute of Standards and Technology (NIST), “Announcing the secure hash standards”, Federal Information Proceeding Standards Publication, August 2002. 11. D. Olteanu, T. Furche, F. Bry, “An Eﬃcient Single-Pass Query Evaluator for XML Data Streams”, ACM Symposium on Applied Computing, pp. 627 - 631, 2004. 12. S. Ratnasamy, M. Handley, R. M. Karp, and S. Shenker, “Application-Level Multicast using Content-Addressable Networks”, Third International COST264 Workshop on Networked Group Communication, pp. 14-29, 2001. 13. W3C XML Schema, Information available online at: http://www.w3.org/XML/ Schema 14. A. Snoeren, K. Conley, and D. K. Giﬀord, “Mesh-based Content Routing using XML”, ACM Symposium on Operating Systems (SOSP), pp. 160-173, 2001. 15. Xerces C++ Parser, Information available online at: http://xml.apache.org/ xerces-c/ 16. ZapThink, Information available online at: http://www.zapthink.com/ 17. N. Duﬃeld, C. Lund, and M. Thorup, “Estimating Flow Distributions from Sampled Flow Statistics”, IEEE/ACM Transactions on Networking (TON), vol. 13, pp. 933-946, October 2005. 18. A. Kumar, J. Xu, O. Spatschek, and L. Li, “Space-code Bloom Filter for Eﬃcient Per-Flow Traﬃc Measurement”, IEEE INFOCOM, vol. 3, pp. 1762-1773, August 2004. 19. D. Shah, S. Iyer, B. Prabhakar and N. McKeown, “Maintaining Statistics Counters in Router Line Cards”, IEEE Micro, vol. 22, pp. 76-81, 2002.

An Eﬃcient and Secure Event Signature (EASES) Protocol for Peer-to-Peer Massively Multiplayer Online Games Mo-Che Chan1 , Shun-Yun Hu2 , and Jehn-Ruey Jiang3 1

2

3

National Central University, Chung-Li, Taiwan 32054 [email protected] National Central University, Chung-Li, Taiwan 32054 [email protected] National Central University, Chung-Li, Taiwan 32054 [email protected]

Abstract. In recent years, massively multiplayer online games (MMOGs) have become very popular by providing more entertainment and sociability than single-player games. In order to prevent cheaters to gain unfair advantages in peer-to-peer (P2P)-based MMOGs, several cheat-proof schemes have been proposed by using digital signatures. However, digital signatures generally require large amount of computations and thus may not be practical to achieve real-time playability. We propose an Eﬃcient and Secured Event Signature (EASES) protocol to eﬃciently sign discrete event messages. Most messages need only two hash operations to achieve non-repudiation and event agreement. The computation, memory, and bandwidth consumptions of EASES are low, which makes it applicable to P2P-based MMOGs.

1

Introduction

Multiplayer online games are a rapidly growing segment of Internet applications in recent years. By providing more entertainment and sociability than singleplayer games, it is fast becoming a major form of digital entertainment. Earlier multiplayer games adopt client-server architectures where all players establish connections with the server to send and receive event updates. However, a single server cannot support a large numbers of concurrent players at the same time, so server-clusters were subsequently introduced to enable massively multiplayer online games (MMOGs) where concurrent users may reach into the range of hundreds of thousands [1]. Current MMOGs adopt server-cluster architectures to provide better scalability than earlier architectures. However, a server-cluster has only limited amount of resources such as CPU and bandwidth at a given time and poses as a single point of failure. A distributed approach to MMOGs thus may be more scalable if millions of concurrent users were to be supported. A

This research was supported in part by the National Science Council of the Republic of China under the grant NSC95-2221-E-008-048-MY3.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1037–1046, 2007. c IFIP International Federation for Information Processing 2007

1038

M.-C. Chan, S.-Y. Hu, and J.-R. Jiang

number of recent peer-to-peer (P2P) virtual environment (VE) [2,3,4,5,6,7,8,9] research thus seeks to further improve the scalability of existing MMOGs. The key is to correctly and eﬃciently maintain the topology of all participating peers by solving a neighbor discovery problem [9]. Unlike earlier fully-connected P2P systems where the number of connections grow exponentially with users. In scalable P2P-based VEs, it is unnecessary to establish contacts between every pairs of player nodes but only with those that are within a player’s visibility range. Although P2P-based MMOG may provide better scalability, by distributing server-side workload to clients, new issues such as maintaining consistency and ensuring fairness are introduced [2, 10, 11]. In server-based MMOGs, game states (e.g. user status, experience points, equipments, world items) are maintained and updated via game logic executions on trusted servers. When a player makes an action such as running, shooting, turning, etc., an event message is sent to the server, which subsequently processes the events and updates the game states. The server then periodically sends updated states to relevant players to keep their local copies of the game states synchronized with the server’s. Under such event processing model, the server maintains all the information to ensure a global ordering of event executions and fair gameplay. However, when we turn to a P2P approach of MMOG, game states maintenance and game logic execution may be distributed to each individual players, creating opportunities for players to cheat. A player may gain various advantages by cheating in multiplayer games. It is especially attractive as gaining valuable items or winning over other players is central to the gameplay value in commercial MMOGs. Valuable virtual items can even be sold in exchange for real-world money, which increases the motivations for cheating. Therefore, cheat-prevention is very important to game designers. To prevent cheating, games may adopt cryptographic techniques. Several cheat types [11] occur when the actions of non-cheating players are known in advance by cheaters, who can then respond unfairly to their advantages (i.e. similar to in a real-life bridge game, if certain players deal their cards only after knowing what cards the opponents will choose, then they could have unfair advantages). Cryptographic techniques have been proposed to prevent players’ actions from being known by others before each player submits the ﬁnal decisions [10]. A potential countermeasure is thus the adoption of a secured event updating protocol that includes 1) a commitment scheme to ensure that players do not change their actions after decisions are made; and 2) a digital signature scheme to ensure that players cannot deny the actions they have done. If the commitments are digitally signed, we can also prevent impersonation and dodging [12,13]. Commitment and digital signatures together therefore provide a fair environment for games even when players do not trust each other [12, 13], as signatures prevent any identity forging and commitments prevent a player from changing the action decisions that are already made. However, there exists some time-consuming exponential operators in public-key cryptography for signing signatures, so signiﬁcant computational power is required to sign and verify all event updates.

An Eﬃcient and Secure Event Signature (EASES) Protocol

1039

In existing cryptography schemes, when a large continuous document needs to be signed, a hash function can be used instead of digital signatures to reduce the computations. The document is ﬁrst hashed into a much smaller digest, which is then signed by a digital signature. The digest is just one short message, but it can be used to characterize the original message. As it is very hard to ﬁnd another document that can produce the same digest, signing the digest can be seen as signing the original document. Unforgeability and veriﬁability are the two requirements of digital signatures currently achievable only by publickey cryptography. As digital signature is necessary to achieve non-repudiation (i.e. undeniableness of user actions) due to its unforgeability and veriﬁability, some event update protocols [11, 14] have been proposed to use digital signatures to sign every event messages to avoid cheating in P2P-based MMOGs. A player cannot repudiate his/her signature after signing a message under these two properties. However, we cannot treat the the many discrete event messages in MMOGs as a large document. How to sign many discrete event updates so that unforgeability and veriﬁability can still be achieved without too much computation is thus our main concern. We propose an Eﬃcient and Secured Event Signature (EASES) protocol to eﬃciently sign many event update messages. As we will show, EASES has low computation costs and consumes little memory or bandwidth. It is thus applicable to P2P-based MMOGs. In the following sections, we ﬁrst review related work for cheat-proof mechanisms in Section 2 and present a message transformation model in Section 3. We propose the eﬃcient one-time signature scheme in Section 4, and evaluate its security and performance in Section 5. The paper is concluded in Section 6.

2

Related Work

Cheating in multiplayer games have been described by several papers. Yan [15] examines several security requirements that impact the design of online games by using a simple client-server bridge game. Kirmse and Kirmse [16] present the security goals for online games in areas such as the protection of sensitive information and the provision of a fair playing ﬁeld. Smed et al. [17] describe issues in multiplayer games such as network resources, communication architectures, scalability and security. Besides the descriptions of cheats, several cheat-proof mechanisms have also been proposed. GauthierDickey et al. proposed the New Event Ordering (NEO) protocol [11] to improve on the work of Baughman and Levine [10] and Cronin et al. [18]. NEO claims to prevent common protocol-level cheats with low latency, such that the adversaries cannot gain any advantages by modifying messages between players. Corman et al. [14] later show that NEO cannot prevent all cheats as claimed, and present an improvement called Secure Event Agreement (SEA). However, both protocols are not practical due to the excessive use of cryptographic signatures. As shown by the NESSIE project [19], digital signatures consume much more computations than hash functions and symmetric encryptions. However, the use of digital signatures is unavoidable to prevent users from denying their behaviors. A key question thus is how to use digital signature only minimally while achieving the same cheat-proof properties.

1040

M.-C. Chan, S.-Y. Hu, and J.-R. Jiang

As we seek to present a more eﬃcient protocol based on NEO and SEA, the two protocols are ﬁrst examined below. 2.1

Description of NEO

GauthierDicky et al. propose NEO to avoid cheating by adding a voting mechanism to compensate for packet loss in the environment [11]. NEO divides the time into ﬁxed-length rounds, where every player sends an event update in each round. The event update of NEO is given in Formula (1), where {}KAr represents encryption, SA () is the signature function, UAr is the update from player A for r−1 round r, KA is A’s key for the update from round r − 1, and VAr−1 is a bit vector for voting. The voting is used to form a consensus on whether a given player has sent an update within a round, in order to determine if an update by that player should be accepted. When each player acts, an event update is signed and encrypted before sending to other players. The encrypted event update and its signature serve as a commitment, which is revealed in the next round when the player sends the key. So players cannot modify their own actions after the commitments are sent (i.e. after they have learned of others’ actions). r−1 MAr = {SA (UAr )}KAr , KA , SA (VAr−1 )

(1)

However, Corman et al. show there are some problems in NEO (e.g. an attacker can replay updates for another player, or send diﬀerent updates to diﬀerent opponents [14]) and present an improvement called SEA. 2.2

Description of SEA

Corman et al. [14] present an improvement of NEO as described in Formula (2), where H() is a hash function, UAr is the update from player A for round r, nr is a random value for round r, SessID is the session ID to prevent replaying this message in a diﬀerent session or with a diﬀerent group of players, IDA is the unique identity of player A, and V hr−1 is the update from player A for round A r − 1 that includes a hash of the update. CommitrA = H(UAr , nr , SessID, IDA ), (2) r−1 MAr = SA (CommitrA , UAr−1 , V hr−1 , r) A ,n In order to achieve better performance and remove potential issues with key tampering and selection, the SEA protocol replaces encryption with a cryptographic hash function as the commitment method. When commitment is done by encryption, the distribution, protection and selection of keys must be carefully considered, or problems would arise from poorly designed key management schemes. However, if the commitment is made via hash functions, the above key-related issues can be avoided. Signing the entire message is used to authenticate the message creator, and the hash serves to commit the player to the message sent in the next round. Every player thus is forced to accept the actions already submitted. Player signatures can also be checked to validate the players.

An Eﬃcient and Secure Event Signature (EASES) Protocol

3

1041

Message Transformation Model

We seek to achieve both scalability and fairness for P2P-based MMOGs, the proposed event signature protocol thus should be usable independent of the underlying network topologies. Our method can also be treated simply as a more eﬃcient signature scheme. As public-key cryptosystem requires a large amount of computations and time, to improve the eﬃciency, we compute the message’s digest by a hash function before signing the message. By using a digest, not only existential-forage is prevented, computation time is also greatly saved. The original signature and its message is described in Formula (3). ⎧ the full message, ⎨M (3) H(M ) digest of M, ⎩ Ssk (H(M )) signature of message digest. Although the above method is eﬃcient for one message, it cannot sign many discrete event updates across diﬀerent time periods (i.e. rounds) as required in a game scenario. We therefore try to ﬁnd an eﬃcient signature protocol usable under such environment. The proposed eﬃcient signature for discrete messages adapts to diﬀerent networks such as fully-connected or scalable P2P topologies, and is described in Formula (4). M1 , M2 , . . . , Mi , . . . , Mn discrete messages, (4) Ssk (Mi ) signatures for message Mi . To prevent other attacks such as eavesdropping using a network sniﬀer, authenticated key exchange protocol [20] can be used to agree the session keys to encrypt data between every pair of transmission parties. Adding a counter number to the original message before encryption can also prevent eavesdropping, modiﬁcation and fabrication attacks. However, this topic is beyond our discussion.

4

The Proposed Scheme

One-time signature was proposed in 1981 [21] to authenticate remote users on a computer network. In order to sign the discrete event updates eﬃciently, we propose a one-time signature variant to achieve the same eﬀects as the regular signature scheme. There are four phases in EASES: 1) Every player generates the keys for signing event updates in the initialization phase. 2) Every player signs his/her event update in the signing phase. 3) After the event updates are received from other players, each player can verify these event updates in the veriﬁcation phase. 4) In the re-initialization phase, players can re-generate new signature keys when the keys have been used up. The notations are listed in Table 1.

1042

M.-C. Chan, S.-Y. Hu, and J.-R. Jiang Table 1. Notations P layeri H(x) OSKij Ssk (x) x|y δij Δ

4.1

each player in the game, where i ∈ {1..m} hash operator with input message x playeri ’s j th one-time signature key message signing x by secret key sk concatenation of the message x and y signature signed by playeri ’s j th OSK signature signed by secret key sk

Initialization Phase

Before starting the game, each player must ﬁrst generate his/her one-time signature keys. The operation of generating a list of one-time signature keys can be done before a user logins the game, or before the ﬁrst event update is sent. Basis: For each player playeri , an unpredictable random one-time master key M Ki is ﬁrst picked to compute a series of the player’s one-time signature keys OSKin = H(M Ki ), where n is a system parameter that speciﬁes the maximum number of times for signing updates by each player. Choosing larger n may save computation time during the later updating stage, but it may also increase the time to compute the one-time signature keys. Induction: Every player subsequently computes the one-time signature keys OSKij = H(OSKij−1 ), where j ∈ 1..n − 1 is a one-time signature key. After these one-time signature keys (i.e. a hash chain) are generated, every player signs the ﬁrst one-time signature key by the player’s secret key Δ = Ssk (OSKi1 ). Those keys are then stored in the players’ computer. Master key = M Ki OSKin

= H(M Ki )

OSKin−1 OSKin−2

= H(OSKin )

....

= ....

....

= ....

OSKi2 OSKi1

= H(OSKi3 )

Δi

= Ssk (OSKi1 )

= H(OSKin−1 )

= H(OSKi2 )

Fig. 1. Generating OSKs in the initialization phase

4.2

Signing Phase

The player computes the signature of the event updates δij = H(OSKij |Mij ), where Mij is the playeri ’s j th event update. During each round, each player sends an event update and its signature M |δij to other players. Note that if the

An Eﬃcient and Secure Event Signature (EASES) Protocol

1043

situation requires (e.g. to re-initialize another list of one-time signature keys), one should not use up all the keys, but to at least reserve the last two keys OSKin and OSKin−1 . Each round, the signature of the event update playeri has to send is shown in Formula (5). Δi , δi1 = H(OSKi1 |Mi1 ) the ﬁrst round, (5) δij = H(OSKij |Mij ) the following rounds. For instance, playeri sends the signature δi1 = H(OSKi1 |Mi1 ) in the ﬁrst round. In the second round, playeri sends δi2 = H(OSKi2 |Mi2 ), Mi1 , OSKi1 , and Δi . In subsequent rounds, playeri sends δij = H(OSKij |Mij ), Mij−1 , and OSKij−1 . 4.3

Veriﬁcation Phase

When each player receives other players’ event updates, it is necessary to verify those messages. Each player receives δi1 = H(OSKi1 |Mi1 ) in the ﬁrst round, and then receives and veriﬁes the signature Δi in the second round. Subsequently, ? each player receives and veriﬁes H(OSKij |Mij ) = δij in the (j + 1)th round. Additionally, the signature Δi also needs to be veriﬁed when the ﬁrst update is received. 4.4

Re-initialization Phase

The signature keys have to be re-generated when they are used up. One basic method is for players to re-execute the initialization phase to generate new signature keys. However, a more eﬃcient way exists if we assume that the last two keys are reserved: a player ﬁrst generates the new one-time signature keys N ewOSK1..n and then signs the head of the new signature key by the previous signature key Δreinit = H(OSKin−1 |N ewOSKi1 ).

5 5.1

Evaluation Security Analysis

Unforgeability and veriﬁability are the two fundamental requirements for digital signatures. Unforgeability means that no one can generate a legal signature for a speciﬁc message except the signer. Signer is the only one who keeps the correct private key for generating a legal signature veriﬁable by the corresponding public key. Veriﬁability means that every one can verify whether a digital signature is legal. We thus evaluate EASES according to these two requirements. Unforgeability. One cannot claim that he has signed Mij to get the signature δij unless he can present OSKij . No one can show that he knows the signing key OSKij for the current message Mij before the original signer reveals it in the next round. The cryptographic hash function also has the following secured

1044

M.-C. Chan, S.-Y. Hu, and J.-R. Jiang

properties: for any given hashed value, it is computationally infeasible to ﬁnd its pre-image due to the one-way property of hash functions. For any given message x, it is computationally infeasible to ﬁnd another message that has the same hash value, such that H(x) = H(x ). Moreover, strong collision resistance property exists in some hash functions, meaning that it is computationally infeasible to ﬁnd any pair of (x, y) such that H(x) = H(y). EASES is unforgeable since it adopts a secured hash function that has the properties mentioned above. With unforgeability, non-repudiation can be achieved in EASES. Veriﬁability. Cryptographic hash function is a public standard that can be installed and executed on every player’s computers, so that everyone can re-compute the hash value of a given signature key OSKij and message Mij ?

to verify if it equals to the received signature H(OSKij |Mij ) = δij . 5.2

Performance Analysis

Computational cost. The major beneﬁt of the proposed scheme is computational eﬃciency. There is at least one signature operator and one veriﬁcation operator for each event update in the existing schemes (e.g. RSA, DSA, etc.). In comparison, there are only three hash operators for each event update in EASES: two for signing and one for verifying, and one traditional signature operator during the initialization phase in EASES. The approximate CPU cycles for some cryptographic functions are listed in Table 2. Table 2. Approximate CPU cycles for some cryptographic functions [14, 19] Primitive type Hash function Symmetric encryption Digital signature

Example Clock cycles SHA-1 15/byte + 1040 AES 25/byte + 504 RSA-PSS 42,000,000

Memory consumption. EASES requires all players to prepare a block of memory to store the list of one-time signature keys. It exact size depends on the hash code length L(H) and the chosen n. For example, if SHA-1 is used and n = 1000, then each player will need 1000 * 192 = 192,000 bits = 24,000 bytes of memory to store the one-time signature keys. The consumed space is not needed in other existing digital signature schemes. Bandwidth consumption. Except for the ﬁrst update, EASES needs to transfer the one-time signature for every event update, with the size of a hash code. This is much shorter than the size of traditional digital signatures. For example, SHA-1 uses 192 bits for each hash code, whereas 1024-RSA uses 1024 bits for each signature.

An Eﬃcient and Secure Event Signature (EASES) Protocol

5.3

1045

Comparisons

We brieﬂy compare EASES against NEO [11] and SEA [14], both of which adopt traditional digital signatures. There are two signature operators in NEO and one signature operator in SEA. Each event update in NEO is given in Formula (1), and Formula (2) describes the update in SEA. With EASES, the signature operators used in each event update can be replaced by just two hash operators. 5.4

Discussions

The proposed scheme in this paper is a signature protocol that can be used to sign many discrete messages eﬃciently. It is thus inherently topology independent. In fact, the event update protocol can be adopted to any topology easily [3], by requiring the protocol be used between every pairs of connected players. However, network topology may change constantly in P2P-based MMOGs. Since existing signature schemes sign each event update independently, the event updates can be sent to diﬀerent targets directly even when the topology has changed. In contrast, EASES adopts a sequence of hash values, signatures thus are not independent to event updates. If a new target emerges, all one-time signatures (i.e. sequence of hash values) and event updates up to the current (i.e. j th ) event have to re-sent to the new target. An alternative is for the player to regenerate a new hash chain and sign its head by executing the initialization phase, treating a new target as a newly joined player. EASES therefore cannot fully eliminate traditional digital signatures, but it can be used to reduce the signature operators used.

6

Conclusion

In this paper, an eﬃcient and secured event signature (EASES) protocol for P2P-based MMOGs is proposed. This protocol has the non-repudiation property inherited from traditional signature schemes. Furthermore, the proposed scheme requires much less computation, which makes it applicable to P2P-based MMOGs. The security of EASES is shown to possess unforgeability and veriﬁability as traditional digital signatures. By signing a hash chain, event commitment can be implicitly achieved. The computation, memory and bandwidth consumptions of EASES are also shown to be low. We have also shown how EASES may be adapted to non-fully-connected and dynamically changing network topologies such as P2P-based MMOGs.

References 1. Woodcock, B.S.: An analysis of MMOG subscription growth (2006) http://www. mmogchart.com. 2. Knutsson, B., Lu, H., Xu, W., Hopkins, B.: Peer-to-peer support for massively multiplayer games. In: Proc. IEEE Infocom. (2004)

1046

M.-C. Chan, S.-Y. Hu, and J.-R. Jiang

3. GauthierDickey, C., Lo, V., Zappala, D.: Using n-trees for scalable event ordering in peer-to-peer games. In: Proc. of the international workshop on Network and operating systems support for digital audio and video, ACM Press New York, NY, USA (2005) 87–92 4. Keller, J., Simon, G.: Solipsis: A massively multi-participant virtual world. In: Proc. of PDPTA. (2003) 262–268 5. Morillo, P., Moncho, W., na, J.O., Duato, J.: Providing Full Awareness to Distributed Virtual Environments Based on Peer-To-Peer Architectures. (June 2006) 6. Lee, J., Lee, H., Ihm, S., Gim, T., Song, J.: Apolo: Ad-hoc peer-to-peer overlay network for massively multi-player online games. Technical report, KAIST Technical Report, CS-TR-2005-248 (December 2005) 7. Douglas, S., Tanin, E., Harwood, A., Karunasekera, S.: Enabling massively multiplayer online gaming applications on a p2p architecture. In: Proc. IEEE Intl. Conf. Information and Automation. (December 2005) 7–12 8. Bharambe, A., Pang, J., Seshan, S.: Colyseus: A distributed architecture for multiplayer games. In: Proc. ACM/USENIX NSDI. (2006) 9. Hu, S., Chen, J., Chen, T.: VON: a scalable peer-to-peer network for virtual environments. IEEE Network 20(4) (2006) 22–31 10. Baughman, N., Levine, B.: Cheat-proof playout for centralized and distributed online games. In: Proc. of IEEE Infocom. (2001) 11. Dickey, C., Zappala, D., Lo, V., Marr, J.: Low latency and cheat-proof event ordering for peer-to-peer games. In: Proc. of ACM International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV), Kinsale, County Cork, Ireland. (2004) 134–139 12. Kamada, M.: A fair dynamical game over networks. In: Proc. of the 2004 International Conference on Cyberworlds (CW’04). (2004) 141–146 13. Kamada, M., Kurosawa, K., Ohtaki, Y., Okamoto, S.: A network game based on fair random numbers. IEICE Transactions on Information and Systems 88(5) (2005) 859–864 14. Corman, A., Douglas, S., Schachte, P., Teague, V.: A secure event agreement (SEA) protocol for peer-to-peer games. In: First International Conference on Availability, Reliability and Security. (2006) 15. J., Y.: Security design in online games. In: Proc. of the 19th Annual Computer Security Applications Conference. (2003) 286–295 16. Kirmse, A., Kirmse, C.: Security in online games. Game Developer 4(4) (1997) 20–8 17. Smed, J.: Aspects of networking in multiplayer computer games. The Electronic Library 20(2) (2002) 87–97 18. Cronin, E., Filstrup, B., Jamin, S.: Cheat-prooﬁng dead reckoned multiplayer games. In: Proc. of Application and Development of Computer Games. (2003) 19. Preneel, B., Van Rompay, B., Ors, S., Biryukov, A., Granboulan, L., Dottax, E., Dichtl, M., Schafheutle, M., Serf, P., Pyka, S.: Performance of optimized implementations of the NESSIE primitives (2003) http://www.cosic. esat.kuleuven.ac.be/nessie/deliverables/. 20. The MIT Kerberos Team: The network authentication protocol http://web.mit. edu/Kerberos/. 21. Lamport, L.: Password authentication with insecure communication. Communications of the ACM 24(11) (1981) 770–772

Uniﬁed Defense Against DDoS Attacks M. Muthuprasanna, G. Manimaran, and Z. Wang Iowa State University, Ames, IA, USA - 50011 {muthu,gmani,zhengdao}@iastate.edu

Abstract. With DoS/DDoS attacks emerging as one of the primary security threats in today’s Internet, the search is on for an eﬃcient DDoS defense mechanism that would provide attack prevention, mitigation and traceback features, in as few packets as possible and with no collateral damage. Although several techniques have been proposed to tackle this growing menace, there exists no eﬀective solution to date, due to the growing sophistication of the attacks and also the increasingly complex Internet architecture. In this paper, we propose an uniﬁed framework that integrates traceback and mitigation capabilities for an eﬀective attack defense. Some signiﬁcant aspects of our approach include: (1) a novel data cube model to represent the traceback information, and its slicing along the lines of path signatures rather than router signatures, (2) characterizing traceback as a transmission scheduling problem on the data cube representation, and achieving scheduling optimality using a novel metric called utility, (3) and ﬁnally an information delivery architecture employing both packet marking and data logging in a distributed manner to achieve faster response times. The proposed scheme can thus provide both per-packet mitigation and multi-packet traceback capabilities due to eﬀective data slicing of the cube, and can attain higher detection speeds due to novel utility rate analysis. We also contrast this uniﬁed scheme with other well-known schemes in literature to understand the performance tradeoﬀs, while providing an experimental evaluation of the proposed scheme on real data sets.

1

Introduction

Denial-of-Service (DoS) attacks are a menace we have come to live with in today’s Internet. Distributed DoS attacks on several sites including Yahoo and eBay, and against root DNS servers had virtually paralyzed the Internet in early 2000s. Recent attacks motivated by political and economic reasons on SCO, RIAA, 2Checkout, Blue Security, EveryDNS etc. have established a disturbing trend, and unless we address this issue now, there might soon be an avalanche of DDoS attacks crippling the entire Internet infrastructure. The stateless nature and destination-oriented routing of the Internet makes tracking of attackers, employing source address spooﬁng, a diﬃcult problem to address [1] [2]. The need of the hour is a technique that not only traces attackers but also aids in eﬀective mitigation of the ongoing attacks [3]. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1047–1059, 2007. c IFIP International Federation for Information Processing 2007

1048

M. Muthuprasanna, G. Manimaran, and Z. Wang

To be realistically applicable on an Internet scale, the proposed schemes must be incrementally deployable, scalable, require minimal changes to existing hardware, maintain high accuracy during large volume attacks, resist tampering by spoofed data injection, and require very few packets to complete traceback while also simultaneously triggering mitigation, amongst other requirements. The traceback schemes in literature address the problem of collecting information about individual packet forwarding agents and collating this data to obtain attack source/path statistics; while the mitigation schemes address the problem of dropping malicious packets using the concept of path identiﬁers and do not overtly concern themselves with the identiﬁcation of the attackers themselves. We analyze the traceback problem here as independent data representation and data transmission issues. We also propose a novel data model and metric to help us better evaluate traceback schemes. We also exploit the concept of path signatures here so that the proposed traceback scheme can additionally support eﬀective mitigation, thus realistically obtaining the better of the two worlds. The rest of the paper is organized as follows. Section 2 reviews the diﬀerent schemes known, while Section 3 presents the basic principle and motivation behind the proposed solution. Sections 4 and 5 discuss the data representation and the data transmission phases respectively. Section 6 analyzes the performance tradeoﬀs, while Section 7 concludes.

2

Related Work

The earliest work on DDoS defense led to the concept of network traceback [4] by Burch and Cheswick. Bellovin et.al. proposed ICMP-based out-of-band messaging in iTrace [5], while Snoeren et.al. proposed SPIE [2] employing packet logging, which was subsequently improved by Li et.al. in [6]. Belenky and Ansari proposed a deterministic packet marking scheme in [7], while Savage et.al. proposed a probabilistic packet marking (PPM) technique in [1], with subsequent enhancements made by others in [8] [9] [10] [11]. IP address fragmentation for eﬃcient packet marking and their vulnerability to attacker induced noise have been studied in [12] and [13] respectively. Recently, various encoding techniques have been used to progressively improve the performance of PPM schemes, as in Tabu marking [14], Local Topology marking [15], Space-Time encoding [16], Color Coding [17], and the use of Huﬀman Codes [18], Algebraic Geometric Codes [19] etc. Additionally various architectures for traceback have been explored, such as inter-domain traceback [20] and hybrid traceback [21] [22], in addition to some other radical approaches like in [23]. Research on mitigating DDoS attacks has proceeded in parallel, focusing on network ingress ﬁltering [24], routing table enhancements as in SAVE [25], CenterTrack [26] and intelligent ﬁltering [3]. The concept of path ﬁngerprints was exploited by Yaar et.al. in [27], and subsequently improved in [28]. Various other techniques involving path ﬁltering [29] [30], statistical ﬁltering [31] [32], and rate limiting [33] [34] have also been explored in literature. Of late, the focus has been on unifying these strategies for greater deployment incentives and better defense capabilities for the Internet.

Uniﬁed Defense Against DDoS Attacks

3

1049

Motivation

We discuss a few reasons here that motivate and also form the basis of our proposed solution. Our aim here is to provide a single mechanism that supports both traceback and mitigation simultaneously with no additional overhead. Uniﬁed Operation: Traditional PPM traceback schemes capture per-hop behavior, i.e. they transmit information about each intermediate router to the victim so that a global view of the routing path is eventually obtained over multiple packets. Thus traceback data is incrementally gathered and does not function on a per-packet basis; where each packet corresponds to some router identiﬁer fragment, and state maintenance is needed for mitigation. Similarly, traditional mitigation schemes capture end-to-end behavior, i.e. at some level of abstraction they generate ﬁngerprints that identify the routing paths as opposed to routers themselves. Thus mitigation operates on a per-packet granularity, but lossy abstraction makes it impossible to perform accurate traceback [27]. Our approach entails slicing the entire traceback data (later modeled as a cube) along the lines of path signatures instead of individual router identiﬁers. Thus we obtain multiple signatures per routing path useful for mitigation at lower thresholds on a per-packet basis, and when viewed collectively yields the required traceback data too. Thus we build a homogeneous technique that seamlessly provides both traceback and mitigation capabilities, operating in parallel. Traditional Restrictions: As the number of bits in the IP headers that can be overloaded to support traceback in the Internet is limited [1], we not only need to reduce the total number of bits to be marked, but also the number of bits marked per packet. Additionally to support faster traceback, we also need to mark these bits across fewer packets. Thus an ideal scheme would need to optimize these three diﬀerent metrics simultaneously. Practical Deployment: We assume that the victim has an upstream router graph [11] that serves as a lookup table for suspect routing paths. Hence, the scope of the problem is limited to identifying a unique path (essentially unique set of connected routers) in the router graph and not the entire Internet. As this is not an exhaustive set, we need to obtain only a certain subset of all router identiﬁer fragments, that ensures uniqueness in the router graph. Note that there exist multiple such subsets of diﬀerent sizes in the router graph. Use of router identiﬁer fragments as base data unit might lead to transmitting more fragments than required to uniquely identify a certain router, thereby wasting entire packets at times. In contrast, use of path signatures as base data unit ensures that every router has some data to convey to the victim in every packet. Thus we ensure maximum data diversity for the victim by increasing the average information (utility) gained per bit transmitted, and achieve traceback in far fewer (bits) packets due to better (bit) packet space utilization. We address these issues in two main phases - the data representation phase where we address the diﬀerent theoretical and technical issues by deﬁning a novel data cube model for traceback and a new metric called utility to analyze

1050

M. Muthuprasanna, G. Manimaran, and Z. Wang

the model, and the data transmission phase where we address the practical implementation issues by employing limited data logging at intermediate routers.

4

Data Representation Phase

We present our data cube model and other fundamental concepts here that form the crux of our solution. The question we answer here is which bits do we prefer to send, as opposed to how we send them as addressed by Adler in [9]. Data Cube: The total information (in bits) needed by the victim to completely reconstruct a suspect routing path is called the Traceback Data and we represent it using the Data Cube in Fig.1(a). The y-axis in the cube represents the diﬀerent routers along the routing path, while the x-axis represents the diﬀerent fragments of the corresponding router identiﬁers. The z-axis in the cube represents the diﬀerent bits in a particular fragment of the router identiﬁer. Thus the cube represents the complete traceback data about the suspect Internet routing path. The shaded region in Fig.1(a) shows the router identiﬁer for a particular router, including the diﬀerent bits in all the diﬀerent fragments. This 3-D representation helps us analyze the optimality of the diﬀerent traceback schemes known. When a DoS attack is launched, the search space for the suspect routing path is the entire Internet (or rather the entire upstream router graph). The victim

(a)

(c)

(b)

(d)

Fig. 1. Traceback Data Cube Model & Diﬀerent Transmission Schedules

Uniﬁed Defense Against DDoS Attacks

1051

then prunes this domain to a unique path, based on the traceback data it receives in an incremental fashion. It is obvious that the speed of the traceback operation is thus determined by the rate of receiving useful incremental information. Thus the transmission schedule used to transmit the entire traceback data to the victim plays a critical role in determining its response time to attacks. We deﬁne a metric called Utility, which helps us choose an optimal transmission schedule for the traceback data cube. The basic idea is to associate each of the bits in the data cube with diﬀerent weights (or utilities), such that an objective of maximizing speed (or total utility) yields a natural optimal transmission (sequence) schedule. Utility: We deﬁne Information Utility of a particular bit [35] as the ratio of the reduction in the search space due to the receipt of that particular bit to the total original search space (Eqn.1). The search space for a particular router having a k-bit identiﬁer is evidently 2k . Receipt of a single bit reduces the search space (uncertainty) to 2k−1 . A second subsequent bit reduces it to 2k−2 and so on. U tility (ui ) =

Search Space Reduction i T otal Search Space

(1)

Thus the utilities of the ﬁrst, second and mth -bits of a k-bit message are as shown in Eqns.2,3. The terms ﬁrst, second and so on here do not refer to the digits in the MSB or LSB of the binary number, rather they refer to that bit (any one of the k bits) that was transmitted ﬁrst, second and so on. u1 =

2k − 2k−1 1 2k−1 − 2k−2 1 ; u = = = 2 2 k k 2 2 2 2

(2)

2k−(m−1) − 2k−m 1 = m 2k 2

(3)

um =

The total utility of all bits of a message is shown in Eqn.4. The 21k represents the self-information of a k-bit message. In other words, it represents the probability of ﬁnding that particular k-bit message in the entire search space, and is the implicit information it possesses. Thus the total utility of the data cube having r intermediate routers is ≈ r, where each router identiﬁer has utility ≈ 1. k i=1

ui =

k 1 1 =1− k i 2 2 i=1

(4)

We thus see that not every bit of information in the traceback data cube has the same utility, as has been assumed by researchers to date. The utility of a bit is governed by how much information has already been transmitted to the victim about a particular slice (router identiﬁer) in the cube. Higher the utility achieved in a certain ﬁxed number of packets, smaller the search space for the suspect routing path, and hence faster the traceback process. We now use these novel concepts of data cube and utility, to derive higher utility rate transmission schedules for the data cube and hence faster traceback schemes.

1052

M. Muthuprasanna, G. Manimaran, and Z. Wang

Utility Analysis: We analyze the basic PPM based traceback scheme [1] here, as it has been the most widely studied technique in literature, and also as its analysis holds true for all the diﬀerent ﬂavors of PPM schemes proposed to date. We quantify the utility rate of the basic PPM scheme clearly demarcating its operating region in the utility space, thus illustrating its sub-optimal performance.

Fig. 2. Utility Rate Comparison

The 3-D data cube when viewed as a 2-D surface (xy-plane), each unit cell corresponds to a vector (z-axis) as indicated in Figs.1(b),1(c). The PPM scheme now reduces to randomly choosing a unit cell (with replacement) from the 2-D surface and sending the corresponding bit-vector to the victim. The receipt of all bit-vectors according to some transmission schedule concludes the traceback process. If the r router identiﬁers are each split into k fragments, each m bits long, then the utilities of the ﬁrst, second and tth fragments for each router identiﬁer are as shown in Eqns.5,6. uF1 = (1 − uFt =

1 1 1 ) ; uF2 = ( m ) ∗ (1 − m ) 2m 2 2

mt i=m(t−1)+1

ui = (

1 2m(t−1)

) ∗ (1 −

1 ) 2m

(5) (6)

Hence, highest (lowest) utility rate is achieved when all high (low) utility fragments are scheduled ﬁrst for transmission. Consider the transmission schedule in Fig.1(b) where all routers send their ﬁrst fragment, then their second fragment, and so on. Also consider another transmission schedule in Fig.1(c) where a particular router sends all its fragments before the next router is allowed to transmit, as in an implicit token passing system. These represent the highest and lowest utility rate transmission schedules respectively. It is easily seen that any other transmission schedule attains a utility rate in the operating region bounded by the inner two curves in Fig.2.

Uniﬁed Defense Against DDoS Attacks

1053

Proposed Transmission Schedule: Now consider a diﬀerent transmission schedule, where we again view the data cube as a 2-D surface (xz-plane), where each unit cell corresponds to a vector (y-axis) as indicated in Fig.1(d). We thus logically move away from the concept of individual router identiﬁers to that of path signatures. All the routers now send their ﬁrst bit, then their second bit and so on in each packet. Any packet now has only one bit from each router in order, and is hence classiﬁed as a path ﬁngerprint (signature). The utilities for the ﬁrst, second, and tth packets are as shown in Eqn.7. uP1 =

m m 1 m ; uP2 = 2 ; uPt = m ∗ t = t 2 2 2 2

(7)

The operating region in utility space of the proposed transmission schedule is bounded by the outer two curves in Fig.2. The lower bound here is caused by the Leader Selection Problem discussed later. In Fig.2, we assume r=16 (routers), k=4 (fragments), m=16 (bits), for illustrative purposes only.

5

Data Transmission Phase

Implementing the proposed scheme requires distributed coordination by all the intermediate routers to resolve mainly 2 issues. Firstly, each router needs to know the exact bit position in the IP headers that it is supposed to mark based on its distance from the victim. Also, the ﬁrst router closest to the attacker, needs to decide which one of the multiple path signatures is to be marked on a certain packet by all the subsequent routers along the suspect routing path.

Fig. 3. Traceback Buﬀer Implementation

We adopt a queue implementation to resolve the ﬁrst problem [28]. Any new router mark is pushed at the buﬀer head, causing some bit at the tail to overﬂow (Fig.3). Thus we ensure that without any explicit messaging, we can retain the marks of the last k routers in the traceback ﬁeld in the IP packet header. To prevent the queue from being hijacked by attacker injected spoofed data in the absence of overﬂow, we would need the ﬁrst router along the routing path to reset the traceback buﬀer. Additionally, it would also need to probabilistically choose a particular path signature to be marked on a certain packet. As there is no consensus on how we determine the ﬁrst router in the research community, we dilute the deﬁnition of a ﬁrst router to represent the router farthest from the victim in a trust domain (ISP PoP, gateway router etc.) [12]. Although this

1054

M. Muthuprasanna, G. Manimaran, and Z. Wang

limits the depth of traceback that can be achieved, the mitigation accuracy is enhanced due to virtually no data tampering by the attackers. Leader Selection Problem: As an alternative to the trust domain, we could also sub-optimally allow each router to probabilistically choose to be the leader (or ﬁrst router). This probabilistic selection was ﬁrst explored in [1]. The probabilistic worst case where the router closest to the victim always chooses to be the leader results in the lower bound for the proposed scheme in Fig.2.

Fig. 4. Hybrid Traceback Architecture

Hybrid Architecture: To improve the average case performance of Probabilistic Leader Selection (PLS), we now propose the hybrid traceback architecture (Fig.4), employing both packet logging and marking techniques [21]. Here, we additionally require the leader to cache the current traceback buﬀer and the path signature index, before overwriting them. Every other (non-leader) router during marking compares the current traceback buﬀer and index with its cache (if present), and on a preﬁx match caches and then marks the longer preﬁx. The example in Fig.4 shows a packet where both routers 1 and 4 contend to be the leader, and consequently marks from routers 1,2,3 are lost (but cached at router 4). In a subsequent packet, where router 2 is the leader, router 4 can augment the traceback buﬀer with cached mark of router 1 also. To keep the storage overhead small, routers can optionally also choose to turn this caching oﬀ. Thus selective pipelined distributed network storage of traceback data yields a signiﬁcant improvement in achieving higher transmission utility rates, and hence faster traceback using far fewer packets.

6

Analysis and Experimental Evaluation

We compare the basic PPM based traceback scheme and the proposed utility based scheme here, while also evaluating the eﬀect of the hybrid architecture. Number of Packets (w.r.t. PPM): Let there be r routers along the routing path, each having a marking probability of p, and whose identiﬁers each have k m-bit fragments. The probability of receiving a mark from a router i hops away is Pi (Mm ) (Eqn.8(a)). If we conservatively assume that marks from all routers appear with same likelihood as the furthest router, then probability that

Uniﬁed Defense Against DDoS Attacks

1055

a packet delivers a mark from some router is P (Mm ) (Eqn.8(b)). From generalized Coupon Collector Problem [36] [37] in Probability Theory, and detailed analyses in [1] [17], number of packets Xm needed to reconstruct the routing path from all the diﬀerent fragments for PPM based scheme is given by Eqn.9. Pi (Mm ) = p(1 − p)i−1 ; P (Mm ) ≥ rp(1 − p)r−1 E[Xm ] <

k ∗ loge (kr) p(1 − p)r−1

(8) (9)

For the proposed utility based scheme (known leader), we have km (r-bit) fragments (vectors), and hence the number of packets Xr needed to reconstruct the routing path from all the diﬀerent packets is given by Eqn.10. E[Xr ] <

k ∗ loge (km) p(1 − p)r−1

(10)

If we choose r=m, then our proposed scheme performs no worse than the PPM based schemes in reconstructing the suspect routing paths from the information obtained from all the diﬀerent fragments (packets). Utility Rate (w.r.t. PPM): Often we need to shortlist a few candidate suspect routing paths and not identify all unique attack paths during high volume DDoS attacks, as the immediate need is always to choose optimal (upstream) network locations where mitigation may be deployed. The constraint is usually to identify the most likely suspect paths in the top 2-5 percentile. Such a constraint not only reduces the eﬀect of the inaccuracies of the traceback process, but also reduces the response time and operational complexity of the mitigation process. We performed an extensive evaluation of the basic PPM based scheme and the proposed utility based traceback scheme, ﬁrst router being deﬁned at the trust boundary, over a candidate router graph obtained from Rocketfuel [38].

Fig. 5. Top k percentile Filtering

1056

M. Muthuprasanna, G. Manimaran, and Z. Wang Table 1. Number of Packets for Traceback Percentile

PPM

PPM

Our Scheme Our Scheme

(16-bit pkt) (50% Legacy) (No Legacy) (50% Legacy) (No Legacy)

2 5 10 20

167 119 91 63

111 80 60 41

15 11 9 6

9 7 5 4

We simulated 10,000 independent instances of DDoS attacks to obtain a high precision average case performance for the two schemes. Along the lines of the probabilistic calculations above, Fig.5 (Table 1) shows actual number of packets required to ﬁlter down to the top 2, 5, 10, 20 percentile suspect routing paths for the two schemes. We evaluate 2 scenarios namely, No Legacy where all intermediate routers are traceback capable, and 50% Legacy where half of them are legacy routers, incapable of traceback. It is to be noted that the performance degradation even in the face of limited deployment is very minimal. To trigger the constraints, the number of packets required by our proposed scheme is roughly an order of magnitude lesser than those required by the PPM based scheme. Thus the proposed utility based scheme achieves signiﬁcantly faster response times to DDoS attacks, while still retaining the same worst case performance (where we construct the unique attack path) as PPM based traceback schemes. Mitigation Strategy: We do not delve into this aspect in detail due to space constraints. The proposed scheme transmits some path signature in every packet to the victim. The victim can then choose to maintain independent thresholds for packet ﬁltering for each of these signatures. Additionally, as it can correlate diﬀerent path signatures as belonging to ﬁxed suspect paths once traceback completes [11], these independent thresholds can then be appropriately scaled to reﬂect the detected correlation. Thus our uniﬁed framework not only provides both traceback & mitigation capabilities, but also speeds up traceback process. Hybrid Architecture Eﬀect: In an ideal situation, every protected server would be associated with some network trust boundary or leader, and we would obtain the best performance then as described above. In cases where this is not true, PLS (possibly with hybrid architecture) would provide a working alternative, although sub-optimal in nature to the scheme analyzed above. The PPM schemes and the proposed scheme with known leader, assure us that every packet contains the entire corresponding bit-vector. However, with PLS, the leader always chooses to erase the traceback buﬀer, and hence the buﬀer contains only a subset of that bit-vector. For simplicity, we assume that PLS can equi-probably choose any router, and hence average bit-vector length would be half the number of routers along routing path ( r2 ). Hence, for the proposed scheme with PLS chosen leader, we have km (r-bit) fragments, and the number of packets Xr1 needed to reconstruct routing path is given by Eqn.11.

Uniﬁed Defense Against DDoS Attacks

E[Xr1 ] < r ∗

k ∗ loge (km) p(1 − p)r−1

1057

(11)

Using the router maps and parameters as above, we also evaluate the eﬀect of the hybrid traceback architecture on the proposed utility based traceback scheme employing PLS. Fig.6 shows the actual number of packets needed to ﬁlter down to the top 2, 5, 10, 20 percentile suspect routing paths for this scheme. As expected, we notice that the number of packets needed goes down sharply, as the required percentile accuracy reduces. We also notice that the number of packets needed here is much higher than the basic utility based scheme with known leader, due to the probabilistic nature of selection as described in Eqns. 9, 10, 11.

Fig. 6. Hybrid Architecture Eﬀect

We additionally capture a metric called Router Logging Depth, which indicates number of upstream routers whose (bits) signatures are being cached. While depth 0 indicates no caching, depth n (n 0) indicates maximum caching. It is easily seen that additional router storage for traceback logging increases linearly with increasing logging depth. However, we see diminishing returns with increasing logging depths in Fig.6, and thus individual routers can tune the cache size appropriately, without aﬀecting the speed of the traceback process.

7

Conclusion

As malicious entities unleash an increasing number of DDoS attacks on the Internet, it has become imperative to not only track them to hold them liable (traceback), but also to limit their capabilities and render them ineﬀective (mitigation). In this paper, we propose an uniﬁed framework that provides both traceback and mitigation capabilities simultaneously, operating in parallel. We view traceback as essentially a data transmission problem where we send a data cube across, and analysis of this model using novel metrics helps us understand

1058

M. Muthuprasanna, G. Manimaran, and Z. Wang

and design faster traceback schemes. We also exploit the concept of path signatures and hence support attack mitigation in an implicit manner. Thus our proposed defense mechanism uniﬁes both traceback and mitigation paradigms into a single protocol yielding better deployment incentives and greater defense capabilities in today’s Internet. It is easily seen that any traceback optimization in literature can easily be cast as an optimization along a certain dimension of the proposed data cube model. The orthogonality of the diﬀerent dimensions indicates that these enhancements need not necessarily be mutually exclusive. As part of our future work, we plan to evaluate a concurrent deployment of the diﬀerent orthogonal optimizations, to achieve even faster attack response times.

References 1. Savage et. al., “Practical Network Support for Traceback,” in SIGCOMM, 2000. 2. A. C. Snoeren et. al., “Hash-Based IP Traceback,” in SIGCOMM, 2001. 3. M. Sung, J. Xu, “Intelligent Packet Filtering: A Novel Technique for defending against DDoS Attacks”, in IEEE TPDS, 2003. 4. H. Burch, B. Cheswick, “Tracing Anonymous Packets to their approximate source”, in Proc. USENIX LISA, Dec. 2000. 5. S. M. Bellovin, “ICMP traceback messages,” Internet Draft, Mar. 2000. 6. Li et. al., “Large-Scale IP Traceback in High-Speed Internet: Practical Techniques & Theoretical Foundation”, in IEEE Symp. on Security & Privacy 2004. 7. A. Belenky, N. Ansari, “IP Traceback with Deterministic Packet Marking”, in IEEE Communication Letters, Apr 2003. 8. D. Song, A. Perrig, “Advanced and Authenticated Marking Schemes for IP traceback”, in Proc. IEEE INFOCOM, 2001. 9. M. Adler, “Tradeoﬀs in Probabilistic packet marking for IP traceback”, in Proc. STOC, pp. 407-418, 2002. 10. T. Peng, C. Leckie, K. Ramamohanarao, “Adjusted Probabilistic Packet Marking for IP Traceback”, in Proc. Networking, 2002. 11. A. Yaar, A. Perrig, D. Song, “FIT: Fast Internet Traceback”, in INFOCOM 2005. 12. I. Hamadeh, G. Kesidis, “Performance of IP Address Fragmentation Strategies for DDoS traceback”, in Proc. IEEE IPOM, 2003. 13. M. Waldvogel, “GOSSIB vs Traceback Rumors”, in ACSAC, 2002. 14. M. Ma, “Tabu Marking Scheme for Traceback”, in IPDPS, 2005. 15. B. Al-Duwairi, T. Daniels, “Topology Based Packet Marking,” in ICCCN 2004. 16. M. Muthuprasanna, G. Manimaran, “Space-Time Encoding for DDoS Attack Traceback”, in Proc. IEEE GLOBECOM, 2005. 17. M. Muthuprasanna, G. Manimaran, Mansoor Alicherry, Vijay Kumar, “Coloring the Internet: IP Traceback”, in Proc. ICPADS, 2006. 18. K. Choi, H. Dai, “A Marking Scheme using Huﬀman Codes for IP Traceback”, in Proc. ISPAN, 2004. 19. C. Bai et.al., “Algebraic Geometric Code Based IP Traceback”, in IPCCC 2004. 20. Y. Sawai, M. Oe, K. Iida, Y. Kadobayashi, “Performance Evaluation of InterDomain IP Traceback”, in Proc. IEEE ICT, 2003. 21. B. Al-Duwairi, G. Manimaran, “Novel Hybrid Schemes employing Packet Marking & Logging for Traceback”, in IEEE TPDS, 2005. 22. Gong et.al., “IP Traceback based on Packet Marking & Logging”, in ICC 2005.

Uniﬁed Defense Against DDoS Attacks

1059

23. M. Walﬁsh, M. Vutukuru, Hari Balakrishnan, D. Karger, Scott Shenker, “DDoS Defense by Oﬀense”, in Proc. SIGCOMM, 2006. 24. P. Ferguson, D. Senie, “Network ingress ﬁltering: Defeating denial of service attacks which employ IP source address spooﬁng,” in RFC 2267, 1998. 25. J. Li, J. Mirkovic, M. Wang, M. Reiher, L. Zhang, “SAVE: Source address validity enforcement protocol”, in Proc. of INFOCOM, 2001. 26. R. Stone, “CenterTrack:An IP overlay network for tracking DoS ﬂoods”, in Proc. USENIX Security Symposium, 2000. 27. A. Yaar, A. Perrig, D. Song, “Pi: A Path Identiﬁcation Mechanism to defend against DDoS Attacks,” in Proc. IEEE Symposium on Security and Privacy, 2003. 28. A. Yaar, A. Perrig, D. Song, “StackPi: New Packet Marking Filtering Mechanisms for DDoS & IP Spooﬁng Defense”, in JSAC, pp.1853-1863, Oct. 2006. 29. C. Jin, H. Wang, K. G. Shin, “Hop-Count Filtering: An eﬀective defense against spoofed DDoS traﬃc”, in ACM CCS, 2003. 30. A. Keromytis, V. Misra, D. Rubenstein, “SOS: An architecture for mitigating DDoS attacks”, in IEEE JSAC, pp. 176-188, Jan. 2004. 31. Y. Kim, W. Lau, M. Chuah, J. Chao, “PacketScore: A statistical-based overload control against DDoS attacks”, in Proc. IEEE INFOCOM, 2004. 32. T. Peng, C. Leckie, K. Ramamohanarao, “Protection from DDoS attacks using history-based IP ﬁltering”, in Proc. IEEE ICC, 2003. 33. J. Ioannidis, S. M. Bellovin, “Implementing Pushback: Router-based defense against DDoS attacks”, in Proc. NDSS, 2002. 34. D. Yau, J. Lui, F. Liang, “Defending against DDoS attacks with max-min fair server-centric router throttles”, in Proc. IWQoS, 2002. 35. C. E. Shannon, “A mathematical theory of communication I & II”, in Bell System Technical Journal, vol. 27, pp. 379-423 &623-656, 1948. 36. H. von Schelling, “Coupon Collecting for Unequal Probabilities”, in Proc. American Mathematical Monthly, 1954. 37. S. Lu, S. Skiena, “Filling a Penny Album”, in Proc. CHANCE, 2000. 38. N. Spring, R. Mahajan, D. Wetherall, “Measuring ISP Topologies with Rocketfuel”, in Proc. SIGCOMM, 2002.

Integrity-Aware Bandwidth Guarding Approach in P2P Networks* Wen-Hui Chiang1, Ling-Jyh Chen2, and Cheng-Fu Chou3 1

2

University of Southern California, Los Angeles, CA 90089, USA Institute of Information Science, Academia Sinica, Nankang, Taipei, Taiwan 3 National Taiwan University, Taipei, Taiwan [email protected]

Abstract. Most Internet-based collaborative computing systems face the major problem: freeriding. The abundance of freeriders, and load imbalance it creates, punishes those peers who do actively contribute to the network by forcing them to overuse their resources. Hence, the overall system performance becomes to degrade quickly. The goal of this paper is aimed to provide an efficient approach to distinguish the dishonest peers from the honest peers. The key idea of our approach is to make use of the relationship between the perceived throughput and the available bandwidth. First, we do a comprehensive study of available bandwidth estimation tools. Next, we propose integrity-aware bandwidth guarding algorithm, which is designed according to the perceived throughput and the available bandwidth estimation. Finally, the simulation results illustrate that our approach can correctly identify dishonest peers and be of great help in constructing a better overlay structure for many peer-to-peer and multicast applications. Keywords: peer-to-peer, available bandwidth, freeriding.

1 Introduction A peer-to-peer (or P2P) computer network is a network that relies primarily on the computing power and bandwidth of the participants in the network rather than concentrating it in a relatively low number of servers. An important characteristic in peer-to-peer networks is that all clients provide resources, including bandwidth, storage space, and computing power. Thus, as nodes arrive and demand on the system increases, the total capacity of the system also increases. Peer-to-peer computing has changed the way people interact in the areas of information sharing and collaboration. Hence, in the past few years, peer-to-peer applications have become more and more popular in the Internet. Most Internet-based collaborative computing systems, including P2P file sharing system such as Gnutella, Napster, and Kazaa, potentially face the problem of *

This work was partially supported by the National Science Council and the Ministry of Education of ROC under the contract No. NSC95-2221-E-002-103-MY2 and NSC95-2622-E002-018.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1060–1071, 2007. © IFIP International Federation for Information Processing 2007

Integrity-Aware Bandwidth Guarding Approach in P2P Networks

1061

freeriding: that is, users or peers that consume resources of the system without contributing anything in return. Many studies [15][17] have shown up to 70% of Gnutella clients do not share any files, and nearly 50% of all responses are returned by 1% of the peers. This abundance of freeriders, and load imbalance it creates, punishes those peers who do actively contribute to the network by forcing them to overuse their resources (e.g. bandwidth). Hence, the system performance becomes to degrade quickly. Previous research works [17] in this area has focused primarily on currency-based systems wherein peers gain currency for uploading files, and use currency when downloading files. In [14], they propose an EigenTrust algorithm, in which the global reputation of each peer i is given by the local trust values assigned to peer i by other peers, to construct the reputation management. On the other hand, BitTorrent [16] employs a tit-for-tat incentive mechanism to reduce freeriding and increase user cooperation. However, they also find out that in torrent with a large number of seeders, the BitTorrent tit-for-tat mechanism may not succeed in producing a disincentive for freeriding: in such torrents, freeriders may actually experience faster download times than cooperating peers. Hence, to cope with the above problem, the goal of this paper is how to figure out an efficient approach to distinguish the honest peers from the dishonest peers (e.g., freeriders), which consume resources of the system but contribute little or nothing in return. Since the Internet traffic is quite dynamic over time, it is not easy to distinguish the dishonest peers from the honest peers on the Internet according the perceived throughput. That is, such throughput-based approach is difficult to distinguish the reason of throughput degradation due to the network congestion or the bad peers. In this work, we aim to propose an efficient method to distinguish the dishonest peers from the honest peers. The key idea of our approach is to observe the relationship between the perceived throughput and the available bandwidth. When a network connection is affected by congestion, the available bandwidth and the throughput should both decrease. On the other hand, when the dishonest peer reduces its sending rate and the perceived throughput might decreases irregularly, but the available bandwidth should be the same or increase. There are many peer-to-peer–based services and applications in which our integrity-aware bandwidth guarding scheme can be of great help. For example, the fair sharing and peers are encouraged to contribute as much as possible. P2P file sharing system or P2P based multicast applications. By selecting a proper peer to cooperate with, not only the throughput of each peer could increase but also the system performance could improve substantially. We illustrate these points in the section 3. At last, we note that the proposed approach can be used in conjunction with currency-based approach, Eigentrust-based approach, or the reputation system to provide more efficient way to distinguish the dishonest peers. The rest of the paper is organized as follows. In section 2, we first introduce the general framework of our approach, several potential candidates of available bandwidth estimate tools and explain our algorithm in details. In section 3, we first do an accuracy study of the proposed approach and then illustrate how to integrate our approach to substantially improve the performance of multicast applications or peerto-peer applications. Finally, future work is given in the section 4.

1062

W.-H. Chiang, L.-J. Chen, and C.-F. Chou

2 Framework In this work, we propose a fast and precise bandwidth guarding approach, which is an active detecting method and consider the peer integrity. The major idea of the integrity-aware bandwidth guarding approach is to explore the relationship between the available bandwidth and the throughput and differentiate the honest peers from the dishonest peers. As we discussed before, most of the detecting schemes are based on the observation of the connection throughput. However, such passive detecting methods are difficult to identify the degradation of throughput which is based on peer’ selfish behaviour or network congestion. Thus, to address the above problem, we let each host be able to figure out its corresponding host’s behaviour by investigating the perceived throughput and the available bandwidth. Since a peer might be able to decrease the connection throughput but it is difficult for him to manipulate the available bandwidth of the path. There are two major components in our approach and they are (a) selection of a proper available bandwidth probing scheme, and (b) an intelligent irregular throughput detecting algorithm. In the following, we first do a comparison study about several famous available bandwidth detecting schemes and then present how to integrate this available bandwidth probing scheme into our framework to discover the dishonest peers. 2.1 Studies on Available Bandwidth Estimation Tools Available bandwidth is useful information for the route selection, quality-of-service verification, and traffic engineering in the overlay network or Internet. The definition of available bandwidth is the unused capacity at a link. Figure 1 illustrates the relationship between the link capacity, cross traffic, and the available bandwidth. C is the capacity of the link and A[0,T] is the traffic from time 0 to time T. The available bandwidth is the average unused bandwidth over some time interval T. Thus, B[0, T ] = min (C −

A [0, T ] T

,0)

In general, the bandwidth estimation techniques are classified into single packet methods and packet pair methods. Single packet methods estimate the link capacity by measuring the time difference between the round-trip time (RTT) to one end of an individual link and that to the other end of the same link. Single packet tools include pathchar, clink, and pchar. Packet pair methods send groups of back-to-back packets, i.e., packet pairs, to a server which echoes them back to the sender. The spacing between packet pair is determined by the bottleneck link. Example tools include NetDyn probes, bprobe, nettimer, and Spruce [1]. In this paper, we focus on comparing three available bandwidth estimation tools: Spruce [1], Iperf [7], and pathChirp [2].

Integrity-Aware Bandwidth Guarding Approach in P2P Networks

1063

Link Capacity Traffic A[0,T] Bits/s

0

T

time

Fig. 1. The Definition of Available Bandwidth

2.2 Comparison of Available Bandwidth Estimation Tools Since the performance of our integrity-aware bandwidth guarding approach depends on the efficiency of the available bandwidth estimation tool, we need to choose the suitable available bandwidth estimation tool with light-weight probing traffic and accurate estimation in multi-bottleneck network paths. Thus, we compare three available bandwidth estimation tools, i.e., Spruce, pathChirp and Iperf, in single bottleneck, pre-bottleneck and post-bottleneck environments.

Bandwidth Controller 10Mbps

Sender

Receiver

(a)

(b)

Fig. 2. (a) The setup of one bottleneck experiment. (b) The comparison of estimation tools in a bottleneck path.

At first, we set up a simple experiment as shown in Figure 2(a) to evaluate the accuracy of the estimation tools. The bottleneck controller is installed with FreeBSD and Dummynet. The experimental result is depicted in Figure 2(b). We can see that all tools perform well under a single bottleneck case. In the next experiment, we would like to create a post-bottleneck scenario for the targeted connection. So, we prepare six computers to construct the emulation experiment. In Figure 3, the sender and the receiver are hosts with Linux Fedora 2.6.11-1.1369_FC4smp.

1064

W.-H. Chiang, L.-J. Chen, and C.-F. Chou Bandwidth Controller 40 Mbps

Sender

Bandwidth Controller 10Mbps

Cross Traffic

Cross Traffic Generator

Cross Traffic Receiver

(a) Pre-bottleneck Case

Receiver

Bandwidth Controller Bandwidth Controller 40 Mbps 10Mbps

Sender

Cross Traffic Cross Traffic Generator

Cross Traffic Receiver

(b) Post-bottleneck Case

Fig. 3. The pre-bottleneck and post-bottleneck experiment environment

Two bandwidth controllers are the other hosts with FreeBSD 5.4 Release. The cross traffic generator and the receiver are notebook computers with Linux Fedora 2.6.11-1.1369_FC4smp. For two bandwidth controllers, we installed Dummynet [16] to limit the link capacity as we expect. The first bandwidth controller’s capacity is 40Mbps, and the second is 10Mbps. In cross traffic generator and cross traffic receiver, we use the Poisson traffic generator to generate the cross traffic. We control the cross traffic generator to generate cross traffic from 25Mbps to 45 Mbps. When the cross traffic is larger than 30 Mbps, the bottleneck happens at the first host. Thus, it is the same as the post-bottleneck experiment.

(a) Pre-bottleneck

(b) Post-bottleneck

Fig. 4. The pre-bottleneck and post-bottleneck experiment

Figure 4(a) illustrates the pre-bottleneck experimental result and the line with cross represents the ideal available bandwidth. It equals the capacity of the path minuses the cross traffic in the bottleneck. In this experiment, Iperf performs still better since its predicted result is very close to the ideal estimated result. The Spruce performs poorly in the experiment. We set up a post-bottleneck experiment as shown in Figure 4(b) to compare these 3 available bandwidth estimation tools. We can see that Iperf still performs better and is able to get the most accurate estimation. The estimated bandwidth by the pathChirp is

Integrity-Aware Bandwidth Guarding Approach in P2P Networks

1065

little higher than the actual value. We note that the Spruce failed to estimate the available bandwidth under such post-bottleneck case.

Fig. 5. The overhead comparison of pathChirp, Spruce, and Iperf

Now we turn our attention to the probing overhead of the estimation tools. In other words, we hope that the probing overhead is as low as possible, i.e., the probing data does not affect the normal traffic. Hence, we do the comparison study of probing overhead. In Figure 5, we see that Iperf usually has probing overhead more than 10Mbps. The overhead of the pathChirp is around 0.1Mbps and the Spruce’s overhead is less than 0.01Mbps. In conclusion, according to the accuracy and overhead comparison, we decide to choose pathChirp as our available bandwidth estimation tool. In fact, it performed well even in multi-bottleneck network environment with light weight probing overhead. 2.3 Irregular Throughput Detection Algorithms To figure out a dishonest peer, the main idea of our approach is to observe the relationship between the received throughput and the available bandwidth. When a network connection is affected by congestion, the available bandwidth and the throughput should both decrease. However, when the user cheats by cutting down its transmission rate, the observed throughput tends to decrease irregularly but the available bandwidth is likely to be the same or increase. According to this important observation, we design our algorithm to figure out the irregular patterns and locate the immoral behaviors of the dishonest peers. Now we first introduce the notations, as shown in Table 1, to explain the details of our algorithm. We define Expected Throughput, RE[t], which is the expected throughput at time t without cheating. Based on RE[t], we compare the current throughput and expected throughput to check if the current throughput is reasonable or not. We apply Exponential Weighted Moving Average (EWMA) to compute RE[t]

1066

W.-H. Chiang, L.-J. Chen, and C.-F. Chou

in our algorithm. In Algorithm 1 depicted in Figure 6, we update RE[t] every time interval τ. During time interval τ, we calculate the average perceived throughput Ravg[t-τ] and compare Ravg[t-τ] with RE[t]. If RE[t] decreases more than αRE[t] and available bandwidth A[t] increases more than (1+ρ)A[t-τ], the throughput in time interval τ will be classified as cheating traffic and the traffic in time interval τ will not be counted in RE[t]. α is defined as 1.6 * Rstd [t − τ ] , the coefficient of variance of throughput in time interval τ. We Ravg [t − τ ] assume that the variation of the traffic in a short term could be approximated as a normal distribution and use the 95% conference interval, i.e., 1.6 times the standard deviation Table 1. Definition of Notations RE[t]

Expected throughput (achieved rate) at time t

Ravg[t-τ]

Average throughput in time interval τ

Rstd[t-τ]

The standard deviation of throughput in time intesrval τ

A[t]

Available bandwidth at time t

Ρ

The available bandwidth estimation error rate

Ω

The weighted parameter used in counting expected throughput

Α

The coefficient of variance of throughput in time interval τ

Algorithm 1 Compute Expected Throughput 1: if |RE[t]ЁRavg[tЁĲ,t] |ІĮRE[t] and A[t] < (1Ѐȡ)A[tЁĲ] then 2: return [t] Ј RE[tЁĲ]×(1ЁȦ) Ѐ Ravg[tЁĲ,t]×Ȧ 3: else 4: return RE[t] Ј RE[tЁĲ] 5: endif

Fig. 6. Algorithm 1 Compute Expected Throughput

In Algorithm 2 illustrated in Figure 7, we use the Expected Throughput, RE[t], and the available bandwidth to identify if the peers are cheating. α and ρ are defined as the same as in Algorithm 1. In Algorithm 2, we monitor the throughput and the available bandwidth as shown in second and third lines. If the throughput drops too fast and the available bandwidth increases more than (1+ρ)A[t-τ], we consider the peer may be cheating and increase the parameter possible_cheating. When the possible_cheating is more than μ, which is the threshold value and we use that to identify the peer as “dishonest peer.” The possible_cheating might increase while the network conditions

Integrity-Aware Bandwidth Guarding Approach in P2P Networks

1067

return to the normal condition. This mechanism avoids the situation that the peer suffers sudden abnormal network conditions and may return to the normal condition in next time interval τ. Algorithm 2 Irregular Throughput Detection 1: for all interval Ĳ do 2: if RE[t] Ё Ravg[t Ё Ĳ,t] Ї ĮRE[t] then 3: if A[t] > (1Ѐȡ)A[t ЁĲ] then 4: possible_cheating ĸ possible_cheating Ѐ 1 5: if possible_cheating Ї μ then 6: return cheating_detected 7: end if 8: else 9:possible_cheating ĸ possible_cheating Ё 1 10: return nothing 11: end if 12: end if 13: end for

Fig. 7. Algorithm 2 Irregular Throughput Detection Algorithm

We also take the available bandwidth A[t] into consideration. If the A[t] increases more than (1+α)A[t-τ], the traffic may be cheating. We define ρ the threshold of estimation error. By experiments shown in Figure 8, we observe that the estimation error is less than 5% in 129 seconds, i.e., 60 rounds. Hence, we set ρ equal to 5% and τ more than 129 seconds in Algorithm 1 because our experiment could have error rate almost ± 5% in 129 seconds. ω is the weighted parameter used in computing expected throughput. In this work, we let ω equal to 0.8.

Fig. 8. The estimation error of pathChirp

1068

W.-H. Chiang, L.-J. Chen, and C.-F. Chou

3 Results In this section, we evaluate the performance of the integrity-aware bandwidth guarding approach through the simulations, which are done by using the NS-2[10] simulator. We consider two different systems: peer-to-peer applications and multicast applications, in which the integrity-aware bandwidth guarding method can be of great help. The TCP-Friendly Rate Control (TFRC) protocol is used in all the simulations. In all simulations, we adopt the pathChirp to estimate the available bandwidth. The performance metrics are (a) the error rate: the ratio of correct detection of dishonest peers, (b) the average waiting time: the mean time to retrieve the targeted file over all the peers, and (c) the longest waiting time: the amount of time for the slowest peer to retrieve the targeted file. 3.1 Accuracy Study: Peer-to-Peer Applications In this experiment, we would like to illustrate the accuracy of our approach, i.e., the successful ratio of using our algorithm to tell the cheating peers from the honest ones. We use Brite [18] to generate the topology, which is in accordance to the Waxman model [19]. There are total 100 nodes and 200 links in our simulation topology. The bandwidth of each link is determined to follow the heavy-tailed distribution and the range of the bandwidth of the link is from 1 Mbps to 20 Mbps. In addition, we use the traditional throughput-based scheme as the baseline comparison. In the first experiment, we randomly select 50 connections and we vary the percentage of the dishonest peers from 20% to 80%. After some period of time, dishonest peers start to reduce by half of their original sending rate. From figure 9(a), we observe that the performance between the throughput-based approach and our integrity-aware approach is comparable to each other. This is because when the load on the network is low, the throughput is mainly determined by the sending rate of the peer, i.e., the throughput is able to imply the behavior of the peer. That is, when the dishonest peer decreases its sending rate, it is enough to distinguish most dishonest peers by observing the change of their throughput only. To explore the benefit of our integrity-aware bandwidth guarding approach, in the following experiment, we consider a scenario in which the background traffic is introduced into the network after 300 seconds such that all the connections suffer serious network congestion from that time. Moreover, there is no dishonest peer in this experiment. From Table 2, it is not surprising that the throughput-based detection approach works poorly, i.e., the error rate is 100%. This can be explained as follows. The throughput-based approach can not tell that the degradation of the throughput is due to the peer behavior or the actual network condition. On the contrary, since our approach considers both available bandwidth and throughput, the error rate is able to keep quite small, i.e., the error rate is 6% in this case. Therefore, it is essential to consider both available bandwidth and throughput to detect the cheating peer correctly.

Integrity-Aware Bandwidth Guarding Approach in P2P Networks

1069

Fig. 9. The Error Rate of Irregular Throughput Detection Algorithm Table 2. Error rate of honest peers Throughput Detection Honest peers 100%

Available bandwidth and throughput 6%

3.2 Performance Study: Multicast Applications The motivation for this experiment is that we would like to illustrate the advantages by using our bandwidth guarding approach to figure out a better peer and construct a proper overlay structure for many systems or applications. That is, we want to investigate the impact of selecting a proper peer on the construction of multicast structure. We note that the performance of multicast systems primarily depends on the underlying multicast tree structure. Usually, a host would like to be placed as close as to the root to get the content first because the content is delivered from the root to the leaf node. For example, a dishonest peer would like to claim that it has quite large transmission bandwidth initially when the multicast tree is built. Hence, once the multicast tree is constructed based on the false information, all the peers under that dishonest host probably suffer a long waiting time until they get the required content or file. The simulation environment is as follows. There are 50 nodes, which want to join this multicast group. If a peer is a dishonest peer, it announces that its transmission bandwidth is from 2 to 5 times of its original transmission bandwidth. After the root is decided, the naïve method starts to construct the minimum spanning tree according to the claimed bandwidth of each peer. On the other hand, the root can use our integrityaware bandwidth guarding approach to build the suitable minimum spanning tree. The performance metrics are the average waiting time and the longest waiting time. The waiting time refers to the period of time which a peer spends to retrieve the required content.

1070

W.-H. Chiang, L.-J. Chen, and C.-F. Chou

(a)

(b)

Fig. 10. (a) The average waiting time in different ratio of cheating peers (b) The longest waiting time in different ratio of cheating peers

In Figure 10, we can see that the multicast application with the help of our approach always outperforms the multicast application with the naïve approach in term of the average waiting time and the longest waiting time. Furthermore, even a small amount of dishonest peers, such as 20% participants, are introduced into the system, the system performance still degrade quickly. That is, the dishonest peers claim that their transmission bandwidth is large and usually have better opportunities to be placed close to the root. Hence, all the hosts located under that dishonest peer will experience a longer waiting time to get the required content. Therefore, the integrity-aware bandwidth guarding algorithm can be used to substantially improve the performance of multicast applications.

4 Future Work In this work, we have proposed an integrity-aware bandwidth guarding algorithm. This approach is able to correctly identify dishonest peers and avoid misjudging honest peers, which are affected by poor network conditions. In addition, our algorithm can be used to construct a better underlying structure, which can substantially improve the performance of multicast applications. Our ongoing work is focusing on how to integrate our approach with other approaches to develop a better reputation system, which could provide more incentives to attract honest peers.

References 1. J. Strauss, D. Katabi, F. Kaashoek, A measurement study of available bandwidth estimation tools, in Proceedings of the ACM SIGCOMM, 2003 2. Vinay J. Ribeiro, Rudolf H. Riedi, Richard G. Baraniuk, Jiri Navratil, Les Cottrell, “pathChirp: efficient available bandwidth estimation for network paths,” in Proceedings of the 3rd Passive and Active Measurements, 2003

Integrity-Aware Bandwidth Guarding Approach in P2P Networks

1071

3. Allen B. Downey, Using pathchar to estimate Internet link characteristics, in Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, 1999 4. Jean-Chrysostome Bolot., End-to-end pakcet depay and loss behavior in the internet, In Proceedings of ACM SIGCOMM, San Francisco, CA, September 1993 5. K. Obraczka, G. Gheorghiu, The performance of a service for network-aware applications, In Proceedings of the SIGMETRICS symposium on Parallel and distributed tools, 1998 6. K. Lai, M. Baker, Nettimer: A tool for measuring bottleneck link bandwidth, in Proceedings of the USENIX Symposium on Internet Technologies, 2001 7. A. Tirumala, F. Qin, J. Dugan, J. Ferguson, K. Gibbs, http://dast. nlanr.net/Projects/Iperf 8. L. Rizzo, Dummynet: a simple approach to the evaluation of network protocols, In proceedings ACM SIGCOMM Computer Communication Review, 1997 9. M. Handley, J. Padhye, S. Floyd, and J. Widmer, TCP Friendly Rate Control (TFRC): Protocol Specification, IETF RFC 3448, January 2003 10. S. McCanne and S. Floyd. NS: Network Simulator. http://www-mash, cs. berkeley, edu/ns/ 11. EW Zegura, KL Calvert, S Bhattacharjee, How to model an internetwork, In the Proceedings of IEEE INFOCOMM, 1996 12. Alberto Medina, Anukool Lakhina, Ibrahim Matta, and John Byers, BRITE: An Approach to Universal Topology Generation, In Proceedings of the MASCOTS '01, Cincinnati, Ohio, August 2001 13. E. Adar and B. Huberman. Free Riding on Gnutella. First Monday, 5(10), Oct. 2000. S. Kamvar, M. Schloser, and H. Garcia-Molina. 14. The EigenTrust Algorithm for Reputation Management in P2P Networks. In WWW 2003, 2003. 15. D. Hughes, G. Coulson, and J. Walkerdine. Freeriding on Gnutella Revisited: the Bell Tolls? IEEE Distributed Systems Online, 6(6), 2005 16. N. Andrade, M. Mowbay, A. Lima, G. Wagner, and M. Ripeanu, Influences on Cooperation in BitTorrent, In SIGCOMM P2P-ECON workshop, 2005. 17. P. Golle, K. Leyton-Brown, I. Mironov, and M. Lillibridge. Incentives for Sharing in Peerto-Peer Networks, In WELCOM, 2001. 18. A. Medina, A. Lakhina, I. Matta, J. Byers, BRITE: An Approach to Universal Topology Generation Proceedings of MASCOTS, 2001 19. B. M. Waxman. Routing of multipoint connections. IEEE Journal on Selected Areas in Communications, 6(9):1617-1622, Dec. 1988.

Measuring Bandwidth Signatures of Network Paths Mradula Neginhal, Khaled Harfoush, and Harry Perros Department of Computer Science North Carolina State University Raleigh, NC 27695 {msneginh,harfoush,hp}@cs.ncsu.edu

Abstract. In this paper, we propose a practical and eﬃcient technique, Forecaster, to estimate (1) the end-to-end available bandwidth, and (2) the speed of the most congested (tight) link along an Internet path. Forecaster is practical since it does not assume any a priori knowledge about the measured path, does not make any simplifying assumptions about the nature of cross-traﬃc, does not assume the ability to capture accurate packet dispersions or packet queueing delays, and does not try to preserve inter-packet spacing along path segments. It merely relies on a simple binary test to estimate whether each probe packet has queued in the network or not. Forecaster is eﬃcient as it only requires two streams of probe packets that are sent end-to-end at rates that are much lower than the available bandwidth of the investigated path, thus avoiding path saturation. Theoretical analysis and experimental results validate the eﬃcacy of the proposed technique.

1

Introduction

The ubiquity of computer networks, our increasing dependence on them, and the need to leverage their utilization, performance and economic value call for ways to measure their characteristic features to get a deeper understanding of their behavior. Network bandwidth is a key characteristic as it quantiﬁes the data rate (throughput) that a network link or a network path can transfer. Measuring network bandwidth is useful for many Internet applications and protocols especially those involving high volume data transfer among others [4,11,27]. The bandwidth available to these applications directly aﬀects their performance. Research on bandwidth estimation has been quite popular over the last few years. Some researchers have targeted estimates of hop-by-hop capacity bandwidth (link speeds) [8,6,16]. Others have targeted the end-to-end capacity bandwidth of a network path, deﬁned as the slowest link speed along that path [25,5,23,20,15]. The link with the slowest speed is referred to as the narrow link.

This work was partially supported by NSF grant CAREER ANIR-0347226.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1072–1083, 2007. c IFIP International Federation for Information Processing 2007

Measuring Bandwidth Signatures of Network Paths

1073

The capacity bandwidth of the narrow link bounds the throughput that the path can transfer in the absence of any cross traﬃc. In the presence of cross-traﬃc, the situation is diﬀerent. A link with a higher link speed may be congested and the residual (available) bandwidth at this link may be even smaller than the residual bandwidth at the narrow link. In this case the available bandwidth at the most congested link (the one with the least residual bandwidth) will bound the path throughput. This link is referred to as the tight link. Tools to estimate the end-to-end available bandwidth of a network path, deﬁned as the available bandwidth at the tight link, have been proposed in [7,22,9,18,19,26,14,10]. There has also been recent interest in the capacity of the tight link as an important metric to help in the eﬃcient estimation of the end-to-end available bandwidth [26,7]. Assuming that the tight link of a path remains the same over some reasonably large time-scale, then an estimate of the capacity of the tight link helps track the available bandwidth of the path much more eﬃciently. To the best of our knowledge, the only technique to estimate the speed of the tight link has been proposed in [12]. The end-to-end available bandwidth of a path and the speed of its tight link thus completely identify the throughput of a network path, and we refer to this tuple for a network path as its bandwidth signature. In this paper we propose a novel technique to estimate the bandwidth signature of a network path. Forecaster does not assume any a priori knowledge about the measured path, does not make any simplifying assumptions about the nature of cross-traﬃc, does not assume the ability to capture accurate packet dispersions or packet queueing delays, and does not try to preserve inter-packet spacing along path segments. Furthermore, it only requires two streams of probe packets that are sent end-to-end at rates that are much lower than the available bandwidth of the investigated path, thus avoiding path saturation. As a sample comparison, Envelope [12] examines each link on a path separately to obtain the path tight-link capacity estimate, and assumes that inter-packet spacing can be preserved along the path. Both problems are resolved in Forecaster. Forecaster also does not require information about the path capacity as in [26], does not overwhelm the path with probes at a rate as high as the available bandwidth of the path as in [2,1], and does not require accurate measurements of packet dispersions as in [22]. The key idea is that the end-to-end path utilization can be estimated through a simple binary test that measures the fraction of probe packets that experienced queueing, not by measuring how much queueing they incurred. By sending two probing streams at diﬀerent rates and measuring the corresponding path utilization, then the available bandwidth and the speed of the tight link can be projected. The rest of this paper is organized as follows: In Section 2 we provide the theory behind Forecaster and show how to estimate the path utilization. In Section 3 we use the theory furnished in Section 2 to devise the Forecaster algorithm to estimate bandwidth signatures. In Section 4 we validate the eﬃcacy of the proposed technique through simulations. We ﬁnally conclude in Section 5.

1074

2

M. Neginhal, K. Harfoush, and H. Perros

Model

We model each link as a queue, and make use of concepts from basic queueing theory to estimate the utilization of a network path consisting of a sequence of links (queues). Note that an estimate of a link/path utilization does not by itself tell about its available bandwidth. For example, a 100 Mbps link with utilization of 0.5 has more available bandwidth than a 10 Mbps link with similar utilization. Still our estimates of the utilization will help in estimating bandwidth signatures as we elaborate in section 3. 2.1

One-Hop Path

In a queueing system consisting of a single queue, i, the utilization, ρi , of the system is expressed as ρi = 1 − πi0 where πi0 is the probability that there are no packets in the queue. This equation is generic and does not make any assumptions about the nature of the crosstraﬃc. If additional probe packets, transmitted at a rate r bps traverse this queue, then the eﬀective utilization, ρi (r), can be expressed as r (1) ρi (r) = min 1, ρi + Ci where Ci is the processing speed (the capacity of the modeled link). Notice that ρi (r) is a linear function of r, bounded by ρi (r) = 1, assuming that the raw link utilization, ρi , is stable over the probing period. 2.2

Multi-hop Path

Consider a network path consisting of a sequence of H links modeled as H successive queues. Assuming that the utilization of successive queues are uncorrelated, then the end-to-end utilization of the system, ρ, can be expressed as ρ=1− (1 − ρi ) (2) 1≤i≤H

However, the correlation between successive queues is expected to happen as they may be traversed by same ﬂow packets. As shown in [21], correlation only delays convergence and does not lead to divergence. In other words, Equation 2 holds when the system is observed over a larger time-scale. In the experiments in Section 4, we test the performance of Forecaster as correlation exists. The end-to-end utilization of the system, ρ(r), when probing at a rate of r is used can be expressed as ⎛ ⎞ r ⎠ ρ(r) = min ⎝1, 1 − 1 − ρi + (3) Ci 1≤i≤H

Measuring Bandwidth Signatures of Network Paths

1075

ρ(r) is a non-linear function of r bounded by ρ(r) = 1 and can be expressed as a function of degree H of the form H

i ci r (4) ρ(r) = min 1, i=1

where ci is the i-th coeﬃcient. Simple manipulation of Equation 3 reveals that ⎧ 1− (1 − ρk ) i=0 ⎪ ⎪ ⎪ ⎪ 1≤k≤H ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (1 − ρj ) ⎪ ⎪ ⎨ 1≤j≤H

j=k1 =k2 ...=ki ci = (−1)i+1 1≤i
Notice that the ﬁrst coeﬃcient, c0 , is the end-to-end utilization of the path without any induced probe traﬃc, ρ. The above equations highlight the fact that |ci | |ci+1 | since the numerator of each constant term, ci , is relatively small and the denominator dramatically increases as i increases. To visualize the differences in the magnitude of the ci values, consider a path comprised of 5 links, H = 5, with per-link capacities Ci and utilization ρi where i = 1 . . . 5. Let the links’ capacities be C1 =100Mbps, C2 =1Gbps, C3 =100Mbps, C4 =10Mbps, and C5 =1Mbps. Also, let the utilization of the links be ρ1 =0.4, ρ2 =0.2, ρ3 =0.3, ρ4 =0.3 and ρ5 =0.1. Substituting these values in Equation 3, ρ(r) = 0.78+2.7X10−7r−4.2X10−14r2 +1.15X10−21r3 −9.3X10−30r4 +10−38 r5 . Thus, c0 = 0.78, c1 = 2.7X10−7, c2 = 4.2X10−14, c3 = 1.15X10−21, c4 = 9.3X10−30 and c5 = 10−38 . As values of ci become negligible for larger i, Equation 4 can be approximated by a function of a degree lower than H. A ﬁrst-order approximation of ρ(r) is of the form ρ(r) ≈ min (1, c0 + c1 r)

(5)

Higher-order approximations are also possible but the gain in accuracy does not justify the extra complexity and overhead as evident from the experimental results in Section 4. Equation 5 generalizes our estimate of the utilization, ρ(r) for a path of arbitrary number of hops, where c0 = ρ as expressed in Equation 2, and j=k (1 − ρj )

1≤j≤H c1 = Ck 1≤k≤H

The similarity between Equations 1 and 5 suggests that the ﬁrst order approximation of the utilization, ρ(r), for a multi-hop path can be interpreted as the utilization over a single link with raw utilization of c0 and link speed of 1/c1 , assuming that the utilization of the links in the path are stable over the probing period.

1076

M. Neginhal, K. Harfoush, and H. Perros

2.3

Estimating a Path Utilization (ρ(r))

Forecaster relies on estimates of ρ(r). As discussed in sections 2.1 and 2.2, ρ(r) is the probability of having a probe packet, from a stream of packets transmitted at a rate r over the investigated path, queue in any of the path queues. To obtain an estimate of ρ(r) we send probes from one end of the path towards the other end with exponential inter-departure times at an average rate of r to attain the well-known Poisson Arrival See Time Average (PASTA) property [24]. According to the PASTA property, the probe packets arriving in the queueing system will sample the system queues, on average, as an outside observer would at an arbitrary point in time. ρ(r) is estimated as the fraction of probe packets that experienced queueing in the system. Probe packets are time-stamped at both ends and the time diﬀerence is used to distinguish probe packets that experienced any queueing from those that did not. Let T ∗ be the minimum experienced time diﬀerence. Assuming that T ∗ corresponds to packets that did not experience any queueing, then the fraction of probe packets leading to time diﬀerences larger than T ∗ corresponds to ρ(r). This assumption however may not hold especially if the path is highly utilized or if the number of probe packets in the stream is small. Modeling the probing process as a geometric distribution with probability p = 1 − ρ of a probe packet not experiencing any queueing delay, the expected number of probes in order to identify the minimum time diﬀerence is (1 − p)/p = ρ/(1 − ρ). As ρ gets larger, the number of probes needed to identify the minimum time diﬀerence needs to be larger. By inspecting this equation, one can see that one probe is expected to be enough if ρ ≤ 0.5 and around 99 probes are needed when ρ = 0.99. Obviously, If ρ = 1, then all probes will experience queueing delay. This backof-the-envelope analysis suggests that the number of probe packets needed per stream does not have to be very large.

3

Estimating Bandwidth Signatures

We next show how to use a path utilization, ρ(r), to estimate the path available bandwidth, A, and the speed of the tight link, C. The key idea is that when the probing rate, r, becomes equal to the path available bandwidth, A, then ρ(r) reaches its bound, ρ(r) = 1. In other words, the following ﬁrst-order approximation equation holds 1 ≈ c0 + Ac1 ; i.e. A≈

(1 − c0 ) c1

(6)

For a one-hop path (Section 2.1), Equation 6 maps to the popular Equation in [5]. In order to estimate A without having to actually probe the path at a rate r equal to A, thus avoiding the need to ﬁll up the communication pipe,

Measuring Bandwidth Signatures of Network Paths

1077

we need to ﬁnd the values of c0 and c1 . This could be achieved by sending two probing streams at two diﬀerent rates r1 and r2 and measuring ρ(r1 ) and ρ(r2 ). From Equation 5, we have: ρ(r1 ) ≈ c0 + r1 c1

(7)

ρ(r2 ) ≈ c0 + r2 c1

(8)

Solving Equations 7 and 8, we calculate c0 and c1 . Thus ρ(r2 ) − ρ(r1 ) (r2 − r1 )

(9)

c0 = ρ(r1 ) − r1 c1

(10)

c1 =

Substituting the values of c0 and c1 in Equation 6 we estimate A. Figure 1 provides a schematic representation of the observed utilization, ρ(r), using Equation 3 and neglecting the bound on ρ(r), as we vary the probing rate, r, both for (A) the single-hop (H = 1) and (B) the multi-hops (H = 3) cases. Notice that there is only one r leading to ρ(r) = 1 in the H = 1 case, and that there are three diﬀerent r values leading to ρ(r) = 1 in the H = 3 case. The r values leading to ρ(r) = 1 are basically the rates that are needed to ﬁll each link. In practice, and as reﬂected in Equation 3, ρ(r) is bounded by 1 and the smallest r value leading to ρ(r) = 1 is the available bandwidth in the tight link, A. Figure 1 also shows how the projection of the line connecting the points (r1 , ρ(r1 )) and (r2 , ρ(r2 )) in these ﬁgures till the point (A, 1) reveals the value of A. The speed of the tight link, C, also results from a projection of the line in the other direction, towards ρ = 0, as shown in the ﬁgures. We note that 1/c1 is exactly C. Thus C=

1 c1

(11)

Notice how the ﬁrst order approximation leads to conservative estimates of A and C. The overall procedure is sketched in Algorithm 1. Utilization

ρ

Utilization

ρ

1

1

(r2)

ρ (r2)

ρ

ρ (r1)

(r1)

slope=c1

slope=c1

ρ

c0= ρ (0)

c0= ρ (0)

r1

r2 C=1/c1

(A)

A

Probing Rate (r)

r1

r2

˜

A1 A1

˜

C

A2

A3 Probing Rate (r)

C=1/c1

(B)

Fig. 1. The observed utilization as we vary the probing rate for (A) a one link path, and (B) a multi-hop path

1078

M. Neginhal, K. Harfoush, and H. Perros

Algorithm 1. Forecaster algorithm: Estimating the end-to-end available bandwidth and the capacity of the tight link of a network path Measure ρ(r1 ) and ρ(r2 ) resulting from two probe sequences, at rates of r1 and r2 , respectively c1 = (ρ(r2 ) − ρ(r1 ))/(r2 − r1 ) C = 1/c1 c0 = ρ(r1 ) − r1 c1 A = (1 − c0 )/c1 return A, C

4

Experimental Results

We next validate our proposed approach using ns − 2 simulations [3]. 4.1

Setup

The link speeds used in our simulations are picked from common standards – shown in Table 1. Cross-traﬃc packets are introduced following measurement ﬁndings in [17] and [13], with 60% of the cross-traﬃc for TCP traﬃc and the remaining 40% for UDP traﬃc. Cross-traﬃc packet sizes are distributed between 40 bytes, 576 bytes and 1500 bytes following the observed Internet trends. Some of the cross-traﬃc ﬂows used are hop-persistent and some are path-persistent, that is some travel only along one hop and some travel along large segments of the investigated path. Path-persistent ﬂows introduce larger correlation between the path queues, which we intentionally introduce to monitor its impact of on our bandwidth estimates, as discussed in Section 2. Probe packets are sized at 1500 bytes. Recall that two probe sequences at two diﬀerent rates, r1 and r2 are used. We use 200 probe packets per sequence and pick r1 and r2 to induce a pronounced diﬀerence in the observed path utilization, ρ(r2 )−ρ(r1 ) ≥ 0.1, whenever possible. r1 is picked to be a small rate (50kbps) and r2 as C1 /10 where C1 is the speed of the link connected to the probing host, which is typically known. 4.2

Test Cases

We simulate four diﬀerent path setups (scenarios), each consisting of either three or four links of diﬀerent link capacities as shown in Table 2. In all scenarios we Table 1. Common Internet links and their capacities Link Type Mbps Link Type Mbps 10BaseT 10.000 OC1 51.840 100BaseT 100.000 OC12 622.080 1000BaseT 1000.000 OC96 4976.000 OC192 9953.280

Measuring Bandwidth Signatures of Network Paths

1079

Table 2. Simulated Path Setups L1 L2 L3 L4 Scenario I 100BaseT OC12 10BaseT Scenario II 1000BaseT OC192 100BaseT Scenario III OC1 OC96 10BaseT Scenario IV 1000BaseT OC96 OC12 100BaseT

Table 3. Accuracy of A and C for Considered Test Cases of Scenario I

Case 1 2 3 4

ρ1 0.0 0.3 0.95 0.2

ρ2 0.0 0.4 0.2 0.99

Scenario I ρ3 ρ4 0.0 0.2 0.01 0.01 -

Narrow Link Tight Link L3 L3 L3 L3 L3 L1 L3 L2

Case A (Mbps) A(M bps) A C (Mbps) C (Mbps) 1 10 10.14 -0.014 10 10.16 2 8 8.14 -0.017 10 24.3 3 5 5.09 -0.018 100 127.15 4 6.22 6.12 +0.016 622.08 722.03

C -0.016 -1.43 -0.271 +0.16

Table 4. Accuracy of A and C for Considered Test Cases of Scenario II

Case 1 2 3 4

ρ1 0.0 0.2 0.92 0.1

ρ2 0.0 0.1 0.1 0.99

Scenario II ρ3 ρ4 0.0 0.5 0.01 0.0 -

Narrow Link Tight Link L3 L3 L3 L3 L3 L1 L3 L2

Case A (Mbps) A(M bps) A C (Mbps) C (Mbps) 1 100 100.37 -0.004 100 100.39 2 50 52.72 -0.049 100 148.48 3 80 72.1 +0.098 1000 994.13 4 89.6 93.05 -0.038 9953 8481.9

C -0.004 -0.484 +0.006 +0.148

set up links of higher capacities in the middle of the simulated path and links of lower capacities at the edge, which is generally the case in the Internet. However, we test the performance of our technique when the links in the middle are the tight links. For each scenario, we experiment with four diﬀerent cases, which diﬀer in the utilization induced by cross-traﬃc on each link. Consequently, diﬀerent cases may diﬀer in the available bandwidth, the speed and the location of the tight link. The test cases have been formulated such that

1080

– – – –

M. Neginhal, K. Harfoush, and H. Perros

Case 1: Base case – No competing cross-traﬃc on any of the links. Case 2: The narrow link is the tight link. Case 3: The narrow link is not the tight link. Case 4: The high speed link in the middle of the network path is the tight link. This case is typically the hardest to estimate since a tight high-speed link implies that its utilization needs to be very large, at least 90% in all our cases. It has been shown in [10] that as the utilization of the tight link increases, so does the variation in the average available bandwidth, the metric that we are measuring. Table 5. Accuracy of A and C for Considered Test Cases of Scenario III

Case 1 2 3 4

ρ1 0.0 0.5 0.1 0.0

ρ2 0.0 0.1 0.2 0.99

Scenario ρ3 0.0 0.2 0.7 0.1

III ρ4 -

Narrow Link Tight Link L3 L3 L3 L3 L3 L1 L3 L2

Case A (Mbps) A(M bps) A C (Mbps) C (Mbps) 1 51.84 52.37 -0.010 51.84 52.42 2 25.92 25.54 +0.015 51.84 55.14 3 30 29.14 -0.028 100 102.34 4 50.2 39.92 +0.204 4976 3360.14

C -0.011 -0.064 +0.0214 +0.324

Table 6. Accuracy of A and C for Considered Test Cases of Scenario IV

Case 1 2 3 4

ρ1 0.0 0.2 0.95 0.2

ρ2 0.0 0.1 0.1 0.99

Case A (Mbps) A(M bps) 1 100 100.33 2 70 71.1 3 50 51.73 4 49.76 43.15

Scenario IV ρ3 ρ4 0.0 0.0 0.1 0.3 0.1 0.01 0.2 0.01

Narrow Link Tight Link L4 L4 L4 L4 L4 L1 L4 L2

A C (Mbps) C (Mbps) -0.003 100 100.33 -0.015 100 127.22 -0.035 1000 1276.65 -0.133 4976 5350.6

C -0.003 -0.272 -0.277 -0.075

Cases 2,3, and 4 are designed to stress-test our estimation methodology in heavily loaded network paths. Case 4 is not very common in the Internet but is designed to pinpoint the limitations of Forecaster. Tables 3, 4, 5, and 6 summarize the utilization of the links and highlight the narrow and tight links in all cases. An important observation that is evident from our analysis and results is that the order of the links does not change the results. For example, in all investigated

Measuring Bandwidth Signatures of Network Paths

1081

cases, the narrow link is the last hop on the path. Making the narrow link as the ﬁrst hop does not change our results or conclusions. 4.3

Performance Metrics

We use two performance metrics namely: (1) A , which is the error in the estimation of the available bandwidth and (2) C , which is the error in the estimation of the capacity of the tight link. These metrics are deﬁned as follows: A =

A−A A

(12)

where A is the actual end-to-end available bandwidth of the network path and A is the estimated value. C =

C−C C

(13)

where C is the actual capacity bandwidth of the tight link on the network path and C is the estimated value. 4.4

Results

In tables 3, 4, 5, and 6 we report on the accuracy of our estimates of A and C for each simulated case. It is clear that the A estimates are accurate especially in cases 1, 2 and 3. In case 4 , the heavy cross-traﬃc rate on the high speed tight link restrains Forecaster’s accuracy, and arguably, would stress any bandwidth estimation technique. In such case, the estimation error, A , can be as high as 20% (Scenario III). The accuracy of the tight link capacity estimates, C, is a diﬀerent story. The error, A , is much higher in some cases, reaching 143% in one case (Case 2, Scenario I). In general, the errors in Forecaster’s estimates may be due to (1) the correlation between successive queues, (2) the large utilization preventing the identiﬁcation of the end-to-end no-queueing delay, and/or (3) the approximation in the ρ(r) equation for the multi-hop case (Equation 5). While errors due to (1) and (2) can be resolved by increasing the number of probe packets per stream, errors due to (3) cannot. However, errors in capacity estimates can in general be reduced by noting that link capacities typically have standard link speeds and are typically not assigned arbitrary values. For example, by matching the estimate C = 24.3M bps of Case 2, Scenario I to the closest standard link speeds in Table 1, one can see that the closest link speed is 10BaseT, the correct link speed. In fact, matching the C of all considered cases to Table 1 leads to the correct tight link bandwidth estimates.

5

Conclusions and Future Work

We introduced Forecaster, a promising bandwidth-estimation tool. Simulation results reveal that Forecaster estimates the available bandwidth and the speed

1082

M. Neginhal, K. Harfoush, and H. Perros

of the tight link with reasonable accuracy in most reasonable scenarios. We also introduced a technique to correct deviating tight link capacity estimates. We are currently testing Forecaster in a controlled lab setup before Internet deployment. Another goal that we intend to pursue is to integrate Forecaster in a rate-adaptive congestion control protocol.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.

Bbftp. http://doc.in2p3.fr/bbftp/ Iperf. http://dast.nlanr.net/Projects/Iperf/ The network simulator-ns-2. http://www.isi.edu/nsnam/ns/ Andersen, D., Balakrishnan, H., Kaashoek, F., Morris, R.: Resilient Overlay Networks. Symposium on Operating Systems Principles, (2001) 131–145 Dovrolis, C., Ramanathan, P., Moore, D.: Packet-Dispersion Techniques and a Capacity Estimation Methodology. IEEE/ACM Transactions on Networking (TON), 12 (2004) 963–977 Downey, A.: Clink: A tool for estimating internet link characteristics. http://www. rocky.wellesley.edu/downey/clink/ Hu, N., Steenkiste, P.: Evaluation and characterization of available bandwidth and probing techniques. IEEE JSAC Special Issue in Internet and WWW Measurement, Mapping, and Modeling, 21 (2003) 879–894 Jacobson, V.: pathchar - A Tool to Infer Characteristics of Internet Paths. ftp://ee.lbl.gov/pathchar Jain, M., Dovrolis, C.: End-to-End Available Bandwidth : Measurement Methodology, Dynamics, and Relation with TCP Throughput. IEEE/ACM Transactions on Networking, 11 (2003) 537–549 Jain, M., Dovrolis, C.: End-to-End Estimation of the Available Bandwidth Variation Range. ACM Sigmetrics, (2005) Jannotti, J., Giﬀord, D., Johnson, K., Kaashoek, F., O’Toole, J.: Overcast: Reliable Multicasting with an Overlay Network. Symposium on Operating System Design and Implementation, (2000) 197–212 Kang, S., Liu, X., Bhati, A., Loguinov, D.: On Estimating Tight-Link Bandwidth Characteristics over Multi-Hop Paths. 26th IEEE International Conference on Distributed Computing Systems, (2006) 55–65 Fomenkov, M., Keys, K., Moore, D., Claﬀy, K.: Longitudinal study of internet traﬃc in 1998-2003. http://www.caida.org/publications/papers/2003/nlanr/ nlanr overview.pdf Kiwior, D., Kingston, J., Spratt, A.: PATHMON, A Methodology for determining Available Bandwidth over an Unknown Network. IEEE Sarnoﬀ Symposium on Advances in Wired and Wireless Communications, (2004) Lai, K., Baker, M.: Nettimer: A tool for measuring bottleneck link bandwidth. USENIX Symposium on Internet Technologies and Systems, (2001) 123–134 Mah, B.: pchar: A Tool for Measuring Internet Path Characteristics, (2001). http://www.kitchenlab.org/www/bmah/Software/pchar/ McCreary, S., Claﬀy, K.: Trends in Wide Area IP Traﬃc Patterns - A View from Ames Internet Exchange. 13th ITC Specialist Seminar on Internet Traﬃc Measurement and Modelling, (2000) Melander, B., Bjorkman, M., Gunningberg, P.: A New End-to-End Probing and Analysis Method for Estimating Bandwidth Bottlenecks. IEEE Global Internet Symposium, (2000)

Measuring Bandwidth Signatures of Network Paths

1083

19. Melander, B., Bjorkman, M., Gunningberg, P.: Regression-Based Available Bandwidth Measurements. International Symposium on Performance Evaluation of Computer and Telecommunications Systems, (2002) 20. Paxson, V.: End-to-End Internet Packet Dynamics. IEEE/ACM Transactions on Networking (TON), 7 (1999) 277–292 21. Presti, F., Duﬃeld, N., Horowitz, J., Towsley, D.: Multicast-based inference of network-internal delay distributions. IEEE/ACM Transactions on Networking, 10 (2002) 961–775 22. Ribeiro, V., Riedi, R., Baraniuk, R., Navratil, J., Cottrell, L.: PathChirp: Efﬁcient Available Bandwidth Estimation for Network Paths. Proceedings of The Conference on Passive and Active Measurements, (2003) 23. Carter, R., Crovella, M.: Measuring Bottleneck Link Speed in Packet-Switched Networks. Performance Evaluation, 27 (1996) 297–318 24. Wolﬀ, R.: Poisson Arrivals See Time Average. Operations Research, 30 (1982) 223–231 25. Keshav, S.: A Control-Theoretic Approach to Flow Control. ACM Sigcomm, (1991) 26. Strauss, J., Katabi, D., Kaashoek, F.: A Measurement Study of Available Bandwidth Estimation Tools. ACM/USENIX Internet Measurement Conference (IMC), (2003) 27. Zhu,Y., Dovrolis, C., Ammar, M.: Dynamic Overlay Routing Based on Available Bandwidth Estimation: A Simulation Study. Computer Networks, 50 (2006) 742–762

A Non-cooperative Active Measurement Technique for Estimating the Average and Variance of the One-Way Delay Antonio A. de A. Rocha, Rosa M.M. Le˜ ao, and Edmundo de Souza e Silva COPPE/PESC and Computer Science Department Federal University of Rio de Janeiro(UFRJ) Cx.P. 68511 Rio de Janeiro, RJ 21945-970 Brazil {arocha,rosam,edmundo}@land.ufrj.br

Abstract. Active measurements are a useful tool for obtaining a variety of Internet metrics. One-way metrics, in general, require the execution of processes at the remote machine and/or machines with synchronized clocks. This work proposes a new algorithm to estimate the ﬁrst two moments of the one-way delay random variable without the need to access a target machine and to have the machine clocks synchronized. The technique uses the IPID ﬁeld information and can be easily implemented using ICMP Echo request and reply messages. Keywords: Active network measurement, One-way delay, IPID ﬁeld.

1

Introduction

Active Internet network measurements are an important tool for aiding the modeling and analysis process, helping understanding the characteristics of such complex system as the Internet, and ultimately improve the performance of applications. There are many algorithms that have been proposed in the literature to estimate variables such as packet delays, jitter, loss rate, bottleneck capacity and tools that implement such algorithms. Some performance network metrics, such as round trip packet loss and delay, can be easily obtained. Existing tools like PING use the ICMP protocol and machines over the Internet are usually conﬁgured to send an echo reply message in response to an echo request. Although round trip measures are relatively easy to compute, it is much harder to obtain one way measures since, in most cases, we cannot simply assume that the forward and reverse paths are symmetric. Forward and reverse paths may have diﬀerent bandwidths and the set of routers in each path may also be distinct. Even if packets moving through the forward and reverse paths go through the same routers and links, the one-way performance metrics may drastically diﬀer because of traﬃc asymmetries in each direction. Current techniques to estimate the one-way delay (or loss rate) require the execution of processes at the remote machine to collect the arriving probes and to

This work is supported in part by grants from CAPES, CNPq and FAPERJ.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1084–1095, 2007. c IFIP International Federation for Information Processing 2007

A Non-cooperative Active Measurement Technique

1085

perform a set of operations. For instance, information about the received probes such as sending/receiving times must be collected and evaluated. Therefore, most active measurement tools for estimating this one-way metrics require that tool processes execute at the probe receiver machine. An issue is to obtain the metrics when remote access is not possible. Another diﬃculty that arises in computing one-way delays is the lack of clock synchronization between end point machines in the source-destination path if synchronization devices such as GPS equipments are not used. Techniques exist to deal with this problem (see [1] and references therein.) The problem of computing one-way metrics when remote access is impossible has been addressed recently, exploiting information contained in the identiﬁcation ﬁeld of the IP header (IPID). Among the metrics obtained by these non-cooperative techniques are the one-way loss rate [2,3]; out of order arrivals [2,4]; and the diﬀerence between the one-way delay from two machines that are sources of probes to a single target machine [5]. In this last case no access is granted to the measurement tool. In this work we present a new algorithm to estimate the ﬁrst two moments of the one-way delay (OWD) random variable between two points, where one of then, the target machine, does not run any process of the associated measurement tool. That is, the access is permitted only at the machines that generate probes and no execution privileges are necessary at the target receiver. The technique uses IPID information and requires that at least two source machines generate probes to a target. The technique can be easily implemented using ICMP Echo request messages and ICMP Echo reply. No special clock synchronization equipment or protocol is required at either the sources or the target machine. The remainder of this paper is organized as follows. Section 2 brieﬂy surveys methods that employ the IPID ﬁeld to compute measures of interest. Section 3 describes the problem we solve and present the proposed technique for the case that the two probe source machines have their clocks synchronized. We extend the technique in section 4 to relax the requirement that the source machine clocks are synchronized. In section 5 we evaluate the eﬃcacy of the technique both through simulation and experimentation using the PlanetLAB environment. Section 6 summarizes our main contributions.

2

Related Measurement Techniques Using IPID

The IPID is a 16 bits IP datagram header ﬁeld [6]. It is used by the IP layer to fragment and reassemble a datagram. The algorithm used to calculate the IPID value depends on the operating system. Several of them, such as Windows, FreeBSD, and Linux (up to kernel version 2.2) implement the IPID as a simple global counter. Recent works in the literature exploit the IPID ﬁeld values for estimating network metrics. The authors of [5] present a survey of previous work and classify the existing techniques into three categories used for: estimating traﬃc intensity [7];

1086

A.A. de A. Rocha, R.M.M. Le˜ ao, and E. de Souza e Silva

identifying the clustering of sources [8,7]; and identifying packet losses, duplication, and arrival order [2,4]. The existing techniques proposed to exploit the IPID ﬁeld have to deal with the wrap around problem. A packet sent by a host latter in time may have smaller IPID value than the IPID carried by another packet generated earlier from this same host. The reason for that is the limited size of the IPID ﬁeld (16 bits) which forces the IPID value to return to zero after reaching 216 . This problem was addressed before in [2,4,5]. In [5] methods are described to deal with the wrap around problem and to correct the IPID sequence. In [5] three new measurement techniques were proposed based on the IPID ﬁeld: (i) the ﬁrst is used to infer the amount of network internal traﬃc generated by a server, from a single passive measurement point; (ii) another serves to identify the number of load-balancing servers behind a single IP address; (iii) the last is an active measurement technique to infer the one-way path delays diﬀerence from two distinct sources A and B to a target D. Our work builds upon technique (iii) of [5] which assumes that the clocks of the source machines are synchronized by GPS and also that the target machine implements a global counter for the IPID ﬁeld. In that work, the source machines A and B generate probes to the target machine D, at constant intervals of value δa and δb , respectively, and D sends back to the sources ICMP echo reply packets. From the IPID values, one can identify when a probe generated by A arrives at D between two consecutive probes from B. Let nA (nB ) be the number of probes that are generated from A (B) starting at instant τA (τB ) such that the (nA )-th probe arrives in D between the (nB )-th and (nB + 1)-th probe. Then (see [5]), τB + dBD + nB δB ≤ τA + dAD + nA δA ≤ τB + dBD + (nB + 1)δB . Since δB is assumed small, the one-way delay diﬀerence dAD − dBD can be estimated by: dAD − dBD ≈ τB − τA + nB δB − nA δA .

3

(1)

Technique to Estimate the First Two Moments of the OWD

Similarly to the technique of [5] we assume that probes are generated from two (or more) sources to a single target machine. The target implements a global counter for the IPID ﬁeld and one does not have any access to that machine. In order to facilitate the explanation of the algorithm and to validate the technique we initially assume the clocks of the source machines are synchronized. However, this assumption is relaxed later, in Section 4. Our goal is to estimate dAD and dBD , i.e., the one-way delays to a target D from sources A and B, respectively. Suppose that machines A and B generate probes to the target machine D and, during some interval Δ, two of these probes, one from each machine, arrive at D in sequence. The corresponding ICMP packets generated by D that are sent back to the source machines will have IPID values that diﬀer by a small amount (assuming δB is small). In this scenario, we can establish the following system of equations:

A Non-cooperative Active Measurement Technique

⎧ dAD + dDA = RT TADA ⎪ ⎪ ⎨ dBD + dDB = RT TBDB dAD − dBD = ΨAD−BD ⎪ ⎪ ⎩ dDA − dDB = ΨDA−DB

1087

(2)

where, ΨID−JD is the estimated OWD diﬀerence dID − dJD and RT TADA and RT TBDB are the estimated round trip times ADA and BDB, respectively. These equations are linearly dependent and so we need extra information to obtain a unique solution. In what follows we address this issue. The OWD of a probe from source A, dAD , is equal to the sum of four terms: prop the overall propagation delay from A to D, TAD ; the sum of queueing times at queue the routers in path AD, TAD ; the sum of the transmission times at the links, proc tx ; the overall processing time, TAD . Therefore, assuming that the processing TAD times are negligible, prop queue tx dAD = TAD + TAD + TAD . (3) We further assume that the propagation times in the forward and reverse paths (AD and DA) are identical, however, the capacities and queue times in the forward and reverse paths can diﬀer. Note that the technique does not assume prop prop queue tx = TDA , TAD and TAD may diﬀer symmetric paths, that is, although TAD queue tx from TDA and TDA . Our approach to estimate the transmission and propagation times is based on the generation of probes with two distinct sizes following the three step procedure described bellow. First, n probes with identical sizes l are sent from A to D. Consequently, the ICMP protocol send ICMP packets back to A with the same size l. Second, the same procedure is repeated but using probes sizes one order of magnitude greater than that used in the ﬁrst step, that is 10l. Finally, n probes of size 10l are sent from A to D. However, this time we would like to get a return reply of size l. Sending ICMP request probes and receiving ICMP replies of the same size is trivial, since request/replies have always identical sizes. In the ICMP protocol speciﬁcation [9], if a machine receives an ICMP Echo request message, it must send back an ICMP Echo reply. The receiving machine will change only the header of the Echo request message, and will send an Echo reply with the same payload. However, the ICMP protocol speciﬁcation does not allow the Echo request sender to control the size of Echo reply message. To overcome this limitation, we generate packet pairs to emulate the eﬀect of sending a probe Echo request of size X and receiving an Echo reply of size Y < X. The method consists of sending two back to back probes(packet pair): the ﬁrst is an ICMP Echo reply message of size equal to 10l bytes and the second is an ICMP Echo request message of size l. (Note that, in the method, an ICMP Echo reply message is generated by the source machine spontaneously, without receiving an Echo request packet.) Both probes cross the same forward path until they reach the destination. (It is a common and reasonable assumption when using packet pair techniques to consider that both packets of the pair follow the same route from the source to the destination host [10].) In this scenario, the second probe is

1088

A.A. de A. Rocha, R.M.M. Le˜ ao, and E. de Souza e Silva

delayed at each hop by the transmission time of the ﬁrst probe, since the ﬁrst is a packet 10 times greater than the second. When the ﬁrst probe (ICMP Echo reply of size 10l) arrives at the target machine, it is discarded by the ICMP protocol. On the other hand, the second probe is an ICMP Echo request of size l, and the target machine immediately sends back an ICMP Echo reply of size l. From the three steps above, we can estimate the RTT of a packet with the same size in the forward and reverse directions and the RTT of a packet with size 10l in the forward direction and l in the reverse direction. From these estimates we obtain extra equations to solve our problem. X−Y , be the minimum round In our work we choose l = 50 bytes. Let RT Tm,ADA trip delays obtained when probes of size X are sent to D and the return has X−Y , it is common to assume that the queue time is size Y ≤ X. For RT Tm,ADA prop prop negligible ([11,12,13,14]). Since, from our assumptions, TAD ≈ TDA , we have: ⎧ ⎨

prop 50−50 tx tx TAD + TDA + 2TAD = RT Tm,ADA prop 500−500 tx tx 10TAD + 10TDA + 2TAD = RT Tm,ADA ⎩ prop 500−50 tx tx 10TAD + TDA + 2TAD = RT Tm,ADA

(4)

prop tx tx , TDA , and TAD The equations above are linearly independent and so TAD can be estimated from the RTTs. A similar system has to be solved to obtain the values of the transmission and propagation times between B and D. To estimate the OWD, we have still to compute the overall queueing times. queue queue queue queue queue queue = TAD − TBD , and ψDA−DB = TDA − TDB . The values Let ψAD−BD queue queue of ψAD−BD and ψDA−DB can be easily estimated from the transmission and propagation times in each path: queue prop prop tx tx = ΨAD−BD − (TAD + TAD + TBD + TBD ) ΨAD−BD queue prop prop tx tx ΨDA−DB = ΨDA−DB − (TDA + TDA + TDB + TDB )

(5)

Since we are able to calculate the diﬀerence between the queue time in the forward and reverse paths both from sources A and B, we can rewrite (2) considering only the queueing time in each path: ⎧ queue queue queue T + TDA = TADA ⎪ ⎪ ⎨ AD queue queue queue TBD + TDB = TBDB (6) queue queue queue − TBD = ΨAD−BD ⎪ ⎪ TAD ⎩ queue queue queue TDA − TDB = ΨDA−DB queue queue where, TADA and TBDB are the probe queueing times along the round trip paths ADA and BDB, respectively. queue queue or TBDB is equal to zero, the equations above are linearly indeWhen TADA queue queue queue queue queue pendent. If TBDB = 0 then TBD = TDB = 0. From (6), TAD = ΨAD−BD queue queue and TDA = ΨDA−DB . We have then obtained all the necessary quantities to estimate dAD from (3) (and identically dDA ).

A Non-cooperative Active Measurement Technique

1089

The details of the procedure are summarized as follows. Algorithm 1 Step 1: Generate nA and nB probes from machines A and B to D. Compute samples X−Y X−Y , RT TBDB . for RT TADA Step 2: From the RTT samples, obtain the minimum values of RTT for each source, prop prop X−Y X−Y tx tx tx tx , RT Tm,BDB , and then TAD , TDA , TAD , TBD , TDB , TBD using equations RT Tm,ADA (4). Step 3: Select a sub-set K of k probe pairs (pA , pB ), where pA and pB are probes sent from A and B, respectively. A pair (pA , pB ) is selected if the corresponding replies arriving from D have consecutive IPID values. Obtain samples for ΨAD−BD and ΨDA−DB queue queue from the sub-set K. Compute ΨAD−BD and ΨDA−DB using equations (5). Step 4: Select a pair of the sub-set K if the RT TBDB value is within the interm ]. Call this sub-set L and suppose that it has mA val [RT Tm,BDB , 1.01RT Tm,BDB pairs. Select a pair of the sub-set K if the RT TADA value is within the interval m ]. Call this sub-set M and suppose that it has mB pairs. [RT Tm,ADA , 1.01RT Tm,ADA (Recall that when RTT has minimum value, the queueing times in both ways are negligible.) queue queue and TDA , Step 5: From each pair of the sub-set L estimate one sample for TAD queue queue and from each pair of the sub-set M estimate one sample for TBD and TDB , using equations (6). Step 6: Use equation (3) to compute mA samples of dAD and dDA , and mB samples of dBD and dDB . Step 7: The average and variance of the OWD can be calculated from: mj dpath = m1j n=1 dpath (n) V ar(dpath ) =

1 mj −1

mj n=1

dpath (n) − dpath

2

where, for j = A (j = B) the path index is replaced either by AD or DA (respectively by BD or DB).

4

Extension for Non-synchronized Sources

In the previous section we assumed that the probe sources had their clocks synchronized. In what follows, we show that this assumption can be relaxed. The main problem for estimating the OWD if the probe sources have their clocks not synchronized is the clock Oﬀset and Skew. Solutions for removing the Oﬀset and Skew to calculate the one-way delay between machines have been discussed in the literature and solutions proposed [11,15,12,13,14,1]. In these techniques, if one wants to measure, for instance, the OWD, dAD , from machine A to D, probes are generated from A to D and both A and D must run processes of the measurement tool. However, in the technique described in Section 3 to calculate dAD , probes are generated from machines A and B to a target machine D. Thus, we can not immediately use the methods in the literature. We adapt the algorithm of [14] to remove the Skew. In [14] the Skew estimation requires the computation of the lower bound of the convex hull of a sequence

1090

A.A. de A. Rocha, R.M.M. Le˜ ao, and E. de Souza e Silva

of points (i, j) where i is the sending time of a probe from the source machine and j is the OWD computed at the destination. In our extension, consider the probes of size l generated by machines A and B to estimate the delay diﬀerences ΨAD−BD and ΨDA−DB . Let Ω = [(τBD (i), dBD−DA (i)) : i = 1, . . . , k] be a sequence of k points, taken from the probe pairs of set K deﬁned in previous section. (Recall that probe pairs in this set are those from machines A and B that arrived approximately at the same time in D.) Take the i-th pair in K. τBD (i) is the sending time of the probe from B to D in this pair, and dBD−DA (i) is the arrival time of the Echo reply of probe from A in this pair minus the sending time of the probe from B in this pair. Intuitively, since the arrival times at D from the probes in the pair are approximately the same, dBD−DA (i) is identical to the time diﬀerence that one would obtain if B sends a probe directly to A in a path that passes through D. From the Ω sequence, we can remove the Skew in the same manner as in [14]. To estimate and remove the Oﬀset, the algorithm presented in [13] could be used, if additional probes are generated from machine A directly to B, and vice versa. To avoid the generation of extra probes, we propose to estimate the Oﬀset based on the minimum values of RT TBDB and dBD−DA . Let dsm,BD−DA be the minimum value of dsBD−DA in the sequence obtained from Ω after the Skew is removed. Let RT Tm,BDB be the minimum round trip delay estimated from B to D. It is reasonable to assume that these values are obtained when the corresponding probes see no queueing delay at the routers. Then we have: prop prop tx tx RT Tm,BDB = TBD + TBD + TDB + TDB prop prop tx tx dsm,BD−DA = TBD + TBD + TDA + TDA − OAB

where OAB is the oﬀset between A and B. From the equations above the Oﬀset can be immediately estimated.

5

Experiments and Validation

The proposed technique was evaluated through simulation and experiments over the Internet. The main goal of the simulation model was to analyze the technique for diﬀerent values of bandwidth utilization and when the clock machines are not synchronized. (All the results below are in microseconds.) The simulation model developed using the TANGRAM-II Modeling environment [1] is illustrated in Figure 1. Objects Host A, Host B and Host Target represent the two sources and target machine, respectively. The routers represented in the model have diﬀerent bandwidths. A global IPID counter is simulated at the target machine, as well as the Echo reply packets. Besides probe packets, we generate cross traﬃc using a set of On-Oﬀ sources as suggested in [16]. The residence time in the state On and Oﬀ follows a Pareto distribution with parameter α < 2. We validate the technique by comparing a trace with the probe arrival times collected at the Host Target object with the values estimated by the proposed technique. We also compute the relative error of the average and variance of the OWD.

A Non-cooperative Active Measurement Technique

1091

Cross_Traffic

Cross_Traffic HostB_Source

Cross_Traffic

Router_4A Router_4B

Internet

Host_Target

Router_3

Cross_Traffic

Router_2

Router_1C

Router_1B Router_1A

HostA_Source

Cross_Traffic Cross_Traffic Cross_Traffic

Fig. 1. Simulation model

2500

900

2400

800

Variance(dDB)

Average(dDB)-μs

We show the results for two scenarios. In the ﬁrst scenario, link utilizations vary between 30% and 50% and the source machines clocks are synchronized. Figure 2 shows the results for path DB as a function of the simulation time considering the ﬁrst scenario. When the simulation time is smaller than 20 seconds, the estimated values are inaccurate. This occurs because the number of samples is too small to obtain an accurate OWD estimation. However, after 40s, the accuracy is very good.

2300 2200 2100 2000 1900

Actual

1800

700 600 500 400

Estimated

300 200

1700

100

Estimated

1600 0

10

20

30

40

50

70

80

Simulation time(sec.)

(A)

Actual

0 60

90

100

0

10

20

30

40

50

60

70

80

Simulation time(sec.)

90

100

(B)

Fig. 2. Average and variance of OWD path DB (link utilization between 30 and 50%)

Figure 3 shows the results for path AD when the second scenario is considered. The utilization varies between 65% and 80%, and clocks are not synchronized. In this case longer simulation times are needed as compared to the ﬁrst scenario. This is expected since, for a given time interval t, the higher the utilization, the smaller the number of samples that can be obtained to estimate the measures. However, even for high utilization values, the estimation procedure converges fast.

1092

A.A. de A. Rocha, R.M.M. Le˜ ao, and E. de Souza e Silva

3.5e+07

11000

Variance(dAD)

Average(dAD)-μs

12000

10000 9000

Estimated

8000 7000

Estimated

2e+07

Actual

1.5e+07 1e+07

6000

Actual

5e+06

5000 4000

3e+07 2.5e+07

0

10

20

30

40

50

60

70

Simulation time(sec.)

80

90

100

0

0

10

20

30

40

50

60

70

80

90

100

Simulation time(sec.)

(A)

(B)

Fig. 3. Average and variance of OWD path AD (link utilization between 65 and 80%) Table 1. Relative error - simulation Path AD DA BD DB

Scenario 1 Scenario 2 Average / variance Average / variance 0.020/ 0.058 0.025 / 0.001 0.013 / 0.011 0.082 / 0.290 0.013 / 0.132 0.057 / 0.220 0.002 / 0.033 0.062 / 0.078

Table 1 presents the relative error of the average and variance of the OWD for both scenarios. The relative errors are less than 2% (average) and 13% (variance) when the utilizations are low to moderate. In the second scenario with higher utilizations the relative errors are less than 8% and 29%, respectively. However, one can obtain smaller relative errors by increasing the measurement time. In what follows we show the results of experiments over the Internet. In order to generate probes according to our algorithm we adapted the TANGRAM-II Traﬃc Generator [1]. In all experiments the probe generation rates for each source are 1,000 packets/s and 100 packets/s. We used machines that are synchronized by GPS to be able to estimate the actual delay values. (Therefore, there is no need to remove the Skew and Oﬀset.) Considering that most packets have size l = 50 bytes, the overload introduced in the network are respectively 400 kbps and 40 kbps which can not be considered an intrusive traﬃc for the actual network rates. In the ﬁrst set of experiments three machines were employed: one at UFRJ (Brazil), other at UNIFACS (Brazil) and the third at UMass (USA). Experiments with 30 minutes duration each were executed. The target machine was varied for each. A sample of all results is shown in Table 2, and the OWD relative error was less than 2%. The second set of experiments was performed using three PlanetLAB machines and during diﬀerent times of the day. Machines at Berkeley and U.K. generate probes to a target machine in Hong Kong during the 5 ﬁrst minutes

A Non-cooperative Active Measurement Technique

1093

Table 2. Relative error - experiments UFRJ, Unifacs and UMass Path UFRJ-UMass UMass-UFRJ Unifacs-UMass UMass-Unifacs

Relative Error Average / variance 0.004 / 0.626 0.005 / 0.022 0.016 / 0.710 0.015 / 0.087

140000

UFRJ-Unifacs Unifacs-UFRJ Umass-Unifacs Unifacs-Umass

Relative Error Average / variance 0.009 / 0.152 0.009 / 0.038 0.001 / 0.015 0.001 / 0.099

2.5e+09

Estimated Actual

120000

Path

Estimated Actual 2e+09

Variance (μs)

Average (μs)

100000

1.5e+09

80000 60000

1e+09

40000 5e+08

20000 0

0

0

5

10 15 Experiment Hour

20

25

0

5

10 15 Experiment Hour

20

Fig. 4. Average and Variance of the OWD in path from Hong Kong to Berkeley during 24 hours experiment

of each hour for 24 hours. Figure 4 illustrates the results obtained for the average and variance of the path from Hong Kong to Berkeley. The ﬁgure clearly shows that the technique was able to capture very accurately the behavior of the measures during several hours. In the third set of experiments (also using PlanetLAB) the source machines were at Seattle and Texas and the target machine was in Korea. Probes were generated in the ﬁrst minute of each hour, for 10 hours (between 5am to 3pm GMT). Each one minute session was divided into 6, 10 seconds duration subsessions. For each sub-session we estimate one sample of the average OWD. Using these 6 samples, we compute the sample average and the conﬁdence interval of the OWD for one session. We consider a 95% level of signiﬁcance. Figure 5 shows the conﬁdence intervals for both the values estimated by our algorithm and actual measures of the average OWD, for the Korea-Seattle path. The ﬁgure conﬁrms the good accuracy of our approach. The last experiment using PlanetLAB hosts involved several machines. Machines at Texas, Standford, Berkeley, Unifacs, Kaist, France, Israel, U.K. and Hong Kong generated probes simultaneously to a target machine at UMass. The main purpose of this experiment was to investigate the OWD of several paths from diﬀerent sources machines to a single target. Table 3 illustrates the results. This experiment shows that the technique could be employed, for instance, by an application to choose the “best” path (i.e., with the minimum value of the OWD and/or variance) to serve a request from a client machine (in this example UMass).

1094

A.A. de A. Rocha, R.M.M. Le˜ ao, and E. de Souza e Silva 200000 Estimated

Average(dDB)-μs

180000

Actual

160000 140000 120000 100000 80000 60000 0

2

4

6

Experiment Hour

8

10

Fig. 5. Conﬁdence interval for the average OWD - path Korea-Seattle Table 3. Experiment from several sources to UMass Path Texas-UMass Standford-UMass U.K.-UMass Berkeley-UMass Hong Kong-UMass Israel-UMass Kaist-UMass Unifacs-UMass France-UMass

6

Average Estimated / Actual / Relative Error 26091 / 25852 / 0.009 35097 / 35562 / 0.013 43777 / 43948 / 0.003 40680 / 40602 / 0.001 19321 / 20427 / 0.057 85975 / 85607 / 0.004 107122 / 106971 / 0.001 86716 / 86425 / 0.003 48513 / 48338 / 0.003

Variance Estimated / Actual / Relative Error 150976 / 227899 / 0.509 256008 / 261035 / 0.019 140461 / 199699 / 0.421 19321457/20427774/0.057 178582 / 263283 / 0.474 570297 / 653080 / 0.145 219852 / 292970 / 0.332 982904 / 227814 / 0.768 207378 / 260358 / 0.255

Main Contributions

In this work we propose a novel technique to estimate the average and the variance of the one-way delay. Several experiments using the PlanetLAB infrastructure were performed and the results obtained show that the average and variance of OWD can be accurately estimated. An important characteristic of the proposal is that it does not require to run any process at the remote machines. Furthermore, it can be used even if the clocks of the source machines are not synchronized. Therefore it is a valuable tool to estimate OWDs from machines one has access to run processes to diﬀerent machines were no access is granted, provided that the targets run an OS which implements a global IPID counter (such as Windows machines).

References 1. de Souza e Silva, E., Leao, R., Rocha, A., Duarte, F., Silva, A., Filho, F.S., Jaime, G., Muntz, R.: Modeling, Analysis, Measurement and Experimentation with the Tangram II Integrated Environment. In: Int. Conf. on Performance Evaluation Methodologies and Tools. (2006) 2. Mahajan, R., Spring, N., Wetherall, D., Anderson, T.: User-level internet path diagnosis. In: 19th ACM SOSP. (2003) 106–119 3. Savage, S.: Sting: a TCP-based Network Measurement Tool. In: USENIX Symposium on Internet Technologies and Systems. (1999) 71–79

A Non-cooperative Active Measurement Technique

1095

4. Bellardo, J., Savage, S.: Measuring Packet Reordering. In: 2nd ACM SIGCOMM IMW. (2002) 5. Chen, W., Huang, Y., Ribeiro, B., Suh, K., Zhang, H., de Souza e Silva, E., Kurose, J., Towsley, D.: Exploiting the IPID Field to Infer Network Path and End-System Characteristics. In: Lecture Notes in Computer Science. Volume 3431. (2005) 108– 120 6. Postel, J.: Internet Protocol (1981) IETF RFC 791. 7. Insecure.org: Idle Scanning and related IPID games (1997) http://www.insecure. org/nmap/idlescan.html. 8. Bellovin, S.: A Technique for Counting NATed Hosts. In: 2nd ACM SIGCOMM IMW. (2002) 267–272 9. Postel, J.: Internet Control Message Protocol (1981) IETF RFC 792. 10. Dovrolis, C., Ramanathan, P., Moore, D.: What do Packet Dispersion Techniques Measure? In: IEEE Infocom. Volume 1. (2001) 905–914 11. Loung, D., Biro, J.: Needed Services for Network Performance Evaluation. In: IFIP Workshop on Performance Modeling and Evaluation of ATM Networks. (2000) 12. Paxson, V.: On Calibrating Measurements of Packet Transit Times. In: ACM Sigmetrics. (1998) 11–21 13. Tsuru, M., Takine, T., Oie, Y.: Estimation of Clock Oﬀset from One-way Delay Measurement on Asymmetric Paths. In: SAINT International Symposium on Applications and the Internet. (2002) 126–133 14. Zhang, L., Liu, Z., Xia, C.: Clock Synchronization Algorithms for Network Measurements. In: IEEE Infocom. (2002) 160–169 15. Moon, S., Skelly, P., Towsley, D.: Estimation and Removal of Clock Skew for Network Delay Measurements. In: IEEE Infocom. (1999) 227–234 16. Taqqu, M.S., Willinger, W., Sherman, R.: Proof of a Fundamental Result in SelfSimilar Traﬃc Modeling. In: ACM Computer Communications Review. (1997) 5–23

The P2P War: Someone Is Monitoring Your Activities! Anirban Banerjee, Michalis Faloutsos, and Laxmi Bhuyan Department of Computer Science and Engineering University of California, Riverside Riverside, CA 92521 {anirban,michalis,bhuyan}@cs.ucr.edu

Abstract. In an eﬀort to prosecute P2P users, RIAA and MPAA have reportedly started to create decoy users: they participate in P2P networks in order to identify illegal sharing of content. This has reportedly scared some users who are afraid of being caught. The question we attempt to answer is how prevalent is this phenomenon: how likely is it that a user will run into such a “fake user” and thus run the risk of a lawsuit? The ﬁrst challenge is identifying these “fake users”. We collect this information from a number of free open source software projects which are trying to identify such IP address ranges by forming the so-called blocklists. The second challenge is running a large scale experiment in order to obtain reliable and diverse statistics. Using Planetlab, we conduct active measurements, spanning a period of 90 days, from January to March 2006, spread over 3 continents. Analyzing over a 100 GB of TCP header data, we quantify the probability of a P2P user of being contacted by such entities. We observe that 100% of our nodes run into entities in these lists. In fact, 12 to 17% of all distinct IPs contacted by any node were listed on blocklists. Interestingly, a little caution can have signiﬁcant eﬀect: the top ﬁve most prevalent blocklisted IP ranges contribute to nearly 94% of all blocklisted IPs and avoiding these can reduce the probability of encountering blocklisted IPs to about 1%. In addition, we examine other factors that aﬀect the probability of encountering blocklisted IPs, such as the geographical location of the users. Finally, we ﬁnd another surprising result: less than 0.5% of all unique blocklisted IPs contacted are owned explicitly by media companies. Keywords: Peer-to-peer networks, Gnutella, RIAA, User monitoring.

1

Introduction

Organizations like the RIAA and MPAA, representing content providers, have escalated their ﬁght against illegal P2P content sharing [2], [13], [14], [15], [21], [22] with the use of fear: there have been a number of lawsuits against individual P2P users [3], [4], [5], [6]. For greater eﬀect, these organizations and their collaborators have also started “trawling” in P2P networks: creating “fake users” which I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1096–1107, 2007. c IFIP International Federation for Information Processing 2007

The P2P War: Someone Is Monitoring Your Activities!

1097

participate in the network and thus identify users who contribute towards illegal content sharing. However, the extent of this deployment tactic has not been quantiﬁed up to now, and this forms the focus of our work. In response to this approach, the P2P community has spawned several projects which attempt to: (a) identify such “fake users”, and (b) enable P2P users to avoid them. In more detail, there is a community based eﬀort to maintain lists of suspect IP address ranges, which are called blocklists. Blocklists are published by organizations which provide anti-RIAA software or by groups which focus on security [9]. Additionally, a number of free, open-source, software projects enable P2P users to avoid these blocklisted IPs automatically and are integrated with the most popular P2P clients using BitTorrent, eDonkey/ eMule, Gnutella networks [1], [8], [9], [30], [17], [26]. Note that it is not our intention here to examine how accurate and comprehensive these lists are, though this would be interesting and challenging future work. What we claim is that, the information we use in our research, is readily available to P2P users and is used by them. [1]. The question we attempt to answer is, how prevalent is the phenomenon of fake users. Simply put, how likely is it that a user will run into such a “fake user” without using blocklists? The answer to this question can lead us to: (a) understand the eﬀort that content providers are putting in trawling P2P networks, and (b) justify the eﬀort of the P2P community to isolate “fake users”. Hereonwards, we refer to IP ranges of fake users listed on these blocklists as blocklisted IPs and users exchanging data with them as being monitored. The intention of blocklists is to identify such ”monitoring” entities, however all IP ranges listed on blocklists are not monitoring users, but we assume the ”worst” case scenario. We say that a user hits blocklisted IPs every time a user receives or sends a piece of data (part of a ﬁle) to that IP range. Organizations employing these blocklisted IPs are referred to as blocklisted entities. To the best of our knowledge, such measurements have not been collected before. We conduct what seems to be the ﬁrst study on the probability with whitch P2P users are being monitored. We employ PlanetLab [12] for a period of 90 days and customize a Gnutella client (mutella version 0.4.5) to automatically initiate meaningful queries and collect statistics from the Gnutella network. Each client initiates 100 queries for popular song found in prominent music charts [36], [29], [28]. We collect and analyze nearly 100GB of TCP header data. We then examine the observed IP addresses using the most popular blocklists on the Internet [1], [9], [30]. Our results can be summarized as follows: 1. Consequence of ignoring blocklists: A user without any knowledge of blocklists, will almost certainly be monitored by blocklisted IPs. We found that all our clients exchanged data with blocklisted IPs. In fact, of all distinct IPs contacted by any client, 12-17% were found to be listed on blocklists. 2. A little information goes a long way: We ﬁnd that avoiding just the top 5 blocklisted IPs reduces the chance of being monitored to about 1%. This is a consequence of a skewed preference distribution: we ﬁnd that the top 5

1098

3.

4.

5.

6.

A. Banerjee, M. Faloutsos, and L. Bhuyan

blocklist ranges encountered during our experiments contribute to nearly 94% of all blocklist hits. Most blocklisted IPs belong to government or corporate organizations: We quantify the percentage of hits to blocklisted entries of each type, i.e. government and corporate, educational, spyware proliferators and Internet advertisement ﬁrms. We ﬁnd that the number of hits which belong to government and corporate lists, is approximately 71% of total number of hits, nearly 2.5 times more than educational, spyware and adware lists put together. Interestingly, some blocklists mention unallocated IP ranges called BOGONS, which we discuss later. Very few blocklisted IPs belong directly to content providers: We ﬁnd that 0.5% of all blocklisted IPs hits could actually be traced back to media companies, such as Time Warner Inc. However, it is an open question whether other blocklisted IPs are indirectly related to content providers. Geographical bias: We ﬁnd that there is geographical bias associated with how users hit entities listed on blocklists. The way in which users located on the two opposite coasts, east and west, of mainland US, Europe and Asia, hit blocklisted entities is quite diﬀerent. Equal opportunity trawling: We ﬁnd that Ultra-peers (UPs) 1 and leaf nodes have equal probability of associating with a blocklisted IP, with less than 5% variation in the average number of distinct blocklisted IPs. This comes in contrast to the popular belief that UPs are monitored more aggressively by blocklisted entities [10], [11], than leaf users.

The rest of the paper is organized as follows. Section II presents relevant literature, followed by Section III which discusses the experimental setup and blocklisted entries. Section IV investigates geographical bias and section V addresses the Ultra/Super peer versus leaf node debate.

2

Relevant Literature

A plethora of P2P networks, such as FastTrack, Gnutella [14], BitTorrent, eMule/Donkey and many others are prevalent in the Internet. Freely available P2P clients for nearly all operating systems generate signiﬁcant amounts of trafﬁc criss-crossing the Internet [13], [15]. These networks have recently been touted as the future for content distribution technologies [16], and for similar exciting and promising applications. However, these overlay networks act as signiﬁcant enablers in the movement of copyrighted material over the web. Organizations such as the RIAA and MPAA have been vociferous in their support for anti-P2P policies since it is the companies represented by these organizations that supposedly lose out on revenue due to the exchange of copyrighted songs and movies [5], [7]. Recently, a slew of reports in the electronic and print media have led to members of P2P communities pondering over the ramiﬁcations of such illegal resource 1

Ultra-peers are high bandwidth nodes that act as local centers, facilitate low bandwidth leaf nodes, and enable the scalability of gnutella-like networks.

The P2P War: Someone Is Monitoring Your Activities!

1099

sharing [18]. To mitigate such a threat of possible lawsuits, users have resorted to downloading and deploying anti RIAA/MPAA software. These programs block computers owned by such organizations from accessing users on the P2P networks [8], [1], thereby eﬀectively alienating them from quorums of P2P users. This prevents them from gaining critical information leading to generation of detailed user behavior log ﬁles which may be used for legal action. The number of such free software, easily available from popular websites is large. Many variants exist for diﬀerent clients, networks and Operating Systems. Previous work on modeling and analysis of P2P systems [24], [25], have focused on developing a viewpoint based on performance metrics of such overlay systems. Our work diﬀers greatly from these important earlier research eﬀorts. We conduct research to speciﬁcally ascertain if the organizations like the RIAA are active on P2P networks or not. We quantify the probability of a P2P user of being monitored by entities listed on the most popular blocklists. Also, we identify if there is any geographical bias associated with observing how P2P users run up against blocklisted entites. To the best of our knowledge, we believe that our research is the ﬁrst which speciﬁcally targets an in-depth study of whether such a threat is a reality for a generic P2P user. Moreover, our work is signiﬁcant for understanding who do we talk to while sharing copyrighted resources on these P2P networks. Additionally, we intend to verify reports suggesting that some so-called organizations enlisted by the RIAA target UPs in preference to leaf nodes [10], [11], in order to break the backbone of the entire overlay structure.

3

Who Is Watching?

In this section we discuss the experimental setup we employ followed by a synopsis of our ﬁndings regarding which blocklisted entities are most prevalent on P2P networks. Experimental set-up. We initiate our experiments to emulate a typical user and yet be able to measure large scale network-wide inter-node interaction characteristics of P2P networks. We measure statistics based on trace logs compiled from connections initiated using PlanetLab. The duration of measurements spanned more than 90 days, beginning January 2006. We initiate connections using nodes spread not only across the continental US but also Europe and Asia in order to determine any geographical nuances associated with which blocklisted entities seem to be more active than others, in speciﬁc locations. We were able to customize mutella 0.4.5 clients [27], a vanilla console based Gnutella client, and intitiate connections to the Gnutella network. Moreover, clients were made to switch intechangeably from UP to leaf modes in order to verify if network wide inter-node behavior of UPs is signiﬁcantly diﬀerent from leaf nodes. Search strings used for probing the P2P network were compiled as a list of popular songs, from Billboards hot 100 hits [28], top European 50 hits [29] and Asian hits [36]. Each node injected about 100 queries during every run. In the process, we analyzed more than 100GB of TCP header traces by using custom scripts and ﬁlters to extract relevant information which helps us develop a deeper

1100

A. Banerjee, M. Faloutsos, and L. Bhuyan

insight into who do we interact with while sharing resources on P2P networks. Note that all ﬁles stored as a result of our experiments on PlanetLab nodes, were completely removed and never used. Similarly no content was downloaded to local UCR machines for storage. Before we present results obtained from our measurements we must discuss what BOGON IPs [34] mean as they hold special siginiﬁcance to the collected information. BOGON is the name used to describe IP blocks not allocated by IANA and RIRs to ISPs and organizations plus all other IP blocks that are reserved for private or special use by RFCs. As these IP blocks are not allocated or specially reserved, such IP blocks should not be routable and used on the internet, however some of these IP blocks do appear on the net primarily used by those individuals and organizations that are often speciﬁcally trying to avoid being identiﬁed and are often involved in such activities as DoS attacks, email abuse, hacking and other security problems. The majority of the most active blocklisted entities encountered are hosted by organizations which want to remain anonymous. Table 1 lists the top ﬁfteen entities we encounter on the P2P network while exchanging resources, throughout the complete duration of our active trace collection. Surprisingly, we ﬁnd these entities operate from BOGON IP ranges. This observation is made on the basis of the various popular blocklist resources, and suggests that these sources deliberately wish to conceal their identities while serving ﬁles on P2P networks, by using up IP ranges which cannot be monitored down using an IP-WHOIS lookup to locate the operator employing these anonymous blocks. Only three out of the top ﬁfteen entries in table 1 do not use unallocated BOGON IP blocks and are listed on PG lists [1]. The rest of the BOGON entities are listed on both Trustyﬁles [30] and Bluetack [9] lists. Most of the BOGON IP ranges point to either ARIN or RIPE IP ranges. We must however mention that these BOGON IP ranges were found to point back to these generic network address distribution entities at the time of our experiments. It is quite possible that these ranges may have now been allocated to ﬁrms or individuals and may no longer remain anonymous. Content providers part of the RIAA do not participate in large scale eavesdropping into P2P networks using their own IPs. We observe that a whopping 99.5% of blocklisted IPs contacted either belong to BOGON, commercial entities, educational institutions and others. Among all blocklisted IPs contacted, about 0.5% could actually be traced back to record companies, such as Time Warner Inc. This is a clear indication of the miniscule presence of record companies trawling P2P networks in a proactive manner. According to popular perception in the P2P community, and discussions on blocklist hosting sites, such as Phoenix Labs [35], the entry FUZION COLO [31], [32] in Table 1, is viewed with distrust, and is understood to propagate self installing malware, and in general as an anti P2P entity. Xeex [33], is more of a mystery. It hosts an inconspicious site which provides absolutely no information as to what the company is really involved in. Going by the discussion groups

The P2P War: Someone Is Monitoring Your Activities!

1101

Table 1. Listing of top 15 blocklist entities encountered on P2P network Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

T op15HitRanges 72.48.128.0-72.235.255.255 87.0.0.0-87.31.255.255 88.0.0.0-88.191.255.255 72.35.224.0-72.35.239.255 71.138.0.0-71.207.255.255 70.229.0.0-70.239.255.255 70.159.0.0-70.167.255.255 70.118.192.0-70.127.255.255 216.152.240.0-216.152.255.255 216.151.128.0-216.151.159.255 70.130.0.0-70.143.255.255 87.88.0.0-87.127.255.255 71.66.0.0-71.79.255.255 87.160.0.0-87.255.255.255 70.82.0.0-70.83.255.255

T ype Bogon Bogon Bogon FuzionColo Bogon Bogon Bogon Bogon xeex xeex Bogon Bogon Bogon Bogon Bogon

Table 2. Listing of top 5 educational and commercial entities encountered on P2P networks Rank 1 2 3 4 5

T op5EucationalHitRanges 152.2.0.0-152.2.255.255-Univ. of N. Carolina 64.247.64.0-64.247.127.255-Ohio University 129.93.0.0-129.93.255.255-Univ. of Nebraska 128.61.0.0-128.62.255.255-Georgia Tech 219.242.0.0-219.243.255.255-CERNET

T op5CommercialHitRanges 72.35.224.0-72.35.239.255-FuzionColo 216.152.240.0-216.152.255.255-XeeX 216.151.128.0-216.151.159.255-XeeX 38.113.0.0-38.113.255.255-Perf.SystemsInted2k 66.172.60.0-66.172.60.255-Netsentryed2kserver

hosted on the PG website, xeex does turn up frequently in blocklist hits for a large number of users. Other individuals or organizations deliberately employing BOGON IPs to participate in the exchange of resources on P2P networks are certainly attempting to cloak themselves, possibly from the RIAA. Another vein of reasoning would suggest that they could be the ones who keep tabs on what users download. Table 2 displays the top ﬁve entities that registered hits on the educational and research institutions list and the government and commercial organizations lists. We observe that FuzionColo and XeeX appear prominently in this categorization along with two other commercial organizations which host servers on ed2k and Gnutella networks. We ﬁnd that hits to entities listed on commercial and government blocklists are much more frequent than hits on any other different kind of blocklists such as Internet ad companies, educational institutions and others. Even though the number of IPs which belong explicitly to content providers may be small, the fact that IPs listed on commercial and government blocklists are providing content to P2P users is of concern. The scenario wherein commercial organizations are hired by content providers to collect user proﬁle data in these networks cannot be ruled out. Furthermore, the possibility that these commercial organizations such as the ones listed in table 2 are not aware of P2P traﬃc emanating from their servers and are too lax about security does not seem very plausible since some of these bocklisted entities kept monitoring our clients nearly every time ﬁles were exchanged. It is clear that these commercial IP ranges which serve ﬁles to P2P users have a very large cache of popular in-demand media and have extremely low downtime, which seems improbable if in fact the machine were turned into a bot. In fact, the number of hits to

1102

A. Banerjee, M. Faloutsos, and L. Bhuyan List-Wise Classification of Hits 1200

No. of Hits

1000 800 600 400 200 0 Edu

Gov

Spy

Ad

Type of Hits

Fig. 1. Classiﬁcation of blocklist hits according to their type. We observe that hits on the commercial and government blocklist is signiﬁcantly larger than hits on the other blocklists.

commercial and government blocklisted entities is nearly 2.5 times greater than hits to any other kind of blocklisted IP we were monitored by.

4

Probability

In this section it is our intention to estimate the probability of a typical user of being monitored by entities listed on these blocklists while surﬁng P2P networks. This gives an idea of how aggressive these lists are and what percentage of entities we talk to while surﬁng P2P networks are not considered trustworthy. We observe throughout the complete duration of our measurements, 100% of all our nodes were monitored by entities on blocklists and on average 12-17% of all distinct IPs contacted by any of our clients were listed on blocklists. As illustrated in Fig. 2, the percentage of IPs listed on blocklists which a node is monitored by is quite signiﬁcant, about 12-17% of all distinct IPs contacted, per node. In fact this trend was reﬂected throughout the complete duration of measurements, which suggests that the presence of blocklisted entities on P2P networks is not an ephemeral phenomenon. Popularity of blocklisted IPs monitoring P2P users follows a skewed distribution. We observe this behavior as displayed in Fig. 3a. A small number of entities register a large number of hits while most blocklisted entities are infrequently visible on P2P networks. This fact is of great consequence to users who wish to avoid contact with blocklisted entities and thus reduce their chances of running into anti-P2P entities. Simply ﬁltering out the ﬁve most popular entities on these networks leads to a drastic reduction in the number of hits to them, to the tune of 94%. This interesting statistic is displayed in Fig. 3b. In fact avoiding just these top 5 popular IP ranges can reduce the chances of a user being monitored signiﬁcantly, down to nearly 1%. Users may use this fact to tweak their IP ﬁlters to increase their chances of safely surﬁng P2P networks and bypassing the most prevalent blocklisted entities. In contrast, a naive user without any information of blocklists will almost certainly be monitored by blocklisted entities. Also, the fact that 100% of all nodes regardless of geographical location were monitored by blocklisted IPs, indirectly points to the completeness of the blocklists we compiled from the most popular sources.

The P2P War: Someone Is Monitoring Your Activities!

1103

%age of Blocklisted IPs Vs all distinct IPs %age of Blocklised IPs

35 30 25 20 15 10 5 0 US-EC US-WC Europe

Asia

Rank-wise Frequency Diestribution (IPs) Frequency of IPs

100 80 60 40 20 0 0

2

4

6

8

10

12

14

16

Percentage contribution of IPs

Fig. 2. Percentage of distinct blocklist IPs contacted, per user, out of the total number of distinct IPs logged Rank-wise Contribution of Blocklisted IPs 100 80 60 40 20 0 2

4

6

8

10

Rank of IPs

Rank of IPs

(a)

(b)

12

14

Fig. 3. (a) Frequency of popularity of blocklisted IPs, following a skewed distribution. (b) Percentage contribution by Blocklisted IPs. The 5 most popular blocklisted IPs contribute to nearly 94.2% of all blocklist hits.

5

Geographical Distribution

In this section we focus attention towards whether geographical bias if any is observed with respect to blocklisted IPs monitoring our clients from diﬀerent locations. To achieve this we needed to develop a mechanism allowing us multiple points of entry, geographically speaking, into a P2P network. We employed over 50 diﬀerent nodes on PlanetLab, encompassing the continental US, Europe and Asia to measure this metric. We monitor individually, PlanetLab nodes located in the continental US and classify nodes situated on the east coast as US-EC and on the west coast as US-WC. This was done to observe if there is any variation in monitoring behavior within mainland US. Surprisingly, we ﬁnd that measurements gathered from PlanetLab nodes located on US-EC and US-WC do not concur in unison regarding various metrics discussed in the following sections. Geographical location inﬂuences observed monitoring activity. To provide an idea of how blocklisted IPs monitor P2P users over a complete geographical spectrum we present Fig. 4a. We observe that the percentage of blocklisted IP hits is highest in US-WC followed by US-EC, Asia and Europe. The percentage of hits to blocklisted IPs per node, compared to total hits to IPs contacted by each node, located on the US-WC seems to be nearly twice that of nodes located on US-EC. Quite obviously, this suggests that users accessing the P2P network from these two vantage points, within the mainland US, encounter diﬀerent levels

A. Banerjee, M. Faloutsos, and L. Bhuyan Geographical Distribution of Total Hits

Geographical Distribution: Edu Hits

Geographical Distribution: Gov Hits

60

%age of Hits

%age of Hits

60 50 40 30 20 10

60

%age of Hits

1104

50 40 30 20 10

0 Asia

Geographical Region

(a)

40 30 20 10

0 US-EC US-WC Europe

50

0 US-EC US-WC Europe

Asia

Geographical Region

(b)

US-EC US-WC Europe

Asia

Geographical Region

(c)

Fig. 4. UP Vs Leaf: (a) Distribution of Blocklisted IPs contacted in diﬀerent geographical zones. (b) Distribution of Blocklisted IP hits, to Educational lists, in diﬀerent geographical zones. (c) Distribution of Blocklisted IP hits, to Government and Commercial lists, in diﬀerent geographical zones.

of monitoring activity. We believe this observed inequality springs from the following reason, that diﬀerence in user behavior and possible diﬀerence in levels of monitoring activities by entities on the blocklists could directly be responsible for such a skewed trend. Fig. 4b depicts the distribution of blocklisted IP hits from the “educational” range, comprising of academic and research institutions. Again, we observe a similar trend. Nodes located on US-WC notch up a higher percentage of blocklist hits compared to nodes located on US-EC, Asia and Europe. In fact, the diﬀerence in measurements between US-WC and USEC is more than ﬁve times than that of readings gathered from US-EC. Fig. 4c depicts the distribution of blocklisted IP hits in the government and commercial domain. Once again, we observe that ﬁgures collected for nodes situated on US-WC are higher than nodes on US-EC, Asia and Europe. Given that the period of observation, the UTC time when data was logged, the number of queries input into the P2P network, the order in which queries were injected were identical, we surmise that, throughout the duration of our experiments the consistent skewed distribution between US-WC and US-EC can be due to diﬀerence in user behavior and the local prevalence and diﬀerence in monitoring activity levels of blocklisted entities in these diﬀerent geographical settings. Users on US-WC experience aggressive monitoring activity. Analyzing information depicted in Fig. 2 and Fig. 4a to c, we observe that users located on US-WC run into a smaller number of distinct blocklisted IPs but at the same time register a larger number of hits to these ranges, a clear indication of heightened monitoring activity vis-a-vis other geographical locations. Nodes located in Europe consistently registered a lower number of blocklisted IP hits when compared to nodes located in Asia. We attempt to maintain a balance while conducting experiments and deploy our code on nearly the same number of nodes in diﬀerent geographical settings, log data during synchronized time periods. The only diﬀerence while gathering measurements in these settings was that we used diﬀerent lists of queries which were injected into the P2P

The P2P War: Someone Is Monitoring Your Activities!

1105

network for nodes located in separate continents. For nodes located in Europe we constructed query lists based on European 50 hits [29] and for nodes in Asia we constructed query lists based on Asian hits [36]. The magnitude of diﬀerence observed between nodes in Europe and Asia was found to be more or less consistent across the diﬀerent types of blocklisted IPs. They were however signiﬁcantly diﬀerent from measurements gathered across the mainland US. We believe that this diﬀerence could again be due to dissimilarity in user behavior and monitoring activity across geographical boundaries.

6

Role Dependent Monitoring

This section delves into whether according to popular perception in P2P communities [10], [11], the probability of being monitored by blocklisted entities varies with the ”role” played by a P2P node. The question we answer is: are UPs monitored with higher probability by entities on blocklists versus regular leaf nodes. This could show if content providers consider monitoring UPs to be a more fruitful excercise. Through our measurements we ﬁnd that there is no conclusive evidence to support any theory regarding role based monitoring. We observe connection dynamics of UP and leaf nodes in Fig. 5a. Surprisingly, for leaves located in the US the mean number of distinct IPs contacted is higher than for UPs. This is in contrast to nodes in Europe or Asia, where the mean number of distict IPs contacted is higher for UPs. This observation suggests that UPs in the US are more conservative in terms of how many users they talk to in comparison with UPs in Europe or Asia. An obvious question that comes to mind is: should UPs interacting with a lesser number of distinct IPs translate into a lower probability of a UP being monitored? As we will see next this is not always true. In Fig. 5b we observe the comparison between the percentage of blocklisted IP hits with regards to total IPs contacted for UPs and leaf nodes. This metric depicts if there is any correlation between UPs being monitored preferentially over leaf nodes irrespective of geographical location. We ﬁnd that UPs in US-WC encounter higher percentages of blocklisted IPs versus leaf nodes. This trend is UP Vs Leaf:Mean No. of Distinct IPs contacted

UP Vs Leaf:Percent of Blocklist IP hits

4,000

100% UP Leaf

UP Leaf

% of Blocklist IP hits

Mean No. of distinct IPs

3,500 3,000 2,500 2,000 1,500 1,000

80%

60%

40%

20%

500 0

US−EC

US−WC

Europe

Geographical Regions

(a)

Asia

0%

US−EC

US−WC

Europe

Asia

Geographical Regions

(b)

Fig. 5. UP Vs Leaf:The black bar signiﬁes UP while the yellow bar signiﬁes leaf users (a) Comparison of average number of distict IPs contacted by UPs and leaves. (b) Comparison of percentage of blocklisted IPs as encountered by UPs and leaf users.

1106

A. Banerjee, M. Faloutsos, and L. Bhuyan

consistent with Europe based nodes. However for US-EC and Asia based nodes we observe that UPs encounter lesser percentages of blocklist IPs compared to leaf nodes. In fact, we ﬁnd less than 5% variation in the average number of blocklisted IP hits registered by UPs versus leaf nodes. Thereby we don’t ﬁnd any conclusive evidence for claims of UPs being preferentially monitored by blocklisted entities versus leaf nodes. Also, to answer the question posed previously. Consider the case of US-WC, where UPs talk to less distinct IPs but still are monitored by a larger number of blocklisted IPs. This is clear indication that monitoring activity varies with geographical location and that talking to lesser number of IPs doesn’t translate into a lesser probability of being monitored. We must mention that our measurements suggest a deﬁnite disparity in monitoring activity between US-WC and US-EC and this could possibly be associated to diﬀerences in user activity levels at these locations. An imbalance in observations for Europe and Asia can possibly be explained by the ”interest” of content providers in trying to monitor P2P networks in those regions. The scanty number of lawsuits in Asia in comparison to signiﬁcant numbers in the US and Europe provide credence to this explanation [22], [20].

7

Conclusion

To the best of our knowledge, this work is the ﬁrst to quantify the probability that a user will be monitored i..e. interact with a suspicious IP address. Using Planetlab, we conduct large-scale active measurements, spanning a period of 90 days, from January to March 2006, spread over 3 continents, yielding nearly 100 GB of TCP packet header data. A naive user is practically guaranteed to be monitored: we observe that 100% of our peers run into blocklisted users. In fact, 12% to 17% of all distinct IPs contacted by a peer are blocklisted ranges. Interestingly, a little caution can have a signiﬁcant eﬀect: the top ﬁve most prevalent blocklisted IPs contribute to nearly 94% of all blocklisted entities we ran into. This information can help users to reduce their chances of being monitored to just about 1%. At the same time, we examine various diﬀerent dimensions of the users such as the geographical location and the role of the node in the network. We ﬁnd that the geographical location, unlike the role, seems to aﬀect the probability of encountering blocklisted users. Finally we answer, who owns blocklisted IP addresses. Interestingly, we ﬁnd that just 0.5% of all blocklisted IP hits belong explicitly to media companies. The majority of blocklisted users seem to belong to commercial and government organizations and a sizeable portion of the most popular belong to BOGON ranges. Our work is the ﬁrst step in monitoring the new phase of “war” between the content providers and the P2P community. It will be very interesting to continue to monitor the evolution of this conﬂict. A logical next step is to analyze the accuracy and completeness of the blocklists, and the speed with which a new blocklisted entity is ﬂagged.

The P2P War: Someone Is Monitoring Your Activities!

1107

References 1. http://peerguardian.sourceforge.net 2. E. K. Lua, J. Crowcroft, M. Pias, R. Sharma and S. Lim A Survey and Comparison of Peer-to-Peer Overlay Network Schemes, IEEE Comm. Survey, March 2004. 3. http://news.dmusic.com/article/7509 4. http://www.betanews.com/article/MPAASuesUsenetTorrentSearchSites 5. http://importance.corante.com/archives/005003.html 6. http://www.mp3newswire.net/stories/napster.html 7. http://news.com.com/2100-1027-995429.html 8. http://sourceforge.net/projects/peerprotect 9. http://bluetack.co.uk/blc.php 10. http://www.boycott-riaa.com/article/9316 11. http://slashdot.org/articles/02/05/25/0324248.shtml 12. http://www.planet-lab.org 13. T. Karagiannis, A.Broido, M. Faloutsos, and kc claﬀy, Transport layer identiﬁcation of P2P traﬃc, In ACM Sigcomm IMC’04, 2004. 14. E. Markatos, Tracing a large-scale peer to peer system: an hour in the life of gnutella, In 2nd IEEE/ACM Intl. Symp. on Cluster Computing & the Grid, 2002. 15. S. Sen and J. Wang, Analyzing Peer-to-Peer Traﬃc Across Large Networks, In ACM SIGCOMM IMW, 2002. 16. Thomas Karagiannis, Pablo Rodriguez and Dina Papagiannaki, Should Internet Service Providers Fear Peer-Assisted Content Distribution?, In IMC’05, Berkeley. 17. Kurt Tutschku, A measurement-based traﬃc proﬁle of the edonkey ﬁlesharing service, In PAM’04, Antibes Juan-les-Pins, France, 2004. 18. http://www.techspot.com/news/16394-record-labels-launch-action-kazaa.html 19. http://www.mpaa.org/CurrentReleases/2004 12 14 WwdeP2PActions.pdf 20. Valerie Alter, Building Rome in a Day: What Should We Expect from the RIAA?,56 HASTINGS COMM. & ENT. L.J. 155. 21. Jane Black, The Keys to Ending Music Piracy, BUS. WK., Jan. 27, 2003, http://www.businessweek.com/bwdaily/dnﬂash/jan2003/ 22. RIAA Gives Advance Warning to Song-Swappers Before Lawsuits are Filed, http://www.antimusic.com/news/03/oct/item77.shtml, 2003. 23. Thomas Karagiannis, Andre Broido, Nevil Brownlee, KC Claﬀy, Michalis Faloutsos, Is P2P dying or just hiding, IEEE Globecom 2004. 24. Chu, J., Labonte, K., and Levine, B. N., Availability and locality measurements of peer-topeer ﬁle systems. In Proc. of ITCom ’02. 25. F. Clvenot-Perronnin and P. Nain, Stochastic Fluid Model for P2P Caching Evaluation, In Proc. of IEEE WCW 2005. 26. http://azureus.sourceforge.net/plugin details.php plugin safepeer 27. http://mutella.sourceforge.net/ ¯ 28. http://www.billboard.com/bbcom/charts/chart display.jsp?fThe Billboard Hot 100 29. http://www.mp3hits.com/charts/euro 30. http://www.trustyﬁles.com 31. http://isc.sans.org/diary.php?date=2005-04-11 32. http://www.winmxworld.com/tutorials/block the RIAA.html 33. http://xeex.com 34. http://www.completewhois.com/bogons/index.htm 35. http://phoenixlabs.org 36. http://www.mtvasia/Onair

On-Line Predictive Load Shedding for Network Monitoring Pere Barlet-Ros1, Diego Amores-L´ opez1 , Gianluca Iannaccone2 , 1 Josep Sanju` as-Cuxart , and Josep Sol´e-Pareta1 1

Technical University of Catalonia (UPC), Computer Architecture Dept. Jordi Girona, 1-3 (Campus Nord D6), Barcelona 08034, Spain {pbarlet,damores,jsanjuas,pareta}@ac.upc.edu 2 Intel Research 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK [email protected]

Abstract. Building robust network monitoring applications is hard given the unpredictable nature of network traﬃc. Complex analysis on streaming network data usually leads to overload situations when presented with anomalous traﬃc, extreme traﬃc mixes or highly variable rates. We present an on-line predictive load shedding scheme for monitoring systems that quickly reacts to overload situations by gracefully degrading the accuracy of analysis methods. The main novelty of our approach is that it does not require any knowledge of the monitoring applications. This way we preserve a high degree of ﬂexibility, increasing the potential uses of these systems. We implemented our scheme in an existing network monitoring system and deployed it in a research ISP network. Our experiments show a 10-fold improvement in the accuracy of the results during long-lived executions with several concurrent monitoring applications. The system eﬃciently handles extreme load situations, while being always responsive and without undesired packet losses. Keywords: Network monitoring, load shedding, resource management, traﬃc sampling, resource usage monitoring, resource usage prediction.

1

Introduction

The processing requirements imposed on network monitoring systems have greatly increased in recent years. Continuous and ﬁne-grained analysis of network traﬃc is now a basic requirement for this class of systems. For example, there is a growing demand for monitoring applications that require tracking and inspection of a large number of concurrent network connections for intrusion and anomaly detection purposes. These systems must also handle ever-increasing link speeds and highly variable data rates, and be robust to anomalous or extreme traﬃc mixes. Within the networking research community, several initiatives have been advanced to provision monitoring infrastructures that allow a large number of I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1108–1119, 2007. c IFIP International Federation for Information Processing 2007

On-Line Predictive Load Shedding for Network Monitoring

1109

users to submit arbitrary traﬃc queries on live network streams [1,2]. Recent research proposals have also introduced system designs that provide developers with suﬃcient ﬂexibility in the deﬁnition of the monitoring applications and with the ability to distribute their computations eﬃciently across the measurement infrastructure [3,4]. However, proposed designs do not directly address the increasingly serious problem of eﬃciently handling overload situations, when resource demands clearly exceed the system capacity. The alternative of over-provisioning the system to handle peak rates or any possible traﬃc mix has two major drawbacks. First, it would be prohibitively expensive and result in a highly underutilized system based on an extremely pessimistic estimation of workload [5]. Second, it would necessarily lead to reduce its ﬂexibility and possible applications [6]. We have designed a load shedding scheme that allows current network monitoring systems to sustain the rapidly increasing data rates, number of users and complexity of analysis methods, with minimum impact on the accuracy of the results. The main novelty of our approach is that it does not require any explicit knowledge of the queries or the type of computations they perform (e.g., ﬂow classiﬁcation, maintaining aggregate counters, pattern search). In a previous work [7], we proposed a method to predict the resource usage of arbitrary and continuous network traﬃc queries. Our method (brieﬂy reviewed in Section 3) automatically identiﬁes, from small sequences of the incoming packet streams, the traﬃc feature(s) that best model the cost of each query (e.g., the number of packets, bytes, unique source IP addresses, etc.) and uses them to accurately predict the CPU usage. In this paper, we extend that work by deﬁning how this short-term prediction can be used to guide the system on deciding when, where and how much load to shed in the presence of overload (Section 4). We present long-lived experiments on a research ISP network, where the traﬃc load and query requirements exceed by far the capacity of the monitoring system (Section 5). Our results indicate that, with the load shedding mechanism in place, (i) the system eﬃciently handles extreme overload situations, while being always responsive and without introducing undesired packet losses, and (ii) the queries can always complete and return results within acceptable error bounds.

2

Related Work

Most of the existing proposals to handle overload situations in network monitoring are based on data reduction techniques, such as packet ﬁltering, aggregation and sampling. The most representative example is arguably Cisco’s NetFlow [8], which aggregates incoming packets into ﬂow records. Sampled NetFlow also resorts to packet sampling to deal with overload situations, while Adaptive NetFlow [9] dynamically adapts the sampling rate to the memory consumption. Keys et al. [6] developed a monitoring system robust to extreme traﬃc mixes that combines aggregation, adaptive sampling and the use of memory-eﬃcient counting algorithms to extract a set of 12 pre-deﬁned traﬃc summaries.

1110

P. Barlet-Ros et al.

Several works have also addressed similar problems in the intrusion detection space. For example, Dreger et al. discuss in [10] several modiﬁcations to the Bro NIDS [11], such as dynamically selecting the restrictiveness of the packet ﬁlters, to allow Bro to operate in high-speed environments. Gonzalez et al. [12] also propose the inclusion of a secondary path into Bro that implements sampling and ﬁltering to reduce the cost of those analysis tasks that do not require stream reassembly and stateful inspection. The design of mechanisms to handle overload situations is a classical problem in any real-time system and several works have proposed solutions in other environments. For example, in the database community, the Aurora system [13] sheds excess load by inserting additional drop operators in the query data ﬂow, while TelegraphCQ [14] uses approximate query processing techniques to provide delay-bounded answers in the presence of overload. Unfortunately, proposed solutions require the use of declarative query languages with a restricted set of operators, of which cost and selectivity are assumed to be known, hindering the use of those techniques in our context. In the Internet services space, SEDA [15] proposes an architecture to develop highly concurrent server applications, built as networks of stages interconnected by queues. In SEDA, load shedding is achieved by applying admission control on the event queues when an overload situation is detected.

3

Architecture

3.1

Monitoring Platform

We chose the CoMo platform [4] to develop and evaluate our load shedding scheme. The platform allows users to deﬁne traﬃc queries as plug-in modules written in C that contain stateful computations. The user is also required to specify a simple stateless ﬁlter to be applied to the incoming packet stream, as well as the granularity of the measurements, hereafter called measurement interval (i.e., the time interval that will be used to report continuous query results). In order to provide the user with maximum ﬂexibility when writing queries, CoMo does not restrict the type of computations that a plug-in module can perform. As a consequence, the platform does not have any explicit knowledge of the data structures used by the plug-in modules or the cost of maintaining them. 3.2

Prediction and Load Shedding Overview

Figure 1 shows the components and data ﬂow in the system. The prediction and load shedding subsystem (in gray) intercepts the packets from the ﬁlter before they are sent to the plug-in module implementing the traﬃc query. The system operates in four phases. First, it groups each 100ms of traﬃc in a “batch” of packets.1 Each batch is then processed to extract a large set of 1

The choice of 100ms is somewhat arbitrary, but our experimental results indicate that it represents a good trade-oﬀ between prediction accuracy and overhead, as we will show in Section 5.2.

On-Line Predictive Load Shedding for Network Monitoring

1111

Fig. 1. System overview

pre-deﬁned traﬃc features. A feature is a counter that describes a speciﬁc property of the batch. For example, the number of packets, bytes, unique destination IP addresses, 5-tuple ﬂows, etc. The features we compute have the advantage of being lightweight with a deterministic worst case computational cost. An exhaustive description of the 42 traﬃc features currently supported by our system can be found in [7]. The feature selection subsystem is in charge of selecting the most relevant features according to the recent history of the query’s CPU usage. This phase is important to reduce the overhead of the prediction algorithm, because it allows the system to discard beforehand the features regarded as useless for prediction purposes. This subset of relevant features is then given as input to the multiple linear regression (MLR) subsystem to predict the CPU cycles required by the query to process the entire batch. When the prediction exceeds the available cycles, the load shedding subsystem pre-processes the batch to discard a portion of the packets. Finally, the actual CPU usage is computed and fed back to the prediction subsystem to close the loop. The feature extraction, feature selection and multiple linear regression phases were already described and evaluated in [7]. In the following sections we focus on the load shedding component of the system.

4

Load Shedding

In this section, we provide the answers to the three fundamental questions any load shedding scheme needs to address: (i) when to shed load (i.e., which batch), (ii) where to shed load (i.e., which query) and (iii) how much load to shed (e.g., the sampling rate to apply). Algorithm 1 presents our load shedding scheme in detail, which uses the output of the prediction subsystem described in Section 3. 4.1

When to Shed Load

To decide when to shed load the system maintains a threshold (avail cycles) that accounts for the amount of cycles available in a time bin to process queries. Since batch arrivals are periodic (e.g., every 0.1s in our implementation), this threshold can be dynamically computed as (timebin × CP U f req.) − overhead, where overhead stands for the cycles needed by our prediction subsystem (ps cycles),

1112

P. Barlet-Ros et al.

Algorithm 1. Load shedding algorithm Input: Q: Set of qi queries bi : Batch to be processed by qi after ﬁltering como cycles: CoMo overhead cycles rtthresh, delay: Buﬀer discovery parameters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

srate = 1; pred cycles = 0; foreach qi in Q do fi = feature extraction(bi ); si = feature selection(fi , hi ); pred cycles += mlr(fi , si , hi ); avail cycles = (time bin × CPU frequency) - (como cycles + ps cycles) + (rtthresh - delay); rror) then if avail cycles < pred cycles × (1 + e

avail cycles−ls cycles srate = pred ; cycles×(1+error) foreach qi in Q do bi = sampling(bi , qi , srate); fi = feature extraction(bi ); ls cycles = α × i ls cyclesi + (1 − α) × ls cycles;

foreach qi in Q do query cyclesi = run query(bi , qi , srate); hi = update mlr history(hi , fi , query cyclesi ); cycles e rror=α × 1 − pred + (1 − α) × e rror; query cyclesi i

plus those spent by other CoMo tasks (como cycles) not directly related to query processing (e.g., packet collection, disk and memory management). The CPU usage is measured using the time-stamp counter, as described in [7]. When the predicted cycles for all queries (pred cycles) exceed the avail cycles threshold, excess load needs to be shed. We observed that, for certain time bins, como cycles is greater than avail cycles, due to CoMo implementation issues (i.e., other CoMo tasks can occasionally consume all available cycles). This would force the system to discard entire batches, having a negative impact on the accuracy of the prediction and query results. However, this situation can be minimized considering the presence of buﬀers (e.g., in the capture devices) that allow the system to use more cycles than those available in a single time bin. That is, the system can be delayed in respect to real-time operation as long as it is stable in the steady state. We use an algorithm, inspired in the way TCP determines the size of the congestion window [16], to dynamically discover by how much the system can safely (i.e., without loss) exceed the avail cycles threshold. The algorithm continuously monitors the system delay (delay), deﬁned as the diﬀerence between the cycles actually used and those available in a time bin, and maintains a threshold (rtthresh) that controls the amount of cycles the system can be delayed without

On-Line Predictive Load Shedding for Network Monitoring

1113

loss. rtthresh is initially set to zero and gets increased whenever queries use less cycles than available. If at some point, the occupation of the buﬀers exceeds a predeﬁned value (i.e., the system is turning unstable), rtthresh is reset to . rtthresh grows zero, and a second threshold (initialized to ∞) is set to rtthresh 2 exponentially while below this threshold, and linearly once it is exceeded. This technique has two main advantages. First, it is able to operate without explicit knowledge of the maximum rate of the input streams. Second, it allows the system to quickly react to changes in the traﬃc. Algorithm 1 (line 7) shows how the avail cycles threshold is modiﬁed to consider the presence of buﬀers. Note that, at this point, delay is never less than zero, because if the system used less cycles than the available in a previous time bin, they would be lost waiting for the next batch to become available. Finally, as we further discuss in Section 4.3, we multiply the pred cycles by 1 + e rror in line 8, as a safeguard against prediction errors, where e rror is an Exponential Weighted Moving Average (EWMA) of the actual prediction error measured in previous time bins (computed as shown in line 17). 4.2

Where and How to Shed Load

Our approach to shed excess load consists of adaptively reducing the volume of data to be processed by the queries (i.e., the size of the batch). We already discussed in Section 2 several data reduction techniques that can be used for this purpose (e.g., ﬁltering, aggregation and sampling). In our current implementation, we support uniform packet and ﬂow sampling, and let each query select at conﬁguration time the option that yields the best results. When an overload situation is detected, the same sampling rate is applied to all queries (line 11).2 In order to eﬃciently implement ﬂow sampling, we use a hash-based technique called Flowwise sampling [17]. This technique randomly samples entire ﬂows without caching the ﬂow keys, which reduces signiﬁcantly the processing and memory requirements during the sampling process. To avoid bias in the selection and deliberate sampling evasion, we randomly generate a new H3 hash function [18] per query every measurement interval, which distributes the ﬂows uniformingly and unpredictably. 4.3

How Much Load to Shed

The magnitude of load shedding is determined by the maximum sampling rate that keeps the CPU usage below the avail cycles threshold. Since the system does not diﬀerentiate among queries, the sampling rate could be simply set to the cycles ratio avail pred cycles in all queries. This assumes that their CPU usage is proportional to the size of the batch (in packets or ﬂows, depending on whether packet or ﬂow sampling is used). However, the cost of a query can actually depend on 2

Note that using the same sampling rate for all queries does not diﬀerentiate among them. See Section 6 for further discussion.

1114

P. Barlet-Ros et al.

several traﬃc features, or even on a feature diﬀerent from the number of packets or ﬂows. In addition, there is no guarantee of keeping the CPU usage below the avail cycles threshold, due to the error introduced by the prediction subsystem. We deal with these limitations by maintaining an EWMA of the prediction error (line 17) and correcting the sampling rate accordingly (line 9). Moreover, we have to take into account the extra cycles that will be needed by the load shedding subsystem (ls cycles), namely the sampling procedure (line 11) and the feature extraction (line 12), which must be repeated after sampling in order to correctly update the MLR history. Thus, we also maintain an EWMA of the cycles spent in previous time bins by the load shedding subsystem (line 13) and subtract this value from avail cycles. After applying the mentioned changes, the sampling rate is computed as shown in Algorithm 1 (line 9). The EWMA weight α is set to 0.9 in order to quickly react to changes. It is also important to note that if the prediction error was zero in average, we could remove it from lines 8 and 9, because buﬀers should be able to absorb such error. However, there is no guarantee of having a mean of zero in the short term.

5

Evaluation and Operational Results

In this section we evaluate our load shedding system in a research ISP network. We also assess the impact of sampling on the accuracy of the queries, and compare the results of our predictive scheme to a system that uses instead a reactive approach, discarding packets when the buﬀers become full. We do not present here the accuracy of the prediction subsystem, which was already evaluated in [7]. 5.1

Testbed Scenario

Our testbed equipment consists of two single processor Pentium IV at 3 GHz, both equipped with an Endace DAG 4.3GE card [19]. Through a pair of optical splitters, both computers receive an exact copy of one direction of a full-duplex Gigabit Ethernet link that connects the Catalan RREN (Scientiﬁc Ring) to the Spanish NREN (RedIRIS). The ﬁrst PC is used to run the CoMo monitoring system on-line, while the second one only collects a packet-level trace, which is used as our reference to verify the accuracy of the results. Throughout the evaluation, we present the results of two 8 hours-long executions (see Table 1 and Figure 3(a) for details). In the ﬁrst one (load shedding), we ran a modiﬁed version of CoMo that implements our load shedding scheme,3 while in the second execution (original como), we repeated the same experiment, but using the original version of CoMo. The duration of the executions was determined according to the amount of storage space available to collect the packet-level traces (400 GB). 3

The source code of our load shedding system is publicly available at http:// loadshedding.ccaba.upc.edu

On-Line Predictive Load Shedding for Network Monitoring

1115

Table 1. Executions done in our experiments Link load (Mbps) mean/max/min load shedding 24/Oct/06 9:00-17:00 750.4/973.6/129.0 original como 25/Oct/06 9:00-17:00 719.9/967.5/218.0 Execution

Date/Time

Table 2. Queries used in the experimental evaluation Name application counter flows high-watermark pattern search top destinations trace

Description Port-based application classiﬁcation Traﬃc load in packets and bytes Per-ﬂow counters High watermark of link utilization Finds sequences of bytes in the payload List of the top-10 destination IPs Full-payload collection

We have selected a set of seven queries that are part of the standard distribution of CoMo (see Table 2).4 They present diﬀerent resource usage proﬁles (CPU, memory and disk bandwidth) for the same input traﬃc and use diﬀerent data structures to maintain their state (e.g., aggregated counters, hash tables, sorted lists). Note that our method considers all queries as black boxes. 5.2

Performance Results

Figure 2 presents the CPU usage during the load shedding execution, broken down by the three main tasks presented in Section 4 (i.e., como cycles, query cycles and ps cycles + ls cycles). We also plot the cycles the system estimates as needed to process all incoming traﬃc (i.e., pred cycles). From the ﬁgure, it is clear that the system is under severe stress because, during almost all the execution, it needs more than twice the cycles available to run our seven queries without loss. However, we can observe that our load shedding system is able to keep the CPU usage consistently below the 3 GHz mark. Figure 3(a) conﬁrms that, during the 8 hours, not a single packet was lost. This indicates that predictions are accurate and the system is robust to overload. In Figure 3(b), we plot the Cumulative Distribution Function (CDF) of the CPU usage per batch (i.e., the service time per batch). Recall that batches represent 100ms, resulting in 3 × 108 cycles available per batch. The ﬁgure shows that the system is stable. As expected, sometimes the limit of available cycles is slightly exceeded owing to the buﬀer discovery algorithm presented in Section 4.1. The CDF also indicates good CPU usage, between 2.5 and 3 × 108 cycles, with a probability around 90%. 4

The source code of the queries used in the evaluation is publicly available at http://como.sourceforge.net

1116

P. Barlet-Ros et al. 9

9

x 10

CPU usage [cycles/sec]

8 7 6 CoMo cycles Load shedding cycles Query cycles Predicted cycles CPU frequency

5 4 3 2 1

0 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 time [hh:mm]

Fig. 2. CPU usage (load shedding execution) 4

x 10

Load shedding 1 0.9 Total DAG drops Unsampled

5

0 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 time [hh:mm] 4

x 10

Original CoMo

packets/sec

10

0.8 0.7 F(CPU usage)

packets/sec

10

0.6 0.5 0.4 0.3

CPU cycles per batch Load shedding Original CoMo

0.2

5 Total DAG drops 0 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 time [hh:mm]

(a) Link load and packet drops

0.1 0 0

2

4

6 8 10 12 CPU usage [cycles/batch]

14

16 8

x 10

(b) CDF of the CPU usage per batch

Fig. 3. Performance of our load shedding system compared to the original CoMo

On the contrary, Figure 3(b) shows that, for the original como execution, the service time per batch is signiﬁcantly larger than the arrival time of batches, with a probability of exceeding the limit of available cycles greater than 30%. Thus, this system is unstable and leads not only to drops of packets without control, but even of entire batches. Figure 3(a) shows the packets dropped by the DAG card,5 while Figure 3(b) certiﬁes that the probability of losing an entire batch (i.e., service time of zero) is larger than 20%. 5.3

Accuracy Results

We modiﬁed the source code of the counter, flows and top destinations queries, in order to allow them to estimate their unsampled output when load shedding 5

The values are a lower bound of the actual drops, because the loss counter present in the DAG records is only 16-bit long.

On-Line Predictive Load Shedding for Network Monitoring

1117

is performed. This modiﬁcation was simply done multiplying the metrics they compute by the inverse of the sampling rate applied to each batch. We chose the counter and flows queries mainly to verify our implementation of packet and ﬂow sampling, respectively. In particular, we measured the relative value error in the number of packets, bytes and ﬂows, deﬁned as |1 − estimated actual value |, where the actual value is obtained from the complete packet trace. Conversely, the top destinations query was chosen to evaluate the impact of our current load shedding mechanisms on a query that computes a metric known to be statistically more complex and problematic [17,20]. In this case, we selected packet sampling as load shedding mechanism [20]. In order to objectively measure the error, we used the detection performance metric proposed in [20], which is deﬁned as the number of misranked ﬂow pairs, where the ﬁrst element of a pair is in the top-10 list returned by the query and the second one is outside the actual top-10. Table 3 presents the error in the results of these three queries averaged across all the measurement intervals. We can observe that although our load shedding system introduces a certain overhead, the error is kept signiﬁcantly low compared to the original version of CoMo. Large standard deviation values are due to long periods of consecutive packet drops during the original como execution. It is also worth noting that the error of the top destinations query obtained in the load shedding execution is consistent with that of [20]. Table 3. Errors in the query results (mean ± stdev) Query counter (packets) counter (bytes) flows top destinations

original como 55.03% ±11.45 55.06% ±11.45 38.48% ±902.13 21.63 ±31.94

load shedding 0.54% ±0.50 0.66% ±0.60 2.88% ±3.34 1.41 ±3.32

Figure 2 shows the overhead introduced by our load shedding system (ps cycles + ls cycles) to the normal operation of the entire CoMo system. We believe this overhead is reasonably low compared to the advantages of keeping the CPU usage and the accuracy of the results well under control. The bulk of the overhead, as discussed in [7], corresponds to the feature extraction phase, which is entirely implemented using a family of memory-eﬃcient algorithms that could be directly built in hardware [21]. Alternatively, this overhead could be reduced signiﬁcantly by applying sampling in this phase.

6

Conclusions and Future Work

In this paper, we presented a predictive load shedding scheme that operates without explicit knowledge of the traﬃc queries and quickly reacts to overload situations by gracefully degrading their accuracy via packet and ﬂow sampling.

1118

P. Barlet-Ros et al.

We implemented our scheme in an existing monitoring system and evaluated its performance and correctness in a research ISP network. We demonstrated the robustness of our method through an 8 hours-long continuous execution, where the system exhibited good CPU utilization without packet loss, even when it was under severe stress. We also pointed out a signiﬁcant gain in the accuracy of the results compared to the original version of the same monitoring system. We also identiﬁed several limitations of our current implementation that constitute an important part of our immediate future work. First, our method does not diﬀerentiate among queries. Hence, we are currently investigating the use of diﬀerent sampling rates for diﬀerent queries according to per-query utility functions, as proposed in [13]. Second, there is a large set of imaginable queries that are not able to correctly estimate their unsampled output from sampled streams. For those queries, we plan to support many diﬀerent load shedding mechanisms, such as computing lightweight summaries of the input data streams [14] and more robust ﬂow sampling techniques [22]. Declarative load shedding is also part of our future work, which will allow the queries to specify their own load shedding mechanisms. Finally, we are interested in applying similar techniques to other system resources, such as memory, storage space and disk bandwidth. Acknowledgments. This work was funded by a University Research Grant awarded by the Intel Research Council, and by the Spanish Ministry of Education (MEC) under contract TEC2005-08051-C03-01 (CATARO project). Authors would also like to thank the Supercomputing Center of Catalonia (CESCA) for allowing them to collect the packet traces used in this work.

References 1. The OneLab project: http://www.fp6-ist-onelab.eu. 2. kc claﬀy, Crovella, M., Friedman, T., Shannon, C., Spring, N.: Community-oriented network measurement infrastructure (CONMI) workshop report. SIGCOMM Comput. Commun. Rev. 36(2) (2006) 41–48 3. Cranor, C., Johnson, T., Spataschek, O., Shkapenyuk, V.: Gigascope: A stream database for network applications. In: Proceedings of ACM Sigmod, New York, NY, USA, ACM Press (June 2003) 647–651 4. Iannaccone, G.: Fast prototyping of network data mining applications. In: Proceedings of Passive and Active Measurement Conference. (March 2006) 5. Stankovic, J., Lu, C., Son, S., Tao, G.: The case for feedback control real-time scheduling. In: Proceedings of the 11th Euromicro Conference on Real-Time Systems. (June 1999) 11–20 6. Keys, K., Moore, D., Estan, C.: A robust system for accurate real-time summaries of internet traﬃc. In: Proceedings of ACM Sigmetrics, New York, NY, USA, ACM Press (2005) 85–96 7. Barlet-Ros, P., Iannaccone, G., Sanju` as-Cuxart, J., Amores-L´ opez, D., Sol´e-Pareta, J.: Predicting resource usage of arbitrary network traﬃc queries. Technical report, Technical University of Catalonia (December 2006) http://loadshedding.ccaba. upc.edu/prediction.pdf. 8. Cisco Systems: NetFlow services and applications. White Paper (2000)

On-Line Predictive Load Shedding for Network Monitoring

1119

9. Estan, C., Keys, K., Moore, D., Varghese, G.: Building a better NetFlow. In: Proceedings of ACM Sigcomm, New York, NY, USA, ACM Press (August 2004) 245–256 10. Dreger, H., Feldmann, A., Paxson, V., Sommer, R.: Operational experiences with high-volume network intrusion detection. In: Proceedings of ACM Conference on Computer and Communications Security, New York, NY, USA, ACM Press (2004) 2–11 11. Paxson, V.: Bro: A system for detecting network intruders in real-time. Computer Networks 31 (1999) 2435–2463 12. Gonzalez, J., Paxson, V.: Enhancing network intrusion detection with integrated sampling and ﬁltering. In: Proceedings of International Symposium on Recent Advances in Intrusion Detection. (2006) 272–289 13. Tatbul, N., C ¸ etintemel, U., Zdonik, S.B., Cherniack, M., Stonebraker, M.: Load shedding in a data stream manager. In: Proceedings of International Conference on Very Large Data Bases. (2003) 309–320 14. Reiss, F., Hellerstein, J.M.: Declarative network monitoring with an underprovisioned query processor. In: Proceedings of International Conference on Data Engineering, Los Alamitos, CA, USA, IEEE Computer Society (2006) 56–67 15. Welsh, M., Culler, D.E., Brewer, E.A.: SEDA: An architecture for well-conditioned, scalable internet services. In: Proceedings of ACM Symposium on Operating System Principles, New York, NY, USA, ACM Press (2001) 230–243 16. Stevens, W.R.: TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery algorithms. RFC 2001 (January 1997) 17. Duﬃeld, N.: Sampling for passive internet measurement: A review. Statistical Science 19(3) (2004) 472–498 18. Carter, J.L., Wegman, M.N.: Universal classes of hash functions. Journal of Computer and System Sciences 18(2) (1979) 143–154 19. Endace: http://www.endace.com. 20. Barakat, C., Iannaccone, G., Diot, C.: Ranking ﬂows from sampled traﬃc. In: Proceedings of ACM conference on Emerging network experiment and technology, New York, NY, USA, ACM Press (2005) 188–199 21. Estan, C., Varghese, G., Fisk, M.: Bitmap algorithms for counting active ﬂows on high speed links. In: Proceedings of ACM SIGCOMM Conference on Internet Measurement. (2003) 153–166 22. Duﬃeld, N., Lund, C., Thorup, M.: Flow sampling under hard resource constraints. In: Proceedings of ACM Sigmetrics. (2004) 85–96

On the Schedulability of Measurement Conflict in Overlay Networks Mohammad Fraiwan and G. Manimaran Real Time Computing & Networking Laboratory Dept. of Electrical and Computer Engineering Iowa State University, Ames, IA 50011 {mfraiwan,gmani}@iastate.edu

Abstract. Network monitoring is essential to the correct and eﬃcient operation of overlay networks, and active measurement is a key design problem in network monitoring. Unfortunately, almost all active probing algorithms ignore the measurement conﬂict problem: Active measurements conﬂict with each other - due to the nature of these measurements, the associated overhead, and the network topology - which results in reporting incorrect results. In this paper, we consider the problem of scheduling periodic QoS measurement tasks in overlay networks. We ﬁrst show that this problem is NP-complete, and then propose a conﬂictaware scheduling algorithm whose goal is to maximize the number of measurement tasks that can run concurrently, based on a well known approximation algorithm. Simulation results show that our algorithm achieves 25% better schedulability over the existing algorithm. Finally, we discuss various practical considerations, and identify several interesting research problems in this context.

1

Introduction

Overlay networks have come to play an increasingly diverse role in today’s network applications. Such applications include end-system multicast, routing, storage and lookup systems (e.g., Akamai [1]), and security. Monitoring overlays are also becoming more common, Internet service providers are deploying monitoring tools at speciﬁc nodes in their networks, as part of a Network Measurement Infrastructure (NMI) [2]. Overlays allow designer to implement and deploy their algorithms, applications, and services with great ﬂexibility and versatility. However, maintaining the eﬃcient and correct operation of these overlays requires regular probing of overlay links to measure available bandwidth, delay, and loss rate. Network monitoring is an important infrastructure service that helps in enforcing network security policies and track network performance. Network monitoring and active measurements in particular do not come at a cheap cost. Measuring bandwidth, loss, and delay involves injecting a nonnegligible amount of probe packets [3]. An inherent property of overlay networks is the overlap or correlation between seemingly independent overlay links. Even if the measurement is done at the physical IP-level network, paths between diﬀerent I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1120–1131, 2007. c IFIP International Federation for Information Processing 2007

On the Schedulability of Measurement Conﬂict in Overlay Networks

1121

source-destination pairs typically overlap [4]. This overlap, in association with the injected overhead, causes concurrent measurements to cross talk or conﬂict with each other, which results in reporting incorrect results, and hence taking incorrect actions by the network administration or the beneﬁciary applications [2]. In this paper we formulate the measurement conﬂict problem as a scheduling problem of periodic QoS real-time tasks. We show the NP-Completeness of this problem, and we propose a heuristic algorithm based on graph partitioning concepts. The remainder of this paper proceeds as follows. Section 2 gives a background into the measurement conﬂict problem. Section 3 presents our network monitoring model, its assumptions, and its computational complexity. Section 4 discusses the related work and motivation. Section 5 goes into the details of the algorithms. Section 6 presents simulation results that demonstrate the performance of the technique proposed. We conclude in section 7.

2

Background

The measured parameters of a network path are bandwidth, loss rate, latency, jitter, etc. We classify the way these parameters can be obtained into two categories, based on the introduction of traﬃc into the network. Passive measurement techniques capture and analyze the network traﬃc traces without injecting any major extra overhead. They are non-intrusive, but may fail in reporting accurate measurements for some scenarios [5]. On the other hand, active measurement techniques are inherently intrusive, since they inject a non-negligible amount of packets into the network [3]. The particular measurement algorithms are not our concern in this paper. However, understanding the way active measurements work, and how they are aﬀected by each other is essential. Active measurement tools start by injecting packets into the network, then measuring the transmission rate of the packet sequence [6], or the changes in the probing packet gaps [7]. Some tools use uniformly spaced packet sequences, while other use statistically calculated intervals between packet trains. Thus, the extra traﬃc injected by other conﬂicting tools may change the available bandwidth of the measured path, or cause congestion on shared links that leads to instantaneous spikes in loss rate. Another important issue is the measurement interval, the averaging interval may be short-term, or over a relatively longer period [7]. Tools with longer averaging period tend to have more tolerance toward small amounts of conﬂict. Several factors contribute to the measurement conﬂict problem as follows: – Overlay and IP-level topologies: Several studies [3] have shown that there is a great deal of overlap at the IP-level between seemingly independent overlay paths. This overlap may cause measurements to interfere with each other depending on the properties of the measurement tools. – Tasks properties: Tasks can be generally divided as either communication intensive, or computation intensive. Communication intensive tasks inject a non-negligible amount of traﬃc into the network, so multiple concurrent

1122

M. Fraiwan and G. Manimaran

tasks, crossing the same path or part of the path, may cause a drop in the reported available bandwidth. On the other hand, computation intensive tasks conﬂict with other tasks if they cross a common node. – Administrative constraints usually limit the amount of monitoring traﬃc that can be injected into the network at any time.

3 3.1

Network Monitoring Model and Complexity Network Monitoring Model

We generalize the network model to include overlay networks, as well as IPlevel networks. As we said earlier, network monitoring is considered part of a NMI (Network Measurements Infrastructure) employed by ISP’s (Internet Service Providers) at the IP-level network. While the rise of overlay networks and their varying application requires regular monitoring to achieve eﬃcient and correct operation of these overlays. Given an overlay network undirected graph Go = (Vo , Eo ), where Vo is the set of overlay nodes and Eo is the set of overlay edges. The corresponding IP-level graph GIP = (VIP , EIP ) is also available, where VIP is the set of physical nodes and EIP is the set of physical edges. Note that Vo ⊆ VIP , but Eo EIP in general. Several previous studies have assumed and justiﬁed the knowledge of the underlying IP topology by the overlay network operator [3]. Let T = {T1 , . . . , Tn } the set of tasks to be scheduled. Ti = (si , di , ci , pi , tooli ), where si is the source of the measurement task, di is destination of the same task, ci is the running time of the task, pi is the period of task Ti , and tooli is the tool used by the task. The deadline of the task is the same as its period. We also deﬁne matrix M to be an n × n 0-1 matrix, representing the possible conﬂict between tasks if run at the same time, where mij = 1 if task i conﬂicts with task j, and 0 otherwise. The conﬂict matrix M captures the conﬂict among tasks based on the set of tasks and the previously mentioned conﬂict factors. 3.2

Problem Definition

We start by deﬁning three important terms related to this problem: – Feasibility: A feasible schedule is a schedule in which tasks meet their deadlines and no two tasks are scheduled in a conﬂicting manner. – Optimality: A scheduling algorithm is said to be optimal if no other algorithm can ﬁnd a feasible schedule for a task set that this algorithm failed to ﬁnd a feasible schedule. – Schedulability: This deﬁnes the eﬃciency of the scheduling algorithm in terms of the ability to ﬁnd a feasible schedule. The measurement conﬂict scheduling problem can be deﬁned as follows: Given the set of tasks T and the conﬂict matrix M , ﬁnd a feasible schedule of tasks that maximizes schedulability. We prove that this problem is NP-complete by reduction from Maximum Cardinality Independent Set [8] as follows.

On the Schedulability of Measurement Conﬂict in Overlay Networks

1123

Maximum Cardinality Independent Set is a known NP-complete problem. INSTANCE: A Graph H = (W, F ), where W is the set of vertices and F is the set of edges. Question: Is there a subset W ⊆ W such that, for all u, v ∈ W , (u, v) ∈ / F and W is of maximum cardinality? An independent set is said to be of maximum cardinality if it contains the largest possible number of vertices without destroying the independence property. Theorem 1. The optimal scheduling of measurement tasks is NP-Complete. Proof: We consider an instance of the problem, where all the tasks have the same period and the same execution time. An optimal scheduling policy will allow for the maximum concurrency of tasks, without violating the conﬂict constraint. Construct a task conﬂict graph G = (V, E) as follows. Assign a node for each task in the set of tasks T , thus |T | = |V |. An edge is added between nodes i, j ∈ V if the corresponding entry in the conﬂict matrix M, mij = 1. Notice that ﬁnding a maximum independent set in the conﬂict graph will correspond to ﬁnding a maximum set of non-conﬂicting tasks that can execute at the same time. Finding the rest of the schedule can be done by repeatedly deleting the nodes in the previous maximum set and their incident edges, and ﬁnding the maximum independent set on the new graph. A polynomial time veriﬁcation algorithm can be easily found, thus the problem is NP-Complete. Although this problem sounds similar to the well known problem of oﬄine single processor scheduling of real-time tasks, a major diﬀerence exists. The resource under consideration is the network on which probe conﬂicts occur. If the network is treated as a single processor system, then there is no parallelism in executing the tasks, this will lead to no conﬂicts, but poor schedulability. The problem is also diﬀerent from the multiprocessor scheduling, because if the network is treated as a multiprocessor system, then the problem become identifying the processors that are active (i.e., the tasks that can execute concurrently) at any given time.

4

Related Work

Most of the research on network monitoring has focused on reducing the probing overhead [3]. Nonetheless, this reduction is a function of the number of nodes in the probing node set, with the best known algorithms having complexity of O(n log n). However, the constant factor in this function is high and depends on the type of measurements (e.g., bandwidth, loss, etc.) and other factors. The measurement conﬂict problem was ﬁrst introduced by Calyam et al. in [2]. They have observed such a problem while designing ActiveMon for the Third Frontier Network (TFN) project. ActiveMon is an NMI software Framework to collect and analyze network-wide active measurements. In another study [9] they have developed a simple scripting language interface to specify various measurement requirements used in generating measurement timetables. Our work is

1124

M. Fraiwan and G. Manimaran

diﬀerent from theirs in that we show the NP completeness of the problem and provide a far superior alternative algorithm at no extra cost, while providing a performance bound. A token passing protocol has been used by related studies to minimize collisions between probes [10], this protocol is used to generate time series of measurement data, which is then used in numerical forecasting models as part of a Network Weather Service. However, their approach does not allow for concurrent execution of multiple measurement tasks. Periodic task scheduling is a well studied problem in the Real-Time scheduling literature. For example, EDF (Earliest Deadline First) scheduling is an optimal single processor scheduling algorithm. As the name suggests, EDF scheduling gives higher priority to the task with the earliest deadline. In this study, we leverage some of the concepts used in Real-Time scheduling, without aﬀecting the novelty of our approach. Motivation In this section we give a motivational example, where we show the shortcomings of existing solutions and the existence of a signiﬁcant room for improvements. The algorithms to be considered are: – Unsynchronized scheduling (US): Tasks are run without regard to conﬂict. This algorithm achieves maximum schedulability, but with a lot of conﬂicts. – Non-preemptive EDF: This algorithm will schedule tasks based on deadline, with higher priority given to earlier deadlines. It permits no conﬂicts, but achieves worst schedulability. – EDF-CE: EDF with Concurrent Execution, this is the algorithm proposed in [2]. Tasks are executed in EDF order, and ready tasks are added randomly as long as they do not conﬂict with the currently executing tasks. Example 1. Consider the following three tasks, where Ti = (ci , pi ). T1 = (15, 50), T2 = (35, 75), T3 = (50, 100). The conﬂict graph is given in Fig. 1a. This example shows an interesting fact. When faced with a non-uniform task set, blindly trying to increase task parallelism may lead to a scheduling anomaly, a situation in which a higher priority task misses its deadline due to a lower priority task execution. The schedule produced by EDF-CE exhibits this problem, see Fig. 1b. Task T1 has the highest priority (i.e., lowest deadline), to achieve maximum overlap, task T3 was allowed to run concurrently with task T1, but since T3 has a longer execution time, it will continue running past T1, causing T2 to miss its deadline. The schedule produced by Unsynchronized Scheduling (US) allows all the tasks to be run at the same time. US and EDF are useless for this problem due to large conﬂicts and poor schedulability respectively. On the other hand, EDF-CE provides higher schedulability with fewer conﬂicts, but it does not provide any guarantees on the amount of parallelism, misses many chances for improvement by randomly choosing tasks for parallel execution, and may cause scheduling anomalies.

On the Schedulability of Measurement Conﬂict in Overlay Networks

T1

US 0

T1

T2 0

T3

0

35

15

50

75

85

100

75

85

100

75

85

100

T3

T2

15

T1

EDF-CE

T3

T2

15

T1

EDF

1125

50

T3

T2 50

(a) Task Conﬂict Graph (b) The schedules produced by US, EDF, and EDF-CE. Fig. 1. Example 1. Under EDF the second instance of T1 will miss its deadline, while under EDF-CE, T2 will miss its deadline due to a scheduling anomaly, and the algorithms abort

5

The Scheduling Algorithm

Since the problem is NP-complete, we develop a heuristic algorithm based on graph partitioning. The algorithm generally has the following three steps: 1. Construct the task conﬂict graph. 2. Partition the task conﬂict graph into the least number of partitions. 3. Schedule each partition concurrently as long as there is enough slots in the time frame, where the time frame is the period of the uniform task set (i.e., task set with the same period and deadline), or the least common multiple (LCM) of the periods in a non-uniform task set. 5.1

The Conflict-Aware Scheduling Algorithm

In case the task set is uniform, the problem becomes a partitioning problem. As for the general case of a non-uniform task set, we use period transformation and execution time transformation [11]. Tasks within each partition are transformed into a certain number of the same uniform task τ = (C, P ). This uniform task can be the same across partitions or diﬀerent. Transforming the tasks into a common task ensures that all the tasks are aligned with each other. Thus, achieving higher parallelism, while maintaining the properties of the original tasks (i.e., deadline, and utilization). Algorithm 1 shows the pseudo code for the conﬂictaware scheduling algorithm, which proceeds in the following steps: 1. The current task set is grouped into a collection of non-uniform task partitions (steps 1.3-1.7), using a least conﬂict ﬁrst partitioning algorithm. The task with the least number of conﬂicts is chosen ﬁrst. Then, all of its neighboring (i.e., conﬂicting) tasks in the graph are removed. We proceed until no more tasks can be added to the current set (step 1.4). The tasks from the resulting partition are removed from the graph, along with their incident edges (step 1.5). The process is repeated until all tasks are grouped.

1126

M. Fraiwan and G. Manimaran input : The Task Conﬂict Graph G = (V, E). τ = (C, P ) output: A schedule of tasks

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 1.14

S←φ I←φ while G = φ do I ← P artition(G) G←G−I S ←S∪I end foreach subset s ∈ S do Transform task subsets using τ Construct a subset conflict graph H, where subtasks from the same parent task share an edge Construct sub-partitions from H end The earliest deadline of a parent task in a sub-partition is the deadline of that sub-partition. Sort all the resulting sub-partitions in increasing order of deadline. Output the schedule.

Algorithm 1. The conﬂict-aware scheduling algorithm

2. Within each partition, tasks are divided into multiple uniform tasks τ = (C, P ), where P is the common period and it is smaller than the period of any task in the task set, C is the common execution time and it is smaller than the execution time of any task in the task set, and the utilization C P is smaller than the utilization of any other task. Subtasks from the same parent task will form a complete subgraph in the task conﬂict graph (i.e., they pairwise conﬂict), and each subtask will inherit the conﬂicts of its parent task. i /pi The number of subtasks generated from a given tasks is given by: cC/P There may be some performance loss due to the rounding up. We argue, however, that achieving higher parallelism, and no scheduling anomalies overrides this slight loss of performance. Note that τ can be the same across all partitions, or unique to each partition. Each subtask will inherit the deadline of its parent task. Hence, inherit the priority. So the deadline of the subtask is not P , which is the period of τ . 3. Each partition is further divided into sub-partitions, using the same partitioning algorithm (step 1.11). 4. The sub-partitions from all partitions are in increasing order of the earliest deadline of a parent task in the sub-partition. This is the schedule. Going back to example 1, we use τ = (5, 50) to normalize the tasks. Task T1 is split into 3 sub-tasks, while T3 is split into 5 subtasks, and T2 stays the same. This division is useful for the purpose of ﬁnding maximum overlap, while

On the Schedulability of Measurement Conﬂict in Overlay Networks

T1

T1 Ĳ

T2

1127

T2

T3

Ĳ

2. Period + Execution time transformation

T3 1. Partition the graph

3. Construct sub-partitions

T2

(a) The working of algorithm 1. T1 T3

Conflict-aware Algorithm 0

T2

15

T3 150

T1 50

T2 170

T3 65

T2 75

T3

T1 205

220

85

100

T1

T1 120

T3 135 150

T2

255 265

300

(b) The schedule produced by algorithm 1. Fig. 2. Algorithm 1 solution to example1

preserving task priorities. However, the tasks actual deadline is still the same. For example, in ﬁgure 2 at time 50, 3 sub-tasks of T3 will miss their deadline imposed by τ , but the actual deadline is 100. 5.2

Implementation Issues

We envision this scheduling algorithm as a component of a monitoring infrastructure, which is part of a larger network management system, or a network application. A node is selected as a controller, which is responsible for collecting and scheduling measurement requests. To make the controller fault tolerant, well-known backup and leader election strategies can be used. As for the performance bottleneck concern, we argue that the communication and computation costs at the central node are small for a reasonably sized monitoring task set. Network measurement is a sophisticated process that involves many issues that need to be considered in conjunction with scheduling. First, the measurement tool may not be susceptible to subdivision. For example, bandwidth

1128

M. Fraiwan and G. Manimaran

measurement tools use the dispersion of packets in packet trains to estimate the bandwidth, the designer need to be careful not to subdivide the packets in the same train as it will aﬀect the measurement process. Second, the execution time of measurement tools may vary greatly based on spatial and temporal factors such as hop count and path bandwidth. And for some measurement tools (e.g., PathChirp [12]) the diﬀerence may be in the order of minutes. Predetermining each task’s run-time and period may not be a trivial issue. Period and execution time transformation may lead to increased execution time of the scheduling algorithm. Since the scheduling algorithm is either run oﬄine or run only when changes in the task set occur, this kind of increase is moderate considering that the least conﬂict ﬁrst partitioning algorithm runs in linear time [13], and the processing capabilities of the centralized controller. Another implementation issue is that of the global synchronization of measurement schedules starts and stops. Achieving a 100% synchronization of a distributed system clock is a diﬃcult task. However, our studies, omitted here for the lack of space, show that complete synchronization is not necessary as the measurements accuracy gracefully degrades with the amount of overlap. Thus, a certain amount of synchronization imperfection can be tolerated.

6

Simulation Results

Scheduling uniform task sets depends solely on how good the partitioning algorithm performs. To study this eﬀect, we experiment with a task conﬂict graph consisting of 100 tasks, each pair of tasks share an edge based on a certain

50 Least Conflicts First EDF−CE

45

Number of Partitions

40 35 30 25 20 15 10 5 0.1

0.2

0.3

0.4 0.5 0.6 Conflict Probability

0.7

0.8

0.9

Fig. 3. The number of partitions produced by the EDF-CE algorithm, and the least conﬂicts ﬁrst partitioning algorithm

On the Schedulability of Measurement Conﬂict in Overlay Networks

100 Conflict−aware EDF−CE EDF US

90 80

Success Ratio

70 60 50 40 30 20 10 0

0

0.2

0.4 0.6 Conflict Probability

0.8

1

(a) Task utilization =0.2 100 Conflict−aware EDF−CE EDF US

90 80

Success Ratio

70 60 50 40 30 20 10 0

0

0.2

0.4 0.6 Conflict Probability

0.8

1

(b) Task utilization =0.4 100 90

Conflict−aware EDF−CE EDF US

80

Success Ratio

70 60 50 40 30 20 10 0

0

0.2

0.4 0.6 Conflict Probability

0.8

1

(c) Task utilization =0.6

Fig. 4. Success ratio as a function of conﬂict probability

1129

1130

M. Fraiwan and G. Manimaran

conﬂict probability. We vary the conﬂict probability from 0.1 to 0.9 , and we report the average number of partitions generated over 20 runs. Figure 3 shows that the proposed least conﬂicts ﬁrst approach achieves about 10% less number of partitions than EDF-CE. We now examine the schedulability of our conﬂict-aware algorithm. In particular, we study the success ratio of our approach in comparison to existing solutions. The success ratio is deﬁned as: success ratio =

# of tasks successf ully scheduled T otal number of tasks

We used a set of 20 tasks, and took the average of 10 runs. The period of each task is selected uniformly at random from [100, 1000], we report the success rate as a function of the conﬂict probability among tasks for task utilization values of 0.2, 0.4, 0.6. The execution time of each task is the product of its period and the utilization value used in the corresponding ﬁgure. Figure 4 shows that the conﬂict-aware scheduling algorithm tasks achieves up to 25% better success ratio. The main reasons for this improvement are the elimination of scheduling anomalies, higher parallelism achieved by our algorithm, and a higher density of tasks in each partition (i.e., a small number of partition have a large number of tasks), which results in dropping low density partitions. In addition, the ﬁgure shows that our conﬂict-aware approach and the EDF-CE algorithm follow the same trend as the conﬂict probability increases. At low conﬂict probabilities (i.e, below 0.2), there is a great deal of possible tasks concurrency, and both algorithms achieve high schedulability. In the region between 0.2 and 0.8 conﬂict probabilities, the conﬂict-aware algorithm achieves superior schedulability compared to EDF-CE. This superiority decreases with increasing task utilization due to the smaller room for improvements. Both algorithm converge again at high conﬂict probabilities (i.e., probabilities greater than 0.9), where most of the tasks conﬂict with each other, and they must be executed sequentially to avoid conﬂicts.

7

Conclusion and Future Work

In this paper, we have shown that the problem of optimally scheduling measurement tasks is NP-complete. We proposed a polynomial time heuristic scheduling algorithms based on a well-known graph theory approximation algorithm, which achieves 10% less number of partitions. Simulation studies have shown that our algorithm improves schedulability by 25% compared to existing solutions. Future work includes conducting intensive performance evaluation of the proposed algorithm (e.g., bounding the performance deviation from the optimal), and performing experiments on PlanetLab [14]. In addition, we plan to formulate imprecise measurement scheduling problems where tasks are allowed to partially overlap, and methodically developing solutions for them.

On the Schedulability of Measurement Conﬂict in Overlay Networks

1131

References 1. Akamai Technologies (February 14, 2007); http://www.akamai.com 2. Calyam, P., Lee, C., Arava, P., Krymskiy, D.: Enhanced EDF scheduling algorithms for orchestrating network-wide active measurements. Proc. of IEEE RTSS’05. 3. Tang, C., Mckinley, P.: On the cost-quality tradeoﬀ in topology-aware overlay path probing. Proc. of IEEE ICNP 2003. 4. Cui, W., Stoica, I., Katz, R.: Backup path allocation based on a correlated link failure probability model in overlay networks. Proc. of IEEE ICNP 2002. 5. Zseby, T., Zander, S., Carle, G.: Evaluation of Building Blocks for Passive OneWay-Delay Measurements. Passive and Active Measurements Workshop 2001. 6. Jain, M., Dovrolis, C.: End-to-end available bandwidth: Measurement methodology, dynamics, and relation with TCP throughput. Proc. of SIGCOMM’02. 7. Strauss, J., Katabi, D., Kaashoek, F.: A measurement study of available bandwidth estimation tools. Proc. of ACM IMC Oct. 2003. 8. Garey, M., Johnson, D.: Computers and Intractability: A guide to the theorey of NP-completeness. W. H. Freeman, 1979. 9. Calyam, P., Lee, C., Arava, P., Krymskiy, D., Lee, D.: OnTimeMeasure: A Scalable Framework for scheduling active measurements. Proc. IEEE E2EMON 2005. 10. Gaildioz, B., Wolski, R., Tourancheau, B.: Synchronizing Network Probes to avoid Measurement Intrusiveness with the Network Weather Service. IEEE Highperformance Distributed Computing Conference 2000. 11. Sha, L., Lehoczky, J., Rajkumar, R.: Solutions for some practical problems in prioritized preemptive scheduling. Proc. 7th IEEE RTSS 1986. 12. Ribeiro, V., Riedi, R., Baraniuk, R., Navratil, J., Cotrell, L.: pathchirp: Eﬀecient available bandwidth estimation for network paths. In Passive and Active Measurement Workshop 2003. 13. Halldorsson, M., Radhakrishnan, J.: Greed is good: approximating independent sets in sparse and bounded degree graphs. Proc. of the twenty-sixth annual ACM symposium on Theory of computing. 1994. 14. PlanetLab (February 14, 2007); http://www.planet-lab.org/

SEA-LABS: A Wireless Sensor Network for Sustained Monitoring of Coral Reefs Matt Bromage1, Katia Obraczka1, and Donald Potts2 Department of Computer Engineering Department of Ecology and Evolutionary Biology University of California, Santa Cruz www.ucsc.edu 1

2

Abstract. This paper describes SEA-LABS (Sensor Exploration Apparatus utilizing Low-power Aquatic Broadcasting System), a low-cost, power-eﬃcient Wireless Sensor Network (WSN) for sustained, real-time monitoring of shallow water coral reefs. The system is designed to operate in remote, hard-to-access areas of the world, which limits the ability to perform on-site data retrieval and periodic system maintenance (e.g., battery replacement/recharging). SEA-LABS thus provides a customized solution to shallow-water environmental monitoring addressing the trade-oﬀs between power conservation and the system’s functional requirements, namely data sensing and processing as well as real-time, wireless communication. We present SEA-LABS’ architecture and its current implementation. Finally, we share our experience deploying SEALABS in the Monterey Bay. Keywords: Wireless Sensor Network (WSN), Remote Environmental Monitoring.

1

Introduction

Shallow tropical and warm temperate oceans are the major global sites for formation of calcium carbonates and new limestones. These shallow limestones have an important impact on atmospheric carbon dioxide levels. Projected climate changes appear to threaten the calciﬁcation potential of many organisms and may reduce the stability of existing limestones. Over the past decade, accumulating evidence indicates that, although the ocean absorbs carbon dioxide (partially ameliorating atmospheric carbon dioxide rise), ocean buﬀering has been impacted so severely that sea-water chemistry is changing (higher carbonic acid concentration; lower pH) at accelerating rates. Because coral reefs live mainly at land-sea-air “interfaces”, they are especially vulnerable to both atmospheric and oceanic changes. The phenomenon of coral “bleaching” is likely to be one visible sign of such stress. Many corals lay down annual skeletal bands with seasonal diﬀerences in density and geochemical composition. It is therefore urgent to assess the impact of physical and chemical changes on coral reefs. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1132–1135, 2007. c IFIP International Federation for Information Processing 2007

SEA-LABS: A WSN for Sustained Monitoring of Coral Reefs

1133

Currently, scientists rely on remote sensing, often through satellite imaging, to monitor coral reef habitats. However, data acquisition, besides being considerably expensive, is far from having adequate temporal and spatial resolution. For instance, typical satellite surveys provide average temperature of the ocean surface over areas of up to 60 square miles. On the other hand, corals living on a reef may experience temperatures that are 5 to 10 degrees Celsius diﬀerent from that average. Furthermore, scientists do not have access to the data in realtime. In fact, it can take weeks for data to be available and then considerable processing has to happen before scientists have access to useful information. SEA-LABS aims at building a low-cost, power-eﬃcient wireless sensor network to perform in-situ monitoring of coral reef habitats in order to provide scientists with continuous, real-time data at adequate temporal and spatial resolution.

2

System Overview

SEA-LABS consists of two main functional components: sensing nodes aﬀectionately called Programmable Oceanic Devices (PODs) and data sinks (a.k.a. base stations). Each POD is divided into three subsystems: control and processing, wireless communication, and power management (see Figure 1).

Fig. 1. The POD’s hardware, including: (1) the wireless subsystem (2) control and processing subsystem including power management (3) lithium battery pack (4) antenna connector (5) watertight housing

The control and processing subsystem is built around the Texas Instruments MSP430 processor[1], and includes standard sensors (temperature, light, pressure, conductivity, and pH). The wireless subsystem uses the MaxStream 9XTEND 1-Watt transceiver [2] operating in the 900MHz frequency spectrum. The POD is placed at ocean surface level (e.g., strapped to a buoy) to allow for RF communication. The power management subsystem implements a custom scheduling algorithm (as opposed to using TinyOS) so that overhead could be minimized. TPS1120

1134

M. Bromage, K. Obraczka, and D. Potts

dual p-channel enhancement mode MOSFETs[3] are used to power down components not being used, thus preserving battery power and increasing the system lifespan. The base station consists of a laptop connected to the Internet, where the sensor data is ultimately received.

3

Scheduling and Networking

The scheduler switches the POD through a duty cycle that has four modes of operations, namely: (1) Deep Sleep, (2) Data Collection, (3) Data Analysis, and (4) Wireless Transmission. During deep sleep, the processor is put into low-power mode, which turns oﬀ all unnecessary system clocks and peripherals. To take sensor readings, the POD enters the data collection mode. Afterward, the processor analyzes the results and formats the information into network packets. After successfully sending the data packets, the POD transitions to a deep sleep mode until the next round of sampling. SEA-LABS’ current network implementation follows a star topology with the information sink (or base station) as the middle node and sensing nodes on the edges. The deployment assumes that all sensing nodes are one hop away from the sink. As the basic medium access mechanism, we use a half-duplex slotted-Aloha system [4]. This scheme works well given that the network load requirements are typically low (less than 1 percent). One of the items of future work is to enable multi-hop deployments which will ease current radio coverage requirements. Once the data packets are received at the base station, it is posted to an online mySQL database. The database can be queried using a PHP graphical user interface (GUI). Users can access information by selecting combinations of sensor readings, POD devices, date and/or time. The user has the option to download the selected data into an Excel spreadsheet for analysis.

4

Monterey Bay Deployment

SEA-LABS was deployed in the Monterey Bay as an initial assessment of the systems’s functionality and performance in a scenario similar to the target environment. The bay is subject to adverse weather conditions and a non-ideal transmission medium. The deployment’s main goals were: (1) validate battery life estimates; (2) test communication capabilities and network functionality; (3) evaluate overall system robustness. A single node was deployed in the Monterey Bay attached to a pre-existing buoy. Initial deployment results are quite promising in that no packets were dropped over the network. Future deployments will employ batteries with increased capacity and new networking protocols to allow multi-hop communication.

SEA-LABS: A WSN for Sustained Monitoring of Coral Reefs

1135

Fig. 2. POD attached to buoy during Monterey Bay deployment

5

Related Research

There are several eﬀorts related to SEA-LABS including the Berkeley Mote[5], eﬀorts led by UCLA’s Center for Embedded Networked Sensing[6], UC Berkeleys’ Wireless and Embedded Systems[7] group, and the ZebraNet[8] project. The Aquaﬂecks and Amour AUV projects[9] highlight eﬀorts in the area of of underwater autonomous vehicles (AUV). Acknowledgments. This work has been supported in part by grants from NSF ANI-0322441, CDELSI, the Ferdinand S. Ruth Fund and Myers Trust, the Friends of Long Marine Lab, and Mitubishi Corporation’s Global Coral Reef Conservation Project.

References 1. MSP430x15x and MSP430x16x Data Sheet (SLAS368A), Texas Instruments, March 2003. 2. Engineering Datasheet: MaxStream 9XTEND OEM Module, MaxStream. [Online]. Available: http://www.maxstream.net/products/xtend/oem-rf-module.php 3. Engineering Datasheet: Enhancement Mode Mosfet TPS1120, Texas Instruments. [Online]. Available: http://focus.ti.com/lit/ds/symlink/tps1120.pdf 4. N. Abramson, “The aloha system - another alternative for computer communications,” Proceedings of Fall Joint Computer Conference, AFIPS Conference, 1970. 5. A. Woo, “Mote documentation and development information,” http://www.eecs. berkeley.edu/˜awoo/smartdust/, 2000. 6. UCLA Center for Embedded Networked Sensing, CENS. [Online]. Available: http:// www.cens.ucla.edu/ 7. UCB Wireless and Embedded Systems, WEBS. [Online]. Available: http://webs.cs. berkeley.edu/ 8. ZebraNet, ZEBRANET. [Online]. Available: http://www.princeton.edu/˜mrm/ zebranet.html 9. I. Vasilescu, K. Kotay, D. Rus, M. Dunbabin, and P. Corke, “Data collection, storage, and retrieval with an underwater sensor network,” Sensys, 2005.

Capacity-Fairness Performance of an Ad Hoc IEEE 802.11 WLAN with Noncooperative Stations Jerzy Konorski Gdansk University of Technology ul. Narutowicza 11/12, 80-952 Gdansk, Poland [email protected]

Abstract. For an ad hoc IEEE 802.11 WLAN we investigate how the stations' incentives to launch a backoff attack i.e., to configure small minimum and maximum CSMA/CA contention windows in pursuit of a larger-than-fair bandwidth share, affect a proposed capacity-fairness index (CFI). We link CFI to the network size, "power awareness," a station's perception of the other stations' susceptibility to incentives, and the way of learning how the other stations perceive the other stations' susceptibility to incentives.

1 Introduction Estimated limits of performance of an IEEE 802.11 WLAN [5] become the more realistic, the richer is the assumed network model; most existing estimates account for PHY-layer bandwidth, MAC and TCP overhead, DATA frame size, number of network stations, station mobility, and channel characteristics. We bring into the picture noncooperative behavior in the form of a backoff attack: each station n is free to configure an arbitrary wn = <wn,min, wn,max> (the minimum and maximum CSMA/CA contention windows) in pursuit of a larger-than-fair long-term bandwidth share [1], [3], thus engaging in a noncooperative CSMA/CA game [3], [7]. It can be shown that if the greedy <1, 1> configuration is ruled out (the backoff mechanism is mandatory) then the game has a unique Nash equilibrium (NE) [4]. Otherwise, one disincentive to configure <1, 1> is a certain "power awareness" i.e., fear of another station also configuring <1, 1>, for all the transmission power is then spent on frame collisions. This we assume tantamount to a "penalty" bandwidth share, which leads to multiple Nash equilibria. In the absence of a compelling unique NE we introduce a simple calculus of backoff attack incentives, a form of seeking a best reply to the beliefs as to the other stations' imminent play. We propose a capacity-fairness index (CFI), a synthetic performance measure equal to the product of the total goodput (bandwidth utilization) and the Jain index of the stations' bandwidth shares. We link CFI to the network size, the stations' "power awareness," a station's perception of the other stations' susceptibility to incentives, and a station's way of learning how the other stations perceive the other stations' susceptibility to incentives. Based on the stations' bandwidth shares bn obtained from existing models [2], [7] we demonstrate that for small enough networks and "power aware" enough stations, cooperative behavior may ultimately emerge. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1136–1139, 2007. © IFIP International Federation for Information Processing 2007

Capacity-Fairness Performance of an Ad Hoc IEEE 802.11 WLAN

1137

2 CSMA/CA Game and Backoff Attack Incentives To reflect both the total bandwidth utilization Σnbn and Jain fairness [6] we use their product i.e., (Σnbn)3/(N⋅Σn bn2 ), that we name capacity-fairness index (CFI). For an Nplayer CSMA/CA game with payoffs bn, suppose that x selfish stations configure ws = <2, 2>, y greedy stations configure wg = <1, 1> (i.e., disengage the backoff scheme), and N − x − y honest ones stick to a standard-prescribed wh e.g., <16, 1024>; let the respective payofs be denoted by bs[g,h](N, x, y). In existing IEEE 802.11 settings, bh(N, 0, 0) > 0 for not too large N and bh(N, x > 0, y) ≈ 0. Thus if x = y = 0 then CFI = N⋅bh(N, 0, 0) (denote this cooperative value by c-CFI). Note that bG = bg(N, x, 1) is the highest possible payoff, while bs[h](N, x, y > 0) = 0. Let bg(N, x, y > 1) = bC ≤ 0, where bC is a "penalty" payoff, reflecting the fact that a greedy (yet "power aware") station in this case spends all its transmission power to no effect. If bC < 0 [bC = 0] then any configuration profile with y = 1 [y > 0] is a NE. Thus the game has multiple Nash equilibria; to predict its outcome we calculate backoff attack incentives. Definition 1. A selfish [greedy] backoff attack incentive is the ratio of the likely payoff upon switching from wh to ws [wg], and the cooperative payoff bh(N, 0, 0). A 0-order sophisticated approach to the "likely" part neglects similar conduct at the other stations: I g ,0 = bˆg ( N , 0, 1) and I s , 0 = bˆs ( N , 1, 0) (the hats normalize w.r.t. bh(N,

0, 0)). Alternatively, a station forms a model of how the other stations' play is susceptible to the calculated incentives. A susceptibility map Φ returns for a (Is, Ig) pair the probabilities ps [pg] of configuring ws [wg] at any other station (ph = 1 − ps − pg is the probability of staying at wh). Intuitively, Φ should be continuous and ensure that pg increases in Ig, ps increases in Is, and ph decreases in both Is and Ig. Taking (ps, pg) = Φ(Is,0, Ig,0), one can calculate the expected normalized payoffs: I s[g],1 = ∑ x , y , z ≥ 0

x + y + z ≤ N −1

⎛ N − 1⎞ x y z ⎜⎜ ⎟⎟ ps p g p h ⋅ bˆs[g] ( N , x + 1[ x], y[ y + 1]) . ⎝x y z⎠

(1)

This approach can be termed 1-order sophisticated, as it does account for the other stations also calculating incentive measures, though neglects their use of Φ. Higherorder sophistication consists in re-applying (1) to account for the other stations using Φ, their accounting for the other stations using Φ etc. In the limit Φ is deemed common knowledge [4]. Hence, ∞-order sophisticated incentive measures solve the fixpoint-type equation (where F is defined by (1) with (ps, pg) = Φ( Is,∞, Ig,∞)): (Is,∞, Ig,∞) = F(Is,∞, Ig,∞),

(2)

A unique solution of (2) obtains e.g., if Φ is defined as follows:

ps = ϕ 2 ( I s ,∞ ) /(ϕ ( I s ,∞ ) + ϕ ( I g ,∞ )), pg = ϕ 2 ( I g ,∞ ) /(ϕ ( I s ,∞ ) + ϕ ( I g ,∞ )) .

(3)

Here, the function ϕ measures a station's willingness to switch from wh to ws [wg], given (Is, Ig); it should be continuous and nondecreasing, with ϕ(0) = 0 and ϕ(∞) = 1. If the CSMA/CA game is played, we use the expected value of CFI w.r.t. the probabilities of configuring wg, ws, and wh, determined by ∞-order incentives.

1138

J. Konorski

Definition 2. The noncooperative CFI, denoted n-CFI, is defined as N −1⎛ N − 1⎞ x N −1− x ⎟⎟ ps p h c - CFI ⋅ p hN + p g (1 − p g ) N −1 bG + ps ∑ x = 0 ⎜⎜ ( x + 1)bs ( N , x + 1, 0) ⎝ x ⎠

(4)

(recall that if all k nonzero payoffs out of N are equal then the Jain index is k/N). In reality, the other stations' susceptibility to incentives may be learned by playing the CSMA/CA game repeatedly and observing successive configuration profiles. We model this intuition by taking a sigmoid ϕ(I) = (1−e−4I)/(1−e−4(I−a)) and manipulating its center a. In the ith instance of the game, station n's perception of the other stations' susceptibility is reflected by a ni , with the dynamics ani +1 = max{0, ani + δ n ( X i , Y i )} . Here, the function δn describes the learning process at station n, and Xi [Yi] is the number of selfish [greedy] stations in the ith instance, distributed according to ps, pg, and ph = 1 − ps − pg as calculated from (3), with the sigmoid ϕ centered at a ni . Let δn(x > 0, 0) = –Δn, δ(0, 0) = –2Δn, and δ(x, y > 0) = Δn, where Δn is proportional to station n's initial a n0 through a constant Δ (thus relative changes of a are the same at each station). ( (a1i ,..., a Ni ) , i = 1,2,…) is an N-dimensional random walk with an absorbing state (0,…,0) corresponding to cooperative behavior (the solution of (2) then yields ps = pg = 0) and another absorbing state (∞,…,∞), with ps = pg = 1/2 and ph = 0. The capacity-fairness indices at these absorbing states are c-CFI and n-CFI∞ = (N⋅bs(N, N, 0) + bG)/2N, respectively. Define amax so that the corresponding solution of (2) yields ps and pg close to 1/2. Definition 3. The noncooperative learning CFI, denoted nl-CFI, is defined as πN⋅cCFI + (1 –πN)⋅ n-CFI∞, πN being the probability of reaching the absorbing state (0,…,0) given that each station n selects a n0 ∈ [0...a max ] at random.

Assuming Δ = 0.2, Fig. 1 depicts c-CFI and nl-CFI. The latter turns out to be distinctly closer to c-CFI than n-CFI for "power aware" enough stations, as confirmed by numerical experiments whose details are omitted. 70 60

CFI (%)

50

c-CFI

impact of contention overhead

b C = -b G

40 nl-CFI

b C = -60%

30 20

impact of noncooperative behavior

b C = -40% bC = 0

10 n-CFI ∞ 0 0

10

20

N

30

40

50

Fig. 1. c-CFI and nl-CFI for various "power awareness" levels

Capacity-Fairness Performance of an Ad Hoc IEEE 802.11 WLAN

1139

3 Conclusion The introduction of wg and "power awareness" changes the CSMA/CA game into one with multiple Nash equilibria i.e., without a compelling outcome. We envisage that each station then calculates common-knowledge incentives to configure ws and wg, and the corresponding probability distribution of imminent configuration profiles. Our study quantitatively illustrates a few intuitions: • the network's ability to provide high and fair bandwidth shares to all stations diminishes as N increases, partly on account of growing contention overhead, but mostly because of the stations' limited willingness to behave cooperatively; these two factors are illustrated for the bC = −40% curve at N = 50, • incentive calculus dictates that the willingness to behave cooperatively grow with "power awareness" for fear of spending all the transmission power without getting any bandwidth share; accordingly, CFI improves as bC goes more negative, • the predictions depend on a station's perception of the other stations' susceptibility to incentives, reflected by Φ, and the learning process, reflected by δ, • each of the nl-CFI curves lies between the n-CFI∞ and c-CFI ones; its bias towards the latter measures the chance πN of emergence of cooperative behavior; this is almost certain for small enough N assuming enough "power awareness." Although the "penalty" bandwidth share bC was assumed constant across the stations, it is relatively easy to generalize to nonuniform "power awareness" in order to study the coexistence of devices with diverse battery lifetimes.

Acknowledgment This work was supported by the Ministry of Education and Science, Poland, under Grant 1599/T11/ 2005/29.

References 1. Bellardo, J., Savage, S.: 802.11 Denial-of-Service Attacks: Real Vulnerabilities and Practical Solutions, Proc. USENIX Security Symp., Washington DC (2003) 2. Bianchi, G.: Performance Analysis of the IEEE 802.11 Distributed Coordination Function, IEEE J. on Selected Areas in Comm. 18 (2000) 535-547 3. Cagalj, M., Ganeriwal, S., Aad, I., Hubaux, J.-P.: On Cheating in CSMA/CA Ad Hoc Networks, Proc. IEEE INFOCOM 2005, Miami FL (2005) 4. Fudenberg, D., Tirole, J.: Game Theory. MIT Press (1991) 5. IEEE Standard for Information Technology: LAN/MAN - Specific Requirements, ISO/IEC 8802-11 (1999) 6. Jain, R.: Fairness: How to Measure Quantitatively? ATM Forum/94-0881 (1994) 7. Konorski, J.: A Game-Theoretic Study of CSMA/CA Under a Backoff Attack, IEEE/ACM Trans. on Networking, 14 (2006) 1167−1178

Multi-rate Support for Network-Wide Broadcasting in MANETs Tolga Numanoglu1 , Wendi Heinzelman1 , and Bulent Tavli2 1

Department of Electrical and Computer Engineering University of Rochester, Rochester, NY 14627 USA {numanogl,wheinzel}@ece.rochester.edu 2 Department of Computer Engineering TOBB University of Economics and Technology, Ankara, Turkey [email protected]

Abstract. Mobile ad-hoc networks (MANETs) utilize broadcast channels, where wireless transmissions occur from one user to many others. In a broadcast channel the same transmission can lead to diﬀerent information rates to diﬀerent users depending on the channel capacity between the transmitter and receiver pair. According to coding theory, there is a certain channel capacity that limits the rate of information that can be sent through the channel. Thus, diﬀerent channel capacities result in diﬀerent acceptable rates for the users. In this paper, we utilize a superposed coding scheme in a MANET scenario to provide diﬀerent rates for users with diﬀerent channel capacities using a single broadcast transmission. We have created techniques to extend this multi-rate concept to network-wide broadcasting scenarios, providing the ability for nodes to appropriately trade-oﬀ delay vs. quality. We describe our approach and provide simulation results showing the beneﬁts and limitations of superposed coding in network-wide broadcasting.

1

Introduction

For broadcast transmissions, the capacities of the communication links from the source to the intended recipients vary greatly due to diﬀerences in communication range, fading, and interference on these links. Therefore, it is crucial to explore the characteristics of the broadcast channel that can be utilized to improve the throughput of the network [1]. Due to diﬀerences in qualities of the links between a broadcasting source node and the intended recipients, it is advantageous to adjust the transmission scheme for broadcasting data so that a single transmission can be best received at all receivers. In other words, with a single transmission, the nodes with “high quality” links receive “high rate” information whereas the nodes with “low quality” links receive “low rate” information. This multi-rate broadcasting can be achieved by using Cover’s theory of superposed information [1,2].

This work was supported in part by the University of Rochester Center for Electronic Imaging Systems and in part by Harris Corp., RF Communications Division.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1140–1144, 2007. c IFIP International Federation for Information Processing 2007

Multi-rate Support for Network-Wide Broadcasting in MANETs

1141

We investigate the performance gains achievable using multi-rate broadcasting for network-wide broadcasting in MANETs. For example, if we combine multirate transmission with scalable voice or video coding [3], nodes with diﬀerent link qualities will have diﬀerent rates (qualities) of voice or video available through a single transmission. To demonstrate all the beneﬁts of multi-rate network-wide broadcasting, we utilized the NB-TRACE architecture [4], which is an energyeﬃcient cross-layer network-wide voice broadcasting architecture for mobile ad hoc networks. Furthermore, we also present results using Flooding with IEEE 802.11 with multi-rate broadcasting for comparison.

2

Superposed Coding

Q 10

Example: d1

x

d2

0 1

11

x

1

x

For Encoded Bits

00

0

I

0

x

x

00

1

Error Regions for Low Rate Message (0) with an Error Margin of d1

01

Error Regions for Additional Message (0) with an Error Margin of d2

BPSK Symbol Non-Uniform QAM Symbol

(a)

Percentage Transmit Power Increase (%)

The idea of superposed coding is to add additional coding on top of the initial coding in such a way that it is unlikely that the additional information can be decoded by any receiver with a poor channel. However, any receiver with a good channel will be able to decode both the ﬁrst and the second displacements of the codewords. Cover proved that the degradation in the rate for the poor channel caused by this additional displacement will allow a more rewarding increase in the rate for the good channel [1]. In Figure 1(a), the idea of superposed coding is achieved by transforming a BPSK constellation into nonuniform quadrature amplitude modulation (QAM). For a given constellation point, Figure 1(a) illustrates the noise margins, d1 and d2 , for low rate and additional information, respectively. The nonuniform spacing makes it much easier for a receiver to correctly recover the low rate information (ﬁrst bit) than the additional information (second bit). In this way, we simultaneously send two bits using a single transmission and oﬀer two diﬀerent rates of broadcasting, which potentially may lead to doubled link throughput for nodes with good channels. Note that this example can be easily extended to provide more than two levels of superposed information.

45

~41.42%

40 35 30 25 20

~13.78%

15 10 5 0

0

20

40

60

80

100

Increasing coverage ratio (%)

(b)

Fig. 1. (a) Constellation diagram for non-uniform quadrature amplitude modulation (QAM). The noise margins and error regions for the selected symbol are also illustrated. (b) Increase in the transmission power with increasing additional rate coverage while keeping the low-rate coverage constant.

1142

T. Numanoglu, W. Heinzelman, and B. Tavli

Achieving multi-rate broadcasting without degrading the performance for the receiver with a poor channel requires an increase in the transmission power. For non-uniform QAM, the amount of extra power needed with increasing additional rate coverage is plotted in Figure 1(b). According to the plot, to introduce an additional rate coverage that is 60% of the low-rate coverage, we need to increase the initial transmission power by approximately 13.78%. When we reach equal noise margins for both rates, the constellation becomes that of uniform QAM, and this is the theoretical limit for the superposed information concept.

3

Multi-rate Network-Wide Broadcasting

Figure 2(a) shows an example where node B can receive low rate information with low delay directly from the source node S (ﬂow I ) or high rate information with high delay rebroadcast from node A (ﬂow II ). S

S

High rate rebroadcast decoded by B

II

B

I

(a)

A

B

PL1|PA1 PL1|PA1 PL2|PA2 PL2|PA2

PL1 PL1 PL1|PA1 & PL2 PL1|PA1

...

A

T1 PL1|PA1 R1 T2 PL2|PA2 R2 T3 PL3|PA3

(b)

Fig. 2. (a) Two diﬀerent rates of information available at node B: ﬂow I, low rate information and ﬂow II, high rate information. (b) Packet transmission and reception schedule.

This idea is illustrated in Figure 2(b) as well. At time T1 the source node transmits the ﬁrst superposed high rate packet PL1 |PA1 consisting of the low rate information packet PL1 superposed with the additional rate information packet PA1 . The row starting with R1 shows the packets received (i.e., decoded) by nodes A and B. Node A decodes both the low rate and the additional information, while node B, having a bad link with S, decodes only the low rate part of the superposed packet. The next set of transmissions takes place at time T2 . At this time S broadcasts the next superposed high rate packet PL2 |PA2 . At the same time, nodes A and B rebroadcast their previously received packets PL1 |PA1 and PL1 , respectively. As can be see from the next row of Figure 2(b), node B has both rates of information available and can choose either of the ﬂows (I or II in Figure 2(a)) according to its delay-throughput requirements.

4

Simulations and Conclusions

In this paper we extended the functionality of NB-TRACE to multi-rate data broadcasting by utilizing the cross-layer properties of the TRACE family of protocols [4]. We also modiﬁed IEEE 802.11 to include support for multi-rate

Multi-rate Support for Network-Wide Broadcasting in MANETs

1143

Table 1. Simulation Parameters and Setup (a) Simulation Setup

(b) Simulation Parameters

Parameter

Value

Acronym & Description

Value

Number of Nodes

256

T SF - Superframe duration

61.5ms

N F - Number of frames

7

Simulation Area

1000m x 1000m

Simulation Time

100s

N DS - Number of data slots per frame 14 N C - Number of cont. slots per frame 15

Coverage Ratio

60%

Number of Repetitions

5

Node Mobility

Way-Point

N/A - Data packet size

110B

T V F - Voice packet generation period 61.5ms

Table 2. Packet Delivery Ratios (PDRs), Packet Delay and Delay Jitter, and Energy Consumption values sorted in pairs. I: Priority Delay. II: Priority Throughput (results in doubled throughput since all nodes forward high-rate packets that have twice as much information as the low rate packets). Architecture NB-TRACE

PDR

PDR

I

II

Packet Delay Packet Delay & Jitter I

& Jitter II

99.6%

99.4%

61.0 ms

192.9 ms

Energy

Energy

Use I

Use II

35.2 mJ/s 48.5 mJ/s

99.2% (min) 97.7% (min) 10.6 ms (jitter) 10.9 ms (jitter) Flooding

99.5%

88.1%

12.3 ms

41.1 ms

237.3 mJ/s 240.1 mJ/s

(IEEE 802.11) 99.4% (min) 84.3% (min) 63.7 ms (jitter) 69.6ms (jitter)

broadcasting. Our goal is to investigate both ends of the delay vs. throughput (quality) trade-oﬀ by using NB-TRACE and Flooding with IEEE 802.11. Table 1(a) summarizes the simulation setup we used to investigate these architectures. Acronyms, descriptions and values of the parameters used in the simulations are presented in Table 1(b). We performed two sets of simulations where each set has a diﬀerent priority. First, we prioritize the reception of packets with the lowest delay (set I). This leads to faster network-wide broadcasting and mainly the low rate traﬃc is forwarded by the nodes. In the second set of simulations, throughput is the priority for the nodes (set II), which thus have to forward the high rate traﬃc. We simulated conversational voice obtained through scalable source coding. The base layer data, which is sent as the low-rate information, is coded at 13Kbps, and the additional layer, which is sent as the additional-rate information, is also coded at 13Kbps. This results in a high-rate packet transmission of 26Kbps. In Table 2, results of these two sets of simulations are provided. These results show the two extreme ends of the delay-throughput trade-oﬀ using multi-rate coding in network-wide broadcasting in MANETs. The low delay constraint results in less traﬃc and low average throughput for the network, while the average throughput is nearly doubled at the cost of increased traﬃc and delay when we have throughput as the constraint.

1144

T. Numanoglu, W. Heinzelman, and B. Tavli

Superposed coding makes diﬀerent information rates simultaneously available, and this lets nodes decide which set of upstream nodes they need to listen to in order to maximize the overall ratio of data rate/delay and minimize the energy dissipation. We conclude that this multi-rate broadcasting scheme (along with a highly coordinated routing protocol) should prove more eﬃcient in a multicasting scenario where nodes with diﬀerent throughput requirements and delay constraints can freely chose one of the available rates.

References 1. T. Cover. Broadcast channels. IEEE Trans. on Info. Theory, 18(1):2–14, Jan 1972. 2. P. Bergmans. Random coding theorem for broadcast channels with degraded components. IEEE Trans. on Information Theory, 19(2):197–207, Mar 1973. 3. Hui Dong and J.D. Gibson. Structures for snr scalable speech coding. IEEE Trans. on Audio, Speech and Language Processing, 14(2):545–557, March 2006. 4. B. Tavli and W. B. Heinzelman. NB-TRACE. In Proceedings of the IEEE WCNC, pages 2076–2081, 2005.

BRD: Bilateral Route Discovery in Mobile Ad Hoc Networks Rendong Bai and Mukesh Singhal Department of Computer Science University of Kentucky, Lexington, KY 40506 {rdbai,singhal}@cs.uky.edu Abstract. Traditionally, route discovery in MANETs operates in unilateral (source-initiated) manner. We propose a new scheme called bilateral route discovery (BRD), where both source and destination actively participate in a route discovery process. BRD has the potential to reduce the control overhead by one half. As an underlying protocol for BRD, we propose gratuitous route error reporting (GRER) to notify the destination of a broken route. The destination can thus play an active role in the upcoming route re-discovery. Keywords: Mobile ad hoc networks, routing, on-demand, AODV, route discovery, unilateral, bilateral.

1

Introduction

On-demand routing is preferred in mobile ad hoc networks (MANETs). Related work [1] shows that the average life of a path in MANETs is fairly short (e.g., less than 7 seconds). Therefore, the control overhead of on-demand routing mainly comes from route discoveries, and the routing performance is directly determined by the eﬃciency of the route discovery scheme. Traditionally, most responsibility of discovering a route is assumed by source node, while destination node simply responds to a route request (RREQ) with a route reply (RREP). We call the traditional manner of route discovery unilateral route discovery (URD). URD is not balanced because one party bears more burden than the other. It is not eﬃcient and the delay is longer. This work proposes a new scheme called bilateral route discovery (BRD). BRD has potential to improve the routing performance by reducing control overhead and route discovery latency. The main contributions are as follows: (i) We address the disadvantage of traditional route discovery that operates in a unilateral manner, and propose BRD, where both source and destination actively participate in a route discovery process. (ii) As an underlying protocol for BRD, we propose gratuitous route error reporting (GRER). GRER uses a relaying node to bypass the failed link and notiﬁes the destination of a broken route. The destination can thus actively participate in the upcoming route re-discovery process.

This research was partially supported by NSF grants IIS-0324836 and IIS-0242384.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1145–1148, 2007. c IFIP International Federation for Information Processing 2007

1146

R. Bai and M. Singhal

(iii) We simulated BRD in conjunction with AODV [2]. The results show that BRD signiﬁcantly improves the routing performance. The rest of the paper is organized as follows. In the next section, we discuss the motivation for BRD. The GRER and the BRD protocols are presented in Section 3 and Section 4, respectively. Section 5 presents simulation results. We draw conclusions in Section 6.

2

Motivation

In this section, we investigate the request zone of route discoveries. The request zone of URD can be represented by a circle centered at the source, with the radius not less than the distance from the source to the destination (denoted as r), as shown in Figure 1 (the dashed-line circle). If the destination participates in the route discovery, the search space can be depicted by two smaller circles (solid line in Figure 1): one is centered at the source (source circle or Cs ) and the other is centered at the destination (destination circle or Cd ). When Cs and Cd intersect and some intermediate nodes are located in the intersection, a route is likely to be established. We call these nodes intersection nodes. BRD consists of two halves: a source route discovery (srd) and a destination route discovery (drd). srd and drd search for each other. We denote the radii of Cs and Cd as Rs and Rd , respectively. The optimal values of Rs and Rd are one half of the distance between the source and the destination, and the area of the request zone Abrd is π(r/2)2 ∗ 2 = πr2 /2. On the other hand, when using URD, the area of the request zone Aurd is πr2 . Therefore, BRD may incur as less as a half of the overhead of URD.

RERR_G r/2 r S

X

U

r/2

RERR

RERR_G

D

D

S W V

RERR RERR

Fig. 1. Request zones of URD and BRD

3

Fig. 2. Gratuitous route error reporting

Gratuitous Route Error Reporting

One challenge in designing BRD is how to notify the destination when a route breaks at a link, so that the destination can actively participate the upcoming route re-discovery. We could implement the notiﬁcation in a number of ways. In this paper, we propose gratuitous route error reporting (GRER). We call the upstream and the downstream nodes of the failed link the start and the end nodes, respectively. The basic idea is that the start node broadcasts

BRD: Bilateral Route Discovery in Mobile Ad Hoc Networks

1147

a gratuitous route error message (RERR G) with a T T L of 2 to bypass the failed link and reach the end node or other downstream nodes on the route. Figure 2 shows an example of how GRER works. When link U V fails, start node U broadcasts a RERR G message. Node X relays (broadcasts) the message and end node V receives the message. When the end node or other downstream nodes receive the message, they send a regular RERR message to the destination informing it of the route error. A RERR G message is relayed at most once. A node relays the message if the end node is in its neighbor table and other neighbors have not relayed the message. The ﬁrst condition avoids unnecessary relays. A RERR G message would be more likely to reach the end node when it is in the neighbor table of the relaying node. The second condition suppresses duplicate relays. We do not use a hello protocol to maintain neighbor tables at nodes. Instead, we utilize RREQ messages to build neighbor tables. This approach works well because a topology change that breaks a route typically triggers a route discovery process, which will generate suﬃcient RREQ traﬃc.

4

Bilateral Route Discovery (BRD)

After the source and the destination are notiﬁed of a route breakage, they conduct a BRD, which consists of a srd and a drd. Intersection nodes learn routes to both the source and the destination, and thus they can send cached route replies to the source. Figure 3 shows an example of BRD, where S is discovering a route to D. D initiates a drd and node V learns a route to D. Similarly, S initiates a srd and node V learns a route to S. V sends a cached route reply to S. When S receives the reply, a route is established from S to D. U S

RREQ_S V D

U

U RREP

RREQ_S

S

S V

W RREQ_D

RREP V

D W

D W

RREQ D

(a) Destination route dis- (b) Source route discovery. covery.

(c) Cached route reply

Fig. 3. (a) D initiates a drd, and V learns a route to D. (b) S initiates a srd. V learns a route to S. (c) V sends a cached RREP to S.

We denote the RREQ for a srd/drd as the RREQ S/RREQ D. The T T L of a RREQ S message (ttls ) and the T T L of a RREQ D message (ttld ) are set as follows: ttls = ceil(HCknown /2) and ttld = f loor(HCknown /2), where HCknown is the hop count of a previously known route. We have designed the BRD scheme such that intersection nodes are able to send cached route replies regardless of the receiving order of RREQ S and RREQ D messages.

5

Performance Evaluation

We have implemented BRD in AODV, which is called AODV-BRD, and have conducted simulations to evaluate the performance of AODV-BRD and compared

R. Bai and M. Singhal

Control packets

70000 60000 50000 40000 30000 20000

80% 60% 40% 20%

10000 0

0% 10

15

2.5

100%

AODV-BRD AODV Packet delivery ratio

80000

20 Flows

25

(a) Control overhead.

30

End-to-end delay (s)

1148

AODV-BRD AODV 10

15

AODV-BRD AODV

2 1.5 1 0.5 0

20 Flows

(b) PDR.

25

30

10

15

20 Flows

25

30

(c) End-to-end delay.

Fig. 4. Performance when number of flows changes. Minimum node speed 0.1m/s, maximum node speed 20m/s and pause time 30s.

it with AODV. Simulations were conducted using GloMoSim 2.03 [3]. The radio bandwidth was 2M b/sec and the radio range was 250m. The traﬃc was 4packets/s CBR and the mobility model was random waypoint. Each simulation run lasted for 1200s. The results were averaged over 20 runs. Figures 4(a), 4(b) and 4(c) show the results of control overhead per ﬂow, packet delivery ratio (PDR), and end-to-end delay, respectively, when the number of ﬂows is varied from 10 to 30. We observe that BRD improves the performance over AODV signiﬁcantly. For example, when there are 30 ﬂows, the control overhead is reduced by 80%, the PDR is improved by 118%, while the end-to-end delay is reduced by 65%.

6

Conclusion

We proposed bilateral route discovery (BRD) where both source and destination actively participate in a route discovery. BRD might incur as less as a half of the overhead of traditional unilateral route discovery (URD). We also proposed gratuitous route error reporting (GRER) to notify the destination of a broken route, and thus the destination could participate in the BRD. Simulation results showed that BRD improves the routing performance signiﬁcantly. In the future, we plan to incorporate BRD into our Way Point Routing (WPR) framework [4] and integrate BRD with the Salvaging Route Reply (SRR) approach [5].

References 1. Sadagopan, N., Bai, F., Krishnamachari, B., Helmy, A.: Paths: analysis of path duration statistics and their impact on reactive manet routing protocols. In: MobiHoc ’03. (2003) 2. Perkins, C.E., Royer, E.M.: Ad-hoc on-demand distance vector routing. In: WMCSA ’99. (Feb 1999) 3. Bajaj, L., Takai, M., Ahuja, R., Bagrodia, R., Gerla, M.: Glomosim: A scalable network simulation environment. Technical Report 990027 (13, 1999) 4. Bai, R., Singhal, M.: Doa: Dsr over aodv routing for mobile ad hoc networks. IEEE Transactions on Mobile Computing 5(10) (2006) 1403–1416 5. Bai, R., Singhal, M.: Salvaging route reply for on-demand routing protocols in mobile ad-hoc networks. In: ACM/IEEE MSWiM ’05. (oct 2005)

Correction, Generalisation and Validation of the “Max-Min d-Cluster Formation Heuristic” Alexandre Delye de Clauzade de Mazieux, Michel Marot, and Monique Becker GET/INT; UMR–5157 SAMOVAR; 91011 CEDEX EVRY, France {alexandre.delye,michel.marot,monique.becker}@int-evry.fr

Introduction and Abstract The justiﬁcation for using mutihop clusters may be found in [1]. In the well known heuristic proposed in [2], the d-dominating set of clusterheads is ﬁrst selected by using nodes identiﬁers and then clusters are formed. In this paper we generalise this algorithm in order to select nodes depending of a given criterion (as the the degree, density or energy of nodes). The ﬁrst section of this paper simpliﬁes and proves the correctness of our generalised algorithm to select clusterheads. The cluster formation process proposed in [2] is extensively studied in the second section and is proved to be false.

1

Formation of d-Dominating Sets Based on a Given Criterion

Due to a lack of room, proofs of this section were published in [3]. Let G = {V, E} be a graph with sets of vertices V and edges E . Clusterheads form a subset, S of V which is a d-dominating set over G. Let us consider x ∈ V , Ni (x) is the set of neighbours which are less than i hops from x. Let Y be a set on which a total order relation is deﬁned. Let v be an injective function of V in Y and X = v(Y ). Our generalised algorithm iterates 2d runs. Each node updates two lists : Winner which is a list of elements of X and Sender which is a list of elements of V. Let us note Wk (x) and Sk (x) the images in x of the functions Wk and Sk , deﬁned by induction. Initial Phase (k = 0). ∀ x ∈ V W0 (x) = v(x) S0 (x) = x. Max Phase (k ∈ 1, d). For x ∈ V , let yk (x) be the only node of N1 (x) which is such that ∀ y ∈ N1 (x) \ {yk (x)} Wk−1 (yk (x)) > Wk−1 (y). Wk and Sk are derived from : ∀ x ∈ V Wk (x) = Wk−1 (yk (x)) Sk (x) = yk (x). Min phase (k ∈ d + 1, 2d). For x ∈ V , let yk (x) be the only node of N1 (x) which is such that ∀ y ∈ N1 (x) \ {yk (x)} Wk−1 (yk (x)) < Wk−1 (y) Wk and Sk are derived from: ∀ x ∈ V Wk (x) = Wk−1 (yk (x)) Sk (x) = yk (x).

This work is funded by the Programme Initiative R´eseaux Spontan´es of the Groupe ´ des Ecoles des T´el´ecommunications.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1149–1152, 2007. c IFIP International Federation for Information Processing 2007

1150

A. Delye de Clauzade de Mazieux, M. Marot, and M. Becker

Deﬁnition 1. Let S be the set defined by S = {x ∈ V, W2d (x) = v(x)} 1 . Theorem 1. Each node x ∈ V \ S may determine one node of S at least which is in Nd (x). It needs only to derive it from its Winner list. If W2d (x) = v(x) then x defines itself as a dominating node (Rule 1). If node x finds a v(y) value which appears once in each of the two phases at least, then y ∈ S ∩Nd (x). If node x find several pairs, the node y with the smallest value v(y) is chosen (Rule 2). If not, let y be the node such that v(y) = Wd (x). Then y ∈ S ∩ Nd (x) (Rule 3). Corollary 1. S is a d-dominating set for the graph G. This deﬁnition of S (see Def. 1) is diﬀerent from the deﬁnition given in [2] where S is deﬁned as: S = {x ∈ V, ∃k ∈ d + 1, 2d Wk (x) = v(x)}. Theorem 2. S = S .

2

Cluster Formation

To join a clusterhead c(x), nodes must establish a path to reach it provided all nodes in the path belong to the same cluster. Therefore, it is necessary to ﬁnd an algorithm to partition the topology in connected components, called clusters. In this section we shall study the formation of these clusters. 2.1

The Solution Proposed in ‘Max-Min d-Cluster’ Formation is False

The authors of [2] proposed a formation of above path. We now prove that there exist some cases for which the formation of the path is not valid. Max-Min d cluster formation proposal. Let x be a node and let y be the corresponding dominated node as deﬁned in Theorem 1 (y = c(x)) and let k ∈ 1, d be such as Wk (x) = v(y). x chooses then Sk (x) as father2 . It may be that : c(p(x)) = c(x). Therefore, in some cases it is necessary to use an additional rule to make sure that node c(p(x)) = c(x). This rule is named convergecast in paper [2] and introduces a new necessary condition which is : ∀x ∈ E p(p(x)) = x. If not , the rule would lead to an inﬁnite loop. However, this condition cannot be observed, as shown in the following example. On an example where the algorithm leads to a bug. The network is shown in Fig. 1 and the results of father and clusterhead selection algorithm (with d = 5) are given by Table 1. The cluster formation proposed in [2] leads to an inﬁnite loop as c(p(3)) = c(5) = 11, c(3) = 10, and p(p(3)) = 3. Hence, the use of the convergecast rule is not possible. The next paragraph proves that this phenomenon is due to the use of the Rule 2. 1

2

This definition is not the same as the one provided in [2] but both definitions are equivalent (see Theorem 2). By definition, Sk (x) ∈ N1 (x).

Correction, Generalisation and Validation

5

10

3

4

2

6

1

7

11

8

9

Fig. 1. Topology

1151

Table 1. 5-Max-Min results Node

1

2

3

4

5

6

7

8

9

10 11

Max1 Max2 Max3 Max4 Max5 Min1 Min2 Min3 Min4 Min5 Clusterhead Father

11 11 11 11 11 11 11 11 11 10 11 11

6 11 11 11 11 11 11 11 10 10 11 1

5 10 11 11 11 11 11 11 11 10 10 5

10 10 10 11 11 11 11 11 10 10 10 10

10 10 10 11 11 11 11 11 11 11 11 3

7 10 11 11 11 11 11 10 10 10 10 4

8 9 10 11 11 11 10 10 10 10 10 6

9 9 9 10 11 10 10 10 10 10 10 7

9 9 9 9 10 10 10 10 10 10 10 8

10 10 10 10 11 11 11 11 11 10 10 10

11 11 11 11 11 11 11 11 11 11 11 11

Notice that if a node i is such that v(c(i)) < Wd (i) then the Rule 2 was used. Necessary condition: Rule 2 was used. For two nodes i and j, let us note d(i, j) the distance in hops. Now, let x, y and z be the three nodes. Then, for any node such that c(i) = p(i), d(i, c(i)) = d(p(i), c(i)) + 1 since p(i) is the node allowing i to know c(i). Let i and j the be such as p(i) = j and p(j) = i; i and j are thus not clusterhead since each one have a diﬀerent father. The preceding equality applied to i and j implies that d(i, c(i)) = d(j, c(i)) + 1 and d(j, c(j)) = d(i, c(j)) + 1. Assume that c(i) = c(j) = l, then d(i, l)=d(j, l)+1 and d(j, l) = d(i, l) + 1 which is absurd, so c(i) = c(j). Suppose, without any generality restriction, that v(c(i)) > v(c(j)). Node i belongs obviously in the d hops neighbourhood of c(i). Therefore, p(i) also is in the d hops neighbourhood of c(i) and therefore j ∈ Vd (c(i)). Thus c(i) ∈ Vd (j). Then, Wd (j) ≥ v(c(i)) and then Wd (j) > v(c(j)). Hence, Rule 2 was used according to what precedes. Suﬃcient condition: Rule 2 was used. If a node i is not a clusterhead, then v(c(i)) = Wd (i) (Rule 3). Let i be a node which belongs to a loop. Without any generality restriction, let us show that a loop with a length 5 cannot occur. Let j, k, l, m and i be the father of i, j, k, l and m respectively. Since, j is father of i, j belongs to the d hop neighbourhood of c(i). So, Wd (j)≥v(c(i)). But v(c(i))=Wd (i) thus Wd (j) ≥ Wd (i). It may be deduced that Wd (i)=Wd (j)=Wd (k)=Wd (l)=Wd (m) then c(i)=c(j)=c(k)=c(l)=c(m)=c. Therefore, by applying to each node the general equality d(i, c(i))=d(p(i), c(i))+ 1 since no node among i, j, k, l is clusterhead : d(i,c) = d(j,c)+1, d(j,c) = d(k,c)+1, d(k,c) = d(l,c)+1, d(l,c) = d(m,c)+1, d(m,c) = d(i,c)+1, which is absurd. The same kind of demonstration can be applied for any other loop of any given length. Hence, if the Rule 2 is removed there is no loop. The following example shows that the suppression of the Rule 2 leads to new problems. The network is shown in Fig. 2 and the results of father and clusterhead selection algorithm (with d = 2) are given by Table tab:2. It can be noticed that node node 2 is not a clusterhead and c(p(1)) = c(2) = 5 whereas c(1) = 4. Therefore, there is another problem which is not solved by convergecast rule as it is not possible to go from sons to fathers and to be sure to go through son’s clusterhead before the father be attached to another clusterhead.

1152

A. Delye de Clauzade de Mazieux, M. Marot, and M. Becker Table 2. 2-Max-Min results 1

4

5

2

3

Fig. 2. Topology

2.2

Node

1 2 3 4 5

Max1 Max2 Min1 Min2 Clusterhead Father

2 4 4 4 4 2

4 5 4 4 5 3

5 5 5 4 5 5

4 4 4 4 4 4

5 5 5 5 5 5

Another Proposal for the Formation of the Cluster

If a node i is a clusterhead after application of the Rule 1, then node i informs its neighbours that it is a clusterhead. The unclustered neighbours choose i as clusterhead and transmit a message to their neighbours to inform them that they are at one hop from the clusterhead i. The unclustered neighbours of these nodes choose i as clusterhead by attaching themselves to one of node i neighbours, and inform their neighbours that they are at 2 hops from i. This process is repeated d times so as not to exceed d hops. It guarantees that there is no loop and that all the connected components are tree clusters with a clusterhead root.

3

Conclusion

In this paper, we simpliﬁed (cf. Theorem 2) the heuristic presented in the paper [2]. We generalized this heuristic to any given criterion and not only to the identiﬁer of the nodes. This allows to take into consideration other factors inﬂuencing the performance of the network. For example, the energy of a wireless sensor network beneﬁts from a hierarchical routing introduced by the determination of clusters with a maximum depth d (cf. paper [1]). In the second part, we gave an example which shows that the cluster formation process proposed in [2] is not always valid. This is an important result since Amis et al. algorithm is well known. We then suggested a correct cluster formation process.

References 1. Mhatre, V., Rosenberg, C.: Design guidelines for wireless sensor networks: Communication, clustering and aggregation. Ad Hoc Networks Journal, Elsevier Science 2 (2004) 45–63 2. Amis, A., Prakash, R., Vuong, T., Huynh, D.: Max-min d-cluster formation in wireless ad hoc networks. In: Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies. (2000) 3. Delye de Clauzade de Mazieux, A., Marot, M., Becker, M.: Proof of the generalised maxmin. Rapport Technique no 02-007RST, Institut National des T´el´ecommunications (February 2007)

Analytical Performance Evaluation of Distributed Multicast Algorithms for Directional Communications in WANETs Song Guo1 , Oliver Yang2 , and Victor Leung3 Computer Science, University of Northern British Columbia, Canada [email protected] School of Information Technology and Engineering, University of Ottawa, Canada [email protected] 3 Electrical and Computer Engineering, University of British Columbia, Canada [email protected] 1

2

Abstract. Two distributed algorithms DMMT-OA and DMMT-DA have been recently proposed to maximize the multicast lifetime for directional communications in wireless ad-hoc networks. The experimental results have shown their superior performance than other centralized algorithms; however, their theoretical performance in terms of approximation ratio is still unknown. In this paper, we use graph theoretic approach to derive the approximation ratio for both algorithms. Furthermore, we have discovered by the ﬁrst time that both ratios are bounded by a constant number. Keywords: Wireless Ad Hoc Networks, Approximation Algorithm, Multicast, Distributed Algorithm, Directional Communications.

1

Introduction

Over the last few years, energy eﬃcient communication in Wireless Ad Hoc Networks (WANETs) with directional antennas has received more and more attention. This is because directional communications can save transmission power by concentrating RF energy where it is needed [1]. On the other hand, the broadcast / multicast communication is also an important issue as many routing protocols for WANETs need this mechanism to maintain the routes between nodes. Therefore, one would be interested in ﬁnding an algorithm that would provide the maximum lifetime to the multicast session. The optimization metric is typically deﬁned as the duration of the network operation time until the battery depletion of the ﬁrst node in the network [2][3]. Some work has considered maximizing the network lifetime in a WANET with omni-directional antennas for a broadcast session, e.g., [2][3][4][5], or a multicast session, e.g., [5][6][7][8]. The same problem with directional antennas has been studied in [1][9][10][11] and shown to be a NP-hard problem [11]. The only exact solution for such diﬃcult problem is the MILP formulation presented in [10]. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1153–1156, 2007. c IFIP International Federation for Information Processing 2007

1154

S. Guo, O. Yang, and V. Leung

We note that all the solutions in [1][10][11] are centralized, meaning that at least one node needs global network information in order to construct an energy eﬃcient multicast tree. Sometimes, this centralized approach is impractical for resource-constrained WANETs. The most desirable work has been presented in [9], in which two distributed maximum-lifetime algorithms have been proposed for directional communications. Simulation results have also shown that they outperform other centralized multicast algorithms, e.g., [1], signiﬁcantly. However, their theoretical performance in terms of approximation ratio is still unknown so far. In this paper, we would like to explore the approximation ratio for these two heuristic algorithms from a graph theoretic approach.

2

Network Model

We model our wireless ad hoc network as a simple directed graph G with a ﬁnite node set N and an arc set A corresponding to the unidirectional wireless communication links. We assume a free-space propagation model and an adaptive antenna model, in which the antenna at each node v can switch its orientation to any desired direction with transmission power uniformly distributed across its adjustable beamwidth θv between constant numbers θmin and 2π. The transmission power pvu to support a link (v , u) separated by a distance rvu (rvu > 1) is therefore θv α ·r pvu = (1) 2π vu We consider a source-initiated multicast with multicast members M = {s}∪D, where s is the source node and D is the set of destination nodes. All the nodes involved in the multicast form a multicast tree rooted at the node s, i.e., a rooted tree Ts , with a tree node set N (Ts ) and a tree arc set A(Ts ). Let the battery supply εv be the energy level associated with each node v. We assign the tree arc weight function wvu as the reciprocal of the maximum lifetime of the arc (v, u) as deﬁned as follows. α θv · rvu (2) wvu (θv ) = 2π · εv Note that the beamwidth θv applied by node v in the multicast tree Ts is a function of node v ’s child node set cv , i.e., θv = ϕ(cv ) and cv = {u|(v, u) ∈ A(Ts )}. Such function ϕ(cv ) is deﬁned as the smallest possible beamwidth at node v in the range between θmin and 2π to provide beam-coverage of cv in the tree Ts . Let ΩM be the family of all rooted multicast trees spanning nodes in M. It has been shown in [9] that the maximum-lifetime multicast problem is equivalent to the min-max tree problem, which is to determine a directed tree Ts spanning all the multicast members (i.e., M ⊆ N (Ts ) such that the maximum arc weight is minimized, i.e., max wvu (θv ). (3) min Ts ∈ΩM (v,u)∈A(Ts )

Analytical Performance Evaluation of Distributed Multicast Algorithms

3

1155

Two Approximation Algorithms

Two distributed maximum-lifetime algorithms DMMT-OA (Distributed MinMax Tree algorithm for Omnidirectional Antennas) and DMMT-DA (Distributed Min-Max Tree algorithm for Directional Antennas) [9] have been proposed for directional communications. It has been proved that the degenerate versions of both distributed algorithms for omni-directional antennas are globally optimal. For both algorithms, the multicast tree is constructed in a distributed and incremental manner. Initially, the multicast tree Ts only contains the source node. It then iteratively performs the following Search-and-Grow procedure until the tree contains all the nodes in M. Search-and-Grow : Find the link (v, u) connecting tree node set and non-tree node set with minimum weight wvu , and then include it into the multicast tree. Consequently, the tree Ts would grow by including as many non-tree links (x, y) as possible into the multicast tree if wxy ≤ wvu until no more such links can be found. The DMMT-OA algorithm disregards the beamwidth in the tree construction process, assuming using omnidirectional antennas, i.e., wvu (θv ) = wvu (2π) for each arc (v, u) in the graph. After the tree Ts is constructed, each internal node v set its antennas beamwidth to ϕ(cv ). Unlike the arc weights wvu (2π) in the DMMT-OA, which remain unchanged throughout the execution of the algorithm, the D-DPMT algorithm dynamically updates the weights wvu (θv ) at each step to reﬂect the changes of the smallest beamwidth θv . In the following theorems, we shall show that both algorithms have bounded approximation ratios ρ1 and ρ2 , respectively. The technical detail of the proofs can be found in [12]. Theorem 1. The DMMT-OA algorithm has a bounded approximation ratio ρ1 , ϕ1 ρ1 ≤ , (4) θmin where ϕ1 is the minimum beamwidth applied by the transmitting node v of the arc (v, u) in the final multicast tree Ts obtained by DMMT-OA, i.e., ϕ1 ≡ ϕ(cv ), in which (v, u) satisfies wvu (ϕ(cv )) =

max

(x,y)∈A(Ts )

wxy (ϕ(cx )).

(5)

Let Ts be the ﬁnal multicast tree obtained from the DMMT-DA algorithm. We use Tsk and ckv to denote the partially constructed tree rooted at s and the childnode set of node v, respectively, after the k -th node is added into the tree, where k = 0, 1, . . . , n−1. We assume that the ending node u of the bottleneck arc (v, u) is the i-th node added into the tree Ts and the node chosen at the beginning of the same search-and-grow cycle as arc (v, u) is the j -th (j ≤ i) one that was . added into the tree. Note that cv excludes all pruned nodes from cn−1 v Theorem 2. The DMMT-DA algorithm has a bounded approximation ratio ρ2 , ρ2 ≤

ϕ2 · ϕ(cv ) , θmin · ϕ(civ )

(6)

1156

S. Guo, O. Yang, and V. Leung

where ϕ2 is the smallest beamwidth applied by the transmitting node v before the j-th node added into the tree such that there exists an arc (v , u ) with minimum weight wv u (2π) connecting node sets X and N − X, where X = N (Tsj−1 ), i.e., ϕ2 ≡ ϕ(cj−1 v ∪ {u }), in which (v , u ) satisfies wv u (2π) =

4

min

x∈X,y∈N −X

wxy (2π).

(7)

Conclusion

The main contribution of this paper is to provide the fact that both DMMT-OA and DMMT-DA algorithms have bounded approximation ratios. These results would help us understand why they have superior performance than other proposals in simulation experiments.

References 1. J. E. Wieselthier, G. D. Nguyen, et al: Energy-Limited Wireless etworking with Directional Antennas: The Case of Session-Based Multicasting. IEEE INFOCOM, New York, 2002, pp. 190 - 199. 2. I. Kang and R. Poovendran: Maximizing Static Network Lifetime of Wireless Broadcast Adhoc Networks. IEEE ICC, Alaska, 2003, pp. 2256 - 2261. 3. A. K. Das, R. J. Marks II, et al: MDLT: a polynomial time optimal algorithm for maximization of time-to-ﬁrst-failure in energy-constrained broadcast wireless networks. IEEE Globecom, San Francisco, 2003, pp. 362 - 366. 4. I. Kang and R. Poovendran: On the Lifetime Extension of Energy-Eﬃcient Multihop Broadcast Networks. World Congress on Computational Intelligence, Honolulu, 2002. 5. M. X. Cheng, J. Sun, and et al: Energy-eﬃcient Broadcast and Multicast Routing in Ad Hoc Wireless Networks. IEEE IPCCC, Phoenix, 2003, pp.87 - 94. 6. B. Floren, P. Kaski, and et al: Multicast time maximization in energy constrained wireless networks. Workshop on Foundations of Mobile Computing, San Diego, 2003, pp. 50 - 58. 7. L. Georgiadis: Bottleneck multicast trees in linear time. IEEE Communications Letters, 7(11), Nov. 2003, pp. 564 - 566. 8. S. Guo, V. Leung and O. Yang: A Scalable Distributed Multicast Algorithm for Lifetime Maximization in Large-scale Resource-limited Multihop Wireless Networks. ACM IWCMC, Vancouver, 2006, pp. 419 - 424. 9. S. Guo, V. Leung and O. Yang: Distributed Multicast Algorithms for Lifetime Maximization in Wireless Ad Hoc Networks with Omni-directional and Directional Antennas. IEEE Globecom, San Francisco, 2006. 10. S. Guo and O. Yang: Optimal Tree Construction for Maximum Lifetime Multicasting in Wireless Ad-hoc Networks with Adaptive Antennas. IEEE ICC, Seoul, 2005, pp. 3370 - 3374. 11. Y. Hou, Y. Shi, H. D. Sherali, and J. E. Wieselthier: Online lifetime-centric multicast routing for ad hoc networks with directional antennas. IEEE INFOCOM, Miami, 2005, pp. 761 - 772. 12. S. Guo: Proofs of the Approximation Ratio Bounded Algorithms for the Maximum Lifetime Multicast Problems in WANETs. Technical Reports, http://web.unbc. ca/~sguo/publications.

Beyond Proportional Fair: Designing Robust Wireless Schedulers Soshant Bali1 , Sridhar Machiraju2 , and Hui Zang2 University of Kansas [email protected] 2 Sprint ATL {Machiraju,Hui.Zang}@sprint.com 1

Abstract. Proportional Fair (PF), a frequently used scheduling algorithm in 3G wireless networks, can unnecessarily starve “well-behaved” users in practice. One of the main causes behind PF-induced starvation is its inability to distinguish between users who are backlogged and users who are not. In this paper, we describe how a simple parallel PF instance can mitigate such starvation.

1

Introduction

Scheduling algorithms play a key role in deciding user performance in wireless networks. In the past, a number of channel-aware scheduling algorithms have been proposed [1,2,3,4] to exploit the time-varying nature of user channel conditions without sacriﬁcing fairness. The Proportional Fair (PF) algorithm [5] is one such channel-aware algorithm that has been widely deployed in cellular data networks, especially in 3G networks such as CDMA-based EV-DO [5] and GSM-based HSDPA networks. Our work is motivated by the observation that PF, though widely used, displays a surprising lack of robustness that causes it to unnecessarily starve wellbehaved users for signiﬁcant periods of time. One of the main reasons behind such starvation is the inability of PF to distinguish between users who are backlogged and users who are not. In particular, as we showed in prior work [6], a user receiving maliciously crafted “on-oﬀ” ﬂows can starve other users. We analyze this problem and design Parallel PF (PPF), i.e., PF with a parallel PF instance, to eliminate such starvation. Using simulations, we show that PPF is superior to PF - it is robust and achieves comparable or better throughput and fairness than PF. The PF-centric nature of our work is justiﬁed for two reasons. First, a robust alternative to PF is of practical interest given the widespread use of PF. Second, the use of a parallel mechanism is fairly general and can be used with non-PF schedulers (see [7]). Previously proposed algorithms are not suitable for one or more reasons. For instance, solutions using a strict timebased threshold [8] to limit starvation do not distinguish between the starvation of a well-behaved user experiencing “good” channel conditions - an undesirable outcome - from the starvation of a user experiencing fading - a desirable outcome. Approaches using some form of delay-throughput tradeoﬀ [3] do not prevent I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1157–1160, 2007. c IFIP International Federation for Information Processing 2007

1158

S. Bali, S. Machiraju, and H. Zang

vulnerability (of TCP and UDP ﬂows) to malicious “on-oﬀ” ﬂows and, also, compromise on channel-awareness (for instance, see [9]).

2

The PF Algorithm and Starvation

The PF algorithm typically transmits data to end-users (Access Terminals or ATs) in time slots of ﬁxed size, for example, 1.67ms in EV-DO networks. Each AT has its own queue at the base station. Being channel-aware, PF uses the current achievable rate (of downlink transmission) to each AT. This rate is reported on a per-slot basis by all active ATs. In the EV-DO system, there are 10 unique achievable data rates [10,5]. Assume that there are n ATs in the system. Denote the achievable data rate reported by user i in time slot t to be Ri [t] (i = 1 . . . n). PF maintains Ai [t], the exponentially-weighted mean achieved rate. Ai [t](1 − α) + αRi [t] if t allocated to i Ai [t + 1] = Ai [t](1 − α) otherwise i [t] PF allocates the slot of time instant t to the AT with the highest R Ai [t] ratio. Typically, α = T1 is chosen to be 0.001 [5,6]. This ensures that Ai [t] remains roughly constant over time under stationary channel conditions. Under reasonably general conditions, PF maximizes the sum of the logarithms of per-AT throughput [11] when the channel conditions of ATs are independent. Moreover, if their channel conditions are identically distributed, PF ensures that all ATs are allocated equal number of slots in the long term. In practice, PF can starve an AT experiencing good conditions for a large period of time thereby causing serious performance degradation in the form of high delay jitter, spurious TCP timeouts, etc [6]. Speciﬁcally, PF would allocate an AT, which has a much larger R[t] A[t] ratio than other ATs, consecutive slots

until the ratios cross-over. It turns out that an AT can have its

R[t] A[t]

ratio grow

arbitrarily large simply by not receiving any traﬃc. To see why, note that R[t] A[t] is inversely proportional to A[t] and recall that Ai [·] reduces (via multiplication with 1 − α) when AT i is not allocated slot t irrespective of whether AT i is backlogged or not. Hence, a malicious AT receiving traﬃc in an “on-oﬀ” pattern could potentially starve other ATs during the beginning of each “on” period. Recently [6], we used rigorous experiments on commercial and laboratory networks as well as analysis to show that the resulting starvation can increase jitter and reduce TCP goodput (due to spurious timeouts) of a competing AT by up to 1 second and 30% respectively. We plot a representative result showing the reduction in TCP goodput for various “oﬀ” durations in Figure 1.

3

Proposed Solution

PF is vulnerable to “on-oﬀ” traﬃc primarily because it reduces A[t] (by multiplying it with 1 − α) even when an AT is not backlogged. A naive solution is to

Beyond Proportional Fair: Designing Robust Wireless Schedulers 6

6

2

x 10

2.2 Constant−rate UDP flow Bursty UDP flow

1.6 1.4 1.2 1

x 10

PF (CBR) PF PPF

2 TCP Goodput (bps)

1.8 TCP Goodput [bits/sec]

1159

1.8 1.6 1.4 1.2 1

57 2.

70

0/

3

76

0/

2.

60

0/ 65

6 55

0/

3.

27

4

3. 0/

50

5

0/

4. 0/

45

6

14 5.

0/ 35

40

2

0/

0/

7. 0/

20

25

30

9

0.8

AT2 Data Rate [Kbps] / Inter−burst Gap [sec]

0.8

200

300 400 500 600 UDP data rate (Kbps)

700

Fig. 1. (Left) Comparison of the (experimental) TCP goodput to an AT when another AT receives (1) A periodic (UDP) packet stream. (2) An “on-oﬀ” UDP ﬂow with various inter-burst times. TCP Goodput can decrease by up to 30% due to “on-oﬀ” ﬂows. (Right) Similar simulation experiments with PF and PPF. The inter-burst times decreased from 9s to 2.57s. Goodput decrease due to PF is similar to that seen experimentally but higher due to diﬀerences in TCP timeout algorithms in ns-2 and practical implementations. Goodput reduction is eliminated with PPF.

freeze the value of A[t] for such ATs. But, a frozen A[t] value does not adapt to changes in the number of backlogged ATs or channel conditions. Hence, an AT with a recently unfrozen A[t] can have a ratio that is much lower or higher than other ATs thereby causing starvation. A backlog-unaware algorithm, which always considers ATs to be backlogged, is also not desirable since it would allocate slots to ATs with no data to receive and hence, would not be work conserving. We propose the following Parallel PF (PPF) algorithm that uses a backlogunaware scheduler instance only to remove the undue advantage an “on-oﬀ” user receives at the beginning of “on” periods. A normal instance of PF drives slot allocation. The parallel instance of PF assumes all ATs are backlogged and executes simultaneously. We use Ap [t] to refer to the A[t] values maintained by the parallel instance. When a previously idle AT becomes backlogged, all A[t] values are reset to the corresponding Ap [t] values. Thus, when a previously idle AT becomes backlogged, diﬀerences in achieved throughput of backlogged and idle ATs are forgotten. Also, notice that as long as an idle AT does not become backlogged, PPF is equivalent to PF. To test if PPF is vulnerable to “on-oﬀ” traﬃc patterns, we recreated our laboratory-based setup (see [6] and Figure 1(Left)) using ns-2 simulations with two ATs - AT1 and AT2. AT1 received a long-lived TCP ﬂow and AT2 received a (malicious) “on-oﬀ” UDP ﬂow consisting of 225KB bursts sent at various interburst time periods. The simulations used a wireless link, governed by PF or PPF, that connected to a wired 100Mbps link with mean round trip times of 250ms. We assigned achievable data rates based on measurements in a commercial EV-DO network. To collect these 30-minute long traces, we used the Qualcomm CDMA Air Interface Tester [12] software on a stationary AT and a mobile AT

1160

S. Bali, S. Machiraju, and H. Zang

moving at an average speed of 40mph. We plot the TCP goodput obtained with PF and PPF in Figure 1 (Right). The results clearly show that, with PPF, the TCP goodput is not aﬀected by the “on-oﬀ” ﬂow. In fact, by not causing the UDP ﬂow to have low A[t] values, PPF better implements channel-awareness. This is the reason why the TCP goodput with PPF is slightly higher than the goodput with PF and a CBR ﬂow consisting of periodically-sent UDP packets.

4

Conclusions and Future Work

In this paper, we focused on the starvation induced by PF due to “on-oﬀ” traﬃc and showed that a parallel PF instance can eliminate such starvation. The parallel instance is a fairly general mechanism that can be used with nonPF schedulers. Indeed, in the longer version of this paper [7], we designed a quantile-based scheduling algorithm that is more robust than PF under other starvation-inducing scenarios and showed that a parallel instance endows it with robustness to “on-oﬀ” traﬃc too.

References 1. Bonald, T.: A Score-based Opportunistic Scheduler for Fading Radio Channels. In: Proc. of European Wireless. (2004) 2. Viswanath, P., Tse, D., , Laroia, R.: Opportunistic Beamforming using Dumb Antennas. IEEE Transactions on Information Theory 48 (June 2002) 1277–1294 3. Shakkottai, S., Stolyar, A.: Scheduling Algorithms for a Mixture of Real-time and Non-real-time Data in HDR. In: Proc. of ITC-17. (September 2001) 4. Borst, S., Whiting, P.: Dynamic Rate Control Algorithms for HDR Throughput Optimization. In: Proc. of INFOCOM. (2001) 5. Jalali, A., Padovani, R., Pankaj, R.: Data Throughput of CDMA-HDR: A High Eﬃciency-high Data Rate Personal Communication Wireless System. Proc. of IEEE Vehicular Technology Conference 3 (May 2000) 1854–1858 6. Bali, S., Machiraju, S., Zang, H., Frost, V.: A Measurement Study of Schedulerbased Attacks in 3G Wireless Networks. In: Proc. of Passive and Active Measurement (PAM) Conference. (2007) 7. Bali, S., Machiraju, S., Zang, H., Frost, V.: On the Performance Implications of Proportional Fairness (PF) in 3G Wireless Networks. Technical Report RR06ATL-040624, Sprint ATL (2006) 8. Park, D., Seo, H., Kwon, H., Lee, B.G.: A New Wireless Packet Scheduling Algorithm based on the CDF of User Transmission Rates. In: Proc. of GLOBECOM. (2003) 9. Patil, S., de Veciana, G.: Measurement-based Opportunistic Feedback and Scheduling for Wireless Systems. In: Proc. of Annual Allerton Conference on Communication, Control and Computing. (2005) 10. Bender, P., Black, P., Grob, M., Padovani, R., Sindhushayana, N., Viterbi, A.: CDMA/HDR: A Bandwidth-eﬃcient High-speed Wireless Data Service for Nomadic Users. IEEE Communications Magazine38 (July 2000) 70–77 11. Kushner, H., Whiting, P.: Convergence of Proportional-Fair Sharing Algorithms Under General Conditions. IEEE Trans. on Wireless Communications (july 2004) 12. CDMA Air Interface Tester: http://www.cdmatech.com/download library/pdf/ CAIT.pdf (2006)

A Voluntary Relaying MAC Protocol for Multi-rate Wireless Local Area Networks Jaeeun Na, Yeonkwon Jeong, and Joongsoo Ma School of Engineering, Information and Communications University Daejeon, 305-732, Korea {davini02,ykwjeong,jsma}@icu.ac.kr

Abstract. To exploit multi-rate capability as well as improve performance in wireless local area networks (WLANs), many mechanisms were proposed on IEEE 802.11 media access control (MAC) layer. However, no eﬀort has been invested to exploit the multi-rate capability for power saving mechanism in MAC layer. In this paper, we propose a Voluntary Relaying MAC Protocol, called VRMP, to achieve both performance improvement and power saving by leveraging the multi-rate capability. In voluntary relaying scheme, if a node can support low rate node’s data packet at higher rate and has suﬃcient power, after cooperatively sending data packet at higher rate, all nodes go into sleep mode as quickly as possible to reduce power consumption. Simulation results show that the VRMP improves throughput by 30 ∼ 60% as well as reduces power consumption by 10 ∼ 55% than the legacy mechanism.

1

Introduction

IEEE 802.11 standard for wireless LAN [1] provides a multi-rate capability on physical layer (PHY) to support higher bandwidth by using diﬀerent modulation schemes. For example, IEEE 802.11b supports data rates of 1, 2, 5.5, 11Mbps, which are inversely proportional with the transmission distance between sender and receiver. To improve performance by exploiting multi-rate capability on medium access control (MAC) layer [2], many relaying MAC protocols [3, 4, 5] were proposed recently. This approach makes that low rate transmission is replaced with two higher rate transmissions by using intermediate node as relay node. However, these solutions only consider how to transmit data packet at higher rate through relay node between sender and receiver. In addition, low rate node selects a relay node without regard to mobility and power status of relay node. In some case, relay node may consume more battery power due to packet transmission of other nodes. Moreover, as the number of nodes increase, the overheads by overhearing other node’s packets and maintenance of table are proportionally increased. Hence, this paper proposes a Voluntary Relaying

This research was supported by MIC (Ministry of Information and Communication), Korea, under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute of Information Technology Assessment). (IITA2006-C1090-0603-0015).

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1161–1164, 2007. c IFIP International Federation for Information Processing 2007

1162

J. Na, Y. Jeong, and J. Ma

Beacon Interval ATIM window

Node A (CNs)

B e a c o n

Data window

A T I M

R T S

2 Mbps

Node B (CNr) Node C (VNs)

C T S

A C K

A C K

11 Mbps

5.5 Mbps

V R T S

A T I M

DATA -2

DATA

SIFS

11 Mbps

Node D

DATA -1

SIFS

A C K

C T S

A C K SIFS

Fig. 1. The Operation of VRMP

MAC Protocol (VRMP) that leverages the multi-rate capability with power saving mode (Section 2). In addition, in VRMP, relay node notiﬁes other low rate nodes voluntarily that allows indirect data transmission via itself. After relaying data transmission, nodes enter into doze mode to save unnecessary power. Thus, VRMP not only increases network throughput, but also reduces average power consumption (Section 3).

2

The Voluntary Relaying MAC Protocol

In this section, we propose the Voluntary Relaying MAC protocol (VRMP) that is based on power saving mechanism (PSM) of 802.11b. The basic concept of VRMP is that every awake node quickly enters doze mode through cooperatively helping transmit data packet at higher rate. To save unnecessary idle power of legacy PSM, sender or receiver can go into doze mode when they have no more sending or receiving data packet, including other node’s relaying data packet. Each node also maintains the candidate table only for current beacon interval to assist low rate transmission of other node. In addition, we restrict that a node can assist only one neighbor node to avoid much power consumption by relaying many other data packets. The VRMP operation is illustrated in Figure 1, and is explained in detail by the following steps: 1) In ATIM (Announcement Traﬃc Indication Message) window, sender transmits ATIM frame and receiver decides direct rate (Rd ) by measuring its received signal strength. And then, receiver notiﬁes it to sender and neighbor nodes via ACK (Acknowledgement) frame. Every node overhears all ongoing ATIM/ACK frames and determines two indirect rates between sender (or receiver) and itself

A Voluntary Relaying MAC Protocol for Multi-rate WLANs

1163

respectively. If Rd is lower than 2Mbps and it supports higher indirect rate than Rd , it adds the new candidate node (CNs ) information in candidate table. 2) When ATIM window is over, a node checks whether its data packet will transmit at high rate, it has suﬃcient power and has more than one CNs in table. If these conditions are satisﬁed, this node enables to relay other packets, which called the voluntary node (V N ). They have high priority for data transmission with smaller contention window size. Since these nodes have to notify their CNs that they will relay low rate data packet at higher rate, before CNs sends out its data packet at low rate. 3) If the voluntary node is sender (V Ns ), it selects an optimal CNs with the highest indirect rates and transmits the Voluntary RTS (VRTS) frame that piggybacks helping CNs ’s address and supporting indirect rates in RTS (Request To Send) frame. In case of receiver (V Nr ), it transmits VCTS (Voluntary Clear To Send) frame only after it receives RTS frame from its sender. At this time, if selected CNs overhears VRTS or VCTS frame, it prepares data transmission regardless of its remaining backoﬀ time. Next, voluntary node begins own data transmission with its counterpart node. After that, CNs also starts immediately relaying transmission of own data packet via voluntary node without contention. Finally, voluntary node and candidate node go into doze mode only if there is no remaining data transmission or reception.

3

Simulation Results

In this section, we evaluate the performance of VRMP through simulation. Similar to [5], the distance threshold for 11, 5.5, 2Mbps are 100m, 200m, and 250m, respectively. The data packet length is ﬁxed at 512 bytes. The nodes are randomly distributed in 250m × 250m. The ATIM window size is 20 or 30ms and beacon interval is 100ms. For calculating power consumption, we use 1.65W, 1.4W, 1.15W and 0.045W as value of power consumed by MAC layer in transmit, receive, and idle modes and doze state, respectively. Figure 2 shows the aggregate throughput during one beacon interval when using legacy PSM, VRMP. Comparing VRMP with legacy PSM, they have same throughput due to enough data window time to transmit all data packets of senders when the number of nodes is small. Otherwise, VRMP outperforms legacy PSM (30% ∼ 60%) when the number of nodes is more than 30 due to higher data rate transmission and voluntary relaying procedure. Figure 3 compares the average power consumption of a node. VRMP saves power at almost 10% ∼ 55% than PSM because nodes enter into doze mode rapidly once they transmit or receive all packets and relaying packet. On the contrary, in PSM, all nodes must awake to transmit data packet during entire beacon interval. At VRMP with 20ms, power consumption is reduced when there is more than 50 nodes. Since many nodes cannot send ATIM frame during short ATIM window, so most of them go to sleep mode until next beacon interval.

1164

J. Na, Y. Jeong, and J. Ma

2200

1200

2000

Power Consumption (mW)

Aggregate Throughput (Kbps)

1000 1600

1200

800 VRMP : 20ms VRMP : 30ms Legacy : 20ms Legacy : 30ms

400

0 10

20

30 40 The Number of Stations

50

Fig. 2. Aggregate Throughput

800

600

400 VRMP : 20ms VRMP : 30ms Legacy : 20ms Legacy : 30ms

200

60

0 10

20

30 40 The Number of Stations

50

60

Fig. 3. Power Consumption

However, it may lead to longer transmission delay. In case of VRMP with 30ms, it results the higher power consumption than 20ms because all nodes must maintain awake state for longer duration of ATIM window.

4

Conclusion

In this paper, we propose the voluntary relaying MAC protocol that exploits the multi-rate capability with power saving mechanism for WLANs. The neighbor nodes help voluntarily low rate node to be delivered data packet faster through indirect transmission. It also makes that all nodes can enter quickly into doze mode through voluntarily helping transmit data packet. Simulation results show that the proposed scheme outperforms the legacy PSM in terms of throughput, and power consumption. The proposed mechanism does not need a complex procedure for relaying data transmission and can be applied to mobile environments and 802.11 a/b/g.

References 1. “IEEE Std. 802.11b-1999, Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Speciﬁcations: High-Speed Physical Layer Extension in the 2.4GHz Band,” 1999. 2. M. Heusse, F. Rousseau, G. Berger-Sabbatel, and A. Duda, “Performance anomaly of 802.11b,” in Proc. of IEEE INFOCOM, San Francisco, USA, March 2003. 3. P. Liu, Z. Tao, and S. S. Panwar, “A Cooperative MAC Protocol for Wireless Local Area Networks,” in Proc. of the 2005 IEEE International Conference on Communications (ICC), Seoul, Korea, May 2005. 4. L.M. Feeney, D. Hollos, H. Karl, M. Kubisch, and S. Mengesha, “A geometric derivation of the probability of ﬁnding a relay in multi-rate networks,” in Proc. of 3rd IFIP-TC6 Networking Conference (Networking 2004), Athens, Greece, May 2004. 5. H. Zhu and G. Cao, “rDCF: A relay-enabled medium access control protocol for wireless ad hoc networks,” in Proc. of IEEE INFOCOM, Miami, FL, Mar. 2005.

Throughput Analysis Considering Capture Effect in IEEE 802.11 Networks* Ge Xiaohu, Yan Dong, and Zhu Yaoting Dept. Electronics & Information Engineering Huazhong University of Science & Technology, Wuhan, P.R. China {xhge,dongyan,ytzhu}@mail.hust.edu.cn

Abstract. The impact of capture effect on the IEEE 802.11 networks with different transmission speeds is investigated in this paper. A new Markov chain model considering capture effect for the binary exponential back-off scheme in the MAC layer has been proposed in the first time. Based on the new Markov chain model, a new throughput model is proposed, and then the impact of capture effect on throughput has been analyzed in the condition of different transmission speeds. The performance analysis shows that, in the RTS/CTS scheme, the improvement throughput of high speed networks caused by capture effect is more than that of low speed networks. Keywords: 802.11 protocols, MAC layer, Markov chain, capture effect.

1 Introduction Accompanying the standardization and rapid deployment of IEEE 802.11 Wireless Local Area Networks (WLANs) in the new century [1]. The medium access control (MAC) protocol used in IEEE 802.11 WLANs is called distributed coordination function (DCF). Unsuccessful packets will be retransmitted according to a binary exponential back-off (BEB) policy. For wireless communications, when multiple mobile stations send their data packets simultaneously to the same access point, a packet collision occurs. But one of the collided packets can still be correctly received, if its SINR seen by the access point is high enough. This phenomenon is called the capture effect [2]. The throughput performance of IEEE 802.11 WLANs has been extensively studied in literatures [3,4]. These previous related work does not consider the impact of capture effect on the IEEE 802.11 BEB back-off policy in the modeling and analysis of throughput performance. In contrast, we will consider this impact in this paper. Incorporating capture effect and physical layer parameters, a new Markov chain based access model will be used to analyze the throughput performance of IEEE 802.11 WLANs in the different transmission speeds. The rest of the paper is organized as follows. In section II, a new throughput model based on the new Markov chain model is proposed, and then the improvement percent *

Supported by National Science Foundation of China under Grant No. 60610106111.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1165–1168, 2007. © IFIP International Federation for Information Processing 2007

1166

G. Xiaohu, Y. Dong, and Z. Yaoting

of throughput caused by capture effect has been reported in condition of the different transmission rates. Finally, in section III, we conclude the paper and introduce some subjects of future research.

2 Performance Analysis in Different Transmission Speeds 2.1 Throughput Model To analyze the MAC-layer performance of a WLAN BSS, we can build a discrete Markov chain model to characterize the temporal evolution of each node’s status at the instants of state changes. In contrast to the previous related work, which does not consider the relationship between capture and transmission probabilities in the mathematical modelling, we design a new Markov chain model with capture effect and derive the closed-form analytical solutions [5]. Based on the Markov chain model derived from [5], the normalized throughput of AP can be calculated as the ratio of time occupied by the transmitted information to the interval between two consecutive transmissions

S=

Ps Ptr E [P ] (1 − Ptr )σ + Ptr PsTs + Ptr (1 − Ps )T f

(1)

where E [P ] is the average packet payload size; Ptr is the probability that there are at least one transmission in the interval; Ps is the conditional successful transmission probability; Ptr Ps is the probability of a successful transmission occurred in the interval, and the average amount of payload information successful transmitted in the interval is Ptr Ps E [P ] . The average length of the interval is readily obtained considering that, with probability 1 − Ptr , the interval is empty; with probability Ptr (1 − Ps ) it contains a failure transmission. Ts is the average time the channel is

sensed busy because of a successful transmission, and T f is the average time the channel is sensed busy by a failure transmission, and σ is the duration of an empty slot time. For simplicity, we assume that all nodes in this network have a fixed frame size and transmission rate, in order to keep E [P ] , Ts , T f and σ constant. And then, we can get a new expression of throughput S considering capture effect from (1). In order to investigate capture effect into the IEEE 802.11 networks with different transmission speeds (1Mbit/s and 11Mbit/s), based on the RTS/CTS access scheme, the standard parameters of IEEE 802.11b networks have been used in the following simulations. 2.2 Performance Analysis Capture Effect in Different Transmission Speeds

In the low speed networks (such as 1Mbit/s transmission speed), Fig. 1(a) shows the improvement percent of throughput considering capture effect, which is compared with throughput no considering capture effect. And this improvement percent of

Throughput Analysis Considering Capture Effect in IEEE 802.11 Networks

1167

5.5 n=5 n=10 n=20 n=50

5

Improvement percent on throughput (unit %)

4.5 4 3.5 3 2.5 2 1.5 1 0.5

8

16

32

64 128 Initial size of the back-off window

256

512

1024

(a) n=5 n=10 n=20 n=50

8

Improvement percent on throughput (unit %)

7

6

5

4

3

2

1

8

16

32

64 128 Initial size of the back-off window

256

512

1024

Improvement percent difference of throughput (unit %)

(b) n=5 n=10 n=20 n=50

2.5

2

1.5

1

0.5

8

16

32

64 128 Initial size of the back-off window

256

512

1024

(c) Fig. 1. Throughput analysis of 802.11 networks in different speeds

throughput has been plotted as the function of initial size of back-off window. It can be found that the improvement percent of throughput caused by capture effect is decreased, while the initial size of back-off window is increased. In the different network scales, the improvement percent of throughput is different, which is increased with the size of network. So the impact of capture effect in the low speed

1168

G. Xiaohu, Y. Dong, and Z. Yaoting

networks is increased with the size of network, but is decreased with the initial size of back-off window. The similar results can be obtained in the high speed networks (such as 11Mbit/s transmission speed), which can be shown in the Fig. 1(b). In order to investigate the impact of capture effect into different transmission speeds, we compare the improvement percent difference of throughput between the high speed networks and low speed networks. The improvement percent difference of throughput, which means that the values of improvement percent in the Fig. 1(a) minus the values of improvement percent in the Fig. 1(b), is obtained and plotted in the Fig. 1(c). Fig. 1(c) shows that capture effect has more influence on the high speed networks than that on the low speed networks.

3 Conclusion In this paper, a new IEEE 802.11b throughput model has been proposed based on the new Markov chain model [5]. By means of analysis of improvement throughput in the RTS/CTS access scheme, it can be found that the improvement percent of throughput in high speed networks is more than that in low speed networks, which means that capture effect has more influence on the high speed networks that that on the low speed networks in the aspect of throughput.

References 1. B. P. Crow, I. Widjaja, J. G. Kim, and P. T. Sakai. “IEEE 802.11 wireless local area networks,” IEEE communications magazine, vol.35, Sept. 1997, pp.116-126 2. Z. Hadzi-Velkov and B. Spasenovski, “Capture Effect with Diversity in IEEE 802.11b DCF”, IEEE International Symposium on ISCC 2003, vol.2, 30 June-3 July 2003, pp699 704 3. G. Bianchi, “Performance analysis of the IEEE 802.11 distributed coordination function,” IEEE J.Selected Areas in Communications. Vol.18, Mar.2000, pp535-547 4. S. T. Sheu and T. F. Sheu, “A bandwidth allocation/sharing/extension protocol for multimedia over IEEE 802.11 ad hoc wireless LANs,” IEEE J.Selected Areas in Communications, Vol.19, Oct.2001, pp2065-2080 5. X. Ge, Y. Dong, Y. Zhu, “Throughput Model of IEEE 802.11 Networks with Capture Effect,” IEEE WiCom, 2006. Wuhan, China, 22-24 Sept. 2006.

Performance Improvement of IEEE 802.15.4 Beacon-Enabled WPAN with Superframe Adaptation Via Traﬃc Indication Zeeshan Hameed Mir1 , Changsu Suh2 , and Young-Bae Ko1 1

College of Information & Communication, Ajou University, Republic of Korea 2 R & D Department, Hanback Electronic, Republic of Korea [email protected], [email protected], [email protected]

Abstract. The IEEE 802.15.4 standard provides the widely accepted solution for low-cost and low-power wireless communications. Despite its design support for low duty-cycle operation, the ﬁxed superframe size in Beacon-enabled mode limits its capabilities due to two contrasting goals; energy eﬃciency and higher data throughput. In this paper, we propose an enhancement of IEEE 802.15.4 Beacon-enabled mode which adaptively adjusts the active period based on the traﬃc information. In order to detect the data traﬃc in the networks, the proposed scheme utilizes the IEEE 802.15.4 CCA function. Evaluation results show that our scheme can improve energy eﬃciency as well as data throughput.

1

Introduction

IEEE 802.15.4 WPAN (Wireless Personal Area Networks) standard [1] supports for low-cost and low-power wireless connectivity among resource-limited devices. Especially, IEEE 802.15.4 MAC achieves a low-duty cycle operation by means of its Beacon-enabled mode. In this mode, a PAN coordinator periodically disseminates a superframe structure bounded by a beacon frame into the network and manages its active/inactive period. Any associated devices are allowed to communicate in the active period and conserve energy by turning oﬀ their transceivers during the inactive period. However, a ﬁxed duration of the active period limits its overall performance by two means: idle listening [2] and lower data throughput. In this paper, we proposed a novel scheme for mitigating the idle listening problem and improving data throughput in the current IEEE 802.15.4 Beaconenabled mode. In our proposed scheme, a coordinator can adaptively adjust the active period based on the data traﬃc information of associated devices. When a data traﬃc load is low, the active period is reduced to conserve energy consumption regardless of a superframe duration. However, with a higher traﬃc

This work was in part supported by the MIC (Ministry of Information and Communication), Korea, under the ITRC support program supervised by IITA, (IITA2006-C1090-0602-0011) and (IITA-2006-C1090-0603-0015).

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1169–1172, 2007. c IFIP International Federation for Information Processing 2007

1170

Z.H. Mir, C. Suh, and Y.-B. Ko

load, the active period becomes lengthen up to a total beacon interval to improve data throughput. For performance evaluation, we have implemented our proposed scheme as well as the IEEE 802.15.4 full-standard using the TOSSIM [3] simulator. Evaluation results show that the proposed scheme outperforms in terms of both energy eﬃciency and data throughput.

2

The Proposed Scheme

To reduce energy consumption but still improve data throughput, the proposed scheme adaptively adjusts the active period based on the data traﬃc information. In our description, we use the term sentinel duration to refer to a special epoch for detecting the traﬃc information. 2.1

Adaptive Active Duration

In our scheme, devices having no data traﬃc are not required to continuously maintain an active state even when they are in their superframe duration part. The sentinel duration is periodically performed with a length equivalent to a superframe duration in a single beacon interval as shown in Fig. 1. In the proposed scheme, the value for Superframe Order (SO) is set to be smaller than the Beacon Order (BO). Therefore, several occurrences of sentinel duration can be performed in one beacon interval. At the start of a sentinel duration, if a node ﬁnds pending data traﬃc in its queue buﬀer, it tries to convey the traﬃc information to its coordinator. On detecting traﬃc presence from devices at some sentinel duration, the coordinator maintains an active RF state to receive data frames until the next sentinel duration. If there is no data traﬃc at any sentinel duration, the coordinator and devices can enter or continuously maintain the sleep state accordingly. Devices can continuously carry out the sentinel duration so that the pending data traﬃcs can be transmitted to the coordinator even after the superframe duration is over. In this way, data throughput can be increased while decreasing latency when the data traﬃc load is high.

Fig. 1. Performing sentinel durations in a beacon interval

Performance Improvement of IEEE 802.15.4 Beacon-Enabled WPAN

2.2

1171

Traﬃc Indication Technique

Our traﬃc indication technique utilizes the IEEE 802.15.4 CCA function. At the start of a sentinel duration, nodes having data traﬃc start to transmit it based on the slotted CSMA/CA similar to the original 802.15.4. In order to check the existences of data traﬃc, our traﬃc indication technique just waits for general packet frame’s signal during the maximum contention period. The sentinel duration(T SD ) of traﬃc indication technique can be calculated as T SD = (2BE − 1) × aU intBackof f P eriod

(1)

where BE means a back-oﬀ exponent value and its maximum value is 5 in the IEEE 802.15.4 standard, therefore T SD is calculated as 620 symbols. As a transmitted packet should appear within the maximum 620 symbols, traﬃc indication technique can detect existence of the traﬃc information within this duration. If one of the CCA becomes busy during T SD, traﬃc indication technique decides that there is data traﬃc. However, if no data is generated during T SD, nodes decide that there is no traﬃc in this active period and enter into the sleep mode. Since our traﬃc indication technique transmits no additional traﬃc indicator frames, it is consistent with the original IEEE 802.15.4 MAC without any conﬂict and control packet transmission overheads.

3

Performance Evaluations

We have evaluated the performance of proposed scheme and compared it with the original 802.15.4 by using TOSSIM [3]. In our simulation model, we evaluated the performance of these two schemes in a star topology. One device is chosen as the coordinator while the other devices act as the associated devices (i.e., general devices). The data frame size is set to 30 bytes. The total simulation time is 200 seconds, and the energy consumption model is consistent to the one presented in [2]. We assume that all nodes are already associated with a coordinator, and only 20% of the nodes generate data frame every 1 second. We have simulated with the various number of devices which can inﬂuence of data traﬃc load. In these simulations, the values for SO and BO are 3 and 6, respectively. Fig. 2 plots the values for diﬀerent performance matrices as a function of number of nodes. In terms of the aggregate throughput, our proposed scheme always performs better than the original 802.15.4, as shown in Fig. 2(a). Since the proposed scheme has an adaptive active duration based on the data traﬃc information, it is not limited by superframe duration, unlike the original 802.15.4. During the high data traﬃc load, the active duration lengthens as long as the Beacon Interval (BI), whereas in low data traﬃc scenarios it becomes as short as several sentinel durations. This observation can lead to the better performance of our schemes in terms of a delivery ratio as well. Fig. 2(b) shows the total energy consumption of the two protocols. Our proposed scheme can reduce the idle listening problem by adjusting the active duration. Therefore, 80% nodes having no data traﬃc can save energy in our scheme while nodes in the original 802.15.4 have to maintain active state for a full superframe duration.

1172

Z.H. Mir, C. Suh, and Y.-B. Ko 50

160000

Total Energy Consumption (W).

Aggregate Throughput (bytes)

45

Original 802.15.4

140000

Proposed scheme

120000 100000 80000 60000 40000 20000

Original 802.15.4

40

Proposed scheme

35 30 25 20 15 10 5 0

0 10

25

50

75

Number of Nodes

(a) Aggregate Throughput

100

125

10

25

50

75

100

125

Number of Nodes

(b) Total Energy Consumption

Fig. 2. Simulation results according to number of nodes with SO=3 and BO=6

4

Conclusion

In this paper, we propose the new IEEE 802.15.4 Beacon-enabled mode. We show that by utilizing available ‘traﬃc information’ based on our traﬃc indication technique help to adjust the active duration in order to conserve extra energy consumption and to achieve a higher throughput. The evaluation results compared with original IEEE 802.15.4 show that our proposed scheme outperforms in aspects of the energy consumption and throughput.

References 1. IEEE Std 802.15.4: Wireless Medium Access Control and Physical Layer speciﬁcation for Low Rate Wireless Personal Area Networks, Dec. 2003. 2. W. Ye, J. Heidemann, and D. Estrin, “Medium Access Control with Coordinated Adaptive Sleeping for Wireless Sensor Networks,” In IEEE Transaction on Networking, June 2004. 3. P. Levis, N. Lee, M. Welsh, and D. Culler, “TOSSIM: Accurate and Scalable Simulation of Entire TinyOS Applications”, In ACM SenSys, Nov. 2003.

Analysis of WLAN Traﬃc in the Wild Caleb Phillips and Suresh Singh Portland State University, 1900 SW 4th Avenue Portland, Oregon, 97201, USA {calebp,singh}@cs.pdx.edu

Abstract. In this paper, we analyze traﬃc seen at public WLANs “in the wild” where we do not have access to any of the backend infrastructure. We study six such traces collected around Portland, Oregon and conduct an analysis of ﬁne time scale (second or fraction of a second) packet, ﬂow, and error characteristics of these networks. Keywords: Measurement, WLAN, passive monitoring, traﬃc modeling.

1

Introduction

Analysis of the MAC-level behavior of WLANs is required in order to better deploy and design future systems. To this end, collection and analysis of traﬃc traces is an important task. The main research reported in this paper analyzes traﬃc traces collected using a commercial sniﬀer VWave [1] which has nanosecond time resolution. We characterize the packet level and ﬂow level behavior of these traces and note signiﬁcant similarities. This result is good news, in that the statistical models we derive can be widely applied for simulations. Our work diﬀers from prior work which have considered congested WLANs at conferences and long-term, coarse-resolution, datasets [2] in favor of studying lightly loaded public hotspots at high resolution, which we conjecture are the norm and not an exception.1

2

Data Collection Methodology

The Veriwave WT20 hardware [1] consists of two 802.11 reference radios, realtime linux, and two processors. The WT20 provides nanosecond resolution and it logs the time when it began seeing a frame and the time when the frame ﬁnished arriving. We face two challenges in data collection: The ﬁrst is placement of the VWave sniﬀer. Because it has a lower eﬀective receiver sensitivity than most access points today (-75dBm versus -90dBm), we must prevent a large possible packet loss with careful antenna choice and placement. The second problem is practical – we had to obtain permission from the three merchants and further needed to 1

This work was funded by the NSF under grant no. 0325014.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1173–1178, 2007. c IFIP International Federation for Information Processing 2007

1174

C. Phillips and S. Singh Table 1. Gross statistics of the captures Capture Name psu-cs (at PSU) library (at PSU) cafeteria (at PSU) pioneer-sq (outdoor) urban-grind (coﬀee shop) powell (book store)

Length Total (hours) Pkts 1 127901 (35 pps) 4 696811 (48 pps) 4 1431897 (99 pps) 4 307880 (21 pps) 2 490528 (58 pps) 4 762574 (53 pps)

Range 6-771 pps 7k-7Mbps 5-672 pps 4k-7Mbps 7-1318 8k-10Mbps 1-265 1k-3Mbps 10-355 6k-3Mbps 8-296 6k-2Mbps

802.11 IP Mgt. 41543 73473

TCP

UDP

8803

5965

Users Mean Max 2.6 5

159699 297481 190962 105405 2.1

3

169541 1026304 911549 108474 10.2

19

206526 99011

2.5

4

87423 390514 350034 38696

6.9

9

150622 565689 529228 20345

3.4

7

94066 4734

ensure that our equipment was as unobtrusive as possible so as not to aﬀect the “normal” behavior of the users. We collected data at six diﬀerent locations of which three were located on-campus and three oﬀ-campus. Table 1 lists the traces with some gross statistics.

3

Detailed Data Analysis

Our analysis is organized into four categories: network load in terms of users and their residing times, anaysis of MAC-layer errors, the packet arrival process, and ﬁnally ﬂow arrival processes and duration times. 3.1

User Load

We consider the number of users over time and the average time spent by a user in the WLAN. We identify the presence of users by a successful DHCP ACK. User departures are indicated by the last message seen with a given MAC address. Table 1 gives the mean and maximum number of users for each capture. The second statistic we consider is the length of time users stay active in the WLAN (see Table 2). Residing time in four cases ﬁts an exponential distribution (“exp”) and in two cases ﬁts the weibull distribution (“wbl”). The quality of the ﬁts is very good as measured by the deviation metric Λ [3]2 . Indeed, for all the ﬁts reported, Λ < 0.25. Table 2. Residing time of users mean max std fit 2

psu-cs 1363s 3486s 1203 exp

library 5057 13001 4900 exp

cafeteria 3675 11878 3332 exp

pioneer-sq 4124 12911 4744 wbl,[a, b] = [3276, 0.68]

urban 4896 8342 2969 wbl,[a.b] = [5283, 1.44]

powell 2471 8081 2181 exp

This metric for determining the quality of a ﬁt for traﬃc analysis was ﬁrst used in [3] where the author explains the rationale behind using this metric rather than a chi-square metric or other metrics.

Analysis of WLAN Traﬃc in the Wild

3.2

1175

Error Analysis

The lower receiver sensitivity of the VWave, makes FCS a poor choice for error analysis. Instead, for the remainder of this section we use MAC retransmissions as an indicator of error and not the FCS value (this is consistant with [4]). We observe a moderate linear correlation between MAC retransmissions and load, with correlation coeﬃcients of 0.53 and 0.54 for packets/sec and bytes/sec respectively. We do see a general reduction in MAC retransmissions with improving rssi, but the relationship does not have a clear ﬁt. And hence, neither load nor signal strength can be conclusively named as a cause of error. pdf of censored powell error data, lognormal fit 0.12

0.1

Density

0.08

0.06

0.04

0.02

0

0

10

20

30

40

50

60

70

Data

Fig. 1. Censored error data PDF and ﬁt

To ﬁt a distribution to the error process, we compute the probability of error/sec (i.e., for each second what is the probability that a packet will be in error) and then use this set of data to ﬁt a distribution. We note that the probability of no error is quite high and thus any distribution ﬁtting will fail. We therefore resort to a simple technique where we censor the data. To explain this, consider Figure 1 which shows the PDF and a lognormal ﬁt of censored error data. The error data is censored as follows: we have 14400 seconds of data of which 7700 Table 3. Fit for censored error data. The second column represents probability of zero retransmissions/second. The third column is the ﬁt for the data when there is a non-zero probability of retransmission. For the ﬁt parameters we use standard notation.

psu-cs library cafeteria pioneer urban powell

Censored data Fit for censored data 0.63 lognormal [μ, σ] = [1.8, 0.75], Λ = 0.09 0.66 lognormal [μ, σ] = [1.87, 0.75], Λ = 0.15 0.31 lognormal [μ, σ] = [2.0, 0.9], Λ = 0.07 0.32 gamma [a, b] = [3.5, 5.3], Λ = 0.45 0.65 lognormal [μ, σ] = [1.25, 0.85], Λ = 0.21 0.5 lognormal [μ, σ] = [1.71, 0.93], Λ = 0.09

1176

C. Phillips and S. Singh

seconds saw no MAC layer retransmissions (50%). The PDF shown corresponds only to the times when there were MAC layer retransmissions. Table 3 summarizes the ﬁt observed for all six traces after censoring. It is interesting to note that in all cases except one, the best fit for the censored data is a lognormal fit with parameters [μ, σ] that are relatively close. Indeed the ﬁts are very good as indicated by the deviation metric Λ. The one exceptional trace, pioneer, is our only capture of an outdoor node – this may serve to explain the diﬀerent error process observed. 3.3

Packet Arrival Analysis

The metric we consider here is the number of packets/second seen in each trace (the bytes/sec metric follows the same distribution in all six cases). Table 4 summarizes the best distributional ﬁt for each of the six traces. We see that for half the traces t-location scale gives a good ﬁt and for the other half inverse gaussian provides a good ﬁt. Interestingly, the three traces following the inverse gaussian ﬁt correspond to a cafeteria – one at the university, one at a bookstore and a third which is a coﬀee shop. The three traces that follow t-location scale were generally characterized by few average users (2.1 – 2.6) and lower packet rates which caused non-stationarity. Table 4. Distribution ﬁts for pkts/sec Mean pkts/sec psu-cs 35.3 library 48.3 cafeteria 99.3 pioneer-sq 21.3 urban-grind 58.1 powell 52.9

3.4

Fit

Fit parameters t-loc scale [μ, σ, ν] = [20, 5.16, 1.09] t-loc scale [μ, σ, ν] = [32.6, 6.1, 1.1] inv gaussian [μ, λ] = [99.3, 75.9] t-loc scale [μ, σ, ν] = [14.1, 3.3, 1.1] inv gaussian [μ, λ] = [58.1, 97.3] inv gaussian [μ, λ] = [52.9, 36.7]

Deviation (quality) Λ = 2.2 Λ = 0.62 Λ = 0.28 Λ = 0.79 Λ = 0.33 Λ = 1.4

Flow Analysis

Flows are more representative of user behavior than are packet traces, and thus, it is important to consider various ﬂow metrics as well when comparing diﬀerent traces. We use two ﬂow metrics in this study – ﬂow arrival rate (number of ﬂows/sec) and ﬂow duration (seconds). We do not consider ﬂow interarrival time distribution because the ﬂow arrival rate metric is a cumulative metric based on the ﬂow interarrival times. To determine ﬂows, we proceeded as follows: we consider pairs of communicating IP address/port tuples and then identify as ﬂows sequential packet exchanges where there were no gaps greater than t = 64s. Flow duration is computed based on a time diﬀerence between the ﬁrst and last packet seen. Table 5 summarizes the distribution ﬁt for ﬂows/sec and ﬂow duration. Four traces exhibit the same negative binomial distribution for ﬂow duration. The exceptions are the library and cafeteria traces. The ﬂow arrival rate distributions, on the other hand, show much more variation.

Analysis of WLAN Traﬃc in the Wild

1177

Table 5. Flow distribution ﬁt Flow arrival rate Parameters exponential μ = 11.18 Λ = 2.9 library t-loc scale [μ, σ, ν] = [6, 4.6, 0.7] Λ = 1.1 cafeteria exponential μ = 9.6 (ﬂows/100s) Λ = 0.8 pioneer-sq generalized [k, σ, μ] = [2, 4.3, 1.9] extreme value Λ=2 urban-grind neg-binomial [r, p] = [0.1, 0.005] Λ = 0.18 powell neg binomial [r, p] = [0.018, 0.007] Λ = 2.9 psu-cs

Flow duration Parameters neg binomial [r, p] = [19.1, 0.69] Λ = 0.26 inv gaussian [μ, λ] = [9.5, 1.9] Λ = 2.3 weibull [a, b] = [6.7, 0.78] Λ = 6.5 neg binomial [r, p] = [0.56, 0.06] Λ = 0.9 neg-binomial [r, p] = [0.58, 0.35] Λ = 2.5 neg binomial [r, p] = [0.7, 0.03] Λ = 11

Our results for the ﬂow arrival process contrast sharply with the results of [5] where the authors ﬁnd that a weibull distribution ﬁts the observed data. However, their result was based on an hourly scale (i.e., number of ﬂow arrivals/hour) whereas our results model ﬂow arrivals/sec. Our results can thus be used for ﬁne time-grained modeling while their results can be used at larger time scales (hours, days). A second result from [5] shows that ﬂow duration as measured by number of packets in the ﬂow is lognormal. We measure ﬂow duration by time and in our case we generally see a negative binomial distribution with two exceptions. One possible reason for the diﬀerence in results is the deﬁnition of ﬂows. Unlike [5], we split a TCP ﬂow into more ﬂows if there is a lull in packets exceeding 64s. In other words, idle times (“thinking”) may result in separate ﬂows for the same TCP connection. This model of deﬁning ﬂows has previously been used in Internet traﬃc modeling [6].

4

Conclusion

The broad results of our analysis are as follows: user residing times can be well modeled by an exponential distribution, packet errors generally follow a lognormal distribution (censored data), load in packets/sec can be modeled using an inverse gaussian distribution (though for very lightly loaded networks t-location scale provides a better ﬁt), ﬂow duration are mostly negative binomial while ﬂow rates do not follow a common distribution. We can conclude that despite the diversity of the WLANs monitored, the users generally are similarly behaved, which is a very useful result from the point of view of future analysis.

References 1. VeriWave: http://www.veriwave.com (February 13 2007) 2. CRAWDAD: http://crawdad.cs.dartmouth.edu (February 13 2007) 3. Paxson, V.: Empirically derived analytic models of wide-area tcp connections. IEE/ACM Transactions on Networking 2(4) (August 1994) 316 – 336

1178

C. Phillips and S. Singh

4. Rodrig, M., Reis, C., Mahajan, R., Wetherall, D., Zahorian, J.: Measurementbased characterization of 802.11 in a hotspot setting. In: Proceedings of the ACM SIGCOMM 2005 Workshop on experimental approaches to wireless network design and analysis (E-WIND-05). (2005) 5. Meng, X., Wong, S., Yuan, Y., Lu, S.: Characterizing ﬂows in large wireless data networks. In: ACM MOBICOM. (Sept 26 – Oct 1 2004) 6. Zhang, Z.L., Ribeiro, V., Moon, S., Diot, C.: Small-time scaling behaviors of internet backbone traﬃc: an empirical study. In: IEEE INFOCOM. (2003) 1826 – 1836

Enhanced Rate Adaptation Schemes with Collision Awareness Seongkwan Kim1 , Sunghyun Choi1 , Daji Qiao2 , and Jongseok Kim3 School of Electrical Engineering and INMC, Seoul National University, Korea Department of Electrical and Computer Engineering, Iowa State University, USA 3 National Defense and Communications Institution, Korea [email protected], [email protected], [email protected], [email protected] 1

2

Abstract. While many existing rate adaptation schemes in IEEE 802.11 Wireless LANs result in severe throughput degradation since they do not consider the collision eﬀect when selecting the transmission rate, CARA (Collision-Aware Rate Adaptation) [1] shows improved system performance thanks to its collision-awareness capability. In this paper, we propose two enhancements to the original CARA scheme to further improve the system performance. The ﬁrst one is called CARA-RI, which extends CARA’s collision-awareness capability in making rate increase decisions, while the second one, called CARA-HD, incorporates a hidden station detection mechanism. Simulation results show that the proposed schemes outperform the original CARA signiﬁcantly under various randomly-generated network topologies.

1

Introduction

While the 802.11 PHYs (Physical layers) provide multiple transmission rates, the IEEE 802.11 standard [2] does not specify any algorithm to utilize the multiple transmission rates eﬃciently. Over the years, many rate adaptation schemes have been proposed in the literature and in practical devices. In a typical 802.11 system, multiple stations contend for the shared wireless medium; frame collisions are inevitable due to the contention nature of the 802.11 DCF (Distributed Coordination Function). Therefore, the eﬀectiveness of a rate adaptation scheme depends on not only how fast it can respond to the variation of wireless channel, but more importantly, how frame collisions may be detected and handled properly. There are two major categories of rate adaptation schemes: closed-loop and open-loop schemes. Closed-loop schemes rely on the interaction between transmitter and receiver, and the rate adaptation is dictated by the receiver. In comparison, open-loop schemes, with which a station makes the rate adaptation decision solely based on its local Ack (Acknowledgment) information, are

This research was in part supported by Information Technology Research Center (ITRC) and Samsung Advanced Institute of Technology (SAIT).

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1179–1182, 2007. c IFIP International Federation for Information Processing 2007

1180

S. Kim et al.

generally standard-compliant and very simple to implement, and hence, more popular. The most widely-adopted ARF (Automatic Rate Fallback) scheme [3] is an open-loop scheme. However, most existing open-loop schemes malfunction severely when there are many contending stations in the network, because they can not diﬀerentiate frame collisions from frame transmission failures caused by channel errors, and hence, may decrease the transmission rate over-aggressively. To address such problem, we proposed a novel rate adaptation scheme with collision-awareness capability, called CARA (Collision Aware Rate Adaptation) in [1]. CARA combines adaptive RTS/CTS (Request-to-Send/Clear-to-Send) exchange with CCA (Clear Channel Assessment) functionality to detect frame collisions. As a result, compared with other open-loop rate adaptation schemes, CARA is more likely to make the correct rate adaptation decisions, and hence, performs better. In this paper, we propose two enhancements to the original CARA to further improve the system performance. The ﬁrst one is called CARA-RI (Rate Increase), which extends CARA’s collision-awareness capability in making rate increase decisions, while the second one, called CARA-HD (Hidden Detection), incorporates a hidden station detection mechanism.

2

Proposed Schemes

In this section, we describe the details of CARA-RI and CARA-HD. We use CARA-BASIC to refer to the original CARA scheme1 , in order to diﬀerentiate it from the proposed extensions. 2.1

CARA-RI (CARA-Rate Increase)

CARA-BASIC yields higher throughput than ARF by making collision-aware rate decrease decisions. However, when attempting to increase the transmission rate, it simply employs ARF’s counting algorithm and resets the consecutive success count (m) upon any frame transmission failure. The key idea of CARA-RI is to introduce collision-aware rate increase decisions into CARA-BASIC. Specifically, unlike in ARF and CARA-BASIC, where m is reset with any frame loss, CARA-RI only resets m when the frame loss occurs with RTS/CTS preceding the transmission attempt. In other words, m is reset only when the frame loss can be clearly identiﬁed as a channel-error-caused failure. As a result, CARA-RI adjusts the transmission rate more quickly to the improving wireless channel condition than CARA-BASIC. 2.2

CARA-HD (CARA-Hidden Detection)

In the presence of hidden stations in a WLAN, the performance of the basic DCF can be severely degraded. It is because of the drastically increasing collision 1

The detailed description of CARA and discussions on its eﬀectiveness are omitted due to space limitation. Refer to [1] for more information.

Enhanced Rate Adaptation Schemes with Collision Awareness

1181

probability caused by hidden stations. The unprotected time interval, however, can be shortened to the RTS transmission time, by preceding the data frame transmission with the exchange of two short control frames, i.e., RTS and CTS frames, and hence, the hidden station problem can be ameliorated. Thanks to its collision awareness capability, CARA-BASIC achieves better performance than ARF in a hidden station environment. However, as pointed out in [4], CARA-BASIC might suﬀer from RTS oscillation, which alternates in turning on and oﬀ the RTS frame transmission. To better deal with the hidden station problem, we propose the second extension, called CARA-HD, which incorporates a hidden station detection mechanism that we proposed recently in [5]. The hidden station detection mechanism works as follows. We assume an infrastructure-based WLAN system, where all stations can hear the AP, but stations may be hidden to each other. Let A and B denote a pair of hidden stations in the network. Although A cannot hear the data transmission from B to the AP, it can hear at least the corresponding Ack frame transmission from the AP to B. By looking into either the frame content or the frame length information carried in the physical layer header, A should be able to ﬁgure out that this is an Ack frame transmission, and consequently, if A did not hear any data frame transmission associated with this Ack, it can claim detection of a hidden station (in this example, station B).

3

Performance Evaluation

In this section, we evaluate the following schemes in randomly-generated network topologies by using the ns-2 simulator: (1) ARF; (2) CARA-BASIC; (3) CARARI; (4) CARA-HD; and (5) CARA-HYBRID, which combines CARA-RI and CARA-HD. We simulate an infrastructure-based 802.11b system with the same simulation setup as that used in [1]. We set the operational parameters, such as the consecutive success threshold (Mth ), the consecutive failure threshold (Nth ), and the probe activation threshold (Pth ) as those used in [1]. In a random-topology network, all the stations are randomly placed within a circle around the AP with the radius of 80 meters. The transmission range of a station is set to 80 meters, and hence, hidden stations are likely to exist in the network. We assume a time-varying Ricean fading channel model with Ricean factor K = 3 dB to describe the indoor fading channel. We vary the number of contending stations from 5 to 20. Simulation results are plotted in Fig. 1, where each point is averaged over 50 simulation runs. In general, CARA schemes are signiﬁcantly better than ARF in terms of aggregate system throughput, thanks to their collision-awareness capabilities. CARA-HD always performs better than CARA-BASIC, and as the number of contending stations increases, the performance gap between them becomes more signiﬁcant. This is because, in a randomly-generated network, with more contending stations, there is a higher probability that hidden stations exist in the network, and, under such circumstances, CARA-HD becomes more eﬀective.

S. Kim et al.

Average aggregate throughput (Mbps)

1182

CARA-HYBRID CARA-RI CARA-HD CARA-BASIC ARF

3 2.5 2 1.5 1 0.5 0 5

10 15 Number of contending stations

20

Fig. 1. Throughput comparison in random-topology networks

It is interesting to see that CARA-RI indeed yields higher aggregate throughput than all other testing schemes except CARA-HYBRID. This is because CARA-RI increases the transmission rate more proactively. As a result, data frames are generally transmitted at higher rates with CARA-RI, and hence, time wasted on collisions and retransmissions is shorter. However, as the number of contending stations increases and, in turn, the probability that hidden stations exist increases, the eﬀect of the proactive rate increase is comparable with that of the hidden detection mechanism. This is evidenced by the similar performances of CARA-RI and CARA-HD shown in Fig. 1 when there are 20 contending stations in the network. CARA-HYBRID conducts both collision-aware rate increases and hidden station detection. From the ﬁgure, we observe that, under certain scenarios, CARAHYBRID performs worse than CARA-RI. This is surprising at the ﬁrst sight but rather reasonable for the following reason. Recall that CARA-HD deals with hidden station problem and reduces the collision length by enforcing an RTS/CTS exchange before each data transmission attempt. So when the extra RTS/CTS exchange overhead is larger than the overhead caused by collisions and retransmissions, combining CARA-HD with CARA-RI indeed degrades the throughput performance.

References 1. Kim, J., et al.: CARA: Collision-Aware Rate Adaptation for IEEE 802.11 WLANs. In: Proc. IEEE INFOCOM. (2006) 2. IEEE 802.11-1999, Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) speciﬁcations. (August 1999) 3. Kamerman, A., et al.: WaveLAN-II: a high-performance Wireless LAN for the Unlicensed Band. Bell Labs Technical Journal 2(3) (1997) 4. Wong, S., et al.: Robust Rate Adaptation for 802.11 Wireless Networks. In: Proc. ACM MobiCom. (2006) 5. Kim, Y., et al.: A Novel Hidden Station Detection Mechanism in IEEE 802.11 WLAN. IEEE Commun. Lett. 10(8) (2006)

A Study of Performance Improvement in EAP* Eun-Chul Cha and Hyoung-Kee Choi School of Information and Communication Sungkyunkwan University, Suwon, South Korea {iris1212,hkchoi}@ece.skku.ac.kr

Abstract. Followed by the popularity of the Internet, a number of access technologies to the Internet have been developed. EAP is an authentication framework. It is designed to provide the authentication functionality in the access network. Because of its flexibility and extensibility EAP poses a global solution for the authentication supported by many access networks. However, EAP has critical weaknesses in the protocol which may, in turn, decrease the EAP performance. Some of the weaknesses are caused by the “lock-step” flow control which only supports a single packet in flight. Considering the weaknesses, we propose the solution for the flow control. Using simulation we prove that our solutions improve the EAP performance. Keywords: Extensible Authentication Protocol (EAP), Network Access Authentication, Network Security.

1 Introduction The popularity of the Internet makes it possible for people to access the Internet anywhere and anytime. Followed by the popularity of the Internet, a number of access technologies to the Internet have been developed. Those users who want to access the Internet via the access networks such as 802.11 and 802.16 by IEEE must get permissions from a service provider. This transaction happens at the first time of entering networks between a user’s terminal and a base station (or access point). The base station authenticates the terminal if the user is a valid identity and vice versa if needed. A variety of mechanisms are available for an authentication such as Authentication and Key Agreement (AKA), Transport Layer Security (TLS) and Tunneled Transport Layer Security (TTLS). These mechanisms only explain methodologies for the authentication. They cannot be used in the access network without fitting them to collaborate with network protocols. Extensible Authentication Protocol (EAP) is designed for such situation. EAP provides a framework to overlay diverse authentication mechanisms on the access network. EAP defines only the message *

“This research was supported by the MIC(Ministry of Information and Communication), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Advancement)” (IITA-2006C1090-0603-0028).

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1183–1186, 2007. © IFIP International Federation for Information Processing 2007

1184

E.-C. Cha and H.-K. Choi

format and the message exchange. The actual authentication is done by the authentication mechanism encapsulated in the EAP message. EAP has critical weaknesses in the protocol which may, in turn, decrease the EAP performance. The weaknesses may be caused by the “lock-step” flow control. This flow control allows an only single packet in flight. If the EAP has a number of messages these messages should be sent in back-to-back instead of in pipeline. The delay to complete the delivery becomes relatively long. We propose to adopt the sliding window protocol in place of the lock-step protocol. This protocol allows a sender to transmit multiple messages in pipeline. Certainly this would lessen the overall delay to complete the authentication. The change of the flow control makes us review the incumbent error recovery scheme, that is “Stop-and-Wait” ARQ. We suggest replacing the “Stop-and-Wait” ARQ by Select Repeat ARQ. In conjunction with the sliding window protocol Selective Repeat ARQ is able to retransmit the only missing message. This paper is organized as follows. In Section 2, we present the performance issue in EAP and propose the improvement in EAP. We conclude in Section 3.

2 Proposed Improvement in EAP We present performance issues in EAP and propose possible solutions for those issues in this section. We use a metric, so-called authentication delay, to compare the performance. The authentication delay is the elapse time to complete the authentication using EAP. EAP makes use of a lock-step protocol for the flow control [1]: i.e., at any given time EAP is allowed to have a single request outstanding. This is okay with EAP as well as some authentication mechanisms running over EAP as the next message is always generated after the previous message returns. However, a few authentication mechanisms like EAP-TLS can have one message larger than one EAP MTU size due to the large certificate. If this is the case this large message needs to be fragmented at the EAP layer and sent out one fragmentation at one RTT. The Peer may not have the

0.8 0.6 0.4 0.2

1.2

EAP-TLS(Stop-and-wait) EAP-TLS with sliding window

1.0 0.8 0.6 0.4 0.2

1020 2040 3060 4080 5100 6120 7140 8160

1020 2040 3060 4080 5100 6120 7140 8160

Message size(byte)

Message size(byte)

Fig. 1. Authentication delays of EAP-TLS and EAP-AKA with respect to the message size. EAP-AKA has a shorter delay because all messages in EAP-AKA fits to a single EAP message.

Fig. 2. Comparison of authenticcation delay between lockstep and sliding window protocols

1400

Authentication delay(ms)

EAP-TLS EAP-AKA

Authentication delay(s)

Authentication delay(s)

1.0

Go Back N Selective Repeat Stop And Wait

1200

1000

800

600 1

3

5

7

Frame error rate(%)

Fig. 3. Comparison of authentication delay between error recovery mechanisms with EAP message of 8160 bytes

A Study of Performance Improvement in EAP

1185

response until the complete EAP message is delivered. In the meantime AUTH after sending the first message would wait for the response. To avoid a possible deadlock in this situation EAP is designed that the Peer sends a null EAP response message to AUTH if the message is fragmented. The delay to complete the delivery of the message equals to the number of fragmentations times RTT. At the worst case TLS can have a certificate which can as large as 16 Mbyte. With the 1020 Byte MTU size it would take 16,000 RTTs [2]. To validate the effect of the lock-step flow control we simulate EAP-TLS varying the size of one EAP message. For the simulation we use the ns-2 simulator. The result of the simulation is shown in Fig. 1. The authentication delay of EAP-AKA does not change with the message size. This is because EAP-AKA suggests not having the EAP message larger than one EAP MTU size [3]. The authentication delay of EAPTLS increases linearly with the message size. It is quite inefficient to use the lock-step flow control in EAP-TLS. Instead, we propose to adopt the sliding window flow control. In this scheme the sender may send multiple messages without having to wait for the response of the previous message. It is necessary to modify the EAP protocol to apply the sliding window flow control. The two buffers in the AUTH and Peer sides are essential. In addition, the Peer must be able to inform AUTH of the available buffer size. This would change the EAP message format. We introduce the option field to allow the Peer to advertise its window size. This advertisement is included in the acknowledgement. The acknowledgment is carried in the EAP null message. The identifier field indicates the message ID being acknowledged. For the backward compatibility issue those who do not understand the new message format would generate the NAK response. Then the lock-step protocol should be used in this case. For the EAP message smaller than or equal to the EAP MTU size (1020 bytes) no delay difference between the “lock-step” and “sliding window” as shown in Fig. 2. However, as the EAP message grows beyond the EAP MTU size the gap between the two schemes is apparent. For instance the authentication delay in the sliding window is decreased by 53 percents at the 8160 byte message. Like many other protocols the error recovery in EAP works in conjunction with the flow control. Since the sliding window protocol is proposed for the flow control it is necessary to find an ARQ mechanism to work closely with the sliding window protocol. We examine the two mechanisms, Go-BACK-N ARQ and Selective Repeat ARQ. In Go-Back-N if the Nth message is detected to err, the sender must go back to the Nth message and transmit following messages again from the Nth message. The (N+1)th message may be sent without errors at the first attempt. However, Go-BackN ends up with retransmit ting the (N+1)th message. At the same situation Selective Repeat ARQ allows to retransmit only the Nth message without having to retransmit the (N+1)th message. To evaluate the effect of the two ARQ mechanisms we measure the authentication delay of EAP varying the frame error rate (FER) on the IEEE 802.16 link. EAP messages are delivered over the PKM protocol in 802.16. PKM provides its own ARQ scheme. However, the frame drop is still possible. After the collision the mobile station attempts to retransmit the frame within the limited number of times. If the collision continues beyond the certain limit the MS gives up transmitting that frame.

1186

E.-C. Cha and H.-K. Choi

Fig. 3 shows the authentication delay against the FER. The two proposed schemes decrease the authentication delay significantly. At the three percent FER, Selective Repeat ARQ and Go-Back-N lessen the delay by 35.1 and 29.6 percents, respectively. The EAP protocol needs to be modified to adopt the two proposed schemes. Because most of the modifications for the flow control are shared with the error recovery we explain the additional changes unique to the error recovery. One major concern in deploying ARQ is that the messages can be out of sequence due to the selective retransmission. EAP is designed to drop out-of-sequence messages implicitly [1]. The great care must be taken to implement ARQ on EAP. The modification for Go-Back-N is minimal because the message, in this scheme, cannot be out of sequence at the receiver side. The problem is only valid for Selective Repeat ARQ. We propose to assign a sequence number to all messages to rearrange the order at the receiver side. To do this we take advantage of the identifier field in EAP for the sequence number. For the temporal storage for out-of-sequence messages we use the receiver buffer made it for the flow control.

3 Conclusion EAP is a lock-step protocol which only supports a single message in flight. If the message is large the fragmentation in EAP is inevitable. The delay to complete the delivery of the message equal to the number of fragmentations times RTT. We proposed the sliding window protocol for the flow control in EAP. The simulation result indicates that the sliding window protocol decreases the authentication delay of the two fragmented message by 53 percent.

Reference 1. B. Aboba et al., “Extensible Authentication Protocol(EAP),” RFC 3748, Jun. 2004. 2. B. Aboba and D. Simon, “PPP EAP TLS Authentication Protocol,” RFC 2716, Oct. 1999. 3. J. Arkko and H. Haverinen, “Extensible Authentication Protocol Method for 3rd Generation Authentication and Key Agreement(EAP-AKA),” RFC 4187, Jan. 2006.

Characterization of Ultra Wideband Channel in Data Centers N. Udar1 , K. Kant2 , R. Viswanathan1 , and D. Cheung2 1

Southern Illinois University, Carbondale, IL 2 Intel Corporation, Hillsboro, OR

Abstract. In this paper, we present a measurement based characterization of the Ultra Wideband (UWB) channel in a data center environment. We ﬁnd that although a modiﬁed Saleh-Valenzuela model characterizes the UWB channel, some of the model parameters such as delay spread and log normal shadowing are unique to the data center environment. Keywords: Ultra Wideband (UWB), data centers, S-V model, path loss.

1

Introduction

In recent years, Ultra Wideband (UWB) communications has received great interest from both the research community and industry. UWB transmissions are subject to strict power regulations and thus are best suited for short-range communications. The IEEE standards group on personal area networks (PANs) is actively working on UWB based communications under Wi-Media alliance and 802.15.4 task group. UWB has been adopted as the underlying technology for the Wireless USB (Universal Serial Bus) standard – a wireless replacement for the popular wired USB interface, and also being developed by the Wi-Media. Although WUSB is designed for the client space, its ubiquity will allow it to be exploited in servers for creating an out-of-band fabric which can be used for a variety of applications in a data center. The objective of this paper is to lay a foundation for new applications scenario for UWB in data center management e.g.asset location. The study reported in this paper characterizes UWB propagation in data centers via direct measurements in a real data center in the UWB frequency band (3-8 GHz).

2 2.1

UWB Propagation Models Indoor Channel Characteristics

Indoor wireless propagation channels have been investigated in the in the context of residential, oﬃce or industrial environments in [1,2,3]. The signal that arrives consists of multiple replicas of the originally transmitted signal; this phenomenon is known as multipath propagation. The diﬀerent multipath components (MPCs) are characterized by diﬀerent delays and attenuations. In cellular systems, where I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1187–1191, 2007. c IFIP International Federation for Information Processing 2007

1188

N. Udar et al.

signal bandwidth is relatively narrow, the multipath components that arrive within short time intervals are not resolvable and therefore combine to produce Rayleigh or Rician distribution of the overall amplitude. Based on a detailed set of studies, IEEE 802.15.3 committee (now Wi-Media) settled on a modiﬁed S-V model [1,4] to enable comparison of various technologies in the PANs. The actual measurements indicate that the MPCs arrive in clusters rather than in continuum. 2.2

Data Center Environment

A data center can be compared with a library room where we have several metallic racks containing servers. The racks are 78” high, 23-25” wide and 26-30” deep and are generally placed side by side in a row without any spacing (other than a supporting beam). A rack can be ﬁlled up with either rack mount or blade servers. Rack mounted servers go horizontally in the rack and have typical heights of 1U or 2U where a “U” is approximately 1.8”. The high density blade servers go vertically in a 19” high chassis, with 14 blades/chassis. If all racks in the data center can be treated as essentially continuous metal blocks, the characterization could be relatively straightforward. The racks are not always ﬁlled up with servers, thereby creating many holes through which the radiation can leak. Because of the increasing stress placed by high density servers on cooling and power distribution infrastructure, the racks in older data centers simply cannot be ﬁlled to capacity. The net result is a unique environment with “organized clutter”. 2.3

S-V Channel Models

The Saleh-Valenzuela model [4] characterizes the channel behavior via a superposition of clustered arrivals of various delay components. Suppose that the received signal for a transmitted impulse consists of C clusters, and Rc MPCs (or “rays”) within the c th cluster. Let Tc denote the arrival time of the c th cluster (i.e., that of the ﬁrst ray within this cluster) and let τcr denote the arrival time of the rth ray within the cluster (relative to the arrival time of the ﬁrst ray). Then the impulse response h(t) of the channel is given by: h(t) =

Rc C

acr δ(t − Tc − τcr )

(1)

c=1 r=1

where δ(.) is the Dirac delta function, and acr is the relative weight (or multipath gain coeﬃcient) of ray (c, r). The essence of the S-V model is to make speciﬁc assumptions about the cluster and ray arrival processes and multipath gain in the above equation. In particular, the basic S-V model assumes that both intercluster and inter-ray times are exponentially distributed, thereby making the corresponding counting processes Poisson. That is, P (Tc − Tc−1 > x) = e−λc x and P (Tc,r − Tc,r−1 > y) = e−λr y , where λc and λr are, respectively, mean cluster and ray arrival rates. As for the coeﬃcients acr ’s, the S-V model assumes an exponential decay for both cluster power and ray power within a cluster as a

Characterization of Ultra Wideband Channel in Data Centers

1189

function of the delay. That is, a2cr = a200 e−Tc /Γ e−τcr /γ where a200 is the power of the very ﬁrst ray, and Γ and γ are the cluster and ray decay constants. Several indoor measurements have shown that the assumption of Poisson process for ray arrivals does not yield a good ﬁt. Reference [1] discusses a modiﬁed S-V model where the ray arrival process is modeled as a mixture of two Poisson processes. We shall see later that our data center measurements agree with this model. In addition to MPC arrival characterization, there are several other aspects to consider in order to fully describe the channel. The path loss model indicates how the power decays as a function of distance. For free-space propagation, the path loss at distance d is given by (4πd/λ)2 , where λ is the wavelength. In a cluttered environment, the loss exponent could be signiﬁcantly diﬀerent from 2 because of reﬂection and diﬀraction. Path loss, cluster power decay, and ray decay phenomenas discussed above are all parameters of the model. An appropriate way to characterize is to consider cluster and ray power as random variables with associated means and standard deviations. The standard deviations σc and σr then become essential parameters of the S-V model and need to be estimated. Another aspect of interest is the time variance of the channel. Wireless channel characteristics may be inﬂuenced by environmental factors such as temperature, humidity, air ﬂow, movements, etc. Fortunately, in data center environments, such variations are expected to be small and infrequent, and time variance characterization may be unnecessary. Our measurements, though not shown here, are observed to validate this conclusion.

3 3.1

Channel Characterization Measurement Setup

The measurements were conducted in a medium sized Intel data center using an Agilent 8719ES vector network analyzer. The network analyzer was set to transmit 1601 continuous waves distributed uniformly over 3-8 GHz. The 5 GHz bandwidth gives a temporal resolution of 0.2 ns. Fig. 1 shows the location of the transmitter and the receivers where the two lines indicate opposing racks. To measure the small scale statistics of the channel the Rx was moved 25 times around each local point over a 5 by 5 square grid with 5 cm spacing. Based on measurements Figs. 2 Fig. 1. TX/RX Locations and 3, show the plots of complementary cumulative distribution functions of ray inter-arrival times and cluster inter-arrival times, respectively. Fig. 2 also shows the S-V model ﬁt labeled as Single Poisson Process, and a mixture of two Poissons, which was proposed as a modiﬁed S-V model for the indoor data in [1]. Fig. 2 shows clearly that the modiﬁed S-V model provides a better ﬁt to the data than the single Poisson process. For cluster inter-arrival

1190

N. Udar et al.

times in Fig. 3, a single Poisson process provides a reasonable ﬁt only if few of the clusters arriving at times greater than 120 ns are ignored. The latter cluster arrivals correspond to multipath due to reﬂections from a wall of the data center. Hence, a cluster of clusters may be a better description for this behavior than a pure Poisson model.

Fig. 2. Ray Inter-arrival Times for S-V Model

Fig. 3. Cluster Inter-arrival Times for S-V Model

Fig. 4. Pathloss Vs Distance from Transmitter

Fig. 4 shows the path loss (PL) in dB versus the distance between diﬀerent receivers (Rxs) and the transmitter (TX). If we express the path loss as PL (dB) = 10n log(n) + const, where n is the path loss exponent and d is the distance between the TX and the RX, Fig. 4 shows that the exponent is about 1.6, as expected in an indoor environment [5]. That is, the path loss in a data center decreases much slower with distance than in free space. This is due to the fact that a large number of diﬀractions and reﬂections taking place in the metallic racks and other components present in the vicinity of Tx and Rx contribute to a much increased received power than is possible in free space. Table. 1 compares available Table 1. Comparison of Indoor Environments data center channel parameters Environ Path Loss Fading Mean delay against those for other indoor en-ment Exponent std. dev. spread (ns) vironments [5] in terms of path LoS NLoS LoS NLoS LoS NLoS loss exponent, standard deviaResidential 1.79 4.58 1.79 4.58 5.44 30.1 tion of the assumed log-normal Oﬃce 1.63 3.07 1.9 3.9 n/a n/a fading characteristics, and the Data center 1.6 n/a 2.3 n/a 18 n/a mean delay spread. It is seen that the last two parameters are higher for data centers, perhaps as a result of a lot of metallic clutter in this environment. In summary, this paper characterizes the UWB propagation within a data center environment and shows that the data center environment is similar but not identical to other indoor environments that have been studied in the past. To the best of our knowledge, this is the ﬁrst study of its kind and lays the ground work for further exploration of UWB communications in a data center.

Characterization of Ultra Wideband Channel in Data Centers

1191

References 1. C-C Chong, Y. Kim, S-S Lee, “A modiﬁed S-V clustering channel model for the UWB indoor residential environment”, Proc. of 1st IEEE Veh. Technol. Conf, pp. 58-62, Stockholm, Sweden, May 2005. 2. C-C. Chong, Y. Kim, S-S. Lee, “Statistical characterization of the UWB propagation channel in various types of high-rise apartments”, 2nd wireless comm. and networking conf, Vol. 2,pp.944-949, March 2005. 3. J. Karedal, S. Wyne, et. al., “Statistical analysis of the UWB channel in an industrial environment”, Vehicular Technology Conf, Vol.1 pp. 81-85, Sept. 2004. 4. A. Saleh and R. Valenzuela, “A Statistical Model for Indoor Multipath Propagation”, IEEE JSAC, Vol. SAC-5 no.2, pp. 128-137, Feb. 1987 . 5. A.F. Molisch et.al. “IEEE 802.4a Channel Model-Final Report”, Nov. 04 Avaliable Online http://www.ieee802.org/15/pub/TG4a.html

Evaluating Internal BGP Networks from the Data Plane Feng Zhao, Xicheng Lu, Baosheng Wang, and Peidong Zhu School of Computer, National University of Defense Technology, Changsha 410073, Hunan, China [email protected]

Abstract. This paper focuses on the design of IBGP networks, which are very important to the reliability and stability of Internet. Although several metrics have been presented to measure the robustness of IBGP networks, they only considered the impact of route reflection networks on the control plane. A robust network should have low sensitivity to traffic load variations. So we propose a new metric to characterize the impact of route reflection networks on the data plane, which is called TDR (Traffic Diversion Rate). Simulation results show that adopting the optimal route reflection topology that minimizes TDR will make the network lose or shift much less traffic, compared with adopting the optimal route reflection topologies found according to other metrics.

1 Introduction Full-mesh IBGP does not scale well. As an alternative to full mesh IBGP, BGP route reflection is often used in IBGP topology design for large ASes. If there is no IBGP session that will fail, it does not matter adopting which IBGP topology. However, link or router failures occur as part of everyday operation in backbone networks of a large AS. They can cause IBGP session failures. When an IBGP session is lost, all related routes in the BGP routing tables have to be withdrawn and thus IP networks may become unreachable. Also when routes change, some traffic flows may be forwarded along different intradomain paths. Traffic shifts will modify the distribution of the traffic inside the network and change the load of some links. As a consequence, some links can even become congested. Robust IBGP networks are very important to the stability and reliability of Internet. To measure the robustness of route reflection networks, reference [1-4] propose several metrics: Hop Count, Reliability Product, IBGP Expected Lifetime, Expected Session Los, Resilience, IBGP Failure Probability, Expected Connectivity Loss. However, they only considered the impact of route reflection networks on the control plane. Even small routing shifts of popular routes can impact the data plane by causing a large swing in traffic, perhaps leading to congestion, loss, delay, and jitter. A robust network should have low sensitivity to traffic load variations [5]. So we set out to evaluating the control plane quality by evaluating the data plane. We propose a new metric to characterize the impact of route reflection networks on the data plane, which is called TDR (Traffic Diversion Rate). I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1192–1195, 2007. © IFIP International Federation for Information Processing 2007

Evaluating Internal BGP Networks from the Data Plane

1193

2 Traffic Diversion Rate A typical network in an AS is represented as graph G (V , E ) . Node set V represents all the routers. E is the set of physical links. We use S to denote the set of all network states, which includes all failure states and the state without any failure. Fs is the set of physical components that fail in state s ∈ S , and Fs ∈ V ∪ E . Other com-

ponents, which are not in Fs , work properly. The probability that state s occurs is rs . If router i can not exchange BGP routing information with router j directly or indirectly due to IBGP session failures, i and j are separated logically from each other and we denote this relation as [i ←a j ] .The probability that i and j are separated logically from each other in failure state s is denoted by Prs [i ←a j ] . We denote the ingress-to-egress traffic matrix by M and denote the traffic that flows from router i to router j by M i j .We define the Traffic Diversion Rate in state s as Ts .

Ts =

∑

i , j∈V r

P rs [ i ← a j ]

M

ij

∑

i , j∈V r

M

ij

Therefore, the Traffic Diversion Rate over the entire state space is

T = ∑

s∈ S

rs T s

After having defined the reliability metric for IBGP networks, we describe the optimization problem based on this metric as follows: Problem 1 (Robust Reflection – TDR (RR-TDR)): Given: a network G (V , E ) ; the upper bound of the node degree: { c i } , where

i ∈ V ; the probabilities of failure scenarios : { rs | s ∈ S } ,where S = V ∪ E ;

IBGP session failure probability { q s | s ∈ S } ; the ingress-to-egress traffic matrix

{ M ij } , where i , j ∈ V ; IGP weights, we look for a reflection topology Gr* (V , Er ) such that (1) hi ≤ ci , for ∀i ∈ Gr* .V ; (2) TS (Gr* ) ≤ TS (Gr ) , for any reflection topology Gr (V , Er ) which satisfies hi ≤ ci , for ∀i ∈ Gr .V . The goal of the IBGP topology design problem is to find a route reflector topology with minimum Traffic Diversion Rate.

3 Experiments The optimal IBGP topologies based on different metrics may be different. We think between two IBGP topologies, if the network loses or shifts less traffic under an IBGP topology than under the other, then the IBGP topology is better than the other. The source data of our experiments comes from a real network, the US research network (Abilene). By using TOTEM, we got the realistic traffic matrix M of Abilene on Jan 1, 2005. Also by using TOTEM, we calculated the number of external routes

1194

F. Zhao et al.

that are obtained by a router from its EBGP peers and are further injected into the IBGP network. Theoretically, there could be multiple levels of reflection. However, in practice, the two-level reflection is most often used. In the experiments, we only focus on the twolevel reflection. We further assume that the reflection graph has not redundancy. In the experiments, we assume that all routers won’t fail and the failure probabilities of all links are uniform. Also the conditional failure probabilities of the IBGP sessions that are affected by the link failures are uniform, which are not zero. And we assume the number limit of IBGP sessions is 6. Then with over 1000 lines of Matlab code, we find the optimal IBGP topologies for different metrics. The optimal IBGP topologies for some different metrics may be the same. We get the 5 IBGP topologies for these 8 metrics: topology 1 for the metrics of Hop Count, Reliability Product and IBGP Failure Probability; topology 2 for the metric of Expected Lifetime; topology 3 for the metric of Expected Session Loss; topology 4 for the metric of Resilience and Expected Connectivity Loss; topology 5 for the metric of Traffic Diversion Rate. We use SSFNet to study the impact of these different route reflection networks on the data plane. In our simulations, there is a network of 11 stub ASes and a transit AS. The topology of the transit AS is the core Abilene network topology. Each stub AS has one router (which connects a router in the transit AS) running BGP and one host. In this transit AS, BGP is running at all routers. All of the Internal BGP peering sessions use loopback addresses for peer destination addresses. And each router in the AS runs OSPFv2. To simulate that there will occur IBGP session failures, we set the experiment parameters as follows: BGP hold time: 60 seconds; OSPF router dead interval: 80 seconds; OSPF hello interval: 10 seconds; Link failure duration: 90 seconds. With these parameter settings, the OSPF routing recovery time will be greater than 70 seconds. Thus a link failure will cause some IBGP sessions to fail. It may take 165 seconds for failed IBGP sessions to be reestablished after the OSPF routing recovers. So we set the simulation time 500 seconds to allow failed IBGP sessions to be reestablished before a simulation ends. At the time of 200th second, the application session of each host starts to send packets to other hosts. The host in AS i will send packets to the host in AS j with the rate M (i, j ) /100 . At the time of 210th second, we inject a network failure by bring down a link in the transit AS. And the link is recovered at the time of 300th second. Because there are 5 optimal IBGP topologies for the metrics and 14 links in the core, we perform 70 simulations and record the number of packets lost in the transmission. The average number of lost packets for each link failure is shown in figure 1. From this figure we can see that the number of lost packets is the least under the optimal IBGP topology based on the TDR metric. Although we do the experiment only with a traffic pattern and a failure model, we believe under other traffic patterns and other failure models, the optimal topology based on TDR metric will make the network lose or shift much less traffic, because TDR metric is derived from the data plane directly.

Evaluating Internal BGP Networks from the Data Plane

1195

The average number of lost packets for each link failure 3250000 3000000 2750000 2500000 2250000 2000000 1750000 1500000 1250000 1000000 Topo.1

Topo.2

Topo.3

Topo.4

Topo. 5

Fig. 1. Number of IBGP sessions: Rocketfuel ISP topologies

4 Conclusion This paper proposes TDR, a new metric to characterize the impact of route reflection networks on the data plane. Our experiment shows that under the same traffic pattern and the same failure model, different IBGP topologies will make the network exhibit different behaviors. And the optimal topology based on TDR metric will make the network achieve better performance than other optimal IBGP topologies based on the metrics proposed before.

References 1. L. Xiao, and K. Nahrstedt, Reliability models and evaluation of internal BGP networks, in Proceedings of IEEE INFOCOM, 2004. 2. L. Xiao, J. Wang, and K. Nahrstedt, Optimizing ibgp route reflection network, in Proceedings of IEEE ICC, 2003. 3. L. Xiao, J. Wang, and K. Nahrstedt, Reliability-aware ibgp route reflection topology design, in Proceedings of IEEE ICNP, 2003. 4. L. Xiao, and K. Nahrstedt, Reliability of Internal BGP Networks: Models and Optimizations, Technical Report UIUCDCS-R-2005-2608/UILU-ENG-2005-1800, 2005. 5. R. Teixeira, N. Duffield, J. Rexford, and M. Roughan, Traffic matrix reloaded: impact of routing changes, In Proc. of PAM 2005, 2005.

Performance of a Partially Shared Buﬀer with Correlated Arrivals Dieter Fiems, Bart Steyaert, and Herwig Bruneel SMACS Research Group, Department TELIN Ghent University, St-Pietersnieuwstraat 41, 9000 Gent, Belgium

Abstract. We assess the performance of a partially shared bottleneck buffer for scalable video. The arrival process of the video packets is modelled by means of a two-class discrete batch Markovian arrival process. Using a matrix-analytic approach, we retrieve various performance measures. We illustrate our approach by means of a numerical example.

1

Introduction

Scalable video coding is able to cope with bandwidth ﬂuctuations in packet networks [1]. A video stream is encoded in a base layer and one or more enhancement layer streams. Only the base layer is needed to decode and playback the video, although at a poor quality. Combined with the enhancement layers, the video can be played back at full quality. Intermediate network nodes should therefore drop packets of the enhancement layer to ensure delivery of base layer packets. In this way, the video quality can be reduced gracefully during congestion periods. To ensure delivery of base layer packets when the network is congested, network nodes are required to support some sort of Quality of Service diﬀerentiation. Partial Buﬀer Sharing (PBS) implements service diﬀerentiation by means of a threshold based packet acceptance policy. As long as the number of the packets in the buﬀer does not exceed a ﬁxed threshold, both base (class 1) and enhancement layer (class 2) packets are accommodated by the buﬀer. Once the number of packets in the buﬀer exceeds the threshold, only base layer packets are accepted. A PBS acceptance policy oﬀers space priority at the cost of some overall throughput loss. However, it is easily implementable in practice as opposed to e.g. a push-out buﬀer [2].

2

Performance Analysis

The bottleneck buﬀer under consideration operates synchronously, i.e., time is slotted. There are two traﬃc classes (class 1 and 2) and packets of these classes arrive in accordance with a two class discrete-time batch Markovian arrival process (2-DBMAP). Such a process is completely characterised by a doubly indexed sequence of substochastic matrices An,m . The matrix An,m governs the transitions of the Markovian environment of the 2-DBMAP when there are n class 1 I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1196–1199, 2007. c IFIP International Federation for Information Processing 2007

Performance of a Partially Shared Buﬀer with Correlated Arrivals

1197

and m class 2 arrivals. The transmission time of packets is ﬁxed and equal to the slot length. Up to N packets can be transmitted at a slot boundary and the buﬀer can accommodate up to M packets, including the packets being transmitted. However, a class 2 packet is only admitted when there are no more than a threshold T ≤ M packets present upon arrival of this packet in accordance with the PBS acceptance policy as described in the introduction. Since there may be arrivals of both classes as well as departures at a slot boundary, one needs to specify the order in which these take place. We here assume the following order: (1) departures; (2) arrivals of class 1; (3) arrivals of class 2. We consider the queue content at random slot boundaries and let πi denote the vector whose nth element equals the probability that the 2-DBMAP is in state n at a random slot boundary while there are i packets in the queue at that boundary. By means of matrix analytic techniques, one can show that the vectors πi satisfy the following set of equations, πn Dn,N −n , πi = πn Cn,N +i−n , πn e = 1 . (1) π0 = n

n

n

for i = 1, 2, . . . , M − N . Here e is a column vector of ones and the matrices Cn,m and Dn,m are deﬁned as follows, Cn,m =

∞

Ag,h Θ(g, h; n, m) ,

Dn,m =

g,h=0

m

Cn,k ,

(2)

k=0

with Θ(g, h; n, m) = 1(min(g, M − n) + min(h, (T − n − g)+ ) = m) and with 1(·) the standard indicator function. One easily shows that this set is of M/G/1/N type and therefore (1) can be solved eﬃciently by the reduction algorithm [3]. Given the vectors πi , we may obtain various performance measures. E.g., the accommodated class 1 and class 2 arrival load are given by, πn m Ak,l e 1(min(k, M − n) = m) , (3) ρ˜1 = n,m

ρ˜2 =

n,m

k,l

πn m

Ak,l e 1(min(l, (T − n − k)+ ) = m) ,

(4)

k,l

respectively. The class i packet loss ratio (plr) is then given by plri = 1 − ρ˜i /ρi . Here ρi denotes the class i arrival load, nAn,m e , ρ2 = τ mAn,m e , (5) ρ1 = τ n,m

n,m

with τ the normalised Perron-Frobenius eigenvector of the matrix k,l Ak,l . The analysis of the probability mass function of the packet delay is more involved. Consider a random slot boundary and let c(l, n, m) denote the probability that there are l packets in the buﬀer and that there are n class 1 and m class 2 packet arrivals that the buﬀer can accommodate. We have, πl Ag,h e 1(min(g, M − l) = n ∧ min(h, (T − l − g)+ ) = m) . (6) c(l, n, m) = g,h

1198

D. Fiems, B. Steyaert, and H. Bruneel

Given these probabilities, we ﬁnd that the fraction of slots νi (k) where there is a class i arrival that ﬁnds k packets in the buﬀer upon arrival equals, M−l

min(k,N )

ν1 (k) =

l=0

ν2 (k) =

k

c(l, n, m) ,

(7)

c(m, l − m, n) ,

(8)

n=k−l+1 m

(T −l)+

l=0 n=k−l+1 m

for k = 0, 1, . . . , M −1 and for k = 0, 1, . . . , T −1 respectively. As there is at most one such packet arrival at a slot boundary, the probability ui (k) that a random class i packet ﬁnds k packets upon arrival in the buﬀer equals νi (k)/ρ˜i . Finally, let packet delay be deﬁned as the number of slots between a packet’s arrival and departure slot boundary. Since up to N packets leave the buﬀer system at a slot boundary, the probability di (n) that the delay of a class i packet equals n slots is given by, nN −1 di (n) = ui (k) , (9) k=(n−1)N

for n = 1, 2, . . . , M/N and for n = 1, 2, . . . , T /N for the class 1 and class 2 delay respectively.

3

Numerical Example

To illustrate our approach, consider the case where 8 video sources are routed through a bottleneck buﬀer (M = 100, N = 4). Playout buﬀers are used at the destination nodes to cope with delay ﬂuctuations in the network. Each video source is modelled as an on/oﬀ source generating one packet at each slot boundary in the on state and no packets in the oﬀ state. A fraction θ of the packets belong to class 1. The on/oﬀ processes are completely characterised by the pair (σ, K). Here σ denotes the fraction of time that the source is on and K is measure for the absolute lengths of the on- and oﬀ-periods [4]. In the remainder we set σ = 0.4, corresponding to an 80% load of the bottleneck buﬀer. In Fig. 1(a), plr1 and plr2 are depicted vs. T for various values of K and for θ = 0.5. For each K, the upper and lower curves depict plr1 and plr2 respectively. For K = 10, the middle curve depicts the plr of a random packet. Increasing T yields an exponential decrease of plr2 at the cost of an exponential increase of plr1 . Also, the plr of a random packet increases for decreasing values of T . I.e., PBS oﬀers service diﬀerentiation at the cost of additional packet loss. Further, performance of the buﬀer system deteriorates when the arrival process is more bursty (larger K). Fig. 1(b) depicts the experienced packet loss ratio of the class 1 (eplr1 ) and 2 (eplr3 ) video ﬂows versus the playout delay δ for various values of θ. The eplri includes packet loss in both bottleneck and playout buﬀer (due to underﬂow). I.e., a packet is not lost if it is not dropped by the bottleneck buﬀer and if its delay in this buﬀer does not exceed δ. The parameters T and K are set

100

100

10-2

-1

10

10-4

total plr

plr

Performance of a Partially Shared Buﬀer with Correlated Arrivals

10-6

10-10 0

20

φ=0.25 φ=0.5 φ=0.75

10-2 10-3

K=10 K=50 K=100

10-8

1199

10-4 40 60 threshold

(a)

80

100

0

5

10 15 20 playout delay

25

30

(b)

Fig. 1. Class 1 and 2 packet loss ratio vs. threshold T (a) and experienced packet loss ratio of class 1 and 2 vs. playout delay

to 80 and 100 respectively. For small δ, loss is mostly caused by buﬀer underﬂow in the playout buﬀer. Once δ reaches T /N , class 2 underﬂow is no longer possible in the playout buﬀer since the delay in the bottleneck buﬀer is bounded by T /N . We have, eplr2 = plr2 . Also eplr1 drops fast. This is explained by the fast decay of the class 1 delay probability mass function.

Acknowledgement This work has been partly carried out in the framework of the CHAMP project sponsored by the Flemish Institute for the Promotion of Scientiﬁc and Technological Research in the Industry (IWT).

References 1. Radha, H., Chen, Y., Parthasarathy, K., Cohen, R.: Scalable internet video using MPEG-4. Signal Processing: Image communication 15(1-2) (1999) 95–126 2. Kr¨ oner, H., H´ebuterne, G., Boyer, P., Gravey, A.: Priority management in ATM switching nodes. IEEE Journal on Selected Areas in Communications 9(3) (1991) 418–427 3. Blondia, C., Casals, O.: Statistical multiplexing of VBR sources: A matrix-analytic approach. Performance Evaluation 16 (1992) 5–20 4. Fiems, D., Steyaert, B., Bruneel, H.: Analysis of a discrete-time GI-G-1 queueing model subjected to bursty interruptions. Computers & Operations Research 30(1) (2003) 139–153

Filter-Based RFD: Can We Stabilize Network Without Sacrificing Reachability Too Much? Ke Zhang and S. Felix Wu Computer Science Department, The University of California, Davis {kezhang,sfwu}@ucdavis.edu

1

Introduction

Internet instability, also referred to as route ﬂaps, can propagate to the whole Internet and consume remarkable computational resource of the routers. Route Flap Damping (RFD) [1] is designed to stabilize the Internet by suppressing persistent route ﬂaps. RFD is a penalty based mechanism.The magnitude of the penalty value indicates the degree of instability of an inter-domain route. Once the penalty reaches a certain threshold, the route will be suppressed. This simple mechanism does not work perfectly. First, the way it identiﬁes route ﬂaps or accumulates penalty is too aggressive and may suppress a fairly stable route with a few occasional ﬂaps. Second, a route may be suppressed even after it has converged. Extensive researches on the side-eﬀects of Route Flap Damping have been done recently [2,3,4]. Mao et al. [2] proposed an intriguing approach, selective route ﬂap damping. Their approach tries to solve the ﬁrst problem by distinguishing persistent route ﬂap from occasional route ﬂap. However, the ISP industry seems to lack interest in the new RFD implementations [5]. A major reason is that as a penalty system, RFD cannot stabilize the network without sacriﬁcing reachability. Although stabilization and high reachability are highly desired, RFD can not optimize the two aspects at the same time. In order to persuade ISP industry to adopt the new RFD ideas, we need to demonstrate that the new mechanisms can really achieve optimal trade-oﬀ between the two aspects of RFD. This paper tries to ﬁll this gap. We propose an empirical RFD tune-up to improve route ﬂap damping based on two heuristics. First, an occasional ﬂap can trigger excessive route updates due to path exploration. These updates are usually observed as a burst of updates. If we accumulate RFD penalty only based on sampled updates, the occasional ﬂap will not be punished and we can still capture the long term persistent route ﬂaps. Second, by examining Internet BGP updates, people observe that when failure is recovered, most of the time the route converges to the previous primary AS path [6]. The primary path can be viewed as a signal indicating that the route is converged into a stable state. Thus, the suppressed route can be reused when the primary path appears, which will signiﬁcantly reduce the suppression time. We conduct extensive experiments to evaluate our optimized RFD and SRFD mechanisms and try to answer the question: can we stabilize the network without sacriﬁcing reachability too much? We examined the following factors in the I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1200–1203, 2007. c IFIP International Federation for Information Processing 2007

Filter-Based RFD

1201

experiments: 3 typical BGP update burst sequences, 2 typical MRAI timer, single homing or mutli-homing network. We evaluate these RFD mechanisms from the following perspectives: – How many BGP update messages triggered by route ﬂaps are suppressed? – How much does RFD impact routing convergence? – How much is network stability improved? – How well the network reachability is maintained? The results show that both SRFD and FRFD can signiﬁcantly reduce the sideeﬀects of RFD and stabilize the network as expected.

2

Filter-Based RFD

Burst of BGP updates caused by path exploration usually lasts for a shorter time period compared to persistent route ﬂaps. Current RFD is oversensitive to a short time burst of BGP updates. To solve this problem, we design a dynamic slide-window based ﬁlter. The principle of the design is simple and lightweight, because RFD is performed for every route received from a single peer. Sophisticated algorithm requires large memory and intensive CPU computing, which cannot be a scalable solution. We apply a window-based ﬁlter to sample incoming updates. Within this time window, only the ﬁrst update is penalized among all the incoming updates. If an incoming update is beyond the window, the window correspondingly slides and the window size is dynamically adjusted. Initially, the size is set to a predeﬁned maximum value. If there is an incoming update beyond the window, we reduce the window size to half each time until the size is reduced to the minimum size. When the route is stable for a long period of time or the penalty drops below the reuse threshold, the sampling window is set to maximum again. The maximum size should be deﬁned carefully to cover the path exploration period, which depends on the topology and peers. The minimum size should be at least equal to Minimum Route Advertisement Interval(MRAI). In the experiments, we choose 16 ∗ M RAI as the maximum size. The heuristic of this design is based on two observations. First, a burst of BGP updates caused by a single route ﬂap should only be penalized once. Second, the purpose of RFD is to prevent persistent route oscillations caused by link/router failure or mis-conﬁguration. Although the sampling may miss some route ﬂaps, it is capable of detecting and quickly penalizing long-term route ﬂaps through the decreased sampling window. Current RFD suppresses the route till the penalty drops the below reuse threshold. Even when the route is not ﬂapping, it still cannot be selected as the best route. This can signiﬁcantly delay the fail-over eﬀorts. In the early work, we observed that many preﬁxes have a primary AS path. The primary AS path is the route that has been used for most of the time. When route ﬂaps happen, the primary route is very likely to be selected as the best path again after convergence. Thus, the primary path could indicate that route convergence is completed and a stable route has been selected. Based on this heuristic, we propose early reuse – reduce the penalty value to one half whenever the primary path is received.

1202

3

K. Zhang and S.F. Wu

Evaluation and Comparison

We performed a set of experiments on SSFNet. Our goal is to reveal that given the same network topology and route ﬂap events, how three RFD mechanisms impact network convergence, reachability and stability. We use a two dimensional 10X10 grid topology to simulate a ﬂat transit network. Each node represents a single EBGP router. Route selection is solely based on the length of AS path. Node 0 announces a preﬁx p and generates different update sequence for p. To simulate multi-homing environment, we attach another node (node 101) to announce the same preﬁx p. We simulate three types of BGP update sequence as input to the network. 1. Route Flapping (RF): A sequence of route UP and DOWN. 2. Route Oscillation (RO): A sequence of oscillating routes. It simulates the persistent route attribute changes. 3. Slow Convergence (SC): A sequence of path exploration followed by a route withdrawal and an announcement. This represents BGP path exploration process corresponding to a failure and fail-over event. We apply four metrics to measure the behavior of RFD from diﬀerent perspectives. 1. total number of BGP update messages. 2. delayed convergence time. It is deﬁned as the interval between the time when node 0 re-advertises the initial updates and the time when the network stops generating updates. 3. total number of nodes that lose routes to the preﬁx advertised by node 0. 4. α-instability. α is the time limit for a router to use a nexthop for switching. Any fast change of nexthop in the forwarding table (in our experiment, nexthop change is equal to FIB interface change) will be counted as an unstable change of the forwarding plane if the time of using this nexthop is less than α (seconds). Thus, α-unstable nexthop is deﬁned as a nexthop which is installed in FIB shorter than α. α-instability is deﬁned as the total time that α-unstable nexthop is used for forwarding. It is a score to measure the forwarding instability of the whole network. We present the experiment results and compare three RFD mechanisms. These results are based on experiments where MRAI is set to 5 seconds. We also analyze results based on experiments where MRAI is set to 30 seconds. There is no signiﬁcant diﬀerence when diﬀerent MRAI timer is used. Our ﬁndings include the following: – All three RFD mechanisms reduce the number of updates as expected. For the ﬁrst event, persistent route ﬂaps, RFD reduce the number of updates by 87% in single-homing network and 90% in multi-homing network. SRFD cuts oﬀ 70% and 83% of the updates. FRFD suppresses 65% and 80% of the updates. However, in the second and third events, the reduction is not as drastic. Especially in the multi-homing network, the RFD mechanism even increase the BGP updates!!

Filter-Based RFD

1203

– Although regular RFD performs better in terms of BGP updates deduction, it indeed sacriﬁces convergence time and route availability. Convergence is delayed for approximately 4000-7000 seconds. SRFD improves the convergence by avoiding the reuse-triggered suppression. Compared to regular RFD, FRFD reduces the delayed convergence by half. – RFD hurts reachability. Due to the reuse-triggered suppression in RFD, some nodes lose route for more than 4000 seconds. In the event of persistent route ﬂaps, although the reuse-triggered suppression is unavoidable, SRFD achieves a shorter suppression. FRFD keeps the route reachable on approximately half of all the nodes. In the second event, path oscillation, FRFD also keeps half of the nodes reachable due to the early-reuse. In the third event, slow convergence, both SRFD and FRFD do not suppress the route and the number of updates is not reduced. On the contrary, although RFD reduces 19% of the updates, reachability has been cut oﬀ for 2000 - 4000 seconds. – In terms of forwarding instability, RFD achieves the most stable forwarding in all three events. FRFD is more preferred than SRFD in the ﬁrst two events and has the same score as SRFD in the third event. In addition, if we only consider nexthop changes within 4 seconds to be unstable, the stability achieved by FRFD is about same as RFD.

4

Conclusion

In this paper, we proposed a Fliter-based RFD, applying two simple heuristics to current RFD. We performed extensive experiments to evaluated the Filterbased RFD. Our experiments measure the number of BGP updates suppressed, network convergence, network reachability and stability. We took into account of various factors that may inﬂuence the performance of RFD, including MRAI timer, diﬀerent BGP events, single-homing, and multi-homing. We demonstrated that, by optimizing current RFD, we can stabilize the network without sacriﬁcing reachability too much.

References 1. R. Chandra C. Villamizar and R. Govindan. BGP Route Damping. RFC 2439, May 1998. 2. Z. Mao, R. Govindan, G. Varghese, and R. Katz. Route Flap Damping Exacerbates Internet Routing Convergence. August 2002. 3. Zhenhai Duan, Jaideep Chandrashekar, Jeﬀrey Krasky, Kuai Xu, and Zhi-Li Zhang. Damping bgp route ﬂaps. In 23rd IEEE International Performance Computing and Communications Conference, 2004. 4. Beichuan Zhang, Dan Pei, Daniel Massey, and Lixia Zhang. Timer Interaction in Route Flap Damping. In The 25th International Conference on Distributed Computing Systems (ICDCS), 2005. 5. P. Smith and C. Panigl. RIPE routing-gw recommendations on route-ﬂap damping. Technical Report 378, RIPE, May 2006. 6. Olaf Maennel and Anja Feldmann. Realistic BGP Traﬃc for Test Labs. In Proceedings of the ACM SIGCOMM ’02, August 2002.

Network Access in a Diversified Internet Michael Wilson, Fred Kuhns, and Jonathan Turner Department of Computer Science and Engineering Washington University, St. Louis MO. 63130 {mlw2,fredk,jst}@arl.wustl.edu

Abstract. There is a growing interest in virtualized network infrastructures as a means to enable experimental evaluation of new network architectures on a realistic scale. The National Science Foundation’s GENI initiative seeks to develop a national experimental facility that would include virtualized network platforms that can support many concurrent experimental networks, with the goal of reducing barriers to new network architectures. This paper focuses on how to extend the concept of virtualized networking through LAN-based access networks to the end systems. We demonstrate that our approach can improve performance by an order of magnitude over other approaches and can enable virtual networks that provide end-to-end quality of service.

1 Introduction Today’s Internet has grown far beyond the original design. New requirements have grown almost as rapidly as the scale of the Internet. Unfortunately, the Internet is owned by no single stakeholder, making it difficult or impossible to upgrade the underlying architecture. [1] As recognized in [2], the inability of the current Internet architecture to meet new needs has led to the development of numerous ad hoc solutions to legitimate problems. The Internet needs a means of deploying potentially disruptive technologies alongside existing technologies. Virtualized networks and protocols could be deployed side-by-side but would be isolated by the virtualization mechanisms. The GENI [3] initiative seeks to use virtualization to create a national experimental facility for experimentation based on these very ideas. Overlay networks have been proposed as one method of virtualizing the network. However, overlay networks exist on top of existing networks and protocols. We believe that overlay networks should be regarded as a temporary migration solution to allow legacy networks to participate in new services. We propose to make network virtualization a core capability of a next generation diversified internet (in the remainder of this paper, we use the term diversification in place of virtualization, because the “V-word” has been so overloaded that it is often misinterpreted). In our diversified internet model, the underlying network provides a minimal set of services and a thin provisioning layer upon which new protocols may be developed. More details can be found in [4]. The fundamental abstractions for a diversified network are substrate routers, which are connected to each other by point-to-point substrate links; and metarouters, I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1204–1207, 2007. © IFIP International Federation for Information Processing 2007

Network Access in a Diversified Internet

1205

which are hosted on substrate routers and are connected to each other by point-topoint metalinks carried over substrate links. Collectively, a set of connected metarouters form a metanet exchanging metaframes adhering to a metaprotocol. We refer to the software components that support these abstractions as the Network Diversification Architecture. In this paper, we focus on the impact of internet diversification on the access network and end systems.

2 Diversification of the Access Network The access network provides the connection between a network endpoint and the first substrate router. We expect that Ethernet will continue to be one of the most common underlying technologies for access networks, and we focus our attention on the Ethernet context in this paper. 2.1 Objectives The overarching objective for the access network in a diversified network infrastructure is to make it possible for end systems to take advantage of any network services that may be provided by metanetworks. This objective leads us to the following specific goals. • Enable provisioned access. To support applications which need QoS guarantees, and to enable isolation between metanets, access links must be provisioned. • Enable dynamic reallocation of access capacity. Access network traffic is inherently more dynamic than backbone traffic, and the model should support changes. • Support existing Internet protocols. The existing Internet protocols should be able to operate within a diversified network environment with no loss of functionality and no significant performance degradation.

3 Diversification of the Hosts Host diversification mechanisms allow the introduction of new Metanet Protocol Stacks (MPSs) that provide metanet-specific services to applications. These mechanisms include a common substrate which is independent of metanets, but can be configured on behalf of individual metanets. 3.1 Objectives There are several key objectives that drive the design of the host diversification architecture. • Security. A MPS should have no more privileges than any ordinary application. Administrative access should not be necessary for MPS operation. Applications using a MPS need no administrative access and should not be trusted by a MPS.

1206

M. Wilson, F. Kuhns, and J. Turner

• Traffic Isolation. Provisioned metalinks must be isolated from one another and from other network traffic. Hosts must ensure that MPSs do not exceed assigned rates, nor does other traffic impact MPS provisions. • Efficiency. The performance of a metanet protocol stack should be comparable to the performance of a stack integrated into the OS kernel. • Support Commodity Operating Systems. We can’t expect users to use nonstandard operating systems in order to use metanetworks. The software must run on standard OS platforms, including Linux and Windows. • Developer Ease. Applications using a new MPS should use familiar APIs. • Ease of Adding New Metanet Stacks. Installing a new MPS should be no more difficult (or dangerous!) than installing an application program. 3.2 Software Design In most systems today, network protocol stacks are integrated into the OS kernel and are accessed through the socket interface. This gives the network code unprotected access to kernel data structures. We expect many organizations to develop MPSs. Requiring that new MPSs be added to the OS kernel brings unacceptable security risks. We solve this problem with a hybrid approach: a user-space implementation of metanet control together with trusted, metanet-independent OS kernel extensions for the data plane. The Substrate Kernel Module (SKM) is a loadable kernel module that coordinates control plane transactions with a metanet control daemon, but handles all metanet data plane operations within the kernel. The metanet control daemon runs in user space in an unprivileged context. The daemon handles control functionality for the MPS, but is divorced from the data path. User applications interact with a MPS using the standard socket interface. Control requests are forwarded from the SKM to the control daemon; send and receive operations pass through the SKM.

4 Prototype Performance Our initial prototype was developed on Linux 2.6.16. We currently support a subset of the socket operations, as some operations are nonsensical in our model. Our choices and reasoning are discussed in more detail in the expanded technical report [5]. To test the performance of the system, we created a metanet protocol resembling a combined UDP/IP. We created a test network with two 2.4 GHz machines connected via a 1000 Mb/s switch. Using our new metanet protocol, we measured CPU utilization vs. sending rate limit for rates from 1 Mb/s to 1000 Mb/s, using maximum size packets (1500 octets). As shown in Fig. 1, our CPU utilization is largely linear with respect to bandwidth. The spike near 600 Mb/s is due to the implementation of the Linux Token Bucket. To see if there are sufficient tokens to allow sending traffic, the token bucket first dequeues a packet and checks the length. If there are insufficient tokens available, it

Network Access in a Diversified Internet

CPU Utilization

1207

Maximum Bandwidth Achieved: 780 Mb/s

CPU Utilization (%) .

100 Metanet Utilization 80 60 40 20 UDP Utilization 0 0

200

400

600

800

1000

Sending rate limit (Mb/s)

Fig. 1. CPU utilization vs. sending rate as limited by egress queues for metanet and native UDP. Senders were limited by token buckets until 780 Mb/s, where system I/O limits governed.

requeues the packet. This process is repeated every time a packet is queued and at every clock tick. At speeds of 600 Mb/s, we saw upwards of 50,000 requeues per second. At higher rate limits, the queue never has a chance to run out of tokens, so packets are never requeued. Because of additional outbound validation overhead, our CPU utilization is always worse than native UDP. Comparable systems such as Oasis [6] and PL-VINI [7] become CPU-bound at 3 Mb/s and 200 Mb/s respectively. We regard our system as a worthwhile gain in efficiency. Further evaluation of our system may be found in the technical report [5]. Acknowledgements. This work is supported by the National Science Foundation (CNS 0325298, 0520778 and 0626661).

References [1] T. Anderson, L. Peterson, S. Shenker, J. Turner, “Overcoming the Internet Impasse through Virtualization,” IEEE Computer Magazine, Apr. 2005. [2] Report of NSF Workshop on Overcoming Barriers to Disruptive Innovation in Networking. (January 2005) http://www.arl.wustl.edu/netv/noBarriers_final_report.pdf. [3] GENI web site. http://www.geni.net [4] J. Turner, D. Taylor, “Diversifying the Internet,” Proceedings of Globecom, Nov. 2005. [5] M. Wilson, F. Kuhns, J. Turner, “Network Access in a Diversified Internet,” Washington University Technical. Report. WUCSE-2007-14, Feb. 2007 [6] H. V. Madhyastha, A. Venkataramani, A. Krishnamurthy, T. Anderson, “Oasis: an overlay-aware network stack,” SIGOPS Oper. Syst. Rev. 40, 1 (Jan. 2006), pp. 41-48. [7] A. Bavier, N. Feamster, M. Huang, L. Peterson, J. Rexford, “In VINI Veritas: Realistic and Controlled Network Experimentation,” SIGCOMM 2006.

Outburst: Efficient Overlay Content Distribution with Rateless Codes Chuan Wu and Baochun Li Department of Electrical and Computer Engineering University of Toronto {chuanwu,bli}@eecg.toronto.edu

Abstract. The challenges of significant network dynamics and limited bandwidth capacities have to be considered when designing efficient algorithms for distributing large volumes of content in overlay networks. This paper presents Outburst, a novel approach for overlay content distribution based on rateless codes. In Outburst, we code content bitstreams with rateless codes at the source, and take advantage of the superior properties of rateless codes to provide resilience against network dynamics and node failures. We recode the bitstreams at each receiver node, so that the need for content reconciliation in parallel downloading is eliminated, and the delivery of redundant content is minimized. The effectiveness and efficiency of Outburst are demonstrated with simulations. Keywords: Overlay Network, Rateless Codes, Content Reconciliation.

1 Introduction As compared to traditional solutions using multiple unicasts, content distribution over overlay networks offers more efficient bandwidth usage and server load distribution. There are, however, two key challenges in overlay distribution of large volumes of data. First, to achieve higher throughput and failure resilience, parallel downloading from multiple overlay nodes becomes typical in most recent proposals. Nevertheless, a risk rises that the same content may be unnecessarily supplied by multiple upstream nodes. To maximize bandwidth efficiency, a receiver needs to reconcile the differences among a set of upstream nodes before the actual downloading, a problem referred to as content reconciliation. In large-scale overlay networks, such a reconciliation process constitutes a complicated and bandwidth-intensive task [1]. Second, overlay content distribution sessions may be routinely disturbed by dynamics in overlay networks, such as node departures and failures. Throughput for bulk data downloading may be significantly affected in case of such dynamics. This paper proposes Outburst, a novel approach which utilizes rateless codes to address both challenges. Rateless codes, such as LT codes [2], Raptor codes [3] and online codes [4], possess the important characteristic of being extremely loss resilient. In Outburst, we take advantage of such loss resilience to achieve the desirable resilience against losses and node dynamics. In addition, we discuss possible solutions towards solving the content reconciliation problem, and propose an approach based on rateless I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1208–1216, 2007. c IFIP International Federation for Information Processing 2007

Outburst: Efficient Overlay Content Distribution with Rateless Codes

1209

recoding at each participating overlay node. Our rateless recoding proposal can completely eliminate the need for content reconciliation in parallel downloading, based on other salient properties of rateless codes. The remainder of this paper is organized as follows. In Sec. 2, we present our network model for using rateless codes, and discuss the recoding approach. The baseline protocol and dynamics handling protocol are presented in Sec. 3. We present simulation results, discuss related work and conclude the paper in Sec. 4, Sec. 5 and Sec. 6.

2 Outburst: Efficient Content Distribution with Rateless Codes In this paper, we consider content distribution in mesh overlay topologies, consisting of one data source S and multiple receivers in T . Each receiver is served by one or more upstream nodes, and may serve one or more downstream nodes. We divide the bulk data file to be distributed into segments s1 , s2 , . . .. Each segment contains k blocks, and each block has a fixed length of L bits. In Outburst, we code each segment with a rateless code and deliver coded blocks for each segment in the network. 2.1 Source Coding with Rateless Codes We now motivate the use of rateless codes in Outburst. In contrast with traditional erasure codes, the benefits of rateless codes are related to the fundamental challenges in overlay content distribution: volatile network dynamics and content reconciliation. t1

t1

S

1,2,3 t2

1,2,3,4,5

{1,2,3,6,7}

{1,2,3,4,5,6,7,8}

2,3,6 t4

6,7 t3

4,5,6,7,8 {4,5,6,7,8}

{2,3,6,7,8} 7,8

S

{1,2,3,4,5} {1(1)2(1)3(1)4(1)5(1)...} (1) (1) (1) 1 2 3 t2 1,2,3,4,5 {1(1)2(1)3(1)1(3)2(3) } S {1(2)2(2)3(2)4(2)5(2)... } {1,2,3,4,5,6,7,8,...} (2) (2) (2) (3) (3) 1 2 3 1 2 t1

{1,2,3,4,5}

{1,2,3,4,5}

1,2,3,4,5

{1,2,3,4,5,6,7,8,...}

1,2,3 t2 {1,2,3,6,7}

6,7 t3 6,7,8,9,10 {6,7,8,9,10} 7,8

2,3,6 t4 {2,3,6,7,8}

6,7,8,9,10

t3

(3) (3)

34

{6,7,8,9,10}

t4 (2) (2) (2) (3) (3) {1 2 3 3 4 } (4) (4) (4) (4) (4) {1 2 3 4 5 ... }

(3) (3) (3) (3) (3)

{1 2 3 4 5 ... }

(a) Content distribution with fixed-rate erasure codes.

(b) Content distribution with rateless codes only at the source.

(c) Content distribution with rateless recoding.

Fig. 1. Content reconciliation problem with different coding schemes: a comparison

Erasure source coding has been used in recent years to cope with network dynamics in content distribution [1,5]. A traditional (n, k) erasure code, such as Reed-Solomon codes and Tornado codes [5,6], is a forward error correction code with parameters k, the number of original symbols, and n, the number of coded symbols. The ratio k/n is referred to as the rate of the code. An erasure code is loss-resilient, since if any k (or slightly more than k) of the n coded symbols are received, the k original symbols can be recovered. This makes erasure codes an ideal solution for reliable transmission over an unreliable transportation protocol, such as UDP. Also, since any distinct symbol from any upstream nodes can be used for decoding, a receiver does not rely on a specific upstream node for the supply of certain original symbols. This makes erasure codes resilient to node failures.

1210

C. Wu and B. Li

In addition, the use of erasure source coding mitigates the need for content reconciliation in parallel downloading. By expanding the k original symbols to a larger symbol space of size n, the probability of different nodes holding the same symbols decreases. However, since the total number of coded symbols is fixed, the problem is not completely solved. We show an example in Fig. 1(a), where S generates 8 coded blocks from 5 original blocks with an (8, 5) erasure code and transmits them to four receivers. It is apparent that t2 and t4 still need to reconcile their parallel downloading from upstream nodes t1 , t3 and t2 , t3 respectively. To further address the content reconciliation problem, as well as to provide better resilience to network dynamics, we propose to use rateless codes as the foundation of Outburst. Rateless codes constitute a category of recently proposed erasure codes, including LT codes [2], Raptor codes [3] and online codes [4]. They are named as “rateless” as the number of coded symbols that can be generated from k original symbols is potentially unlimited. Rateless codes are failure-tolerant as they retain the desirable property that the k original symbols are decodable from any slightly more than k coded symbols with high probability. Furthermore, rateless codes possess two key advantages, which make them a more suitable solution for overlay content distribution. 1) Efficient encoding and decoding. We briefly illustrate the basic idea in the encoding and decoding process of a rateless code. Given k input symbols, the basic operation performed by a rateless-code encoder is to exclusive-or a subset of the input symbols, which is randomly chosen based on a special degree distribution, such as Robust Soliton distribution for LT codes. This simple encoding process makes it possible to produce coded blocks on the fly when required. A decoding graph that connects coded symbols to input symbols is defined by the encoding process. The encoding information for each coded symbol, i.e., degree and set of neighbors in the decoding graph, is communicated to the receiver for decoding. In the Belief-Propagation (BP) decoder of rateless codes, it constructs the decoding graph when it receives slightly more than k coded symbols and their encoding information. In each round of decoding process, the decoder identifies a coded symbol with degree one, and recovers the value of its unique neighbor among the input symbols. Then the value of the recovered input symbol is exclusive-or’ed to the values of all its neighboring coded symbols, and all the incident edges are removed. Such a process is repeated until all the input symbols are recovered. As both encoding and decoding only involve exclusive-or operations, rateless codes are very computationally efficient. 2) Better solution towards content reconciliation. The rateless property of rateless codes is useful towards finding a complete solution to the content reconciliation problem. Compared to a traditional erasure code which generates a fixed number of coded symbols from k original symbols, rateless codes can potentially provide a nearly unlimited number of coded symbols to be delivered throughout the network, further decreasing the probability of block conflicts in parallel downloading. Nevertheless, coding with rateless codes at only the data source may not completely eliminate the need for reconciliation. As shown in Fig. 1(b), t4 still needs to reconcile its downloading from t2 and t3 , which inevitably share some common blocks as t2

Outburst: Efficient Overlay Content Distribution with Rateless Codes

1211

downloads from t3 . To completely solve the content reconciliation problem throughout the topology, we propose to generate new coded blocks for a segment on each receiver node, instead of purely relaying the received blocks. 2.2 Recoding with Rateless Codes The basic idea in Outburst is to generate freshly coded blocks at each receiver, so that all the received blocks from any upstream nodes are unique, and useful for decoding at receivers. To this end, we seek to find an efficient recoding scheme at each receiver. At the first thought, a question to ask is: Is it possible to directly recode incoming coded blocks of a segment at each receiver, such that the new generated blocks are also useful for decoding at other receivers? If so, we can employ such direct recoding at each receiver. Unfortunately, with the example of LT codes, we show that the favorable property of efficient decoding is not maintained and the decodability is not guaranteed, if we directly recode received blocks with the same Robust Soliton distribution. Direct Recoding with LT Codes is not Feasible. With an LT code, a segment is encoded with the Robust Soliton distribution. This degree distribution plays a significant role in the success of BP decoding. With it, the probability for a coded block to have a small degree in the decoding graph is high, but the probability quickly decreases as the degree becomes larger. For example, if 10 input blocks are encoded, a coded block has a probability of 0.5 to have degree 2, or a probability of 0.2 to have degree 3. We show that, if we directly recode the received coded blocks on the same Robust Soliton distribution at a receiver, such a degree distribution is not retained in the decoding graph connecting recoded blocks to original blocks. The expected degree of a recoded block in this decoding graph tends to increase. For an example in Fig. 2, from 8 original blocks, S generates 6 coded blocks to transmit to n1 , and 5 additional blocks to n2 . n1 directly recodes the 6 received blocks into 5 new blocks to serve t, while n2 recodes its 5 received blocks to produce 3 new blocks for t. At t, the decoding graph connecting the 8 received blocks to the 8 original blocks is depicted. This graph, with average degree of 3.6, is much denser than that at source S with average degree of 2.5. Since the desirable Robust Soliton distribution is not retained with direct recoding, it is unlikely that the same superior decoding efficiency of BP decoder in LT codes can be achieved. Further, the decodability with such recoded blocks is not guaranteed with the same high probability as the original LT codes. Outburst’s Recoding Scheme. To design a recoding scheme which retains a degree distribution, we investigate another favorable property of rateless codes — the receiver may decode from coded symbols generated by different devices operating a same ratelesscode encoder, as long as they are generated from the same set of input symbols [3]. In Outburst, the data source encodes blocks of each segment with a rateless code based on a certain special degree distribution, such as the LT code with Robust Soliton Distribution, and transmits the coded blocks. After a receiver receives slightly more than k coded blocks for segment si , it decodes and obtains the k original blocks. Upon requests for segment si from its downstream nodes, it generates freshly coded blocks from the recovered original blocks, using a rateless-code encoder based on the same

1212

C. Wu and B. Li 1

2

3

4

5

6

7

8

2'

3'

4'

5'

6'

7'

8'

9'

S ...... 1'

10' 11' 7',8',9',10',11'

1',2',3',4',5',6' 1'

2'

3'

4'

5'

6'

7'

8'

9'

6''

7''

8''

10' 11'

n2

n1

......

......

1''

2''

3''

4''

5''

1'',2'',3'',4'',5''

6'',7'',8'' 1

2

3

4

5

6

7

8

1''

2''

3''

4''

5''

6''

7''

8''

t

Fig. 2. Direct LT recoding with the Robust Soliton distribution: an example

degree distribution, and delivers them to these downstream nodes. In what follows, we show that such a recoding process is correct and efficient. Correctness. In Outburst, the coded blocks a node receives for segment si are either encoded by the source or recoded by a receiver, both from the same set of k original blocks of si . Since all the encoders follow the same encoding steps and generate each block independently from any other one based on the same degree distribution, the coded blocks are all potentially unique as if they are produced by a same encoder. Thus, after collecting slightly more than k coded blocks from any upstream nodes, the receiver can recover the k original blocks with the same high probability as the original code. By guaranteeing the potential uniqueness of all the coded blocks in the network, our recoding scheme successfully eliminates the need for content reconciliation. In Fig. 1(c), for example, all receivers decode their received blocks and recode them into a potentially unlimited number of coded blocks. Since the blocks in transit are all freshly encoded by their senders, no reconciliation is needed for parallel downloading at any receiver. An analogy is to describe this situation as having many “mini-sources” in the overlay, each serving at least one segment with unlimited number of coded blocks. Efficiency. As previously mentioned, rateless codes are highly efficient with respect to encoding and decoding, which makes it feasible to recode on-the-fly at the receivers. For the example of LT codes, it takes on average O(ln(k/δ)) block exclusive-or operations to generate a coded block from k input blocks, and O(k ln(k/δ)) √ block exclusive-or operations to recover the k original blocks from any k + O( k ln2 (k/δ)) of coded blocks with probability 1 − δ. Each block exclusive-or operation includes L bitwise exclusive-or operations. Even better linear-time encoding and decoding are provided by Raptor codes. For decoding, the decoding graph can actually be constructed on the fly while receiving coded blocks; based on belief propagation, original blocks can be recovered whenever there is enough information to recover it. Thus, our recoding scheme

Outburst: Efficient Overlay Content Distribution with Rateless Codes

1213

does not introduce much delay and computation overhead, but eliminates the need for content reconciliation required for every parallel retrieval.

3 Outburst: Protocols We now present practical protocols the source and receivers employ in Outburst. 3.1 Baseline Protocols In Outburst, a receiver can essentially choose any available segment to download from any upstream node. Even when it is concurrently downloading coded blocks for a same segment from multiple upstream nodes, the received blocks can all be used for decoding the segment with high probability. In the practical retrieval protocol design, we consider two problems. First, as a receiver needs to decode a segment before the segment becomes available to be recoded and served to other nodes, a potential problem may arise that an upstream node may hold partially coded blocks for many segments at a specific time, but not sufficient coded blocks for recovering a single segment. Second, as the data source and overlay nodes may fail unexpectedly, segment diversity needs to be guaranteed throughout the network for better failure tolerance. Our strategies to address the above problems are as follows. When a receiver v decides which segment to retrieve from upstream node u, it first checks whether any of the segments it partially holds (which is currently being downloaded from other nodes or has previously been downloaded from a failed node) is available at u. If so, it randomly chooses one such segment and requests it from u; otherwise, it randomly selects an available segment at u and requests its coded blocks. At the upstream side, when node u receives a request for a specific segment from v, it starts generating coded blocks for the segment and keeps pushing them to v. When the segment is successfully decoded at v, v will send a “stop” message to u to terminate generation and delivery of coded blocks for this segment, and request a new available segment if there exists one. 3.2 Handling Node Departures and Failures In Outburst, upon detecting the departure or failure of an upstream node, a downstream node tries to increase its download bandwidths from the remaining upstream nodes. Appearing intuitive, we note that such simple node dynamics handling — practically compensating the throughput loss from other upstream nodes — is only efficient due to rateless recoding in Outburst. As rateless recoding guarantees all coded blocks in the entire overlay are unique, we can rest assured that the compensating download bandwidths are indeed fully utilized to deliver useful blocks for decoding, without the need for reconciliation. Also, our segment retrieval strategy maximizes the diversity of segments in the network and minimizes the chance of holding only partial blocks of a segment in case of node departures or failures. Working together, these measures are able to provide excellent failure tolerance for the content distribution.

1214

C. Wu and B. Li

4 Performance Evaluation Our simulations are conducted over random network topologies generated with BRITE [7], based on power-law node degree distributions. The average number of neighbors per node is six. Each node, including the data source, has 1.5 − 4.5 Mbps of download bandwidth and 0.6 − 0.9 Mbps of upload bandwidth. Unless otherwise stated, each segment of the data file to be distributed consists of 100 blocks. 4.1 Maximization of Bandwidth Utilization We first compare bandwidth efficiency among four different schemes: source coding (SC) and recoding (RC) with rateless codes, source coding only with rateless codes, source coding with erasure codes (n/k = 8), and no coding. Under each scheme, the content blocks are delivered without reconciliation among upstream nodes. For the scheme without coding, we eliminate duplicated blocks obtained from different upstream nodes and calculate the throughput of distinct content blocks at each receiver; for the schemes with coding, we eliminate duplicated and non-useful coded blocks, decode the content and compute the throughput of decoded original blocks. The bandwidth efficiency of each system is computed as the aggregate throughput at all receivers divided by the total bandwidth consumption. We can see in all the comparison scenarios shown in Fig. 3, Outburst’s rateless source coding and recoding scheme always achieves the highest bandwidth efficiency, as it best eliminates delivery redundancy. Fig. 3(B) further shows that increasing the number of blocks in each segment helps improving bandwidth efficiency. Nevertheless, the other schemes never outperform Outburst, and their bandwidth efficiency becomes stable when k exceeds 100. 4.2 Tolerance Against Node Failures In our next experiments over a 300−node network, we randomly choose different percentages of nodes to fail concurrently, and calculate the remaining throughput of receiving original content at the remaining receivers. Under all schemes, the receivers perform the same failure handling protocol as discussed in Sec. 3.2. Fig. 4 reveals that

0.8 0.7 0.6 SC + RC (rateless codes) SC (rateless codes) 0.5 SC (erasure codes n/k=8) no codes 0.4 50 100 150 200 250 300 350 400 450 500

Number of nodes in the network (A) Comparison over different network sizes (k=100)

0.9

Average node throughput (Kbps)

1

Bandwidth efficiency

Bandwidth efficiency

1 0.9

1200

0.8

1000

0.7 0.6 0.5 0.4

5

SC + RC (rateless codes) SC (rateless codes) SC (erasure codes n/k=8) no codes 10 20 50 80 100 200 500 8001000

Number of blocks in one segment (k) (B) Comparison over different block numbers in a 300-node network

Fig. 3. A comparison of bandwidth efficiency

SC + RC (rateless codes) SC (rateless codes) SC (erasure codes n/k=8) no codes

800 600 400 200 0

0

20 40 60 80 Percentage of node failures

100

Fig. 4. A comparison of failure tolerance

Outburst: Efficient Overlay Content Distribution with Rateless Codes

1215

the average throughput in Outburst remains almost unaffected with failure percentage up to 40%. For the other schemes, their throughput starts to drop whenever failure occurs, and drops faster than that for Outburst when failure percentage is high. All these exhibit the excellent failure tolerance of Outburst.

5 Related Work To enhance delivery bandwidth and reliability, mesh-based proposals have become typical in recent overlay content distribution systems [8,9]. In a mesh overlay, each receiver decides which upstream node to retrieve a specific block from. In Outburst, we make every coded block from any upstream nodes equally useful, sparing the receivers from the burden of reconciliation. As a well-known work on content reconciliation, Byers et al. [1] provide algorithms for reconciliation of symbols between node pairs. The algorithms are quite resource intensive as for computation and messaging. Some existing overlay content distribution proposals advocate erasure codes, e.g., Reed-Solomon codes and Tornado codes, to provide reliability and flexibility [1,8]. A more recent work by Maymounkov et al. [4] uses online codes, a type of rateless codes. These proposals only encode at the source but do not recode at the receivers, and thus mitigate the need for content reconciliation but do not eliminate it. For recoding with erasure codes, Byers et al. [1] discuss direct recoding of Tornadocode encoded symbols to mitigate delivery redundancy. They advocate heuristics to construct recoding degrees, and do not prove the decodability of recoded symbols. Network coding has been studied to allow encoding at intermediate nodes in a network [10]. Avalanche [11] is a well-known content distribution scheme with network coding. Similar to Outburst, it is robust to node dynamics and reduces delivery redundancy. However, decoding of network coding involves matrix inversions over a finite field up to GF(216 ), which is known to be more complex than decoding with only XORs in rateless codes.

6 Conclusion This paper presents Outburst, an excellent solution for efficient content distribution over overlay mesh topologies. Using rateless codes in a novel way — encoding at both the source and the receivers — it effectively battles the fundamental challenges of dynamics, reconciliation, and limited bandwidth in overlay content distribution. With examples, analysis and simulation results, we demonstrate that Outburst achieves high bandwidth efficiency and excellent failure tolerance, as compared to traditional schemes with or without erasure codes. The benefits inspire us to further work towards its implementation in realistic large-scale content distribution applications.

References 1. Byers, J., Considine, J., Mitzenmacher, M., Rost, S.: Informed Content Delivery Across Adaptive Overlay Networks. In: Proc. of ACM SIGCOMM 2002. (August 2002) 2. Luby, M.: LT Codes. In: Proc. of the 43rd Symposium on Foundations of Computer Science. (November 2002)

1216

C. Wu and B. Li

3. Shokrollahi, A.: Raptor Codes. In: Proc. of the IEEE International Symposium on Information Theory. (June 2004) 4. Maymounkov, P., Mazieres, D.: Rateless Codes and Big Downloads. In: Proc. of the 2nd Int. Workshop Peer-to-Peer Systems (IPTPS). (February 2003) 5. Byers, J., Luby, M., Mitzenmacher, M., Rege, A.: A Digital Fountain Approach to Reliable Distribution of Bulk Data. In: Proc. of ACM SIGCOMM 1998. (September 1998) 6. Luby, M., Mitzenmacher, M., Shokrollahi, M., Spielman, D., Stemann, V.: Practical LossResilient Codes. In: Proc. of the 29 th ACM Symp. on Theory of Computing. (1997) 7. Medina, A., Lakhina, A., Matta, I., Byers, J.: BRITE: Boston University Representative Internet Topology Generator. Technical report, http://www.cs.bu.edu/brite (2000) 8. Kostic, D., Rodriguez, A., Albrecht, J., Vahdat, A.: Bullet: High Bandwidth Data Dissemination Using an Overlay Mesh. In: Proc. of the 19th ACM Symposium on Operating Systems Principles (SOSP) 2003. (October 2003) 9. Sherwood, R., Braud, R., Bhattacharjee, B.: Slurpie: A Cooperative Bulk Data Transfer Protocol. In: Proc. of IEEE INFOCOM 2004. (March 2004) 10. Ahlswede, R., Cai, N., Li, S.Y.R., Yeung, R.W.: Network Information Flow. IEEE Transactions on Information Theory 46(4) (July 2000) 1204–1216 11. Gkantsidis, C., Rodriguez, P.: Network Coding for Large Scale Content Distribution. In: Proc. of IEEE INFOCOM 2005. (March 2005)

Adaptive Window-Tuning Algorithm for Eﬃcient Bandwidth Allocation on EPON Sangho Lee, Tae-Jin Lee, Min Young Chung, and Hyunseung Choo School of Information and Communication Engineering, Sungkyunkwan University, Korea {ianlee,tjlee,mychung,choo}@ece.skku.ac.kr

Abstract. Ethernet passive optical network (EPON) has been considered to solve the last mile bottleneck problem. In an eﬀort to accommodate the explosive bandwidth demands from subscribers, the optical line terminal (OLT) eﬃciently divides and allocates time slots for data upstream to all optical network units (ONUs) in EPON. This technology is expected to be a core in the future ﬁber-to-the-home/-oﬃce/-curb (FTTH/O/C). We study previous algorithms for dynamic bandwidth allocation (DBA) in interleaved polling with adaptive cycle time (IPACT). For eﬀective bandwidth allocation of the uplink channel, we propose an adaptive window-tuning algorithm (AdWin) based on the excessive bandwidth. This algorithm not only satisﬁes bandwidth demands of ONUs within the possible scope, but also seeks fairness among ONUs. The comprehensive computer simulation results indicate that the proposed scheme is up to 94% and 94% lower than previous schemes in terms of average packet delay and average queue size, respectively. It also demonstrates up to 86% improved performance in regards to packet loss ratio.

1

Introduction

Ethernet passive optical network (EPON) is an emerging solution to mitigate the last mile [1] bottleneck problem between backbone and access networks connected with business and residential subscribers [2]. In order to accommodate the huge demands of subscribers resulting from the explosive growth of the Internet and numerous high-broadband applications, many studies [1,2,4] on EPON have been discussed. In general, an EPON architecture based on tree topology is a point-to-multipoint ﬁber optical network, which consists of an optical line terminal (OLT), a 1:N passive star coupler (or splitter/combiner) [2], and multiple optical network units (ONUs) which share an optical ﬁber between the passive star coupler and the OLT for data upstreaming [2,3]. Previously, in order to eﬃciently manage data upstreaming, various DBA schemes have been studied. Interleaved polling with adaptive cycle time (IPACT) [3] is a typical scheme, where several disciplines such as limited, constant credit,

Corresponding author.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1217–1220, 2007. c IFIP International Federation for Information Processing 2007

1218

S. Lee et al.

linear credit and elastic services are proposed to prevent monopolizing the entire bandwidth. Limited service grants the requested bandwidth, but imposes the limit of the transmission window. Although it is the most conservative, it has the best performance of DBA disciplines in IPACT. Based on limited service, constant and linear credit services are proposed [3]. Basically, these services take into account newly arriving packets from subscribers during the interval between the current upstream end time and the next upstream start time of a particular ONU [3,4]. The constant credit service allocates the requested bandwidth and ﬁxed credits as additional bandwidth. Linear credit service is equivalent to the constant credit service other than how to add credits. It decides proportionally additional bandwidth based on the requested bandwidth. However, both services waste bandwidth if not used. Meanwhile, elastic service allocates bandwidth to the current ONU based on the past N grants where N is the number of ONUs. It is an attempt to remove the transmission window limit. It is imposed only on the maximum poll cycle time [3]. However, these services do not necessarily serve bandwidth to every ONU fairly and eﬃciently, because in case of limited service, OLT grants bandwidth to every ONU as much as the maximum window limit regardless of requested bandwidth under heavy traﬃc load. It can lead to relative unfairness among ONUs. In the meantime, elastic service can grant entire bandwidth within a cycle time to an ONU. Nevertheless, after granting considerable bandwidth, OLT cannot allocate enough bandwidth to the next ONUs on account of the maximum polling cycle time. It also brings about another fairness problem. In this paper, an adaptive window-tuning algorithm (AdWin) is proposed to resolve the problems in IPACT disciplines, which suﬀer from unfairness and, in some cases, ineﬃciency. It allows OLT to eﬃciently allocate bandwidth to all ONUs, by changing the limits of the transmission window (time slot) [2,3]. The variable transmission window limit exerts to assign excessive bandwidth to the ONUs that have greater demands. This algorithm reduces not only average packet delay, but also average queue size and supports better services than previous schemes. In accordance with comprehensive computer simulation results, the proposed scheme has up to 94% and 94% lower average packet delay and average queue size, respectively. The scheme also results in up to 86% better performance in terms of packet drop ratio. The remainder of the paper is organized as follows. In Section 2, the AdWin is proposed as a promising solution. Next in Section 3 we evaluate the performance of the proposed scheme, in terms of the average packet delay, average queue size, and packet loss ratio. Section 4 concludes this paper.

2

An Adaptive Window-Tuning Algorithm

The proposed algorithm, AdWin, is a generous bandwidth allocation scheme that utilizes the remaining bandwidth in the past. The remainder bandwidth is used to increase the transmission window limits for the subsequent N ONUs.

Adaptive Window-Tuning Algorithm

1219

To calculate the remainder bandwidth, a limit threshold is needed ﬁrst, this is obtained from the maximum window size in IPACT [2,3]. For the diﬀerence between the limit threshold and the granted bandwidth for the ONU i−1, the remainder bandwidth is decided and equally divided by the number of ONUs. The divided remainder bandwidth is then added to the limit of transmission window for the ONU i. In doing so, the transmission window limit for the subsequent N ONUs (ONU i to i − 1) is determined. However, the remainder bandwidth may be negative when the granted bandwidth for the ONU i − 1 is greater than that of the limit threshold. In this case, the over-granted bandwidth is equally divided and added to the previously determined limit of the transmission window, as in the case of the remainder occurrence. Thus, the limit for the next ONUs changes every bandwidth granting. Due to the increasing limit, ONUs have greater possibility of bandwidth grants than that with the limit threshold. Therefore, they can accommodate bandwidth demands accordingly. The proposed AdWin algorithm is based on the limited service in IPACT. When it comes to granting bandwidth, the OLT compares the bandwidth request to the decided limit of the transmission window. This is simi[i] lar to the limited service, which is presented as follows: G[i] = min(R[i] , WMAX ), [i] where G[i] is granted bandwidth, R[i] is requested bandwidth, and WMAX is the maximum window size for ONU i.

3

Performance Evaluations

To evaluate performance of the DBA schemes, we consider an EPON system consisting of an OLT and 16 ONUs with interleaved polling operation. The simulation parameters are determined as in [3]. The traﬃc pattern on access networks is characterized in accordance with self-similarity and long-range dependence (LRD) [3,5]. Fig. 1 presents average packet delay, average queue size, and packet loss ratio of limited, elastic services, and AdWin algorithm, under the traﬃc load ρ, from 0.05 to 0.95. In the region of 0.45 ≤ ρ ≤ 0.60, the packet delay drastically increases as ρ increases, but its value among all the considered schemes is diﬀerent, because the average granted bandwidth for ONUs is diﬀerent. In addition, the average queue size shows a similar trend to the average packet delay. From the results, the proposed scheme outperforms the limited and elastic services for the traﬃc load ρ in [0.45, 0.60]. The AdWin has up to 94(94)% and 91(90)% lower average packet delay (average queue size) than that of limited and elastic services, respectively. The packet loss ratio for the considered algorithms is shown in Fig. 1(c). Due to drastic increases in the average buﬀer size of ONUs, compared with limited and elastic services, the AdWin has a lower packet loss ratio for 0.55 ≤ ρ < 0.60. Especially, for ρ = 0.55, the AdWin has no packet loss and for ρ = 0.575, it is 86% and 69% lower than that of limited and elastic services, respectively.

1220

S. Lee et al.

10

Limited service Elastic service AdWin (Proposed scheme)

10M 1M

0.1 Bytes

Second(s)

100M

Limited service Elastic service AdWin (Proposed scheme)

1

0.01

100k 10k

1E-3

1k

1E-4

100

0.0

0.2

0.4 0.6 Traffic load

0.8

1.0

0.0

(a) Average packet delay

0.2

0.4 0.6 Traffic load

0.8

1.0

(b) Average queue size

Loss Ratio

0.01

1E-3

Limited service Elastic service AdWin (Proposed scheme)

1E-4 0.0 0.5

0.6

0.7 0.8 Traffic load

0.9

1.0

(c) Packet loss ratio Fig. 1. Performance comparison among considered schemes

4

Conclusion

In order to enhance the performance of EPON system, we propose AdWin which alters transmission window limits based on past information of bandwidth allocation. It allocates the bandwidth to all ONUs eﬃciently and fairly. Consequently, the EPON system with AdWin algorithm can provide high-quality services for end users on last mile. Acknowledgments. This research was supported by MIC, Korea under ITRC IITA-2006-(C1090-0603-0046) and the Korea Research Foundation Grant funded by the Korean Government(MOEHRD) (KRF-2005-042-D00248).

References 1. Kramer, G. and Pesavento, G.: Ethernet Passive Optical Network (EPON): Building a Next-Generation Optical Access Network. IEEE Communications Magazine Vol. 40, (2002) 66-73 2. Zheng, J. and Mouftah, H.T.: Media Access Control for Ethernet Passive Optical Networks: An Overview, IEEE Communications Magazine Vol. 43, (2005) 145-150 3. Kramer, G., Mukherjee, B., and Pesavento, G.: Interleaved Polling with Adaptive Cycle Time (IPACT): A Dynamic Bandwidth Distribution Scheme in an Optical Access Network, Photonic Network Communications Magazine, Vol. 4, (2002) 89107 4. Byun, H., Nho, J., and Lim, J.: Dynamic Bandwidth Allocation Algorithm in Ethernet Passive Optical Networks, Electronics Letters, Vol. 39, (2003) 1001-1002 5. Leland, W.E., Taqqu, M.S., Willinger, W., and Wilson, D.V.: On the Self-Similar Nature of Ethernet Traﬃc (Extended Version), IEEE/ACM Transation on Networking, Vol. 2, (1994) 1-15

Optical Burst Control Algorithm for Reducing the Eﬀect of Congestion Reaction Delay Myungsik Yoo and Junho Hwang School of Electronic Engineering, Soongsil University, Seoul, Korea {myoo,jhwang}@ssu.ac.kr

Abstract. To address the burst loss problem in OBS network, we propose a new optical burst control algorithm, which estimates the future burst traﬃc condition to eliminate the eﬀect of congestion reaction delay. Through the simulations, we verify that the proposed algorithm outperforms other existing burst control algorithms. Keywords: Optical Burst Switching, Congestion Control, Congestion Reaction Delay, Estimation.

1

Problem Statement

In optical burst switched (OBS) [1] network, a data burst is assembled at ingress edge router, and then is forwarded to core router for delivery to its destination. When the core router receives data bursts more than it can handle, the congestion control algorithm detects the congestion, and then informs the corresponding ingress edge routers. Upon receiving congestion feedback information, the ingress edge router reacts against the congestion by reducing burst transmission rate. Thus, as can be seen in Fig. 1, it takes round trip time (RTT) for a core router to detect the congestion, and then to receive the reduced burst ﬂow from the ingress edge router, which is called the congestion reaction delay. It means that a huge amount of data bursts might be lost during the congestion reaction delay if a long-lasting burst congestion presents. Congestion Detection

Decreased Burst Flow

RTT

Decrease Burst Transmission rate

time

BCP

Congestion detection

Burst

Core Router

Ingress Edge Router Congestion Feedback Information

Fig. 1. Eﬀect of congestion reaction delay

One can handle the burst loss problem using the burst congestion control algorithm based on the current traﬃc information [2]. Let τ , D(t) and Lth denote the sampling period, the amount of data burst received in [t, t + τ ] and the I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1221–1224, 2007. c IFIP International Federation for Information Processing 2007

1222

M. Yoo and J. Hwang

congestion detection threshold, respectively. Then, this congestion control algorithm makes the decision, if D(t) ≥ Lth , then congestion condition. Otherwise, normal condition. Due to the congestion reaction delay, this approach suﬀers from a high burst loss when the long-lasting burst congestion presents. The burst loss problem can be handled by the burst congestion estimation algorithm based on a single statistic [3]. Let Davg (t) denote the weighted moving average of the samples at time t, where each sample D(t) indicates the amount of data burst received in a sample period [t, t + τ ]. Then, Davg (t) is obtained by Davg (t) = (1 − α) × Davg (t − 1) + α × D(t), 0 < α < 1.

(1)

The congestion estimation algorithm based on a single statistic makes the decision, if Davg (t) ≥ Lth , then congestion condition. Otherwise, normal condition. If τ is shorter than RTT, it suﬀers from the congestion reaction delay penalty. τ can be set to RT T to eliminate the congestion reaction delay, but it leads to average down the traﬃc characteristics shown in shorter intervals than τ .

2

Burst Control Algorithm with Long-Term Estimation: BCA-LTE

The proposed congestion control algorithm utilizes multiple statistics on burst traﬃc, which are measured over the various sampling periods. Let k denote segment constant where k is an arbitrary positive integer, and divide RTT into 2k small segments where each segment spans RT T /2k (= RT T × 2−k ). We deﬁne the smallest segment as τ2−k , which is the smallest burst traﬃc sampling period for estimation. For every τ2−k , obtain D2−k (t) and get Davg,2−k (t) by applying Eqn. (1). Now, deﬁne another sampling period, where the samples are taken for every two τ2−k . This sampling period spans RT T /2(k−1)(= RT T × 2−(k−1) ), which is denoted as τ2−(k−1) . For every τ2−(k−1) , obtain D2−(k−1) (t) and get Davg,2−(k−1) (t) by applying Eqn. (1). By increasing the sampling period by a factor of two until it reaches to RTT, one can deﬁne (k + 1) diﬀerent sampling periods (τ2−i , i = k, k − 1, · · · , 0) and obtain (k +1) statistics (Davg,2−i (t), i = k, k −1, · · · , 0) for each sampling period. Davg,2−i (t) reﬂects how burst traﬃc changes in short and long terms. One can summarize all statistics into a single estimation point using Eqn. (2), where EST stands for estimated burst traﬃc at future time t+RT T . In Eqn. (2), ESTk (t + RT T ) is a predicted burst traﬃc condition at t + RT T , where the statistics on longer sampling periods are reﬂected more. EST0 (t + RT T ) = Davg,2−k (t), EST1 (t + RT T ) = α × EST0 (t + RT T ) + (1 − α) × Davg,2−(k−1) (t), EST2 (t + RT T ) = α × EST1 (t + RT T ) + (1 − α) × Davg,2−(k−2) (t), .. . ESTk (t + RT T ) = α × EST(k−1) (t + RT T ) + (1 − α) × Davg,2−(0) (t).

(2)

Optical Burst Control Algorithm

1223

In order to have more estimation points in a RTT period, we introduce the tier and deﬁne N and n as the number of tiers and tier index, respectively, where n = 0, 1, · · · , N − 1. Tier[0] becomes a timing reference to other N − 1 tiers. The timing diﬀerence between two neighboring tiers equals to RT T /N . Thus, each tier starts its RTT at t + RT T × n/N . Each tier runs the estimation algorithm as in Eqn. (2) and generates its own estimation point. By having N tiers, one can get N estimation points over a RTT period. At every estimation point that each tier generates, the congestion control algorithm makes decision, if ESTk (t + RT T ) ≥ Lth , then congestion condition. Otherwise, normal condition.

3

Simulation Results and Discussion

For performance evaluation, we consider a simple network topology as shown in Fig. 2. We assume a simple burst assembly mechanism, where the burst rate is controlled by adjusting the burst size to be assembled. For the comparison purpose, we evaluate the performance of three algorithms: the proposed algorithm (indicated as BCA-LTE), the algorithm based on a single statistic (indicated as SINGLE) and the algorithm without any burst congestion control (indicated as w/o CC).

Flow 1 Ingress Edge Router A 40Gbps 160Km

40Gbps 160Km

Bottleneck link

Egress Edge Router C

40Gbps 160Km 40Gbps 160Km

Core Router E

Ingress Edge Router B

Core Router F Flow 2

40Gbps 160Km

Egress Edge Router D

Fig. 2. Simulation topology

Table 1 shows that BCA-LTE algorithm outperforms SINGLE algorithm in terms of estimation accuracy. In fact, BCA-LTE algorithm achieves about 95% of accuracy. These results prove that the estimation based on multiple statistics helps to improve the accuracy of congestion detection. Table 1. Estimation Accuracy SINGLE BCA-LTE Accuracy Inaccuracy

83.32%

94.92%

8.36%

2.04%

False alarm 8.32%

3.04%

Miss

1224

M. Yoo and J. Hwang

Fig. 3. (a) Burst loss rate (b) Throughput

The performance of three algorithms are compared in Fig. 3 in terms of burst loss rate and throughput. For the algorithm without burst congestion control, the burst loss rate keeps linearly increasing with the traﬃc load. It achieves a high throughput around 36 Gbps at the cost of high burst loss rate. Two algorithms (SINGLE and BCA-LTE) well control the congestion. In particular, BCA-LTE keeps the burst loss rate lower than SINGLE (about 28% gain), while achieving higher throughput. This also well supports the advantage of multiple statistics on estimation. It is expected that BCA-LTE algorithm can enhance the stability of OBS network by proactively and accurately controlling the burst rate.

Acknowledgement This work was supported by the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korea government(MOST) (No. R11-2000-07401001-0).

References 1. C. Qiao and M. Yoo, “Optical Burst Switching (OBS) - A New Paradigm for an Optical Internet,” Journal of High Speed Networks, Vol. 8, No. 1, pp. 69-84, Jan. 1999. 2. G. Thodime, V. M. Vokkarane and J. Jue, “Dynamic Congestion-Based Load Balanced Routing in Optical Burst-Switched Networks,” In Proc. of Global Telecommunications Conference (GLOBECOM), Vol. 5, pp. 2628-2635, Dec. 2003. 3. Y. Gu, H. O. Wang, Y. Hong and L. G. Bushnell, “A Predictive Congestion Control Algorithm for High Speed Communication Networks,” In Proc. of American Control Conference, 2001.

Incremental Provision of QoS Discarding Non-feasible End-to-End Paths Alfonso Gazo-Cervero, José Luis González-Sánchez, and Francisco J. Rodríguez-Pérez Computer Science Department, University of Extremadura, Escuela Politécnica de Cáceres, Av/ Universidad s/n, 10071 - Cáceres, Spain {agazo,jlgs,fjrodri}@unex.es

Abstract. Trying to find a practical solution for QoS provision capabilities over IP networks has been subject of a great research effort during last years. The main aim of the proposal presented in this paper is to allow QoS provision over IP networks without the requirement of upgrading every router of the Internet. Upgrading some routers within the network is still needed, but no constraints do exist related to which routers might be upgraded.

1 Introduction and Related Work Enhancing the current Internet to include QoS provision capabilities is still an open issue that has been researched since the Integrated Services (IntServ) specification. One of the approaches is to enhance every router in the Internet so that QoS flows are prioritized at each hop in case of congestion. Another approach is to apply strict admission control mechanisms to every incoming traffic flow, be a QoS flow or not, so congestion would be avoided. Finally, the overprovisioning approach tries to assure that there are enough resources in the network to satisfy traffic demand. Bandwidth is an important issue for routers which are not in the backbone, so the overprovisioning approach does not seem appropriate for them. For the other approaches to be applied, current equipment must be modified. However, current size of the Internet indeed makes its modification a challenging task. Because of this, our work describes a generalized proposal that allows QoS provision to be deployed over current protocols in the Internet. Modifications to current equipment are still needed, but it is not mandatory to deploy them at every node within the Internet. We have called it incremental, because it allows providers to avoid migrating their whole network, but only parts of it. Our proposal is designed to work over a variety of different network technologies to achieve QoS provision such as MultiProtocol Label Switching (MPLS) and Open Shortest Path First (OSPF). It uses an automated manager that is inspired on the Bandwidth Broker (BB) described in the two-bit Differentiated Services (DiffServ) architecture. Automated management is still an open issue in QoS provision. It has been researched in many works, some of them suggesting a centralized manager [1,2] or distributed management [3,4].

This work is sponsored in part by the Regional Government of Extremadura, Spain (Education, Science and Technology Council) under GRANT No PRIA06145.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1225–1228, 2007. c IFIP International Federation for Information Processing 2007

1226

A. Gazo-Cervero, J.L. González-Sánchez, and F.J. Rodríguez-Pérez

2 Overlay QoS Framework Our proposal is based on the creation and management of an overlay network. The overlay network is managed by an agent within the Autonomous System (AS), called Network Broker (NB). A number of edge nodes within the network include signaling capabilities to communicate to the NB. Users that are willing to request any guarantees for their traffic also include signaling capabilities. The proposal described in this paper extends the functions suggested for the BB of the DiffServ architecture. New functions incorporated include overlay network management and Traffic Engineering (TE) functions, if available. As a result, the element that carry out all these tasks is called NB. More details on this can be found in [5]. 2.1 Overlay Network Several Internet overlays have been designed in the past for various purposes, but what makes a difference in an overlay network for QoS provision is that there is a need for end-to-end guarantees. Our overlay network (figure 1) is defined as follows: let G (V, E) describe an AS network, where setof nodes and E the set of links. Overlay V is the network is described by G V ⊆ V, E ⊆ E if ∀v ∈ V : v ∈ V ⇔ is_qnode(v). The is-qnode function is defined as follows: if node is an edge router, is-qnode returns true if and only if node has classifying, marking, shaping, policing, scheduling and NB signaling capabilities; and if node is a core router, is-qnode returns true if and only if node has scheduling and policing capabilities.

Fig. 1. Original and overlay network representation

2.2 Resource Management The NB manages a database that contains assigned bandwidth to every traffic class at every link in the AS. Once admission control phase if passed, this database is updated to reflect the new bandwidth assignments at all links along calculated path. If we consider T = {t0 , ...,tn } the set of different traffic classes excluding best effort, let C(e) be link capacity for link e ∈ E , Rt (e) be the residual bandwidth remaining for traffic class t at link e ∈ E , the maximum bandwidth assignment Bt (e) for traffic class t at link e ∈ E is limited by C(e) · λt . The λt factor is bounded by 0 < λt < 1 and represents

Incremental Provision of QoS Discarding Non-feasible End-to-End Paths

1227

the proportional amount of bandwidth that would be available for traffic class t at every link. In addition, each λt must be chosen so that the following expression is fulfilled: 0<

∑ λt < 1

(1)

∀t∈T

Residual bandwidth RBE (e) for best effort traffic class at link e ∈ E is bounded by the following expression: C(e) · (1 − ∑∀t∈T λt ) ≤ RBE (e) ≤ C(e) ∀e ∈ E : RBE (e) = C(e)

if e ∈ E if e ∈ /E

(2)

2.3 Routing The NB has two modes of operation. The mode of operation is selected depending on the characteristics of the Interior Gateway Protocol (IGP) used at the AS. If explicit routing is allowed, NB builds explicit routes based on its own routing information for Quality of Service (QoS) flows. This is what we call Routing Mode, and explicit routes are signaled to q-routers. But if explicit routing is not available, the NB pre-calculate paths that the routing protocol is supposed to establish. This calculation is done at the admission control phase. Unlike the Routing Mode case, if explicit routing is not available, the NB calculate paths to accomplish admission control and resource management at the NB. Neither routers nor q-routers are signaled, because signaling them would imply a modification of the routing protocol, which we try to avoid.

3 Simulation Results To test the behavior of our proposal, the Network Simulator (ns-2) has been enhanced to incorporate the architecture described. The following simulation results are obtained using a representation of the AT&T topology publicly available. The distribution of qnodes within the topology is driven by a q-prob parameter, which represents the probability of a node to behave as a q-node and hence to take part on the QoS overlay topology. Either Routing and Non-Routing modes have been implemented in the simulator. Figure 2a demonstrates the behavior of the NB in Routing Mode for a number of scenarios with different values for the q-prob parameter and a variable number of flows. Figure 2b demonstrates the behavior of the NB in Non Routing Mode under the same scenarios and traffic load as the previous figure. The percentage of requested blocked flows increases when traffic load is also increased, as expected. When Non Routing Mode is used, TE techniques could not be applied thus causing a faster increase in the number of blocked flows compared to the case in which Routing Mode is used. Considering the q-prob parameter, the above figures show that, for the same scenario and traffic load, the more the number of q-nodes that are present within the network, the less the percentage of blocked flows.

1228

A. Gazo-Cervero, J.L. González-Sánchez, and F.J. Rodríguez-Pérez 100

100

90

90

80

80

70 % Blocked flows

% Blocked flows

70 60 50 40

60 50 40 30

30 20

20

q-prob=1 q-prob=.9 q-prob=.8 q-prob=.7 q-prob=.6 q-prob=.5

10

0

0 0

500

1000 1500 Traffic (flows x 1000)

2000

q-prob=1 q-prob=.9 q-prob=.8 q-prob=.7 q-prob=.6 q-prob=.5

10

2500

(a) Using the WSGP routing algorithm

0

500

1000

1500

2000

2500

Traffic (flows x 1000)

(b) Using the SGP routing algorithm

Fig. 2. Percentage of blocked flows over the AT&T topology

4 Conclusions and Future Work Main novelty of this proposal is the possibility that not every node within an AS are needed to support QoS provision mechanisms, but only some of them. Other proposals already exist with this approach, but they often require complex protocols that also interfere with best-effort service provision or have constraints related to which nodes must support QoS provision mechanisms. Results section of this paper presents how decreasing the amount of nodes which include QoS provision mechanisms has an impact over the QoS provision global capacity at the AS. But it is also shown that QoS provision is still possible under some circumstances. We have focused on MPLS and OSPF network technologies, yet its generalized approach let this proposal to be deployed over a wider range of technologies.

References 1. Scoglio, C., Anjali, T., Oliveira, J.C., Akyildiz, I.F., Uhl, G.: TEAM: A Traffic Engineering Automated Manager for DiffServ-based MPLS Networks. IEEE Communications Magazine 42 (2004) 134–145 2. Zhang, Z., Duan, Z., Hou, Y.T.: On scalable network resource management using Bandwidth Brokers. In: IEEE Network Operations and Management Symposium. (2002) 3. Bhatnagar, S., Nath, B.: Distributed Admission Control to Support Guaranteed Services in Core-Stateless Networks. In: IEEE INFOCOM 2003. Volume 3. (2003) 1659–1669 4. Lima, S., Carvalho, P., Freitas, V.: Distributed Admission Control for QoS and SLS Management. Journal of Network and Systems Management. Special issue "Distributed Management of Networks and Services" 12 (2004) 397–426 5. Gazo-Cervero, A., González-Sánchez, J.L.: Incremental QoS deployment based on Network Brokers. In: HET-NETs’04. 2nd International Working Conference. Performance Modelling and Evaluation of Heterogeneous Networks. (2004)

Enhancing Guaranteed Delays with Network Coding Ali Mahmino1 , J´erˆome Lacan2 , and Christian Fraboul3 1

ENSICA/IRIT/INPT/ENSEEIHT, University of Toulouse 1, place E. Blouin, 31056 Toulouse Cedex, France [email protected] 2 LAAS-CNRS/ENSICA, University of Toulouse [email protected] 3 IRIT/INPT/ENSEEIHT, University of Toulouse [email protected]

Abstract. For networks providing QoS guarantees, this paper determines and evaluates the worst case end-to-end delays for strategies based on network coding and multiplexing. It is shown that the end-to-end delay does not depend on the same parameters with the two strategies. This result can be explained by the fact that network coding can cope with congestions better than classical routing because it processes simultaneously packet from diﬀerent ﬂows. In counterpart, additional delays like algebraic combinations of packets are added. Keywords: Network Coding, network calculus, worst-case delays, buﬀering.

1

Introduction

In networks providing quality of service (QoS) guarantees (ATM, IntServ, DiﬀServ, . . .) input data ﬂows verify constraints of burstiness and maximal throughput. On the other hand, the network ensures a level of quality of service (QoS) guarantee expressed in terms of end-to-end delays or minimal throughput. The diﬀerent guarantees and constraints characterizing the network and the ﬂows can be represented by using network calculus. Network Calculus is a framework providing deterministic bounds of QoS parameters such as end-to-end delays or backlogs by using Min-Plus algebra. This theory was introduced and developed in [1] by generalizing previous works such as [2]. In this paper, we focus on decreasing worst case end-to-end delays by using network coding techniques. Network coding is a concept introduced by Ahlswede et al. in [3], consisting in combining packets in routing nodes instead of simply forwarding them as a classical routing strategy. In addition to the optimization of the utilization of the capacity of the network, this technique also allows to decrease average end-to-end delays. For a complete ﬁle, the evaluation of the delay performance gains of several network coding strategies was studied in [5] in the context of a downlink transmission of ﬁles from a single base station to I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1229–1232, 2007. c IFIP International Federation for Information Processing 2007

1230

A. Mahmino, J. Lacan, and C. Fraboul

multiple receivers. In [4], a queuing model introduced for network coding allows to estimate the average end-to-end delays of the packets for independent Poisson ﬂows in whole networks. The problematic considered here concerns also end-to-end delays like in [4], but the ﬂows are constrained and the network provide QoS guarantees. In this context, our main objective is to reduce the worst cases delays. The main idea is based on the fact that, with network coding, the node processes several packets simultaneously. Compared to a classical routing approach dealing with packets in sequence, this allows to reduce the maximum time spent by packets in the buﬀers. In counterpart, other delays are added like the time needed to achieve the linear combination or the time spent by a packet while it waits the corresponding packets arriving from other links. The network hypotheses and the network coding node strategy are detailed in Section 2. The QoS guarantees provided by the nodes/network are evaluated and compared with a classical routing strategy in Sections 3 and 4. Finally, Section 5 presents the main conclusions.

2

Network Hypotheses

Consider a communication network represented by an acyclic directed graph G = (V, E) with a vertex set V and an edge set E. The set of nodes is divided into three categories: source nodes, intermediate nodes and receiver nodes. Each Source node Si generates a ﬂow Fi of packets of ﬁxed length L constrained by aﬃne arrival curves γρi ,σi where ρi is the leak rate and σi is the burstiness size (see [1]). Intermediate nodes are the nodes situated between the sources and the receiver nodes. The nodes, e.g. vi and vj , are connected by directed edges, denoted by ei,j , each of them having a given maximum capacity Ci,j and a maximum transmission delay Ti,j . The service provided by the link is represented by a rate latency service curve βCi,j ,Ti,j (t) (see [1]). We assume that all the nodes are synchronized and that each node knows the maximum transmission delays between itself and the neighbors generating its input ﬂows. We suppose that the capacity of every output edge of a node is greater or equal than the sum of capacities of all input edges. We assume that the time necessary to perform a linear combination of several packets does not depend on the number of packets and is upper-bounded by Tlc . Each intermediate node Nn+1 oﬀers a service curve βCn+1 ,T n+1 +Tlc +τn+1 (t) where TBn+1 is the maximum time spent by a packet of i Bi

ﬂow i in the buﬀers while waiting for corresponding packets of others ﬂows. τn+1 is the service delay to transmit a packet. Each multiplexer node Nn+1 oﬀers a service curve βCn+1 ,τn+1 (t) where τn+1 is the service delay to transmit a packet of the aggregate ﬂow and Cn+1 is the output edge capacity (see Figure 1(a)). The network coding strategy described here is based on a ﬁxed network code determined a priori. To ensure that receivers decode all the received packets, the concept of packet generation is used, where each generation corresponds to a ﬁxed-length time interval [ti , ti + Δ[ in which a source generates at most one packet of a ﬂow. The linear network code only combines packets belonging to the

Enhancing Guaranteed Delays with Network Coding

(a) N-ﬂows multiplexer

1231

(b) Transmission of a ﬂow through N nodes

Fig. 1. Node Level and Network Level examples

same generation. To perform a linear combination of several packets, the node must wait all the packets of the same generation. Since the network provides delays guarantees, a node is able to compute a time limit for reception of a given generation. If a packet of a given generation is not received before the time limit, the node assumes that the source did not generate a packet in this time interval, and it performs combinations with the available packets. To evaluate the interest of using network coding compared with a classical routing/multiplexing approach, we proceed in two steps: Node Level and Network Level.

3 3.1

Multiplexing vs. Coding: Delay Analysis at Node Level Multiplexing Delay

We suppose that the FIFO multiplexer node Nn+1 presented on Figure 1(a) oﬀers a service curve βCn+1 ,τn+1 to the aggregated ﬂow. From [1] and the hypotheses (and notations) introduced in Section 2, it can be shown that the maximum delay experienced by a packet of the ﬂow Fi in this multiplexer node is Timn = n n σk +ρk Tk,n+1 )/(Cn+1 )+(σi +ρi Ti,n+1 )/(Cn+1 − ρk ). Then τn+1 +( k=1,k=i

k=1,k=i

the arrival curve (see [1]) of the ﬂow Fi at the output of the multiplexer is equal n σk + ρk Tk,n+1 )/(Cn+1 )). to γρi ,σi∗ where σi∗ = σi + ρi (Ti,n+1 + τn+1 + ( k=1,k=i

3.2

Coding Delay

Let us now consider the node presented on Figure 1(a) as an intermediate node combining the ﬂows Fi , i = 1, . . . , n, respectively constrained by γρi ,σi . Following the hypotheses and notations introduced in Section 2, it can be shown that the + maximum delay experienced by a packet of Fi in the node is Tinc = TBn+1 i Tlc + τn+1 + σi + ρi Ti,n+1 /Cn+1 . Note that this delay corresponds to the delay between an input packet and the corresponding encoded packet. An arrival curve of a subﬂow corresponding to Fi at the output of this intermediate node is γρi ,σi∗ where σi∗ = σi + ρi Ti,n+1 + ρi (TBn+1 + Tlc + τn+1 ). i

1232

4

A. Mahmino, J. Lacan, and C. Fraboul

Multiplexing vs. Coding: End-to-End Delays Analysis

The properties of the network calculus facilitate the generalization of the results at node-level to the network level. From [1] and Section 3.1, it can be shown that a FIFO multiplexer node Nk (see Figure 1(b), node N2 ) oﬀers to the ith input ﬂow a service curve m m k k i k Cs,k and Yik = σs /Ck,r and mk βC k k (t) where Xi = k,r −X ,τk +Y i

i

s=1,s=i

s=1,s=i

is the number of input ﬂows to node k and Ck,r is the output edge capacity. The maximum delay of the ﬂow Fi from the source to the output of this netn−1 n i i work is Timn = Ttot + Cσii where Ttot = T1,1 + Tk,k+1 + τk + Yik and tot

k=1

k=1

i Ctot =min(C1,1 , C1,2 − X1 , C2,3 − X2 , · · · , Cn,n+1 − Xn ). For the coding case, from the results of Section 3.2, the maximum delay of the i + Cσii ﬂow Fi from the source to the output of this network is equal to Ticn = Ttot tot n−1 n i i where Ctot = min(C1,1 , C1,2 , · · · , Cn−1,n ) and Ttot = T1,1 + Tk,k+1 + TBik + k=1

k=1

Tlc + τk . Even if the worst case delays of multiplexing and coding strategy are diﬃcult to compare in the general case, the main interest of these formulas is to emphasize the parameters that must be taken into account to decrease the worst case endto-end delays.

5

Conclusion

This paper has presented a new interest of network coding. The worst case endto-end delays were determined for classical routing/multiplexing and network coding approaches in networks providing QoS guarantees. These results show that the two formulas do not depend on the same parameters. These results can be viewed as the ﬁrst step toward the construction of networks using network coding providing optimal level of guaranteed end-to-end delays.

References 1. Le Boudec, J.Y., Thiram, P. : Network Calculus A Theory of Deterministic Queuing Systems for the Internet. Lecture Notes in Computer Science, Vol. 2050. Springer Verlag, (2001) 2. Cruz, R.L.: A Calculus for Network Delay, Part I : Network Elements in Isolation. IEEE Transactions on Information Theory, Vol. 37 no. 1. January(1991) 114-131 3. Ahlswede, R., Cai, N., Li, S.-Y.R., Yeung, R.W.: Network Information Flows. IEEE Transactions on Information Theory, Vol. 46. July(2000) 1204-1216 4. Y. Ma, W. Li, P. Fan, and X. Liu, “Queuing Model and Delay Analysis On Network Coding,” in ISCIT, October 2005. 5. Eryilmaz, A., Ozdaglar, A., M´edard, M.: On Delay Performance Gains from Network Coding. invited paper, Proceedings of the Conference on Information Sciences and Systems (CISS). (2006)

LPD Based Route Optimization in Nested Mobile Network Jungwook Song1 , Heemin Kim1 , Sunyoung Han1, , and Bokgyu Joo2 Department of Computer Science and Engineering, Konkuk University 1 Hwayang, Gwangjin, Seoul 143-701, Korea {swoogi,procan,syhan}@konkuk.ac.kr Department of Computer and Information Communications, Hongik University 300 Shinan, Jochiwon, Chungnam 339-701, Korea [email protected] 1

2

Abstract. IETF working groups developed mobile IP protocols to support host mobility, and the NEMO(Network Mobility) Working Group, speciﬁcally, developed the NEMO Basic Support Protocol. The NEMO basic solution enables mobile networks to change their point of attachment to the Internet. The protocol, however, leads to suboptimal route, and other problems. The problems become serious when mobile networks are nested. In this paper, we present a route optimization mechanism for nested mobile networks based on ‘Limited Preﬁx Delegation’ technique. We present performance evaluation results by simulation to conﬁrm the eﬀectiveness.

1

Introduction

As the mobile access networks become utility for everyday life, there will be thousands of mobile nodes change their locations simultaneously and, in a city, mobile networks will be common place. IETF standardized NEMO(Network Mobility) Basic Support Protocol(NBSP)[1] to support network mobility by extending MIPv6[2]. With the NBSP, networks as well as hosts can freely change the points of attachment to the Internet, while nodes in the network still preserving their on-going communication sessions to. The NBSP, however, does not provide optimal routing path of data packets, which results in various problems including packet delay and loss[3], and the problems become serious when mobile networks are nested. The nesting of mobile networks will become common in the future, since lots of networks as well as hosts will change their locations freely. For example, people with personal area network(PAN) ride into a bus with its own mobile network, and buses move into a car-ferry with its own mobile network. Therefore, route optimization of mobile network is major issue to be solved in mobile IP research. In this paper, we propose a route optimization solution based on ‘Limited Preﬁx Delegation’(LPD) technique. Our solution is an extension of the NBSP

This research was supported by the ‘Seoul R&D Program’. Corresponding author.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1233–1236, 2007. c IFIP International Federation for Information Processing 2007

1234

J. Song et al.

by making some modiﬁcation on mobile router(MR) function. The concept of our solution was presented elsewhere in a short article [4]. In this paper, we complete the concept by performing a comprehensive analysis of the mechanism and presenting simulation results. By simulation we showed the eﬀectiveness of our solution compared to NBSP. We also analyze our mechanism for real situations that may arise during the deployment stage.

2

LPD-Based Route Optimization

The problem of NBSP become serious if mobile networks are nested. As the nesting level increases, so is the number of bi-directional tunnels between MRs and their HAs. The route of data packets become longer and complicated(called ‘pinball routing problem’). Moreover, the needs for multiple encapsulation(deep tunnels) increases the processing overhead at HA and MR, as well as the header size. The eﬀects of sub-optimal route of the NBSP are fully described in [3]. 2.1

Limited Prefix Delegation Technique

Our LPD mechanism added and modiﬁed some features to the NBSP as follows: 1. Add a new RA(Router Advertisement) option 2. Extend the binding update procedure at the mobile router 3. Modify tunneling process at the mobile router In our mechanism, key concept is the delegation of the access router’s preﬁx to the MRs underneath. The eﬀect is that only one MR in the nested mobile network opens direct tunnel to the CN, and route optimization is achieved. For the MR that is attached to the access router directly, the delegation is simple. To achieve this, we add new RA option called ‘delegated preﬁx option’ to RA message. MNNs other than MR silently ignore this option. This way, all MRs in the nested mobile network will have care-of addresses that is the subnet address of the access router of visited network. More detail mechanism is described in [4]. 2.2

Analysis of Routes Taken by LPD Mechanism

Please refer to the Appendix B of [3] in parallel with this section. Then, we consider special situation during deployment stage of this mechanism and show that ours still achieves optimal route practically possible. Analysis of all cases(case A to L) are summarized in Table 1. During deployment stage of LPD mechanism, however, we could not expect that all MRs in nested mobile networks are LPD-extended, i.e., some MRs are just ‘plain MR’ supporting only NBSP and some are ‘LPD-extended MR’. Our solution still achieves near-optimal route even in that circumstance. In the Case B of [3], if all MRs are ‘plain MR’ and CN is Mobile IPv6-enabled, the routing path of packets between VMN and CN by NBSP would be taken

LPD Based Route Optimization in Nested Mobile Network

1235

Table 1. Routing Path of Each Case with LPD. For example, the Case A - both MNN and CN are ﬁxed nodes with no mobility functions. In our solution, MR3 opens a tunnel to CN directly, so that packets from the LFN to CN takes an optimized route. Since CN has no mobility function, it sends packets to MR3 HA, then MR3 HA opens tunnel to MR3 directly. CASE Routing Path of LPD Mechanism A

(to CN) LFN → MR3 → MR2 → MR1 → CN (from CN) LFN ← MR3 ← MR2 ← MR1 ← MR3 HA ← CN

B

VMN ↔ MR3 ↔ MR2 ↔ MR1 ↔ CN

C

(to CN) VMN → MR3 → MR2 → MR1 → CN (from CN) VMN ← MR3 ← MR2 ← MR1 ← VMN HA ← CN

D, E LFN(VMN) ↔ MR3 ↔ MR2 ↔ MR1 ↔ MR4 ↔ MR5 ↔ CN F

VMN ↔ MR3 ↔ MR2 ↔ MR1 ↔ MR4 ↔ MR5 ↔ CN

G, H LFN(VMN) ↔ MR3 ↔ MR2 ↔ MR1 ↔ MR4 ↔ MR5 ↔ CN I J, K L

VMN ↔ MR3 ↔ MR2 ↔ MR1 ↔ MR4 ↔ MR5 ↔ CN LFN(VMN) ↔ CN VMN ↔ MR3 ↔ CN

pinball routing(see [3], the Case B of the Appendix B). In the situation when both types of MRs co-exist in this case, there are two cases: two of three MRs are LPD-extended(Case I) and only one of three MRs is LPD-extended(Case II). The routing path of data packets summarized in Table 2. Table 2. Routing Path of Special Cases. These results show that our solution still achieves near-optimal route possible in any situation. CASE

Routing Path of LPD Mechanism

Special Case I

VMN ↔ MR3 ↔ MR2 ↔ MR1 ↔ MR1 HA ↔ CN

Special Case II VMN ↔ MR3 ↔ MR2 ↔ MR1 ↔ MR1 HA ↔ MR3 HA ↔ CN

3

Performance Evaluation

To evaluate the performance of our solution, we performed simulation with OMNeT++, which is open architecture simulation environment for communication network[5]. We considered all twelve cases(A to L) of network conﬁgurations speciﬁed in the Appendix B of [3]. We also performed simulation on two special cases described above. The results of simulations are summarized in Table 3 and Table 4.

1236

J. Song et al.

Table 3. Simulation Results(Average RTT). This table shows the statistical results of RTT values for ten conﬁguration model cases. The average RTT values of the LPD mechanism signiﬁcantly smaller than those of the NBSP for all cases. Case

NBSP

LPD

Case

NBSP

LPD

A B C D, E

429ms 429ms 533ms 645ms

218ms 114ms 114ms 120ms

F G, H I L

750ms 753ms 858ms 858ms

120ms 18ms 18ms 6ms

Table 4. Simulation Results of Special Cases(Average RTT). We also did simulation for the special cases that may arise during deployment. This result shows that even one LPD-enabled MR can decrease network cost.

4

Case

Mean of RTTs

NBSP A Special Case I Special Case II

429ms 219ms 324ms

Concluding Remarks

In this paper, we proposed the LPD mechanism to solve the route optimization problem of mobile networks that delegates the network preﬁx of access router to all MRs attached behind. To achieve route optimization, we made some extension to the MR only. In our solution, no change is necessary to other network elements, like MNN, CN, and HA. We need more investigation on security issues to complete the protocol. We are planning to implement our solution on NEPL[6].

References 1. V. Devarapalli, R. Wakikawa, A. Retrescu, P. Thubert: “Network Mobility (NEMO) Basic Support Prototol”, RFC3963, IETF, 2005 2. D. Johnson, C. Perkins, J. Arkko: “Mobility Support in IPv6”, RFC3775, 2004 3. C. Ng, P. Thubert, M. Watari, F. Zhao: “Network Mobility Route Optimization Problem Statement”, IETF nemo WG Draft, “draft-ietf-nemo-ro-problemstatement-03”, 2006 4. Jungwook Song, Sunyoung Han, Kiyong Park: “Route Optimization in NEMO Environment with Limited Preﬁx Delegation Mechanism”, ICCS2006, Part I, LNCS 3991, pp. 936-939, 2006 5. Andr´ as Varga: OMNeT++: http://www.omnetpp.org 6. Mobile IPv6 for Linux(MIPL): http://www.mobile-ipv6.org/

PIBUS: A Network Memory-Based Peer-to-Peer IO Buffering Service∗ Yiming Zhang, Dongsheng Li, Rui Chu, Nong Xiao, and Xicheng Lu National Laboratory for Parallel and Distributed Processing, NUDT, Changsha 410073, Hunan, China [email protected]

Abstract. This paper proposes a network memory-based P2P IO BUffering Service (PIBUS), which buffers blocks for IO-intensive applications in P2P network memory like a 2-level disk cache. PIBUS reduces the IO overhead when local cache is missed due to speed advantage of network memory over disks, and improves hit ratio based on accurate classification of IO behaviors.

1 Introduction Under the limit in size of disk cache, the performance of IO-intensive applications is determined by the hit ratio of disk cache and IO overhead when the cache is missed [1]. However, under the limit of local cache neither of them can improve significantly. Recently we proposed RAM-Grid [2], which proved the feasibility of using Internet memory for performance purpose. This motivates us to the idea of using Internet memory for IO-intensive applications. However, the assumption of unlimited network memory by RAM-Grid doesn’t work here [3], thus indiscriminately buffering all blocks in network memory (like what RAM-Grid does) is impractical and prohibitive. In this paper we propose a network memory-based P2P IO BUffering Service (PIBUS), in which each node views the free memory of other nodes in P2P overlays as a 2-level disk cache. PIBUS is built on top of Armada DHT [4], and includes two basic services, namely, caching service to reduce the latency of IO operation, and policy service to improve the hit ratio of local cache. Based on the speed advantage of network memory over local disks, PIBUS reduces the IO latency when the local cache is missed. Furthermore, PIBUS helps to manage local cache through fine-grained classification of IO behaviors, and then improves hit ratio effectively.

2 Block Caching Service Due to lack of space, all detailed analysis is omitted here and can be found at [5]. The blocks are sent to caching service when replaced out of local cache. Network IO is usually faster than local disk IO due to the speed advantage of network memory ∗ The work was partially supported by the National Natural Science Foundation of China under Grant No. 60673167 and No. 90412011, and the National Basic Research Program of China (973) under Grant No. 2005CB321801. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1237–1240, 2007. © IFIP International Federation for Information Processing 2007

1238

Y. Zhang et al.

over local disks. And these blocks are fetched back through networks simultaneously with local disk file reads when re-accessed. There are 4 sets of nodes in PIBUS, namely, Provider, Consumer, Potential Provider, and Potential Consumer. 1) When a Potential Provider finds its local memory utilization exceeds some threshold, it turns into a Potential Consumer, and a Cancel-Publication and a Make-Subscription occur. 2) When a Potential Consumer finds its own memory utilization is less than some certain threshold, it becomes a Potential Provider, and a Cancel-Subscription and a Make-Publication occur. 3) When a Potential Consumer starts its IO-intensive applications and begins buffering its obsolete blocks in subscribed caching services, it becomes a Consumer. 4) When a Potential Provider begins its caching service, it becomes a Provider and a Cancel-Publication occurs. 5) When a Consumer ends its IO-intensive applications it becomes a Potential Consumer. 6) When a Provider’s caching service is no longer used, it becomes a Potential Provider and a MakePublication occurs. 7) When a Provider finds its utilization exceeds some threshold, it handovers its buffered blocks to others and turns into a Potential Consumer.

3 Replacement Policy Service Similarly to other policies, In PIBUS the Consumer records accesses that hit its local cache and makes a coarse-grained preliminary classification as Unknown, One-off, and, Repeating. These three patterns have their counterparts in recent proposals [1]. However, due to lack of records of each block’s access history, in practice a large fraction of so-called Repeating blocks is not really accessed periodically. To account for this imprecision, policy service takes an accurate record of the accesses to blocks buffered in its memory and further classifies the Repeating pattern as Regular-Looping, Single-Clustered, and Multi-Clustered, as shown in figure 1. Multi-Clustered

Regular-Looping

Single-Clustered One-off

Fig. 1. Examples of IO patterns

As shown in figure 2, each Consumer maintains a File table, in which each entry represents the IO record of a file, including the file ID, the count of One-off accesses, the count of Repeating accesses, the current IO pattern, and the period of Repeating accesses. For each file at the Consumer, the Provider maintains a Block table, in which each entry records the block number, the current access pattern, Period1, Period2 and the last access time. Period1 is used to record the period of Regular-Looping accesses, or the intra-cluster period of Single-Clustered and Multi-Clustered accesses; Period2 is used to record the inter-cluster period of Multi-Clustered accesses.

PIBUS: A Network Memory-Based Peer-to-Peer IO Buffering Service

1239

Fig. 2. Data structures of policy service

When a block is accessed we first look up the block table. If the block is not in the table the One count is increased; otherwise the Rep count is increased and the One count is decreased. If the block has been accessed it will be further identified by the Provider. Otherwise the block takes the current pattern of its file as its pattern. Files are classified based on Rep, One and Threshold. If the Rep count is greater than the One count, the file is classified as Repeating. Otherwise the file is classified as Oneoff if One is greater than Threshold, or Unknown if One is less than Threshold.

Fig. 3. Further identification algorithm

As shown in figure 3, policy service further identifies Repeating blocks. If the access intervals of a Repeating block differ within a range, it is classified as Regular-Looping and uses Period1 to record the period. If a block has been classified as Regular-Looping but won’t be accessed any more, it is classified as Single-Clustered and uses Period1 to record the intra-cluster period. Otherwise it is classified as Multi-Clustered, and uses Period1 for intra-cluster period and Period2 for inter-cluster one.

1240

Y. Zhang et al.

Blocks are placed in the corresponding subcache. 1) The Unknown blocks in Unknown subcache are managed by LRU. 2) The One-off blocks are discarded because they won’t be accessed any more. 3) All the Repeating blocks are buffered in Repeating subcache. The blocks are ranked according to the estimated next access time (ENAT) and the last one will be replaced out. ENAT is estimated as follows: RegularLooping blocks use (Block->Period1 - LastAccessTime). Single-Clustered blocks use (Block->Period1 - LastAccessTime). Multi-Clustered blocks use (Block->Period1 LastAccessTime) if the interval between the current time and LastAccessTime is less than n × Block->Period1; otherwise use (Block –>Period2 - LastAccessTime).

4 Evaluation We trace an actual meteorological application by modifying the Linux kernel to record all the IO operations, including the access time, file ID and offset in the file. We have built a simulation environment for RAM-Grid in our previous work [2]. Based on this, we simulate PIBUS by using Armada for resource discovery. The underlying network is composed of 1,000 nodes with different memory capacity, CPU frequency, bandwidth, etc. The influence of PIBUS on system IO performance is shown in table 1. Table 1. PIBUS VS. LRU under different local cache sizes: completion time/hit ratio(%)

LRU PIBUS

32 MB 56.5s/5 29.2s/24

64 MB 53.2s/10 25.5s/28

128 MB 51.4s/13 22.3s/35

256 MB 50.8s/15 14.5s/68

512 MB 41.2s/33 11.6s/79

1024 MB 10.1s/83 10.1s/83

5 Conclusion PIBUS uses Armada protocol as its underlying resource discovery infrastructure and is composed of caching service and policy service. Trace driven simulation shows that PIBUS improves the performance of IO-intensive applications efficiently.

References 1. C. Gniady, A.R. Butt, and Y.C. Hu, Program-Counter-Based Pattern Classification in Buffer Caching, in: OSDI 2004. 2. R. Chu, N. Xiao, Y. Zhuang, Y. Liu, and X. Lu, A Distributed Paging RAM Grid System for Wide-area Memory Sharing, in: IPDPS 2006. 3. S. Ghemawat, H. Gobioff, and S.T. Leung, The Google File System, In: SOSP 2003, 4. D. Li, J. Cao, X. Lu, K.C.C. Chan, B. Wang, J. Su, H. Leong, A.T.S. Chan, Delay-Bounded Range Queries in DHT-based Peer-to-Peer Systems, in: ICDCS 2006. 5. http://www.kylinx.com/Papers/PIBUS_Networking_1208.pdf

A Subgradient Optimization Approach to Inter-domain Routing in IP/MPLS Networks Artur Tomaszewski1 , Michal Pi´ oro1,2 , Mateusz Dzida1 , Mariusz Mycek1 , and Michal Zago˙zd˙zon1 1

Institute of Telecommunications, Warsaw University of Technology, Poland 2 Department of Communication Systems, Lund University, Sweden Abstract. We present a mathematical model for a distributed process of routing optimization that could be run in the control plane of the Internet using existing EGP routing protocols. A more detailed description of the results presented in this paper is given in [1].

1

Introduction

Each domain of the Internet is usually operated as an autonomous system (AS), which, for security, competitiveness, administrative and technical reasons, does not reveal its sensitive internal information concerning topology, link capacities, traﬃc volumes, and routing parameters. Only a very limited amount of data is made available to other AS’s by means of exterior gateway protocols (EGP) such as BGP (see [2] and the discussion there): border routers of neighboring AS’s exchange information concerning the capability of reaching a destination, or a group of destinations, identiﬁed by a network address. It is virtually impossible for a single domain to decide on its own on the optimal ﬂow of inter-domain traﬃc through its routers/gateways using only the information about accessibility of given destinations via particular gateways, and having no knowledge of the overall network topology, routing resources, and endto-end traﬃc demand. Even if such information were made available to an AS, its local decisions would still have to be consistent with those in other domains. It seems that to reach a globally ”optimal” traﬃc routing, it would be required to implement a distributed, network-wide process of routing optimization which should be run in the control plane of the network using the existing EGP protocols. For this, an optimization model is needed that is decomposable into subproblems that could be solved locally in individual domains. Moreover, the results from solving the subproblems exchanged by the domains must be limited to only those pieces of information that directly concern the inter-domain traﬃc and inter-domain links, and can be distributed using EGP protocols. Below we present a decomposable mathematical model for the problem of optimizing bandwidth reservation levels on inter-domain links, satisfying the abovementioned requirements. The decomposition method of Lagrangean relaxation leads to a master problem that uses limited information delivered from individual domains: it concerns only inter-domain traﬃc and inter-domain links, and is based on detailed intra-domain routing solutions computed locally at AS’s. We show how to solve the master problem using a subgradient minimization method and how to recover optimal bandwidth reservation levels. I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1241–1244, 2007. c IFIP International Federation for Information Processing 2007

1242

2

A. Tomaszewski et al.

Primal and Dual Formulation

A network is represented by a directed graph G = (V, E) with the set of nodes V and the set of directed links E (E ⊆ V × V). The capacity of link e ∈ E is denoted by ce . Then, for U ⊆ V we deﬁne the set of links δ + (U) outgoing from set U, and the set of links δ − (U) incoming to set U (we write δ ± (v) instead of δ ± ({v}) when U = {v} is a singleton). M is the set of network domains. For each domain m ∈ M, E m is the set of links between the nodes in the same domain m (intra-domain links). Further, EO is the set of all inter-domain links, and EI = E \ EO is the set of all intra-domain links. Set D represents traﬃc demands between pairs of nodes. The starting and terminating node of demand d ∈ D is denoted by s(d) and t(d), respectively, and hd is the traﬃc volume of demand d ∈ D. Also, D(s, t) denotes the set of all demands from node s to node t. In the sequel, zd will denote the variable specifying the percentage of volume hd actually handled in the network, i.e., zd hd is the carried traﬃc of demand d. The set of all demands originating in domain m ∈ M is denoted as Dm . The basic quantity in the considerations of this paper is xet —a variable specifying the amount of aggregated bandwidth (called flow in the sequel) reserved on link e ∈ E for the traﬃc destined for (a remote) node t ∈ V. For each − inter-domain link e ∈ EO we also introduce variables x+ et and xet expressing, respectively, the amount of traﬃc carried on e and destined for t assumed by domain in which link e originates, and by domain in which link e terminates. We − write down one obvious constraint stating that x+ et = xet = xet , e ∈ EO , t ∈ V, the ﬂow conservation constraint, and the capacity constraint:

xet +

e∈δ + (v)∩EI

x+ et −

e∈δ − (v)∩EI

e∈δ + (v)∩EO

=

xet −

x− et

e∈δ − (v)∩EO

zd hd ,

t, v ∈ V, v = t

(1a)

d∈D(v,t)

xet ≤ ce ,

e ∈ E.

(1b)

t∈V

For each domain m ∈ M we introduce ﬂow vectors z m = (zd : d ∈ Dm ), + m m− xm = (xet : e ∈ E m , t ∈ V), xm+ = (x+ = (x− et : e ∈ δ (V ), t ∈ V), x et : e ∈ m − m m m+ m− δ (V ), t ∈ V), and X = (x , x , x ). The set of all pairs of non-negative vectors (z m , X m ) satisfying constraints (1) will be denoted by Y m . The routing optimization problem can now be stated as follows. max s.t.

F (z) =

m∈M d∈D m m m

zd hd

(2a)

Y m = (z , X ) ∈ Y m ,

m∈M

(2b)

+ x− et = xet ,

e ∈ EO , t ∈ V.

(2c)

Now, let us consider the problem dual to (2), obtained by Lagrangean relaxation of constraints (2c). Let λ = (λet : e ∈ EO , t ∈ V) be a vector of multipliers associated with constraints (2c). Assume that M = {1, 2, . . . , M } and deﬁne

A Subgradient Optimization Approach

1243

Y = (Y 1 , Y 2 , . . . , Y M ) and Y = Y 1 × Y 2 × . . . × Y M . As shown in [1], the dual can be written as follows:

m

minλ w(λ).

(3)

where w(λ) = m∈M w (λ) is the dual function deﬁned through subproblems wm (λ) = max{Lm (λ; Y m ) : Y m ∈ Y m }. Each Lm (λ; Y m ) = d∈Dm zd hd + − + m ) λet xet ) is a component of the decomposed t∈V ( e∈δ − (V m ) λet xet − e∈δ + (V m m Lagrangean function L(λ; Y ) = m∈M L (λ; Y ). Thus, for any given λ, computation of w(λ) reduces to solving a set of M mutually independent subproblems where for each domain m ∈ M the objective is to ﬁnd a maximizer Yˆ m of Lm (λ; Y m ).

3

Solving the Dual and Recovering a Primal Solution

An eﬃcient decomposed approach to the primal problem (2) is applying a subgradient-based procedure resolving the dual problem (3) in variables λ, and recovering an optimal primal solution in the process (see [4], [5]). An example of such an algorithm (referred to as SPR in the sequel) is as follows. ˆ 0 of L(λ0 ; Y ); Y ∗ := Y ˆ 0 . j := 1. Step 0: Set λ0 , γ, and η. Find a maximizer Y j j−1 j−1 − γ∇w(λ )). Step 1: λ := (λ Step 2: Find a maximizer Yˆ j of L(λj ; Y ) and modify the current estimate of the optimal primal solution: Y ∗ := η Yˆ j + (1 − η)Y ∗ . Step 3: If stopping criterion satisﬁed then stop. Else, modify γ and η, put j := j + 1 and go to Step 1. ˆ is a subgradient of the dual function at point λ, ˆ γ > 0 is a Above, ∇w(λ) step-size, and 0 < η < 1 is a weight parameter. We note that subgradients are ˆ obtained as a by-product of the distributed computation of the value of w(λ): ˆ ˆ ˆ if Y is a maximizer of the Lagrange function L(λ; Y ) for ﬁxed λ, then the ˆ = (ˆ ˆ+ corresponding subgradient is ∇w(λ) x− et − x et : e ∈ EO , t ∈ V) [3]. A stopping criterion is for example no improvement in the value of the dual function w(λj ). In general, the primal solution Y ∗ produced by SPR is approximate. This is, however, not a big issue since the input data available at the domains to solve the subproblems is intrinsically not accurate. Consequently, a primal solution computed with accuracy of, say, 5% will be suﬃcient for the purpose of multidomain routing optimization. SPR can be distributed among the domains so that each domain m ∈ M is responsible for adjusting the values of λm+ = (λet : e ∈ δ + (V m ), t ∈ V) for all its outgoing inter-domain links using the partial subgradient ∇m w(λm+ ) = (x− et − m+ + m m x+ : e ∈ δ (V ), t ∈ V). Observe that for computing ∇ w(λ ), the domain et + m m has to collect values x− et , e ∈ δ (V ), t ∈ V, from its neighboring domains. To make the process consistent, it must be ensured that the same parameters γ and η for adjusting individual components of λ and Y ∗ , for example γ = a/(b + cj) and η = 1/j (see [5]), are used simultaneously in all the domains. The information required from each domain m ∈ M during the process of resolving the dual problem (3) (called the master problem) is limited to vectors

1244

A. Tomaszewski et al.

xm− and xm+ , and scalar d∈Dm zd hd . They concern basically only the interdomain traﬃc and do not reveal any conﬁdential data related to a domain. In fact with a simple SPR (as the one described above), a good approximation of optimal Y ∗ is hard to reach in a reasonable number of steps. This, however, can be achieved with a more sophisticated approach called conic-bundle method available as freeware in [6].

4

Final Remarks

We tested the convergence of SPR on several randomly generated example multidomain networks with up to 40 nodes, 184 directed links and 1560 directed demands. We used two diﬀerent SPR algorithms: an advanced deﬂection algorithm CB taken from [6], and a plain algorithm SGR of the type described in Section 3. Using SGR it was not possible to obtain solutions with 1% accuracy for most of the example networks. It was also not possible to ﬁnd a 5% quality solution using SGR for the biggest network. Using CB all problems were solved, even for 1% solution accuracy. Still, even though the CB method is more eﬀective than SBR, its application in the distributed environment of the Internet may by difﬁcult. Contrary to CB, implementation of SBR is simple; we hope that a tuned variant of SDR can assure a good tradeoﬀ between implementation feasibility and convergence in a multi-domain environment. A practical implementation of the presented approach would result in a continuous network-wide process distributed over the domains, responsible for establishing consistent, near-optimal bandwidth reservation levels. A major problem is to provide scalable synchronization of the sub-processes running in individual domains, so that the overall process would converge even in large networks. Specifying such a network-wide distributed optimization process will be a subject of future work. Acknowledgment. The research presented in this paper has been sponsored by Polish Ministry of Science and Higher Education (grant 3 T11D 001 27), by Swedish Research Council (grant 621-2006-5509), and by FP6 NoE Euro-NGI.

References 1. A. Tomaszewski, M. Pi´ oro et al., Towards Distributed Inter-Domain Routing Optimization for IP/MPLS Networks, Technical Report, Warsaw University of Technology, 2006, http://ztit.tele.pw.edu.pl/TR/NDG/rmd07.pdf. 2. N. Feamster, J. Borkenhagen and J. Rexford, Guidelines for Interdomain Traﬃc Engineering, ACM SIGCOM Computer Communications Review, vol.33, no. 5, pp. 19–30, October 2003. 3. M. Minoux, Mathematical Programming: Theory and Algorithms, Wiley, 1986. 4. N.Z. Shor, Minimization Methods for Nondiﬀ. Functions, Springer-Verlag, 1985. 5. H.D. Sherali and G. Choi, Recovery of Primal Solutions when Using Subgradient Optimization Methods to Solve Lagrangean Duals of Linear Programs, Operations Research Letters, 19(1996), pp. 105–113, 1996. 6. C. Helmberg, ConicBundle 0.1, Fakult¨ at f¨ ur Mathematik, Techn. Univ. Chemnitz, 2005, http://www-user.tu-chemnitz.de/∼helmberg/ConicBundle/.

Cost-Based Approach to Access Selection and Vertical Handover Decision in Multi-access Networks Fanchun Jin1 , Hyeong-Ah Choi1 , Jae-Hoon Kim2 , Se-Hyun Oh2 , Jong-Tae Ihm2 , JungKyo Sohn3 , and Hyeong In Choi3, 1

Department of Computer Science, George Washington University, Washington, DC 2 Technology Strategy Group, SK Telecom, Seoul, Korea 3 Department of Mathematics, Seoul National University, Seoul, Korea {jinfc,hchoi}@gwu.edu, [email protected], {shoh,jtihm}@sktelecom.com, {jgsohn,hichoi}@snu.ac.kr

Abstract. In multi access network environments, mobile stations may encounter multiple choices for selecting an access network. Carefully designed access selection schemes can provide not only mobile users with better services but also network operators with better resource utilizations. It is also envisioned that further improvements can be achieved by redistributing mobile stations from one access network to anther (i.e., vertical handovers). Such decisions should be made by following carefully designed, yet simple to implement, protocols. In this paper, we present a cost-based scheme for access selection and vertical handover decision algorithms. The proposed algorithm was implemented in a Java-based simulator called MANSim (Multiple Access Network Simulator) that we developed. Our simulation results show that during the congested periods, the network throughput is significantly improved with greatly reduced call drop rate. Keywords: Multi access networks, access selection, vertical handover, cost value, marginal cost function, network throughput, call drop.

1 Introduction Growing demands for ubiquitous coverage and emerging mobile wireless applications are leading to fundamental changes in the wireless networking paradigms. The wireless landscape consists of a large number of protocols and service providers, providing their services to a variety of users with different traffic characteristics and hardware capabilities. It is expected that new radio access technologies (RATs) will be deployed in the future, but most likely the existing RATs will not be completely replaced by new RATs [1]. Each RAT operates on a different spectrum band and occupies different bandwidth capacities. With different traffic types and arrival processes in different cells, the traffic load of each cell significantly varies, and the resource management should not be uniformly applied across the cells; rather, it should be done by considering various parameters specific to each cell and each RAT. The current research trends

This work was in part supported by the Access Network & Mobile Terminal R&D Center, SK Telecom, Seoul, Korea.

I.F. Akyildiz et al. (Eds.): NETWORKING 2007, LNCS 4479, pp. 1245–1248, 2007. c IFIP International Federation for Information Processing 2007

1246

F. Jin et al.

in wireless communications is to develop technologies that would allow network operators to provide mobile users with access to any network and Always Best Connected (ABC) services [2]. The next generation of wireless networks, commonly referred to as Beyond 3G (B3G) networks, is envisioned to support higher bandwidth requirements on fully digital, all IP based networks that use a common frequency band across all providers and regions. Most of the work in this effort [3,4] are at the rather conceptual stage while some concentrate on performance gains using simulations or experiments in ad-hoc manners.

2 Access Selection and Vertical Handover Decision Our proposed approach will be demonstrated using the operational policy discussed in this section. We begin with a description of the system model we consider. System Model: The network consists of multiple radio access technologies (RATs) and multi-mode mobile stations that can access to any RAT in the network. The following aspects of the network are considered in our work. • The network consists of four types of RATs: CDMA1x, EvDO, WCDMA, and HSDPA. • The CDMA1x and EvDO operate on different spectrum bands while the WCDMA and HSDPA share the same spectrum band. • Two types of the WCDMA system exist. One is operating alone called WCDMAalone system. The other is sharing its spectrum with HSDPA called WCDMA/HSDPA system, and a fixed amount of the total power is allocated for WCDMA and HSDPA to share. • Voice traffic is served only by CDMA1x system or WCDMA-only system, or nonHSDPA part of the WCDMA/HSDPA system. • Data traffic is served only by EvDO system or HSDPA part of the WCDMA/HSDPA system. • Voice traffic has higher priority than data traffic. Hence, a new voice traffic may be admitted to a non-HSDPA part of a WCDMA/HSDPA cell by dropping or redirecting existing data traffic in its HSDPA part. Operational Policy: The following operational policy rules are assumed when admitting new mobile stations generating voice or data traffic into the network or making existing stations handover to other cells of the same or different RATs. 1) The CDMA1x or WCDMA-alone cells have higher priority over WCDMA/HSDPA cells when voice traffic is concerned. 2) The EvDO cells have higher priority over WCDMA/HSDPA cells when data traffic is concerned. 3) Voice traffic has higher priority over data traffic in WCDMA/-HSDPA cells, i.e., a new voice traffic may be admitted to a WCDMA/HSDPA cell by dropping its existing data traffics. Note that rule 3) is a commonly practiced operational policy in handling voice and data traffics in WCDMA/HSDPA systems. The rules 1) and 2) are then naturally

Cost-Based Approach to Access Selection and Vertical Handover Decision

1247

established. From the network operator’s perspectives, a clear goal is to maximize the network’s overall throughput while minimizing call drops. In order to accommodate burst traffic randomly arriving in any arbitrary cells in an arbitrary time, the radio resource should be managed to balance traffic loads across the cells. Marginal Cost Functions: We define a marginal cost function of each cell that computes its cost value, i.e., a virtual price, taking the current cell load and the cell capacity as input parameters. Using these cost values, a new mobile station is admitted to a cell with the lowest cost value among all accessible cells (i.e., candidate cells to which the mobile station has strong enough signal strength). The cost value of each cell is periodically updated, and existing mobile stations may be forced to perform handovers to other cells of the same or different RATs. A handover to a cell with a different RAT is called a vertical handover. For a cell v serving voice calls, its load xv (or capacity cv ) is defined as the number of voice calls currently served (or the maximum number of voice calls that can be served) by v. For a cell d serving data traffic, its load xd (or capacity cd ) is defined as the total amount of bandwidth provided by d to mobile stations currently in the cell (or the maximum bandwidth that d can provide). For a WCDMA/HSDPA cell that serve both voice and data traffics, the voice capacity is defined as the maximum number of voice calls that can be served using all the power allocated to the system, and the capacity of data traffic is defined as the maximum amount of bandwidth that the system can provide using all the power allocated to it. Let α be a real number in 0 < α < 1. A marginal cost function for each cell (voice or data) with a certain RAT is then defined as follows. • For CDMA1x, fvoice (xv ) is defined as (xv /cv )2 if xv /cv < α and (xv /cv )2 + 1 otherwise. • For WCDMA-alone, fvoice (xv ) is defined as (xv /cv )2 if xv /cv < α and (xv /cv )2 +1 otherwise. • For EvDO, fdata (xd ) = (xd /cd )2 + 1. • For WCDMA/HSDPA, fdata (xd ) = (xd /cd )2 /(1 − xv /cv )2 + 1 and fvoice (xv ) = (xv /cv )2 + 1. The actual value of α should be determined by considering many network parameters such as traffic arrival rate, burstness, time interval on which the cost values are updated and vertical handovers are performed, and the acceptable voice call drop rate set by the operator. Computing the optimal value of α requires precise network modeling and in-depth analysis, and we simply assume α = 0.9 in our simulations. Each base station periodically updates its cost values and reports them to the central resource manager CRRM. When a connection request arrives from a mobile station, the CRRM selects a base station with the minimum cost value, and direct the mobile station to connect to the selected base station. For vertical handover, either network make decision for each mobile station, or each mobile station make decision according to the cost values broadcasted by surrounding base stations.

1248

F. Jin et al.

3 Performance Analysis The performance of our CRRM is analyzed using MANSim that we developed. The network consists of the following cells: four CDMA1x cells, four EvDO cells, and two WCDMA/HSDPA. (The cell with WCDMA-alone is not included.) The CDMA1x cells and EvDO cells both cover the entire simulation area. The WCDMA/HSDPA cells each overlapping with at least one other WCDMA/HSDPA cell are deployed in a certain simulation area. The cell capacity of each cell type is defined as follows: 80 voice traffics (equivalently about 1.15Mbps) can be accommodated in a single CDMA1x cell; 1.2Mbps in an EvDO cell; 240 voice traffics (equivalently, about 3.45Mbps) in a WCDMA cell when no HSDPA data traffic is served; and 3.6Mbps in a HSDPA cell when no voice traffic is served. Each mobile station generates one of the five service types of traffic that are WAP, VOD Streaming, VOD Downloading, Video Conferencing and Voice. Because of the space limit, the simulation results for network-based handover decision are shown only.

15000

15000

Data Capa. Voice Capa.

10000

Weighted Call Drop (kbps)

Cell Throughput (kbps)

Data Capa.Voice Capa.

Voice Input Data Input

5000

HSDPA

10000

Voice Input Data Input

Data drop with CRRM 5000

Data drop w/o CRRM

Voice drop w/o CRRM WCDMA

CDMA1X 0 0

500

1000

EVDO

1500 2000 Time (Seconds)

Voice drop with CRRm

2500

3000

3500

(a) Throughput

0 0

500

1000

1500 2000 Time (Seconds)

2500

3000

3500

(b) Weighted call drop Fig. 1.

Figure 1 (a) shows the cell throughput in each RAT, and (b) shows the weighted call drop with and without our CRRM applied in which the weighted call drop denotes the total amount bandwidth requested by the mobiles stations dropped from the network.

References 1. Prytz, M., Karlsson, P., Cedervall, C., Bria, A., Karla, I.: Infrastructure cost benefits of ambient networks multi-radio access. In: Proc. IEEE Vehicular Technology Conference. (March 2006) 2. Gustafsson, E., Jonsson, A.: Always best connected. IEEE Wireless Communications 10 (Febrary 2003) 49–55 3. Niebert, N., et al.: Infrastructure cost benefits of ambient networks multi-radio access internetworking. In: Proc. IEEE Vehicular Technology Conference - Spring. (May 2005) 4. F. Berggren, e.a.: Multi-radio resource management for ambient networks. In: Proc. of IEEE 16th International Symposium on Personal, Indoor, and Mobile Radio Communications. (Sep 2005)

Author Index

Agarwal, Niraj 415 ¨ ur B. 558 Akan, Ozg¨ Al Hanbali, Ahmad 191 Amaldi, Edoardo 287 Amb¨ uhl, Christoph 855 Amores-L´ opez, Diego 1108 Argibay-Losada, Pablo Jes´ us ¨ Armagan, Onsel 239 Armenia, Sergio 215 Atakan, Barı¸s 558

970

Bai, Rendong 1145 Baiamonte, Valeria 356 Bali, Soshant 1157 Bandyopadhyay, Subir 143 Banerjee, Anirban 582, 1096 Bari, Ataul 143 Barlet-Ros, Pere 1108 Baudoin, C´edric 511 Baynat, Bruno 924 Becker, Monique 1149 Beylot, Andr´e-Luc 511 Bhuyan, Laxmi 1096 Bi, Jun 902 Biersack, Ernst W. 594 Blair, Dana 1014 Bohnert, Thomas Michael 664 Boice, Jay 155 Bos, Herbert 945 Bournelle, Julien 345 Brehon, Yannick 703 Bromage, Matt 1132 Bruneel, Herwig 1196 Cabellos-Aparicio, Albert 333 Calle Ortega, Eusebi 786 Capone, Antonio 287 Carlsson, Niklas 570 Carra, Damiano 594 Casaca, Augusto 703 Cesana, Matteo 287 Cha, Eun-Chul 1183 Chan, Mo-Che 1037 ChanneGowda, Divya 415

Chatzidrossos, Ilias 617 Chen, Gen-Huey 546 Chen, Ling-Jyh 1060 Cheng, Liang 462 Cheung, D. 1187 Chiang, Wen-Hui 1060 Chiu, Dah Ming 311 Choi, Hyeong-Ah 1245 Choi, Hyeong In 1245 Choi, Hyoung-Kee 1183 Choi, Sunghyun 37, 1179 Choi, Youngkyu 37 Choo, Hyunseung 380, 391, 1217 Chou, Cheng-Fu 1060 Chu, Rui 1237 Chung, Min Young 1217 Chydzinski, Andrzej 879 Claﬀy, K.C. 738 Combes, Jean-Michel 345 Cristea, Mihai-Lucian 945 Cruz, Rene 403 Cu´ellar, Leticia 855 Cui, Jun-Hong 108, 227 D´ an, Gy¨ orgy 617 Das, Sajal K. 475, 1026 de Souza e Silva, Edmundo 1084 Delye de Clauzade de Mazieux, Alexandre 1149 Dhaou, Riadh 511 Domingo-Pascual, Jordi 333 Dong, Qi 251 Dong, Yan 1165 Donnet, Benoit 738 Dovrolis, Constantine 628, 1014 Dvir, Amit 13 Dzida, Mateusz 1241 Dziong, Zbigniew 691 Eager, Derek L. 73, 570 Easton, Todd 714 Eidenbenz, Stephan 855 Elhanany, Itamar 797 ¨ ur 239, 263 Er¸cetin, Ozg¨

1250

Author Index

Faloutsos, Michalis 582, 1096 Feng, Jie 73 Fern´ andez-Veiga, Manuel 970 Fiems, Dieter 1196 Firoiu, Victor 179 Fodor, G´ abor 488 Fodor, Vikt´ oria 617 Fraboul, Christian 1229 Fraiwan, Mohammad 1120 Fran¸cois, Jean-Marc 167, 322 Friedman, Timur 738 Fumagalli, Andrea 415 Ganjali, Yashar 867 Gao, Ruomei 1014 Garcia-Luna-Aceves, J.J. 61, 155 Gazo-Cervero, Alfonso 808, 1225 Gerla, Mario 1005 Ghaderi, Majid 403 Gitelman, Boris 427 Gommans, Leon 945 Gonz´ alez-S´ anchez, Jos´e Luis 808, 1225 Goswami, Abhishek 299 Gu´erin, Roch 500 G¨ une¸s, Mesut 97 Guo, Jun 726 Guo, Song 1153 Guo, Zheng 227 Han, Kwanghun 37 Han, Sunyoung 1233 Harfoush, Khaled 1072 Hartmann, Matthias 749 Heidari, Fariba 832 Heinzelman, Wendi 1140 Hengartner, Nicolas 855 Herpel, Thomas 522 Hu, Shun-Yun 1037 Huﬀaker, Bradley 738 Hwang, Junho 1221 Iannaccone, Gianluca Ihm, Jong-Tae 1245

356, 1108

Jaekel, Arunita 143 Jain, Manish 628 Jeong, Yeonkwon 1161 Jha, Sanjay 726 Jiang, Jehn-Ruey 1037 Jin, Fanchun 1245 Joo, Bokgyu 1233

Kamalakis, Thomas 935 Kamel, Mina 714 Kang, Chul-Hee 368 Kannan, Lakshmi Narasimhan 415 Kant, K. 1187 Katsianis, Dimitris 935 Kaya, Selim 263 Kherani, Arzad A. 191 Kim, Heemin 1233 Kim, Jae-Hoon 1245 Kim, Jongseok 1179 Kim, Jungrae 380 Kim, Sang-Sik 85 Kim, Seongkwan 1179 Ko, Young-Bae 1169 Koch, Wolfgang 522 Kofman, Daniel 703 Konorski, Jerzy 1136 Koo, Jahwan 380, 391 Korakis, Thanasis 427 Koucheryavy, Yevgeni 664 Koukoutsidis, Ioannis 439 Krunz, Marwan 120 Kuhns, Fred 1204 Kumara, Soundar 203 Kundu, Sumantra R. 475, 1026 Kwon, Youngwoo 37 Lacan, J´erˆ ome 1229 Larafa, Sondes 345 Laurent-Maknavicius, Maryline Le˜ ao, Rosa M.M. 1084 Leduc, Guy 167, 322 Lee, Eun-Jong 368 Lee, Sangho 1217 Lee, Sanghwan 890 Lee, Seokcheon 203 Lee, Tae-Jin 1217 Leung, Victor 1153 Levi, Albert 239, 263 Li, Baochun 678, 1208 Li, Dongsheng 1237 Liang, Ben 275, 535 Liang, Weifa 958 Lim, Hyung-Taig 368 Lo Cigno, Renato 594 L´ opez-Garc´ıa, C´ andido 970 Liu, Donggang 251 Liu, Haiyang 179 Liu, Xin 49

345

Author Index Liu, Yuzhen 958 Lu, Xicheng 1192, 1237 Lui, John C.S. 311

Park, Ae-Soon 85 Pau, Giovanni 1005 Peng, Jun 462 Perros, Harry 1072 Phillips, Caleb 1173 Pi´ oro, Michal 691, 1241 Popova, Larissa 522 Potts, Donald 1132 Psaras, Ioannis 981

Ma, Joongsoo 1161 Machiraju, Sridhar 1157 Mahmino, Ali 1229 M¨ ah¨ onen, Petri 25 Makaroﬀ, Dwight 73 Makda, Salik 427 Malucelli, Federico 287 Manimaran, G. 1047, 1120 Mannor, Shie 832 Marﬁa, Gustavo 1005 Marot, Michel 1149 Martin, R¨ udiger 749, 762 Marzo, Jose L. 786 Mason, Lorne G. 832 Matta, Ibrahim 820 Matthews, Brad 797 Mazloom, Amin R. 475 Menth, Michael 749, 762 Michiardi, Pietro 606 Mir, Zeeshan Hameed 1169 Mitra, Prasenjit 203 Mizorov, Vasil 25 Moltchanov, Dmitri 664 Monath, Thomas 935 Monteiro, Edmundo 664 Morabito, Giacomo 215 Morrow, Monique 1014 Mun, Sung-Gon 391 Murthy, Sudheendra 299 Muthuprasanna, M. 1047 Mycek, Mariusz 1241

Qiao, Daji

1179

Ramachandran, Krishna 606 Ramasubramanian, Srinivasan 120 Ramaswamy, Venkatesh 855 Ren, Yan 132 Ridoux, Julien 924 Roccetti, Marco 1005 Rocha, Antonio A. de A. 1084 Rodr´ıguez-Rubio, Ra´ ul 970 Rodr´ıguez-P´erez, Francisco J. 808, 1225 Rolland, Chlo´e 924

Na, Jaeeun 1161 Nain, Philippe 191 Neginhal, Mradula 1072 Numanoglu, Tolga 1140 Obraczka, Katia 155, 1132 Oh, Se-Hyun 1245 Ok, Changsoo 203 Pal, Sourav 475, 1026 Palazzi, Claudio 1005 Palazzo, Sergio 215 Panwar, Shivendra 427 Papagiannaki, Konstantina

1251

356

Saha, Debanjan 890 Sahu, Sambit 890 Saleh, Aladdin 535 Sanadidi, M.Y. 1005 Sanju` as-Cuxart, Josep 1108 Savas, Erkay 239, 263 Schuba, Christoph L. 1026 Scoglio, Caterina 714 Segal, Michael 13 Sen, Arunabha 299 Seok, Seung-Joon 368 Shavitt, Yuval 774 Sikdar, Biplab 606 Singer, Yaron 774 Singh, Suresh 1173 Singhal, Mukesh 1145 Siris, Vasilios A. 439 So, Aaron 275 Sohn, JungKyo 1245 Sol´e-Pareta, Josep 1108 Song, Jungwook 1233 Sphicopoulos, Thomas 935 Sp¨ orlein, Ulrich 762 Sridharan, Ashwin 403, 500 Srinivasan, Mukund 890 Srivastava, Jaideep 179 Steyaert, Bart 1196

1252

Author Index

Su´ arez-Gonz´ alez, Andres 970 Subbaraman, Ramesh 500 Suh, Changsu 1169 Sun, Jin 582 Sun, Jinsheng 844 Tabatabaee, Vahid 797 Tacca, Marco 415 Tao, Zhifeng 427 Tavli, Bulent 1140 Tomaszewski, Artur 1241 Towsley, Don 403 Tran, Con 691 Tsaoussidis, Vassilis 981 Tseng, Yi-Hsien 546 Turner, Jonathan 1204 Udar, N. 1187 ¨ u, Abd¨ Unl¨ ulhakim Urra, Anna 786

239

Varoutas, Dimitris 935 Veeraraghavan, Vilas 640 Vernon, Mary K. 912 Vila, Pere 786 Vilzmann, Robert 25 Viswanathan, R. 1187 Wang, Baosheng 1192 Wang, Bing 227 Wang, Bo 132 Wang, Mei 867 Wang, Wenye 1 Wang, Xin 61 Wang, Yue 311 Wang, Z. 1047 Weber, Birgitta 855 Weber, Steven 640, 652 Wenig, Martin 97

Widmer, J¨ org 25 Wilson, Michael 1204 Wu, Chuan 678, 1208 Wu, Eric Hsiao-Kuang 546 Wu, Jianping 902 Wu, S. Felix 1200 Xiao, Nong 1237 Xiaohu, Ge 1165 Xie, Lizhong 902 Xing, Fei 1 Xu, Dan 49 Xu, Li 945 Yang, Oliver 1153 Yang, Xiaowei 992 Yaoting, Zhu 1165 Yilmaz, Selma 820 Yoo, Myungsik 1221 Younis, Ossama 120 Zago˙zd˙zon, Michal 1241 Zahran, Ahmed 535 Zan, Lei 992 Zang, Hui 403, 1157 Zegura, Ellen 1014 Zhang, Hongke 132 Zhang, Ke 1200 Zhang, Sidong 132 Zhang, Su 912 Zhang, Yiming 1237 Zhang, Zhi-Li 179, 890 Zhao, Feng 1192 Zhou, Shengli 108 Zhou, Zhong 108 Zhu, Hao 450 Zhu, Peidong 1192 Zimmermann, Alexander 97 Zukerman, Moshe 844