Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering Editorial Board Ozgur Akan Middle East Technical University, Ankara, Turkey Paolo Bellavista University of Bologna, Italy Jiannong Cao Hong Kong Polytechnic University, Hong Kong Falko Dressler University of Erlangen, Germany Domenico Ferrari Università Cattolica Piacenza, Italy Mario Gerla UCLA, USA Hisashi Kobayashi Princeton University, USA Sergio Palazzo University of Catania, Italy Sartaj Sahni University of Florida, USA Xuemin (Sherman) Shen University of Waterloo, Canada Mircea Stan University of Virginia, USA Jia Xiaohua City University of Hong Kong, Hong Kong Albert Zomaya University of Sydney, Australia Geoffrey Coulson Lancaster University, UK
63
Róbert Szabó Hua Zhu Sándor Imre Ranganai Chaparadza (Eds.)
Access Networks 5th International ICST Conference on Access Networks, AccessNets 2010 and First ICST International Workshop on Autonomic Networking and Self-Management in Access Networks, SELFMAGICNETS 2010 Budapest, Hungary, November 3-5, 2010 Revised Selected Papers
13
Volume Editors Róbert Szabó Department of Telecommunications and Media Informatics Budapest University of Technology and Economics 1117 Budapest, Hungary E-mail:
[email protected] Hua Zhu San Diego, CA 92121, USA E-mail:
[email protected] Sándor Imre Department of Telecommunications Budapest University of Technology and Economics 1117 Budapest, Hungary E-mail:
[email protected] Ranganai Chaparadza MOTION Department, Fraunhofer FOKUS 10589 Berlin, Germany E-mail:
[email protected]
ISSN 1867-8211 ISBN 978-3-642-20930-7 DOI 10.1007/978-3-642-20931-4
e-ISSN 1867-822X e-ISBN 978-3-642-20931-4
Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2011926505 CR Subject Classification (1998): C.2, D.4.4, D.4.6, K.4.4, K.6.5
© ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering 2011 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface
The fifth edition of the annual International ICST Conference on Access Networks (AccessNets 2010) was organized to provide a forum that brings together scientists and researchers from academia as well as managers and engineers from industry and government organizations to meet and exchange ideas and recent work on all aspects of access networks and how they integrate with their in-house counterparts. This year’s conference was held in Budapest, Hungary, during November 3–5, 2010 and was sponsored by ICST and was coorganized by the Budapest University of Technology and Economics, Hungary, and the Scientific Association for Infocommunications, Hungary. This year’s main conference focused on next-generation wireless and wired broadband networks, sensor networks and emerging applications related to access networks. The main conference received 23 submissions from 15 different countries. After a thorough review process, 9 papers were accepted from the open call, one distinguished researcher was invited to contribute an invited paper and one was invited for a post-deadline submission, yielding 11 technical papers altogether. The 11 technical papers were organized into 4 technical sessions. In addition, four posters were allocated for a poster session during the conference. Within the main program of the conference, two keynote speeches addressed hot topics on emerging trends and focus areas for access networks. The first keynote by Jens Malmodin from Ericsson addressed the energy and carbon footprint of ICT and media services and the second keynote by Peter Szilagyi, Nokia Siemens Networks, addressed self-organizing networks. Collocated together with the main conference of AccessNets 2010 was the First International ICST Workshop on Autonomic Networking and Self-Management in Access Networks (SELFMAGICNETS 2010), which complemented the main conference program with focused coverage on theories and technologies of autonomic networking and self-management. The organizer of the SELFMAGICNETS 2010 workshop was the EC-funded FP7 EFIPSANS IP Project (INFSO-ICT215549). Altogether ten peer-reviewed technical papers and a keynote address were presented at the workshop, out of which five were outside the EFIPSANS project. We would like to take this opportunity to express our thanks to the technical and financial sponsors of AccessNets 2010, to the Chairs and members of the Technical Program Committee and to all members of the Organizing Committee. November 2010
R´obert Szab´o
Organization
Steering Committee Imrich Chlamtac (Chair) Jun Zheng Nirwan Ansari
Create-Net Research, Italy Southeast University, China New Jersey Institute of Technology, USA
Conference General Chair Gyula Sallai
Scientific Association for Infocommunications, Hungary
Conference Vice Chair R´obert Szab´o
Budapest University of Technology and Economics, Hungary
Technical Program Co-chairs Hua Zhu S´ andor Imre
ArgonST, Network Systems, USA Budapest University of Technology and Economics, Hungary
Local Arrangements Chair P´eter Nagy
Scientific Association for Infocommunications, Hungary
Publication Chair Andrei Gurtov
Aalto University and University of Oulu, Finland
Publicity Chair Rolland Vida
Scientific Association for Infocommunications, Hungary
Web Chair Attila Vid´ acs
Budapest University of Technology and Economics, Hungary
VIII
Organization
Conference Coordinator Edit Marosi
ICST
Technical Program Committee Gee-Kung Chang Tamer ElBatt Maurice Gagnaire Erol Gelenbe Paolo Giacomazzi Victor Govindaswamy Kaibin Huang Raj Jain David K Hunter Ken Kerpez Dusan Kocur Polychronis Koutsakis Sunil Kumar Chang-Hee Lee Ming Li Kejie Lu Xun Luo Victor C.M. Leung Maode Ma Martin Maier John Mitchell Sagar Naik Timo Ojala Garret Okamoto Nikos Passas Przemyslaw Pawelczak Martin Reisslein Djamel Sadok Mehmet Safak Gangxiang Shen Driton Statovci Gaoxi Xiao SiQing Zheng
Georgia Institue of Technology, USA Nile University in Cairo, Egypt ENST (TELECOM ParisTech), France Imperial College London, UK Politecnico di Milano, Italy Texas A&M University - Texarkana, USA Yonsei University, Korea University of Washington in St. Louis, USA University of Essex, UK Telcordia Technologies, USA Technical University of Kosice, Slovak Republic Technical University of Crete, Greece San Diego State University, USA KAIST, Korea California State University, Fresno, USA University of Puerto Rico at Mayaguez, Puerto Rico Qualcomm Inc., USA The University of British Columbia, Canada Nanyang Technological University, Singapore Institut National de la Recherche Scientifique (INRS), Canada University College London, UK University of Waterloo, Canada University of Oulu, Finland Adaptive Communications Research Inc., USA University of Athens, Greece University of California, Los Angeles, USA Arizona State University, USA Federal University of Pernambuco (UFPE), Brazil Hacettepe University, Turkey Ciena Corporation, USA Telecommunications Research Center Vienna, Austria Nanyang Technological University, Singapore University of Texas at Dallas, USA
Organization
SELFMAGICNETS 2010 Committee General Chair Ranganai Chaparadza
Fraunhofer FOKUS, Germany
Technical Program Committee Domonkos Asztalos P´eter Benk¨ o Ranganai Chaparadza Nikolaos Chatzis Moiso Corrado Juan Manuel Gonzales Mu˜ noz Timotheos Kastrinogiannis Slawomir Kuklinski Yuhong Li Jose Antonio Lozano Lopez Symeon Papavassiliou Said Soulhi
Ericsson Hungary, Hungary Ericsson Hungary, Hungary Fraunhofer FOKUS, Germany Fraunhofer FOKUS, Germany Telecom Italia, Italia Telefonica TID, Spain ICCS, Greece Warsaw University of Technology, Poland BUPT, China Telefonica TID, Spain ICCS, Greece Ericsson Canada, Canada
IX
Table of Contents
ACCESSNETS 2010 – Technical Session 1: Next Generation Wireless Networks Improving TCP-Friendliness for mHIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tatiana Polishchuk and Andrei Gurtov Automatic Base Station Deployment Algorithm in Next Generation Cellular Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Istv´ an T¨ or˝ os and P´eter Fazekas A Fast and Simple Scheme for Mobile Station-Controlled Handover in Mobile WiMAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sayan Kumar Ray, Swapan Kumar Ray, Krzysztof Pawlikowski, Allan McInnes, and Harsha Sirisena
3
18
32
ACCESSNETS 2010 – Technical Session 2: Emerging Applications Modeling the Content Popularity Evolution in Video-on-Demand Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Attila K˝ or¨ osi, Bal´ azs Sz´ekely, and Mikl´ os M´ at´e
47
Sizing of xDR Processing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B´ alint Ary and S´ andor Imre
62
Modeling Self-organized Application Spreading . . . . . . . . . . . . . . . . . . . . . . ´ am Horv´ Ad´ ath and K´ aroly Farkas
71
ACCESSNETS 2010 – Technical Session 3: Next Generation Wired Broadband Networks Passive Access Capacity Estimation through the Analysis of Packet Bursts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Martino Fornasa and Massimo Maresca
83
A Minimum BER Loading Algorithm for OFDM in Access Power Line Communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linyu Wang, Geert Deconinck, and Emmanuel Van Lil
100
XII
Table of Contents
ACCESSNETS 2010 – Technical Session 4: Sensor Networks Self-repairing Clusters for Time-Efficient and Scalable Actor-Fault-Tolerance in Wireless Sensor and Actor Networks . . . . . . . . . Loucif Amirouche, Djamel Djenouri, and Nadjib Badache
113
ACCESSNETS 2010 – Invited Talk Bit-Error Analysis in WiFi Networks Based on Real Measurements . . . . . G´ abor Feh´er
127
ACCESSNETS 2010 – Poster Session Data-Rate and Queuing Method Optimization for Internetworking Medical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Radek Dolezel, Otto Dostal, Jiri Hosek, Karol Molnar, and Lukas Rucka Shared Wavelength Assignment Algorithm in Multi-profile WDM-EPONs to Support Upstream Bandwidth Guarantees . . . . . . . . . . . Noem´ı Merayo, Patricia Fern´ andez, Ram´ on J. Dur´ an, Tamara Jim´enez, Ignacio de Miguel, Juan C. Aguado, Rub´en M. Lorenzo, and Evaristo J. Abril Towards Sustainable Broadband Communication in Rural Areas . . . . . . . Amos Nungu and Bj¨ orn Pehrson Characterization of BitTorrent Traffic in a Broadband Access Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zolt´ an M´ ocz´ ar and S´ andor Moln´ ar
141
153
168
176
SELFMAGICNETS 2010 Remediating Anomalous Traffic Behaviour in Future Networked Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Angelos K. Marnerides, Matthew Jakeman, David Hutchison, and Dimitrios P. Pezaros IPv6 and Extended IPv6 (IPv6++) Features That Enable Autonomic Network Setup and Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ranganai Chaparadza, Razvan Petre, Arun Prakash, Felici´ an N´emeth, Slawomir Kukli´ nski, and Alexej Starschenko A Min-Max Hop-Count Based Self-discovering Method of a Bootstrap Router for the Bootstrap Mechanism in Multicast Routing . . . . . . . . . . . . Toshinori Takabatake
187
198
214
Table of Contents
ALPHA: Proposal of Mapping QoS Parameters between UPnP Home Network and GMPLS Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lukasz Brewka, Pontus Sk¨ oldstr¨ om, Anders Gavler, Viktor Nordell, Henrik Wessing, and Lars Dittmann
XIII
226
Methodology towards Integrating Scenarios and Testbeds for Demonstrating Autonomic/Self-managing Networks and Behaviors Required in Future Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vassilios Kaldanis, Peter Benko, Domonkos Asztalos, Csaba Simon, Ranganai Chaparadza, and Giannis Katsaros
240
How Autonomic Fault-Management Can Address Current Challenges in Fault-Management Faced in IT and Telecommunication Networks . . . . Ranganai Chaparadza, Nikolay Tcholtchev, and Vassilios Kaldanis
253
Efficient Data Aggregation and Management in Integrated Network Control Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Patrick-Benjamin B¨ ok, Michael Patalas, Dennis Pielken, and York T¨ uchelmann On Self-healing Based on Collaborating End-Systems, Access, Edge and Core Network Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nikolay Tcholtchev and Ranganai Chaparadza Priority Based Delivery of PR-SCTP Messages in a Syslog Context . . . . . Mohammad Rajiullah, Anna Brunstrom, and Stefan Lindskog
269
283 299
Auto-discovery and Auto-configuration of Routers in an Autonomic Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arun Prakash, Alexej Starschenko, and Ranganai Chaparadza
311
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
325
ACCESSNETS 2010
Technical Session 1: Next Generation Wireless Networks
Improving TCP-Friendliness for mHIP Tatiana Polishchuk1 and Andrei Gurtov1,2 1
2
Helsinki Institute for Information Technology HIIT, P.O. Box 19800, 00076 Aalto, Finland http://www.hiit.fi Centre for Wireless Communications, Oulu, Finland
Abstract. Multihomed environments are getting increasingly common, especially for mobile users. mHIP was designed to provide secure multipath data transmission for the multihomed hosts and boost throughput of a single TCP connection by effectively distributing data over multiple available paths. In this paper we develop a TCP-friendly congestion control scheme for mHIP secure multipath scheduling solution. We enable two-level control over aggressiveness of the multipath flows to prevent stealing bandwidth from the traditional transport connections in the shared bottleneck. We demonstrate how to achieve a desired level of friendliness at the expense of inessential performance degradation. A series of simulations verifies that the proposed congestion control for mHIP meets the criteria of TCP-compatibility, TCP-equivalence and TCP-equal share, preserving friendliness to UDP and another mHIP traffic. Keywords: Internet, HIP, multipath routing, TCP-friendliness, goodput.
1
Introduction
Multipath data transfer is a promising technique for enhancing reliability of Internet connections. New mobile devices and laptops are equipped with several network interfaces (e.g., WLAN, GPRS, 3G) and have multiple links to the Internet, which results in availability of multiple paths between a source and destination end host. TCP [20] comprises a major share of the total Internet traffic. Among its other management tasks, TCP controls segment size, the rate at which data is exchanged, and network traffic congestion [21]. However, traditional TCP flow is constrained to use one path only per one connection between two communicating hosts. There are efforts within the networking community to overcome this limitation. Most of these efforts rely on the mechanisms which aggressively compete for network resources. Naive designs and implementations risk substantial unfairness to well-behaved TCP flows. Proper per-flow congestion control is required to limit aggressiveness of the proposed multipath solutions. Other multipath communication methods, proposed to efficiently utilize multiple access links, unable to take advantage of all available multipath bandwidth because they do not properly consider end-to-end delay of packet transmission. R. Szab´ o et al. (Eds.): AccessNets 2010, LNICST 63, pp. 3–17, 2011. c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
4
T. Polishchuk and A. Gurtov
Out-of-order data arrivals at a receiver cause unpredictable underutilization of spare network capacity. Packet reordering and non-congestion packet loss can significantly degrade TCP performance. TCP-friendliness has emerged as a measure of correctness in Internet congestion control. The notion of TCP-friendliness was introduced to restrict non-TCP flows from exceeding the bandwidth of a conforming TCP running under comparable conditions. Protocols commonly meet this requirement by using some form of AIMD (Additive Increase Multiplicative Decrease) congestion window management, or by computing a transmission rate based on equations derived from AIMD model. In the prior work [8] we proposed a multipath HIP solution, which combines the advantages of HIP advanced security with the benefits of multipath routing. mHIP uses multiple parallel flows simultaneously in order to boost throughput of the TCP connection. The multipath scheduler takes into account rapidly changing parameters of the available paths, including the TCP queuing delay at a sender and the network delay, and sends each data packet through the path with the earliest estimated time of arrival. Simple congestion control measures were suggested to provide reliable multipath data delivery. In this paper we study TCP-friendliness of HIP multipath design with respect to coexisting connections. The contributions of this work include the development of a two-level congestion control concept for a reliable multipath data transmission and methods of tuning aggressiveness of individual flows from the multipath bundle in order to provide a desirable level of TCP-friendliness while avoiding significant performance degradation. The rest of the paper is organized as follows. Section 2 summarizes the related work. Preliminaries are presented in Section 3 and contain the review of multipath HIP simple congestion control and definitions of TCP-friendliness. Section 4 presents the step-by-step work which was done to enable TCP-friendly congestion control for mHIP. Conclusions and future work are given in Section 5.
2
Related Work
Despite the fact that multiple multipath solutions for multihomed hosts has recently emerged, multipath routing is not yet widely deployed in practice. Researchers study advantages of its implementation on different layers of the TCP/IP stack. Transport layer solutions, such as SCTP [13], MPTCP [6], TCP-MH [15], can naturally obtain the most recent information on the quality of different paths and detect congestion situations in timely manner. For example, SCTP can perform measurements across several paths simultaneously, and then map flows on one or another path. Network layer approaches ([2], [5]) are easy to deploy and totally transparent to applications and involve only minimal changes in contrary to the application and transport layer solutions which involve many changes in the infrastructure. Wedge-layer approaches, implemented in HIP [7], LIN6 [11], MIP6 [14], conduct multiaddressing support in a functional layer between IP and transport.
Improving TCP-Friendliness for mHIP
5
They have an advantage of being able to maintain multiaddressing information across transport associations. The transport activity between two endpoints may well be able to use multiaddressing immediately and with no further administrative overhead. Moreover, edge-based locator exchange protocols can be incorporated without necessitating modification to any host’s IP or transport modules, which makes them the best choice to provide multihoming and multipath functionality for legacy Internet applications and transport protocols. There is an effort in the community to create new methods which effectively and TCP-friendly utilize a spare network capacity. In [9] authors created a parallel multipath TCP solution, which controls data transmission over coordinated multiple TCP connections. They stressed the importance of TCP-friendliness for multipath schemes and suggested a way to find a balance between effectiveness and fairness. Their work provided a motivation to design a TCP-friendly congestion control over multipath flows inside one TCP connection. When data packets are sent over several paths inside one connection they can experience different end-to-end delays and arrive out of order. In case of TCP traffic, packet reordering causes significant performance degradation. The authors of [16] surveyed and analyzed relevant techniques on coping with multipath TCP packet reordering. They conclude that there exists no one-fits-all solution to solve the problem of packet reordering for multipath TCP. Basing on the methods [3], [4], [17], [24] we suggest the improvement for multipath HIP which reduced the level of reordering on the receiver and significantly improved TCP-friendliness of our scheme. According to the resource pooling principle [23] when several subflows of one connection share a bottleneck, their resource consumption adds up. Multipath connections with a large number of TCP-friendly subflows can compete unfairly against a smaller number of regular TCP connections. Each subflow is as aggressive as a single TCP, and a bundle of n TCP-friendly subflows will hence use an approximately n times greater share of the bottleneck resource than they should. TCP-fair multipath connection should displace no more TCP traffic than a traditional TCP stream would displace. A number of methods [9], [18], [10] were proposed to study and solve the TCP-fairness problem. Although the current implementation of mHIP was not intended to achieve the TCP-fairness criterion, the two-level congestion control scheme proposed further in this paper will provide TCP-fairness of mHIP by default, and the preliminary experiments with competing mHIP flows inside one connection confirmed this assumption. mHIP multipath scheduling assumes the paths are bottleneck-disjoint. This automatically liberates us from the necessity to prove TCP-fairness of our solution since multiple flows of a single multipath HIP connection never share the same bottleneck link. When it is not possible to guarantee bottleneck independence of the paths a coupled congestion control for congestion management [18] was recently suggested by MPTCP working group. The complexity and effects of applying such measures are out of the scope of this paper.
6
3 3.1
T. Polishchuk and A. Gurtov
Preliminaries TCP-Friendliness Definitions
TCP-friendliness is a generic term describing a scheme that aims to use no more bandwidth than TCP uses. In this paper we study mHIP congestion control in view of the criteria proposed in [22]: A TCP-compatible flow, in the steady state, should use no more bandwidth than a TCP flow under comparable conditions, such as packet-loss rate and round-trip time (RTT). However, a TCP-compatible congestion control scheme is not preferred if it always offers far lower throughput than a TCP flow. A TCP-equivalent scheme merely ensures the same throughput as TCP when they experience identical network conditions. Although a TCP-equivalent scheme consumes TCP-equivalent bandwidth when working by itself, it may not coexist well with TCP in the Internet. TCP-equal share is a more realistic but more challenging criterion than TCPequivalence and states that a flow should have the same throughput as TCP if competing with TCP for the same bottleneck. A TCP-equivalent flow may not be TCP-equal share, but the opposite is always true. To be able to meet all three criteria a TCP-friendly scheme should use the same bandwidth as TCP in a steady-state region, while being aggressive enough to capture the available bandwidth and being responsive enough to protect itself from congestion, as the packet-loss condition changes in the paths in the transient state. Aggressiveness of a scheme describes how the scheme increases the throughput of a flow before encountering the next packet loss, while responsiveness describes how the scheme decreases the throughput of a flow when the packet-loss condition becomes severe. In what follows we will examine the ability of our multipath solution to adhere to the proposed definitions of TCP-friendliness. We evaluate its performance (f low) as the measure. Here T (·) using the factor of friendliness F F (f low) = TT (T CP ) denotes the average flow throughput in Mbps. F F = 1 indicates the solution satisfies the strongest TCP-equal share criterion, while solution resulting in F F > 1 is more aggressive than a typical TCP and the one with F F < 1 may be not TCP-compatible. 3.2
Review of Multipath HIP with Simple Congestion Control
In the prior research [8] HIP multipath scheduling showed a potential to aggregate about 99% of the sum of individual paths bandwidth. Simple congestion detection and avoidance are able to prevent the sending rate of the multipath traffic from significant degradation caused by congestion in the paths. Before we start evaluating mHIP congestion control scheme in the view of TCP-friendliness criteria, we recall how it operates.
Improving TCP-Friendliness for mHIP
7
1. Connection establishment During the base exchange HIP obtains information about the number of available interfaces on both communicating hosts and the number of available paths with the initial parameters such as available bandwidth and propagation delay. 2. Updating parameters of the paths mHIP uses HIP signaling packets for path probing. The frequency of heartbeats can vary depending on the particular setup. 3. Sending data HIP multipath scheduler optimally splits data among the paths according to their capacities. The details of scheduling algorithm are provided in [8]. mHIP stores packet-to-path assignments at the sender and also in the ESP packet headers, which are used according the HIP standard [12]. SPI number, specified in the packet header corresponds to the path which is assigned to deliver this particular packet. 4. Congestion control Marking and multipath congestion avoidance techniques provide a simple congestion control for mHIP. One packet per round-trip time is marked on the departure to each path. The expected delivery time of the marked packet is stored at the sender and then compared to its actual arrival time value on the receipt of the corresponding ACK. If the estimated delivery time and the actual arrival time of the marked packet are noticeably different, the scheduler considers the path to be congested. Multipath congestion avoidance technique specifies two indicators of the path congestion: – Case 1: standard TCP dupack action, when the sender is retransmitting the packet after the receipt of three duplicate acknowledgments from the receiver; – Case 2: observed delivery time of the marked packet exceeds its corresponding expected delivery time by more than some preset value. If any of the two indicators suggest congestion, the path is temporarily closed and the packets are redirected to the other available paths. mHIP sends regular probes to the congested path to detect when the path becomes again free for reliable data transmission. 5. Assumptions and limitations Our approach corresponds to the class of disjoint multipath routing [19]. The paths are restricted to have independent bottlenecks. The scheduler resides at the sender side, no information from the receiver is available other than TCP acknowledgments (ACKs) received by the sender. At least one available path should not be congested at any given point of time.
8
4
T. Polishchuk and A. Gurtov
Improving mHIP Step by Step
Next we examine mHIP congestion control in the view of TCP-friendliness criteria. We analyze the reasons why multipath flows not always fairly share available bandwidth with TCP and propose the methods to improve TCP-friendliness of our multipath solution. 4.1
Experimental Evaluation of mHIP with Simple Congestion Control
All simulations presented in this work were run using ns-2 network simulator [1]. A new protocol agent was implemented on the basis of TCP New Reno to deal with the multipath flow controlled by HIP. Existing TCP and UPD modules were also used to simulate external cross-traffic competing with HIP multipath flows for bottleneck bandwidth. Consider a simulation model shown in Figure 1. A TCP traffic flow, controlled by multipath HIP, is sent from n0 to n1 over two available paths: P ath1 = n0−n2−n1 and P ath2 = n0−n3−n1 with the bandwidth of 8Mbps and 4Mbps respectively. Since multipath scheduler is distributing the traffic according to bandwidth-delay product of the paths, for simplicity the propagation delay is fixed to be the same for all the links and equals 30 ms. mHIP is calculating the end-to-end propagation delays in the paths, they can consist of any number of connected links and intermediate nodes. Node n4 is used for the path n2−n1−n4 construction, which accommodates a standard TCP New Reno flow, competing against one flow from the mHIP bundle for the bottleneck link n2 − n1. DropTail scenario was used to manage the bottleneck link, its size is 1.5 times the bandwidth-delay product of the link. The packet size in each flow is 1250 bytes. The simulation runs for 20 seconds, which we believe is sufficient to reflect the difference between the proposed congestion control solutions. Appropriate Rwin values were used at the receivers to allow maximum throughput of the flows. We begin our first experiment with an empty network and then allocate multipath HIP subflows to the two end-to-end paths. At the same time we start sending a TCP traffic from n2 to n4, which will compete with mHIP flow in the bottleneck link n2 − n1. To simulate variable network conditions we also
Fig. 1. 2-path simulation model
Improving TCP-Friendliness for mHIP
9
introduce cross-traffic to P ath2. A 4Mbps UDP flow was scheduled between 5 and 11 seconds of the simulation run, triggering a congestion situation in P ath2. Figure 2 shows mHIP and TCP New Reno flow throughputs, averaged over 0.1 sec. As one can clearly conclude from the chart the flows do not share the bottleneck bandwidth fairly. mHIP (dotted curve) occupies more bandwidth, with the average of T (mHIP 1) = 3.98Mbps and TCP takes just T (T CP ) = 1) 3.56Mbps resulting in the friendliness factor F F = T T(mHIP (T CP ) = 1.11.
Fig. 2. mHIP flow competes with TCP New Reno flow in P ath1
Lets try to understand the reason why mHIP starts starving the TCP flow during the particular time period. In the beginning of the simulation run mHIP and TCP flows share the bandwidth mostly fair. At some point after 5 seconds the marking technique reports a congestion situation, resulting from the competition with UDP cross-traffic in P ath2. Let w be the number of packets at the sender, which corresponds to the cwnd value of the global TCP flow controlled by mHIP. The multipath scheduler sends w1 packets to P ath1 and w2 packets to P ath2 in the share correspondent to path characteristics with the total w1 + w2 = w. According to the congestion avoidance scheme P ath2 is closed and all the traffic from the congested P ath2 is rerouted to P ath1, meaning that at this same time P ath1 receives not only its own share w1 but also extra w2 packets. In this region mHIP is dominating and stealing bandwidth from the competing TCP transport transmission in the bottleneck link n2 − n1. The proposed congestion control method is definitely more aggressive than AIMD policy of a typical TCP.
10
4.2
T. Polishchuk and A. Gurtov
Designing TCP-Friendly Congestion Control for mHIP
We want our mHIP connections to coexist with other traffic providing opportunities for all to progress satisfactory. To limit aggressiveness of the flow growth we propose the following two-level congestion control scheme - per-path AIMD plus TCP global stream congestion control on top of it, and introduce a sender-side buffer to provide better control on the packet sequence in congestion situations.
Fig. 3. Two-level multipath congestion control
The proposed twofold congestion control scheme is illustrated in Figure 3. Global congestion controller coordinates the work of the individual per-path controllers and balances traffic load between the paths according to their available capacity. If cwnd capacity of the quickest path is exceeded, the path with the next minimum estimated arrival time is chosen. An important property of the proposed scheme is that per-path controllers are connected so that the aggregated congestion window is a simple sum of perflow congestion windows. Same rule applies to the threshold values. Connecting per-path congestion control parameters in such a way we guarantee the resulting multipath bundle behaves as a single TCP if all are sent to the same path. Below we summarize the proposed updates to the mHIP multipath scheduling design presented in subsection 3.2. Parts 1,2 and 5 (connection establishment, path parameters updates and assumptions) remain unchanged, while there are some additions to the rest:
Improving TCP-Friendliness for mHIP
11
3. Sending data After per-path congestion control limitations were introduced the scheduler takes in consideration the current sizes of per-path congestion windows. If cwnd capacity of the best path is exceeded, the path with the next minimum estimated arrival time is chosen. If there is no available capacity in any of the paths, the packet is placed to the sender-side buffer until new ACK arrives. 4. Congestion control Marking is now removed from the congestion control scheme. Multipath congestion avoidance retains only one congestion indication, the standard TCP dupack event. Upon receipt of a preset number of dupacks (3 for standard TCP) the scheduler determines from which path the packet is missing and halves cwnd and ssthresh values of the corresponding path. This action reduces data intake in the congested path and automatically redirects traffic to the other paths which have available capacity. If there is no capacity in the paths, extra data goes to the sender-side buffer. Maximum capacity of the buffer is set to TCP receiver window size Rwin, making it capable to occupy the maximum flight-size number of packets in case of severe congestion situations. 4.3
Experimental Evaluation of mHIP with the Updated Congestion Control
To validate correctness of the proposed congestion control scheme we repeat the experiments with the simulation scenario described in Section 4.1. Again, one of the multipath HIP flows sent to P ath1 meets with the external TCP flow in the bottleneck link n2 − n1, while the other flow sent to P ath2 is interrupted by UDP cross-traffic in the link n0 − n3. The resulting throughputs of the two flows competing in P ath1 are shown in Figure 4. mHIP average flow throughput is T (mHIP ) = 3.56Mbps and TCP takes about T (T CP ) = 3.98Mbps resulting in the fairness factor F F = T (mHIP ) = 0.89. Now we observe the opposite extreme: mHIP flow behaves T (T CP ) too leniently and is not able to occupy available bandwidth effectively. In the following section we analyze the problem and propose a method to solve it. 4.4
Balancing between Aggressiveness and Responsiveness
Competition with the external traffic naturally influences effectiveness of multipath scheduling. Mistakes in the expected delivery time estimations result in the output sequence reordering at the receiver. TCP sender receives multiple dupacks in response to reordering, which mHIP scheduler treats as an indication of congestion. In response to the congestion mHIP scheduler halves congestion window of the corresponding path, reducing aggressiveness of the traffic flow. This precaution could be too strict in case when the missing sequence numbers are not lost but just slightly delayed in competition with the external flows.
12
T. Polishchuk and A. Gurtov
Fig. 4. mHIP flow controlled by the proposed twofold multipath congestion control is suppressed by TCP
Fig. 5. mHIP flow 1 friendly coexists with TCP New Reno flow
Improving TCP-Friendliness for mHIP
13
To differentiate between the reordering signals and actual losses we propose the following modifications to mHIP congestion control scheme. First, we increase dupthresh value defining the number or dupacks which serve as an indication of congestion. This method is proposed in the related work [4], [24] as a cure from the mild packet reordering. Compared with the default dupthresh of three, the proposed techniques improves connection throughput by reducing the number of unnecessary retransmissions. But one should adjust dupthresh value carefully since making it too large slows down the reaction of the system to the actual losses and can significantly degrade the overall performance in the networks with high loss rates. Additionally we introduce a new time variable ADDR (allowable delay due to reordering), which counts how much time has elapsed since the congestion situation in some path was reported. If the missing sequence number has arrived successfully during this allowable time period and the corresponding ACK arrives to the sender, cwnd and ssthresh of the path should be returned to the values prior to congestion notification. ADDR is chosen to be less than the shortest RTT among the paths used to deliver multipath flow. It will assure accurate differentiation between the packets delayed due to reordering and their duplicates retransmitted after the loss was reported. If the original packet arrives, the retransmitted one is naturally disregarded by the receiver. 4.5
Final Validation
Below we provide the final validation of effectiveness of the proposed modifications to mHIP congestion control. Again, we repeat the experiment described in Section 4.1 with the last version of mHIP with two-level congestion control scheme and all the proposed modifications applied. Figure 5 illustrates significant improvement in TCP-friendliness of the mHIP flow when it competes against TCP for the bottleneck link bandwidth. Finally both mHIP and TCP flows are able to achieve comparable average throughputs of T (mHIP 1) = 3.80Mbps and T (T CP ) = 3.71Mbps with the friendliness factor 1) F F = TT(mHIP = 1.02. The competition demonstrated high variation about (T CP ) the average during a short stabilization phase. This unfairness is rather moderate and can be tolerated as far as the flows quickly achieve stability and later coexist friendly. 4.6
UDP-Friendliness
An interesting observation is that the second mHIP flow in P ath2 behaves also about friendly competing against the UDP cross-traffic which we used to simulate variable network conditions between 5 and 11 seconds. On this interval mHIP achieves the throughput of T (mHIP 2) = 4.20Mbps. The solid curve in Figure 6 corresponds to the UDP cross-traffic flow with the average flow throughput T (U DP ) = 3.98Mbps. The flows fight during negligible time period and then find stability to share the bottleneck about fairly with a moderate unfairness of 2) F F = TT(mHIP (UDP ) = 1.05.
14
T. Polishchuk and A. Gurtov
Fig. 6. mHIP flow 2 competes almost friendly with UDP cross-traffic
4.7
TCP-Compatibility and TCP-Equivalence
According to the definitions TCP-compatible flow, in the steady state, should use no more bandwidth than a TCP flow under comparable conditions, while TCPequivalent scheme ensures the same throughput as TCP when they experience identical network conditions. We send mHIP to the empty 2-path network with no cross-traffic to determine how effectively the protocol is able to use a spare network capacity in the steady state. Figure 7 shows mHIP flow occupies no more available bandwidth than a TCP flow sent to the same path making it TCP-compatible. Moreover, mHIP achieves the same average flow throughput of 7.8Mbps as TCP in the steady state and thus meets the criteria of TCP-equivalence. 4.8
Friendliness to the Other mHIP
Another interesting question is whether mHIP competes friendly against the other mHIP connections. We run six multipath HIP connections in the simulated network scenario similar to the one we used in the experiments presented above, but now we have three parallel paths connecting the common source and destination with the following path bandwidths: 8 Mbps, 4 Mbps and 4 Mbps and the corresponding propagation delays: 60ms, 60ms and 20ms, which provide some diversity in the network parameters. Figure 8 demonstrates how the total network bandwidth is divided between the six multipath HIP bundles. The comparison shows a tolerable unfairness with the friendliness factor differing from 0.92 to 1.08. We conclude that multiple mHIP connections can coexist in the shared multipath network quite friendly to each other.
Improving TCP-Friendliness for mHIP
Fig. 7. Testing TCP-compatibility and equivalence of mHIP
Fig. 8. Six mHIP connections share 3-path network about fair
15
16
4.9
T. Polishchuk and A. Gurtov
The Cost of Friendliness
We achieved the desired level of TCP-friendliness for our multipath HIP solution and would like to evaluate the cost in terms of performance degradation paid for this improvement. We calculate the total throughput T T of the traffic flow controlled by multipath HIP. In the experiment where mHIP with simple congestion control policy demonstrated an excessive unfriendliness competing against TCP NewReno, T T (mHIP ) = 6.45Mbps. After we applied a series of modifications to mHIP congestion control, similar experiment with the TCP-friendly mHIP resulted in T T (mHIP ) = 5.30Mbps, which corresponds to ∼18% performance reduction. A number of experiments with different network conditions confirmed the desired TCP-friendliness can be achieved at the cost of about 15-20% performance degradation.
5
Conclusions and Future Work
We showed a way how to tune aggressiveness of the multipath data transmission controlled by mHIP without loosing its responsiveness in competition with crosstraffic. We designed a twofold congestion control scheme, and adjusted it to meet the TCP-friendliness definitions. Simulation results verify the improved congestion control algorithm meets TCP-compatibility, TCP-equivalence and TCP-equal share criteria under the proposed testing scenarios, and allows mHIP to coexist friendly with the other TCP, UDP and mHIP connections. The work could be extended to provide a method to dynamically adjust mHIP congestion control variables and enable adaptivity to random congestion scenarios including extreme cases. We will continue examining mHIP friendliness in competition against different transport protocols other than TCP and compare the results against the alternative multipath proposals.
Acknowledgments This work was supported in part by TEKES as part of the Future Internet program of the ICT cluster of the Finnish Strategic Centers for Science, Technology and Innovation.
References 1. Network simulator ns-2, http://www.isi.edu/nsnam/ns/ (last checked 15/02/2010) 2. Barre, S., Bonaventure, O.: Shim6 implementation report: Linshim6. Internet draft, draft-barre-shim6-impl-03.txt (September 2009) 3. Bhandarkar, S., Reddy, A.L.N.: TCP-DCR: Making TCP robust to non-congestion events. In: Mitrou, N., Kontovasilis, K.P., Rouskas, G.N., Iliadis, I., Merakos, L.F. (eds.) NETWORKING 2004. LNCS, vol. 3042, pp. 712–724. Springer, Heidelberg (2004)
Improving TCP-Friendliness for mHIP
17
4. Blanton, E., Allman, M.: On making TCP more robust to packet reordering. ACM Computer Communication Review 32 (2002) 5. Chebrolu, K., Raman, B., Rao, R.R.: A network layer approach to enable TCP over multiple interfaces. Wirel. Netw. 11(5), 637–650 (2005) 6. Ford, A., Raiciu, C., Barre, S., Iyengar, J.: Architectural guidelines for multipath TCP development. Technical report, Internet draft, draft-ietf-mptcp-architecture01 (June 2010) (work in progress) 7. Gurtov, A.: Host Identity Protocol (HIP): Towards the Secure Mobile Internet. Wiley and Sons, Chichester (2008) 8. Gurtov, A., Polishchuk, T.: Secure multipath transport for legacy Internet applications. In: Proc. of BROADNETS 2009, Madrid, Spain (September 2009) 9. Hacker, T.J., Noble, B.D., Athey, B.D.: Improving throughput and maintaining fairness using parallel TCP. In: IEEE InfoCom (2004) 10. Ishida, T., Ueda, K., Yakoh, T.: Fairness and utilization in multipath network flow optimization. In: Proc. of 2006 IEEE International Conference on Industrial Informatics, pp. 1096–1101 (2006) 11. Ishiyama, M., Kunishi, M., Teraoka, F.: An analysis of mobility handling in LIN6. In: Proc. of International Symposium on Wireless Personal Multimedia Communications (WPMC 2001) (August 2001) 12. Jokela, P., Moskowitz, R., Nikander, P.: Using the Encapsulating Security Payload (ESP) transport format with the Host Identity Protocol (HIP). IETF RFC 5202 (March 2008) 13. Jungmaier, A., Rescorla, E., Tuexen, M.: Transport layer security over Stream Control Transmission Protocol. RFC 3436, IETF (December 2002) 14. Kempf, J., Arkko, J., Nikander, P.: Mobile IPv6 security. Wirel. Pers. Commun. 29(3-4), 389–414 (2004) 15. Kim, K.-H., Shin, K.G.: Improving TCP performance over wireless networks with collaborative multi-homed mobile hosts. In: Proc. of the 3rd Int. Conf. on Mobile Systems, Applications, and Services (MobiSys 2005), pp. 107–120 (June 2005) 16. Leung, K.-C., Li, V.O., Yang, D.: An overview of packet reordering in Transmission Control Protocol (tcp): Problems, solutions, and challenges. IEEE Transactions on Parallel and Distributed Systems 18, 522–535 (2007) 17. Ludwig, R., Katz, R.H.: The Eifel algorithm: making TCP robust against spurious retransmissions. SIGCOMM Comput. Commun. Rev. 30(1), 30–36 (2000) 18. Raiciu, C.: Coupled multipath-aware congestion control (March 2010) (work in progress) 19. Ramasubramanian, S., Krishnamoorthy, H., Krunz, M.: Disjoint multipath routing using colored trees. Comput. Netw. 51(8), 2163–2180 (2007) 20. Stevens, W.R.: TCP/IP illustrated: TCP for transactions, HTTP, NNTP, and the Unix domain protocols, vol. 3. Addison Wesley Longman Publishing Co., Inc., Redwood City (1996) 21. Stevens, W.R.: TCP slow start, congestion avoidance, fast retransmit, and fast recovery algorithms. RFC 2001, IETF (January 1997) 22. Tsao, S.-C., Chiao, N.: Taxonomy and evaluation of TCP-friendly congestioncontrol schemes on fairness, aggressiveness, and responsiveness. IEEE Network 21(6), 6–15 (2007) 23. Wischik, D., Handley, M., Braun, M.B.: The resource pooling principle. SIGCOMM Comput. Commun. Rev. 38(5), 47–52 (2008) 24. Zhang, M., Karp, B., Floyd, S., Peterson, L.: RR-TCP: A reordering-robust TCP with DSACK. In: Proc. of IEEE ICNP, pp. 95–106 (2003)
Automatic Base Station Deployment Algorithm in Next Generation Cellular Networks Istv´ an T¨ or˝ os and P´eter Fazekas Dept. of Telecommunications, Budapest University of Technology and Economics, Magyar tud´ osok k¨ or´ utja 2., 1117 Budapest, Hungary {toros,fazekasp}@hit.bme.hu
Abstract. The optimal base station placement and effective radio resource management are of paramount importance tasks in cellular wireless networks. This paper deals with automatic planning of base station sites on a studied scenario, maintaining coverage requirement and enabling the transmission of traffic demands distributed over the area. A city scenario with different demands is examined and the advantages/disadvantages of this method are discussed. The planner and optimizing tasks are based on an iterative K-Means clustering method. The planning method involves base station positioning and selecting antenna main lobe direction. Results of the output network deployment of this algorithm are shown, with various traffic loads over the studied area. Keywords: cellular network planning, coverage, capacity.
1
Introduction
The radio planning of cellular wireless networks is a highly investigated topic, because operators can save budget using a cost efficient planning method. The planning of network must satisfy the interests of operators such as high spectral efficiency and low infrastructure cost. Developing and using an algorithm that automatically plans the positions of base stations and provides the necessary coverage and capacity over the area with small number of stations is thus of utmost importance. However, planning of the forthcoming 3GPP Long Term Evolution (LTE) networks, with its specific radio interface features is less covered in the literature yet. The effective placement is a complex problem. The designer has to choose the optimal positions of base stations and directions of antennas. Frequency planning is not an issue in 3G networks, as the basic spread spectrum radio interface allows the deployment of reuse 1 scheme, that is each cell may use the same frequency band. However, the forthcoming 3GPP LTE network is based on OFDMA (Orthogonal Frequency Division Multiple Access) technology, where both the frequency band and timeslots are radio resources, effective radio resource management (distributing radio resource in frequency and time) algorithms should operate. Hence the frequency band may be dynamically used at the cell where is needed, effectively resulting in a dynamic frequency distribution among cells. R. Szab´ o et al. (Eds.): AccessNets 2010, LNICST 63, pp. 18–31, 2011. c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
Automatic Base Station Deployment Algorithm
19
This basic property and operation should be taken into account right during the network planning phase, otherwise the resulting deployment may be inefficient and wasting spectrum resource. Our research is focused on the development and investigation of such an automatic planning algorithm. The optimization task is based on a clustering method which is very popular suggestion for planning, the K-means algorithm. However, most of the methods that use K-means concentrate either on dimensioning or optimization of the network and they require prediction of number of beginning clusters, which is not straightforward, see e.g. [1][2]. In [3][7-10] cellular network planning solutions were targeted, however the authors hasn’t considered the full complexity of problem. Their main task is the base station positioning criteria, and they use simplified model. In [11] the author has proposed a complex algorithm with base station selection and configuration. However, in this paper the basic assumption of having a given maximum traffic load expressed in Erlangs per cell is not applicable if we consider next generation networks, that mainly carry high speed data services and cannot be characterised by a simple capacity measure due to the varying nature of radio channel (both over time and over positions in the area), the aplied adaptive modulation and coding. In contrast, our method does both the dimensioning and optimization steps of cellular planning and does not require initial estimations, rather can start with an empty (in terms of number of base stations) area, with an arbitrarily placed single base station and places the necessary stations over this. However, if required, the algorithm might be used starting with an initial arbitrary network topology (location of arbitrary number of base stations) and places new base stations to fulfil coverage and capacity requirements. This is useful in the case when network deployment strategy has to be planned in order to serve increasing capacity demands in an already running network. It is important to note that the location algorithm creates the clusters based on the properties of base station which were initialized at the beginning. The frequency adaptation relies on the structures of base stations. This rest of the paper is composed as follows. In Section 2 the basic modelling environment of the cellular network is characterised. In the Section 3 the aims of planning algorithm are described. It is followed by the Section 4 that details the planning algorithm. In Section 5 the results are showed by different graphs. The last Section 6 includes our conclusion.
2
Modelling Environment
The terrain where the planning method could be used can be simply described by the set of applicable coordinates over the area and the given traffic amount over the area, assigned to any subset of the coordinates on the terrain. For the sake of easier understanding, the method is described using concrete example area on which our examinations were conducted. The environment of our evaluations is a 9 km2 square city area, which is modelled as having three different layers. This layering approach does not have
20
I. T¨ or˝ os and P. Fazekas
significance in terms of the planning algorithm, but for the numerical calculations we condicted. The first is the flat geographical layer which is followed by layer of roads and buildings. Any point of the area is defined by Cartesian coordinates. The resolution of the coordinates is 100 m2 in our example, so x and y are supported on the interval [0..300]. We note the points where traffic demands of users are supposed to be known by DP (Demand Point). DP = {DP1 , DP2 , ..., DPm } where m is the number of DPs in our environment. These points are represented by (xi ,yi ,di ), where xi , yi are the coordinates and di are the demands of DPs (1 ≤ i ≤ m), expressed in kbps. The di parameter in our model depends on the location of the DP. According to our assumptions those DPs that are placed within buildings has higher demands. Morover, our model assumes that along the roads we find more users, hence DPs are placed more frequently along roads and their traffic requirement is higher as well. The aim of the lanning algorithm will be to serve all DPs, so we have to compute the required resource that is provided by base stations. 2.1
Base Station Model
Base Station (BS) is the equipment that provides the radio resource to our wireless network. We suppose that a BS operates three cells through three sectorised antennas. The BSs are represented by BS = {BS1 , BS2 , ..., BSn } where n is the number of BSs in our environment. 2.2
Antenna Model
To keep the model realistic, sectorized antennas are assumed. The antenna horizontal characteristic is described by equations (1) and (2), IF α ≤ 90, then P ower = cos2 (α) ∗ pw.
(1)
IF α > 90, then P ower = 0
(2)
where α is an angle between the main direction of sector antenna and a changing vector pointing towards the actual location under examination and pw is the transmitter gain extended by antenna gain. Hence, during the calculations, signal strength is determined (along with the path loss model) according to the direction of a given point of the map. The vertical characteristic is described by (3). P ower = cos2 (α − x) ∗ pw .
(3)
Automatic Base Station Deployment Algorithm
21
Fig. 1. Horizontal characteristic of Sector Antenna
We can employ (3) in all directions where α is the vertical angle of the main direction of sector antenna and x is the vertical angle of changing vector. The BS-s are planned with the traditional layout, namely three sector antennas with 120 degrees separation between their main directions. 2.3
Propagation Models
We use COST 231 path loss model for big city environment in our simulations. This has the advantage that it can be implemented easily without expensive geographical database, yet it is accurate enough, captures major properties of propagation and used widely in cellular network planning. This model is a function of the carrier frequency f which range from 50 MHz to 2 GHz, of the effective height of base station hb and the effective height of mobile hm [5]. The attenuation is given by equation (4) Ap = 46.3 + 33.9 ∗ log10 f − 13.82 ∗ log10 hb + (44.9 − 6.55 ∗ log10 hb ) ∗ log10 d − a(hm ) + Cm
(4)
where for large cities and f ≥ 400MHz. a(hm ) = 3.2 ∗ (log10 (11.75 ∗ hm ))2 − 4.97
(5)
and Cm =3dB in large cities. Along with this model, a slow fading is also taken into account by means of a single fading margin expressed in dB. We extended this model by out to indoor propagation. If a DP is localized in a building, than the strength of signal will be decrease 10 dB. 2.4
Signal Propagation
We can describe the signal propagation by the next equation RSi,j (DPn ) = T Si,j + T Againi,j − P L + RAgain − C
(6)
22
I. T¨ or˝ os and P. Fazekas
where RSi,j (DPn ) is the received signal of nth DP from j th transmitter of ith BS in dBm, T Si,j is the transmitter power of j th transmitter of ith BS in dB, T Againi,j is the gain of j th transmitter of ith BS in dB(taking into account antenna characteristics), P L is the pathloss in dB (COST-231), RAgain is the gain of receiver antenna in dB, C is the fading margin in dB. This calculation is executed for every DP of the map from all transmitters. 2.5
Sector
A sector is defined as the set of DPs that are covered by same transmitter. The “best server” policy is followed within the network. Si,j = {DPh : RSi,j (DPh ) ≥ min and RSi,j (DPh ) ≥ RSl,k (DPh ) 1 ≤ i, l ≤ n, 1 ≤ j, k ≤ 3, i = l, j = k} where min is the minimal strength of signal that the mobile phone can receive that.
3
Aim Description
The efficiency of mobile wireless networks is described by serving bit rate per cell value. This metric depends on the distribution of SINR (Signal to Noise plus Inerference Ratio) values of the given cell. The main task, that the mobile phones can receive the signal stronger than the overall interferences in any point. Consequently the SINR values of DPs have to be increase by an efficiency planning algorithm. The cost of infrastructure is the other key factor. If we used any amount of BSs within our network, then we could serve our demands assuredly lavish in spending. Possibility of increasing of efficiency: – Observing of signal propagation. This is very important factor, because we can save resource if the high demand DPs are served by small number of frequencies. This effect can be achieved, if the BSs are placed near these DPs. The received serving signal will be stronger and interference is constant in these positions, so the SINR will be higher. The spectral efficiency of DPs can be increase if they are placed on the beam. If the directions of DPs are subtended smaller angle with the main direction of serving antenna, then the received serving signal will be stronger. – Efficient frequency adaptation. This factor is also very important, because we can increase SINR value in the position of user if the neighbour interference signals are controlled. 3.1
Coverage
An important task, that the coverage criteria of DPs is guaranteed. If every DPs are covered by any sector then this requirement is accomplished. ∀DPi ∈ Sj,k i ∈ (0..m), j ∈ (0..n), k ∈ (1..3)
Automatic Base Station Deployment Algorithm
3.2
23
Computing of Signal to Noise Plus Inerference Ratio
Another important task that the demands of DPs are served. The resources of network can be managed by frequency adaptation and power management. Our planning procedure uses the properties of LTE radio resource management (RRM). This type of RRM uses OFDMA multiplexing scheme in the LTE downlink. The whole spectrum is divided subcarriers which bandwidth is 15 KHz. Furthermore the time is also divided slots. The users are allocated a specific number of subcarriers for a predetermined amount of time. A PRB is defined as consisting of 12 consecutive subcarriers for one subframe (1 msec) in duration. PRB is the smallest element of resource allocation assigned by the base station scheduler. First of all we have to calculate the amount of interference signals. An interfering transmitter can be defined as equipment that provides the DP with signal strength that is stronger than service threshold but is not the best server. This effect can be observed if the best server and the interfering transmitter send the signals by same PRB. We can describe the SINR by the next equation. SIN Rh = n
k=1
3
RSi,j (DPh )
l=1
RSk,l (DPh ) + N oise
, N OT (k = i and l = j)
(7)
The power of thermal noise (Noise) is taken to be -101 dBm in the evaluations. 3.3
Spectral Efficiency
The relationship between SINR and spectral efficiency is given by the so called Alpha-Shannon Formula, suggested to be used for LTE networks [4]. SIN Rh
SpectralEf f iciencyh = α ∗ log2 (1 + 10 10∗impf actor )
(8)
where α=0.75, impf actor=1.25, and SIN Rh is Signal Noise Interference ratio at the DPh in dB. The unit of spectral efficiency is expressed bit/sec/Hz, so one PRB can carry to DPh 180*1024*0.001*SpectralEf f iciencyh bits per second. Furthermore the number of required PRBs of DPh can be defined by N umberof P RBh =
dh 180 ∗ 1024 ∗ 0.001 ∗ SpectralEf f iciencyh
(9)
If we calculate the required PRBs in all sectors then we can decide that the actual sector can serve the covered DPs. 3.4
Objective Function
We have to define an objective function, that demonstrates our aim. The overall using packet resource blocks are U sedP RBs = N umberof P RBi (10) ∀DPi
24
I. T¨ or˝ os and P. Fazekas
Our main tasks are to serve all demands and cover the entire map as well as minimize U sedP RBs. We can more decrease the number of used PRBs, if the spectral efficiency are increased in the positions of high demand points. The aim of proposed algorithm is to guarantee the higher SINR value in these points. 3.5
Initialization
Before the planning is started we have to give some key parameters of model. These are the placing of DPs and the power of transmitters. The map of roads and buildings is constant in all simulations.
4
The Planning Algorithm
The core method of our algorithm is the K-means clustering. This mechanism will shift the BSs and will rotate our antennas. 4.1
K-Means Clustering
This produces a separation of the objects into groups from which the metric to be minimized can be calculated. We use this algorithm to cluster the DPs and form sets of them (sectors). The criterion function (ρ(xi ,mj ) which has to be minimized, is the distance measure between an object (xi ) and the cluster centre (mj ) [6]. The first is the assignment step. Join each demand to the closest cluster. Cit = {xj : ρ(xj , mj ) < ρ(xj , m∗i ) f or all i∗ }
(11)
Cit is the closest cluster of xj demand at the tth step. The other is update step. (t+1)
mi
=
1 xj #Cit t
(12)
x⊂Ci
where #Cit is the number and x is the location of DP within ith cluster (Ci ). This equation (12) calculates the new means to be the center point in the cluster. The algorithm is composed of the following steps: 1. Place K points into the space represented by the DPs that are being clustered. These points represent initial group centroids (BSs). 2. Assign each DP to the group that has the closest centroid. (Assignment step) 3. When all DPs have been assigned, recalculate the properties of the K centroids. (Update step) 4. Repeat Steps 2 and 3 until the centroids no longer move or our counter of iteration expire.
Automatic Base Station Deployment Algorithm
25
Fig. 2. Main flowchart diagram of RF planning algorithm
4.2
Main Algorithm
Our Planning Algorithm (PA) is made up of four interdependent blocks (Figure 2). This procedure will run until all DPs will be served. At the beginning PA places one BS to the center of map. The next step is a conjunct procedure (CP). In CP we will create the sectors and calculate SINR for all covered DPs (6)(7). The BS placement and the antenna rotation will run alternately six times. After some cycles the moving of BSs and the rotation of antennas will decrease, hence we intuitively chose the K-means clustering to run for six cycles. CP will run after every BS positioning and antenna rotation mechanism. The BS positioning algorithm is based on K-means. The centroids of clusters are the BSs. The assignment step is the procedure of sector creation, but one cluster will be made up of three sectors of one BS. The necessary ρ(xi ,mj ) metric is the strength of received signal in xi position (position of DPi ) from mj position of transmitter. In the update step the position of covered DP (xi ) will be weighted by the demands of DP (di ). #C is the amount of demands of covered DPs within the cluster. The aim of this procedure, that the DPs with higher demand are positioned near the serving transmitter, so we can save resource by higher SINR values. The antenna rotation algorithm is also based on K-means clustering. In the previous procedure we achieved that the higher demands will be placed close to serving transmitters. Our frequency adaptation will run with a frequency reuse factor of 1, so every adjacent sector will be interfering. Our aim that the directions of covered DPs with higher demand are subtended smaller angle with the main direction of serving antenna. The assignment step is also the procedure of sector creation. In the update step xi is the subtended angle between the direction of covered DPi within the sector and the main direction of serving transmitter weighted by the demands of DP di . #C is the amount of demands of covered DPs within the cluster. The clusters of this K-means procedure are also the three covered sectors of BSs, so the rotation of three antennas will happen equally.
26
I. T¨ or˝ os and P. Fazekas
After six cycles will run the Radio Resource Management (RRM). This is made up of PRB adaptation (Figure 3) mechanism and power allocation. The power allocation is very simple, because every PRB will be transmitted maximal or null strength by transmitter. The PRB adaptation is a cycle procedure which will assign the required number of PRBs in every sector. We choose the first PRB (0 subcarrier, 0 subframe) in all sectors and adapt to the unserved DP with the highest SINR. If the adaptation of actual PRB is successful in every sector, then we will choose the next PRB. If one sector is served, then the transmitter of this will not transmit on the remaining PRBs and we have to run the CP without this transmitter. This procedure run until serving of all DPs is successful. After that we will find the most unserved sector (MUS). If the greatest number of required PRBs is less than the number of available PRBs of actual sector then we haven’t MUS and the algorithm will run the coverage filling procedure. Otherwise the algorithm locates a new base station near the serving antenna of MUS in the main direction and CP, positioning and rotation procedures will start again. The DPs of MUS will connect with the new placed BS, because the new signals will arrive from nearer position. The update step of BS positioning will shift them. We can see this mechanism on Figure 4.
Fig. 3. PRB adaptation
Fig. 4. Mechanism of BS positioning
Automatic Base Station Deployment Algorithm
27
The coverage filling procedure will find the uncovered DPs. If this method don’t find anything then we are ready. Else we have to cover this DP by a new BS, and run the above procedure. The detailed flowchart diagram of the algorithm is presented in Figure 5.
Fig. 5. Extended planning algorithm
5
Results
We ran the planning algorithm with different overall demands (50Mbit/sec2500Mbit/sec) and different powers of transmitters (1W,30W). The properties
28
I. T¨ or˝ os and P. Fazekas
of DPs will be changed in every simulation. We use 20 Mhz spectrum allocation and the frequency reusing factor is 1. All transmitters can use the whole spectrum (100*1000 PRBs per sec). The required service treshold value is -115 dBm. The next figure (Figure 5) will show the required number of BSs if we use the transmitters with 1W or 30W power. We represented 98 and 100 percentile serving of the entire DPs.
Fig. 6. Number of Base Stations in different loaded environments
We can see on this graphic (Figure 6) that the simulations with 1W power and small overall demands require a great number of BS. The reason of this that the PA focussed the coverage criteria. The transmitter with 1W didn’t able to cover the far DPs. The other problem is the users who stay home, because the small strength of signal can’t pass the serving threshold owing to the shading of walls. The required number of BSs stagnates from 226 Mbit/s to 676 Mbit/s. The reason of this, that the loading of sectors is low in 226 Mbit/s case and this loading will increase later. The simulations with 30W power place BSs with large coverage, so at the beginning the DPs will be served by few BSs. If the overall demands are higher, then the simulations with 1W and 30W power will use similar number of BSs. In this situation the algorithm will focus the serving criteria of the demands and new BSs have to be placed in both case. The BSs will be placed nearer and nearer, so the SINR values of DPs will be similary in both case. We can see that the full serving requires more BSs than the 98 percental serving in both type of transmitter.
Automatic Base Station Deployment Algorithm
29
Fig. 7. Overall bit rate per sector in different loaded environments
We can see on this figure (Figure 7) the average bit rate per sector in different simulations. The transmitter with 30W power results similar bit rate per sector (≈ 20Mbit/s in the case of 100% serving and ≈ 22Mbit/s in the case of 98% serving) in all simulations. The transmitter with 1W results different bit rate per sector in the simulations. At the beginning we placed small overall demands on the map but we need cover all DPs, so the PA will place more small sectors and the average bit rate will decrease. In the next some simulations the overall demands will increase but the number of BSs are pretty much the same, so the average bit rate per cell will increase. At the end of simulations both type of transmitter will result similary bit rate per cell values.
Fig. 8. Number of used and free PRBs during the running of planning algorithm in the simulation with 2508,65 Mbit/s overall demand and 1W power of transmitter
30
I. T¨ or˝ os and P. Fazekas
We can see on this diagram (Figure 8) that the number of free PRBs will increase during simulations, because the number of required PRBs will decrease in some sectors owing to the new BS placing. If we analyze the end state of simulations, then we can propose the required spectrum allocation of sectors, because we will see the required number of PRBs of every sector.
Fig. 9. Spectral efficiency during the running of planning algorithm in the simulation with 2508,65 Mbit/s overall demand and 1W power of transmitter
In Figure 9, the average spectral efficiency of the network is presented, as the algorithm places new base stations over the area. We can see that the average spectral efficiency will decrease continuously then will stagnate. This stagnation means that the planning algorithm efficient, because the average SINR of the network won’t be decrease if we use more BSs.
Fig. 10. End states of the simulations with 676,73, 932,05, 2059,19 Mbit/s overall demands
We can see on this picture (Figure 10) the results of three different simulations. If we give poorly overall demands then the average sizes of sectors will be larger. This property of sectors will be decrease if the environment is more loaded.
Automatic Base Station Deployment Algorithm
6
31
Conclusions
In this paper a novel algorithm was shown, that enables the automatic placement and determination of the number of base stations, that can serve a cellular network area with given traffic conditions. The algorithm is based on realistic assumptions and can be used for any legacy system, with arbitrary Radio Resource Control method applied in the network. Numerical methods were presented, showing that the algorithm reaches total coverage and allows the service of all traffic demands. Future work will investigate an effective power management which will included in RRM.
References 1. Karam, O.H., Fattouh, L., Youssef, N., Abdelazim, A.E.: Employing Clustering Techniques in Planning Wireless Local Loop Communication Systems: PlanAir. In: 11th International Conference On Artificial Intelligence Applications Cairo, Egypt, February 23-26 (2005) 2. Mishra, A.R.: Advanced Cellular Network Planning and Optimization, pp. 15–197. John Wiley & Sons Ltd., Chichester (2007) 3. Calegarie, P., Guidec, F., Kuonen, P., Chamaret, B., Udeba, S., Josselin, S., Wagner, D.: Radio network planning with combinatorial algorithms. ACTS Mobile Commun., 707–713 (1996) 4. Basit, A.: Dimensioning of LTE Network, Description of Models and Tool, Coverage and Capacity Estimation of 3GPP Long Term Evolution radio interface (2009), http://lib.tkk.fi/Dipl/2009/urn100056.pdf 5. Barclay, L.: Propagation of Radiowaves, p. 194. The Institution of Electrical Engineers, London (2003) 6. MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967) 7. Ramamurthy, H., Karandikar, A.: B-Hive: A cell planning tool for urban wireless networks. In: 9th National Conference on Communications (2003) 8. Tutschku, K.: Demand-based Radio Network Planning of Cellular Mobile Communication Systems. In: INFOCOM 1998, pp. 1054–1061 (1998) 9. McGeehan, J., Anderson, H.: Optimizing microcell base station locations using simulated annealing techniques. In: Proc. 44th IEEE Vehicular Technology Conf., pp. 858–862 (1994) 10. Molina, A., Athanasiadou, G., Nix, A.: The automatic location of base-stations for optimized cellular coverage: A new combinatorial approach. Presented at the IEEE Vehicular Technology Conference (1999) 11. Hurley, S.: Planning effective cellular mobile radio networks. IEEE Trans. Vehicular Technol. 51(2), 243–253
A Fast and Simple Scheme for Mobile Station-Controlled Handover in Mobile WiMAX Sayan Kumar Ray1, Swapan Kumar Ray2, Krzysztof Pawlikowski1, Allan McInnes3, and Harsha Sirisena3 1
Department of Computer Science and Software Engineering, University of Canterbury, Christchurch, New Zealand 2 Department of Computer Science and Engineering, Jadavpur University, Kolkata, India 3 Department of Electrical and Computer Engineering, University of Canterbury, Christchurch, New Zealand {sayan.ray@pg.,krys.pawlikowski@,allan.mcinnes@, harsha.sirisena@}canterbury.ac.nz,
[email protected]
Abstract. A Mobile Station (MS)-controlled fast and simple scheme of handover (HO) in Mobile WiMAX network has been described. An MS can roughly estimate its present distance from any neighbouring Base Stations (BS) using the Received Signal Strength (RSS) and an appropriate pathloss formula. From the Mobile Neighbor Advertisement (MOB_NBR-ADV) broadcasts, the MS periodically monitors the RSS of its Serving BS (SBS), chooses the appropriate times to perform few scanning of selected Neighbouring BSs (NBS) and estimates their changing distances to compute their respective angles of divergence from its own line of motion. The MS selects the NBS having the minimum angle of divergence (AOD), coupled with satisfactory quality of service and bandwidth capability, as its Target BS (TBS) and requests the SBS for executing this HO. Simulation studies show fairly reduced HO latency. MS-controlled HO promises greatly increased scalability for the Mobile WiMAX network. Keywords: Handover in Mobile WiMAX; MS-controlled fast handover; distance estimation and lookahead handover; MS self-tracking; scalability improvement in Mobile WiMAX; Angle of Divergence; RSS-based Distance Estimation.
1 Introduction Attractive features like high data rate, spectral efficiency, extended area coverage, and low cost are steadily increasing the deployment of Mobile WiMAX (IEEE 802.16e) networks. However, designing improved processes of HO remains an important area of research. HO is the process of transferring an ongoing connection of an MS from its current BS (SBS) to its next or would be SBS. It must be carried out fast, without causing any call break, and also efficiently, without consuming much of the network resources. After the various recommendations that were made in the Mobile WiMAX standard [1] and WiMAX Forum documents [2] regarding the parameters to be used, R. Szabó et al. (Eds.): AccessNets 2010, LNICST 63, pp. 32–44, 2011. © Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
A Fast and Simple Scheme for MS-Controlled Handover in Mobile WiMAX
33
types of mobility to be considered, etc., in designing HO algorithms, a large number of different approaches toward performing HO have been suggested in the literature. A brief review of these will be provided in the next section. In the present paper, we propose a fully MS-controlled fast and efficient Hard HO (HHO) scheme in Mobile WiMAX, by following the broad approach adopted by its precursor paper [3]. It is based on the concept of an MS estimating its present distances from its NBSs by utilizing their RSS and performing, using these distance estimates, an appropriate lookahead algorithm for selecting its TBS. Actually, with the knowledge of the RSS of its SBS and the NBSs and also knowledge of its own absolute velocity, the MS can itself ascertain its need of a HO, determine its relative velocity with respect to its NBSs, select the TBS and, finally, just request its SBS for handing it over to its selected TBS. This approach of RSS-based distance estimation followed by an appropriate lookahead technique was originally developed in connection with a Modified Distance Vector Routing (MDVR) algorithm [4] for use in Mobile Ad-Hoc Networks (MANET), which have no infrastructures like BSs, Access Points etc. Recently, this idea was used in a MS-controlled fast MAC-layer HO scheme [3] in Mobile WiMAX. The chief attraction of such simple MS-controlled HO techniques in Mobile WiMAX lies in the possibility of achieving enhanced scalability of Mobile WiMAX networks by distributing much of the HO-related work of each BS to the large number of MSs being served by it, without the MSs being much burdened either. As a matter of fact, the MS selects its TBS by virtue of two criteria: (i) meeting the bandwidth (BW) and quality of service (QoS) requirement for its ongoing call and (ii) showing the highest relative velocity with respect to it (i.e. the MS). Hence those NBSs that either do not meet the QoS-BW requirement or do not show at least a progressive or approaching relative velocity with respect to the MS are not even considered for (further) scanning. Because of greatly reduced scanning and ranging activities in this scheme, as compared to that in the Mobile WiMAX standard or many of the proposed HO schemes, the overall HO delay here is considerably reduced, thus improving the expected call drop performance. The rest of this paper is organised as follows. The IEEE 802.16e HHO procedure and the HO-related research works are briefly reviewed in Section 2. Section 3 discusses the principle and implementation methodology of our new scheme. Section 4 presents the simulation details and the numerical results that have been obtained. Finally, the paper ends with some conclusions drawn in Section 5.
2 Mobile WiMAX HHO and Related Research Work Although the Mobile WiMAX standard supports three types of HO procedures, namely, the HHO, the Macro Diversity HO (MDHO) and the Fast Base Station Switching (FBSS), the HHO is the default and the most commonly used procedure. The two main phases in the Mobile WiMAX HHO procedure [1] are the Network Topology Acquisition Phase (NTAP) and the Actual Handover Phase (AHOP). In Mobile WiMAX, the HO process is triggered when the strength of the signal received by the MS from its SBS drops below a certain threshold level. During the NTAP, the MS and the SBS, with the help of the backhaul network, jointly gather information about the underlying network topology before the actual HO decision is made. The
34
S.K. Ray et al.
SBS, using MOB_NBR-ADV messages, periodically broadcasts information about the state of its neighbouring BSs. Based on these information, the MS performs repeated scanning and ranging activities with different available NBSs (irrespective of MS’s movement direction and QoS and BW availabilities of the NBSs) before finally a suitable TBS is selected, with the active help of the SBS, for a potential HO activity. Several MAC management messages are exchanged between MS and SBS in the whole process. In AHOP, after the TBS has been finalized, the MS terminates its connection with the SBS by informing it with the help of a MOB_HO-IND (Mobile Handover Indication) message. Next, following a series of MAC management procedures between the MS and the TBS, involving synchronisation, ranging, authorization and registration, the MS becomes fully functional with the new SBS. A detailed description of the HO procedure can be found in [1]. The conventional Mobile WiMAX HO procedure has some important limitations. Prolonged scanning and ranging-related activities during NTAP cause much delay and create primary hindrances for delay-sensitive real-time applications. On the other hand, AHOP suffers from lengthy inter-HO gap because of the extensive network reentry activities of an MS [5]. Recent 802.16e HHO-related research has focused mostly on attempts to reduce the disruptive effects of these constraints. The schemes proposed in [6-7] suggest prediction of TBSs before the scanning and ranging activities on the basis of different factors like BSs’ coverage, MS’s mobility direction, required bandwidth and QoS for HO, etc. In all cases, scanning and ranging related activities are sought to be reduced. Schemes proposed in [8] and [9], focus on minimizing the disruptive effects of Mobile WiMAX channel scanning activities during HO in case of different types of traffic and noise levels. Works related to reducing the handover latency by shortening the inter-HO gap during the AHOP were proposed in [10-12]. Recently, a cross-layer HO scheme based on the mobility prediction of the MS using the signal strengths of the BSs was proposed in [13]. The total HO latency here was reduced by initiating the layer-3 HO activities prior to the layer-2 HO activities. However, this movement prediction scheme did not reduce the MAC layer HO time. It may be pointed out here that most of these proposed HO schemes in Mobile WiMAX are largely controlled by the SBS with possible assistance from the MS, the only exception probably being the cross-layer HO scheme [13], which is “MSinitiated”. It must be recognized that an SBS controlling the HO of all MSs being served by it, creates the important problem of scalability owing to excessive load on the SBS. An MS-controlled HO arrangement where the MSs can themselves select, with acceptable power consumption, their respective TBSs (next SBSs) and then request the present SBS for effecting the actual HO process, via the backbone network, may provide a better alternative. In the MS-controlled HO scheme [3], the MS can, at any time, obtain a rough estimate of its present distance from any NBS using the measured value of the relevant RSS in an appropriate pathloss formula. Through periodic monitoring of the RSS of the SBS, the MS ascertains the need of a HO and then starts scanning only those NBSs, which have been chosen as “potential” TBSs. With a few scanning cycles, each yielding the latest distance estimates of the NBSs, the MS selects the NBS for which it has the highest relative velocity and requests the SBS to hand it over to this selected TBS.
A Fast and Simple Scheme for MS-Controlled Handover in Mobile WiMAX
35
3 Proposed MS-Controlled Fast HHO Scheme Since the present paper shares the same broad approach towards achieving a fast HO in Mobile WiMAX that had guided its precursor paper [3], we first present a recapitulation of the salient points of this broad approach before describing the new lookahead technique for TBS selection that is proposed in this paper. The key idea [4] is that any station, fixed or mobile, in a wireless network can, at any time, roughly estimate its present distance from any neighbouring station, fixed or mobile, by measuring (after suitable and adequate signal processing like filtering etc.) the strength of the signal received from the latter and using this RSS information in an appropriate pathloss formula [14]. It may be pointed out here that the parameter Received Signal Strength Indicator (RSSI) used in conventional Mobile WiMAX handover framework is actually obtained after some filtering of the received carrier signal (to reduce the effect of random noise and fading) followed by computing its logarithm. However, in this paper we propose to use as the RSS the received carrier signal, only after the appropriate signal processing (to take care of random noise, fading, shadowing etc) but before computation of the logarithm. The idea of distance estimation using the RSS has also been recently investigated [15] for use in localization in WiMAX networks [16] and this study has yielded, along with a new empirical pathloss formula, encouraging results to establish RSS-based distance estimation as a viable alternative to the existing two methods, namely, use of (i) GPS-enabled receivers (expensive and power-hungry) and (ii) round-trip delay (RTD)/relative delay (RD) measurement (needs synchronization between BSs) [16]. Though a relatively inaccurate approach, the RSS-based distance estimation is simple and entails no cost. In this context, it was suggested in [3] that pre-computed values of the estimated distance d, for all possible values of the RSS and for several different pathloss formulae, can be pre-stored as RSS-Vs-d Tables (RSSVDT) in the memory of the computer inside the MS. This would allow the stored values of the estimated distance d to be retrieved immediately, without wastage of any computer time and battery power. In order to efficiently manage its own HO process, the MS creates four conceptual zones by partitioning the dynamic range [0, Pm] of the RSS through a suitable choice of 3 different levels of RSS power P, viz. P1, P2 and P3 as shown in Figure 1. These zones are called the Zone of Normalcy (ZN), the Zone of Concern (ZC), the Zone of Emergency (ZE) and the Zone of Doom (ZD). The MS periodically monitors the RSS of its SBS via the MOB_NBR-ADV broadcasts [1] for identifying the zone it is presently in. Very little HO-related activity is needed in the ZN and, additionally, all HOrelated activities (including those carried out by the BS after the MS has selected the TBS) should preferably be completed before the ZD is entered so that any call drop owing to poor RSS in the ZD and/or excessive HO delay becomes highly improbable. Now, we are in a position to describe the proposed RSS-based distance estimationcum-AOD-based lookahead technique that the MS performs for controlling its own HO. For the purpose of explanation, we assume that the MS has six NBSs, A, B, C, D, E and F, clustered around its SBS S and the MS is moving along the straight line XY (Figure 2) at any speed up to 120 km/hr. How the MS selects its TBS may now be explained as follows:
36
S.K. Ray et al.
Fig. 1. Zones based on RSS levels
Step 1: During its stay in the ZN (Pm ≥ P > P3) where the MS receives high RSS P from its SBS, the MS creates, from the MOB_NBR-ADV broadcasts made by the SBS S, its set {A, C, D, E} of Potential TBSs (PTBS) by excluding those NBSs (B and F in our example) which do not have adequate QoS-BW capability to become a TBS. This screening not only reduces the number of PTBSs to be scanned but also removes any unfortunate possibility for the MS to receive a poor quality service after HO. Step 2: When the MS enters the ZC (P3 ≥ P > P2) after leaving the ZN, it starts receiving a power P from the SBS, which is “less than normal but still much higher than the Minimum Acceptable Signal Level (MASL)”. So, in anticipation of the possible need for a HO, the MS, when it is at the point x (see Fig. 1) during its journey, it starts a scanning iteration for the four short-listed PTBSs in order to obtain the RSSs from them for the purpose of estimating their respective current distances dA, dC, dD and dE from it (i.e. from point x). Next, after the appropriately chosen period of time T seconds (this time T should be chosen depending on the current velocity of the MS) when the MS is at the point y on its line of motion, the MS starts a second scanning iteration for the four PTBSs (or less, if the RSS from any one was below the MASL) to estimate their respective changed distances dA', dC', dD' and dE'. At this point, we make an assumption that the motion of the MS is linear at least from the beginning of the first scanning iteration till the completion of the entire process of HO. This is probably not an unreasonable assumption for drives on the highways or important roads in urban areas, which are relatively straight, rather than being curved or zigzag over short stretches.
A Fast and Simple Scheme for MS-Controlled Handover in Mobile WiMAX
37
Fig. 2. Distance estimation-based lookahead scheme
Now it may be observed from Figure 2 that after the two scanning iterations, which have yielded a pair of distance samples for each PTBS, e.g. (Cx, Cy) for C, a triangle has been formed for each PTBS (e.g. ∆ xCy for C), with all the four triangles standing on the same common side (base) xy which lies on the line of motion of the MS. More importantly, it should also be observed that the line of motion XY of the MS has created, at the point x, an AOD θ (e.g. angle Cxy) with each PTBS on each triangle. The AOD θ (0º ≤ θ ≤ 180º) characterizes the motion of the MS relative to the four (static) PTBSs as detailed in Table 1. W.r.t the table it should be mentioned that for value of θ = 0º and 180º, the concept of a triangle vanishes at these angles as the triangle becomes a straight line. From the above, a looking ahead makes it obvious that the PTBS with the lowest value of θ promises to offer the strongest RSS to the MS in the near future and hence should be selected as the TBS. However, to do this, some means of identifying the PTBS having the minimum value of θ must be found out. This problem has been solved with the following three observations. 1. In each triangle, lengths of all the three sides are known. While lengths of two of the sides have been estimated through scanning and RSS measurement, length of the third (common) one can be computed as Length (xy) = vT where v, the average velocity of the MS during T, can be easily measured with simple instrumentation.
(1)
38
S.K. Ray et al. Table 1. θ Vs Characterization of MS’s motion
Value of θ
Characterization of the motion of MS w.r.t. the PTBS
0º
MS is moving absolutely towards the PTBS, i.e. it has the highest possible progressive or forward movement towards the PTBS.
0º<θ<90º
The MS is moving towards the PTBS.
90º
Movement of the MS is tangential and cannot be characterized as either progressive or regressive w.r.t. the PTBS.
90º<θ<180º
The MS is moving away from the PTBS.
180º
The MS is moving absolutely away from the PTBS, i.e. it has the highest regressive or backward movement away from the PTBS.
2. If, inside each triangle, we build a right-angled triangle by making the common side xy its hypotenuse and dropping a perpendicular from y upon the side joining x with the PTBS, then, obviously, the PTBS with the lowest value of the AOD θ will have the highest value for COS θ. 3. From the well known “Law of Cosines” in Trigonometry, cosine of any angle of a triangle, whose all three sides are known, can be determined. For example, considering ∆ Dxy in Fig. 2 (inspection tells us that D has the smallest θ = angle Dxy among the four PTBSs) and applying the Law of Cosines, we have COS θ = {(Dx)2 + (xy)2 – (Dy)2} / {2(Dx)(xy)}
(2)
Thus with four computations of Equation (2) and three comparisons between the four values of COS θ, the MS can select the TBS out of the four PTBSs. However, the MS does not make the final selection of the TBS at this time in keeping with the well known “look before you leap” dictum, which requires a last minute check. In the present case, the check is necessitated by the possibility that the selected MS may change its direction of motion even at the last moment so that a good standby PTBS would be welcome. Accordingly, two PTBSs, to be called Candidate TBSs (CTBS), are selected. The two must have the largest values of COS θ, show a progressive movement (0º ≤ θ < 90º) and have a signal level greater than the MASL. Step 3: After reaching the ZE (P2 ≥ P > P1), the MS finalizes its selection of the TBS from among the two CTBSs and then requests the SBS, through a MOB_HO-IND message [1], for executing an urgent HO by passing the ID of the selected TBS. As stated earlier, the HO process should preferably be completed before the MS enters the ZD. Now, in order to carry out the final selection of the TBS, the MS carries out a
A Fast and Simple Scheme for MS-Controlled Handover in Mobile WiMAX
39
final scanning iteration for CTBS1 and CTBS2. CTBS1 is selected if it shows both a progressive movement (compared to its previous distance) and a signal level greater than MASL. Otherwise, CTBS2 is selected. The implicit assumption is that at least one of the two will hopefully maintain the trend that both had shown in the previous scanning. Figure 3 shows the implementing flowchart of the scheme.
4 Performance Evaluation 4.1 Simulation Scenario The performance evaluation of the proposed HO scheme was done using the IEEE 802.16e OFDMA [14] model implemented in the Qualnet 4.5 Simulator [17]. The
Fig. 3. Flowchart of proposed MAC-layer HHO scheme
40
S.K. Ray et al.
simulation topology consists of 25 nodes spread over a 1500 m x 1500 m terrain. 6 of the nodes are the BSs (1 SBS and 5 NBSs) that are deployed in a multi-cell environment operating in the 2.4 GHz – 2.45 GHz band with different radio frequencies. One node is the Access Network Gateway (ASN-GW) whereas the remaining 18 nodes are the MSs, with 3 MSs per cell under each BS. The nature of traffic used in the simulation is CBR. As per our simulation model, a single MS, initially controlled by its SBS, is modelled to randomly move around between the BSs thereafter and perform a HO whenever needed during the course of the simulation. A Random Waypoint Mobility model [18] was used to model movements of the MS during simulations for different speeds, varying from 20 km/hr to 120 km/hr [13]. The two-ray path loss model [14] is used to incorporate the path loss effects during simulation. Table 2 lists the important simulation parameters that have been assumed according to the WiMAX forum specifications [19]. For our scheme, the BS signal values (in dBm) were converted to milliwatts. All the graphs shown in this Section depict results based on the method of multiple independent replications, on an average, each of which lasted for approximately 20 minutes (real time). The maximum relative statistical error is 7% at the 0.95 confidence level. Table 2. Key simulation parameters
Parameters Number of BSs Number of MSs Number of cells Bandwidth FFT Size No. of Sub channels MAC Propagation Delay VoIP Application Exists? Environment Temperature (K) Noise Factor (K) Default Frame Length Signal Values (in dBm) BS Antenna Height MS Antenna Height QPSK Encoding Rate BS Link Propagation Delay Scan Interleaving Interval Scan Iterations MS’s movement speed
Values 6 18 6 10 MHz 1024 30 1 µs Yes 290 10 20 ms -76, -78, -80 15 m 1.5 m 0.5 1 ms 6 frames 3 20 kmph – 120 kmph
4.2 HO Latency Analysis As was stated in Section 2, the overall HO time comprises of the sum of the NTAP time and the AHOP time. In contrast with the conventional Mobile WiMAX scheme
A Fast and Simple Scheme for MS-Controlled Handover in Mobile WiMAX
41
where the MS carries out scanning and synchronization activities with all the advertised NBSs before short-listing a few, the overall NTAP latency in our scheme is much reduced due to much less scanning activities performed by the MS (see Section 3). So far as the AHOP time in the conventional HO scheme is concerned, prior to the synchronization, ranging, capability negotiation and authorization-registration times between the MS and the selected TBS, a major amount of time is consumed for HO preparation. The latter is concerned with the finalization of the ultimate TBS before the MS and the SBS jointly go for the HO. During this HO preparation time, the SBS exchanges a significant number of MAC management messages with the MS as well as with all the PTBSs to ensure that the MS would receive adequate QoS, BW and other relevant resources from its next SBS after HO. However, in our scheme, because the PTBSs are selected through inter-BS communication over the backbone network and, that too, prior to the scanning, the actual HO preparation time is omitted. This has led to a large reduction in AHOP time. Thus, there has been a significant reduction in the overall HO time. Preliminary simulation results show that, in comparison with the conventional method, our scheme can reduce the NTAP delay by as much as 53% as shown in Figure 4. Also, as shown in Figure 5, for the overall MAC-layer HHO latency in our scheme, the reduction is as much as 49% compared to the conventional Mobile WiMAX scheme. Both for the pre-HO (i.e. NTAP) latency analysis and for the total HO latency analysis, simulations have been carried out for six different speeds of the MS, lying in the range of 20 - 120 km/hr.
Fig. 4. Comparison of NTAP time
42
S.K. Ray et al.
Fig. 5. Comparison of overall handover latency
Fig. 6. Comparison of scannings per replication
4.3 Analysis of the Number of Scanning Performed Figure 6 shows the result of comparison of the average number of scanning performed per replication, at different speeds of the MS. Clearly, scanning is much fewer in our
A Fast and Simple Scheme for MS-Controlled Handover in Mobile WiMAX
43
scheme in comparison to the conventional Mobile WiMAX HO scheme. Again, this is because of avoidance of unnecessary scanning in our scheme.
5 Conclusion An MS-controlled MAC-layer scheme for achieving a fast HHO in Mobile WiMAX networks, based on RSS-based distance estimation and relative velocity-based lookahead, has been described. There is a marked reduction in the overall HO latency because of the intelligent management by the MS of both NTAP and AHOP – the two main phases in the Mobile WiMAX HHO procedure. Aided by the concept of four zones, a good part of the HO-related jobs are completed even before the RSS from the SBS reaches the HO-threshold level. Intelligent short-listing of NBSs as PTBSs has considerably reduced the scanning overhead. Finally, the RSS-based estimation of the relative distances of the PTBSs from the MS and the lookahead based on angle of divergence from line of motion of the MS has enabled very authentic selection of the TBS. Besides a fast and efficient HO, an important contribution of the proposed MS-controlled HO scheme is the promise of enhancing the scalability of the Mobile WiMAX networks allowing each BS to serve a much larger number of MSs.
References 1. IEEE 802.16e-2005: IEEE Standard for Local and Metropolitan Area Networks-Part 16: Air Interface for Fixed and Mobile Broadband Wireless Access Systems 2. WiMAX Forum White Paper: Mobile WiMAX-Part I: A Technical Overview and Performance Evaluation (August 2006) 3. Ray, S.K., Ray, S.K., Pawlikowski, K., McInnes, A., Sirisena, H.: Self-Tracking Mobile Station Controls Its Fast Handover in Mobile WiMAX. In: IEEE Wireless Communications and Networking Conference, pp. 1–6. IEEE Press, Sydney (2010) 4. Ray, S.K., Kumar, J., Sen, S., Nath, J.: Modified distance vector routing scheme for a MANET. In: The National Conference on Communications, Kanpur, pp. 197–202 (2007) 5. Ray, S.K., Pawlikowski, K., Sirisena, H.: Handover in Mobile WiMAX Networks: The State of Art and Research Issues. J. IEEE Communications Surveys and Tutorials Mag., 376–399 (August 2010) 6. Ray, S.K., Pawlikowski, K., Sirisena, H.: A Fast MAC-layer handover for an IEEE 802.16e-based WMAN. In: Wang, C. (ed.) AccessNets 2008. LNCS, vol. 6, pp. 102–117. Springer, Heidelberg (2009) 7. Boukerche, A., Zhang, Z., Pazzi, R.W.: Reducing Handoff Latency for WiMAX Networks using Mobility Patterns. In: IEEE Wireless Communications and Networking Conference. IEEE Press, Sydney (2010) 8. Rouil, R., Golmie, N.: Adaptive Channel Scanning for IEEE 802.16e. In: IEEE Military Communications Conference, pp. 1–6. IEEE Press, Washington (2006) 9. Daniel, K., Rohde, S., Subik, S., Wietfeld, C.: Performance Evaluation for Mobile WiMAX Handover with a Continuous Scanning Algorithm. In: IEEE Mobile WiMAX Symposium, pp. 30–35. IEEE Press, California (2009) 10. Choi, S., Hwang, G.-H., Kwon, T., Lim, A.R., Cho, D.H.: Fast handover scheme for realtime downlink services in IEEE 802.16e BWA system. In: IEEE Vehicular Technology Conference, pp. 2028–2032. IEEE Press, Stockholm (2005)
44
S.K. Ray et al.
11. Jiao, W., Jiang, P., Ma, Y.: Fast handover scheme for real-time applications in Mobile WiMAX. In: IEEE International Conference on Communications, pp. 6038–6042. IEEE Press, Glasgow (2007) 12. Anh, N.H., Kawai, M.: An Adaptive Mobility Handoff Scheme for Mobile WiMAX Networks. In: International Conference on Wireless Communications, Vehicular Technology, Information Theory and Aerospace & Electronic Systems Technology, Aalborg, pp. 827– 831 (2009) 13. Choi, Y.-H.: Mobility management of IEEE 802.16e networks. Intl. J. of Comp. Sc. and Net. Sec., 89–93 (February 2008) 14. Andrews, J.G., Ghosh, A., Muhamed, R.: Fundamentals of WiMAX. Prentice-Hall, Englewood Cliffs (2007) 15. Bshara, M., Deblauwe, N., Biesen, L.V.: Localization in WiMAX networks based on signal strength observations. In: IEEE Global Communications Conference. IEEE Press, New Orleans (2008) 16. Venkatachalam, M., Etemad, K., Ballantyne, W., Chen, B.: Localization Services in WiMAX Networks. IEEE Comms Mag., 92–98 (October 2009) 17. Scalable Network Technologies, http://www.scalable-networks.com 18. Camp, T., Boleng, J., Davies, V.: A survey of mobility models for Ad-Hoc network research. J. Wireless Communications and Mobile Computing, 483–502 (September 2002) 19. WiMAX Forum Mobile System Profile: Release 1.0 Approved System Specification, WiMAX Forum Working Group
ACCESSNETS 2010
Technical Session 2: Emerging Applications
Modeling the Content Popularity Evolution in Video-on-Demand Systems Attila Kőrösi1, Balázs Székely2 , and Miklós Máté1 1
Department of Telecommunication and Media Informatics Budapest University of Technology and Economics H-1529 B.O. Box 91, Hungary {korosi,mate}@tmit.bme.hu 2 Institute of Mathematics Budapest University of Technology and Economics
[email protected]
Abstract. The simulation and testing of Video-on-Demand (VoD) services require the generation of realistic content request patterns to emulate a virtual user base. The efficiency of these services depend on the popularity distribution of the video library, thus the traffic generators have to mimic the statistical properties of real life video requests. In this paper the connection among the content popularity descriptors of a generic VoD service is investigated. We provide an analytical model for the relationships among the most important popularity descriptors, such as the ordered long term popularity of the whole video library, the popularity evolutions and the initial popularity of the individual contents. Beyond the theoretical interest, our method provides a simple way of generating realistic request patterns for simulating or testing media servers. Keywords: Video popularity, analytical model.
1
Introduction and Related Works
Building true Video-on-Demand (VoD) services with strong quality and availability grantees becomes feasible with the widespread adoption of broadband Internet access. The demand for VoD systems is high, as the customers are gradually turning away from scheduled broadcasts to personalized multimedia contents. VoD systems have high bandwidth requirements; therefore the effect of introducing a VoD service on the existing network must be examined through simulations before deployment, thus there is a strong need for accurate modeling of all components of a VoD. Perhaps the most important component of VoD systems are the clients, because the characteristics of the network traffic of the VoD largely depends on their content selections. The long-term popularity distribution is the most important characteristic of a content library. The relative popularity of a content is defined with the number of requests for that content divided by the total number of requests in a (usually long) time interval. Content popularities are usually displayed in R. Szabó et al. (Eds.): AccessNets 2010, LNICST 63, pp. 47–61, 2011. © Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
48
A. Kőrösi, B. Székely, and M. Máté
Fig. 1. Typical ordered long-term popularity distributions (source: [6])
decreasing order of popularity, and on a log-log scale, as in Figure 1. This curve is called ordered long-term popularity, which can be considered as a probability distribution, and it is usually modeled with a Zipf-like distribution based on empirical studies [1]. The standard Zipf distribution is linear on a log-log scale, but the real-world popularity distributions are not, thus several modifications were proposed to the Zipf distribution to fit the empirical data; such Zipf-like distributions include the Zipf-Mandelbrot law [4] and the k-transformation [6]. Recently, the use of the stretched exponential distribution has been suggested instead of a Zipf-like distribution [3]. We remark, that often, like in this paper, the absolute popularity is studied, without being divided by the total number of requests. The daily popularity or short term popularity of a content library is also an important descriptor, because replicating frequently accessed items in a location closer to the clients is often required in order to decrease network bandwidth requirements. There are several such bandwidth-optimization schemes, ranging from simple caching to complex content delivery networks [5]. The efficiency of these solutions depends on the steepness of the popularity curve; if the majority of the requests are for a small number of contents, then a caching scheme can be very efficient. The popularity evolution or lifespan, which is the timely change of the relative popularity of the individual contents. It is also interesting, because several caching optimizations depend on the prediction of the popularity changes. The most common one is when a content, which is expected to become popular in the near future, is inserted into the caches (precaching). Therefore, it is important to analyse the properties and reasons of the short-term popularity changes, and their connection to the long-term popularity distribution. The most commonly observed popularity evolution curve shows an increase immediately after the introduction, a short apex, and a long decrease [6], but other shapes have also been observed [7]. Contents can be classified into categories, based on the type of their popularity evolution. A quite common classification is the distinction between “news”
Content Popularity Model
49
and “movie” types. News are typically very popular for a short time after their introduction, but become obsolete very quickly. On the other hand, movies have smaller initial popularity, but remain relevant significantly longer. In this paper we resolve the connections between the long-term popularity and the other popularity descriptors. These descriptors are the distribution of the video types, and the properties that depend on the type: the release day distribution, the popularity evolution, and the distribution of the initial popularity. As far as we know no such model is available in the literature. The main contribution of our paper is the following. If one of the above parameters is unknown, an approximation of the missing parameter can be constructed. Our model can handle arbitrary long-term popularity distributions and the other parameters can be chosen arbitrarily – as long as they do not contradict to each other. Beyond the theoretical interest, our method provides a simple way of generating realistic request patterns for simulating or testing media servers. The rest of the paper is organized as follows. In Section 2 we introduce the model and the notions appearing in the paper, along with the connection among the popularity descriptors. In Section 3 we show how the missing parameter can be approximated. In Section 4 we describe how our model can be used for generating user events, and compare our method to the ones found in the literature. Finally, in Section 5 we summarise our results and draw the conclusions.
2
Notations and the Popularity Model
In this section we introduce the main notations and describe the popularity model. Afterwards, we present the main relations that we use to derive the results in later sections. 2.1
Description of the Model
The observed period consists of D observation days, indexed with the set {1, 2, . . . , D}, during which a total of N videos have been released. We assign four parameters to each video, namely: type θ ∈ Θ according to a given type distribution G on the set Θ. The following three parameters depend on the type: initial popularity Iθ is a positive real number valued random variable, which determines the number of claims for a video of type θ on the day it is released. The distribution of Iθ is denoted by Fθ . popularity evolution function hθ : {1, 2, . . . } → [0, ∞) is a deterministic function, which describes how the popularity changes for one video of type θ during its lifetime in the observed period. hθ is an intrinsic parameter of the video, as it can be seen from the following definition. For n ≥ 1 we define hθ (n) :=
# of claims for a video of type θ on day n after its release
Iθ
(1)
50
A. Kőrösi, B. Székely, and M. Máté
Consequently, the number of claims for a video of type θ on day n after its release is Iθ hθ (n). release day dθ from {1, 2, . . . , D} according to a release day distribution {pθ,d , 1 ≤ d ≤ D} depending on θ. Note that the observation days and the release days are indexed with the same set. Remark 1. Instead of observation days we can take observation weeks. In this case the other parameters can be changed appropriately. For example the initial popularity counts the requests on the video during its first week in the system. Based on the above definitions, video k (1 ≤ k ≤ N ) can be represented by its type θk and the starting day dk := dθk , thus the popularity evolution function of video k is hk := hθk , and its initial popularity is Ik := Iθk . Definition 1. Let Xk denote the long term popularity of video k (1 ≤ k ≤ N ), that is, the number of claims for video k, introduced on day dk , during the observed period of D days. It is easy to see that the following equation holds: Xk = Ik
D−d k +1
hk (m).
(2)
m=1
Since Xk depends on the random variables (Ik , dk , hk ) and (Ik , dk , hk ), k = 1, . . . , N is a sequence of independent and identically distributed (i.i.d.) random variables (they were generated independently) it can be seen that the X1 , . . . , XN long term popularities are also i.i.d. random variables. (hk is also a random variable since hk = hθk and θk is a random variable.) For further reference, we define the overall intrinsic popularity of video k over the whole observed period. First, we introduce the aggregated intrinsic popularity of video k during its first n days: Hk (n) :=
n
hk (m).
(3)
m=1
Observe that Hk (n) is an increasing function of n and Hk (1) = hk (1) = 1. Using this notation we can write Xk = Ik Hk (D − dk + 1) since the video k is added on day dk so it is in the system for D − dk + 1 days. 2.2
Long Term Popularity Parameters
Now we introduce two parameters that describe the global behavior of the system:
Content Popularity Model
51
long term popularity curve Π : {1, 2, . . .} → [0, ∞) is a decreasing deterministic function. For an appropriately long period we count the number of claims for each video and put these numbers in decreasing order, thus Π(i) is the number of claims for the ith most popular video in that period. long term popularity cdf Φ is defined as Φ(x) := P(Xk ≤ x),
x > 0.
(4)
for every video. It is the cumulative distribution function (cdf) of the number of claims arrive for a randomly chosen video during a long period. Since X1 , X2 , . . . , XN are independent and identically distributed random variables by definition, we can omit the index k from Φk (x). There exists a one-to-one correspondence between the long term parameters in the following sense. Before presenting the precise statement let us recall that the empirical distribution function of the sample X1 , . . . , XN generated independently from the distribution Φ is defined by ΦN (x) :=
N 1 I{Xk ≤ x}. N k=1
Moreover, by definition, Π is the ordered sample of X1 , X2 , . . . , XN . Proposition 1. Let the long term popularities X1 , X2 , . . . , XN be a sequence of independent random variables with common distribution Φ. Then N − Π −1 (x) ΦN (x) = , N where Π −1 (x) denotes the generalized inverse, Π −1 (x) = sup{i : Π(i) ≥ x}. If N is large then ΦN is close to Φ, since the empirical distribution function converges to the original distribution function as the sample size (N ) increases. Thus N − Π −1 (x) Φ(x) ≈ . (5) N In the rest of the paper we will use Φ to describe the long term popularity, since Eq. (5) gives a simple relation between Φ and Π. Further, using Φ instead of Π is better for modeling purposes, since Eq. (2) describes the connection between the Xk long term popularity and the other parameters, and Φ is the distribution of Xk . This connection among the distribution functions will be discussed in Sec. 2.3 in detail. Proof (for Proposition 1). The popularity curve shows that for the ith most popular video there have been Π(i) claims in the long-run. The inverse of Π shows that no more than Π −1 (x) videos have x or more claims. This means that less than N − Π −1 (x) videos had less than x claims. Since the number of −1 videos is N , the portion of videos that have less than x claims is N−ΠN (x) . If the popularities of the N videos are equal to the N elements (unordered) sample −1 X1 , X2 , . . . , XN of Φ then by the definition of ΦN one yields ΦN (x) = N −ΠN (x) .
52
2.3
A. Kőrösi, B. Székely, and M. Máté
Long Term Popularity as a Function of the Parameters
In this subsection we express Φ using the type distribution (G), release day distribution ({pθ,d, d = 1, 2, . . . , D}), the initial popularity distributions (Fθ ) and the popularity changes (hθ ). Recalling the definition of Φ in equation (4) we have Φ(x) = P(X ≤ x) = P(X ≤ x|θ)G( dθ)
=
n
Θ
P(X ≤ x|θ, d)pθ,d G( dθ).
Θ d=1
Using the interpretation (2), the definition of H (3) and that the distribution of Iθ is denoted by Fθ , the last term in the previous formula equals n Φ(x) = P(Iθ Hθ (D − d + 1) ≤ x)pθ,d G( dθ)
=
=
Θ d=1 n Θ d=1 n
P Iθ ≤ Fθ
Θ d=1
Thus we have Φ(x) =
n
x Hθ (D − d + 1)
x Hθ (D − d + 1)
Fθ
Θ d=1
pθ,dG( dθ)
x Hθ (D − d + 1)
pθ,dG( dθ).
pθ,dG( dθ).
(6)
Definition 2. We say that the model is well defined, if the given parameters (Φ, G, {Fθ }, {hθ } and {pθ,d}) satisfy equation (6). The functional equation (6) will be used in Section 3, for computing the missing parameter functions.
3
Connections among the Popularity Descriptors
In this section we assume that the number of types T is finite. We will show how the missing parameter can be approximated if the other parameters are known. First, observe that if T is finite then the equation (6) can be written in the form T D x Φ(x) = gθ pθ,d Fθ , (7) Hθ (D − d + 1) θ=1
d=1
where gθ denotes the probability that G concentrates to type θ, θ = 1, 2, . . . , T . We will solve the four implicit problems for the missing Fθ , hθ , pθ,d , gθ cases, each
Content Popularity Model
53
of them will be presented in separate subsections. The problems being implicit means that we always suppose that Φ is known. 3.1
Approximation of the Initial Popularities
In this section we determine the suitable initial popularity distributions (Fθ ) in case the popularity change functions (hθ ), the type distribution (gθ ), the release day distribution (pd ) and the long term popularity, Π or Φ, are given. We use equation (7) for obtaining numerical approximations for Fθ , θ ∈ Θ. We will determine the cdf of the initial distributions of several fixed points by solving a Linear Programming (LP) problem [2]. Let the set of base points {x1 , x2 , . . . , xL } be given in increasing order. These are the points at which the quality of the approximation of Φ will be checked. The variables of the LP problem are i fθ,i,d = Fθ ( Hθx(d) ) (1 ≤ θ ≤ T , 1 ≤ i ≤ L, 1 ≤ d ≤ D).
For fixed fθ,i,d the approximation of Φ at points xi is given by the following L equations: T D i) = (∀i) Φ(x gθ pθ,d fθ,i,d θ=1
d=1
We have to ensure that the variables fθ,i,d determine the distribution functions x for any θ. Therefore, we assume that if Hθx(di 1 ) ≤ Hθ (dj 2 ) then fθ,i,d1 ≤ fθ,j,d2 for any type θ. Thus, we can define the following LP problem: min ε i) ≤ ε (∀i) : −ε ≤ Φ(xi ) − Φ(x x ∀θ, i, j, d1 , d2 such that Hθx(di 1 ) ≤ Hθ (dj 2 ) : 0 ≤ fθ,i,d1 ≤ fθ,j,d2 ≤ 1 Solving this problem yields the best approximating initial popularity functions. To illustrate how the method works we present the following example. Example 1. (The figures related to this example are shown in Figure 2 and −3 2−i 3) Let T = 2, gθ ∈ {0.6, 0.4}, and h1 (i) = (50+i) 51−3 , h2 (i) = 1/2 . The evolution functions are fundamentally different: h1 has power law decay (represents regular movies), while h2 has exponential decay (represents news like videos). Further, let Fθ (x) = 1 − (1 − x)−αθ , with αθ ∈ {8, 2}. The release day distribution is uniform over the observed period for any type that consists of D = 50 weeks according to Remark 1. There were N = 5000 different movies. We generate the long-term popularity curve Φ using these parameters and equation (7). Then we solve the problem of finding Fθ , θ = 1, 2 with the obtained Φ. Then we compare the approximated initial popularity distributions with the given ones and we also compare the generated Φ based on the original functions and Φ generated from h’s and the approximated F ’s.
54
A. Kőrösi, B. Székely, and M. Máté 1.0
0.8
0.6
L = 10 L = 20 L = 100 theoretical
0.4
0.2
0
500
1000
1500
2000
number of claims
Fig. 2. The distribution function Φ and approximated Φ if (Fθ , θ ∈ Θ) is the missing parameter
The quality of the approximation was investigated for three different base i−1 L−1 point sets. For L = 10, 20, 100 the base points are xi = xmin xxmax , where min xmin = 0.1 and xmax = 2000. The base points are chosen so that the increment of Φ between two neighboring xi ’s is constant. Of course, one may use other base point set but in our experience this kind of set provides fairy good approximation not only for Φ but the initial popularity distributions Fθ , θ ∈ Θ as well. In Figures 2 and 3 the original parameters and the three approximated parameters are depicted for Φ, F1 and F2 . The difference between the original and the approximated Φ at the base points x1 , . . . , xL is at most 10−9 . The difference at the other points depends on the approximation of Fθ ’s. We approximated Fθ ’s by using jump functions (and not linear approximates) that causes fairly big errors. However, the difference between the original and the approximated Φ never exceeds 0.05, 0.02, 0.004 in cases L = 10, 20, 100 respectively. 3.2
Approximation of the Popularity Changes
In this section we will approximate the popularity evolution functions, considering that the long-term Φ and the initial Fθ popularity distributions are given for T types. We will use equation (6) and, like in the previous subsection, the solution, hθ , θ ∈ Θ, will be the solution of an appropriate LP problem. We will minimize the error ε: T D x ε = sup gθ pd Fθ − Φ(x) , H (D − d + 1) x∈x θ θ=1
d=1
where x = {x1 , . . . , xL } is a set of given points. We first describe the idea of the approximation for T = 1, then we present the general case for T > 1. Assume T = 1. Using the notation pd = pθ,d , we have D x pd F = Φ(x). H(D − d + 1) d=1
Content Popularity Model
55
1.0
0.8
0.6
L = 10 L = 20 L = 100 theoretical
0.4
0.2
0
20
40
60
80
number of claims on the released week
(a) Theoretical and approximated F1
number of claims on the released week
(b) Theoretical and approximated F2
Fig. 3. Approximation of the initial popularity distributions Fθ , θ ∈ Θ. Two types with different initial popularity distributions. The details are given in Example 1. The accuracy of the approximations increase as more base points are added to the LP problem.
The sum can be interpreted as a mean of the discrete distribution Q, which 1 1 is concentrated on the points H(D−d+1) , d = 1, . . . , D, and point H(D−d+1) has probability pd . Thus, for a random variable X, with distribution Q, we can write: D d=1
pd F
x H(D − d + 1)
= EQ F (Xx) ,
(8)
where EQ denotes the expectation with respect to Q. We will approximate Q by a recursive sequence of distributions, such that the sup |FQ (x) − FQ (x)|.
(9)
x∈x
approximations is small. The distance between the distributions of Q and its Q
1 , which is a jump function with first element of the sequence is denoted by Q jumps at the arbitrarily chosen r 1 = (r1 , r2 , . . . , rK ) points. We want to assign
1 ({ri }) = si ) to make sure that the distance weights s1 = (s1 , s2 , . . . , sK ) as (Q K sup Φ(x) − sj F (rj x) (10) x∈x j=1 is minimal. This will be accomplished via the following LP problem:
(∀i)
−≤
min
K
j=1
j
sj F (rj xi ) − Φ(xi ) ≤ sj = 1
0 ≤ s1 , . . . , 0 ≤ sK
56
A. Kőrösi, B. Székely, and M. Máté
This solution for s1∗ for given r1 minimizes (10). Now, we do the following heuristically reasonable refinement step: to get better approximation in the sense of (9) we add new jump points to r 1 . First, take an intermediate distribution B concentrated on 1 = z1 > z2 > · · · > zD and each zn carries pn weight. Then, we solve the LP problem that minimizes supx∈x |FB (x) − FQ 1 (x)|. The solution 1 = ∗ z1 > z2∗ > ·· · > zD corresponds to the best H ∗ such that if H ∗ (n) = 1/zn∗ , then D K x is the closest function of this form to j=1 s1j F (rj xi ) d=1 pd F H ∗ (D−d+1)
∗ on the set {x1 , . . . , xL }. Second, we add the set z ∗ = {z1∗ , z2∗ , . . . , zD−1 } to r1 and 2 1 ∗ start the approximating procedure described above with r = r ∪ z . Similarly,
2 . We continue this refining procedure until we obtain a we can construct Q satisfactory collection of hθ such that Φ and the approximated Φ is close enough. Next, suppose that T > 1. For a given r θ = (rθ,1 , . . . , rθ,K ), θ = 1, . . . , T and given type distribution gθ , 1 ≤ θ ≤ T we want to find the best sθ = (sθ,1 , . . . , sθ,K ), θ = 1, . . . , T in the sense that the failure of the approximation is minimal, that is, min K T − ≤ θ=1 gθ j=1 sθ,j Fθ (rθ,j xi ) − Φ(xi ) ≤ , (∀i) j sθ,j = 1, (∀θ)
0 ≤ sθ,1 , . . . , 0 ≤ sθ,K , (∀θ)
20
accumulated rates
accumulated rates
As in the case T = 1, for each θ separately we add new points z ∗θ to r θ by constructing the best approximating intermediate distribution Bθ . We iterate this refinement technique until we obtain a satisfactory collection of hθ .
15
10
L = 10 L = 20 L = 100 theoretical
5
0
10
20
30
40
50
weeks
(a) Theoretical and approximated H1
weeks
(b) Theoretical and approximated H2
Fig. 4. Hθ and approximated Hθ , θ = 1, 2. The details are given in Example 1 and 2.
Since the algorithm presented above employs heuristic considerations it is worth verifying it through an example. Example 2. (The figures related to this example are shown in Figure 4.) The theoretical parameters are the same as in Example 1, and the solution follows the
Content Popularity Model
57
same pattern. We generate the long-term popularity curve Φ using the given parameters and equation (7), then we solve the problem of finding Hθ , θ = 1, 2 with the obtained Φ. Then we compare the approximated popularity change functions with the given ones and we also compare the generated Φ based on the original functions and Φ generated from the original F ’s and the approximated H’s. Although the convergence to the theoretical popularity changes is not guaranteed, because of the heuristic approximation method, the approximated popularity distribution Φ converges to the theoretical one. 3.3
Approximation of the Release Day Distribution
In this subsection we will show how the pθ,d release day distribution can be computed from known Φ, Fθ , hθ and G. Similarly to the previous subsections, we will use equation (6): x Φ(x) = pθ,dgθ Fθ (11) Hθ (D − d + 1) θ
d
We solve an LP problem at points x =xi , 1 ≤ i ≤ K and solved for pθ,d with the bounds 0 ≤ pθ,d ≤ 1 for any θ and d pθ,d = 1. 3.4
Approximation of the Type Distribution
In this subsection we will investigate how the gθ type distribution can be computed from known Φ, Fθ and hθ . The solution is quite simple. Generate the functions φθ from the functions Fθ and hθ by using equation (6) as though there was only one type for each θ above. Using equation (6) again, we have the following equation: x Φ(x) = gθ pd Fθ = gθ Φθ (x) (12) Hθ (D − d + 1) θ
d
θ
Now, for finding gθ we have to solve an LP problem in some points {x1 , . . . , xL } similar ways as in the previous sections. Remark 2. On the accuracy of the approximations. In case of finding Fθ , θ ∈ Θ and finding hθ , θ ∈ Θ there is no guarantee for convergence. However, for certain parameter combinations the approximations get very close tho the theoretical values. The accuracy of the approximation highly depends on the number of base points x1 , x2 , . . . , xL . In Figure 3 and 4 it can be seen that the approximations of F get closer to the theoretical ones while the approximations of H does not. In case of finding the type distribution and the release day distribution the approximations typically converge. The explanation is that if the functions Φθ , θ = 1, . . . , T are not pairwise equal, then typically we can find L = T points {x1 , . . . , XT } such that the vectors [Φθ (x1 ), . . . , Φθ (xT )]t , θ = 1, . . . , T are linearly independent. Consequently, the system of equations in (12) has one
58
A. Kőrösi, B. Székely, and M. Máté
unique solution. The same argument can be repeated for the convergence of the release day distribution. If one can find L = T D points such that the vectors
t xi Fθ Hθ (D−d+1) , i = 1, . . . , T D , θ = 1, . . . , T, d = 1, . . . , D are linearly independent then the system of equations in (11) has one unique solution. The condition typically holds if the function Fθ , θ = 1, . . . , T and the functions Hθ , θ = 1, . . . , T are pairwise different. Remark 3. If there is exactly one more unknown parameter beyond the type distribution gθ , a Non Linear Programming (NLP) problem can be written with linear constrains and fourth degree objective function. Proof. Let εθ,i be the difference of the approximation of type θ from Φ in xi : εθ,i = Φ(xi ) −
D
pθ,d Fθ
d=1
xi Hθ (D − d + 1)
.
Then the error of the approximation in l2 norm in x1 , x2 , . . . , xL is L ( gθ εθ,i )2 . i=1 θ∈Θ
This error is minimal if its square is minimal, thus the NLP problem for finding the two unknown parameters is
εθ,i
4
L min i=1 ( θ∈Θ gθ εθ,i )2 D xi = Φ(xi ) − d=1 pθ,d Fθ Hθ (D−d+1) (∀θ∀i) θ gθ = 1 0 ≤ gθ (∀θ) .
Client Requests Generation
In this section we demonstrate how our model can be used to generate a series of client requests for testing or simulating a VoD system. The number of requests and their timely distribution will all follow the given distributions, because they are optimized independently. Assume that the type distribution {gθ }, the release day distribution {pd }, the initial popularities {Fθ } and the popularity changes {hθ } are given. The generating method is simple, since the construction is designed to solve this problem easily: to each video k we generate such release day dk , evolution type hk and initial popularity Ik , that the number of claims on day d for video k is I{d ≥ dk }Ik hk (d − dk + 1). The distribution of the requests within the observation period is fairly easy, as, according to other studies [6,7], its distribution is independent of the other
Content Popularity Model
59
# of claims in the long run
popularity descriptors. After the number of requests has been calculated for a given period, their exact timing can be determined using the given intensity distribution. If the observation period is one day, then this distribution is usually called diurnal access pattern, which has usually its maximum in the evening, and its minimum during the night.Similar recurring request intensity changes can be observed over weeks as well. To overcome the problem that the number of claims for a day can be fractional, because we do not require that hθ is integer valued, we take either Ik hk (d − dk + 1) or Ik hk (d − dk + 1) + 1 according some probability distribution, while ensuring that the sum of these integers is exactly Xk = Ik Hk (Dk ). This can be done very easily. The long-term distribution Φ of the simulated system (the empirical distribution) converges to the theoretical Φ because of Proposition 1. Figure 5 shows that the empirical distribution and the theoretical distribution are close to each other and the simulated long-term popularity curves also approximate the theoretic one. The continuous line is the analytical result, the dashed curves show the cases, when the number of videos in the system is 50, 500, 5000 in the scenario described in Example 1. The difference of Φ(x) and the approximated Φ at any x is not larger than 10−2 (50 videos), 10−3 (500 videos), and 10−4 (5000 videos). 104
x → Π(N x)
1000
N = 50 videos N = 500 videos N = 5000 videos theoretical
100
10
1/5000
0.001
1/500
0.01
0.1
1
1/50 relative popularity rank
# of claims in the long run
(a) Ordered long run popularity scaled to 1 (b) Long term popularity distribution −1 Π(N ·) on log-log scale Φ = N −Π on log-log scale N Fig. 5. Simulation results. The differences between the long run parameters (Π, Φ) of the simulated system and the theoretical parameters decrease as the number of videos increases. The relative popularity in Figure (a) means that on the x-axes the numbers x/N are depicted for x = 1, 2, . . . , N (N = 50, 500, 5000), where x denotes the popularity rank of the video in decreasing order.
Our method is comparable to the method of Medisyn [6]. Medisyn starts the request generation with a given long-term popularity curve Π, then, for each video in the library, it generates a random type, which can be “news” or “movie”. The probability of a video being “news” depends on the popularity of the video. In their measurements the authors found that the “news” type videos
60
A. Kőrösi, B. Székely, and M. Máté
tend to be more popular than the “movie” type ones, therefore they included this bias in their generator. Once the type is known for a video, it generates a release day according to the given release day distribution (the authors of Medisyn consider both the intensity and the interval between the releases). Then it selects a life span function (popularity evolution) for the video with randomly chosen parameter. This function is from an exponential family (its parameter is Pareto distributed) for the “news” type videos and from a lognormal family (the parameter is normally distributed) for the “movie” type ones. Finally, the total number of requests is distributed along the timeline according to the release day of the video and its life span function. In this way the initial popularity defined in our model is also obtained implicitly. Therefore, irrespectively of the randomly selected life span, Medisyn solves the problem presented in Section 2.3.
5
Conclusions
We provided a stochastic model for finding the relationships among the following popularity descriptors: (1) the ordered long-term popularity, (2) video type distribution, (3) release day distribution, (4) the distribution of the initial popularity of each individual video and (5) the popularity change over time for each individual video. An important feature of our model is the possibility of constructing an approximation of any missing popularity descriptor, unless the conditions contradict to each other. The missing parameter is the solution of an appropriate LP problem in all four cases (the ordered long-term popularity does not need to be approximated), thus we have four similar, but not identical approximation schemes. The two most important out of the four problems, from practical point of view, are finding the initial popularity distribution and the popularity evolution for the content types. As the examples have shown, the approximation works well for finding the initial popularities, the results were very close to the original distribution. Finding suitable popularity evolution functions is much harder, our procedure does not necessarily converge to the original functions. This is natural, since the popularity evolution has great degree of freedom. Our model is designed so that one can easily generate realistic request patterns for simulating or testing media servers. We have shown that the more videos there are in the VoD system, the parameters in the simulated system get closer to the theoretical ones. In the future we want increase the accuracy of our approximations, and try to find exact solutions for the missing parameters in special cases. We are also interested in finding a way to modify the model in order to take randomly occurring jumps in the popularity evolution into account. Acknowledgments. The work has been supported by HSNLab, Budapest University of Technology and Economics, http://www.hsnlab.hu
Content Popularity Model
61
References 1. Breslau, L., Cao, P., Fan, L., Phillips, G., Shenker, S.: Web caching and zipf-like distributions: evidence and implications. In: Proceedings of Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies, vol. 1, pp. 126– 134. IEEE, Los Alamitos (1999) 2. Dantzig, G.B., Thapa, M.N.: Linear programming 1: Introduction. Springer, Heidelberg (1997) 3. Guo, L., Tan, E., Chen, S., Xiao, Z., Zhang, X.: The stretched exponential distribution of internet media access patterns. In: Proc. of PODC 2008, Toronto, Canada (August 2008) 4. Mandelbrot, B.: Information Theory and Psycholinguistics. Penguin Books (1968) 5. Pallis, G., Vakali, A.: Insight and perspectives for content delivery networks. Communications of the ACM, 101–106 (January 2006) 6. Tang, W., Fu, Y., Cherkasova, L., Vahdat, A.: Modeling and generating realistic streaming media server workloads. Comput. Netw. 51(1), 336–356 (2007) 7. Yu, H., Zheng, D., Zhao, B.Y., Zheng, W.: Understanding user behavior in largescale video-on-demand systems. In: Proc. of Eurosys 2006, Leuven, Belgium, pp. 333–344 (2006)
Sizing of xDR Processing Systems B´alint Ary and S´ andor Imre Budapest University of Technology and Economics, Department of Telecommunications, Magyar Tud´ osok k¨ or´ utja 2., 1117 Budapest, Hungary
[email protected],
[email protected]
Abstract. Postpaid billing systems in most cases are using offline charging methods to rate the calls. Since latency is an accepted property, the throughput can be lower than the capacity required to process peak-hour traffic in a real-time manner. In this paper we will give an efficient mathematical model to calculate the processing power while taking the maximum queue size and maximum record age constraints into consideration.
1
Background
Billing systems in the telecommunication industry have different modules to fulfill the business processes from call pricing to the settlement of the bill. The module, which is responsible to rate the calls is often called rater. The mobile telecommunication companies generally have two different payment methods (prepaid and postpaid), and two different rating approach (offline and online). Usually the method determines the approach and the IT system underneath: online charging requires real-time computation thus requiring more processing power, while offline rating has softer constraints on the processing time and on the capacity of the supporting IT infrastructure. Online charging is done while the call is made via online, socket based interfaces, while offline rating is based on files. Sizing of the real-time (online) system can be done with the help of queuing theory and since the system shall be capable to process all the calls real-time (even in peak period), the sizing must take these busy periods into consideration. The records in offline rating are called call detail records, charging detail records or event detail records and often referred as CDRs, EDRs, or more generally xDRs. The price of the call made in the offline system is derived from the corresponding CDR and since these records are sent to the billing system using some non-real-time protocol (FTP for instance), there is no real-time requirement against these modules. This allows some latency during the processing and we can undersize the systems according to peak periods. Even though queuing theory can be applied here with changing incoming probability over time, in most cases the business is not interested in a few minutes difference between processing times. This simplification allows us to observe and calculate the required processing power with a greater scale and using functions that represents the incoming CDR number and the processing power over time. R. Szab´ o et al. (Eds.): AccessNets 2010, LNICST 63, pp. 62–70, 2011. c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
Sizing of xDR Processing Systems
63
Although we made some simplification, offline rating must comply with several requirements. As such, usually the business is interested in the maximum age of the unprocessed CDRs to calculate the fraud possibility and the operational team is interested in the maximum queue size to calculate the required disk space. In the next chapters we will represent the required mathematical formula to compute the minimum processing power if the maximum queue size and latency is given. Finding the proper, not oversized processing power is beneficial, and should lower the cost of IT infrastructure as well as software licensing fees. In this paper we will discuss the incoming CDRs versus the computing capacity, however, the same equations and results can be used to size call centers to the incoming calls, as in most cases the same business requirements (with different values and functions) should apply. Sadly, the call centers are far more sensitive for processing time differences, and the maximum age of a request in a queue shall kept low. Since processing time differences resulted from the call arrival distribution is significant, our model shall be circumspectly used. The available literature mainly deals with queuing theory while calculating the appropriate sizing for telecommunication and other queue based processing systems [2][6]. In many case [7] the estimated waiting time is calculated during peak hour, or a constraint is given for the maximum waiting time [3][4] but the job or record is vanishing from the queue after a certain amount of time, thus these models cannot be applied for telecommunication networks and call detail records. Some literature is dealing with call center sizing [1] and scheduling [5], which – as mentioned above – is more sensitive to processing time jitters, and as so, these models shall be used instead in these cases. In chapter two we will clarify the used model and simplifications as well as the possible business requirements. In section three and four we will detail the queue size and record age constraints respectively, while in section five we will represent a simple case with simplified functions as an example for the calculations. Section six summarizes the results of this paper and outlines possible future works.
2
Assumptions and Requirements
The queue size and the maximum item age in a processing queue cannot be given or calculated in a closed mathematical form in general. Since we are calculating the aforementioned values in a specific system, we can make some assumptions in order to simplify the complexity of the required formulas. We will use two different functions to represent the main characteristics of the system. We will denote the number of incoming CDRs over time with c(t), and we will represent the processing capacity of the system with p(t). The later one is measured with the number of processable CDRs. Thus, if p(t) ≥ c(t) for every t, then the system will process every CDR immediately, which (taking the real-life examples into consideration) is a rude waste of resources and a beautiful example of system oversizing. The terminology of charging and billing systems define the processing window as a daily time period, where the rating system is up and running and capable
64
B. Ary and S. Imre
to process and rate incoming CDRs. This processing window normally starts when the nightly activities (often referred as end of day or EOD) are finished and ends around midnight (where EOD activities are started again). The rating module must be turned off during the EOD period to assure consistent rating and billing, since the different reference modification and billing activities taking place at this time.
Fig. 1. General incoming CDR and processing power functions
The incoming number of CDRs can be represented with a general bell-curve: the number of phone calls, sent SMSs and GPRS activities are low during the night and peaks during the mid-day. The price of the call (or other services) is often different in this two (peak and off-peak) periods. Generally, the processing window starts, when the number of incoming CDRs is low but rising, and ends when the number of records is decreasing. The maximum processing power generally does not exceeds the number of incoming CDRs during peak hour, thus the two functions intersects four times as represented on Figure 1. The number of unprocessed CDRs in the processing queue increasing as long as the processing power is less then the number of incoming CDRs and decreasing in every other case. In our paper, we will assume the followings: AS1. The function representing the incoming CDRs (c(t)), and the function representing the processing power of the rating system (p(t)) resemble the functions represented in Figure 1. At least, the intersections and positions of the functions can be related to the displayed functions. AS2. Both functions are day-independent. We do not distinguish between weekdays, holidays and working days, and we do not calculate or care the differences between consecutive days. AS3. The scheduling of the CDRs in the processing queue is FIFO (first in, first out), which complies with the implementation of the available commercial telecommunication billing systems.
Sizing of xDR Processing Systems
65
Generally, these rating systems shall comply with different business requirements as mentioned in Section 1. Some of them are mandatory from engineering point of view, some of them are purely business, financial or security requirements. In this paper we will represent a sizing model, where the following three requirements are taken into consideration. R1. The system shall be capable to process the daily CDRs in one day. Moreover, the system shall have some spare capacity to process additional CDRs (taking Christmas or New Years Eve into consideration for example). R2. The maximum number of unprocessed CDRs should not exceed Q (a given IT parameter). R3. The oldest unprocessed CDR during the normal period shall not be older then K (a given business requirement) during the normal period. The system shall catch-up (lower the oldest record age below this level) shortly after it is started. To ease the further computations, please let us distinguish five different areas (A, B, C, D and E) and five different moments (m1 , m2 , m3 , m4 and m5 ) as displayed in Figure 1 as follows: A Early morning area. The processing is not yet started, or the processing capacity is less then the number of incoming CDRs. The size of this area is equal with the number CDRs increasing the processing queue during this period. B Morning area. The rater is up and running and the processing capacity is more than the number of incoming CDRs. The size of this area is equal with the number CDRs vanishing from the queue during this period. C Peak area. The processing has started, but the number of incoming CDRs exceeds the processing capacity again. The processing queue is increasing, and the increment is equal with the size of this area. D Afternoon area. The number of incoming CDRs is below the processing capacity. The processing queue is decreasing. E Night area. The system shut down, but CDRs are still coming in. The processing queue is increased with the size of this area. m1 Start time. The moment, when the processing power exceeds the number of CDRs in the morning. This is the end of area A and the start of area B. m2 Peak start time. The moment, when the number of incoming CDRs exceeds the processing power. This moment is around the start of the peak hour before noon. This is the end of area B and the start of area C. m3 Offpeak start time. The moment, when the processing power exceeds the number of CDRs in the afternoon. This is the end of area C and the start of area D. m4 Shutdown time. The number of incoming CDRs exceeds the processing power again. EOD will start shortly. This is the end of area D and the start of area E. m5 Midnight. This is the end of the day, and the end of area E.
66
B. Ary and S. Imre
We can calculate the above defined areas with the help of c(t), p(t) and the above defined moments as follows:
m1
A= 0 m 2 B= C= D= E=
3
m m1 3 m m2 4 m m3 5 m4
c(t)dt − p(t)dt − c(t)dt − p(t)dt − c(t)dt −
m1
p(t)dt
(1)
c(t)dt
(2)
p(t)dt
(3)
c(t)dt
(4)
p(t)dt.
(5)
0 m2 m m1 3 m2 m 4 m m3 5 m4
Queue Size
In this section we will give mathematical formulas for the first two requirements mentioned in Section 2. In order to process the proper amount of CDR in one day, we have to determine the processing capability to satisfy to following inequality: p(t)dt >
c(t)dt.
(6)
Using the areas defined in the requirements section, the following statement must comply: D = −A + B − C + D − E > 0. (7) where D denotes the additional CDR processing power in one day if it is greater then 0. Otherwise the first requirement (R1) is not met. We will prove, that if D > 0, then there is no unprocessed CDR at m2 or m4 . In order to do this, let us denote the number of unprocessed CDRs at the end of the day with R. Since the queue size is increasing before m1 , during m2 to m3 and after m4 , the queue size cannot be negative and due to assumption A2 the value of R on the previous day shall be equal with the current value, we can calculate R as: R = max(0, max(0, R + A − B) + C − D) + E.
(8)
If R + A > B, then the queue is not empty at m2 , since the unprocessed CDRs from the previous day, plus the morning CDRs are not processed till this moment, thus R = max(0, R + A − B + C − D) + E. (9)
Sizing of xDR Processing Systems
67
If R + A − B + C − D > 0, then R = R + A − B + C − D + E, but the condition of A − B + C − D + E < 0 (see equation 7) out rules this possibility, leaving us only with the R + A − B + C − D ≤ 0 option. In such case R = E, thus the processing queue is empty at m4 . If R + A < B, then the queue is empty at m2 , and we have the following equation for the queue size at the end of the day: R = max(0, C − D) + E.
(10)
Thus, the queue size is either R = E if C ≤ D (which makes the processing queue empty at m4 as well), or R = C − D + E otherwise. It can be easily understood, that the maximum queue size can be calculated as follows: Qmax = max(E + A, C, E + A − B + C, C − D + E + A).
4
(11)
Constraint on Record Ages
According to the requirements, the system shall catch-up with the CDRs early in the morning. More precisely, the system shall process all the CDRs which are older then K between m1 and m2 . If the system queue is empty in m2 , then this requirement is straightforward. Otherwise (if the processing queue is empty only at m4 ), this requirement can be modeled with the following integral function:
min(0,x−K)
E+
x
c(t)dt ≤
0
p(t)dt.
(12)
0
The processing function (p(t)) shall be capable to fulfill this inequality with the condition of m1 ≤ x ≤ m2 . Let us denote the result of this equation (the minimal x, which fulfills this inequality) with G (as grace period). Taking the mid-day ageing requirement into consideration, we have to differentiate two cases. If the queue does not clear out till m2 , then the above equation can be used with the condition of G ≤ x ≤ m4 , otherwise the requirement is fulfilled trivially until x ≤ m2 + K, for the rest of the time, the following equation can be used where m2 + K ≤ x ≤ m4 :
(x−K) m2
c(t)dt ≤
x
m2
p(t)dt.
(13)
If the queue is empty at m4 , then the requirement is trivially fulfilled after m4 until m4 + K, moreover it is fulfilled until N (extension period), where N is the solution of the following integral function if x ≥ m4 + K
x−K m4
c(t)dt =
x m4
p(t)dt.
(14)
68
5
B. Ary and S. Imre
Example
Let us give an example for these calculations. We will simplify both the incoming CDR and the processing functions as represented on Figure 2. The processing power will be denoted with P as follows: 2 if t < 6 or t > 18 c(t) = (15) 10 if 6 ≤ t ≤ 18 0 if t < 4.5 or t > 23 p(t) = (16) P if 4.5 ≤ t ≤ 23.
Fig. 2. Incomming CDRs and processing power
Our task is to calculate the value of P so it fulfills the different requirements. The minimum processing power (Pmin ) can be calculated from the main queuing theory requirement, thus solving the inequality of p(t)dt > c(t)dt gives us, that 144 Pmin > ≈ 7.78. (17) 18.5 The maximum queue size can be calculated with (1)-(5) and (11), and represented on Figure 3 as a function of P . We have drawn all four values from the max function, but only the function with the highest value with a given P shall be used. It can be seen, that we cannot decrease the queue size below 11 but for smaller P values the Q = A + E − B + C is dominant. With the minimum required power the system will operate with a queue size around 28.9. Taking the CDR age into consideration, we have to fulfill two different requirements. The system shall catch-up between 4.5 and 6 o’clock and for the rest of the day, the maximum age in the queue shall not exceed K. It is obvious, that K is a function of P (and vice-versa), and it is represented on Figure 4. As it can be seen, the mid-day requirement is stronger, and it requires more power capacity.
Sizing of xDR Processing Systems
69
Fig. 3. Queue size
Fig. 4. Latency
Let us do some calculation with the following given requirements: The maximum queue size should not exceed 15, and every CDR should be processed within 1.5 hours. Thus, we have the following equations for queue size: Q =A+E−B+C Q = 9 + 2 − 1.5(PQ − 2) + 12(10 − PQ ) 134 − Q 134 − 15 PQ = = ≈ 8.814, 13.5 13.5
(18) (19) (20)
and for the age requirement we get the strongest constraint if 6+K < x ≤ 18+K when using (12) since the queue is not empty at m2 :
70
B. Ary and S. Imre
PK (x − 4.5) = 2 + 12 + 10(x − K − 6) −10K − 1 PK = 10 + , x − 4.5
(21) (22)
and we need the highest P if x = 18 + K, thus: −10K − 1 13.5 + K 134 134 = = ≈ 8.933 13.5 + K 15
PK = 10 +
(23) (24)
The required power is the maximum of the above calculated powers, thus Ptotal = max(Pmin , PQ , PK ) = PK ≈ 8.933.
6
(25)
Summary
In this paper we have summarized the nature of offline billing systems from sizing point of view. We gave mathematical formulas for the possible business requirements and guidelines to calculate the required processing capacity. If the number of incoming CDRs is known over time, and we have constraints on the start and end time of the processing window we can calculate the required processing capacity that fulfills the business requirements. This model can also be used to size the call centers of a telecommunication system if the number of incoming calls is known over time. The model can be further refined, if the required background capacity (processing capacity, that is not used to process CDRs, but for other required activities) is known, and different that 0. Also, in most cases the number of CDRs on weekdays and national holidays are different from the regular working days. This fact, and additional business requirements may affect the required processing power and the calculation can be refined accordingly in future researches.
References 1. Anderson, G.L., Flockhart, A.D., Foster, R.H., Mathews, E.P.: Queue waiting time estimation (EP0899673) (August 2003) 2. Daigle, J.N.: Queueing Theory with Applications to Packet Telecommunication. Springer Science, University of Mississippi (2005) 3. Graves, S.C.: The application of queueing theory to continuous perishable inventory systems. Management Science 28(4) (April 1982) 4. Rajabi, A., Hormozdiari, F.: Time constraint m/m/1 queue (2006) 5. Rottembourg, B.: Call center scheduling (2002) 6. Schoenmeyr, T., Graves, S.C.: Strategic safety stocks in supply chains with capacity constraints (2009) 7. Shtivelman, Y.: Method for estimating telephony system-queue waiting time in an agent level routing environment (6898190) (May 2005)
Modeling Self-organized Application Spreading ´ am Horv´ aroly Farkas1,2 Ad´ ath1 and K´ 1
2
University of West Hungary, Sopron 9400, Bajcsy-Zsilinszky u. 9., Hungary {horvath,farkas}@inf.nyme.hu https://inf.nyme.hu/~ {horvath,farkas} Budapest University of Technology and Economics, Budapest 1117, Magyar Tud´ osok krt. 2., Hungary
Abstract. Information spreading in self-organized networks is a frequently investigated research topic today; however, investigating the characteristics of application spreading by exploiting the direct connections between the user devices has not been widely studied yet. In this paper, we present our spreading model, in which we use Closed Queuing Networks to model the application spreading process. In this model, we capture the users’ behavior, as well. We also give some simulation results to demonstrate the usage of our model. Keywords: application spreading, self-organized networks, mathematical modeling.
1
Introduction
Knowing the characteristics of application spreading is important for the application provider, first of all from economic point of view. He has to know or at least assess how much money he can realize from the purchases of a given application in a given time. He should also know, which factors influence the spreading process and how. Traditionally, applications are distributed via a central entity, like an internet webshop. Users can browse on the internet and select, purchase and download the application software that they like. However, the proliferation of modern communication paradigms, such as self-organized networks can change the characteristics of application spreading. In such networks, users can communicate directly between each other and direct application downloading is available. So, participants of the spontaneous communication can try out applications and get incentives to purchase the ones they liked. The purchasing is available only via a traditional way, since the secure payment in self-organized networks is a challenging issue today. Application spreading aided by self-organized networks has not got too much attention yet. There are many factors in this type of communication, which are not exploited from economic point of view, such as community experience e.g. with a multi-player game. These factors can have an effect on application R. Szab´ o et al. (Eds.): AccessNets 2010, LNICST 63, pp. 71–80, 2011. c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
72
´ Horv´ A. ath and K. Farkas
spreading, as well, and mean more motivation for users to purchase a given application than he would have seen only some advertisements. Therefore, spontaneous communication can become a new area of the software business. In this paper, we propose the use of Closed Queuing Networks (CQNs) [1] to model the application spreading aided by self-organized networks. The CQN is a stochastic model, which is appropriate to describe the rapid change of the network topology. We can investigate the application spreading in a given population, which is interested in the spontaneous communication. Moreover, CQNs allow us to order transition intensities to state transitions, by which we can describe the time behavior of the spreading process. The technical details of the application spreading, such as discovering nodes, deploying, managing and terminating the application software are beyond the scope of this paper. These issues are detailed in other contributions, e.g. in [2]. Similarly, we do not focus on security issues, which can be found in other works, e.g. in [3], [4], [5] and [6]. The rest of the paper is organized as follows. In Section 2, we present our communication model. We describe our spreading model based on CQNs in Section 3. We present some simulation results in Section 4. We give an overview about the related works in Section 5. Finally, we give a short summary in Section 6.
2
Communication Model
To model the spreading process, we use the following communication model. We examine the spreading of a given application, which is a multi-user application having two versions, a trial and a full version. We examine the application spreading in a given population, which is composed of the users, which are interested in the use of the application. The uninterested individuals do not influence the spreading process, so we do not consider them as users. Henceforth, we refer to the investigated population simply as users. We can categorize the users into different classes depending on whether they possess any versions of the application or not. Since our model shows similarities with epidemic spreading models, we named the different classes after the terminology of epidemics. We call a user infected, if he possesses the full version of the application, susceptible, if he possesses the trial version of it and resistant, if he possesses none of them, or he has already lost his interest of using it. Users communicate with each other forming self-organized networks from time to time, and direct communication takes places between them. The users can download the trial version of the application and try it out only if there is at least one infected user in the same network. If a user liked the application, he can purchase it using a traditional purchasing way, e.g., using a webshop on the Internet (this phase is necessary, since the secure payment and licensing method in self-organized environment is a challenging issue). Later, if a user purchased the application, he can use it or even spread its trial version further. Susceptible users will be motivated in purchasing the full version of the application only if there are limitations in using the trial version. Hence, we apply a
Modeling Self-organized Application Spreading
73
limit (leech limit) that restricts how many susceptible nodes (leech1 ) can connect to one infected node (seed1 ). In this sense, we can consider the seeds as servers, which can serve a limited number of clients. A seed must be an infected user, while a leech may be either infected or susceptible. Fig.1 shows the case when a susceptible user purchases the application. The light and the dark laptops depict susceptible users, while the PDA depicts an infected user. In this example, the leech limit is two, so only two susceptible users (the light laptops) can connect to the only infected user and two users (the dark laptops) have to wait. After one of them purchased the application, they can use it, too.
Fig. 1. Change of application usage when a susceptible user purchases the application
Furthermore, we distinguish three different user types based on their behavior. T ypeA users are interested in using the application; so if they liked it they will purchase it, possibly without trying it out. T ypeB users are also interested in using the application, but they will purchase the application with a given intensity only if they cannot find a seed from time to time which they can connect to. However, T ypeC users never buy the application, their presence influences the spreading process.
3
Modeling Application Spreading
In this section, we present our spreading model and describe how we can use it. 3.1
Spreading Model
We use CQNs for modeling the spreading mechanism, because it is appropriate for describing stochastic processes. Moreover, we can define the transition intensities of the state transitions, what allows us to investigate the time behavior of the spreading process. 1
After the terminology of BitTorrent [7].
74
´ Horv´ A. ath and K. Farkas
In our CQN model, the different states represent the whole user population (Fig. 2). Each user is in one state, and the users’ state depends on their actual user class. The resistant users, which possess neither the trial version of the application nor the full version of it are in state 0. We call also resistant the users, which possess either the trial version (state 5) or the full version (state 6) of the application, but already lost the interest in using it. The susceptible users, which are currently not using the application (passive susceptibles) are in state 1, while susceptible users, which are currently using the application (active susceptibles) are in state 2. Similarly, the passive and active infected users are in state 3 and 4, respectively.
Fig. 2. The proposed CQN model for application spreading
The Greek letters in Fig. 2 represent the transition intensities of a single user, nx represents the number of users in state x, while nxA , nxB and nxC represent the number of T ypeA, T ypeB and T ypeC users in state x, respectively (nx = nxA + nxB + nxC ). The transition intensities are real numbers assigned to the state transitions, and denote how many times a state transition takes place during a given time interval on average. The transitions of our model are described in Table 1. In our model, we do not consider the network topology as a key element of the application spreading, we assume that users can connect to each other forming self-organized networks from time to time. They change their state depending on whether they have got any version of the application, or whether they start to run the application or stop using it.
Modeling Self-organized Application Spreading
75
Table 1. Description of the State Transitions in our CQN Transitions Description 1→2 2→1 3→4
4 0 3 1
→ → → →
3 1 6 5
0→3
1→3
5→1 6→3
3.2
A susceptible user starts to run his application and tries to connect to an available seed in the network. If he cannot find one, he has to wait. A susceptible user stops running the application. An infected user starts to run the application. he can either try to connect to an available seed, or will be a seed himself to which leeches can connect. An infected user stops running the application. A resistant user downloads the trial version of the application. An infected user loses the interest in using the application. A susceptible user loses the interest in using the application. The additional intensity γ represents that susceptible users lose the interest faster because they have to wait possibly. A resistant user purchases the application. It is possible that someone purchases it without trying it out. In our model, we allow this state transition only to T ypeA users, thus, n0A depicts the number of T ypeA users in state 0. A susceptible user purchases the application. n1A and n1B depict the number of T ypeA and T ypeB users in state 1, respectively. This transition is enabled for T ypeB users only when they cannot find a free seed in the network to which they can connect. Therefore, the indicator variable i is one if at least one free seed is available, and zero otherwise. Since T ypeC users never purchase the application, this transition is not allowed to take place for them. However, T ypeC users can also connect to seeds, so their presence decreases the probability that T ypeB users can find a free seed. A resistant user, which lost the interest in using the trial version, wants to use the application again after a while. His state changes to susceptible. Similarly, if a resistant user, which possesses the full version of the application, so his state changes to infected.
Usage of the Spreading Model
We can unambiguously describe the system by the user distribution (n0 , n1 , n2 , n3 , n4 , n5 , n6 ). The transition intensity values with regard to a single user (α, β, γ, δ, , φ, λ, μ, ν, ρ and ξ) are the parameters of our model, which we can set experimentally. The holding time h can be computed in each system state by the following way: h=
1
∀state x
outx
,
(1)
76
´ Horv´ A. ath and K. Farkas
where outx depicts the sum of intensities for transitions leaving state x. The holding time h is the time that the system spends in a given system state. We compute it in each system state, since the transition intensities, from which we derive it, change after each transition due to the user distribution changes. The next system state can be generated based on the ratio of the transition intensity values. For example, the probability that the next transition in the system will be transition 1 → 2 is n1 · ρ· h. After we selected the transition that takes place, we move one user from the source state to the destination state of that transition and compute the holding time of the new system state, and so on. If a user reaches state 5 or 6, it means that he lost the interest in using the application. We allow users to return also from these states, since it is possible, that they will be interested later again. However, we allow these transitions with low intensity values to ensure that the system will reach sooner or later the state (final system state) in which all users are in either of state 5 and state 6. However, the final system state is not the steady state of the system, our investigations will finish when we reach it, because all users lost the interest in using the application. In this state, the holding time is very large, since we set ν to low. Therefore, the further transitions need very much time to take place, so the interest in using and purchasing the application is very low. The number of transition 1 → 3 and 0 → 3 will determine how many pieces of the application software were sold until we reached the final system state, while the duration of the spreading process can be determined by summing the holding times. By using a simulator software, we can evaluate the simulation results and learn the dynamics of the spreading process and the attitude of the different user types.
4
Simulation Results
In this section, we present some simulation results to demonstrate how the spreading mechanism takes place. To get the simulation results, we have developed a Java based simulator software. The simulator works as follows. In the initial system state, when every node is in state 0, it computes the holding time and stores it. The next system state is generated by selecting the transition that takes place. The selection works by using random numbers, which are weighted with the intensity values of the transitions. The simulator moves one user from the source state to the destination state of the selected transition. We compute and store the holding in the next system state, as well, and so on. The whole process terminates when we reach the final system state first. In every system state, we can store statistics and combining them with the holding time we can investigate the time behavior of the spreading process. In the following, we describe two scenarios, in which we ran the simulator with different parameters. The parameters are collected in Table 2. The measure of the parameters depicted by Greek letters in Table 2 is 1 / hour.
Modeling Self-organized Application Spreading
77
Table 2. Simulation Parameters Parameters Simulation 1 Simulation 2 α β γ δ φ λ μ ρ ξ ν nA nB nC leech limit
10−3 10−3 10−3 5· 10−3 10−3 10−5 2.5· 10−2 1 2.5· 10−2 1 10−9 300 400 300 2
3· 10−3 10−3 10−3 5· 10−3 10−3 10−5 2.5· 10−2 1 2.5· 10−2 1 10−9 300 400 0 2
In this paper, we do not aim to find the correct set of the parameters, which is a complex and hard challenge, and we plan to do it in the future. The parameters depend on many things, such as the popularity or the price of the given application, thus, they can be set experimentally. In these simulations, we tried to set the parameters as realistic as possible based on good sense. For example, λ = 2.5· 10−2/hour means that an infected user starts the application once in every 40 hours on average, while μ = 1/hour means that he is using it for 1 hour on average. In both simulation runs, we repeated the simulations 10 times, and we got similar results. Therefore, we randomly picked up one in both cases for investigations. In Simulation 1, we got that 3 users purchased the application without trying it out, 215 T ypeA user and 74 T ypeB user purchased it after trying it out. Thus, the total number of purchases was 292. The duration of the process was about 8000 hours, which is almost one year; however, the interest to purchase the application was very low after 5300 hours (Fig. 3). In Simulation 2, we set the number of T ypeC users to zero to demonstrate how they influence the number of purchases of T ypeB users. Moreover, we tripled the value of transition intensity α, which will fasten the spreading of the trial version, and the whole spreading process. Fig. 4 shows that the number of purchases of T ypeB users decreased to 63, therefore, the total number of purchases decreased to 280. It can be explained by the absence of T ypeC users: T ypeB users found more frequently an available seed to connect to, and their motivation to purchase the application decreased. Since we increased the value of transition α, the simulation reached the final system state after 6270 hours.
78
´ Horv´ A. ath and K. Farkas
Fig. 3. The size of user classes and the number of purchases – Simulation 1
Fig. 4. The size of user classes and the number of purchases – Simulation 2
5
Related Works
Investigating application spreading has not got too much attention so far; however, it becomes more and more important with the proliferation of the modern communication paradigms. On the other hand, epidemic spreading is a popular research topic today and deals with issues similar to ours. In [8], the authors investigate the propagation of a virus in a real network. They present a model to determine an epidemic threshold in the network, below which the number of infected nodes decreases exponentially. The threshold is derived from the adjacency matrix of the network. In [9], the authors use scale-free networks to model the spreading of computer viruses and also give
Modeling Self-organized Application Spreading
79
an epidemic threshold, which is an infection rate. Information spreading is also modeled by epidemic spreading models, such as the susceptible-infected-resistant (SIR) model [10], or other models based on the network topology [11], [12]. In [13], the spreading of malicious software over mobile ad hoc networks is investigated. The authors propose the use of the susceptible-infected-susceptible (SIS) model on the basis of the theory of closed queuing networks. In [14], the authors propose the commercial use of ad hoc networks and present a radio dispatch system using mobile ad hoc communication. In the proposed system, the network topology is the key element of the information dissemination. In our model, we do not consider the network topology as a key element of the application spreading, since no real-time information dissemination is needed between the users. The above mentioned papers do not deal with the application spreading, do not capture the users’ behavior, and except [14], they do not touch the commercial use of the self-organized networks, in which the authors consider the information dissemination as a tool, not as a goal.
6
Summary
In this paper, we investigated the application spreading aided by spontaneous communication. We proposed a CQN model to describe the application spreading process, in which we assumed that users support the spreading process by the distribution of the trial version of an application. We categorized the users into different classes based on their behavior. Finally, we gave some simulation results to demonstrate the usage of our model. In the future, we plan to elaborate on setting the model parameters as realistic as possible and investigate other tools, such as Stochastic Petri Nets, to be able to refine our spreading model.
Acknowledgements This work has been partially supported by the Hungarian Scientific Research Fund (OTKA, PD 72984).
References 1. Robertazzi, T.G.: Computer Networks and Systems: Queuing Theory and Performance Evaluation. Springer, New York (1994) 2. Plattner, B., Farkas, K.: Supporting Real-Time Applications in Mobile Mesh Networks. In: MeshNets 2005 Workshop, Budapest, Hungary (2005) ˇ 3. Capkun, S., Butty´ an, L., Hubaux, J.-P.: Self-Organized Public-Key Manegement for Mobile Ad Hoc Networks. IEEE Transactions on Mobile Computing 2(1) (2006) 4. Hu, Y.-C., Johnson, D.B., Perrig, A.: Secure Efficient Distance Vector Routing in Mobile Wireless Ad Hoc Networks. In: 4th IEEE Workshop on Mobile Computing Systems and Applications (WMCSA), Callicoon, New York, USA (2002)
80
´ Horv´ A. ath and K. Farkas
5. Hu, Y.-C., Perrig, A., Johnson, D.B.: Ariadne: A Secure On-Demand Routing Protocol for Ad Hoc Networks. In: 8th ACM International Conference on Mobile Computing and Networking (MobiCom), Atlanta, Georgia, USA (2002) 6. Sanzgiri, K., Dahill, B., Levine, B.N., Shields, C., Belding-Royer, E.M.: A Secure Routing Protocol for Ad Hoc Networks. In: 10th IEEE International Conference on Network Protocols (ICNP), Paris, France (2002) 7. Cohen, B.: Incentives Build Robustness in BitTorrent. In: 1st Workshop on Economics of Peer-to-Peer Systems, UC Berkeley, California, USA (2003) 8. Wang, Y., Chakrabarti, D., Wang, C., Faloutsos, C.: Epidemic Spreading in Real Networks: An Eigenvalue Viewpoint. In: 22nd International Symposium on Reliable Distributed Systems (SRDS 2003), Florence, Italy, pp. 25–34 (2003) 9. Pastor-Satorras, R., Vespignani, A.: Epidemic Spreading in Scale-Free Networks. Phys. Rev. Lett. 86, 3200 (2001) 10. Fu, F., Liu, L., Wang, L.: Information Propagation in a Novel Hierarchical Network. In: 46th IEEE Conference on Decision and Control, New Orleans, USA (2007) 11. Khelil, A., Becker, C., Tian, J., Rothermel, K.: An Epidemic Model for Information Diffusion in MANETs. In: 5th ACM International Workshop on Modeling Analysis and Simulation of Wireless and Mobile Systems, Atlanta, Georgia, USA (2002) 12. Sekkas, O., Piguet, D., Anagnostopoulos, C., Kotsakos, D., Alyfantis, D., Kassapoglou-Faist, C., Hadjiethymiades, S.: Probabilistic Information Dissemination for MANETs: the IPAC Approach. In: 20th Tyrrhenian Workshop on Digital Communications, Pula, Italy (2009) 13. Karyotis, V., Kakalis, A., Papavassiliou, S.: Malware-Propagative Mobile Ad Hoc Networks:Asymptotic Behavior Analysis. Journal of Computer Science and Technology 23(3), 389–399 (2008) 14. Huang, E., Hu, W., Crowcroft, J., Wassel, I.: Towards Commercial Mobile Ad Hoc Network Application: A Radio Dispatch System. In: 9th Annual International Conference on Mobile Computing and Networking, San Diego, California, USA (2003)
ACCESSNETS 2010
Technical Session 3: Next Generation Wired Broadband Networks
Passive Access Capacity Estimation through the Analysis of Packet Bursts Martino Fornasa and Massimo Maresca Centro di ricerca sull’Ingegneria delle Piattaforme Informatiche University of Genova, University of Padova - Italy {martino.fornasa,massimo.maresca}@dei.unipd.it
Abstract. Downlink capacity is the most advertised quality parameter of broadband Internet access services, as it significantly influences the user perception of performance. This paper presents an automatic computation method of such a capacity from a measurement point located inside the network. The method is fully passive as it takes advantage of existing TCP connections. It does not inject additional traffic in the network and does not require end-host collaboration. The method takes advantage of the bursty nature of TCP to apply the packet-dispersion approach to TCP segment sequences (packet trains) rather than to segment pairs. This results in a sensible reduction of noise impact on rate estimation. We present an analysis of the effects of the interfering traffic in the access link on rate estimation. We show that it is possible to detect and drop TCP packet trains affected by interfering traffic and to identify and process the packet trains that are not affected by interfering traffic. The proposed method has been validated by means of a set of experiments on ADSL and fibre Internet access services, which are described in the paper. Applications of the proposed method are i) to provide a passive SLA verification method to Internet Service Providers toward Access Service Providers, ii) to support widespread Internet access capacity measurement campaigns, and iii) to perform constant monitoring of access links for fault detection. Keywords: Broadband Access Service, Capacity, Passive Estimation.
1 Introduction Downlink capacity, i.e. the maximum achievable downlink network-layer rate, is the most advertised quality parameter of broadband Internet access service, as it significantly influences the user perception of application service performance. This paper proposes a passive method to estimate the downlink capacity of an access link to a TCP/IP network from a measurement point located inside the network by taking advantage of the existing TCP connections. The method suits a variety of scenarios, the most relevant of which is the one in which a service provider wants to estimate the quality of the access service provided by another provider called Access Service Provider. This is what very often happens in Internet service provisioning, in which the access service is often operated by an Access Service Provider (ASP), typically the incumbent operator or a local telephone R. Szabó et al. (Eds.): AccessNets 2010, LNICST 63, pp. 83–99, 2011. © Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
84
M. Fornasa and M. Maresca
company. The method proposed in this paper enables Internet Service Providers (ISPs) to estimate the downlink access capacity provided by the ASP passively and autonomously, without the cooperation of the ASP and without the cooperation of the final customer. Additionally, the method supports large-scale measurement campaigns aimed at characterizing broadband access link capacity and supports access link fault detection. The method takes advantage of the bursty nature of TCP and applies the packetdispersion technique to the acknowledgement segments (ACK) generated by TCP data segment sequences (packet trains) rather than by TCP packet pairs. To our knowledge, the method is the first effective narrow-link capacity estimation method that is both passive (it does not inject traffic on the network) and remote-based (it relies on the ACK packet passing times measured in a different location with respect to the narrow link)1. In order to obtain a method which is both passive and remote we process packet trains rather than packet pairs, as longer packet sequences allow to reduce the impact of the noise corresponding to the delay jitter of the ACK upstream path. However, packet trains last longer than packet pairs and are therefore more subject to crosstraffic than packet pairs. We propose a method to detect and drop the packet trains affected by interfering traffic, both in the uplink access queue and in the downlink access queue. The proposed approach was validated through a set of experiments performed over ADSL and fibre access lines under different traffic conditions. This work is a continuation and extension of the author’s earlier work presented in [11].
2 Background Among the many capacity estimation methods proposed in the past we focus on the packet dispersion method [1, 2, 5, 6]. Such a method is based on the observation that the dispersion (i.e., the time difference between the last bit of the first packet and the last bit of the second packet) of a pair of equally-sized back-to-back packets traversing a link can be modified along the source-destination path. In general, the dispersion (d) of a back-to-back pair after a link of capacity r is d = w / r , where w is the size of the two back-to-back packets. Using such a formula, it is possible to calculate the link capacity as r = w / d . The formula is valid assuming that no interfering traffic is transported over the link. On the contrary an interfering traffic on the link changes the packet dispersion and leads to a rate estimation error. In absence of interfering traffic the dispersion of two back-to-back packets that traverse a path is the one induced by the path “narrow link” (i.e., the link having the smallest capacity on the path). The packet dispersion method has also been used in TCP Westwood [10] in order to estimate the fair share bandwidth for a TCP connection. Capacity estimation techniques can be active or passive. Active techniques rely on active probing and therefore require the injection of traffic on the network, whereas passive techniques only rely on traffic observation (traffic traces). As a consequence passive techniques enable non-invasive capacity estimation of large numbers of access links as well as long lasting measurement campaigns. 1
Active-remote techniques have been proposed in the past ([8, 9]).
Passive Access Capacity Estimation through the Analysis of Packet Bursts
85
Passive packet dispersion methods can be classified depending on the point where the measurement equipment is placed. Receiver-side techniques are based on measurements taken at the receiver, sender-side techniques are based on measurements taken at the sender, network-side techniques are based on measurements taken at one network node located in the path between the sender and the receiver. If the measurement point is on the sender side or on the network side the packet dispersion must be estimated by observing the existing TCP connections, and more specifically by using the timestamps of the TCP acknowledgement segments taken at the measurement point to estimate the interarrival times of the corresponding forward data segments at the receiver. In general, the capacity estimation methods based on the ACK interarrival times are more complex than the receiver-based methods, because of the following reasons: 1. 2. 3. 4.
The traffic that passes through a path that does not include the measurement point (interfering traffic) can disturb the measurement. The jitter on the upstream path delay may modify the ACK dispersion. The TCP ACKs are sent according to the delayed ACK scheme, i.e., the TCP protocol acknowledges more than one data packet at a time. Congestion on the uplink queue leads to a decreased ACK pair dispersion, namely the ACK compression phenomenon.
3 Proposed Approach A general issue impacting the accuracy of the packet pair dispersion approaches is that the w/r ratio is small compared to the network delay jitter. This is caused by the fact that the Maximum Segment Size (MSS) of TCP is about 1500 bytes for legacy reasons, irrespective of the ever increasing capacity of networks. This issue can be mitigated by the adoption of a more general approach that consists of considering longer packet sequences, usually called packet trains. The dispersion of a packet train composed by n packets [0, 1, … n-1] of size wi will be: d 0,n−1 =
n −1
i =1
wi
r The above formula is valid assuming no interfering traffic on the link. Considering packet trains allows obtaining capacity estimates less influenced by the measurement noise. However, some issues arise when considering packet trains, as stated in [2]. The authors correctly argue that the longer a packet sequence, the larger the probability of the influence of interfering traffic causing increased dispersion. We solve such an issue by means of a method aimed at discarding the packet trains affected by the interfering traffic, both in the downstream access link and in the upstream access link. We show that it is possible to detect and drop the packet trains affected by interfering traffic. The proposed method is based on the following assumptions:
1. 2.
There are no post-narrow links, since we are considering the access link, that is the last downstream link (and usually the narrow link of the downstream path). The capacity of the access link is far below the capacities of the backbone links.
86
3.
M. Fornasa and M. Maresca
The majority of TCP data segments are approximately 1500 bytes long. In fact, as stated in [4], the packets size distribution on the Internet is mostly bimodal at 40 bytes (TCP ACK segments) and 1500 bytes (TCP MSS segments).
Assumption 1 excludes the increased dispersion of packet trains caused by postnarrow links. Assumption 2 allows isolating the effect of the upstream path delay jitter as a symmetric noise. Assumption 3 allows coming up with heuristics aimed at filtering out the packet trains influenced by interfering traffic on the downstream path.
4 Reference Model We consider a reference model (shown in Fig. 1) in which a passive Traffic Monitoring System (TMS) is placed on a specific interface of a Measurement Node (MN). The MN is connected to a Customer Premises Gateway (CPG) by means of a chain of links and nodes. The MN can be placed somewhere inside an Internet Service Provider network, at the border of such a network (for example in a Neutral Access Point facility), or at a network endpoint, for example in a content provider premises. We consider the access service in place between the Service Provider Remote Access Service (SP-RAS) and the CPG. The access link downstream and upstream capacities can be equal (symmetric access, such as a fibre or HDSL access) or different (asymmetric access, such as an ADSL). We are interested in measuring access downstream capacity taking advantage of any existing TCP connections, so we consider the TCP half connection in which the end-user host (the one attached downstream the CPG) acts as a receiver, and a host placed upstream with respect to the MN acts as a sender, i.e., we consider TCP data segments flowing toward a CPG and the TCP ACK segments coming from the CPG. This corresponds to the usual case in which the end-user host acts as a client of a server on the Internet. The TMS captures the TCP segments passing through the MN interface, detects the TCP data segment with the corresponding ACK segments based on the TCP sequence number, and fills out an array of (Packet size, TCP ACK passing time timestamp) pairs: (w0, t0a) (w1, t1a) (w2, t2a) … (wN-1, tN-1a) where wi is the data segment IP total size, tia is the ACK segment passing timestamp and N is the total number of TCP ACK segments during the observation period. Such an array is the input of the capacity estimation algorithm presented in next section. In some cases TCP does not send an ACK segment for each data segment received due to the delayed acknowledgment technique. This issue will be discussed in Section 7.
Passive Access Capacity Estimation through the Analysis of Packet Bursts
MN
Link
Node
Link
SP L2 Access Link RAS
Node
87
CPG
MN: Measurement Node SP-RAS: Service Provider Remote Access Server TMS: Traffic Monitoring System CPG: Customer Premises Gateway
TMS
Fig. 1. Description of the reference model
5 Analysis: No Interfering Traffic In this section we analyze the behaviour of the reference model described in Section 4 in order to devise a downlink access capacity estimation method. In order to do so, we make two simplifying assumptions, which will be removed in the following sections: 1. 2.
No interfering traffic on the access downlink queue, i.e., all the traffic that passes through the access link passes also through the TMS. No congestion on the access uplink queue, i.e., acknowledgment segments never queue on the uplink access queue, thus the uplink access link gives a fixed contribution to data/acknowledgment pair round-trip time2.
We consider a data segment sequence [i, i+1, …, j-2, j-1] where all but the first segment arrive at a non-empty queue. We call such a sequence a ‘Packet Burst’ (PB). More precisely, we suppose now that: • • •
Before the arrival of data segment i, the access downlink queue is empty (Fig. 2a). The queue does not empty up to the arrival of segment j-1 (Fig. 2b). Before the arrival of segment j the queue is empty (Fig. 2c). Narrow link Receiver
i
a)
Narrow link i+9
i+7
i+7
i+6
i+5
i+4
i+3
i+2
Receiver
b)
Narrow link j
Receiver
c)
Fig. 2. Packet burst 2
This hypothesis is justified by the fact that the size of a data packet can be up to 1500 bytes, whereas the size of an ACK packet is around 40 bytes. So, in the absence of data traffic originating downstream the access link, symmetric access links never show queueing on the uplink, and also asymmetric access links are always correctly dimensioned in order to avoid such a phenomenon.
88
M. Fornasa and M. Maresca
We split the analysis in two phases. First, we write an expression for the arrival time at the MN of the ACK segments triggered by the data segments forming the PB (ta). Second, we write an expression for the arrival time of the j-th ACK segment, the first that does not belong to the PB. During the PB. During a PB we can write an expression for the interarrival time to the receiver (dispersion) between data segment k-1 and data segment k as: wk ∀k : i < k < j r r where tk is the arrival time of the k-th data segment to the receiver, wk is its total IP size and r is the downlink access capacity. Now, we can write an expression for the arrival time of the generic k-th segment to the receiver as: t kr − t kr−1 =
t kr = tir +
1 k wl r l =i+1
∀k : i < k < j
It is worth noting that during a PB the interarrival times at the receiver are not influenced by the downstream delay jitter (i.e., the jitter on the delay needed by a data segment to travel from the MN to the access downlink queue). The arrival of a data segment at the TCP receiver causes the generation of an ACK segment that flows back to the sender. The arrival time of the k-th ACK segment at the MN (tka) can be written as the sum of the arrival time of the data segment at the receiver (tkr), plus the network upstream delay (T), plus a noise component due to the upstream delay jitter (ξk): t ka = t kr + T + ξ k = tir +
1 k wl + T + ξ k r l =i+1
∀k : i < k < j
The above formula can by simplified by subtracting the arrival time of the first ACK of the PB (tia): 1 k t ka = tia − ξ i + wl + ξ k r l =i +1 yk
∀k : i < k < j
(1)
xk
Now we define ( xk ≡ k wl ) and (yk≡tka). Thus, as long as the queue does not l =i +1 empty, the (xk, yk) points are approximately arranged on a line with slope 1/r and y-intercept (tia- ξi). The reciprocal of the slope of such a line represents the capacity of the downlink access queue (r). In order to devise the fitting line parameters, it is possible to apply the linear regression method to such points. The fitting line is represented in Fig. 3 as a dotted line. So, during a PB, the capacity of the downlink access queue can be obtained as the inverse of the slope obtained by applying the linear regression on the PB points: r = 1 / lin _ regr _ slope({xi , xi +1 ,, x j −1 },{ yi , y i +1 ,, y j −1 })
After the Packet Burst. The data segment j arrives at an empty queue. So point (xj , yj) is not aligned with the previous points, while it is shifted upward a Δ quantity (see Fig. 3). In general, given a sequence of pairs (wk , tka), it is possible to identify a set of PBs where a linear relationship exists. As shown in Fig. 4, such PBs form a set of fitting lines, at different y-intercepts, but with same slope (1/r).
Passive Access Capacity Estimation through the Analysis of Packet Bursts
89
yk
yj
PB
1/r
yj-1 ... ... yi+2 yi+1 a t i-ξi xi+2
xi+1
...
...
xj-1
xj
xk
Fig. 3. Linear relationship yk PB
PB
PB
xk
Fig. 4. Linear relationship: multiple PBs
5.1 Packet Burst Identification Algorithm We propose an algorithm to identify the PBs during a given observation period taking advantage of the linear relationship devised in Equation (1). The algorithm input is the (tia, wi) array obtained at the TMS, while the output is the set of the capacities (r0, r1, …, rP-1) associated with the maximum-sized PBs. The capacity associated to each PB is obtained by linear regression over the PB (i.e., the reciprocal of the slope of the fitting line). The algorithm (see the pseudocode on Fig. 5) consists of successive tests over increasing sequences of pairs, to find the maximum-sized sequence of pairs showing a ‘good’ fit to the linear model described by Equation (1). It starts considering the subsequence composed of the first three pairs ([m, n], m = 0, n = 2). At each iteration, the algorithm:
90
• •
M. Fornasa and M. Maresca
Calculates the {xk , yk} points over the considered interval, i.e., {xk , yk}k=m to n according to xk and yk definitions provided in the previous section. Performs a fit test on such points. Then: o If the fit is bad, the interval is shifted up by one (m←m+1; n←n+1) and the next iteration is started. o If the fit is good, the capacity value associated with such an interval is saved, the interval is enlarged by one (n←n+1), and a new iteration is started; if the fit on the larger interval is good, the interval is enlarged another time, and so on. However, if the fit on the larger interval is bad, the last valid capacity value (the one found in a previous iteration) is retained, and the next threeelement interval is selected.
At the end of the iterations, the algorithm has identified several PBs, each characterized by a capacity value. Goodness of fit. In principle it could be possible to use the linear regression coefficient of determination (R2) calculated over the considered interval in order to discriminate between a good fit and a bad fit to the linear model3. However, we noticed that the use of R2 has some drawbacks on long (n > 4) TCP segment sequences, as it tends to join successive PBs that are split by a non-PB point, because a single nonlinearity can be hidden summing a large number of squared residuals. In order to overcome this problem, we apply a different fit evaluation method, based on the definition of the instantaneous rate (ρ), defined as the ratio between the increase in x and the increase in y of the PB points:
ρk =
xk − xk −1 yk − yk −1
∀k : m < k ≤ n
The linearity condition can be detected by checking that all the instantaneous rates on the PB are equal. So, for every [m,n] segment interval, we calculate the above ratios and consider a fit good when the difference between each of the ρ k values and the mean of the ρ k over the interval is below a given threshold; in particular the condition is: | ρ k − mean( ρ k ) |≤ C ⋅ mean( ρ k )
∀k : m < k ≤ n
In this way we define a range around the mean value of the instantaneous rates, and all the rate values on the interval have to fall within that range. The C value must be tuned; in our test we found that a 0.2 value (giving a ±20% range) can be appropriate. In summary, the algorithm exploits the linearity of the ACK generation time during a PB in order to identify all the PBs during the measurement period and to estimate a downlink capacity for each of them. (y (y n
3
The coefficient of determination is defined for a [m, n] interval as follows:
R2 ≡ 1 −
k =m n
k =m
k
) − y)
− yˆ k
2
2
k
where yˆ k is the value predicted by the linear model (i.e. yˆ ≡ t a + 1 x ); and y is the mean of k m k r
the xk ( x ≡ n y /(m − n + 1) ). The coefficient of determination value is between 0 and 1, k =m k where 1 means that the fit line passes exactly through the measured points.
Passive Access Capacity Estimation through the Analysis of Packet Bursts
ent of determination is defined for a [m, n] interval as follows:
91
n
R2 1
k m n
k m
he value predicted by the linear model (i.e. yˆ t a 1 x ); and y is the m k m k r
y k /( m n 1) ). The coefficient of determination value is between 0 and 1
he fit line passes exactly through the measured points.
Fig. 5. Pseudocode of the Packet Burst identification algorithm
6 Analysis: Interfering Traffic In this section we remove the simplifying assumptions of no interfering traffic on the access queue made at the beginning of Section 5. In particular, in Section 6.1 we examine the effects of interfering traffic on the downlink access queue while in Section 6.2 we examine the effects of congestion on the uplink access queue. 6.1 Downlink Access Queue Interfering Traffic In case of interfering traffic on the access downlink queue, i.e., TCP data segments that arrive at the access downlink queue through a path that does not include the MN, the queue might contain interleaved traffic coming from different paths, possibly
92
M. Fornasa and M. Maresca
invalidating the Packet Burst identification algorithm described in Section 5.1. Consider for example, at a given moment, the case in which the queue contains the traffic pattern: M M I M I I I M M . . . where M denotes a segment passing through the MN, and I denotes an interfering traffic segment. Such a pattern destroys the linear relationship described by Equation (1), thus invalidating the PB identification method. The interfering traffic on the downlink has two possible outcomes: 1.
A given segment sequence is identified as a PB. This can be due to two causes: o There is no interfering traffic and the measured traffic produces a PB on the downlink queue. So, the capacity obtained by the algorithm is correct. o The interfering traffic and the measured traffic are shaped to cause a false positive, i.e., a pattern of measured and interfering traffic leading to a PB condition associated to an incorrect capacity. We discuss how to detect false positives later. 2. A given segment sequence is not identified as a PB. This can be due to: o The fact that there is no interfering traffic but the measured traffic does not produce a PB on the downlink queue. o The fact that the interfering traffic destroys the linearity on the downlink queue.
False positive detection. Equation (1), which provides an expression for the passing time of the k-th ACK segments belonging to a PB, can be modified to take into account the interfering traffic on the access downlink queue. Being vi the sum of the sizes of the interfering segments that arrive at the queue between the arrival of the (k-1)-th measured segment and the k-th measured segment, we obtain: t ka = tia − ξ i +
1 k (wl + vl ) + ξ k r l =i+1
∀k : i < k < j
However, as we stated before, the capacity estimation algorithm is only able to monitor the traffic that passes through the MN. So, the system equation seen by the algorithm is the following: tka = tia − ξ i +
1 k (wl ) + ξ k r * l =i +1
∀k : i < k < j
with a different capacity (r*) with respect to the actual capacity. It is easy to devise the necessary condition for a false positive: vi+1 vi+2 vi+3 = = = ... wi+1 wi+2 wi+3
Under such conditions, the under-estimated (wrong) capacity is: r* =
wi r vi + wi <1
Passive Access Capacity Estimation through the Analysis of Packet Bursts
93
Moreover, as the length of most TCP data segments is about 1500 bytes [4], the denominator of the aforementioned expression can assume only a value that is an integer multiple of 1500. The segment patterns that can cause a false positive and their corresponding wrong rates are summarized Table 1. Table 1. False positives segment patterns r*
Segment pattern M I M I M I
1/2 r
M I I M I I M I I
1/3 r
M I I I M I I I M I I I
1/4 r
…
…
In summary, the probability of false positives depends on the segment pattern of the measured and interfering traffic on the access downlink queue. So a false positive always causes a capacity under-estimation, with a capacity less than or equal to half of the real one. An exception to this, as we show in Section 7, is that the delayed acknowledgement mechanism implemented by TCP can raise the maximum wrong capacity to (2/3)r. So, the interfering traffic on the downlink access queue can cause a capacity underestimation (false positive) with a value less than 2/3 of the actual capacity value. 6.2 Uplink Access Queue Interfering Traffic In Section 5 we assumed that the ACK segments never meet congestion on the access uplink queue, i.e., that the access ACK segments takes a constant amount of time to traverse such a queue. This means that, if we ignore the effect of the upstream path delay jitter, the ACK segments arrive at the MN at the same rate at which they are sent by the receiver. Due to the size difference between TCP forward data segments and TCP ACK segments, congestion on the uplink access queue is likely to be triggered by data traffic on the uplink (e.g., during a file upload or when a file sharing application is active). If we consider the TCP data segments flowing through the uplink there are two possibilities: 1.
2.
Data traffic on the uplink modifies the ACK spacing of a PB and destroys its linearity. In this case, the segment sequence is discarded by the PB identification algorithm, without causing incorrect capacity estimation. Data traffic on the uplink causes the ACK-compression phenomenon [3], i.e., a reduction of the time spacing between successive ACKs due to a congested uplink access queue.
The following example shows the impact of ACK compression on the capacity estimation algorithm. Let us consider a PB composed by segments [i, i+1, …, j − 1] having the following properties: •
The access uplink queue contains wint bits at the moment of the arrival of the first ACK at the uplink queue.
94
•
M. Fornasa and M. Maresca
wint is large enough to cause the buffering of all the burst ACK segments in the uplink queue.
For simplicity, let us suppose that all the PB data segment sizes are equal to w and ignore the effect of the uplink delay jitter. Let ru be the uplink capacity and let wack be the ACK segment size. Under such conditions, the ACK interarrival times do not depend on the downlink capacity, while they only depend on the uplink capacity. t ka − t ka−1 =
wack ru
∀k : i < k < j
The capacity value (r*) obtained is: r* =
w w u = r a t − t k −1 wack a k
Such a value is independent of the actual downstream access capacity, and as a consequence it cannot be correct. In the typical case in which w=12000 bit (1500 Bytes) and wack=320 bit (40 Bytes), the estimated capacity is: r * = 37.5r u
On a symmetric access link (r=ru), the estimated capacity is almost 40 times the actual capacity. On an asymmetric link (e.g., ADSL), the downlink/uplink capacity ratio is usually less than 20, which makes the estimated capacity nearly twice the actual capacity. In summary, in presence of interfering traffic on the uplink, a PB can be subject to the ACK compression phenomenon due to uplink delay queue congestion. In this case the algorithm can estimate a ‘wrong’ rate at a value which is at least twice the actual downlink access capacity.
7 Delayed Acknowledgements TCP receiver implementations may employ the delayed acknowledgement technique, which consists of sending less than one ACK segment for every received data segment. Specifications allow a host to send an ACK every two incoming data segments [7]. In presence of delayed ACKs, the PB identification algorithm can not measure the passing time of the ACKs of every TCP data segment, while it can only measure the ACK passing time every two data segments. In this case it is possible to consider a (w, ta) pair for every ACK, with the timestamps taken from the second data segment (the one that received the ACK) and a size that is the sum of the sizes of the two data segments: ( w0 + w1 , t1a ) ( w2 + w3 , t3a ) ( w4 + w5 , t5a ) ...
Passive Access Capacity Estimation through the Analysis of Packet Bursts
95
So, in this case a minimum of six successive TCP segments forming a PB is necessary to perform capacity estimation using the proposed method. The delayed ACK mechanism also influences the effect of the downlink interfering traffic on the capacity estimation described in Section 6.1. In fact, in presence of delayed ACKs, there are more combinations of interfering traffic and measured traffic that can lead to false positives. In particular, the maximum capacity estimation associated to a false positive in the presence of delayed ACK is caused by the following traffic pattern on the downlink queue: M I M* M I M* M I M* M … where M* represents a measured packet that does not receive an ACK, M represents a measured packet that does receive an ACK, and I is an interfering traffic packet. It can be shown that in this case the estimated capacity is 2/3 of the actual capacity.
8 Experiments We performed a set of experiments on ADSL and fibre access services to the Internet aimed at validating the proposed capacity estimation technique. We placed the TMS and a Web Server on a host attached to a well-provisioned link, with the path from the TMS to the access link composed by about 20 hops, traversing the backbone network of two service providers. The experiments consisted in 100 HTTP downloads of a small file (50 KB) from the Web Server, with a 2-second interval. The file size was chosen in order to obtain short TCP connections to simulate Web surfing activities. Due the to TCP slow-start mechanism, such downloads did not exploit the full access capacity, but only a small fraction of it. We considered three possible traffic conditions: A. No interfering traffic, obtained by running only the 50-KB file downloads on the access host. B. Interfering traffic on the downlink, obtained by means of a large persistent file download from a third-party Web Server to the access host. The large file download was started before the experiments and kept active during the execution of the experiments, in such a way that the short downloads of the experiment had to compete with it. C. Interfering traffic on the uplink, obtained by means of a large file upload to a third party server from the access host. The large file upload was started before the experiments and kept active during the execution of the experiments in such a way that the ACKs generated by the short downloads of the experiment had to compete with it. Fig. 6 and 7 show the histograms representing the distribution of the capacity values obtained by the PB identification algorithm.
96
M. Fornasa and M. Maresca
100 7.02
90 80 70 60 50 40 30 20 10 0 1
2
3
4
5
6
7
8
9
10
A
Capacity (Mbps)
50 6.88
45 40 35
False positives
30 2.34
3.52
4.73
25 20 15 10 5 0 1
2
3
4
5
6
7
8
9
Capacity (Mbps)
10
B
100 6.89
90 80 70 60 50 40
ACK compression
30 20 10 0 5
6
7
8
9
10
11
12
13
14
Capacity (Mbps)
Fig. 6. Capacity distributions (7-Mbps ADSL)
15
16
C
Passive Access Capacity Estimation through the Analysis of Packet Bursts
70 10.10 60 50 40 30 20 10 0 2
3
4
5
6
7
8
9
10
11
12
Capacity (Mbps)
A
18 16 14 False positives
12
9.99 10 8 6 4 2 0 2
3
4
5
6
7
8
9
10
11
12
Capacity (Mbps)
B
20 6.60
18
9.66
16 14 12 10 8 6 4 2 0 2
3
4
5
6
7
8
9
10
Capacity (Mbps)
Fig. 7. Capacity distributions (10-Mbps fibre)
11
12
C
97
98
M. Fornasa and M. Maresca
8.1 ADSL Fig. 6 shows the histograms of the experiment performed on a 7-Mbps ADSL access line over the three traffic conditions previously described. Experiment A (no interfering traffic) shows a unimodal distribution with a mean value of 7.02 Mbps, estimating the downlink capacity with a 0.3% error. Experiment B (interfering traffic on the downlink) results in a quadrimodal distribution, with a mode placed on the actual capacity value (with a mode mean value of 6.88 Mbps, 1.7% error), and other three modes placed at about 2/3 (4.73 Mbps), at 1/2 (3.52 Mbps) and at 1/3 (2.34 Mbps) of the actual capacity value, consistently with the results presented in Section 6.1. Experiment C (interfering traffic on the uplink) shows a strong mode with a mean value of 6.89 Mbps (1.57% error), and a number of samples that largely overestimates the correct capacity value, consistently with the results presented in Section 6.2. 8.2 Fibre Fig. 7 shows the results of the experiments performed on a 10-Mbps symmetric fibre access line. Experiment A (no interfering traffic) shows a unimodal distribution with a mean value of 10.10 Mbps (1% error). Experiment B (interfering traffic on the downlink) exhibits a main mode with a mean value of 9.99 Mbps (0.1% error), and a noticeable peak around 5 Mbps (half of the actual capacity), which can be attributed to the false positives. Experiment C (interfering traffic on the uplink) shows a main mode with a mean value of 9.66 Mbps (3.4% error), and a second mode at 6.60 Mbps, at about 2/3 of the actual capacity value. Further investigations explained the presence of such a strong 2/3-mode with a interfering traffic on the downlink composed of UDP packets generated by an IPTV service active on the access line.
9 Conclusion We have presented a new method to measure the downlink access capacity in a passive way from a measurement point inside the network. The proposed method is an extension of the packet dispersion method. It differs from the traditional packet dispersion method in the fact that it is applied to longer TCP segment sequences. The application of the packet dispersion method to TCP segment sequences longer than two leads to a significant reduction of noise in the measurement. We have started our analysis by introducing the Packet Burst (PB) concept, i.e., a sequence of TCP segments that are so close in time to each other that they need to be buffered in the access downlink queue. It is during these PBs that the model computes the access capacity. We presented an algorithm that exploits the linearity of the ACK passing times time during a PB in order to identify all the PBs during the measurement period. We have shown that it is possible to identify and drop the packet sequences affected by interfering traffic. In particular: 1) the interfering traffic on the downlink access queue is detected taking advantage of the fact that the packets related to the measured traffic and the packet related to interfering traffic can be organized
Passive Access Capacity Estimation through the Analysis of Packet Bursts
99
only around a limited number of patterns; 2) the congestion on the uplink access queue can be detected because it induces the ACK compression phenomenon. We have presented a set of experiments carried out on ADSL and fibre broadband access services to validate the proposed approach. Future work will include large-scale characterization of Internet capacity of a large number of access lines from a measurement point located inside the network. Such a characterization will be possible thanks to the fact that the proposed method is highly scalable as it is passive and does not inject traffic in the network.
References 1. Prasad, R., Dovrolis, C., Murray, M., Claffy, K.: Bandwidth estimation: metrics, measurement techniques, and tools. IEEE Network 17(6), 27–35 (2003) 2. Dovrolis, C., Ramanathan, P., Moore, D.: Packet dispersion techniques and a capacity estimation methodology. IEEE ACM Trans. Netw. 12(6), 963–977 (2004) 3. Zhang, L., Shenker, S., Clark, D.D.: Observations on the dynamics of a congestion control algorithm: the effects of two-way traffic. SIGCOMM Comput. Commun. Rev. 21(4), 133– 147 (1991) 4. Sinha, R., Papadopoulos, C., Heidemann, J.: Internet packet size distributions: Some observations., Tech. Report ISI-TR-2007-643 (May 2007) 5. Jacobson, V.: Congestion avoidance and control. In: Proc. of SIGCOMM 1988, New York, pp. 314–329 (1988) 6. Bolot, J.C.: Characterizing end-to-end packet delay and loss in the internet. Journal of High Speed Networks 2(3), 289–298 (1993) 7. Braden, R.: RFC 1122. Requirements for Internet Hosts – Communication Layers, Updated by RFCs 1349, 4379 (October 1989) 8. Croce, D., En-Najjary, T., Urvoy-Keller, G., Biersack, E.W.: Capacity estimation of ADSL links. In: Proc. of CoNEXT 2008, Madrid, Spain (2008) 9. Dischinger, M., Haeberlen, A., Gummadi, K.P., Saroiu, S.: Characterizing Residential Broadband Networks. In: Proc. of IMC 2007, San Diego, CA, USA (2007) 10. Casetti, C., Gerla, M., Mascolo, S., Sanadidi, M.Y., Wang, R.: TCP Westwood: Bandwidth Estimation for Enhanced Transport over Wireless Links. In: Proc. of ACM Mobicom 2001, Rome, Italy (2001) 11. Fornasa, M., Maresca, M., Baglietto, P., Zingirian, N.: Passive Access Capacity Estimation for QoS Measurement. In: Proc. of IWQoS 2009, Charleston, SC, USA (2009)
A Minimum BER Loading Algorithm for OFDM in Access Power Line Communications Linyu Wang1, Geert Deconinck2, and Emmanuel Van Lil1 1
Div. ESAT-Telemic, Electrical Engineering Department, K.U. Leuven, Kasteel Arenbergpark 10, Bus 2444; B-3001 Heverlee, Belgium 2 Div. ESAT-ELECTA, Electrical Engineering Department, K.U. Leuven, Kasteel Arenbergpark 10, Bus 2445; B-3001 Heverlee, Belgium
[email protected]
Abstract. In this paper, we investigate the resource allocation problem for access broadband power line communications (PLC) according to the OPERA specification for power grids monitoring and control. The proposed loading algorithm attempts to minimize the BER while guaranteeing a certain throughput for Amplitude Differential Phase Shift Keying (ADPSK) modulation, when the attenuation varies significantly in function of the frequency. A performance comparison between the proposed algorithm and the Fischer-Huber algorithm is also discussed. Numerical simulations show that the proposed method can improve the average BER performance. Keywords: Broadband Power line communications, high attenuation variance, loading algorithm.
1 Introduction The concept of data transmission through power lines has attracted a lot of interest during the last decades. One of the main advantages is that a significant cost is saved by operating over the already existing power network infrastructure [1-4]. However, due to the fact that the power supply grids are originally designed for energy delivery rather than high speed communication, the power line network turns out to be rather hostile. The line impedance, (high) attenuation and phase shift may vary with frequency, time, location, and distance [1-4]. The appearance of OFDM has been proved to be an excellent solution to suppress these problems. In an OFDM system, additional significant gains can be achieved by allocating more bits to subcarriers with larger margins, less or even no bits to seriously faded carriers, i.e. using a bit loading algorithm. Over the past years, different power and bit allocation schemes with diverse optimization objectives have been studied [5-8]. Nevertheless, during the earlier period, the power line communications provide much lower speed compared to other alternatives, consequently the purpose of the bit loading research in the past decades is to maximize the overall throughput while guaranteeing a target bit error rate. In 2006, the latest standards of HomePlug and OPERA announced data rates up to 200 Mbps in the physical layer [9], which makes it competitive enough with other technologies, such as Ethernet in its 100 Mbps version, wireless in 100 Mbps at most, R. Szabó et al. (Eds.): AccessNets 2010, LNICST 63, pp. 100–109, 2011. © Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
A Minimum BER Loading Algorithm for OFDM in Access PLC
101
ADSL and so on. The high speed also releases the purpose of bit loading algorithm in PLC from maximizing the throughput. On the other hand, in practice there is class of reliability-demand applications in PLC, which desires to transmit fixed data with a fixed power at the lowest bit error rate. For power grid monitoring and control in our case, the idea is that all the endusers in one network transfer certain status information periodically to the master in a fixed speed at the best reliability, the master collects the status report and responds to the emergency if there is an abnormal status reported. This requires the algorithm to guarantee the reliability rather than improve the throughput. The minimum BER bit loading algorithm appears as the solution of these problems. The OPERA specifications announced on June 2007 are considered in this paper [10]. The OPERA system, supporting raw data up to 200 Mbps, employs OFDM over a bandwidth from 2 to 30 MHz, using 1536 sub-carriers. The sub-carriers adopt Amplitude Differential Phase Shift Keying (ADPSK) for modulation, which is a good solution for power line channels [11-12]. The information is assigned to the phase change and the actual amplitude because (1) the carrier is assumed to be unknown and uniformly distributed; (2) the amplitude of the received signal still providesinformation on the transmitted amplitude even if no channel state information is available at the receiver. The possible modulations per sub-carrier are: DPSK, 4DPSK, 2A4DPSK, 2A8DPSK, 4A8DPSK, 4A16DPSK, 8A16DPSK, 8A32DPSK, 16A32DPSK, 16A64DPSK. The remainder of this paper is organized as follows: in section 2 the considered system is described and the bit-loading algorithm is presented in section 3. Numerical results are reported in section 4. Finally, conclusions and further research are shown in section 5.
2 Optimal Bit Allocation Problem 2.1 System Model Fig.1 shows the considered OFDM system for power line communications. The entire bandwidth is divided into 1536 equal parts, the transfer function of each subcarrier is supposed to be constant during a control interval. With a symbol interval of T and data rate Rb, the number of bits per symbol is then b=RbT bits. The Encoder translates raw input data into coded data with certain gain; here we suppose there are not additional bits generated after encoding. According to the sub-carrier attenuation H estimated from feedback, the data bits b are allocated to sub-carriers respectively; subsequently the allocated bits of each sub-carrier are modulated in a predefined constellation. The time domain symbols are obtained by the inverse discrete Fourier transform and transmitted via a power line channel, the sub-carrier’s transmission coefficients H are obtained from measurements [13]. After Discrete Fourier Transform (DFT), the received samples are demodulated into binary bit streams and then are multiplexed into one flow. The system contains a feedback channel to transfer the absolute value of the transmission coefficient H, as in [5-8], the feedback is assumed to be noiseless and to have no delays, i.e., perfect feedback. In this paper, the effect of the notch or frequency-selective fade is not taken into consideration since the sub-carrier’s channel has been assumed to be flat. In practice, the notch part could be excluded from the bandwidth and this does not affect the process of bit-loading algorithm.
102
L. Wang, G. Deconinck, and E. Van Lil
Fig. 1. Simplified Block diagram of the OPERA OFDM system
2.2 Power Line Channel As described before, the power line channel is rather hostile for high speed data transmission, earlier studies on the characteristics of power lines [1-4] have revealed the situations met in power line communications. First, the high frequency dependent attenuation is serious; the attenuation (in dB) theoretically increases linearly with frequency and distance. A model for the attenuation as a function of frequency and distance is proposed in [13]. By selecting the parameter of c, d, e, g, the attenuation, which represents the amplitude of the transfer function, can be defined by the formula: A( f , l ) = c * f * l + d * f + e * l + g
(dB )
(1)
Fig. 2. A sample Power line channel attenuation with cable length neglecting the impact of notches
A Minimum BER Loading Algorithm for OFDM in Access PLC
103
The dynamic range D of the sub-carrier transmission coefficient H, which shows the attenuation varying with frequency, is expressed as
D = 20 log( H m ax / H m in )
(2)
where Hmax = max (Hi) and Hmin = min (Hi) for i=1…M. From Fig.2, D is getting higher with increasing length; for 50 meters, D is around 40 dB; for 300 meters, D could even reach 60 dB. Second, the noise in the power line channel is not AWGN noise as normal [1-4]. It can be typified into three categories: the colored noise has a relatively low power spectral density which decreases with increasing frequency, the narrowband background noise and the impulsive noise (synchronous and asynchronous with the main frequency). The impulsive noise asynchronous with the main’s frequency is the most detrimental type of noise for data transmission. Its duration varies from a few microseconds to milliseconds and has a random inter-arrival time. 2.3 State of the Art Fischer and Huber proposed in [5] the first minimum BER bit loading algorithm, in this paper, the minimization problem of BER is solved for Quadrature Amplitude Modulation (QAM) constellations with the restrictions that the total energy is constant and the total bit rate is also constant. The optimal solution is obtained from the equation: M
bi = b / M + 1 / M ⋅ log(∏ k =1
Nk ) N iM
(3)
where bi are the bits allocated to the ith subcarrier; for negative bi the corresponding subcarrier will be turned off. M is the total number of subcarriers, b is the bit rate target and Ni is the noise variance at the ith subcarrier. The equation can be applied iteratively until all bi of the remaining subcarriers are positive. Finally, energy is distributed flatly along the remaining subcarriers. This algorithm has the advantage that the BER of M-QAM could be represented by an analytical formula, which makes it simpler than Chow’s algorithm [6]. However, the algorithm suffers challenges in a high D channel. When D is high, the bits allocated to each subcarrier could vary within a large range. As is shown in Fig.3, one possible situation is that some of the subcarriers have reached the maximum bits it could allow when there are still bits left to be allocated. These fully occupied subcarriers should be turned off and excluded from the set of next allocations; the over allocated bits should be withdrawn and reallocated. The reallocation will be performed under two restrictions: 1), turn off the negative subcarriers; 2), keep the bi lower than bmax, and withdraw the bi- bmax bits when bi is larger than bmax. Since some of the subcarriers have been turned off at the beginning without considering the second restriction, to get the optimal result, some of the removed subcarriers have to be turned on again. That is to say, the two restrictions are not independent in this algorithm, which makes the iterative operation more complicated than the original one with one restriction. Furthermore, the BER of ADPSK modulation could not be expressed with an analytical formula; the algorithm needs to perform bit round off as well, because it does not allocate integer number of bits on each carrier.
104
L. Wang, G. Deconinck, and E. Van Lil
a) Allocation Result
b) Attenuation of the subcarriers Fig. 3. A Fischer bits allocation result in high D channel without considering the modulation limitation
Lev Golgfeld introduced a minimum BER power loading algorithm for OFDM in a fading channel in [7]; the optimal power allocation is obtained assuming the same signal constellation in each subcarrier. The aggregate BER is calculated as a function of the average SNR defined as
A Minimum BER Loading Algorithm for OFDM in Access PLC M
SNRav = i =1
SNRi M
105
(4)
This algorithm takes the dynamic range D of subcarrier transmission coefficients Hi into account. It offers the best power allocation in some situations but not all the time as we know.
3 Bit Loading Algorithm The minimum BER loading algorithms try to solve the following problem: min s.t.
M
pe = 1 − ∏ (1 − pe ( SNRi )),
i = 1, 2....M
i =1
M bi = b i =1 M P =P i b i =1
(5)
where pe is the aggregate BER of all the subcarriers, here we suppose the BER of each subcarrier is independent. 2 E SNRi = H i i Ni
(6)
is the SNR in the ith subcarrier, Ei=Pi*T is the allocated energy for the subcarrier, Ni is the spectral density of the noise, Pi is the power allocated for the ith subcarrier and Pb is the total power could be used. By considering the upper boundary of bits allocated to each subcarrier, the bit and power allocation formulated above by Eq.3 can be solved by an improved bit-add way. We start bit allocation under constant transmission power per bit. With certain units of power, the subcarriers which have smaller BER than the threshold, which could be the per_median or per_quarter, get one more bit and stop when the subcarrier reaches the modulation’s upper boundary. When bit allocation is done, the allocated power is adjusted for sets of subcarriers with the same number of allocated bits to equalize the subcarriers SNR. The detail of this algorithm is as follows: 1) Sort all the subcarriers according to the value of attenuation over noise power: |Hi|2/ σ i . 2
2) For all the i ∈ S={1,2,....M}, set bi = 0, Ei = 0,peri =0 and E_allo = Pb*T/b. 3) For all subcarriers in set S, set Ei = E_allo. Find the median BER for all these subcarriers, per_median = median (peri | i ∈ (1,2,…M)), for all the subcarriers that peri < per_median, add one more bit to bi and keep the changed Ei= Ei + E_allo. For the first allocation, the algorithm could be simplified by finding the
106
L. Wang, G. Deconinck, and E. Van Lil
median value of |Hi|2/ σ i instead of calculating and finding the median BER 2
for each subcarrier since they are one-to-one correspondence with each other. 4) If bi (i ∈ S) reaches the upper boundary of the modulation, then remove the corresponding subcarriers from S. 5) Repeat step 3, 4 while
∏
b < b . When the total number of allocated bits
M i =0 i
is equal to the number of bits to be transmitted, namely terminated the bit allocation and go to step 7, if 6) Set b_remove=
∏
∏
∏
M i =0 i
b =b, then
M i =0 i
b >b, go to step 6.
b − b , sort the subcarriers according to the last BER;
M i =0 i
remove one bit and E_allo per subcarrier from the b_remove subcarriers with highest BER. 7) Sort the bi according to the following rule: S(n,i)={(n,i)|bi=k}, where n keeps the number of subcarriers with same allocated bits, i keeps the corresponding index of the subcarrier. k ∈ (0,1,2…10). 8) For the set of subcarriers with the same allocated bits, reallocate the power: n
Ei _new=Ei*|Hi|-2/ (
H j =1
−2 j
).
9) End.
4 Performance Evaluation In this section, we evaluate the performance of the loading algorithm by computer simulations. In order to compare the proposed algorithm with that of Fischer et al, we add an extra limitation of the modulation level in Fischer’s algorithm and the performance under QAM modulation is compared. The channel model is obtained from measurement of OPERA as in [13]. The modeled noise in the simulation is Additive White Gaussian Noise plus impulsive noise for two reasons: 1), the colored noise could be converted into white noise by a prewhitening filter; 2) the impulsive noise is the most detrimental noise for data transmission. The noise PSD is Ni=N0+PimNim
(7)
where Ni is the PSD of the overall noise which includes AWGN N0 and the impulsive noise Nim. Pim is the total average occurrence of the impulsive noise duration in time T and the impulsive noise is given by Bernoulli-Gaussian process, i.e., a product of a real Bernoulli process with expected value p and a complex Gaussian process with 2
2 σ mean zero and variance σ im >> o . Hence, when considering the effect of impulsive noise on the BER performance of OFDM system, the signal to noise ratio should be:
Ei 2 E 2 SNRi = H i i = H i Ni N 0 + Pim N im
(8)
A Minimum BER Loading Algorithm for OFDM in Access PLC
107
The power line length is assumed to be 200 meters; the number of modulated subcarriers is 1536; p is set to be 0.01, the variance of impulsive noise σ im2 is one hundred times larger than σ o2 . Fig. 4 shows the comparison of the BER performance with three different data rates under AWGN plus impulsive noise, D is chosen to be 40 dB, which is normal in PLC. The median value is chosen in the case of 6000 bits and 8000 bits, while the quarter value is chosen for 4000 bits. According to Fig.4, for the case of 6000 bits per symbol, the proposed power allocation scheme has a 2 dB better SNR than the Fischer-Huber algorithm at almost every BER value as shown in b); in the case of 8000 bits in c), the improvement is about 1 dB, which is smaller than in the case of 6000 bits, but is still outstanding. In the case of 4000 bits, the proposed algorithm adopts per_quarter as the threshold for adding one more bit, it is fortunate that when the number of bits is smaller, the change from per_median to per_quarter just induces a slight complexity. For D equals 40 dB, the best break point is around 4800 bits, when the number of bits to be allocated is smaller than 4800 bits, it is better to adopt per_quarter rather than per_median, a tradeoff between BER performance and complexity. A comparison of the proposed algorithm with or without power allocation is also shown in Fig.4. It is clearly shown that power allocation can further improve the performance in our case. In addition, by introducing the median or quarter parameter, the iterations can be reduced sharply compared to bit-add algorithm [8], which makes the computation time becomes o(M) in the worst case, a big improvement over H-H algorithm.
a) D=40dB, b=4000 bits, M=1536. Fig. 4. Performance comparison (BER versus Average SNR) between the Algorithms
108
L. Wang, G. Deconinck, and E. Van Lil
b) D=40dB, b=6000 bits, M=1536.
c) D=40dB, b=8000 bits, M=1536. Fig. 4. (continued)
5 Conclusion In this paper, we proposed a fast loading algorithm for power line carriers with a high variation in attenuation. In this kind of carriers, loading algorithms without considering the modulation limitation are not good choices any more. The proposed algorithm in this paper is an improved bit-add algorithm, it minimizes the BER based on the
A Minimum BER Loading Algorithm for OFDM in Access PLC
109
criterion that the subcarrier which has the smallest BER with unit power gets one more bit, by introducing the median value, the algorithm could simplify the computational effort of bit-add algorithm. The OPERA specification also offers a HURTO mode for control information or data information that needs high reliability without losing efficiency [10]. As a future work, we will investigate the mixed system for data transmission with different priorities.
References 1. Pavlidou, N., Vinck, A., Yazdani, J.: Power Line Communications: State of the Art and Future Trends. IEEE Commun. Mag., 34–40 (April 2003) 2. Yousuf, M.S., El-Shafei, M.: Power Line Communications: An Overview - Part I” Innovations in Information Technology. In: 4th International Conference on Innovations 2007, November18-20, pp. 218–222 (2007) 3. Yousuf, M.S., El-Shafei, M.: Power Line Communications: An Overview - Part II” Information and Communication Technologies: From Theory to Applications. In: 3rd International Conference on ICTTA 2008, April 7-11, pp. 1–6 (2008) 4. Lu, L., Brito, R., Song, Y.: QoS and performance of REMPLI PLC network. In: 1st Workshop on Networked Control System and Fault Tolerant Control - NeCST Workshop 2005, Ajaccio/France (2005) 5. Fischer, R.F.H., Huber, J.B.: A new loading algorithm for Discrete Multitone Transmission. In: Proc. IEEE Globecom 1996, London, pp. 724–728 (November 1996) 6. Chow, P.S., Cioffi, J.M., Bingham, J.A.C.: A practical Discrete Multitone Tranceiver Loading Algorithm for Data Transmission over Spectrally Shaped Channels. IEEE Transactions on Communications 43(2/3/4), 773–775 (1995) 7. Goldfeld, L., Lyandres, V., Wulich, D.: Minimum BER Power Loading for OFDM in Fading Channel. IEEE Transactions on Communications 50(11) (November 2002) 8. Hughes Hartogs, D.: Ensemble Modem Structure for Imperfect Transmission Media, U.S. Patents, Nos. 4883706 9. Morosi, S., Marabissi, D., Del Re, E., Famtacci, R., Del Santo, N.: A rate adaptive bitloading algorithm for a DMT Modulation system for in- building power line communications. In: IEEE GLOBECOM, 28 November-2 December (2005) 10. First draft of OPERA specification version 2, http://www.ist-opera.org/drupal2/?q=node/56 11. Fischer, R.F.H., Lampe, L.H.-J., Calabro, S.: Differential Encoding Strategies for Transmission over Fading Channels, http://www.lnt.de/LITdoc/papers/aeu_00.pdf 12. Lampe, L.H.-J., Fischer, R.F.H.: Comparison and Optimization of Differentially Encoded Transmission on Fading Channels, http://www.lnt.de/LITdoc/papers/plc99.pdf 13. Path loss as a function of frequency, distance and network topology for various low voltage / medium voltage European power-line networks, http:// www.ist-opera.org/opera1/project_outputs_available.html.htm
ACCESSNETS 2010
Technical Session 4: Sensor Networks
Self-repairing Clusters for Time-Efficient and Scalable Actor-Fault-Tolerance in Wireless Sensor and Actor Networks Loucif Amirouche1 , Djamel Djenouri2 , and Nadjib Badache2 1
El-Djazair Information Technology, Algiers, Algeria
[email protected] 2 CERIST Research Center, Algiers, Algeria
[email protected],
[email protected]
Abstract. A new solution for fault-tolerance in wireless sensor and actor networks (WSAN) is proposed. The solution deals with fault-tolerance of actors, contrary to most of the literature that only considers sensors. It considers real-time communication, and ensures the execution of tasks with low latency despite fault occurrence. A simplified MAMS (multipleactor multiple-sensor) model is used, where sensed events are duplicated only to a limited number of actors. This is different from the basic MAMS model and semi-passive coordination (SPC), which use data dissemination to all actors for every event. Although it provides high level of faulttolerance, this large dissemination is costly in terms of power consumption and communication overhead. The proposed solution relies on the construction of self-repairing clusters amongst actors, on which the simplified MAMS is applied. This clustering enables actors to rapidly replace one another whenever some actor breaks down, and eliminates the need of consensus protocol execution upon fault detection, as required by the current approaches to decide which actor should replace the faulty node. The extensive simulation study carried out with TOSSIM in different scenarios shows that the proposed protocol reduces the latency of replacing faulty actors compared to current protocols like SPC. The reduction of the overall delay for executing actions reaches 59%, with very close faulttolerance (action execution success rate). The difference for this metric does not exceed 8% in the worst case. Scenarios of different network sizes confirm the results and demonstrate the protocol’s scalability.
1
Introduction
A wireless sensor and actor network (WSAN) is a heterogenous networ where nodes communicate through wireless links to cooperatively monitor the environment and accordingly react on it. Sensors are small and usually static devices with limited resources, while actors or (actuators) are more powerful devices, equipped with more powerful resources. Actors are able either to move and perform appropriate actions, or launch an action on several actuation devices(action mobility). Sensors are responsible for sensing the physical environment, while actors use data collected by sensors to make appropriate decisions and accordingly R. Szab´ o et al. (Eds.): AccessNets 2010, LNICST 63, pp. 113–123, 2011. c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
114
L. Amirouche, D. Djenouri, and N. Badache
react on the environment. There is a variety of WSAN’s applications, such as forest monitoring and fire extinguishing, battlefield surveillance, intrusion detection, automatic irrigation of cultivated fields, and last but not least biomedical applications. In many applications, tolerating the breakdown of sensors, and particularly actors is mandatory for real deployment. Many solutions offering fault-tolerance to sensors have been proposed thus far, but they completely ignore actor faults. One of the common techniques used to increase availability, and recently used to enable fault tolerance is the MAMS (multiple-actor multiplesensor) model. In this model, every single event is distributed to all actors in the network. The few solutions dealing with actor fault-tolerance use this model, which in addition to the high complexity, it requires a consensus arrangement between actors for every single event involving action. SPC (semi-passive coordination) reduces the need of consensus protocol execution by fixing a single primary actor. Still, a consensus is needed to decide which actor should be used to replace the primary actor whenever it breaks down. The proposed solution used a simplified version of MAMS, where the number of duplications is largely reduced, and consensus step is eliminated. First, a clustering protocol is proposed, which is executed once at the network initialization. It permits to divide sensors into clusters with one actor as clusterhead, then to group each couple of actors able to replace each other into a high level cluster (including the two actors and their members). The large cluster is called selfrepairing cluster or SR-cluster, as it is able to automatically replace one of the actors with the other as soon as it breaks down. To ensure this, MAMS is applied, but only within the SR-cluster domain. That is, a sensor reports events to its cluster-head (primary actor), as well as the other actuator of the SR-cluster (secondary actor with respect to this sensor). Comparative simulation study with TOSSIM shows the proposed method considerably reduces the execution latency compared to SPC approach, while keeping fault-tolerance high enough compared to fault-intolerant solutions. The remaining of the paper is organized as fellows: The related work is presented in the next, followed by the new solution in section 3. Simulation results are presented in Section 4, and finally Section 5 concludes the paper and summarizes the perspectives.
2
Related Work
Fault-tolerance in wireless sensor networks (WSN) have been largely considered by the research community, and several solutions have been proposed. Different approaches have been used, such as information sharing [1], information filtering [2], clustering [3], data checkpointing and recovery methods [4]. Nonetheless, these solutions do not apply directly to WSAN, notably to actors’ failure, due to their heterogeneity and the special features of actors in terms of energy, computation and storage capacity, etc. More importantly, actors tend to be deployed in limited number, and tolerating their fault is critical to design reliable applications. The first survey dealing with WSAN and research challenges is [5]. In
Self-repairing Clusters for Time-Efficient and Scalable Actor-Fault-Tolerance
115
[6] the authors propose the use of multi-actor multi-sensor (MAMS) model to ensure fault-tolerance in WSAN. In this model every sensor sends data to several actors, and every actor receives data from several sensors in the event area. This model is obviously more fault-tolerant than the single-actor multi-sensor (SAMS) model. However, for each event a consensus among actors is needed to elect a primary actor that will react upon the event. This requires a costly negotiation step (consensus) to be executed for each actuation event. Semi-passivecoordination (SPC) [7] is an improvement of the basic MAMS model, where only one actor is used as primary, and the others are considered as backups. Sensor-actor communication is done in three phases; Broadcast, decision, and update. A sensor si capturing an event ei submits the collected data towards all the actors. Backup actors forward the data to the primary actor, which is the main responsible for execution of actions related to the event ei . Once a decision is made by the primary actor, an update message is sent to all backup actors using some group communication protocol [8] [9]. Accordingly, all the backup actors acknowledge the update message. When the primary actor breaks down, a backup actor is elected as a new primary actor using an election algorithm [10]. The new primary actor sends an update message to all backups and waits for receiving all acknowledgements. This technique rises two major problems. The first one is lack of scalability, as a unique actor cannot responde to all events in a large network. The second is the action execution latency when the primary actors breaks down. The proposed solution tackle these issues and proposes a scalable approach that ensures fast substitution of faulty actors.
3 3.1
New Solution Network Model
We suppose nodes are densely deployed in the event area, enabling availability of multiple routes between any two communicating nodes. Each actor is able to cover a limited area of the sensed region. Number of Actors is supposed high enough to cover the whole sensed region, with enough redundancy on coverage such as every actor can be replaced by at least another one in case of fault. All sensor are assumed to be aware of their direct (one hop) neighbors. To ensure this a simple neighbor discovery protocol can be run at network setup. All nodes are supposed to be synchronized. A synchronization algorithm, like [11] [12], can be used for this end. The proposed solution applies to both senor/actor (SA) model, and sensor/actor/actuation-device (SAA) model. In the first case, actor mobility is needed to replace faulty nodes. The second model is much efficient as it separates the actuation device from the action decision, and eliminates the need of a mechanical movement as long as actuation devices are correctly operating. 3.2
Solution Description
The proposed protocol divides the network into several equal-size self-repairing clusters, where every sensor is associated to a single primary actor then every two
116
L. Amirouche, D. Djenouri, and N. Badache
actors are gathered in a higher level cluster called self-repairing cluster or (SRcluster). The SR-cluster may be considered as the fusion of the two clusters and each actor is considered as secondary clusterhead by members of the other actor’s small cluster. A simple MAMS model is used within the SR-cluster, where data are sent to both cluster-heads. As soon as one of the two clusters breaks down, the other one replaces it and executes the action. The use of only two clusters instead of using more eliminates the need of any consensus protocol execution to replace the faulty actor, which accelerates the execution of the waiting actions. The proposed protocol runs in the following four steps. 3.3
Phase 1: Hello Propagation
This phase enables the creation of primary clusters, along with construction of routes towards the primary cluster-head. An actor, CH-source, in an event region W-source, initially broadcasts a HELLO message with a fixed TTL, i.e. the packet will be propagated up to TTL’s value hops. The TTL value may depend upon the residual energy of the actor, the number of its neighboring nodes, etc. The HELLO packet carries information about the original actor along with routing information, which is updated on each hop. Two classes of routes are defined; real-time paths (RTP) and low-energy paths (LEP). Routing information carried in the HELLO packet that reflects the energy level and the latency of nodes on the route traversed thus far by the packet are used respectively to update LEP and RTP tables. The metric of the route constructed by a HELLO packet is simply the cumulative cost (energy and delay for LEP and RTP routes respectively). Each free sensor (FS) receiving the HELLO packet becomes a member of the CH-source, or M-source (member of the source cluster). When the HELLO packet reaches a node belonging to another cluster, say CH-destination, it becomes a sensor border, SB, and launches the second step of the protocol to attempt gather CH-source and CH-destination in an SR-cluster. This phase is launched asynchronously by every actor, once at the initialization of the network. 3.4
Phase 2: SR REQ
After receiving a HELLO packet from all its neighboring nodes, or after a timeout from receiving the first HELLO packet, the SB sends an SR REQ packet (Selfrepairing cluster construction request) towards its primary actor, CH-destination, through the RTP path. It includes information about the actor originator of the HELLO packet, CH-source. This information is used by the CH-destination to check if the CH-destination can cover CH-source area, which is a vital conditions for constructing an SR-cluster. Then after collecting SR REQ packets from different SB the CH-destination responds accordingly by a positive or negative SR REP towards two SB that it chooses as sensor gateways (SG) if the response is positive. The choice of these gateways depends on the current residual energy of available candidates [13], to provide long-time reliable communication between the two clusters. The response message takes the reverse RTP path towards the two selected SG, which are in charge of launching the third step.
Self-repairing Clusters for Time-Efficient and Scalable Actor-Fault-Tolerance
Algorithm 1. Script Describing the Protocol Initialization if (Node is Actor) then Broadcast HELLO end if When receive HELLO if (Node is FS) then Calculate RTP and LEP to CH Source and Update routing Table if (HELLO.Hop < HELLO.TTL) then HELLO.Hop++ Update and Broadcast HELLO end if else Set node state to SB Wait For Receiving Hello from all neighbors or timeout Initialize and Send SR REQ to CH dest end if When receive SR REQ if (Node is Actor) then Select best two SG if (SR cluster construction condition is TRUE) then Initialize and Send positive SR REP to SG else Initialize and Send negative SR REP to SG end if else Update and Forward SR REQ to CH dest end if Whene receive CA REP if (Node is Sensor) then Update and Forward HELLO REP to SG if (Node is SG) then if (CA REP is positive) then Node sate = SG Update and Broadcast positive HELLO REP else Update and Forward negative HELLO REP to CH source end if end if end if receive HELLO REP if (CA REP is positive) then Calculate RTP and LEP to SG and Update routing Table if (HELLO REP .Hop < HELLO REP .TTL) then HELLO REP .Hop++ Update and Broadcast HELLO REP end if else Update and Forward negative HELLO REP to CH Source end if When receive data to forward Switch(data.QoS) case(MA-RTP): Use RTP to send data to CHsource and CHdest case(SA-RTP): Use RTP to send data to CHsource only case(MA-LEP): Use LEP to send data to CHsource and CHdest case(SA-LEP): Use LEP to send data to CHsource only
117
118
3.5
L. Amirouche, D. Djenouri, and N. Badache
Phase 3: Route Update
During this phase, if the CA REP is negative the two SG just transmit in unicast a negative HELLO REP to CH-source. This means that CH-destination cannot cover CH-source’s event zone, which prevents the construction of the SRcluster. The actor CH-source may decide to increase the TTL and rebroadcast the HELLO packet to search for another possible backup. This can also be done if the actor does not receive any HELLO REP , i.e. no SB has been reached. On the other hand, if the CA REP is positive the two SG broadcast a positive HELLO REP with doubled TTL such that to reach sensors of the two clusters and to update entries towards the SG in the sensor’s RTP and LEP routing tables. 3.6
Phase 4: Data Transmission
As soon as routing tables of all M-source sensors in W-source are updated, each one would be able to reach CH-destination through the two SB1 . Four modes are used for data transmission in an SR-cluster, following the required QoS of the data packet. The first mode is multi-actor real-time path (MA-RTP), where sensors send data to both actors using RTP routing. This mode is the most reliable and delay-efficient, and it may be used for critical data where reaction time is required to be minimal. In this case, the backup actor may react to the event if no ACK of action execution is received from the responsible actor. Substitution is then performed rapidly. The second mode is multi-actor lowenergy path (MA-LEP), where data are sent to both actors but using LEP routing. This mode also ensures a fault-tolerance but with possible small extra delay for the sake of saving energy. It can be used to send data related to events where reaction is critical but not necessary in realtime. The remaining modes are single-actor real-time path (SA-RTP) and single-actor low-power path (SALEP). They use only one actor and may be used for real-time traffic and regular traffic (respectively) that may tolerate non-execution of action. The protocol is illustrated in Algorithm 1.
4
Simulation Study
The proposed protocol has been compared by simulation using TOSSIM [14] with the SPC approach (SPC-like protocol) and a basic protocol with single actuator for each region (SA), which does not provide any fault-tolerance. Two metrics have been considered in scenarios with faulty actors: i) efficiency in executing actions (success rate), which is the ratio between the number of executed actions vs. the total tasks (that rise actions) launched, ii) the execution delay (of successful actions), as the time separating the detection of the event (that rises an action) and the execution of the action. The protocols has been evaluated in configurations with different error rates (the rate of faulty actors), and different 1
This is identical for sensors of the other cluster.
Self-repairing Clusters for Time-Efficient and Scalable Actor-Fault-Tolerance
119
levels of network size (scalability). Each point of the plots presented hereafter is the average of 10 measurements, with 95% of confidence interval. Figures 1 and 2 show the performance metrics vs. error rates. In each execution, every actor’s state is randomly set to faulty with probability equals to the appropriate error rate. A grid topology of 150 uniformly distributed nodes (10 ∗ 15) has been used, among them 10 equally distant nodes have been configured as actors. The T T L value has been set to 5. Success rate of the proposed protocol (SR-cluster) presented in Figure 1 is not much affected by the increase of error rate, and kept above 88%. The difference between SR-cluster and SPC-like is minor compared to the difference with SA; it does not exceed 8%, whereas the difference between SR-cluster and SA varies between 10% and 38%. SPC-like uses all actors as potential substitute of faulty actors, while in SR-cluster each actor may be replaced only with one actor (secondary cluster-head of the SR cluster). Trivially, the probability that all actors are faulty is less than the probability that two clusters are so, which justifies the superiority of SPC-like and the small difference vs. SR-cluster. However, the cost of the highest fault-tolerance provided by SPC-like is a very high latency, Figure 2. SA ensures a stable and the lowest delay. The delay of SR-cluster is inevitably higher than SA, and smoothly increases with the error rate. The difference between the two protocols is due to the delay of executing actions requiring actor substitution (in case of failure of primary actors), which do not occur for SA that does not ensure any tolerance. i.e. In case of failure, SA just ignores the action and thus no delay is accounted. Substitution delay of SR-cluster is limited to a timeout for ACK reception at the secondary actor, upon which the replacement procedure is immediately launched. Nonetheless, for SPC-like this delay involves a delay of consensus protocol execution between all actors to elect a substitute. The latter is considerably affected by the error rate (causing increase of number of substitutions). This justifies the highest delay of SPC-like and the dramatic increase. The difference between SR-cluster and SA is around 1 sec, while the difference between SR-cluster and and SPC-like reaches almost 2.7 sec, i.e 60% reduction for SR-cluster over SPC-like. Figures 3, 4 show the performance metrics in scenarios of different sizes, where the number of nodes has been varied from 25 (grid of 5∗5) to 300 (grid of 15∗20). The number of actors has been varied between 2 to 15 (2, 4, 6, 10, 12, 15 for each grid respectively), and the actors have been uniformly distributed within the grid. The error rate has been set to 40%. We remark that plots of Figure 3 has the same shape as in Figure 1, except a stable but still low success rate for SA. The same can be realized for the delay metric (Figure 4), with the exception of linear increase for SA, which still has the lowest delay. Increase and decrease of the delay and success rate respectively for all protocols, are due to the increase of the network size. This increase inevitably rises the number of hops in routes, which rises the delay. It also rises collisions and thus reduces the success rate. The two figures illustrate that the proposed protocol scale well in balancing the success rate and the latency.
L. Amirouche, D. Djenouri, and N. Badache
100
Success rate %
90
80
SR-Cluster SPC-like SA
70
60
50
40 10
15
20
25
30 35 Error rate %
40
45
50
45
50
Fig. 1. Success rate vs Error rate
5000 4500 4000 SR-Cluster SPC-like SA
3500 Delay (ms)
120
3000 2500 2000 1500 1000 500 10
15
20
25
30 35 Error rate %
Fig. 2. Delay vs Error rate
40
Self-repairing Clusters for Time-Efficient and Scalable Actor-Fault-Tolerance
100 SR-Cluster SPC-like SA
95
Success rate %
90 85 80 75 70 65 60 55 0
50
100
150 Nodes
200
250
300
Fig. 3. Success Rate vs Number of nodes
6000
5000 SR-Cluster SPC-like SA
Delay (ms)
4000
3000
2000
1000
0 0
50
100
150 Nodes
200
Fig. 4. Delay vs Number of nodes
250
300
121
122
5
L. Amirouche, D. Djenouri, and N. Badache
Conslusion
A new delay-efficient fault-tolerant solution has been proposed, which considers actor faults. The solution relies on a two-level hierarchical clustering, and the use of a simplified MAMS (multiple-actor multiple-sensor) communication model. It includes a simple clustering protocol that runs once at the network initialization. It first divides nodes into equal-size clusters with one actor as cluster-head. After that, each two-clusters are gathered in a higher level of hierarchy cluster, called SR-cluster (self-repairing cluster). This cluster ensures self-repairing as it allows each actor to automatically replace the other as soon as the later breaks down. To provide this, events of the two clusters are duplicated but only towards the two actors (simplified MAMS). Limiting the number of actors in the SR-cluster to two eliminates the need of any consensus protocol running step required by the current actor-fault-tolerant solutions, namely SPC (semi-passive coordination), and the basic MAMS. Simulation results carried out using TOSSIM show the proposed protocol (SR-cluster) ensures a fault-tolerance very close to that of SPC, while considerably decreasing delays in executing actions (up to 59%). The cost of this delay reduction is inevitably a minor decrease in fault-tolerance, but the difference does not exceed 8%. Compared to a fault-intolerant protocol with single actor (SA), both protocols (SR-cluster and SPC) provide much higher fault-tolerance. SR-cluster provides from 10% to %38 more performance than SA, which is by far higher than the difference between SR-cluster and SPC. The proposed protocol is thus very appropriate for realtime applications. Furthermore, eliminating the large duplication towards every actor as well as the consensus protocol execution upon each actor failure would be power-efficient (compared to SPC and the basic MAMS). Investigating this issue by measuring some energy metrics represents a perspective to this work. Mathematical analysis of the solution is also in the perspectives.
References 1. Clouqueur, T., Saluja, K.K., Ramanathan, P.: Fault tolerance in collaborative sensor networks for target detection. IEEE Trans. Comput. 53(3), 320–333 (2004) 2. Ding, M., Liu, F., Thaeler, A., Chen, D., Cheng, X.: Fault-tolerant target localization in sensor networks. EURASIP J. Wirel. Commun. Netw. 2007(1), 19–19 (2007) 3. Gupta, G., Younis, M.: Fault-tolerant clustering of wireless sensor networks. In: IEEE Wireless Communications and Networking, 2003, WCNC 2003, pp. 1579– 1584 (2003) 4. Salehy, I., Eltoweissy, M., Agbariax, A., El-Sayedz, H.: A fault tolerance management framework for wireless sensor networks. Journal of Communications 2(4) (2007) 5. Akyildiz, I.F., Kasimoglu, I.H.: Wireless sensor and actor networks: research challenges. Ad Hoc Networks 2(4), 351–367 (2004) 6. Ozaki, K., Kenichi, W., Satoshi, I., Naohiro, H., Tomoya, E.: A fault-tolerant model for wireless sensor-actor system. In: 20th IEEE International Conference on Advanced Information Networking and Applications (AINA 2006), IEEE Digital Library (2006)
Self-repairing Clusters for Time-Efficient and Scalable Actor-Fault-Tolerance
123
7. Ozaki, K., Watanabe, K., Enokido, T., Takizawa, M.: A fault-tolerant model of wireless sensor-actuator network. Int. Journal of Distributed Sensor Networks 4(2), 110–128 (2008) 8. Schiper, A., Birman, K., Stephenson, P.: Lightweight causal and atomic group multicast. ACM Trans. Comput. Syst. 9(3), 272–314 (1991) 9. Nakamura, A., Takizawa, M.: Causally ordering broadcast protocol. In: The 14th IEEE International Conference on Distributed Computing Systems (ICDSC), pp. 48–55 (1994) 10. Nikano, K., Olariu, S.: Uniform leader election protocols for radio networks. IEEE Trans. Parallel Distrib. Syst. 13(5), 516–526 (2002) 11. Boukerche, A., Martirosyan, A.: An efficient algorithm for preserving events’ temporal relationships in wireless sensor actor networks. In: Proceedings of the 32nd IEEE Conference on Local Computer Networks, LCN 2007, pp. 771–780. IEEE Computer Society, Washington (2007) 12. Ganeriwal, S., Tsigkogiannis, I., Shim, H., Tsiatsis, V., Srivastava, M.B., Ganesan, D.: Estimating clock uncertainty for efficient duty-cycling in sensor networks. IEEE/ACM Trans. Netw. 17(3), 843–856 (2009) 13. Djenouri, D., Badache, N.: An energy efficient routing protocol for mobile ad hoc network. In: The second IFIP Mediterranean Workshop on Ad-Hoc Networks, MedHoc-Nets 2003, Mahdia, Tunisia, pp. 113–122 (June 2003) 14. Levis, P., Lee, N., Welsh, M., Culler, D.: Tossim: accurate and scalable simulation of entire tinyos applications. In: Proceedings of the 1st International Conference on Embedded Networked Sensor Systems, SenSys 2003, pp. 126–137. ACM, New York (2003)
ACCESSNETS 2010
Invited Talk
Bit-Error Analysis in WiFi Networks Based on Real Measurements G´ abor Feh´er Budapest University of Technology and Economics, Department of Telecommunications and Informatics, Magyar tud´ osok krt. 2, 1117 Budapest, Hungary
[email protected]
Abstract. The IEEE 802.11 standard is coming from 1999. Since that time lots of research paper were born analyzing WiFi networks. However, until the recent years, WiFi devices and drivers were on closed source, so measurements could rely only on those features that the vendors offered for them. For such reason there could be hardly any research focusing on the bit level internals of WiFi transmissions. Today we already have better tools to access the WiFi devices. This paper presents measurements in real WiFi scenarios and shows what happens with the message bits on their flight. The paper also highlights that the implementation of WiFi devices are very different and using a single parameter set to model them is inappropriate and might be misleading. abstract environment. Keywords: wireless, WiFi, bit-errors, measurement.
1
Introduction
The IEEE 802.11, WiFi transmission is a frequent research topic. WiFi networks are everywhere and researchers want to tune WiFi networks to its best performance. The performance of this wireless network was investigated and published in many papers [1,5,4]. However, most of the investigations and measurements focuses on the frame transmission as an atomic event, and they have drops and successful transmissions, but never have errors within the transmitted frame. The 802.11 standard defines to use CRC checksum to protect the integrity of the frames. Whenever a frame gets corrupted during the transmission, the WiFi device at the receiver will check the checksum and drop the frame if it contains errors. People who are not hacking WiFi drivers are unable to control the check, so they are forced to use correct frames only. The drop is obvious, when the sender side sent the frame, but the receiver side did not get it. There are only a low number of publications that really focuses the internals of the WiFi transmission. Giuseppe Bianchi and his group has a modified Atheros driver, which is able to show more than the everyday user or researcher can see from a wireless frame transfer. In their publications [2,6,10] they did measurements using the Atheros card and a modified open soruce driver (MadWifi [8]). In their recent publication they concluded, that taking measurements without R. Szab´ o et al. (Eds.): AccessNets 2010, LNICST 63, pp. 127–138, 2011. c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
128
G. Feh´er
understanding the implementation details may lead to biased experimental trials and/or to erroneous interpretation of experimental results. Hacking Atheros chipset based 802.11b/g cards and their drives was very popular among researchers and it is still popular even today. The reason is that there exists an open source driver, called MadWiFi, which allows modifications in the MAC (Media Access Control) layer. The Atheros card still has a closed source firmware, but it is thin and most of the MAC level frame processing is done in the driver. Using the MadWifi driver, it is possible to capture all the frames that the receiver card gets, even those frames that are corrupted during the transmission. 1.1
Novel Measurements
This publication steps over the Atheros cards and the MadWifi driver. In the following sections we introduce how Linux systems were improved in the view of wireless drivers and their capabilities. We present the current technologies that are available to capture and also to transmit wireless data as the user, developer or researcher wants it. Using the capture and transmit functions we demonstrate that the implementation of wireless devices are so different that it is impossible to describe them with a model and a single parameter set. After the presentation of our initial measurements we show more measurement results analyzing bit errors during the wireless transmissions.
2
Linux Support for WiFi Capture
WiFi cards are very different in general, but there are a few things in their implementation that is common. Excluding the System-on-Chip design, they are all built around a chipset coming from a specific vendor. They have a code, called firmware to drive the chip inside and they have a driver software running on the host machine to communicate with the card. Regarding the firmware and the driver, at the beginning of the WiFi device productions, somewhere in the late 90’s, vendors put all the card control software into firmware. This is called FullMAC, where almost anything related to the WiFi transmission managed on the card itself. The driver was thin, its function was to feed the card with outgoing packets and receive the incoming ones. Later, the implementation design changed completely. Nowadays vendors produce so called SoftMAC cards and drivers, where the firmware is thin and the driver part is responsible to do the MAC functions. Indeed, only the physical layer related codes (e.g. modulation) and some regulatory codes remained on the card, all the MAC function went to the host machine. This transition opened a path to make modification in the MAC functions. 2.1
mac802111 in Linux
WiFi device vendors usually have trade secrets, so at the beginning their drivers were not open for modifications. In the middle of 2003 an open source driver for
Bit-Error Analysis in WiFi Networks Based on Real Measurements
129
Atheros based cards, the Multiband Atheros Driver for WiFi (MADWIFI) came out. This driver still exists and maintained. Due to its open source, it is already possible to put any modifications to the driver. Later other open source drivers were developed for various chipsets. Some vendors, just like Ralink also helped the open source Linux system by offering official open source drivers. In 2007 a new wireless development paradigm appeared in Linux. Developers started to build a common platform for the various wireless drivers. This is the mac80211 development, where conceptually the MAC layer is the part of the Linux kernel. Firmwares and drivers are thin now, the MAC functionality is positioned to the common kernel. This movement has the great advantage that developers can place their codes into the common MAC code and it will run on all cards that fully support the new architecture. Various models from Atheros, Broadcom, Orinoco, Ralink, Realtek, ZyDAS cards are already supported. As time goes on more and more cards become available with mac80211 support. All the devices that run the recent Linux kernel already support mac80211 by default. An exciting feature of the mac80211 code is that the virtual devices can be created easily. Moreover, there is an operation mode, called MONITOR mode, where the card is set to capture all the frames that it can get. The MONITOR mode can be instructed to capture not just the correct frames, but also the damaged ones. There are two kinds of damages a frame might have suffer. First, when the card is able to detect the PLCP (Physical Layer Convergence Procedure) preamble and is able to synchronize to it, but there is error in the payload. This is the CRC error, as the 32 bit Cyclic Redundancy Check (CRC) value will signal the problem. The second type of error is the PLCP error, where the card is unable to synchronize to the preamble. 2.2
The RADIOTAP Header
Another exciting feature of the Linux wireless code that it supports various extra information regarding the receiving procedure of the actual frame. The information is collected into a filed, called radiotap [9] header, when the wireless device is in monitor mode. In this case the radiotap header is inserted to the beginning of the frame. Through the radiotap header the following most important characteristics can be obtained for a received frame: the antenna number, on which the frame was received, antenna noise power, antenna signal power, channel, rate, CRC error, PLCP error. As a bonus feature, radiotap header is not only meant for receiving frames, but frames can be transmitted with it as well. The radiotap header should be inserted before the frame and then the frame should be sent to the interface, which is in monitor mode. Adding the radiotap header, it is possible to set up for example the transmission rate and the transmission power of the frame. 2.3
Linux on the Access Point
The previously mentioned mac80211 code, monitor mode and radiotap header are available for all machines that run Linux. Linux is not limited to desktops only, but we can find even Access Points that run the Linux operating system.
130
G. Feh´er
In fact, it is a cheap choice for Access Point vendors, since they have a powerful operating system without costly licenses (Actually Linux has the GNU General Public License, but vendors tend to forget it). There is developer community that creates OpenWRT [7] a Linux based firmware for various Access Points. Currently the development supports 80 different Access Points coming from 37 different vendors. In addition plus 80 more Access Points are marked as work in progress. Since OpenWRT is Linux based, roughly all Linux based Access Point with not less than 4 MB ROM can run it. The kernel and the drivers are the same as for desktop Linux, except the CPU architecture is usually different. Naturally, OpenWRT is based on a recent Linux kernel, so radiotap functions are available during frame captures and transmissions. 2.4
Wireless Card Drivers in Windows 7
In the Windows operating system it was already planned in 2002 to introduce virtual WiFi adapters and share the resource of the single WiFi device [3]. Unfortunately, at that time there was no driver support from the vendors. Starting with Windows 7, Windows already implement a virtual WiFi interface, however its capability is limited. Hopefully in the future we can see more advancement on the Windows line as well.
3
WiFi Measurements
We made various measurements using modified software on the Access Point and the WiFi clients. For all the measurements we used a relatively cheap ASUS WL-500gP Linux based Access Point. Due to some implementation problems, we replaced the original Broadcom WiFi card with an Atheros card. This modification was necessary, since at the time of the measurements, Broadcom had not released a mac80211 architecture based Linux driver yet, while Atheros did. Today we already have mac80211 support on Broadcom devices as well. 3.1
Measuring Various Clients
First of all we measured three different client side WiFi devices in order to get an initial picture of the radio chipset capabilities. All the tested devices were off the shelf USB stick. This test was a simple one. We had the Access Point to broadcast test frames on a 48 Mbps rate to a certain multicast address. The three different WiFi clients were switched to monitor mode and recorded all the transmissions that their radio were capable to receive. The three WiFi devices were placed about 7m away from the Access Point. The antenna of the Access Point was detached to have worse signals. There were no ACK frames, since the measurement frames were multicast frames and it is not acknowledged by the receivers. We repeated the tests three times and measured how many valid frames were captured by each device. The repeated test happened roughly the same time, the background traffic and noise of the radio channel can be
Bit-Error Analysis in WiFi Networks Based on Real Measurements
131
10000 SMC WUSB-G
9000
SMC WUSB-G2
8000
D-Link DWL-G122 c1
Captured frames
7000 6000 5000 4000 3000 2000 1000 0 1
2
3
Test set
Fig. 1. Receiving WiFi transmission on different WiFi cards
considered static during the tests. The measurement results are displayed in Fig. 1. Based on the measurement results we can observe that there is a huge difference among the capabilities of the tested devices. The best device in our tests was the old version of the SMC WUSB-G stick, while the worst performer was the D-Link DWL-G122 device capturing the least frames. The SMC devices had ZyDAS chipset, while the D-Link had Ralink. All of the devices gave a steady performance, as they reacted the same way in the same situation. As a first conclusion we can state that due to hardware or software reasons WiFi clients perform differently under the same circumstances. Thus we cannot make a simple model of a generalized WiFi client, where only the radio channel parameters are presented. In contrast, we can measure the actual performance of a given WiFi client and we can assume that this performance does not fluctuate while the conditions of the radio channel are the same. In the further measurements we used the SMC WUSB-G WiFi client, as this device had the best performance in the previous tests. 3.2
Measuring Different Channel Conditions
In this measurement configuration we had an indoor scenario presented on Fig. 2. There was one Access Point in one of the room, signaled with AP on the figure. We have 6 indoor positions for the wireless client. First the 4m scenario, where the client was in the same room as the Access Point, placed 4 meter away from it. In the 6m scenario there was already a thin wall between the Access Point and the client. The distance of the Access Point and the client is estimated for 6 meter. The 1 room scenario has one room (i.e. 2 walls) between the Access Point and the client. The 2 rooms, 3 rooms and 4 rooms has 2,3 and 4 rooms respectively in between the Access Point and the client. The Access Point was
132
G. Feh´er
the modified ASUS WL-500gP device and a client was a laptop running Linux equipped with the SMC WUSB-G card. The measurements were performed in the University’s building during the night. We tried to choose a silent period where other radio signals do not disturb the measurements. Also, we selected a WiFi channel, where the channel and its first and second neighbors were not allocated by other Access Points.
Fig. 2. Indoor scenarios
In each measurement scenario we took a 7 hour long measurement. The software modified Access Point sent out 1000 byte long, specially patterned measurement frames periodically in each 1/10th seconds. The transmission speed was redefined after each transmission and the values cycled through 1, 2, 5.5, 11, 6, 9, 12, 18, 24, 36, 48 and 54 Mbps. The first 4 transmission rates are IEEE 802.11b rates, while the latter 8 are for 802.11g. Also, in the latter cases, the modulation was OFDM instead of DSSS. The destination of the measurement flow was a multicast address, so the Access Point waited for no acknowledgments. The client was switched to monitor mode and recorded all the correct and damaged frames it could. The damaged frames suffered bit modifications during the transmissions. As we constructed the measurement flow in a special way using a recognizable bit pattern, at the receiver side we were able to identify the place of the erroneous bits. In the case of the first indoor scenario, where the distance from the Access Point to the WiFi client was only 4 meters, almost every frame were correctly transmitted even on the highest transmission rate. In the case of the 4 room scenario, there were hardly any frames received even on the lowest rate. 3.3
The Good, the Bad and the Dropped
We present some measurement results from the middle ranges that demonstrate the receiver’s performance when receiving the same transmission on different transmission rates. The results are presented along the transmission speed and we put the results into 5 groups. The first group is for the good frames, the frame was correctly received here. The second group is for the lightly damaged frames, we have byte changes here up to 1 percent of the whole frame (1-10 bytes). The third group is a moderate damage between 1 and 10 percent change in the frames (10-100 bytes). The forth group indicates a severe damage, as more than 10 percent of the frames (more than 100 bytes) were changed during the transmission.
Bit-Error Analysis in WiFi Networks Based on Real Measurements
133
The last group represents frame drops, here the receiver were unable to catch the frame in the air. We know about the drops since the measurement frames are sent out periodically. During the 7 hour measurements there were certain periods, where the channel seemed to be better and other times worse. We selected an 2 hour interval where we had nearly steady performance and calculated the average values for the different transmission speeds. Since the measurement frames were sent out alternating the transmission speed one by one, therefore the same 2 hour period is used for all the different speeds. 100% 90% 80% 70%
Delivery rate
60% 50% 40%
Drop Bad 100-
30%
Bad 10-100 20% 10%
Bad 1-10 Good
0% 1 Mbps
2 Mbps
11 Mbps
36 Mbps
48 Mbps
54 Mbps
Transmission speed
Fig. 3. Transmission results in the case of the 1 room scenario
Fig. 3 displays the 1 room scenario, where there was one room between the Access Point and the WiFi client. The 97.59 percent of the 1 Mbps measurement flow was received by the client correctly, 2.1 percent of the flow contained erroneous frames and 2.2 percent of the flow was lost. In a higher transmission speed, at 36 Mbps, the receiver was able to correctly capture only 50.07 percent of the measurement flow. There is a significant amount, 21.55 percent of the flow, where frames contain a small number of errors, up to 1 percent of the total length. 13.17 percent of the measurement flow suffered more than a light damage, while 15.2 percent of the flow did not reach the client at all. In the highest transmission rate, which was 54 Mbps, there is hardly any valid frame. Just a small fraction, 0.81 percent of the flow was received with less than 10 percent of errors. 43.39 percent of the frames was received with more than 10 percent of errors and 55.79 percent of the measurement flow were lost during the transmission. This measurement result shows that there exists a situation where we can have a close to perfect transmission even on a moderate transmission speed, ie. 89.81 percent of successful delivery rate at 11 Mbps, while on higher speeds we already have a significant amount of error.
134
G. Feh´er 100%
Drop
90%
Bad 100-
80%
Bad 10-100 70%
Bad 1-10 Delivery rate
60%
Good
50% 40% 30% 20% 10%
0% 1 Mbps
2 Mbps
11 Mbps
36 Mbps
48 Mbps
54 Mbps
Transmission speed
Fig. 4. Transmission results in the case of the 3 room scenario
In Fig. 4 we present the measurement results of the 3 rooms scenario. Here there were 3 rooms between the Access Point and the WiFi client. As the figure shows, the 4 walls and the distance had a serious impact on the transmission. At the 36, 48 and 54 Mbps transmission rates there were no frames at all received by the WiFi client. In contrast, 20.12 percent of the measurement traffic sent out with the 1 Mbps transmission speed was correctly received by the client. In this case, there is the 26.47 percent of the measurement frames that were received with less than 1 percent error in the frames. This is also a significant amount. Plus there is 12.4 percent of the measurement flow that was received with more than 1 percent errors. Here 41 percent of the measurement flow was not captured. This statistics becomes a lot worse in the case of the 2 Mbps measurement frames. The loss is already 87.79 percent, and only 0.03 percent of the measurement flow was received correctly. The weights of the damaged frames are 0.64, 9.77 and 1.51 percent respectively. This measurement also highlights the differences among the performances at various transmission speeds. Moreover, we can observe, that using the base rate, it is still possible to send frames to places, where the radio channel is already heavily distorted. 3.4
The Number of Errors and the Signal Strength
In the following measurements we analyzed the relation between the number of errors within the frame and the signal strength that was measured by the capturing WiFi device. The number of errors are expressed in bytes, while the official measurement unit of the signal strength is dB. This latter metric could be a little bit misleading, since it is measured to a fixed reference that can vary from driver to driver. It is impossible to compare the signal strength values among different cards, however it is a good indication when there is just a single card in use.
Bit-Error Analysis in WiFi Networks Based on Real Measurements
135
Fig. 5. Errors and Signal strength in the 6 meter scenario
Fig. 5 presents the measurement results in the 6 meter scenario. In this case there was 6 meter between the Access Point and the WiFi client, and there was also a wall between them. The figure shows the full length of the measurement, 25000 seconds that is around 7 hours. The different curves show different transmission speeds. Although the per frame number of errors in the case of the 1 to 11 Mbps transmission speeds are very low and therefore indistinguishable on the figure, we can observe that in the higher rates the number of damaged bytes are already significant in the transmitted frames. Moreover, we show that despite of our efforts to create an environment where the channel condition is stable, there are sections in the measurement where the receiving behavior differs a lot. During the first 5500 seconds the number of errors are really high for the 54 Mbps transmission.In this section the signal strength is around 35 dB. In the second section, which is between 5500 and 12000 seconds, the signal strength is better, it goes up to 45 dB. The transmission has less errors, it is always under 70 bytes for all the measurement flows. In the third section, which is after the first 12000 seconds, both the signal strength and the number of errors in the frames are fluctuating. Interestingly, the signal strength is lower than it is in the first section, however the performance in the view of the number of errors is better. This measurement underlines that we cannot derive straight relationship between the signal strength metric and the amount of damages within the frames. On the figure we can see that it is only the 54 Mbps measurement signal that has a three different sections, the remaining 5 measurement flows show balanced performance during the whole measurement. The 54 Mbps flow was
136
G. Feh´er
Fig. 6. Errors and Signal strength in the 1 room scenario
not a distinguished one, since the measurement signal was cycling through the transmission speed settings frame by frame, still we got this results. Our assumption is that there was a background wireless traffic and that created the three different sections. On Fig. 5 the signal strength curves run together. This means that the signal strength is independent of the transmission speed. Moreover, as the measurement frames followed each other in a 0.1 second distance, we can show that the signal strength changes slowly in time assuming steady channel conditions. Fig. 6 presents the same error and signal strength metrics as it is on the previous figure. Here we displayed the results of the 1 room scenario, where there is 2 walls between the Access Point and the WiFi client. On the figure the signal strength curves stay together, showing the independence of the signal strength and the transmission speed. The figure perfectly displays that there is a connection between the number of errors and the transmission speeds. The number of error curves in the case of 48 and 54 Mbps are fluctuating similarly. Moreover, when the number of errors within a transmitted 36 Mbps measurement frame is high, the curve also follows the shape of the higher rate curves. Finally on Fig. 7 the results of the 3 rooms scenario measurements are displayed. There are only two measurement flows on the figure, since on the higher rates we had hardly any captured frames. The per frame number of errors are high at both flows and we can identify again the connection between the number of errors and the transmission speed. The received signal strength is around 16 dB during the measurement , which value is considered very low.
Bit-Error Analysis in WiFi Networks Based on Real Measurements
137
Fig. 7. Errors and Signal strength in the 3 rooms scenario
4
Conclusion
In this paper we presented WiFi measurement results. We utilized the Linux mac80211 wireless driver architecture, and set the card to monitor mode. Thus we were able to capture all frames in the air regardless they have correct CRC or not. With the help of the radiotap headers we knew the signal strength for the received frames. In our Access Points we ran Linux as well, namely the OpenWRT distribution. We sent the frames using the radiotap headers and set the transmission speed. This measurement system is available to everyone, since all the required components are in the common Linux kernel. We made very long measurements, sending 250000 specially constructed frames in each scenario at various transmission rates. During the measurements we analyzed the bit errors that transmitted damaged the frames. Based on the measurement results we can state that there is a connection between the per frame number of errors and the transmission speed. Despite of the similar signal strength values, flows with different transmission speed have different number of errors in their frames. We also pointed out that wireless devices are so different that making conclusions based on the observation of a specific card and driver pair is inappropriate and might be misleading.
138
G. Feh´er
Acknowledgments The research leading to these results has received funding from the European Union’s Seventh Framework Programme ([FP7/2007-2013]) under grant agreement n INFSO-ICT-214625.
References 1. Anastasi, G.: Ieee 802.11 ad hoc networks: performance measurements. In: in: Proceedings of the Workshop on Mobile and Wireless Networks (MWN 2003) in conjunction with ICDCS 2003, pp. 758–763 (2003) 2. Bianchi, G., Formisano, F., Giustiniano, D.: 802.11b/g link level measurements for an outdoor wireless campus network. In: WOWMOM, pp. 525–530. IEEE Computer Society, Los Alamitos (2006) 3. Chandra, R., Bahl, P., Bahl, P.: MultiNet: Connecting to multiple IEEE 802.11 networks using a single wireless card. In: Li, B., Krunz, M., Mohapatra, P. (eds.) 23rd Annual Joint Conference of the IEEE Computer and Communications Societies, INFOCOM 2004, Piscataway, NJ, USA, vol. 2, pp. 882–893. IEEE Computer Society, Los Alamitos (2004) 4. Cheng, Y.C., Bellardo, J., Benk, P., Snoeren, A.C., Voelker, G.M., Savage, S.: Jigsaw: Solving the puzzle of enterprise 802.11 analysis. In: Proceedings of the ACM SIGCOMM Conference (2006) 5. Franceschinis, M., Mellia, M., Meo, M., Munaf, M., Superiore, I., Boella, M., Italy, T.: Measuring tcp over wifi: A real case. In: 1st workshop on Wireless Network Measurements, Winmee, Riva Del Garda (2005) 6. Giustiniano, D., Bianchi, G., Scalia, L., Tinnirello, I.: An explanation for unexpected 802.11 outdoor link-level measurement results. In: INFOCOM, pp. 2432– 2440. IEEE, Los Alamitos (2008) 7. Heldenbrand, D., Carey, C.: The linux router: an inexpensive alternative to commercial routers in the lab. J. Comput. Small Coll. 23(1), 127–133 (2007) 8. madwifi. madwifi homepage, http://www.madwifi.org 9. radiotap. radiotap homepage, http://www.radiotap.org 10. Tinnirello, I., Giustiniano, D., Scalia, L., Bianchi, G.: On the side-effects of proprietary solutions for fading and interference mitigation in ieee 802.11b/g outdoor links. Computer Network 53(2), 141–152 (2009)
ACCESSNETS 2010
Poster Session
Data-Rate and Queuing Method Optimization for Internetworking Medical Applications Radek Dolezel, Otto Dostal, Jiri Hosek, Karol Molnar, and Lukas Rucka Faculty of Electrical Engineering and Communication, Brno University of Technology, Purkynova 118, 61200 Brno, Czech Republic
[email protected],
[email protected],
[email protected],
[email protected],
[email protected]
Abstract. In medical environment, there is a fundamental demand to transfer and store large volumes of image data generated by modern medical devices. Currently the majority of the medical facilities spread around the country have quite limited Internet access. The aim of our work, presented in this article, was to find an optimal solution to transfer large volumes of image date over lowcapacity links with regarding to minimum response-times. First we statistically described the traffic generated by the corresponding medical equipment and then evaluated the behaviour of these mathematical models in the OPNET Modeler discrete event simulation environment. The simulation results and their interpretation represent the main contribution of the following text. Keywords: Computer Rentgen, Computer Tomograph, medical image processing, OPNET Modeler, transmission capacity, WFQ.
1 Introduction Usually in regional medical image data processing systems, large volumes of data received from all cooperating medical facilities are stored in one central node. The sources of data, called modalities, are obviously MRI (Magnetic Resonance Imaging), CT (Computer Tomograph), US (Ultra-Sound) or CR (Computer Rentgen / X-ray) devices. The usage of optical networks can provide sufficient bandwidth capacity for medical facilities, [1], [2]. Difficulties occur in medical facilities which are connected by alternative technologies with limited data-rate. The aim of this article is to find a solution which can compound the demands of hospital workers on the volume of required data and maximum acceptable delay. The main goal is to find an optimal relation between a channel capacity and delay of images transmitted by various types of acquisition devices (modalities). Preferential treatment of some selected traffic-flows can also significantly affect the responsetime of the evaluated services. Preferential treatment has its reason because not all of modalities are used for acute cases, so these modalities can have a less resources, e.g. data-rate in comparison with those used instantly. The simplified scheme of medical data transfer architecture is shown in Fig. 1. R. Szabó et al. (Eds.): AccessNets 2010, LNICST 63, pp. 141–152, 2011. © Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
142
R. Dolezel et al.
Fig. 1. Medical data transfer architecture – simplified scheme
2 Initial Premises and Statistical Analysis The system in the scope of our evaluation uses TCP (Transmission Control Protocol) as a transport protocol, so the transfer time of image data can be affected by channel capacity, performance of the TCP transmitter and the receiver subsystem and by application functionalities. We have experimentally verified that channel throughput is not limited by the size of socket-buffer neither in the transmitter nor in the receiver. So, there is no TCP window reduction caused by the lack of buffer capacity on the receiver or the transmitter side. We also assumed that the channel throughput was not influenced neither by the application behaviour such as the data storage and organization method. The parameters of a statistical model were specified based on the measurement and analysis of real traffic. The measurement was provided during traffic peaks, which is the time from morning 7 AM to 4 PM. As a source of investigated data the traffic from CT and CR modalities has been chosen. Three key traffic parameters have been identified which were required to model the traffic: inter-request time, size of transmitted data and number of repetitions. Since all of these parameters are random variables each of them were described by a corresponding probability distribution. The probability distributions of the corresponding traffic parameters have been tested by the Pearson's chi-square test with a significance level of 5%. The independence of the volume of transmitted data and the intervals between transmissions was verified by a contingency table. To obtain the precise traffic-profile of the corresponding acquisition modality a precise long-term measurement has been carried out. We collected required data for one week, every day from 7am to 4pm. For this purpose the modalities connected with speed of at least of 100Mbps has been selected. The whole traffic from these modalities was captured using the tcpdump utility and subsequently analysed. The results of the analysis of the selected modalities are presented in the following chapter. The CR was the firstly processed modality. First of all we analysed the interrequest time, e.g. whether during a given period of time the acquisition modality is transmitting data or not. From a practical point of view we found more useful to work with the periods between the establishments of the subsequent connections instead of
Data-Rate and Queuing Method Optimization for Internetworking Medical Applications
143
the time between the end of the previous and the setup of the following connection. The reason is that the end of the connection depends on the capacity of the transmission links which is the parameter we want to optimize in our simulations. Based on the analysis provided the probability distribution of the inter-request times between two subsequent TCP connections DT can be described by exponential distribution with parameter λ = 1387.40s-1. The amount of data transmitted V is a combination of two intervals with uniform distributions in ranges <11; 14> and <20; 95> MB. Values DT and V are independent. The example of the slices of the secondly processed CT modality is shown in Fig. 2. During the analysis of the captured data a random number of TCP bursts have been identified during each relation. These bursts were usually represented by one to seven separate connections. Therefore, the connection time between bursts, the number of the TCP connections in the burst and the time between TCP connections in the burst were analyzed separately. A time interval of 150 seconds was set up as a time limit for TCP connection which is no longer considered to be a part of the investigated burst.
Fig. 2. One of the slices of the CT modality
The inter-request time between subsequent TCP connections DT has exponential probability distribution with parameter λ = 837.63s-1. The interval between TCP connections within one burst DB is in range from 8 to 150 seconds and has normal probability distribution with parameters μ = 57.87 and σ = 27.88. Burst of the TCP connections contains from one to seven connections. The number of TCP connections in the burst NB has Poisson probability distribution with parameter λ = 1.45. The amount of data transmitted V in every TCP connection has an alternative probability distribution. The probability of transmitted data with 8.5MB in size is 0.25 and with 10.25MB in size is 0.75.
144
R. Dolezel et al.
3 Simulation Results Due to timing constraints in practical implementations we examined the impact of total link capacity and differentiated queue management on the response-time of the modalities. For this purpose a simulation model has been built in OPNET Modeler simulation environment [3]. The model consisted of four traffic sources modelling the CT modalities and other four traffic sources modelling the CR modalities. The topology of the simulation scenarios is in Fig. 3.
Fig. 3. Topology of the simulation scenario
During the simulations the application-level response-time has been evaluated. Because of a very close behavioural analogy, the FTP (File Transfer Protocol) protocol has been used to model both of the modalities. To simulate limited link capacities ratelimiting was applied on the common communication link. All the other communication links operated with full-speed 1Gbps. The inter-request time, file size and number of repetitions were configured according to the results obtained by statistical analysis of the captured traffic. In later simulation scenarios we also verified the influence of controlled queue management, namely the mechanism of WFQ (Weighted Fair Queuing) [4], [5]. The following figures show the most important simulation results. Fig. 4 and Fig. 5 show the dependency of the response time (averaged for all four sources of the same modality) on the capacity of the rate-limited link. In the simulation the traffic was generated by all eight devices at the same time and the response times were averaged separately for each modality. For both modalities there is a significant increase in response times when the link-capacity is reduced through 10Mbps to 5Mbps. Based on simulation results, practically the capacity should not be dropped below 10Mbps, otherwise the quality of examined services will be markedly reduced.
Data-Rate and Queuing Method Optimization for Internetworking Medical Applications
145
Fig. 4. Dependency of the average response-time of modality CT on the maximum link capacity
Fig. 11 evaluate the impact of WFQ on response-times. Two queues were used in the simulation the first one for one of the eight sources and the second for the remaining seven. It was necessary to distinguish between scenarios where the selected source is of modality CT (Fig. 6, Fig. 7 and Fig. 8) or CR (Fig 9, Fig. 10 and Figure 11). For both types of preferentially treated traffic-flows simulation scenarios with various maximum link-capacities were created. In addition, for each link-capacity four different bandwidth distribution models have been configured with different ratios between the bandwidth allocated to the first WFQ queue to the total bandwidth. More precisely the ratios of 20%, 30%, 50% and 80% have been used. For better comparison the corresponding graphs also contain the average response-time from the scenarios without WFQ queues.
Fig. 5. Dependency of the average response-time of modality CR on the maximum link capacity
146
R. Dolezel et al.
Fig. 6. Impact of the relative bandwidth distribution on the response-time of the preferentially treated CT modality in the case of 5Mbps total link-capacity
Fig. 7. Impact of the relative bandwidth distribution on the response-time of the preferentially treated CT modality in the case of 10Mbps total link- capacity
Figures 6, 7 and 8 clearly show that WFQ improves the response-times for the CT modality. The influence of preferential treatment is more significant in the case of lower link-capacities, e.g. 5Mbps, see Fig. 6. In contrast, the impact of WFQ at a speed of 20Mbps with the given number of sources is practically negligible. Furthermore, the figures also show that the reduction of the response-times is significant only up to 30% of total bandwidth. Allocation of more bandwidth to one source brings no further improvements.
Data-Rate and Queuing Method Optimization for Internetworking Medical Applications
147
Fig. 8. Impact of the relative bandwidth distribution on the response-time of the preferentially treated CT modality in the case of 20Mbps total link- capacity
Fig. 9. Impact of the relative bandwidth distribution on the response-time of the preferentially treated CR modality in the case of 5Mbps total link-capacity
Figures 9, 10 and 11 show the impact of WFQ on the response-times for a CR modality source. It is evident that for this modality WFQ does not bring any significant improvement nor at lower speeds. The reason of it is the bursty character of the CR modality. We can conclude that the efficiency of WFQ is highly dependent on the modality type and is not able to reduce the response-time under all circumstances.
148
R. Dolezel et al.
Fig. 10. Impact of the relative bandwidth distribution on the response-time of the preferentially treated CR modality in the case of 10Mbps total link- capacity
Fig. 11. Impact of the relative bandwidth distribution on the response-time of the preferentially treated CR modality in the case of 20Mbps total link- capacity
The following figures show a more detailed analysis of the impact of WFQ. Based on the earlier conclusions a bandwidth distribution model with 30% of resources allocated to the first queue (to the preferentially treated source) was used in the analysis. The results were divided based on the modality type of the preferentially treated source and the total link-capacity. Fig. 12, 13 and 14 show the simulation results with preferentially treated CT modality in the case of 5Mbps, 10Mbps and 20Mbps total link-capacities respectively. There are five response-times included in each figure: 1) response-time of the preferentially treated source, 2) average response-time of the remaining three sources of the
Data-Rate and Queuing Method Optimization for Internetworking Medical Applications
149
same modality, 3) average response-time of four sources of another modality, 4) average response time of the first modality without WFQ and 5) average response time of the second modality without WFQ. From the results it is clear that the preferred traffic has shorter response-times than the others of the same modality, but this difference decreases by the increasing maximum link-capacity. Furthermore, at a speed of 30Mbps the preferential treatment of one source has a substantial negative impact on the average response time of the CR modality. This is due to the rarely generated but very large bursts of the CR modality, which in the case of bandwidth artificially limited to 70% of its original size cannot be transmitted as fast as in the case of standard best-effort treatment.
Fig. 12. Response-times when 30% of the 5Mbps link-capacity is reserved for one traffic source of CT modality
Fig. 13. Response-times when 30% of the 10Mbps link-capacity is reserved for one traffic source of CT modality
150
R. Dolezel et al.
Fig. 14. Response-times when 30% of the 20Mbps link-capacity is reserved for one traffic source of CT modality
Fig. 15. Response-times when 30% of the 5Mbps link-capacity is reserved for one traffic source of CR modality
Figures 15, 16 and 17 show the simulation results with preferentially treated CR modality source in the case of 5Mbps, 10Mbps and 20Mbps total link-capacities respectively. As in the previous case, there are also five graphs in each figure: 1) response-time of the preferentially treated CR source, 2) average response-time of the remaining three CR sources, 3) average response-time of four CT modalities, 4) average response time without WFQ for the CT and 5) for the CR modalities. The results suggest that the impact of preferential treatment is evident only in a case of slow 5Mbps connection, see Fig. 15. In other situations the sorting of very large data burst
Data-Rate and Queuing Method Optimization for Internetworking Medical Applications
151
into a queue with limited capacity seems rather counterproductive. The results also show that the preferential treatment of a CR modality practically has no effect on the response-time of the CT modality.
Fig. 16. Response-times when 30% of the 10Mbps link-capacity is reserved for one traffic source of CR modality
Fig. 17. Response-times when 30% of the 20Mbps link-capacity is reserved for one traffic source of CR modality
4 Conclusion The aim of our work was to define a method how to estimate the link-capacity required by modern medical equipment communicating via data networks. Since
152
R. Dolezel et al.
hospital facilities are spread around the country usually they are interconnected trough commercial internet service providers and there is a natural pressure on minimization of the cost of these connections. On the other hand, long response-times can limit the practical usability of these equipments. Taking into account the previous constraints we suggested a two-step method, which firstly statistically describes the traffic generated by corresponding equipment and then evaluates the behaviour of these models in a discrete event simulation environment. To verify the reliability of the method suggested we selected a mid size hospital facility with four CTs and four CRs. Next we derived their statistical model, based on data from long-term traffic-capturing, compared the response times calculated by OPNET Modeler with real values and found out that the simulation results substantially correspond to practical results. To extend our analysis we also evaluated in the simulation environment the effect of quality of service support on response-times. During the analysis we confirmed that preferential treatment is significant only in the case of lower link-capacities, more exactly at 5Mbps maximum link-capacity for the selected combination of equipment. We also discovered that the efficiency of QoS support is highly dependent on the modality type and it is not able to reduce the response-time under all circumstances. This is caused by the bursty character of the modality. In addition, in some situations the bandwidth reservation appeared to be counterproductive as compared to standard best-effort treatment. Acknowledgments. This paper has been supported by the Grant Agency of the Czech Republic (Grant No. GA102/09/1130) and the Ministry of Education of the Czech Republic (Project No. MSM0021630513).
References 1. Slavicek, K., Javornik, M., Dostal, O.: Technology backround of international collaboration on medicine multimedia knowledge base establishment. In: 2nd WSEAS International Conference on Computerr Engineering and Applications (CEA 2008), pp. 137–142. WSEAS Press, Acapulco (2008) 2. Slavicek, K., Novak, V.: Introduction of Alien Wavelength into Cesnet DWDM Backbone. In: 6th International Conference on Information, Communications and Signal Processing, pp. 61–66. IEEE Press, Singapore (2007) 3. Opnet Technologies: OPNET Modeler Product Documentation Release 15.0 (2009) 4. Park, K.I.: QoS in Packet Networks. Springer, New York (2004) 5. Růčka, L., Hosek, J., Molnar, K.: Advanced Modelling of DiffServ Technology. In: 32nd International Conference on Telecommunications and Signal Processing (TSP 2009), Asszisztencia Szervezo Kft., Budapest, Hungary, pp. 1–6 (2009)
Shared Wavelength Assignment Algorithm in Multi-profile WDM-EPONs to Support Upstream Bandwidth Guarantees Noemí Merayo, Patricia Fernández, Ramón J. Durán, Tamara Jiménez, Ignacio de Miguel, Juan C. Aguado, Rubén M. Lorenzo, and Evaristo J. Abril Optical Communications Group Department of Signal Theory, Communications and Telematic Engineering E.T.S.I. Telecomunicación, University of Valladolid (Spain) Campus Miguel Delibes, Camino del Cementerio s/n, 47011 Valladolid, Spain Tel.: +34 983 423000 ext. 5549; Fax: +34 983 423667
[email protected]
Abstract. A novel wavelength and bandwidth allocation algorithm in WDMEPON is proposed to provide subscriber differentiation by ensuring guaranteed bandwidth levels in the upstream direction. Contrary to previous schemes, the new algorithm is designed to save cost at both ends of the network, especially at the users’ side, as it restricts the number of upstream wavelengths which can be used by them. Simulation results show that ShaWaG achieves better performance than other bandwidth allocation algorithms in WDM-EPONs but simultaneously it requires lower number of upstream wavelengths. The novel algorithm makes fairer bandwidth distribution than those methods as it ensures efficiently a minimum guaranteed bandwidth to every subscriber for a larger number of ONUs when compared to existing methods. Keywords: Wavelength Division Multiplexing (WDM), Dynamic Bandwidth Allocation (DBA), Ethernet Passive Optical Network (EPON), Service Level Agreement (SLA), Wavelength Dynamic Assignment.
1 Introduction Passive Optical Networks (PONs) are an excellent technology to develop access networks, as they provide both high bandwidth and class of service differentiation [1-2]. The PON technology uses a single wavelength in each of the two directions and such wavelengths are multiplexed on the same fiber by means of Wavelength Division Multiplexing (WDM). Since all users share the same wavelength in the upstream direction, a Medium Access Control (MAC) is necessary to avoid collision among packets from different Optical Network Units (ONUs). Dynamic Bandwidth Allocation (DBA) algorithms, based on the Time Division Multiplexing Access (TDMA) protocol, are the best choice as they dynamically distribute the available bandwidth depending on the current demand of ONUs [3-8]. Although PON infrastructures can provide enough bandwidth for current applications, the gradual increase of the number of users and the bandwidth requirements of R. Szabó et al. (Eds.): AccessNets 2010, LNICST 63, pp. 153–167, 2011. © Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
154
N. Merayo et al.
the new emerging services, demand an upgrade of such access networks. The addition of new wavelengths to be shared in the upstream and downstream direction in PON infrastructures leads to the so-called Wavelength Division Multiplex PONs (WDMPONs). The pure WDM-PON architecture assigns one dedicated wavelength per ONU, which implies more dedicated bandwidth and security in the system. However, the related cost associated with such deployment makes pure WDM-PONs as the next-generation architectures. Hence, the combination of the WDM technology with Time Division Multiplexing (TDM) techniques the best near future approach. These hybrid architectures exploit the advantages of wavelength assignment of WDM techniques and the power splitting of TDM techniques. Consequently, the most important challenge of WDM-PON networks is the costs associated with the deployment of such architectures. As it was said before, pure WDM-PON architectures, do not allow bandwidth redistribution and they present high deployment cost. Besides, if the number of ONUs highly increases, they can overload the available wavelengths of the transmission band (1530 nm-1560 nm). To deal with it, novel WDM-PON prototypes assume that ONUs can simultaneously transmit in several wavelengths in the upstream direction instead of having one dedicated wavelength. To do that, each ONU is equipped with several fixed transceivers or a tunable transceiver. However, the use of tunable transceivers provides less bandwidth due to the dead tuning time necessary to switch wavelengths. Hence, it is required transceivers of high tuning speeds, especially if the number of supported upstream wavelengths is quite high. As a consequence, it is preferable intermediate architectures between the previous architectures that simultaneously provide flexibility and future scalability in WDM-PONs. On the other hand, end users contract a Service Level Agreement (SLA) with a provider, normally related to a minimum guaranteed bandwidth. It forces that DBA algorithms ought to support various service levels with different guarantees. The Bandwidth Guaranteed Polling (BGP) method proposed in [5] divides ONUs into two disjoint sets of bandwidth guaranteed ONUs and best effort ONUs. However, this scheme only differs between guaranteed ONUs and best effort ONUs, but it does not distinguish other profiles with specific restrictions. A typical way to offer customer differentiation is to use a fixed weighted factor assigned to each ONU associated with a specific SLA. The bandwidth is allocated depending on these weights. In the methods presented in [6-7], the OLT distributes the available bandwidth by assigning different weights to each client depending on their SLA. Therefore, ONUs associated with a higher weight will be assigned more bandwidth. In contrast, the algorithm proposed in [8] distributes the bandwidth to each subscriber changing the value of the initial weights to adapt them to the service conditions of every profile according to the mean packet delay of the most sensitive traffic. In this paper, we present a novel DBA algorithm applied to a hybrid WDM-TDM EPON architecture for a gradual upgrade of the existing TDM EPON infrastructures. Unlike other DBA algorithms proposed in WDM-EPONs, it deals with the cost of these architectures, by only allowing each ONU to transmit in a limited set of wavelengths which depends on the requirements of users. Besides, the new algorithm can differ between service level profiles with the aim to ensure minimum guaranteed bandwidth levels to each of them. The Ethernet protocol has been considered as it is a well-known inexpensive technology and interoperable with legacy equipment [1-2].
Shared Wavelength Assignment Algorithm in Multi-profile WDM-EPONs
155
2 Dynamic Bandwidth Allocation in Hybrid WDM-TDM PONs Several WDM-TDM architectures have been proposed recently, although the deployment of the WDM technology in the access network is still in its first stages. One extended WDM-PON approach employs one separate wavelength for the transmission between the OLT and each ONU. In general, this architecture does not allow bandwidth redistribution and presents high deployment cost. Other type of architectures, such as the proposed in [9-11], consider a smooth upgrade of TDM-PONs, allowing several wavelengths to be used in the upstream transmission. Authors in [9-10] propose that the OLT consists of an array of fixed laser/receivers and the ONUs of either an array of fixed laser/receivers or one or more tunable laser/receivers. From the providers’ point of view is more likely the utilization of either tunable laser/receivers or fixed laser/receiver arrays, but not both simultaneously. In the prototype proposed in [11], every ONU employs one or more fixed transceivers, permitting a gradual upgrade depending on the traffic demand of ONUs. Then, the OLT assigns the bandwidth to each ONU in those wavelengths they support. In addition, the fixed transceivers at the ONU can be interchanged by a fast tunable laser. In that case, the OLT only can transmit in one single wavelength at any given time, which may lead to poor bandwidth utilization due to the dead tuning time every time there is a wavelength switch. Most of the existing bandwidth allocation algorithms in WDM-PONs assume this kind of architecture, in which several wavelengths are shared by ONUs. The algorithm proposed for the prototype shown in [11] presents three variants to assign the excess bandwidth among ONUs with great traffic demand (high loaded ONUs). In the controlled variant, the one which achieves the best performance, the OLT waits until all reports messages from one cycle are received in order to apply the allocation algorithm for the next cycle. However, in the other two approaches the OLT permits that ONUs with low traffic demand can transmit before the reception of every report. Since several wavelengths are available in the upstream channel, the channel allocation is based on the first-fit technique (i.e. the first available free wavelength). In contrast, the algorithm proposed in [13] is an extension of the Interleaved Polling Adaptive Cycle Time (IPACT) and it permits that every ONU transmits just after receiving each single report message. It also applies the first-fit technique to dynamically select each channel wavelength. However, the algorithm also provides Class of Service (CoS) differentiation by means of the extended strict priority queue scheme. In other to compare both policies, authors in [10] developed an extension of the Multi-Point Control Protocol (MPCP) for WDM-PONs to support dynamic bandwidth allocation. They implemented two scheduling paradigms, namely online and offline. In the former, the OLT applies bandwidth and wavelength allocation based on the individual request of each ONU. On the contrary, in the offline policy the OLT applies scheduling decisions taking into account the bandwidth requirements of all ONUs. The simulations demonstrated that the online scheduling method obtained lower delays than the offline scheduling, especially at high ONU loads. The method proposed in [15], which follows the same online philosophy, is designed to ensure minimum guaranteed bandwidth levels to different profiles. This scheme assumes that every ONU simultaneously transmits on several wavelengths in the upstream and all of them support the same set of wavelengths. Other proposals support Quality of
156
N. Merayo et al.
Service (QoS) in a differentiated service framework. The algorithm proposed in [14] allows each ONU to simultaneously transmit on two channels, each channel dedicated to a different type of traffic.
3 Description of the WDM-PON and the WDM-DBA Algorithm 3.1 Proposed WDM-PON Architecture Although it does not exist a predominant WDM-PON architecture, the gradual WDM upgraded will be limited by technological costs and based on the necessity of service providers. Consequently, it is preferable flexible WDM-PON architectures which could be upgraded in a cost-effective way. However, legacy WDM-PON architectures employ one separate wavelength for the transmission between each ONU to the OLT. These infrastructures do not allow bandwidth redistribution and presents high deployment costs. In contrast, recent WDM-PON prototypes assume that ONUs can simultaneously transmit in the same set of upstream wavelengths. Typically, each ONU is equipped with a tunable transceiver, as the use of them is very interesting because it can provide several wavelengths with only one device. However, it may provide low throughput due to the dead tuning time necessary to switch among wavelengths. Therefore, it is necessary tunable transceivers with a tuning speed of microseconds. Furthermore, the more number of wavelengths each ONU are allowed to transmit, the more expensive the ONU is. As a consequence, we agree with intermediate architectures which allow future flexibility and we propose a hybrid WDM-TDM architecture which minimizes the related costs, especially at the ONUs. A novel DBA algorithm has been proposed for such architecture, so that the WDMEPON effectively supports QoS by means of subscriber differentiation. The DBA algorithm is designed to ensure a minimum guaranteed bandwidth to each connected user, in the presence of several Service Level Agreements (SLAs) contracted by them. In this way, each ONU is allowed a number of wavelengths limited by the requirements of the connected subscribers. The proposed architecture agrees with the principles of the architectures in [11-12]. The proposal of the upstream direction with the presence of several SLAs is shown in Fig. 1 (with three SLAs). In the scenario of our proposal, all ONUs which belong to one specific SLA share the same dedicated wavelength. The OLT schedules the transmission of the different ONUs over this wavelength using a dynamic time division allocation scheme. Moreover, there is one more wavelength simultaneously shared by every ONU (λbackup), only used to accommodate the extra bandwidth needed by ONUs to fulfill their minimum guaranteed bandwidth. To supply the upstream wavelengths, each ONU is equipped with a cost-effective laser to transmit on the dedicated laser. However, it is considered the deployment of a second laser for the backup wavelength. Then, by means of coarse WDM (CWDM) techniques it is permitted a smooth upgrade to a WDM scenario. This architecture lacks of poor bandwidth utilization due to the dead time imposed every time there is a wavelength switch because of laser tuning times. However, when technology is mature enough and fast tunable lasers with low tuning times are achieved, the deployment of tunable laser will allow more flexibility and scalability, in case more backup wavelengths are
Shared Wavelength Assignment Algorithm in Multi-profile WDM-EPONs
157
needed to accommodate traffic or a higher number of ONUs will be connected. The wavelength channels are routed from the ONUs to the OLT by a passive arrayed waveguide grating (AWG) router. Regarding the OLT, for the upstream direction, it employs a WDM demultiplexer together with an array of receivers to detect the information of every upstream wavelength. This infrastructure can be easily scaled as it can be added other ports to the AWG in order to support more types of profiles with different bandwidth requirements. On the other hand, this equipment permits a gradual upgrade of the WDM-EPON architectures as if the ONUs increased their bandwidth requirements, the developed DBA algorithm assigns to them more frequently the backup wavelength. In case more backup wavelengths are needed in the network it can be possible to upgrade the infrastructure of ONUs with higher bandwidth requirements. Then, the DBA algorithm can be easily adapted to the new set of wavelengths supported. However, when technology provides very fast tunable lasers, their deployment inside the ONU will permit more future scalability.
Fig. 1. Basic proposed upstream architecture for users belonging to different SLAs
In the downstream direction, the wavelength channels are routed from the OLT to the ONUs by means of the same AWG router. As the upstream and downstream wavelengths are located in a different wavelength window, these two windows are separated using coarse CWDM at the OLT (as shown in Fig. 1). Moreover, the OLT is equipped with a multi-wavelength laser in order to transmit the corresponding wavelengths to each ONU. They can be a bank of fixed lasers or a tunable laser if the delay constraints permit its deployment. 3.2 Wavelength Allocation Scheme in the WDM-DBA Algorithm To distribute the available bandwidth among users in WDM-EPONs our algorithm follows the joined time and wavelength assignment, as most of the studies consider this policy as it permits multidimensional scheduling. The algorithm, called Shared Wavelength allocation algorithm with bandwidth Guarantees (ShaWaG), distinguishes between profiles with different bandwidth requirements. It has been designed to offer a minimum guaranteed bandwidth to each profile when their
158
N. Merayo et al.
demand excesses the available bandwidth in the upstream channel. In contrast to other existing DBA algorithms in WDM-EPONs, ShaWaG focus on save costs at ONUs by restricting the number of wavelengths that ONUs are allowed to use. Since the novel algorithm obliges ONUs of the same SLA to transmit over the same wavelength, the fixed scheme is used for the ONUs of the same SLA, which makes the wavelength allocation very simple to implement. However, when the number of ONUs or the demanded bandwidth is increased, the backup wavelength is dynamically activated by certain ONUs in order to be satisfied their guaranteed bandwidth levels. Under this situation, a different wavelength allocation policy is needed to arbitrate the dynamic allocation of the backup wavelength among ONUs. The study carried out in [10] demonstrated that the random, the least assigned and the least loaded methods excessively overload certain wavelengths. In contrast, the first fit method in which ONUs are able to transmit in the first free wavelength, leads to an efficient solution [13]. Consequently, we assumed the first fit scheme to dynamically assign the two supported wavelengths, the dedicated (λsla∈onui ) and the backup wave-
length (λbackup ) , when the second one is activated. If ONUs of several profiles require the employment of the backup wavelength, ShaWaG gives preference to priority profile. Once ONUs of this profile are ensured their guaranteed ShaWaG assigns this wavelength to the next profile. In order to activate wavelength, the OLT keeps a track of the mean allocated bandwidth to
the highest bandwidth, the backup each ONU
onui ( Balloc ) . When this value is lower than its minimum guaranteed bandwidth and its demanded bandwidth is higher than this guaranteed level, the OLT activates the backup wavelength and decides on which wavelength the ONU transmits in the next onui cycle (λalloc ) . Otherwise, if every ONU complies with its guaranteed bandwidth the backup wavelength keeps switched off. Fig. 2 shows a flow diagram to explain the performance of the developed WDM-DBA algorithm ShaWaG.
3.3 Dynamic Bandwidth Allocation in Each Wavelength
The designed algorithm achieves efficient upstream channel utilization because ONUs can transmit as soon as the previous ONU ends its transmission in each channel, since it follows a polling policy. The EPON standard and its extension to WDM-EPON architectures, uses the Multi-Point Control Protocol (MPCP) to properly schedule the communication between the OLT and the ONUs. Two control messages of MPCP are used to assign bandwidth in each upstream channel, the Report and the Gate messages. In the Report, the ONU sends the demanded bandwidth (in bytes) for the next cycle and the OLT sends a Gate message with the allocated bandwidth for that cycle. Therefore, the OLT allocates bandwidth to each ONU just after receiving its updated demand (i.e. Report). Hence, the OLT assigns bandwidth to each ONU independently of the status of the remaining ONUs, and the OLT does not have to wait for the queue information of every ONU. This leads to an efficient bandwidth utilization and avoids long packet delay. To avoid that the upstream channel is over used by some ONUs or the cycle time becomes quite longer, we limit the window length of every ONU in every cycle time
Shared Wavelength Assignment Algorithm in Multi-profile WDM-EPONs
159
[16]. In this scheme, the OLT gives the required bandwidth to each ONU as long as the demand is lower than a maximum bandwidth imposed. When the demand is higher than this bandwidth, the OLT gives this latter maximum. This performance makes the cycle adaptive depending on the updated demand of each ONU. The cycle is the total time in which all ONUs transmit in a round robin discipline. As the network allows different service levels profiles (SLAs), the new algorithm ShaWaG has been designed to distinguish between profiles with different requirements. In fact, it ensures a guaranteed bandwidth to each profile when their demand excesses the available bandwidth in the shared upstream channel. This is implemented by assigning a minimum guaranteed bandwidth factor to each SLA which ensures them a different bandwidth level. The OLT uses these factors to allocate the available bandwidth to each channel. Thus, ShaWaG sets different
(
)
slak , one for each SLA. The allocated bandwidth in one maximum bandwidths Bmax onui cycle time for each ONU ( Balloc ) can be defined by Eq. 1:
{
sla
i Balloci = min imum Bdemand ,Bmaxj onu
onu
}
(1)
onui where Bdemand is the aggregated bandwidth demand in bits of ONU i. The maximum
allocated bandwidth permitted to each ONU depending on its SLA (j) in each cycle
(
sla
time Bmaxj
) is calculated using Eq. 2. In Eq. 2, R
sla j
is a factor which represents the
minimum guaranteed bandwidth (bits/s) associated with the SLA j and Bcycle _ available is the available bandwidth in the maximum cycle considered (i.e. 2 ms set by EPON). slam The term N onus is the number of ONUs associated with the SLA m in the presence of n profiles. The term N λ is the number of supported wavelengths in the upstream. sla Bmaxj
=
Bcycle _ available ⋅ R m = n −1 sla R m m =0
⋅
sla j
⋅ Nλ
slam N onus
(2)
4 Simulation Results 4.1 Simulation Scenario
Simulations were initially made considering a WDM-EPON with both scenarios of 48 and 52 ONUs and one user connected to each ONU using OPNET Modeler 14 [17]. However, the simulation study has been extended to show the results for a different number of ONUs from 32 to 64 ONUs. The transmission rate of the upstream link between ONUs and the OLT is set to 1 Gbit/s and the access link from the user to each ONU to 100 Mbit/s [6,16-18]. The distance between ONUs and the OLT is set to 20 km, which is near the maximum permitted distance for a typical EPON [18]. To avoid collisions between adjacent ONUs, a guard time of 1 μs is chosen, a
160
N. Merayo et al.
value within the limits specified by the standard IEEE 802.3ah D1.414 [19]. Packet generation follows a Pareto distribution with a Hurst parameter, H, equal to 0.8, considering them of variable length (from 64 to 1518 bytes). Moreover, ONUs have one buffer of 10 Mbits where packets are queued according to their arrival [16]. ŚĞĐŬKEhŝĚĞŵĂŶĚ onui Bdemand
ĂŶĚǁŝĚƚŚůůŽĐĂƚŝŽŶ
{
sla
onui onui Balloc = minimum Bdemand ,Bmaxj
} tĂǀĞůĞŶŐƚŚůůŽĐĂƚŝŽŶ
ONU i
1R
onui λalloc = λsla∈onui
ŝƐƚƌƵĞƚŚĂƚ
onui sla∈onui ( Bdemand ) > Bguarantee && onui sla∈onui ( Balloc )? < Bguarantee
/ƐʄďĂĐŬƵƉŽĐĐƵƉĞĚ ďLJĂŚŝŐŚĞƌƉƌŝŽƌŝƚLJ ^>͍
onui λalloc = λsla∈onui
1R
onui λalloc = λbackup
EĞdžƚƚƌĂŶƐŵŝƐƐŝŽŶƚŝŵĞŽĨKEhŝ λalloc i TTxonu_ next = T free + T Balloc onui
onui
^ĞŶĚ'dƚŽKEhŝ :DLW 5(3257RI KEhŝ
Fig. 2. Main diagram of the WDM-DBA algorithm
As the WDM-EPON copes with the presence of several SLAs, it is presented a scenario with three SLAs: SLA0 as the highest priority service level, SLA1 as the medium priority service level and SLA2 as the lowest priority service level. In general, a very few conventional users contract high level agreement conditions, whereas users tend to contract medium or low priority profiles. In this study it has been considered that the 12% of users contract high level conditions, the 31% contract medium level conditions and the 56% contract the lowest. Regarding the minimum guaranteed bandwidths factors to each SLA, it has been set the values R sla0 = 100 , R sla1 = 70 and R sla2 = 50 as well as other studies [7, 15]. These factors are chosen to comply with the NTT DSL service plans (100/70/50 Mbit/s) [20]. Hence, each SLA should be given this guaranteed bandwidth when the bandwidth demand of every SLA exceeds the available bandwidth of the upstream channel.
Shared Wavelength Assignment Algorithm in Multi-profile WDM-EPONs
161
ShaWaG is compared with DyWaS-SLA [15], as it also applies weighted factors to guarantee bandwidth levels to different SLAs. As well as other schemes in WDMEPONs [13,15] DyWaS-SLA assumes that ONUs transmit on several wavelengths in the upstream direction, initially set to three wavelengths. As this number can be upgraded depending on the service provider requirements and the number of ONUs, we show a complete simulation study where it is assumed different number of wavelengths from two to four, as well as other published works [11, 12]. Thus, this kind of architecture will be compared to the one proposed by ShaWaG, which limits the number of wavelengths at the upstream channel to minimize costs and complexity. 4.2 Simulation Results
One of the most important characteristics of ShaWaG is the offered bandwidth to each subscriber depending on the guaranteed bandwidth contracted with the service provider. In Fig. 3 it is shown the offered bandwidth to one ONU of each SLA versus the ONU load when ShaWaG and DyWaS-SLA are compared for a initial set of 52 ONUs. As all ONUs have the same traffic distribution, all of them demand the same bandwidth ( Bdemand ) , as it is represented in the figure. The demanded bandwidth follows a linear function from 0 Mbit/s until the maximum user transmission rate set to 100 Mbit/s. In the same way, every algorithm offers the same quantity of bandwidth in Mbit/s to every ONU of the same SLA ( Boffered ) , since all ONUs have the same traffic distribution. Consequently, in Fig. 3 the bandwidth offered by each algorithm to each SLA is represented with only one line. This figure shows how ShaWaG is able to efficiently guarantee the establish bandwidth levels to every profile. In contrast, DyWaS-SLA cannot deal with such bandwidth guarantees for the two lowest priority subscribers (SLA1 and SLA2). Furthermore, it is noticeable that ShaWaG always give to the highest priority profile (SLA0) its total demanded bandwidth. In conclusion, the new algorithm ShaWaG, which only allows a maximum of two wavelengths to each ONU, achieves higher throughput than DyWaS-SLA, which uses three operating wavelengths in the upstream direction. On the other hand, another important strength of ShaWaG, is the limited number of wavelengths that it permits each ONU to use in order to minimize costs and complexity. In this study, we analyze the wavelength utilization of each SLA versus de ONU load made by ShaWaG and DyWaS-SLA. The next graphs represent the percentage of the wavelength utilization of each SLA (SLA0, SLA1 and SLA2). It should be mentioned that in DyWaS-SLA each ONU initially supports three wavelengths in the upstream (λ0, λ1, λ2). Meanwhile, in ShaWaG each ONU supports a maximum of two wavelengths, one dedicated wavelength depending on the SLA, λ0, λ1 and λ2 for SLA0, SLA1 and SLA2 respectively, and another for the backup transmission (λ3). Regarding the highest priority profile SLA0, Fig. 4 shows the percentage of the wavelength utilization when 52 ONUs share the upstream. As it can be seen, DyWaS-SLA simultaneously uses the three wavelengths, which means that the OLT has to constantly switch the laser at the ONUs. Moreover, ONUs are allocated the three supported wavelengths in the same proportion along the time. On the contrary, the novel algorithm ShaWaG only needs to use the dedicated wavelength for the SLA0 profile (λ0) for every ONU load. It means that the OLT has not to switch among several wavelengths
162
N. Merayo et al.
100
70
Bdemand
90
Boffered ShaWaG SLA0
80
Boffered ShaWaG SLA1
% of Wavelength Utilization of SLA0
Bandwidth to one ONU of each SLA (Mbit/s)
and it simplifies the upstream transmission. For the profile SLA1, Fig. 5 represents the percentage of the wavelength utilization versus the ONU load. One more time, DyWaS-SLA simultaneously uses the three wavelengths and they are assigned in the same proportion along the time. In contrast, ShaWaG only needs to employ the dedicated wavelength (λ1) up to loads relatively high. The backup wavelength, λ3, is only used for loads higher than 0.6 (i.e. ONUs transmitting at 60 Mbit/s) to ensure the level of 70 Mbits. The results confirm that in DyWaS-SLA the OLT is constantly changing the assigned wavelength to each ONU. Meanwhile, with ShaWaG the OLT employs the dedicated wavelength and it only uses the backup wavelength to fulfil the bandwidth requirements for certain loads. Finally, Fig. 6 shows the results for the SLA2 profile. In that case, ShaWaG uses the dedicated wavelength (λ2) for loads up to 0.4 (ONUs transmitting at 40 Mbit/s). However, it starts to assign the backup wavelength (λ3) for loads higher than this value to ensure the guaranteed bandwidth of 50 Mbit/s. Whereas ShaWaG only activates the backup wavelength when it needs to fulfil the bandwidth requirements,
Boffered ShaWaG SLA0
70
Boffered DyWaS-SLA SLA0
60
Boffered DyWaS-SLA SLA1 Boffered DyWaS-SLA SLA2
50 40 30 20 0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
60
DyWaS-SLA λ0
50
DyWaS-SLA λ2
DyWaS-SLA λ1 ShaWaG λ0
40 30 20 10 0 0.1
1.0
ONU Load
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
ONU load
Fig. 3. Demanded and offered bandwidth to Fig. 4. Percentage of the wavelength utilizaone ONU of each SLA versus the ONU load tion versus the ONU load for the SLA0 for for ShaWaG and DyWaS-SLA for 52 ONUs ShaWaG and DyWaS-SLA and 52 ONUs 90
% of Wavelength Utilization of SLA2
% of Wavelength Utilization of SLA1
100 90 80
DyWaS-SLA λ0
70
DyWaS-SLA λ1 DyWaS-SLA λ2
60
ShaWaG λ1 ShaWaG λ3 backup
50 40 30 20
DyWaS-SLA λ1
70
DyWaS-SLA λ2
60
ShaWaG λ3 backup
ShaWaG λ2
50 40 30 20 10
10 0 0.1
DyWaS-SLA λ0
80
0.2
0.3
0.4
0.5
0.6
ONU load
0.7
0.8
0.9
1.0
0 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
ONU Load
Fig. 5. Percentage of the wavelength utiliza- Fig. 6. Percentage of the wavelength utilization versus the ONU load for the SLA1 for tion versus the ONU load for the SLA2 for ShaWaG and DyWaS-SLA and 52 ONUs ShaWaG and DyWaS-SLA and 52 ONUs
Shared Wavelength Assignment Algorithm in Multi-profile WDM-EPONs
163
DyWaS-SLA uses in both scenarios the three wavelengths an the OLT keeps on switching the assigned wavelength to each ONU in every cycle. Finally, it is noticeable that ShaWaG keeps the dedicated wavelength overloaded even for low loads as the number of ONUs associated with this profile is higher than the ONUs of SLA0 and SLA1 profiles. The novel algorithm ShaWaG, assumes the utilization of one shared wavelength (λ3) among every profile (SLA0, SLA1 and SLA2) only when ONUs of one SLA needs to ensure their stipulated guaranteed bandwidth. In case several SLAs require this backup wavelength, ShaWaG gives preference to the highest priority profile. Once the highest priority profile is ensured its guaranteed bandwidth, ShaWaG assigns λ3 to the next profile which needs it. Otherwise, this backup wavelength is not activated. In this way, Fig. 7 represents the wavelength utilization of the backup wavelength λ3 for the three profiles SLA0, SLA1 and SLA2, when different number of ONUs (from 32 to 64 ONUs) is considered in the upstream. It can be noticed that the highest priority profile SLA0 does not need this wavelength even when the number of ONUs is set to 64. Thus, only SLA1 and SLA2 profiles demands its utilization for a number of ONUs higher than 36. It can be observed that the lowest priority profile (SLA2) requires more its utilization because the number of ONUs related to this profile is higher that the ONUs related to the medium priority profile SLA1. However, when the number of ONUs is higher or equal than 56, ShaWaG assigns more frequently the backup wavelength to SLA1 to the detriment to SLA2. It happens because ShaWaG is designed so that if several SLAs demand the backup wavelength, it gives priority to the highest profile which needs it more to satisfy its bandwidth requirements. % of Utilization of the Backup Wavelength by SLA0, SLA1 and SLA2
60
ShaWaG SLA1 λ3 backup 50
ShaWaG SLA2 λ3 backup
40
30
20
10
0 32
36
40
44
48
52
56
60
64
Number of Upstream ONUs
Fig. 7. Percentage of the wavelength utilization of the backup wavelength (λ3) versus the number of ONUs in ShaWaG to each profile
On the other hand, in the initial simulation scenario, DyWaS-SLA was assumed to support three upstream shared wavelengths. However, in Fig. 3 it was demonstrated that DyWaS-SLA cannot efficiently comply with the guaranteed bandwidth levels for SLA1 and SLA2 profiles. Besides, this number of wavelengths can be changed depending on the service provider requirements and the number of ONUs connected to the WDM-EPON. In this way, we have analyzed the performance of both algorithms when DyWaS-SLA supports a different number of wavelengths in the upstream. In particular, we consider scenarios with a range of wavelengths from two to four. As the guaranteed bandwidth is the most important aim of both algorithms, it is going to be
164
N. Merayo et al.
studied the maximum offered bandwidth to each profile for a different number of ONUs. This study permits to determine the limit of both algorithms for each set of upstream wavelengths when different number of ONUs shared the upstream. Regarding the highest priority profile SLA0, Fig. 8 (a) represents the maximum offered bandwidth to one ONU of this SLA (in Mbit/s) when different number of ONUs (from 32 to 64 ONUs) share the upstream and it supports different number of wavelengths (for two to four). It should be noticed that the bandwidth offered to every ONU of the same SLA is the same, as all of them have the same traffic distribution. In this figure, it can be observed that DyWaS-SLA offers more bandwidth to each ONU as the number of wavelengths increases. When the number of upstream wavelengths is set to four, DyWaS-SLA provides the stipulated guaranteed bandwidth for the highest number of represented ONUs (i.e. 64 ONUs). However, when the upstream supports three wavelengths, DyWaS-SLA only ensures the guaranteed bandwidth up to 48 ONUs. Furthermore, if the upstream only allows two wavelengths, DyWaS-SLA is not able to guarantee the stipulated bandwidth to the SLA0 even when the considered ONUs is set to 32. It happens since the two wavelengths have to be shared by every ONU and the total demanded bandwidth of ONUs is higher than the contained bandwidth within these two wavelengths. On the contrary, ShaWaG provides the guaranteed bandwidth independently of the number of ONUs which shares the upstream. SLA1 profile
SLA0 profile Maximum offered Bandwidth to one ONU of SLA1 (Mbit/s)
Maximum offered Bandwidth to one ONU of SLA0 (Mbit/s)
100
100
90
80
70
60
50
40 32
DyWaS-SLA 4 Wavelenghts DyWaS-SLA 3 Wavelenghts DyWaS-SLA 2 Wavelenghts ShaWaG 36
40
44
48
52
56
60
90 80 70 60 50 40 30 20 32
64
DyWaS-SLA 4 Wavelenghts DyWaS-SLA 3 Wavelenghts DyWaS-SLA 2 Wavelenghts ShaWaG 36
40
ONU Load
44
48
52
56
60
64
Number of ONUs
(a)
(b) SLA2 profile Maximum offered Bandwidth to one ONU of SLA2 (Mbit/s)
DyWaS-SLA 4 Wavelenghts DyWaS-SLA 3 Wavelenghts DyWaS-SLA 2 Wavelenghts ShaWaG
100 90 80 70 60 50 40 30 20 32
36
40
44
48
52
56
60
64
Number of ONUs
(c) Fig. 8. Maximum offered bandwidth of DyWaS-SLA and ShaWaG to each ONU of each profile (a) SLA0 (b) SLA1 (c) SLA2
Shared Wavelength Assignment Algorithm in Multi-profile WDM-EPONs
165
Hence, for the maximum number of ONUs (i.e. 64 ONUs), ShaWaG fulfils every guaranteed bandwidth levels with only two wavelengths allowed to each ONU. In contrast, DyWaS-SLA needs four wavelengths to satisfy the same bandwidth requirements when the number of ONUs is set to 64. For the SLA1 profile, in Fig. 8 (b) it is seen that ShaWaG ensures the guaranteed bandwidth for every set of represented ONUs with only two wavelengths. However, DyWaS-SLA needs the presence of four wavelengths to ensure the stipulated guaranteed bandwidth for every range of ONUs. Consequently, the novel algorithm deals better than DyWaS-SLA with the bandwidth requirements, using at the same time a lower number of wavelengths. Finally, for the SLA2 profile, Fig. 8 (c) shows that although ShaWaG only supports two upstream wavelengths, it ensures the guaranteed bandwidth for up to 60 ONUs. In contrast, for the same number of wavelengths, DyWaS-SLA cannot provide the guaranteed bandwidth even the number of ONUs is only 32. Although the number of wavelengths is increased to three, DyWaS-SLA only guarantees the minimum bandwidth up to 52 ONUs, so it needs four upstream wavelengths to achieve the same performance than ShaWaG.
5 Conclusions In this paper, it has been proposed an algorithm, called ShaWaG, to support subscriber differentiation in a WDM-EPON. The new algorithm distributes the bandwidth according to a set of weights to efficiently ensure a guaranteed bandwidth to each profile when the available bandwidth is not enough to cover the demand of every of them. In contrast to other algorithms proposed in WDM-EPONs, this algorithm deals with the related cost of these architectures, by only permitting each ONU to transmit in a limited range of wavelengths according to the bandwidth requirements of its contracted SLA. ShaWaG has been compared with DyWaS-SLA as it is a very efficient method which also takes into consideration bandwidth guarantees in a multi-profile scenario. However, this scheme allows ONUs to transmit through the same set of upstream wavelengths. As a consequence and contrary to ShaWaG, it does not minimise the number of upstream wavelengths dedicated to each ONU to save cost. Simulation results show that ShaWaG efficiently ensures guaranteed bandwidth levels for every profile for a larger number of ONUs when compared to DyWaS-SLA. Not only ShaWaG makes a better conscious bandwidth distribution than DyWaS-SLA, but also it does with a lower number of upstream wavelengths. To achieve similar performance, DyWaS-SLA requires a higher number of upstream operating wavelengths. In fact, with only two wavelengths, ShaWaG can deal with the guaranteed bandwidth levels of up to 60 ONUs. In contrast, DyWaS-SLA needs four wavelengths to fulfill the bandwidth requirement of the same number of ONUs. As a conclusion, ShaWaG saves costs in each ONU achieving better performance than more expensive architectures which consider a higher number of upstream channels. Regarding the percentage utilization of every upstream wavelength made by both algorithms, it has been demonstrated that DyWaS-SLA simultaneously used in the same proportion every upstream wavelength. It means that the OLT is constantly switching the laser to assign different wavelengths in each cycle. Consequently, it
166
N. Merayo et al.
may lead to poor bandwidth utilization due to the dead tuning time if high speed tunable lasers are not deployed. In contrast, in ShaWaG the OLT assigns the dedicated wavelength to each profile, whereas it only employs the backup wavelength to fulfil the bandwidth requirements of certain profiles. Acknowledgments. This work has been supported by the GR72 Excelence Group funding by the Regional Ministry of Castilla y León (Junta de Castilla y León).
References 1. Kramer, G., Mukherjee, B., Maislos, A.: Ethernet Passive Optical Networks. In: Dixit, S. (ed.) Multiprotocol over DWDM: Building the Next Generation Optical Internet, pp. 229–275. John Wiley & Sons, Chichester (2003) 2. Pesavento, M., Kelsey, A.: PONs for the Broadband Local Loop. Lightwave 16, 68–74 (1999) 3. Luo, Y., Ansari, N.: Bandwidth allocation for multiservice access on EPONs. IEEE Communications Magazine 43, 16–21 (2005) 4. Byun, H.-J., Nho, J.-M., Lim, J.-T.: Dynamic bandwidth allocation algorithm in ethernet passive optical networks. Electronics Letters 39, 1001–1002 (2003) 5. Ma, M., Zhu, Y., Cheng, T.-H.: A bandwidth guaranteed polling MAC protocol for ethernet passive optical networks. In: 22th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM 2003), San Francisco, pp. 22–31 (2003) 6. Assi, C., Ye, Y., Dixit, S., Ali, M.A.: Dynamic Bandwidth Allocation for Quality-ofService over Ethernet PONs. IEEE Journal on Selected Areas in Communications 21, 1467–1477 (2003) 7. Chang, C.-H., Kourtessis, P., Senior, J.M.: GPON service level agreement based dynamic bandwidth assignment protocol. Electronics Letters 42, 1173–1174 (2006) 8. Merayo, M., Durán, R.J., Fernández, P., de Miguel, I., Lorenzo, R.M., Abril, E.J.: EPON bandwdith allocation algorithm based on automatic weight adaptation to provide client and service differentiation. Photonic Network Communication 17, 119–128 (2009) 9. McGarry, M.P., Maier, M., Reisslein, M.: WDM Ethernet Passive Optical Networks (EPONs). IEEE Communications Magazine 23, 187–195 (2005) 10. McGarry, M.P., Reisslein, M.: Bandwidth Management for WDM EPONs. Journal of Optical Networking 5, 627–654 (2006) 11. Dhaini, A.R., Assi, C.M., Maier, M., Shami, A.: Dynamic Bandwidth Allocation Schemes in Hybrid TDM/WDM Passive Optical networks. Journal of Lightwave Technology 5, 277–286 (2007) 12. Hsueh, Y.-L., Rogge, M.S., Yamamoto, S., Kazovsky, L.G.: A highly flexible and efficient passive optical network employing dynamic wavelength allocation. Journal of Lightwave Technology 23, 277–286 (2005) 13. Kwong, K.H., Harle, D., Andonovic, I.: Dynamic Bandwidth Allocation Algorithm for Differentiated Services over WDM EPONs. In: IEEE International Conference on Communications Systems (ICCS), Singapure, pp. 116–120 (2004) 14. Dhani, A.R., Assi, C.M., Shami, A.: Quality of service in TDM/WDM Ethernet Passive Optical Networks (EPONs). In: 11th IEEE Symposium on Computers and Communications (ISCC 2006), Sardinia, Italy, pp. 621–626 (2006)
Shared Wavelength Assignment Algorithm in Multi-profile WDM-EPONs
167
15. Merayo, N., González, R., de Miguel, I., Jiménez, T., Durán, R.J., Fernández, P., Lorenzo, R.M., Aguado, J.C., Abril, E.J.: Hybrid dynamic bandwidth and wavelength allocation algorithm to support multi-service level profiles in a WDM-EPON. In: Hei, X.J., Cheung, L. (eds.) AccessNets 2009. LNICST, vol. 37, pp. 1–13. Springer, Heidelberg (2010) 16. Kramer, G., Mukherjee, B., Ye, Y., Dixit, S., Hirth, R.: Supporting differentiated classes of service in Ethernet passive optical networks. Journal of Optical Networking 1, 280–298 (2002) 17. Opnet Modeler Technologies, http://www.opnet.com 18. Sherif, S.R., Hadjiantonis, A., Ellinas, G., Assi, C.M., Ali, M.: A novel decentralized Ethernet-Based PON Access Architecture for Provisioning Differentiated QoS. Journal of Lightwave Technologies 22, 2483–2497 (2004) 19. IEEE 802.3ah Ethernet in the First File Task Force, IEEE 802.3ah Ethernet in the First File Task Force home page, http://www.ieee802.org/3/efm/public/ 20. NTT, NTT VDSL service plan, http://www.asist.co.jp/jensspinnet/bflets.html
Towards Sustainable Broadband Communication in Rural Areas Amos Nungu and Bj¨ orn Pehrson Telecommunication Systems Lab, Forum 120, 164 40 Kista, Sweden {amnungu,bpehrson}@kth.se
Abstract. As part of the development of a general strategy, we present a framework for the establishment of sustainable broadband communication in under-served areas of developing regions often described in terms of low population density, low purchasing power, intermittent power supply, and lack of competent human resources. Due to an increasing political awareness of the importance of ICT for development, not least due to the explosive expansion of the mobile phone networks, such regions are getting more attention also regarding broadband infrastructure. Our research includes experimental validation of a community networking approach based on affordable high-performance, low-effect technologies focusing on pilot projects in Tanzania. Keywords: Community networks, Broadband Networks, Developing regions, Rural Areas, economics, sustainability.
1
Introduction
There is an increasing awareness of the importance of Information and Communication Technologies (ICT) for development, including both mobile access and broadband communication infrastructures. There are clear indications that this awareness is now affecting mainstream planning of development activities of key institutions. ITU has formulated a broadband vision: “Build broadband networks and everything else will follow [1]”. The 2009 world bank report on Information and Communication for Development observed a correlation between broadband connections and the economy of a region [4]. Although there is no reason to believe that the societies and citizens in developing regions have different communication needs than those in developed regions, the broadband penetration in developing regions, especially Africa, is very poor compared to developed world [2]. There are many reasons for this: first, the often under-developed policies and regulatory frameworks create political risks adding to the commercial risks. Second, there is a lack of supporting infrastructures, such as optical fibre networks, electrical power and developed supply chains. Third, the traditional business models used by most network operators and service providers lead to high perceived commercial risks. There is a misconception that communication networks and services will be provided by commercial markets, if only there is a demand. This may be true R. Szab´ o et al. (Eds.): AccessNets 2010, LNICST 63, pp. 168–175, 2011. c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
Sustainable Community Networks
169
in densely populated areas of developed regions. It is, however, definitely not true in developing regions nor in sparsely populated areas of developed regions. We argue that for developing regions to catch up, public investments have to be used as drivers. Although most national budgets are strained, there is currently the opportunity to take advantage of the commitment of the developed world to support efforts leading to the Millennium Development Goals (MDG) [3]. To achieve this, we propose the creation of self-sustained local area broadband islands serving local communication needs, even if there are currently no, or only narrowband, external connections due to the unavailability or too high price of uplinks. This kind of networks are easy to build and increasingly found in many under-served areas. We use the term “local” and “community” networks interchangeably to refer to district or municipal networks. Also, broadband in this paper is referring to high speed connectivity within the local network, not necessarily the uplink. We demonstrate, in a case study, that financial and operational sustainability can be achieved for communication networks in rural and remote areas, given that a proper environment exist. Our methodology is based on both own and related academic and professional work as well as the evaluation of implementations specific to a project in Tanzania. The organization of the rest of the paper is as follows. Section 2 highlights related work in this area. In Section 3, we discuss the framework required to create a sustainable broadband networks in under-served areas. Section 4 is a presentation of how we applied the framework into a running project in Tanzania. A summary and conclusions is provided in Section 5.
2
Related Work
We have found no previously published work taking a holistic approach that is similar to ours. Previous studies address specific issues, relating to either technology [11,15], application [13] or environmental challenges [12] in underserved areas. We will discuss two references that closely relate to our work. Gillett et al. [14] observed that certain geographical areas and populations lag behind others in terms of Internet access. The author noted that municipals can contribute in different roles to accelerate broadband in such areas, such as a broadband user creating a demand, as policy-maker defining the rules, financial supporter and infrastructure developer. In our experience, due to the fact that most municipals in developing countries rely completely on the central government for their budget and policies, we believe that municipals in these areas can facilitate broadband mainly by being consumers of broadband services in their own work procedures and as fall-back producers of broadband services in a utility branch as long as there is no or little commercial interest to provide such services. Munir et. el. [5] proposed three steps to follow when deploying a Municipal Wireless Networks: step 1: identifying goals, stakeholders and governing policy; Step 2: designing the infrastructure and securing funding; and step 3: actual
170
A. Nungu and B. Pehrson
implementation. The authors assume that the municipals are “ICT aware” and that they have access to technical competence and funding opportunities. These assumptions do not always apply in the under-served areas we are focusing on.
3
Framework for Establishment of Sustainable Broadband Markets
The balance between demand and supply required to sustain a broadband market in under-served areas can be improved beyond the limitations of traditional approaches through community interventions such as: raising ICT awareness and capacity building; understanding and developing the market; providing services that optimize existing solutions; innovative use of existing technologies; and the use of flexible business models. We will discuss these items in detail. 3.1
Awareness Raising and Capacity Building
There is a need to raise the awareness and competence of all stakeholders on the communication market, among consumers so that they can demand the services they need to improve their performance and among producers to become competitive by providing innovative solutions in cost-effective ways and to manage and maintain the existing broadband networks and services efficiently. In our approach, the awareness raising and capacity building is provided by involving students and faculty members at universities, local government and private sector in a cooperative multi-stakeholder framework for integration of development projects and problem-oriented, project driven learning, including individual learning towards master and doctoral level degrees, organisational learning towards certification and consortia learning towards a deeper understanding among all stakeholders of a modern communication market [17]. 3.2
The Broadband Market
A communications market consists of consumers, producers, policy-makers and regulators. The paper discusses the market at a municipal level. Consumer: The consumers own the applications and requests for services Municipals are the main buyers of broadband services for internal operations, or services to the citizens. Other consumers include the private sector, community organisations and households/citizens. Producer: On the producer side there is a whole supply chain including the network operator(s), application developers, service and content providers. Municipals are content producers for the public consumptions and business entities. Private sector should develop contents and other services such as trainings for its own consumption as well as selling to the government.
Sustainable Community Networks
171
Policy-Maker and Regulator: Regulatory authorities are set up to translate legislation passed by the policy-makers into operational regulatory frameworks, to arbitrate between the interests within and between the consumer and producer groups, and to give government a fair share of the revenues of the communication business in terms of license fees. Our model includes a component focusing on advocacy to include a special category in the regulatory framework for community networks in under-served areas of no or little commercial interest, turning license fees into support, e.g. from universal access funds. Even though most of the policies are spelled out by the central government, municipalities should facilitate whatever is in their power and provide feedback to get attention to their special needs. 3.3
Application and Services
When discussing broadband communication, there is often a fixation on Internet access to connect the under-served areas into the global village. However, in our approach, this is secondary as connection to the Internet (uplink) is very expensive. Our primary focus is on applications within the local network. The purpose is to support the development of basic public sector services required to progress towards the MDGs, primarily healthcare, education, local administration and support to local entrepreneurs. Applications include telemedicine for consultation between rural health centres and district or referral hospitals, tele-teaching and sharing of learning material between schools, public administrative services, portal for market information and for marketing of local entrepreneurs, etc. 3.4
Technology
Technology selection at a specific location is affected by the demand for services, the regulatory environment, the physical environment, and what is possible to implement and maintain sustainably. The availability of affordable highperformance, low-effect network elements based on open source software and selected off-the-shelf hardware components facilitates deployment of advanced networks. How advanced depends mainly on the availability of communication links. The Wireless Fidelity (WiFi) technology has gained acceptance in community networks due to its mass production, making it cheap. Also, it is easy to install and maintain. Optical Fibre Links has high capacity and durability but requires specialized training and tools for installation and maintenance. Any civil works involved in fibre deployment is expensive although there are sometimes innovative cooperative approaches to this in rural areas. There are examples from rural areas of developed regions where local inhabitants sitting on right of way and appropriate machinery, contribute the civil work part while the telecom or power utility company provides the fibre cables and active network elements [10]. Once the fibre link is available, 1Gbps link up to some 160 km and 10Gbps up to some 80 km are very cheap. Longer distances require signal amplifiers.
172
3.5
A. Nungu and B. Pehrson
Business Model
Our approach is to attract the capital expenses (CAPEX) required to establish the network from the government budget or development agencies to support basic public services while the operational expenses (OPEX) has to come from consumers as fees paid for the services consumed. The management model will be affected by the source of funding as well as the guiding policies. However, the public private partnership (PPP) is the most suitable as it will combine the benefits from both sides: the entrepreneurship and technical know how from the private sector and access to funding from the government side.
4
The Serengeti Pilot
The ICT for Rural Development (ICT4RD) in Tanzania [7] is a multi-stakeholder research and development project with the objective to develp a scalable model for establishment of sustainable broadband networks in under-served areas. To explore possible technical solutions, economical parameters, business models and models for support to local entrepreneurs, the program established two pilot sites. This paper will reference the Serengeti pilot, deployed in the north of Tanzania, connecting two district capitals to prove the proposed framework. 4.1
Awareness Raising and Capacity Building
The project conducted several ICT awareness raising workshops and training seminars. Eight Tanzanians have been trained at a master of science level. During their thesis work, all of them directly contributed in the technical design and implementations. Also, most of the applications are developed or customized using students from the academic partners in collaboration with locals. 4.2
The Broadband Market
The findings of the baseline study carried out in 2006 [8] revealed that main activities in both districts are agriculture and livestock, carried out by about 90% of the population. Also, both districts rely in the central government for about 95% of their budgets. When assessing the market in 2008, the author [9] observed both municipals having several computers, also produce a lot of information public consumption. Furthermore, the author noted a vibrant private sector and community organisations who could both consume and produce broadband services. Policy - Tanzania passed its Universal Communications Service Access Act in 2006 and the Universal Communications Service Access Regulations in 2009. The establishment of the Universal Communications Service Access Fund (UCAF) was completed in 2009. The government has promised to use this fund to expand connectivity in rural areas. The Serengeti network diagram is provided in Fig. 1, the network implementation is detailed in [6].
Sustainable Community Networks
173
Fig. 1. Serengeti Network
4.3
Services and Applications
Serengeti pilots is a self sustained broadband Island with its own network services hosted locally, running a domain name, mail, web and VoIP servers. The idea is to facilitate communication within each sector and between sectors at a district level. Current services includes: – e-Governance: The district website provides various informations to the public. It also provides blogs, forums and chat for exchange ideas and provide feedbacks. – e-Health: District hospitals providing consultation to connected primary health centres via video-conferencing and VoIP. – e-Learning: Various contents are captured and stored into the server to be accessed by students and teachers. Apart from the broadband connectivity locally, there is also a narrowband VSAT connection to the Internet which is used in variate ways as discussed in [16]. Fig. 2 further provides a summary of Internet utilization. Like any other region, the social networks seems popular. Access to local newspapers seems also to increase as times goes on. 4.4
Technology
Our model is about sharing of the passive infrastructure at cost-related rate. To connect the 2 districts, we are using a fiber cable provided by the power utility company, in return we offer them an Internet connection. WiFi, which is easily accessible, affordable and require a minimum knowledge and equipments to build and maintain is used in the last mile, connecting the end users as shown in Fig. 1.
174
A. Nungu and B. Pehrson
Fig. 2. Internet Traffic Growth
4.5
Business Model
An ICT Board, registered as a not for profit company, with members from the government, private sector and community representatives was formed. It is a pure PPP model, tasked to oversee ICT matters in the district, including managing the broadband network. By providing services to government institutions (health, education and governance), it was able to raise CAPEX from a development partner, Swedish International Development Agency (SIDA) to build the network. The OPEX, including technician salary and Internet fees is achieved through contributions from users. Current income, mainly from Internet charges is 2.5M TZS while expenses is 2M TZS. From their own budgets, the municipalities provide funds to extend the network to cover new schools and hospitals. Also, the PPP model facilitates getting entrepreneurs setting up ICT centers which extends the network, provide basic ICT trainings (capacity building) as well as Internet access points.
5
Conclusion
Our approach to stimulate the establishment of sustainable broadband markets in areas of little or no commercial interest has been validated in real pilot networks. The Serengeti pilot discussed in this paper is one of them. Our main contributions include: 1) the holistic market oriented approach involving all interested stakeholders in a localizable capacity building framework integrating development activities and learning on the individual, organisational and consortia levels in parallel, and 2) the innovative use of leading edge technology to establish a high-performance, low-effect broadband network at a very low cost.
Sustainable Community Networks
175
While our approach has developed to a point where we are confident to disseminate it further, we will continue to stimulate and monitor the developments of the pilots, intensify our research on green networks and risk management methods that can attract commercial actors to go into under-served areas in early stages.
References 1. The International Telecommunication Union (ITU), http://www.itu.int/en/broadband/Pages/default.aspx 2. ITU report: Measuring the Information Society - The ICT Development Index (2009) 3. United Nations Millennium Development Goals (MDGs), http://www.un.org/millenniumgoals/ 4. Qiang, C.Z., Carlo, M.R.: Information and Communications for Development 2009: Extending Reach and Increasing Impact. 3550, Washington, DC., World Bank 5. Mandviwalla, M., Jain, A., Fesenmaier, J., Smith, J., Weinberg, P., Meyers, G.: Municipal Broadband Wireless Networks. J. Communications of the ACM (2008) 6. Nungu, A., Genesis, N., Pehrson, B.: Serengeti Broadband. In: The 2nd ACM SIGCOMM Workshop on Networked Systems for Developing Regions, Seatle (2008) 7. Information and Communication Technology for Rural Development (ICT4RD) Tanzania, http://www.ict4rd.ne.tz/ 8. Mascarenhas, O.: Serengeti Baseline Study (2006) 9. Mlongetcha, M.: Towards Community Ownership and Sustainability of ICT4RD Projects in Tanzania (2008) 10. Skelleftea Community Broadband and IT, http://www.skelleftea.se/default.aspx?id=17245 11. Brewer, E., Demmer, M., Du, B., Fall, K., Ho, M., Kam, M., Nedevschi, S., Pal, J., Patra, R., Surana, S.: The Case for Technology for Developing Regions. J. IEEE Computer. 38, 25–38 (2005) 12. Brewer, E., Patra, R., Surana, S.: Simplifying Fault Diagnosis in Locally Managed Rural WiFi Networks. In: NSDR 2007 (2007) 13. Surana, S., Patra, R., Nedevschi, S., Brewer, E.: Deploying a Rural Wireless Telemedicine System: Experiences in Sustainability. J. IEEE Computer 41(6), 48–56 (2008) 14. Gillett, S.E., Lehr, W.H., Osorio, C.: Local government broadband initiatives. J. Telecommunications Policy 28, 537–558 (2004) 15. Raman, B., Chebrolu, K.: Experiences in using WiFi for Rural Internet in India. J. IEEE Communications Magazine 45, 104–110 (2007) 16. Nungu, A., Pehrson, B.: Impact of ICT in Rural Areas: The Case Study of Chalinze ICT Center in Tanzania. In: 7th Open Access Conference, Accra, Ghana (2009) 17. Communication Systems Design, a problem oriented project-driven environment integrating development and learning, www.tslab.ssvl.kth.se/csd/
Characterization of BitTorrent Traffic in a Broadband Access Network Zolt´an M´ ocz´ar and S´ andor Moln´ ar High Speed Networks Laboratory Dept. of Telecommunications and Media Informatics Budapest Univ. of Technology and Economics H–1117, Magyar tud´ osok krt. 2., Budapest, Hungary
[email protected],
[email protected]
Abstract. BitTorrent as one of the leading P2P file sharing applications has dominant traffic in broadband access networks. In this paper we present the main characteristics of BitTorrent traffic based on actual measurements taken from a commercial network. Analysis results at both application- and flow-levels are presented and discussed. Keywords: traffic measurements, traffic analysis, BitTorrent.
1
Introduction
BitTorrent [1] is one of the most popular peer-to-peer (P2P) file sharing application in the Internet. It was designed by Bram Cohen in April 2001 and the first implementation was released on July 2, 2001. After a few years, BitTorrent reached extremely high popularity with 10 million users in 2005, and we can identify that the popularity increases continuously in the previous five years. The mechanism of BitTorrent is based on the organization of peers sharing the same file in a P2P network, and this method ensures an efficient replication to distribute the shared file. The file is divided into little chunks, and a peer can perform a simultaneous download of more chunks. The exchange of file chunks is motivated by an incentive mechanism, which enables peers with high uploading bandwidth to reach high downloading bandwidth. Such mechanism can prevents free riding effectively, which is very common in P2P systems. A majority of studies focuses on the identification of P2P traffic including the popular P2P applications like BitTorrent, Gnutella, and eDonkey [2,3,4]. There are several other studies addressing the characteristics of BitTorrent traffic. Choffnes and Bustamante pointed out that testbed-based views of Internet paths are surprisingly incomplete concerning BitTorrent and many other applications. This message gives us a warning and emphasizes the need for using actual measurements for analysis [5]. In the last years, several articles have been published, which analyze the behavior of BitTorrent. For example, Erman et al. presented a study on modeling and evaluation of the session characteristics of BitTorrent traffic [6]. They found that session inter-arrivals can be accurately R. Szab´ o et al. (Eds.): AccessNets 2010, LNICST 63, pp. 176–183, 2011. c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
Characterization of BitTorrent Traffic in a Broadband Access Network
177
modeled by the hyper-exponential distribution while session durations and sizes can be reasonably well-modeled by the log-normal distribution. Andrade et al. studied three BitTorrent content sharing communities regarding resource demand and supply. The study introduced an accurate model for the rate of peer arrivals over time. They also found that a small set of users contributes most of the resources in the communities, but the set of heavy contributors changes frequently and it is typically responsible only for a few resources used in the distribution of an individual file [7]. Authors in used full packet traces collected at a large edge network. They focused on three flow-level metrics, namely flow size, flow inter-arrival time and flow duration. In addition they also studied three host-level metrics: flow concurrency, transfer volume and geographic distance. Based on the results they presented flow-level distribution models. The main goals of the paper are towards a better understanding of the characteristics of BitTorrent traffic found in broadband access networks at both application- and flow-levels. We have carried out a comprehensive analysis study based on measurements taken from a commercial network. Our results shows the dominance of the BitTorrent traffic and we also found that the BitTorrent mechanism with the optical high bandwidth symmetrical access produced a double amount of upload BitTorrent traffic volume compared to the download traffic volume. The flow level results shows a detailed characterization concerning BitTorrent flow size, flow duration and number of packets including both TCP and UDP flow components of BitTorrent traffic. The paper is organized as follows. In Section 2 we discuss the details of measurements including the network architecture and analysis tools. Section 3 and Section 4 present our analysis results at application- and flow-levels, respectively. Finally, Section 5 concludes the paper with our main results.
2
Traffic Measurements
Our measurements were taken from one of the commercial networks in Stockholm, Sweden. This company maintains network infrastructure for several service providers. These ISPs provide many different services like Internet access, IP telephony or IPTV for both residential and business users. The traffic in the measurements was generated by more than 1800 customers. The network infrastructure of Swedish backbone network and the related residential network are shown in Fig. 1. The network company did not give any further information regarding subscribers due to privacy issues. The backbone network consists of three core routers linked to each other with 3 Gbps optical fibres. The subscribers are connected to the area switches and their traffic is aggregated in a migration switch through 100 Mbps links. The migration switch is linked to one of the core routers with a 1 Gbps capacity link. The workstation responsible for data capturing was connected to one of the core routers with a 1 Gbps capacity fibre. The router mirrored its traffic to the workstation, which let the capturing device store and dump the data on its hard drives. Only the packet headers were captured to get information for the analysis such as protocol, size and direction.
178
Z. M´ ocz´ ar and S. Moln´ ar
Fig. 1. The architecture of the measured network
Traffic identification was done with a tool developed by Ericsson Hungary. This software uses various techniques to identify the traffic such as port-based, signature-based and heuristic-based approaches, however, the algorithm is not public. Table 1 describes the main parameters of the investigated traces where FL denotes the flow-level data sources. After the preprocessing phase the cleaned data were loaded into database tables using Microsoft SQL Server 2005. Data retrieving was performed by SQL queries, and the results were processed by Matlab routines. Moreover, Matlab was also used for visualizing and creating charts. Table 1. Basic description of measurements
3
Trace
Measurement period October, 2008 [(day) hour:min]
Duration [hour:min]
Flows [million]
Packets [million]
FL-1 FL-2
(7) 11:18 – (8) 22:16 (7) 15:00 – (7) 15:59
34:59 01:00
59.1 1.53
3892 69.96
Application-Level Analysis of BitTorrent
In this section we give some basic description about the measured traffic including the daily profile and user ranking based on the number of incoming and outgoing packets followed by a detailed discussion on the main characteristics of BitTorrent.
Characterization of BitTorrent Traffic in a Broadband Access Network
179
150
Total traffic [GB]
125 100 75 50 25 0
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Hour
Fig. 2. Daily profile (FL-1)
The daily profile of the traffic is given in Fig. 2. We can see that the peak hours are between 17 PM and 20 PM. In this time frame the traffic volume is about six times higher than in non-busy hours in the mornings around 5 AM. Fig. 3 shows the measured incoming (downlink) packets as a function of the measured outgoing (uplink) packets. Every point in the figure corresponds to one user. The linear fitting tells that an average user generates approximately 25% more downlink packets, though the variance is quite large. 5
8
x 10
Number of incoming packets
y = 1.25x
7 6 5 4 3 2 1 0
0
1
2
3 4 5 6 Number of outgoing packets
7
8 5
x 10
Fig. 3. User ranking based on the number of incoming and outgoing packets (FL-1)
Table 2 represents the incoming, outgoing and total traffic volumes generated by BitTorrent. It clearly indicates that the amount of data uploaded by BitTorrent users is more than the downlink traffic volume. More precisely, 68% of the total BitTorrent traffic is uplink traffic and this is two times higher amount of data compared to the BitTorrent downlink traffic. This is due to the usage of BitTorrent since the finished downloads stay in the queue and they are automatically shared with other users. However, the users can remove these shared
180
Z. M´ ocz´ ar and S. Moln´ ar
files from the queue, but fortunately most of them do not do that what is crucial for the efficient operation of the BitTorrent network. Consequently, their uplink traffic increases by the finished and shared downloads. The volume of uplink traffic is not limited since most of the users have symmetric optical high bandwidth access. Table 2. The traffic volume generated by BitTorrent (FL-2)
4
Incoming
Outgoing
Total
7.5 GB (32%)
15.8 GB (68%)
23.3 GB
Flow-Level Analysis of BitTorrent
In the following analysis both incoming and outgoing traffic have been considered and results are related to the total traffic. In the investigated one and a half day long period 1217 GB was downloaded (incoming traffic) and 1568 GB was uploaded (outgoing traffic). We observed that the dominance of BitTorrent uploading determines the general picture and makes the uplink traffic volume 30% higher than the downlink traffic volume. Additionally, it gives the explanation of why the ratio is different for the number of incoming and outgoing packets (see Fig. 3), which is in contrast to the previous results. 4
2
x 10
Size [B]
1.5
1
0.5
0
0
200
400 600 Duration [s]
800
1000
Fig. 4. Relationship between flow size and duration (FL-2)
Fig. 4 illustrates the relationship between flow size and duration for BitTorrent in an enlarged view. Every point in Fig. 4 and Fig. 5 represents exactly one flow. There are two different clusters in Fig. 4: a cluster concentrated around the horizontal and vertical axis, and another one bounded by the vertical lines. Our analysis showed that the bytes carried by a BitTorrent flow are almost independent of the flow duration. We found that in the first cluster 98.3% of BitTorrent flows have a duration less than 200 s and a size smaller than 4 kB.
Characterization of BitTorrent Traffic in a Broadband Access Network
181
Furthermore, 85.2% of these flows are transferred over UDP and only 14.8% of them are sent over TCP. Concerning duration the flows of the second cluster fall in the interval 300 s and 365 s. In this region about 97.2% of flows are related to TCP and only a negligible amount of data is transmitted over UDP. 5
6
x 10
5
Size [B]
4 3 2 1 0
0
200
400 600 Number of packets
800
1000
Fig. 5. Relationship between flow size and number of packets (FL-2)
Fig. 5 depicts the relationship between flow size and number of packets. The size of the largest packet is near to the maximum size of the Ethernet frame close to 1500 bytes, and it can be observed that BitTorrent flows are highly scattered in the examined dimensions. Fig. 6 shows the histogram of the number of packets. We can observe that the number of BitTorrent flows have a heavy-tailed decrease for the number of packets more than 100. More deeply analysis revealed the interesting property that for the whole measurement period only 0.1% of flows contain a unique number of packets. Fig. 7 shows the relationship between flow size and duration for TCP and UDP flows, respectively. BitTorrent is the most dominant P2P application in FL-2 contributing 46.8% of the total traffic. We can observe a peculiar difference between TCP and UDP flows in this figure. In case of TCP flows the graph is composed of two clusters: a cluster mainly consisting of horizontal lines and filling up almost completely the region between size 90 B and 6 kB and duration 100 µs and 320 s, and another cluster covering a large range in size between 1.5 kB and 580 MB, but a small range in duration between 5 s and 50 min. In contrast to TCP we can only find a single cluster in case of UDP flows, which is similar to the related cluster concentrated around the horizontal lines in Fig. 7a. For the whole measurement period we got that although, only 16% of BitTorrent flows are transferred over TCP and 84% of them are sent over UDP, TCP flows carry almost 99% of the total bytes. The reason is that BitTorrent basically uses TCP for file transfer, but some of the new clients implement a UDP-based method to communicate with the tracker servers. Obviously, it does not need to transmit large volume of data, but rather needs to send numerous flows.
182
Z. M´ ocz´ ar and S. Moln´ ar 8
10
6
Number of flows
10
4
10
2
10
0
10
−2
10
10
0
2
10
4
6
10 Number of packets
8
10
10
Fig. 6. Histogram of the number of packets (FL-1) 5
5
10
Duration [s]
Duration [s]
10
0
10
−5
0
10
−5
10
10
2
10
4
10
6
10 Size [B]
(a) TCP flows
8
10
10
10
2
10
4
10
6
10 Size [B]
8
10
10
10
(b) UDP flows
Fig. 7. Relationship between flow size and duration (FL-2)
5
Conclusion
In this paper we presented a comprehensive traffic characterization study of BitTorrent based on actual measurements taken from a commercial network. We found that the total number of incoming (download) packets is 25% higher than the total number of outgoing (upload) packets. However, the traffic volume shows a different picture: the outgoing traffic volume is about 30% higher than the incoming traffic volume. This observation is explained by the dominance of BitTorrent usage where the amount of data uploaded by an average user is much more higher than the downlink traffic volume in many cases and it was well supported by the symmetric optical high bandwidth access of most of the users. It resulted in double amount of upload traffic volume of BitTorrent compared to the download traffic volume. The flow-level analysis revealed that the bytes carried by a BitTorrent flow are almost independent of the flow duration, and almost all of the flows have a duration less than 200 s and a size smaller than 4 kB. We also found that almost all of the flows, which have a duration between 300 s and 365 s are TCP flows. The size of the largest packet is near to the maximum size of the Ethernet frame
Characterization of BitTorrent Traffic in a Broadband Access Network
183
close to 1500 bytes, and BitTorrent flows are highly scattered in the dimensions of size and number of packets. Furthermore, the number of BitTorrent flows have a heavy-tailed decrease for the number of packets more than a certain value, and only 0.1% of flows contain a unique number of packets. Although, only 16% of BitTorrent flows are transferred over TCP and 84% of them are sent over UDP, TCP flows carry almost 99% of the total bytes.
Acknowledgement We thank Sollentuna Energi AB for the measurements, Ericsson Sweden and Ericsson Hungary for the cooperation and helping to access the data. The research was supported by NKTH-OTKA grant CNK77802.
References 1. BitTorrent homepage, http://www.bittorrent.org/index.html 2. Sen, S., Spatscheck, O., Wang, D.: Accurate, Scalable In-Network Identification of P2P Traffic. In. In: WWW (2004) 3. Karagiannis, T., Papagiannaki, K., Faloutsos, M.: BLINC: Multilevel Traffic Classification in the Dark. In: SIGCOMM (2005) 4. Dang, T.D., Per´enyi, M., Gefferth, A., Moln´ ar, S.: On the Identification and Analysis of P2P Traffic Aggregation. In: Boavida, F., Plagemann, T., Stiller, B., Westphal, C., Monteiro, E. (eds.) NETWORKING 2006. LNCS, vol. 3976, pp. 606–617. Springer, Heidelberg (2006) 5. Choffnes, D., Bustamante, F.: Pitfalls for Testbed Evaluations of Internet Systems. ACM SIGCOMM Computer Communication Review 40, 43–50 (2010) 6. Erman, D., Ilie, D., Popescu, A.: BitTorrent Session Characteristics and Models. In: Proceedings of the 3rd International Conference on Performance Modelling and Evaluation of Heterogeneous Networks, Ilkley, West Yorkshire, U.K, pp. 1–10 (2005) 7. Andrade, N., Neto, E.S., Brasileiro, F., Ripeanu, M.: Resource Demand and Supply in BitTorrent Content-Sharing Communities. Computer Networks 53, 515–527 (2009) 8. Basher, N., Mahanti, A., Mahanti, A., Williamson, C., Arlitt, M.: A Comparative Analysis of Web and Peer-to-Peer Traffic. In: Proceedings of the 17th International Conference on World Wide Web, Beijing, China, pp. 287–296 (2008)
SELFMAGICNETS 2010
Remediating Anomalous Traffic Behaviour in Future Networked Environments Angelos K. Marnerides1, Matthew Jakeman1, David Hutchison1, and Dimitrios P. Pezaros2 1 Infolab21, Computing Department Lancaster University, Lancaster, UK {a.marnerides,m.jakeman,dh}@comp.lancs.ac.uk 2 Department of Computing Science University of Glasgow, Glasgow, UK
[email protected]
Abstract. The diverse characteristics of network anomalies, and the specific recovery approaches that can subsequently be employed to remediate their effects, have generally led to defence mechanisms tuned to respond to specific abnormalities; and they are often suboptimal for providing an overall resilience framework. Emerging future network environments are likely to require always-on, adaptive, and generic mechanisms that can integrate with the core networking infrastructure and provide for a range of self-* capabilities, ranging from self-protection to self-tuning. In this paper we present the design and implementation of an adaptive remediation component built on top of an autonomic network node architecture. A set of pluggable modules that employ diverse algorithms, together with explicit cross-layer interaction, has been engineered to mitigate different classes of anomalous traffic behaviour in response to both legitimate and malicious external stimuli. In collaboration with an always-on measurement-based anomaly detection component, our prototype facilitates the properties of self-optimisation and self-healing. Keywords: Future and autonomic networks, resilience, remediation.
1 Introduction The design of future autonomic architectures requires the incorporation of self-* properties that will ensure the optimal network operation in the face of the dynamically-changing behaviour of next generation, converged networked environments. The merging of heterogeneous systems and networks within such environments requires that effective measurement and control mechanisms are composed in a multidimensional fashion on all three data, control and management planes. In this paper, we present the design of resilient systems which will provide and maintain an acceptable level of service in the face of various challenges to normal operation [8]. We particularly focus on the area of network traffic anomalies since they pose great challenges to e-infrastructures, and because their remediation after being detected is a non-trivial problem. This is mainly due to the many different types R. Szabó et al. (Eds.): AccessNets 2010, LNICST 63, pp. 187–197, 2011. © Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
188
A.K. Marnerides et al.
of anomaly that can be triggered from either legitimate or malicious intent, and whose solution cannot easily be provided by existing models or mechanisms. Within the context of autonomic communications, anomalies span dynamically across different systems and networks, and hence the employment of recovery mechanisms in order to remedy them makes for hard design and implementation decisions. It is necessary for such mechanisms to be foundationally supported by infrastructures that facilitate flexible design frameworks. The EU FP6 Autonomic Network Architecture (ANA) project has successfully deployed an Autonomic node (ANA node) which – via its specially developed framework and API – facilitates the design of such complex mechanisms. Based upon the resilience requirements we have identified within ANA, we have designed an architecture composed of an anomaly detection unit and an anomaly remediation engine [9]. Synergistically, and by employing cross-layer information exchange, both components empower the properties of self-protection, self-learning, self-optimisation and self-healing at the onset of anomalous traffic behaviour. The first two are provided by the detection unit, whereas our remediation engine accommodates the latter two properties, and it is embedded within the resilience architecture built on top of the ANA node. Through our algorithmic design, we provide an explicit interaction between the application and the network layers which can provide performance benefits for each layer with respect to recovery from an ongoing anomaly (e.g. Flash Event- FE, DoS). The design complements our previous work [13][11], and our implementation has enabled the instrumentation of diverse remediation strategies within a flexible API such as the one provided by the ANA architecture. The remainder of this paper is structured as follows: section 2 briefly describes our resilience architecture that hosts the remediation mechanism, and introduces the ANA node infrastructure. Section 3 shows the internal architecture of our remediation framework as well as the algorithmic design and evaluation. Section 4 presents the prototype implementation, whereas section 5 outlines future work and concludes the paper.
2 Resilience Architecture The resilience architecture is encapsulated as an integral part of an ANA node and is composed by two Functional Blocks (FBs); the Detection Engine (DE) and the Remediation Engine (RE). FBs compose one of the most primitive abstractions within ANA and their operations may reside either on the control or the management planes. They can be deployed locally on a single ANA node or be distributed within a compartment. A compartment within ANA is the operational service context where FBs cooperate to provide a service of any type [14]. In its implemented form, an ANA node is a microkernel that provides a message switching broker service to Functional Blocks (FBs). It is composed by three core segments: the Minimal Infrastructure for Maximal Extensibility (MINMEX), the ANA Playground and the ANA hardware abstraction layer. The MINMEX is the means for allowing and providing basic low-level functionalities which are required for
Remediating Anomalous Traffic Behaviour in Future Networked Environments
189
bootstrapping and running ANA. At the same time it facilitates the generic sets of methods (API) that are used by its “clients” (e.g.. applications, protocols). The most complex and advanced networking functionalities within an ANA node reside in the ANA ‘Playground’ (i.e. an area for experimentation and deployment). This segment of the node hosts both commodity and bespoke components (e.g. cryptographic primitives, compression schemes, error recovery codes), and it couples a development and execution environment allowing the implementation of FBs.
Fig. 1. Resilience Architecture within an ANA node
Fig. 1 shows our resilience framework which is engineered within the ANA Playground, and follows the API principles provided by it. Since it is a measurementbased framework, the architecture purely depends on the functionality offered by a dedicated Adaptive Measurement/monitoring Functional Block (AM FB) that consists of 5 FBs available via the Playground. Data passing and message interaction between the measurement framework and the resilience architecture is achieved through certain data interfaces which in ANA are referred to as Information Dispatch Points (IDPs). IDPs act as generic communication pivots between the various FBs and they offer the advantage of re-organizing communication paths between the FBs. IDPs also allow the implementation of forwarding tables which are fully decoupled from addresses and names. For example, in ANA, the next hop “entity” may reside either locally on the system or within a compartment and is always identified by an IDP. Consequently, this allows for the easy addition and usage of new networking protocols and technology, as long as their communication services are exported as IDPs. There are several IDPs published (e.g. flow reception IDP from the DE; notification reception IDP from the RE; sampling and capturing configuration IDP from the Adaptive Monitoring Functional Block (AM FB)) as services from both the Resilience FBs and the AM’s FBs, albeit not included in Fig.1 for the sake of simplicity.
190
A.K. Marnerides et al.
However, the main purpose of this section is to explain the main interaction between the resilience and measurement frameworks and to show how resilience can be accommodated. Initially, the DE receives flow information from the AM FB on a dedicated IDP and internally performs entropy estimation on selected flow features (e.g. packet interarrival time, bytecount), in order to provide a prediction about the evolutionary behaviour of the traffic, and to detect possible anomalies. Subsequently, the entropy results are passed to a Supervised Naïve Bayesian classifier which, based on past classified traffic can compare and further categorize a possible anomaly to its correct label (e.g. DDoS, alpha flows, Flash Events (FEs)), and decide whether an anomaly occurs on a local or a compartment-wide scenario. According to the vicinity of the anomaly, the DE is then responsible for notifying the appropriate RE that may reside locally (on the same ANA node) or remotely within the same compartment. The operations undertaken by the DE FB are composed by two main sub-units which, based on the ANA terminology, are referred to as ‘bricks’. Their complex functionalities pose great overall design and implementation challenges which we do not intent to describe in this document, since we are focusing on the explanation of the RE. A detailed description regarding the internals of the DE can be found at [9], [12] and [10].
3 Remediation Engine Design 3.1 Remediation Engine Internal Architecture The composition of the architecture described below has been a challenging task since we initially had to identify and further evaluate algorithms that would be appropriate as robust remediation strategies. In parallel, we had to consider whether our design was practically feasible to instrument using the ANA API. This subsection mainly presents the engineering aspect of our design followed by the algorithmic evaluation. Fig. 2 presents the RE internal architecture which is in charge of mitigating the effects of a local or compartment-wide traffic anomaly. The RE is composed of two main functional modules: the Defender and the Messenger. The former executes node-local remediation algorithms, and the latter distributes the instance of an event to remote REs within a network compartment, as required. Apart from the two core bricks, there is also the Configuration Manager (CM) which is an infrastructural unit, and provides for the dynamic binding and configuration of the overall RE with a local or remote DE. Due to the diversity of network anomalies and the different effective remediation strategies that can be deployed according to their nature, we have focused on two broad categories, those of Denial of Service (DoS) –including distributed DoS–, and Flash Events (FEs). We have therefore considered two families of remediation algorithms for our Defender module, namely traffic shaping and dropping (in response to an e.g. DDoS attack), and geographic region-aware clustering and load-balancing (in response to a FE), respectively.
Remediating Anomalous Traffic Behaviour in Future Networked Environments
191
Fig. 2. Remediation Engine Internal Architecture
This latter remediation strategy enforces traffic prioritisation to ensure path diversity and maximises throughput by propagating popular content to geographically diverse areas of the network. It then uses cross-layer information to redirect clients to alternative sources of content, and therefore reduces link stress, effectively providing for network self-optimization [11]. In addition, due to path diversity increasing innetwork, this algorithm is inherently distributed, since a RE instructs a peer to take a clustering action on its behalf. To a lesser extent, the same holds for the (D)DoS remediation based on packet dropping, since distribution among multiple REs can be exploited as an efficient pushback mechanism that enhances system and network selfhealing, and alleviates the detrimental effects of an attack at the “last mile”. 3.2 Remediation Strategies A core process within our engine’s development lifecycle was the algorithm selection and evaluation phase undertaken before the actual prototype implementation. As already mentioned, our intention was to accommodate diverse remediation strategies that would enable self-optimization and self-healing at the onset of two particular types of anomalies: FEs and (D)DoS. In parallel with region-clustering and load balancing that are considered beneficial for DDoS defence [1], we strongly support that the remediation of such events is required to include a dropping mechanism. Either on a distributed or local attack scenario, our dropping methodology is collaboratively accommodated with the actions taken by the Functional Composition (FC) framework. The FC as presented in [14] is a core infrastructural unit present in any ANA node that leverages selfmanagement capabilities on the data, control and management planes. One of the features of the FC framework is congestion management for which a packet dropping
192
A.K. Marnerides et al.
utility is provided. The FC contains the Packet Sink brick which employs the traditional Random Early Detection (RED) algorithm [7], and its services are visible within the same network compartment. Therefore, at the onset of (D)DoS attack, our Remediation Engine (RE) sends a notification to the FC containing a list of the malicious source addresses. Subsequently, the Packet Sink brick is notified by the FC and marks the reported packets as “optimal”, dropping them immediately in order to block the attack traffic. Since the Packet Sink brick supports RED, we inherit some of its terminology in our scheme stating “optimal” packets as those holding a high dropping preference in the system’s queue. In addition to the dropping mechanism offered by the FC, we have evaluated the gains of diverse remediation strategies for fast content propagation, applicable to event of legitimate, yet adverse requests for content hosted over a single network topology. We have simulated P2P file sharing overlays to demonstrate how application-network cross layer interaction can alleviate the detrimental effects of flooding phenomena such as Flash Events. We have developed two cross-layer services that can be made accessible to an ANA network compartment (as well as to conventional ISP networks), namely a “distance” and a “region-awareness” service. The “distance” service is a facility that simply takes IP addresses as input and returns a distance measurement between the source and the requester. Since distance may be specified in several ways in ANA, in our case we have used the Autonomous System (AS) path length that returns an AS Proximity (ASP) metric, and then selects the least-AS-distant content provider. “Region-awareness” is a service that also takes a set of addresses as input, and returns these addresses clustered to several subsets according to the different paths traversed between two nodes [13]. We have developed a number of augmented and region-aware overlay algorithm variants in order to demonstrate how operation can be optimised using explicit crosslayer interaction. The augmented overlay algorithm uses the “region-awareness” service so that providing peers can load-balance their response traffic. When a provider reaches its simultaneous serving threshold, it explicitly redirects further clients to the subset of peers (providers) it has already served through the same “regional” cluster that the incoming request came from. Using this strategy, providers tend to spread the overlay traffic load to diverse segments of the underlay infrastructure. In addition, we have developed variants of a Region-aware Overlay (RegO) algorithm which, in contrast to the augmented algorithm, uses a central tracker facility to provide requesters with all currently serving overlay peers [13]. The RegO implements Random (Ran) provider selection and region –aware load balancing to requests. Within our simulations both algorithms have employed two variants of clustering namely, Simple Clustering (SC) and Hierarchical Clustering (HC). In SC, the server clusters requests to segments that are based on the first-hop egress link traversed by the response traffic. Simply enough the total simultaneous request threshold defined
as T is divided by the servers’ ν egress links and τ i = T /ν simultaneous requests are served per-cluster. The HC variant accommodates a hierarchical clustering of the requests based on the traversed response traffic from both the first and second hops. Under the HC scenario the initial threshold T is divided by the number of first-hop
egress links ν to produce i first-hop thresholds τ i = T /ν , each of which is further divided by the number of egress links attached to first hop i.
Remediating Anomalous Traffic Behaviour in Future Networked Environments
193
We have used NS2 [15] and BRITE [4] to construct numerous power-law AS-level topologies to include 100, 500, and 1000 AS nodes, each having a minimum degree of 2, 3 and 4 links per AS leaf [3][5]. The gains of the explicit underlay/overlay performance were assessed under the scenario of spreading the so-called first chunk (in our case 1MB) of content among participating peers which serve at most 10 simultaneous transfers each. The performance metrics used were the individual transfer throughput in KB/s and the maximum link stress over the complete Internet-wide topology. Fig. 3 shows the effect of cross-layer algorithms in increasing transfer throughput over different AS-size topologies and their respective overlays.
Fig. 3. Percentage increase in mean individual transfer throughput for cross-layer algorithms
Fig. 4. Percentage decrease in topology-wide maximum link stress for the cross-layer algorithms
194
A.K. Marnerides et al.
Variations in throughput increase with respect to the minimum access edge degree of the topologies are also shown. The solid lines represent the performance gains of the cross layer-algorithms with first-hop SC, and the dashed lines show their second hop HC counterparts. The decrease of maximum link stress for the same AS-level topologies as for those in Fig. 3 is presented in Fig. 4. Even though there is no clear correlation between throughput and link stress, it is evident that on average cross-layer algorithms outperform their simple overlay counterpart. It is evident that every algorithm employing hierarchical clustering on the requesting peers consistently outperforms simple clustering based on network access link. In addition, the augmented algorithm with ASP provider selection presents significant gains over the rest of the algorithms with a 50% optimisation. A quite appealing general observation is that the length of the content providers list (i.e. the number of alternative sources) is not of major importance neither for reduced link stress nor for increased throughput transfer.
4 Remediation Engine Implementation The Remediation Engine (RE) is composed of two main entities, the Defender and the Messenger. These are both implemented in the form of ANA bricks, the most atomic elements of functionality within a FB. Fig. 5 displays a conceptual overview of the data communication that takes place once an anomaly has been detected by the DE. The Defender publishes the rmInfoIDP which is responsible for receiving anomaly information by the Detection Engine (DE).
Fig. 5. RE data flow & functionality
The information the DE sends a structure containing the result of the Bayesian classification along with an indication as to whether the abnormality has a local or compartment-wide impact. Subsequently, this information is passed on to the Decision Unit (DU) that resides within the Defender.
Remediating Anomalous Traffic Behaviour in Future Networked Environments
195
The DU is in charge of processing the information received from the DE and decides whether the anomaly has been classified as a (D)DoS attack or a FE. Subsequently, it forwards the information to the appropriate processing unit. Acting as an integral part of the Defender, the Local Action Unit (LAU) is responsible for informing the Functional Composition (FC) framework what actions to take based on the nature of the attack. This is achieved by constructing a LUA [6] message that the FC can use to insert appropriate traffic filters. If the anomaly is a DoS attack, the DE information is passed to the Drop Unit whose responsibility is to inform the Local Action Unit (LAU) that packets need to be dropped. In cases where the attack has been classified as local, the LAU still needs to interact with the local FC instance in order to inform it that local dropping is required. In addition, the LAU provides a list of the malicious source addresses to the FC which via its Packet Sink brick marks them as “optimal” in order for the Random Early Detection (RED) mechanism employed to drop them immediately. In parallel with dropping, the FC constructs a filter based on the listed source addresses and triggers its congestion control utilities with an initial packet inspection. Similarly in the case of a compartment-wide attack (i.e. DDoS), it is necessary for the LAU to inform the Messenger using its defenderIDP so the Messenger can distribute information about the anomaly to all REs within the same network compartment. In the scenario of a FE, the anomaly-related information is passed on to the Algorithm Selection Unit (ASU). The ASU is the unit in charge for performing decisions about which of the algorithms discussed previously in this paper to be employed by the Remediation Engine (RE). Once a decision is made, and since the FE is a compartment-wide phenomenon, the ASU informs the LAU in order to construct an appropriate LUA message and send it over to the local Messenger instance. The primary purpose of the Messenger is to disseminate information about an anomaly to other nodes in a compartment if the anomaly was classified as not just affecting the local host. When it receives a message over the defenderID, it checks to see what type of anomaly has been detected to enable it to construct an appropriate message to send to other Messengers in the same network compartment. Messengers also transmit Autonomous System Proximity (ASP) metrics as well as routing information (i.e. next hop) between themselves since this information is required to be used by the region-aware and load balancing algorithms as deployed by the ASU. When next hop load balancing is being performed, the Defender can inform the Functional Composition Functional Block (FC FB) where to forward traffic based on the next hop and the FC FB can place appropriate filters to route the traffic. In a DDoS attack event, the Messenger also constructs a new message containing a list of source addresses that have been identified as originating nodes in the attack. A parallel process performed by the Messenger is to resolve all Messenger-compatible reception data IDPs (i.e. recvDataIDP) that are public and visible within the same compartment so it can broadcast this message to other Messenger bricks. When another brick of the same type receives this message, it is then eligible to inform its local Defender that it is required to remediate the attack. This action is achieved from an internally viewed IDP namely the messRecvIDP. As soon as this information is passed to the Defender, the appropriate local action is triggered (i.e. dropping via the FC FB) in the same way it does when informed of a local DoS by the DE.
196
A.K. Marnerides et al.
Similarly, at the onset of a FE, the Messenger interacts with other REs via their Messenger bricks and sends a message in the compartment informing it about the FE along with a list containing the addresses that are directly related to the phenomenon. The Messenger additionally transmits a notification stating the algorithm that the ASU decided as appropriate for compartment-wide deployment. For instance, under the circumstance where a FE has an impact on next hop nodes within the same compartment, the ASU decides and informs the Messenger to notify the next hop Messenger instances that the most suitable clustering algorithm for all of them to collaboratively perform for confronting the phenomenon is the Region-aware Overlay - Simple Clustering variant (RegO SC). Subsequently any Messenger that receives this information then pass the algorithm selection request on to its local Defender which triggers the requested scheme.
5 Conclusions and Future Work In this paper, we have presented the design and implementation of a traffic anomaly remediation component that can be an integral part of next generation autonomic network infrastructures. Through our design and implementation, we have demonstrated that the correct exploitation of carefully designed infrastructures enables complex issues such as the remedy of network anomalies to be effectively resolved. Our remediation framework prototype contributes towards the instrumentation of diverse remediation methodologies and empowers core autonomic properties such as self-optimization and self-healing. After the promising results obtained through large-scale simulation, ongoing work focuses on testing our framework under actual operational conditions and traffic scenarios. Our intention is to evaluate our remediation mechanisms within the overall resilience architecture that we have also presented in this paper. The evaluation will be conducted in the autonomic communication testbed (ANA-Lab) provided within the ANA project. The ANA-Lab offers the capability of virtual topology instrumentation of ANA nodes through distributed monitoring and control facilities. Our main objective is to examine the practical system performance of our prototype through experiments using live as well as pre-captured operational traffic traces. Acknowledgements. The authors wish to thank the EU/FP6 IST Autonomic Network Architecture (ANA) project for its partial support of the research presented here.
References [1] Asosheh, A., Ramezani, N.: A Comprehensive Taxonomy of DDoS Attacks and Defence Mechanism Applying in a Smart Classification. WSEAS Transactions on Computers 7(7), 281–290 (2008) [2] Autonomic Network Architecture (ANA) Project details, http://www.ana-project.org [3] Barabasi, A., Albert, L.: Emergence of scaling in random networks. Science, 509–512 (October 1999) [4] Boston University Representative Internet Topology Generator (BRITE), http://www.cs.bu.edu/brite
Remediating Anomalous Traffic Behaviour in Future Networked Environments
197
[5] Bu, T., Towsley, D.: On distinguishing between Internet power law topology generators. In: IEEE INFOCOM 2002, New York, USA, June 23-27 (2002) [6] De Figueiredo, L.H., Ierusalimschy, R., Celes, W.: LUA: An Extensible Embedded Language. Journal of Software Tools 21(12) (1996); National Center for Biotechnology, Information, http://www.ncbi.nlm.nih.gov [7] Floyd, S., Jacobson, V.: Random Early Detection gateways for Congestion Avoidance. IEEE/ACM Transactions in Networking 1, 397–413 (1993) [8] Hutchison, D., Sterbenz, J.P.G., Jabbar, A., Sholler, M.: D3.2: Resilience/Security Framework, Deliverable D3.2 ANA (December 2006) [9] Marnerides, A.K., Pezaros, D.P., Hutchison, D.: Detection and Mitigation of Abnormal Traffic Behaviour in Autonomic Networked Environments. In: 4th ACM SIGCOMM CoNEXT Student Workshop, Madrid, Spain, December 9-12 (2008) [10] Marnerides, A.K., Pezaros, D.P., Hutchison, D.: Autonomic Diagnosis of Anomalous Network Traffic. In: 4th IEEE WoWMoM Workshop on Autonomic and Opportunistic Communications (AOC 2010), Montreal, Canada, June 14-17 (2010) [11] Pezaros, D.P.: Cross-Layer Optimisation of Network Response at the Onset of Bursty Requests. In: Proceedings of Multi-Service Networks (MSN 2006), Cosener’s House, Abingdon, UK, July 13-14 (2006) [12] Pezaros, D., P., Marnerides A., K., Hutchison D.: 2008 D3.10: Measurement-based Resilience Mechanisms, Deliverable D3.10 ANA (December 2008) [13] Pezaros, D.P., Mathy, L.: Explicit Application-Network Cross-layer Optimisation. In: 4th International Telecommunication NEtworking WorkShop (IT-NEWS) on QoS in Multiservice IP Networks (QoS-IP 2008), Venice, Italy, February 13-15 (2008) [14] Sifalakis, M., Louca, A., Peluso, L., Mauthe, A., Zseby, T.: A Functional Composition Framework for Autonomic Network Architectures. In: Proceedings of 2nd IEEE International Workshop on Autonomic Communications and Network Management (IEEE NOMS/ACNM 2008), Salvador, Bahia, Brazil, April 7-11 (2008) [15] The Network Simulator 2 (NS2), http://www.isi.edu/nsnam/ns/
IPv6 and Extended IPv6 (IPv6++) Features That Enable Autonomic Network Setup and Operation Ranganai Chaparadza1, Razvan Petre1 , Arun Prakash1, Felici´an N´emeth2 , Slawomir Kukli´ nski3 , and Alexej Starschenko1 1
2
Fraunhofer FOKUS, Berlin, Germany {ranganai.chaparadza,razvan.petre,arun.prakash, alexej.starschenko}@fokus.fraunhofer.de Budapest University of Technology and Economics, Budapest, Hungary
[email protected] 3 Warsaw University of Technology, Warsaw, Poland
[email protected]
Abstract. In this paper we present an insight on the IPv6 features and a few examples of propositions for Extensions to IPv6 protocols, which enable autonomic network set-up and operation. The concept of autonomicity-realized through control-loop structures embedded within node/device architectures and the overall network architecture as a whole is an enabler for advanced self-manageability of network devices and the network as a whole. GANA Model for Autonomic networking introduces autonomic manager components at various levels of abstraction of functionality within device architectures and the overall network architecture, which are capable of performing autonomic management and control of their associated Managed-Entities (MEs) e.g. protocols, as well as cooperating with each other in driving the self-managing features of the Network(s). MEs are started, configured, constantly monitored and dynamically regulated by the autonomic managers towards optimal and reliable network services. This amounts to what we call autonomic setup and operation of the network. We present how to achieve this, and also present the features that IPv6 protocols exhibit, that are fundamental to designing and building self-configuring, self-optimizing and self-healing networks i.e. IPv6 based autonomic networks. Keywords: IPv6, Evolution of the current Internet towards Self-Managing Future Internet.
1
Introduction
The main benefits of the self-management technology in systems and networks, from the operator’s perspective are: to minimize operator involvement and OPEX in the deployment, provisioning and maintenance of the network, and increasing network reliability (self-adaptation and reconfiguration on the fly). R. Szab´ o et al. (Eds.): AccessNets 2010, LNICST 63, pp. 198–213, 2011. c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
IPv6 Features and Extended IPv6 (IPv6++) Features
199
The FP7 EFIPSANS project [1] introduced a standardizable evolvable Architectural Reference Model for Autonomic Networking and Self-Management dubbed the Generic Autonomic Network Architecture (GANA) [2,3]. The GANA Model defines fundamental building blocks that should be considered when designing devices of a network that is autonomic/self-managing. GANA, presented in brief, in the next section, is a holistic Architectural Reference Model for Autonomic Network Engineering and Self-Management that serves the following purposes: (1) To answer the question of how Self-Management/Autonomicity can be introduced into the fundamental architecture of Future Internet devices (GANA-conformant devices), (2) To then instantiate GANA with autonomic management and control of Protocols and Mechanisms (e.g. IPv6 protocols, because core GANA concepts are protocol agnostic), (3) To use GANA as a guide to examining and exploiting the strengths and features of IPv6 protocol, e.g.in order to have the big picture on where Extensions to IPv6 protocols (IPv6++) can be introduced and for what purposes. This approach to designing the self-managing Future Internet has lead to the Propositions for a number of Extensions to IPv6 Protocols (IPv6++) being proposed by EFIPSANS. For more information on the kind of extensions to IPv6 that are necessitated by a GANA compliant network, we refer the reader to [2,3,4]. In this paper, we present a few selected IPv6 Extensions required for autonomic networking.
2
Fundamentals of Autonomic Networking and Self-management
The concept of autonomicity-realized through control-loop structures embedded within node/device architectures and the overall network architecture as a whole is an enabler for advanced self-manageability of network devices and the network as a whole. The GANA model introduces autonomic manager components at 4 various levels of abstraction of functionality within device architectures and the overall network architecture, which are capable of performing autonomic management and control of their associated Managed-Entities (MEs) e.g. protocols, as well as co-operating with each other in driving the self-managing features of the Network(s). MEs are started, configured, constantly monitored and dynamically regulated by the autonomic managers towards optimal and reliable network services. The four levels are protocol-level, abstracted-functions-level, node-level and network-level [2]. A central concept of GANA is that of an autonomic Decision-Making-Element (DME or simply DE in short-for Decision Element). A Decision Element (DE) implements the logic that drives a controlloop over the management interfaces of its assigned Managed Entities (MEs). Therefore, in GANA, self-* functionalities such as self-configuration, self-healing, self-optimization, etc, are functionalities implemented by a Decision Element(s). The fundamental principles of the setup and operation of an autonomic network can be described as three cascaded phases of some automated behaviors of nodes/devices being connected together to form an autonomic network, namely:
200
R. Chaparadza et al.
[Phase-1] - Boot-up and Bootstrap Phase for each initializing node/device; [Phase-2] - Auto-Configuration Phase for each node/device and the network as a whole; [Phase-3] - Operation and Self-Adaptation Phase for each node/device and the network as a whole, i.e., adaptation to challenges and adverse conditions and policy changes by the human. The following automated behaviors of node/devices and the network (realized as autonomic behaviors orchestrated or triggered by autonomic managers) apply to some of the specific phases (from the three described above). Auto-Discovery (Network-Layer-Services Discovery, Service/Application-Layer-Services Discovery) - Applies to [Phase-1], and some behaviors related to Auto-Discovery for more advanced service provisioning requirements beyond the minimal required at bootup/bootsrap time, may still be attempted during the operation and self-adaptation time of a node/device or network; Auto-Configuration/ Self-Configuration (in the Service-Layer and Network-Layer) - Applies to [Phase-2]; Self-Diagnosing and Self-Healing, Self-Optimization, other Self-* functions - Applies to [Phase-3].
3
Need for a Information and Knowledge Sharing System
In an autonomic network, different network entities need to collaborate in order to fulfill the global goals of the network. For an efficient collaboration, data (e.g. resources and capabilities description, configuration data, events or alarms), information and knowledge must be gathered, elaborated and shared between the different network entities. Hence, a powerful system for exchanging information and knowledge must be in place. The information and knowledge exchange system enables more advanced autonomic and cognitive functions like self-configuration, auto-discovery, self-adaptation, self-optimization or other self-* functionalities. Since the information and knowledge exchange system is a fundamental requirement for achieving autonomic and self-management functionalities, it must scale, it must be fault-tolerant and provide a good accessibility, and it must be secure. Figure 1 highlights the role of the information and knowledge exchange system inside a GANA network. For special type of information and knowledge like time critical information, very large pieces of information or fast changing information, other protocols and/or mechanism must be in place. All the other types of information and knowledge are shared through the exchange system. In the same time, any node may have a local repository for storing information and knowledge. The key requirements and goals that must be considered when designing such a system are: • It must support a large number of resources in large-scale autonomic networks. Thus, the system should scale to a large numbers of resources spread throughout a wide area network across different administrative domains. To achieve this goal, the system must be distributed ideally as a managed overlay network of information/knowledge servers (e.g. one possibility would be to rely on distributed hash tables - DHT). A resource is understood as a
IPv6 Features and Extended IPv6 (IPv6++) Features Information may be shared directly between nodes. e.g. 1. Time critical information (incidents, alarms, etc) 2. Other kind of information
Ask & receive Monitoring Information, Incidents Descriptions, etc.
201
Other Network-Level-DEs Network-LevelQoS_Management_DE Network-LevelRouting_Management_DE
`
Subscribe for receiving Monitoring Information, Incidents Description, etc.
Store the description of Incidents
` Information and Knowledge Exchange System
Other FUNC_LEVEL _DEs
Ask about monitoring capabilities
Publish Monitoring Information
NODE_MAIN_DE
Information / Knowledge Repository
FUNC_LEVEL_ QoS_DE
FUNC_LEVEL_ Monitoring_DE
Fast changing monitoring information
Monitoring DE/ Monitoring Component or Tool
Publish monitoring capabilities
`
Internal Protocol(s) of the Information and Knowledge Exchange System
External protocol(s) of the Information and Knowledge Exchange System
Other protocols & mechanisms for sharing information/knowledge that is time critical (e.g. alarms, incident), very large (e.g. monitoring traces) or fast changing.
Fig. 1. The Role of Information and Knowledge Sharing exchange system in a GANA Conformant Network
•
• •
• •
•
device or services offered by devices, but also as data in the form of configuration data, network policies, incidents descriptions, monitoring data, network knowledge, etc. The system must also provide scalability to support a large number of updates and data-retrieval requests. It must be highly available, as it plays an important role in the processes of Auto-Discovery and Auto-Configuration and other Self-* functionalities. To achieve this goal the information and knowledge stored inside the system must be replicated and must be always available, even in the case of failures of several servers forming the distributed system. The information and knowledge exchange system must offer the possibility to update information that was previously published. It must provide very good update times for replications for all data types. It must support complex queries. Partial queries should be supported (queries that contain only a subset of the attributes originally advertised by resources, considering the other attributes as wildcards). Each query might also have a scope (global, local, n-hops away, etc). It must provide a powerful subscription mechanism for receiving information of interest. It must provide well-designed interfaces to allow efficient data input and retrieval. These interfaces may be summarized as the system’s external protocols. The messages to and from the Information Exchange System should be optimized so as not to contribute to congestion in the network. It must provide well-designed communication protocols and primitives to be used by servers that are part of the overlay network, to build and maintain the overlay network. These protocols and primitives must facilitate the efficient distribution and propagation of information/knowledge across the
202
R. Chaparadza et al.
overlay network and also must support advanced services like replication, clustering or security. They may be summarized as the internal protocols of the system. • It must provide well-designed communication protocols and primitives to be used by any network entity willing to access the services offered by the system. These communication protocols and primitives may be summarized as the external protocols of the system. • It must be easily extendable to allow the inclusion of future functionality. • Since the goal of GANA is to reduce the complexity in management network, the Information and Knowledge Exchange System must be optimized for simplicity of installation and maintenance. The instantiation of such a system is the ONIX system. ONIX stands for Overlay Network for Information eXchange and it is the EFIPSANS proposed solution for a scalable, fault-tolerant and secured information and knowledge exchange system in IPv6 networks, but not limited to it. The architecture of the ONIX system is presented in Figure 2. External Protocol Layer DHCPv6++ Server
ONIX Layer XML Parser & Key Generatio n Module
Query Interprete r& Resolver
Subscripti on Handler
Security Module
Replication Module
Bloom Filters Module
ONIX communication module
DHT Layer DHT Factory Chord Protocol
Other DHT protocols
Hash Function Factory SHA1
Rabin Fingerprints
New hash functions
Chord communication module
Fig. 2. ONIX System Architecture
ONIX system is built on top of a DHT. In the current implementation Chord [5] protocol is used, but any DHT protocol that expose the functions: put(key, value) and get(key) may be used as the underlying DHT protocol. Example of other possible DHTs include [6,7,8]. Moreover, if the DHT protocol permits, ONIX system provides a mechanism to easily switch between different hash functions, like SHA1 or Rabin Fingerprints [9]. Information and knowledge published
IPv6 Features and Extended IPv6 (IPv6++) Features
203
to the system is described using XML, and the query language is based on Xpath. Keys and data, information and knowledge are replicated inside the system, the component responsible for handling the replication process as well as for keeping the replicas synchronized is the Replication Module. Bloom Filters [10] are used to reduce the traffic generated in the system for resolving complex queries. The ONIX internal protocol is an extended version of the Chord protocol. As the external protocol, the system uses an extended version of DHCPv6 (DHCPv6++), designed in EFIPSANS project. A short summary of the services offered by the ONIX system includes: • Information and Knowledge storage and retrieval : (a) Push & Pull models supported, (b) Different classes of information, (c) Add, remove or replace operation supported, (d) Information is described using (XML). • Information Query: by supporting a query language capable of expressing complex queries (partial queries, scoped queries, etc) (based on XPath). • Information Dissemination: Upon request, periodically or event triggered: (a) Normal Subscription, (b) On-behalf Subscription, (c) Publish & Disseminate • Security: Authentication, Authorization, Trust, Confidentiality, Integrity, Non-repudiation, Privacy, Tracking of activities taken and originators of each input to the system for accountability and auditing. • Reliability, fault-tolerance and accessibility
4
Auto-discovery as an Enabler for Self-configuration and Self-adaptation
A self-managing network needs a way to know the entities composing the network in order to employ self-configuration and self-adaptation mechanisms to configure and operate the nodes in the network. In the context of GANA , the Auto-Discovery functionality consists of Self-Description, Self-Advertisement, and support for Solicitation of Capabilities at the node level and Topology-Discovery. In the context of GANA, we define these as follows: Self-description is defined as the ability of a functional entity to describe itself. This includes the description of its Capabilities such as software and hardware specifications, available services and tools, supported protocols, node interface information including its current role and a list of potential roles it can play in the network. The self-description mechanism of a node results in the generation of the GANA Capability Description Model [11] of the node. Self-Advertisement of Capabilities is the process by which a functional entity spontaneously disseminates the generated GANA Capability Description Model to other functional entities either inside a node or in the network subject to security policies. The dissemination may be carnied out through an information and knowledge sharing repositories such as ONIX. Support for solicitation for Capabilities is ability of a functional entity to respond to requests for its Capability Description by initiating its self-description and self-advertisement functions. This is vital for the self-organization functionality of a network.
204
R. Chaparadza et al.
Topology-Discovery is the ability of the functional entity to automatically discover the topology of its network without any manual assistance. Topologydiscovery is essential to detect the presence of new nodes, the absence of nodes due to failures and changing network conditions. In the context of GANA, the topology-discovery function can be performed by the NET LEVEL RM DE. Each node publishes its neighbor information, obtained by using IPv6’s NeighborDiscovery (ND) protocol, to the NET LEVEL RM DE in order to facilitate the computation of the network topology. In GANA, the auto-discovery mechanism is initiated by the NODE MAIN DE of a node. The NODE MAIN DE generates the Capability Description for the node by triggering the iterative self-description process as shown in Figure 3. The NODE MAIN DE requests for the Capabilities of its underlying Function-Level DEs. These DEs in turn request for the Capabilities of the Protocol-Level DEs and MEs. Thus the Capabilities of individual DEs and MEs are obtained in a recursive manner. This completes the self-description process of the node. The aggregated Capabilities of the node are then published to on-link neighbors and to ONIX for a network wide dissemination. This completes the self-advertisement process of the node. The self-description and self-advertisement functions are repeated every time the Capabilities of the node changes due to failures, software updates and changing networking conditions. Aggregates capabilities conveyed by FUNC_LEVEL_DE_1 and FUNC_LEVEL_DE_2 plus its own capabilities and self-advertise them
Self-Advertises the node/ device’s Capabilities [employs security mechanisms]
17
Node Level NODE_MAIN_DE
Aggregates capabilities conveyed by PROTO_LEVEL_ DE_1 and ME_3 plus its own capabilities and presents them to the upper DE
Aggregates capabilities of ME_1 and ME_2 and its own capabilities and presents them to the upper DE
1
Function Level
10
FUNC_LEVEL_DE_1
2
Protocol Level
3 ME_1
7
8
PROTO_ LEVEL_DE_1 4
5
11 16
FUNC_LEVEL_DE_2
9 12 13
14 15
Aggregates capabilities of ME_4 and ME_5 and its own capabilities and presents them to the upper DE
ME_3 6
ME_4
ME_5
ME_2
Fig. 3. Self-Description and Self-Advertisment of GANA Capability Description
For the Topology-Discovery mechanism employed by the NET LEVEL RM DE, the neighbor information of each node in the network is required. The neighbor information of each node is used for constructing the network topology graph. Each node publishes its neighbor information, obtained by using IPv6’s NeighborDiscovery (ND) protocol, to the NET LEVEL RM DE. The neighbor information is provided by the NODE MAIN DE by publishing and updating a list of
IPv6 Features and Extended IPv6 (IPv6++) Features
205
its on-link nodes on each interface to the NET LEVEL RM DE. Every time the neighborhood of a node changes either due to failures or due to the bootstrapping of new nodes, or due to dynamic network conditions, the NODE MAIN DE updates the NET LEVEL RM DE with new neighbor information. This facilitates the topology-discovery function to be in sync with the changing network topology. For a detailed sketch on the GANA Capability Description and the algorithms describing the behavior of the auto-discovery mechanism in GANA, the reader is directed to [11]. The information obtained through the auto-discovery mechanisms enable Network-Level DEs to compute node configurations and decisions for various self-adaptation mechanisms in the network. Auto-Configuration in a GANA conformant network is achieved through the use of the GANA Network Profile (GANA NETPROF) [11]. The GANA NETPROF is composed of a NETPROF, GANA Configurations Options Map (MAP) and several vendor specific configuration files. The NETPROF can be considered as an entity that provides a structural and monolithic framework for defining policies, objectives, high-level DE configurations and for the network and its nodes, along with hooks for adding vendor specific configurations for the nodes. Thus in a NETPROF, the policies, objectives and high-level configurations are categorized in terms of the various node roles planned in the network, rather than the actual nodes available in the network. This allows a vendor agnostic implementation of the DEs. As MEs are vendor specific by the nature of their design and implementation, all vendor specific configuration options are delegated to the vendor specific files of the GANA NETPROF. The configurations provided to a node, the GANA Node Configurations (NODECONF), are specific to the vendor type of the node, and are generated from the NETPROF. As a node boots up in a network, the NODE MAIN DE bootstraps its interfaces and initiates the self-description process as shown in Figure 3. The neighbor-information is also computed side-by-side. The aggregated Capabilities Description and neighbor-information are published to ONIX and on-link neighbors. ONIX ensures the dissemination of the information to the Network-Level DEs. The NET LEVEL RM DE uses the neighbor-information for the computation of the network topology-graph (topology-discovery). If OSPF routing is an objective of the network, the obtained network topology-graph is used for partitioning the network into OSPF areas. Thus new areas are formed when the number of routers in an area conflicts with a policy delineating the threshold number of routers for an OSPF area. Existing areas are merged when node failures occur and the number of nodes in a given area falls below the minimum number of routers in an area required. Thus to successfully employ OSPF routing, the network self-adapts to changing network conditions, by partitioning and merging OSPF areas, a behavior executed by the NET LEVEL RM DE. In the context of OSPF routing, the node roles are classified as follows: Core Router (CR), Area Border Router (ABR) and Autonomous System Border Router (ASBR). The Capability Description of a node provide information regarding the vendor, software and hardware attributes and the current role of the node in the network. These, along with the partitioning information is used
206
R. Chaparadza et al.
by the NET LEVEL RM DE for the computation of the NODECONF. The NETPROF is searched for the appropriate node role sub-profile and the vendor specific configuration files in order to generate the NODECONF. The exact nature of the algorithm used for the NODECONF generation is described in [11]. Once the NODECONF is generated, it is disseminated to the respective nodes through ONIX. The NODE MAIN DE receives the NODECONF and uses it to self-configure its DEs and MEs to reflect the dynamic objectives of the network.
5
Autonomic Routing in Wired Network Environment
The Routing Functionality (Function) of nodes in an IPv6 based fixed network and the network as whole can be made autonomic by making diverse Routing Schemes, Policies and Routing Protocol Parameters employed and altered based on network-objectives, changes to the network’s context and the dynamic network views in terms of events, topology changes, etc. Figure 4 depicts how the routing behavior of a node/device and the network as a whole can be made autonomic. Two types of Control-Loops are required for managing/controlling the routing behavior. The first type is a node-local control loop that consists of a FunctionLevel Routing-Management DE embedded inside an autonomic routing node e.g. a router. The local Function-Level Routing-Management DE is meant to process only that kind of information that is required to enable the node to react autonomically and autonomously(according to some goals) by adjusting or changing the behavior of the individual Routing protocols and mechanisms required to be running on the node. The Function-Level Routing-Management DE reacts to views, such as events or incidents exposed by its Managed Entities (MEs) i.e. the Routing protocols and mechanisms. Therefore, the Routing-Management DE implements self-configuration and dynamic reconfiguration features specific to the routing functionality of the routing node. It is important to note that due to scalability, overhead and complexity problems that arise with attempting to make a Routing-Management DE of a node process huge information/data for the control loop, a logically centralized Decision Element(s), may be required, in order to relieve the burden. In such a case, a network-wide slower Control Loop is required in addition to the faster node-local control-loop (with both types of loops working together in controlling/managing the routing behavior in an autonomic way). In [12] an instantiation case of GANA for autonomic management and control of IPv6 routing protocols and mechanisms, as discussed in this section, is illustrated and elaborated. More details on the inter-working of the 2 DEs (control loops) and the aspects related to cognition can be found in [2]. In the [2], related work on how this framework is applied to Auto-Discovery and Auto-Configuration of Routers in an Autonomic Network is presented. Next we show using the example of Open Shortest Path First (OSPFv3) protocol how we examined the Features in IPv6 protocols that are fundamental to designing and building self-configuring, self-optimizing and self-healing networks i.e. IPv6 based autonomic networks. Each paragraph below corresponds to raw
IPv6 Features and Extended IPv6 (IPv6++) Features
207
. & #' / 2$'$*$' $' '$*3 0 43' ' ' ''15'!51 $'1'' $* 16 '51 ' ' ' 17
&$*+ . &$*+ /,-,. &$*+ $',-,. * ((#'
# ' &' ( ((#')
) * ((#'
! "# $ %
&
! "#$%
01'&$*
+ *
& + , -
Fig. 4. Autonomicity as a feature in Routing Functionality in a IPv6 based Network
in our questionnaire template. Additionally, Table 1 enumerates the basic features of OSPFv3 and shows how and where they fit into the GANA framework. Each feature is examined from three aspects detailed in the header row. Summary of the usage of the Protocol and any of its exploitable features. OSPFv3 is a link-state routing protocol that is associated with the control plane of today’s IP networks. The protocol comes with a management interface that can support the automation of (re)-configuring the protocol’s behavior according to the goals set for the routing behavior of the network for which OSPFv3 is meant to fulfill. CLI types of (re)-configuration by a human are normally supported by most implementations of OSPFv3. For the management of OSPFv3 by traditional centralized NMS-type of approaches to network management, the OSPFv3 MIB can be used to perform the (re)-configuration of the protocol. Management of the protocol from within the node running an OSPFv3 instance may require some different approach to CLI or MIB types of interfaces. OSPFv3 implements a simple control loop that implements a selfadaptation mechanism to link failures as described later in this table. Mapping the IPv6 protocol to the GANA Functional Planes. OSPFv3 belongs to the Control Plane, a sub-plane of Dissemination Plane of GANA. Information such as Routes and link state information is disseminated among OSPFv3 supporting nodes. Because OSPv3 implements some control loop, we can loosely associate it with some kind of Decision Element intrinsic to the protocol itself by design. Therefore, we can consider such kind of a protocol intrinsic DE as belonging to the Decision Plane of GANA. Because there is no separation between the
208
R. Chaparadza et al.
Table 1. Exploitable OSPFv3 Features
Protocol Feature that can be exploited
OSPF Hello Parameter Configuration
OSPF Interface Configuration
1. Any Decision Element (DE) that can use the Protocol Feature(s) [Case3]. use means getting information supplied by or via the protocol feature to aid the decision making process of the DE OR managing the use of the available feature for some purpose/goal. 2. Network Environment(s) 3. Any other Self-* Behavior that can benefit from the feature (if applicable) 1. The DE at the Network-level responsible for selfconfiguration/self-adaptation of protocol behavior could (re)adjust OSPF Hello timers to rate-limit Hello traffic. 2. Fixed or slowly changing wired or wireless network environments. 3. Self-configuration and self-optimization. 1. The DE at the Network-level responsible for self-configuration MUST provide a routing profile (IP addresses, identity, link cost, etc.) for OSPF to bootstrap its operation on interfaces where needed. 2. Any network environment.
Area Border Router (ABR) Support 1,2 N/A
OSPF Area Support
Route Redistribution
1. The DE at the Network-level responsible for selfconfiguration/self-organization could restructure the areas in order to limit/optimize LS flooding traffic. For instance, stub (sub)-networks could be separated into distinct areas. 2. Fixed, relatively larger, heterogeneous wired topologies, where the amount of LSA traffic could be substantial. 1. The DE at the Network-level responsible for self-configuration might readjust route import/export from/to external, interdomain routing protocols to holistically optimize the interaction of intra- and inter-domain routing (e.g., to avoid hot-potato routing). 2. Fixed, relatively larger, heterogeneous wired topologies.
Type 1 and Type 2 External Routing Support
1,2 see above
Virtual Link Support
1. In consequence of the reorganization of the area-structure, readjustment of virtual links might become necessary to ensure the consistency of the Backbone area (area 0.0.0.0). This is the task of the Network-level Routing Management DE. 2. Fixed, relatively larger, heterogeneous wired topologies
Unknown LSA
1,2 N/A
OSPF Management Information Base (MIB)
1,2 N/A
Traffic Engineering Support
1. Network-level Self-optimization functionality might use resource availability information disseminated in extended TE-LSAs to make informed Traffic Engineering decisions. 2. Any network environment 3. Monitoring
Non-broadcast Multi-Access (NBMA) Support
1,2 N/A
IPv6 Features and Extended IPv6 (IPv6++) Features
209
decision logic i.e. a DE that implements the control loop intrinsic to OSPFv3 and the rest of the functions of the OSPFv3 that can be considered as regulated (managed) by such a virtual DE, the protocol itself (as a single module at implementation and run-time), as a whole, can be considered as belonging to the Decision Plane of GANA. This means that OSPFv3 is neutral to both the Decision Plane and Dissemination Plane of GANA. Note: Some autonomic protocol can be designed in such a modular way that it clearly has a distinct separation between its protocol-intrinsic DE and the regulated(managed) functions of the protocol that are managed/regulated by the associated protocol-intrinsic DE. Any Self-* Behavior that can be considered as a feature intrinsic within the protocol. OSPv3 as a link-state routing protocol, supports Self-Adaptation by performing failure detection and adaptation to the underlying network topology and link or node failures. After link weights have been configured (typically a manual process), the routing protocol will discover the network topology, disseminate routing information and set up consistent forwarding tables. Additionally, OSPFv3 is capable to adapt to failures: it detects link failures through a variety of methods, such as repeated failures to receive packet acknowledgements. Following failure detection it will reroute, eventually converging on a new valid path. Some forms of Auto-Discovery, Self-Description, Self-Advertisement and Self-Organization functionality can also be identified in the operation of OSPFv3. Any other Self-* Behavior that can benefit from using the protocol in general. Self-Optimization functionality can be achieved by optimizing shortest paths. This involves extending the OSPFv3 LS flooding protocol to convey resource (bandwidth) availability at network links and incorporating OSPFv3 into a control loop that fine-tunes the link costs with respect to monitoring data and actual user demands. This Self-Optimization functionality can be extended to involve multipath routing (OSPF-OMP). Additionally, proposals exist to facilitate for fast OSPFv3 Self-Healing (IPFRR). Any Decision Element (DE) that can manage the protocol [Case-1] OR is intrinsic to the protocol [Case-2]. The Routing-Management DE of an autonomic node, operating on the abstracted networking functions level of GANA-HCLs framework, that manages all routing protocols and mechanisms on the node, is the DE considered as the autonomic manager of OSPFv3 and other routing protocols and mechanisms of the node (collectively). By design, OSPFv3 does not have a distinct DE intrinsic within the protocol per se.
6
Autonomicity for the Data Plane
The Data Plane and the Forwarding functionality of nodes and the network as a whole in an IPv6 based network can be made autonomic by making Diverse Forwarding schemes and GANA Data-Plane parameters such FIBs, ACLs, packet filters, etc, employed and changed based on network-objectives, changing network context and the dynamic network views in terms of events, topology changes, etc. Like Autonomic Routing, two types of Control Loops are required for managing/controlling the data plane and the forwarding behavior. The first type is
210
R. Chaparadza et al.
a node-local control loop (the faster control loop) that consists of a DataPlaneand-Forwarding-Management DE (FUNC LEVEL DP FWD M DE) embedded inside an autonomic node e.g. a router. The FUNC LEVEL DP FWD M DE of a node is meant to process only that kind of information that is required to enable the node to react by adjusting or changing the behavior of the Data plane protocols, which include IPv6 Forwarding, Layer2.5-Forwarding, Layer2Forwarding, Layer3-Switching, Layer2-Switching, etc, supported by the node. The FUNC LEVEL DP FWD M DE reacts to views, such as events or incidents exposed by its MEs - the GANA Data-Plane protocols and mechanisms. The DE thus implements the self-configuration and dynamic reconfiguration features specific to the Data Plane and the forwarding functionality of the autonomic node. The node-scoped FUNC LEVEL DP FWD M DE also relays views such as events or incidents to the Network-Level DataPlane-and-ForwardingManagement DE i.e. network-level control loop (the slower control loop) for further reasoning (in case, wider global knowledge is required in addressing the problems affecting the Data Plane and the forwarding behavior).
7
IPv6 in Autonomic Wireless Networks
Wireless ad-hoc Mesh Networks (WMNs) can serve as community networks, temporary networks for event handling or as sensor networks. WMNs are usually built using 802.11 devices due to their popularity, very low cost, royalty-free deployment and high link throughput. It is worth to emphasize that there are about 120 routing protocols developed for WMNs, 30 of them are expired IETF drafts, 2 are active IETF drafts, and 4 are IETF RFCs. Unfortunately, many of these protocols are well suited for a specific scenario only (for example for highly mobile or static networks, for sparse or dense networks etc.) and as a result it is very hard, if not impossible to select a proper protocol for a specific case. In most WMNs throughput degrades exponentially with the number of wireless hops as a result of the usage of single radio channel, which is the case with most of the deployed or tested networks. The use of multiple radio channels may efficiently improve the overall network performance whereas the cost of adding extra radio interfaces to 802.11 nodes is negligible. In order to use multiple radio channels an algorithm and a protocol for channels allocation is needed. Another mechanism that may improve the performance of WMNs is the use of the multi-path approach, which may improve transfer reliability and increase the end-to-end throughput. In case of proactive routing and especially in case of multi-path routing, of great importance is the selection of the best path or paths for data forwarding. The path quality typically is evaluated by routing metrics. In WMNs a plethora of routing metrics are used, some of them especially designed for WMNs (namely AirTime, ETX and ETT [13]). It has been shown that proper metric selection have an important impact on the network performance [14]. In many WMNs routing protocols the hop-count metric is
IPv6 Features and Extended IPv6 (IPv6++) Features
211
used, which selects the shortest path in terms of number of hops, but is not load or interference aware. So, in autonomic WMNs more sophisticated metrics have to be used. Unfortunately, in the existing protocols, the metric is an integral part of the routing protocol and there is no easy possibility for their change. Another, generic problem of WMNs lies in ignoring by many approaches the information about the physical layer. The usage of the information about the physical and link layer characteristics (the cross-layer approach) has not only a positive impact on the routing (the avoidance of unreliable and low throughput paths), but also on radio channel selection and configuration. There is an ongoing work on the IEEE 802.11s standard, which standardize WMNs, unfortunately this standard does not support multi-interface nodes, multi-path routing, energy-efficient operations nor advanced auto-configuration schemes. The management of 802.11s networks is centralized, thus not well suited for autonomic networks with dynamic topologies; there is no support for realtime distributed management operations that can be used for the optimization of the network behavior, according to user requirements and/or environmental changes. Therefore, a new approach to WMNs is required. This approach should cope with proper, dynamic routing protocol selection and appropriate tuning of the routing protocol used for the specific use-case, according to users or application preferences, network density, energy saving importance, nodes mobility, etc. This approach should enable a dynamic selection of routing metrics, and the dynamic choice of the path for data forwarding according to path quality, expressed in terms of: load, path length (number of hop-counts), path SNR, etc. This new approach should also provide support for cross-layer operations (physical and link layer monitoring and control), radio channel management and multi-path routing. All the above mentioned mechanisms if implemented would increase the reliability and the performance of the WMN, and bring about autonomic WMNs, which are able to adapt to the environment in a much more flexible way than is achieved by the use of current solutions. This holistic approach requires monitoring and control of the nodes’ behavior in near real-time. In order to implement such kind of autonomic management, the GANA Model can be used. As a result, the routing protocol can be dynamically fine tuned, multiple routing protocols can be employed on-demand and the multiple radio interfaces can be handled. In order to make all the above mentioned operations possible, we developed a Wireless and Autonomic Routing Framework (WARF). WARF has a modular structure, in which functions like forwarding, routing, radio channel management and policy control are separated. That way it is possible to modify one of the components leaving the other ones untouched. We use the IPv6 protocol with some Extensions (so called WARF Extensions to IPv6) to carry all WARF control messages. WARF control messages are based on the commonalities of the existing WMNs routing protocols. The uniform form of routing messages enable multiprotocol approach - the control messages have the same format for every routing protocol, but of course every protocol handles them differently. The WARF components enable fine tuning
212
R. Chaparadza et al.
of protocol parameters according to current needs. An outstanding feature of WARF is integration with routing, the channel allocation, and monitoring part, which is able to cope with multiple radio interfaces and control the physical layer properties (channel number, transmitted power etc.). IPv6 provides the flexibility to implement the mentioned functions, but also makes it possible to increase the control plane efficiency - the WARF messages can be piggybacked with user data packets, using the IPv6 Extension Header mechanism, in that way reducing the number of MAC requests. We distinguish four WARF basic components: Resource Maintenance Route Maintenance, Route Representation, Data Forwarding and Policy Control component (WARF management). A more detailed description of WARF can be found in [15]. The WARF Extension Header after validation in real testbed is intended for submitting as an IETF Internet Draft proposal.
8
Conclusion
In this paper, we presented an insight on the IPv6 features and a few examples of propositions for Extensions to IPv6 protocols (IPv6++), which enable autonomic network set-up and operation. The GANA Model presents a framework based upon which architects can reason about an autonomic network in its holistic sense. The work we presented here illustrates how the GANA Model can be instantiated for autonomic management and control of IPv6 protocols and lower layer transport and control-plane protocols and mechanisms to achieve an IPv6 based autonomic/self-managing network, capable of auto-discovering, auto-configuring (self-configuring) and self-adapting its resources to challenges such as adverse conditions, incidents, as well as policy changes received from the human operator. We presented the enablers i.e. enabling methods and mechanisms to designing an autonomic/self-managing network, including a distributed information/knowledge sharing system. What we also presented is a method to view the GANA as a guide to examining and exploiting the strengths and features of IPv6 protocols in order to have the big picture on where Extensions to IPv6 protocols (IPv6++) can be introduced and for what purposes. In that regard, such approach to designing the self-managing Future Internet has lead to a number of Extensions to IPv6 Protocols (IPv6++) being proposed by the FP7 EFIPSANS project. For more information on the kind of extensions to IPv6 that are necessitated by a GANA compliant network, we refer to the project deliverables available soon on the project’s website. We presented selected examples of the proposed Extensions to IPv6, and illustrate an IPv6 based autonomic network, and associated architecture in which routers perform auto-discovery functions and auto-configuration (self-configuration). Acknowledgment. This work is partially supported by EC FP7 EFIPSANS project (INFSO-ICT-215549) [1].
IPv6 Features and Extended IPv6 (IPv6++) Features
213
References 1. EC FP7-IP EFIPSANS Project (2008-2010), www.efipsans.org INFSO-ICT215549 2. Chaparadza, R.: Requirements for a Generic Autonomic Network Architecture (GANA), suitable for Standardizable Autonomic Behavior Specifications for Diverse Networking Environments. In: International Engineering Consortium (IEC), Annual Review of Communications, vol. 61 (2008) 3. Chaparadza, R., et al.: Creating a viable Evolution Path towards Self-Managing Future Internet via a Standardizable Reference Model for Autonomic Network Engineering. In: Towards the Future Internet - A European Research Perspective, pp. 313–324. IOS Press, Amsterdam (2009) 4. Chaparadza, R.: Evolution of the current IPv6 towards IPv6++ (IPv6 with Autonomic Flavours. In: International Engineering Consortium (IEC) Annual Review of Communications, vol. 60 (December 2008) 5. Stoica, I., et al.: Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications. IEEE/ACM Transactions on Networking 11(1), 17–32 (2003) 6. Zhao, B.Y., et al.: Tapestry: A Resilient Global-Scale Overlay for Service Deployment. IEEE Journal on Selected Areas in Communications 22(1), 41–53 (2004) 7. Maymounkov, P., Mazi´eres, D.: Kademlia: A Peer-to-Peer Information System Based on the XOR Metric. In: Druschel, P., Kaashoek, F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, pp. 53–65. Springer, Heidelberg (2002) 8. Rowstron, A., Druschel, P.: Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems. In: Liu, H. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001) 9. Rabin, M.O.: Fingerprinting by Random Polynomials. Technical Report TR-15-81, Center for Research in Computing Technology, Harvard University (1981) 10. Bloom, B.H.: Space/Time Trade-offs in Hash Coding with Allowable Errors. ACM Communications 13(7), 422–426 (1970) 11. Prakash, A., Starschenko, A., Chaparadza, R.: Auto-Discovery and AutoConfiguration of Routers in an Autonomic Network. In: SELFMAGICNETS 2010: Proc. of the International Workshop on Autonomic Networking and SelfManagement, ICST ACCESSNETS 2010, Budapest, Hungary (November 2010) 12. R´etv´ ari, G., N´emeth, F., Chaparadza, R., Szab´ o, R.: OSPF for Implementing Self-adaptive Routing in Autonomic Networks: A Case Study. In: Strassner, J.C., Ghamri-Doudane, Y.M. (eds.) MACE 2009. LNCS, vol. 5844, pp. 72–85. Springer, Heidelberg (2009) 13. Baumann, R., Heimlicher, S., Strasser, M., Weibel, A.: A Survey on Routing Metrics. TIK Report 262, Computer Engineering and Networks Laboratory, ETHZentrum, Switzerland (February 2007) 14. Stefanescu, H., Skrocki, M., Kuklinski, S.: AAODV Routing Protocol: The Impact of the Routing Metric on the Performance of Wireless Mesh Networks. In: Proc. of the 6th International Conference on Wireless and Mobile Communications, ICWMC 2010, Valencia, Spain (September 2010) 15. Kuklinski, S., Radziszewski, P., Wytrebowicz, J.: WARF: A Routing Framework for IPv6 based Wireless Mesh Networks. In: Proc. of the 2nd International Conference on Internet, ICONI 2010, Cebu, Philippines (December 2010)
A Min-Max Hop-Count Based Self-discovering Method of a Bootstrap Router for the Bootstrap Mechanism in Multicast Routing Toshinori Takabatake Department of Information Science, Shonan Institute of Technology, Fujisawa, 251-8511 Kanagawa, Japan
[email protected]
Abstract. In this paper, a min-max hop-count based self-discovering method of a Bootstrap Router (BSR) for the bootstrap mechanism in multicast routing is proposed and its performance is evaluated. The key for the proposed method is that the BSR is selected as to become a position such as almost equal distances from the other routers by the min-max hop-count way. Especially, the proposed method can reduce the processing time efficiently to discover a Rendezvous Point which plays a central role of the data transmission. Simulation results show that the proposed method can reduce the processing time by average more than 43% compared with that of the conventional one. Keywords: multicast, routing, protocol, bootstrap mechanism, bootstrap router, self-discovery.
1
Introduction
Recently, multicast can play an important role in delivering messages to many specific users of the Internet. Many multicast routing protocols have been studied and well surveyed in [1]–[4]. Among them, Protocol Independent Multicast – Sparse Mode (for short, PIM-SM) has been the center of focus in [5],[6]. Data transmission in the PIM-SM domain is performed on a shared tree. A shared tree is a distribution tree for one multicast group in the domain. Packets destined to a group are delivered on the shared tree, which is generated beforehand by a multicast routing algorithm [2]–[4]. The shared tree has a single router called a Rendezvous Point (RP) which plays a central role of the data transmission in PIM-SM [12]–[16]. All packets from a sender are first sent to the RP and the packets on the RP-rooted tree are delivered to all receivers of a multicast group. The methods to configure or discover the RP can mainly classify into two kinds of way: one is to select it by manual statically, e.g., static configuration and embedded-RP; the other is by some mechanism dynamically, e.g., Cisco’s Auto-RP and Bootstrap Router [6]. In discovery of the RP on the shared tree in the PIM-SM domain, one of the important mechanisms is that of the “Bootstrap Mechanism” [7],[8]. The R. Szab´ o et al. (Eds.): AccessNets 2010, LNICST 63, pp. 214–225, 2011. c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
A Min-Max Hop Count Method
215
mechanism has its features of dynamic, self-configuring largely, and robust to a router failure. However, because the mechanism needs data flooding for all routers in the domain to find the RP, some traffic or congestion is prone to occur on the domain. The processing of the mechanism is also time-consuming. Besides, the role of the RP selection is played in a Bootstrap Router (BSR). Before discovering the RP, one of the routers is selected as the BSR firstly. As a result, some bordering routers in the domain can be selected as the BSR. Thus, since the distances between the BSR to the other routers may become longer, data transmission also takes much time. In this paper, to overcome the mentioned-above problem, a min-max hopcount based self-discovering method of a BSR for the Bootstrap Mechanism in multicast routing is proposed and its performance is evaluated. The key for the proposed method is that the BSR is selected as to become a position such as almost equal distances from the other routers by the min-max hop-count way. Thus, especially, the proposed method can reduce the processing time efficiently to discover the BSR and the RP. The proposed method is characterized by a self-discovering method of the BSR. Simulation results show that the proposed method can reduce the processing time efficiently by compared with that of the conventional one. The rest of this paper is organized as follows: Section 2 describes the overview of multicast routing, the PIM-SM, the bootstrap mechanism, and the hybrid one. Section 3 presents the proposed method. Section 4 evaluates the performance of the bootstrap mechanism, the hybrid one, and the proposed one by simulation. Section 5 discusses and compares the proposed method with the other methods. Section 6 summarizes and concludes this paper.
2
Preliminaries
In this section, an overview of the multicast routing, a multicast routing protocol in Protocol Independent Multicast – Sparse Mode (PIM-SM), a bootstrap mechanism, and a hybrid bootstrap one in PIM-SM are described. Sender
B
B
A
A
B
B
A
(a) Multicast : Router
A
A
A
A
A
(b) Shared tree for group A : Rendezvous Point
: Packet (Message) flow
B
B
B
B
(c) Shared tree for group B A
: Receiver of group A
B
Fig. 1. Examples of multicast routing in PIM-SM
: Receiver of group B
216
T. Takabatake
2.1
Overview of Multicast Routing
In the Internet, there are four kinds of data communication as follows: unicast, broadcast, anycast, and multicast. Especially, multicast has been brought to much public attention since efficient communication is needed for specific users in the Internet. Thus, many multicast routing protocols have been studied [2]–[4]. Multicast is to aim at sending packets efficiently for specific destinations (i.e., receivers) or a group of destinations. Fig. 1a shows an example of the multicast that some packets are sent to the specific groups on a shared tree. As shown in Figs. 1b and 1c, the packets toward the group A are delivered on a groupspecific Rendezvous Point (RP)-rooted tree; the packets toward the group B are also delivered on a RP-rooted tree, which will be described detail in the following subsection. To construct such a shared tree, there are also many multicast routing algorithms [2]–[4]. In one of the features of a multicast group in the algorithms, the receivers have their group addresses, which are assigned to Class D addresses from 224.0.0.0 to 239.255.255.255 in IPv4 [4] and, upper eight bits are all one in IPv6 [11]. 2.2
PIM-SM
In this here, PIM-SM [5],[6] is described briefly and informally for simplicity. The join mechanism of PIM-SM is described in the following. Receiver R
(1)
Router
S Sender
S Sender
Unicast
(1)
: RPT
: SPT
RP (2) (2)
S Sender
(3)
(3)
: Rendezvous Point (RP)
R
R
R
R
R
R
(a) Sender and Receiver joining RP (b) Rendezvous Point Tree (RPT)
R
R
R
R
R
R
R
R
R
R
(c) Shard Path Tree (SPT)
Fig. 2. An overview of PIM-SM
As shown in Fig. 2a, when a receiver joins a multicast group by sending an Internet Group Management Protocol (IGMP), its first-hop router (i.e., Designated Router) sends a PIM join message toward a Rendezvous Point (RP) in Fig. 2a(1). Then, the processing of this message by intermediate routers maintains status information for the group. From this information, a new branch of the distribution tree, i.e. a multicast tree, from the RP-rooted to the receivers is built. On the other hand, when a sender joins the multicast group, the sender sends a PIM register message with data packets encapsulated to the RP by unicast in Fig. 2a(2). After the RP is receiving the PIM register message from the sender, the RP sends a PIM join message toward the sender in Fig. 2a(3).
A Min-Max Hop Count Method
217
Note that the single router (i.e., RP) is selected in the PIM-SM domain and the RP-rooted tree called a Rendezvous Point Tree (RPT) is set up. The packets of the sender are firstly sent to the RP by unicast, the packets are secondly sent to the receivers on the RP-rooted tree as shown in Fig. 2b. If traffic or congestion has occurred on the RPT, it is switched over the tree with originated the sender (i.e., the shortest path from the sender to the receiver) called a Shared Path Tree (SPT) as shown in Fig. 2c. 2.3
Bootstrap Mechanism
In this here, an overview of the bootstrap mechanism [7],[8] is given in the following. C-BSR
C-RP
C-BSR
BSR
C-RP
C-BSR
C-RP (b) C-RP advertisment and RP-Set formation. (Step 2 and Step 3.)
(a) BSR election. (Step 1.)
C-RP
BSR
C-RP
Receiver
DR
BSR
RP
(c) RP-Set flooding. (Step 4.)
C-RP
(d) Group-to-RP mapping. (Step 5.) : Unicast
: Flooding
Fig. 3. Outline of bootstrap mechanism
Fig. 3 shows an outline of the bootstrap mechanism. A bootstrap mechanism is an algorithm which is to find a RP on a shared tree dynamically. Some of the PIM routers within a PIM domain are configured to be Candidate Bootstrap Routers (C-BSRs) for the domain as shown in Fig. 3a. In addition, some of the PIM routers in the PIM-SM domain are also as potential RPs as Candidate RPs (C-RPs) as shown in Fig. 3b. A PIM router may be configured as both C-BSR and C-RP. A C-BSR may be identical to a C-RP or may be different from the C-RP. One of the C-BSRs is elected as a Bootstrap Router (BSR). The BSR is an important role in the mechanism. The procedures of the bootstrap mechanism [7],[8] are as follows: Step 1. (BSR Election). Each C-BSR generates bootstrap messages (BSMs). Every BSM contains a BSR priority filed, a BSR address filed, and so on.
218
T. Takabatake
BSMs are flooded hop-by-hop throughout the domain. If a C-BSR hears about a higher-priority C-BSR than itself, then the C-BSR stops its sending of further BSMs for some period of time. A single remaining C-BSR becomes the elected BSR. Step 2. (C-RP Advertisement). Each C-RP within a domain sends periodic C-RP-Advertisement (C-RP-Adv) messages to the elected BSR as shown in Fig. 3b. A C-RP-Adv message includes the priority of the advertising C-RP, group addresses, and so on. Step 3. (RP-Set Formation). The BSR collects a set of C-RPs information (the RP-Set). To form the RP-Set, the BSR selects a subset of the C-RPs that it has received C-RP-Adv messages from each C-RP. Note that the RPSet contains the following elements: multicast group range, RP priority, RP address, Hash mask length, and so on. Step 4. (RP-Set Flooding). In future BSMs, the BSR includes the RP-Set information. BSMs are flooded through the domain, which ensures that the RP-Set rapidly reaches all the routers in the domain as shown in Fig. 3c. Step 5. (Group-to-RP mapping). When a Designated Router (DR) receives an IGMP from a directly connected receiver for a group for which it has no state, the DR uses an algorithmic mapping to bind the group to one of the RPs in the RP-Set. Note that the algorithmic mapping is used as the following hash function in this paper. The same function is used in all routers in the domain. This is guaranteed to select one RP in the RP-Set for a domain (i.e., a group address) even if the hash value is provided for any routers. f = (n · ((n · (g&m) + k) XOR C(i)) + k) mod 231 where g is multicast address; m is a hash-mask, in case that IPv4 by RFC2362 or RFC4601, m = 30 is recommended; C(i) is the IP address of the i-th RP in the RP-Set; n and k are constant, and n = 1103515245, k = 12345. In the procedures of Step 1 and Step 4 mentioned above, the data flooding is performed to all routers within the domain. However, these processes may cause traffic in the network and they may take a lot of time. 2.4
Hybrid Bootstrap Mechanism
In this here, an overview of the hybrid mechanism [9],[10] is given in the following. As described in Subsection 2.3, Steps 1 and 4 in Figs. 3a and 3c, respectively, make their flooding for all routers within the domain. However, the flooding may cause some traffic or congestion and it may take a lot of time in the processing. The procedures from Step 1 to Step 3 in the hybrid method as shown in Figs. 4a and 4b are the same procedures from those in the conventional one as shown in Figs. 3a and 3b. However, in Step 4 as shown in Fig. 4c, to select one RP from C-RPs, the hybrid method is that the role of the RP selection is played in the BSR, not that the DR. Since the BSR only knows the RP-Set in Step 3 of Fig. 4b, the BSR can select the RP. Thus, BSR can also embed the information of the RP into the BSM in advance.
A Min-Max Hop Count Method C-BSR C-BSR
C-BSR
C-RP
BSR
Receiver
C-RP
(a) BSR election. (Step 1.)
C-RP (b) C-RP advertisment and RP-Set formation. (Step 2 and Step 3.)
DR
219 BSR
RP
(c) Group-to-RP mapping. (Step 4.) : Unicast
Fig. 4. Hybrid bootstrap mechanism
In this way, the hybrid method in Step 4 as shown in Fig. 4c is performed by unicast between the DR and the BSR as follows: the DR sends a query to the BSR; then, the BSR sends the BSM contained the RP-Set to the DR. Thus, the hybrid method does not need the flooding of BSM for all routers in Step 4 of Fig. 4c. The procedures of the hybrid method are as follows: Step 1. (BSR Election). The same as that in the previous Subsection 2.3. Step 2. (C-RP Advertisement). The same as that in the previous Subsection 2.3. Step 3. (RP-Set Formation). The same as that in the previous Subsection 2.3. Step 4. (Group-to-RP mapping). When a Designated Router (DR) receives an IGMP from a directly connected receiver for a group for which it has no state, the DR sends a query to the BSR; then, the BSR sends the BSM contained the RP-Set information to the DR. Note that, because the hybrid method does not need to inform the RP-Set by flooding in Step 4 as shown in Fig. 3c, the load of communication can thus reduce against the domain. In addition, the processing time can reduce.
3
Proposed Method
In this section, an overview of the proposed method and its procedures are described in the following. 3.1
Overview of Proposed Method
For the conventional bootstrap method and the hybrid one in the above-mentioned, a BSR is selected from C-BSRs in Step 1 of the both methods. Since the procedure of Step 1 (i.e., the BSR election) is based on the BSR priority, e.g., the IP address, in the BSM in each C-BSR as shown in Figs. 3a and 4a, there may happen to be selected a boundary router as the BSR in the domain. In this case, BSMs of the BSR in the procedure of Step 4 (i.e., RP-Set Flooding) are flooded to all routers in the domain as shown in Fig. 3c. However, since there are some routers which have a lot of hops to reach from the BSR, to inform the BSM for all routers may
220
T. Takabatake
take a lot of times. So that the processing time may become longer. As a result, the performance can be decreased in both methods, respectively. In the proposed min-max hop-count method in Step 1 when one BSR is selected from C-BSRs, the procedure is based on distances, i.e., the number of hops, but not such as IP addresses. Due to select the BSR by the number of hops, distances which are sending and receiving data between sources and destinations may become middle or short, but not long. Thus, the proposed method makes it possible to reduce the processing time of it rather than that of the conventional one. Note that it is assumed that the BSM in each C-BSR contains the information, i.e., distances between C-BSRs. 3.2
Procedures of Proposed Min-Max Hop-Count Method
Fig. 5 shows the proposed method of Step 1 in the mechanism. Steps 2 to 5 in the proposed method are as same as those in the conventional one. A
E
B
C
D
F
IP addresses of C-BSRs : A : 224.10.10.10 B : 239.10.10.20 C : 224.10.10.30 D : 224.10.10.40 E : 224.10.10.50 F : 239.10.10.60
A(B, C, D, E, F) = A(2, 1, 2, 1, 3) B(C, D, E, F, A) = B(2, 2, 3, 2, 2) C(D, E, F, A, B) = C(1, 1, 2, 1, 2) D(E, F, A, B, C) = D(2, 1, 2, 2, 1) E(F, A, B, C, D) = E(3, 1, 3, 1, 2) F(A, B, C, D, E) = F(3, 2, 2, 1, 3) (a) Step 1-1.
(A, B, C, D, E, F) .
(b) Step 1-2.
A(2, 1, 2, 1, 3) B(2, 2, 3, 2, 2) C(1, 1, 2, 1, 2) D(2, 1, 2, 2, 1) E(3, 1, 3, 1, 2) F(3, 2, 2, 1, 3)
A=3 B=3 C=2 D=2 E=3 F=3 (c) Step 1-3. C, D
D (d) Step 1-4.
Fig. 5. Example of the proposed min-max hop-count method in bootstrap mechanism
The procedures from Step 1-1 to Step 1-4 in the proposed method are as follows: Step 1. (BSR Election). Each C-BSR generates BSMs. Every BSM contains a BSR priority filed, a BSR address filed, and so on. BSMs are flooded hopby-hop throughout the domain. Step 1-1. Each router finds the minimum number of hops between one to the others, respectively, as shown in Fig. 5a. Step 1-2. Each router sends the information obtained in Step 1-1 to all routers by each BSM as shown in Fig. 5b. Step 1-3. The maximum number of hops is selected among the number of hops in each router obtained in Step 1-2 as shown in Fig. 5c.
A Min-Max Hop Count Method
221
Step 1-4. The router which has the minimum number of hops from the results in Step 1-3 is selected. Note that, in the case that there are several routers with the same number of hops, the router which has the higher value of Class D in the IP address is selected as shown in Fig. 5d. Step 2. (C-RP Advertisement). The same as that in the previous Subsection 2.3. Step 3. (RP-Set Formation). The same as that in the previous Subsection 2.3. Step 4. (RP-Set Flooding). The same as that in the previous Subsection 2.3. Step 5. (Group-to-RP mapping). The same as that in the previous Subsection 2.3.
4
Evaluation
In this section, computer simulation was performed to verify the proposed method in the following. We have conducted a simulation to compare performance of the bootstrap mechanism, the hybrid one, and the proposed one. The simulation was written in the C programming language and was compiled in Visual C++ 2008, and was running on a Windows XP SP3 with Pentium Dual Core 2.5 GHz CPU and 2048 MB RAM. Simulation runs were made repeatedly until 95 percent confidence intervals for the sample means were acceptable, where a source (i.e., sender) and destinations (i.e., receivers) were randomly given on the four topologies assumed as shown in Fig. 6. The topologies were a mesh-type topology and, complete-, partial-, and incomplete-binary tree topologies.
(a) Mesh
(b) Compelte binary tree
(c) Partial binary tree
(d) Incomplete binary tree
Fig. 6. Topologies used for simulations
222
T. Takabatake
Note that the IP addresses of Class D in the routers regarding as the nodes were also randomly given. To examine the scalability of the mechanism, the number of nodes used for a topology was that of 16, 49, 100, 144, 256, and 676, respectively, in the simulations. Also note that, to investigate the worst influence of the flooding, all routers were configured as C-BSRs and C-RPs. In the bootstrap mechanism, the hybrid one, and the proposed one, the means of the processing time of them were measured from the start to the finish in the simulation in terms of the topologies. That is, the processing time is from Step 1 to Step 5 described in Subsection 2.3, from Step 1 to Step 4 described in Subsection 2.4, and from Step 1 to Step 5 in Section 3, respectively.
5
Discussion
Figs. 7 to 12 show the results of the simulations in the case that the number of nodes is 676, 256, 144, 100, 49, and 16, respectively. Table. 1 summarizes the reduction ratio of the hybrid method [9] to the conventional one [7] and that of the proposed one to the conventional one [7] in processing time. From Fig. 7 and Table. 1, in the case that the number of nodes is 676, the reduction ratio of the proposed method to the conventional one [7] is more than 70% about a mesh and a complete binary tree, and is more or less 7% about a partial binary tree and an incomplete binary tree. Furthermore, in uniform topologies such as a mesh and a complete binary tree in Figs. 8 to 12, the proposed method is also almost superior to the other methods.
Fig. 7. Simulation Results on 676 nodes
Fig. 9. Simulation Results on 144 nodes
Fig. 8. Simulation Results on 256 nodes
Fig. 10. Simulation Results on 100 nodes
A Min-Max Hop Count Method
Fig. 11. Simulation Results on 49 nodes
223
Fig. 12. Simulation Results on 16 nodes
Table 1. Comparison of the reduction ratio of [9] to [7] and that of the proposed method to [7] in processing time for four topologies on 676, 256, 144, 100, 49, and 16 nodes, respectively Topology Bootstrap [7] Hybrid method [9] (#1) Prop. method (#2) Mesh (26 × 26) 100% 55.4% 73.3% Complete binary tree 100% 55.5% 74.8% Partial binary tree 100% 55.6% 7.6% Incomplete binary tree 100% 50.8% 7.1% Mesh (16 × 16) 100% 54.2% 72.5% Complete binary tree 100% 52.2% 69.2% Partial binary tree 100% 54.1% 17.9% Incomplete binary tree 100% 50.1% 16.5% Mesh (12 × 12) 100% 51.9% 66.0% Complete binary tree 100% 47.5% 57.8% Partial binary tree 100% 54.5% 23.4% Incomplete binary tree 100% 32.6% 23.5% Mesh (10 × 10) 100% 51.6% 64.2% Complete binary tree 100% 49.3% 47.8% Partial binary tree 100% 44.6% 38.9% Incomplete binary tree 100% 56.6% 36.3% Mesh (7 × 7) 100% 48.3% 51.7% Complete binary tree 100% 47.2% 45.3% Partial binary tree 100% 26.2% 40.5% Incomplete binary tree 100% 36.1% 47.5% Mesh (4 × 4) 100% 14.3% 28.6% Complete binary tree 100% 31.6% 42.1% Partial binary tree 100% 30.8% 46.2% Incomplete binary tree 100% 14.3% 50.0% ave. − 44.4% 43.7% #1: the reduction ratio (%) of the hybrid method [9] to [7] = (1 − [9]/[7]) × 100; #2: the reduction ratio (%) of the proposed method to [7] = (1 − P rop./[7]) × 100.
224
T. Takabatake
However, from the reduction ratio as shown in Table. 1 in non-uniform topologies such as a partial binary tree and an incomplete binary tree, the performance of the proposed method is better than that of the conventional one, but worse than that of the hybrid one for the large number of nodes as shown in Figs. 7 to 10. This is because there may exist some nodes which take many hops from a BSR even if one node had been selected as to be the minimum hops from the BSR in the proposed method. On the other hand, the smaller the number of nodes becomes in the topologies, the shorter the processing time does as shown in Figs. 11 and 12. As a result, the reduction ratio of the proposed method has achieved better as compared with that of the other ones for the small number of nodes. Thus, for the large number of nodes in uniform topologies, the processing time of the proposed method can reduce greatly as compared with that of the other methods. On the other hand, for the small number of nodes in uniform topologies and the large number of nodes in non-uniform ones, the processing time of the proposed method may affect the whole processing time as compared with that of the other methods because finding distances (i.e., the number of hops) between routers takes much time.
6
Conclusions
In this paper, a min-max hop-count based self-discovering method of a Bootstrap Router (BSR) for the bootstrap mechanism in multicast routing is proposed and its performance is evaluated. The key for the proposed method is that the BSR is selected as to become a position such as almost equal distances from the other routers by the min-max hop-count way. Especially, the proposed method can reduce the processing time efficiently to discover the BSR and a Rendezvous Point which plays a central role of the data transmission. Simulation results show that the proposed method can reduce the processing time effectively by average more than 43% compared with that of the original one. Further research issues remain to be explored; these include running precise simulations on various topologies and developing an efficient self-management mechanism with fault tolerance to find the BSR, also applying the mechanism to a mobile environment. Acknowledgments. The author would like to thank Mr. Masaki Furuya, who was a member of the laboratory of the author, for his support on this work.
References 1. Edwards, B.M., Giuliano, L.A., Wright, B.R.: Interdomain Multicast Routing. Addison-Wesley, Reading (2002) 2. Paul, P., Raghvan, S.V.: Survey of Multicast Routing Algorithm and Protocols. In: 15th Int’l. Conf. Computer Communication, pp. 902–926. IEEE Press, New York (2002) 3. Sahasrabuddhe, L.H., Mukherjee, B.: Multicast Routing Algorithms and Protocols: A Tutorial. IEEE Network 14(1), 90–102 (2000)
A Min-Max Hop Count Method
225
4. Ramalho, M.: Intra- and Inter- Domain Multicast Routing Protocols: A Survey and Taxonomy. IEEE Communications Surveys & Tutorials 3(1), 2–25 (2000) 5. Deering, S., Estrin, D., Farrinacci, D., Jacobson, V., Liu, C., Wei, L.: The PIM Architecture for Wide-Area Multicast Routing. IEEE/ACM Trans. Networking 4(2), 153–162 (1996) 6. Fenner, W., Handley, M., Holbrook, H., Kouvelas, I.: Protocol Independent Multicast - Sparse Mode (PIM-SM): Protocol Specification (Revised). RFC 4601 (August 2006) 7. Estrin, D., Handly, M., Helmy, A., Huang, P., Thaler, D.: A Dynamic Bootstrap Mechanism for Rendezvous-Based Multicast Routing. In: IEEE 18th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM 1999), vol. 3, pp. 1090–1098. IEEE Press, New York (1999) 8. Bhaskar, N., Gall, A., Lingard, J., Venaas, S.: Bootstrap Router (BSR) Mechanism for Protocol Independent Multicast (PIM). RFC 5059 (January 2008) 9. Takabatake, T.: A Hybrid Bootstrap Mechanism for Multicast Routing in PIMSM. In: Australasian Telecommunication Networks and Applications Conference 2007 (ATNAC 2007), pp. 332–336. IEEE Press, New York (2007) 10. Takabatake, T.: Evaluation of the Hybrid Bootstrap Mechanism for Multicast Routing. In: 2nd IEEE Int’l. Symp. Advanced Networks and Telecommunication Systems (IEEE ANTS 2008), pp. 1–3. IEEE Press, New York (2008) 11. Savoa, P., Haberman, B.,: Embedding the Rendezvous Point (RP) Address in an IPv6 Multicast Address. RFC 3956 (November 2004) 12. Bista, B.B., Chakraborty, G., Takata, T.: An Efficient RP (Rendezvous Point) Replacement Mechanism for Rendezvous-Based Multicast Routing. In: 23rd Int’l. Conf. Distributed Computing Systems Workshops (ICDCSW 2003), pp. 514–518. IEEE Press, New York (2003) 13. Harutyunyan, H.A., Dong, X.: A New Algorithm for RP Selection in PIM-SM Multicast Routing. In: 3rd IASTED Int’l. Conf. Wireless and Optical Communications, pp. 566–571. ACTA Press (2003) 14. Lin, H.-C., Lin, Z.-H.: Selection of Candidate Cores for Core-Based Multicast Routing Architectures. In: IEEE Int’l. Conf. Communications (ICC 2002), vol. 1, pp. 2662–2666. IEEE Press, New York (2002) 15. Lin, Y.-D., Hsu, N.-B., Pan, C.-J.: Extension of RP Relocation to PIM-SM Multicast Routing. In: IEEE Int’l. Conf. Communications (ICC 2001), vol. 1, pp. 234– 238. IEEE Press, New York (2001) 16. Gulati, A., Rai, S.: Core Discovery in Internet Multicast Routing Protocol. Int’l. J. Communication Systems. 12(5–6), 349–362 (1999)
ALPHA: Proposal of Mapping QoS Parameters between UPnP Home Network and GMPLS Access Lukasz Brewka1 , Pontus Sköldström2 , Anders Gavler2 , Viktor Nordell2 , Henrik Wessing1 , and Lars Dittmann1 1
Department of Photonics Engineering, Technical University of Denmark Kgs. Lyngby, Denmark {ljbr,hewe,ladit}@fotonik.dtu.dk 2 Department of Networking and Transmission Lab, Acreo AB 164 40 Kista, Sweden {ponsko,andgav,viknor}@acreo.se Abstract. This paper is treating the interdomain QoS signaling between the home and access domains with a focus on applying it for providing QoS between a UPnP-QoS based home network and GMPLS based access network. The work presented here is defining a possible approach for an interface between UPnP-QoS and GMPLS in order to move towards end-to-end QoS establishment as well as investigating the complexity of such a solution. We present the QoS parameters and mechanisms in both UPnP-QoS and GMPLS and how they can be matched to create a coherent QoS architecture. Keywords: Interdomain QoS, UPnP-QoS, GMPLS, Auto-Discovery and Auto-Configuration of end-systems and access devices.
1
Introduction
Home networks and network services available for residential users are under a constant development. The integration of services is becoming a reality and lately much attention has been attracted by triple-play services. One of the requirements for quality delivery of triple-play services over a single broadband connection is the possibility to differentiate between the services as well as providing end-to-end QoS. The means for guaranteeing QoS within different domains usually are different. In this paper we consider the two areas that are ICT ALPHA project’s main focus i.e. home- and access networks. In the home network domain we consider a service based architecture for further analysis. In this paper we describe the UPnP-QoS Architecture [1] as a control and management protocol. We examine the suitability of UPnP-QoS for the management of a modern home network and we consider an implementation of UPnP-QoS version 3. UPnP-QoS defines the approach for providing QoS as an application layer management protocol, it does not define any actual means of mapping policy based priorities into link/network layer technologies such as Ethernet or WiFi. This allows more flexibility and leaves the decision about the marking R. Szabó et al. (Eds.): AccessNets 2010, LNICST 63, pp. 226–239, 2011. c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
Mapping QoS Parameters between UPnP and GMPLS
227
for the network implementers. In this paper we will focus on the edge between the home and access networks and mapping parameters signaled by UPnP-QoS components to the QoS scheme used in the access network. As an access network technology we consider a packet based Active Optical Network (AON) e.g. based on Ethernet or MPLS. In the scope of AON network we investigate the GMPLS protocol suite using OSPF-TE for routing and RSVPTE as its resource reservation protocol. MPLS and GMPLS are often seen as core technologies, but during recent years MPLS usage has been pushed towards the end-customers and is commonly referred to as “MPLS access”. This together with the common belief that future broadband access should be viewed as the “fourth utility” and the future need for higher bandwidth in this part of the network then GMPLS is a possible future control plane due to its traffic engineering and multi-technology data plane support (e.g. high capacity optical networks). The use-case that is motivation for a discussion in this paper is depicted in Fig. 1. Integration of the QoS provisioning in the home and access networks allows preservation of the flow transmission parameters, like delay, jitter and data loss, between the host in the home and server in the access network, e.g. preventing above listed traffic flow parameters from degradation due to background traffic (like in Fig. 1 the solid line - Video on Demand service being protected from the dashed line - background traffic).
Fig. 1. UPnP-GMPLS usecase
Proposing a control and management plane interface between the UPnP network and GMPLS network is an important step towards the integration of these two, what we think, important technologies in home and access networks respectively. The integration of QoS within these domains would allow end-to-end QoS provisioning for services that are provided by the access network operator or services where the operator has direct connectivity with an external service provider - which might be a common case). End-to-end service that traverse more domains, e.g. the entire Internet, are out of scope. In this paper we usually refer to mapping as to the translation of the QoS parameters from one domain to other neighboring domain (that we can call also horizontal mapping), to distinguish the mapping performed between different OSI layer in the same domain (usually in the same network component) we will refer to this mapping as vertical mapping.
228
L. Brewka et al.
Related Work The authors of [2] recognize the need for QoS information exchange between home and access network, but propose to “outsource” the flow classification to the access network. They correctly claim that use of RSVP imposes limitation like; "applications need to be specially (re)written; the approach is not scalable as routers need to track resource requests and usage of multiple independent flows; typical consumer access routers are low-power devices and potentially lack the resources to implement this solutions" [2]. Our solution does not require redirecting a copy of all customer traffic to a centralized classifier and additionally users equipment needs to be only UPnP-QoS enabled (which is an extension to already widely deployed on personal devices UPnP). When scalability is considered, in our scenario a few, quality sensitive, applications do need to support UPnP-QoS and scalability is not of great concern as scenario described does not consider global end-to-end reservations but is limited to smaller domains designed to meet scalability requirements. Additionally, we do not necessarily have a 1:1 relationship between application flows and network reservations i.e. application flows can be merged into a single reservation thus reducing the amount of signaling state. An investigation of end-to-end QoS establishment and some work on integration of reservations is presented in [3] where the authors use SIP information to discover the domains to request QoS in. The authors however do not present how specific QoS parameters like bandwidth, delay, etc. are signaled in different domains. This paper contributes with the first, to our knowledge, QoS mapping and signaling schema between a UPnP-QoS based home network and a GMPLS based access network. It outlines the design part that later on enables an implementation and verification phase. Design, implementation and verification of such an interface are included as FP7 ICT Alpha goals. The remainder of this paper is organized as follows; section 2 treats UPnPQoS, section 3 describes QoS approaches in GMPLS. These sections are followed by mapping strategies in section 4, finally in section 5 the conclusions are given.
2
In Home QoS - UPnP-QoS
The UPnP-QoS Architecture [1] defines a number of services responsible for QoS provisioning in the home network. There are four distinct components in the UPnP QoS Architecture, these can be seen in the overview of the architecture in Fig. 2. The Control Point (CP) is the entity that requests QoS for a traffic flow (typically it is part of an application that is the traffic source), it is aware of the requirements of the traffic flow, its source- and destination address. The QoS Manager (QM) is the entity responsible for QoS establishment; it contacts the QoS Policy Holder (QPH) to obtain the policies that should be enforced for particular traffic flows, it also monitors the state of and requests the admittance of a traffic on the QoS Devices (QD) along the calculated network path.
Mapping QoS Parameters between UPnP and GMPLS
Set Policy
Control Point
229
QoS Policy Holder
Traffic Traffic Policy Descriptor Request QoS QoS Manager Report
QoS Device
QoS Device
QoS Device
Fig. 2. UPnP architecture
UPnP-QoS defines three types of QoS: prioritized, parameterized, and a hybrid. QoS types UPnP-QoS uses different parts of the Traffic Descriptor [1] for defining the requirements towards devices’ capabilities and configurations. In subsections below we describe the Traffic Descriptor parameters for prioritized and parameterized QoS. Later in the section 4 we will discuss the tasks of mapping the parameters conveyed by the Traffic Descriptor to the interface proposed in this paper, specifying the input for the establishment of the resource reservations in the access networks. 2.1
Prioritized QoS in UPnP
Traffic prioritization usually gives good results in preservation of transmission parameters of different flows types, although only when there is no oversubscription within the priority classes. It is performed by marking packets belong to different classes with their priority and then treating them differently during forwarding. The main advantage of this approach is its simplicity and scalability, though it is important to point out that prioritized setup does not provide any end-to-end guarantees since it acts on a per hop basis and there is no traffic flow specific bandwidth allocation [4]. This type of QoS provisioning is performed by the UPnP-QoS Prioritized setup. Prioritized QoS setup in UPnP-QoS works as follows; after the CP requests QoS the QM determines which QoS Devices (QDs) should take part in the forwarding of the traffic flow, by invoking the GetPathInformation action, it also verifies the state of these devices via the GetExtendedQosState action. Next, the QM obtains the Traffic Importance Number (TIN) for this particular traffic flow from the QPH and attempts the establishment of the QoS on the QDs using the AdmitTrafficQoS action, passing the Traffic Descriptor with proper TIN as this action’s argument. If no errors occur throughout the above procedure and the configuration of the QDs then the specific traffic flow should be admitted and the QM sends to the CP UpdatedTrafficDescriptor containing up to date information about the traffic specification. The messages exchanged between the UPnP-QoS entities are presented in Fig. 3 above the dashed line.
230
L. Brewka et al.
Fig. 3. The UPnP-QoS architecture
As stated before UPnP-QoS does not consider how a QD configures the vertical mapping from TIN to link/network layer prioritization, however the UPnP-QoS specification provides guidelines on how to map the TIN into the VLAN priority tag (802.1Q) and DSCP field, this mapping is presented in Table 1. The TIN, beside the TrafficId (used for unique identification of packets belonging to particular stream), is the only mandatory part of the Traffic Descriptor when setting up prioritized QoS. Table 1. Vertical mapping between UPnP-QoS TIN and link/network layers Traffic VLAN / DSCP Importance Number IEEE 802.1Q priority 0 0 0x00 1 1 0x08 2 2 0x10 3 3 0x18 4 4 0x20 5 5 0x28 6 6 0x30 7 7 0x38
2.2
Parameterized QoS in UPnP
With Parametrized QoS network resources (typically tokens or forwarding buffer space) are reserved on all involved in traffic forwarding nodes based on a set of parameters such as bandwidth, thus guaranteeing that admitted traffic will be treated in the desired manner. A sequence communication diagram showing the signaling between different UPnP-QoS entities for this setup is as presented in Fig. 3 (both above and below the dashed line - showing the possibility of preemption). As for the prioritized setup the CP initiates the QoS establishment. Next, the QM requests the topology information from QDs, then policies from the QPH and attempts the traffic admittance on the devices on the traffic path.
Mapping QoS Parameters between UPnP and GMPLS
231
If the reservation fails the QM can attempt to preempt (if requested) already admitted traffic and re-admit the traffic. Finally, upon successful QoS admittance the QM sends to the CP UpdatedTrafficDescriptor (for parameterized setup containing rate, end-to-end delay, jitter and others values described later in this section). The key parameters for setting up Parameterized QoS are placed in the Traffic Descriptor structure which is passed as an argument of the AdmitTrafficQoS action. This will invoke the admission mechanisms towards the network/link layer. Among many parameters included in the Traffic Descriptor the most relevant for the parameterized QoS setup is the AvaialbleOrderedTspecList, which contains a list of Traffic Specifications (Tspec), the Tspec in turn is composed of a number of traffic parameters. Below the Tspec parameters are listed (precisely the v3TrafficSpecification fragment) together with the unit and indication if the field is; o - optional or m - mandatory, for clarity chosen parameters are shortly described. – – – – – – – – –
RequestedQosType - o - prioritized, parametrized or hybrid DataRate - m - bytes per second TimeUnit -o- this integer field specifies the smallest time interval in μs PeakDataRate -o- bytes per second MaxBurstSize -o- bytes MinServiceRate -o- bytes per second ReservedServiceRate -o- bytes per second MaxPacketSize -o- bytes E2EMaxDelayHigh -o- desired upper bound of the End-to-End Delay, in microseconds – E2EMaxJitter -o- microseconds – E2EMaxDelayLow -o- express that packet delays smaller than E2EMaxDelayLow are not necessary, in microseconds – QosSegmentSpecificParameters - Interface ID, QoSSegment ID and Segment specific delay and jitter values
3
In Access QoS - GMPLS/RSVP
Generalised MPLS (GMPLS) is a suite of protocols developed by the IETF for reserving resources in networks that may consist of multiple network technologies, for example MPLS, OTN, SDH. The signaling protocol, RSVP-TE, is of interest here as it is responsible for the actual reservations. The GMPLS suite involve other protocols, e.g. OSPF-TE, which is responsible for distributing routing information such as available bandwidth on a particular link (see Fig. 4). RSVP-TE reserves resources by transmitting a request (the RSVP Path message) from the ingress node through the network to the egress node. The egress node confirms the reservation by replying with a RSVP Resv message which traverses the same path as the request back to the ingress. Any of the network nodes involved in the reservation may upon reception of either message abort the
232
L. Brewka et al.
setup by transmitting a PathErr/ResvErr message if e.g. its available resources are less than the requested amount. A GMPLS network may include other entities separate from the network nodes themselves, such as a Service Management System for initiating the reservation process or a Path Computation Engine that calculates which path is suitable for a particular reservation. Since GMPLS is an extensive effort we will not go into details, more information can be found at the IETF work group CCAMP homepage [5].
Fig. 4. The GMPLS architecture
3.1
Prioritized QoS in GMPLS
Prioritized QoS in GMPLS network is based on the Differentiated Services (DiffServ) where the Per Hop Behavior (PHB) defines how the flows associated with particular label should be processed in the node, this information is carried in the RSVP-TE DiffServ Object [6]. The RSVP-TE can signal DiffServ in two ways; – for packet oriented networks E-LSP like approach could be used, where packets or frames can contain priority indication. E-LSP (originally designed for MPLS and named after Experimental (EXP) bits in the Shim header) support multiple Ordered Aggregates (OAs), the priority bits indicate the PHB to be applied to the packet (OAs are the DiffServ Ordered Aggregate, when the traffic belongs to single OA then it is assigned the same Per Hop Behavior Scheduling Class (PSC) and drop precedence), – for cases where priority is determined by the label (e.g. for cases where there is no possibility of using the priority bits like in λ switching) L-LSPs are used. L-LSP is used to carry the traffic belonging to single OA, supports single PSC that is signaled during the LSP setup procedure (Path message), in this case the priority bits could be used for drop precedence indication.
Mapping QoS Parameters between UPnP and GMPLS
233
In GMPLS the Shim header in most cases will not be available and consequently it is impossible to pass traffic requirements using the EXP bit. That is why for later described mapping and further implementations we will consider the LLSPs. The DiffServ object for the L-LSP is presented on the Fig. 5 0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Length = 8 Reserved
Class-Num 65 (DiffServ)
Class-Type 2(L-LSP)
PHBID
Fig. 5. DiffServ object for the L-LSP
3.2
Parameterized QoS in GMPLS
In the parameterized setup two types of services are distinguished Controlled Load [7] and Guaranteed Services [8]. Control Load should provision QoS in order to give a flow forwarding characteristic that a flow would receive in case of unloaded network. The Controlled Load traffic parameters are listed below: – – – – –
Token Bucket Rate (r) Token Bucket Size (b) Peak Data Rate (p) Minimum Policed Unit (m) Maximum Packet Size (M)
The Guaranteed Services provide a specific QoS with no packet drop guarantees and delay boundaries, and as such the list of its parameters is extended with the delay information: – – – – – –
Token Bucket Rate (r) Token Bucket Size (b) Peak Data Rate (p) Minimum Policed Unit (m) - used for overhead calculation Maximum Packet Size (M) Rate (R) - increases the token bucket rate (r) to reduce queuing delays such that - r=
234
L. Brewka et al.
In order to collect the information about the capabilities and resources available on a path the Path message is carrying the Adspec object that is updated by the traversed nodes. Once the Path message reaches the destination it reflects the end-to-end state of the network path. The Adspec object is composed of a default fragment for both Control Load and Guaranteed Services and from service specific fragments. The default Adspec contains number of hops, BW estimate, Minimum path latency, and Composed MTU, if present the Guaranteed Services fragment contains additional values rate-dependent (the C term) and rate-independent (the D term) error factors both end-to-end and from the last traffic shaping point.1 The Flowspec object is traversing the network in the reverse direction as part of the Resv message and contains the Receiver TSpec that describes the traffic flow and the Rspec defining the desired service parameters required for service to be invoked.
4
Inter-domain Control and Management Plane QoS Interworking
The studies of the QoS mechanisms and methodologies used in UPnP-QoS and GMPLS show a good match between the UPnP-QoS TrafficDescriptor and RSVP-TE parameters. The following sub-sections will separately treat the mapping for prioritized and parameterized QoS setups. 4.1
Inter-domain Mapping for Prioritized QoS
In the prioritized QoS setup case the mapping can be considered fairly straight forward. The only parameter that is used in the UPnP domain is the TIN which should be mapped into the PHB in the RSVP-TE domain. For the simplest case, eight TINs could be mapped into the eight different values of the EXP bits, defining one-to-one mapping, though as described before that could be done only for a case of a packet oriented network e.g. MPLS where each packet carries the EXP bit in the Shim header. For a more general case where the TIN matching has to be done with the L-LSP, the LER connected to the home link has to be aware of what is the level of QoS support in particular LSP in order to properly match TIN with PHB. It could be realized by having a number of pre-established LSP matching the number of supported classes and the information about the PHBID assigned to a particular LSP stored in the LER. For cases of dynamic L-LSP establishment the LER needs to ad hoc match the PHBID with the TIN and setup the LSP with proper PHB properties. 1
The error term C is the rate-dependent error term. It represents the delay a datagram in the flow might experience due to the rate parameters of the flow e.g. serializing delay; the error term D is a rate-independent error term representing the worst case non-rate-based transit time variation per element [8].
Mapping QoS Parameters between UPnP and GMPLS
235
The situation becomes more complex when there is a mismatch in a number of classes in the UPnP home and GMPLS - DiffServ access (that can be a case for example when networks are setup at different time using different policies). For such a case there is a need for class merging or splitting. These could be addressed in a couple of ways: – the traditional approach would be merging basing on the traffic properties; merging all control and management traffic in one group, all real-time traffic classes in the other group, and similarly with all assured forwarding and all best effort flows. – the mismatch in a number of traffic priorities could be also addressed in other way. Within the scope of the project we are considering remote management of the Home Gateway (HG) using the TR-069 [11]. For such case it would be possible to limit the number of TINs returned by the QPH for flows that will be directed to the access networks, and in this way achieve a one-to-one mapping. Using TR-069 also addresses, pointed out by [2] the issue of end users responsibility to keep their device’s rule sets up to date. 4.2
Inter-domain Mapping for Parameterized QoS Setup
In order to perform mapping for parameterized QoS setup (and we assume here that both home and access networks support this QoS type) the most important task is to match all required RSVP SendersTSPEC parameters with the UPnP Traffic Descriptor. The part of the Traffic Descriptor that contains the information required for parameterized QoS setup and mapping is the v3TrafficSpecification described in Section 2. This UPnP QoS traffic flow specification has to be mapped into the Control Load or Guaranteed Services parameters described in the previous section. Table 2 presents the proposed mapping between the UPnP QoS parameters and GMPLS/RSVP-TE parameters. Explanation for unmapped parameters and clarification of chosen mappings is described below. The MinServiceRate parameter is defined as the minimal bit-rate that is acceptable as a resource reservation for the requesting application [12], it is not mapped as there is no equivalent parameter in the GMPLS domain. This is not an issue, as the reservation is performed to provision the proper QoS for the service in question and the Data Rate parameter is sufficient for that purpose. There is no parameter defined in the UPnP-QoS that could indicate the Minimum Policed Unit (m) which indicates the minimum size of the processed packets in order to estimate the worst case overhead for bandwidth calculation [10]. Translation of this information is not mandatory though its lack might cause miscalculation of available bandwidth. Rate R is the reserved service rate, this is the rate parameter contained in the RSpec (Receivers Specification) and reflects the actual rate that is reserved. This information should also be fed to the CP to update the TrafficDescriptor.
236
L. Brewka et al. Table 2. Mapping between UPnP-QoS parameters and GMPLS-RSVP UPnP QoS parameter RequestedQosType Data Rate Time Unit Peak Data Rate MaxBurstSize MinServiceRate ReservedServiceRate MaxPacketSize E2EMaxDelayHigh E2EMaxJitter E2EMaxDelayLow ServiceType
GMPLS/RSVP-TE parameter DiffServ/IntServ Token Bucket Rate (r) 1000000 Peak Data Rate (p) Token Bucket Size (b) Rspec (R) - FLOWSPEC //TODO Maximum Packet Size (M) Minimum Policed Unit (m) to be calculated - Ctot, Dtot // TODO to be calculated - Min and Max Latency Minimum Path Latency //TODO Slack Term 0 (CL) or 1 (GS)
Slack Term [8] expressed in microseconds is used to indicate the difference between the requested and obtained delay due to the fact that the packets are transmitted with the Rate R from the RSpec instead of Token Bucket rate r. The delay and jitter parameters could be used for path selection, but this is out of scope for our work at this stage, instead we will focus on communicating the delay and jitter values between access and home networks. The most critical delay related parameter is E2EMaxDelayHigh. As the LSR does not have any knowledge about the committed delay in the home network it cannot be sure that the LSP total delay is low enough to meet the requirement of the requesting application. In order to save resources we propose a LER behavior where the LSP is released or an error is signaled once the LSP delay is higher than the requested E2EMaxDelayHigh. Additionally, the interface between home and access network should include the possibility of reporting the MaxCommittedDelay parameter (in UPnP-QoS terminology) for the LSP. That will allow the QM to send the E2EMaxCommittedDelayHigh in the UpdatedTrafficDescriptor (being the result of traffic admittance on network devices) to the CP. The UpdatedTrafficDescriptor received by a CP would include delay calculated until the end of the LSP in the access network, which allows the CP to verify if the obtained delay value is within acceptable bounds. The maximum delay for LSP can be calculated based on the token bucket parameters, Ctot , and Dtot values according to the formula 1 [10]. The resulting parameter, as described earlier, should be mapped to MaxCommitedDelayHigh and should be reported to the QM. maxE2Edelay = b/R + Ctot /R + Dtot
(1)
where b is the tocken bucket depth, R is the reserved rate, Ctot and Dtot are the described earlier error rates.
Mapping QoS Parameters between UPnP and GMPLS
237
6HUYLFH'HVWLQDWLRQ,3 83Q3
+*/(5 4' 40
43+
4' 4'
4' 9,17)
*03/6 $GDSWRU '5$*21
,17) ,17)
*03/6
3$
*03/6
7+ 69 5(
/65 /65
3$ 7+
/65
9 (6 5
5(69
3$7+
9R'6HUYHU
/(5
6HUYLFH6RXUFH,3 Fig. 6. The UPnP/GMPLS testbed architecture
For reporting MaxCommitedJitter (where MaxJitter is the upper bound on the end-to-end jitter defines as a difference between maximum of End-to-End Delay and the minimum of End-to-End Delay [12]) we propose the maximum LSP jitter to be calculated based on the Minimum Path Latency (part of the default Adspec [10]) assuming that formula 2 holds. M axCommitedJitter = max(Jitter1 , Jitter2 , ...) ≤
(2)
b/R + Ctot /R + Dtot − M inimumP athLatency where Jittern is a jitter value based on a number of consequential packet delay measurements. This value similarly as for the delay values should be reported to the QM which composes E2EMaxCommitedJitter value to be sent to the CP in the UpdatedTrafficDescriptor.
238
L. Brewka et al.
4.3
Implementation
We have implemented an OSGi-based interface that acts as a proxy between the home and access networks. Upon receiving a traffic description the interface converts the UPnP Traffic Descriptor into parameters expected by the access network testbed. The access network used is based on a number of virtual machines running a modified version of the GMPLS suite DRAGON [13]. It has been tested and proved to successfully interface the different domains. The architecture of the testbed which presentation is out current objective is depicted on Fig. 6. 4.4
Network Security Consideration
When deploying a system that allows an end-user interact with the access network control plane, security is a large concern. We propose to integrate QoSsetup with existing AAA solutions, where users are authenticated and granted access to certain services. The amount of accessible resources could be controlled by the users account type and one could imagine that for example premium subscribers have access to more resources and/or have priority in case of QoS preemption etc.
5
Conclusions and Future Work
This paper presents the proposal for the integration of the UPnP-QoS architecture in home network with GMPLS based access. The parameters required for inter-domain QoS provisioning are outlined and mapping between different domains is presented, while making sure that all relevant information is translated. Additionally, some signaling between domains is presented which allows reporting delay and jitter parameters in order to achieve end-to-end view during traffic setup procedure. The work presented here is a first step towards the test setup where after the development of interface capable of signaling indicated here parameters the integration of the UPnP-QoS architecture with GMPLS test-bed will be presented. Acknowledgments. The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7) under project 212 352 ALPHA "Architectures for fLexible Photonic Home and Access networks".
References 1. UPnP Forum. UPnP QoS Architecture:3 Service Template Version 1.01 For UPnP Version 1.0 (November 2008) 2. But, J., Armitage, G., Stewart, L.: Outsourcing automated qos control of home routers for a better online game experience. Communications Magazine, IEEE 46(12), 64–70 (2008)
Mapping QoS Parameters between UPnP and GMPLS
239
3. Good, R., Ventura, N.: End to end session based bearer control for ip multimedia subsystems. In: IFIP/IEEE International Symposium on Integrated Network Management, IM 2009, pp. 497–504 (June 2009) 4. Welzl, M., Muhlhauser, M.: Scalability and quality of service: a trade-off? Communications Magazine, IEEE 41(6), 32–36 (2003) 5. Common control and measurement plane, http://tools.ietf.org/wg/ccamp/ 6. Le Faucheur, F., Wu, L., Davie, B., Davari, S., Vaananen, P., Krishnan, R., Cheval, P., Heinanen, J.: Multi-Protocol Label Switching (MPLS) Support of Differentiated Services. RFC 3270 (Proposed Standard), Updated by RFC 5462 (May 2002) 7. Wroclawski, J.: Specification of the Controlled-Load Network Element Service. RFC 2211 (Proposed Standard) (September 1997) 8. Shenker, S., Partridge, C., Guerin, R.: RFC 2212: Specification of guaranteed quality of service, Status: Proposed Standard (September 1997) 9. Braden, R., Zhang, L., Berson, S., Herzog, S., Jamin, S.: Resource ReSerVation Protocol (RSVP) – Version 1 Functional Specification. RFC 2205 (Proposed Standard), Updated by RFCs 2750, 3936, 4495 (September 1997) 10. Wroclawski, J.: RFC 2210: The use of RSVP with IETF integrated services, Status: PROPOSED STANDARD (September 1997) 11. The Broadband Forum. TR-069 CPE WAN Management Protocol v1.1 (December 2007) 12. UPnP Forum. UPnP QosManager:3 Service Template Version 1.01 For UPnP Version 1.0 (November 2008) 13. Jabbari, B., Lehman, T., Sobiesky, J.: Dragon: A framework for service provisioning in heterogenous grid networks. Communications Magazine, IEEE 44(3) (March 2006)
Methodology towards Integrating Scenarios and Testbeds for Demonstrating Autonomic/Self-managing Networks and Behaviors Required in Future Networks Vassilios Kaldanis1, Peter Benko2, Domonkos Asztalos2, Csaba Simon3, Ranganai Chaparadza4, and Giannis Katsaros1 1
VELTI S.A. -Mobile Marketing & Advertising, 44 Kifissias Ave, GR 15125 Maroussi, Greece {vkaldanis,gkatsaros}@velti.com 2 Ericsson Hungary, Laborc u. 1, H-1037, Hungary {peter.benko,domonkos.asztalos}@ericsson.com 3 Budapest University of Technology and Economics, Dept. of Telecom and Media Informatics, Magyar T. krt. 2, 1117 Budapest, Hungary
[email protected] 4 Fraunhofer FOKUS Institute for Open Communication Systems, Kaiserin-Augusta-Allee 31, Berlin, Germany
[email protected]
Abstract. In this paper we report an insight of our experiences gained in devising a methodology for validating Scenarios demonstrating autonomic/selfmanaging network behaviors required in Future Networks—powered by IPv6 and its evolution along the path to the Self-Managing Future Internet. Autonomic networking introduces “autonomic manager components” at various levels of abstraction of functionality within device architectures and the overall network architecture, which are capable of performing autonomic management and control of their associated Managed-Entities (MEs) e.g. protocols and mechanisms, as well as co-operating with each other in driving the selfmanaging features of the Network(s). MEs are started, configured, constantly monitored and dynamically regulated by the autonomic managers towards optimal and reliable network services. There are some challenges involved when designing and applying a framework for integrating and validating Scenarios demonstrating autonomic behaviors we share in this paper, and show how we have addressed them. In this paper, we present the EU funded FP7 EFIPSANS Integration and Validation Framework that we designed for demonstrating a substantial selection of essential autonomic behaviors of “autonomic managers” whose implementations are based on the principles of the GANA architectural Reference Model for Autonomic Networking and Self-Management, and on the IPv6 protocols and associated extensions proposed and developed in the frame of the EC funded FP7 EFIPSANS Project. Keywords: Autonomic behaviors of Decision-Making-Elements (DMEs/DEs), Validation of the GANA Model for Autonomic Networking and SelfManagement, Testbeds Integration, Validation methodology, framework, IPv6 networks, Self-Management, Managed Entities (MEs). R. Szabó et al. (Eds.): AccessNets 2010, LNICST 63, pp. 240–252, 2011. © Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
Methodology towards Integrating Scenarios and Testbeds
241
1 Introduction The main benefits of the self-management technology in systems and networks, from the operator’s perspective are: to minimize operator involvement and OPEX in the deployment, provisioning and maintenance of the network, and increasing network reliability (self-adaptation and reconfiguration on the fly in response to challenges e.g. faults, errors, failures, attacks, threats, etc) [1], [2], [3], [4], [5], [6], [8], [10]. There are some challenges involved when designing and applying a framework for integrating and validating Scenarios demonstrating autonomic/self-managing network behaviors required in Future Networks. A Scenario must consist of a clear description of the problems/limitations with the “current network management practices” and/or “current technology that come with the devices/systems of today”. The problems/limitations are with respect to either of the following needs: a)
b)
Reducing human involvement in the management aspects considered while at the same time reducing the probability of introducing faults into any item supplied as input to the devices/network for its operation e.g. policyspecifications, configuration data, etc; OR the introduction of advanced algorithms, components and mechanisms that enable the network entities to perform Self-* operations such as autodiscovery, self-configuration, self-healing, self-protection, self-diagnosis, and self-optimization operations towards guarantee reliable services, including on-demand services.
Special components called “autonomic manager components” (referred to as DEs in the GANA Model [2], [3]) introduced into node/device architectures and the overall network architecture are meant to address the two issues (“a” and “b”). In a Scenario, one must be able to talk and reason about either “current practices” and/or “current technology that come with the devices/systems of today”, and that the Scenario then reflects what is being solved by Self-* technologies being introduced by the prototyped components, mechanisms and algorithms. The integration challenges include the following: • • • •
Interconnecting multiple testbeds environments and diverse types of testbeds required by each Scenario; Validating the functionality, algorithms, autonomic behaviors and architectures on which to realize the Scenario, and architectures proposed by the research and prototyping team. Visualization of the behaviors of “autonomic manager components” involved in a Scenario, How to visualize the autonomic architectural Reference Model framework in action, i.e. the framework applied to derive the implementation of the components of the Scenario architecture being demonstrated.
242
V. Kaldanis et al.
2 Autonomic Networking and Self-management Fundamentals The concept of autonomicity—realized through control-loop structures embedded within node/device architectures and the overall network architecture as a whole is an enabler for advanced self-manageability of network devices and the network as a whole. The emerging GANA architectural Reference Model for Autonomic Networking and Self-Management ([1], [2], [3]) introduces “autonomic manager components” at four various levels of abstraction of functionality within device architectures and the overall network architecture, which are capable of performing autonomic management and control of their associated Managed-Entities (MEs) e.g. protocols, as well as co-operating with each other in driving the self-managing features of the Network(s). MEs are started, configured, constantly monitored and dynamically regulated by the autonomic managers towards optimal and reliable network services. The GANA Model defines a framework of hierarchical “autonomic managers” referred to as Decision Elements (DEs) in GANA, at four levels of abstraction of functionality ([1], [2], [3]). The fundamental principles of the setup and operation of an autonomic network can be described as three cascaded phases of some automated behaviors of nodes/devices being connected together to form an autonomic network, namely: • • •
[Phase-1]: Boot-up and Bootstrap Phase for each initializing node/device; [Phase-2]: Auto-Configuration Phase for each node/device and the network as a whole; [Phase-3]: Operation and Self-Adaptation Phase for each node/device and the network as a whole, i.e. adaptation to challenges such as faults, errors, failures, and adverse conditions, and to policy changes by the human.
The following automated behaviors of node/devices and the network (realized as autonomic behaviors orchestrated or triggered by autonomic managers i.e. GANA DEs) apply to some phases (from the three described above): (1) Auto-Discovery (Network-Layer-Services Discovery, Service/Application-LayerServices Discovery): The associated behaviours apply to Phase-1, and some behaviors related to Auto-Discovery for more advanced service provisioning requirements beyond the minimal required at bootup/bootsrap time may still be attempted during the operation and self-adaptation time of a node/device or network. (2) Auto-Configuration/Self-Configuration (in the Service-Layer and NetworkLayer). The associated behaviours apply to Phase-2. (3) Self-Diagnosing and Self-Healing, Self-Optimization, other Self-* functions. The associated behaviours apply to Phase-3. Such automated behaviors must be orchestrated and regulated by specific contextaware Decision Elements (DEs) designed to detect context, start, configure, and constantly monitor and dynamically regulate the behavior of their specifically assigned (by design) Managed Entities (MEs) i.e. managed resources such as protocols, protocols
Methodology towards Integrating Scenarios and Testbeds
243
stacks and mechanisms. More details on such phases and behaviors can be found in [6], [8], [9], [10], [11], [12], [13]. What determines autonomicity for a functionality are two things: (1) the autodiscovery of items required by the functionality to perform an auto-configuration/selfconfiguration process; (2) the predictions/forecasting and listening for some events and reactions by the Decision Element (DE) that controls and adapts the behaviour of the functionality towards some goal, based on the events.
3 Scenarios and Demonstration Testbeds The following key functionalities for which autonomic elements (DEs) emerged for specification and design, for the selected diverse networking environments (i.e. instantiation cases for the GANA Model) and are demonstrated in the testbeds: • Routing and Autonomicity. Special DEs that implement control-loops over the “management interfaces” of routing protocols and mechanisms as their associated Managed Entities (MEs). The DEs apply configuration profiles (which include policies) on this type of MEs, and then react to incidents, state changes and context changes by communicating with other DEs to enforce changes on the behavior of various types of MEs of the devices to ensure optimal conditions of network operation. Parameters of the MEs are dynamically adjusted e.g. Timers and link weights in OSPF are dynamically adjusted by the RoutingManagement-DEs (see [8], [10] for more details). For other general aspects related to Control Plane and Autonomicity: special DEs apart from RoutingManagement-DEs have been introduced for addressing these other aspects of the control plane (signalling plane). • Data Plane & Forwarding and Autonomicity. Special DEs that implement control-loops over the “management interfaces” of the Data Plane and forwarding protocols and mechanisms as their associated Managed Entities (MEs). The DEs apply configuration profiles (which include policies) on this type of MEs, and then react to incidents, state changes and context changes by communicating with other DEs to enforce changes on the behavior of various types of MEs of the devices to ensure optimal conditions of network operation. Parameters of the MEs are dynamically adjusted e.g. IPv6 forwarding-engine parameters, MPLS related Management Objects, and other types of Layers-1/2/2.5/3 related parameters. Parameters are dynamically adjusted by the Data Plane and Forwarding-Management-DEs (see [10] for more details). • Auto-Discovery, Auto-Configuration / Self-Configuration, Self-Provisioning and dynamic Re-Configuration. Special DEs that implement control-loops over the “management interfaces” protocols and mechanisms of a node/device that is fundamental to enabling the device to advertise and update its capabilities to the network, and to discover network resources at boot-up time and during the device’s operation. The DEs apply configuration profiles (which include policies) on this type of MEs, and then react to incidents, state changes and context changes by communicating with other DEs to enforce changes on the behavior of various types of MEs of the devices to ensure optimal conditions of network operation (see [6] for more details).
244
V. Kaldanis et al.
• Mobility Management and Autonomicity. Special DEs that implement control-loops over the “management interfaces” of mobility protocols and mechanisms as their associated Managed Entities (MEs) e.g. MIPv6 and PMIPv6. The DEs apply configuration profiles (which include policies) on this type of MEs, and then react to incidents, state changes and context changes by communicating with other DEs to enforce changes on the behavior of various types of MEs of the devices to ensure optimal conditions of network operation. Parameters of the MEs are dynamically adjusted by the Mobility-Management-DEs (see [11] for more details). • QoS Management and Autonomicity. Special DEs that implement controlloops over the “management interfaces” of QoS protocols and mechanisms as their associated Managed Entities (MEs). The DEs apply configuration profiles (which include policies) on this type of MEs, and then react to incidents, state changes and context changes by communicating with other DEs to enforce changes on the behavior of various types of MEs of the devices to ensure optimal conditions of network operation. Parameters of the MEs are dynamically adjusted by the QoS-Management-DEs (see [11], [13] for more details). • Resilience, Survivability, and/for Autonomicity. Special DEs that implement control-loops over the “management interfaces” of resilience and survivability protocols and mechanisms as their associated Managed Entities (MEs). The DEs apply configuration profiles (which include policies) on this type of MEs, and then react to incidents, state changes and context changes by communicating with other DEs to enforce changes on the behavior of various types of MEs of the devices to ensure optimal conditions of network operation. Parameters of the MEs are dynamically adjusted by the Resilience & Survivability-DEs (see [12] for more details). • Autonomic Fault-Management. Special DEs that implement control-loops over the sub-interfaces of the “management interfaces” of components and modules of devices that enable the fault-management operations to be performed by Fault-Management-DEs on the components/modules (as MEs). Also, FaultManagement-DEs manage and control special MEs that handle challenges such as detection of faults, errors, failures. The DEs apply configuration profiles (which include policies) on this type of MEs, and then react to incidents, state changes and context changes by communicating with other DEs to enforce changes on the behaviour of various types of MEs of the devices to ensure optimal conditions of network operation. Parameters of the MEs are dynamically adjusted by the Fault-Management-DEs (see [12] for more details). • The role of Monitoring in enabling Autonomicity, and SelfMonitoring/Autonomic Monitoring as an autonomic feature. Special DEs that implement control-loops over the “management interfaces” of Monitoring protocols and mechanisms as their associated Managed Entities (MEs). The DEs apply configuration profiles (which include policies) on this type of MEs, and then react to incidents, state changes and context changes by communicating with other DEs to enforce changes on the behaviour of various types of MEs of the devices to ensure optimal conditions of network operation. The MEs are orchestrated and parameters of the MEs are dynamically adjusted by the Monitoring-DEs (see [13] for more details).
Methodology towards Integrating Scenarios and Testbeds
245
4 Integration and Validation Framework In this section the overall approach to Integration and Validation is described in general. We consider that the use of Templates should be applied to describing Scenarios in such a way as to show the paradigm shift brought by Autonomics and SelfManagement, as well as showing the benefits brought by the various technologies conveyed by the Scenario, to key players: manufactures, Operator/network management personnel, content providers, etc. Also, some questionnaires are used answered by all project partners/developers in describing information that helps those building the testbeds. From our experience, some decisions must be made when selecting Open source tools, such as the selection of common libraries to ease the integration of Scenarios whose components emerge from multiple partner testbeds. 4.1 Integration Methodology There are twelve (12) template-based Scenarios defined by EFIPSANS that span over heterogeneous networking environments, functionality and use-cases. In order to give a clear, uniform picture on the overall benefits of autonomic networking, the integration of some of these Scenarios in a use case trial is required. The objectives of integration are the following: Harmonize the autonomic functions to be demonstrated with regard to inter-operability and networking environment; Create a common testbed that can be used for experimentation; and Describe a high-level story-line for the Scenarios. Monitoring testbed FOKUS testbed BUPT testbed
Autonomic QoS Management
ETH-BME testbed Auto-Collab. for Optimal Network Resource Utilization
Traffic Monitoring and QoS Management
Auto-Conf. of Routers using Routing Profiles
RiskAware Routing
TARC testbed
Autonomic FM for Selected Types of Black Holes
WUT testbed
GRNET testbed
Emulation
Autonomic Mobility and QoS Management
Autonomic Multipath Routing
Auto-conf for Mobile Ad Hoc Networks
Auto-Conf of Radio Channels
Provision of Autonomic Services in Self-Conf Environments
MARSIAN Satellite Demo
WARF Satellite Demo
MANET Standalone Demo
Auto-Conf/ Self-Conf of Addresses
QoS Satellite Demo
Phase 1
Phase 2
Phase 3
Integrated Demonstrator Fixed Network Environment
Wireless - 802.11
Fig. 1. EFIPSANS Scenarios and Testbeds versatility
Mobility Standalone Demo
Cellular
246
V. Kaldanis et al.
In order to fulfill the above objectives, first we selected scenarios that can be used in a given network environment. This is necessary since most autonomic behaviors in EFIPSANS are specific to a certain network environment, such as fixed, wireless or cellular. The grouping of scenarios ensures that each scenario is demonstrated in the appropriate environment Fig. 1 next shows how different scenarios were mapped to different networking environments. The yellow boxes represent the individual scenarios while in the bottom of the figure the magenta boxes indicate the networking environment. The code names illustrated (BUPT, ETH-BME, TARC, WUT, GRNET, etc) corresponds to project partner shortcuts and therefore can be ignored. One of the main objectives here was to create a proof-of-concept testbed that can be used to demonstrate the autonomic functions researched and developed by the project. Since the consortium members are spread practically all over Europe, it would have required considerable effort to create an integrated testbed that is installed at a single geographical location. However, the public Internet infrastructure enables a more or less straightforward interconnection of fixed networks. This motivated our decision to create a common integrated demonstrator core testbed that is composed of interconnected network segments. This core testbed will host a number of important scenarios as seen in Fig. 1 that cover all the areas of GANA-defined autonomicity and focus on the wired networks. The interconnection of the networks is based on a layer 2 tunneling solution, which enables passing both link layer and IPv6 packets (see Fig. 2). The configured tunnels transfer layer 2 packets over IPv4 packets. The tunneling choice was motivated by the fact that the connectivity provided by the current
X
Client 1 Client 2
X
X X
Network Level DEs
FOKUS
Client 3 X
X
Remote tunnels
Router
X
Content Server 1
X Router
BME Partner’s testbeds
Fig. 2. Integrated Testbed overview
Router
Ericsson
Content Server 2
Methodology towards Integrating Scenarios and Testbeds
247
Internet is still based on IPv4 dominantly. We chose to tunnel layer 2 packets so that in addition to IPv6 packets, link layer packets can also be exchanged between the tunnel endpoints. This provides a totally transparent connectivity on layer 2 and on layer 3, which is necessary to demonstrate some of the EFIPSANS scenarios. 4.2 Validation Methodology EFIPSANS validation methodology in principle aims to validate a number of fundamental (to autonomic network engineering) features of an autonomic network categorized accordingly to the project objectives per network type (fixed or mobile), functionality (layer-specific), topology (e.g. mobile ad hoc) and other (e.g. security). These features have been categorized in general under the following five key concepts: • • • • •
Auto-Discovery Auto/Self-Configuration Autonomic Routing & Self-Adaptation Autonomic Mobility & QoS Management Autonomic Network Monitoring & Fault-Management
Validation of the former key concepts is required to assess their impact on the project technology framework and evaluate to what extend the defined R&D challenges and objectives coming from those key concepts were successfully addressed and implemented within the project lifespan. In order to successfully complete such an assessment a common concept and step-based evaluation process must be specified to guarantee a smooth effective and unified evaluation. EFIPSANS validation methodology incorporates a number of purpose-driven activities with specific expected outcomes such as: 1) Analysis of project specific documentation deemed suitable and essential in helping analyze the former key autonomic network functionalities, in terms of: • • • •
Identification that the indentified R&D features and challenges at the project’s start phase have been implemented into the underlying framework Identification and analysis of specific project deliverables which provide evaluations and recommendations on the adoption of key concepts in the individual work packages. Identification and analysis of project publications related to the key concepts to gain feedback and recommendations Analysis of the outcomes of the project related events (e.g. workshops) to get external insight on how the key concepts were received and anticipated by the general public.
2) Completion and distribution of specific questionnaires Completion of specific key concept questionnaires to be distributed to various (business and technology) groups in order to obtain feedback and recommendations of issues like: • •
IPv6 vs. IPv4 Autonomic systems awareness and usability
248
• • • • •
V. Kaldanis et al.
Anticipation of autonomic behaviors and advanced functionality How the key autonomic concepts improve end user experience Benefits for Industry players (ISPs, Operators, SPs) from deploying relevant autonomic features and functionality in existing infrastructures Autonomic networking: Applicability, deployment and acceptance Impact in higher-layer services and application deployment
3) Validation via Simulation/Emulation Validation of the specific key concepts via simulation/emulation is used as part of the overall evaluation of the project’s key concepts mainly to assess important issues around performance, stability and scalability that cannot be easily estimated in the project testbeds. Specialized satellite (to the integrated) testbeds deployed simulation/emulation approaches in order to prove and experiment with autonomic features around Mobility and QoS/QoE management in mobile/wireless environments in scenarios where real operator’s core/access network mechanisms and functionality were represented successfully. 4) Analysis of the Qualitative/Quantitative Testing, Results and Reporting Selected Qualitative / Quantitative (Q&Q) tests and results tightly dependent to the project scenarios that directly incorporate autonomic functionality related to the former key concepts have been selected for analysis and evaluation. This can work towards identifying: • What certain innovations and advancements have been validated in each scenario • Recommendations / lessons learnt from the integration of DEs/MEs with specific functionality implementing each key concept • What is the impact of each scenario and relevant DEs/MEs in the respective business and technology area in the present and future systems • Identification of any problems encountered during pilot or productive system operation 5) Analysis of key concepts to the project’s Business Model The fact that existing project scenarios already analyze the business aspects around the corresponding key concept(s) they deal with (as part of the overall business framework) creates itself a necessity to evaluate in practice the real impact of each autonomic concept in industry today. This will reveal the real value added by the project in the converged IP-based systems of today and tomorrow especially in the area of autonomic service management. 6) Analysis of the key concepts with respect to the standards The fact that the project is tightly coupled with IPv6 technology framework and is expected to highly influence IPv6 deployment in current and future systems that are enhanced with autonomic functionality create itself the inherent need for a circumstantial analysis surrounding standardization issues. Addressing of important standardization issues around required IPv6 extensions (IPv6++) [4], [10] will allow the aforementioned key concepts to have a compatible and aligned impact in this area and particularly:
Methodology towards Integrating Scenarios and Testbeds
249
• Have the results or concepts coming from the project been taken up outside the project consortium? • Which specific IETF drafts or standards have the key concepts contributed towards? • How the adopted strategy achieved its purposes • Setup the required framework for other projects to build on the top and expand further IPv6++ context [4], [10]. The formerly described validation framework aims to complements the project’s demonstration environment around the key concepts in a number of scenarios implemented in specific testbeds (integrated and satellite ones). The Fig. 3 next illustrates how the developed components that realize the project key concepts are mapped within the different physical nodes per work package as software installed on these nodes in the testbeds. MOBILE TERMINAL
APPLICATION SERVER FARM Application Server 1
Application
Application Server 2
….
Context Awareness ACCESS & CORE NETWORK
Personalization Network Monitoring & Management
QoS Management
Context Awareness
Mobility Management
Personalization
QoS Management
AAA & Security
Network Monitoring & Management
Mobility Management AAA & Security WLAN AP & ACCESS ROUTER Resource Discovery Addressing
Routing
Resource Discovery Routing
QoS Management
Addressing
AAA & Security Physical Node
Fig. 3. EFIPSANS Validation Framework
The grey box represents a physical node (e.g. mobile terminal, router, etc) and the colored boxes within each grey box represent autonomic components or modules that implement purpose-specific autonomic functionality (in the form of DEs/DMEs) on the top of existing functionality (e.g. QoS management). 4.3 IPv6 Integration and Validation This section reflects on the aspects related to IPv6 and the EFIPSANS proposed Extensions to (IPv6++) required for designing and building IPv6-based Autonomic Networks
250
V. Kaldanis et al.
and Services (we refer to the upcoming EFIPSANS deliverable D2.6 [1] for IPv6++). It also summarizes the key features of IPv6 that make the integration and validation of large scale autonomic networks in Testbeds easy to achieve. IPv6 features such as auto-discovery e.g. neighbor and parameter discovery, autoconfiguration, advanced addressing schemes and route aggregation, and Support for large address space can be considered as enablers for designing large-scale Testbeds. This is because scalability and some automated discovery and auto-configuration features are requirements for facilitating for more advanced autonomic/self-managing network behaviors that leverage the basic auto-discovery and auto-configuration of nodes’ interfaces. But how EFIPSANS achieves autonomic management and control of IPv6 Protocols through its mechanisms? Here autonomic management and control of IPv6 protocols and mechanisms as socalled Managed Entities (MEs) at GANA’s lowest level/layer, is based on the assignment of specific IPv6 protocols and mechanisms to specific Decision Elements (DEs) that autonomously manage and regulate/control the behavior of the different MEs. Autonomic routing involves the development of Routing-Management-DEs that are meant to be context-aware and to start, configure, monitor and dynamically regulate the behavior of IPv6 protocols and mechanisms of specific devices (as the associated MEs), such as OSFPv3 (the main routing protocol of focus in EFIPSANS). More information on how the GANA has been instantiated for realizing autonomic routing functionality in wired networks can be found in the EFIPSANS deliverables D1.7 in [1] and particularly in those from work package 1. Regarding instantiation of GANA Mobility-Management DE(s) in an IPv6 network, the associated Managed Entities (MEs) of the Mobility-Management DE(s) are MIPv6 and PMIPv6. For Autonomic QoS Management via the QoS-Management DE(s), the associated MEs are mechanisms such as the IPv6-based DiffServ and IntServ protocols and mechanisms (see [11], [13]). The Managed Entities (MEs) associated with the GANA DEs for Auto-Discovery, Auto-Configuration/Self-Configuration i.e. NODE_MAIN_DE, are protocols and mechanisms such as Neighbor Discovery (ND), DHCPv6, NETCONF, IPv6 Stateless Address Auto configuration. For autonomicity for the Data Plane and Forwarding functionality, parameters of the Data Plane protocols and mechanisms as MEs, are dynamically adjusted e.g. IPv6 forwarding-engine parameters and Layers-1/2/2.5/3 related parameters. Parameters are dynamically adjusted by the Data Plane and Forwarding-Management DEs. Also being validated are IPv6 protocol extensions being proposed by EFIPSANS [1], which include ICMPv6++[9], ND++ (Extensions to the ND protocol), DHCPv6++, PMIP6++.
5 Concluding Remarks We presented our Methodology Towards Integrating Scenarios and Testbeds for demonstrating autonomic/self-managing networks and behaviors required in Future Networks.
Methodology towards Integrating Scenarios and Testbeds
251
The evolvable Architectural Reference Model for Autonomic Networking and SelfManagement called GANA enables the design of interworking hierarchical decisionmaking processes at different levels of abstraction, which react to the changes in the state of the network and its environment (refer to [7] for information on the evolution of the model). The GANA has been successfully “instantiated” by EFIPSANS for autonomic management and control of different types of Managed Entities (Protocols and Mechanisms at GANA’ lowest level/layer” for diverse network environments (Fixed/Mobile/Wireless Networks). Examples include: Autonomic Routing, AutoDiscovery, Auto-Configuration/Self-Configuration, Autonomic Mobility Management, Autonomic QoS Management, Autonomic Resilience, Survivability, Autonomic FaultManagement, Autonomic Monitoring, Autonomic Security Management. We have designed and implemented an integrated testbed that implements the core features of an autonomic network, based on GANA. The testbed serves as proof of concept for the applicability of the GANA model in a heterogeneous networking environment. Since our work on validating the GANA concepts in the testbed continues, we expect to draw more lessons from running some field trials and provide a report on how GANA-based, advanced self-managing IPv6 networks can be build. We seek to show how to build diverse types of autonomic IPv6-enabled networks based on the autonomic management and control of IPv6 and lower layer protocols, and the use of EFIPSANS proposed extensions to IPv6 (IPv6++). We also aim at looking deeper into autonomic network services build on top of such networks. This will demonstrate how the Future Internet will emerge based on an evolution path that focuses on IPv6 and its Extensibility towards the Self-Managing Future Internet.
Acknowledgement This work has been partially supported by EC FP7 EFIPSANS project (INFSO ICT215549).
References 1. EC funded- FP7-EFIPSANS Project, http://efipsans.org/ 2. Chaparadza, R., Papavassiliou, S., Kastrinogiannis, T., Vigoureux, M., Dotaro, E., Davy, A., Quinn, K., Wodczak, M., Toth, A.: Creating a viable Evolution Path towards SelfManaging Future Internet via a Standardizable Reference Model for Autonomic Network Engineering. Published in the book by the Future Internet Assembly (FIA) in Europe: Towards the future internet - A European research perspective. pp. 136–147. IOS Press, Amsterdam (2009) 3. Chaparadza, R.: Requirements for a Generic Autonomic Network Architecture (GANA), suitable for Standardizable Autonomic Behaviour Specifications of Decision-MakingElements (DMEs) for Diverse Networking Environments: published in International Engineering Consortium (IEC) in the Annual Review of Communications 61 (December 2008) 4. Chaparadza, R.: Evolution of the current IPv6 towards IPv6++ (IPv6 with Autonomic Flavours). Published by the International Engineering Consortium (IEC) in the Review of Communications 60 (December 2007)
252
V. Kaldanis et al.
5. Greenberg, A., et al.: A clean slate 4D approach to network control and management. ACM SIGCOMM Computer Comm. Review 35(5), 41–54 (2005) 6. Prakash, A., Starschenko, A., Chaparadza, R.: Auto-Discovery and Auto-Configuration of Routers in an Autonomic Network. In: SELFMAGICNETS 2010: Proc. of the International Workshop on Autonomic Networking and Self-Management in Access Networks, ICST ACCESSNETS 2010, Budapest, Hungary (November 2010) 7. AFI_ISG: Autonomic network engineering for the self-managing Future Internet (AFI), http://portal.etsi.org/afi 8. Retvari, G., Nemeth, F., Chaparadza, R., Szabo, R.: OSPF for Implementing Self-adaptive Routing in Autonomic Networks: a Case Study. In: Strassner, J.C., Ghamri-Doudane, Y.M. (eds.) MACE 2009. LNCS, vol. 5844, pp. 72–85. Springer, Heidelberg (2009) 9. Internet Draft: ICMPv6 based Generic Control Protocol (IGCP):draft-chaparadza-6manigcp-00.txt, https://datatracker.ietf.org/doc/draft-chaparadza-6man-igcp/ 10. Chaparadza, R., et al.: IPv6 and Extended IPv6 (IPv6++) Features that enable Autonomic Network Setup and Operation. In: SELFMAGICNETS 2010: Proceedings of the International Workshop on Autonomic Networking and Self-Management in the Access Networks, ICST ACCESSNETS 2010 (November 2010) 11. Aristomenopoulos, G., et al.: Autonomic Mobility and Resource Management Over an Integrated Wireless Environment-A GANA Oriented Architecture. In: proceedings of the IEEE MENS Workshop at Globecom 2010, Miami, Florida, USA, December 6-10 (2010) 12. Tcholtchev, N., Chaparadza, R.: Autonomic Fault-Management and Resilience from the Perspective of the Network Operation Personnel. In: Proceedings of the IEEE MENS Workshop at Globecom 2010, Miami, Florida, USA, December 6-10 (2010) 13. Liakopoulos, A., et al.: Applying distributed monitoring techniques in autonomic networks. In: Proceedings of the IEEE MENS Workshop at Globecom 2010, Miami, Florida, USA, December 6-10 (2010)
How Autonomic Fault-Management Can Address Current Challenges in Fault-Management Faced in IT and Telecommunication Networks Ranganai Chaparadza1, Nikolay Tcholtchev1, and Vassilios Kaldanis2 1
Fraunhofer FOKUS Institute for Open Communication Systems, Berlin, Germany {ranganai.chaparadza,nikolay.tcholtchev}@fokus.fraunhofer.de 2 VELTI S.A. - Mobile Marketing & Advertising, Athens, Greece
[email protected]
Abstract. In this paper we discuss the perspectives that should be taken into account by the research community while trying to evolve Fault-Management towards Autonomic Fault-Management. The well known and established FCAPS Management Framework for Fault-management, Configuration-management, Accounting-management, Performance-management and Security-management, assumes the involvement of human technicians in the management of systems and networks as is the practice today. Due to the growing complexity of networks, services and the management of both, it is now widely believed within the academia and the industry that the concept of Self-Managing Networks will address some of the current challenges in the management of networks and services. Emerging Self-Management technologies are promising to reduce OPEX for the network operator. There is still a lot of work to be done before we can see advanced, production level self-manageability aspects of systems and networks, beyond what has been achieved through scripting based automation techniques that have been successfully applied to management and network operation processes. The concept of autonomicity—realized through control-loop structures and feed-back mechanisms and processes, as well as the information/knowledge flow used to drive the control-loops), becomes an enabler for advanced self-manageability of networks and services, beyond what has been achieved through scripting based automation techniques. A control-loop can be introduced to bind the processes involved in each of the FCAPS areas, and the “autonomic manager components” that drive the control loops and are specific for different FCAPS should interwork with each other in order to close the gaps characterized by dependencies among FCAPS functional areas as the FCAPS functional areas go autonomic and realize self-management. The dependencies among FCAPS functional areas need to be studied such that the functions/operations and processes that belong to the different areas can be well interconnected to achieve global system goals, such as integrity, resilience and high degree guarantee of system and service availability. Keywords: Autonomic Fault-Management; GANA architectural Reference Model for Autonomic Networking and Self-Management; Resilience; SelfHealing/Self-Repair; dependencies among FCAPS functional areas; Interactions between the Operator and the Autonomic Network. R. Szabó et al. (Eds.): AccessNets 2010, LNICST 63, pp. 253–268, 2011. © Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
254
R. Chaparadza, N. Tcholtchev, and V. Kaldanis
1 Introduction The main benefits of introducing self-management technology in systems and networks, from the operator’s perspective are: to minimize operator involvement and OPEX in the deployment, provisioning and maintenance of the network, and increasing network reliability (self-adaptation and reconfiguration on the fly in response to challenges e.g. faults, errors, failures, attacks, threats, etc). According to the research/survey study conducted and published by authors of [1], operators have recently provided a set of requirements on the evolution of network and services management, offering an insight into the need (requirement) for self-management in networks and processes of the next generation Operations Support Systems (OSS’s). In [1], we learn about other requirements that operators have identified as requiring serious attention by researchers in the coming years. The requirements noted by operators in [1] include, apart from self-management: alarm correlation, self-healing, Auto-Configuration/Provisioning, Model/Interface Integration, Service Quality, Service Modeling and Discovery. The concept of autonomicity—realized through control-loop structures [2, 3] and feed-back mechanisms and processes, as well as the information/knowledge flow used to drive the control-loops, becomes an enabler for advanced self-manageability of networks and services, beyond what has been achieved through scripting based automation techniques. A control-loop can be introduced to bind the processes involved in each of the FCAPS areas, and the “autonomic manager components” that drive the control loops of different FCAPS areas should interwork with each other in order to close the gaps characterized by dependencies among FCAPS functional areas as the FCAPS functional areas go autonomic and realize self-management. Autonomic Manager Components (referred to as “Decision Elements” (DEs) in the GANA Model [3, 4]) must serve the purpose of automating management processes by even executing some scripts while at the same time governing the autonomicity for a particular functionality for which the autonomic manager is responsible for. The autonomicity for a functionality e.g. routing, considers the following: (1) the auto-discovery of items required by the functionality to perform an auto-configuration/selfconfiguration process; (2) predictions/forecasting and listening for some events and reactions by the “autonomic manager” that controls and adapts the behaviour of the functionality towards some goal, based on the events. The dependencies among FCAPS functional areas need to be studied such that the functions/operations and processes that belong to the different areas can be well connected to achieve global system/network goals, such as integrity, resilience and high degree guarantee of system and service availability. In this paper we focus on the FaultManagement functional area of the FCAPS framework and illustrate how to close the gap of its dependencies with other FCAPS functional areas in the context of an autonomic, self-managing network. The following is an illustration of the chain of processes that form a control-loop of an “autonomic manager component” specifically designed to perform Autonomic Fault-Management (we refer the reader to [5, 6] for more details on the subject of Autonomic Fault-Management (AFM), as well as to the subsequent sections of this paper). Autonomic Fault-Management [5, 6, 7] is understood as a control
How Autonomic Fault-Management Can Address Current Challenges
255
loop structure (Figure 1) that facilitates the interplay between the processes of FaultManagement as defined by TMN (Telecommunications Management Network) [8]: Fault-Detection – “detect the presence of a fault”, Fault-Isolation – “find the fault (root cause) for the observed erroneous state”, and Fault-Removal – “remove or reduce impact of the root cause”. The autonomic fault-management control loop is characterized by the behaviour of components (including the respective “autonomic manager components”) and mechanisms that collaboratively work towards implementing the following chain of operations/functions:
Fig. 1. Autonomic Fault Management Control Loop
The rest of the paper is organized as follows: Section 2 presents the implications of FCAPS going autonomic; Section 3 discusses Autonomic Fault-Management in relation to network and service management; sections 4 and 5 discuss further implications of Autonomic Fault-Management; Section 6 and 7 introduce the GANA Model in brief, which forms the basis for a unified architecture for realizing Autonomic FaultManagement, Resilience and Survivability. Finally we give concluding remarks.
2 Closing the Gap of Dependencies among FCAPS Functional Areas as the Areas Go Autonomic and Realize Self-management In [9], we learn of the relationships between dependability of system(s) and security. These relationships i.e. dependencies should inspire efforts towards closing the gap of dependencies among FCAPS functional areas as the areas go autonomic and realize self-management. Indeed, efforts must be stepped up towards creating frameworks that show the dependencies and how to harmonize and integrate the corresponding operations of the diverse FCAPS functional areas. This would avoid having disjoint noninterworking solutions that are currently inherent in OSS (Operations Support Systems), consequences of which are well known and studied as described very well in [1]. In this section we present a consolidated picture on how to link (close the gaps) among the FCAPS functional areas for an autonomic/self-managing network: The links can be established primarily through (1) the interaction and synchronization of “autonomic manager components” that automate management processes involved in a particular FCAPS functional area with those in the other areas, in order to assure high degree of system/network integrity, reliability and availability; (2) the sharing of information/knowledge and data among the functional areas that need to use them in their respective functions; (3) the use of common models (Information Models, Data Models, Ontologies, Resource Description Models, Service Models, Network Description Models, Service Topology Models, Dependability and Causality Models,etc).
256
R. Chaparadza, N. Tcholtchev, and V. Kaldanis
It is in such a consolidated picture that we seek to show the way Autonomic FaultManagement can help address some of the challenges that operators are facing with respect to network and services management. The figure below (Figure 2) illustrates how the gaps can be interpreted and closed. In terms of the dependencies and desired interactions of the FCAPS functional areas, we see that Information/Knowledge sharing, use of Models, Ontologies, and collaborations of the functions belonging to the diverse functional areas are key to closing the gaps to achieve automation and self-management in an autonomic manner. This picture can be applied when reasoning about designing a system e.g. a router or the network architecture as a whole, required to be autonomic and self-managing it its operation. As can be observed on the figure, Autonomic Fault-Management requires, apart from information about detected faults/errors/failures and alarms, some access to information/knowledge about detected threats and attacks in order to use the information during Fault-Diagnosis/Localization/Isolation process, and even during faultforecasting, and fault-removal operations i.e. in the associated algorithms. Autonomic Fault-Management may trigger reconfiguration procedures while attempting to remove certain types of detected faults, and so it needs to interact with Autonomic Configuration Management components and mechanisms. On the figure, we show some shared repositories that store information/knowledge such as models, required as input to the different FCAPS functional areas. Some of the Information/Knowledge maybe created and possibly used only by one or a few of the functional areas, as shown for the case of Fault-Management and Security Management. What is worthy to note is that the process of creating and populating the “Information/Knowledge Bases” depicted on the figure, with the types of information/knowledge depicted must be “automated” according to what the operators indicated in [1]. For example the auto-discovery of resources and capabilities of functional entities such as network elements and service components, and the building up of such information/knowledge in the corresponding repository must be something considered automated. Autonomic Fault-Management in particular, uses for example, information about Resource and Capability Descriptions to find appropriate mechanisms that can be applied to remove a fault when it has been narrowed down to a particular faulty resource. Capabilities information may then be used by Autonomic Fault Management in collaboration with the Configuration Management functions, to re-configure a system(s) based on knowledge about its Capabilities. Also, from resource descriptions obtained automatically by the self-description and advertisement of Resource and Capability Descriptions from the network elements, it is possible to create a picture of a candidate topology the operator could deploy as well as the advantages of a candidate topology. The advantages could be expressed in terms of employable resilience and recovery strategies (see [10] for more information on the subject of resilience and recovery strategies the operator needs to consider for different types of networks, resources and topologies, and even strategies that can be implemented through a Network Management System (NMS)). Later, in this paper, we describe how Autonomic Fault-Management uses some information/knowledge highlighted in the Figure 2.
How Autonomic Fault-Management Can Address Current Challenges
257
Fig. 2. Closing the gaps among dependencies of FCAPS functional areas
3 Implications of Autonomic Fault-Management (AFM) on a Network and Services Management The implications of Autonomic Fault-Management on Network Management is that all the so-called traditional network management functions, defined by the FCAPS management framework (Fault-management, Configuration-management, Accounting-management, Performance-management and Security-management), as well as the fundamental network functions such as routing, forwarding, monitoring, discovery, fault-detection, fault-removal and resilience, are made to automatically feed each other with information (knowledge) such as goals and events, in order to effect feedback processes among the diverse functions. Since some degree of selfmanagement and self-adaptation needs to be introduced into network device architectures e.g. for routers and switches, the implication is that the FCAPS functions become diffused within node/device architectures, apart from being part of an overall network architecture—whereby traditionally, a distinct Management Plane is
258
R. Chaparadza, N. Tcholtchev, and V. Kaldanis
engineered separately from theother functional planes of the network [3, 4]. The recently emerged architectural Reference Model for Autonomic Network Engineering and Self-Management, dubbed GANA (Generic Autonomic Network Architecture), presents a framework on how the FCAPS becomes diffused into device architectures, while still maintaining an outer Management Plane that would be required to evolve to perform more sophisticated decisions for the operation and optimization of the whole network in an autonomic manner. In this paper, we present the GANA Model in brief, as we seek to illustrate how we have derived a GANA-orientated architecture that unifies Autonomic Fault-Management and Resilience and Survivability within device architecture and the overall network architecture. It is this unified architecture and the autonomic interworking of its components, placement of functions and algorithms that we consider as the machinery for addressing challenges currently faced by operators in Fault-Management.
4 Implications of Autonomic Fault-Management on a Network Element Architecture, Protocols and Services Apart from the architectural implications that can be derived easily from the GANA Model as discussed above and also in [11, 5, 6, 12], the need for information sharing is very crucial. Looking at the current practices in the design of network elements, protocols, services, fault-tolerance implemented in such entities, in most of the cases, implies that the entities execute their intrinsic fault-tolerance mechanisms and do not share with other functional entities (through say a shared information/knowledge base), the information as to what happened, what was detected and was the problem handled successfully?. This information may be useful elsewhere, at a higher level where algorithms for fault diagnosis and resolutions would benefit from knowing such information. This requirement for information sharing in autonomic faultmanagement is described in more detail in [5], where the need for information repositories for registering and sharing such information is discussed in detail. Later, in this paper we briefly touch this subject when we discuss the unified framework for Autonomic Fault-Management, Resilience and Survivability.
5 How Different Types of Faults Can Be Addressed Autonomically In this section (in the Table below), we summarize different types of faults and discuss the different types of Faults that are handled by Resilience and Recovery mechanisms found in fault-tolerant systems, services and protocols (let’s call this DomainA), while contrasting to the ones that must be handled by Autonomic FaultManagement(let’s call this Domain-B). The role of Autonomic Fault-Management in relation to any type of fault is also discussed. We also discuss the inter-working required between functions in Domain-A and functions in Domain-B.
Some protocols are designed with intrinsic mechanisms to detect faults/errors/failures and react by executing resilience and recovery mechanisms i.e. faulttolerance mechanisms and strategies, e.g. Protection and Restoration Schemes in Telecommunication networks [14, 15, 10]. For example Telecommunication Protocols such as ATM, MPLS/IP-FR, SONET/SDH exhibit such resilience and recovery characteristics [15, 10].
Faults, Errors, Failures detected at the Network layer, Link and Physical Layers. Examples include, faulty components, faulty modules, component failure, Link Failure, Node Failure and other related types of errors and failures (e.g see [13]). According to today’s systems engineering approaches, mechanisms exist for detecting all these kind of “problems” to enable reactive resilience and recovery behaviours in some protocols and layers [10].
From the perspective of Autonomic FaultManagement: To augment Alarms that are often generated for some faults, it is necessary to enrich the sets and quality of the
Domain-A: Resilience, Recovery and Faulttolerance Mechanisms intrinsically built into Protocols, Services and Applications
Detected Faulty conditions, faulty input to network systems, and fault symptoms
Fault Removal mechanisms must infer whether resilience and recovery i.e. fault –tolerance mechanisms employed by some protocols upon the detection of an incident have executed successfully or not, in order to execute a strategy on the global level, to remove or reduce the impact of a fault, in order to ensure system/network integrity, reliability and availability.
Normally this process is preformed using Alarm information coming from the network elements. For Autonomic Fault-Management some information about even low level detected incidents often not considered necessary to be communicated to the Network Management System (NMS), as well as detected threats and attacks may be required in performing extensive Fault-Localization, provided that it is practically feasible to include all such information in a Causality Model of a device or a network of some scope. Diverse Fault Diagnosis/Localization and Isolation techniques exist today (see [16]) and may be applied in Autonomic Fault-Management as discussed in [5, 6].
Fault-Detection needs to not only rely on Alarms generated as symptoms, but also access to detected errors and failures or at least knowledge about even some low level incidents that are normally not communicated to the Network Management System (NMS) through Alarms (simply because “humans” may have thought that some low level incidents need not be communicated or even archived for offline analysis).
Decisions regarding Fault-Removal are taken locally by a system while at the same time the system performs actions requested by a higher Decision –level e.g. the NMS level. Some distributed and collaborative Fault-Removal can be performed by network systems [17].
Fault-Removal relies on Dependability and Causality models and Service Topology (if not embedded as part of a Dependability Model) as discussed in [7] and later in this paper..
Fault-Removal
FaultDiagnosis/Localization/Isolation
Fault-Detection
We also discuss the inter-working required between functions in Domain-A and functions in Domain-B
Domain-B: Autonomic Fault-Management
How Autonomic Fault-Management Can Address Current Challenges 259
Faults, Errors, Failures detected at the Services and Application Layers.
There is a great deal of work achieved and research continues in systems engineering practices for the design of fault-tolerant service components and applications. Multi-Layer resilience strategies (see [14, 10]) also include how the service/application layer reactions should be taken into account.
communicated incidents information and the outcome of reactions by resilience mechanisms also need to be communicated to Autonomic FaultManagement components that keep track of global state and perform long term fault resolutions. As OSS are evolving to cover Service Management aspects, Fault Detection in the Services/Applications Layer follow similar requirements as discussed in the case of FaultDetection at the network layer and below, that of information sharing with the Fault-Management functions of the system and network. Since Service Management is the “playground” whose responsibilities fall solely on the network operator, it really means that faults emanating from Service creation, deployment and maintenance must be solved by the tools available to the operator, i.e. Fault-Management processes. As the operator desire to see automated The concept of Service Topology, as discussed in [1], must be instrumental to Fault-Diagnosis for Services. As illustrated in Figure 2, the Model describing Service Topology must be captured, stored and shared automatically by all the FCAPS functional areas.
It still remains a challenge: how to build management systems that unify the management of services and network resources. As noted in [1], MIBs and alarms specific to services and applications are not widely developed to aid in the overall Services Management processes including Fault-Management.
Fault-Removal relies on Dependability and Causality models and Service Topology (if not embedded as part of a Dependability Model). It also remains a challenge as to how to automate the processing of trouble tickets in order to execute some steps towards removing problems which may be narrowed down to network level, though some OSSs’ can perform some limited steps in that direction..
260 R. Chaparadza, N. Tcholtchev, and V. Kaldanis
At this level (“DomainA”), some resilience and recovery functions process alarms and react upon them, while at the same time generating other types of alarms to the Management Systems or OSS’s (see [18] for alarm reporting function).
Such Human Errors are not handled at this level (“Domain-A”), but rather in Fault-Management (“Domain-B”).
We can not expect resilience and recovery mechanisms to handle such faults.
Alarms
Human Errors during network and service management phase
Unknown Faults resulting from insufficient Testing of software
Alarm “severity” indications are vital input to We can eliminate the problem of Fault-Removal functions and decisions made redundant Alarms discussed in [1] autonomically. through models that when used by network elements, the elements can collaborate to suppress some alarms to avoid multiple alarms being sent to the NMS when the alarms are triggered by the same fault. Advanced Alarm correlation techniques are required as integrated solutions with the Autonomic Fault-Management system [1]. As illustrated on Figure 2 there are some aspects of Fault-Detection and Fault-Removal that should be handled in Configuration Management area while some aspects that should be handled in the Fault-Management functional area. In [19], an insight is provided as to what sort of faults may be created during the configuration of the network by humans e.g. erroneous configuration data provided as input to systems. In 18[19], automation techniques are discussed that relieve the operator from introducing severe faults into the input supplied to configure network elements. We believe that as further automation is required beyond what is described in [19] via autonomics, the autonomic Fault-Manager components of the network (presented later in this paper) should incorporate Fault-Tolerance techniques described in [9], or generally speaking fault-removal techniques) such as Rollback, Rollforward, compensation, Reconfiguration and Re-initialization, etc. Such kind of faults are hard to detect, and often classified as “unknown” during Fault-Diagnosis/Localization/Isolation. Often they are handled by upgrading the software and hardware components by applying patches and resorting to new releases [19]. From the perspective of Autonomic Fault-Management, the detection of situations whereby faults can be classified as “unknown” due to the fact that the techniques used in Fault Localization may have reached an inconclusive verdict, means the Autonomic Manager Components (i.e. DEs in the GANA Model [3]) responsible for autonomically managing a device (s) should automatically be designed to be able to synchronize with each other to fetch patches and new releases from vendor servers, and schedule updates events for the network, after which they can perform selfvalidation to ensure that the systems and the network are working properly.
service monitoring (refer to [1], it means Autonomic Fault-Management must perform Service Monitoring e.g. Service Probing to detect problems. Alarms are used primarily as symptoms used for triggering FaultDiagnosis/Localization.
How Autonomic Fault-Management Can Address Current Challenges 261
262
R. Chaparadza, N. Tcholtchev, and V. Kaldanis
6 Generic Autonomic Network Architecture in Brief A central concept of GANA is that of an autonomic Decision-Making-Element (“DME” or simply “DE” in short—for Decision Element) i.e. an “Autonomic Manager Component”. A Decision Element (DE) implements the logic that drives a control-loop over the “management interfaces” of its assigned Managed Entities (MEs). Therefore, in GANA, self-* functionalities such as self-configuration, self-healing, self-optimization, etc, are functionalities implemented by a Decision Element(s). The Generic Autonomic Network Architecture (GANA) is based on a set of requirements derived in [3, 4]. Since control loops on different levels of functionality are possible, e.g. on node or network level, GANA defines the Hierarchical Control Loops (HCLs) framework. The HCLs Framework fixes and establishes four levels of abstraction for which DEs, MEs and associated control-loops can be designed: Level-1: ProtocolLevel, i.e. control loops embedded within protocol modules (e.g. within some routing protocol). Level-2: Abstracted Functions-Level, i.e. DEs managing some abstracted networking functions inside a device e.g. routing, forwarding, mobility management, Level-3: Node Level - the node level consist of a Node_Main_DE that takes care of the management of aspects related to the state/fitness of the overall node, e.g. FaultManagement and Auto-Configuration. Level-4: Network Level -DEs on that level manage different aspects that require to be handled at the network-level, e.g. routing or monitoring, of a group of nodes according to a network scope. Thereby, control loops (i.e. DEs) on a higher level manage DEs on a lower level down to the lowestlevel "pure" MEs. Detailed information about all the presented concepts, examples, as well as discussions on the application of GANA to diverse aspects of Autonomic Networking can be found in [3].
7 Decision Notification in GANA for the “Human in the Loop” Certain types of decisions made by DEs should be communicated to the administrator. Potentially, decision made at the “node-level” in GANA or at the “network-level” by Network-Level-DEs are candidate for “decision notification” to the human. The majority, though, should be handled by the autonomic entities. This requires that: (a) When the administrator informs the network that for certain types of Decisions (the human will specify them using some means e.g. a Rule or Policy Specification Language) she wants a Decision Notification, the DEs should notify the administrator so that the human closes the loop by providing a response to the Decision Notification. This may happen during the early days/weeks/months of operating the autonomic network till the time when the administrator has build trust. (b) When the administrator deactivates Decision Notification, DEs shall proceed with executing decisions without Decision Notification to the human. (c) When Decision Notification is deactivated as in (b), DEs shall however inform (possibly by simply logging the decision(s) that were taken in response to “a triggering event”), meaning that both: decision(s) and associated event should be logged. Example cases for decision notification are given in [20].
How Autonomic Fault-Management Can Address Current Challenges
263
8 GANA-Oriented Unified Architecture for Autonomic Fault-Management, Resilience, and Survivability in Self-managing Networks According to [2], self-healing is defined as follows: “To detect incidents such as adverse conditions, faults, errors, failures; diagnose, and act to prevent or solve disruptions”. That is, on one hand a network equipped with autonomic self-healing mechanisms should aim at automatically preventing future fault activations, and on the other hand it should resolve occurred/activated faulty conditions. This corresponds to a number of concepts and approaches that have been investigated recently, such as Autonomic Fault-Management as well as reactive and proactive Resilience in autonomic networks.
Fig. 3. The UAFAReS [11] architecture inside an Autonomic Node
Within the EFIPSANS [21] project, we introduced a Unified Architecture for Autonomic Fault-Management, Resilience and Survivability in Self-Managing Networks (UAFAReS). UAFAReS is based on the observation that the evolution of traditional Fault-Management towards Autonomic Fault-Management enables network devices to exercise self-healing and recover from faulty conditions. That is, the nodes of the network are then able to automatically self-heal (to some degree) without the need for human intervention. Hence, Autonomic Fault-Management has to interplay with concepts and mechanisms related to Fault-Tolerance, Fault-Masking, and Multilayer Resilience [14]. This implies harmonization (i.e. ordered time-scaling of reactions) to incidents, at different levels of autonomicity and self-management defined by GANA.
264
R. Chaparadza, N. Tcholtchev, and V. Kaldanis
UAFAReS is based on the GANA reference model and specifies a number of components which aim at realizing the interplay of the aforementioned aspects. The node components of the architectural framework are illustrated in Figure 3. The main UAFAReS entities in a device are the Fault-Management Decision Element (FM_DE) and the Resilience and Survivability Decision Element (RS_DE). The RS_DE is responsible for an immediate reaction to the symptoms of an erroneous state while in parallel the FM_DE performs Fault-Isolation and Fault-Removal in order to eliminate the corresponding root cause(s). Both DEs are part of the Node_Main_DE, i.e. they are introduced on node level inside a GANA conformant device, in order to have exclusive access to all node functional entities (i.e. DEs and MEs) such that the overall autonomic behaviors of a node with respect to coping with incidents and alarms are synchronized to ensure node integrity. The UAFAReS DEs operate based on distributed control loops. The distributed nature of the UAFAReS control loops is enabled by a number of components that facilitate the incident information exchange across the network nodes. A set of repositories for storing incident information and an Incident Information Dissemination Engine (IDE) enable the synchronization of the faults/errors/failures/alarms knowledge known by UAFAReS DEs residing in different devices, and allow the DEs to perform Fault-Masking, Fault-Isolation and FaultRemoval in a node specific manner, based on the same information. The FM_DE consists of four modules: 1) a component responsible for FaultIsolation (Fault-Diagnosis/Localization/Isolation functions abr. FDLI), 2) FaultRemoval Functions (FRF), 3) Action Synchronization Functions (ASF) – responsible for synchronizing (allowing and/or disallowing) tentative actions issued as by the RS_DE and the FM_DE control loops running in parallel, 4) Fault-Removal Assessment Functions (FRAF) – a component responsible for assessing and verifying the success of the fault removal actions issued as output of the FM_DE. The interactions of these modules towards the realization of an Autonomic Fault-Management control loop are illustrated in Figure 4. Specially instrumented monitoring entities, which have the capability to share incident information over the UAFAReS incident repositories, push descriptions of symptoms to the UAFAReS fault/error/failure/alarm registries such that the info gets conveyed (i.e. stored in the UAFAReS node registries) by the Incident Dissemination Engine (IDE) to the UAFAReS instances across the network scope, e.g. subnet/LAN. Once an incident description has been reported to the FM_DE over the UAFAReS incident repositories, it gets received and processed first by the FDLI functions as depicted in Figure 4. That is, the FDLI functions collect such events and correlate them in order to find the root cause for the observed faulty conditions. Algorithms that can be used for event correlation are presented in [6, 16, 22]. The correlation of incident events is realized by the FDLI functions based on a Causality Model that is kept in the Causality Model Repository (CMR) inside a node. The identified root cause(s) (faults) are then further submitted to the FaultRemoval Functions which implement an “if-then-action” logic that issues a reaction required to eliminate the faults, e.g. reconfiguration of an entity by using, e.g. the command line interface (CLI). Since it is possible that the tentative reaction would interfere with other actions that are intended to be performed by either the RS_DE control loop (next paragraph), or would interfere with a parallel Autonomic FaultManagement control loop process (i.e. a thread in multi-threading environment), the ASF should be invoked in order to allow or disallow the tentative action in question.
How Autonomic Fault-Management Can Address Current Challenges
265
The ASF is based on techniques from the area of optimal control, and selects the optimal subset of tentative actions in order to better optimize the network performance reflected by the values of selected key performance indicators while ensuring integrity. An applicable algorithm can be found in our previous work [12]. Given that the ASF has allowed a tentative action, the FRF issues it on the MEs in question inside the device. Thereby the FRF can make use of information regarding the dependencies among protocol entities and services, kept in the Dependability Model Repository (DMR). Finally, the success of the executed action is assessed by the FRAF functions, which may choose to notify the network operator in case when the UAFAReS mechanisms can’t cope with the pending challenges.
Fig. 4. The Autonomic Fault-Management control loop inside a node
The Resilience and Survivability DE contains the Fault-Masking Functions (FMF) component and a Risk Assessment Module (RAM). The Fault-Masking Functions realize a reaction immediately after the symptoms of a faulty condition have been registered into the UAFAReS alarm/incident repositories. Thereby, the goal of the FMF is to implement a fault-tolerant behavior such that some fundamental level of service can be sustained in the face of a pending challenging condition. The FMF follow a similar logic as the Fault-Removal Functions of the FM_DE, and consult the Actions Synchronization Functions of the FM_DE to react first in order to ensure that the best possible set of actions is executed. The FMF, as the instance of first reaction, should also consider the aspect of Multilayer Resilience while orchestrating a faulttolerant/masking behavior. Multilayer Resilience is a model that deals with the capabilities of functional entities at different layers in the protocol stack to execute their own embedded resilience behaviors. For instance, in IP networks generated ICMP messages enable systems (especially end systems) to overcome issues occurring in the network, e.g. sudden changes of PMTU (Path Maximum Transmission Unit) during the lifetime of a connection. Thus, the FMF is expected to allow the protocol modules to recover based on their own intrinsic capabilities and should intervene only in the case when these mechanisms fail. [14] proposes the usage of “hold-off” timers specifying the time that should be given to a protocol to recover on its own. Information on
266
R. Chaparadza, N. Tcholtchev, and V. Kaldanis
how to handle the resilience properties of a protocol module (e.g. protocol module ID and corresponding “hold off” timer) is kept in the Multi-Layer Resilience Properties Repository. In addition, the operation of the Risk Assessment Module (RAM) is based on monitoring information about diverse key performance indicators (e.g. CPU temperature) that are used to calculate the probability for failures in the future. This results in notifications to the FMF which consequently have to trigger mechanisms that help proactively avoid significant degradation in the QoS in the future.
9 Concluding Remarks In this paper, we presented some perspectives on how Autonomic Fault Management can address some of the challenges in Fault-Management faced in IT and Telecommunication Networks. The work presented is framework oriented, and is inspired by the need to move the well known and established FCAPS management framework towards automated management through concepts and principles of autonomic networking and self-management. Looking at the fact that there exist dependencies between dependability of systems and security, we see the need to close gaps characterized by dependencies among FCAPS functional areas as the FCAPS functional areas go autonomic and realize self-management. The dependencies among FCAPS functional areas need to be studied such that the functions/operations and processes that belong to the different areas can be interconnected well to achieve global system goals, such as integrity, resilience and high degree guarantee of system and service availability. We have illustrated how this can be achieved by providing a framework that can be further refined while doing actual implementations. We believe that such a framework should be the basis for reasoning about how Autonomic FaultManagement can address some challenges faced by Operators and vendors in the design and operation of Self-Managing Future Networks. We categorized some types of faults and discussed how certain faults are handled by resilience and recovery mechanisms intrinsic to some protocols and service components, and how certain faults can only be handled within the realm of Configuration Management phase, and some within the real of Fault-Management during the operation time of the network. The role of enriched information/knowledge sharing has also been discussed as part of the glue required to interconnect functions and operations belonging to the different FCAPS functional areas. We do not claim that there are no obstacles to implementing the framework, since what we offer are guidelines and so, issues such as scalability and complexity need to be addressed during the derivation of implementation architectures from the framework we provided. Our further work will be based on the evaluation of the frameworks, such as the UAFARes Framework we proposed, whose architectural components are built on the concepts and principles prescribed by the emerging, standardizable GANA architectural Reference Model for Autonomic Networking and Self-Management. Acknowledgement. This work has been partially supported by EC FP7 EFIPSANS project (INFSO-ICT-215549).
How Autonomic Fault-Management Can Address Current Challenges
267
References [1] Wallin, S., Leijon, V.: Telecom network and service management: An operator survey. In: Pfeifer, T., Bellavista, P. (eds.) MMNS 2009. LNCS, vol. 5842, pp. 15–26. Springer, Heidelberg (2009) [2] Autonomic Computing: An Architectural Blueprint for Autonomic Computing. IBM White Paper (2006), http://www-01.ibm.com/software/tivoli/autonomic/ [3] Chaparadza, R.: Requirements for a Generic Autonomic Network Architecture (GANA), suitable for Standardizable Autonomic Behavior Specifications for Diverse Networking Environments. International Engineering Consortium (IEC), Annual Review of Communications 61 (2008) [4] Chaparadza, R., Papavassiliou, S., Kastrinogiannis, T., Vigoureux, M., Dotaro, E., Davy, A., Quinn, K., Wodczak, M., Toth, A.: Towards the future internet - A European research perspective. In: Creating a viable Evolution Path towards Self-Managing Future Internet via a Standardizable Reference Model for Autonomic Network Engineering, pp. 136–147. IOS Press, Amsterdam (2009); published by the Future Internet Assembly (FIA) in Europe [5] Chaparadza, R.: Unifaff: a unified framework for implementing autonomic fault management and failure detection for self-managing networks. Int. J. Netw. Manag. 19(4), 271–290 (2009) [6] Tcholtchev, N.: Scalabale Markov Chain based Algorithm for Fault-Isolation in Autonomic Networks. Accepted to appear in the Proceedings of the NGN Symposium of Globecom (2010) [7] Li, N., Chen, G., Zhao, M.: Autonomic fault management for wireless mesh networks. Electronic Journal for E-Commence Tools and Applications (eJETA) 2(4) (January 2009), http://www.cs.uml.edu/~glchen/papers/fault-ejeta09.pdf [8] The FCAPS Management Framework. ITU-T Rec. M.3400 (February 2000) [9] Avizienis, A., Laprie, J.C., Randell, B., Landwehr, C.: Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable Secur. Comput. 1(1), 11–33 (2004) [10] Autenrieth, A.: Differentiated Resilience in IP-Based Multilayer Transport Networks. Ph.D. thesis, Technische Universität München (2003); Presented in 2003 at ”Lehrstuhl für Kommunikationsnetze” [11] Tcholtchev, N., Grajzer, M., Vidalenc, B.: Towards a Unified Architecture for Resilience, Survivability and Autonomic Fault-Management for Self-Managing Networks. In: MONA 2009: Proc. of 2nd Workshop on Monitoring, Adaptation and Beyond, MONA+ (2009) [12] Tcholtchev, N., Chaparadza, R., Prakash, A.: Addressing stability of control-loops in the context of the GANA architecture: Synchronization of actions and policies. In: Spyropoulos, T., Hummel, K.A. (eds.) IWSOS 2009. LNCS, vol. 5918, pp. 262–268. Springer, Heidelberg (2009) [13] Markopoulou, A., Iannaccone, G., Bhattacharyya, S., Chuah, C.N., Ganjali, Y., Diot, C.: Characterization of Failures in an Operational IP Backbone Network. IEEE/ACM Trans. Netw. 16(4), 749–762 (2008) [14] Touvet, F., Harle, D.: Network Resilience in Multilayer Networks: A Critical Review and Open Issues. In: Lorenz, P. (ed.) ICN 2001. LNCS, vol. 2093, pp. 829–838. Springer, Heidelberg (2001)
268
R. Chaparadza, N. Tcholtchev, and V. Kaldanis
[15] Types and Characteristics of SDH Network Protection Architectures. ITU-T Rec. G.841 (December 1997) [16] Steinder, M., Sethi, A.S.: A survey of fault localization techniques in computer networks. Science of Computer Programming 53(2), 165–194 (2004), http://dx.doi.org/10.1016/j.scico.2004.01.010 [17] Tcholtchev, N., Chaparadza, R.: On Self-Healing based on collaborating End-Systems, Access, Edge and Core Network Components. In: SELFMAGICNETS 2010: Proc. of the International Workshop on Autonomic Networking and Self-Management. ICST ACCESSNETS 2010 (November 2010) [18] Information Technology - Open Systems Interconnection - Systems Management: Alarm Reporting Function, ITU-T Rec. X.733 (February 1994) [19] Juniper Networks Inc.: Juniper Network Whitepaper: What’s Behind Network Downtime? Proactive Steps to Reduce Human Error and Improve Availability of Networks (2008) [20] Tcholtchev, N., Chaparadza, R.: Autonomic Fault-Management and Resilience from the Perspective of the Network Operation Personnel. In: IEEE MENS 2010: IEEE International Workshop on Management of Emerging Networks and Services (MENS), Miami (December 2010); in conjunction with IEEE Globecom 2010 [21] EC FP7-IP EFIPSANS Project (2008-2010), INFSO-ICT-215549 http://www.efipsans.org [22] Hasan, M., Sugla, B., Viswanathan, R.: A conceptual framework for network management event correlation and filtering systems. In: Sloman, et al. (eds.), pp. 233–246 (1999)
Efficient Data Aggregation and Management in Integrated Network Control Environments Patrick-Benjamin B¨ok, Michael Patalas, Dennis Pielken, and York T¨ uchelmann Ruhr-University Bochum, 44801 Bochum NRW, Germany (boek,patalas,pielken,tuechelmann)@iis.rub.de http://www.iis.rub.de
Abstract. Due to the emerging growth of computer networks, broadly based measurements, monitoring and management become necessary, for example, to solve occurring problems. Lots of different concepts exist for each of the mentioned functionality. Therefore, distributed network control architectures integrating all of these functionalities are in the focus of current research. To take advantage of this architectures, advanced data aggregation and management schemes are required because an efficient access to the distributed data is critical in this case. In this paper, we present a data aggregation and management scheme that improves the performance of data handling in Integrated Network Control environments. The Integrated Network Control concept is enhanced by an multidimensional on-line analytical processing (OLAP) scheme. A performance analysis shows that the proposed scheme noticeably improves the overall performance of the Integrated Network Control environment. Keywords: Business Intelligence, Data Aggregation, Hierarchical Communication, Network Control, Network Measurement, OLAP.
1
Introduction
Computer networks have to be monitored and managed efficiently with a focus on the performance of the network and its elements observed. Due to the increasing size and complexity of computer networks, broadly based measurements, monitoring and management become necessary especially if problems regarding the performance occur within the network. Lots of different concepts and tools exist for each of these functions. Because of the huge amount of different tools and concepts, Integrated Network Control architectures have been developed that try to subsume these tools and concepts into a controlled architecture. Instead of using much additional and expensive high-performance hardware, Integrated Network Control environments as [1] have the advantage to perform measurement, monitoring and management functions using one central system that distributes the functions of network control towards the observed elements, e.g. the traffic generating hosts. Thereby, these elements are utilized for network control purposes. To work efficiently, [1] uses a hierarchical group communication structure. An efficient data aggregation and management scheme is required to take R. Szab´ o et al. (Eds.): AccessNets 2010, LNICST 63, pp. 269–282, 2011. c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
270
P.-B. B¨ ok et al.
advantage of this communication structure because synchronization and analysis are of fundamental importance to the fully distributed approach of Integrated Network Control. Due to the fact that the measured or monitored data is distributed over the elements, accessing this data is critical because the network of observation should not be influenced significantly by the transfer of data caused by network control. However, an administrator must be able to access the data when it is required. This ranges from historical analysis to near-term monitoring. Therefore, the data has to be transferred to the administrator. Besides, the distributed data has to be merged and aggregated, respectively. In the field of business intelligence, several approaches of data warehousing with on-line analytical processing (OLAP) exist. These are already in use for data analysis and monitoring purposes, also in distributed systems, but disregarding any performance aspects with reference to the used computer network. A brief review on related work is given in Section 2. On the basis of the related work, we propose a data aggregation and management scheme that improves the performance of data-handling in Integrated Network Control environments. Therefore, a central data warehouse including an OLAP-engine is integrated into the data-handling component of the Integrated Network Control Framework (INControl-F) [1]. The enhanced data management allows an efficient analysis of live and historical data, using composed multidimensional data models. Data aggregates are pre-calculated in a number of operational data store. The access by and the transfer to a data warehouse are managed by a specially designed data management component. Thereby, the performance of the hosts and the computer network is not influenced negatively. The remainder of this paper is structured as follows. The Integrated Network Control architecture is outlined in Section 3. In Section 4, the proposed data aggregation and management scheme is presented. The general improvements induced by the proposed scheme are discussed in Section 5 and confirmed by the results of a performance analysis presented in Section 6. We conclude our contribution in Section 7.
2
Related Work
Data warehousing and accordingly different types of OLAP are used in several business segments, e.g. e-commerce or clinical reporting, as presented by Thomas et al. [2], Xiangdong and Xiao [3] as well as Hamm et al. [4]. Because of them being used for business purposes and not for performance sensitive network control purposes, the influence caused by computation on nodes and by data exchange using a network has been neglected. Therefore, these approaches cannot be used or adopted for network control purposes. To improve the efficiency of computation and analysis of data, Albrecht et al. [5] propose an advanced management of multidimensional data cubes. Therefore, a context-sensitive model that uses hierarchical data stores is introduced. Specialized methods and systems for distributed and parallel OLAP and data warehousing are presented by several researchers. Chen et al. [6] focus on the organization of distributed data
Efficient Data Aggregation and Management
271
sources in OLAP infrastructures. They present an approach that allows to harmonize the data of different distributed sources according to dynamic patterns and rules. Their prototype confirms the motivated improvements for data organization in distributed OLAP environments. Also Jianzhong and Hong [7] show improvements for the handling of the huge amount of data in parallel and distributed data warehouses. The authors present a scheme that improves the range sum query processing in such OLAP environments. Akinde et al. [8] increase the efficiency of OLAP query processing, that is very critical in distributed OLAP environments. They show that the storage of data without using an operational data store is impractical. Using an OLAP query translator, queries on the data warehouse can be spread to a number of distributed OLAP elements. Besides organizing and accessing the data of data warehouses, the storage of data within the data bases of date warehouses is critical. He et al. [9] focus on the improvement of a storage algorithm for multidimensional OLAP (MOLAP) that uses multidimensional data cubes. The proposed algorithm reduces the time to maintain multidimensional data cubes because in the majority of cases this task is time-critical. Therefore, the performance of data cubes is improved on the basis of the hierarchies existing in multidimensional data cubes. A similar concept is presented by Shimada et al. [10], who simplify the dimension dependency in the storage process. The dimension dependency is a capital problem when data arrays are accessed in multidimensional data cubes. All of these researchers show different improvements for the aggregation and storage of data. However, they neglect the performance of systems and networks used for computation or transfer, respectively. Yeung et al. [11] discuss the problem of transferring data from distributed and operational data stores to a central data warehouse. Their concept allows updating the data warehouse, initiated by operational data stores if a modification of data occurs. It allows a centralized near-term analysis, but impairs the performance significantly with an increasing number of operational data stores to include. Xu et al. [12] introduce an on-demand pull mechanism to solve this shortcoming. Instead of using an autonomous push mechanism, data is pulled from the operational data stores by a central data warehouse. This concept works more efficient when using an increasing number of operational data stores. Besides, Hose et al. [13] improve the performance of OLAP by means of an a priori preparation of complex aggregation requests. All of the mentioned approaches in this section present data warehousing concepts with on-line analytical processing (OLAP) that are already in use for data analysis and monitoring purposes in distributed systems. However, the impact of their concepts on the performance of a computer network or the system as a whole is ignored, which makes it impossible to use these approaches in Integrated Network Control environments.
3
Integrated Network Control Architecture
In most of the network control architectures, much additional hardware is required to perform network control. Network control includes functions and
272
P.-B. B¨ ok et al.
software for network monitoring, measurement and management. For these functionalities it is necessary to site active and passive elements in the network. In [1] an architecture is presented that utilizes up to every client of a network, instead of siting probing and steering elements across the network. To efficiently utilize every client for network measurement, monitoring and management purposes, an efficient communication structure is necessary because the emerging overhead, which is created by control and command flows as well as data migration flows, for example that consist of network captures, between a central component and the utilized elements (Fig. 1 (a)) may become critical. To avoid a negative influence on the network and its elements, a dynamic hierarchical group communication structure (see Fig. 1 (b)) is introduced.
(a)
(b) Master
Master
SN1 C15
C1
…
SN3 C21
C30 C1
SNY CX
SN2 C16
Client X being SuperNode Y
... C14
CX
C17 ... C20
C22 ... C30
Client X
Fig. 1. Hierarchical Communication Structure
Instead of using a flat hierarchy, dynamic groups are formed as possible by selectable parameters (e.g. by the logical subnets). Thereby, a central controller (Master) that organizes all tasks is enabled to delegate tasks to group managers (Super-Nodes), which communicate with the clients of their group and forward the accordant tasks. Super-Nodes are chosen by the master on the basis of a score that, among other things, includes values for computing power, memory power, data transfer rate, uptime and past experience of availability. Also a fault detection mechanism allowing a fault-tolerant behavior of this structure is already included. Therefore, the availability of Super-Nodes is periodically checked by the Master and the structure is changed accordantly. The occurring problem of data consistency is guaranteed by the data aggregation and management scheme, presented in the next section. In the following, we use host synonymic with client and node as instrumented elements. The task forwarding is illustrated in Fig. 2. To improve the architecture’s and its communication structure’s efficiency, the Super-Nodes perform data aggregation methods on the data retrieved from the clients of their group (see Fig. 2), before it is forwarded to the Master. The proposed data aggregation and management scheme for Integrated Network Control purposes, which uses this communication structure will be discussed in the next section.
Efficient Data Aggregation and Management
273
Master Task1
Task1
SN2 C16
SN3 C21 ...
Task1
AnswerAgg(1-14)
...
SN1 C15
SNY CX Taskx
Answer1 Answer 14
C1
...
Client X being SuperNode Y
C14
CX
Client X
Transmission of Task X
Answery
Transmission of Answer Y
Answerz
Transmission of Aggregated Answer Z
Fig. 2. Hierarchical Task Forwarding and Answer Aggregation
Network
Connection Service
Message Patcher/Dispatcher Node Structure Builder
Command Controller
Control Layer Service
Plug-In Service Plug-In1
Data Aggregation and Management
Security Service
Plug-In2
Fig. 3. Enhanced INControl-F Architecture
The enhanced Integrated Network Control architecture has been implemented within a framework called INControl-F, illustrated in Fig. 3. It consists of several components that provide the basic functionality required to build the communication structure and to enable the described features. The INControl-Framework runs on hosts of inspection within a network . Thereby, no additional hardware except for a managing Master is required. The Connection Service is a lightweight communication component that is responsible for any communication between Master, Super-Nodes and hosts. The Security Service ensures secure and confidential data transmission between the different levels in the communication hierarchy. It works transparent to any component. A Control Layer Service includes sub-components for patching and dispatching messages to or from components, the Message Patcher/Dispatcher, for controlling the framework, the Command Controller, and for the automatic build up of the hierarchical group communication structure, the Node Structure Builder. An INControl-Plug-In represents a measurement, monitoring or management function that has been included as an attachable component into the framework. To allow an easy integration of plug-ins, a Plug-In Service offers a stable API and essential functions to be used for communication by and with plug-ins. As illustrated in Fig. 3, a Data Aggregation and Management component is included in the architecture. It is responsible for the aggregation and management of all collected data. The novel data aggregation and management scheme used within this component is presented in the next section.
274
4
P.-B. B¨ ok et al.
Data Aggregation and Management Scheme
The proposed data aggregation and management scheme for Integrated Network Control environments considers the hierarchical communication structure, its various layers and their specific demands. The lowest layer in the hierarchy contains the hosts of a network on which data is collected and initially aggregated. Due to the definition of the Integrated Network Control architecture, computationally intensive data analysis is not performed on the lowest layer. Thus, measured data will be transferred to the corresponding Super-Node, representing the next higher layer in the hierarchy and having sufficient capabilities to serve a quantity of hosts. Because a Super-Node is part of the network, it may also collect data. In our scheme we activate an operational data store on each node that acts as a Super-Node. An operational data store contains the collected data received from the lower layer and the Super-Node itself. Data aggregation and management functions have to be the same on each host because every host may become a Super-Node. Because of performance restrictions on each host or Super-Node, a Super-Node will not contain its own OLAP-Engine for multi-dimensional data analysis in our scheme. Instead, a central data warehouse that includes an OLAP-Engine to perform the multi-dimensional data analysis is attached to the Master. The data warehouse automatically builds up the multi-dimensional data cubes based on the available information of each INControl-Plug-In that generated the corresponding data. To minimize the network overhead created by the Integrated Network Control architecture, collected data from the operational data store can be transferred to the data warehouse in large intervals, for example, when the network utilization is low. Thereby, historical analysis becomes possible. The processing of a request is explained in detail in Fig. 4. First, a request is sent to the data warehouse. If a request relates to data that has not been transferred from the according operational data store to the data warehouse, yet, an aggregate of the requested data compressed with the aggregation factor klevel will be transferred to the data warehouse. The value of the aggregation factor klevel depends on the current utilization of the network and the corresponding Super-Node. These are checked periodically using statistical measurements from routers in the core network and from Super-Nodes. Thereby, it can be ensured that the network and the Super-Node will not be influenced negatively. Finally, the OLAP-Engine on the data warehouse is used as a MDX/SQL-converter and SQL-queries are sent to the corresponding Super-Node to request the data. Since the data structures in each operational data store are identical to the data structures in the data warehouse, an execution of short SQL statements becomes possible. Afterwards, the result is returned and added to the data warehouse, so that it is available for future requests. Using this concept, near-term analysis can be performed efficiently. Nodes store their measured values in flat files, whereas the Super-Nodes store the data retrieved from nodes in an embedded database and the Master stores the data retrieved from the Super-Nodes in a framework external database. Because nodes and also Super-Nodes should not be overloaded with data to store
Efficient Data Aggregation and Management
Data Warehouse
275
Master Identify corresponding Super-Node
Incoming OLAP Request Determine network load caused by the data-transfer Data already transferred from ODS ?
yes
Convert MDX Æ SQL and send request to Master
Send request to identified Super-Node
Add ODS result to DW
Relay ODS results
no Request data from ODS
Calculate measures on DW Cube
Determine aggregation factor klevel based on network load
Show OLAP-Request results
Transfer data ?
Return ODS results
Execute query on ODS ODS
yes
no
Super-Node
Fig. 4. OLAP Request Processing Master INControl-F
INControl Kernel
Operational database
INControl-DWH-Plug-In
Data Warehouse
Web application
Pentaho Mondrian OLAP
SOAP protocol transfer
Super-Node
Node
INControl-F
INControl-F transfer
INControl-BI-Plug-In
Data Flow
INControl-BI-Plug-In
Control Flow
Fig. 5. Scheme Components within INControl-F
within their structures, data has to be deleted after some time. The failure of a Super-Node leads to a change of the Super-Node for a certain group of nodes. This may become a critical point if data has already been transferred to the Super-Node, but not to the Master. If the data would be erased on the nodes directly after they are transferred to the next higher level in the communication hierarchy, such a failure of a Super-Node or link failures would lead to the loss of data. To counter this problem, data transferred to the next higher level will only be erased if the completion of a transfer to the Master has been acknowledged. It can be assumed that the Master as high-performance element offers redundancy in data storage. Thereby, the loss of data can be eliminated. The data aggregation and management scheme extends the INControl-Framework with several sub-components, as illustrated in Fig. 5. The active instance of INControl-F on the Master is extended with a data warehouse plug-in (INControl-DWH-PlugIn) that connects INControl-F to a data warehouse. An operational database of the data warehouse is connected to INControl-F through the INControl-Kernel.
276
P.-B. B¨ ok et al.
The active instance of INControl-F on a Super-Node or node is extended with an business intelligence plug-in (INControl-BI-Plug-In) that connects such an instance to the data warehouse plug-in of the Master or to the business intelligence plug-in of a Super-Node, respectively. Both plug-ins together represent the functions of the data aggregation and management scheme, which utilize the operational database and the data warehouse. The analysis of the data in the data warehouse is allowed through a web application on the basis of the public domain OLAP tool Pentaho Mondiran OLAP.
5
General Improvements
The data aggregation and management scheme described before leads to several improvements discussed in this section. Because of network control data being additional overhead for the network, it should not influence operative business applications using the network. Using the aggregation factor klevel , a smart varying of the granularity of data of interest to be transferred becomes possible. Thereby, the amount of data from the INControl-Framework that has to be transferred over the network to the central data warehouse can be reduced as confirmed in the next section. The aggregation of data within the given hierarchical communication structure is performed bottom-up, starting at the hosts using an aggregation factor of k1 . This factor defines how strong the data from a host will be aggregated before it is transferred to the Super-Node. As soon as the aggregated data of all nodes is available in the operational data store of a Super-Node, this already aggregated data will be auxiliary reduced using the aggregation factor k2 . Thereby, the total amount of data transferred from the operational data store of a Super-Node to data warehouse of the Master is reduced. Aggregated data will be available on the Super-Node until it is requested and transferred to the data warehouse or the recurring transfer to the data warehouse is performed. Thereby, the total amount of transferred data within the hierarchical communication structure can be significantly reduced as shown in the following. In Fig. 6 the ideal impact of the aggregation factor on each level in the communication hierarchy is illustrated. The total amount of data to be transferred from i = {0, ..., n} clients to the operational data store of a corresponding Super-Node j = {0, ..., s}, with each client collecting an amount of di,j (ΔT ) data within period ΔT , is n di,j (ΔT ) i=0
k1
Hence, the total amount of data to be transferred from s Super-Nodes j to the central DW, considering the amount of data dj (ΔT ) measured within period ΔT on the Super-Node itself, is s n j=0 i=0
di,j (ΔT ) k1
k2
+
s dj (ΔT ) j=0
k1 · k2
<
s n j=0 i=0
di,j (ΔT ) +
s j=0
dj (ΔT )
Efficient Data Aggregation and Management
277
Master s
n
di, j ('T)
¦¦
s d ('T ) ¦ j j 0 k1 k2
k1
j 0i 0
k2
SN1 n
d i, j (' T )
i 0
k1
¦
...
SNs aggregation factor k2
C1
...
Cq aggregation factor k1
n Clients i excluding Super-Nodes s Super-Nodes j d(ΔT) Measured data within ΔT
k1 Aggregation factor on the Clients k2 Aggregation factor on the Super-Nodes
Fig. 6. Impact of the aggregation factor
which is smaller than without aggregation. Because of k1 > 1 and k2 > 1, the aggregation has a significant impact on the amount of values and data transferred. To demonstrate this, consider four Super-Nodes each having three hosts beneath. Thus, a group of hosts with the corresponding Super-Node consists of four measuring hosts. Each node or Super-Node creates a measured value once per second. The total amount of network control overhead created by the transferred data can be reduced according to the aggregation factors klevel . For example, nearly 75% of the amount of data to be transferred can be saved using the proposed scheme within the scenario described before if the aggregation factors k1 and k2 are chosen to be 2. Using these aggregation factors, a flexible reaction on the different demands becomes possible. For real-time data warehouse architectures, an essential requirement is to send data to a central data warehouse as fast as possible. Thereby, the data warehouse has to manage a large number of connections. On the one hand, this loads the server. On the other hand, real-time traffic is created, even when the network is almost near to its limit. If monitoring requests are rarely required, these are disadvantages that cannot be tolerated because network control is an additional application to observe the vitality of a network. The proposed scheme overcomes these disadvantages. The Super-Nodes reduce the operational burden of the data warehouse. The data warehouse does not have to address all the hosts individually. Instead, only the aggregated data from the Super-Nodes have to be managed. Additionally, initial data aggregation of collected data can be performed on the nodes. The dynamical variation of the aggregation factors klevel based on the current utilization of Super-Nodes, hosts and network, holds the requirement that none of these elements should be affected by the Integrated Network Control architecture at any time.
278
6
P.-B. B¨ ok et al.
Results
Number of Values
As shown in this section, the results of the performance analysis and are compliant to the general improvements motivated in the former section. For the performance analysis, a basic scenario is constructed that consists of five nodes (Dell OptiPlex 760, E7000-Series Core2Duo, 2GB DDR2 RAM, Windows Vista SP2) including two Super-Nodes that are both connected to the Master and, thereby, building an unbalanced tree. The INControl-Framework on every node controls a monitoring tool that creates a measured value once per second. For the performance analysis, it does not matter what the content of such a value is. After thousand measured values the test is stopped and the values are transferred to the Master, passing the corresponding Super-Nodes. The OLAP view within the data warehouse on the Master is configured to use three measures over five dimensions and eleven dimension levels overall to restructure the 5000 values received from the nodes. In the basic scenario, no aggregation of data is performed and k1 and k2 are set to 1. Beside aggregation, compression is applied before transferring the data from the nodes to their Super-Nodes and from them to the Master. In comparison to the basic scenario, the measured values are aggregated at the Super-Node using our scheme within three other scenarios, according to the configured measures mentioned before. Therefore, k2 is set to 1 for each of the following scenarios. In the first aggregation scenario the measured values are aggregated with k1 = 2. It is to note that this does not mean to build the average value, but create a value based on the defined measures for each aggregation interval. The two other aggregation scenarios aggregate the measured values with k1 = 5 and k1 = 10, respectively. The number of measured values transferred to the Master in each scenario is presented in Fig. 7. As mentioned before, the total number of values transferred to the Master if aggregation is not applied is 5000 measured values. After aggregation is applied with k1 = 2 on the measured values, the number of transferred values is nearly the half than without aggregation. Because not in every moment of the aggregation interval enough measured values are available, the number of transferred values is not exactly the half, but only a few percent more. This is also applies to the other aggregation scenarios. Accordingly, the number of transferred values falls down nearly linear if the aggregation factor is set to 5 or 10, respectively. In contrast to the number of values transferred to the Master, which is the overhead normally created by network control, the amount of compressed data representing
6000
5000 4000 3000
2000 1000
5000
2517
1010
517
1
2
5
10
0 Value of the Aggregation Factor
Fig. 7. Amount of measures transmitted depending on aggregation
Efficient Data Aggregation and Management
279
KiBytes tranferred
2500
2000 1500 1000
500
2026
1215
968
869
1
2
5
10
0 Value of the Aggregation Factor
Fig. 8. Amount of compressed data transmitted depending on aggregation
Amount of Data
100% 80% 60% 40% 20%
100%
100%
60%
50%
48%
20%
43%
10%
0%
1
2
5
10
Value of the Aggregation Factor Amount of transferred data
Amount of transferred measured values
Fig. 9. Aggregation Improvements
these values that are transferred to the Master does not decrease linear to the aggregation factor. As shown in Fig. 8, the amount of data transferred for about the half of the values than the original number of values is nearly 60%. In the other scenarios, one can see that the amount of data is not linearly reduced like the number of values. In the case where the aggregation factor is set to 10 , the number of values are nearly 10% of the original ones, but the amount of data is still above 40% of the original one with no aggregation activated. This results were expected because compression functions work more efficient if the number of measured values is higher, so the efficiency of the compression algorithm must decrease with a decreasing number of measured values. The comparison of the number of transferred values and the amount of data transferred shows huge differences with an increasing aggregation factor and the constant number of measured values created on the nodes (see Fig. 9). The number of transferred values decreases linear to the aggregation factor klevel , whereas the amount of transferred data is nearly constant after setting the aggregation factor to 10. From this results we can derive that the efficiency of our scheme is not only driven by the aggregation factor, but also by the number of measured values that drive the efficiency of the compression algorithm. To verify this, additional large scale tests have been performed. After the frequency of creating measured values was increased, obviously more measured values were retrieved and the compression algorithm worked more efficient if k1 was set to 5 and 10. As confirmed, the performance of the network can be increased the proposed data aggregation and management scheme. But also the influence of the scheme on the performance of nodes, Super-Nodes and the Master, which are elements that run this additional components is of special interest. The average memory usage is nearly 10 MB for each node, 16 MB for each Super-Node and only 50 MB for the central Master. The memory usage on the nodes is not
280
P.-B. B¨ ok et al.
CPU-Time in %
25 20 15
10 5 0 00:05:00
00:10:00
INControl-F
00:15:00
00:20:00
00:25:00
00:30:00
MySQL-Service
Fig. 10. Additional CPU Usage (Master)
CPU-Time in %
25 20 15 10 5 0 00:05:00
00:10:00
INControl-F
00:15:00
00:20:00
00:25:00
00:30:00
MySQL-Service
Fig. 11. Additional CPU Usage (Super-Node)
influenced by the aggregation of data because this is controlled by the SuperNodes or the Master, respectively. So the memory usage on the nodes is negligible. Also the Super-Nodes that control and perform the aggregation of measured values use an acceptable amount of memory. Finally, the Master uses a higher amount of memory because the OLAP-Server that uses Apache Tomcat and the MySQL database as additional components are running on the Master. This is also acceptable because the Master as central component is a high-performance element. The Integrated Network Control architecture was designed with the constraint that utilized elements, excepting the Master, must not be affected by their additional features. The analysis of the memory usage shows that the additional memory usage is acceptable. Beside memory usage, additional CPU utilization may become critical. The detailed results of the analysis of additional CPU utilization are shown in Fig. 10 and Fig. 11. The maximal additional CPU utilization of a simple node is less than 1%. Thus, the results confirm that also the additional CPU utilization is obviously negligible. If the data of more nodes is managed by the Master, the additional CPU utilization of the simple nodes is the same. Also the utilization of the Super-Node’s CPU does not increase significantly. Only the CPU utilization of the Master ascend because of the increasing amount of values and data to handle.
Efficient Data Aggregation and Management
281
Summarizing, the results show that the proposed data aggregation and management scheme significantly reduces the network traffic overhead created by Integrated Network Control structures, especially for the INControl-Framework, while the performance of the probing and steering elements is not affected and, thereby, the computational overhead created by the proposed scheme is negligible.
7
Conclusion
Efficient data aggregation and management is crucial for distributed network control environments. Therefore, we presented a data aggregation and management scheme that is fitted to the special purpose of distributed network control. The proposed scheme improves the performance of data handling in these environments. Adapting concepts and schemes of data warehousing and on-line analytical processing that are already in use for performance insensible applications, a component has been designed that is integrated into the data management of the Integrated Network Control Framework. Because the scale of aggregation of data is driven by the utilization of any involved element, the introduced overhead can be automatically adjusted and minimized. With this enhancements, an efficient analysis of live and historical data from network measurement and monitoring becomes possible without influencing the network or its hosts negatively. Thereby, the network management can be performed using a broad number of data sources. The improvements of the proposed scheme have been verified within several testbed scenarios. The results of the performance analysis confirm that the data management of Integrated Network Control environments is improved by the proposed scheme.
References 1. B¨ ok, P.-B., Pielken, D., T¨ uchelmann, Y.: Towards an integrated network control architecture. In: Parallel and Distributed Computing and Networks Conference (2010) 2. Thomas, M.S., Nanda, D., Ali, I.: Development of a data warehouse for nonoperational data in power utilities. In: Power India Conference, p. 7. IEEE, Los Alamitos (2006) 3. Xiangdong, Z., Xiao, J.: Process control analysis system based on data warehouse. In: International Conference on Artificial Intelligence and Computational Intelligence, AICI 2009, November 2009, vol. 1, pp. 283–287 (2009) 4. Hamm, C.K., Kennedy, M.R., Wu, T.T., Phillips, J.S.: An operational data store for reporting clinical practice guideline adherence in chronic disease patients. In: Proceedings of the 17th IEEE Symposium on Computer-Based Medical Systems, CBMS 2004, June 2004, pp. 78–83 (2004) 5. Albrecht, J., Bauer, A., Deyerling, O., Gunzel, H., Hummer, Lehner, W., Schlesinger, L.: Management of multidimensional aggregates for efficient online analytical processing. In: International Symposium Proceedings Database Engineering and Applications, IDEAS 1999, August 1999, pp. 156–164 (1999)
282
P.-B. B¨ ok et al.
6. Chen, Q., Dayal, U., Hsu, M.: A distributed olap infrastructure for e-commerce. In: Proceedings of the 1999 IFCIS International Conference on Cooperative Information Systems, CoopIS 1999, pp. 209–220 (1999) 7. Jianzhong, L., Hong, G.: Range sum query processing in parallel data warehouses. In: Proceedings of the Fourth International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2003, August 2003, pp. 877– 881 (2003) 8. Akinde, M., Bohlen, M., Johnson, T., Lakshmanan, L.V., Srivastava, D.: Efficient olap query processing in distributed data warehouses. In: Proceedings of the 18th International Conference on Data Engineering 2002, p. 262 (2002) 9. He, H., Zhang, Y., Li, L., Ren, J., Hu, C.: An improved storage algorithm for multidimensional data cube. In: Fourth International Conference on Innovative Computing, Information and Control (ICICIC), December 2009, pp. 841–844 (2009) 10. Shimada, T., Tsuji, T., Higuchi, K.: A storage scheme for multidimensional data alleviating dimension dependency. In: Third International Conference on Digital Information Management, ICDIM 2008, November 2008, pp. 662–668 (2008) 11. Yeung, G.C.H., Gruver, W.A., Kotak, D.B.: A multi-agent approach to immediate view maintenance for an operational data store. In: IFSA World Congress and 20th NAFIPS International Conference, 2001. Joint 9th, July 2001, vol. 4, pp. 1869–1874 (2001) 12. Xu, W., Li, M., Wu, S., Zhu, S., Wang, Z., Miao, K., Wang, Y.: Incremental data feed maintenance of a data warehouse system derived from multiple autonomous data sources. In: International Conference on Control and Automation, ICCA 2005, June 2005, vol. 2, pp. 1108–1113 (2005) 13. Hose, K., Klan, D., Sattler, K.-U.: Online tuning of aggregation tables for olap. In: IEEE 25th International Conference on Data Engineering, ICDE 2009, March 2009, pp. 1679–1686 (2009)
On Self-healing Based on Collaborating End-Systems, Access, Edge and Core Network Components Nikolay Tcholtchev and Ranganai Chaparadza Fraunhofer-FOKUS Institute for Open Communication Systems, Berlin, Germany {nikolay.tcholtchev,ranganai.chaparadza}@fokus.fraunhofer.de
Abstract. Autonomic Networking, realized through control loops, is an enabler for advanced self-manageability of network nodes and respectively the network as a whole. Self-healing is one of the desired autonomic features of a system/network that can be facilitated through autonomic behaviors realized by control loop structures. Autonomicity, implemented over existing protocol stacks as managed resources, requires an architectural framework that integrates the diverse aspects and levels of self-healing capabilities of individual protocols, systems and the network as a whole, such that they all should co-operate as required towards achieving reliable network services. This integration should include the traditional resilience capabilities intrinsically embedded within some protocols e.g. some telecommunication protocols, as well as diverse proactive and reactive schemes for incident prevention and resolution, which must be realized by autonomic entities implementing a control loops at a higher-level outside of protocols. In this paper, we present our considerations on how such an architectural framework, integrating the diverse resilience aspects inside an autonomic node, can facilitate collaborative self-healing across end systems, access networks, edge and core network components. Keywords: Autonomic Fault-Management, GANA-orientated architecture for Autonomic Fault-Management, Resilience, Self-Healing.
1 Introduction Autonomic Computing as introduced by IBM [1] is based on MAPE (Monitor→Analyze→Plan→Execute) type of a control loop. Such a control loop is also meant to realize self-management features like self-healing, self-protection, selfoptimization and self-configuration. According to [1], self-healing is defined as follows: “To detect incidents such as adverse conditions, faults, errors, failures; diagnose, and act to prevent or solve disruptions”. That is, on one hand a network equipped with self-healing mechanisms should aim at automatically preventing future fault activations, and on the other hand it should detect, diagnose and remove faults (provided that the detected faults can be automatically removed or have their impact reduced to a minimum). This corresponds to a number of concepts and approaches that have been investigated recently, such as Autonomic Fault-Management as well as reactive and proactive Resilience in autonomic networks. Autonomic FaultManagement [7][9] is understood as a control loop structure that facilitates the R. Szabó et al. (Eds.): AccessNets 2010, LNICST 63, pp. 283–298, 2011. © Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
284
N. Tcholtchev and R. Chaparadza
interplay between the processes of Fault-Management as defined by TMN (Telecommunications Management Network) [3] namely: Fault-Detection – “detect the presence of a fault”, Fault-Isolation – “find the fault (root cause) for the observed erroneous state”, and Fault-Removal – “remove or reduce the impact of the root cause”. Moreover, resilience mechanisms have been developed for a variety of communication protocols and technologies, such as restoration schemes in SONET/SDH and recovery schemes in MPLS [RFC4427]. The introduction of control-loops that govern self-management and control of the existing protocols (as managed entities, i.e. managed resources of specific control-loops), calls for a framework that integrates an overall picture of the self-healing aspects and levels, at which to reason about self-management within a node/device architecture, and the network architecture as a whole. A first draft of such architecture was presented in our previous work [4] and is briefly described in section 3. Hitherto, the application of this framework to a single administrative domain of limited scope was taken into account. In this work, we present our further considerations on how this architectural framework could be used in a multi-domain environment consisting of end systems, access networks, edge routers, and core routers. Thereby, on the network provider’s side we restrict the discussion to the internet service provider’s (ISP) network, which enables the access to the internet backbone for its subscribers. The rest of this paper is organized as follows: Section 2 presents the Generic Autonomic Network Architecture (GANA) which is the Reference Model for Autonomic Networking and Self-Management on which our self-healing framework is based. Section 3 presents the architectural framework (based on GANA) which reflects the current status of our research on self-healing/resilience mechanisms in autonomic networks. Section 4 presents the different aspects and mechanisms facilitating the collaboration of end systems, access and core network components towards implementing self-healing mechanisms across the different domains. Section 5 presents a case study and a scenario on how such collaboration can work in practice. Finally, section 6 provides some concluding remarks.
2 Generic Autonomic Network Architecture The Generic Autonomic Network Architecture (GANA) is based on a set of requirements derived in [2]. The core autonomic concept in GANA is that of a Decision Element (DE). A Decision Element (DE) implements a control-loop and manages a set of Managed Entities (MEs) assigned to be autonomically managed and controlled. That is, self-*/autonomic features are realized by Decision Element(s) implementing a control loop within the GANA reference model. Since control loops on different levels of functionality are possible, e.g. on node or network level, GANA defines the Hierarchical Control Loops (HCLs) framework. The HCLs Framework fixes and establishes four levels of abstraction for which DEs, MEs and associated controlloops can be designed: Level-1: Protocol-Level, i.e. control loops embedded within protocol modules (e.g. within some routing protocol). Level-2: Abstracted FunctionsLevel, i.e. DEs managing some abstracted networking functions inside a device e.g. routing, forwarding, mobility management, Level-3: Node Level - the node level consist of a Node_Main_DE that takes care of the management of aspects related to
On Self-healing Based on Collaborating End-Systems
285
the state/fitness of the overall node, e.g. Fault-Management and Auto-Configuration. Level-4: Network Level-DEs on that level manage different aspects that require to be handled at the network-level, e.g. routing or monitoring, of a group of nodes according to a network scope. Thereby, control loops (i.e. DEs) on a higher level manage DEs on a lower level down to the lowest-level "pure" MEs. Detailed information about all the presented concepts, examples, as well as discussions on the application of GANA to diverse aspects of Autonomic Networking can be found in [2].
3 Unified Architecture for Autonomic Fault-Management, Resilience, and Survivability in Self-managing Networks Within the EFIPSANS [17] project, we introduced a Unified Architecture for Autonomic Fault-Management, Resilience and Survivability in Self-Managing Networks (UAFAReS). UAFAReS is based on the observation that the evolution of traditional Fault-Management towards Autonomic Fault-Management enables network devices to exercise self-healing and recover from faulty conditions. That is, the nodes of the network are then able to automatically self-heal (to some degree) without the need for human intervention. Hence, Autonomic Fault-Management has to interplay with concepts and mechanisms related to Fault-Tolerance, Fault-Masking, and Multilayer Resilience [6]. This implies harmonization (i.e. ordered time-scaling of reactions) to incidents, at different levels of autonomicity and self-management defined by GANA. UAFAReS is based on the GANA reference model and specifies a number of components which aim at realizing the interplay of the aforementioned aspects. The node components of the architectural framework are illustrated in Figure 1. The main UAFAReS entities in a device are the Fault-Management Decision Element (FM_DE) and the Resilience and Survivability Decision Element (RS_DE). The RS_DE is responsible for an immediate reaction to the symptoms of an erroneous state, while in parallel the FM_DE performs Fault-Isolation and Fault-Removal in order to eliminate the corresponding root cause(s). Both DEs are part of the Node_Main_DE, i.e. they are introduced on node level inside a GANA conformant device, in order to have exclusive access to all node functional entities (i.e. DEs and MEs) such that the overall autonomic behaviors of a node with respect to coping with incidents and alarms are synchronized to ensure node integrity. The UAFAReS DEs operate based on distributed control loops. The distributed nature of the UAFAReS control loops is enabled by a number of components that facilitate the incident information exchange across the network nodes. A set of repositories for storing incident information and an Incident Information Dissemination Engine (IDE) enable the synchronization of the faults/errors/failures/alarms knowledge known by UAFAReS DEs residing in different devices, and allow the DEs to perform Fault-Masking, Fault-Isolation and FaultRemoval in a node specific manner, based on the same information. The FM_DE consists of four modules: 1) a component responsible for FaultIsolation (Fault-Diagnosis/Localization/Isolation functions abr. FDLI), 2) FaultRemoval Functions (FRF), 3) Action Synchronization Functions (ASF) – responsible for synchronizing (allowing and/or disallowing) tentative actions issued as by the RS_DE and the FM_DE control loops running in parallel, 4) Fault-Removal Assessment Functions (FRAF) – a component responsible for assessing and verifying the success of the fault removal actions issued as output of the FM_DE.
286
N. Tcholtchev and R. Chaparadza
Specially instrumented monitoring entities, which have the capability to share incident information over the UAFAReS incident repositories, push descriptions of symptoms to the UAFAReS fault/error/failure/alarm registries such that the info gets conveyed (i.e. stored in the UAFAReS node registries) by the Incident Dissemination Engine (IDE) to the UAFAReS instances across the network scope, e.g. subnet/LAN. Once an incident description has been reported to the FM_DE over the UAFAReS incident repositories, it gets received and processed first by the FDLI functions. That is, the FDLI functions collect such events and correlate them in order to find the root cause for the observed faulty conditions. Algorithms that can be used for event correlation are presented in [8]. The correlation of incident events is realized by the FDLI functions based on a Causality Model that is kept in the Causality Model Repository (CMR) inside a node. The identified root cause(s) (faults) are then further submitted to the Fault-Removal Functions which implement an “if-then-action” logic that issues a reaction required to eliminate the faults, e.g. reconfiguration of an entity by using the corresponding command line interface (CLI). Since it is possible that the tentative reaction would interfere with other actions that are intended to be performed by either the RS_DE control loop (next paragraph), or would interfere with a parallel Autonomic Fault-Management control loop process (i.e. a thread in multi-threading environment), the ASF should be invoked in order to allow or disallow the tentative action in question. The ASF is based on techniques from the area of optimal control, and selects the optimal subset of tentative actions in order to better optimize the network performance reflected by the values of selected key performance indicators while ensuring integrity. Given that the ASF has allowed a tentative action, the FRF issues it on the MEs in question inside the device. Thereby the FRF can make use of information regarding the dependencies among protocol entities and services, kept in the Dependability Model Repository (DMR). Finally, the success of the executed action is assessed by the FRAF functions, which may choose to notify the network operator in case when the UAFAReS mechanisms can’t cope with the pending challenges. The Resilience and Survivability DE contains the Fault-Masking Functions (FMF) component and a Risk Assessment Module (RAM). The Fault-Masking Functions realize a reaction immediately after the symptoms of a faulty condition have been registered into the UAFAReS alarm/incident repositories. Thereby, the goal of the FMF is to implement a fault-tolerant behavior such that some fundamental level of service can be sustained in the face of a pending challenging condition. The FMF follow a similar logic as the Fault-Removal Functions of the FM_DE, and consult the Actions Synchronization Functions of the FM_DE to react first in order to ensure that the best possible set of actions is executed. The FMF, as the instance of first reaction, should also consider the aspect of Multilayer Resilience [6] while orchestrating a fault-tolerant/masking behavior. Multilayer Resilience is a model that deals with the capabilities of functional entities at different layers in the protocol stack to execute their own embedded resilience behaviors. For instance, in IP networks generated ICMP messages enable systems (especially end systems) to overcome issues occurring in the network, e.g. sudden changes of PMTU (Path Maximum Transmission Unit) during the lifetime of a connection. Thus, the FMF is expected to allow the protocol modules to recover based on their own intrinsic capabilities and should intervene only in the case when these mechanisms fail. [6] proposes the usage of “hold-off” timers specifying the time that should be given to a protocol to recover on
On Self-healing Based on Collaborating End-Systems
287
its own. Information on how to handle the resilience properties of a protocol module (e.g. protocol module ID and corresponding “hold off” timer) is kept in the MultiLayer Resilience Properties Repository. In addition, the operation of the Risk Assessment Module (RAM) is based on monitoring information about diverse key performance indicators (e.g. CPU temperature) that are used to calculate the probability for failures in the future. This results in notifications to the FMF which consequently have to trigger mechanisms that help proactively avoid significant degradation in the QoS in the future.
Fig. 1. The UAFAReS [4] architecture inside an Autonomic Node
Hitherto, we considered the application of UAFAReS to a single domain of limited scope. In this work, we present our view on how UAFAReS can be used in a specific multi-domain environment consisting of end systems, access networks, edge routers, and core routers implementing UAFAReS. Thereby, we have identified that what differs from domain to domain is the type of knowledge and models that the UAFAReS instances are supplied with. Thereby, it is imperative that on one hand, effective collaboration, i.e. knowledge/information exchange, is facilitated, and on the other hand, the confidentiality and integrity of the information exchanged between the domains must be ensured.
4 Aspects of Collaboration In this section, we present on the different aspects of collaboration among end systems, access network entities, as well as core network components. These aspects should be considered in implementing self-healing based on collaboration of network entities across multiple network segments. Furthermore, we make an attempt to identify situations and issues where the collaboration can be beneficial for enabling fault
288
N. Tcholtchev and R. Chaparadza
resolution. Thereby, we assume that the devices across the different networks are equipped with UAFAReS components and we specify the collaboration aspects as interactions between the UAFAReS instances. 4.1 Means for Auto-collaboration In general, the base for UAFAReS collaboration over the domains in question is provided by the exchange of different types of information, either during the device/terminal subscription phase for the end systems, or during the phase in which they are utilizing the ISP’s network. The information used by a UAFAReS instance can be classified as follows: 1) Runtime information about detected incidents - stored in the incident repositories of a UAFAReS implementing device, 2) Causality Model that describes information about potential chains of events (fault→error … →failure) – stored in the Causality Model repository of a node, 3) Information about the resilience properties of involved network components – stored in the Multi-Layer Resilience Properties Repository, 4) Policy configurations for the FRF, FMF and the FRAF modules of the FM_DE and the RS_DE. Figure 2 gives an overview of the information flows required to facilitate collaborative self-healing. In the following subsections, we discuss the importance of the different types of information outlined in Figure 2, thereby distinguishing between the subscription phase of a device to the ISP network and the operating phase when the network is utilized by the end system for services and applications.
Fig. 2. Aspects of collaboration
4.2 Auto-configuration During the Subscription Phase for a Terminal: Enabling UAFAReS Based Self-healing The end systems are expected to share with the operator’s network, information about their settings and configurations, e.g. OS (Operating System) type and characteristics,
On Self-healing Based on Collaborating End-Systems
289
type and version of the employed protocol stack including enabled protocol features such as PMTU discovery, applications and services hosted on the end system, etc. This information is denoted as device capabilities and is provided during the subscription phase of the device into the operator’s network. The device capabilities can be seen as accumulated capabilities of the plethora of software and hardware components of an end system. As described in Figure 3, the capabilities are required by the ISP network in order to select the appropriate UAFAReS related configurations which are sent back to the end system. This information may include (as illustrated in Figure 2) a Causality Model related to potential problems that can occur in the network such that the end system can better understand certain anomalies, Fault-Removal/Masking policies which enhance the set of such policies already in place on the end-system, information about Multilayer Resilience aspects related to the core network, which allow the UAFAReS instance on the end system to take into account the intrinsic resilience mechanisms of the lower communication layers of the network. An interesting research issue is that of the information model that facilitates the processes in Figure 3. The current trends in telecommunications are striving to reduce the usage of proprietary models and promote the usage of a single standardized model such as or CIM [14]. For the purpose of self-description by end systems, one has to be aware that due to different standards and technological domains, the exchanged information will suffer major drawbacks because of the lack of a standardized terminology. Hence, the usage of semantic based model concepts such as ontologies is imperative for the process of self-description and auto-configuration for UAFAReS. That is, the end system must submit (to the network) its capabilities in an ontological format such that the obstacle of different terminologies across software/hardware vendors, standardization bodies, etc. can be avoided.
End System
Access Network
Network
1. Subscribe and get authenticated
2. Submit device capabilities 3. Forward device capabilities 3. Compile/select UAFAReS related configurations for end system 4. Respond with UAFAReS related configurations 5. Deliver UAFAReS related configurations to end system
Fig. 3. UAFAReS auto-configuration of an end system
Another aspect that must be addressed when specifying the Causality Model as well as Fault-Removal/Masking policies to be supplied to an end system, is that of confidentiality of the exported information. If not designed properly, the aforementioned models and policies may contain information that directly or indirectly reveals
290
N. Tcholtchev and R. Chaparadza
some sensitive information about the operator’s network. Such information can be easily exploited by hackers. Therefore, it may be useful to use only some cryptographically generated strings (i.e. avoid using full descriptions of the events) in the version of the Causality Model which is given to an end system, or to delegate as many as possible reactions (Fault-Masking/Removal) of the control loops to the edge or access network components. Survivability Requirement (SR) +UrgencyLevel : int
Alarm based SR +AlarmID : string +AlarmDescription : string +Keywords : string +PerceivedSeverity : int +correlatedNotifications : string
1
1 +Provider Entity
Incident based SR
+Detecting Entity
+incidentType : string +incidentID : string +Keywords : string
+threshold
1
1
+Provided Service
1 Threshold Information +ThresholdDescription : string +ThresholdLevel : int
+location 1
1
1 Entity Information
1
+EntityID : string +EntityDescription : string +EntityLocation : string
1
Service Information +ServiceID : string +ServiceDescription : string
Fig. 4. Information Model for Survivability Requirements
Additionally, after getting to know the capabilities of the new subscriber, the autonomic mechanisms of the network have to prepare the information flows from the network to the subscriber such that the end system can access information about incidents in the ISP network and can correspondingly react according to the aforementioned Fault-Masking policies. This means that the network mechanisms formulate Survivability Requirements (SR) for the applications and services on the newly subscribed end system. A survivability requirement expresses time frame within which an entity requires to be notified of incidents, to enable it to employ its own mechanisms to avoid failure or degraded service beyond unacceptable service. These SRs define “filters” and specify the incident information that should be conveyed from the network to the end system. Furthermore, the end system can explicitly express its SRs to the network UAFAReS mechanisms as mentioned in [7]. The SRs are used as filters by the IDEs across the ISP‘s network in order to enable the automatic notifications towards the end system upon the detection of matching incident events. Figure 4 depicts the information model of a Survivability Requirement that we propose. This model is based on alarm/incident event descriptions as recommended by ITU-T [15] and CIM [14]. Due to space limitations, we omit the detailed description of the different attributes which are also self-explanatory to a large extent.
On Self-healing Based on Collaborating End-Systems
291
4.3 Collaboration and Information Flow during the Operation Phase After an end system has subscribed to the ISP over the access network, selected types of monitoring information are expected to flow between the end system, the access network and the ISP’s network. Figure 5 describes a generic scheme of how an end system can mask the local impact of an erroneous state that has been detected in the core network (step 1). The node in the core network, on which the erroneous state was detected, reasons about the need to send the incident description to the corresponding end system, e.g. the decision could be based on a previously expressed/generated Survivability Requirement and/or on the end system being one of the end points in a flow in which some traffic anomalies were detected. Moreover, the Edge of the ISP network intercepts (step 3) the message (after it has been sent in step 2), and checks whether the information inside can be sent to the end system based on the security policies in place. Obviously, such an incident message can contain information which the operator does not want to share in order to keep certain structures and configurations of her network a secret. Hence, the edge should be equipped with a policy that allows/disallows, or manipulates incident descriptions being sent to end systems. Given that the message has been allowed or anonymized, step 4 and 5 End System
Access Network
Edge
ISP Core Network
1. Detect an incident 2. Send detected incident to end system 3. Process incident forwarding request
4. Forward (anonymized) incident description 5. Deliver detected incident to end system 6. Mask local impact of the erroneous state
Fig. 5. An end system masking the local impact of an erroneous state in the ISP network
End System
Access Network
Edge
ISP Core Network
1. Detect an incident 2. Send detected incident to end system 3. Process incident forwarding request
4. Forward (anonymized) incident description 5. Deliver detected incident to end system 6. Perform Fault-Isolation 7. Perform Fault-Removal
Fig. 6. An end system performing Fault-Isolation and Fault-Removal in case the incident, detected in the ISP network, indicates a fault (root cause) on the end system
292
N. Tcholtchev and R. Chaparadza
deliver it to the end system. Based on the Fault-Masking policy, which has been supplied during the subscription phase of the end system (Figure 3), the end system is expected to react and mask/mediate the local impact of the erroneous state indicated by the received incident description. In addition to the mediation behavior of the end system, which is realized by the Resilience and Survivability DE of UAFAReS, we also consider a behavior as the one described in Figure 6. In this case the end-system FM_DE gets activated and performs Fault-Isolation, and consequently Fault-Removal in case the root cause for the traffic anomaly observed in the core network is located on the end-system. The logistic that facilitates these processes includes the Causality Model and the Fault-Removal policies which must have been supplied during the subscription phase of the end-system. Furthermore, monitoring information, i.e. metric values or incident description of observed anomalies, may flow from the end systems to the ISP core network thereby enabling self-healing in the access network, the edge and core network. The signaling that realizes fault tolerant behaviors in the core network is described in Figure 7. In addition, the sequence of interactions and actions towards eliminating the root cause(s) of a faulty condition in the core network are illustrated in Figure 8.
End System
Access Network
Edge
ISP Core Network
1. Detect an incident 2. Send detected incident to ISP network 3. Froward detected incident description 4. Froward detected incident description
5. Perform Fault-Masking inside the core network
Fig. 7. Fault-Masking in the ISP network based on information supplied by an end system
End System
Access Network
Edge
ISP Core Network
1. Detect an incident 2. Send detected incident to ISP network 3. Froward detected incident description 4. Froward detected incident description 5. Perform automatic Fault-Isolation
6. Perform automatic Fault-Removal
Fig. 8. Autonomic Fault-Management in the ISP network, including automatic Fault-Isolation and Fault-Removal, based on information supplied by an end system
On Self-healing Based on Collaborating End-Systems
End System 1
Access Network
293
End System 2
1. Detect an incident 2. Share incident 3. Froward detected incident description
4. Perform Fault-Masking and mediate the local impact of the faulty condition
Fig. 9. Information sharing between end systems and consequent Fault-Masking of the local impact of a faulty condition
The sharing of incident event descriptions among end systems can enable UAFAReS based resilient behaviors on the subscribed end systems as illustrated in Figure 9. This sharing can be realized over the SOHO network or directly over the communication medium of the access network. It is also possible that different access network components, serving different subscribers collaborate and enable the exchange of incident information between the subscribed end systems. An example of such autonomic behavior, facilitated by the UAFAReS incident sharing mechanisms among end systems, is given in the next subsection. 4.4 Faulty Conditions Resolvable by UAFAReS Based Auto-collaboration In this section, we give some examples of issues which can be resolved in case different access/core nodes and end systems implementing UAFAReS collaborate. Black Holes: The term Black Hole represents a family of packet delivery problems where the physical (and in some cases even logical) connectivity between two systems is present, but however the packets sent between the two nodes (e.g. hosts) don't reach their destination. The erroneous state results from the fact that the systems, even having the capabilities to react to the fault activation, do not get notified of packet delivery failures and can not even localize the fault. That is, the sender may continue sending packets without detecting the packet loss problem and it can not react. The forwarding nodes are also not aware of the problem and do not adapt their behavior correspondingly. The phenomenon of Black Holes has been extensively studied and its relevance for ISP operators is now well known [13] [12]. Potentially, Black Holes may occur due to: 1) Loss of connection because of an incorrect PMTU or broken tunnels, e.g. MPLS. 2) Software bugs, 3) Delayed routing protocol convergence etc. The UAFAReS architecture can remediate some Black Hole issues by detecting and isolating them, reconfiguring the corresponding core network components, and informing the end systems in question to adapt their behaviors. Duplex mismatches: IEEE 802.3 Ethernet networks are an established communication environment for LANs. These networks support an auto-negotiation procedure that allows the Ethernet interfaces on the involved nodes to automatically obtain and setup the optimal parameters on a link. The aforementioned parameters include the speed of the interface cards (10 MBit/s, 100 MBit/s, 1000 MBit/s), the duplex mode, as weIl as
294
N. Tcholtchev and R. Chaparadza
flow control information. Whilst this procedure makes the setting up of a network easier, it can lead to a mismatch in the settings assumed by both NICs (Network Interface Cards) involved. For instance, the failure of the auto negotiation procedure may result in a mismatch of the duplex configuration on two peer interfaces, i.e. the one operates in full and the other in half duplex mode. This can seriously cripple the network and lead to performance degradation. Such duplex mismatches can occur in the SOHO network attached to the access network, as well as in the core or access network. In [11], the symptoms of such Ethernet Duplex mismatches are investigated. The potential symptoms include duplicate ACKs in TCP flows, fluctuations in the CWND (congestion window) size of TCP connections, as well as increased collusions on the MAC layer and frame losses at nodes' NICs. Some of these anomalies can be detected only on the end systems, e.g. CWND size fluctuations. On the other hand, in case a duplex mismatch is identified somewhere along a path, the UAFAReS framework would disseminate the information to the involved network components and enable/implement diverse resilient behaviors. That is, the nodes attached to the misconfigured link would restart the corresponding NICs in order to reinitiate duplex auto-negotiation. Given that the involved end systems have been notified about the issues (Figure 5), the UAFAReS mechanisms could hold outgoing traffic for a period of time which should be enough for the forwarding nodes along a path to recover. That way, traffic is not unnecessarily (since it will for sure get lost during the fault removal phase) pushed into the network, and the user will probably experience just a short delay. TCP MSS size problems: [RFC2923] describes a case that results in a smaller (then possible based on the actual PMTU) Maximum Segment Size (MSS) advertised and used in a TCP connection. This leads to a limited segment size of the TCP connection between end systems, and respectively to degradation in QoS of TCP applications using the affected TCP session. UAFAReS mechanisms (monitoring sensors) inside the network could detect such symptoms, identify the aforementioned root cause for the QoS degradation, and inform the UAFAReS instance on the end system in question to readjust its MSS size settings. (D)DoS detection: We hold that the mechanisms of the UAFAReS architecture can also be employed for security related issues realizing a self-protecting/defending functionality. Recently, a lot of research has been conducted in the area of intrusion detection. The study and classification of traffic anomalies towards the detection of malicious software preparing a Distributed Denial of Service (DDoS) attack is an ongoing research topic the results of which can be easily integrated with the UAFAReS incident sharing mechanisms. In this way, end systems that detect such suspicious activities may send this information to the UAFAReS instance at the edge of the ISP's network. After having correlated these notifications, and having identified the threat of a DDoS attack, the UAFAReS instance on the edge can implement a policy in order to counter the intended attack. Corrupted services, e.g. DNS: Given that an end system or any functional entity has detected a malfunctioning service, the UAFAReS incident sharing mechanisms can be employed to disseminate this information and inform other potentially affected end systems. For instance, an end system may detect the unavailability of a DNS server
On Self-healing Based on Collaborating End-Systems
295
and use the UAFAReS mechanisms in order to share this information and enable other end systems to switch to another DNS server as long as the primary one is offline. Maintenance activities: According to [5], maintenance activities are responsible for around 20% of the failures observed in an operational IP backbone. The UAFAReS architecture can potentially detect the corresponding outage, identify the reason for it and apply, for example, some admission control policies in order to guarantee best QoS for the already subscribed end systems. PMTU issues: Traditionally, IP protocols implement a PMTU (Path Maximum Transmission Unit) discovery procedure on end systems. PMTU discovery is performed before the data transmission has started. This enables the end systems to obtain a valid PMTU for a connection towards another end system using the services of an operator's network. However, it is possible that the PMTU towards a host decreases during the lifetime of a connection. In such cases, the router at which the PMTU has decreased will fail to forward a packet and is expected to respond to the end system with an ICMP message indicating the packet loss and the reason. The IP module on the sender end system is expected in turn to readjust its PMTU settings towards the receiver. However, this procedure is very often not possible due to reasons such as firewall ICMP suppression on routers, or simply because the traffic that cannot be forwarded is being tunneled (e.g. a VPN tunnel) and the ICMP messages are correspondingly not relayed. In such cases the sending host fails to adapt its fragmenting behavior and the packets towards the receiver fail to reach their destination even though there is an intact physical connectivity. Obviously, this constitutes a specific type of a Black Hole as previously described. We believe that a framework such as UAFAReS can enable the implementation of mechanisms which allow the collaborating end systems, access and network components to overcome such issues with PMTU changes.
5 Case Study: Overcoming Potential Problems with PMTU Changes in an IPv6 Network, Resulting from IPv4 to IPv6 Transitioning Our scenario is based on the Path MTU problems described in [RFC2923]. [RFC2923] describes a specific type of a Black Hole phenomenon that can potentially occur in IPv6 networks wrongly configured with firewall configurations adopted from IPv4 firewall configuration rules that drop ICMP messages. The faulty condition gets activated by a change in the PMTU towards a host during the lifetime of a connection. This problem can become crucial and lead to the extensive loss of traffic in IPv6 networks, since in IPv6 packets do not get fragmented on the forwarding nodes due to performance considerations. The next paragraphs describe the outlined issues in detail. The phenomenon in our case study is caused by the loss of packets at a router without any (ICMPv6) error notification being conveyed back to the sender host. We look at the specific case when packets are dropped at a forwarding node because of an incorrect PMTU (Path Maximum Transmission Unit), i.e. the packets of a flow are larger than the maximum size of a packet that can be transmitted. Such a situation can
296
N. Tcholtchev and R. Chaparadza
occur in case of a link/node failure followed by the automatic (as done in OSPF) rerouting of the packets over a link with an MTU which is smaller than the initially obtained PMTU for a flow originating from an end system as illustrated in Figure 10 (the Gigabit Ethernet links have bigger MTU than the Fast Ethernet links). Usually, if a router fails to forward a packet with MTU greater than the one set for the egress interface for the packet, it is supposed to notify the sender of a flow via an ICMPv6 Packet Too Big message. Upon receiving such an ICMPv6 message, the sender is expected to adjust the PMTU of the corresponding flow in order to continue serving the end user’s applications. However, it seems that administrators often suppress ICMP messages in their firewalls (security considerations). Therefore, this problem can easily occur whenever security concerns prevail inside an operator’s network or simply in the case when an old (IPv4 relevant) firewall configuration has been adopted to an IPv6 network. The issue of ICMP message suppression is extremely critical in the context of IPv6, since ICMPv6 plays a significant role in IPv6, and suppressing ICMPv6 messages in routers could lead to major performance and connectivity issues, amongst others because intermediate nodes in an IPv6 network do not perform packet fragmentation due to performance considerations.
Fig. 10. Reference network and scenes for the IPv6 Black Hole case study
The reference network for our IPv6 PMTU scenario is illustrated in Figure 10. It consists of a WLAN access point which is connected to a service provider network of five routers with different types of links. On the services’ side, there is an FTP server which is used by subscriber users to share files. Such a network can be seen as a part of a university campus network. Assuming that R2 is a router with a misconfigured l traffic l between l thelFTP firewall that suppresses ICMP messages, one can think of l server and the host on the left over the path: end system ↔ R1 ↔ R2 ↔ R4 ↔ R5 ↔ FTP server. The flows running over this path would also have an established Path MTU of the size allowed by Gigabit Ethernet jumbo frames and WLAN at the same l l
On Self-healing Based on Collaborating End-Systems
297
time: 2346 bytes (the WLAN MTU). Assuming that at a particular point in time, the link R2 ↔ R4 fails as illustrated in Figure 10, then the routing protocol would shift the traffic to the link R2 ↔ R3. This link has the Fast-Ethernet MTU of 1500 bytes. Indeed, the PMTU will go down from 2346 bytes to 1500 bytes. For that reason, R2 starts losing packets since no fragmentation is allowed on intermediate nodes in IPv6 because of performance issues. Because of R2 suppressing ICMP messages, the end system on the left won’t get notified to readjust the size of the packets it sends out. This would normally lead to a time out of the connection on the transport layer in case of connection-oriented protocols. In the case of TCP, the connection could survive given that TCP PMTU discovery is implemented and activated on the end system, which is often not the case for the standard configuration of many operating systems. The application of UAFAReS in that context aims at extending the IPv6 capabilities of handling PMTU changes during the lifetime of a flow even in the presence of misconfigured firewalls, and eliminating the misconfigurations in the network. [RFC2923] provides packet flow descriptions that illustrate such a PMTU Black Hole as observed on an intermediate router, i.e. a router between one of the end systems and the black hole router. Such a router would be R1 in our case, because R2 is considered as the black hole router. [RFC2923] illustrates a case when large packets fail to traverse the network. Given that such symptoms are detected on the R1, the corresponding incident descriptions are submitted to the local UAFAReS repositories and in turn disseminated by the IDE to R2 and to the end system in order to facilitate resilient behaviors on these devices. After the FM_DE on R2 has been fed with the information regarding the detected symptoms, its FDLI functions component will perform Fault-Isolation based on the Causality Model stored in the local CMR repository and will convey the results of the Fault-Isolation process to the FRF module in order to execute a fault removal action on R2. We expect that the Autonomic FaultManagement control loop will be completed by an action reconfiguring the firewall in question such that ICMPv6 messages (according to the IPv6 security model described in [RFC4942]) can get through, and that the RS_DE (more specifically the FMF) on the end system (on the left) would readjust the PMTU towards the FTP server while trying to prolong the lifetime of the connection – this corresponds to the behavior specified in Figure 5. In that sense, the actions undertaken by UAFAReS will be of benefit for all transport layer communications, including connectionless UDP.
6 Concluding Remarks In this paper, we presented the developments within the EFIPSANS [10] project towards the definition of a standardizable Reference Model for Autonomic Networking and Self-Management dubbed GANA Model, and the investigation of the interplay between Autonomic Fault-Management, Resilience and Survivability concepts and mechanisms towards the implementation of self-healing in autonomic networks. We presented our framework for achieving self-healing, called UAFAReS (Unified Architecture for Autonomic Fault-Management, Resilience and Survivability), which is based on GANA. UAFAReS was initially designed to implement resilience within a limited network scope, e.g. subnet, LAN, OSPF routing area. In this paper, we extend it, propose and investigate aspects and mechanisms on how collaboration between end
298
N. Tcholtchev and R. Chaparadza
systems, access, edge and core network components can be facilitated and exploited in such a way that collaborative self-healing across the different domains is realized through the UAFAReS framework. Moreover, we presented some types of faulty conditions that can be resolved by the collaboration of UAFAReS instances inside the ISPs network, the edge, the access network and the subscribed end systems (user terminals). Finally, we looked into the details of a particular case study, showing how the collaboration of UAFAReS instances across different domains can enable overcoming potential problems with PMTU changes in an IPv6 network. Our future research efforts will be mainly concentrated on the implementation of the UAFAReS framework and the investigation of an extensive number of issues which can be addressed by UAFAReS mechanisms in enabling self-healing across different domains. Acknowledgement. This work has been partially supported by EC FP7 EFIPSANS project (INFSO-ICT-215549).
References 1. Autonomic Computing: An architectural blueprint for autonomic computing, IBM White Paper (2006), http://www-01.ibm.com/software/tivoli/autonomic/ 2. Chaparadza, R.: Requirements for a Generic Autonomic Network Architecture (GANA), suitable for Standardizable Autonomic Behavior Specifications for Diverse Networking Environments. IEC Annual Review of Communications 61 (December 2008) 3. The FCAPS management framework: ITU-T Rec. M. 3400 4. Tcholtchev, N., et al.: Towards a Unified Architecture for Resilience, Survivability and Autonomic Fault-Management for Self-Managing Networks. To appear in the Proceedings of the 2nd Workshop on Monitoring Adaptation and Beyond MONA+ 5. Markopoulou, A., Iannaccone, G., Bhattacharyya, S., Chuah, C.N., Ganjali, Y., Diot, C.: Characterization of Failures in an Operational IP Backbone Network. IEEE/ACM Transactions on Networking 16(4), 749–762 (2008) 6. Touvet, F., Harle, D.: Network Resilience in Multilayer Networks: A Critical Review and Open Issues. In: The Proceedings of the First International Conference on Networking-Part 1, July 09-13, pp. 829–838 (2001) 7. Chaparadza, R.: UniFAFF: A Unified Framework for Implementing Autonomic FaultManagement and Failure-Detection for Self-Managing Networks. John Wiley & Sons, Chichester (2008) 8. Steinder, M., Sethi, A.S.: A survey of fault localization techniques in computer networks. Journal – Science of Computer Programming 53, 165–194 (2004) 9. Li, N., Chen, G., Zhao, M.: Autonomic Fault Management for Wireless Mesh Networks. Electronic Journal for E-Commence Tools and Applicatoins, eJETA (January 2009) 10. EFIPSANS project: http://www.efipsans.org/ (as of date September 17, 2010) 11. Shalunov, S., Carlson, R.: Detecting Duplex Mismatch on Ethernet. In: Dovrolis, C. (ed.) PAM 2005. LNCS, vol. 3431, pp. 135–148. Springer, Heidelberg (2005) 12. Kompella, R.R., Yates, J., Greenberg, A., Snoeren, A.C.: Detection and Localization of Network Blackholes. In: The Proceedings of IEEE Infocom, Alaska, USA (May 2007) 13. Hubble: Monitoring Internet Reachability in Real-Time, http://hubble.cs.washington.edu/ (as of date July 12, 2010) 14. CIM, http://www.dmtf.org/standards/cim/ (as of date September 17, 2010) 15. ITU-X.733: Information Technology – Open Systems Interconnection – Systems Management: Alarm Reporting Function
Priority Based Delivery of PR-SCTP Messages in a Syslog Context Mohammad Rajiullah, Anna Brunstrom, and Stefan Lindskog Department of Computer Science, Karlstad University SE-651 88 Karlstad, Sweden {mohammad.rajiullah,anna.brunstrom,stefan.lindskog}@kau.se
Abstract. Unquestionably, syslog provides the most popular and easily manageable computer system logging environment. In a computer network, syslog messages are used for several purposes such as for optimizing system performance, logging user’s actions and investigating malicious activities. Due to all these essential utilities, a competent transport service for syslog messages becomes important. Most of the current syslog implementations use either the unreliable UDP protocol or the more costly reliable TCP protocol. Neither of these protocols can provide both timeliness and reliability, while transporting inherently prioritized syslog messages in a congested network. In this paper, we both propose and evaluate the use of PR-SCTP, an existing partial reliability extension of the SCTP transport protocol, as a candidate transport service for the next generation syslog standard. In our emulation based experimental results, PR-SCTP shows better performance than TCP in terms of average delay for message transfer. Furthermore, PR-SCTP exhibits less average packet loss than UDP. In both cases, PR-SCTP exploits priority properties of syslog messages during loss recovery. Keywords: Syslog, PR-SCTP, performance evaluations, transport service.
1
Introduction
An important task of network, system or security administrators is the analysis of system or device generated log files. These files contain entries about specific events that occurred in a system or a network. Logs within a computer network are generated from various applications and operating systems on servers, clients or other networking devices. Log files are normally used for several functions, for example in optimizing system and network performance, recording the actions of users, and providing data important for the investigation of security related events. Generally, logging systems are used in large organizations where the number of computer systems can range in thousands. Such logging system must thus provide not only a high degree of reliability but also timeliness while transporting log messages within the network. Traditionally, log messages have R. Szab´ o et al. (Eds.): AccessNets 2010, LNICST 63, pp. 299–310, 2011. c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
300
M. Rajiullah, A. Brunstrom, and S. Lindskog
been collected and compiled by using the syslog protocol [1]. This protocol allows a machine or device to send any event notification message across an IP network to a logger, commonly known as a syslog server. The widely used syslog protocol does not specify any mechanism to provide reliability and is normally run over the unreliable transport protocol User Datagram Protocol (UDP) [2]. Hence, messages can be dropped unnoticed or may be maliciously intercepted and altered. A standard for reliable syslog transport has also been proposed [3]. Here reliability in log delivery is provided using the connection oriented transport protocol Transmission Control Protocol (TCP) [4]. However, TCP has its inherent difficulties to provide both reliability and timeliness for transporting log messages in a lossy or congested network scenario. In this paper, we first identify several problems in the current syslog standard and then propose the use of PR-SCTP [5], an extension of Stream Control Transmission Protocol (SCTP) [6], as a candidate for the underlying transport service to ensure both reliable and timely delivery of syslog messages. PR-SCTP uses the message based abstraction of SCTP and chooses transport policy on a per message basis. In PR-SCTP, a sender application can apply a range of policies to define the retransmission limit for each transmitted message. In other words, the transport service can control the reliability level on a per message granularity. It guarantees maximum bandwidth usage for some prioritized messages with overall lower delay performance for all messages. We show a first set of evaluations of PR-SCTP performance in comparison with both TCP and UDP. The results show that in a congested network scenario, PR-SCTP can ensure improved delay performance while still providing reliable delivery of high priority messages. The remainder of the paper is organized as follows. The syslog protocol and the related transport services and problems are described in further detail in section 2. In section 3, we describe several features of SCTP in relation to the syslog protocol and suggest PR-SCTP as a transport alternative for syslog. In section 4, we detail the experiment of the performance evaluation and discuss the results for comparative analysis. Finally, we give some concluding remarks and some indication of future work in section 5.
2
Background
In this section, we discuss the syslog protocol in general and then we mention the major challenges for its existing transport services. 2.1
Syslog
The syslog protocol was designed simply to transport event notification messages from an originator to a collector. In addition, a relay forwards messages, received from the originator or other relays to collectors or other relays. As the number of systems under surveillance increases, it becomes common to move to a centralized logging of syslog messages. This allows an administrator to monitor the log files at one location rather than trying to monitor the log files on a large number of systems.
Priority Based Transport Service for Syslog Messages 100000
301
normal message important message
Number of log message generated
80000
60000
40000
20000
0 07/01/08 09/01/08 11/01/08 01/01/09 03/01/09 05/01/09 07/01/09 09/01/09 11/01/09 01/01/10 Date
Fig. 1. Number of Syslog messages received at the server between August 2008 and November 2009
Syslog is intended to be very simple. Each syslog message has only three parts. The first part specifies the priority of a syslog message. It represents, as a numeric value, the facility and severity of the corresponding event that generated this message. To produce a priority value, the facility value of a message is first multiplied by 8 and then added to the severity value of the message. For example, a user level message (f acility = 1) with a severity of alert (severity = 1) would produce a priority value of 9. Like in [7], messages with severity level of emergency, alert, critical, or error can be marked as important messages and the rest as normal messages. This marking can be used to prioritize the messages if needed. Figure 1 shows the statistics of syslog messages generated in a research network at the Computer Science Department at our university. For the sake of visualizing important messages, which are small in number, we restrict the number of normal messages up to one hundred thousand. It represents a common scenario where the number of normal messages surpasses the number of important messages by several orders of magnitude. The highest peak shows a sign of several mishaps in the network. The second part of the message contains a timestamp along with the host name or IP address of the source of the log. The third part, finally, is the actual message content and is human readable. Any application can generate syslog compliant messages and send them across a network. Since, each different application and operating system was developed independently; there is little uniformity in the content of messages. Also, message transmission can happen without any explicit knowledge about a receiver, and on the other hand a syslog server or a collector cannot ask a specific device to generate logs. Due to all this simplicity, syslog has been widely deployed and supported. The specification of the syslog protocol was standardized in March 2009 in RFC 5424 [8]. The specification is based on a layered architecture in which message content is separated from message transport.
302
2.2
M. Rajiullah, A. Brunstrom, and S. Lindskog
Transport Service for Syslog
Although it is practically simple to implement a syslog environment, it has several drawbacks while considering both reliability and timeliness in transporting log messages. This is due to the fact that most syslog implementations are based on UDP, as standardized in RFC 5426 [9]. Since, UDP is connectionless and unreliable, a collector does not send back any acknowledgement when a message is received. As a consequence, if a UDP packet is lost or damaged due to network congestion, resource unavailability, or interception or alteration by an intruder, then this will not be noticed. Moreover, UDP lacks a congestion control mechanism. If the network path is not over provisioned, with UDP, voluminous syslog traffic may aggravate the congestion level and harm the fairness of other flows sharing the same communication path. Several syslog implementations such as syslog-ng [10] support TCP in addition to UDP. TCP is a reliable connection oriented protocol. Using flow control, sequence numbers, acknowledgements, and adaptive timers, TCP guarantees the reliable, in-order delivery of a stream of bytes. Besides, it provides congestion control. This mechanism limits an application from overwhelming the network. Considering all these advantages, IETF has standardized TCP as an alternative transport protocol for delivering syslog messages. As pointed out by recent work [7], TCP has several shortcomings to provide the intended reliability syslog messages demand. First of all, TCP provides reliable delivery of the data transfer and in the same time it maintains strict order of data transmission. However, syslog messages may need reliable transmission but they are semantically independent. These messages do not demand sequence maintenance. So, partial ordering is highly desirable as the strict sequencing in TCP can introduce unnecessary delays to the overall message delivery services. This happens when a transmitted TCP segment is lost in the network and a subsequent segment arrives out of order. This subsequent segment is ceased at the transport layer until the lost segment is retransmitted and arrives at the receiver. This problem is called head of line (HOL) blocking [11]. Besides, the congestion control algorithm in TCP can cause additional delay if there is not enough space in the receiver window. In this case, a sender must wait until sufficient space is freed. This waiting time becomes quite undesirable and fatal when some messages need to be delivered instantly. In this sense, this strict reliability may not always be desirable [8]. The induced delay can block any syslog originator. According to [8], in Unix/Linux, a syslog originator or relay runs inside a high priority system process, that means if that process is blocked, the system may even face a deadlock situation to some extent. Additionally, TCP handles application data as a byte stream. This is often an inconvenience for the transportation of syslog messages, which are mostly independent in nature. So, in this case each application must add special markings to make sure that the receiver can easily perceive the particular message boundary. Lastly, an attacker may try to overwhelm the transport receiver by message flooding and cause denial of service (DoS) in the network. Hence, the transport
Priority Based Transport Service for Syslog Messages
303
protocol should provide features that minimize this type of threat [8]. TCP is known to be relatively vulnerable to this DoS attack although some solutions can be found in [12]. In addition to TCP, secure syslog transport over TLS is standardized in RFC 5425 [13]. Also, there is some ongoing work on specifying the use of DTLS for transportation of syslog messages [14]. As mentioned above, none of the currently standardized transports offer any possibility to prioritize syslog traffic beyond the choice of an unreliable or reliable transport. In [7], the authors proposed an application layer based prioritized retransmission mechanism to provide reliable delivery of syslog messages. However, their work indicates limited possibilities of prioritizations. In addition, they skipped many important details of flow and congestion control in their quite simplistic description. Also the added complexity in the application layer may be considered excessive for an application designer. Due to the limitations in the current solutions, we propose the use of PR-SCTP as a transport service for the syslog protocol.
3
PR-SCTP as a Transport Service for Syslog Messages
In this section, we discuss PR-SCTP in detail. Since it is an extension of SCTP and accommodates all its features, we start our discussion with SCTP. Similar to TCP and UDP, SCTP provides transport layer functions on top of a connectionless packet service such as IP. SCTP was primarily designed to overcome TCP’s shortcomings as a telephony signaling transport in IP networks. Later it was noticed that SCTP is also useful in diverse application areas other than signaling transport [15]. SCTP is now a mature general purpose transport protocol with implementation on various platforms, such as AIX, FreeBSD, HP-UX, Linux, Mac OS X, and Solaris/OpenSolaris. A separate kernel driver provides SCTP functionalities in Windows XP and Vista. Like TCP, it provides a reliable, connection-oriented, and flow-controlled transport service to various applications. Several advanced features are available in SCTP, which can help to deal with the shortcomings of TCP. SCTP supports multi-streaming which is used to ameliorate the HOL blocking effect that results from TCP’s strict ordering requirement. An SCTP association consists of multiple individual streams, which are logical unidirectional paths between the sender and the receiver. Data sequencing is only preserved within any particular stream, but not between streams. In other words, if a particular transmission unit is lost within a single stream, only that stream is blocked until the sender retransmits the missing data. However, other streams can allow their data to flow from the sender to the receiver in the usual fashion. Thus, HOL blocking is limited to a particular stream. SCTP also supports unordered delivery [6] where messages can be instantly delivered to the higher layer in whatever order they arrive at the receiver. This is an essential feature for the transportation of time sensitive logs in syslog. Besides, syslog originators send small (less than 1024 bytes) messages [1] to the syslog servers. The byte oriented nature of TCP is thus inconvenient to
304
M. Rajiullah, A. Brunstrom, and S. Lindskog
transport these messages. In comparison, SCTP is a message oriented protocol where whole messages can be delivered in its entirety as long as there is space in the receivers window. SCTP based syslog will not need any explicit message delimiters, which simplifies application level message parsing. Moreover, as previously discussed, an attacker may try to flood a syslog server with continuous spoofed connection setup requests to consume all its resources and cause DoS in the network. SCTP uses a simple but powerful technique to eliminate the risk of SYN flooding as a DoS attack. It utilizes a four-way handshake with a cookie mechanism [6] during association initialization to avoid maintaining state information in the server side for incomplete sessions. This is an efficient yet simple technique to allow only legitimate users to access servers resources. An authentication extension of SCTP [16] supports further security enhancements. It adds functionality for sender authentication and message integrity. TCP’s firm reliability implementation may block a sender during transmission. This may for instance happen when a receiver becomes unable to receive any more messages, such as the corresponding application layer being slow. In this case, one solution may be to discard some of the messages where a syslog sender application would otherwise be blocked. We can think of several policies to do this job. According to RFC 5424 [8], when messages need to discarded, prioritization should be considered according to the severity value of a message. And if any message is abandoned, this should be reported to the receiver. PRSCTP can be very useful to implement such an idea in the next version of syslog release. In PR-SCTP, a sender can choose (re)transmission behavior on a per message basis. When a message is abandoned, the sender notifies its receiver to move the Cumulative ACK point forward, which simply tells the receiver not to expect that particular message from it any more. It serves two purposes; one is that the sender’s transport layer can now drop some messages according to the particular partially reliable semantics used and secondly, this incident can also be notified to the receiver. Timed reliability is an example of this kind of service. In such case, an application defines a specific lifetime for every message. PR-SCTP tries to reliably transmit a message during its lifetime and upon expiration it simply abandons the message and notifies the receiver without considering whether the message is successfully transmitted or not. In this way, bandwidth wastage is reduced which can be used to transmit unexpired messages. Besides, under all circumstances, an unexpired message is a candidate for retransmission if it is lost. This is how PR-SCTP provides partial reliability. The overall benefits of using PR-SCTP are summarized as follows: – A single PR-SCTP association can carry both reliable and unreliable messages according to an application specific partial reliability policy, – Compared to TCP, PR-SCTP avoids the HOL blocking problem, – PR-SCTP allows message boundaries to be preserved during transportation, – PR-SCTP specifies several security benefits through the four way handshake mechanism and the authentication extension.
Priority Based Transport Service for Syslog Messages
305
Moreover, the PR-SCTP extension allows a receiver to be completely unaware about the sender’s particular reliability policy. In this work, we adopt PR-SCTP as the transport service for syslog messages and evaluate its performance.
4
Performance Evaluation
In this section, we test and evaluate the performance of PR-SCTP in comparison to both UDP and TCP in an emulation based experiment setting. 4.1
Experiment Setup
For the investigation of PR-SCTP, we adopt a single bottleneck emulation based experiment setup. All the experiments are carried out in a local LAN test bed. Figure 2 shows the setup we use. Both end machines are configured with FreeBSD 8.1 PRERELEASE. Our custom made application implements all the necessary SCTP API and flags to enable the PR-SCTP extension. Traffic is routed through a middle box running also a FreeBSD kernel. The Dummynet traffic shaper [17] is used in this machine to introduce artificial and random packet loss, delays and bandwidth limitations in the network. End-to-end delay in the network is set to 40 ms and bandwidth is set to 10 Mbps. In this experiment, we only consider a single flow to understand the manifestation of various PR-SCTP properties under various network settings. In every experiment, the client machine initiates a connection and the server machine sends 10,000 syslog messages in response. Messages were generated according to a Poisson arrival process. Relating to our discussion in Section II, the server sends both important and normal messages. We use several distributions of important messages starting from 1% to 20%.
Fig. 2. Client-server based network used in the experiment
In these experiments, we use a timed reliability based PR-SCTP policy. Our intention is that, even under heavy congestion in the network, important messages should go through whereas normal messages have a smaller chance. We use 5000 ms TTL (time to live) for an important message and 100 ms TTL for a normal message. In a network with a 40 ms end-to-end delay and under a light congestion scenario, a 100 ms TTL is considered to be sufficient for a single retransmission opportunity in response to packet loss. We use the same application settings on top of various transport services such as TCP, UDP and PR-SCTP. We perform each experiment with 30 repetitions to allow for 95% confidence intervals. All the experiment related parameters are given in Table 1.
306
M. Rajiullah, A. Brunstrom, and S. Lindskog
Table 1. Experiment related parameters Parameter
Values
Message size: Number of msg sent: Send rate: Imp msg: Packet loss rate:
250 Bytes 10000 1 Mbps 1%, 3%, 5%, 10%, 20% .5%, 1%, 1.5%, 2%, 3%, 4% and 5% One way Delay: 40 ms Queue size: 20 packets Bandwidth: 10 Mbps(Up and Down) Evaluated protocol: TCP, UDP and PR-SCTP TTL for imp msg: 5000 ms TTL for norm msg: 100 ms Number of repetitions: 30 Operating system: FreeBSD 8.1 PRERELEASE
We measure several performance metrics such as average packet delay and average packet loss rate for both important (imp msg) and normal messages (norm msg). 4.2
Experimental Results
In this section, we present the results from our measurements. We start with delay performance of both important and normal messages under varying network conditions. The delay is measured as the time difference between the generation of a syslog message at the sender and the reception of the corresponding message at the receiver. It is crucial for the syslog protocol and the overall optimization of the system that messages go through the network in a timely manner. We evaluate the delay performance for PR-SCTP, TCP and UDP for various important message distributions. Before we go to the details of the evaluation, we will first have a look at the delay behavior in the network for each message for both PR-SCTP and TCP, as displayed in Figure 3. Figure 3 suggests that when the packet loss rate in the network is as high as 3%, TCP cannot keep up with the application demand and delay keeps rising indefinitely, whereas PR-SCTP shows quite good response with the application demand by prioritizing its traffic. This result motivates for further investigation of PR-SCTP properties. Figure 4a shows the numerical results from several individual experiments with different transport services. Each data point in the graphs is an average result of 30 experimental runs. Along with each data point, a 95% confidence interval is also shown. Since PR-SCTP supports prioritization, we present the delay graph for both important and normal messages separately in this figure. PR-SCTP shows much faster response compared to that of TCP for both normal and important messages. It is worth noticing that graphs for both normal
Priority Based Transport Service for Syslog Messages 8
307
TCP PR-SCTP (20% imp msg) PR-SCTP (5% imp msg)
7
6
Delay (s)
5
4
3
2
1
0
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Message no.
Fig. 3. Per message delay for 3% packet loss rate
and important messages coincide. The rational behind this is that only smaller portions of the total number of messages are important and treated specially in the event of loss. On the other hand, many of the normal messages are lost and are not recovered during heavy congestion. PR-SCTP abandons these messages and no delays affect the successive messages. Figure 4a also shows that when we have a small number of important messages, such as 3%, delay in PR-SCTP becomes almost as fast as UDP for transferring messages through the network. However, when the number of important messages becomes as high as 10%, average delay increases as the packet loss rate increases. In this case, a high packet loss rate also causes an increase in the number of retransmissions. Hence, as the number of important messages increases, PR-SCTP needs more time to deliver all the important messages in the event of heavy packet loss. This is particularly confirmed in the subsequent Figure 4b, where all the results of PR-SCTP for various distributions of important messages are presented in the same figure. Still PR-SCTP can sustain several orders of magnitude lower delay than TCP. According to Figure 3, TCP shows a stable trend of increasing delay because it does not have the benefit of prioritization and thus handles all message with equal importance. As a consequence, Figure 4a shows that it is substantially slower than PR-SCTP in various packet loss scenarios. Thus, the use of TCP for carrying important messages which needs special treatment in terms of both reliability and timeliness in a syslog environment may not be very wise. Next, we evaluate average message loss rate for the transport services under evaluation. We do not expect that there will be any packet loss for TCP, but the result for TCP is included for the clarity of comparison. We use traces from both the sender and the receiver side to calculate the average loss rate. In this case, we divide the number of lost messages by the number of transmitted messages to derive the average loss rate. Figure 5a shows the loss rate of important messages for UDP, TCP and PR-SCTP with various distributions of important messages. This figure clearly shows that by giving high priority to an important message, PR-SCTP ensures TCP like reliability. Figure 5b, on the other hand, shows the
308
M. Rajiullah, A. Brunstrom, and S. Lindskog
11
5
TCP UDP PR-SCTP(3% imp msg)for imp msg PR-SCTP(10% imp msg)for imp msg PR-SCTP(3% imp msg)for norm msg PR-SCTP(10% imp msg)for norm msg
10 9
0.5 % Plr 1% Plr 1.5% Plr 2% Plr 3% Plr 4% Plr 5% Plr
4
8
Average delay (s)
Average delay (s)
7 6 5 4
3
2
3 1
2 1 0
0.5
1
1.5
2
2.5 3 Packet loss rate (%)
3.5
4
4.5
0
5
0
2
4
6
8 10 12 Imp msg distribution (%)
(a)
14
16
18
20
(b)
Fig. 4. (a) Average delay comparison between TCP, UDP and PR-SCTP(for both imp and normal message). (b) Average delay comparison for different loss rates and imp msg distributions for PR-SCTP. 7
5
TCP UDP PR-SCTP(1% imp msg) PR-SCTP(20% imp msg)
6
5 Normal message loss (%)
Important message loss (%)
7
TCP UDP PR-SCTP(1% imp msg) PR-SCTP(3% imp msg) PR-SCTP (5% imp msg) PR-SCTP(10% imp msg) PR-SCTP (20% imp msg)
6
4
3
2
4
3
2
1 1 0 0 -1
0.5
1
1.5
2
2.5 3 Packet loss rate (%)
(a)
3.5
4
4.5
5
0.5
1
1.5
2
2.5 3 Packet loss rate (%)
3.5
4
4.5
5
(b)
Fig. 5. (a) Comparison of average loss percentage between TCP, UDP and PR-SCTP for important messages. (b) Comparison of average loss percentage between TCP, UDP and PR-SCTP for normal messages.
loss rate of normal messages for the same transport services. We avoid putting all the PR-SCTP results to make the figure less crowded and more readable. Since, the application level policy for PR-SCTP gives lower priority to a normal message, there is no statistically significant difference in the average loss rate between UDP and PR-SCTP for normal messages. However, still some difference is notable between the two results of PR-SCTP. When there is a large number of PR-SCTP messages with high priority, much of the bandwidth is spent for their retransmission during packet loss. However, this bandwidth is used to transmit some of the normal messages when we have a small number of high priority messages. This shows the bandwidth efficient behavior of PR-SCTP. According to both Figures 4 and 5, it is evident that PR-SCTP can provide both timeliness and reliability for the important messages in the evaluated scenario. For the lower
Priority Based Transport Service for Syslog Messages
309
percentage of important messages, PR-SCTP can be almost as fast as UDP and as reliable as TCP. In the syslog protocol, PR-SCTP can be very useful as a transport service where some log messages with high severity value enjoys a reliable service and some low priority messages are allowed to be dropped when congestion or blocking appears in the network. In our evaluation, we set the TTL value of important messages long enough so that even in highly congested network, important messages can get through. We intentionally do not set this value too high as that may cause a sender to block for a long time in some event such as buffer shortage at the receiver side. Since, the PR-SCTP policy is applied in the application layer, this type of optimization needs to be considered during application design.
5
Conclusion
In this paper, we have discussed several problems in both TCP and UDP, the currently used transport services, while transporting syslog messages. We propose PR-SCTP, an existing partially reliable extension of SCTP, as the transport service for both timely and reliable retrieval of high priority syslog messages at the receiver. By means of emulation, we have shown various performance aspects in a simple scenario. Our initial evaluation shows that PR-SCTP can provide lower average delay and can more intelligently use bandwidth for transporting messages while ensuring reliability for high priority messages. Although PR-SCTP drops some low priority messages, this saves time and capacity to effectively deliver more important messages. This is basically a trading of reliability against timing during times of congestion. Hence, PR-SCTP can be the right choice as a transport service for the syslog protocol. Our initial observation of PR-SCTP properties seems useful and attractive. However, we believe that more experiments in a wide range of settings with multiple flows as well as RTT and capacity variations are needed to understand the true potential of this transport service. We intend to do all these evaluations in our future work. PR-SCTP handles syslog messages on a priority basis, assigned at the senders application layer. This layer exploits the basic feature of syslog messages such as their associated severity parameters to prioritize messages. For the sake of simplicity in our initial work, we only use two kinds of priorities but multi-level priorities should be an obvious choice for a real implementation. In addition, our partial reliability policy is based on TTL values, which is only an option not a limitation. This policy can be implemented in many ways such as based on the number of sender retransmissions or based on the size limitation at the sender’s transmission queue. In our future work, we also aim to evaluate these possibilities. In conclusion, our motivation for applying prioritization while transmitting traffic in a syslog context seems appropriate. Furthermore, our current transport service evaluation confirms the effectiveness of PR-SCTP in terms of providing an attractive trade-off between reliability and timeliness for transporting syslog messages.
310
M. Rajiullah, A. Brunstrom, and S. Lindskog
References 1. 2. 3. 4. 5. 6. 7.
8. 9. 10. 11.
12. 13. 14. 15. 16. 17.
Lonvick, C.: The BSD Syslog Protocol. RFC 3164 (August 2001) Postel, J.: User Datagram Protocol. RFC 768 (August 1980) New, D., Rose, M.: Reliable Delivery for syslog. RFC 3195 (November 2001) Postel, J.: Transmission Control Protocol. RFC 793 (September 1981) Stewart, R., et al.: Stream Control Transmission Protocol (SCTP) Partial Reliability Extension. RFC 3758 (May 2004) Stewart, R.: Stream Control Transmission Protocol. RFC 4960 (September 2007) Tsunoda, H., et al.: A Prioritized Retransmission Mechanism for Reliable and Efficient Delivery of Syslog Messages. In: Proceedings of Seventh Annual Communication and Services Research Conference, Washington, DC, USA, pp. 158–165 (2009) Gerhards, R., et al.: The syslog Protocol. RFC 5424 (March 2009) Okmianski, A.: Transmission of Syslog Messages over UDP. RFC 5426 (March 2009) Syslog New Generation (Syslog-ng), http://www.balabit.com/network-security/ syslog-ng/ (visited September 20, 2010) Marco, G.D., et al.: SCTP as a transport for SIP: a case study. In: 7th World Multiconference on Systemics, Cybernetics and Informatics (SCI), Orlando, FL, USA, July 2003, pp. 284–289 (2003) Eddy, W.: TCP SYN Flooding Attacks and Common Mitigations. RFC 4987 (August 2007) Miao, F., et al.: Transport Layer Security (TLS) Transport Mapping for Syslog. RFC 5425 (March 2009) Salowey, J., et al.: Datagram Transport Layer Security (DTLS) Transport Mapping for Syslog, draft-ietf-syslog-dtls-06.txt(work in progress) (expires: January 9, 2011) Fu, S., et al.: SCTP: State of the art in research, products, and technical challenges. Communications Magazine, IEEE 42(4), 64–76 (2004) Tuxen, M., et al.: Authenticated Chunks for the Stream Control Transmission Protocol (SCTP). RFC 4895 (August 2007) Rizzo, L.: Dummynet: A simple approach to the evaluation of network protocols. ACM SIGCOMM Computer Communication Review 27(1), 31–41 (1997)
Auto-discovery and Auto-configuration of Routers in an Autonomic Network Arun Prakash, Alexej Starschenko, and Ranganai Chaparadza Fraunhofer FOKUS, Competence Center MOTION, Kaiserin-Augusta-Allee 31, 10589 Berlin, Germany {arun.prakash,alexej.starschenko,ranganai.chaparadza}@fokus.fraunhofer.de
Abstract. The domain of Autonomics and Self-Managing networks come with a number of self-* features, such as auto-discovery and autoconfiguration to name a few. In this paper, we provide a novel approach to auto-discover and auto/self-configure routers for OSPF routing in an autonomic network. We present the enablers for realizing these self-* functionalities. This includes a framework for describing the network policies, objectives and router configuration, models to be followed by a node/device implementation when self-describing the capabilities of a node and tokens for enforcing security and access control during the autodiscovery and auto-configuration processes of a node. We also present the algorithms that the various entities should employ for realizing these self-* features in their autonomic networks. Keywords: autonomics, auto-discovery, auto/self-configuration, GANA, self-managing networks.
1
Introduction
The domain of Configuration Management has been well studied over the past years. A number of problems ranging from configuration-related problems, such as the impact of configuration-errors on security and operation of the network [1], to problems due to manual configurations, to problems related to network configuration management frameworks, and the limitations of their associated approaches and protocols are all well-documented [2,3,4]. Traditional network management and configuration protocols such as SNMP [5] and COPS (for policy configurations in Policy-Based Network Management (PBNM)) that still play a significant role in device and network configuration contribute their own set of problems [6]. Proprietary approaches to configuration management based on CLI (Command Line Interface), which are vendor-specific, are not free from problems either. The domain of autonomic networking promises to alleviate the problems infesting the domain of configuration management. The auto-discovery and autoconfiguration functionalities of self-* networks move ahead from the traditional scripting and automation techniques, to an advanced feedback-control based configuration m management. Several initiatives such as ANEMA [7], ANA [5], SelfNET [6], and Generic Autonomic Network Architecture (GANA) [8,9] to name R. Szab´ o et al. (Eds.): AccessNets 2010, LNICST 63, pp. 311–324, 2011. c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2011
312
A. Prakash, A. Starschenko, and R. Chaparadza
a few, propose self-managing/autonomic architectures for network management and control. In this paper we focus on the auto-discovery and auto-configuration functionalities of GANA. The auto-discovery and auto-configuration functionalities of GANA are inbuilt with security features, and thus have explicit and far reaching influence on other autonomic functionalities in the network such as autonomic-routing, self-organization, etc. The paper is structured as follows: In Section 1, the problems with state-ofthe-art device and network configuration methods, and the need for advanced auto-discovery and auto-configuration techniques are discussed. Sections 2.1, 2.2, 2.3 showcase the enablers of auto-discovery and auto-configuration in GANA conformant networks. The algorithms to auto-discover and auto/self-configure routers for realizing OSPF [10] routing in an autonomic network are provided in Section 3. Finally, in Section 4, the conclusion, insights and future research directions are deliberated.
2
Auto-discovery and Auto-configuration Enablers
The auto-discovery and auto-configuration mechanisms in GANA require the concepts: GANA Network Profiles, GANA Capability Description Model, GANA Tokens and ONIX [11] as enablers for their complete range of functionality, potential and dynamism. These self-* enablers, with the exception of ONIX are discussed in here. ONIX (Overlay Network for Information eXchange) [11] can be considered as a scalable, fault-tolerant and secured information and knowledge exchange system, with its own set of protocols and functions which facilitates seamless data and information distribution in an autonomic network. 2.1
GANA Network Profiles
A Profile is defined as a composition of Policies, Functional Objectives and Configuration Data required for defining overall goals and specific objectives for networking functions such as routing, security, etc of a network. A Goal, in the context of GANA, is defined as the overall target of the network as described by an network operator/administrator. Thus Goals delineate the Policies, Objectives and Configuration Data required for a network and its devices. In the context of GANA, the Profile is called a GANA Network Profile (GANA NETPROF) and is designed to provide: – A structured and monolithic framework with a common data structure for specifying the policies, objectives and configuration data for an autonomic network and its nodes. – A flexible framework for (re)configuring nodes, based on the dynamic roles they are computed to play in the network, and – A mechanism to separate a node’s role and functionality from its vendor specific configuration requirements.
Auto-discovery and Auto-configuration of Routers in an Autonomic Network
!" #""
$
!" #""
%
&
%
&
$%
$&
(a) Vertical Decomposition
(b) Horizontal Decomposition
313
$ % & ' ! ( $ ' #
! " #
(c) Routing Profile
Fig. 1. Network Profile - NETPROF
A GANA NETPROF thus consists of a Network Profile (NETPROF), GANA Configuration Options Map (MAP) and several Vendor Specific Node Configurations Options (CONFIG). The NETPROF is composed of Policies, Objectives, Configuration Data and hooks for importing vendor specific CONFIG, and is structured along the GANA hierarchy [8,9] as shown in Figure 1a. At each GANA level, sub-profiles encapsulating the Policies, Objectives and Configuration Data for a GANA Decision-Element (DE) (≈ networking function) and its Managed Entities (MEs) are provided. The NETPROF can also be viewed to be composed along the abstracted functionalities of the GANA architecture, i.e., along functionalities such as routing, security, etc. Thus each of these functionalities can be considered to be contributing a profile of their own, as shown in Figure 1b. Thus, there is an inherent relationship between the GANA levels and abstracted functionalities as reflected Figure 1c. The NETPROF is designed to accept different vendor CONFIGs for a node, as such an arrangement provides a number of advantages. The network operator/administrator can use existing configuration files without major changes, thus avoiding potential errors during the migration from their traditional networks to a GANA conformant network. They can continue to use their configuration files that are vendor specific for the nodes/devices. Finally, the hooks in the NETPROF do not confine a node role to be vendor specific at design time, allowing dynamic role switching and re-configuration of the nodes. While some configuration parameters are static, whose values are not manipulated by the DEs, the configuration values for the many parameters need to be adjusted at runtime in a dynamic manner to reflect the goals and objectives of the network. For instance, the value of a parameter such as Area ID, in the case of OSPF [10] routing is dynamic, as the network gets autonomically partitioned
314
A. Prakash, A. Starschenko, and R. Chaparadza
and merged into new OSPF areas due to failures, new routing nodes being added and changing network conditions. The problem arises when such configuration parameters are expressed in different vendor specific semantics for their devices. In order to enable a vendor-free implementation of a specific DE, we provide a solution with the MAP. The MAP is a tabular structure that maps configuration parameters used in GANA in their standard form (e.g. IETF RFCs) to vendor specific formats. The MAP holds the semantics of both name and value of a configuration parameter. It is used by the Network-Level Routing-Management DE (NET LEVEL RM DE) [8,9] for the generation of the GANA Node Configurations (NODECONF) from the NETPROF. The NODECONF is used by a node for its configuration. The GANA NETPROF is formalized through the well known industry de facto XML standard. XML provides a formal and standardized approach to the design and engineering of GANA NETPROFs. The use of GANA NETPROFs for network governance is fully described in [12]. 2.2
GANA Capability Description Model
A self-managing network needs a way to know the entities composing the network and the Capabilities of individual functional entities of the nodes in the network. The auto-discovery functionality of the network should gratify this need. In GANA, self-description, self-advertisement, topology-discovery and support for solicitation of Capabilities belong to the auto-discovery functions of a node. Self-Description of Capabilities. is the ability of a functional entity to describe itself, i.e., to describe its Capabilities such as hardware and software specifications, supported protocols, services and tools, interface information, etc, its current role and the possible potential roles it can play in the network. These compose the GANA Capability Description Model of a functional entity. The Capability Description is formalized through the industry de facto XML standard. Self-Advertisement of Capabilities. is the process by which a functional entity spontaneously disseminates its Capability Description to other functional entities either inside a node or in the network. The dissemination may be done over a distributed repositories such as ONIX [11]. Support for solicitation for Capabilities. is ability of a functional entity to respond to requests for its Capability Description by initiating its self-description and self-advertisement functions. This is vital for the self-organization functionality of a network. In GANA, the auto-discovery mechanism is initiated by the Node-Main DE (NODE MAIN DE) [9,8] of a node. The NODE MAIN DE generates the Capability Description of the node by triggering the iterative self-description process as shown in Figure 2. The Capabilities of individual DEs and its MEs are obtained in a recursive manner. The aggregated Capabilities of the node are then advertised to the network, thus completing the auto-discovery process of a node.
Auto-discovery and Auto-configuration of Routers in an Autonomic Network Aggregates capabilities conveyed by FUNC_LEVEL_DE_1 and FUNC_LEVEL_DE_2 plus its own capabilities and self-advertise them
315
Self-Advertises the node/ device’s Capabilities [employs security mechanisms]
17
Node Level NODE_MAIN_DE
Aggregates capabilities conveyed by PROTO_LEVEL_ DE_1 and ME_3 plus its own capabilities and presents them to the upper DE
Aggregates capabilities of ME_1 and ME_2 and its own capabilities and presents them to the upper DE
1 10
Function Level
FUNC_LEVEL_DE_1
2
Protocol Level
3
7
8
PROTO_ LEVEL_DE_1 4
5
ME_1
11
16
FUNC_LEVEL_DE_2
9 12 13
14 15
Aggregates capabilities of ME_4 and ME_5 and its own capabilities and presents them to the upper DE
ME_3 6
ME_4
ME_5
ME_2
Fig. 2. Auto-Discovery Functionality in a GANA Conformant Node
Info
Info
Info
Table 1. GANA Tokens Types and Token Information - Abridged Example
2.3
Token Type 0 0 1 2 3 1 0 1 2 0 1
Type Description Entity Role Token Network Operator/Administrator NET LEVEL SEC MNGT DE Network-Level DE GANA Node Permitted ONIX Operations Token On-Behalf Subscribe On-Behalf Update ONIX Information Access Token Network Policy Monitoring Data
GANA Tokens
A GANA Token can be considered as an object that holds the token type and token information. Three different types of GANA Tokens have been defined, – Entity Role Token: This identifies the token holder as a certain type of network entity / network role. – Permitted ONIX Operations Token: This allows the token holder to have privileged ONIX related operations. – ONIX Information Access Token: This allows the token holder with privileged access to information and data stored in ONIX.
316
A. Prakash, A. Starschenko, and R. Chaparadza
A GANA Token is always encapsulated by a security key issued by the NetworkLevel Security-Management DE (NET LEVEL SM DE) [8,9]. A key can have any number of ONIX related tokens, but only one Entity Role Token. The Token Information (Info) can be configured manually by the network operator/administrator or can be dynamically set by the NET LEVEL SM DE. The NET LEVEL SM DE issues GANA Tokens to the nodes to be used during the above mentioned operations in the network. A brief illustration of GANA Tokens is given in Table 1.
3
Auto-discovery and Auto-configuration Algorithms
The auto-discovery and auto-configuration algorithms of a GANA Node and the NET LEVEL RM DE are discussed here. The algorithms focus on the mechanisms for realizing OSPF [10] routing in an autonomic network. 3.1
Network-Level Routing-Management Decision-Element
The NET LEVEL RM DE [8,9] belongs to Level-4 of the GANA control-loop hierarchy. Thus it has the overall view on the routing functionality of the network, required for the configuration of the routers. In the context of this paper, the DE supports the network operator/administrator by assisting in network topology design and area-partitioning for OSPF routing during the network topology design phase. Further, autonomic network topology discovery and generation of router NODECONF with respect to the network design and goals are carried out at runtime without any manual intervention. Additionally, the DE triggers the reconfiguration of the routers to adapt to changing routing goals and dynamic network conditions. These functionalities of the NET LEVEL RM DE are realized through a number of modules and functions, as discussed below. Partitioning Parameters (P artitioning P arameters) influence the partitioning of a network topology into OSPF areas and are provided by the operator through the Desired Topology element of the NETPROF. If no parameters are explicitly set, default values are used for generation of OSPF areas. The parameters of interest are: – Maximum/minimum number of routers in an area - It determines the size of an area during network design and partitioning. – Maximum/minimum number of areas - It specifies the number of areas during network design an partitioning. – Threshold - It expresses the maximum size of an area permissible for OSPF routing. OSPF Desired Topology Partitioning Module partitions the network topology designed (desired) by the network operator/administrator into OSPF areas with respect to constraints imposed by the partitioning parameters. The behavior of the module is specified in Algorithm 1. The outcome of the
Auto-discovery and Auto-configuration of Routers in an Autonomic Network
317
Algorithm 1. OSPF Desired Topology Partitioning Require: UPDATES of Desired T opology, P lanned T opology and P artitioning P arameters Ensure: P lanned T opology computed according to P artitioning P arameters 1: loop 2: if Desired T opology available then 3: if not P lanned T opology generated from Desired T opology then 4: if not P artitioning P arameters published then 5: P artitioning P arameters = DEFAULT 6: P lanned T opology = OSPF Area Partitioning(Desired T opology, P artitioning P arameters) 7: if Partitioning FAILED then 8: REPORT FAILURE 9: else 10: PUBLISH P lanned T opology 11: LISTEN Desired T opology updates
partitioning process (P lanned T opology) is published to ONIX as an update to NETPROF. The P lanned T opology is applied to the network by the OSPF Actual Topology Partitioning and Configuration Module. Additionally the operator may view and modify the P lanned T opology with new data. OSPF Actual Topology Partitioning and Configuration Module handles a number of tasks relevant to the actual network, such as: 1. To apply the Planned Topology (if one is present) when it matches with the actual network topology. 2. To partition the network, each time the number of routers in an OSPF area reaches the set threshold. 3. To compute and distribute NODECONFs that contain the required routing related configuration for each node. Algorithm 2 reflects the behavior of this module. The Actual T opology encapsulates the current network graph, i.e, the nodes and their interconnections. It is required in order to compute the point-of-attachment of the routers in the network, and thus determine their network roles and corresponding router NODECONFs. It may be generated by a topology-discovery function using IPv6’s Neighbor-Discovery (ND) protocol, wherein each router publishes the state of its neighbors to the NET LEVEL RM DE facilitating the computation of the IP-layer topology. The generated Actual T opology is published into ONIX as a update to the NETPROF. The NETPROF contains several sub-profiles (Node Profiles) for various node roles. For OSPF routing, the node roles are classified as: Core Router (CR), Area Border Router (ABR) and Autonomous System Border Router (ASBR). Using the Area Assignment computed by the partitioning function, Capability Description and the current role of the node, the role of each router in the
318
A. Prakash, A. Starschenko, and R. Chaparadza
Algorithm 2. OSPF Actual Topology Configuration and Partitioning Require: UPDATES of Desired T opology, P lanned T opology, Actual T opology, Capability Descriptions Ensure: M AX(AREA SIZE(Actual T opology)) <= T HRESHOLD, P lanned T opology applied to network, routers configured 1: if not P artitioning P arameters published then 2: P artitioning P arameters = DEFAULT 3: loop 4: if Actual T opology changed then 5: if P lanned T opology and Actual T opology match then 6: for all Router in Actual T opology do 7: COMPUTE N odeConf using P lanned T opology 8: PUBLISH N odeConf to ONIX 9: SUBSCRIBE Router to receive N odeConf 10: WAIT for ACK from Router 11: else 12: if SIZE(Actual T opology) > SIZE(P lanned T opology) then 13: P artitioning P arameters = DEFAULT 14: if MAX(AREA SIZE(Actual T opology)) > THRESHOLD then 15: T opology = OSPF Area Partitioning(Actual T opology, P artitioning P arameters) 16: if Partitioning FAILED then 17: P artitioning P arameters = DEFAULT 18: T opology = OSPF Area Partitioning(Actual T opology, P artitioning P arameters) 19: if Partitioning FAILED then 20: REPORT FAILURE 21: for all Router in Actual T opology do 22: COMPUTE N odeConf using T opology 23: PUBLISH N odeConf to ONIX 24: SUBSCRIBE Router to receive N odeConf 25: WAIT for ACK from Router 26: else 27: COMPUTE N odeConf of new router using Capability Description and Actual T opology 28: PUBLISH N odeConf to ONIX 29: SUBSCRIBE Router to receive N odeConf 30: WAIT for ACK from Router 31: LISTEN Actual T opology updates
network is computed. The appropriate node sub-profile of the NETPROF is chosen for the generation of the NODECONF, as shown in Algorithm 3. OSPF Area Partitioning Module contains functions to partition a network into OSPF areas and is used by both the modules specified in sections above. It uses a modified version of the DDOA-Algorithm [13], adapted to fit with the GANA requirements. The behavior is realized through three separate functions as depicted in Algorithm 4.
Auto-discovery and Auto-configuration of Routers in an Autonomic Network
319
Algorithm 3. GANA NODECONF Computation Require: Capability Description of Router, Conf iguration Options, Conf iguration Options M ap, T opology, Area Assignment, N etP rof Ensure: N odeConf for Router computed 1: Router V endor = read vendor from Capability Description 2: RECEIVE Conf iguraion Options for Router V endor from ONIX 3: READ RouterID from Capability Description 4: SET RouterID in Conf iguration Options using Conf iguration Options M ap 5: for all Interf ace of Router in T opology do 6: COMPUTE AreaID for Interf ace using T opology and Area Assignment using 7: SET AreaID for Interf ace in Conf iguraion Options Conf iguration Options M ap 8: N odeRole = role of router in AreaAssignment and T oplogy 9: GET N odeConf element for N odeRole from N etP rof 10: SET Policies,Objectives,Configuration for FUNC LEVEL RM DE in N odeConf 11: PASTE Conf iguration Options for Router V endor into N odeConf 12: if N odeConf already published on ONIX then 13: N odeConf Current =RECEIVE N odeConf 14: REPLACE classified under Routing with the elements from N odeConf 15: RETURN N odeConf Current 16: else 17: RETURN N odeConf
Node-Grouping Function applies the node grouping constraints (if any) specified in the P artitioning P arameters. The constraint ensures that the grouped nodes remain in the same OSPF area after the partitioning. Nodes are grouped by replacing them with a single node in the network topology graph. The weight of this single node is the aggregate weight of the group. After partitioning, the area-assignment of the representative node is applied to the entire group. Graph-Partitioning Function partitions a network-graph into interconnected partitions under the constraints imposed by the P artitioning P arameters. For graph partitioning the Chaco [14] partitioning tool is used. The number of desired areas (computed from P artitioning P arameters), the weighted network topology graph, and optionally the geographical coordinates of the vertices (nodes/routers) serve as inputs. The graph is partitioned into the desired number of areas while balancing the number of vertices in each area (obtained from P artitioning P arameters). Each vertex is then assigned an area-number based on the area it belong. The output (Area Assignment) cannot be directly used for OSPF area assignment as Chaco does not guarantee interconnected areas. This function extends Chaco by reassigning disconnected areas with new area numbers. After graph partitioning, the function checks whether the resulting graph satisfies the partitioning requirements, namely minimum size of an area, maximum number of areas, etc, enclosed within P artitioning P arameters. The function returns the computed area assignment if the requirements are satisfied. If
320
A. Prakash, A. Starschenko, and R. Chaparadza
the number of areas is beyond the maximum, the function tries to merge some partitions without violating the minimum number of areas and the maximum number of nodes per area constraints. If the merging is successful, the new areaassignment is returned, else a failure is indicated. Algorithm 4. OSPF Area Partitioning Module Require: T opology, P artitioning P arameters Ensure: Area Assignment for routers that fulfills partitioning-constraints 1: Graph = graph-representation of T opology 2: T opology = N odeG rouping(Graph, P artitioning P arameters) 3: N Areas = minimum number of areas 4: while N Areas <= maximum number of areas do 5: Area Assignment = Network Partition(Graph, P artitioning P arameters, N Areas) 6: if SUCCEED Partitioning then 7: Area Assignment = Area0 Design(Graph, P artitioning P arameters, Area Assignment) 8: if SUCCEED Area0-Design then 9: Area Assignment = Area Assignment ”+” Area Assignment for grouped nodes 10: return Area Assignment 11: N Areas = N Areas++ 12: return F AILURE
Area0-Design Function computes the OSPF ABR candidates and builds the OSPF Area-0 for the partitioned network-graph. The inputs to this function are the network-graph, partitioning-parameters and area-assignment. The output of this function is the Area Assignment with Area-0 defined. Nodes that are connected to each other across partition boundaries are chosen as the ABRCandidates for the network. Area-0 is computed by choosing a predefined number (def ault = 1) of nodes from each area’ ABR candidate-list. If the network operator/administrator choose some nodes as preferred ABRs, they are chosen first. Otherwise the candidates with the highest sum of edge-weights are chosen. The chosen ABRs are used for constructing Area-0. If Area-0 is discontinuous, more candidate ABRs are added or virtual links [10] are used to construct a connected Area-0. If this operation succeeds, the resulting area0-assignment for the selected ABR candidates is saved. Other possible combination of ABR candidates are tried, and the most optimal solution (minimal, robust, etc) is chosen. If no solutions are found, a failure is indicated. 3.2
Node Main Decision-Element
The NODE MAIN DE [8,9] is the top element in the DE hierarchy of a GANA node. It provides several functions and manages the node as a whole. The autodiscovery and auto-configuration functionalities of a node are managed and orchestrated by the NODE MAIN DE. They are discussed below.
Auto-discovery and Auto-configuration of Routers in an Autonomic Network
321
Security. In GANA, some security aspects are inbuilt into the auto-discovery and auto-configuration functionalities of the node and the network. GANA Tokens (see Section 2.3) are used for authentication of the node and access control of ONIX information. All communication between the nodes, Network-Level DEs and ONIX make use of GANA Tokens for network security. The GANA Tokens are provided by the NET LEVEL SM DE [8,9]. Node Bootstrap. When the node/router is turned on by the network operator/administrator, the NODE MAIN DE is invoked and performs the initialization and orchestration of node parameters and DEs. For the initialization and orchestration operations, default/initial DE and protocol configurations are used. During the bootstrap phase the NODE MAIN DE also triggers the addressauto-configuration of the interfaces of the node/router. For the address-autoconfiguration the Autonomic DHCP Architecture (ADA) [15] is employed. ADA enables the address auto-configuration of node/router interfaces by providing a set of extensions to DHCP to primarily support interface bootstrapping and zero-conf DHCP relaying [15]. Auto-Discovery. The Self-Description process of the node involves the generation of GANA Capability Description Model, a concept introduced in Section 2.2. The computation of Capabilities of the DEs and its underlying MEs is an iterative process, as shown in Figure 2, triggered by the NODE MAIN DE. The aggregated Capability Description is augmented with the node attributes such as class-of-device and hardware information. The Self-Advertisement process of a node involves the publication of the unified Capability Description Model into ONIX and to on-link nodes subject to security policies. The self-description and self-advertisement functions are repeated when the Capabilities of the device changes. As described in Section 3.1, neighbor information of each device is required by the NET LEVEL RM DE for topology-discovery. This information is provided by the NODE MAIN DE by publishing and updating a list of on-link routers on each interface to the NET LEVEL RM DE. Algorithm 5 provides the algorithm for Node Bootstrap and Auto-Discovery. Auto-Configuration. As described in Section 2.1, the NODECONF computed by the NET LEVEL RM DE and distributed through ONIX, is used in the auto-configuration of the node. Every time a node receives a NODECONF, its NODE MAIN DE parses it, and configures itself and Function-Level DEs. Further, each DE also receives the sub-profiles containing the configuration data for their MEs from the NODE MAIN DE. This is reflected in Algorithm 6. 3.3
Function-Level Routing-Management Decision-Element
The Function-Level Routing-Management DE (FUNC LEVEL RM DE) [8,9] is responsible for the configuration and management of routing protocols and mechanisms inside a GANA node/router. The functionalities in the context of the paper include:
322
A. Prakash, A. Starschenko, and R. Chaparadza
Algorithm 5. NODE MAIN DE - Auto-Discovery Require: GAN A T oken Ensure: Auto-Discovery, Security 1: for all DEs on Node- and Function-Level do 2: START DE 3: DISCOVER N ET LEV EL SEC M N GT DE 4: REQUEST GAN A T oken 5: RECEIVE GAN A T oken 6: DISCOVER ON IX 7: Capability Description ”=” N ODE M AIN DE capabilities 8: Capability Description ”+=” device capabilities 9: for all M Es of N ODE M AIN DE do 10: Capability Description ”+=” M E capabilities 11: WAIT for global IP-address auto-configuration 12: Capability Description ”+=” global address 13: PUBLISH Capability Description on ON IX 14: COMPUTE N eighbour List using the IPv6 Neighbour Discovery Mechanism 15: DISCOVER N ET LEV EL RM DE 16: PUBLISH N eighbour List to N ET LEV EL RM DE 17: loop 18: LISTEN Event 19: if Event == UPDATE of M E capabilities then 20: Capability Description ”+=” M E capabilities 21: PUBLISH update of Capability Description on ON IX 22: else if Event == UPDATE N eighbour List then 23: PUBLISH N eighbour List to N ET LEV EL RM DE
Algorithm 6. NODE MAIN DE - Auto-Configuration Require: N odeConf Ensure: Auto-Configuration of DEs and protocols 1: loop 2: LISTEN Event 3: if Event == RECEIVE N odeConf then 4: for all DE on Node- and Function-Level do 5: PUSH DE Conf ig derived from N odeConf to DE 6: PUSH P rotocol Conf ig derived from N odeConf to DE for DEs function
1. Publication of the Capabilities of itself and its MEs (routing protocols and mechanisms such as OSPFv3) when requested by the NODE MAIN DE. 2. (Re-)Configuration of the MEs according to the configuration provided by the NODE MAIN DE, and its acknowledgment and validation. The Capabilities of the MEs contain information such as protocol/ME version, supported protocol features and services, computational cost of running the protocol, etc. As described in Section 2.1. the configuration of MEs in GANA is vendor specific. Thus the DE uses the MAP defined in Section 2.1 to
Auto-discovery and Auto-configuration of Routers in an Autonomic Network
323
understand the semantics and syntax of the configuration parameters for the various vendor specific implementations of OSFPv3. The management interface (CLI or SNMP [5]) is used by the DE to configure OSPF. The configuration is validated by checking the value of the various parameters of OSPF with the configuration parameters provided in the NODECONF. Along with the NODE MAIN DE, the FUNC LEVEL RM DE ensures that the configurations for OSPF (routing protocol) computed by the NET LEVEL RM DE and distributed through ONIX, are applied by the individual routers, completing the auto-configuration process.
4
Conclusion
In this paper, we presented the enablers and the algorithms required for realizing the auto-discovery and auto-configuration features in a GANA conformant network. The approach outlined is realistic and moves ahead from the traditional scripting and automation techniques used for configuration management. Further, the profound issue of network security, often sidelined during a discourse on configuration management, is given paramount importance. Thus several security issues, such as authentication, access control and trust are considered during the design of the enablers and algorithms. In conclusion from our ongoing implementation, we believe that the approach provided here is pragmatic and effective. In the future, we aim to replace the MAP with a vendor configuration Ontology that captures the semantics and grammar of the configuration parameters used by different vendors. Every vendor conforming to GANA standards may provide their vendor specific configuration ontology which would be integrated with the ontology designed as part of the GANA oriented design methodology [16]. The unified ontology would enable the Network-Level DEs to manipulate complex configuration parameters, of different vendor specific MEs providing the same functionality, in a dynamic fashion. Further, we also intend to extend the current algorithms to auto-configure other routing protocols, such as RIP, BGP etc. Finally, a case study evaluating the proposed algorithms would be completed and published. Acknowledgments. This work is partially supported by EC FP7 EFIPSANS project (INFSO-ICT-215549) [17].
References 1. Tayal, A.P., Patnaik, L.M.: An Address Assignment for the Automatic Configuration of Mobile Ad Hoc Networks. Personal Ubiquitous Comput. 8(1), 47–54 (2004) 2. Sanchez, L., McCloghrie, K., Saperia, J.: Requirements for Configuration Management of IP-based Networks. RFC 3139 (Informational) (June 2001), http://www.ietf.org/rfc/rfc3139.txt
324
A. Prakash, A. Starschenko, and R. Chaparadza
3. Yoo, S.M., Ju, H.T., Hong, J.: Web Services Based Configuration Management for IP Network Devices. In: Dalmau Royo, J., Hasegawa, G. (eds.) MMNS 2005. LNCS, vol. 3754, pp. 254–265. Springer, Heidelberg (2005), http://dx.doi.org/10.1007/11572831_22 4. Choi, H.M., Choi, M.J., Hong, J.: Design and Implementation of XML-based Configuration Management System for Distributed Systems. In: IEEE/IFIP NOMS 2004: Proc. of Network Operations and Management Symposium, vol. 1, pp. 831– 844 (2004) 5. Harrington, D., Presuhn, R., Wijnen, B.: An Architecture for Describing Simple Network Management Protocol (SNMP) Management Frameworks. RFC 3411 (Standard) (December 2002), http://www.ietf.org/rfc/rfc3411.txt ; updated by RFCs 5343, 5590 6. Chatzimisios, P.: Security Issues and Vuluerabilities of the SNMP Protocol. In: ICEEE 2004: Proc. of 1st International Conference on Electrical and Electronics Engineering, June 2004, pp. 74–77 (2004) 7. Derbel, H., Agoulmine, N., Sala¨ un, M.: ANEMA: Autonomic network management architecture to support self-configuration and self-optimization in IP networks. Comput. Netw. 53(3), 418–430 (2009) 8. Chaparadza, R.: Requirements for a Generic Autonomic Network Architecture (GANA), suitable for Standardizable Autonomic Behavior Specifications for Diverse Networking Environments. International Engineering Consortium (IEC), Annual Review of Communications 61 (2008) 9. Chaparadza, R., et al.: Towards the Future Internet - A European Research Perspective. In: Creating a viable Evolution Path towards Self-Managing Future Internet via a Standardizable Reference Model for Autonomic Network Engineering, pp. 313–324. IOS Press, Amsterdam (2009) 10. Coltun, R., Ferguson, D., Moy, J., Lindem, A.: OSPF for IPv6. RFC 5340 (Proposed Standard) (July 2008), http://www.ietf.org/rfc/rfc5340.txt 11. Chaparadza, R., et al.: IPv6 and Extended IPv6 (IPv6++) Features that enable Autonomic Network Setup and Operation. In: SELFMAGICNETS 2010: Proceedings of the International Workshop on Autonomic Networking and SelfManagement in the Access Networks. ICST ACCESSNETS 2010 (November 2010) 12. Lozano, J.A., Gonzalez, J.M., Chaparadza, R., Vigouraux, M.: Engineering Future Network Governance. Journal of ICT Future Technologies 36(204), 23–30 (2010) 13. Galli, S., et al.: A Novel Approach to OSPF-area Design for Large Wireless ad-hoc Networks. In: ICC 2005: Proc. of 2005 IEEE International Conference on Communications, May 2005, vol. 5, pp. 2986–2992 (2005) 14. Hendrickson, B., Lelandy, R.: The Chaco User’s Guide Version 2.0. Tech Report SAND95-2344, Sandia National Laboratories, Albuquerque, NM 87185-1110 (July 1995) 15. N´emeth, F., R´etv´ ari, G.: The Autonomic DHCP architecture for IPv6 (September 2010), http://qosip.tmit.bme.hu/~ retvari/autonomic_dhcp.html 16. Prakash, A., Chaparadza, R., Theisz, Z.: Requirements of a Model-Driven Methodology and Tool-Chain for the Design and Verification of Hierarchical Controllers of an Autonomic Network. In: CTRQ 2010: Proc. of International Conference on Communication Theory, Reliability, and Quality of Service, June 2010, vol. 0, pp. 208–213. IEEE Computer Society, Los Alamitos (2010), ISBN: 978-0-7695-4070-2 17. EC FP7-IP EFIPSANS Project (2008-2010), http://www.efipsans.org, INFSOICT-215549
Author Index
Abril, Evaristo J. 153 Aguado, Juan C. 153 Amirouche, Loucif 113 Ary, B´ alint 62 Asztalos, Domonkos 240 Badache, Nadjib 113 Benko, Peter 240 B¨ ok, Patrick-Benjamin 269 Brewka, Lukasz 226 Brunstrom, Anna 299 Chaparadza, Ranganai 283, 311
198, 240, 253,
Deconinck, Geert 100 Dittmann, Lars 226 Djenouri, Djamel 113 Dolezel, Radek 141 Dostal, Otto 141 Dur´ an, Ram´ on J. 153 Farkas, K´ aroly 71 Fazekas, P´eter 18 Feh´er, G´ abor 127 Fern´ andez, Patricia 153 Fornasa, Martino 83 Gavler, Anders Gurtov, Andrei
226 3
´ am 71 Horv´ ath, Ad´ Hosek, Jiri 141 Hutchison, David 187 Imre, S´ andor
62
Jakeman, Matthew 187 Jim´enez, Tamara 153 Kaldanis, Vassilios 240, 253 Katsaros, Giannis 240 K˝ or¨ osi, Attila 47 Kukli´ nski, Slawomir 198
Lil, Emmanuel Van 100 Lindskog, Stefan 299 Lorenzo, Rub´en M. 153 Maresca, Massimo 83 Marnerides, Angelos K. 187 M´ at´e, Mikl´ os 47 McInnes, Allan 32 Merayo, Noem´ı 153 Miguel, Ignacio de 153 M´ ocz´ ar, Zolt´ an 176 Molnar, Karol 141 Moln´ ar, S´ andor 176 N´emeth, Felici´ an 198 Nordell, Viktor 226 Nungu, Amos 168 Patalas, Michael 269 Pawlikowski, Krzysztof 32 Pehrson, Bj¨ orn 168 Petre, Razvan 198 Pezaros, Dimitrios P. 187 Pielken, Dennis 269 Polishchuk, Tatiana 3 Prakash, Arun 198, 311 Rajiullah, Mohammad 299 Ray, Sayan Kumar 32 Ray, Swapan Kumar 32 Rucka, Lukas 141 Simon, Csaba 240 Sirisena, Harsha 32 Sk¨ oldstr¨ om, Pontus 226 Starschenko, Alexej 198, 311 Sz´ekely, Bal´ azs 47 Takabatake, Toshinori 214 Tcholtchev, Nikolay 253, 283 T¨ or˝ os, Istv´ an 18 T¨ uchelmann, York 269 Wang, Linyu 100 Wessing, Henrik 226