Engineering Internet QoS
For a listing of recent titles in the Artech House Telecommunications Library, turn to the back of this book.
Engineering Internet QoS Sanjay Jha Mahbub Hassan
Artech House Boston • London www.artechhouse.com
Library of Congress Cataloging-in-Publication Data Jha, Sanjay. Engineering Internet QoS / Sanjay Jha, Mahbub Hassan. p. cm. — (Artech House telecommunications library) Includes bibliographical references and index. ISBN 1-58053-341-8 (alk. paper) 1. Internet—Evaluation. 2. Telecommunications—Traffic management. 3. Quality control. I. Hassan, Mahbub. II. Title. III. Series. TK5105.875.I57 J53 2002 004.6—dc21
2002074494
British Library Cataloguing in Publication Data Jha, Sanjay Engineering Internet QoS. — (Artech House telecommunications library) 1. Computer engineering 2. Internet 3. Computer networks– Quality control I. Title II. Hassan, Mahbub 621.3'981 ISBN 1-58053-341-8 Cover design by Gary Ragaglia
© 2002 ARTECH HOUSE, INC. 685 Canton Street Norwood, MA 02062 All rights reserved. Printed and bound in the United States of America. No part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without permission in writing from the publisher. All terms mentioned in this book that are known to be trademarks or service marks have been appropriately capitalized. Artech House cannot attest to the accuracy of this information. Use of a term in this book should not be regarded as affecting the validity of any trademark or service mark. International Standard Book Number: 1-58053-341-8 Library of Congress Catalog Card Number: 2002074494 10 9 8 7 6 5 4 3 2 1
v
To my father, Krishna, my wife, Manju, and my pets, Kookaburra (Maansi) and Possum (Pranay). Sanjay Jha To my son, Aaron. Mahbub Hassan
vi
Contents Preface
xv
Chapter 1 Introduction 1.1 QoS Framework 1.2 Video-Conferencing System 1.3 Overview of Audio-Video Compression Techniques 1.3.1 Video-Compression Standards 1.3.2 Audio-Compression 1.4 End-System Considerations 1.5 Operating-System Approach 1.6 Overview of Networking and Media Technologies 1.7 End-to-End QoS in the Internet 1.8 Supporting QoS in Best-Effort Networks 1.9 Application-Level Adaptation 1.9.1 Montgomery’s Destination Wait Method 1.9.2 Adaptive Audio Playout 1.9.3 Feedback Control Mechanism 1.9.4 Forward Error Correction 1.9.5 Interleaving 1.9.6 Repair at Receiver 1.10 Real-Time Protocol 1.11 Real-Time Control Protocol 1.11.1 Interarrival Jitter Calculation 1.11.2 Example: Audio Transmission in the Internet 1.12 Summary
1 1 5 6 6 9 9 10 11 13 14 16 16 18 19 19 20 20 20 22 24 24 26
vii
viii
Contents
1.13 Review Questions
26
Chapter 2 QoS Fundamentals 2.1 Traffic Description 2.1.1 Types of Traffic Sources 2.1.2 Traffic Parameters 2.2 QoS Specification and Contract 2.3 QoS Signaling 2.4 Packet Classification 2.5 Resource Reservation 2.6 Admission Control 2.7 Traffic Policing 2.7.1 Requirements for Traffic Policing 2.7.2 Policing Parameters 2.7.3 Policing Algorithms 2.8 Traffic Shaping 2.9 Queuing and Scheduling 2.10 Congestion Control and Buffer Management 2.11 Research Directions 2.12 Summary 2.13 Review Questions
31 31 31 32 34 34 34 35 35 36 36 37 37 42 43 44 44 45 45
Chapter 3 Scheduling for QoS Management 3.1 Scheduling Goals 3.2 Scheduling Techniques 3.2.1 First Come First Serve 3.2.2 Priority Queuing 3.2.3 Generalized Processor Sharing 3.2.4 Round Robin 3.2.5 Weighted Round Robin 3.2.6 Deficit Round Robin 3.2.7 Weighted Fair Queuing 3.2.8 Virtual Clock 3.3 Class-Based Queuing 3.4 Implementation Status 3.5 Research Directions in Scheduling 3.6 Summary 3.7 Review Questions 3.8 Implementation Project
49 49 52 52 53 54 56 57 58 60 66 67 69 71 73 74 74
Contents
ix
Chapter 4 TCP/IP and Queue Management 4.1 Internet Protocol 4.1.1 Datagram Forwarding 4.1.2 Unreliable Delivery of Datagrams 4.1.3 Datagram Format 4.2 User Datagram Protocol 4.3 TCP Basics 4.4 TCP Segment Format 4.5 TCP Three-Way Handshake 4.6 TCP Acknowledgment 4.7 Flow Control 4.8 Congestion Control 4.8.1 Packet Loss Detection 4.8.2 Retransmission Timer 4.8.3 RTT Estimation 4.8.4 Slow Start 4.8.5 AIMD 4.8.6 TCP Tahoe/Reno/Vegas 4.9 Queue Management 4.9.1 Explicit Congestion Notification 4.9.2 Packet Drop Schemes 4.9.3 Global Synchronization Problem 4.9.4 Random Early Detection Scheme 4.9.5 Weighted Random Early Detection 4.9.6 RED with In/Out 4.9.7 Problems with RED 4.10 Research Directions 4.10.1 Blue 4.10.2 Related Work 4.11 Summary 4.12 Review Questions
77 77 78 78 78 81 82 82 85 86 86 88 88 89 89 90 90 92 93 93 94 95 95 98 98 99 100 100 101 102 102
Chapter 5 Integrated Services Packet Network 5.1 Intserv Aim 5.2 Application Classification 5.2.1 Elastic Applications 5.2.2 Tolerant Real-Time Applications 5.2.3 Intolerant Real-Time Applications
107 108 108 108 109 109
x
Contents
5.3
5.4 5.5 5.6 5.7 5.8
5.9
5.10 5.11 5.12 5.13
Intserv Service Classes 5.3.1 Controlled Load Service Class 5.3.2 Guaranteed Service Class Flow Definition Signaling/Flow Setup Routing Protocol Independence Reservation Specs IS-Capable Router Components 5.8.1 Admission Control 5.8.2 Policing and Shaping 5.8.3 Packet Classifier 5.8.4 Packet Scheduler 5.8.5 Packet Processing 5.8.6 Traffic Control Implementation LAN QoS and Intserv 5.9.1 QoS Problem in LAN 5.9.2 IEEE Solution for LAN QoS 5.9.3 Mapping of Intserv QoS to LAN QoS Intserv Problems Research Directions Summary Review Questions
Chapter 6 Resource Reservation Protocol 6.1 RSVP Features 6.1.1 Simplex Protocol 6.1.2 Receiver-Oriented Approach 6.1.3 Routing-Protocol Independent 6.1.4 Reservation Setup 6.1.5 Soft State Refresh 6.2 Reservation Merger 6.3 Reservation Styles 6.3.1 Wildcard Filter 6.3.2 Shared Explicit 6.3.3 Fixed Filter 6.3.4 RSVP/ns Simulation 6.4 RSVP Messages 6.4.1 PATH Messages
109 110 110 111 111 112 113 116 116 117 117 118 118 118 120 120 120 124 125 126 128 128 133 133 133 134 134 134 135 135 137 137 138 139 141 143 144
Contents
6.4.2 RESV Messages 6.4.3 Other RSVP Messages 6.4.4 Message Processing 6.5 RSVP Message Format 6.5.1 Session Objects 6.5.2 TSpec Object 6.5.3 AdSpec Object Class 6.5.4 AdSpec Functional Block 6.5.5 Other RSVP Objects 6.5.6 PATH Message Format 6.5.7 RESV Message Format 6.5.8 Controlled Load Flow Specification 6.5.9 Guaranteed Load Flow Specification 6.6 RSVP APIs 6.7 RSVP Problems 6.8 Other Resource Reservation Protocols 6.9 RSVP Extensions 6.9.1 Improvement-Related Extensions 6.9.2 Subnet Bandwidth Manager 6.9.3 New Application-Related Extensions 6.10 Summary 6.11 Review Questions 6.12 Implementation Project Chapter 7 IP Differentiated Services Network 7.1 Diffserv Architecture 7.1.1 Per-Hop Behavior 7.1.2 Per-Domain Behavior 7.1.3 Existing IPv4 ToS 7.1.4 Diffserv Codepoint 7.1.5 PHB Encoding 7.2 Diffserv Router 7.3 Premium Service 7.4 Experimental Evaluation of Premium Service Under Linux 7.5 Assured Service 7.6 Open Issues with Diffserv 7.7 Diffserv Research Directions 7.8 Summary
xi
145 145 146 147 148 148 149 150 151 152 153 154 154 155 157 159 160 161 161 163 164 164 165 169 169 171 171 172 172 173 175 178 179 184 187 187 190
xii
Contents
7.9 Review Questions 7.10 Implementation Project
190 191
Chapter 8 Policy-Based QoS Management 8.1 Definition of Terminologies 8.2 Bandwidth Broker 8.3 Policy Framework 8.3.1 Policy Protocols 8.3.2 Policy Rules and Representations 8.3.3 Policy Database 8.4 Policy and RSVP 8.5 Bandwidth Broker Implementation 8.6 Internet2 and QBone 8.7 Research Directions 8.8 Summary 8.9 Review Questions
195 195 196 198 200 201 202 203 203 208 209 209 210
Chapter 9 ATM QoS 9.1 Why ATM Networks? 9.2 Protocol Architecture 9.3 Connections 9.3.1 Virtual Channel 9.3.2 Virtual Path 9.3.3 Permanent and Switched Virtual Circuits 9.4 Interfaces 9.5 Cell Formats 9.6 QoS Support 9.6.1 Traffic Contract 9.6.2 Traffic Descriptions 9.6.3 QoS Parameters 9.6.4 Service Classes 9.7 Adaptation Layers 9.8 IP-ATM Integration 9.8.1 ATM Deployment in IP Networks 9.8.2 Encapsulation of IP Datagrams into ATM Cells 9.9 IP-ATM QoS Mapping 9.9.1 Intserv over ATM 9.9.2 Diffserv over ATM 9.9.3 Performance Implications of QoS Mapping
213 213 214 215 216 216 216 217 218 220 220 221 222 222 224 227 227 228 232 233 233 235
Contents
9.10 9.11 9.12 9.13
9.9.4 MPLS Solution Research Directions Summary Further Reading Review Questions
xiii
235 236 236 237 237
Chapter 10Multiprotocol Label Switching 10.1 Proprietary Protocols 10.2 Motivation 10.3 MPLS Basics 10.4 Conventional IP Routing 10.5 MPLS Approach 10.5.1 Label Encoding 10.5.2 TTL Handling 10.5.3 MPLS Encapsulation 10.5.4 Label Processing 10.6 Label Distribution 10.6.1 Sample Network 10.6.2 Label Binding 10.6.3 Label Allocation 10.6.4 Label Switching 10.7 Hierarchical Routing 10.8 MPLS over ATM 10.9 Traffic Engineering Using MPLS 10.9.1 Constraint Routed LSP 10.9.2 Path Resource Reservation Protocols 10.9.3 Traffic Trunk 10.9.4 MPLS Experimental Results 10.9.5 No Trunking 10.9.6 Two Trunks Using LSPs 10.10MPLS and latest developments 10.10.1 Diffserv over MPLS 10.10.2 Generalized MPLS (GMPLS) 10.11Summary 10.12Review Questions
241 241 242 243 244 245 245 246 246 246 247 247 248 249 250 252 254 256 257 258 260 261 261 262 262 262 264 265 265
Chapter 11QoS in Mobile Wireless Networks 11.1 Mobile Applications 11.2 Mobile Wireless Networks
269 269 270
xiv
Contents
11.3
11.4
11.5
11.6 11.7 11.8
11.2.1 Wireless LAN 11.2.2 Bluetooth 11.2.3 Cellular Networks 11.2.4 Comparison of Wireless Networks Mobile Services over IP Networks 11.3.1 Mobile IP 11.3.2 Cellular IP Impact of Mobility on QoS 11.4.1 Effect of Wireless Links 11.4.2 Effect of Movement 11.4.3 Limitations of Portable Devices Managing QoS in Mobile Environments 11.5.1 Resource Reservation 11.5.2 Context-Aware Handoff 11.5.3 Application Adaptivity Research Directions Summary Review Questions
271 272 273 274 274 274 277 278 278 279 279 280 280 282 282 283 287 287
Chapter 12Future 12.1 Intserv over Diffserv 12.1.1 Motivation 12.1.2 Generic Framework for Intserv over Diffserv 12.1.3 Guaranteed Service over EF PHB 12.1.4 Controlled Load over AF PHB 12.2 QoS Routing 12.3 Resource Discovery and QoS 12.4 Virtual Private Network and QoS 12.5 Content Distribution Network and QoS 12.6 Web QoS 12.7 Billing and Charging for QoS 12.8 Final Words 12.9 Summary 12.10Review Questions
293 293 293 293 295 296 297 297 298 299 300 301 302 303 303
About the Authors
307
Index
309
Preface Engineering Internet QoS addresses the technical issues raised by the emergence of new types, classes, and qualities of the Internet services. The international research community has been working for the last decade on designing solutions to the QoS problems in the Internet. The Internet Engineering Task Force (IETF) has issued many standards recommending architectures and protocols to build a QoS support infrastructure for the IP networks. The volume and pace of this QoS research and development have demanded new textbooks on this topic. Although several books on QoS have been published in the last few years, currently no text exists that provides a single comprehensive source for the QoS concepts, architectures, and algorithms. Most books cover the latest developments in IETF without going into the depth of the fundamentals that is required to provide QoS. Readers are required to consult several reference books to gain an in-depth understanding of the fundamental concepts to build QoS in the Internet. This makes it difficult to adopt any of these books as a text for a course on QoS. We have written Engineering Internet QoS to provide a comprehensive source of knowledge in the field. We have attempted to provide sufficient depth for the major QoS concepts and architectures. Simulation results have been presented to help understand some of the difficult concepts. The book has several Linux/FreeBSD– based practical examples to illustrate how QoS testbeds can be set up for experimentation and future research.
xv
xvi
Engineering Internet QoS
ASSUMED KNOWLEDGE Readers would be required to have the basic knowledge of data communications and TCP/IP protocols, with some exposure to IP routing.
AUDIENCE The book is designed for use in a second course on networking with a prerequisite of introductory networking or data communications. The name of the course can be as specific as “QoS in the Internet.” The book can also be used in some existing networking courses. Some of the possible courses for which the book can be adopted include Advanced Computer Networks and High Performance Networks. Graduate students will find the sections on “research direction” useful for their literature survey in the respective fields. Professionals working as network engineers, telecommunication and network software developers, R&D managers, research scientists, and network administrators will also find this book valuable to understanding the QoS issues and how to implement, maintain, and deploy QoS technologies. QoS API and traffic control examples will help engineers learn how to configure a router/switch supporting QoS. Since this book is partly based on the industry short courses the authors developed, the book will be suitable for internal training in many telecommunication industries. The trainees will find the knowledge acquired useful in creating new products or setting new directions for their team.
ORGANIZATION 1. Introduction: We believe that it will be easier to understand the specific architectures once the readers have absorbed the fundamental issues and techniques. This chapter provides definition of the QoS problem and QoS parameters and answers why QoS is an important problem. This is followed by the issues in providing QoS in the Internet environment. Through an example, we discuss issues related to providing end-to-end QoS for real-time multimedia communications over the Internet. End system as well as network issues are discussed. The standard protocol real-time transport protocol (RTP) and RTCP are also discussed, with examples of how they provide features for supporting adaptive feedback as well as jitter calculation. Readers not familiar with the QoS area will find this chapter interesting.
Preface
xvii
2. QoS Fundamentals: In order to understand various architectures for QoS provisioning in the Internet, it is important to understand the fundamental algorithms/techniques behind these architectures. This chapter provides an overview of the fundamental issues such as traffic specification and negotiation, admission control, resource reservation, scheduling, and congestion control. A detailed treatment to shaping and policing has been provided in this chapter. Highly technical details of scheduling, congestion control, and buffer management as well as resource reservation are provided in later chapters. Readers familiar with these basic concepts may find the specialized chapters more useful. 3. Scheduling for QoS Management: Packet scheduling is an important building block for any QoS network. This chapter provides an in-depth treatment of this topic. A variety of schedulers including FCFS, priority round robin, fair queuing, and its variants are discussed in details with examples of how they could meet the delay and bandwith requirements of flows. There is an overview of several advanced schedulers, and current research issues are described. We recommend this chapter to both novice and advanced readers. 4. TCP/IP and Queue Management: This chapter provides a brief refresher on TCP and IP protocols. Another objective of this section is to provide readers with background on how the congestion control problem is currently solved in the best-effort Internet. Authors have included this material to minimize cross–referencing to other books where possible. Readers familiar with these concepts may skip this section. The later part of this chapter deals with queue management techniques used in best effort as well as QoS-capable networks. Algorithms, such as RED/RIO, wRED, and Blue are discussed in detail. An overview of current research issues relating to queue management is described in this chapter. 5. Integrated Services Packet Network: One of the earliest architectures for QoS support within IETF is the integrated services (Intserv) model. This chapter starts with classification and requirements of applications, followed by description of various components of the Intserv model. Components of an Intserv capable router are described in detail. QoS support issues in the LAN environment as well as Intserv mapping of LAN QoS are also covered in this chapter. 6. Resource Reservation Protocol: Although flow setup has been discussed briefly in the context of Intserv, a separate chapter has been dedicated to the
xviii
Engineering Internet QoS
resource reservation protocol (RSVP). This chapter covers details of RSVP protocol design and demonstrates its usefulness in the Intserv environment through examples and simulation. Resource reservation protocol is an active research area. This chapter will help researchers and implementors learn the issues involved in designing any new resource reservation protocol. An overview of various RSVP extensions and related research is also provided in this chapter. 7. IP Differentiated Services Network: This chapter describes the Diffserv architecture and its various elements. We also describe components of a Diffserv router. Premium and assured service are described in detail with experimental evaluation of premium service using Linux testbed. The chapter concludes with Diffserv problems and new research development such as perdomain behavior. 8. Policy–Based QoS Management: Policy-based QoS management is emerging as a strong research and development area for next generation Internet. We discuss resource allocation protocol (RAP) framework. Bandwidth broker is considered to be the oracle that has a global view of resources within a Diffserv domain. This chapter also describes the intra- and interdomain protocols used in the Internet along with the Internet2 and Qbone architectures. 9. ATM QoS: ATM network was designed to support QoS from the start. Currently many carriers have deployed ATM switches in their backbone network. This chapter provides an overview of ATM technology as well as the IP/ATM integration issues. This background is essential for understanding the next chapter on MPLS. However, readers familiar with ATM network may skip this chapter. 10. Multiprotocol Label Switching: Multiprotocol label switching has been a popular topic for developers and researchers in the QoS area. This chapter begins with the motivation behind developing this new technology. A detailed description of the MPLS protocol, label distribution protocol, and issues related to MPLS over ATM network is provided. Finally we discuss the traffic engineering issues, with some examples of traffic trunking in MPLS networks. 11. QoS in Mobile Wireless Networks: Mobile wireless technology has experienced the same level of growth as the Internet. A vast topic like this deserves a separate book. We start with discussion of applications and their QoS requirments in the wireless Internet. We also provide a high-level overview of
Preface
xix
measures currently being proposed to address the QoS issue in the wireless Internet. 12. Future: Besides the various architectures discussed in this book, several new architectures are evolving in the Internet. It is impossible to cover all these new developments in a single book. We start this chapter with a detailed discussion of Intserv over Diffserv followed by an overview of QoS routing, VPN and QoS, content distribution network, as well as billing and charging for QoS. Each of these sections provides a list of key references for further reading. Each chapter provides a section on either future research directions or latest developments in the area. Readers keen to explore the areas further will find these sections and references very useful. We have structured chapters in a way that readers absorb basic concepts before jumping into architectural issues. Each chapter is selfcontained. We have repeated some concepts briefly in a few chapters to make them self-contained, providing forward/backward reference for details. This facilitates readers to carry on with the current chapter without breaking their continuity.
ON-LINE SUPPORT On-line support material for each chapter and presentation foils (for adopters of the text only) will be available from the following URL: http://www.cse.unsw.edu.au/qosbook https://hhp.bvdep.com/artechhouse /Default.Asp?Frame=Book.Asp&Book=1-58053-341-8
ACKNOWLEDGMENTS First of all we would like to express our gratitude to the anyonymous reviewer for the extensive reviews of the chapters and the comments on the book’s organization. A number of other people helped with this book. Some part of this book came from our joint work with Professor Raj Jain from Ohio State University. Muneyb Minhazuddin from Avaya Communications provided helpful suggestions on Diffserv. Professor William Atwood from Concordia University provided suggestions
xx
Engineering Internet QoS
on various aspects of this book. Our graduate students William Lau, Jahan Hassan, Alfandika, and Monir Hossain, read some of the chapter drafts. Jim Wu provided graphics for the Diffserv chapter. Abdul Aziz Mustafa helped with the NS simulation setup. Matt Chalmers and Shaleeza Sohail provided references for billing and charging as well as policy-based management sections. Filip Rosenbaum provided experimental results for MPLS and helped with Linux TC examples. This book has been influenced by authors of books, articles, and RFCs provided in the refence list after each chapter. The MPLS chapter has been influenced by Bruce Davie’s MPLS tutorial in Globecom’98. We are indebted to our employer, the University of New South Wales, and our head of school, Professor Arun Sharma, for their flexibity and encouragement, which enabled us to write this book. We acknowledge Mark Walsh, Barbara Lovenvirth, Judi Stone, Jen Kelland, Susanne Schott and others at Artech House Publishers for their wonderful support. Geoff Oakley helped with Latex formatting. Finally, we extend our gratitude to our families for their continual support throughout the entire project. Sanjay Jha Mahbub Hassan July 2002
Chapter 1 Introduction There has been a dramatic increase in the processing power of workstations and bandwidth of high speed networks. This has given rise to new real-time applications such as multimedia. These applications have traffic characteristics and performance requirements that are quite different from existing data-oriented applications. Some examples of these applications are: desktop teleconference, multimedia mail and documents, remote surveillance, and video on demand. Applications have different quality of service (QoS) requirements. For example, video-on-demand applications can tolerate moderate end-to-end delay but require high throughput and very low error rate. In contrast, Internet telephony needs very low end-to-end latency but needs moderate throughput and a slightly higher error rate (than VoD) is acceptable. The Internet, in the past, has provided only best effort service with no predictable performance. The QoS term has been used primarily in the networking community to define a set of network performance characteristics such as delay, jitter, bit error rate, packet loss, and more. With new multimedia services over packet networks such as the Internet, the concept of QoS involves not only the network but also the end systems.
1.1 QoS FRAMEWORK The early 1990s saw a large number of frameworks being proposed for supporting QoS over the Internet [1, 2, 3, 4, 5]. With IETF’s efforts to come up with a standard framework such as Intserv and Diffserv, work by individual groups in proposing new architectures have slowed down. Figure 1.1 shows the abstracted view of
1
2
Engineering Internet QoS
common elements of these architectures to support end-to-end QoS consisting of user, application, and system levels adapted from Nahrstedt and Steinmetz [6]. At each of these levels QoS needs to be specified for the level below. The user specified QoS needs to be translated into layer specific parameters and then the application and system levels need to ensure that QoS expectations of the user are met.
User Perceptual QoS Application level Application QoS
System level (operating sytems and network) System QoS Device QoS
Multimedia devices
Figure 1.1
Network QoS
Network subsystem
QoS framework (Source: [6]).
Users of multimedia applications are the ultimate decision makers on what they perceive as a good quality of transmission. For example, the audio/visual synchronization parameter requires coordinating the timing of the two streams. The term lip synchronization applies to synchronizing audio with the movement of a person’s lips. If data is out of synchronization, human perception tends to
Introduction
3
Table 1.1 Network QoS Parameters Category
Parameters
Timeliness
Delay Response time Jitter Systems-level data rate Application-level data rate Transaction rate Mean time to failure (MTTF) Mean time to repair (MTTR) Mean time between failures (MTBF) Percentage of time available Packet loss rate Bit error rate
Bandwidth
Reliability
Source: [7]
identify the presentation as artificial, strange, and annoying. These user-perception parameters are required to be mapped to lower-level technology-based parameters. Application QoS parameters could include media quality, end-to-end delay requirements, inter/intra stream synchronization, and others derived from user’s QoS specifications. Application parameters are required to be mapped into system level QoS. System level QoS has two components. The device level QoS specifies timing and throughput requirements. Network level QoS could be parameters as defined in Table 1.1. A brief description of these parameters is provided below: Delay: Time it takes for a message to be transmitted; Response time: Round-trip time from request transmission to reply receipt; Jitter: Variation in delay or response time; Systems-level data rate: Bandwidth required or available, in bits or bytes per second; Application-level data rate: Bandwidth required or available, in applicationspecific units such as video frame rate; Transaction rate: Number of operations requested or processed per second;
4
Engineering Internet QoS
Table 1.2 Perceptual QoS Parameters Perceptual Parameter
System Parameter
Picture detail Picture color accuracy Video rate Video smoothness Audio quality Audio-video synchronization
Pixel resolution Maps-to-color information per pixel Maps-to-frame rate Maps-to-frame rate jitter Audio-sampling rate and number of bits Video and audio stream synchronized (e.g, lip-sync)
Source: [7]
Mean time to failure (MTTF): Normal operation time between failures. Mean time to repair (MTTR): Downtime from failure to restarting next operation; Mean time between failures (MTBF): MTBF = MTTF + MTTR ; Percentage of time available: MTTF/MTTF + MTTR. These parameters typically form part of the service level agreement (SLA) between a network service provider and a service user. For the Internet, it typically refers to availabilty of the access link to the service provider; Packet loss rate: Proportion of total packets that do not arrive as sent, e.g., lost because of congestion in the network; Bit error rate: Proportion of total data that does not arrive as sent because of errors in the network transmission system. For example, bit error rate increases if transmission speed is increased over the telephone line. QoS Translation As we have seen in Figure 1.1, QoS parameters need to be translated between levels. For example, application level QoS parameters, such as frame rate, size of video window, and quality need to be mapped to network layer QoS parameters, bandwidth, delay, etc. A description of some parameters related to user perception and their mapping to application/system parameters are presented in Table 1.2.
Introduction
5
The task of QoS translation is nontrivial. For example, an application may specify the video frame rate and frame size that gets mapped to the transmission data rate. If the network subsystem is unable to meet these requirements, a new adjusted data rate may result in either lower image quality or a reduced frame rate. This would require some renegotiation of parameters with the user. Many research papers have been published in this area [8, 9, 10]. Details of these works are beyond the scope of this book.
1.2 VIDEO-CONFERENCING SYSTEM We describe a video-conferencing system with a view toward understanding the various components of this system that are responsible for providing end-to-end quality of service. In a video-conferencing system, the video source, such as a camera converts analog signals to digital information via an analog-to-digital converter (ADC). The digital images then can be manipulated, stored, or transmitted in digital form. At the receiving end, the digital information must be transformed back into analog form via a digital-to-analog converter (DAC). Figure 1.2 shows the video data flow from the camera of the sender to the video display unit of the receiver. It shows analog video being digitized and placed in a frame buffer. This function is performed by special hardware cards called frame grabbers. These cards may also have the capability to compress the video in various formats. Alternatively, the compression can take place in a software encoder. Most of the time, the video data sent over the network is already compressed in order to save bandwidth. At the receiving end, decompression can take place either in hardware or software. All multimedia applications need to process and exchange large volumes of information. An uncompressed ten-minute video clip, for example, consumes 22 Gbytes of storage and its voice grade audio requires another 5 MBytes. An uncompressed National Television Standards Committee (NTSC) standard composite signal represents almost 2 MBytes of pixel data per second. This number is almost 20 times larger for high definition television [11]. There is a need to reduce the data to be processed or transmitted by end systems. Section 1.3 presents an overview of compression methods used for audio-video transmission. Section 1.4 considers the capabilities of end systems to process multimedia streams. The bandwidth requirement of video streams is very high. The network transferring the continuous media stream between two end systems may have limited bandwidth. It also may introduce variability in end-to-end delay. Section 1.5 provides a description of operating system issues. Section 1.6 looks at the characteristics of different networking
6
Engineering Internet QoS
Sender
Receiver
Analog video
Display
Digitized video in frame buffer
Digitized video in frame buffer
Video data in application buffer
Video data in application buffer
Video data in network buffer
Video data in network buffer
Network
Figure 1.2
Video data flow.
technologies available and their suitability for the task of transporting multimedia streams.
1.3 OVERVIEW OF AUDIO-VIDEO COMPRESSION TECHNIQUES The biggest challenge posed by digital video is the volume of data involved and how it bears on storage, transmission, throughput, and display. A video image of the size of 640 x 480 pixels with a resolution of 24 bits per pixel and a standard NTSC frame rate of 30 frames per second represents a little over 26 MB of data per second. At this rate, a 1 GB hard disk could store only about 38 seconds of video. On top of this, audio consumes more resources. 1.3.1 Video-Compression Standards The most promising solutions for integrating video and computers center on various compression technologies. Due to the massive amount of data involved and the need for a high compression ratio, most approaches are lossy in nature and take
Introduction
7
advantage of the artifacts of the human visual system. One prominent approach averages areas to take advantage of the human vision system’s lack of sensitivity to slight changes. Most of the compression methods are adaptive so they can be implemented to optimize for a given series of images [12]. The other area of interest is motion compression. In scenes that involve little action, significant data reduction can be achieved by saving only the pixels that change from frame to frame. Most video compression solutions combine various approaches such as these to yield optimum results. Figure 1.3 shows a tree diagram of existing standards based on [12]. MULTIMEDIA STANDARDS
P * 64
JPEG
MPEG
ITU−T
ISO
ISO
Image
Video
JPEG baseline
H.261 Motion Video H.263 Low Bit−Rate Visual Telephony
Video
Proprietary standards and others Video
MPEG−1 stored video
Intels’s Indeo and DVI
MPEG−2 HDTV (extension of MPEG−1)
Apple’s QuickTime Microsoft’s AVI
Video
Audio
MJPEG (motion JPEG)
MPEG−4 motion video Audio MPEG audio
G.711
Cell−B nv nvdct
(Layers 1,2,3)
G.722 G.728
Figure 1.3
Multimedia standards.
ITU-T Recommendation H.261 has been developed with the aim of video transmission at Kbps ( in range of 1 to 30). The possible applications of this standard include video phone and videoconferencing. Joint photographic experts group (JPEG) is a coding scheme for still images. The standard may be applied to various requirements such as image storage, facsimile, desktop publishing, medical imaging, electronic digital cameras, and so forth. This standard provides several coding modes, from basic to sophisticated, according to application fields. Motion video can also be achieved by transmitting, say, 30 still JPEG images per second.
8
Engineering Internet QoS
Table 1.3 Bandwidth Requirements for Video Encoding
Bandwidth
Resolution
Standard
H.261
64 Kbps–2 Mbps
M-JPEG
3–8 Mbps 15-25 Mbps 60–100 Mbps 1.2–3 Mbps 5–10 Mbps 20–40 Mbps 1–2 Mbps 4–5 Mbps 8–10 Mbps 20–30 Mbps
177x144 352x288 352x288 720x486 1920x1080 352x288 720x486 1920x1080 352x288 720x486 960x576 1920x1080
QCIF (conference) CIF (VHS quality) CIF (VHS quality) CCIR601 (PAL) HDTV CIF (VHS quality) CCIR601 (PAL) HDTV CIF (VHS quality) CCIR601 (PAL) EDT HDTV
MPEG-1
MPEG-2 (H.262)
This feature has been used in some video systems, which call it motion JPEG (MJPEG). The motion picture experts group (MPEG) is working on standards for motion video. MPEG-1 aims at achieving plausible video and audio recording and transmission at about 1.5 Mbps. Compact-disc and laser-disc players using the MPEG-1 audio-video decoders have already entered the market. MPEG-2 is an extension of the MPEG-1 standard for higher bit-rate applications, including telecommunications, broadcasting, and high-definition television (HDTV) services. MPEG-4 and H.263 are the proposed standards for very low bit-rate visual telephony ( 64 Kbps). Applications of these standards include video telephone, multipoint video-conferencing systems, audiovisual database access, and remote monitoring and control. Besides these standards there are proprietary standards such as Intel’s Indeo, Apple’s QuickTime, and Microsoft’s Audio Video Interleave (AVI). Table 1.3 summarizes the bandwidth required by a variety of these services. Here, we look at the bandwidth requirement for video applications. Depending upon the compression technique used, the bandwidth ranges from 64Kbps for H.261 to 5 to 30 Mbps for HDTV standards.
Introduction
9
Table 1.4 Bandwidth Requirements for Audio Coding Technique
Standard
Data Rate (Kbps)
PCM (pulse code modulation) 4-bit ADPCM (adaptive differential PCM) 2-bit ADPCM CELP (code-excited linear-predictive) Adaptive CELP Part of H.324
G.711 G.726 G.726 G.728 G.729 G.723.1
64 32 16 16 8 5.3 or 6.3
1.3.2 Audio-Compression Audio is formed by analog sine waves that tend to repeat for milliseconds at a time. This repetitive nature of audio signals makes it ideal for compression. Schemes such as linear predictive coding and adaptive differential pulse code modulation (ADPCM) can achieve 40 to 80% compression. ITU-T (formerly CCITT) is the main source of audio standards. A brief description of some standards and required data rates is provided in Table 1.4. Technical details of these audio and video compression techniques, products available in the market, and references to standards can be found in [12, 13].
1.4 END-SYSTEM CONSIDERATIONS In section 1.3, a range of compression/decompression standards was discussed. End systems require dynamic decoding of a multimedia stream before it can be presented to the user. The decoding and playback of a compressed multimedia stream require a very powerful central processing unit (CPU). Alternatively, special hardware such as the Intel i750B chip supporting DVI standard or digital signal processing (DSP) hardware is required. Processors such as Sun Microsystem’s SPARC, DEC’s Alpha, and Intel’s Pentium are very powerful. They can perform decompression and playback management in the CPU itself [12]. However, the speed of memory access is a bottleneck. Caches are used by CPUs to temporarily store commonly used code and data from slower memory. Video playback needs sufficient CPU time at regular intervals so that it can perform 30 repetitions (for the NTSC system) of the grab, decompress, and display
10
Engineering Internet QoS
sequence every second. Figure 1.4 shows the amount of compute time needed to decompress and display JPEG video frames from two different clips on a SunUltra1 workstation. It is evident that the compute time needed for each frame within a stream varies and also that the CPU requirements for two streams are different. It becomes hard to predict the CPU requirements for video streams.
Decompression Time
Decompression Time
35
40
35 30
30 Time (ms)
Time (ms)
25 25
20 20
15 15
10
10 0
10
20
30
40
50
60
70
80
90
100
0
Packet Sequence Number
(a) Lion King
Figure 1.4
10
20
30
40
50
60
70
80
90
100
Packet Sequence Number
(b) Song
Decompression time for JPEG.
1.5 OPERATING-SYSTEM APPROACH The processing requirements of multimedia applications are very dynamic. For example, video frames may be required to be played every 40 ms (25 fps), but processing requirements (compression/decompression) of each frame in the stream vary substantially. Most of the work to support real-time applications has been done for embedded real-time systems where application timing requirements are periodic and static. The majority of computers connected to the Internet and used for multimedia sessions run general-purpose operating systems such as UNIX or Microsoft Windows. Most multimedia applications are CPU bound and the scheduling scheme explained above lowers the priority of these applications continually over time. These time-sharing operating systems have unpredictable response times to generate, process, and display continuous media [14]. This may result in high levels of delay and jitter. In the absence of real-time support, event timings become more
Introduction
11
skewed as the workloads on the receiving systems increase [15]. To address this problem, some experiments have been performed running multimedia applications as high-priority threads. Mauthe and others have implemented these applications using high-priority threads on PS/2 workstations running OS/2 [16]. Neih et al. [17] studied the fixed priority real-time scheduler built into UNIX SVR4. They found several problems with this approach. With higher priority for video applications, Xserver will not display images in time. Several other researchers have attempted to modify operating system schedulers by increasing the number of preemption points in order to provide bounded dispatch latency [18, 19]. Lakshman [3] also conducted experiments with fixed priority scheduling. He used an audio player, a video player, and a Fast Fourier Transform (FFT) application and various I/O intensive tasks. Priorities of audio and video player were assigned so that they achieved the needed QoS. The experiments showed that when video has higher priority, the computer stops responding to keystrokes or mouse movements and the FFT application never gets a chance to run. He also used a scheme called priority capping. In this scheme, the UNIX scheduler doesn’t lower the priority of multimedia applications below a certain threshold. The results were similar to a fixed priority scheme.
1.6 OVERVIEW OF NETWORKING AND MEDIA TECHNOLOGIES The networking media is becoming faster. This applies to all forms of media. Consider copper cables. In early 1980s, Ethernet designers (IEEE 802.3 standards committee) had concluded that category 5 unshielded twisted pair (UTP-5) cannot support any more than 1 Mbps, and so the 1BASE-5 standard was designed for UTP. For higher speed, we needed coaxial cables. In 1999, the same UTP cable carried 1 Gbps over 4 pairs. This is a two orders of magnitude increase in 15 years. Moving on to fibers, fiber distributed data interface (FDDI) was designed in 1993. It allowed 100 Mbps over 2 km. In 1999, dense wavelength division multiplexing (DWDM) using 64 wavelengths of OC-192 (10 Gbps each) allowed 0.6 Tbps on a single fiber as well OC-768 (40 Gbps) was demonstrated using single wavelength. This represents a three orders of magnitude increase in the capacity in approximately 8 years. The wireless link speeds are also growing. The IEEE 802.11 standard that came out in 1998 was designed to run at 1 and 2 Mbps. They couldn’t match 10 Mbps wired Ethernet speeds then. However, just a year or two later, 11 Mbps wireless LAN products started appearing. Thus, we see that the media capacity is
12
Engineering Internet QoS
Table 1.5 Data Rate Supported by Networks Network Type
Bandwidth (Mbps)
Ethernet Fast Ethernet 100VG–ANYLAN Token Ring Wireless LANs Fiber distributed data interface (FDDI) Distributed queue dual bus (DQDB) Integrated services digital network (ISDN) Asynchronous transfer mode (ATM) Switched multimegabit data services Frame relay
10 100 100 4 to 16 1 to 11 100 45 to 140 64 to 2.048 155 to 622 1 to 34 2
increasing fast and this may lead some to argue that the QoS problem will be a shortlived one. Interestingly, the traffic on the networks is also growing exponentially. Barely a few years ago, the highest speed home connections were limited to 28.8 kpbs. Today many households have 10 Mbps cable modem connections. Asymmetric digital subscriber lines (ADSL) and very high-speed digital subscriber lines (VDSL) allow 6 to 27 Mbps access from home on the same phone wire, resulting in increased demands on the Internet. In the wide area networks, in the carrier market, bandwidth is still expensive— while in the local area networks, in the enterprise market, bandwidth is growing fast. New technology introduction in the LAN is controlled by the technology as well as affordability of each organization to procure and install these technologies in a short time frame. Rolling of new services in the WAN is controlled more by business considerations, where the network service providers do not accept a new technology simply based on its technical merits. There are several networking solutions currently used for transmission of data, voice, images, and video. These networks provide connectivity of various characteristics. Disparity occurs in data rate, medium access protocol (switched versus shared), maximum transfer unit (MTU), connection oriented versus connectionless, error rate, etc. Table 1.5 presents a range of networks and data rates supported by these networks. Details of these technologies can be found in [20].
Introduction
13
1.7 END-TO-END QoS IN THE INTERNET Comparisons of Tables 1.3 to 1.5 show that it is possible to meet the bandwidth requirement of multimedia services over a range of available networks. Connectionoriented networks such as ATM are capable of providing varying QoS negotiated by applications. However, the problem is more complicated for the Internet. Because the Internet consists of heterogenous networks, the quality of service provided to end systems is unpredictable. Bandwidth, delay, and jitter can vary dramatically, both in time and for selection of destinations. More demand than resources results in queues at the resources. If there are queues, there will be those who will pay to get a preferred position in the queue. In other words, some will pay for QoS. If there is no queue, then it is not worthwhile to pay extra for QoS guarantees. The Internet provides a best effort model. Traffic from various sources compete for resources against each other. Most of the routers implement the first come first serve (FCFS) model and drop packets if their buffer is full. As is evident from past experience, this simple model has worked successfully for the past several years supporting a very large number of users globally. Figure 1.5 shows the end-to-end delays suffered by 100 packets sent between two end systems that are 7 hops apart. Packets suffered delays ranging from 30 ms to 90 ms. This makes transmission of media such as audio and video very challenging. End to end delay 90
80
70 Delay (ms)
60
50
40
30 0
Figure 1.5
10
20
30
40 50 60 Packet Sequence Number
End-to-end delay in the Internet.
70
80
90
100
14
Engineering Internet QoS
1.8 SUPPORTING QoS IN BEST-EFFORT NETWORKS
The goals of designing a multimedia communication system that guarantees QoS would be to meet the following properties: Bounds on delay and jitter; Effective utilization of bandwidth; Acceptable error rate; Low processing overhead for the underlying communication and end systems; Adaptability to dynamically changing network and traffic conditions. As discussed earlier, interactive multimedia applications such as video-conferencing require a bound on jitter in addition to a bound on delay (variance in end-to-end delay is called jitter). Delay jitter is an important performance metric for real-time traffic. For example, if the screen is not updated regularly, the user may notice flickering in the image. Similarly, if voice samples are not played out at regular intervals, voice output may sound distorted. Some factors responsible for causing jitter are: The time required to digitize, compress, and decompress; Variation in delay in the network transport system; Operating system scheduling latencies. Figure 1.6 shows the steps involved in playing out a video frame from the moment it is grabbed by the sender until it is displayed at the receiver. The grab line shows the interval between successive frame grabbing and compression. The time interval between two successive frames is variable. The next line shows the time line of frames being sent from the sender. The receive line shows the time when frames arrive at the receiver. The jitter in network delay introduces variance in arrival of frames at the receiver. This variance can be very large in a connectionless network such as the Internet. In order to accommodate the jitter introduced by grabber, sender, and network, each frame is delayed by an additional amount . After this delay period, frames are played back at a regular interval, , depending upon the frame rate. If the operating system at the receiver supports real-time guarantees, we can expect the inter-display time to be constant. For non-real-time operating systems or where is the variation in operating such as UNIX, this interval is
Introduction
15
system latency. Bounds can be provided if the network is connection oriented and resources can be reserved in advance during the connection set-up phase.
Grab
Send Time
Receive d
Playout Real−Time OS D
R
R Playout Non−Real−Time OS
D
R+r
R−r
L
Figure 1.6
Video playout pipeline.
However, under the assumption that the network cannot guarantee the required bounds on delay and jitter, there is a need to accommodate the delay jitter in end systems. If the underlying network is connectionless (individual packets may take different routes), the video frames may arrive at the destination out of sequence or after the time at which they should be displayed. A gap may result if a frame is not available for display. This may affect the quality of audio-video playout. An application may reduce or eliminate delay jitter by carefully managing the process of acquiring, processing, transmitting, and displaying frames. However, this may require services from the operating system and the network transport system that are not provided in general purpose computing and networking environments.
16
Engineering Internet QoS
1.9 APPLICATION-LEVEL ADAPTATION We discuss some of the methods used for smoothing packet delay variability in the output by destination buffering. The basic idea behind each of them is to delay the display of the first packet by an amount, , (called the destination wait time) so that the receiver has at least one frame buffered most of the time. This reduces the frequency and duration of gaps caused by late arrival of packets, but results in increased latency. Selection of the value of is the key to the success of any playout algorithm.
1.9.1 Montgomery’s Destination Wait Method In this approach, the receiving system selects a destination wait time and attempts to play the received frames after this time. For each received packet, the playout time is selected as a fixed interval after which the packet is produced. Packets that arrive before their playout time are placed in a buffer (to wait for the amount of destination wait time). Packets that arrive late may be considered lost, subject to playout policies. Selection of this destination wait time is a very challenging task and should take into consideration the variation in end-to-end delay. Such systems should also dynamically adjust the destination wait time. The destination wait time should be chosen to optimize the delay and loss. Figure 1.7 (adapted from [21]) shows two delay components, fixed delay ( ) and variable delay ( ). The playout point is the sum of and . It is evident from the figure that loss of packets (because of late arrival) can be reduced by increasing the playout time. For interactive communication, there is an upper limit on acceptable end-to-end latency of packets. Hence the playout point has to be within this limit if interactivity is to be maintained. Subjective evaluations have shown that 250 ms may be an appropriate upper limit on one-way communication delay for a packet voice call in the public telephone network [22]. Montgomery suggested various ways of determining end-to-end delay and then displaying the frames based on this estimate. In a method called blind delay, the delay of the first packet is taken as and is chosen to optimize the loss factor. If sender and receiver clocks drift then this factor has to be adjusted in the playout time calculation. However, this method is not suitable for WANs where may be too large. Round-trip measurement is another way of estimating delay. This involves measuring round-trip delay to a destination and assuming that it is equally distributed in both directions (although this may not be true). This approach has been used for clock synchronization in distributed systems.
Introduction
17
100 %
% of packet arriving
0%
Df
Dv
Playout Time
Figure 1.7
Impact of destination wait time on playout.
Absolute timing uses synchronized clocks, which means that source and destination both use the same absolute timing reference. The network time protocol (NT) [23] facilitates synchronization of clocks in distributed systems to a granularity of a few milliseconds. The task of clock synchronization is becoming easier by use of the global positioning system (GPS) [24]. The GPS system is a space-based radio-positioning system consisting of a collection of satellites owned by the U.S. government that provide navigation and timing information to military and civilian users around the world. Signals from these satellites are so accurate that the time can be estimated within tens of microseconds. GPS receivers are already available in the market and can be fitted to PCs and workstations. As the price of these receivers goes down, they will become ubiquitous. The added variable delay method suggests use of an additional delay stamp field in the packet header. Each source of variable delay (queuing and processing) adds the delay introduced to this field using the local clocks (time the packet was sent – time packet was received). The playout time of packets is calculated using the local clock of the receiver as follows:
!#"%$ $'&)(+*-,./"%01*023$ $'&4(+*5 (+6 7*#89:#&'<;/8* 7=*#=
(1.1)
18
Engineering Internet QoS
where
(6 7=*#8>,
Maximum destination wait time
Montgomery also recommends adaptation of the delay estimate as the call progresses so that the destination wait time can be changed during a silence interval. These can be compressed or expanded without compromising the quality of speech. The Pandora system [25] uses clawback buffers to place packets as they arrive. These buffers are designed to remove the effects of jitter and are placed as close to the destination as possible. Some ways have been suggested for deciding how much to buffer. This in turn accommodates jitter. 1.9.2 Adaptive Audio Playout In this section we describe an adaptive method of audio playout that is being used by most MBONE audio applications [26]. The playout time for the first packet of the talk spurt is calculated using:
?A@
? @ ,CB @ D FE 2FGIH
(1.2)
where
J? @K, Playout time for frame & B3@L, Send time for frame & H, Jitter
MEC, Adapted delay value 2N,PORQTS
playout delays. This parameter can be used to control the delay-versus-loss tradeoff in the playout algorithm. The playout point of all subsequent packets the following equation:
[, &3 O=Q\&% ZS
?YX
in a talk-spurt is calculated using
? X , ?K@% ZB X B3@
where The adapted delay value and jitter equations suggested by Jacobson [27].
FE
H
(1.3)
are estimated using the following
ME @ ,C]G^ ME @`_ba c\Od]IeJG^7 @
(1.4)
Introduction
19
H @ ,C]GIH @`_ba CcfOdg]KeWh ME @ g7 @ h
(1.5)
where
7 @ , @ $ @\i @ , Arrival time of packet & $ @ , Transmit time of packet &
ME @ and H @ are updated for each packet but are used only at the beginning of
each talk-spurt. Ramjee et al. also analyzed various methods of delay adaptation. One method used a different weighting factor, for increasing (0.75) and decreasing (0.998002) trends. They also tried same weighting (0.998002) for delay as well as jitter estimates. In a slightly different approach, they used the minimum packet delay from the previous talk-spurt as the value of for playout calculation. The best result was achieved by using for delay estimates and for jitter estimates.
]Z,kjmlnO0S=o
ME
],pl q!ro
1.9.3 Feedback Control Mechanism The Internet does not provide guaranteed resources such as bandwidth, or guaranteed performance measures such as maximum delay. One way to support packet video in these networks is to use feedback mechanisms that adapt the output rate of video coders based on the state of the network. Bolot et al. [28] have shown the suitability of this approach through experiments. Packet loss at the receiver is detected using a sequence number (if packets , , have been received and is missing then it is considered lost). For a conference with small number of participants, this loss information can be sent in a NACK (negative acknowledgment) packet. For a large multicast conference, it may be sent periodically (every 100 packets or at least every 2 minutes [29]) as average packet loss rate for a period to avoid the NACK explosion problem. The output rate of the coder is reduced if the loss rate increases. The feedback control facilitates maintaining good quality videoconferencing even across congested Internet connections, and stops senders from wasting resources by regulating the data rate. Bolot et al. also examined rate control issues in a multicast environment. This mechanism has been implemented in the H.261 video coder of IVS [30].
2
2A sO 2A +S 2A t
1.9.4 Forward Error Correction In this scheme, some redundant information is added to original packet so that the receiver can reconstruct the original stream if some packets are lost [31].
20
Engineering Internet QoS
1.9.5 Interleaving The interleaving technique is useful for an audio transmission in which the sender interleaves the original stream to avoid a large gap because of loss of a single chunk. Because of resequencing the effect of lost chunk is spread in the reconstructed stream. 1.9.6 Repair at Receiver This technique tries to replace a lost packet with a packet that is similar to the original packet. The audio has a large amount of short-term self-similarity, which makes the substitution easier. In the simplest cases, previous packets of audio samples can be repeated (previous frames, for video). 1.10 REAL-TIME PROTOCOL Real-time protocol (RTP) is an application layer protocol standardized by IETF in RFC1889 [29]. RTP is end-to-end protocol that uses an underlying transport layer (such as TCP or UDP) to get the packets through to destination. RTP is not capable of providing any QoS guarantees by itself. Multimedia data is encapsulated in RTP packets and sent over the network using UDP socket interface (most audio-video tools in the Internet use this). RTP can be used for transporting a variety of formats such as MPEG-1/2 for video, PCM audio, GSM audio, and a variety of others. RTP packets contain information such as sequence numbers, timestamps, payload type, etc. Applications extract the data from the RTP packet and use the information fields from the RTP header to decompress/decode and playout the media stream. Figure 1.8 shows the header format used by RTP. A brief description of RTP header fields is provided below: Version (V): 2 bits. Identifies the version of RTP (currently 2); Padding (P): 1 bit. If set, the packet contains one or more padding octects at the end of the payload; Extension (X): 1 bit. If set, fixed header follows an extension. CSRC count (CC): 4 bits. Indicates the number of CSRC identifiers that follow the fixed header. This is relevant when a mixer is used (not discussed here).
Introduction
0
21
31
V
P X CC M Payload type
Sequence number
Media timestamp
Synchronization source identifier
Contributing source (CSRC) identifiers
Figure 1.8
RTP header.
Marker (M): 1 bit. May be used for marking events such as frame boundary (as defined in profile); Payload type: 7 bits. Indicates type of encoding for audio (PCM, LPC) or video (M-JPEG, MPEG1/2, H.261, H.231 ); Sequence number: 16 bits. A number incremented by 1 for each RTP packet sent. It is useful for loss detection and packet sequencing; Timestamp: 32 bits. A number representing a sampling instant of the first data byte in an RTP packet. Let’s take an example of an audio sample at 8kHz sampling clock. The timestamp clock will increase by 1 every 125 sec. Most audio tools generate 160 encoded samples (20 ms of data), then the timestamp increases by 160 for each RTP packet. The timestamp clock keeps increasing even if the source is inactive;
22
Engineering Internet QoS
Synchronization source identifier (SSRC): 32 bits. An identifier uniquely identifying the source of the RTP stream (each stream in a session will have distinct SSRC). Each source picks up a random number for SSRC. In case of collision (an unlikely event), sources will pick up another SSRC; CSRC list of 0 to 15 items: 32 bits each. Identifies the contributing sources for the payload contained in this packet. The identifiers are inserted by a mixer using the SSRC of a contributing source. Issues such as RTP header compression and multiplexing are covered in detail in [32].
1.11 REAL-TIME CONTROL PROTOCOL Real-time Control Protocol (RTCP) is a companion protocol that multimedia applications can use along with RTP. RTCP packets are sent periodically between sender(s) and receiver(s). These packets contain statistics such as number of packets sent, number of packets lost, and interarrival jitter. The loss rate can be useful for senders to adapt their sending rate. RTCP is used in unicast as well as multicast communications. With larger number of receivers, there is a chance of explosion of RTCP messages. RTCP attempts to limit bandwidth to 5% of total session bandwidth. For example, a session has been allocated a bandwidth of 2 Mbps. The RTCP is limited to 100 Kbps. Further, this 100 Kbps gets divided between sender and receivers in the ratio of 1/4. In this example, receivers get 75 Kbps in total and, assuming that there are N receivers, each gets 75/N Kbps. An RTCP receiver generates a reception report (RR) for each RTP stream that it receives. These RRs are aggregated into a single RTCP packet. The report is multicasted, which means that it is received by all participants in a session. The report contains several fields, with the following being relevant from the QoS point of view: SSRC identifying the source of RTP stream for which the report has been generated; Fraction of packets lost in the RTP stream since last report; Cumulative number of of packets lost; Interarrival jitter (calculated as average).
Introduction
23
31
0 V P
RT
PT=SR
Length
Source identifier
NTP timestamp, most significant word
NTP timestamp, least significant word
Media timestamp
Figure 1.9
RTCP header.
Sender also creates a sender report consisting of the following fields, as shown in Figure 1.9: SSRC identifying the source of RTP stream; NTP timestamp and media timestamp. This provides mapping between the media timestamp (as per discussion in the RTP section) and the network time protocol (or wall clock). This mapping is particularly useful for cross-media synchronization; Number of packets and bytes sent in the stream. In addition, the source also sends source description packets containing information such as the e-mail address of the sender, name of sender, application responsible for generating RTP stream, and SSRC. The RTCP standard allows sending receiver report, sender report and source descriptors in the same packet.
24
Engineering Internet QoS
1.11.1 Interarrival Jitter Calculation We describe the interarrival jitter calculation based on RFC1889. This RFC defines the interarrival jitter as the mean deviation (smoothed absolute value) of the difference in packet spacing at the receiver compared to the sender for a pair of packets. Equation 1.6 shows how this field can be calculated using timestamps available from the RTP header.
u
c`&TQ4He5,Pc8wvw-@4eKxc)BvwgB3@4ey,zc8wv{Bv0eI|c8-@3B3@)e
(1.6)
where
c`&TQ4He = Jitter between packets & and H B @ = RTP timestamp from packet & @ = Is the time of arrival in RTP timestamp units for packet & The value of u is calculated using the following equation: (1.7) uN,Pu Cc}h c8&JxOQ\&feWh1Zube~!O0 The value of u is continuously calculated for each received packet & from a source. The difference for that packet and the previous packet &KxO in order of arrival is used for this purpose. It is worth noting that packet arrival may not necessarily be in sequence. The gain parameter O1~!O0 in equation 1.7 provides a low pass filtering
to remove the impact of spikes in the measurement. Packet loss information can be useful in tracking persistent congestion problems. The interarrival jitter field may be useful in tracking transient congestion problems before actual loss starts. The RFC1889 recommends that it may be worth analyzing a number of reports from one receiver over a period or possibly from multiple receivers belonging to the same network, as the interarrival jitter field represents only a snapshot of the jitter at the time the report was generated. 1.11.2 Example: Audio Transmission in the Internet Figure 1.10 shows how the audio signal is captured from a microphone at the sender and played back at the receiver connected over the TCP/IP network. Most PCs/workstations these days have an on-board sound card, which is capable of capturing and digitizing audio samples. Digital voice can be produced in a variety of formats from low data rates, such as LD-CELP (6 Kbps), to high data rates, such as PCM (64 Kbps).
Introduction
25
Speaker
Microphone
Analog audio
Analog audio
Digitized audio PCM/ADPCM...
Digitized audio PCM/ADPCM...
RTP/RTCP
RTP/RTCP
UDP/IP
UDP/IP
Subnet Protocol
Subnet Protocol
Internet
Figure 1.10
TCP/IP protocol stack for audio transmission.
The RTP is used to encapsulate the audio data. These RTP packets are then sent using user datagram protocol (UDP) socket interface. UDP is a preferred choice of transport protocol over TCP because interactive communications like voice telephony is more susceptible to delay than to limited amount of packet loss. End-to-end latency is an important QoS parameter for interactive multimedia communications. For applications such as Internet telephony, it consists of several factors. First of all, analog audio needs to be digitized and packetized. Once the packets are ready, they are passed on to the operating system (OS) for transmission. Depending upon the scheduling mechanisms (real-time versus non-real-time OS), the packet will incur some amount of delay before being transmitted. Transmission delay is dependent on link speed (typically 10/100 Mbps in the LAN environment these days). Link length determines propagation delay (this is very small in comparison to other factors). As the packets are being forwarded by routers, they incur a variable amount of delay (depending upon processing speed and load of the routers). Upon receipt at the receiver again, OS scheduling, depacketization, and playout delay are added. However, for interactivity this playout delay should be ideally low (50–150 ms). Studies have shown that communication becomes unintelligible if the end-to-end latency increases beyond 400 msec.
26
Engineering Internet QoS
In best effort network such as the Internet there is no bound on delay which results in large variance in delay (jitter). Packets arrive at variable intervals at the receiver. If packets are played out with variable interpacket (frame for video) gaps, the resulting playout is not going to be smooth (this may result in jerky video stream or annoying audio). Most Internet multimedia tools currently deploy receiver buffering techniques described earlier. A limited number of packets are buffered before playout of stream starts. For interactive applications there is an upper bound on end-to-end latency. Buffering large number of packets will result in exceeding the end-to-end latency but buffering too few may result in pause in playout because of the empty buffer. Several playout algorithms have been proposed in literature for playout adaptation [33, 34].
1.12 SUMMARY In this introductory chapter we defined QoS and what are the QoS parameters. We covered factors that lead to a need for QoS and then discussed several trends that lead to a need for quality of service in today’s data networks. Finally, we discussed components of quality of service for multimedia transmission over the Internet. A brief overview of media compression techniques, OS scheduling issues, and networking technologies was introduced. The current best effort model, the Internet, is not capable of guaranteeing QoS. End system adaptation and other techniques to improve QoS were also described. The IETF RTP/RTCP protocols that enable transmission of multimedia with a header that is useful in reconstruction, rate adaptation, and jitter calculation were discussed briefly. The rest of the book is network-centric, looking at providing QoS in the core and access network. We concentrate mostly on IP networks, with a brief introduction to the ATM network in the rest of the book.
1.13 REVIEW QUESTIONS 1. Why is QoS such an important issue? 2. Is overprovisioning an answer to QoS in the Internet? Support your argument with a description. 3. What are user-perceived QoS parameters?
References
27
4. What are technology-oriented QoS parameters? 5. How does OS scheduling affect QoS for a multimedia session? 6. What are some of the techniques for improving QoS in a best effort Internet? 7. Describe the basic elements of an audio playout algorithm. 8. Why was there a need for protocol like RTP? 9. Why is there a need to restrict bandwidth consumed by RTCP? How does RTCP achieve scaling in a multicast session? 10. Describe the protocol stack used by an interactive audio application in the Internet. Why is UDP used at the transport layer?
References [1] Andrew Campbell and Geoff Coulson. Implementation and evaluation of the QoS–A transport system. In Dabbous Walid and Diot Christophe, editors, Protocols for High-Speed Networks V, pages 201–218, Inria, France, October 1996. Chapman and Hall. [2] Klara Nahrstedt and Jonathan M. Smith. The QoS Broker. Technical Report MS-CIS-94-13, University of Pennsylvania, March 1994. [3] K. Lakshman. AQUA: An Adaptive Quality of Service Architecture for Distributed Multimedia Applications. PhD thesis, Computer Science Department, University of Kentucky, Lexington, Kentucky, January 1996. [4] P Florissi and Y. Yemini. Management of application quality of service. In International Workshop on Distributed Systems Operations and Management, Toulouse, France, October 1994. [5] M. Fry, V. Witana, P. Ray, and A. Senevirante. Managing QoS in multimedia services. Journal of Network and Systems Management, 5(3), September 1997. [6] K. Nahrstedt and R. Steinmetz. Resource management in networked multimedia systems. IEEE Computer, 28(5):52–63, May 1995. [7] Morris Sloman and Dan Chalmers. A survey of quality of service in mobile computing environments. In IEEE Communications Surveys, pages 2–10, 1999. [8] M. Alfano and R. Sigle. Controlling QoS in a collaborative multimedia environment. In Proceedings of the Fifth IEEE International Symposium on High-Performance Distributed Computing (HPDC-5), Syracuse, NY, August 1996. [9] S. Fischer and R. Keller. Quality of service mapping in distributed multimedia systems. In IEEE International Conference on Multimedia Networking (MmNet95), pages 132–141, AizuWakamatsu, Japan, September 1995. [10] B. Landfeldt, A. Seneviratne, and C. Diot. User services assistant: An end-to-end reactive QoS architecture. In Proceedings of IWQoS’98, pages 177–186, Napa, California, USA, May 1998.
28
References
[11] L. Palmer, R. Palmer, P. Callahan, and J. Marsh. The arrival of desktop teleconferencing. White paper, October 1991. Digital Equipment Corporation. [12] P. K. Andleigh and K. Thakkar. Multimedia Systems Design. Prentice Hall, Upper Saddle River, NJ, 1996. [13] K. R. Rao and J.J. Hwang. Techniques and Standards for Image Video and Audio Coding. Prentice Hall, Upper Saddle River, New Jersey, 1996. [14] Newton Faller. Measuring the latency time of real-time Unix like operating systems. Technical Report TR 92-037, International Computer Science Institute, Berkeley, California, June 1992. [15] K. Fall, J. Pasquale, and S. McCanne. Workstation video playback performance with competitive process load. In Proc. International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV), pages 179–182, Durham, New Hampshire, April 1995. Springer. [16] A. Mauthe, W. Schulz, and R. Steinmetz. Inside the Heidelberg Multimedia Operating System Support: Real-Time Processing of Continuous Media in OS/2. Technical Report 43.9214, IBM, European Networking Center, Heidelberg, Germany, September 1992. [17] Jason Nieh, James G. Hanko, J. Duane Northcutt, and Gerard A. Wall. SVR4 UNIX scheduler unacceptable for multimedia applications. In Proceedings of the 4th International Workshop on Network and Operating System Support for Digital Audio and Video, pages 41–53, Lancaster, U.K., November 1993. Lancaster University. Lecture Notes in Computer Science 846. [18] Sape J. Mullender, Ian M. Leslie, and Derek McAuley. Operating system support for real-time multimedia communication. In Proceedings of Usenix Summer Conference, Boston, Massachusetts, June 1994. [19] Sandeep Khanna, Michael Sebree, and John Zolnowsky. Realtime scheduling in Sunos 5.0. In Proceedings of Usenix Winter Conference, San Antonio, TX, June 1992. [20] William Stallings. Data and Computer Communications. Prentice Hall, Upper Saddle River, New Jersey, 1997. [21] Warren A. Montgomery. Techniques for packet voice synchronization. IEEE Journal on Selected Areas in Communications, SAC-1(6):1022–1028, December 1983. [22] K. Nahrstedt and R. Steinmetz. Resource management in networked multimedia systems. In IEEE Computer, pages 53–63, May 1995. [23] David L. Mills. Measured performance of the network time protocol in the DARPA/NSF Internet system. ACM Computer Communication Review, 20(1), January 1990. [24] P. H. Dana. Global positioning system overview. http://www.colorado.edu/geography/gcraft/notes/gps/gps f.html, Department of Geography, University of Colorado at Boulder, 2000. [25] Alan Jones and Andy Hopper. Handling audio and video streams in a distributed environment. Technical Report TR 93-4, Olivetti Reseach Laboratory (ORL), Cambridge, England, 1993. Proceedings of 14th ACM Symposium on Operating System Principles, OSR, Vol 27, No 5, December 1993.
References
29
[26] Ramachandran Ramjee, Jim Kurose, Don Towsley, and Henning Schulzrinne. Adaptive playout mechanisms for packetized audio applications in wide-area networks. In Proceedings of the Conference on Computer Communications (IEEE Infocom), pages 680–688, Toronto, Canada, June 1994. IEEE Computer Society Press, Los Alamitos, California. [27] Van Jacobson. Congestion avoidance and control. ACM Computer Communication Review, 18(4):314–329, August 1988. [28] Jean-Chrysostome Bolot and Thierry Turletti. A rate control mechanism for packet video in the Internet. In Proceedings of the Conference on Computer Communications (IEEE Infocom), pages 1216–23, Toronto, Canada, June 1994. [29] Henning Schulzrinne, Stephen Casner, Ron Frederick, and Van Jacobson. RTP: A transport protocol for real-time applications. Internet Request for Comment RFC1889, IETF, January 1996. [30] Thierry Turletti. The INRIA videoconferencing system IVS. Connexions, 8(10):20–24, October 1994. [31] Jean-Chrysostome Bolot and Andres Vega Garcia. Control mechanisms for packet audio in the Internet. In Proceedings of the Conference on Computer Communications (IEEE Infocom), pages 232–239, San Fransisco, California, March 1996. [32] J. Crowcroft, M. Handley, and I. Wakeman. Internetworking Multimedia. Morgan Kaufman Publishers, San Francisco, California, 1999. [33] S. K. Jha and P. A. Wright. Playout management of interactive video – an adaptive approach. In Proceedings of IWQoS’97, pages 145–156, New York, USA, May 1997. [34] Vicky Hardman, Angela Sasse, Mark Handley, and Anna Watson. Reliable audio for use over the Internet. In Inet’95, Honolulu, Hawaii, June 1995.
30
References
Chapter 2 QoS Fundamentals Supporting QoS in packet switching networks requires specialized infrastructure to be designed and developed. In this chapter, we discuss the fundamental concepts, issues, and algorithms required to implement QoS in packet switching communication networks. In the rest of the book, we will show how these fundamental concepts form the basis to build technology specific QoS solutions for two popular networks, IP and ATM.
2.1 TRAFFIC DESCRIPTION The user needs to provide an accurate description of its traffic to the network, so the network can appropriately allocate its resources to support the required QoS. One of the basic requirements of QoS, therefore, is to be able to qualitatively describe the pattern of traffic generated from a given source. In this section, we discuss the methods used for quantitatively descriptions of various types of traffic transported over the Internet. 2.1.1 Types of Traffic Sources The bitrate of an application is an important factor for the network in allocating resources. The dynamics of bitrate over time describes the behavior of a traffic source. Based on the bitrate dynamics, all applications can be categorized into two main categories: Constant Bit Rate (CBR). These applications send traffic at a constant rate. Many multimedia applications fall under this category. As an example, we
31
32
Engineering Internet QoS
considered the widely used PCM coded voice, which generates traffic at 64 Kbps. Variable Bit Rate. Traffic rates of these applications are not constant. A good example is MPEG coded video. When there is lot of scene change, it generates many bits per second. If there is not much change, for example, a speaker is delivering his or her speech, the bit rate is minimal. 2.1.2 Traffic Parameters Although it is rather trivial to describe the traffic behavior of a CBR source, it is not easy to completely describe the traffic pattern of a VBR source. Nevertheless, it is possible to bound the traffic of a VBR source using a few cleverly chosen traffic parameters. The following three parameters are commonly used to bound source traffic: Peak Rate is the maximum data rate in any time interval. CBR traffic can be completely described using the peak rate of the traffic. Average Rate is the “long-term” mean of the traffic rate for a VBR source. Burst Size refers to the number of packets that can be delivered in the peak rate. VBR traffic is inherently bursty, i.e., generates traffic in bursts. For example, when there is a quick scene change, an MPEG video codec may generate several packets within an extremely short interval. Note that such bursty traffic is a source of potential congestion and delay in the network. Hence a limit on the burst size is a useful parameter for a QoS contract. Figure 2.1 illustrates the three traffic parameters for a typical on-off bursty VBR source, which transmits at a peak rate for a while, then goes to a dormant state (no traffic), again sends a burst of traffic at the peak rate, and so on. As we can see, the average rate in a long-term interval is below the peak rate. The burst size varies over time. Although the average rate gives us a rough estimate of the bitrate of a source over a long-term interval, the peak rate and the burst size play a crucial role in defining the traffic pattern more precisely. Traffic from two different sources may have the same average rate, yet they can display significantly different patterns. Figure 2.2 illustrates three different traffic patterns, all having the same average rate of 10 Kbps. When peak rate is lower, larger bursts can be accommodated to maintain the same average rate.
QoS Fundamentals
33
Burst 3
Burst 2
Transmission rate
Burst 1
Peak rate
Average rate
Time
Figure 2.1
Traffic parameters.
10 Kbps Time 20 Kbps
Time
40 Kbps
Time
Figure 2.2
Different traffic patterns with the same average rate.
34
Engineering Internet QoS
2.2 QoS SPECIFICATION AND CONTRACT The user must specify the required QoS using a set of QoS parameters. A comprehensive list of QoS parameters was provided in Chapter 1. The purpose of specifying the QoS is to enter a QoS contract with the network provider. A well-specified QoS contract yields the following benefits: QoS guarantee. The user will precisely know what QoS guarantee it is going to get from the network. Quality monitoring. The user and the network can monitor the quality of a call against the set of QoS parameters. Charging. Differential charging can be implemented based on the values of the QoS parameters specified in the QoS contract. For example, a contract with a delay limit of 10 ms could be charged higher than the one with a delay limit of 100 ms.
2.3 QoS SIGNALING To receive the required QoS for a given communication, the user must be able to negotiate the relevant QoS parameters with the network before the communication starts. Such negotiations can be achieved in two ways: Static Configuration. Relevant table entries are created manually. Such static QoS configurations remain valid for a long period until changed (manually) again. Due to the need for manual intervention, it is not easy to renegotiate the QoS dynamically. Dynamic Negotiation. QoS configurations are accomplished automatically using software. Such automatic negotiation is called signaling. During signaling, the user passes its traffic and QoS parameters and their associated values to the network and the network provides details of the QoS guarantees to the user. The communication starts after the QoS signaling is completed successfully.
2.4 PACKET CLASSIFICATION In order to provide differing grades of service, the network devices (router/switches) must distinguish between packets. This is particularly true for IP networks that do
QoS Fundamentals
35
not maintain a virtual circuit. Packets can be marked with distinct code so that they receive certain kind of treatment from a network device. What classification criteria is used for identifying packets and what treatment is given to which class of packets is a policy matter and will depend upon the QoS architecture. We discuss these in detail in context of Intserv and Diffserv architectures in Chapters 5 and 7, respectively. Wang [1] provides a good description of several hashing-based schemes for packet classification and their performance comparison.
2.5 RESOURCE RESERVATION Unlike best effort networks, QoS networks must make sure that the network is properly dimensioned and enough communication resources are in place in order to make sure that the negotiated QoS is maintained. For absolute QoS guarantee, it may be required to reserve some resources in advance of the call and release them after the call ends. The most important network resources that can be reserved are link bandwidth and buffer space. Adequate bandwidth reservation helps in maintaining delay and jitter requirements of critical calls. Reserving buffer space helps in maintaining any negotiated limit on the packet loss rate in the network. If no signaling is in place, resource allocation has to be made manually while configuring the QoS. Per-call resource reservation has been adopted in ATM networks and in the Intserv model [2] of the Internet. We revisit the resource reservation issue in context of Intserv architecture in Chapter 5 with a detailed discussion of the IETF Resource Reservation Protocol (RSVP) in Chapter 6.
2.6 ADMISSION CONTROL Best-effort networks that do not guarantee QoS accept traffic from any source anytime without any question. However, such uncontrolled admission is not acceptable for QoS networks, as a newly admitted source may jeopardize the QoS of an existing connection. Admission of new calls therefore must be controlled carefully in a QoS network to protect the QoS of the calls currently in progress. The concept of controlling the admission of new calls is known as admission control. If admission control is implemented in a network, there are basically two outcomes of a new call request: It will either be accepted or rejected. Such per-call admission control has been adopted in ATM networks [3]. There are two ways to implement admission control. One natural way is to exercise it automatically during QoS signaling. If no signaling is implemented,
36
Engineering Internet QoS
admission control has to be performed manually, i.e., someone (usually the system administrator or the network operator) has to make a decision about whether a new call request can be accommodated. Manual control, however, is not practical for small time scales, such as seconds, but can be achieved for longer time scales, such as days or months. If manual control is used, it may not be applied on a per-call basis; some sort of aggregation would be more practical. One of the key challenges is to make sure that adequate resources are allocated in order to maintain the negotiated QoS. The question is how much resource should be considered adequate. For CBR sources, a bandwidth reservation at the peak rate is sufficient and appropriate. For VBR sources, the answer is not so easy. If peak rate is reserved, bandwidth will be wasted, as VBR sources do not send traffic at the peak rate all the time. If average rate is reserved, the packets will be delayed during the bursts. The answer is definitely between the average and the peak rate, but one must devise a more precise algorithm, which can quantitatively relate a given reservation to the QoS parameters, such as delay and loss. 2.7 TRAFFIC POLICING Since a user enters a “QoS contract” with the network, the QoS network must police all entering traffic to detect any violation of the negotiated contract. The policing is performed on each packet entering the network. Figure 2.3 illustrates the concept of policing in a QoS network. For each packet, the policing function must detect whether the packet is conforming to the contract or not. If conforming, the packet is admitted to the network without any further ado. Note that the network has a responsibility to maintain the negotiated QoS only for the conforming packets. What can be done with the “illegal” packets trying to enter the network in violation of the contract? The policing function can either drop them unceremoniously, or decide to accept them as lower priority packets that will be dropped in the first instance when congestion occurs. If no congestion occurs, these packets can make use of available bandwidth in the network. The policing must be implemented at the edge (or entrance) of the network. Therefore, only the routers with direct connection to the users should implement traffic policing. No policing is needed in the core routers inside the network. 2.7.1 Requirements for Traffic Policing There are several basic design and operating requirements for any policing algorithm:
QoS Fundamentals
37
Traffic arriving at network entrance
Nonconforming traffic Dropped
Policing function
Conforming traffic
Enters network as is
May enter network with reduced priority
Figure 2.3
Traffic policing in a QoS network.
It must not discard or decrease the priority of packets that do not violate the negotiated contract. It must detect every packet that violates the contract and take appropriate actions (drop or decrease the priority). It should operate in real time and should not cause any additional delay for the admitted packets. It should not be too complex to implement. 2.7.2 Policing Parameters Traffic is policed on various combinations of the traffic parameters discussed earlier in the chapter. For example, peak-rate policing tests whether a packet violates the negotiated peak rate. It does not test anything else. Such type of policing is suitable for CBR sources which have peak rate as the only traffic parameter. For VBR sources with peak rate limitation (the usual case), all three parameters—peak rate, average rate, and burst size—need to be policed. Such policing on multiple traffic parameters is more complex than the peak rate policing. Table 2.1 shows three possible combinations of traffic parameters to be used for policing. 2.7.3 Policing Algorithms Leaky bucket is a generic policing algorithm used to police the parameters shown in Table 2.1. Different forms of leaky bucket are used to police different combinations of parameters. If only peak rate is policed, a simple leaky bucket is sufficient. If
38
Engineering Internet QoS
Table 2.1 Different Types of Policing Based on Policing Parameters Policing Parameter
Traffic Type
Peak rate Average rate, Burst size Peak rate, Average rate, Burst size
CBR VBR without limit on peak rate VBR with peak rate limitation
Packets from source
Enters network Small bucket
Figure 2.4
Peak rate policing using a simple leaky bucket.
average rate and burst size need to be policed, a more advanced leaky bucket, called token bucket, is required. A token bucket is used in tandem with a leaky bucket (we refer to this combination as dual leaky bucket), when both peak rate and average rate need to be policed. Details of these three variants of leaky buckets are discussed below. Simple Leaky Bucket Figure 2.4 illustrates the concept of peak rate policing using a simple leaky bucket. There is a small bucket (buffer) to hold a few packets arriving from the source. The packets from the buffer, if the buffer is not empty, are accepted into the network at the negotiated peak rate. If an arriving packet finds the buffer full, it is considered nonconforming, i.e., this packet is violating the negotiated peak rate. The purpose of the small buffer is to allow for some minor variations in the arriving rate caused by the hardware or software packet processing elements. The buffer should not be too large; a large buffer would allow large bursts to pass the conformance test. A zero-size buffer (no buffer) means strict policing of peak rate without considering the small deviations caused by hardware or software. Although the leaky bucket concept is based on a buffer to hold incoming packets, it is interesting to note that the implementation of leaky bucket policing does not involve any actual buffering. All that is required is a simple counter
QoS Fundamentals
39
Table 2.2 Policing Decisions for Five Packets for a Peak Rate Policing Packet Arrival (ms)
Counter Value
Policing Decision
10.0 11.0 11.2 11.5 12.0
1 1 2 2 2
Conforming Conforming Conforming Nonconforming Conforming
or variable. The implementation of a simple leaky bucket with a counter is as follows. The counter is initially set to zero. Upon arrival of each packet, the counter is incremented by one. If multiple packets arrive simultaneously, the counter will be incremented by more than one. This counter therefore keeps track of the virtual buffer occupancy. The counter is decremented periodically at the peak rate, simulating the servicing of packets from the buffer at a constant peak rate. A packet is found to be nonconforming if upon its arrival the counter is found to be at its limit. Here the limit simulates the maximum size of the virtual buffer. We illustrate the operation of peak-rate policing with a simple leaky bucket using the following example. Assume we are to police a peak rate of 1,000 packets per second. Therefore, the period to decrement the counter should be set to 1 ms. We will assume that the counter is decremented at the ms boundary, i.e., the counter is decremented at or just before 1 ms, 2 ms, and so on. Let us further assume a counter threshold (burst allowed) of 2 packets. Table 2.2 shows the arrival times of 5 packets and the policing decision for each packet, given the above assumptions. In the above example, we considered “packets” instead of bytes for measuring the traffic. For networks with fixed-size packets, such as ATM networks, the above example is perfectly alright. However, for networks with variably sized packets, such as the IP networks, counter threshold should be set to bytes, instead of packets. Token Bucket Figure 2.5 illustrates the concept of a token bucket. Instead of data packets, the bucket this time contains tokens. Tokens trickle into the bucket at the average rate of to be policed. The bucket has a finite depth of . Arriving tokens overflow the bucket (are lost) when the bucket is full. The bucket depth corresponds to the burst size to be policed. Arriving packets can pass the conformance test and be accepted
;
40
Engineering Internet QoS
Bucket of depth b
Tokens arrive at the average rate r
Packets from source
Figure 2.5
Enter network
Traffic policing with a token bucket.
into the network only if there is enough token in the bucket. One token is removed from the bucket for each packet accepted into the network. From the above description, we obtain the following: If the source does not transmit for awhile and allows the bucket to be full of tokens, it can send a burst of packets to the network at the same time without waiting. The maximum burst size is therefore limited by the bucket depth. This is burst size policing.
;
The token generation rate limits the long-term average rate that a source may receive. As the bucket is being filled at the average rate of , the maximum number of packets that can be served for a source within time interval is bounded by .
0$L Z;
$
Arriving packet(s) is found to be nonconforming if there is no token in the bucket. Like the simple leaky bucket algorithm, implementation of a token bucket requires only one counter, no actual buffering is needed. The implementation and operation of a leaky bucket with only one counter are given below. The counter is initially set to zero and incremented by one, to a maximum value of the burst size, periodically at the average rate. The counter is decremented by one for each packet accepted into the network. If multiple packets (a burst) are accepted simultaneously, the counter is decremented by more than one. Packets are found to be nonconforming if upon
QoS Fundamentals
41
Table 2.3 Policing Decisions for a Token-Bucket Policing Packet Arrival (ms)
Counter Value
Policing Decision
100.0, 100.1, 100.2, 100.3, 100.4, 100.5 110 120.1, 120.2, 120.3, 120.4, 120.5, 120.6
4 4 0
Conforming Conforming 5 Conforming, 1 nonconforming
arrival the counter is not large enough to accommodate them. If the counter was zero, all arriving packets are found to be nonconforming. If the counter was not zero, but not large enough either to accommodate the entire burst, some packets from the burst are accepted and others are considered nonconforming. The following example illustrates the operation of a token bucket. Let us assume the average rate to be policed is 100 packets per second. Hence, a token is added to the bucket every 10 ms. The maximum burst allowed is assumed to be 10 packets. Therefore, the counter threshold is set to 10. We should note that a burst of packets from a given source cannot arrive to the policing function at the same time. This is because all transmissions are serial, meaning a burst of packets will follow each other back-to-back with an interval equal to the time taken to put all the bits of the packet on the line (serialization delay). In this example, serialization delay is assumed to be 0.1 ms, meaning it takes 0.1 ms to complete the transmission of one packet on the link (it depends on the link speed and the packet size). Let us also assume that the link was idle for awhile and hence the counter reaches its maximum value of 10. Table 2.3 shows the packet arrival times, the counter value, and the policing decision for each packet. Note that there is a single packet arrival at 110 ms sandwiched between two bursts of 6 packets. All packets of the first burst are admitted and the counter is decremented to 4. Between 100 ms and 120 ms, there should be two more token arrivals. One will be used by the packet arriving at 110 ms, leaving another 5 tokens in the bucket. These 5 tokens will allow the first 5 packets of the second burst to be admitted to the network, but the sixth packet will be detected as nonconforming. Dual Leaky Bucket When all three important parameters—peak rate, average rate, and burst size—are to be policed, the token bucket and the simple leaky bucket can be used in tandem.
Traffic conforming PR
PR policing
Figure 2.6
Traffic conforming PR, AR, and BS
Token bucket AR, BS policing
Enter network
AR and/or BS
Leaky bucket
Traffic violating PR
Arriving traffic
Engineering Internet QoS
Traffic violating
42
Traffic policing with dual leaky bucket.
Figure 2.6 shows that the traffic is first policed on peak rate and later the traffic conforming to the peak rate is tested further for average rate and burst size. Traffic passing the peak rate test may still be detected as nonconforming by the dual leaky bucket if it later fails the average rate and/or the burst size test. To implement the dual bucket algorithm, we need two counters, one for the simple leaky bucket and one for the token bucket. We do not need any actual buffering to implement the dual leaky bucket.
2.8 TRAFFIC SHAPING Traffic shaping is about controlling the shape of the traffic. If terminals send traffic without shaping, they may be detected as nonconforming at the network edge and subject to discard. Therefore it is a good idea for the terminal to shape the traffic before sending it to the network so that it does not send any packet that violates the traffic parameters negotiated during call set-up. Although traffic policing and traffic shaping may seem to be doing the same job but in different locations, there is a clear conceptual and technical difference between policing and shaping. Shaping of traffic means smoothing out any traffic bursts. Traffic shapers do not want to discard violating traffic, but to store them in actual buffers and smooth it out. In contrast, traffic policing is not concerned with smoothing out any traffic. The sole purpose of traffic policing is to detect violation. Once violation is detected, it does not want to correct it; it either discards the traffic or downgrades the priority of it. There is no actual buffering in traffic policing.
QoS Fundamentals
43
Bucket of depth b
Tokens arrive at the average rate r
Arriving (unshaped) traffic
Figure 2.7
Server
Departing (shaped) traffic
Traffic shaping using token bucket.
Both shaping and policing can leverage the same fundamentals of leaky buckets. Figure 2.7 illustrates the operation of a token bucket used for traffic shaping. The shaping is achieved as follows: Average-rate and burst-size shaping. The server serves packets from the incoming packet buffer as long as there is token in the token bucket. Tokens arrive and are accumulated in the token bucket at the average rate. Here the token bucket size limits the burst size. Peak-rate shaping. If only peak rate shaping is desired, the bucket size is set to zero (no token bucket), and the token arriving rate is set to the peak rate.
2.9 QUEUING AND SCHEDULING In packet switching networks, link bandwidth is shared by multiple traffic sources by two basic mechanisms, queuing and scheduling. Queuing refers to the process of buffering incoming packets at the entrance of a communication link. The links are called serial links; they transmit one packet at a time from the buffers. Given multiple packets waiting in the buffer, a scheduling algorithm defines the transmission schedule of the packets over the serial link.
44
Engineering Internet QoS
Packet loss rate, packet delay, and other QoS parameters of a given traffic flow may be significantly affected by the choice of queuing and scheduling techniques. QoS networks implement sophisticated queuing and scheduling techniques to facilitate the guarantee of the required QoS of an accepted call. We will discuss several queuing and scheduling algorithms in Chapter 3.
2.10 CONGESTION CONTROL AND BUFFER MANAGEMENT As we have seen earlier in Chapter 1, packet loss rate is one of the network QoS parameters. A QoS capable network must provide some guarantees on packet loss rate based on negotiated QoS contract. Congestion in network devices such as routers and the switches is a major cause of packet loss in wired networks. A network can take either proactive or reactive measures to control congestion. The best effort model of the Internet has relied mostly on reactive measures where, if TCP acknowledgment does not arrive before expiry of the timer, it is interpreted as indication of congestion in the network. TCP sources attempt to reduce their sending rate as a reaction to this congestion situation. Buffer management is a proactive technique whereby the networking devices monitor their queue length and once it exceeds a certain threshold, they start dropping packets. Congestion control and buffer management issues and algorithms are described in details in Chapter 4.
2.11 RESEARCH DIRECTIONS Packet classification puts extra overhead on routers as some core routers need to process packets at Gbps speed or more. Several schemes for fast lookup have been proposed by researchers. Lakshman and Stiliadis [4] present a new packet classification scheme. This scheme with a worst-case and traffic-independent performance metric can classify packets by checking among a few thousand filtering rules, at rates of a million packets per second using range matches on more than 4 packet header files. A binary-search-based scheme for solving the best-matching prefix problem of an IP network has been proposed [5]. Another scheme combines gridof-tries and cross-producting algorithms for fast lookup in layer four switches [6]. ATM-specific descriptions of admission control, resource reservation, traffic policing, and shaping can be found [3]. Mathematical techniques such as equivalent capacity [7] and the theory of large deviations [8] to provide probabilistic bounds have been proposed. Another technique uses a measurement-based approach for the admission control algorithm, where a source is admitted initially based on
QoS Fundamentals
45
nominal traffic specification but later admission control is performed based on actual measurement [9, 10]. This scheme allows for occasional delay violations. Details of these techniques are beyond the scope of this book. Keshav [11] provides a good description of this topic with several examples. Good technical coverage of leaky bucket algorithms, in the context of ATM networks, is available in [12]. Golestani [13] proposed a congestion management in integrated services packet networks that uses a moving average traffic descriptor. Cruz [14] shows the worst-case delay analysis of a leaky bucket controlled source. A large number of research papers have studied performance of leaky bucket and token bucket schemes [15, 16, 17]. A simulation study called dynamic token bucket (DTB) evaluates the performance of the bandwidth allocation algorithm by adjusting the token-bucket threshold dynamically, and measures the instantaneous arrival rate of flows [18]. This study shows that the new algorithm achieves fair bandwidth allocation and is robust. An adaptive resource negotiation control scheme based on optimal token bucket parameters has been proposed [19]. This study proposes a new scheme to find the optimal value of token bucket parameters and from observed traffic. This scheme also proposes an admission control scheme that offers the renegotiation feature in the resource reservation process. As a result, higher admission ratios and higher resource utilization are achieved.
;
2.12 SUMMARY We have discussed the fundamental concepts for implementing QoS in packet switching networks. Accurate description of traffic and the required QoS using welldefined quantitative parameters is a prerequisite for any QoS networks. There must be a signaling protocol and software for dynamic negotiation of QoS. Admission control of new calls is essential to protect the QoS of existing calls in the network. To guarantee the negotiated QoS, the network must reserve adequate resources before the communication starts. All traffic entering the network must be policed against the negotiated traffic contract. Traffic violating the contract must be dropped or tagged with lower priority at the network entry. Several policing algorithms are discussed in this chapter.
2.13 REVIEW QUESTIONS 1. Explain the differences between CBR and VBR sources.
46
References
2. What do you understand by quantitative description of traffic? Why is quantitative description crucial for QoS? 3. What do you understand by QoS specification? Why is QoS specification so important? 4. What is QoS signaling? Can we achieve QoS without signaling? 5. What is admission control? Why should we have admission control in QoS networks? 6. Can we still support QoS without admission control on a per-call basis? If admission control is not applied on a per-call basis, what form does it take? 7. Why is it necessary to reserve resources? What sort of resources can we reserve in packet switching networks? 8. Can we avoid per-call resource reservation and still support QoS by doing appropriate network dimensioning? Explain your answer. 9. Why do we have to police traffic? Where in the network is traffic policed? 10. Illustrate the operation of a dual leaky bucket using an example similar to the ones shown in Tables 2.2 and 2.3.
References [1] Z. Wang. Internet QoS Architectures and Mechanisms for Quality of Service. Morgan Kaufman Publishers, San Francisco, California, 1st edition, 2001. [2] R. Braden, D. Clark, and S. Shenker. Integrated services in the Internet architecture: an overview. Request for Comments (Informational) RFC 1633, Internet Engineering Task Force, June 1994. [3] D. McDysan and D. L. Spohn. ATM Theory and Application. McGraw-Hill, New York, 1998. [4] T. V. Lakshman and D. Stiliadis. High-speed policy-based packet forwarding using efficient multidimensional range matching. ACM Computer Communication Review, 28(4):203–214, September 1998. [5] Butler Lampson, Venkatachary Srinivasan, and George Varghese. IP lookups using multiway and multicolumn search. IEEE/ACM Transactions on Networking, 7(3):324–334, June 1999. [6] V. Srinivasan, G. Varghese, S. Suri, and M. Waldvogel. Fast and scalable layer four switching. ACM Computer Communication Review, 28(4):191–202, September 1998. [7] R. Guerin, H. Ahmadi, and M. Naghshineh. Equivalent capacity and its application to bandwidth allocation in high-speed networks. IEEE Journal on Selected Areas in Communications, 9(7):968– 981, September 1991.
References
47
[8] A. Shwartz and A. Weiss. Large Deviations for Performance Analysis: Queues, Communications, and Computing. Chapman & Hall, New York, 1995. [9] S. Jamin, P. Danzig, D. Shenker, and L. Zhang. A measurement-based admission control algorithm for integrated services packet networks. ACM Computer Communication Review, 25(4):2–13, October 1995. [10] Lee Breslau, Sugih Jamin, and Scott Shenker. Comments on the performance of measurementbased admission control algorithms. In Proceedings of the Conference on Computer Communications (IEEE Infocom), Tel Aviv, Israel, March 2000. [11] S. Keshav. An Engineering Approach to Computer Networking. Massachusetts, 1st edition, 1997.
Addison Wesley, Boston,
[12] W. Stalling. High-Speed Networks and Internets: Performance and Quality of Service. Prentice Hall, Upper Saddle River, New Jersey, 2002. [13] S. Jamaloddin Golestani. A stop-and-go queueing framework for congestion management. In Sigcomm ’90: Communication Architectures and Protocols, pages 8–18, Philadelphia, Pennsylvania, September 1990. ACM. (also in ”Computer Communication Review” 20 (4), Oct. 1990). [14] Rene Leonardo Cruz. A Calculus for Network Delay and a Note on Topologies of Interconnection Networks. PhD thesis, University of Illinois at Urbana-Champaign, July 1987. [15] Gustavo de Veciana. Leaky buckets and optimal self-tuning rate control. In Proceedings of the IEEE Conference on Global Communications (GLOBECOM), pages 1207–1211, San Francisco, California, November 1994. [16] A. W. Berger and W. Whitt. The pros and cons of a job buffer in a token-bank rate-control throttle. IEEE Transactions on Communications, COM-42(2-):857–861, 1994. [17] Hamid Ahmadi, Roch Gu´erin, and Khoshrow Sohraby. Analysis of leaky bucket access control mechanism with batch arrival process. In Proceedings of the IEEE Conference on Global Communications (GLOBECOM), pages 344–349 (400B.1), San Diego, California, December 1990. IEEE. [18] Jayakrishna Kidambi, Dipak Ghosal, and Biswanath Mukherjee. Dynamic token bucket (DTB): a fair bandwidth allocation algorithm for high-speed networks. Journal of High Speed Networks, 9(2):67–87, 2000. [19] T.-Y. Tan, T.-H. Cheng, S. K. Bose, and T.-Y. Chai. Adaptive resource negotiation based control for real time applications. Computer Communications, 24(13):1283–1298, August 2001.
48
References
Chapter 3 Scheduling for QoS Management
Scheduling of resources such as link bandwidth and available buffers is key to providing performance guarantees to applications that require QoS support from the network. The routers and switches need to distinguish between the flows requiring different QoS (and possibly sort them into separate queue) and then, based on a scheduling algorithm, send these packets to the outgoing links. This chapter discusses various scheduling disciplines that are being widely deployed in the routers and switches in the Internet.
3.1 SCHEDULING GOALS Following are the goals to be achieved by the new scheduling techniques to support QoS in packet switching networks: Sharing bandwidth; Providing fairness to competing flows; Meeting bandwidth guarantees (minimum and maximum); Meeting loss guarantees (multiple levels); Meeting delay guarantees (multiple levels); Reducing delay variations.
49
50
Engineering Internet QoS
Best-effort traffic doesn’t demand any performance guarantees from the network. However, if there are multiple competing best effort flows, the scheduler is required to perform fair allocation of resources (such as bandwidth). The list of goals suggests that scheduling in QoS networks is nontrivial. A scheduler (server) decides the order in which it serves packets. The service order has impact on delay suffered by packets (eventually flows or users of these flows) waiting in the queue. Packets sharing the same source and destination address, same source destination port, and same protocol identification are considered to belong to a flow. The server can allocate bandwidth to packets from a flow by servicing a certain number of packets from that flow within a time interval. If packets are arriving at the output buffer at a rate faster than the server can serve them, packets will have to wait in the queue for service. If the buffer is of limited size, packets will be dropped. Again, the service order has an impact on packet loss, and a scheduler is capable of guaranteeing that the loss will be below minimum level. As we discussed earlier, fairness is an important criteria for competing best effort flows. The scheduler should allocate resources in a fair manner to these flows. Fairness is not an issue for a class-based network, where traffic demands have differing QoS (possibly a user is willing to pay more). A scheduler is called work-conserving if it is not idle when any of the queues has a packet waiting to be served. In contrast, a non-work-conserving scheduler may choose to remain idle even if it has packets to serve. At first, it may appear bizzare that a scheduler wastes bandwith by remaining idle. The reason for this idle time at the non-work-conserving scheduler is to reduce the burstiness of traffic entering a downstream network element. The Conservation Law [1] states that the sum of the mean queuing delays received by the set of multiplexed connections, weighted by their share of the link’s load, is independent of the scheduling discipline. This law is given by the following equations:
@ ,C @ 6 @ @8W@ ,#2LV$f23$ @Ka Where @ = Mean utilization of flow & @ = Mean arrival rate of flow & 6 @ = Mean service time of packets from flow &
(3.1)
(3.2)
Scheduling for QoS Management
V @
&
= Mean wait time of flow at scheduler = Number of flows
Intuitively, a flow can receive lower delay from a work-conserving scheduler only at the expense of another flow. For example, assume that there are two sources and going through a router with an outgoing link speed of 155 Mbps. Further assume that source generates 15 Mbps of data and source generates 45 Mbps of data. Assume that first come first serve (FCFS) discipline gives a mean queuing delay of 1 msec to each of these sources. Assume that another discipline gives a mean queuing delay of 0.5 msec to source . Queuing disciplines are discussed later in this chapter. Based on ( 3.1), the mean queuing delay for source can be computed as 1.16 msec. ( , , and waiting time of in new scheduling discipline.) Source starts observing lower mean queuing delay at the cost of higher mean queuing delay for .
&
H
&
H
& @ ,O#o=~!O#oRo v , !o=~!O#oRo H
@ G>jl o v G
H
51
@ G9O=l j v H G9O=l j, &
Max-Min Fair Share One of the techniques used for fair share allocation of resources is called maxmin fair share. Intuitively, this scheme allocates the smallest of all demands from all flows first. Remaining resources are distributed equally among competing flows. Let’s assume that there are competing flows, each demanding units of resources (assuming that ) and the total available resource is unit. Initially the flow with lowest demand (i.e., ) gets the unit of resource. goes back to resource pool. If this is greater than what flow 1 needs, then Remaining flows get additional of resource and this process iterates until either all resources are exhausted or all demands have been met. This scheme guarantees that a flow either gets what is wants or at least is not worse off than any other competing flow. For simplicity it has been assumed that each flow gets an equal amount of resources. A variation of this scheme is possible by assigning weights with flows . The following example shows how the max-min fair share scheme can be implemented for an ATM network with an outgoing link of capacity 155 Mbps. Assume 5 competing sources having a bandwidth request of 23, 27, 35, 45, and 55 Mbps each. Initially, the resource is divided equally and all sources are allocated 31 Mbps each (155/5). Since the first source needs only 23 Mbps, the remaining
6 a Q\6=QVlnlnlQ}6%X
[>ZO
ORQSmQVlnllnQ[
O=QSmQVlnllnQ[
a 6lnllnl 6%X 6 6 a ~[ ~R[-N6 a ~[ Cc8~[g6 a e}~
ZO#e
a1Q} QlnllnQ\ X
52
Engineering Internet QoS
8 Mbps (31–23) are divided equally among 4 remaining sources. This gives the remaining sources 33 Mbps each. However, the second source needs only 27 Mbps; the residual 6 (33–27) are divided among the remaining three sources. This increases the allocation for sources 3 to 5 to 35 Mbps each (since all of these sources need 35 Mbps or more, the algorithm stops allocation at this point).
3.2 SCHEDULING TECHNIQUES A number of different scheduling techniques have been proposed in literature. Several of these separate packets into different queues and serve each queue with different criteria. A detailed discussion of the most prominent of these schemes currently being used in the Internet is presented here. 3.2.1 First Come First Serve In traditional packet switching networks, packets from all flows are enqueued into a common buffer and a server serves packets from the head of the queue. This scheme is either called first come first serve (FCFS) or first in first out (FIFO) as shown in Figure 3.1(a). It shows arrival of packets into a FCFS queue and the scheduler serving packets from the head of the queue. Figure 3.1(b) shows a scenario in which all buffers are occupied and a newly arrived packet is dropped. FCFS fails to allocate max-min fair share bandwidth allocation to individual flows. A greedy source can occupy most of the queue and cause delay to other flows using the same queue. Congestion sensitive TCP flows get penalized (TCP flow reduces its sending rate in the wake of congestion) at the expense of applications using UDP flows with no congestion control. A detailed discussion of which packet to drop when a buffer is full and a variety of congestion control techniques are further discussed in Chapter 4. Figure 3.2 shows throughputs of two flows sending packets through a bottleneck router that implements the FCFS scheme. Flow 1 starts sending at 1.5 Mbps and achieves this rate until Flow 2 starts at time 5 with a rate of 1.0 Mbps. Both flows continue untill time 40. It is evident that there is no fixed pattern of sharing bandwidth between the flows as the router serves the packet that arrives first. Only after time 40, when the first flow ceases, does the second flow start getting the required bandwidth. Figure 3.3 shows the same two flows going through the same router that implements a stochastic fairness queuing scheme (discussed later in this chapter). Both flows get a fair share of bandwidth between time 5 and 40 in this case.
Scheduling for QoS Management
53
Head Arrival
Departure FCFS scheduler (a) Tail
Arrival
Departure FCFS scheduler
Packet dropped Buffer empty
Buffer full
(b) Figure 3.1
First come first serve.
Use of a single queue and FCFS scheduling to serve packets from a queue has major limitations for QoS support. First of all there is no flow isolation. Without flow isolation it is very difficult to guarantee delay bound or bandwidth guarantee to specific flows. If different service is required for different flows, multiple queues are needed to separate the flows. Nagle [2] proposed a scheme that separates the flows using their source and destination IP address pair into different queues and then serving these queues in round robin order. However, the scheme may not work well when packets are of variable size. Several advanced scheduling techniques discussed in the following sections use multiple queues. 3.2.2 Priority Queuing One simple way to provide differential treatment to flows is to use multiple queues with associated priorities. Multiple queues with different priority levels, 0 to , are maintained as shown in Figure 3.4. The number of priorities to be used will depend on the number of priority levels supported by a particular protocol. For
2FDO
54
Engineering Internet QoS
1.6 Flow 1 Flow 2 1.4
Throughput (Mbps)
1.2
1
0.8
0.6
0.4
0.2
0 0
10
20
30
40
50
60
Time
Figure 3.2
Throughput achieved with First come first serve.
example, the IPv4 header supports a field called type of service (ToS). Details of the ToS field will be discussed in Chapter 7. The source can use this field to request preferential treatment from the network. If there are packets queued in both higher and lower priority queues, the scheduler serves packets from the higher priority queue before it attends to the lower priority queue. For example, priority 0 is always serviced first. The highest priority has the least delay, highest throughput, and lowest loss. Priority is serviced only if 0 through are empty. This scheme has the potential of starvation of lower priority classes (i.e., the server will be unable to serve the lower class because it is always busy serving the higher class). However, priority queuing is simple from an implementation point of view, as it needs to maintain only a few states per queue. It is important to note that packets from a given priority queue are usually served FCFS. Therefore, FCFS is a useful and simple scheduling technique and it may be used in conjunction with other advanced scheduling techniques.
&
&CO
3.2.3 Generalized Processor Sharing Generalized processor sharing (GPS) is an ideal work-conserving scheme that is capable of achieving max-min fair share [3, 4]. In simple terms, GPS assumes that
Scheduling for QoS Management
55
1.8 Flow 1 Flow 2 1.6
1.4
Throughput (Mbps)
1.2
1
0.8
0.6
0.4
0.2
0 0
10
20
30
40
50
60
Time
Figure 3.3
Throughputs achieved with Stochastic fairness queuing.
each flow is kept in a separate logical queue. It serves an infinitesimal amount of data from each queue such that in a finite time interval it visits every nonempty queue. Each queue can have associated weight and can be served in proportion of its weight. If we assume that there are active flows, then each flow gets of a share of max-min resource. (Remember that the GPS server serves an infinitesimal chunk of data from each flow.) If a queue is empty (because the flow needs less than its max-min fair share), the residual resource gets evenly distributed among competing flows. GPS is capable of achieving max-min weighted fair share as well. In GPS terminology, a connection is called backlogged when it has data present in the queue. Let us assume that there are flows to be served by a server implementing GPS with weights . The service rate for th flow in interval is represented as . For any backlogged flow in interval and for another flow , the following equation holds:
¢ £ Q\$4¤
H
YcfO0e¡Qf Yc)£ SRe/QWlVlWl¡Qf Ycse ¥c`&TQ Q}$\e
c8&Q ££ \Q $\e Y c8&fe ¥ ¥cH=Q }Q $\e>¦ Ync H!e
&
O1~18
&
¢ £ Q\$4¤
(3.3)
Equation 3.3 achieves the max-min fair share by allocating the residual resource (unused resource) such that it gets shared by the backlogged connection
56
Engineering Internet QoS
Priority 0
Arrival Priority scheduler
Priority
Figure 3.4
Departure
§
Priority scheduling server.
in proportion of its weight. However, GPS is an ideal scheme, since serving an infinitesimal amount of data is not implementable. We discuss some variations of the GPS that can be implemented in a real system next. 3.2.4 Round Robin A simple implementation of GPS is the round robin (RR) scheduling whereby one packet replaces infinitesimal data. To address the fairness problem of a single FCFS queue, the round robin scheduler maintains one queue for each flow. Each incoming packet is placed in an appropriate queue. The queues are served in a round robin fashion, taking one packet from each nonempty queue in turn. Empty queues are skipped over. This scheme is fair in that each busy flow gets to send exactly one packet per cycle. Further, this is a load balancing among the various flows. Note that there is no advantage to being greedy. A greedy flow finds that its queue becomes long, increasing its delay, whereas other flows are unaffected by this behavior. If the packet sizes are fixed, such as in ATM networks, round robin provides a fair allocation of link bandwidth. If packet sizes are variable, which is the case in
Scheduling for QoS Management
57
the Internet, there is a fairness problem. Consider a queue with very large packets and several other queues with very small packets. With round robin, the scheduler will come back to the large-packet queue quickly and spend long times there. On average, the large-packet queue will get the lion’s share of the link bandwidth. Another problem with round robin is that it tries to allocate fair bandwidth to all queues and hence differential treatment, or any specific allocation of bandwidth to specific queues, is not achieved.
3.2.5 Weighted Round Robin
Weighted round robin (WRR) is a simple modification to round robin. Instead of serving a single packet from a queue per turn, it serves packets. Here is adjusted to allocate a specific fraction of link bandwidth to that queue. Each flow is given a weight that corresponds to the fraction of link bandwidth it is going to receive. The number of packets to serve in one turn is calculated from this weight and the link capacity. Assume three ATM sources (same cell size) with weights of 0.75, 1.0, and 1.5, respectively. If these weights are normalized to integer values, then the three sources will be served 3, 4, and 6 ATM cells in each round. The WRR works fine with fixed size packets, such as in ATM networks. However, WRR has difficulty in maintaining bandwidth guarantees with variable size packets (the Internet). The problem with a variable size packet is that flows with large packets will receive more than the allocated weight. In order to overcome this problem, the WRR server needs to know the mean packet size of sources a priori. Now assume that a serial link (MTU 500 bytes), an Ethernet (MTU 1,500 bytes), and an FDDI ring network (MTU 4,500 bytes), share a high-speed outgoing link. Each of these links are assigned weights of 0.33, 0.66, and 1.0. Using weights normalized to their respective packet sizes, the scheduler will need to serve 6 packets from the serial link (3,000 bytes), 4 packets from the Ethernet (6,000 bytes), and 2 packets from the FDDI (9,000 bytes). Short-term fairness is another problem encountered by WRR. On a small time scale, WRR does not meet the fairness criteria, since some flows may transmit more than others. In the previous example, while FDDI packets of 9,000 bytes are sent, other links will not get the chance to transmit. This would mean that WRR is not fair on a time scale of less than 9,000 byte transmission time (at 100 Mbps, it would take around 72 s to send 9,000 bytes).
¨
2
2
58
Engineering Internet QoS
3.2.6 Deficit Round Robin Deficit round robin (DRR) improves WRR by being able to serve variable length packets without knowing the mean packet size of connections a priori. The algorithms work as follows: Initially a variable quantum is initialized to represent the number of bits to be served from each queue. The scheduler starts serving each queue that has a packet to be served. If the packet size is less than or equal to the quantum, the packet is served. However, if the packet is bigger than the quantum size, the packet has to wait for another round. In this case another counter, called a deficit counter, is initialized for this queue. If a packet can’t be served in a round, its deficit counter is incremented by the size of the quantum. The following pseudo code adapted from Shreedhar and Varghese [5] describes the DRR scheme. 1 0
#define TRUE #define FALSE
©
void DeficitRoundRobin() /*initialize the deficit counter*/ for(conn=1; conn = N; conn++) deficitCounter[conn] = 0; quantum = q;/* quantum for each queue*/ while (TRUE)
ª
©
/* get the queue id for active connection if (Activelist.Isempty() == TRUE)
©
conn = Activelist.getconnid(); /* add quantum to deficit*/ deficitCounter[conn]+= q; while ((headofQueue[conn].Isempty())== FALSE) && deficitCounter[conn] 0))
«
©
packetSize = headofQueue[conn].packetSize; if (packetSize = deficitCounter[conn])
«
©
Dequeue(headofQueue[conn]); deficitCounter[conn] = packetSize;
¬
©
else conn++; break;
Scheduling for QoS Management
59
/* stop accumulation of deficit counter*/ if (headofQueue[conn].Isempty())== TRUE) deficitCounter[conn]= 0; else
Activelist.Setflag(conn);
Let’s look at an example to understand the working of the DRR scheme. Figure 3.5 shows four queues, q1 to q4. It is assumed that the deficit counter of all queues has been initialized to 0. The queues have the following status initially:
® ®
q1 has two packets of size 500 and 700 bytes.
®
q3 has two packets of size 200 and 500 bytes.
q2 is empty.
®
q4 has one packet of size 400 bytes.
Assume that the quantum is set to 500 bytes for all queues. It is possible to configure different quantum for various queues. Figure 3.5 shows the snapshot of the first round and the deficit counter values are shown at the beginning and the end of processing a queue. For q1, the start value is 500 and the first packet of size 500 gets served from this queue, resulting in 0 value at the end of this round. The q2 has both start and end value of deficit counter as 0, as it is not active. The third queue, q3, gets its first packet of size 200 served, resulting in a deficit counter value of 300, as it has another outstanding packet of size 500 in the queue. The q4 gets its first packet of size 400 served. However, its deficit counter will be reset to zero. This ensures that credits cannot be accumulated for a long period of time and fairness can be maintained. In round 2 (Figure 3.6), the q1 has a packet of size 700 at the start of the queue. It cannot be served, as it gets a 500 quantum at the beginning, which is not sufficient for a 700-byte packet. However, at the end of this round the deficit counter of 500 gets accumulated for the next round. The q3 has a packet of size 500 at the front of the queue as well as a newly arrived packet of size 400. The deficit counter is at the beginning of the round. A packet gets served from this queue resulting in a carry-over of 300 in its deficit counter. Finally in round 3 (Figure 3.7), the first packet of 700 bytes from q1 gets served, as its deficit counter is 1000 at the start of this round. The q3 works exactly the same way, serving its last
¯=°R°L±³²°R°
60
Engineering Internet QoS
packet of 400 bytes. The deficit counter is reset to zero for the queues q1 and q3, as they do not have a packet waiting in the queue. Deficit counter
Round 1 q1
700
500
q2
Empty queue
q3
500
q4
Figure 3.5
200
400
Packet sent
Start
End
500
0
0
0
500
300
q2
500
0
q4
q1
DRR scheduling round 1.
DRR should set the quantum (bits to be served) to at least one packet from each connection. This would require one to set the quantum to MTU of the link. For example, if the link consists of Ethernet packets, it should be set to 1,500 bytes. DRR is not fair at time scales shorter than a packet time. However, ease of implementation makes it an attractive scheduler. 3.2.7 Weighted Fair Queuing For variable size packets (Internet), a complex scheduler such as weighted fair queue (WFQ) is used. Another identical scheme called packet-by-packet generalized processor sharing (PGPS) was invented at the same time as WFQ. For our purposes, discussion is restricted to WFQ. Each packet is tagged on the ingress with a value identifying, theoretically, the time the last bit of the packet should be transmitted. Basically this tag value (or finish time) of the packet is the time a packet would have been transmitted if GPS scheduler was used (as we have discussed earlier, GPS is not an implementable scheme). Each time the link is available to send a packet, the packet with the lowest tag value is selected.
Scheduling for QoS Management
Deficit counter
Round 2 q1
q2
q3
q4
Figure 3.6
700
Empty queue
400
500
Empty queue
q1
Figure 3.7
End
500
500
0
0
800
300
0
0
Deficit counter Start
700
Empty queue
400
q3
q4
Start
Packet sent
q2
DRR scheduling round 2.
Round 3
q2
61
Empty queue
DRR scheduling round 3.
End
1000
0
0
0
800
0
0
0
Packet sent q1
q2
62
Engineering Internet QoS
Finish Time Calculation The following equation is used to calculate the virtual finish time would have finished sending the packet on connection ):
· ¸ ´ ¹ µ ¶{ºy» ·+¼½J¾8´ ¶Aµ ¿3ÀÁ} `¾ Ã\Ä}ÄL±Å ¶ µ Æ0Ç ¾¸VÄ
´N¶ µ
(the router
(3.4)
 ¾`Ã\Ä
where is called a round number. This is the number a bit-by-bit round robin scheduler (in place of GPS’s nonimplementable infinitesimal data) has completed at a given time. The round number is a variable that depends on the number of active queues to be served (inversely proportional to the active queue number). The more queues to serve, the longer a round will take to complete. is the time required to transmit packet from connection, and is the weight of connection .
·+É8Ê
Ç ¾8¸VÄ
¸0É8Ê
Åȶ µ
¸
Example of Finish Time Calculation The following example demonstrates how the WFQ scheme works. It is assumed that there are three equally weighted connections [i.e., equals one for all connections] , , and and the link service rate is 1 unit/s. Packet names such as one, two, three, etc., are used to simplify understanding of concepts. Following are the packet details used in our example:
Ë Ì
®
Ç ¾8¸VÄ
Í
Ë
®
one: arrive at time 0 on connection of size 2 units;
®
three: arrive at time 2 on connection of size 3 units;
®
five: arrive at time 3 on connection
Ì
®
two: arrive at time 1 on connection of size 2 units;
®
four: arrive at time 2 on connection of size 1 units;
Ë
Í
Ì
Ë
of size 4 units;
six: arrive at time 4 on connection of size 1 units.
Figure 3.8 shows this scheme. At time 0, counters such as round number and finish time for connections , , and are initialized to zero. Calculations of finish times of packets are shown below:
Ë Ì
®
®
one: ;
Ì
Ë
two: ;
Í
´À Î » ·+¼½J¾8° Á °Ï °ÄL±|Ð » Ð , where  ¾8°Ä » ° , first packet on connection ´-À Ñ » ·+¼=½K¾8° ÁVÒ Ï °=Äb±|Ð » ¯ , where  ¾ Ò Ä » Ò , first packet on connection
Scheduling for QoS Management
®
63
´ Ó Î » ·+¼½J¾)Ð ÁVÒ Ï ²=Äb±D¯ » ² , where  ¾)ÐRÄ » Ò Ï ² and ´À Î » Ð ; ® four: ´Ó Ñ » ·+¼½J¾¯ ÁVÒ Ï ²=Ä3± Ò »Ô , where  ¾Ð=Ä » Ò Ï ² and ´À Ñ » ¯ ; ® five: ´À Õ » ·+¼½J¾° Á ÐmÏ °ÄW± Ô»Ö , where  ¾¯=Ä » Ð<Ï ° , first packet on connection Í; ® six: ´× Î » ·¼½J¾)² Á Ð<Ï ¯R¯=Ä3± Ò » Ö , where  ¾ Ô Ä » Ð<Ï ¯R¯ Á ´9Ó Î » ¯ . three:
The above finish times have used the round time as shown in Figure 3.8. The rate of the round number is controlled by the number of active connections (also shown in the figure). The figure also shows the number of active connections at each point of time and the packets present in those queues. A point to note here is that is nonzero only if there is a packet from the same connection still present in the active queue. For our example, had packet three (second packet of connection ) arrived at real time 3 (at this time packet one, first packet on connection had finished), would have been zero. As the finish time is used by a scheduler to select packets to be sent, the real order of packet departure for the example above is shown in Figure 3.9. The order of packet departure is one, two, four, three, five, and six. Since the finish time for packet five and six is same, we have arbitrarily assumed that five is sent before six. However, schedulers may choose a policy to break a tie. In total, 13 units of packets are to be sent on the outgoing link and Figure 3.9 shows that all packets are finished at the end of real-time 13. Figure 3.10 shows the rate of change of the round number with the progress of real time. As we can see from this figure, the rate keeps varying depending upon the number of active queues. Between time interval 0 and 1, the rate is 1, as there is only one active connection . At time 1, there are two active connections, and , and hence the slope of the rate curve is 1/2. At time 3, all three connections become active, and the slope drops further to 1/3. Intuitively, it takes more time to complete a round if more connections are active. Again at time 9, the slope of the rate curve changes to 1/2 as only two connections and are active. For simplicity, we assumed equal weight for each connection. However, if a different weight is assigned to each connection, they will be served in proportion of their weights. These weights can either be configured manually or by using some form of signaling protocols [Chapter 6 describes a signaling protocol called resource reservation protocol (RSVP)]. Keshav [6] points out a problem, called iterated deletion, related to the round number calculation, due to inaccurate estimation of active connections. Further, a
Ë
´¶Aµ ¿3À
Ë
´9À Î
Ë
Ë
Ë
Í
Ì
64
Engineering Internet QoS
one
six five four three two
six five four three
two one
four three two one
five four three two
i
i,j
i,j i,j,k i,j,k i,j,k i,j,k i,j,k i,j,k i,k
1
2
2
3
3
3
six five three
3
3
3
Packets in active queues
six five i,k
2
i,k 2
Active queues 2
2
Number of active queues Time
0
1
0
1
Figure 3.8
2
3
4
5
7
8
9
2
3
4
one
two
four
10
11
12
13
14
5
6
Round number
three
five, and six
Packet finished
Finish time calculation for WFQ.
13 six
Figure 3.9
6
9 8 five
three
5 4 four
2 two
one
Real departure time.
complex round-number calculation is done for every packet arrival and departure; thus implementation of this scheme becomes hard for very high-speed links. Delay Bound with WFQ One attractive property of WFQ is that it can bound the delay when each flow is policed using a token bucket. Figure 3.11 shows the scheme, whereby each flow goes through the token bucket filter before being serviced by a WFQ scheduler. We will discuss how flows can be classified in Chapter 5. In the simplest case, the ToS field of the IP header could be used for this purpose. Multiple input flows are shaped by their token bucket regulators and placed into a queue based on some classification criteria. The WFQ scheduler servers each queue in proportion to its ). weight (
Ø À ÏÏnÏÙØdÚ
Scheduling for QoS Management
65
Round number calculation 7
6
Round number
5
4
3
2
1
0 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Real time
Figure 3.10
Rate of round number.
Ë
Ç ¾`ËfÄ
, is guaranteed to receive Under WFQ, class , with assigned weight of a fraction of service that equals , where the denominator represents the sum of all classes that have packets waiting for transmission. For a link with a transmission rate of , the flow of class will always receive a minimum throughput of (3.5)
Ü
Â
Ë4Ý3Ë4·Þ·
Ç ¾8ËfÄ ÆÛ|Ç ¾Ì!Ä Ë
ßAàá1â#ÞãmàäÞà » ÂAÇ ¾`ËfÄ Æ=åæÇ ¾Ì!Ä
With token bucket policing, we assume that initially the token bucket is full and a burst of packets arrives for a flow for the class . These packets remove all the tokens and get queued (into the queue for class ) to be served by a WFQ scheduler. Since these packets are served by the WFQ scheduler at a minimum rate given by ( 3.5), the last packet to complete the service will suffer a maximum delay of given by the equation below.
çÎ
Ë
Ë
è ¶dé/ê
è {¶ é¡ê » ç Î Æ ¾ ÂYÇ ¾8ËfÄ Æ ¾ å Ç ¾Ì!Ä\Ä (3.6) Intuitively, there are ç packets to be served from the queue at a rate specified Î by equation 3.5, and the total amount of time taken to transmit the last bit of the last packet cannot exceed the è ¶{é¡ê value given by ( 3.6).
66
Engineering Internet QoS
r1 b1 Input
Test
?
w1 WFQ scheduler
?
Output
wn
bn rn
Figure 3.11
Weighted fair queue with token bucket.
Several commercial router and switch vendors have started implementing WFQ in their products. Using these routers, it is possible to guarantee QoS for flows, at least within an autonomous system. For Internet wide implementation, we need all core routers to implement WFQ. Chapter 5 discusses the scalability issues in providing per-flow QoS guarantees. 3.2.8 Virtual Clock Virtual clock proposed by Zhang [7] is a scheme similar to WFQ using GPS. In literature, it is also referred to as fair queuing (FQ). The finish time emulates time division multiplexing (TDM) in place of GPS. Round time in Equation 3.4 is replaced by real time in virtual clock. This simplifies the complex finish time
Scheduling for QoS Management
67
calculation used by WFQ. The following equation calculates the new finish time:
´ ¹ µ ¶{º » ·+¼½J¾8´ ¶Aµ ¿3À Á}ë ¶ ÄL±Å ¶ µ where
ë ¶
is the real arrival time of packet
·
(3.7)
.
Goyal et. al. [8] have shown that when all connections are active (i.e., they have packets waiting for transmission in the queue), virtual clock and fair queuing achieve similar same worst case end-end delay. Stiliadis and Verma [9] compared the fairness of several well-known scheduling algorithms and found that the relative fairness bound for virtual clock is infinity when used for best effort flows. Figure 3.12 shows an example to demonstrate the problem associated with virtual clock. There are two active queues, A and B. A has a large packet, with finish time being 20. Since there is no other competing packet with a smaller finish time waiting in any other queue, A starts getting service. Now, at time 2 for example, another packet arrives at queue B, which has a smaller finish time, say of 7. Despite the packet from queue B having a smaller finish time, it will have to wait until time 20, because the packet from queue A cannot be preempted. Queue head Finish time = 20
Output order
A Queue head Finish time = 7 B
Figure 3.12
Virtual clock problem.
3.3 CLASS-BASED QUEUING Class-based queuing (CBQ) falls under the category of hierarchical schemes whereby classification of packets to be scheduled can be done more than once [10]. Floyd [11] presents various algorithms for hierarchical link-sharing and argues that
68
Engineering Internet QoS
controlled link-sharing may equip gateways with the flexibility to cater for new applications and network protocols in the Internet. All arriving packets are classified into several classes, one queue for each class. Many different criterion are used to classify packets. For instance, an organization may purchase part of a link bandwidth based on their needs. However, they would like to divide this bandwidth further into separate classes based on real-time and non-real-time traffic (or possibly based on the protocol types being used).
ABR
EE ATM Input
VBR
CSE Telnet
IP FTP
Real time
Figure 3.13
Class-based queue.
Figure 3.13 shows an example of hierarchy of classification. At the first level of classification, packets from two divisions, CSE and EE, are classified into separate classes. Packets from organization CSE are then further classified based on protocols used, ATM or IP. For IP packets, three classes are used, TELNET, FTP, and real time (RTP, etc.). It is important to note that under one class, there may be many flows. Resources, therefore, are allocated to a class, not to a flow. Many TELNET flows will share the resources allocated for TELNET class. CBQ can use any of the scheduling disciplines we have discussed so far.
Scheduling for QoS Management
69
3.4 IMPLEMENTATION STATUS Under different flavors of UNIX operating systems, several of these advanced scheduling techniques have been developed. Linux has an iproute2 tool module that provides support for a variety of packet schedulers in the kernel [12]. Alternate queuing (ALTQ) is another set of utilities available, mostly under FreeBSD platform [13]. We provide some sample usage of the iproute2 tool for tc for traffic control. CBQ Example Let’s take a simple example of a Linux-based router connecting a university to the outside world via an interface eth0 (100 Mbps) and to the internal network via interface eth1 (100 Mbps). The university decides to divide the available link capacity of 100 Mbps among the faculties that it connects to the outside world. The faculty of engineering gets 70 Mbps of the link capacity whereas the remaining 30 Mbps is given to others. The Linux iproute2 module provides command line interface to its traffic control tool tc. We provide a brief description of how this tool could be used to accomplish the aforesaid task. The command below configures a root queuing discipline with 100: as a handle to be used later. The discipline used here is CBQ with total available bandwidth of 100 Mbps and an average packet size of 1000 bytes. #tc qdisc add dev eth1 root handle 100: cbq \ bandwidth 100Mbit avpkt 1000 The following command generates a root class from the above queuing discipline with a class id of 100:1. This class uses the complete available bandwidth of 100 Mbps and sets MTU to 1514. Details of command line arguments for tc can be found in the Linux manual [14]. #tc class add dev eth1 parent 100:0 classid 100:1 cbq \ bandwidth 100Mbit rate 100Mbit allot 1514 weight 1Mbit \ prio 8 maxburst 20 avpkt 1000 Now the root class needs to be divided into the engineering and other faculties in the ratio of 70:30. The following commands accomplish this task: #tc class add dev eth1
parent 100:1 classid 100:70 \
70
Engineering Internet QoS
cbq bandwidth 100Mbit rate 70Mbit allot 1514 weight \ 7Mbit prio 5 maxburst 20 avpkt 1000 #tc class add dev eth1 parent 100:1 classid 100:30 \ cbq bandwidth 100Mbit rate 30Mbit allot 1514 weight \ 3Mbit prio 5 maxburst 20 avpkt 1000 Figure 3.14 shows the classification performed by the above commands using the CBQ.
Root ID:100: (100%) 100 Mbps
Figure 3.14
Engineerinng
Others
ID 100:70 (70%) 70 Mbps
ID 100:30 (30%) 30 Mbps
CBQ example using TC.
So far we have managed to define a CBQ discipline and its classes. However, we need to manage these classes using a queuing discipline. The following commands configure SFQ for these queues. #tc qdisc add dev eth1 parent 100:70 sfq quantum \ 1514b perturb 15 #tc qdisc add dev eth1 parent 100:30 sfq quantum \ 1514b perturb 15 Finally, the kernel needs to have a packet classifier to decide which queue the packets need to be sent to. This is achieved using the following commands with an assumption that traffic from the engineering faculty is coming via gateway 192.168.21.29.
Scheduling for QoS Management
71
#tc filter add dev eth1 parent 100:0 protocol ip \ prio 100 u32 match ip dst 192.168.21.29 \ flowid 100:70 #tc filter add dev eth1 parent 100:0 protocol ip \ prio 25 u32 match ip dst 192.168.0.0 flowid 100:30
3.5 RESEARCH DIRECTIONS IN SCHEDULING In recent years, packet scheduling has been a very active research area, resulting in several variants of fair queuing scheduler. Keshav [15, 6] provides details of many of these schedulers and a table of comparison of these scheduling disciplines. A brief description of a few prominent schedulers is provided here. Worst-Case Fair Weighted Fair Queuing WFQ doesn’t provide absolute fairness bound on a time scale that is very short. Worst-case fair weighted queuing eliminates [16] this problem by introducing a new rule for selecting the packet to be serviced. In place of a packet with the lowest tag, W Q serves the packet with the smallest finish number of all the packets that have started (and also possibly finished) service in GPS emulation at that point of time. This makes W Q more complex than WFQ, but it provides better fairness than WFQ in short time scales.
´
Ó
´
Ó
Self-Clocked Fair Queuing As we have seen earlier, WFQ uses a complex scheme for round number calculation. Self-clocked fair queuing (SCFQ) [17] is distinguished from WFQ in that the round number is set to the finish number of the packet currently being served. This simplifies the round number calculation. The drawback of this scheme is that it results in greater worst-case delay than with WFQ. Start Time Fair Queuing This scheme again tries simplyfing the complex round number calculation of WFQ [18]. The round number is set to the start number of a packet currently receiving service by the scheduler. The worst case delay is similar to that of the
72
Engineering Internet QoS
WFQ scheduler. However, flows that have less throughput suffer less delay than in the case of WFQ.
Core State Fair Queuing
As discussed earlier, the FQ algorithms are complex to implement, especially for high speed links. Also, these algorithms suffer from a scalability problem, as they provide guarantees to each individual flow. The core state fair queuing (CSFQ) [19] algorithm proposes a two-tiered scheme whereby only the edge routers perform perflow state management. Edge estimates the rate of each incoming flow and puts a label indicating this. After this, edge as well as core nodes do not need to perform per-flow management. They look at the incoming label and the fair rate of each outgoing link to calculate forwarding probability, , of the packet. While forwarding the packet, its flow label gets replaced with the minimum of its previous label ( ) and the fair value of the output link ( ), given the following equation:
Å
á
ì
á í » ·³Ë4Ýy¾ì Á áÄ
(3.8)
áí
where is the new label.
 Ò Ä
Using CSFQ, the core routers have state complexity of whereas the edge routers have where is the number of flows. Figure 3.15 shows a topology used to describe the simulation study performed by Stoica et. al. [20]. Node1 and Node2 are assumed to be connected via a full duplex 10 Mbps point-to-point link with a 1 msec link delay. Output buffer of these nodes is 64,000 bytes and the maximum packet size allowed on this link is assumed to be 1000 bytes. Figure 3.15 also shows one UDP and four TCP (Tahoe) sources connected to Node1 sending packets to sinks connected to Node2. The UDP source is sending a 1000 byte packet at a rate of 10 Mbps. TCP sources (TCP-1 to TCP-4) are also sending 1000 byte packets with a window size of 28K bytes. Simulation runs for 10 seconds, where all sources start and finish at the same time. Table 3.1 shows the throughput received by these sources when CSFQ is applied at Node1 and Node2. As is evident from the table, all sources receive a fair share of the 10 Mbps bandwidth (approximately 2.0 Mbps). Although the UDP source attempts to send at 10 Mbps, it only receives 2.24 Mbps of bandwidth.
`ÝLÄ
Ý
Scheduling for QoS Management
73
UDP
UDP
TCP−1
TCP−1
TCP−2
NODE2
NODE1
TCP−2
TCP−3
TCP−3
TCP−4
Figure 3.15
TCP−4
CSFQ simulation.
Table 3.1 CSFQ Simulation Results Source
Throughput
UDP TCP-1 TCP-2 TCP-3 TCP-4
2.24 1.99 1.88 1.74 1.89
3.6 SUMMARY In today’s Internet, the networking devices such as routers and switches need to perform control functions to guarantee QoS. This chapter provided a detailed description of the fundamentals of packet scheduling techniques. Packet schedulers must achieve bandwidth partition and provide delay bound guarantees. We discussed the fair queuing scheme and several variants of this scheme such as WFQ and SFQ. Several commercial networking devices and Linux/FreeBSD based routers provide support for hierarchical schemes. We discussed the CBQ scheme with an example of configuring CBQ using Linux iproute2 module. Finally, we concluded this chapter with recent research developments in the packet scheduling area.
74
Engineering Internet QoS
3.7 REVIEW QUESTIONS 1. Why is the packet scheduler considered important for QoS capable networks? 2. What is a work-conserving scheduler? Explain with an example. 3. Explain the max-min fair share scheme. 4. Assume that three queues are competing for links with MTU 500, 1000 and 2000 bytes. How many packets from each of these queues will be served by a WRR scheduler in each round? 5. What factor influences the choice of quantum size in DRR? 6. What is the GPS scheme? Why is it impractical to implement? 7. Explain the finish time calculation scheme in WFQ. 8. How does the virtual clock scheme differ from WFQ? 9. A DRR scheduler is serving a queue with quantum size of 500 bytes. Let’s assume that three flows are identified and put into queues q1, q2, and q3 respectively. Each of these queues has packets (q1, 1200), (q2, 700), and (q3, 450). Show how packets will be served from these queues (you may like to draw diagrams similar to the example in the text). How would the deficit counter value for connection q3 change if it had another packet of size 400 waiting behind the first packet of size 450? 10. What is the motivation behind implementing hierarchical scheduling in gateways?
3.8 IMPLEMENTATION PROJECT Configure CBQ on a gateway that serves as a point of connection for an organization to the outside world. The organization has three divisions and the bandwidth should be distributed in the proportion of 50:30:20. Further, each of the divisions, may like to allocate bandwidth based on certain applications. Divide the bandwidth further into a 50:50 ratio between TCP and UDP flows. You may like to allow bandwidth borrowing between the flows. Once you have configured the router, you may like to generate traffic from multiple sources and measure the throughput received for each flow. Try to send at a rate faster than what is allowed by your flow to see if your configuration works correctly.
References
75
Hints: Use either Linux Traffic Control or ALTQ for configuring CBQ. The companion Web site has links to other tools for traffic generation and statistics collection. References [1] L. Kleinrock. Queuing Systems, Volume 2: Computer Applications. Wiley Interscience, New York, 1975. [2] J. Nagle. On packet switches with infinite storage. Request for Comments 970, Internet Engineering Task Force, December 1985. [3] Abhay K. Parekh and Robert G. Gallager. A generalized processor sharing approach to flow control in integrated services networks — the multiple node case. In Proceedings of the Conference on Computer Communications (IEEE Infocom), volume 2, pages 521–530 (5a.1), San Francisco, California, March/April 1993. IEEE. [4] Alan Demers, Srinivasan Keshav, and Scott Shenker. Analysis and simulation of a fair queueing algorithm. In SIGCOMM Symposium on Communications Architectures and Protocols, pages 1– 12, Austin, Texas, September 1989. ACM. [5] M. Shreedhar and George Varghese. Efficient fair queueing using deficit round robin. ACM Computer Communication Review, 25(4):231–242, October 1995. [6] S. Keshav. An Engineering Approach to Computer Networking. Addison Wesley, Boston, Massachusetts, 1st edition, 1997. [7] L. Zhang. Virtual clock: A new traffic control algorithm for packet switching networks. In Proceedings of ACM SIGCOMM, pages 19–29, Philadelphia, Pennsylvania, 1990. [8] P. Goyal, S. Lam, and H. Vin. Determining end-to-end delay bounds in heterogenous networks. In Proc. International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV), pages 287–298, Durham, New Hampshire, April 1995. Springer. [9] Dimitrios Stiliadis and Anujan Verma. Latency-rate servers: a general model for analysis of traffic scheduling algorithms. In Proceedings of the Conference on Computer Communications (IEEE Infocom), pages 111–119, San Fransisco, California, March 1996. [10] Ian Wakeman, Atanu Ghosh, Jon Crowcroft, Van Jacobson, and Sally Floyd. Implementing realtime packet forwarding policies using streams. In USENIX 1995 Technical Conference, New Orleans, Louisiana, January 1995. [11] Sally Floyd and Van Jacobson. Link-sharing and resource management models for packet networks. IEEE/ACM Transactions on Networking, 3(4):365–386, August 1995. [12] Werner Almesberger. Linux traffic control—implementation overview. Technical report, EPFL, January 1998. ftp://lrcftp.epfl.ch/pub/people/almesber/pub/tcio-current.ps.gz. [13] Kenjiro Cho. A framework for alternate queueing: Towards traffic management by PC-UNIX based routers. In USENIX 1998 Annual Technical Conference, New Orleans, Louisiana, June 1998. [14] Linux 2.4 advanced routing HOWTO. URL: http://www.linuxdoc.org/HOWTO/Adv-RoutingHOWTO.html. [15] Hui Zhang and Srinivasan Keshav. Comparison of rate based service disciplines. In Sigcomm’91 Symposium — Communications Architectures and Protocols, pages 113–121, Zurich, Switzerland, October 1994.
76
References
[16] Jon C. R. Bennett and Hui Zhang. WF2Q: worst-case fair weighted fair queueing. In Proceedings of the Conference on Computer Communications (IEEE Infocom), pages 120–128, San Francisco, California, March 1996. [17] S. Jamaloddin Golestani. A self-clocked fair queueing scheme for broadband applications. In Proceedings of the Conference on Computer Communications (IEEE Infocom), pages 636–646, Toronto, Canada, June 1994. IEEE. [18] Pawan Goyal, Harrick Vin, and Haichen Cheng. Start-time fair queueing: A scheduling algorithm for integrated services packet switching networks. In SIGCOMM Symposium on Communications Architectures and Protocols, pages 157–168, Stanford, California, August 1996. [19] Ion Stoica, Scott Shenker, and Hui Zhang. Core-stateless fair queueing: Achieving approximately fair bandwidth allocations in high speed network. ACM Computer Communication Review, 28(4):118–130, September 1998. [20] Ion Stoica. Technical overview:csfq. http://www.cs.cmu.edu/˜istoica/csfq/tech.html.
Chapter 4 TCP/IP and Queue Management
Reducing packet loss rate is a challenging task for the Internet, particularly when QoS requirements are becoming stringent. Dropped packets along the path translates into wastage of resources. The first part of this chapter provides an overview of TCP/IP protocols and the congestion control algorithm used by TCP as a refresher on these topics. Detailed coverage of these topics may be found in Stevens [1] and Comer [2]. Readers familiar with these concepts may jump to Section 4.9, which discusses queue management issues in the Internet. The Internet is synonymously also known as the TCP/IP network. TCP and IP are transport and network layer protocols that provide service to higher layers. In order to understand the queue management issues, it is important to understand the basic underlying principles behind these protocols.
4.1 INTERNET PROTOCOL The Internet Protocol (IP) is the main protocol that is used by all higher layer protocols in the TCP/IP protocol stack. The IP protocol is independent of underlying subnetwork technology and can potentially run over any networking technology (including ATM, Ethernet, frame relay, optical network). On top of this, the network layer performs routing functions (determining the path to be followed by a packet) using routing protocols such as RIP and OSPF (details of these protocols are beyond the scope of this book). An accompanying protocol, Internet Control Message
77
78
Engineering Internet QoS
Protocol (ICMP), is used for communication between network layers and for error reporting function. In the following, we discuss IP protocol in detail. 4.1.1 Datagram Forwarding The IP protocol provides a connectionless, unreliable datagram model of communication as opposed to the virtual circuit-oriented communication of the ATM network. The IP layer encapsulates the higher layer protocols such as the TCP segment within an IP datagram, fills the source and destination address and other fields of the IP header, and forwards all to the next hop router (based on its routing table) along the destination. Each intermediate router along the path (between source and destination) processes the IP datagram by looking up the next hop (from the routing table) for the destination address carried in the datagram and forwards it along this path. It is worth noting that each datagram is processed independent of others. The connectionless model used by IP in the Internet has several advantages. First of all, unlike the connection-oriented (virtual circuit) model, there is no need for explicit connection establishment and termination. This simplifies router design, as the routers don’t need to maintain large number state variables. The connectionless model scales well for a large number of hosts in the Internet. Routers also have the flexibility of choosing an appropriate path for IP datagrams based on congestion level or link availability of routes. However, this approach also creates problems for end systems. Packets taking different routes may arrive out of order as they will suffer different amounts of delay. The receiver needs to take care of buffering and packet sequencing. 4.1.2 Unreliable Delivery of Datagrams The IP layer provides only best effort delivery guarantees to higher layers. This means that if routers can’t process datagrams fast enough, they will drop packets. The IP layer doesn’t make any effort to recover from packet loss. Also it is possible that duplicate packets may be generated in the IP network. As discussed earlier, packets taking different routes may arrive out of sequence at the receiver. Higher layer protocols such as TCP can be used to build reliability on top of the unreliable IP layer. 4.1.3 Datagram Format Figure 4.1 shows the format of the current version of IP version 4 (IPv4). IETF has a new standard for IP called IP version 6 (IPv6). Details of IPv6 can be found in
TCP/IP and Queue Management
0
79
31 Version (4)
Header Len (4)
Total length in bytes (16)
ToS (8)
Identifier (16) Time to live (8)
Fragment offset (13)
Flags (3)
Protocol (8)
Header checksum (16)
Source IP address (32) Destination IP address (32) Options ( if any)
Payload
Figure 4.1
IPv4 header format.
Hinden and Deering [3]. We confine our discussion to IPv4 in this chapter. An IP datagram consists of a header followed by a payload. A description of some key fields of the IP header is provided in the following:
®
®
®
Version: Bits in this field show the version number of the IP protocol in use. This field tells how to interpret the rest of IP datagram, as headers and fields vary between versions. The current version of IP protocol is 4, as per discussion earlier. In the future, the IP datagrams will be carrying 6 to represent IPv6. Header length: This field is used to indicate the length of the header in multiples of 32 bit words. In most cases the header length of IP datagrams is 20 bytes (if the options field is not used). Since the header can be of variable length, the fields also help identify the start of payload. Type of service (TOS): This field was included in IPv4 so that a source could request some form of privileged treatment from routers. Say, control packets could get preferential treatment in the wake of congestion in the network. It could also be used to specify quality of service requirements for sources, such as delay and throughput. However, it is not mandatory on routers to support this feature. Also, many legacy routers interpret these fields differently. In recent years, the Diffserv working group under IETF has been working on standardizing these bits to support the class-based network [4].
80
®
Engineering Internet QoS
®
Total length: As the name suggests, the total length field indicates the total length of a datagram in bytes including header. The theoretical maximum size of a datagram is 65,535, as this field is 16-bits wide. This length is essential as the datagrams can be of variable length. Also there is no specific end-ofdatagram flag in use.
®
Identifier: This field is used as a sequence number to uniquely identify a datagram and its fragments. If a datagram is fragmented by a router, each fragment contains the same identifier. This is helpful in the datagram reassembly process.
®
Flags and fragment offset: Flags and fragment offset are used for fragmentation.
®
Time to live: Because of routing loops, datagrams can keep circulating in the network. This may result in wastage of resources. The time-to-live (TTL) field restricts the life of a datagram in the network. The TTL field indicates the maximum number of hops a datagram can traverse in the network. Each router decrements this counter by one. Once the value of this field reaches zero, the datagram is discarded by the router.
®
Protocol: The protocol field identifies the transport layer protocol entity at the receiver that should receive the data portion of the IP datagram. As an example, a value of 6 indicates that the IP datagram is destined for a TCP entity, whereas the value of 17 indicates that it should be passed to a UDP entity. In essence, this field helps the multiplexing and demultiplexing a variety of higher layer protocols over the same IP layer.
®
®
Header checksum: The header checksum is used by routers to identify bit errors in a received IP datagram header. An error in the header may potentially result in delivery of a datagram to a wrong destination. Routers simply discard a datagram for which the checksum gives an error. The data part of IP protocol is not protected by this checksum. It is up to the higher layers to recover from errors in the data field. Source and destination address: These fields are used to identify the source and destination of the IP datagram. It contains the 32-bit-long IP address. Each interface connected to an IP network has a unique IP address. Options: The option field makes it possible to extend the IP header. As the name suggests, this field is not compulsory. This field can be used to support options such as security, source routing, route reordering, and timestamping. This field is of variable length, as the number of options is determined by the source of the datagram.
TCP/IP and Queue Management
0
Figure 4.2
®
81
31
Source port
Destination port
Length
Checksum
UDP header format.
®
Payload: In most cases, the payload field encapsulates higher layer PDUs (such as TCP or UDP segments). This field can also carry the ICMP messages. Padding: As we discussed earlier, the options field is of variable length. The padding field can be used to align this field to 32-bit words.
4.2 USER DATAGRAM PROTOCOL The Internet protocol stack has two protocols at the transport layer. The User Datagram Protocol (UDP) is a simple unreliable connectionless protocol. UDP is defined in the RFC768 [5]. UDP uses a connectionless mode of delivery. UDP datagram is encapsulated within an IP packet. The UDP datagrams are delivered just like IP packets and may be discarded before reaching their destination. Figure 4.2 shows the header used by UDP datagrams. The header consists of four fields as described below:
®
®
Source and destination port numbers: 16 bits each. UDP provides port numbers to let multiple processes use UDP services on the same host. A UDP address is the combination of a 32-bit IP address and the 16-bit port number. With 16 bits, there are 65,535 possible ports. However, certain port numbers (called wellknown ports) are reserved for server applications.
®
Length: 16 bits. The length field represents the total length of the UDP datagram (including header) in bytes; Checksum: 16 bits. The UDP provides a checksum field to check the integrity of its data. A packet with an incorrect checksum is simply discarded, with no further action taken.
The simplicity offered by UDP is used when TCP would be too slow or complex. Especially for interactive multimedia transmission such as Internet telephony,
82
Engineering Internet QoS
where a certain amount of packet loss is acceptable but end-to-end delay needs to be bounded, UDP becomes a protocol of choice. The overhead of retransmission of lost datagrams is too high and a retransmitted packet may be too late for playout.
4.3 TCP BASICS In contrast to UDP, the Transmission Control Protocol (TCP) is a reliable connectionoriented protocol. For certain applications the reliability feature becomes extremely important as the applications do not have to worry about lost or reordered data. TCP is a full duplex protocol supporting data flow in both directions. TCP is a byte-oriented protocol. Once a TCP connection is established, the sender writes bytes into the connection and the receiver reads bytes out of the connection. It is worth noting that this byte-oriented abstraction is provided to higher layer protocols only (such as application layer protocols). The TCP entity accumulates a certain amount of bytes and sends a packet (called TCP segment) to the destination. The receiving TCP entity buffers these bytes to be read by a receiving application process at a later stage. The sending TCP entity has a variable called maximum segment size (MSS) that determines the size of a segment (or number of bytes to be sent in a segment). This TCP segment is encapsulated within an IP datagram, as described later in this chapter. Fragmentation at the lower layer is expensive. For this reason, the TCP entity selects an MSS size that avoids fragments at lower layers. Each of these links has a maximum limit on the amount of data that can be carried [called maximum transfer unit, (MTU)]. For example, the Ethernet has an MTU of 1500 bytes. The TCP entity should choose an MSS of 1460 bytes (1,500 40 (TCP+IP header)) to avoid link layer fragmentation.
ï
4.4 TCP SEGMENT FORMAT Figure 4.3 shows the format of a TCP segment.
®
®
Source port number: Each application using a TCP connection can be uniquely identified by a 16-bit port number. This allows for the TCP entity to multiplex and demultiplex multiple higher layer connections. Destination port number: The destination port number allows for the destination TCP entity to identify the higher-layer entity (application protocol) as a recipient of a particular TCP segment.
TCP/IP and Queue Management
83
31
0
Destination port #
Source port # Sequence number
Acknowledgement number Head er l eng th
Unused
U
A
P
R
S
F
Receiver window size
Checksum
Urgent pointer data Options (variable )
Application Data (variable length)
Figure 4.3
®
TCP segment format.
®
®
Sequence number: As discussed earlier, TCP is a byte-oriented protocol. The 32-bit -wide sequence number field contains the sequence number of the first byte of data carried in a TCP segment. As an example, if the preceding segment started with the sequence number 2,000 and contained 1,460 bytes of data, then the sequence number of the next TCP segment would be set to 3,461. Acknowledgment number: This field is used by TCP for data flow in the reverse direction (remember that TCP is full duplex). This technique of sending an acknowledgment with data is also called piggybacking. Data carries acknowledgment for the reverse direction of traffic. The 32-bit acknowledgment number tells the sender the start byte of the next segment that it should expect to receive. This also implicitly acknowledges that all bytes before this have been received successfully. It is worth noting that more than one segment can be acknowledged by a TCP entity using this field. Header length: This field is used to indicate the length of the TCP header in multiples of 32-bit words. In most cases the header length of a TCP segment is 20 bytes. However, this may vary if the options field is used. Since the header
84
Engineering Internet QoS
®
can be of variable length, this field also helps identify the start of the data field of the segment. Reserved bits: The reserved bits (6) of the TCP segment are for future use.
®
®
Flags: The 6-bit flag field is used by the peering TCP entities for communication of control information. Some of these fields are used during the connection establishment and termination phases. A TCP segment may carry either user or control information. These fields help identify the type of information carried in the segment. Table 4.1 provides descriptions of these fields. The description relates to the value of the flag being set to true (1).
®
Receiver window size: This field carries information about the flow of data in the reverse direction. The receiver window size is used for flow control purposes, as described later in this chapter.
®
®
Checksum: The checksum field is computed over the TCP header, the TCP payload, and pseudo header consisting of source and destination address from IP, as well as the length fields of the IP header. This 16-bit field protects both TCP header and payload. Urgent pointer: The TCP segment may carry data that needs priority treatment. The urgent data is placed before any other data. The 16-bit urgent pointer indicates the start location of nonurgent data in the segment. The receiving TCP entity can identify the urgent data using this pointer. A TCP segment carrying urgent data has the URG flag set. Options: The RFC793 [6] defines the format of options filed. Options are to be specified using multiples of bytes. The first byte indicates the option type followed by length of option in bytes. We provide examples of two important options below: 1. Maximum segment size (MSS): This option is used by TCP during the connection establishment phase to negotiate the maximum segment size to be used for a connection. The 16 bits used for this field limits the MSS to 64 KB. 2. Timestamp: The timestamp option is defined in RFC1323 [7] to be used for the round-trip time (RTT) calculation. Two 4-byte timestamp fields are used for this option. The sender TCP entity fills the first field with the current time. The receiver echoes back the timestamp value received
TCP/IP and Queue Management
85
Table 4.1 TCP Segment Flags Flag
Description
ACK (A) FIN (F) PSH (P) RST (R) SYN (S) URG (U)
Acknowledgment field valid Final segment from sender Push operation invoked; receiving process needs notification Connection to be reset Start of a new connection Urgent pointer field valid
in an acknowledgment segment. This facilitates the sender with accurate calculation of RTT.
4.5 TCP THREE-WAY HANDSHAKE As discussed earlier, TCP is a connection oriented protocol. The two ends of TCP need to establish a connection before they can send any data. Once the data sending phase is over, connection is closed by each end of the TCP connection. Both ends need to explicitly close the connection. The process of establishing and terminating a TCP connection is called the three-way handshake. Figure 4.4 shows how the three-way handshake works. First of all, the client end of the TCP sends a segment to the server with the initial sequence number (say ) it is going to use with SYN-bit set in the flags field. The server sends a segment that has both and bits set in the flag ( ). The acknowledgment number indicates that the server has received bytes up to correctly and the next byte it expects is with sequence number . The sequence number tells the client that the server will use start sequence number for its data. Finally, the client acknowledges the server’s sequence number ( ) with a segment.
» õ=õ òð ñ#ó1ôâ ö Aë øù ðò÷ô » õRü ðò÷9ôú± ëAøùÁ ðòñ#ó1ôâ » Ò ²R² Á}ë ¸WÍ<ôâ û õRõ õRü Ò ²R² ëAøùÈÁ}ë ¸WÍ<ôâ » Ò ² Ö
86
Engineering Internet QoS
SYN, SeqNo=88
SYN, ACK, SeqNo=155, AckNo=89
ACK, AckNo=156
Server
Client
Figure 4.4
TCP three-way handshake.
4.6 TCP ACKNOWLEDGMENT TCP is a reliable connection-oriented protocol. It uses acknowledgments from a receiver to the sender to confirm delivery of data received. The acknowledgment is cumulative (i.e., acknowledgment for packet acknowledges that all packets up to and including sequence were received correctly by the receiver). Although, in reality, TCP assigns sequence numbers to each byte that it sends, for simplicity this chapter assumes that packets are assigned a sequence number. A receiver generates a new cumulative acknowledgment upon receipt of a new in-sequence packet. A duplicate acknowledgment is generated by the receiver whenever an out-of-order segment arrives at the receiver.
Ë
Ë
4.7 FLOW CONTROL TCP provides flow control to avoid a fast sender swamping the slow receiver. Each TCP receiver allocates buffer for a TCP connection. Bytes received (correctly and in order) are placed in this buffer. The corresponding application reads bytes
TCP/IP and Queue Management
87
from this buffer as soon as possible. In some cases the application may be busy and cannot read the bytes immediately. Once the receiver buffer is full, more bytes cannot be received for this connection. Flow control is required to indicate to the sender the receiver’s available buffer space so that the sending rate can be kept in tune with the receiving rate. The TCP receiver maintains a variable called for this purpose. TCP code for flow and congestion control are mixed. Another variable called is maintained by TCP to limit the amount of data that it can send into a connection. This variable is calculated based on current congestion levels in the network. Impact of congestion control and the variable is discussed later in this chapter. Jacobson and Allman et al. [8, 9] provide detailed treatment of congestion control in TCP. The sender TCP chooses a minimum of and to limit the amount of data sent to a connection. This has the impact of the TCP source being restricted to the slowest component in the end-toend connection. If the network is congested, there is no point sending faster than what can be forwarded by routers. If the end system is busy or slow, again there is no advantage in flooding the network and system. The TCP sliding window scheme used in TCP flow control is described next.
ëYý=þ ñ0á#Ã'Ëfÿ0ñ ý Ë4Ý ý â#Ø
ø â#Ý ã!ñ1ÿWÃ'Ë'â#Ý
ø â#Ý ã!ñ1ÿWÃ'Ë'â#Ý
ëYý=þ ñ0á#Ã'Ëfÿ0ñ ý Ë4Ý ý â#Ø
Ë4Ý ý â#Ø
Ë)Ý ý â#Ø
ø â#Ý ã<ñ#ÿWÃ'Ë'â#Ý
Ë4Ý ý â#Ø
Sliding Window Figure 4.5 shows the sliding window scheme used by TCP for flow control. For simplicity, let’s assume that the window size is 6 and segments with sequence number 0 onwards are available for transmission. The sequence space is determined by the sequence number field in the TCP header (32 bits). We explain this example through the steps below: Step 1: The figure shows that segments 0, 1, and 2 have already been transmitted and acknowledged by the receiver. Segments 3, 4, and 5 have already been sent and the sender is waiting for acknowledgment to come back. Since the window size is 6, segments 6, 7, 8 are eligible to be sent by the TCP process. Segments 9 and above cannot be sent because of window size limitations. Step 2: Shows that the sending TCP process has sent segments 6, 7, and 8 and it is waiting for acknowledgment for all segments in its current window. No more segments can be sent at this stage. Step 3: Shows the status when acknowledgment for segments 3 and 4 has been received. At this stage, the sliding window moves by two to the right, making segments 9 and 10 eligible to be sent.
88
Engineering Internet QoS
0
1 2
3
4 5
6
7
8
9 10 11 12 13
Step 1
0
1 2
3
4 5
6 7
8
9 10 11 12 13
Step 2
0
1 2
3
4 5
6 7
8
9 10 11 12 13
Step 3
0
1 2
3
4 5
6 7
8
9 10 11 12 13
Step 4
Time Acknowledged
Waiting for ACK
Figure 4.5
Can be sent Can’ t be sent
TCP sliding window.
Step 4: Shows that the sender TCP process sends segments 9 and 10 and starts waiting for acknowledgment. In summary, the right side of the window moves when a segment is sent, whereas the left side of the window moves when an acknowledgment is received. The maximum number of segments waiting for acknowledgment is determined by window size.
4.8 CONGESTION CONTROL It is important to understand a few basic mechanisms such as packet loss detection, retransmission timer, RTT estimation, and timer granularity before understanding the congestion control issues in TCP. 4.8.1 Packet Loss Detection Detection of packet loss is specific to variants of TCP, such as Tahoe, Reno, and Vegas (briefly discussed in Section 4.8.6). Some schemes use timeouts to detect
TCP/IP and Queue Management
89
loss of a segment. The TCP sender sets a retransmission timer for packets sent. At one time only one timer is active. If the acknowledgment for the packet (for which the timer was started) is not received before expiration of the timer, the packet is considered to be lost. The value of the retransmission timer is calculated dynamically, as explained in the next subsection. Some variants of TCP use timeout as well as arrival of multiple ACKs with the same cumulative sequence number (duplicate ACKs) for packet loss detection. 4.8.2 Retransmission Timer For each TCP segment sent, the sender starts a timer. Upon expiration of this timer (called timeout period), the segment is retransmitted. Setting a correct value for this time is very significant from a performance point of view. The timeout period should be greater than the RTT to accommodate the transmission delay, propagation delay of the link, packet header processing time, as well as ACK generation time by protocol stack at the receiver. Setting timeout much longer than this would result in a longer delay for applications. However, smaller values may result in premature retransmission of segments. This would also result in wasted bandwidth. 4.8.3 RTT Estimation Each TCP sender maintains an estimate of RTT for each of its connections. We use the EstimatedRTT variable to represent this. The EstimatedRTT is calculated from sample RTT (SampleRTT) of the connection. SampleRTT is defined as the time from the moment at which the TCP segment is delivered to the IP layer until an ACK is received for this segment. An exponential weighted moving average is used for calculation of EstimatedRTT, since the SampleRTT varies significantly between measurements. This variation may be caused by delay in router buffers as well as end system processing time. The EstimatedRTT is given by the following equation
» ¾Ò ï IÄ
±
(4.1)
The typical value of is set to 0.125. This has the impact of giving very low weight to the SampleRTT value measured in the previous period and high weight to the historical data represented by EstimatedRTT. Lower value prevents the RTT estimate from being skewed by any spikes in the measured samples. The timeout is given by the following equation
"!#
»
± Ô
# $ %"&
(4.2)
90
Engineering Internet QoS
where
# $% %%&
» ¾ Ò ï'IÄ(
# $ %"&
±)+*
, "
ï
, "
* (4.3)
The deviation factor in ( 4.2) accommodates the fluctuation in SampleRTT from its EstimatedRTT. For links with consistent SampleRTT, this factor will be negligible. Equation 4.3 is used to keep the exponentially weighted moving average of deviation. 4.8.4 Slow Start Figure 4.6 shows the slow start mechanism used by TCP for congestion control. is set to 1 MSS. A TCP Initially, the congestion window size sender increases its window size by 1 upon receipt of an ACK. This has the impact of window size being doubled every round-trip time (RTT). As is evident from the figure, initially one segment is sent. Upon receipt of the first ACK, two segments are sent. Once the ACKs for these segments are received, four segments are sent. This results in exponential increase of TCP window size.
ø â#Ý ã<ñ1ÿVÃ'Ë'â#Ý
Ë4Ý ý â#Ø
4.8.5 AIMD Additive increase and multiplicative decrease (AIMD) is a scheme used by TCP for congestion control purposes. In the additive increase phase, the TCP sender increases the window size by for each ACK received. This results in increasing the window size by 1 of every RTT. Transition from slow start (or exponential) phase is controlled by a variable called ssthresh. The ssthresh is initialized to half the initial window size. Once loss is detected, this variable resets its size to half the current window size. To be precise, the is set to the maximum of min( , )/2 and 2 MSS. Figure 4.7 shows the AIMD scheme. For simplicity, it shows the increase of the congestion window as a function of RTT. Upon detecting a packet loss, the TCP sender implicitly assumes that network congestion has occurred. In the example used for was Figure 4.7, it is assumed that congestion occurred when 16 MSS. Upon timeout, the is set to 1 MSS. The is set to 8 in this case (half of at the time of congestion). The congestion window increases exponentially during slow start phase until it reaches (8). After this period, it enters the congestion avoidance phase the value of and starts to grow linearly.
Ò#Æø â#Ý ã<ñ1ÿWÃ'Ë4â#Ý
ø â#Ý ã!ñ1ÿWÃ'Ë'â#Ý
Ë4Ý ý â#Ø
ÿ0ÿWÃ\àá1ñ#ÿ#à Ë4Ý ý â#Ø ëYý=þ ñWá#Ã'Ëfÿ0ñ ý Ë4Ý ý â#Ø
ø #â Ý ã<ñ#ÿWÃ'Ë'â#Ý 4Ë Ý ý â#Ø ø â#Ý ã<ñ#ÿWÃ'Ë'â#Ý 4Ë Ý ý â#Ø
ÿ#ÿWÃ\àmá1ñ1ÿ#à
ø â#Ý ã<ñ#ÿWÃ'Ë'â#Ý
Ë4Ý ý â#Ø ÿ0ÿWÃ\àá1ñ#ÿ#à
TCP/IP and Queue Management
Sender
Receiver 1 segment
RTT
2 segments
4 segments
Time
Figure 4.6
TCP slow start.
91
92
Engineering Internet QoS
16 14
Congestioon window
12
Congestion avoidance
10
Slow start threshold
8 6 4
Slow start
2
1
2
3
4
5
6
7
Roundtrip time (RTT)
Figure 4.7
Congestion avoidance.
4.8.6 TCP Tahoe/Reno/Vegas
The congestion control algorithm described in the previous section is known as Tahoe. A major drawback of the Tahoe scheme is the time taken to react to congestion. A TCP-Tahoe source needs to wait for a timeout period to detect a segment loss. An improved algorithm called TCP-Reno is widely implemented by most TCP/IP stacks. TCP-Reno improves the Tahoe version by using a scheme called fast retransmission. If three duplicate ACKs of a segment are received before the segment’s timeout period, the dropped segment is retransmitted. In addition, TCP-Reno also employs a technique called fast recovery. Fast recovery skips the slow start phase after a fast retransmission [9]. A new algorithm called TCPVegas proposed proactive congestion management [10]. Congestion is monitored by looking at the RTT. A longer RTT indicates that the network is congested. Once the longer RTT indicates possible future congestion, the source reduces its rate linearly.
TCP/IP and Queue Management
93
4.9 QUEUE MANAGEMENT Although we have studied mechanisms deployed by TCP to control congestion, these mechanisms are considered insufficient in the Internet. Especially nonadaptive flows such as UDP can do little to control congestion unless rate control is implemented at higher layers using protocols such as RTP/RTCP. Current Internet routers are expected to implement queue management techniques that can complement the end-to-end schemes [11]. These techniques include explicit signaling of congestion to the senders as well as active management of queues at network elements. Active queue management monitors the queue size proactively and starts marking and dropping packets before severe congestion occurs. Many authors consider packet scheduling (as discussed in Chapter 3) as part of queue management. Queue management and scheduling are complementary to each other. Scheduling is primarily used to decide which packet to send next and to provide per-flow bandwidth guarantees. However, it doesn’t have any mechanism to control the size of the queue. Many implementations of FQ and CBQ are implementing schemes such as random early detection (RED) for this purpose. In this book we consider queue management separately from packet scheduling. Queue management techniques are responsible for following functions:
® ®
Move packets to an appropriate queue;
®
Remove packets from a queue on request from a packet scheduler; Drop and remark packets if queue is full (or approaching saturation).
4.9.1 Explicit Congestion Notification The TCP congestion control algorithms and other application-layer adaptive algorithms use implicit methods to recognize congestion. Also, most legacy routers drop packets in the wake of congestion, which results in wasted resources. An explicit method of signaling congestion was proposed by Ramakrishnan and Jain [12, 13] in a scheme called DECbit. The basic idea behind the DECbit scheme is that a packet header carries a bit that can be set by congested intermediate network elements. The receiver copies the congestion bit from data packets to the acknowledgment that is to be sent back to the receiver. The sender treats this as a congestion signal and reduces its data rate. IETF is currently working on standardizing explicit signaling of congestion to senders. The RFC2481 [14] has proposed one such scheme called explicit congestion notification (ECN) by using two bits from the IPv4 ToS field:
94
®
Engineering Internet QoS
®
ECN capable transport (ECT) is used by end systems to indicate whether they are capable of ECN. Congestion experienced (CE) is used by network elements to mark this flag if it is experiencing congestion.
Congested network elements are required to mark the CE flag only when the ECT bit is set. Otherwise they may choose to drop the packet. As the congestion related bits are marked at the IP layer, transport layers such as TCP need to be informed of the marking of the CE bit. RFC2481 has suggested modifications to TCP end systems so that the ECN enabled clients can communicate with each other and data packets can be marked with ECT bits.
4.9.2 Packet Drop Schemes
This section concentrates on packet drop techniques and algorithms. Packet dropping is an important mechanism used by network elements (routers and switches) to avoid and reduce congestion. Most legacy routers have used the drop-tail scheme because of ease of implementation. Network elements drop packets from the tail of the queue. If a queue is full upon arrival of a packet, the newly arrived packet is dropped. RFC2309 [11] lists the following two problems with the drop-tail scheme:
® ®
Lockout: It is possible that one or more flows can monopolize the queue space on a router. This phenomenon, called lockout, prevents other flows from accessing the queue space. Full queues: The Drop-tail scheme allows routers to maintain queues at nearly full level. As the tail drop signals congestion only when queue is full and packets start getting dropped, congestion can continue for long period of time.
Sources, such as TCP that detect congestion using duplicate ACKs, will figure this out only after all packets from the queue have been served. For such sources, drop from the head is more effective [15]. Usually implementation of the drop-head scheme is more expensive. Yet another possibility is to distribute the packet loss randomly from the queue. This scheme is quite attractive, as it distributes the loss among flows fairly. Flows sending more packets are penalized severely. However, implementation of random drop is quite complex.
TCP/IP and Queue Management
95
4.9.3 Global Synchronization Problem The drop-tail packet dropping scheme has one major deficiency in that it creates a problem known as the global synchronization problem. When a large number of flows are passing through an Internet router that is congested, it may start dropping packets from each of these flows almost simultaneously. As a result, each TCP sender detects packet loss and goes into the slow start phase. As a result of this, congestion at the bottleneck router improves and the TCP sources start increasing their rate. If the bottleneck router can’t cope with load, all of these sources would back off again. This results in a unstable network with congestion collapse. 4.9.4 Random Early Detection Scheme Floyd and Jacobson proposed the RED method, which drops packets randomly from the active flows in order to eliminate the global synchronization problem [16]. The RED scheme monitors the queue (buffer) fill level at routers and as the queue builds up, it starts dropping packets from randomly selected flows.
Qlen t - /- 1. 0 10 3. 2 32 5. 4 6.54 7. 6 8.76 9. 8 98 /. 0 10 3. 2 32 5. 4 6.54 7. 6 8.76 9. 8 98 /-./- 1. 0 10 3. 2 32 5. 4 6.54 7. 6 8.76 9. 8 98 /-./- 1. 0 10 3. 2 32 5. 4 6.54 7. 6 8.76 9. 8 98 /-./- 1. -. / -/ 0. 1 01 2. 3 23 4. 5 6.45 6. 7 8.67 8. 9 89 0 10 3. 2 32 5. 4 6.54 7. 6 8.76 9. 8 98 /-./- 1. 0 10 3. 2 32 5. 4 6.54 7. 6 8.76 9. 8 98 /-./- 1. 0 10 3. 2 32 5. 4 6.54 7. 6 8.76 9. 8 98 /-./- 1. 0 10 3. 2 32 5. 4 6.54 7. 6 8.76 9. 8 98 /-./- 1. 0 10 3. 2 32 5. 4 6.54 7. 6 8.76 9. 8 98 /-./- 1.
Max th Figure 4.8
Minth
RED drop probability.
Ü
RED algorithm requires an estimate of buffer occupancy and a calculation of ) and minpacket drop probability. Two pointers, called maximun threshold ( ), are maintained for this purpose, as shown in Figure 4.8. imum threshold ( Another variable, :<; , maintains the current value of queue length at instant .
Ü
4Ë ÝKÉ8Ê ñWÝIÉ
¼=½YÉ8Ê
Ã
96
Engineering Internet QoS
18 Actual EWMA 16
14
Queue size
12
=
10
8
6
4
2
0 0
1
2
3
4
5
6
7
8
9
10
Time
Figure 4.9
EWMA calculation.
4.9.4.1 Buffer Occupancy Estimation The >@?#A@?#AM?ANBDCFEOG since may vary significantly between measurements. The is given by the following equation:
>M?A@?#A
(4.4)
using the Figure 4.9 shows the calculation for EWMA with simulation script provided by Floyd [17]. The solid line marked “actual” shows the queue size in packets waiting for transmission on the outgoing link of the RED gateway. The dotted line shows X the EWMA of queue size calculated by the gateway. Typically, a small value of ensures A@?#A
TCP/IP and Queue Management
97
4.9.4.2 Packet Drop Probability The packet drop probability, g , for a newly arrived packet is calculated by the following equation: > ?A
1
Drop probability
Max p
Min th
Max th
Average queue size
Figure 4.10
RED drop probability.
The RED packet drop algorithm is shown in Figure 4.10. The following checks are performed by a RED queue manager:
hon EHGDp
q If >@?#A
, packet is buffered. This situation indicates that there is
no congestion;
q If >M?ANBDCFE Gtsuhkjl,GDp , packet is discarded (drop probability is 1). This signifies a high congestion scenario;
q If hon E GDpwv
>M?ANBDCFE G r
hkjlGDp
, packet is dropped based on calculated m hkj l drop increases linearly toward as probability, g . The packet drop probability hkj ltGDp the buffer occupancy approaches and as a result avoids overreaction
98
Engineering Internet QoS
to the mild congestion situation. This is a congestion avoidance scenario, and packets are dropped in anticipation of queue buildup. Adaptive flows such as TCP can react to a congestion situation after only one RTT. The proactive approach taken by RED starts the packet dropping before the queue gets saturated, and adaptive flows are likely to get congestion notification in time. Nodes implementing RED can also configure it so that along with packet drop, it can also implement marking of offending packets. Clients implementing schemes such as DECbit and ECN can benefit from the marked packets by reducing their sending rate. Floyd and Jacobson [16] have shown that loss of packets for flows is in proportion to their share of throughput. In essence, greedy sources (those sending more packets) are penalized heavily. A scheme such as RED is suitable for best effort flows where fairness is a key criteria. However, if a varying degree of QoS is to be supported for different flows, the queue manager must discard packets selectively. Lower priority flows must lose more packets in the wake of congestion. A variant of RED called WRED is described below. 4.9.5 Weighted Random Early Detection The weighted random early detection (WRED) scheme works with packets that are marked with different drop levels (Chapter 7 discusses assured forwarding for the diffserv network, which can mark packets with various drop precedences). A different drop probability function is used based on marking of packets. Packets with higher drop priority should be dropped first if queue occupancy increases beyond the minimum threshold. scheme hkjl GDThe n E(GDp hkj lm allows for the setting of a p hoWRED different set of parameters, , , and , for various traffic classes. Router vendors are supporting up to 8 sets of values based on IPv4 precedence level [18]. Figure 4.11 shows that packets marked with AF12 class are dropped more aggressively than packets marked with AF11. 4.9.6 RED with In/Out RED with the in/out (RIO) scheme assumes that edge routers will mark packets conforming to the service level agreement (SLA) as in-profile and offending packets as out-of-profile [19]. When a network gets congested, queue management will drop packets marked out-of-profile first. A different set of RED parameters are used for traffic being in and out of profile. RIO differs from WRED in that it uses different EWMA for estimating average buffer occupancy level. It maintains
TCP/IP and Queue Management
99
1 AF12
Max p AF11 class AF11
Drop probability
Max p
Figure 4.11
AF 12 class
Min th
Min th
AF12
AF11
Max th AF12
Max th AF11
Average queue size
WRED drop probability.
>@?#A
the variable >M?ANBDCFE|{N}J~(G , which is the EWMA of only in-profile packets. Another variable, , keeps EWMA of both in- and out-of-profile packets. The basic reason behind keeping separate EWMA is to isolate in-profile traffic from burst of out-of-profile traffic during congestion. A burst of out-of-profile traffic doesn’t trigger dropping traffic. hkjlHGDp ofhoin-profile n E GDp hkjl m As with the WRED scheme, RIO uses a different set of , , and for in- and out-of-profile traffic, with more aggressive drop probability for out-of-profile traffic. May [20] et al. developed an analytical model to study the performance of RIO in the Diffserv network. This study concluded that service differentiation is achievable in Diffserv using RIO. 4.9.7 Problems with RED In the previous sections, we discussed the RED scheme and some of its variants. A large amount of research work has been published on performance study of RED. Researchers have found several problems with the RED scheme when it is subjected to a mix of traffic. In particular, it may result in lower throughput, larger delay and jitter, and unfairness for some flows at the expense of other flows [21] [22] [23]. Lin and Morris [24] have shown that the RED scheme doesn’t work particularly well when the queue is occupied by well-behaved (adaptive) TCP flows as well as greedy
100
Engineering Internet QoS
(nonadaptive) UDP flows at the same time. Misbehaving flows don’t back off hon even E
4.10 RESEARCH DIRECTIONS The packet drop scheme has been a very active research area for the past decade. This section gives details of a scheme called Blue [26] and an overview of some of the prominent research related to this topic. 4.10.1 Blue Blue is a totally new approach to active queue management. It uses packet loss and link-idle event to manage congestion. Blue uses single probability N to mark packets when they are queued. If congestion grows, the marking probability is increased (more packets dropped). If the queue remains idle for a while, the marking probability is decreased. The following pseudo code adapted from Feng et al. [26] describes the Blue algorithm: Upon packet loss event: if (( now last update) freeze time) then Pm = Pm + incr; last update = current time; Upon link idle event: if (( now last update) freeze time ) then Pm = Pm decr; last update = current time;
TCP/IP and Queue Management
101
The freeze time variable controls the update period of probability . It is recommended to use a randomized value of this time to avoid global synchronization [27]. The incr parameter determines the amount by which is incremented when the queue manager starts dropping packets. (This situation indicates that at the link sending rate, there is a lack of queue space.) Loss of packets due to overflow indicates that Blue is working conservatively. After freeze time, the marking probability is increased. The decr parameter determines the decrease in when the link is idle. If the link remains idle for the duration of freeze time, the queue manager takes a less aggressive approach by decreasing the probability by decr. Through simulation and experimentation, authors have demonstrated that Blue outperforms RED. It needs much smaller queue size and has lower packet loss under a variety of traffic mix and network conditions. Its lower memory requirement is particularly significant for legacy routers. 4.10.2 Related Work Balanced RED (BRED) [28] attempts to solve the problem of fair bandwidth sharing between TCP and UDP traffic. It regulates the bandwidth of a flow by keeping the per-flow state. Stochastic fair blue (SFB) is yet another scheme with similar objectives that combines the BLUE scheme with a Bloom filter [26]. Class based threshold RED (CBT-RED) sets the queue threshold based on traffic type and its associated priority level [29]. For example, UDP traffic is subjected to a different threshold from TCP, which provides protection for TCP flows. Stabilized RED (SRED) proposes an algorithm to make the RED queue stable [30]. SRED drops a packet based on the number of active flows and the instantaneous queue size as opposed to the average queue size of RED. Drop probability is also divided into three sections. Low throughput is considered to be the major problem with SRED. Double Slope RED (DSRED) hon E+GDp improves hkjl the GDp and throughput and delay of RED by dividing queues between into two segments [31]. Authors also propose use of a linear drop function with different slopes for each segment. The FRED scheme proposed by Lin and Morris maintains a per-active-flow state to impose a loss rate proportionate to the queue occupancy level [24]. Random exponential marking (REM) [32] is another scheme that combines the queue and the link rate information together. The main objective of REM is to match the rate required by flows to the available link capacity and keep the gateway queue small. The major drawback of REM is that it needs configuration of multiple parameters.
102
Engineering Internet QoS
4.11 SUMMARY An overview of TCP and IP protocol was provided in this chapter. We began with a reliable transport protocol and saw that it could provide connection-oriented reliable communication over connectionless IP protocol. Concepts such as TCP’s connection management, flow and congestion control, and retransmission timer were introduced. In the second half of this chapter, we described queue management issues. We looked at passive as well as active queue management techniques. A detailed discussion of the popular technique RED was provided. For building a class based network, WRED and RIO schemes were described. Finally we looked at the problems associated with RED and current research related to active queue management. In particular, the Blue algorithm was discussed in detail.
4.12 REVIEW QUESTIONS 1. What is the difference between congestion management and packet scheduling? 2. How is active queue management different from simple packet dropping? 3. What is the advantage of the drop-head over drop-tail scheme? 4. What are the situations under which a queue manager would mark a packet rather than dropping it? 5. Explain the RED algorithm. What is the significance of early dropping? 6. Why does RED use EWMA for queue size estimation rather than taking the instantaneous value? 7. Why does the packet drop probability hon EGDp need to grow only after the queue size has reached minimum threshold ? 8. Why does the packet drop probability hon EGDp grow aggressively after the queue size has reached maximum threshold ? 9. Explain the RIO drop scheme. 10. What is the major distinction between RED and RIO? 11. Explain the Blue algorithm. 12. What is the major distinction between RED and Blue?
References
103
References [1] W. R. Stevens. TCP/IP Illustrated Vol 1: Protocols. Addison-Wesley, Reading, Massachusetts, 1994. [2] D. E. Comer. Internetworking with TCP/IP Vol 1: Principles, Protocols, and Architecture. Prentice Hall, Upper Saddle River, New Jersey, 2nd edition, 1991. [3] R. Hinden and S. Deering. IP version 6 addressing architecture. Internet Request for Comment RFC2373, IETF, July 1998. [4] K. Nicols, S. Blake, F. Baker, and D. Black. Definition of the differentiated services field (ds field) in the IPv4 and IPv6 headers. Internet Request for Comment RFC2474, IETF, December 1998. [5] J. Postel. User datagram protocol. Internet Request for Comment RFC768, IETF, August 1980. [6] J. Postel. Transmission control protocol. Internet Request for Comment RFC793, IETF, September 1981. [7] V. Jacobson, S. Braden, and D. Borman. TCP extensions for high performance. Internet Request for Comment RFC1323, IETF, May 1992. [8] Van Jacobson. Congestion avoidance and control. ACM Computer Communication Review, 18(4):314–329, August 1988. [9] M. Allman, V. Paxon, and W. Stevens. TCP congestion control. Internet Request for Comment RFC2581, IETF, April 1999. [10] Lawrence S. Brakmo and Larry L. Peterson. TCP Vegas: end-to-end congestion avoidance on a global Internet. IEEE Journal on Selected Areas in Communications, 13(8):1465–1480, October 1995. [11] B. Braden, D. Clark, J. Crowcroft, B. Davie, S. Deering, D. Estrin, S. Floyd, V. Jacobson, G. Minshall, Partridge C., L. Peterson, K. Ramakrishnan, S. Shenker, J. Wroclawski, and J. Zhang. Recommendations on queue management and congestion avoidance in the Internet. RFC 2309, April, 1998. [12] K. K. Ramakrishnan and Raj Jain. A binary feeback scheme for congestion avoidance in computer networks. ACM Transactions on Computer Systems, 8(2):158–181, May 1990. [13] K. K. Ramakrishnan and Raj Jain. A binary feedback scheme for congestion avoidance in computer networks with a connectionless network layer. In SIGCOMM Symposium on Communications Architectures and Protocols, pages 303–313, Stanford, California, August 1988. ACM. [14] K. Ramakrishnan and S. Floyd. A proposal to add explicit congestion notification (ECN) to IP. Internet Request for Comment RFC2481, IETF, January 1999. [15] T. V. Lakshman, A. Neidhardt, and T. J. Ott. The drop from front strategy in TCP and in TCP over ATM. In IEEE INFOCOM96, pages 1242–1250, San Francisco, California, March 26–28, 1996.
104
References
[16] S. Floyd and V. Jacobson. Random early detection gateways for congestion avoidance. IEEE/ACM Transanctions on Networking, 1(4):397–413, August, 1993. [17] S. Floyd. Ns simulator tests for random early detection (red) queue management. http://www.aciri.org/floyd/red.html, October 1996. [18] G. Armitage. Quality of Service in IP Networks. McMillan Technical Publishing, Indianapolis, Indiana, April 2000. [19] J. Heinanen, V. Baker, F. Jacobson, and Wroclawski J. Weiss. Assured forwarding PHB. Request for Comments 2597, Internet Engineering Task Force, June 1999. [20] M. May, J. Bolot, A. Jean-Marie, and C. Diot. Simple performance models of differentiated services schemes for the Internet. In IEEE INFOCOM99, pages 1385–1394, New York, March, 1999. [21] M. May, T. Bonald, and J. Bolot. Analytic evaluation of RED performance. In IEEE INFOCOM2000, pages 1415–1424, Tel-Aviv, Israel, March 26–30 2000. [22] B. Suter, T. V. Lakshman, D. Stiliadis, and A. K. Choudhury. Buffer management schemes for supporting TCP in gigabit routers with per-flow queuing. IEEE Journal on Selected Areas in Communications, 17(6):1159–1169, June, 1999. [23] G. Hasegawa, T. Matsuo, M. Murata, and H. Miyahara. Comparisons of packet scheduling algorithms for fair service among connections on the Internet. In IEEE INFOCOM2000, pages 1253–1262, Tel-Aviv, Israel, March 26–30, 2000. [24] D. Lin and R. Morris. Dynamics of random early detection. In Proceedings of ACM SIGCOMM97, pages 127–137, Cannes, France, September 14–18, 1997. [25] C. Villamizar and C. Song. High performance TCP in ANSNET. ACM Computer Communication Review, 24(5):45–60, October 1994. [26] W. Feng, D. Kandlur, D. Saha, and K. G. Shin. Stochastic fair blue: A queue management algorithm for enforcing fairness. In IEEE INFOCOM2001, volume 3, pages 1520–1529, Anchorage, Alaska, USA, April 22–26, 2001. [27] S. Floyd and V. Jacobson. On traffic phase effects in packet-switched gateways. Internetworking: Research and Experience, 3(3):115–156, September 1992. [28] F. Anjum and T. Tassiulas. Balanced RED: An algorithm to achieve fairness in the Internet. In IEEE INFOCOM99, volume 3, pages 1412–1420, New York, March, 1999. [29] M. Parris, K. Jeffay, and F. D. Smith. Lightweight active router queue management for multimedia networking. In Proceedings of SPIE, pages 162–174, San Jose, California, Jan 25-27, 1999. [30] T. J. Ott, T. V. Lakshman, and L. Wong. SRED: Stabilized RED. In IEEE INFOCOM99, pages 1346–1355, New York, March, 1999.
References
105
[31] B. Zheng and M. Atiquzzaman. DSRED:improving performance of active queue management over heterogeneous networks. In IEEE ICC2001, volume 8, pages 2975–2979, Helsinki, Finland, June 11–15, 2001. [32] S. Athuralyia, S. H. Low, V. H. Li, and Q. Yin. REM:active queue management. IEEE Network Magazine, pages 48–53, May/June 2001.
106
References
Chapter 5 Integrated Services Packet Network High-speed networks have enabled new real-time multimedia applications. They also need “deliver on time” assurances from the network. The best effort service model of the Internet, where all packets and flows have equal status and the network is not being able to provide packet delivery guarantees is inappropriate for these applications. Research work and experimentation on the Internet demonstrate that it is capable of supporting services such as transport of audio, video, real-time, and classical data traffic using a unified network infrastructure. It has been demonstrated through experiments that expanding the Internet service model would result in better service to the needs of these diverse applications. At the same time there is a market push to provide a different service grade so that users could be charged differently. Implementation of these new services requires a network architecture with a different type of equipment as well as software for end hosts and routers than that being used for the best effort network. This chapter summarizes one of the early architectures for providing QoS over IP networks, called the Intserv. The two service classes proposed by IETF, called controlled load and guaranteed service, are discussed in detail. Various components of an Intserv capable router are also described. This chapter also discusses the need for LAN QoS to support end-to-end QoS in the Internet, followed by an overview of the IEEE solution for supporting QoS in LANs and mapping of Intserv QoS to LAN QoS.
107
108
Engineering Internet QoS
5.1 INTSERV AIM In early 1990s, the IETF working group on integrated services (Intserv) was formed with a view toward standardizing the types of services that could be provided to build such an integrated service. The Intserv working group focused on defining a minimal set of global requirements that could help the transition of current best effort Internet into an integrated-service networking infrastructure. The Intserv framework [1] aimed at providing per-flow QoS guarantees to individual application sessions. It defined several new classes of services along with the existing best effort service. The main idea behind this framework is that applications should be able to choose a particular class based on their QoS requirements. The integrated services network provides a mechanism for applications to choose between multiple levels of delivery of its services. This network provides flow oriented service using soft connection oriented communication in conjunction with the existing best effort service. The complexity of the underlying heterogeneous network is hidden from application programmers. This chapter concentrates on the architectural aspects of Intserv and signaling issues. The associated signaling protocol called resource reservation protocol (RSVP) is discussed separately and in depth in Chapter 6.
5.2 APPLICATION CLASSIFICATION The Intserv framework has classified various applications into the following categories:
q Elastic applications; q Tolerant real-time applications; q Intolerant real-time applications. 5.2.1 Elastic Applications Elastic applications are flexible in terms of their QoS requirements and can operate over a range of data rates, delay bounds, and loss rates. Applications such as Telnet, File Transfer (using ftp), Web browsing, and Net news fall into this category. The best-effort service model is acceptable for these applications as long as some resource is available for these services. Most of these applications are built on top of a reliable transport protocol such as transmission control protocol (TCP). TCP slows
Integrated Services Packet Network
109
down or speeds up its data rate based on the sender’s view of network capacity. TCP makes use of network bandwidth in a cooperative way with other competing traffic. TCP also performs error detection and retransmission techniques to ensure that data is delivered to the destination reliably. Because of packet loss in the network, TCP receivers may receive segments out of order. These segments are put into a queue and passed to a higher layer upon receipt of the missing segments (i.e., strictly in sequence). 5.2.2 Tolerant Real-Time Applications Real time applications such as audio conference or video streaming are very sensitive to delay bounds. Timeliness is very important for these real-time applications but they can accept limited loss or delays in some cases. Such real-time applications run over user datagram protocol (UDP) and do not perform any flow control or retransmissions, at least at the transport layer. (Applications can choose to perform rate adaptation, etc., as we discussed in Chapter 1 with RTP/RTCP protocol). Applications can be real-time as well as tolerant to limited delay and loss if they are designed to perform receiver delay adaptation and FEC (as discussed in Chapter 1). A good example of tolerant real-time applications would be audio/video streaming. These applications can tolerate moderate end-to-end delay but require high throughput and very low error rate. Internet games interact with users but do not impose absolute timing constraints. 5.2.3 Intolerant Real-Time Applications Intolerant real-time applications demand more stringent QoS from the network. These applications have precise bandwidth, delay, and jitter constraints, and they degrade severely if the timing constraints are not met. Example of such applications could be interactive voice (Internet telephony), which needs very low end-to-end delay.
5.3 INTSERV SERVICE CLASSES Earlier we divided applications into three categories. Each of these categories has a set of requirements. Integrated services defines two additional services on top of the existing best effort service, namely, controlled load and guaranteed quality of service, to meet these requirements.
110
Engineering Internet QoS
5.3.1 Controlled Load Service Class In words of Wroclawski, “Controlled load service is a quality of service closely approximating the QoS that the same flow would receive from an unloaded network element” [2]. This service is meant to be better than best effort service. The basic idea is that this service will provide the same bounds as would an unloaded network in the same situation. However, there is no consensus on what an unloaded network is. Service providers may find it hard to develop a service level agreement with customers based on such an abstract concept. Controlled load service is designed for tolerant real-time applications that require a sufficient amount of bandwidth and can tolerate occasional delays and losses. These applications perform quite well when the network is lightly loaded, but degrade their performance rapidly as the network load increases. Network elements implementing controlled load service need to set aside a sufficient amount of bandwidth to meet the needs of these applications. However, this service does not provide any fine-grain guarantees, since QoS parameters are not negotiated beforehand. As only a limited amount of bandwidth is reserved for this service, additional packets (above controlled service) can expect to receive only best effort delivery. Some form of admission control is required to limit the traffic using the controlled load service so that it can match the allocated resources for this class. NE elements implementing CL service will have to have some strategies to handle packet bursts. As no explicit rate negotiation takes place for this service, NEs may buffer small bursts rather than discard the packets. The receiver needs to have a buffer to smooth out the jitter. 5.3.2 Guaranteed Service Class Guaranteed service, as per specifications in Shenker et al. [3], provides firm bounds on throughput and deterministic upper bound on packet delays. Packets receiving guaranteed service will arrive at their destination with a defined delay bound. This requires computation of delay target at each hop along the data path (route to destination) in order to calculate the cumulative maximum delay of all hops that the packet is traversing. Guaranteed service is designed for intolerant real-time applications such as CBR and rt-VBR. Interactive multimedia applications (such as Internet telephony) are intolerant to delay, and perform very limited receiver buffering. Applications have the option of controlling the delay parameter by increasing the bandwidth required.
Integrated Services Packet Network
111
In the guaranteed service specification, aR
source’s traffic characteristics are Y provided by a token bucket with parameters and the requested service is characterized by a transmission rate of . Since the traffic is specified using a token bucket with a request for rate guarantee of , it is also possible to bound maximum queuing delay at
network elements. With token bucket, traffic generated in interval _ I is bounded by I , as discussed in Chapter 2, since the traffic is serviced at rate . The guaranteed service class is hard to implement in shared media environments that do not support resource reservation.
5.4 FLOW DEFINITION As discussed earlier, the main objective of the Intserv framework is to provide QoS guarantees to individual application sessions or flows. Flow is a chain of packets from a sending application to a receiving application traversing a network element, all of which are covered by the same request of QoS. It has its own traffic and performance characteristics. For example, a video streaming application will send video frames to a destination (multicasting is also possible). Each frame belonging to this video stream is sent as a sequence of packets. All such packets will need a certain amount of bandwidth and bound on end-to-end delay and jitter. In some cases, a flow could also represent aggregated traffic from multiple application sessions. From this point onward we will use the term “flow” in place of “application session” (in fact, a session may have more than one flow). Later in this chapter we will discuss the ways of identifying a flow in the Internet routers.
5.5 SIGNALING/FLOW SETUP In a class based network such as Intserv, the applications need to specify what service they want from the network. A signaling protocol is used for the purpose of indicating application requirements to the network. The signaling protocol carries QoS-related information from the end systems requesting QoS guarantees to network elements. Figure 5.1 shows a pair of sender/receivers connected by a set of routers. Flow setup involves installing a state on a set of routers between the source and destination. A signaling protocol needs to interact with the routers in the path. Flow admission at network elements (routers) requires consideration of required resources and unreserved resources, and the decision is made whether the request should be admitted or rejected. A signaling protocol may also gather QoS-related information from network elements that are along an application’s data flow path
112
Engineering Internet QoS
Flow setup
Receiver Sender
Figure 5.1
Flow setup.
for delivery to end systems. It is worth noting that the QoS of a flow cannot be better than the lowest QoS provided by an NE along the end-to-end path. The following steps are typical of a signaling protocol: 1. Application requests for resources; 2. NE looks at required resources and unreserved resources; 3. NE admits or rejects the request. Intserv uses a soft connection oriented approach. A soft connections oriented approach differs from a static connection oriented approach (such as with ATM networks) in that it maintains the connection state for a limited amount of time only. The state needs to be refreshed at regular intervals if the connection is active. Some examples of protocols that perform signaling functions are resource reservation protocol (RSVP) [4], ST-II [5], and Q.2931. Other mechanisms using network management protocols are also possible. RSVP has already been standardized by the IETF and a detailed discussion is provided in Chapter 6.
5.6 ROUTING PROTOCOL INDEPENDENCE Use of QoS based routing is not mandatory for the Intserv model. One of the reasons to do this is to make sure that existing next-hop-forwarding does not need immediate modification. Secondly, QoS based routing between domains is still an active research area and is not deployed in production networks. Underlying routing
Integrated Services Packet Network
113
entities may provide a fixed path by using mechanisms such as route pinning (see Chapter 6 for details).
5.7 RESERVATION SPECS A flow willing to use the Intserv network sends a reservation request consisting of the following two parts:
q Filterspec: Filterspec specifies the set of data packets that is supposed to receive the QoS. Those packets not passing the filter are forwarded using best effort service;
q Flowspec: Specifies desired QoS used by admission control and scheduler during packet forwarding. Flowspecs Flows in the Intserv need to communicate their QoS requirements in such a way that all NEs can understand their needs. The Intserv framework has standardized the service specification parameters using flowspecs. Following are the two categories of specification parameters. Traffic Specification A traffic specification (TSpec) is a description of the traffic pattern for which service is being requested. Once a request is accepted, the network elements along the path to the destination provide a specific QoS as long as the flow’s data traffic conforms to TSpec. Table 5.1 shows the list
of parameters used for TSpec. The first three parameters , , and g relate to token
bucket filter (discussed earlier, in Chapter 2). They specify the average data rate ( ), buffer size to absorb burst ( ), and short-term peak rate g . The token rate may range from 1 byte/sec to 40 terabytes/sec (the theoretical limit on optical fiber). The bucket depth can
range from 1 byte to 250 gigabytes. The peak rate parameter has the same range as . The minimum policed unit is the minimum datagram size used for resource allocation h (this excludes the link layer header). The maximum packet size sets the upper limit on packets that are eligible for receiving negotiated QoS. Senders may choose h are considered the MTU of the path to set this parameter. Packets larger than nonconformant and may not receive any service guarantees.
114
Engineering Internet QoS
Table 5.1 TSpec Parameters Parameter Token rate Bucket depth Peak data rate Minimum policed unit Maximum packet size
Symbol
Unit Bytes/second Bytes Bytes/second Bytes Bytes
Request Specification A service request specification (RSpec) is a specification of the quality of service a flow wishes to request from a network element. The controlled load service is specified using TSpec only. It doesn’t specify the delay characteristics of the flows. The guaranteed load service uses RSpec to specify its delay characteristics in addition to TSpec. The following parameters are used for RSpec:
q Service rate (bytes/sec) must be greater than
. q Slack Term (microsec) must be non-negative.
The underlying principle behind these terms is that if rate is increased, the variable delay caused by queuing will decrease (as packets will be processed faster). These specifications are revisited in Chapter 6 in the context of RSVP message format. For details of RSpec and TSpec, see Wroclawski [6] and Shenker et al. [7]. Worst-Case Queuing Delay for Guaranteed Service The guaranteed service class is possible only if all NEs in the path are able to support this service. RFC2212 [3] states “The definition of guaranteed relies RD
service Y on the result that the fluid delay of a flow obeyinga token bucket and being served by a line with bandwidth is bounded by as long as is no less than
. Guaranteed service with a service rate , where now is a share of bandwidth rather than the bandwidth of a dedicated line, approximates this behavior.” The above result is valid for a fluid model service that would be provided if there were a wire bandwidth of R between sender and receiver (a fluid model assumes no packetization, i.e., bits flow continuously). However, the bound on
Integrated Services Packet Network
115
queueing delay should be adjusted by error terms that specify how the flow deviates from the fluid mode. Two error terms and need to be calculated per hop as well as end-to-end for this purpose. The term represents the delay contributed by rate parameters as per TSpec. This is also known as packet serialization delay. For example, an IP datagram may be fragmented by link layer and then transmitted on an outgoing link, incurring some delay in the process. The second term is rate independent and represents a queuing delay at NEs. The time division multiplexer (TDM) serves as a good example for understanding the delay term . If we assume that the TDM has 8 terminals connected to its input link, then terminals can send every 8 slots. In the worst case scenario (assuming all terminals are active), a packet that arrived immediately after its designated time slot allocation, will have to waitGDUaG maximum of 8 time slots. The NE will add an 8 time slot delay to the incoming as its own delay term D [8]. The partial sum of these terms between shaping points are N and F . The cumulative GDUG GDUG values of these terms and represent end-to-end delay contributed by all NEs in the data path. After incorporating these error terms, the worst case queuing delay can be calculated using the following equations:
R
for g
s
W
P
R OVhSY
Y
¤£
for
R
zv g
v
[
Y
W
P
R R
g
V g
RKh¢_
Y _ V¡
Y
R h¥_
GDUGY _
GDUG Y
GDUG
_
GDUG
(5.1)
(5.2)
Let’s assume that §¦¨© is the desired end-to-end delay of the application and it is larger than the worst case delay of the fluid model. The slack term is calculated using the following equation:
P
¦¨©
VuR
_
GDUG
_
GDG Y
(5.3)
The slack term above may be used by the NEs to reduce the amount of resource allocated to a flow (this will increase the delay, though). Applications specify their traffic characteristics. They receive the error terms and latency from the network using a signaling protocol. This enables them to verify whether their delay requirements can be met or not. The applications then set their playout point to this worst case end-to-end delay as discussed in Chapter 1.
116
Engineering Internet QoS
Control Plane Reservation setup agent
Routing agent
Management agent
Admission control
Routing database
Traffic control database Forwarding plane
Packet classifier
Queue
Packet scheduler
Input driver
Figure 5.2
Output driver
Intserv capable router.
5.8 IS-CAPABLE ROUTER COMPONENTS Figure 5.2 shows the components of an IS capable router. A brief description of various components followed by functioning of the IS capable router is provided below. Many of these components have been discussed in detail in Chapter 2. 5.8.1 Admission Control The main job of admission control is to determine whether access to resources available at the network element (router) can be granted to a new flow. In order to perform this task, the admission control module needs to determine if adequate unused resource is available. If resource is committed for a flow, it should be available for use by this flow for the lifetime of the flow. A new request may not be admitted if the resource is not available (or is requested to renegotiate with modified parameters). The admission control unit looks at the TSpec and RSpec of a flow to make a decision. Currently there is a move toward policy-based admission control, whereby policies can be used to determine who can access which resource at what time. There are variety of ways of implementing admission control.
Integrated Services Packet Network
117
5.8.2 Policing and Shaping Policing is the set of actions performed by a network element when a flow’s actual data traffic characteristics exceed the negotiated values given in the flow’s traffic specification TSpec. Services that use policing functions to operate correctly must specify both the action to be taken when such violations occur and the locations in the network where violations are to be detected. Network elements can perform one of the following actions in response to the traffic contract violation:
q Drop packets using schemes, such as RED or variations eRED, wRED, etc.; q Downgrade packets to a lower service class (such as best effort); q Mark packets as nonconforming, and other network elements may choose to drop these packets or reshape the traffic to conform to TSpec. Controlled load service must ensure the following steps when a nonconformant packet arrives at the edge of the network:
q Continue servicing conformant flows; q Take measures to stop nonconformant traffic from unfairly impacting the conformant traffic;
q Send nonconformant traffic as best effort if sufficient resources are available. Guaranteed service needs to implement policing at the edge of the network, as well as reshaping at intermediate NEs. Nonconforming traffic can either be dropped or marked nonconformant and forwarded as best effort datagrams. Reshaping involves buffering packets so that they conform to the negotiated TSpec. Why do we need to reshape the traffic once policing has already been done at the edge? The answer to this question lies in the fact that NEs may introduce jitter at intermediate nodes in the path. 5.8.3 Packet Classifier Packet classifier on a host or network element is responsible for identifying packets corresponding to a particular flow. This identification is essential in order to provide special treatment to the packets. For IPv4, packets are identified using five tuples (srcAddr, dstAddr, srcPort#, dstPort#, Protocol ID). IPv6 proposes use of flow label; however, there is no consensus as to how this flow label can be picked up so that it
118
Engineering Internet QoS
is unique globally. After the identification process is complete, each packet in the flow is associated to a particular QoS class designated for this flow. 5.8.4 Packet Scheduler Packet scheduler is responsible for ensuring that the flows identified by packet classifier receive the negotiated QoS guarantees. This involves servicing packets in the queues in such a way that they receive the service that has been requested. Chapter 3 provides details of packet scheduler and a variety of scheduling disciplines. For guaranteed service class, WFQ may be used, in which each flow gets its individual queue with a certain share of link and provides guaranteed end-end delay bound. Other methods are being tested for controlled service. In addition to queue management, scheduler can also perform traffic shaping to make sure that it remains in-profile. This means that bursty incoming traffic (the cause of burstiness in this case could be an upstream router, not the source sending these packets) will be shaped to avoid aggravating the condition of a flow. 5.8.5 Packet Processing Routers in the Intserv network make use of a packet classifier to establish perflow context for each accepted flow. This context is then used to drive schemes, such as token bucket metering of the traffic, and trigger a policing action (packet drop, nonconformant packet, etc.). Packets are assigned to one of the queues based on their QoS requirements. The reservation agent performs the signaling part (i.e., accepts the QoS request and provides an appropriate response). The reservation agent interacts with the admission control module to check if a new request can be admitted, and appropriately configures classifier, packet scheduler and traffic control database. As we have discussed earlier, the Intserv framework does not mandate QoS routing. Existing routing databases based on current routing protocols in the Internet (IGP and BGP) are used for this purpose. The management agent performs network management related tasks. The Intserv framework doesn’t mandate any management protocol. Existing protocols such as Simple Network Management Protocol (SNMP) and Common Management Information Protocol (CMIP) can be used by IS routers. 5.8.6 Traffic Control Implementation Although the implementation details of IS capable routers have been left to vendors, the Intserv working group has provided a possible implementation framework as per
Integrated Services Packet Network
Guaranteed service
119
GS flow 1 GS flow 2 Priority 1 Priority 2
Controlled load service
......... Priority n
Figure 5.3
Hierarchical traffic control.
Figure 5.3 using a hierarchical approach. WFQ is used in this scheme to separate the flows. WFQ was discussed in detail in Chapter 3. Each flow is assigned in a separate queue and the WFQ schedules packets from the queues in such a way that each flow gets at least the preconfigured fraction of link bandwidth under congested network conditions. Bandwidth unused by flows gets assigned to other flows. As guaranteed service flows have stringent QoS requirements, they get their own queue at the top level. This achieves segregation of guaranteed service flows from other flows. All other traffic is sent to a pseudo WFQ flow. The pseudo flow queue separates controlled load traffic from best-effort traffic by implementing a priority scheme. It is possible to have subclasses with varying bounds on delay within the controlled load service using a priority scheme. The borrowing of bandwidth from a lower priority class is permissible to clear any traffic burst. This scheme achieves sharing of bandwidth as well as isolation of flows. It is recommended that the controlled load should be allocated bandwidth in such a way that it doesn’t starve the best effort traffic.
120
Engineering Internet QoS
LAN
Gateway
Internet backbone
Users
Figure 5.4
Gateway
LAN Users
Internet access via LAN.
5.9 LAN QoS AND INTSERV The Intserv QoS model was discussed in the previous sections mainly in the context of backbone networks, where routers are usually connected via point-to-point links. In most working environments, however, users are actually connected to a LAN, which in turn provides access to the global Internet via a LAN gateway (see Figure 5.4). To provide end-to-end QoS in such environments, it is therefore not adequate to engineer QoS only in the backbone. Appropriate QoS mechanisms must be deployed in LANs, too. However, the LAN architectures, being quite different than the backbone architectures, make it difficult to readily deploy the Intserv model in the LAN environment. This section takes a look at the QoS problems in the existing LANs and discusses the standard-based solutions for LAN QoS. 5.9.1 QoS Problem in LAN To support any type of QoS, the first requirement is to be able to somehow differentiate the priority of one flow (or aggregation of flows) from another. In most popular LANs, such as Ethernet and IEEE 802.11 wireless LANs, there is no mechanism at Layer 2 to differentiate such priorities. Since LAN devices, bridges, and switches function at layer 2 (they do not have access to the IP header), they fail to support any type of QoS. Some LANs, for example Token Ring, have indeed such priority mechanisms in place. Nevertheless, it is the large installed base of Ethernet, and now the wireless LANs, that is the concern for LAN QoS. In this chapter, we will therefore mainly look at the QoS issues in the Ethernet and wireless LANs. 5.9.2 IEEE Solution for LAN QoS The LAN QoS solution standardized by IEEE has two main arms, the addition of a priority field in layer 2 frames and replacing the standard FIFO queues in the
Integrated Services Packet Network
pamble SFD daddress
pamble SFD daddress
saddress
saddress
prot ID
pri
type
data
CFI VLAN ID
CRC
type
121
Standard Ethernet
data
CRC
Ethernet with 802.1Q
4 byte VLAN tag
Figure 5.5
Insertion of VLAN tag in Ethernet frames.
bridges and switches with multiple priority queues. Although the latter is not subject to explicit standarization, it is required to reap the benefit of having priority fields in layer 2 frames in the first place. Without having multiple queues, it becomes extremely difficult, if not impossible, to provide different levels of treatment to frames with different priorities. The structures of the priority field and the priority queue are discussed in more details in the following sections. 5.9.2.1 Priority Field in LAN Frame The IEEE 802.3 (Ethernet) and 802.11 (wireless LAN) frames do not have any priority field. The lack of a priority field makes it impossible to convey user priority to the bridge or switch where frames from many sources are queued up for transmission. To address this issue, IEEE 802.1Q can be used, which defines a special 32-bit tag to implement virtual LANs (VLAN) [9]. In a VLAN environment, this tag is mainly used to identify a given VLAN. The VLAN tag includes a 3-bit priority field, which becomes handy when used for the purpose of implementing LAN QoS. The use of the VLAN tag is shown in Figure 5.5. The 4-byte tag is inserted between the source address and type fields of standard Ethernet frames. Once 802.1Q is in use by Ethernet and wireless LAN hosts, the user priority, along with other tag fields, stays with the layer 2 frame all the way from source to any intermediate bridge or switch through to the destination. The bridge or switch can then look up this priority field to take appropriate queuing and scheduling actions. Out of three priority bits, a total of eight priorities are possible. A separate document, IEEE 802.1D, defines seven priorities, as shown in Table 5.2 [10]. One of the eight priorities is left unused.
122
Engineering Internet QoS
Table 5.2 LAN QoS Priorities (in Descending Order) Defined by IEEE 802.1D Priority
Description
7 6 5 4 3 2 1 0
Network control (e.g., ICMP error messages) Voice (less than 10-ms latency) Video (less than 100-ms latency) Controlled load (e.g., busines-critical data) Excellent effort (most important best-effort traffic) Best effort (average best-effort traffic, such as file transfers) Currently unused (left as a spare) Background (bulk transfers, such as routine backup)
5.9.2.2 Priority Queuing in Bridges and Switches
The standard LAN bridges and switches use a single queue at each output port to hold frames from many sources. Frames from this queue are transmitted over the LAN using the simple FIFO scheduling. Queuing delay in the output ports of LAN switches therefore becomes the QoS bottleneck. A frame carrying real-time traffic may have to wait behind many file transfer frames. To address the queuing delay problem, multiple queues must be implemented at each output port. Figure 5.6 illustrates the operation of a bridge or switch output port augmented with multiple priority queues. Two additional modules are added to the standard bridges and switches, a priority mapper and a priority scheduler. The priority mapper inspects the priority field of an incoming frame and appends the frame to the end of an appropriate priority queue. If the number of queues matches the number of user priorities in use, the mapping is a straightforward case; otherwise a mapping table may have to be looked up. The priority scheduler ensures that a frame from a given queue is transmitted only when there is no frame waiting in higher priority queues. Several observations are in order:
q Frame loss due to queue overflow: Frames of a given priority may be lost if the corresponding queue is full. Many commercial bridges and switches do not allow sharing of queuing spaces across the priority queues. A possible consequence of such segregation is that while the queue of a certain priority remains empty, frames of another priority experience high loss rate due to lack of queuing space.
Integrated Services Packet Network
123
Bridge/switch Priority queue
Priority mapper
Priority queue 1
Input port
Figure 5.6
N
Priority scheduler Output port
LAN bridges/switches with multiple priority queues.
Table 5.3 Groupings of LAN QoS Priorities for Switches with Limited Number of Queues per Port Number of Queues 2 4
Priority Groups
ª
0,2,3 « , ª 4,5,6,7 « 2,3 «ª 4,5 « , ª 6,7 «
ª 0« , ª
q Starvation of low priority frames: Due to strict priority scheduling, frames of lower priority may never get through if there is coexistence of heavy high priority traffic. The solution to this problem would be a more flexible queuing scheduling. Several alternate scheduling techniques were covered in Chapter 3. Priority queuing means there should be more than one queue, but exactly how many queues should there be per port (more queues mean more cost)? Many commercial switches have two or four queues per port. A grouping policy is thus required to group multiple user priorities to assign to a given physical queue. Table 5.3 shows the grouping structure for two and four queues as recommended by IEEE 802.1D. 5.9.2.3 Setting the Priority The priority field may be set by user hosts, servers, bridges, switches, routers, or any other LAN device. When the VLAN (IEEE 802.1Q) is in use, user hosts mark the priority field, which usually stays with the frame all the way to the
124
Engineering Internet QoS
destination. The intermediate bridges and switches, however, may be given the authority to overwrite such priorities by remarking the priority field. By assigning such remarking capabilities in the central switches, a network administrator can achieve more tighter control over the organization’s QoS needs. For example, a high performance server connected to a given port may be given the highest priority and all other ports get a lower priority. With this policy, if a frame from another port arrives with the highest priority set by the user, the switch can explicitly remark the priority field to assign a lower priority to this frame. Assigning priorities to ports eliminates the need for hosts to have the VLAN capability (no need for the priority field). In this case, having multiple queues in the switches suffices for implementing LAN QoS. In layer 3 switches or routers, priority can be assigned based on applications identified by TCP/UDP port numbers. For example, a frame carrying delay-sensitive TELNET traffic (identified by TCP port 23) can be placed in a higher priority queue while frames carrying FTP traffic (TCP port 21) are placed in a lower priority queue. Such TCP/UDP port-based prioritization, however, has significant processing overhead and may not easily lend itself to secured environments where IP payloads are encrypted (TCP/UDP port numbers travel in encrypted IP payloads). 5.9.2.4 Upgrading to LAN QoS If the host applications are to mark the priorities, all existing hosts will need new IEEE 802.1Q compliant network interface cards (NIC). In addition, all bridges, switches, and routers must be upgraded with multiple queues per port. Therefore, QoS cannot be supported over the existing Ethernet without the hardware upgrade. The good news is that switch prices with multiple queues per port are falling rapidly. Installing LAN QoS in a new organization (with no existing LAN infrastructure) therefore is not an issue. Also, many organizations are upgrading their switches with priority queuing as hardware prices continue to fall. It is therefore expected that in the near future, most LANs will be capable of LAN QoS. 5.9.3 Mapping of Intserv QoS to LAN QoS The support of Intserv over LAN is defined in three Internet documents [11], [12] and [13]. To support Intserv over LANs, various Intserv services must be mapped to appropriate LAN QoS priorities. Given the description of LAN QoS priorities in Table 5.2, a possible mapping is shown in Table 5.4.
Integrated Services Packet Network
125
Table 5.4 Mapping of Intserv Services onto LAN QoS Priorities Intserv Service
LAN QoS Priority
Guaranteed (10-ms delay bound) Guaranteed (100-ms delay bound) Controlled load Best effort
6 5 4 2
5.10 INTSERV PROBLEMS One single reason why Intserv has not been accepted in the Internet is scalability. The aim of providing per-flow QoS doesn’t scale beyond the Intranet environment. Typically more than 250,000 flows pass through Internet core routers. Maintaining a state for such a large number of flows requires enormous resources. Neogi et al. performed an empirical study on the impact of a large number of flows on a router that is capable of supporting Intserv. Details of experimental methodology and results may be found in their analysis [14]. They observed that real-time scheduling overhead increases with the number of real-time flows. Figure 5.7 shows that when the number of Intserv flows reaches around 400, the scheduling overhead increases sharply. The gap between the two lines in the figure is the latency incurred due to the use of the WFQ scheduler for real-time flows on a commercial router used in their experiments. The router is unable to cope with the load and, as a result, the packets need to wait in the queue. This experiment also showed that the best-effort traffic suffers a 1% packet loss when the number of realtime flows exceeds 450. Such benchmarks may be useful in configuring bandwidth for real-time and best effort traffic. Signaling protocols required for flow setup are not very feature-rich and have limitations such as lack of negotiation and backtracking, requirement of frequent updates, and path pinning to maintain the soft state. We discuss this in the context of RSVP in Chapter 6. Throughput and delay guarantees require support from the link layer. In widely deployed shared Ethernet, it is impossible to provide any bound on throughput and delay.
126
Engineering Internet QoS
4.5 RT NRT 4
3.5
Latency (msec)
3
¬
2.5
2
1.5
1
0.5
0 0
50
100
150
200
250
300
350
400
450
500
Number of flows
Figure 5.7
Best-effort packet latency.
5.11 RESEARCH DIRECTIONS Wang et al. [15] provide experimental evaluation of end-to-end QoS performance of QoS-demanding applications using different transport protocols. They use a new extension of the QoS specification and management to the Berkeley sockets called QoSockets. Their results show that the performances of applications with the Intserv resource reservations are significantly improved, but not always guaranteed. Barzilai et al. [16] present a design, implementation, and performance evaluation of a protocol architecture for supporting Intserv. They introduce a new kernel module called QoS Manager on UNIX-like Internet servers for managing resources such as bandwidth, buffers, and priorities on network interface. An efficiency study of voice over the guaranteed service class was performed by Braden et al. [17]. They consider two scenarios: an IP-phone-to-IP-phone and a gateway-to-gateway. Their results show that with aggregated voice flows, bandwidth may be efficiently utilized. This work is interesting in that it deals with a QoS provision of voice application using Intserv architecture. Reservation for individual flows requiring that each router be signaled with the arrival or departure of each new flow for which it will forward data is not scalable in the Internet. Researchers have proposed solutions whereby reservation
Integrated Services Packet Network
127
aggregation is performed at the edge of networks to provide quality-of-service demands of real-time flows in a scalable way. The scalable multipath aggregated routing (SMART) architecture aggregates flows along multipaths to reduce the perflow reservation state in the routers to a small scalable aggregated state [18]. The size of the states is dependent only on the number of destinations and flow classes. However, aggregation of reservations of flows has its own problems. Aggregation provides approximation to Intserv if flow demand varies at slower time scales. If the bandwidth required by flows varies rapidly, aggregation may result in underutilization of resources. Fu [19] developed an analytical model and performed extensive trace-driven simulations to explore the efficacy of aggregation under varying conditions. An architecture and admission control algorithm termed egress admission control has been proposed in Buchli et al. [20] with a view to provide scalability and a strong service model. The available service on a network path is passively monitored. The admission control is performed only at egress nodes. The effects of cross traffic are implemented with implicit measurements rather than with explicit signaling. Priggouris et al. [21] have performed a simulation study implementing the Intserv architecture in the general packet radio service (GPRS) network. Their study quantified the effect of signaling overhead on the GPRS operation and performance. The results show that the scheme has good scalability, even with a large population base. Packet classification requires routers to look up a reservation table to match each packet based on flow identification criteria such as the five-tuple described above. Several packet filtering schemes have been devised by researchers for realtime classifications of packets [22] [23] [24]. For edge routers with a small number of flows, algorithms such as hash table lookup may suffice. With core routers, the lookup process can become a bottleneck, especially at link speeds of gigabit/second and above. Several research efforts have been made in this direction with proposals for fast lookup algorithms [25, 26]. Wang [27] provides details of several hashingbased schemes and their performance evaluation. Making users pay for the resource they reserve is fairly important in using the Intserv network. If resources are free for all, it doesn’t take long for users to saturate the network. Karsten et al. [28] have proposed a charging scheme to protect an integrated services network from arbitrary resource reservations. A funding mechanism is used to make additional capacity allocation at strategic locations. This work also proposes a technique called virtual resource mapping to apply wellknown economic principles to an optimal pricing framework and other tasks related
128
Engineering Internet QoS
to charging. The major focus of this work is on rate-based service guarantees for Intserv in conjunction with IP multicast and RSVP as signaling protocol.
5.12 SUMMARY This chapter provided an overview of the IETF Intserv architecture for providing QoS. We discussed application classification, traffic, and service specification procedure in the Intserv. A simplified view of an Intserv capable router and associated functions such as admission control, policing, shaping, and traffic classification was presented. Scalability is a major issue with any Internet QoS scheme. We discussed scalability and other deficiencies of the Intserv architecture. Finally, we concluded this chapter by discussing the research directions related to Intserv. In most popular LANs, such as Ethernet and IEEE 802.11 wireless LANs, there is no QoS mechanism at layer 2 devices (bridges and switces) to differentiate one flow (or aggregation of flows) from another. The IEEE solution, known as LAN QoS, extends standard Ethernet frames with priority tags and recommends multiple priority queues in LAN bridges and switches. With the growing base of such new bridges and LAN switches, QoS will soon become available over most LANs. To support Intserv over LANs, RSVP is extended to allow resource reservation on LAN segments. With this extended RSVP and LAN QoS in place, it is possible to establish some level of end-to-end QoS over the Internet.
5.13 REVIEW QUESTIONS 1. What are the main goals of Intserv architecture? 2. Give three examples each of tolerant and intolerant real-time applications. 3. What are the major distinctions between controlled load and guaranteed service class? 4. What is the role of signaling in Intserv architecture? Describe the flow setup procedure. 5. What is the use of Filterspec in Intserv? 6. What is the use of TSpec in Intserv? 7. What is the use of RSpec in Intserv?
References
129
8. What steps are required to perform policing in a controlled load network? 9. What steps are required to perform policing in a guaranteed load network? 10. Why are IS capable routers more complex than best effort routers? 11. How does packet classification work with IPv6? 12. List three problems associated with Intserv. 13. What do we mean by flow aggregation? Can the same level of QoS be achieved with flow aggregation? 14. What is the main QoS challenge in Ethernet and wireless LANs? 15. If the hosts do not take part in IEEE 802.1Q, they cannot convey any priority information to the switch. Under this circumstance, how can a switch prioritize the arriving frames?
References [1] R. Braden, D. Clark, and S. Shenker. Integrated services in the Internet architecture: an overview. Request for Comments (Informational) RFC 1633, Internet Engineering Task Force, June 1994. [2] J. Wroclawski. Specification of the controlled-load network element service. Request for Comments (Standard Track) RFC 2211, Internet Engineering Task Force, September 1997. [3] S. Shenker, C. Partridge, and R. Guerin. Specification of guaranteed quality of service. Request for Comments (Standard Track) RFC 2212, Internet Engineering Task Force, September 1997. [4] R. Braden, L. Zhang, S. Berson, S. Herzog, and S. Jamin. Resource reservation protocol (RSVP) – version 1 functional specification. RFC 2205, Internet Engineering Task Force, November 1997. [5] Craig Partridge and Stephen Pink. An implementation of the revised Internet Stream Protocol (ST-2). Internetworking: Research and Experience, 3(1), March 1992. [6] J. Wroclawski. The use of RSVP with IETF integrated services. Request for Comments (Standard Track) RFC 2210, Internet Engineering Task Force, September 1997. [7] S. Shenker and J. Wroclawski. General characterization parameters for integrated service network elements. Request for Comments (Standard Track) RFC 2215, Internet Engineering Task Force, September 1997. [8] D. Durham and R. Yavatkar. Inside the Internet’s Resource Reservation Protocol. John Wiley and Sons, New York, 1999.
130
References
[9] IEEE 802.1Q : Virtual Bridges Local Area Networks, 1998. [10] IEEE 802.1D : Media Access Control (MAC) Bridges, 1998. [11] R. Yavatkar, F. Baker, D. Hoffman, Y. Bernet, and M. Speer. SBM (subnet bandwidth manager): a protocol for admission control over IEEE 802-style networks. RFC 2814 Standard Track, Internet Engineering Task Force, May 2000. [12] M. Seaman, A. Smith, E. Crawley, and J. Wroclawski. Integrated Service Mappings on IEEE 802 Networks. Request for Comments 2815, Internet Engineering Task Force, May 2000. [13] A. Ghanwani, W. Pace, V. Srinivasan, A. Smith, and M. Seaman. A Framework for Integrated Services over Shared IEEE LAN Technologies. Request for Comments 2815, Internet Engineering Task Force, May 2000. [14] Anindya Neogi, Tzi-cker Chiueh, and Paul Stirpe. Performance analysis of an RSVP-Capable router. IEEE Network, 13(5):56–69, September 1999. [15] P. Wang, Y. Yemini, D. Florissi, J. Zinky, and P. Florissi. Experimental QoS performances of multimedia applications. In Proceedings of the Conference on Computer Communications (IEEE Infocom), volume 2, pages 970–979, Tel Aviv, Israel, March 2000. [16] T. Barzilai, D. Kandlur, A. Mehra, D. Saha, and S. Wise. Design and implementation of an RSVP based quality of service architecture for an integrated services Internet. IEEE Journal on Selected Areas in Communications, 16(3):397–411, April 1998. [17] Maarten Buchli, Danny De Vleeschauwer, Jan Janssen, Annelies Van Moffaert, and Guido Petit. On the efficiency of voice over integrated services using guaranteed service. In Internet Telephony Workshop 2001, New York, April 2001. [18] Srinivas Vutukury and Jose J. Garcia-Luna-Aceves. SMART: a scalable multipath architecture for intra-domain QoS provisioning. Lecture Notes in Computer Science, 1989:67–79, January 2001. [19] Huirong Fu and Edward W. Knightly. Aggregation and scalable QoS: A performance study. Lecture Notes in Computer Science, 2092:307–324, June 2001. International Workshop on Quality of Service (IWQoS). [20] Julie Schlembach, Anders Skoe, Ping Yuan, and Edward Knightly. Design and implementation of scalable admission control. Lecture Notes in Computer Science, 1989:1–15, January 2001. [21] Giannis Priggouris, Stathes Hadjiefthymiades, and Lazaros Merakos. Enhancing the general packet radio service with IP QoS support. Lecture Notes in Computer Science, 1989:365–379, January 2001. [22] Ian Wakeman, Atanu Ghosh, Jon Crowcroft, Van Jacobson, and Sally Floyd. Implementing realtime packet forwarding policies using streams. In USENIX 1995 Technical Conference, New Orleans, Louisiana, January 1995.
References
131
[23] Steven McCanne and Van Jacobson. A BSD packet filter: a new architecture for user-level packet capture. In Proceedings of Usenix Winter Conference, pages 259–269, San Diego, California, January 1993. Usenix. [24] D. Engler and M. F. Kaashoe. DPF: fast, flexible message demultiplexing using dynamic code generation. ACM Computer Communication Review, 26(4):53–59, October 1996. [25] M. Degermark, A. Brodnik, S. Carlsson, and S. Pink. Small forwarding tables for fast routing lookups. ACM Computer Communication Review, 27(4):3–15, October 1997. ACM SIGCOMM’97, September 1997. [26] M. Waldvogel, G. Varghese, J. Turner, and B. Plattner. Scalable high-speed IP routing lookups. ACM Computer Communication Review, 27(4):25–36, October 1997. ACM SIGCOMM’97, September 1997. [27] Z. Wang. Internet QoS Architectures and Mechanisms for Quality of Service. Morgan Kaufman Publishers, San Francisco, California, 1st edition, 2001. [28] Martin Karsten, Jens Schmitt, Lars Wolf, and Ralf Steinmetz. Provider-oriented linear price calculation for integrated services. In Proceedings of the Seventh IEEE/IFIP International Workshop on Quality of Service (IWQoS’99), London, UK, pages 174–183. IEEE/IFIP, June 1999. ISBN 0-7803-5671-3.
132
References
Chapter 6 Resource Reservation Protocol
The integrated services (Intserv) framework described in Chapter 5 requires that applications communicate their resource requirements to network elements along the path. A signaling protocol is required for this purpose. IETF has proposed a standard Resource Reservation Protocol (RSVP) to solve this problem [1]. RSVP carries resource reservation requests (traffic specifications, QoS specifications, network resource availability, etc.) through the network. This chapter looks at the design of RSVP protocol and explains its usage through examples.
6.1 RSVP FEATURES Designers of RSVP have listed several architectural features [2]. This section provides an overview of the major RSVP features. 6.1.1 Simplex Protocol It supports a variety of communication methods: point-to-point, point-to-multipoint, and multipoint-to-multipoint. It supports simplex streams between sources and receivers. A reservation is made for data flows from sources (upstream) to receivers (downstream). For full duplex communication, separate reservations should be made in each direction.
133
134
Engineering Internet QoS
6.1.2 Receiver-Oriented Approach RSVP was originally designed with multicasting applications in mind. Current Internet multimedia applications such as Real audio-video, vic (video-conferencing tool), vat/rat (audio-conferencing tools) have more receivers than senders [3, 4, 5]. For example, a NASA shuttle launch is viewed worldwide over Mbone (multicasting backbone). Typically, unicast communication is handled as a degenerate case of multicast. RSVP has been designed to accommodate heterogeneous receiver systems and subnets. The receiver-oriented design caters for diverse receiver requirements. Say, for example, in a multicast session with multiple senders, one receiver could be interested in a particular sender whereas another receiver could be interested in all senders. A receiver may modify its requested QoS anytime. This can also happen in response to a sender’s modification of its traffic characteristics (TSpec). A new sender can start sending to a multicast group and may need a larger reservation. Also, a new receiver joining a multicast group may request a different QoS. 6.1.3
Routing-Protocol Independent
One major objective of RSVP design is to be consistent with the robustness of the present connectionless model of the Internet [2]. Hence, RSVP does not mandate use of any new routing protocol. It makes use of the existing routing table set up by current unicast or multicast routing protocols used in the Internet. Routing architecture mechanisms such as route pinning provide a route that is relatively stable. 6.1.4 Reservation Setup Figure 6.1 illustrates the basics of the RSVP reservation mechanism. Network elements (NE) are the intermediate communication devices such as routers and switches that are RSVP capable. There are two basic reservation setup models supported by RSVP:
q One Pass: A sender sends its TSpec to the destination. In this model, there is no support for path-characteristic indication to the sender.
q One Pass with AdSpec (OPWA): In this model, a sender sends its TSpec as well as AdSpec to the NEs along the path toward the destination in a PATH message. Network elements look at TSpec and forward it further along with AdSpec,
Resource Reservation Protocol
135
Available resources AdSpec
Sender
Traffic specification
Network element (router)
Network element (router)
Receiver Traffic specification QoS specification
Figure 6.1
RSVP signaling setup.
which advertises a network element’s capabilities (i.e., whether it can support a particular type of service) and available resources. Based on TSpec and AdSpec received along with the receiver’s own requirements, a QoS reservation request message (RESV) is generated by the receiver. TSpec and AdSpec are discussed later, in Section 6.5. 6.1.5 Soft State Refresh Another important feature of RSVP is that it uses soft state. Soft state is different from static state (used in ATM virtual circuits) in that it gets purged after a short interval. The soft state makes it possible to adapt to routing changes, link failures, and multicast group membership changes with explicit signaling messages for change and removal of the state. The reservation state gets purged automatically after a fixed interval. Retaining the reservation state requires that RSVP PATH and RESV messages refresh the state installed in network elements on a regular interval (typically 30 seconds).
6.2 RESERVATION MERGER Merger of reservations is one of the key features of RSVP. This is particularly useful in multicast sessions where the routing path (tree) between multiple source and destination pairs may have shared links. If a reservation state for a session is already in place at a network element, a new resource request for the same session will only recalculate and possibly update the state rather than creating a new state. The main advantage of this process of reservation merger is that it utilizes resources efficiently and reduces the amount of control messages on the network. Figure 6.2 shows two senders, H1 and H2, connected to a router R. Let’s assume that H1 needs
136
Engineering Internet QoS
10 Kbps video and H2 needs 20 Kbps video from same source of a multicast session and that the reservation for H1 has already been made on the interface if1 of the router. When the new request of 20 Kbps comes from H2 for the same session, the router R merges the two requests and forwards a reservation request of 20 Kbps to the upstream node. Now let’s change the scenario by starting the reservation setup process from H2 of 20 Kbps first. In this case, the new reservation request of 10 Kbps from H1 will not result in a change of state as the least upper bound (LUB) of merged reservations results in a 20 Kbps reservation, and this reservation is already in place. At this point, if H2 wants to remove its reservation of 20 Kbps (as it is no more interested in this session), the router will need to make changes in its state by sending a reservation request of only 10 Kbps to the upstream router. However, if the data rate sent by the source of this multicast session exceeds 20 Kbps, the unreserved part of flow will be delivered only on a best effort basis.
H1
10 Kbps 20 Kbps if1
R 20 Kbps
H2 Figure 6.2
RSVP reservation merger.
Reservation merger brings forward an interesting issue. In Figure 6.2, the receiver is interested in or capable of receiving only 10 Kbps. How could the router decide which packets to forward and which packets to discard on the link going toward H1? Randomly discarding the packets may result in a multimedia stream that is useless for the receiver. Special media filters are needed to sensibly reduce the media stream by taking advantage of layered media encoding. For example, the video stream may be encoded in a base and an enhancement layer. Receivers interested in lower quality (data rate) may receive the base layer, whereas the receiver interested in high quality may receive both layers. Amir et al. [6] have
Resource Reservation Protocol
137
demonstrated a video gateway capable of performing a media transcoding function to suit receiver bandwidth requirements. Mergers of requests from receivers are determined by reservation styles. We provide a detailed description of reservation styles in Section 6.3.
6.3 RESERVATION STYLES Reservation style indicates to the network element that an aggregation of reservation request is possible for a multicast group. Resource reservation controls how much bandwidth is reserved, whereas reservation filter determines the packets that can make use of this reservation. RSVP supports three styles of reservation. A description of these styles is provided in the following subsections: 6.3.1 Wildcard Filter The wildcard-filter (and shared explicit) style reservation is suitable for multicast sessions where sources are not likely to send information at the same time. Typically, audio applications are suitable for this style since only a limited number of participants can converse with each other simultaneously. A reservation slightly exceeding the requirements for a single speaker (for overspeaking and interjections) will be sufficient for this style. Multiple senders are required to coordinate the use of shared bandwidth. RSVP protocol doesn’t take care of the conference control and floor control issues. Handley et al. [7] describe a conference control protocol developed by IETF for this purpose in detail. We look at the wildcard-filter style using an example in Figure 6.3. These examples use a rate of Kbps for simplification (in reality token bucket parameters are used). The example uses a multicast session with three senders S1, S2, and S3 and three receivers H1, H2, and H3. The senders S1 and S2 as well as receivers H1 and H2 are shown to be on a LAN segment capable of implementing traffic priority schemes. Following are the requirements of receivers:
q H1 wants to reserve 3 Kbps. q H2 wants to reserve 2 Kbps. q H3 wants to reserve 4 Kbps.
138
Engineering Internet QoS
4 Kbps
S1
S2
if 1
if 2 4 Kbps
R1
S3
if 0 4 Kbps
if 2 if 1 4 Kbps
H3
R2
3 Kbps
H1
H2
4 Kbps
Figure 6.3
if 0
2 Kbps
3 Kbps
Example of a wildcard filter.
Reservations for H2 and H3 are merged—4 Kbps (if0, R2). Another request comes from H1 on if1 at R2 for 3 Kbps. R2 sends a merged request via (if2, R2) of 4 Kbps. It is the larger of (if1, R2) 3 Kbps and (if0, R2) 4 Kbps. Router R1 forwards a 4 Kbps request on if1 and if2 (to S1, S2, and S3). An important point to note here is that the source is not identified and that merger of requests at routers does not use the sum of the incoming requests, but takes the larger of the two values. 6.3.2 Shared Explicit The shared-explicit-filter style reservation is similar to wildcard-filter, with the only difference that senders are identified. The reservation is shared among all senders in the list. Figure 6.4 shows an example of the shared-explicit-filter style reservation. In this case, following are the requirements of receivers:
q H1 wants to reserve 1 Kbps for S1 and S2. q H2 wants to reserve 3 Kbps for S1 and S3.
Resource Reservation Protocol
3 Kbps
3 Kbps
139
S3 (3 Kbps) 3 Kbps
S1
S2
if 2
R1
if 1
S3
if 0
S1, S2 (3 Kbps)
S1, S2, S3 (3 Kbps)
if 2 if 1 if 0
R2
S1, S2, S3 (3 Kbps)
H3 S2 (2 Kbps)
Figure 6.4
S1, S2 (1 Kbps)
H1
H2
S1, S2 (1 Kbps) S1, S3 (3 Kbps)
Example of a shared explicit filter.
q H3 wants to reserve 2 Kbps for S2. A reservation for sources S1, S2, and S3 from H2 and H3 on (if0, R2) is merged to 3 Kbps. Another request comes from H1 on if1 of R2 for 1 Kbps to S1 and S2. The requests on if0 and if1 of router R2 are merged and forwarded on if2 as 3 Kbps for S1, S2, and S3. The requests received on if0 of router R1 are forwarded as follows:
q on if2 3 Kbps for S1 and S2. q on if1 3 Kbps for S3. 6.3.3 Fixed Filter The fixed-filter style reservation is suitable for applications such as videoconferencing, where one window is required for each sender and all these windows need to be updated simultaneously. Fixed-style reservation requires that receivers identify the source from which they want to receive the reservation along with the bandwidth required. Bandwidth is not shared (between sources), since reservations are made for a particular source.
140
Engineering Internet QoS
S3 (2 Kbps)
S1
R1
S2
if 1
S3
if 2 if 0 S1(3 Kbps) S2 (4 Kbps)
if 2
S1 (3 Kbps S2 (4 Kbps) S3 (2 Kbps)
if 1 if 0
S1 (2 Kbps) S3 (2 Kbps)
H3
H1
H2
S1 (1 Kbps)
Figure 6.5
S1 (3 Kbps) S2 (4 Kbps)
R2
S1 (2 Kbps) S3 (2 Kbps)
S1 (3 Kbps) S2 (4 Kbps)
Example of a fixed filter.
Figure 6.5 shows how the fixed-filter style reservation can be used. Following are the requirements of receivers:
q H1 wants to reserve 3 Kbps for S1 and 4 Kbps for S2. q H2 wants to reserve 2 Kbps for S1 and 2 Kbps for S3. q H3 wants to reserve 1 Kbps for S1. The reservation for source S1 from H2 and H3 is merged to 2 Kbps at (if0, R2). The reservation for source S3 of 2 Kbps from H2 arrives at (if0, R2). Another request comes from H1 on if1 of R2 for 3 Kbps to S1 and 4 Kbps to S2. The requests on if0 and if1 of router R2 are merged and forwarded on if2 as follows:
q 3 Kbps for S1; q 4 Kbps for S2; q 2 Kbps for S3.
Resource Reservation Protocol
141
The requests received on if0 of router R1 are forwarded as follows:
q on if2 3 Kbps for S1 and 4 Kbps for S2; q on if1 2 Kbps for S3. 6.3.4 RSVP/ns Simulation In this section we provide results of a simulation study using an extension to RSVP called RSVP/ns [8]. Figure 6.6 shows the topology used for simulation. Node 2 1 Mbps (0.5 Mbps) 1 Mbps
Node 0
(0.5 Mbps)
1 Mbps
Router 1
Node 3
(0.5 Mbps)
1 Mbps (0.5 Mbps)
Node 4
Figure 6.6
Simulation topology.
6.3.4.1 Simulation of Fixed-Filter-Style Reservation Figure 6.7 shows the results of fixed-filter-style reservation. Node 0 is receiving multicast sessions from three sources on Node 2, 3, and 4. Each source is sending at constant bit rate (CBR) of 0.5 Mbps, and the flows belong to the same video conferencing session. The receiver is performing an explicit reservation for different sources at different times. If no reservation is made, traffic gets best effort treatment. As is evident from Figure 6.7, initially up to time 200, all three sources are sharing a 1 Mbps link between Node 0 and router 1 (out of 1 Mbps, only 0.5 can be reserved by RSVP flows on all links), and get a throughput of 0.33 Mbps. At time 200, Node 0 requests 0.1 Mbps for flow from Node 2. (Remember that traffic that is not covered by this reservation gets best effort treatment. As a result, 0.4 Mbps of traffic from the required source would compete for resources as best effort.) This reservation succeeds, and flow from Node 2 starts getting higher throughput
142
Engineering Internet QoS
0.5 Flow 1 Flow 2 Flow 3
0.45
0.4
Throughput (Mbps)
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0 0
200
400
600
800
1000
1200
1400
1600
1800
2000
Time
Figure 6.7
Filter simulation result.
compared to other flows. At time 400, Node 0 makes another reservation request for 0.5 Mbps for a flow from Node 3. However, this reservation fails since maximum allowable bandwidth for RSVP flow is 0.5 Mbps, and on shared links between Node 0 and 1, a reservation of 0.1 Mbps is already in place. At time 700, a request for 0.4 Mbps from Node 3 succeeds, and flow from Node 3 starts getting higher throughput. At time 1,100, the reservation for flow from Node 3 is dropped to 0.1 Mbps. As is evident from Figure 6.7, both flows from Node 2 and Node 3 are getting equal bandwidth. At time 1,400, the reservation request of 0.3 Mbps for flow from Node 4 succeeds. 6.3.4.2 RSVP Reservation Merger Figure 6.8 shows the simulation results of an RSVP reservation merger. We use the same topology of Figure 6.6. In this case, Node 0 takes the role of a multicast sender, sending three different multicast flows of CBR traffic at 0.5 Mbps to Nodes 2, 3 and 4. The receiving nodes perform reservations and removal of reservations at different times. Figure 6.8 presents the results of traces collected at one of the receiving nodes. Up to time 200, no reservation has been made and all flows are sharing the
Resource Reservation Protocol
143
0.6 Flow 1 Flow 2 Flow 3 0.5
Throughput (Mbps)
0.4
0.3
0.2
0.1
0 0
200
400
600
800
1000
1200
1400
1600
1800
2000
Time
Figure 6.8
Merge simulation result.
bandwidth equally. At time 200, a reservation of 0.1 Mbps for flow 1 is made and this flow starts getting more bandwidth. There is successive increase in bandwidth at time 500 (of 0.3 Mbps) and time 800 (of 0.5 Mbps). From time 800 onwards, flow 1 starts getting the maximum reservable bandwidth of 0.5 Mbps. The successive reservation requests for flow 1 have been merged, otherwise the last reservation for 0.5 Mbps would have failed because of exceeding the maximum reservable limit of 0.5 Mbps on a link. Finally, reservations are removed at times 1,100, 1,400, and 1,600 and the impact of this is obvious from the figure. All flows start sharing the bandwidth equally once all reservations have been removed.
6.4 RSVP MESSAGES RSVP has two main messages: PATH and RESV. The source transmits PATH messages every 30 seconds. RSVP messages travel hop-by-hop. The next hop is determined by the routing table. Routers remember where the message came from and maintain this state (called route pinning). RSVP messages are sent using a raw IP datagram (port # 46). End systems not accommodating raw IP datagrams may encapsulate RSVP messages in UDP segments. IP packets have the router alert
144
Engineering Internet QoS
option set in the header. The alert option signals NEs that this message needs special processing.
6.4.1 PATH Messages
Figure 6.9 shows traversal of a PATH message in a simple network of four network elements, R1–R4, two sources, S1–S2, and three destinations, D1–D3. Arrows in the forward direction show the route taken by the PATH message from each source to every destination. (For simplicity we assume that each destination is interested in each source.) The forwarding decision is based on routing tables built by protocols such as OSPF, since RSVP doesn’t mandate use of a particular routing algorithm. At a minimum it contains the IP address of each previous hop (PHOP) that is used for subsequent RESV messages. Also, the data packets follow the same path once the state has been set up. PATH messages also carry the following:
q Sender template: contains the data format, source address, and the port number that uniquely identifies sender’s flow from other RSVP flows;
q Sender TSpec: provides traffic flow characteristics; q AdSpec: helps the receiver identify non-RSVP routers in the path. It contains a cumulative summary of QoS parameters (calculated and updated at each node). AdSpec is discussed in detail later in this chapter.
S1
R1
D1
R4
D2 S2
Figure 6.9
R2
Path message.
R3
D3
Resource Reservation Protocol
145
6.4.2 RESV Messages Receivers must join a multicast address to receive the PATH messages. Receivers generate reservation (RESV) requests upon receipt of a PATH message. RESV messages contain a request for resources to be reserved. They are forwarded along the reverse path of PATH messages. Figure 6.10 shows RESV messages from receivers D1 and D3 moving toward sources S1 and S2. The resource reservation request is expressed by filter specification and flow specification. Filter specification defines the packets in the flow that will receive a specific class of service. This is helpful in the packet classification process. Flow specification is used by packet schedulers. Content of flow specification is dependent upon the Intserv service class (controlled load or guaranteed load). This will generally include TSpec and RSpec. We will look at the format of TSpec and RSpec later in this chapter.
S1
R1
D1
R4
D2 S2
R2
Figure 6.10
Reserve message.
R3
D3
6.4.3 Other RSVP Messages Table 6.1 provides a summary of a few important RSVP messages, their purpose and direction of flow. The direction of a message is considered downstream if it is being sent from sender toward receiver, and upstream if it is going from receiver toward the sender. An exhaustive list of all RSVP messages can be found in Braden et al. [1] A brief description of the purpose of some messages is given below: RESV confirmation: Is used by the sender to notify the receiver that its reservation request has been satisfactorily installed. This message is sent directly to the receiver (no hop-by-hop processing).
146
Engineering Internet QoS
Table 6.1 RSVP Messages Message Type
Purpose
Direction
PATH RESV RESV confirmation PATH error RESV error PATH tear RESV tear
Install path state and traffic specification Request for resource (QoS) Send confirmation of reservation to receiver Report error in path installation Report error in reservation installation Explicitly remove PATH state Explicitly remove RESV state
Downstream Upstream Direct to receiver Upstream Downstream Downstream Upstream
PATH error: Is used to indicate an error in processing the PATH message sent to the sender (hop-by-hop). RESV error: Is used to indicate an error in processing the RESV message sent to the receiver(s) (hop-by-hop). A possible cause could be the refusal of resources by the admission control module of an NE. PATH tear: Is explicitly generated by senders (or by an NE after timeout of the paths state in a node along the traffic path). This message is sent to all receivers. A timeout occurs only after missing three refresh messages (90 seconds). PATH tear will immediately remove the state (important if the user is billed). RESV tear: Is explicitly generated by the receiver or any node in which the reservation state has timed out. This is sent to all pertinent senders. This helps free up resources to be used by other flows immediately. 6.4.4 Message Processing Requests are checked for resource availability (admission control) and administrative permissions (policy control). A request may be accepted, accepted with modified parameters, or refused because of a shortage or an unavailability of resources. It is also possible to refuse a request because of administrative or policy constraints. Routers maintain a soft state of accepted reservation. The receivers have to refresh periodically at 30-second intervals by sending RESV messages. Two or more RESV messages for the same source over the same link are merged. This is particularly useful for multicasting sessions. Description of different merger styles used by RSVP was provided earlier in this chapter.
Resource Reservation Protocol
147
6.5 RSVP MESSAGE FORMAT Figure 6.11 shows the format used for a common header of RSVP messages. It consists of a header and object(s). The header contains following information:
q Vers (4 bits): version number of RSVP (currently 1); q Flags (4 bits): various flags (See RFC2205); q Message type (8 bits): RSVP message (e.g., 1 - PATH, 2 - RESV); q Checksum (16 bits): verifies message corruption; q Send TTL (8 bits): counts RSVP aware hops; q RSVP length: length in bytes of entire RSVP message (limited by use of IP to 64 K). It may be further limited by MTU size of path. This includes a common header and variable length objects.
0 Vers
1 Flags
Send_TTL
2
3
Message type
RSVP checksum
(Reserved)
Figure 6.11
RSVP message common header format.
Bytes
0
1
Length
bytes
RSVP length
2 Class−Num
3 C−Type
Object contents
Figure 6.12
RSVP message object format.
The object contains fields required for describing a message, as shown in Figure 6.12. A message can contain multiple objects. There are 14 classes of objects
148
Engineering Internet QoS
currently defined by the RSVP standard. Each class has multiple types that specify the format of encapsulated data. The object header contains the following fields:
q Length (16 bits): length of object in bytes including header (multiple of 4 and at least 4);
q Class-Num (8 bits): general characterization of data (e.g., SESSION, POLICY DATA, ADSPEC);
q C-Type (8 bits): object type, unique within a Class-Num (e.g., for SESSION class; C-Type = 1 represents IPv4 or C-Type = 2 represents IPv6 address);
q Object contents: this field contains the object itself and is limited to 65528 bytes. Examples of a few objects are provided next. 6.5.1 Session Objects Figures 6.13 and 6.14 show the format of UDP session objects for IPv4 and IPv6 addresses with C-Type = 1 and C-Type = 2, respectively. It can be extended to accomodate other addresses by defining a new C-Type. The format contains fields such as the destination IP address, the IP protocol identifier, and the destination TCP or UDP port of a data flow. There are two types of these objects to support IPv4 and IPv6 addressing. These addresses can be either unicast or multicast. A flag field can be set if a host is not capable of policing and wants the edge network device to perform policing. All RSVP messages require this object to identify a flow. 0
1 Length = 12
2 Class−Num = 1
3
Bytes
C−Type = 1
IPv4 destination address (4 Bytes) Protocol ID
Figure 6.13
Flags
Destination port
IPv4 UDP session object.
6.5.2 TSpec Object A RSVP source uses the sender TSpec class object to specify the traffic characteristics of its data flow, as shown in Figure 6.15 [9]. It provides an indication to the
Resource Reservation Protocol
0
1 Length = 24
2 Class−Num = 1
149
3
Bytes
C−Type = 2
IPv6 destination address (16 Bytes) Protocol ID
Figure 6.14
Flags
Destination port
IPv6 UDP session object.
admission control module as to how much resource needs to be reserved for a flow. Intserv token bucket parameters are used to describe the data flow characteristics. In particular, the token rate, token size, peak rate, minimum policed unit, and maximum packet size are the attributes of this object. The TSpec object is mandatory in a PATH message. 0
1 Length = 36
Version num Service num Parameter ID
2 Class−Num = 12
(Reserved)
IS length
(Reserved)
Service data length
Parameter flags
3
bytes
C−Type = 2
Parameter data length
Token rate Token bucket rate Peak data rate Minimum policed unit Maximum packet size
Figure 6.15
Sender TSpec object.
6.5.3 AdSpec Object Class Forwarding NEs use the AdSpec object class to specify the kind of services they offer, along with service-specific attributes and the amount of QoS resources
150
Engineering Internet QoS
available. This object is carried in PATH messages and manipulated by RSVP aware hops in data path. The AdSpec object is modified by an NE only if its available resource or capability to support a service is less than what is specified in an incoming PATH message’s AdSpec (also if there is no AdSpec in the incoming PATH). The information received in AdSpec is used by the receiver to determine available service types and minimum capacity along the path. AdSpec helps the receiver formulate the request in RESV message. This object class is informational only because resource availability at an NE may change dynamically. As discussed earlier, this object is required in the reservation setup with the OPWA model. 6.5.4 AdSpec Functional Block The AdSpec object has three functional blocks: 1. General characterization parameters block: This block carries the following values:
q NUMBER IS HOPS: number of IS compliant hops along data path; ® AVAILABLE PATH BANDWIDTH: largest data rate possible on the path a receiver can reserve;
® MINIMUM PATH LATENCY: minimum delay a receiver can expect along this path. It is increased by each device along the path to reflect the delay introduced (time taken to forward a packet). This is the fixed delay component on top of the queuing delay;
® PATH MTU: maximum transmission unit (packet size) supported along this path under the QoS constraints. 2. Controlled load service (CLS) block: This block is used to advertise to the destination that CLS is supported on this path. This block contains only a header (no values). 3. Guaranteed service functional block: This block provides information about delay bound along the data path. An NE that doesn’t support a service marks the header to indicate to the receiver that at least one
Resource Reservation Protocol
151
device along the path may not support this service. It contains two components: total independent delay term ( ¯ ) and rate dependent delay term ( ° ). ° represents delay as a function of the transmission rate of the link. ¯ is the variable delay caused by the queuing of packets at a device. Each device adds its local delay characteristics to °J±²± and ¯§±²± . Other terms, °<³L´µ and ¯¶³L´µ , are used to describe worst-case delay terms since the last traffic-shaping device. These components were discussed in detail in Chapter 5. 6.5.5 Other RSVP Objects In addition to the objects discussed earlier, we provide a summary of some other important RSVP objects below: RSVP hop: This object class is used for route pinning. A device’s RSVP hop is the closest upstream or downstream device along the data path. Time value: The time value object controls the interval between message refreshes (remember, for soft state, maintenance messages must be refreshed at regular intervals). Error specification: This class is used for identification and reporting of errors. RSVP Error and PATH Error messages make use of this object. Scope: Records a list of senders to avoid message looping in certain multicasting scenarios. Style: Specifies the style of reservation (WF, FF, SE). Flow specification: Used by receiver to specify Intserv style flow characteristics (controlled load or guaranteed service). Filter specification: Used by receiver to uniquely identify a flow source (particularly useful for the multicast scenario where there are multiple senders). Used by various RESV messages. Sender template: Used by senders to uniquely identify themselves (and also provide some additional information) in PATH messages. Policy data: This is an optional object in PATH and RESV messages that can be used for authentication, billing, and other policy matters. Reservation confirmation: This object carries an IP address of a destination that is interested in receiving a confirmation from the sender for its RESV message.
152
Engineering Internet QoS
6.5.6 PATH Message Format Earlier we looked at generic message formats and a number of objects that could be used by these messages. Figure 6.16 shows the format of a PATH message. In Section 6.4.1 we saw how a PATH message traverses from a sender (upstream) to a receiver downstream. The IP header indicates the source and destination (unicast or multicast) address of this message. The PATH message is received by an NE and interpreted by the RSVP process. It may either install a new PATH state or refresh the existing state. The session object contains the destination address and port used by the flow. The PATH message is also forwarded along interface(s) after looking up the routing table. The PATH message consists of a number of objects on top of the RSVP header. The hop object is used to identify the upstream device (PHOP) that generated this message for route pinning. As per discussions earlier, sender template and TSpec identify the source and its characteristics, whereas AdSpec indicates the kind of service offered. The PATH state should be refreshed at regular intervals as per time value object. Details of a number of other objects in the message can be found in RFC2205.
RSVP header
Hop object
Sender template object Sender TSpec object ADSpec object
Figure 6.16
PATH message format.
PATH message
Session object
Resource Reservation Protocol
153
6.5.7 RESV Message Format
The RESV message is used to make a request for a reservation. Section 6.4.2 discussed how an RESV message traverses from a receiver (downstream) to a sender (upstream). The RESV message follows the reverse path of an incoming PATH message (the route has been pinned by PATH). The network element (upon receipt of the RESV message) makes a check to find a valid installed PATH state. After finding a valid state, it performs the admission control function. The result of admission control may be acceptance (install state) or rejection (generate error message) of reservation. Figure 6.17 shows that the RESV message consists of a number of objects on top of the RSVP header. The style object is used to indicate a reservation style. It also includes flow and filter specifications to identify a source of interest and reservation for that source. The number of these objects will depend on the style of reservation used. Again like the PATH state, the RESV state should be refreshed at a regular interval based on time value object.
RSVP Header
Hop Object
Style Object Flow Spec Object Filter Spec Object
Figure 6.17
RESV message format.
RESV Message
Session Object
154
Engineering Internet QoS
6.5.8 Controlled Load Flow Specification Controlled load service was discussed earlier in the context of Intserv in Chapter 5. Figure 6.18 shows the format of controlled load flow specification object. The most important point to observe here is that it uses the token bucket parameters to make the bandwidth reservation (token bucket doesn’t specify delay characteristics). Token bucket parameters were discussed in Chapter 2. 0
1
Class−Num = 9
Length = 36 Version num Service num = Controlled load
Parameter ID=127
2
(Reserved)
IS length
(Reserved)
Service data length
Parameter flags
3
bytes
C−Type = 2
Parameter data length
Token rate Token bucket rate Peak data rate Minimum policed unit Maximum packet size
Figure 6.18
Controlled load flow specification.
6.5.9 Guaranteed Load Flow Specification Figure 6.19 shows guaranteed service flow specification. It has an additional RSpec term to deal with delay bounds in addition to the token bucket parameters. The RSpec has a rate and slack term. The underlying principle behind these terms is that if rate is increased, the variable delay caused by queuing will decrease (as packets will be processed faster). Flow merger becomes very complicated for guaranteed services.
Resource Reservation Protocol
0
1 Length = 48
Version num Service num= Guaranteed−service
Parameter ID=127
2 Class−Num = 9
(Reserved)
IS length
(Reserved)
Service data length
155
3
bytes
C−Type = 2
Parameter data length
Parameter Flags Token rate Token bucket rate Peak data rate
Minimum policed unit Maximum packet size Parameter ID=130
Parameter flags
Parameter data length
Rate Slack term
Figure 6.19
Guaranteed load flow specification.
6.6 RSVP APIS The RSVP stack is supported by many operating system vendors these days. GQoS stands for generic QoS. It takes advantage of WinSock 2 APIs. WinSock 2 is a general API usable by a variety of network protocol stacks [10]. RSVP implementation under various flavors of UNIX is available from the research community at no cost. RSVP application programming interface (RAPI) is one such example. Applications use RAPI to initialize a session and provide a callback routine. When a network event associated with the callback routine occurs, the application gets notified. Applications provide TSpec and the sender template to the RSVP daemon via RAPI. SCRAPI is a simplified version of RAPI that needs only a few parameters, to interact with RSVP daemon [11]. This simplifies the task of application programmers as they do not need to have a detailed understanding of RSVP.
156
Engineering Internet QoS
SCRAPI contains four commands, namely, sender, receiver, close, and status. Their definition and usage are listed below: sender: is used to register as a data sender. receiver: is used to make a QoS reservation as a data receiver. The receiver commands may be repeated with different parameters to dynamically modify the state of a session at any time. close: is used to close the session and delete all of its resource reservations. status: is used to check the current status of a session. Three types of status are possible: Green, Yellow, and Red. Green means the reservation is done and is working properly for the session. Yellow indicates that the operation is pending. Red means no session is created for the requested session. A brief description of the SCRAPI command line syntax and an example of making a resource reservation using this interface is provided below: receiver destination protocol source reservation service style sender destination protocol source bw ttl close destination protocol source status destination protocol source Parameters: destination or source ::= | <port> | <0> protocol ::= | reservation ::= | service ::= | style ::= <shared> | bw ::= ttl ::= Note: A session is defined by a particular transport protocol, IP destination address, and destination port. cl - Control load service gs - Guaranteed service bw - Bandwidth ttl - Time-to-live
Resource Reservation Protocol
157
Example of an Application Using SCRAPI A sending host wants to reserve 1 Mbps (1000000 bps) for a TCP stream from itself to a receiving host, where the IP address and port used by the sending and receiving host is 10.0.0.1/1001 and 20.0.0.1/2001, respectively. To do this, the sending host needs to execute the following command in SCRAPI first: scrapit> sender 20.0.0.1/2001 tcp 10.0.0.1/1001 1000000 10 Note that the command above has ttl equal to 10, and it is just an implementation issue. A PATH message to the receiving host is generated as a result of this command. After the receiving host receives the PATH message, it can use the following command to make a reservation: scrapit> receiver 20.0.0.1/2001 tcp 10.0.0.1/1001 on cl distinct This command will create a control load service reservation with distinct merger style. After executing this command, an RESV message is sent to the sending host, and the reservation for the tcp data flow is done. Both the sender and receiver can use the command below to check the status of the reservation. scrapit> status 20.0.0.1/2001 tcp 10.0.0.1/1001 If either hosts want to close the reservation, it can be done by executing this command: scrapit> close 20.0.0.1/2001 tcp 10.0.0.1/1001
6.7 RSVP PROBLEMS RSVP introduces additional complexity in routers as they need packet classification, scheduling, and admission control modules. This also slows down the speed of packet processing. Neogi et al. [12] carried extensive performance analysis of commercial router with a 133-MHz processor and 8K DRAM. They measured the latency of RSVP PATH and RESV messages by measuring the difference in timestamps at which the packet appears at input and output links. The x-axis of Figure 6.20 shows the number of sessions in progress when measurement was
158
Engineering Internet QoS
performed. The value of 9 indicates that already 9 RSVP sessions are in progress. Measurements were performed under loaded (marked L in the figure) and unloaded (marked U in the figure) conditions for only the first RSVP message (as subsequent messages may be delayed by the routers). The loaded condition is created by increasing and decreasing best effort traffic through the router. It is evident from this figure that, on average, the RESV message takes longer than the PATH message, as it requires interaction with the admission control module of the router. This study also found that packet scheduling overhead for real-time traffic starts affecting the performance guarantees if the number of such flows is very high. 14 PATH (U) RESV (U) PATH (L) RESV (L) 12
Latency (Msec)
10
8
·
6
4
2
0 0
Figure 6.20
200
400
600 Number of sessions
800
1000
Latency of RSVP messages.
Since RSVP is receiver oriented, it scales well in number of receivers per flow. However, it does not scale with a number of flows (per-flow state: ¸º¹D»|¼ ). It is suitable for the Intranet environment where the number of flow is comparatively small. The number of flows in the backbone can be very large (some core routers may carry more than 250K flows at any time). This has brought a need for the concept of “virtual paths” or aggregated flow groups for the backbone. Earlier we saw the merits of the receiver based approach adopted by RSVP. However, in some cases we may need sender control and notifications. Also, if
Resource Reservation Protocol
159
receiver reservations are merged in a multicast scenario, which receiver is liable to pay for a shared part of the tree? This gets complicated further by receivers joining and leaving dynamically. The soft state approach has its own problems. It requires route and path pinning. Change in route will take a while before a new state is established and the old state gets purged. Throughput and delay guarantees require support of lower layers. This poses challenges for shared media LANs since delay is not bounded for these LANs. Supporting QoS at the link layer will need switched full-duplex LANs. Integrated services over the specific link layer (ISSLL) working group is currently addressing this problem. Talwar [13] describes the killer reservation problem encountered when merging RSVP reservation requests. The RSVP requests get merged as they travel up the multicast distribution tree. In this process they lose information about individual requests. A request that would have succeeded on its own may suffer denial of service when the merged request fails admission control. One possible solution to this problem is to install an additional state, called blockade state, in routers using RESV error messages. The blockade state modifies the merging algorithm by omitting the offending reservation from merge. This allows the smaller request to be forwarded and established. However, this method adds further complexity in router processing. Another important issue is the need for policy controls: Who can make reservations? ISPs would like to make sure that they have an authorized sender and receiver making a request and they should be able to bill the user. The IETF resource allocation protocol (RAP) working group has work in progress to address this issue [14].
6.8 OTHER RESOURCE RESERVATION PROTOCOLS An early work on reservation protocol for multicasting is stream protocol (ST) and its successor ST-II [15]. ST-II was proposed at the time when multicast routing was still in its infancy. It continues to have its own multicast tree by combining the paths derived from unicast routing protocols. As opposed to the earlier version that used a central access controller, ST-II establishes multiple simplex reservations. This obviates the problem of a centralized access controller being responsible for coordination with all participants and management of tree establishment. ST-II doesn’t accommodate heterogeneous receivers. Single reservation pipe is set up from every source to every receiver in a multicast group. Partridge and Pink [16] provide details of ST-II implementation. Delgrossi et al. [17] provide a comparison
160
Engineering Internet QoS
of RSVP and ST-II protocol. The authors have identified the classes of applications that are better supported by one or the other protocols. The session reservation protocol (SRP) [18] is based on a workload and scheduling model called the DASH resource model [19]. This model defines parameterization of client workload, an abstract interface for hardware resources, and an end-to-end algorithm for negotiated resource reservation based on cost minimization. SRP implements this end-to-end algorithm, handling those resources related to network communication. SRP allows communicating parties to reserve the resources, such as CPU and network bandwidth, to provide QoS support (delay and throughput). Herrtwich [20] provides a comparison of SRP, ST-II, and RSVP. As we discussed earlier, RSVP suffers from two major problems: complexity and scalability. Ping Pan et al. [21] developed a new reservation mechanism called YESSIR (yet another sender session internet reservation) that simplifies the process of establishing reserved flows while preserving many unique features introduced in RSVP. Additional features such as robustness, advertising network service availability, and resource sharing among multiple senders are part of the YESSIR. Reservation requests are generated by senders to reduce the processing overhead and runs on top of RTCP. It supports soft state to maintain reservation states, shared reservation, and flow merger similar to RSVP. Another distinct feature of YESSIR is its extension of the all-or-nothing reservation model to support partial reservations that improve over the duration of the session. Scalable resource reservation protocol (also called SRP) [22] is another attempt to provide a solution to the scalability problem of RSVP. SRP proposes a new architecture that automatically aggregates flows on each link in the network. SRP introduces packet type with three values (reserved, request, or best effort) that can be encoded in two bits. Researchers at the University College of London [23] developed alternative reservation protocols for packet networks. The ATM block transfer (ABT) reservation protocol has been designed to modify reservations on the fly to efficiently utilize the bandwidth. Their new reservation protocol, dynamic Reservation Protocol (DRP) combines the best features of RSVP and ABT.
6.9 RSVP EXTENSIONS Since its inception, RSVP has become a very extensively used signaling protocol. This has resulted in several extensions to solve some of the problems associated either with RSVP implementation or new applications of the RSVP protocol. Many
Resource Reservation Protocol
161
of these applications may have not been foreseen by the designers of this protocol. This section overviews some of the significant extensions to RSVP protocol. 6.9.1 Improvement-Related Extensions Aggregation of individual RSVP flows into an aggregate class has been proposed by Baker [24]. A single RSVP reservation is used to aggregate other RSVP reservations across a transit routing region. This is analogous to the use of virtual paths in an ATM network. Recommendations are also made for various algorithms and policies for predictive reservations. RSVP protocol doesn’t guarantee the delivery of control messages, as it relies on periodic refresh messages. This doesn’t work efficiently in a congested network. Reservation establishment and removal can be delayed by as much as 30 seconds if corresponding PATH/RESV or PATH TEAR/RESV TEAR messages are dropped. Further, state removal will not happen until the RSVP cleanup timer expires. An enhancement of RSVP proposed by Pan and Schulzrinne [25] is called staged refresh timers to support fast and reliable message delivery that ensures hop-byhop delivery of control messages while retaining the soft state mechanism. Refreshing soft state in RSVP can generate lots of traffic and load on routers. A number of mechanisms that reduce the refresh overhead of RSVP have been discussed in Berger et al. [26]. These extensions are useful in reduction of processing requirements of refresh messages, elimination of the state synchronization latency because of loss of an RSVP message, and, when desired, suppression of the generation of refresh messages. RSVP has also been extended to provide security [27]. These extensions allow support of individual data flows using RFC1826 IP authentication header (AH) or RFC1827 IP encapsulating security payload (ESP). These extensions facilitate use of security features for both both IPv4 and IPv6. 6.9.2 Subnet Bandwidth Manager The Intserv model is designed to be largely independent of the underlying networking technologies. As a consequence, additional mapping mechanisms are needed to implement Intserv over LANs. Specifically, there are two components for such mapping, an extension of RSVP to allow Intserv-capable hosts to reserve LAN resources and a service mapping between Intserv service classes over the IEEE 802.1D queuing priorities, which was discussed in Chapter 5. In IETF terminology, the RSVP extension is known as subnet bandwidth manager (SBM).
162
Engineering Internet QoS
SBM capable switch PATH
PATH
RESV
Figure 6.21
DSBM
RESV
PATH
RESV
Operation of SBM using RSVP messages.
The SBM is a signaling method and protocol for RSVP-based admission control over IEEE 802-style LANs [28, 29]. It enables hosts and routers to reserve LAN bandwidth for RSVP flows in a given LAN segment. In its most basic form, SBM is an admission control entity that can be implemented in a router or on a separate node on the LAN segment. An SBM can be in charge of a single segment or it may manage multiple segments. For fault tolerant purposes, more than one SBM is active in a given segment. An election protocol is used to nominate one of the active SBMs as a designated SBM (DSBM). The presence of a DSBM within a segment is broadcast on the LAN periodically. Absence of such broadcast messages over a certain interval indicates a failure of the DSBM. Other active SBMs participate to elect a new DSBM in the event of a failure or termination. The conventional RSVP can be modified to reserve resources on the LAN through a DSBM. Figure 6.21 illustrates the resource reservation process on the LAN using SBM. Initially, the host sends its RSVP PATH message through the DSBM. The DSBM forwards this message to the next hop router or other Intserv device and waits for the RESV message from the destination. The DSBM uses a special MAC and IP address (a reserved multicast address) to listen to incoming requests from the DSBM clients. DSBM keeps track of all resources consumed on the LAN segment for accepted reservations and determines whether to accept new requests based on the remaining LAN capacity and any administrative policies.
Resource Reservation Protocol
163
Interference with Best Effort Traffic SBM serves as a reservation and admission control module for LANs. Intserv capable hosts can reserve bandwidth subject to availability of bandwidth and any policy imposed by the LAN administrator. For example, the administrator may specify only 20% of the total LAN bandwidth for reservation purposes; the rest is used by existing best-effort traffic. The presence of best effort traffic on the LAN poses interference problem for RSVP flows. To reduce such interference, IEEE 802.1D switches use priority queues, with RSVP traffic assigned the higher priorities than best effort traffic. However, if SBM is implemented in existing shared legacy LANs without priority switches, the control of best effort traffic is entirely left with the TCP’s control engine at the transport layer. It is well known that TCP can adapt its average sending rate to the available bandwidth in the underlying network, using an adaptive algorithm with cyclic increase and decrease of its transmission window. For guaranteed flows, however, the instantaneous bandwidth is also important as much as, if not more than, the average bandwidth. Unfortunately, TCP is not capable of keeping the instantaneous rate of best effort traffic within the available bandwidth. As a result, it may be difficult to guarantee QoS over shared LANs. Baig et al. [30] have performed a simulation study to quantify the impact of best effort traffic on RSVP flows. 6.9.3 New Application-Related Extensions In order to support the Diffserv (details of the Diffserv framework are discussed in Chapter 7) style network, the RSVP is extended to include DCLASS objects [31]. The DCLASS object indicates the Diffserv DSCP that the sender is required to include when submitting packets on the admitted flow, to the Diffserv network. Certain network elements (routers) within or at the edges of the Diffserv network may use RSVP messages to effect admission control or to apply QoS policy in this model. Policy-based management has been covered in Chapter 8. A set of extensions for supporting generic policy-based admission control in RSVP has been proposed by Herzog [32]. These extensions include the standard format of POLICY DATA objects, and a description of RSVP’s handling of policy events. RSVP+ [33] describes the impact of RSVP router message processing based on a variety of extensions proposed for RSVP. These variations are driven under policy control.
164
Engineering Internet QoS
RSVP has been extended to be used as label distribution protocol in the multiprotocol label switching (MPLS) network [34]. Several additional objects have been proposed as an extension to RSVP that allow the establishment of explicitly routed label switched paths using RSVP as a signaling protocol. RSVP extensions to address some of the unique requirements of such optical trails are proposed by Lang et al. [35]. This extension serves as control plane for dynamically provisionable optical cross-connects (OXCs) for future optical networks and the multiprotocol lambda switching (MP ½ S).
6.10 SUMMARY Resource reservation is an important part of building a class based Internet. This chapter examined the prominent IETF resource reservation protocol RSVP. Basic building blocks of the RSVP and its functions were described in detail. We discussed the scalability problem and other drawbacks of RSVP. An overview of extensions to the RSVP protocol for a variety of new applications was also provided. To support Intserv over LANs, RSVP is extended to allow resource reservation on LAN segments. With this extended RSVP and LAN QoS in place, it is possible to establish some level of end-to-end QoS over the Internet. Finally, we concluded this chapter by discussing the research directions in resource reservation protocols.
6.11 REVIEW QUESTIONS 1. List four architectural features of RSVP. 2. Why does RSVP use a receiver oriented approach for resource reservation? 3. List the drawbacks associated with RSVP. 4. Explain the concept of request merger in RSVP. 5. Compare and contrast the three filters used by RSVP. 6. For Figure 6.3 what is the forwarded reservation on if2 of router R2 if H1 is requesting 5 Kbps and H2 is requesting 4 Kbps? 7. For Figure 6.5 what is the forwarded reservation on if2 of router R2 if H1 is requesting S1 (3 Kbps) and S2 (5 Kbps)? 8. Explain the killer reservation problem in RSVP.
References
165
9. What are the problems associated with the soft-state timer refresh period being set to large values such as 30 seconds? 10. Routers supporting RSVP are required to identify the flows. How are flows identified by a RSVP capable router? Discuss any problems that the flow identification method may pose for providing secure services to the end user. 11. What is the use of the AdSpec functional block? 12. What is the main threat to the performance of SBM in shared (not switched) LANs?
6.12 IMPLEMENTATION PROJECT Write a client/server program that makes use of SCRAPI interface to reserve bandwidth. Your prototype test-bed will need a client, a server, and at least an intermediate router. All of these should be RSVP capable. Your measurements should be able to show that despite presence of other best effort flows, the reserved flow receives the bandwidth it has requested. Hints: you may like to use ALTQ for traffic control. The companion Web site for this book has URLs for various implementations related to this project.
References [1] R. Braden, L. Zhang, S. Berson, S. Herzog, and S. Jamin. Resource reservation protocol (RSVP) – version 1 functional specification. RFC 2205, Internet Engineering Task Force, November 1997. [2] Lixia Zhang, Stephen Deering, Deborah Estrin, Scott Shenker, and Daniel Zappala. RSVP: a new resource ReSerVation protocol. IEEE Network, 7(5):8–18, September 1993. [3] Steve McCanne and Van Jacobson. Vic: A flexible framework for packet video. In Proc. of ACM Multimedia ’95, pages 511–522, San Fransico, California, November 1995. [4] Van Jacobson and Steve McCanne. The LBL audio tool vat. Manual page, July 1992. [5] Vicky Hardman, Angela Sasse, Mark Handley, and Anna Watson. Reliable audio for use over the Internet. In Inet’95, Honolulu, Hawaii, June 1995. [6] Elan Amir, Steve McCanne, and Hui Zhang. An application level video gateway. In Proceedings of ACM Multimedia, San Francisco, California, November 1995.
166
References
[7] Mark Handley, Ian Wakeman, and Jon Crowcoft. The conference control protocol CCCP: a scalable base for building conference control applications. In SIGCOMM Symposium on Communications Architectures and Protocols, pages 275–281, Cambridge, Massachusetts, September 1995. [8] M. Greis. RSVP/ns: An implementation of RSVP for the network simulator ns-2. RSVP/ns Documentation, 2000. [9] D. Durham and R. Yavatkar. Inside the Internet’s Resource Reservation Protocol. John Wiley and Sons, New York, 1999. [10] Martin Karsten, Jens Schmitt, Lars Wolf, and Ralf Steinmetz. An embedded charging approach for RSVP. In Proceedings of 6th IEEE/IFIP International Workshop on Quality of Service, pages 91–100, Napa, California, May 18–20 1998. IEEE/IFIP. [11] B. Lindell. SCRAPI — a simple bare bones API for RSVP. Internet Draft, Internet Engineering Task Force, March 1999. work in progress. [12] Anindya Neogi, Tzi-cker Chiueh, and Paul Stirpe. Performance analysis of an RSVP-Capable router. IEEE Network, 13(5):56–69, September 1999. [13] M. Talwar. RSVP killer reservations. Internet Draft, Internet Engineering Task Force, January 1999. Work in progress. [14] Resource allocation protocol working group. http://www.ietf.org/html.charters/rap-charter.html. [15] C. Topolic. Experimental Internet Stream Protocol: Version 2 (ST-II). Internet Request for Comment RFC1190, IETF, 1990. [16] Craig Partridge and Stephen Pink. An implementation of the revised Internet Stream Protocol (ST-2). Internetworking: Research and Experience, 3(1), March 1992. [17] Luca Delgrossi, Ralf Guido Herrtwich, Carsten Vogt, and Lars C. Wolf. Reservation protocols for Internetworks: A comparison of ST-II and RSVP. In Proceedings of the 4th International Workshop on Network and Operating System Support for Digital Audio and Video, pages 195–203, Lancaster, U.K., November 1993. Lancaster University. Lecture Notes in Computer Science 846. [18] David P. Anderson. SRP: a resource reservation protocol for guaranteed-performance communication in the Internet. Report UCB/CSD 90/562, Computer Science Division, University of California, Berkeley, February 1990. [19] David P. Anderson, Shin-Yuan Tzou, Robert Wahbe, Ramesh Govindan, and Martin Andrews. Support for continuous media in the DASH system. Technical Report CSD 89/537, University of California, Berkeley, October 1989. [20] Ralf-Guido Herrtwich. Reservation mechanisms for internetworks. In Architecture and protocols for high-speed networks, pages 279–294, Wadern, Germany, September 1993. Dagstuhl Seminar. [21] Ping Pan and Henning Schulzrinne. YESSIR: A simple reservation mechanism for the Internet. Technical Report RC 20697, IBM Research, Hawthorne, New York, September 1997.
References
167
[22] Werner Almesberger, Tiziana Ferrari, and Jean-Yves Le Boudec. SRP: a scalable resource reservation protocol for the internet. Technical Report SSC/1998/009, EPFL, Lausanne, Switzerland, March 1998. [23] Paul Patrick White and Jon Crowcroft. A case for dynamic sender-based reservations in the Internet. Technical report, University College London, London, England, May 1998. [24] F. Baker. Aggregation of RSVP for IP4 and IP6 reservations. Internet Draft, Internet Engineering Task Force, June 1999. work in progress. [25] Ping Pan and Henning Schulzrinne. Staged refresh timers for RSVP. In Proceedings of Global Internet, Phoenix, Arizona, November 1997. also IBM Research Technical Report TC20966. [26] L. Berger, D. Gan, G. Swallow, and P. Pan. RSVP refresh reduction extensions. Internet Draft, Internet Engineering Task Force, July 1999. work in progress. [27] L. Berger and T. O’Malley. RSVP extensions for IPSEC data flows. Internet Draft, Internet Engineering Task Force, August 1997. Work in progress. [28] R. Yavatkar, F. Baker, D. Hoffman, Y. Bernet, and M. Speer. SBM (subnet bandwidth manager): a protocol for admission control over IEEE 802-style networks. RFC 2814 Standard Track, Internet Engineering Task Force, May 2000. [29] A. Ghanwani, W. Pace, V. Srinivasan, A. Smith, and M. Seaman. A Framework for Integrated Services over Shared IEEE LAN Technologies. Request for Comments 2815, Internet Engineering Task Force, May 2000. [30] A. Baig, M. Hassan, and S. Jha. Quality of service of RSVP flows over legacy LANs in the presence of best effort traffic. IEICE Transactions on Communications : Appendix, E84(11):25– 28, November 2001. [31] Y. Bernet. Usage and format of the DCLASS object with RSVP signaling. Internet Draft, Internet Engineering Task Force, March 1999. Work in progress. [32] S. Herzog. RSVP extensions for policy control. Internet Draft, Internet Engineering Task Force, April 1999. Work in progress. [33] S. Gai, G. Dutt, N. Elfassy, and Y. Bernet. RSVP+: an extension to RSVP. Internet Draft, Internet Engineering Task Force, July 1999. Work in progress. [34] D. Awduche, L. Berger, D. Gan, T. Li, V. Srinivasan, and G. Swallow. RSVP-TE: extensions to RSVP for LSP tunnels. Internet Draft, Internet Engineering Task Force, August 2000. Work in progress. [35] J. Lang, K. Mitra, and J. Drake. Extensions to RSVP for optical networking. Internet Draft, Internet Engineering Task Force, March 2000. Work in progress.
168
References
Chapter 7 IP Differentiated Services Network IETF proposed another framework, called Diffserv, that could support a scalable form of QoS and could provide a variety of end-to-end services across multiple, separately administered domains. Trying to maintain per-flow QoS becomes a monumental task for large networks. Experimental measurements show that even for an OC-3 link, more than 250,000 source destination pairs may be passing through the backbone routers each minute [1]. Diffserv works at class level, where a class is an aggregate of many such flows. For example, packets coming from a set of source addresses may fall into one class. The rest of this chapter discusses various aspects of Diffserv architecture, services, and current research trends.
7.1 DIFFSERV ARCHITECTURE The RFCs 2474 and 2475 define the fundamental framework of the Diffserv architecture [2, 3]. The scaling properties of the Diffserv architectural framework are achieved by marking each packet’s header with one of the standardized codepoints. Each packet containing same codepoint receives identical forwarding treatment by routers and switches in the path. This obviates the need of state or complex forwarding decisions in core routers based on per flow, as is the case with Intserv. Figure 7.1 shows a Diffserv domain with a set of interior (core) routers and boundary (edge) routers. The ingress boundary router is normally required to classify traffic into microflow based on TCP/IP header fields. Diffserv microflows are subjected to policing and marking at the ingress boundary router according to a contracted service level specification (SLS). Depending on the particular Diffserv model, out-of-profile packets are either dropped at the boundary or marked with
169
170
Engineering Internet QoS
Interior node
Boundary node
Boundary node Interior node
Ingress Interior node Diffserv domain
Interior node
Boundary node Interior node
Egress Boundary node
Figure 7.1
Diffserv domain.
a different priority level, such as best-effort. These functions are termed as traffic conditioning in Diffserv language. A traffic conditioner is governed by rules that are defined in the traffic conditioning agreement. TCA typically includes traffic characteristics (token bucket parameters may be used for this) and performance metrics (delay, throughput, etc.) as actions required for dropping nonconformant packets. Details of SLS, TCA, and related issues are discussed in Chapter 8. A Diffserv flow along with similar Diffserv traffic forms an aggregate. All subsequent forwarding and policing are performed on aggregates by Diffserv interior nodes. As the interior nodes are not expected to perform an expensive classification function, their ability to process packets at high speeds becomes viable. At inter-domain boundaries, SLSs specify the service to be given to each aggregate in the transit. Enforcement of the aggregate traffic contracts between Diffserv domains is key to providing QoS. The admission control modules must ensure that new reservations do not exceed the aggregate traffic capacity. These features make it possible to provide end-to-end services using Diffserv architecture. We discuss service models used by Diffserv later, in Sections 7.3 and 7.5. A new kind of network entity known as a bandwidth broker (BB) has emerged for QoS sensitive networks. It plays an important role in automating admissions control for Diffserv networks. Bandwidth broker is discussed in detail in Chapter 8.
IP Differentiated Services Network
171
The rest of the chapter uses the edge router for the term boundary node and the core router for the interior node, to be consistent with the rest of the book. 7.1.1 Per-Hop Behavior The Internet currently supports several services, namely mail, file transfer, Web, and so on. In the future, there may be more services with different characteristics. In contrast to Intserv, the Diffserv model does not define any service; it defines certain behaviors a packet may receive at each hop. This is called per-hop behavior (PHB). PHBs are combined with a much larger number of policing policies at the edge routers, to provide a range of services. Many different PHBs can be defined. An example of a simple PHB is one that guarantees that a given class of marked packets receive strictly ¾ % of the outgoing link bandwidth. A variation would be to define a PHB that guarantees a minimum of ¾ % link bandwidth and then a fair share of any excess bandwidth. Yet, another PHB might specify that one class of traffic will always receive strict priority over another class—that is, if a highpriority and a low-priority packet are queued in a router at the same time, the higher priority packet must leave before the lower priority packet. Note that Diffserv does not standardize any particular queuing discipline. The vendors may use priority queuing, WFQ, or anything else they like, as long as the observable behavior meets the PHB specification. In the Diffserv model, several traffic flows are aggregated to one of a small number of behavior aggregates (BAs). Each BA gets treated using the same PHB. Flows identified by the same Diffserv Code Point (DSCP) belong to a BA. We describe DSCP later in Section 7.1.4. A PHB group is a set of PHBs that share a common constraint. Within a group, resources can be allocated relative to each other. Also, the drop precedence of packets may be defined within a group. An example of the PHB group called AF PHB is defined in Section 7.5. 7.1.2 Per-Domain Behavior The Diffserv WG has standardized a new RFC3086 [4] that uses a term called perdomain behavior (PDB). The PDB describes the behavior experienced by packets as they pass through a DS domain. Specific metrics are used to quantify the treatment that packets with a particular DSCP are expected to receive. These metrics should be suitable for use in SLAs between domains (or at the edge of a network). A Diffserv WG draft in progress describes a PDB, called Assured Rate (AR). The AR PDB is suitable for carrying traffic that requires rate assurance but does not
172
Engineering Internet QoS
Table 7.1 RFC1349 Semantics for ToS ToS
Semantics
1000 0100 0010 0001 0000
Minimize delay Maximize throughput Maximize reliability Minimize monetary cost Normal service
require quantitative bounds on metrics such as delay and jitter. The AR PDB may be implemented using the Diffserv AF PHB (discussed in Section 7.5) in conjunction with suitable policers at the DS domain ingress nodes. 7.1.3 Existing IPv4 ToS To provide differentiated services, a mechanism is needed to distinguish packets between classes. IPv4 already has the type of service (ToS) field as shown in Figure 7.2. Of an 8-bit ToS field, 3 bits are used for IP precedence, 4 bits are used for type of service, and 1 bit remains unused. The 4 bit ToS field can be interpreted in different ways by many routers (no consensus). Many routers do not support this field and can simply ignore it. RFC1349 shows the semantics of the 4 bit ToS field and RFC701 discusses the values of the IP precedence fields. The semantics of these fields are shown in Tables 7.1 and 7.2. Routing protocols such as OSPF provide information to the routing entity so that they can compute paths based on different QoS critera. Datagrams can be forwarded along different routes based on their marked ToS field. However, the vendors use ToS in different ways and this poses a compatibilty risk between routers. 7.1.4 Diffserv Codepoint Figure 7.3 shows that at every hop (router), packets come in through the input ports and go out through the output port. The objective of Diffserv is to provide a facility so that packets marked with specific Diffserv codepoint (DSCP) receive well-defined performance or forwarding behavior at every hop. IETF formed the DS working group to standardize the definition and use of the ToS field. The DS byte can be used to mark packets with different codes.
IP Differentiated Services Network
173
Table 7.2 RFC791 Semantics for IP Precedence Precedence
Semantics
111 110 101 100 011 010 001 000
Network control Internetwork control CRITICAL/ECP Flash override Flash Immediate Priority Routine
Table 7.3 Diffserv Codepoint Pool
Codepoint Space
Assignment
1 2 3 Default
xxxxx0 xxxx11 xxxx01 000000 xxx000
Standard action Experimental/local action Experimental/local action (subject to standardization) Best-effort forwarding For IP precedence compatibility
The intermediate routers can be configured so that packets with different codes are forwarded differently. The ToS field of IPv4 (Figure 7.2) and the traffic class field of IPv6 (Figure 7.4) have been renamed as DS byte in RFC2474 [2]. Figure 7.5 shows the DSCP bit allocation. The DSCP uses 6 bits to mark a packet. Out of 6 bits, ¿#ÀÁÃÂ"Ä different codepoints are possible. The 2 bits in the DS byte currently unused are marked as CU. Each codepoint must map to PHB (standard or local). Table 7.3 shows the code space allocation of DSCP. 7.1.5 PHB Encoding There are several cases when it is necessary to identify a PHB in a protocol message rather than deriving it from DSCP in the IP ToS field. An example of such cases may include a bandwidth management message in a domain. A new RFC3140 [5] defines a binary encoding to uniquely identify PHBs or a set of PHBs in protocol
174
Engineering Internet QoS
3 Bit
4 bit Type of service
Precedence
1 bit unused
0
31 Header Version (4) length (4)
ToS (8)
Total length in bytes (16)
Indication (16) Time to live (8)
Flags (3)
Protocol (8)
Fragment offset (13) Header checksum (16)
Source IP address (32) Destination IP address (32) Options (if any)
Figure 7.2
IPv4 packet header.
In
Out PHB
Figure 7.3
Diffserv per-hop behavior.
messages. The encoding takes place using a 16-bit binary field. Figure 7.6 (Part a) shows the encoding for standard action. The standard single PHB encoding uses recommended DSCP values with bits 6 to 15 set to zero. The encoding for a set of PHBs requires the bit 14 to be set to 1 and DSCP be the smallest of the set. The encoding for AF1y will use AF11 with bit 14 set to 1. Figure 7.6(b) shows the encoding for nonstandard such as experimental/local action PHB. The 12-bit PHB identifier code is to be assigned by IANA. The bit-14 is set to 0 or 1 based on a single PHB or a set of PHBs as for standard action. The bit 15 is marked 1 in this case.
IP Differentiated Services Network
175
0
31
Version (4)
Traffic class (8)
Flow label (20)
Payload length (16)
Next header (8)
Hop limit (8)
Source IPv6 address (128)
Destination IPv6 address (128)
Payload
Figure 7.4
IPv6 header. 0
5 DSCP
DSCP: Diffserv codepoint
Figure 7.5
7 CU
CU: currently unused
Diffserv codepoint.
7.2 DIFFSERV ROUTER Figure 7.7 shows the data path operation performed by a Diffserv router. It needs a series of components such as classifier, meter, marker, shaper, and dropper commonly known as traffic conditioner. Many concepts discussed in this section have been discussed in detail in Chapter 2. Functions of these components are provided in the following:
Å Classifier: The packet received by the Diffserv router is first classified by a classifier module. The classifier selects packets based on the values of one or more packet header fields. Following are the two types of classification supported by Diffserv.
176
Engineering Internet QoS
0
5
DSCP
14
0
0
0
0
0
0
0
0
0
15
X
0
(a) Standard track
12
0
PHB id Code
13 0
14
0
X
15
1
(b) Nonstandard (experimental/local) Figure 7.6
PHB encoding.
Å Multifield (MF) classification: Supports classification based on multiple fields. It may be similar to the Intserv classification whereby the 5-tuple (source and destination address, source and destination port, and protocol identification) is used to classify packets. This type of classification is required at any Intserv capable router at the edge of a network connecting to a Diffserv domain. The MF classified flows need to be marked by appropriate DSCP either by the egress router of the Intserv domain or by the ingress router of the Diffserv domain. In the latter case, the Diffserv ingress router needs to perform MF classification.
Å Behavior aggregate (BA) classification: Sorts packets based on the ToS field that contains the DSCP. This classification is performed in the DSCP core routers and results in faster classification.
Å Marker: Once the MF classification process is complete, the packet is handed over to the marker. The job of the marker is to insert the appropriate DSCP value in the DS byte so that the packet receives appropriate service (PHB) in subsequent routers. Once the packet has been marked, all downstream routers need to perform only BA classification. The RFC2698 [6] defines a two rate three color marker (trTCM), to be used as a component in a Diffserv traffic conditioner. The trTCM meters the incoming packet stream and marks its packets based on the peak information rate (PIR) and committed information rate (CIR). The associated burst sizes can take one of the three values: green, yellow, or red. A packet exceeding the PIR is marked
IP Differentiated Services Network
177
Shaper
Classifier
Marker
Meter
Dropper
Figure 7.7
Data path operation.
red. If the packet exceeds the CIR, it is marked yellow, otherwise it gets green marking. Another scheme is called single rate three color marker (srTCM) [7]. This scheme uses three traffic parameters: committed information rate (CIR), committed burst size (CBS), and excess burst size (EBS). A packet is marked green if it doesn’t exceed the CBS, yellow if it does exceed the CBS but not the EBS, and red otherwise.
Å Meter: A meter is used to compare the incoming flow with the negotiated traffic profile and pass the violating packets to the shaper and dropper or remark the packet with lower grade service using a different DSCP. The meter can be used for accounting management of the network.
Å Shaper: A packet may be sent to the shaper module. This module may introduce some delay in order to bring the flow into compliance with its profile. The shapers usually have limited buffer, and packets that don’t fit into the buffer may be discarded. The shaper buffers may accept a burst of traffic and then send it at an acceptable rate to the next hop.
Å Dropper: A dropper performs a policing function by simply dropping the packets that are out of profile. It is a special instance of a packet shaper with no buffer.
178
Engineering Internet QoS
These components (meter, marker, shaper, and dropper) are also known as traffic conditioners in the Diffserv world. Combination of these components facilitates building a scalable Diffserv network. MF classification combined with metering at the edge is scalable, as the traffic volume is not very high (in comparison to the core). The core network doesn’t need to maintain per-flow state, as the classification is performed based on BA. QoS guarantees can be achieved by separating flows using different DSCP and by shaping and policing traffic.
7.3 PREMIUM SERVICE As we discussed earlier, Diffserv architecture is standardizing only PHBs, not services. IETF researchers have worked on a few sample services that can be constructed using PHBs. RFC2598 [8] has standardized a PHB called expedited forwarding (EF). Using the EF PHB, carriers can develop a service that provides a low loss, low latency, low jitter, and bandwidth assurance through its DS domain. Such a service is also known as premium service. Premium service is intended for traffic that requires a virtual leased line. The virtual leased line is similar to constant bit rate (CBR) traffic. It provides a simple abstraction of a link with minimum guaranteed bandwidth. The EF PHB is defined as a forwarding treatment for a particular Diffserv aggregate where the departure rate of the aggregate’s packets from any Diffserv node must equal or exceed a configurable rate [8]. The EF traffic receives this rate independent of the intensity of any other traffic attempting to transit the node. It averages at least the configured rate when measured over any time interval equal to or longer than the time it takes to send an output link MTU-sized packet at the configured rate. The configured minimum rate is settable by a network administrator. If the EF PHB is implemented by a mechanism that allows unlimited pre-emption of other traffic (e.g., a priority queue), the implementation has to include some means to limit the damage EF traffic could inflict on other traffic (e.g., a token bucket rate limiter). Traffic that exceeds this limit is discarded. This maximum EF rate, and burst size if appropriate, is settable by a network administrator. Code point 101110 is used for the EF PHB.
IP Differentiated Services Network
179
7.4 EXPERIMENTAL EVALUATION OF PREMIUM SERVICE UNDER LINUX An implementation project is proposed at the end of this chapter. Readers interested in setting up an experimental testbed may find this section of interest. Others should skip to the Experimental Results part of this section. The testbed described in this section is based on a Linux kernel patched with ds-8 distribution of Diffserv on Linux [9]. Also the iproute2 package is used for traffic control [10]. Figure 7.8 shows the experimental setup used to describe the the results. The Netcom system’s SmartBits 200 is a multiport, multistream, and multilayered performance analysis system. SmartBits has two ML-7710 cards and each card can emulate the traffic generation of about 1024 nodes. The multiple streams called VTEs can be generated from each card and these can be analyzed if captured at the other card. Marvin is a Diffserv capable PC-based network router that is directly connected to the ports of SmartBits through 10/100 Mbps Ethernet yellow crossover cables. The ef-prio script is used to invoke scheduling at the outgoing interface (eth0) of the router. This script invokes the queuing disciplines through the user level program tc.
Marvin SmartBits 200 eth0 Port 1
Port 2
eth1
Figure 7.8
Experimental testbed.
Traffic generation and analysis is performed by SmartsBits. Five virtual transmission engines (VTEs) are defined in the sending port of SmartBits (port No. 1). Two of these VTEs were configured to send packets marked with EF DSCP (0xb8). SmartBits doesn’t recognize the DSCPs as such but has the capability to configure custom packets. The DSCP (0xb8=10111000) is mapped in the ToS byte of the IP packet header. At the receiving port, a trigger is written that would map
180
Engineering Internet QoS
the ToS byte of the IP header from the incoming packets so that the traffic profiles of both the BE and EF traffic can be analyzed. The traffic profile for all the VTEs is described in Table 7.4. Results were collected at the destination port number 2 of SmartBits for various combination of loads on the network router Marvin. Table 7.4 Traffic Profile for VTE Parameter
Value
Packet size Interpacket gap Link utilization TTL Protocol no.
64 bytes 9.6 usec 100% 32 17(UDP)
The Linux traffic control is invoked running the following ef-prio script: DiffServ ef Æ prio script. #! /usr/bin/perl
// Pipe to your shell to execute e.g. ./ef Æ prio
$TC = "/iproute2/tc/tc"; $DEV = "dev eth1"; $efrate="1.0Mbit"; $MTU="1.5kB";
Ç
sh
// modify to your environment
print "$TC qdisc add $DEV handle 1:0 root dsmark indices 64 set_tc_index\n"; print "$TC filter add $DEV parent 1:0 protocol ip prio 1 tcindex ". "mask 0xfc shift 2\n"; print "$TC qdisc add $DEV parent 1:0 handle 2:0 prio\n"; # EF class: Maximum about one MTU sized packet allowed on the queue print "$TC qdisc add $DEV parent 2:1 tbf rate $efrate burst $MTU limit 1.6kB\n"; print "$TC filter add $DEV parent 2:0 protocol ip prio 1 ". "handle 0x2e tcindex classid 2:1 pass_on\n"; # BE class print "#BE class(2:2) \n";
IP Differentiated Services Network
181
print "$TC qdisc add $DEV parent 2:2 red limit 60KB ". "min 15KB max 45KB burst 20 avpkt 1000 bandwidth 10Mbit ". "probability 0.4\n"; print "$TC filter add $DEV parent 2:0 protocol ip prio 2 ". "handle 0 tcindex mask 0 classid 2:2 pass_on\n";
The ef-prio invokes the priority queuing discipline and RED packet discard algorithm. Readers are encouraged to read the references [11, 10] for detailed descriptions of commands used in this script. A brief description of some commands of this script is provided below. tc qdisc add dev eth1 root handle 1:0 dsmark indices 64 set_tc_index This adds the root queuing discipline to interface eth1 and assigns 1:0 as a handle for future references. The qdisc is of type dsmark (or the Diffserv type). The index is given as 2 raised to a value ÈuÉ , such as ¿À"¹]ÁÊÂ%ļ . It represents the number of entries in a table that holds pairs of ËDµÌ#³ÍÎÐÏÌÑ´ÒÓ . In this example we can set up 64 different pairs. The set tc index simply instructs the TC to retrieve the ToS field of every packet sent to this interface (eth1) so we can use it to classify packets. This value will be referred to as tcindex from now on. It is actually 8 bits wide but only 6 bits are used for the DSCP value so shift operations by 2 will be necessary later on. tc filter add dev eth1 parent 1:0 protocol ip prio 1 tcindex mask 0xfc shift 2 Here, a filter is applied to the queuing discipline identified by 1:0, which is our dsmark qdisc. It will select all ip packets and mask the tcindex with 0xfc and right shift the result 2 positions. If a packet is marked with 0xb8 it will result in ÔLÉÔ"Ô"ÔFÉ"É ÉÖÕ Ô Ô"Ô Ô"Ô"ÔFÉ"ÉNÁ×ÔFÉÔ"Ô ÔLÉ É"É ; this is right shifted 2 positions and the result will be 00101110 (=0x2e). This value is later used to find another filter with 0x2e as its handle; see below. The prio value is used to create an order between different filters attached to the same qdisc; prio1 is executed before prio2 and so on. tc qdisc add dev eth1 parent 1:0 handle 2:0 prio Add a new queuing discipline on eth1. This is placed inside (or under the root or qdisc 1:0) and called qdisc 2:0. The type is prio, that contains
182
Engineering Internet QoS
one or more internal queues or bands (defaults to 3) identified by their priorities. Priority 1 is handled before priority 2 and so forth. This prio parameter is not to be confused with the prio mentioned in the context of filters. tc qdisc add dev eth1 parent 2:1 tbf rate 1.0Mbit burst 1.5kB limit 1.6kB This is yet another qdisc added on eth1, inside the 2:0 qdisc; it is attached to band 1, hence the value of 2:1 (there are 3 bands in total). It is a token bucket filter discipline. The tbf qdisc is rate-limited to 1 Mbps. tc filter add dev eth1 parent 2:0 protocol ip prio 1 handle 0x2e tcindex classid 2:1 pass_on Here a new filter is added to 2:0, or the prio qdisc. It selects all ip packets and assigns handle 0x2e. As a result, all ip packets sent to eth1 marked with 0xb8 are passed on to this filter (see description about the first filter, added above). The filter simply selects all packets and sends them to 2:1 for further processing. The pass on means that if no matching handle identifier is found, pass this packet on to the next filter. tc qdisc add dev eth1 parent 2:2 red limit 60kB min 15kB max 45kB burst 20 avpkt 1000 bandwidth 10Mbit probability 0.4 This is the queuing discipline intended to be used for all BE traffic. It is attached to the second band of the prio qdisc 2:0, hence the parent 2:2 in the above command-line. It is a RED queue rate limited to 10 Mbps. It drops packets randomly when the queue length exceeds 15 kB. It tries to keep it below 45 kB. If the buffer fills up completely, 60 kB, it turns into an ordinary tail-drop queuing discipline. tc filter add dev eth1 parent 2:0 protocol ip prio 2 handle 0 tcindex mask 0 classid 2:2 pass_on This filter is added to the same qdisc as the EF filter. It selects all ip packets passed to it and passes them on to 2:2 or our BE qdisc. All ip packets received by the main filter with ToS equal to 0x00 (BE) are forwarded to this filter according to the same principles as seen above. The statistics collected at the Marvin using the tc command shows the number of packets channeled through each qdisc. In the qdisc statistics part
IP Differentiated Services Network
183
from the output given below, we first see the current qdisc configuration for each qdisc. Then different statistics such as the number of bytes and the number of packets sent through the qdisc are shown. The output also shows how many packets have been dropped by each qdisc. For ef Æ prio script Output of command : tc
Æ
s qdisc
qdisc red 8006 : dev eth1 limit 60kb min 15kb max 45kb sent 7237644 bytes 120628 pkts ( dropped 0, overlimits 0 ) qdisc tbf 8005 : dev eth1 rate 1 Mbit burst 1535b lat 950us sent 2286120 bytes 38102 pkts ( dropped 82523, overlimits 0 ) backlog 1560b 26p qdisc prio 2: dev eth1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 sent 9523764 bytes 158730 pkts ( dropped 82523, overlimits 0) backlog 26p qdisc dsmark 1: dev eth1 indices 0x0040 ser tc index sent 9523764 bytes 158730 pkts (dropped 82523, overlimits 0) backlog 26p
Experimental Results Figure 7.9 illustrates the behavior of BE and EF traffic streams with and without the Diffserv module running at the outgoing interface eth1 of Marvin, the network router. At time 1 only BE traffic stream is present and is using up all the available bandwidth of the link. At time 2 the EF VTE from the SmartBits is started. As expected, both EF and BE share the link bandwidth. At time 3, we invoke the Diffserv capability at Marvin by running the ef-prio script. This results in the EF traffic bandwidth utilization being policed to 1.2 Mbps. The BE starts consuming the rest of the bandwidth. The EF queue TB/EF has higher priority than the RED queue for BE. Delays faced by EF traffic are much less than the delays faced by BE. At time 7, the BE VTE from the SmartBits is switched off. However, the EF bandwidth utilization remains at 1.2 Mbps due to policing (token bucket).
184
Engineering Internet QoS
8 7 6 5 Bandwidth 4 (M bps) 3
EF BE
2 1 0 1
2
3
4
T ime values
Figure 7.9
5
6
7
8
Experimental results for EF traffic.
7.5 ASSURED SERVICE The assured forwarding (AF) PHB group as defined in RFC2597 is the means for a provider DS domain to offer different levels of forwarding assurances for IP packets received from a customer DS domain [12]. The customer or the provider DS domain separates traffic into one or more of these AF classes according to the services that the customer has subscribed to. Packets within each class are further divided into drop precedence level. A typical example used to describe AF PHB could be to provide different service types such as gold, silver, and bronze. Service providers in this case could guarantee that gold service gets lower delay and loss than other services. This requires allocation of resources such as buffer and bandwidth at routers and switches. Service providers need to perform admission control to ensure that they don’t over-commit the provisioned capacity for each service. For example gold service is provisioned 155 Mbps and silver is provisioned 45 Mbps. However, if the level of traffic generated by customers using gold service is very large (i.e., no admission control is performed), then it is likely that the silver customers may experience better service. Nonconformant packets are marked so that if insufficient resources are available, these packets will be dropped.
IP Differentiated Services Network
185
Four AF classes are defined, where each AF class in each DS node gets allocated a certain amount of forwarding resources (buffer space and bandwidth). Packets are assigned to a queue based on the service class. A scheduler can be configured to assign bandwidth for some queue (for example, WFQ may be used for this purpose). Within each AF class, IP packets are marked (again by the customer or the provider DS domain) with one of three possible drop precedence values. In case of congestion, the drop precedence of a packet determines the relative importance of the packet within the AF class. A congested DS node tries to protect packets with a lower drop precedence value from being lost by preferably discarding packets with a higher drop precedence value. Congestion avoidance techniques such as random early detection (or variants) may be used for packet dropping from each queue to keep the long-term congestion low while absorbing the short-term burstiness. A detailed discussion of these techniques has been provided in Chapter 4.
DSCP q q
In
Queue 1 Scheduler
q
Dropper
d d
Out
0 CU
Queue 4 Dropper Classifier
Figure 7.10
Assured forwarding implementation.
Table 7.5 gives values of DSCP associated with combination of service class and drop preference. DSCPs for ØMÙÛÚÜ Ú Ü the four general-use AF classes use the notation where is the class and is the drop precedence in that class. ØMÙ Ø+Ù For example, Ô ÔÝÁÞÉ ÉÔFÉÔLÉ is AF class 1 with low drop precedence and Ä ßaÁàÔLÉ ÉÔ ÔLÉ belongs to AF class 4 with high drop precedence. With drop probabilities, it is required that high should be dropped more aggressively than low. The example in Figure 7.10 assumes DSCP to be qqqdd0 where qqq determines the queue to be assigned and dd determines the drop precedence. It is worth noting that AF DSCP identifies a queue but doesn’t specify what the size
186
Engineering Internet QoS
Table 7.5 AF Drop Precedence Drop Precedence
Class AF1
Class AF2
Class AF3
Class AF4
Low (1) Medium (2) High (3)
001 010 001 100 001 110
010 010 010 100 010 110
011 010 011 100 011 110
100 010 100 100 100 110
of this queue should be or how the packets should be scheduled from this queue. Schedulers such as WRR, WFQ, etc., may be used to serve each queue. Chapter 3 describes these schedulers in detail. A service provider can provide a variety of services based on the AF classes. Further, each precedence level within a class may be associated with a certain drop probability. In a simple case, the service provider providing video-phone service may decide to associate drop precedence based on some form of charging scheme. Users willing to pay more get better quality of service. If the network gets congested, their packet drop probability is lowered, as AF provides only soft guarantees. Figure 7.11 shows the impact of drop level on video frames. Flow with a high drop precedence level has the worst affected video frames [Figure 7.11(c)].
(a) Low drop
Figure 7.11
(b) Medium drop
Video-phone service with different drop precedences.
(c) High drop
IP Differentiated Services Network
187
7.6 OPEN ISSUES WITH DIFFSERV There are several open problems that need to be addressed before this framework can see commercial deployment. Service can be easily stolen in a Diffserv network by simply marking packet headers with appropriate DSCP codes. The edge router must perform authentication to make sure that service is not stolen. Diffserv has no dynamic admission control. Therefore, the network managers must make sure that enough resources are available for the agreed SLAs. Diffserv doesn’t support perflow QoS guarantees to achieve scalability. The QoS is supported over aggregates of many flows belonging to the same class. It becomes challenging to still maintain QoS, especially for voice and video, which need per-flow guarantees. The ways in which PHBs, edge functionality, and traffic profiles can be combined to provide an end-to-end service, such as a virtual leased line service [8] or an Olympic-like gold/silver/bronze service [12], are still active research areas. Diffserv does not support any closed loop flow control for data service similar to the ABR service in ATM networks. While drop-based services work well with voice or video, data services work better with feedback flow control. Here data waits at the source rather than being dropped in the network. In a typical end-to-end communication, traffic is likely to be carried over multiple administrative domains. Mechanisms need to be in place for smooth transfer of traffic from domain to domain so that end-to-end service transparency is maintained. The following questions must be answered:
Å How to decide which users get special service? Å Where to implement bandwidth sharing policy? Å Who is responsible for ensuring that simultaneous use of special service fit within allocation? A new entity called bandwidth broker proposed by Jacobson is being developed by the IETF community in conjunction with the resource allocation protocol (RAP) working group. These topics are discussed in detail in Chapter 8.
7.7 DIFFSERV RESEARCH DIRECTIONS Diffserv has been a very active research area for last 4 to 5 years. This is evidenced by large number of papers published in various conferences and journals related to
188
Engineering Internet QoS
this topic. This section provides only a small number of works, with a view toward showing the flavor of research related to Diffserv. Ferrari [13] provides experimental study of provisioning of end-to-end services in Diffserv. This study uses two priority queuing (PQ) and weighted fair queuing (WFQ) scheduling schemes. In particular, this study shows the effect of stream multiplexing on delay, and jitter-sensitive traffic. The evaluation methodology uses three different cases:
Å With different aggregate traffic loads; Å With a variable number of flows multiplexed in the same class; Å With different packet sizes. End-to-end measurement-based connection admission control (EMBAC) proposes a decentralized admission control mechanism for Diffserv. For real-time flows, each individual user probes the network during connection establishment (flow setup) phased [14]. A decision is made whether the connection should go ahead or not, based on statistics collected at the destination. Performance evaluation (analytical and simulation) was carried out to demonstrate that the scheme can provide strict QoS guarantees even with very light probing overhead. A huge research effort has gone into proposing new packet drop mechanisms to support QoS in the Diffserv network. A mechanism called selective pushout with random early detection (SPRED) is proposed by Hou et al. [15]. They show that SPRED is a generalized buffer management algorithm that combines the best features of pushout (PO), RED, and RED with in/out (RIO) mechanisms. Through simulation study they demonstrate that under identical conditions, network nodes employing this mechanism have significant performance improvement over the best effort model for streaming applications. Experimental evaluation of providing bandwidth assurance for flows in a RIO-enabled AF PHB differentiated services network is performed by Seddigh et al. [16]. It is worth recalling that RIO is an extension of the RED algorithm that uses a differentiated drop treatment during congestion to provide differentiated throughput to end users. This study shows the impact of various factors on throughput assurances for UDP and TCP flows in such a network. The authors show that these factors can result in different throughput for end users having identical service level agreements (SLA). Another study, by Nandy et al. [17] has investigated similar issues. Their simulation and prototype look at more than seven different factors that can bias bandwidth assurance for customers with identical SLAs. This study goes a bit further by providing design options for traffic conditioning schemes at the edge
IP Differentiated Services Network
189
of a network, to mitigate these effects. An enhanced random early detection buffer management scheme, called AMRED-G, to support service guarantees using AF PHB group has been proposed by Chaskar et al. [18]. This scheme involves buffer dimensioning and adaptive adjustment of drop thresholds. This study extends the Olympic service model proposed for the AF PHB group to provide guarantees on packet-drop rates. Aggregate flow control (AFC) proposed by Nandy et al. [19] with a Diffserv traffic conditioner improves the bandwidth and delay assurance of services using a Diffserv network. This work has developed a prototype to study the end-to-end behavior of customer aggregates. The main claims of this work are:
Å Fairness among aggregated customer traffic with different numbers of microflows in an aggregate as well as the impact of nonadaptive traffic (UDP) and adaptive traffic (TCP);
Å Improved performance for short transaction oriented TCP flows; Å Reduced interarrival jitter for streaming UDP traffic. Resource allocation and packet scheduling are major research areas in the context of Diffserv. Applying WFQ to provide QoS in Diffserv is not uncommon. Assigning weight for each queue so that it gets serviced at a certain rate is a challenging task. Buch et al. [20] show how to set this weight such that the edgerouter-to-edge-router queuing delay in the Diffserv network is statistically bounded. The M/D/1 queuing system is used to model the nodes. A simulation study of different approaches for call admission control (CAC) and resource allocation in the Diffserv network has been performed by Gerla et al. [21]. Each approach incurs different processing and signaling loads on edge and core routers. This paper attempts to identify solutions that provide QoS guarantees without requiring per-flow processing in the core routers. Capacity planning of a network using Diffserv has been performed by Fiedler et al. [22]. Through simulation, they demonstrate significant capacity savings using the Diffserv framework over a network without Diffserv. They also provide guidelines on provisioning a Diffserv network for real-time traffic. The impact of marking strategies employed by aggregated sources on the provided service in a Diffserv network has been studied by Yeom and Reddy [23]. The authors propose two new marking algorithms that improve fairness among the individual flows within an aggregation. The RFC2963 [24] describes several rate adaptive shapers (RASs) that can be used in combination with the single rate three color markers (srTCM) and the
190
Engineering Internet QoS
two rate three color marker (trTCM) discussed earlier. Applying these RASs at the ingress of a Diffserv network improves the performance of TCP. This is achieved by reducing the burstiness of the traffic. This RAS can be particularly useful in providing the assured forwarding per hop behavior (AF PHB) with the TCM being used to mark traffic consisting of a small number of TCP connections. Traffic shaping is one of the essential components of Diffserv. A family of RASs has been proposed by De Cnodder et al. [25] with the goal of reducing the traffic burstiness. This simulation study aims at increasing the ratio of packets with the highest level of forwarding treatment by buffering and appropriate scheduling packets before applying traffic control functions. A network such as Diffserv supporting multiple classes of service requires a differentiated pricing structure. Wang et al. [26] propose a pricing scheme in a Diffserv environment based on the cost of providing different service classes and on long-term demand. This simulation study compares the performance of a network supporting congestion-sensitive pricing and adaptive-service negotiation to that of a network with a static pricing policy. Some users may adapt to price changes by either adjusting their sending rate or by selecting a different service class.
7.8 SUMMARY This chapter provided an overview of the IETF differentiated services architecture for providing QoS. The Diffserv architecture is capable of extending the current best effort paradigm of the Internet to a class based network. It also achieves scalability and gradual upgrade of the existing system. A simplified view of data-path operation of a Diffserv capable router and associated functions such as classification, marking, metering, shaping, and dropping was presented. The chapter discussed how services such as expedited forwarding and assured forwarding can be supported. There are still several open issues in supporting Diffserv. We discussed issues such as service stealing, resource allocation, and scheduling in this chapter. Finally, we concluded this chapter by discussing the research directions related to Diffserv.
7.9 REVIEW QUESTIONS 1. What are the main goals of the Diffserv architecture? 2. How does Diffserv achieve aggregation and scalability? 3. What is the need for defining per domain behavior?
References
191
4. What are the components essential for data path operation of a Diffserv capable router? 5. What is the appropriate location of performing MF classification? 6. What is the role of a marker in a Diffserv capable router? 7. What is the difference between a single rate three color marker (srTCM) and a two rate three color marker (trTCM)? 8. What is the role of the shaper in a Diffserv capable router? 9. How is packet classification performed in a Diffserv network? 10. Give two examples each of applications that can make use of EF and AF PHBs. 11. Why is there a need for various precedence levels in AF service? 12. List some problems associated with Diffserv.
7.10 IMPLEMENTATION PROJECT Install a Diffserv module on a PC based router. Connect this router to two other PCs and use one of them as the source and the other as the sink. Write an application using sockets and mark the ToS filed with DSCP for EF and AF. Produce measurements and results similar to the results discussed in Section 7.4. You may configure different scheduling disciplines and packet discard mechanisms to study the impact of these on the QoS. Vary the number of best-effort and real-time flows and study the variation on the QoS. Hints: Use either Linux traffic control or ALTQ for configuring the scheduler. You may use the ttcp program and modify it to mark the DSCP. The companion Web site has links to other tools for traffic generation and statistics collection.
References [1] K. Thompson, G. J. Miller, and R. Wilder. Wide-area Internet traffic patterns and characteristics. IEEE Network, 11(6):10–23, November/December 1997. [2] K. Nichols, S. Blake, F. Baker, and D. Black. Definition of the differentiated services field (DS field) in the IPv4 and IPv6 headers. Request for Comments 2474, Internet Engineering Task Force, December 1998.
192
References
[3] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss. An architecture for differentiated services. Request for Comments 2475, Internet Engineering Task Force, December 1998. [4] K. Nichols and B. Carpenter. Definition of differentiated services per domain behaviors and rules for their specification. Request for Comments 3086, Internet Engineering Task Force, April 2001. [5] D. Black, S. Brim, B. Carpenter, and F. Le Faucheur. Per hop behavior identification codes. Request for Comments 3140, Internet Engineering Task Force, June 2001. [6] J. Heinanen and R. Guerin. A two rate three color marker. Request for Comments 2698, Internet Engineering Task Force, September 1999. [7] J. Heinanen and R. Guerin. A single rate three color marker. Request for Comments 2697, Internet Engineering Task Force, September 1999. [8] V. Jacobson, K. Nichols, and K. Poduri. An expedited forwarding PHB. Request for Comments 2598, Internet Engineering Task Force, June 1999. [9] Differentiated services on Linux. URL:http://diffserv.sourceforge.net/. [10] Linux 2.4 advanced routing HOWTO. URL: http://www.linuxdoc.org/HOWTO/Adv-RoutingHOWTO.html. [11] Werner Almesberger. Linux traffic control—implementation overview. Technical report, EPFL, January 1998. ftp://lrcftp.epfl.ch/pub/people/almesber/pub/tcio-current.ps.gz. [12] J. Heinanen, V. Baker, F. Jacobson, and Wroclawski J. Weiss. Assured forwarding PHB. Request for Comments 2597, Internet Engineering Task Force, June 1999. [13] Tiziana Ferrari. End-to-end performance analysis with traffic aggregation. Computer Networks, 34(6):905–914, December 2000. [14] Giuseppe Bianchi, Antonio Capone, and Chiara Petrioli. Packet management techniques for measurement based end-to-end admission control in IP networks. Journal of Computer Networks, 2(2):147–156, June 2000. [15] Yiwei Thomas Hou, Dapeng Wu, Bo Li, Takeo Hamada, Ishfaq Ahmad, and H. Jonathan Chao. A differentiated services architecture for multimedia streaming in next generation internet. Computer Networks, 32(2):185–209, February 2000. [16] Nabil Seddigh, Biswajit Nandy, and Peter Pieda. Bandwidth assurance issues for TCP flows in a differentiated services network. In Proceedings of the IEEE Conference on Global Communications (GLOBECOM), page 6, Rio de Janeiro, Brazil, December 1999. [17] Biswajit Nandy, Nabil Seddigh, and Peter Pieda. Diffserv’s assured forwarding PHB: what assurance does the customer have? In Proc. International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV), Basking Ridge, New Jersey, June 1999.
References
193
[18] Hermant M. Chaskar, Eleftherios Dimitriou, and Rayadurgam Ravikanth. Service guarantees in the Internet: differentiated services approach. In Proceedings of International Workshop on Quality of Service, pages 176–178, Pittsburgh, Pennsylvania, June 2000. [19] Biswajit Nandy, Jeremy Ethridge, Abderrahmane Lakas, and Alan Chapman. Aggregate flow control: improving assurances for differentiated services network. In Proceedings of the Conference on Computer Communications (IEEE Infocom), volume 3, pages 1340 –1349, Anchorage, Alaska, April 2001. [20] Maarten Buchli, Danny De Vleeschauwer, Jan Janssen, Annelies Van Moffaert, and Guido Petit. Resource allocation and management in DiffServ networks for IP telephony. In Proc. International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV), Port Jefferson, New York, June 2001. [21] Mario Gerla, Claudio Casetti, Scott Seongwook Lee, and Gianluca Reali. Resource allocation and admission control styles in QoS DiffServ networks. Lecture Notes in Computer Science, 1989:113– 128, January 2001. [22] Ulrich Fiedler, Polly Huang, and Bernhard Plattner. Towards provisioning diffserv intranets. In International Workshop on Quality of Service (IWQoS), pages 27–43, Karlsruhe, Germany, June 2001. [23] I. Yeom and A. L. Narasimha Reddy. Impact of marking strategy on aggregated flows in a differentiated services network. In Proceedings of International Workshop on Quality of Service, pages 156–158, London, United Kingdom, June 1999. [24] O. Bonaventure and S. De Cnodder. A rate adaptive shaper for differentiated services. Request for Comments 2963, Internet Engineering Task Force, October 2000. [25] Stefaan De Cnodder, Omar Elloumi, and Kenny Pauwels. Rate adaptive shaping for the efficient transport of data traffic in diffserv networks. Computer Networks and ISDN Systems, 35(2-3):263– 285, February 2001. [26] Xin Wang and Henning Schulzrinne. Pricing network resources for adaptive applications in a differentiated services network. In Proceedings of the Conference on Computer Communications (IEEE Infocom), volume 2, pages 943–952, Anchorage, Alaska, April 2001.
194
References
Chapter 8 Policy-Based QoS Management In order to support QoS in the network, new architectures such as Intserv and Diffserv have been proposed in the IETF [1]. As we disccused in Chapters 5 and 7, these architectures support diverse service levels for multimedia and real-time applications. For example, the Diffserv architecture is capable of providing welldefined end-to-end service over interconnections of autonomous domains. These domains need to enter contractual agreement regarding the aggregate traffic level that they will send and receive from each other. These agreements need to be converted into a set of actions that the networking devices would implement and enforce. Also, in order to provide end-to-end service, service provisioning policies are required. These policies configure the edge devices at the DS boundary and define rules that may be used for mapping traffic to DS behavior aggregates. Policy-based management has become a significant research and implementation area in the past few years. This chapter discusses domain management issues and policy-based management as a possible solution for managing inter- and intradomain issues.
8.1 DEFINITION OF TERMINOLOGIES We provide a few definitions from RFC2475 [1] and RFC3198 [2] that will be used in this chapter subsequently. Service-Level Agreement (SLA): A legal service contract between a customer and a service provider that specifies the service a customer is expected to receive. This contract includes levels of availability, serviceability, performance, operation, or other attributes of the service. The SLA can be negotiated between
195
196
Engineering Internet QoS
a service provider and a user organization (source domain) or between two service providers (forming transit domains). A SLA may include service level specification (SLS) or in Diffserv terminology, a traffic conditioning agreement (TCA) [1]. Service-Level Objective (SLO): SLAs are legal documents. An SLO is basically a set of parameters and their values derived from SLA that is to be enforced or monitored to meet the SLA. These can be specified as part of an SLA, an SLS, or a separate document [2]. Service-Level Specification (SLS): This is used to specify how a customer’s traffic is to be treated by a service provider. In the context of a Diffserv environment, it may define parameters such as specific DSCP and the per-hop-behavior, and profile characteristics and treatment of the traffic for those codepoints. An SLS is a specific SLA (a negotiated agreement) and its SLOs (the individual metrics and operational data to enforce) toguarantee QoS for network traffic [2]. Traffic-Conditioning Agreement (TCA): As defined in RFC2475 [1], this is “an agreement specifying classifier rules and any corresponding traffic profiles and metering, marking, discarding, and/or shaping rules that are to apply to the traffic streams selected by the classifier. A TCA encompasses all of the traffic conditioning rules explicitly specified within a SLA, along with all of the rules implicit from the relevant service requirements and/or from a DS domain’s service provisioning policy.”
8.2 BANDWIDTH BROKER As we discussed earlier, in Chapter 7, each Diffserv network may be added with a component called a bandwidth broker (BB) as shown in Figure 8.1. The bandwidth broker is responsible for automating the process of SLS negotiation. In addition, it may also perform admission control, resource management, and network management tasks such as configuration of network devices to support the provisioned QoS according to a common set of operational policies specified for the network [3]. Bandwidth broker is a logical entity, and its implementation is not subject to standardization. BB manages the QoS resources within a given domain based on the servicelevel-specifications (SLS) that have been agreed upon in that domain. The BB is also responsible for managing interdomain communication, with the BBs in neighboring domains, with a view to coordinate SLSs across the domain boundaries.
Policy-Based QoS Management
BB
197
BB
DS domain A
DS domain B
BB DS domain C
Figure 8.1
Bandwidth broker.
An ISP, for example, can dynamically negotiate different service level agreements and bandwidth guarantee with a given customer. Alternatively, a service provider could charge different rates for bandwidth, depending on the demand. To do this, the bandwidth broker will contain the ability to communicate with a remote bandwidth broker in order to negotiate the SLS. It will also communicate with local enforcers to determine the state of the network as well as configure the network. The BB also gathers and monitors the state of resources within its domain and on the edges of the domain (edge routers connected to and from adjacent domains). The bandwidth broker will take into consideration the ability of the entire network to deliver the policy request. Figure 8.1 shows that domain A is connected to domain B and domain B is connected to domain C. Bandwidth brokers of each domain need to communicate with its directly connected neighboring domain. Unfortunately, at this point in time, there is no standard protocol for this purpose. A simple inter-domain bandwidth broker signaling protocol (SIBBS) is under development for this purpose [3]. SIBBS is a client-server oriented protocol that uses a TCP connection between peering BBs. The next section describes the IETF policy framework and how bandwidth broker could be used to manage a Diffserv network using this framework.
198
Engineering Internet QoS
8.3 POLICY FRAMEWORK The resource allocation protocol (RAP) working group [4] is developing a policybased management framework, as shown in Figure 8.2. For enhanced network services that support QoS and traffic engineering in the network, the devices need different capabilities. Especially this becomes a significant issue in medium and large scale networks with hundreds and thousands of devices to be managed. The RAP working group is currently defining protocols and frameworks so that policies can be implemented on devices to support QoS enabled services. Before we discuss the policy framwork a definition of policy in the networking context would be appropriate. Rajan et al. [5] provide the following definition of policy: Policy is used to denote the unified regulation of access to network resources and services based on administrative criteria. They also define a policy hierarchy with three levels of policy: network level, node level, and device level. The network level policy is responsible for network wide resource utilization, topology, and objectives. The network consists of nodes. Nodelevel policy consists of TCAs that can meet specific QoS objectives of provisioned service. Finally, device level policy is the translation of node level policy into classification rules, scheduling mechanisms, policing criteria, etc. The entity responsible for admission control may use a policy-based system that provides answers to questions such as how to admit traffic into the network (accept RSVP RESV request) or assign to a PHB (marking DSCP code). Some possibilities are to assign priority or importance of a user or give preference based on willingness to pay more. Also, time of the day will impact the cost and quality of service being received. In order to support QoS, we also need to have support for monitoring and accounting from the network infrastructure. The RAP working group addresses these issues. As with any other working group, the RAP WG has its own set of nomenclatures such as PDP and PEP. A policy enforcement point (PEP) is a networking device (router and switch) that is directly responsible for receiving and forwarding packets. It executes or/and enforces the policy of a domain on data flows. The policy decision point (PDP) is a logical entity that determines the treatment packets should receive when passing through a network. The PDP has a view over the whole network area (for example, an administrative domain) through its PEPs. It may decide whether or not to admit a specific data flow to enter. There exists at least one PDP per domain. PDP may use additional protocols such as DIAMETER or RADIUS [6, 7] for authentication and billing.
Policy-Based QoS Management
Policy DB
199
BB Policy decision point (PDP)
Policy enforcement point (PEP)
Edge router Figure 8.2
Policy architecture.
The local policy decision point (LPDP) is an optional entity. It is for replacement in absence of a PDP and makes a local decision point. PDP and PEP communicate with each other using a protocol called common open policy service (COPS) [8] described in Subsection 8.3.1. The PEP usually initiates a TCP connection to a PDP and the PEP uses this connection to send requests and receive decisions from the remote PDP. However, in some situations there is a possibility for PDP to send an unsolicited decision to the PEP to force changes in previously installed request states. After the PDP decision is applied on PEP, PEP must send a report to the remote PDP that it has been successfully applied. These reports are used for accounting and monitoring purposes. If a request state is suddenly changed, PEP must notify the PDP straightaway. If there is any state deletion that is no longer applicable because of events at the client or a decision issued by server, PEP must always update itself. In Diffserv architecture, the bandwidth broker could be configured as PDP to make decisions about whether or not to admit users to access a certain transport service (admission control). PEP can be implemented on edge routers that consult the PDP for policy matters. In some implementations, it is possible that PDP and PEP are colocated on the same device.
200
Engineering Internet QoS
8.3.1 Policy Protocols This section provides a brief introduction to the COPS protocol and also its use for provisioning policies (COPS-PR). COPS and COPS-PR specifications can be found in RFC2748 [8] and RFC3084 [9], respectively. PEP Initialization
PDP Request
Decision
Policy DB lookup
Resource configuration Report Policy DB update Decision
Policy DB lookup
Resource configuration Report Policy DB update
Figure 8.3
COPS message exchange.
The purpose of the COPS protocol is to exchange policy information between a PDP and its clients, PEPs. The protocol employs a client/server model, where the PDP is the server, and the PEPs are the clients. It uses a persistent TCP connection as its transport; thus there is no need for reliability mechanisms in the protocol itself. Figure 8.3 shows the message exchange between a PEP and a PDP. After PEP sends a configuration request, PDP will send a decision message containing configuration data for PEP. When the configuration data is successfully installed on the PEP, PEP
Policy-Based QoS Management
201
should send a report message to PDP about the installation. The server must update or remove the configuration information by using a new decision message. If PDP sends a decision to remove the named configuration data from the PEP, then PEP will delete the specific configuration and must send a report message back to the PDP as confirmation. COPS provides fault tolerance to guarantee the security and service management of distributed network devices. The connection between the PEP and remote PDP is continually monitored using keep-alive messages. In case of a broken connection, the PEP keeps trying to reconnect to the remote PDP or to connect to an alternative PDP. While disconnected from PDP, LPDP makes the local decision for the PEP. Once connection is reestablished, and if there is deletion of states or new events, a report must be sent to the PDP. In addition, the PDP could request that the PEP resynchronize all previously installed request states. It is also possible to create PEP caches so that when the failure is detected, and before the new connection is started, the PEP cache will continue to receive decisions and send requests for some time. COPS supports two models: outsourcing and policy provisioning. The policy provisioning model is supported in DiffServ architecture where the user contacts the PDP. However, the outsourcing model, in which the user approaches the PEP, (for instance the router), which in turn contacts the PDP, is supported by the Intserv/RSVP architecture. For COPS to support policy provisioning, a new client type has been introduced. This new client type is called COPS for provisioning (COPS-PR). It is independent of the policy type and can carry information about such things as QoS, virtual private networks, and security. To provide a high-level security, COPS messages can use the Hashed Message Authentication Codes (HMAC) algorithm, IPSEC, or Transport Layer Security (TLS). This will provide authentication and security of the channel between the PEP and the PDP. 8.3.2 Policy Rules and Representations Policies can be implemented using simple rules. The policy rules usually follow If, What, When, and Then logic. We show a very simplified example of a policy rule below: If: The user is CEO of the company and What: The application is watching streaming video and When: The time is between 9:00 – 17:00
202
Engineering Internet QoS
Then: The user is entitled to a service-level premium that gives a throughput of 2 Mbps and an end-to-end latency of no more than 150 ms.
This simplified example uses parameters such as bandwidth, latency, etc. The real parameters would be similar to the Tspec and others discussed earlier in Chapters 5 and 6. Policy needs a standard representation for interoperability purposes. If lightweight directory access protocol (LDAP) is used for policy storage, the LDAP schema can serve as a good candidate to represent policies. Another approach taken by the IETF RAP and Diffserv working groups is to use a virtual information store called the policy information base (PIB). All the information of provisioned policies in COPS-PR is kept by sets of PIB. The model underlying this structure is one of well-defined policy rule classes, and instances of these classes residing in a virtual information store called the policy information base (PIB). PIB is based on the model of structure of management information (SMI) and management information bases (MIBs) as used with simple network management protocol (SNMP). Inside PIB, all the policy data is classified according to type or class of policy. The PIB can be thought of as a tree, where the branches of the tree represent types of policy rules or policy rule classes (PRCs) and the leaves represent content of the policy rules or policy rule instances (PRIs). Moreover, PRC can have multiple PRIs. Each PRI is identified by a provisioning instance identifier (PRID). A PRID is a unique name in a COPS object. An example of the PRID numbering of a PIB tree is “1.2.3.4.5”. The first four numbers represents the PRC class (“1.2.3.4”) and the last number represents the PRI (“5”). Several PIBs are being defined by IETF and these are beyond scope of this book.
8.3.3 Policy Database
Policy can be stored in a policy database. This database can be implemented as a directory service. PDP can access this policy database using LDAP. LDAP uses the TCP/IP protocol stack and it provides ease of use for stand-alone databases. More complex policies are likely to be stored in relational database management systems (RDBMS). The standard SQL queries can be used to access the RDBMS.
Policy-Based QoS Management
203
8.4 POLICY AND RSVP Earlier, in Section 8.3.1, we noticed that the COPS outsourcing model needs an edge router to perform the function of PEP. An application launched by an end user may use resource reservation protocol such as RSVP. As the RSVP request reaches the PEP, it contacts the PDP to receive a policy decision. The PEP needs to interpret the POLICY DATA object carried in the PATH and RESV messages of RSVP. A policy object may carry a list of policy elements (PEs) that are used to describe the policy attributes. For example, the authentication policy element may carry authentication information necessary to securely identify the source of an RSVP flow. Based on these PEs, the PDP communicates its decision as to whether the flow can be admitted and what sort of resource should be allocated for this flow. From a QoS perspective, the priority element may be used to provide the priority level to be used for a flow. As an example, the PEP may be required to preempt a low priority flow to accommodate a high priority flow when resource availability becomes scarce. Details of the policy object format and a description of various PEs may be found in Durham et al. [10].
8.5 BANDWIDTH BROKER IMPLEMENTATION This section describes how to build a simple experimental policy-based network using publicly available tools. The Linux kernel version 2.2.14 was patched with the Diffserv ds-8 distribution [11]. This kernel is recompiled and patched with iproute2 package that comes with ds-8 the distribution [12]. The teletraffic tapper (TTT) package is used for real-time packet monitoring. The test tcp (ttcp) is used as a traffic generator as well as a DSCP marker in the TOS field in the IP header. A Java implementation of bandwidth broker using the COPS-PR protocol is used for building PDP and PEP [13]. This implementation allows an edge router to register itself to a BB and configure its parameters according to decisions sent from BB. When a BB is initialized, it creates and initializes a PIB object. The initialization can be done through configuration files. The PEP module opens a connection to the bandwidth broker by sending an open message to a specified port. The bandwidth broker issues a decision message with the configurations. This decision is then added to the local PIB. The decision is installed in the router by creating a LinuxRouterConfig object and calling the method install() after initializing the object’s parameters. This object
204
Engineering Internet QoS
configures the router, based on the given parameters. The following code shows the Java implementation of the LinuxRouterConfig class and associated methods. package diffserv.test; import pib.*; import pib.test.*; public class LinuxRouterConfig á // general setting for this router public String tc = "/home/hans/cse/thesis/iproute2/bin/tc"; public String ip = "/home/hans/cse/thesis/iproute2/bin/ip"; public String ipAddr = "192.168.2.1"; public IpFilterEntry ipFilter; public IfQueueEntry queue; // configurations public static int classid = 2; public int prio = 3; public String dev = "eth1"; public void install() á if (ipFilter == null) return; if (queue == null) return; //byte[ ] filterPrid = ipFilter.getPRID(); //classid = filterPrid[filterPrid.length â 1]; int rate = queue.bandwidthAllocation; StringBuffer command = new StringBuffer(256); command.append(tc).append(" class add dev ").append(dev); command.append(" parent 1:1 classid 1:").append(classid); command.append(" cbq bandwidth 10Mbit rate ").append(rate); command.append("Kbit allot 1514 cell 8 weight 100Kbit prio ").append(prio); command.append(" maxburst 20 avpkt 1000 split 1:0 bounded"); System.out.println(command); try á ã Runtime.getRuntime().exec(command.toString()); ã catch (Exception e) á e.printStackTrace(); command = new StringBuffer(256); command.append(tc).append(" filter add dev ").append(dev); command.append(" parent 1:0 protocol ip prio ").append(prio);
Policy-Based QoS Management
205
command.append(" route from ").append(classid); command.append(" classid 1:").append(classid); System.out.println(command); try á ã Runtime.getRuntime().exec(command.toString()); ã catch (Exception e) á e.printStackTrace(); command = new StringBuffer(256); command.append(ip).append(" route add ").append(new ObjectID(ipFilter.srcAddr)); command.append(" via ").append(ipAddr).append(" realm ").append(classid); System.out.println(command); try á ã Runtime.getRuntime().exec(command.toString()); ã catch (Exception e) á e.printStackTrace();
ã
ã
classid++;
BB is conceptually a policy decision point in the context of policy-based management. For our experiments we configured a laptop that performs the BB function. The router in Figure 8.4 acts as a policy enforcement point. For simplicity, the request for resource allocation to the BB is made via Web interface. BB interacts with the PEP (the router in the figure) using the COPS-PR that we have developed. We use the route classifier, which means we will differentiate traffic based on the originating IP address. For this test, we have a host Mango (at IP address 192.168.2.2), and a sink Lychee (with IP address 192.168.4.2). BB sends the decision to the router and the router installs the configuration accordingly. The router is configured by adding a class to the root class and attaching a filter to that class. The router will attach a CBQ and a root class to the device that will be configured. As a result, the flow gets policed at a specified rate corresponding to the PDP request. Policy Enforcement Scenario 1: Domain A has a policy that during the peak hour, download from the entertainment video-server Mango should not exceed 2 Mbps (i.e., the policy is to limit the traffic from video server to external domain B in this case). Using the testbed in Figure 8.4, we demonstrate how this prototype can be used to enforce the above policy. BB instructs the PEP (edge router) to configure it so that the policy can be enforced as TCA. Figure 8.5 shows these measurements. At time 20 sec, traffic is generated from server Mango. It occupies the available
206
Engineering Internet QoS
Web-based policy editor
PDP
PDP
BB
BB
Mango (BE) COPS
COPS
Guava
Lychee ( EF)
PEP
Domain A
Figure 8.4
PEP
Domain B
Experimental testbed.
bandwidth at about 7 Mbps. Around time 40 sec, the PEP installs the configuration on the router. As a result of this, now the traffic from server Mango gets policed and receives approximately 2 Mbps of bandwidth.
Policy Enforcement Scenario 2: After performing analysis of historical data, the domain A administrator decides that gaming traffic is consuming a substantial amount of bandwidth. After discussion with management, a policy is formulated that restricts gaming traffic to 2 Mbps.
In the second experiment using the same experimental setup, we started EF flow from Guava and BE from Mango, going through the routers to sink Lychee, as seen in Figure 8.4. BE flows to port 5001 of the sink and the gaming application uses port 5000. Figure 8.6 shows that at time 10 sec, both traffics start. They share the available bandwidth equally. At around 27 sec, the PEP (router) is configured to restrict the gaming traffic to 2 Mbps as a result of the request from PDP (BB). The gaming traffic flow drops to around 2 Mbps while the BE flow occupies 6 Mbps. Even after BE flow terminates (time 40 sec), the gaming traffic is still restricted to 2 Mbps.
Policy-Based QoS Management
Figure 8.5
Sample policy scenario 1.
Figure 8.6
Sample policy scenario 2.
207
208
Engineering Internet QoS
8.6 INTERNET2 AND QBONE Internet2 provides a very good infrastructure for experimentation with Diffserv networking concepts. Currently the Internet2 membership consists of over 180 universities and 50 corporations. Universities are connecting to the Internet2 backbone at high-speed access points called GigaPOPs (gigabit per second points of presence). The QoS working group under Internet2 is working to support the development and deployment of advanced network applications through the use of IP traffic differentiation. The Internet2 community is actively developing the bandwidth broker concept and inter-domain signaling. They have also conducted some interoperability tests between various developers. One of the key projects of the WG is the QBone. The goal of the QBone architecture is to specify requirements for participation in an interdomain diffserv testbed where new IP services may be deployed and tested. The QBone architecture has the following two components:
ä Measurement architecture: The QBone measurement architecture makes recommendations for the collection of a set of QoS metrics at interdomain peering points. The collected data will assist in the auditing of SLAs and the debugging of new services. Collection of data has been divided into two categories. The active collection involves IP delay variation, one-way loss, and traceroutes. The passive collection involves interface losses, loads, and link capacities. All data need to be collected for expedited forwarding (EF) as well as best-effort service.
ä Service architecture: The service architecture in its draft form currently specifies only one service, called QBone premium service (QPS). The QPS is expected to provide virtual wirelike assurances end to end. According to the draft, “QPS requires strict policing at all trust boundaries, carefully provisioned priority queues on all interfaces, call admission control, and eventually some means of accounting to recoup the cost of provisioning an elevated service.” Because of these reasons, deployment of QPS has been difficult. The working group has started looking at the QBone scavenger service (QBSS) as a second QBone service. QBSS is expected to be a light-weight service in which users and upstream leaf networks voluntarily mark some traffic for possible downgraded service at downstream congestion points. Details of Internet2 and QBone projects can be found on the Internet and the QBone home pages [14, 15].
Policy-Based QoS Management
209
8.7 RESEARCH DIRECTIONS The research areas of resource management and bandwidth broker have been very active under IETF. Several drafts have been proposed for policy-based management and related issues. Many researchers have proposed architectures and mechanisms to support resource management [16, 17]. End-to-end management of virtual leased line service using the policy-based management framework is described in Rajan et al. [18]. This paper describes a process through which a customer’s request for connectivity between multiple sites is translated into network level policies, and how these policies are resolved into role level policies and activated on network devices. In Bhatia et al. [19], the authors describe a policy server that provides centralized administration of packet voice gateways (soft switches). The policies are specified using a domain-independent policy-description language (PDL). They propose a policy evaluation algorithm and demonstrate that it is quite efficient and well suited for enforcing policies in real networks. Bruan and Khalil [20] describe a range-based approach for SLA where users specify upper and lower range rather than one static value for SLA parameters. Through implementation they show that their concept works for Diffserv-based VPN. A two-tier resource management is proposed in [21]. This resembles the twotier routing hierarchy of the Internet. Autonomous administrative domains make independent decisions on strategies and protocols to use within their network. The aggregate traffic transiting between domains uses long-term bilateral agreement. Chua et al. [17] propose a clearing house (CH) architecture to facilitate resource reservations over multiple network domains. They also propose a mechansim for local admission control. The CH design is scalable to a large user base, as it uses a hierarchy and aggregation approach. Pagani et al. [22] describe a new protocol for providing bandwidth guarantees to multicast sessions on IP-based networks. The end-to-end call admission multicast protocol (CAMP) operates as a distributed bandwidth broker and allows one to combine the benefits of both the Intserv and Diffserv approaches.
8.8 SUMMARY Resource allocation and policy-based management are an important part of building a class based Internet. This chapter examined the IETF RAP working group efforts
210
Engineering Internet QoS
in this direction. Bandwidth broker can perform policy-based management and resource allocation in a QoS sensitive network. Basic building blocks of the bandwidth broker and its functions were described in detail. Overviews of COPS and COPS-PR protocols for performing policy-based management were also provided. Finally, we concluded this chapter by discussing research directions in resource allocation and policy-based management.
8.9 REVIEW QUESTIONS 1. What is a service level agreement? 2. What is a service level specification? 3. What is the role of bandwidth broker in QoS networks? 4. What do we understand by policy-based management? 5. What are some of the inter-intradomain protocols that may be used for building bandwidth broker? 6. What is the difference between COPS and COPS-PR? 7. What is the role of a PEP? 8. What is the role of a PDP? 9. What is a policy information base (PIB)? Are there alternatives to PIB to perform the same function? 10. How can RSVP policy be used for QoS management?
References [1] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss. An architecture for differentiated services. Request for Comments 2475, Internet Engineering Task Force, December 1998. [2] J. Schnizlein, J. Strassner, M. Scherling, B. Quinn, S. Herzog, A. Huynh, M. Carlson, J. Perry, and S. Waldbusser. Terminology for policy-based management. Request for Comments 3198, Internet Engineering Task Force, November 2001.
References
211
[3] Benjamin Teitelbaum, Susan Hares, Larry Dunn, Robert Neilson, Vishy Narayan, and Francis Reichmeyer. Internet2 QBone: Building a testbed for differentiated services. IEEE Network, 13(5):8–16, September/October 1999. [4] Resource allocation protocol working group. http://www.ietf.org/html.charters/rap-charter.html. [5] Raju Rajan, Dinesh Verma, Sanjay Kamat, Eyal Felstaine, and Shai Herzog. A policy framework for integrated and differentiated services in the internet. IEEE Network, 13(5):36–41, September/October 1999. [6] S. Faccin et al. Profile management framework and diameter profile management application. Internet Draft, Internet Engineering Task Force, November 2001. work in progress. [7] P. Calhoun et al. Diameter framework document. Internet Draft, Internet Engineering Task Force, March 2001. work in progress. [8] D. Durham, J. Boyle, R. Cohen, S. Herzog, R. Rajan, and A. Sastry. The COPS (common open policy service). Standard Track, Internet Engineering Task Force, January 2000. RFC2748. [9] K. Chan, J. Seligson, D. Durham, S. Gai, K. McCloghrie, S. Herzog, F. Reichmeyer, R. Yavatkar, , and A. Smith. COPS usage for policy provisioning (COPS-PR). Standard Track, Internet Engineering Task Force, March 2001. RFC 3084. [10] D. Durham and R. Yavatkar. Inside the Internet’s Resource Reservation Protocol. John Wiley and Sons, New York, 1999. [11] Differentiated services on Linux. URL:http://diffserv.sourceforge.net/. [12] Linux 2.4 advanced routing HOWTO. URL: http://www.linuxdoc.org/HOWTO/Adv-RoutingHOWTO.html. [13] Hans Halim and Mardi Darmadi. Implementation of bandwidth broker using COP-PR. Honours thesis report, School of Computer Science and Engineering, UNSW, November 2000. [14] Internet2 home page. URL:http://www.internet2.edu/. [15] Qbone home page. URL:http://qbone.internet2.edu/. [16] Peter Key. Service differentiation: congestion pricing, brokers and bandwidth futures. In Proc. International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV), Basking Ridge, New Jersey, June 1999. [17] Chen-Nee Chuah, Lakshminarayanan Subramanian, Randy H. Katz, and Anthony D. Joseph. QoS provisioning using a clearing house architecture. In Proceedings of International Workshop on Quality of Service, pages 115–124, Pittsburgh, Pennsylvania, June 2000. [18] Raju Rajan, Angela Chiu, and Seyhan Civanlar. A policy based approach for QoS-On-Demand over the Internet. In Proceedings of International Workshop on Quality of Service, pages 167–169, Pittsburgh, Pennsylvania, June 2000.
212
References
[19] Randeep Bhatia, Jorge Lobo, and Madhur Kohli. Policy evaluation for network management. In Proceedings of the Conference on Computer Communications (IEEE Infocom), Tel Aviv, Israel, March 2000. [20] T. Braun and E. Khalil. A range based SLA and edge driven virtual core provisioning in DiffservVPN. In IEEE Conference on Proc. of the IEEE Conference on Local Computer Networks, Tampa, Florida, November 2001. [21] A. Terzis, L. Wang, J. Ogawa, and L. Zhang. A two-tier resource management model for the Internet. In Proceedings of Global Internet, December 1999. [22] Elena Pagani, Gian Paolo Rossi, and Dario Maggiorini. A multicast transport service with bandwidth guarantees for Diffserv networks. Lecture Notes in Computer Science, 1989:129–140, January 2001.
Chapter 9 ATM QoS The international telecommunication union (ITU) has selected asynchronous transfer mode (ATM) as the network technology for realizing the broadband integrated services digital network (B-ISDN). ATM has been designed to satisfy the requirement of carrying both real- and non-real-time data over a single network. In this chapter, we describe the motivation for adopting ATM technology in the Internet infrastructure. This chapter provides an overview of ATM technology, discusses the concepts of QoS mapping when the ATM is integrated in IP networks, and the mechanisms for supporting IP Diffserv over ATM networks. This background is essential for understanding the MPLS topic. However, readers familiar with the ATM network may skip this chapter. Readers interested in learning more about the ATM network would find the references in the Further Reading section (Section 9.10) useful.
9.1 WHY ATM NETWORKS? The following limitations or drawbacks of traditional telecommunication networks have worked as a driving force behind B-ISDN and ATM: Service Dependence: Traditional networks were designed to support a particular service. For example, the telephone network was designed to carry only voice. This network is not suitable to carry data or video. Cable TV (CATV) was designed to carry TV channels and cannot be used for voice or data communication. Inflexibility: Network switches and other equipment were designed for connections of specific bandwidth, such as 4-KHz analog signals for analog telephony
213
214
Engineering Internet QoS
or 64 Kbps voice for narrow band ISDN. Such designs do not easily adapt to support new technologies, e.g., compressed voice requiring less than 64 Kbps. Inefficiency: Traditional telecommunication networks are based on circuit switching. Once a circuit is established between two end points, the bandwidth of the circuit is exclusively dedicated to these end points, and other users cannot share the bandwidth even if the bandwidth is not fully utilized. This leads to waste of network resources, especially when data services are supported, as data communication is inherently bursty and does not make continuous use of the entire circuit bandwidth. In late 1980s, ITU started working on a new networking technology, known as B-ISDN, to address the above limitations of traditional networks and meet any future service requirements without requiring fundamental changes in the core networking infrastructure. Clearly, the networking technology, which would support the concept of B-ISDN, would have to have the following two important features: Packet Switching or Asynchronous Transfer: Packet switching can transfer information between two end points asynchronously without requiring dedicated synchronous circuits. This reduces waste of network resources, as packets from multiple sources are queued for transmission over the same link and any unused bandwidth by one source can be used by others. In addition to increasing network utilization, packet switching can support connections with a wide range of bandwidth requirement, as it is not tied to a specific data rate. This makes packet switching future-safe. Fast Processing: The packet switches should be able to operate at very high speeds to support the high bandwidth connections of the future, e.g., HDTV. ATM was chosen as the transfer technology for B-ISDN because ATM was designed to be a fast-packet switching technology based on small, fixed size packets called cells. ATM is service independent, and information from any type of service is carried in the ATM cells. An overview of the ATM networking architecture is described in the following sections.
9.2 PROTOCOL ARCHITECTURE Figure 9.1 shows the protocol architecture for ATM, defined by ITU-T. ATM is the common layer used by all services running over ATM networks. All information
ATM QoS
215
Management plane Control plane Higher layer (e.g., Q.2931)
User plane Higher layer (e.g., TCP)
Adaptation layer (e.g., AAL5) ATM layer Physical layer (e.g., SONET)
Figure 9.1
ATM protocol architecture.
at the ATM layer is transported in 53-byte fixed-size cells. The adaptation layer is service specific; there are different adaptation protocols for different services. The adaptation layer maps application information into ATM cells and vice versa. The physical layer supports the encoding of data onto physical transmission media. In addition to the protocol layers, ATM protocol architecture includes three separate planes: user plane, control plane, and management plane. The user plane supports transmission of user information; the control plane provides connection controls; and the management plane performs coordination among layers as well as management of network resources.
9.3 CONNECTIONS ATM is a connection-oriented networking technology. In this section, we describe various types of connections possible in ATM networks.
216
Engineering Internet QoS
9.3.1 Virtual Channel Transferring information through the ATM network requires first setting up a connection with the destination. All subsequent cells follow the same connection. Such an end-to-end connection is called a virtual channel (VC). There can be many VCs multiplexed onto the same physical link. A VC supports only one-way communication. For full-duplex or two-way communications, a pair of VCs, one in the forward direction and the other in the reverse direction, are established between two end systems. The bandwidth allocated for the forward and the reverse directions can be different. 9.3.2 Virtual Path An additional level of connections, called virtual paths or VPs, is supported by the ATM networks. A number of VCs can be multiplexed onto the same VP. This way, the intermediate switches can manage a large number of VCs by simply maintaining a small number of VPs, which significantly reduces the connection management load in the ATM switches. This also allows for two levels of switches in the ATM network. Figure 9.2 illustrates the concept of multiplexing VCs and VPs into a physical channel. 9.3.3 Permanent and Switched Virtual Circuits Two different types of connections can be set up in ATM networks, depending on the user requirements. Permanent virtual circuits (PVCs) are set up between two points by the network operator. PVCs are more appropriate for high volume of data transfer and they usually stay in place for months before being torn down by the operator. A semipermanent PVC can be set up and torn down more frequently as need arises. The duration of semipermanent PVCs usually ranges from a few days to months. PVCs and semipermanent PVCs are suitable for implementing virtual leased lines over ATM networks. It is worthwhile mentioning that data transfer does not encounter any VC setup delay with PVCs, as these PVCs are set up prior to any data transfer. For low volume, bursty data transfers lasting only a few seconds to a few minutes, a switched virtual circuit (SVC) is set up and released by a user using signaling from the application. A finite amount of time is required to set up the circuit before data transmission can start.
ATM QoS
217
VC
VP
Physical
channel
VP
VC VC VC VC
VC
VP
VP
Figure 9.2
VC VC VC
VC VC VC
VC VC VC VC
Relationship between virtual circuits and virtual paths.
PVCs do not provide a scalable solution for ATM networks. When there are a large number of sources and destinations, PVC-based solutions will require a large number of virtual circuits to be maintained in the network. Although most early implementations of ATM used PVCs for simplicity (no signaling), future ATM networks will increasingly use SVCs to dynamically set up and release ATM connections on demand.
9.4 INTERFACES Like any large networks, high-speed ATM networks have a core where high-speed switches are interconnected via high-capacity links and hosts are connected to the edge switches using relatively low-speed links. The link-layer protocol used for interswitch communication is slightly different than that used between a host and a switch. Therefore, there are two main categories of ATM interfaces, user-to-network
218
Engineering Internet QoS
interface (UNI) to connect a host to a switch and network-to-network interface (NNI) to connect a switch to another switch. Figure 9.3 illustrates the use of these two interfaces. The cell formats, as we will see in the following section, are also slightly different for these two interfaces. Host
Host UNI
Host
Figure 9.3
ATM switch
NNI
ATM switch
NNI
UNI
ATM switch
UNI UNI
Host
User-to-network and network-to-network interfaces.
9.5 CELL FORMATS ATM cells have 5-byte headers and 48-byte payloads. The cell formats for the UNI and the NNI interfaces are shown in Figures 9.4 and 9.5, respectively. As we can see, the only difference is the presence of a flow control field (GFC field) in the UNI at the expense of a smaller virtual path field. There are a total of six fields in the UNI cell headers and five in the NNI cell headers: Generic Flow Control (GFC, 4 bits). This field was originally included in the UNI format to support a flow control protocol between the user and the network. However, no such flow control has been standardized. Virtual Path Identifier (VPI) and Virtual Channel Identifier (VCI). These two fields carry the identifiers for the VP and the VC to which the cell belongs. These identifiers are used by the ATM switches to route (or switch) a cell to the correct destination. The VPI field is 8-bits long in the UNI format and 12-bits long in the NNI format. The VCI is 16-bits long in both UNI and NNI. Payload Type (PT, 3 bits). The high-end bit is used to indicate whether it is a data (set to 0) or a management (set to 1) cell. For data cells, the ATM switches use the middle bit to indicate network congestion (1 means congestion and 0 means no congestion) to the end-systems, and the low-end bit can be used to identify two types of cells. For AAL5, the cells carrying the last segment can be marked as a different type than the cells carrying other segments (useful for reassembly).
ATM QoS
8
7
6
5
4
3
GFC
VPI
VPI
VCI
219
2
1
VCI VCI
PT
CLP
HEC
Cell payload (48 octets)
Figure 9.4
Cell format for the user-to-network interface.
Cell Loss Priority (CLP). CLP allows two levels of priority for user cells. During congestion, the ATM switches drop the low priority cells (CLP=1) before discarding high priority (CLP=0) ones. Header Error Correction (HEC, 8 bits). These 8 bits contain an error code calculated on the remaining 32 bits in the cell header. The 8-bit error code is capable of correcting any single-bit error and detecting most of the multibit errors. However, the receiver does not always try to correct an error when it detects one. The receiver works in two modes, correcting mode and detecting mode. It will correct a single-bit error only when it is in the correcting mode. All cells with an error in the header, even if the error is a single-bit error, are discarded when the receiver is in the detecting mode. The receiver starts in the correcting mode and switches to the detecting mode when it receives a cell with an error in the header. It remains in the detecting mode as long as it continues to receive cells with incorrect headers. It switches back to the correcting mode when it receives a cell with no error in the header.
220
Engineering Internet QoS
8
7
6
5
4
3
2
1
VPI VCI
VPI VCI VCI
PT
CLP
HEC
Cell payload (48 octets)
Figure 9.5
Cell format for the network-to-network interface.
9.6 QoS SUPPORT ATM has several mechanisms in place to support QoS guarantees for the end user. In this section, we describe the basic mechanisms for supporting QoS in ATM networks. 9.6.1 Traffic Contract When an ATM connection is established, the user and the network enter a traffic contract that has to be honored by both the user and the network for the entire duration of the connection. The traffic contract has two parts, as shown in Figure 9.6; the traffic descriptor part defines the traffic pattern the source promises not to violate and the QoS descriptor part defines the QoS the network promises to guarantee to the user for this connection. The traffic and QoS descriptors are defined by several parameters as described below.
ATM QoS
221
Traffic contract Traffic descriptor part (obeyed by the user) Traffic parameters (PCR, SCR, MCR, MBS, MFS, CDVT)
Figure 9.6
QoS descriptor part (guaranteed by the network) QoS parameters (maxCTD, CDV, CLR)
ATM traffic contract.
9.6.2 Traffic Descriptions Combinations of the following five parameters define the source traffic pattern for a given connection: Peak Cell Rate (PCR): PCR defines the maximum rate at which the source can submit cells to the network over this connection. Sustainable Cell Rate (SCR): SCR defines the upper bound of the average cell rate for this connection. If the average cell rate exceeds SCR, the source would be violating the traffic contract. Minimum Cell Rate (MCR): MCR is the minimum cell rate commitment asked from the network. Maximum Burst Size (MBS): MBS is the maximum number of cells that can be sent back-to-back at the peak rate. Maximum Frame Size (MFS): MFS is the maximum size of a frame that can be carried over ATM. In addition to the above five source traffic descriptors, another parameter that is used to define the traffic pattern of an ATM connection is cell delay variation tolerance (CDVT). Even when a source is transmitting at the peak cell rate, the delay between cells is not always constant at the user network interface (UNI). The variation in delay is caused by factors such as transmitting physical layer overhead cells, cell multiplexing, and so on. CDVT defines the maximum cell delay variance.
222
Engineering Internet QoS
9.6.3 QoS Parameters ATM defines absolute end-to-end guarantees using several QoS parameters. The following QoS parameters are defined by the ATM Forum: Maximum Cell Transfer Delay (maxCTD): Cell transfer delay (CTD) is the time spent by a cell in the network including any fixed delays, such as propagation and switching delays, and variable delays such as queuing delay. maxCTD imposes an upper limit on CTD. Cell Delay Variation (CDV): CDV is the difference between the best case and worst case of CTD. The best case is the fixed delay and the worst case is the maxCTD. Cell Loss Ratio (CLR): This is the ratio of lost cells to total cells transmitted. Although the primary cause of cell loss in ATM networks is buffer overflow at intermediate switches, this is not the exclusive cause. Cells may also be lost due to misrouting. The above parameters are used to define absolute, quantitative guarantees for cells traveling through a given VC. In addition to these absolute parameters, ATM Forum has recently defined a new parameter, called behavior class selector (BCS), to support differential treatment to a group of VCs belonging to different behavior (priority) classes [1]. An ATM network provider can define a set of behavior classes where a given class from this set can be indicated via the BCS parameter. The support of behavior class and BCS are optional. Later in the chapter we will see how BCS can be used to support Diffserv over ATM. 9.6.4 Service Classes ATM supports five different service classes, each one providing a different level of QoS. The class of service is specified by the sources during connection setup. Depending on the required service, the network reserves appropriate resources to satisfy the QoS requirements. The following services have been defined by the ATM Forum [2]: The Constant Bit Rate (CBR): is similar to a leased line and is characterized by the Peak Cell Rate (PCR). It requires a guaranteed bandwidth and delay from the network.
ATM QoS
223
The Variable Bit Rate (VBR): service is characterized by the peak cell rate, the sustainable cell rate and the mean burst size. The VBR service is further divided into real-time VBR (rt-VBR) and non-real-time VBR (nrt-VBR), depending on the guarantees required from the network. The Available Bit Rate (ABR): service was designed for data applications and uses rate-based congestion control to manage the traffic in the network. In case of network congestion, the network uses feedback to inform the sources to reduce the data rate. An MCR is negotiated at connection setup, but the MCR may be zero. The feedback is implemented by resource management (RM) cells. ABR sources periodically send special RM cells that are marked by the switches with congestion information and the amount of bandwidth available at the switches. The destination turns around the RM cells toward the sources. The sources, on receiving the RM cells, adjust their source rates depending on the network congestion and the amount of bandwidth available at the switches. A set of source, destination, and network rules specify the operation of ABR connections. As long as the ABR sources conform to the rate specified by the network, the loss rate of the ABR connection is very low. The Unspecified Bit Rate (UBR): service does not provide the user with any bandwidth or delay guarantee. The network tries its best to carry such traffic. In case of congestion, the network drops cells and relies on the end applications to detect such losses and initiate retransmission. UBR is equivalent to the best-effort service provided by the IP layer in the current Internet. The service provided to a UBR connection can be optionally differentiated by associating a BCS parameter to the connection. A UBR connection associated with a BCS parameter is called Differentiated UBR. Later in the chapter, we will see how a Differentiated UBR can be used to map Diffserv over ATM. Guaranteed Frame Rate (GFR): is designed to work at the frame boundary rather than the cell boundary for the purposes of cell discard during congestion. The frames refer to the upper-layer protocol data units (PDU). Like ABR, GFR has an MCR, but it has no flow control. The relevance of traffic and QoS parameters to each service class is shown in Table 9.1. Figure 9.7 shows the usage of a link bandwidth among various service classes during steady state operation, i.e., a period of time without any addition or deletion of VCs. A constant amount of bandwidth is reserved and used by the CBR sources, followed by the variable amount of bandwidth being used by the VBR sources.
224
Engineering Internet QoS
Table 9.1 Traffic and QoS Specifications for Different Service Classes Service Class
Traffic Parameter
QoS Parameter
CBR rt-VBR nrt-VBR ABR
PCR PCR, SCR, MBS PCR, SCR, MBS PCR, MCR
UBR GFR
PCR PCR, MCR, MBS, MFS
maxCTD, CDV, CLR maxCTD, CDV, CLR CLR CLR (some networks may not allow specification of a CLR) No QoS, but service differentiation via BCS CLR (some networks may not allow specification of a CLR)
There is no mechanism to slow down the data rate of the CBR or VBR connections; hence, congestion control in ATM networks is achieved by reducing the data rate of the ABR sources. It should be noted, however, that it is possible to reserve a nonzero minimum bandwidth for an ABR connection, in which case the ABR connection continues to receive at least the minimum bandwidth plus any leftover bandwidth not used by the CBR and VBR sources. Any bandwidth not used by CBR, VBR, or ABR, is used to transmit UBR traffic.
9.7 ADAPTATION LAYERS Different applications have different service requirements in terms of timing, delay, variability of transmission rate, and so on. The ATM adaptation layer (AAL), which resides on top of the ATM layer, allows any type of service or protocol to connect to the ATM. The AAL layer accepts data from any other protocol and transmits the data in fixed-size ATM cells. Each adaptation layer is further divided into two sublayers: segmentation and reassembly (SAR) sublayer and convergence sublayer (CS). The CS can add more information to the data units, such as a sequence number. The higher layer protocol data units (PDUs) are broken down into fixed-size ATM cells by the SAR sublayer. To satisfy the different service requirements of the applications, the following four AALs have been defined:
ä AAL1 for constant bit rate services, such as 64 Kbps voice;
ATM QoS
225
Bandwidth
Maximum link bandwidth UBR, ABR, GFR
VBR
CBR Time Figure 9.7
Distribution of link bandwidth among various traffic classes during steady state.
ä AAL2 to carry variable bit rate data, such as MPEG2 coded video; ä AAL3/4 for transporting data from connection-oriented packet switching networks;
ä AAL5 to support connectionless data communications, such as TCP/IP traffic. Since, in this book, we are primarily concerned with IP traffic, we are going to discuss the details of only AAL5. The protocol stack for running TCP/IP applications over ATM using AAL5 is shown in Figure 9.8. AAL5 does not have any header, it only has an 8-octet trailer. The format and the protocol fields of AAL5 PDU at the CS layer are shown in Figure 9.9, with the fields having the following meaning: Payload (0-65535 octets): is a variable length field to hold the the higher layer data. PAD (0-47 octets): is a variable length field used to make the length of the AAL5 PDU a multiple of 48 octets, each of these 48 octets is then carried in one ATM cell.
226
Engineering Internet QoS
Application TCP/UDP IP AAL5 ATM Figure 9.8
Protocol stack for TCP/IP over ATM.
Size in octets
0−65,535
Payload
0−47
1
1
PAD
UU
CPI
2
4
Length
CRC
8−octet trailer
Figure 9.9
Format of AAL5 encapsulation.
User-to-User (UU, 1 octet): is a single octet field not directly used by the adaptation layer, but used by a higher layer for purposes like sequencing and multiplexing. Common Part Indicator (CPI, 1 octet): indicates which interpretation should be used to interpret the rest of the fields in the trailer. Since only one interpretation is defined at the moment, this field is not used. Length (2 octets): field specifies the length of the data being carried in the payload field. CRC (4 octets): is used to detect bit errors in AAL5 PDU using cyclic redundancy check.
ATM QoS
227
The AAL5 CPCS-PDU is broken down into segments of 48 octets that are carried as payload of ATM cells. A 5-octet header is added to the 48-byte payload to form an ATM cell that is then transported over the ATM network.
9.8 IP-ATM INTEGRATION The ATM network has been designed to be used both as a local area network (LAN) and a wide area network (WAN). Consequently, the ATM network was once envisioned to become a ubiquitous network that would replace most of the current networks. However, because of the large installed base of legacy LANs (e.g., Ethernet) and associated TCP/IP infrastructure, and the higher pricing of ATM equipment, the current trend is to continue to use legacy LANs as much as possible and use ATM as a high-speed backbone network to interconnect legacy LANs using TCP/IP. Therefore, TCP/IP and ATM are going to coexist and interwork for the foreseeable future. This chapter describes various configurations and architectures that have been proposed for running TCP/IP over ATM networks, and discusses the QoS interworking between these two networks. 9.8.1 ATM Deployment in IP Networks Running TCP/IP over ATM networks involves, among other things, breaking large, variable-size TCP/IP packets into small, fixed-size ATM cells. Depending on the point in the network where the breaking of the IP packets takes place, there are two scenarios to deploy TCP/IP over ATM networks: ATM to the desktop and ATM in the backbone. The ATM to the desktop approach is based on connecting the hosts directly to an ATM network using ATM network interface cards as shown in Figure 9.10. The ATM network essentially replaces the legacy LAN, and ATM is configured to operate as an emulated LAN. The segmentation of the TCP/IP packets into ATM cells takes place at the hosts. In the ATM in the backbone approach, the hosts in an enterprise are connected using legacy LANs, and a gateway connects the enterprise to the ATM network using a multiprotocol router, as shown in Figure 9.11. In this case, the segmentation of the TCP/IP packets into ATM cells occurs at the gateway, and the hosts are not aware of the presence of ATM.
228
Engineering Internet QoS
TCP/IP host equipped with ATM NIC
TCP/IP host equipped with ATM NIC
ATM used to emulate legacy LANs
TCP/IP host equipped with ATM NIC
Figure 9.10
TCP/IP host equipped with ATM NIC
TCP/IP hosts connected to an ATM LAN using the ATM-to-the-desktop approach.
9.8.2 Encapsulation of IP Datagrams into ATM Cells
To transfer IP traffic over ATM networks, IP datagrams must be encapsulated into ATM cells. In this section, we discuss the standard methods of encapsulating IP into ATM. Two different methods of encapsulating connectionless data over an ATM network using AAL5 have been suggested in RFC 1483 [3]. The first method, called LLC/SNAP encapsulation, multiplexes a number of connectionless data streams (belonging to different protocols) over a single ATM VC using a logical link control (LLC) header. The second method uses individual ATM VCs to carry packets belonging to different protocols; hence, it is called VC based multiplexing. Since both routers (or hosts) working at layer 3, and bridges (or LAN emulation hosts) working at layer 2, can be connected to ATM networks, the two standard encapsulation methods use a slightly different format for carrying a layer 3 (routed) and layer 2 (bridged) protocols. The following two subsections describe LLC/SNAP encapsulation and VC based multiplexing for both routed and bridged protocols.
ATM QoS
229
Ethernet LAN
Ethernet LAN
TCP/IP host
TCP/IP host
IP−ATM gateway
ATM network
IP−ATM gateway
TCP/IP host
TCP/IP host
Figure 9.11
TCP/IP networking using ATM-in-the-backbone approach.
LLC/SNAP Encapsulation The LLC/SNAP multiplexing technique is used when several protocols are carried over the same ATM VC. The AAL5 payload formats for the routed and bridged protocols vary and are given below. Routed Protocols To enable the destination to differentiate between the different protocols that are being carried over the same VC, the source appends a 3-octet logical link control (LLC) header and a 5-octet subnetwork attachment point (SNAP) header, which are carried in the AAL5 payload of the transmitted data. The AAL5 payload for encapsulating an IP packet is shown in Figure 9.12. The SNAP header consists of a 3-octet organizationally unique identifier (OUI) and a 2-octet protocol identifier (PID). The OUI identifies the organization that administers the meaning of the codes used to identify different protocols in the PID field. Bridged Protocols The payload for bridged protocols differs slightly from the payload of the routed protocols described above. As an example, the AAL5 payload for the bridged Ethernet is shown in Figure 9.13. A value of 0xAA-AA-03 in the LLC field indicates the presence of the SNAP header consisting of the OUI and PID fields. A value of
230
Engineering Internet QoS
LLC (3 octets) OUI (3 octets) PID (2 octets) Non−ISO PDU (up to 2^16 −9 octets)
Figure 9.12
AAL5 payload for encapsulating an IP packet.
0x00-80-C2 in the OUI field represents the organizational code for the IEEE 802.1 working group. A PID value of 0x00-01 represents the presence of a LAN FCS field and a bridged Ethernet protocol, whereas a PID value of 0x00-07 represents a bridged Ethernet protocol with no LAN FCS field. The LAN FCS field contains the frame check sequence of the original PDU. Since the PDUs of all the different protocols are carried over the same VC in the LLC/SNAP encapsulation method, the method is suitable when it is not convenient or possible to dynamically open a large number of VCs without incurring significant cost. VC-Based Multiplexing The VC-based multiplexing scheme is used when a large number of VCs can be opened without incurring significant cost. In the VC-based multiplexing scheme, a host opens a number of VCs to the destination, each VC being used to carry packets for a different protocol, as shown in Figure 9.14. The destination host differentiates between the PDUs of the different protocols by the different VC numbers. There is no overhead (such as the LLC overhead used in the LLC multiplexing scheme) in order to differentiate packets from different protocols. The advantage of this method is that it requires minimal bandwidth to transmit data, and only a small overhead to process headers. This scheme is better than the LLC multiplexing scheme when a large number of VCs can be dynamically opened very quickly without incurring much cost. It is anticipated that this scheme will
ATM QoS
231
LLC 0xAA−AA−03 (3 octets) OUI 0x00−80−C2 (3 octets) PID 0x00−01 or 0x00−07 (2 octets) PAD MAC Destination address Remainder of MAC frame LAN FCS (if PID is 0x00−01)
Figure 9.13
AAL5 payload for the bridged Ethernet.
VC1 VC2 VC3 Source Figure 9.14
IP Netbeui IPX
Destination
Two hosts exchanging multiprotocol data over ATM using VC-based multiplexing.
232
Engineering Internet QoS
IP datagram (up to 65,535 octets)
Figure 9.15 AAL5 payload format for encapsulating IP datagrams in the VC-based multiplexing scheme.
prevail in private ATM networks. We describe below the techniques to implement the VC-based multiplexing scheme in the routed and bridged protocols. Routed Protocols Since the destination does not need to differentiate between PDUs carried in a particular VC, the AAL5 payload field using VC-based multiplexing and routed PDUs will therefore consist of only the IP PDU, as shown in Figure 9.15, for IP over ATM. Bridged Protocols When PDUs of bridged protocols are carried using the VC-based multiplexing scheme, the payload format is similar to when it is carried using the LLC encapsulation scheme, except that the LLC, OUI, and PID fields are no longer required. The payload format for a bridged Ethernet carried using the VC-based multiplexing scheme is shown in Figure 9.16.
9.9 IP-ATM QoS MAPPING The QoS models and parameters native to the connection-oriented ATM (defined by the ATM Forum) are different than the ones defined for the connectionless IP networks by the IETF. Therefore, when IP traffic needs to be carried over ATM networks, appropriate QoS mapping at the IP-ATM boundary is necessary to preserve the desired QoS indicated through the IP QoS mechanisms. In this section, we discuss the QoS mapping techniques that allow us to map IP QoS onto the QoS requirements in ATM networks.
ATM QoS
233
PAD MAC destination address Remainder of MAC frame LAN FCS (if PID is 0x00−01)
Figure 9.16
Payload for a bridged Ethernet in the VC-based multiplexing scheme.
Table 9.2 Recommended Mapping of Intserv Service Classes to ATM Service Classes Intserv Service Class
ATM Service Class
Guaranteed service Controlled load Best Effort
CBR or rt-VBR CBR or rt-VBR UBR or ABR
9.9.1 Intserv over ATM Table 9.2 shows the mapping of Intserv service classes to ATM service classes recommended by RFC2381 [3]. The traffic parameter mappings of the guaranteed and the controlled load services are shown in Tables 9.3 and 9.4, respectively. 9.9.2 Diffserv over ATM There is a difference between the ways Diffserv and ATM support QoS. ATM explicitly provides a finite number of service classes with firm end-to-end QoS guarantees on parameters like maximum delay and loss rate. Diffserv does not have service classes, it supports well-defined PHBs that can be used as building blocks to build a desired service. Diffserv can support both types of services, services with
234
Engineering Internet QoS
Table 9.3 Traffic Parameter Mapping for Intserv’s Guaranteed Service ATM Parameter
Guaranteed Service Parameter
PCR SCR MBS
pr R br
Table 9.4 Traffic Parameter Mapping for Intserv’s Controlled Load Service ATM Parameter
Controlled Load Parameter
PCR SCR MBS
pr rr br
absolute guarantees and services with relative differentiation. Different mapping techniques are used for these two types of services: Mapping Services with Absolute Guarantees: The EF PHB can be used to define services with absolute guarantees (also known as premium services). For example, a virtual leased line service with a guaranteed bandwidth can be built by marking all IP datagrams with the DSCP of 101110. This service can be easily mapped over an ATM core using the ATM CBR service. Mapping Services with Differential Treatment: The AF PHBs can be used to establish several priority services, such as gold, silver, and bronze, in Diffserv networks. There are no absolute guarantees for these services, but the Diffserv guarantees that gold packets will always receive better treatment than silver and bronze packets in the intermediate routers. Such differentiated services can be mapped to the ATM network using the differentiated UBR. Figure 9.17 illustrates the mapping of three Diffserv AF classes (representing gold, silver, and bronze services) onto three differentiated UBR VCs between a pair of IPATM devices. Each UBR VC has been assigned a different BCS. A mapping table can be instantiated to map a particular Diffserv DSCP to a particular ATM VC, depending on the class and loss priority. A mapping module (QoS mapper
ATM QoS
IP-ATM device
235
IP-ATM device Differentiated UBR VC 1
Incoming IP datagram
QoS mapper QoS mapping table
Figure 9.17
Differentiated UBR VC 2 Differentiated UBR VC 3
Mapping of Diffserv to ATM using differentiated UBR connections.
in the figure) can then determine which VC to select for an incoming datagram based on the DSCP in the datagram header. 9.9.3 Performance Implications of QoS Mapping The QoS mapping techniques discussed in the previous sections provide an approximation for the QoS translation from IP to ATM. There are several issues for insuring an acceptable mapping between these two different networks. For example, ATM cells have only two drop precedences, but Diffserv AF classes have three. Therefore, if an IP provider is using an AF class with three drop precedences, these will have to be mapped to two drop precedences. Another mismatch is in the units used to measure rates. IP uses bytes per second to measure data rate, but ATM uses cells per second. Translating a cell per second to byte per second is not that simple, due to padding in some cells. The padding is variable and depends on the IP datagram size. Due to these inherent difficulties, QoS mapping may lead to less-thanexpected results. Studies conducted in an experimental environment [4] have shown that IP-ATM QoS mapping indeed can have significant impact on the QoS received by the applications. It is therefore advised to rigorously study the performance of a given mapping before using it in production systems. It may be necessary to overprovision resources to accommodate inaccuracies in QoS mapping between IP and ATM networks. 9.9.4 MPLS Solution QoS mapping provides a rather clumsy way of integrating the IP QoS and the ATM QoS. Multiprotocol label switching (MPLS) provides a more elegant solution for IP over ATM. With MPLS, “connections” or paths can be set up over IP networks
236
Engineering Internet QoS
where these connections can actually be established over ATM VCs, making it much simpler to support QoS over IP and ATM. MPLS is discussed in Chapter 10.
9.10 RESEARCH DIRECTIONS ATM reseach has already matured, with products currently deployed in the backbone network. We provide a few references for interested readers. Details of IP over ATM may be found in Siu and Jain [5] and in Armitage and Adams [6]. IP multicast over ATM has been discussed in Armitage [7]. Performance of TCP/IP protocols over ATM has been discussed in Kalyanaraman et al. [8], and Hoang and Wang [9]. In Ahuja et al. [10] the authors describe the design, implementation, and performance tuning of a transport layer targeted specifically for ATM networks. A survey on congestion control and traffic management in ATM networks has been provided in Jain [11]. Bandwidth management and admission control issues in ATM network have been discussed in Chong et al. [12] and Elwalid et al. [13]. Crowcroft et al. [14] provide a comparison of IETF and ATM service models. The interworking and mapping of Intserv QoS, and ATM QoS have been recommended in three IETF RFCs, RFC2379 [15], RFC2380 [16], and RFC2381 [3].
9.11 SUMMARY Like IP, ATM is also a packet switching networking technology. ATM is connectionoriented and can support QoS guarantees for each connection. To support QoS, ATM relies on a traffic contract between an application session and the network. The application describes its traffic using a set of traffic description parameters. Given these traffic parameters, the agreed QoS is defined using a set of QoS parameters. ATM has several service classes; each uses a subset of all possible QoS parameters to define the service. ATM uses very short packets, called cells. Large IP datagrams are transmitted over ATM networks using an adaptation layer. For QoS-capable IP networks, a QoS mapping is needed to run IP over high-speed ATM networks. Differentiated UBR is a new option defined by the ATM Forum to facilitate mapping of Diffserv over ATM networks.
References
237
9.12 FURTHER READING Numerous books, articles, and standard documents (RFCs and ATM Forum documents) have been published on ATM in the past decade or so. For nontechnical readers, ATM for Dummies [17] provides a good tutorial for ATM. ATM Theory and Application [18] is a good book for technical readers. Performance analysts interested in the performance issues and concepts in ATM networks will find Asynchronous Transfer Mode Networks: Performance Issues [19] a useful reference. Good technical coverage of ATM traffic control functions, such as the policing functions to enforce a given cell rate, etc., can be found in Stalling’s “High Speed Networks and Internet: Performance and Quality of Service” [20].
9.13 REVIEW QUESTIONS 1. Where does ATM fit in the Internet infrastructure? 2. Why are ATM cells so short? 3. What are the main functions of AAL5? 4. What are the two standard encapsulating techniques for transporting IP datagrams over ATM networks? Explain the differences between these two techniques. 5. What is the benefit of virtual paths? 6. Why do NNI cells not have the flow control field in the cell header? 7. Describe the QoS parameters used in ATM networks. 8. What is the key difference between GFR and any other service? 9. What is QoS mapping? What are the challenges in defining an effective QoS mapping between IP and ATM? 10. What is differentiated UBR? How can differentiated UBR be used to support Diffserv over ATM?
References [1] ATM Forum. Addendum to TM Version 4.1 : Differentiated UBR, July 2000.
238
References
[2] ATM Forum. Traffic Management Specification Version 4.1, March 1999. [3] M. Garrett and M. Borden. Interoperation of Controlled-Load Service and Guaranteed Service with ATM. Request for Comments 2381, Internet Engineering Task Force, August 1998. [4] P. Francis-Cobley and N. Davies. Performance Implications of QoS Mapping in Heterogeneous Networks Involving ATM. In IEEE International Conference on ATM, pages 529–535, France, June 1998. [5] K.-Y. Siu and R. Jain. A brief overview of ATM: protocol layers, LAN emulation, and traffic management. ACM Computer Communication Review, 25(2):6–20, April 1995. [6] Greenville J. Armitage and Keith M. Adams. How inefficient is IP over ATM anyway? IEEE Network, 9(1):18–26, January/February 1995. [7] G. J. Armitage. Multicast and multiprotocol support for ATM based internets. ACM Computer Communication Review, 25(2):34–46, April 1995. [8] Shiv Kalyanaraman, Raj Jain, Sonia Fahmy, Rohit Goyal, and Seong-Cheol Kim. Performance and buffering requirements of Internet protocols over ATM ABR and UBR services. IEEE Communications Magazine, 36(6):152–157, September 1996. [9] D. B. Hoang and Z. Wang. Performance of TCP applications over ATM networks with ABR and UBR services a simulation analysis. Computer Communications, 23(9):802–815, April 2000. [10] R. Ahuja, S. Keshav, and H. Saran. Design, implementation, and performance of a native mode ATM transport layer. In Proceedings of the Conference on Computer Communications (IEEE Infocom), San Francisco, California, March 1996. [11] R. Jain. Congestion control and traffic management in ATM networks: recent advances and a survey. Computer Networks and ISDN Systems, February 1995. [12] Song Chong, San-qi Li, and Joydeep Ghosh. Dynamic bandwidth allocation for efficient transport of real-time VBR video over ATM. IEEE Journal on Selected Areas in Communications, 13, January 1995. [13] Anwar Elwalid, Debasis Mitra, and Robert H. Wentworth. A new approach for allocating buffers and bandwidth to heterogeneous regulated traffic in an ATM node. IEEE Journal on Selected Areas in Communications, 13(6):1115–1127, August 1995. [14] J. Crowcroft, Z. Wang, A. Smith, and J. Adams. A rough comparison of the IETF and ATM service models. IEEE Network, 9(6):12–16, November 1995. [15] L. Berger. RSVP over ATM Implementation Guidelines. Request for Comments 2379, Internet Engineering Task Force, August 1998. [16] L. Berger. RSVP over ATM Implementation Requirements. Request for Comments 2380, Internet Engineering Task Force, August 1998. [17] C. Gadecki and C. Heckart. ATM for Dummies. Hungry Minds Inc., 1997.
References
239
[18] D. McDysan and D. L. Spohn. ATM Theory and Application. McGraw-Hill, New York, 1998. [19] R. Onvural. Asynchronous Transfer Mode Networks: Performance Issues. Artech House, Norwood, Massachusetts, 1995. [20] W. Stalling. High-Speed Networks and Internets: Performance and Quality of Service. Prentice Hall, Upper Saddle River, New Jersey, 2002.
240
References
Chapter 10 Multiprotocol Label Switching The early 1990s saw introduction of the ATM by service providers to provide QoS guarantees that were not possible over the best effort IP network. However, this approach had several deficiencies, including late deployment of end-to-end ATM connectivity, scalability, and a high management cost for maintaining two separate networks, namely, IP and ATM. Also, provisioning of ATM- and Frame Relay– based VPN and SVC proved to be difficult and hard to maintain. Researchers started working on alternate ways of engineering the IP networks that could provide QoS, integration of IP and ATM, as well as VPN provisioning. This effort resulted in development of the multiprotocol label switching (MPLS). It is a new technology aimed at reducing the packet forwarding bottleneck at backbone routers. It enables routers at the edge of a network to apply simple labels to packets. The label swapping process is similar to ATM VCI/VPI swapping. ATM switches or existing routers in the network core can switch packets according to the labels with minimal lookup overhead.
10.1 PROPRIETARY PROTOCOLS Several incarnations of MPLS existed from different vendors before standardization efforts started. Some of the initial efforts were designed to increase the speed of packet level forwarding by simplifying the forwarding mechanism. These approaches assumed a cell switched core network. Examples of such effort included Cell Switched Router (CSR) and IP switching. Later approaches involved both cell based as well as packet based core and introduced traffic engineering aspects (more discussion on this later in the chapter). Cisco’s Tag Switching and IBM’s Aggregate
241
242
Engineering Internet QoS
Route–Based IP Switching (ARIS) were prominent efforts in this direction. Details of these protocols can be found in [1]. Multiprotocol label switching (MPLS) is currently under development by the IETF MPLS working group [2]. The main objective of this group is to standardize this variety of proprietary approaches so that interoperable products can be developed.
10.2 MOTIVATION Following are some of the motivating factors behind development of this new solution:
ä MPLS combines the layer 2 switching and layer 3 routing functions. It is a convergence of connection-oriented (ATM) forwarding techniques and the Internet’s routing protocols. This approach has the advantage of scalability of Internet’s routing protocols and traffic engineering benefits achieved by the optimizing capabilities of ATM switches.
ä The MPLS network obviates the need for expensive longest-match lookup procedure for each packet at each router along the path to destination. This results in significant performance enhancement.
ä The hierarchy of routing supported by the MPLS network reduces the size of the routing table for internal routers within a domain. These routers do not need to keep routing information for transit traffic. We discuss hierarchical routing more in Section 10.7.
ä MPLS is capable of forwarding packets via routes that are not on the shortest path. It is not possible to do nonshortest path routing with conventional IP routing. MPLS facilitates establishment of tunnels between domains that don’t support the label switch path [3]. These novel features make MPLS a technology of choice for the service providers to launch new services such as QoS-based virtual private network using a MPLS backbone.
Multiprotocol Label Switching
243
10.3 MPLS BASICS As is traditional with various standard bodies, the MPLS WG has its own set of terminologies. We introduce below a few of these terminologies that will be used subsequently in this chapter:
ä Flow: A single instance of an application-to-application data transfer; å Forwarding equivalence class (FEC): a group of IP packets that are forwarded along the same path. These packets are treated the same way by a router. The concept of FEC provides for a great deal of flexibility and scalability. An example of FEC could be all IP packets with their destination addresses matching a certain prefix (such as 192.25.8). Other examples may include an address prefix and other IP header fields such as type of service (ToS);
å Label: a short fixed-length physically contiguous identifier that is used to identify an FEC (local significance). In practical implementation terms, it could be an index of the routing/forwarding table;
å Label switched router (LSR): An MPLS node that is capable of forwarding layer 3 packets;
å Label edge router (LER): An LER is the entry/exit point of IP packets into/from the MPLS domain. The LERs are capable of performing conventional IP routing as well as label-based forwarding. With label switching, the complete analysis of the layer 3 IP header is performed only once: at the ingress LER, which is the entry point of a packet in this network. At this location, the Layer-3 header is mapped into a label. In some cases it is possible that the packet arrives with a label from another MPLS domain. At each LSR across the network, only the label of an incoming packet need be examined for making forwarding decisions. At the other end of the MPLS domain, an egress LER strips the label. Figure 10.1 shows an MPLS domain with a set of LSRs as well as the ingress/egress LERs. All routers at the boundary of the MPLS domain are called LERs. However, the ingress/egress LERs are associated with a particular flow.
244
Engineering Internet QoS
LSR
LER
LER LSR
Ingress LSR MPLS Domain LER
LSR
LSR
Figure 10.1
LER Egress
MPLS domain.
10.4 CONVENTIONAL IP ROUTING In the current IP network, as a datagram traverses the network, each router extracts all the information relevant to forwarding from the IP packet header (typically the destination address). This information is then used as a key for a routing table lookup (longest match) to determine the packet’s next hop and the output port. The longest prefix match is quite complex as it needs to find the best match between the destination and all routing table entries. Many core routers may have over 10,000 entries in their routing tables [4]. Let’s take an example of an incoming packet that has the destination IP address of 192.94.172.14. A routing table has 192.94.*, 192.94.172.*, and 192.94.172.14 in its table. In this instance, the router should match the last entry, that is, 192.94.172.14. The packet is scheduled on the output port using FCFS for best effort network. If QoS support is available, packet classification and more sophisticated packet scheduling, such as WFQ are possible. This process is repeated at each router along the path to the destination.
Multiprotocol Label Switching
Label 0 Figure 10.2
Exp 19
245
S 22
TTL 23
31
Label encoding.
10.5 MPLS APPROACH As we discussed earlier, one of the major improvements MPLS makes over the existing IP network is that it obviates the need of the expensive longest match procedure at most of the routers along the path to destination. In place of network layer IP address longest match, the MPLS network uses a fixed length identifier called a label. All packets belonging to a particular FEC are assigned the same label. The labels are assigned to packets at the ingress LER of an MPLS domain. Inside the domain, labels attached to these packets are used for forwarding. Labels are removed at the exit point (egress LER) of the MPLS domain. As we saw earlier, in conventional routing, at each hop longest match lookup is performed; in MPLS it is done only once at the network ingress. 10.5.1 Label Encoding MPLS uses a 32-bit identifier for label encoding, as shown in Figure 10.2. From this identifier, 20 bits are used for encoding the label itself [5]. A stack of labels can be used to support hierarchical routing. The s bit (s = 1) indicates whether this is the last entry (bottom) in the stack. Time to live (TTL) is copied from the value of the IP TTL field when the packet is first labeled. When the last label is popped off the stack, MPLS TTL is copied back to the IP TTL field. Since MPLS can support multiple protocols such as IP and IPX, we need to know which protocol it is referring to. The protocol information is inferred from the last label in the stack. The following label values are used for specific purposes:
å IPv4 explicit null label (label 0): indicates that label stack must be popped and the packet should be forwarded using normal IPv4 header;
å Router alert label (label 1): indicates that packet should be forwarded based on the label beneath this one in the stack. This option is similar to the router alert option in IP packets that alerts the router software to do special processing. This
246
Engineering Internet QoS
option should be pushed into the label stack again if the packet is forwarded to the next hop;
å IPv6 explicit null label (label 2): Indicates that the label stack must be popped and the packet should be forwarded using the normal IPv6 header;
å Implicit null label (label 3): Is used for assignment and distribution but should not appear in any label stack in the packets. In place of the usual label swapping process, an LSR performs a pop operation in this case;
å Reserved (labels 4–15): Currently reserved for future use. 10.5.2 TTL Handling TTL entry is similar to the time-to-live field carried in the IP header. An LSR processes the the TTL field of the top entry from the stack. At each hop, the TTL value is decremented by one. A packet is dropped if TTL reaches zero. At the exit point of the MPLS domain TTL value from the label is copied back to the TTL in the IP header. For a network such as ATM, the entire MPLS domain may be considered as a virtual link of a single hop. 10.5.3 MPLS Encapsulation MPLS is intended to run over multiple link layers. Link layer technologies that have adequate semantics to carry the MPLS label simply use the header fields to carry them. Examples of such networks include ATM and Frame Relay. For ATM, the label is contained in the VCI/VPI field of the ATM header, and for Frame Relay the label is contained in the DLCI field in its header. Other link layer technologies such as point-to-point links, Ethernet, FDDI etc., do not provide the semantics to carry the MPLS label. In these cases, a shim header is inserted between the L2 and L3 headers. If PPP is used, the protocol field may identify frames that carry labels. 10.5.4 Label Processing Routers use the label as an index into a table that specifies the next hop and a new label. This eliminates the longest match calculation at subsequent hops. As with ATM VPI/VCI swapping, an old label is replaced with a new label and the packet is forwarded to the next hop. In addition, the experimental field of a label may also contain a class of service (CoS) field. This field is useful for scheduling/discarding packets, etc. Routers do not need to look at the IP header for CoS information again
Multiprotocol Label Switching
247
and this saves header processing time. One problem with label processing is that it is used as an index into a table that is local to the router. This index is advertised to the neighboring routers for label binding (we will discuss label distribution protocol later). In the event of a router crash, the new label (index) needs to be advertised. This may cause transient problems, as forwarding tables may become inconsistent.
10.6 LABEL DISTRIBUTION Label distribution protocols define a set of procedures to exchange label/FEC bindings. In simple terms this involves assignment of labels to identify traffic. Consider LSR A forwarding traffic to LSR B. We call A the upstream (with respect to dataflow) and B the downstream LSR. A must apply a label to the traffic that B understands. Label distribution must ensure that the meaning of the label will be communicated between A and B. An important question is whether A or B (or some other entity) allocates the label. Downstream label allocation refers to a method in which the label allocation is done by the downstream LSR. Upstream allocation is done by upstream LSR. Local allocation of the label at a node (known as local binding) refers to the operation in which the local node sets up a label relationship with the FEC. This can be done either when an LSR receives traffic or beforehand upon receipt of control information from an upstream or downstream LSR. Adjacent LSRs decide which method to use. Control-driven binding is set up in advance using control messages or preprovisioning craft commands to the LSR. Flow control binding (also know as data binding) is performed dynamically, based on analysis of a data stream. Label distribution can either piggyback on top of an existing routing protocol, or a dedicated label distribution protocol (LDP) can be created. LDP is based on the union of Cisco’s Tag Distribution Protocol (TDP) and IBM’s ARIS protocols. It provides a discovery mechanism to establish communication with peers and uses TCP’s reliable delivery mechanism. This chapter deals with the architectural components that are important to understanding QoS issues in an MPLS domain. Details of the LDP and various extensions are covered by Davie and Yekhter [1] and Andersson et al. [6]. 10.6.1 Sample Network Let’s take an example network consisting of four LSRs: R1, R2, R3, and R4. We assume that R1 is attached to subnet 162.25.8 and R3 is attached to subnet 192.35.10, as shown in Figure 10.3. We assume that these routers run the existing
248
Engineering Internet QoS
Subnet 192.35.10 MPLS domain
Ingress node if1
R2
R3
if0
R4
if1
R1
Subnet 162.25.8 Figure 10.3
Sample network topology.
Internet routing protocols such as RIP/OSPF, etc., to build their routing tables. We show part of the routing table at R2 in Table 10.1. At index 17, there is an entry for subnet 162.25.8, which indicates that any incoming packet with a destination address matching this entry should be sent along interface if1 of LSR R2. Similarly at index 18, there is an entry for subnet 192.35.10, which indicates that any incoming packet with a destination address matching this entry should be sent along interface if0 of LSR R2. These indexes (17, 18) are later used as labels by LDP. Similarly we show entries for these two subnets at LSR R4, as shown in Table 10.2. For our example, we do not need entries for other indexes. 10.6.2 Label Binding Figure 10.4 shows the label binding scheme for our example network. In this example we are considering that the data traffic will flow from LSR R4 to LSR R1 via LSR R2. R1 is the downstream router with respect to R2. Now R2 must apply a label that R1 understands. Similarly, R4 must apply a label that R2 understands. Figure 10.4 shows that R1 is sending an LDP binding æ 162.25.8, 19 ç to R2. This binding indicates to R2 that packets destined for 162.25.8 should be sent with the
Multiprotocol Label Switching
249
Table 10.1 Forwarding Table at LSR R2 Index
IP Address
Interface
16 17 18 19
... 162.25.8 192.35.10 ...
... 1 0 ...
Table 10.2 Routing Table at LSR R4 IP Address
Interface
... 162.25.8 192.35.10 ...
... 1 1 ...
label 19. A possible implementation could have the forwarding information for 162.25.8 located at index 19 in R1’s forwarding table. When a packet arrives at R1 with the label 19, it will refer to the 19th entry in its table (eliminating the longest match) and find the forwarding port (also a corresponding label if it is not the last router in the path). Similarly R3 is sending an LDP binding æ 192.35.10, 16 ç to R2 (assume that entry at index 16 in R3’s table has the forwarding information for 192.35.10). R2 sends æ 162.25.8, 17 ç and æ 192.35.10, 18 ç to R4. The table for LSR R2 that has these entries at index 17 and 18 was shown earlier in Table 10.1. 10.6.3 Label Allocation Once LDP bindings are done, the modified forwarding table at LSR R2 and R4 looks as follows: In our example, R4 is assumed to be an ingress LSR. We assume that labeling is performed first at this router and incoming packets contain no label. Table 10.3 at R4 shows what outgoing label is to be applied and which interface is to be used for the address prefixes in the destination address of packets. For example, an IP datagram with destination address 165.25.8.9 will be forwarded along interface if1 with an outgoing label of 17. Similarly, Table 10.4 at LSR R2
250
Engineering Internet QoS
Subnet 192.35.10
MPLS domain
Ingress node
R3
Upstream
R2
if1
R4
LDP Binding <165.25.8, 17> <192.35.10, 18>
if0
LDP binding <192.35.10, 16>
if1 Downstream LDP Binding <165.25.8, 19>
R1
Subnet 162.25.8 Figure 10.4
Label binding example.
shows that a packet arriving with incoming label 17 should be sent along interface if1 with outgoing label 19.
10.6.4 Label Switching
The example in Figure 10.5 shows how packets are forwarded in the LSR domain. An IP datagram arrives at LSR R4 with the destination address 162.25.8.4 from a non-MPLS domain. R4 performs longest match and finds out that an outgoing label 17 should be applied to this packet and forwarded along interface if1. When this packet arrives at LSR R2, a simple label swapping takes place. In place of label 17, a new label 19 is applied and the packet is forwarded on interface if1. Finally, when this packet arrives at LSR R1, R1 discovers that it is directly connected to the subnet 162.25.8 and uses the direct delivery mechanism to send this packet to host 162.25.8.4 (it may use ARP, etc., to find the MAC address of a destination host).
Multiprotocol Label Switching
251
Table 10.3 MPLS Forwarding Table at R4 In Label
Out Label
Address Prefix
Interface
... ? ? ...
... 17 18 ...
... 162.25.8 192.35.10 ...
... 1 1 ...
Table 10.4 MPLS Forwarding Table at R2 In Label
Out Label
Address Prefix
Interface
... 17 18 ...
... 19 16 ...
... 162.25.8 192.35.10 ...
... 1 0 ...
Subnet 192.35.10
MPLS domain
Longest match + label add
R3
R4 162.25.8.4
if1
Label swap only
162.25.8.4
17
162.25.8.4
R2
19
if1
R1
Subnet 162.25.8
Figure 10.5
Label switching.
252
Engineering Internet QoS
IGP IGP BGP
IGP
IGP
IGP IGP
IGP BGP
BGP VPN site 2
VPN site 1 BGP
R2
R3 R4
NSP
R6 R5
R1
Figure 10.6
Hierarchy of routing.
10.7 HIERARCHICAL ROUTING Current Internet routing protocols require that all routers in the transit domain maintain in their forwarding tables all the IGP routes provided by the interdomain routing, regardless of whether this is an interior router or a border router. Maintaining full routing tables in all routers limits scalability of interior routing and results in slower convergence, larger routing tables, and poorer fault isolation. Using MPLS, the volume of routing information is reduced since routers need to keep information only sufficient to get packets to the right border router. MPLS enables an ingress router to identify an egress router and label packets based on an interior route. Interior LSRs would only require enough information to forward packets to egress routers. We consider an example network of two corporate sites (1 and 2) connected via a network service provider (NSP) in Figure 10.6. The NSP has R2 and R5 border routers that run both interior and exterior gateway protocols. The path to be followed by transit traffic between site 1 and site 2 through NSP follows R2, R3, R4, R5 (other routers in the figure are of no significance for this example.). The routers R3 and R4 run only IGP. The border router R1 connects site 1 to the NSP via R2 and the site 2 border router R6 connects to NSP via router R5. R1 and R6 run EGP and we are not interested in other routers of these sites. The border routers R2 and R5 run EGP providing interdomain routing. Interior transit routers R3 and R4 as well as R2 and R5 run an IGP, providing intradomain routing. IGP forwarding table entries at R2 and R5 are built using LDP, as described earlier. Table 10.5 shows the BGP entries used by the NSP. This table is built using BGP protocol. The next hop entry at border router R2 with incoming label 21 is
Multiprotocol Label Switching
253
Table 10.5 BGP Forwarding Table Router
In Label
Out Label
Next Hop
R1 R2 R5 R6
... 21 31 41
21 31 41 ?
R2 R5 R6 ?
Table 10.6 IGP Forwarding Table Router
In Label
Out Label
Next Hop
R2 R3 R4 R5
... 52 62 72
52 62 72 ?
R3 R4 R5 R5
to be forwarded with outgoing label 31 and the next hop is R5. Since R2 is not connected directly to router R5, it will need the help of IGP entries. Table 10.6 shows the IGP entries for the NSP. The IGP entries suggest that it should use label 52 and the next hop is R3. Stack of Labels A packet may carry several labels organized as a label stack. A packet forwarded from one domain to another contains one label. A packet forwarded through a transit domain contains two labels. LSRs use a label from the top of the stack. Labels are pushed (at ingress) and popped (at egress) at domain boundaries. We will see how this works through an example in Figure 10.7. Let’s assume that a packet arrives from site 1/router R1 with label 21. The BGP table at LSR R2 indicates that label 31 should be used and the next hop is R5, which is not directly connected to R2. LSR R2 performs a push operation on the label stack and puts yet another new label 52 after consulting the IGP table. The IGP table also gives the next hop as R3 (directly connected to R2). At routers R3 and R4, simple label swapping takes place (on the top label). R3 replaces label 52
254
Engineering Internet QoS
31 52
21
R1
R2
31 72
R4
R3
31 62 Site 1
Figure 10.7
41
R5
R6
Site 2
NSP
Push-and-pop operation on stack.
with 62 and forwards it to R4. R4 replaces label 62 with 72 and forwards it to R5. At the egress router R5, again the IGP entry indicates the next hop as R5 (itself). R5 pops the top label off the stack (which is 31). The BGP table indicates that the outgoing label should be 41. Now the packet is forwarded to site 2 LSR R6 with this new label 41.
10.8 MPLS OVER ATM ATM switches performing label switching are called ATM-LSRs. At the ingress point of ATM-based LSPs, a device performs the conversion from IP packets to the ATM cells, and encodes MPLS labels. Figure 10.8 shows the ATM adaptation layer 5 (AAL5) encoded MPLS frame. The AAL5 PDU is segmented into ATM cells. The top layer label is encoded in VCI/VPI fields in one of the several possible ways currently being developed by the IETF. The VPI/VCI or VPI alone or VCI alone could be used for the label. MPLS forwarding is similar to label swapping in ATM. Multiple tags per destination may be used to avoid frame merging. VPI/VCI space may be segmented for label switching and normal ATM switching. The VCI field is sufficient for one-level tagging. VPI may be used for the second level. The same ATM user plane may be used by MPLS LSRs. However, the control plane needs to be changed for MPLS-based ATM. The Internet routing protocols such as OSPF/BGP replace the ATM UNI/PNNI. ATM-LSR switches need to participate in network-layer routing protocols (OSPF, BGP). MPLS on ATM: Issues In ATM/AAL5, cells of frames on the same VC cannot be intermingled, i.e., VCs cannot be merged. In Figure 10.9, cells arriving from LSR1 and LSR2 use the same label 21. The ATM-LSR swaps it with another label 31 for cells from both routers.
Multiprotocol Label Switching
255
AAL5 PDU Higher layer PDU (IP datagram)
MPLS label stack
AAL5 PAD and trailer
MPLS frame
Figure 10.8
ATM AAL5 encoding of MPLS frame.
Now the receiving router LSR3 has no way of distinguishing between the cells coming from LSR1 and LSR2. LSR2
LSR1
21
ATM-LSR
21
if1
21
if0
21
31 31 31
LSR3
Figure 10.9
ATM cell interleave problem.
One possible solution for the cell interleave problem in ATM-LSRs is to provide separate labels to each LSR. In this case, both LSR1 and LSR2 request for a label for destination 162.25.8. ATM-LSR records these requests (coming along interface if0 and if2). In turn it requests labels for LSR1 and LSR2 from LSR3 and receives separate labels 51 and 41, respectively. Now the cells from LSR1 are
256
Engineering Internet QoS
swapped with label 51 and cells from LSR2 are swapped with label 41, removing the ambiguity at LSR3. Predefined VPI/VCI is used for label binding. Another approach to solving cell interleave requires buffering of cells at ATM-LSR. In Figure 10.10, cells from LSR2 are delayed until the ATM-LSR receives the complete AAL5 frame. The end of the AAL5 frame is distinguished by the EOF marker. However, this approach introduces substantial delay for the stream from LSR2. LSR2 LSR1
21
21
ATM-LSR 21
21 21
21
31 31 31 31 31 31
LSR3
Figure 10.10
ATM VC merge.
10.9 TRAFFIC ENGINEERING USING MPLS MPLS is capable of supporting traffic engineering concepts [7] such as performance optimization and QoS support. This would involve maximizing utilization of links and nodes throughout the network. MPLS-based network can spread the network traffic across network links. This also minimizes the impact of single node failure. MPLS is becoming popular for backbone where it is possible to ensure reliability by making spare link capacity available for rerouting traffic in case of a failure. It is possible to engineer links to achieve required delay and grade-of-service. Using
Multiprotocol Label Switching
257
policy-based resource allocation it is possible to meet policy requirements imposed by the network operators. 10.9.1 Constraint Routed LSP In the current shortest-path-based routing protocol, a shortest path is selected based on metrics such as hop count, cost, link speed, etc. In Figure 10.11, we assume that R2 è R5 è R6 is a high-speed link at 55 Mbps. The alternate route R2 è R4 è R6 is an 8 Mbps link. The shortest path between source-destination pair (A, C) based on link speed will select R1 è R2 è R5 è R6. We assume that other links will not affect the shortest path calculation. Similarly, the shortest path between source-destination pair (B, C) based on link speed will select R3 è R2 è R5 è R6. A
C R1 R6
R5 R2 R3 B
Figure 10.11
R4
Conventional shortest path.
At this stage, if multiple sessions are in place concurrently between A è C and B è C, all packets from both A and B will follow the same path. The alternate path R2 è R4 è R6 has the potential of supporting QoS in this case; however, it remains unused. It is hard to do nonshortest-path routing with connectionless forwarding. Constrained routed label-switched paths allow selective nonshortestpath routing. It is possible to perform load balancing across the path by splitting traffic along various routes based on the load on particular routes and links.
258
Engineering Internet QoS
10.9.2 Path Resource Reservation Protocols Signaling protocols for path establishment such as CR-LDP and RSVP-TE are being developed by the IETF. The fundamentals of both protocols are the same. They both initiate and control the establishment of an LSP between itself and a remote LER, and specify strict (specify every core on the path) or loose (some transit LSR specified but others in between may be discovered using IP routing protocols) routing support. They also specify the queue management (scheduling) techniques and associated parameters to be used with the LSP at each hop. Brief descriptions of two major efforts in this direction are provided below. 10.9.2.1 RSVP-TE One of the proposals is an extension of the RSVP called traffic engineering extension (RSVP-TE) [8]. It is based on the RSVP soft-state oriented model. PATH and RESV messages are exchanged to establish the label swapping table for an LSP and associated QoS parameters. Both hop-by-hop routing as well as explicit route are supported by RSVP-TE. RSVP-TE supports the fixed-filter (FF) and shared explicit (SE) styles of reservation. These filters have been described in Chapter 6. The FF style is useful for creating point-to-point LSPs that are exclusively created for a sender (or FEC). SE style allows a list of senders to be included in the reservation and may be used to create a multipoint-to-point connection. SE reservation is also useful for creating a backup LSP in case of failure of the original LSP. Essentially, backup LSPs are set up to share resources with the primary LSPs. If the primary fails, the backup uses the resources allocated to the primary. This avoids making double reservations. 10.9.2.2 CR-LDP The other protocol, called constraint routed label distribution protocol (CR-LDP), is an extension to LDP (at the time of writing it is in IETF draft stage only). It is a hard-state signaling protocol in which the state needs no refreshing periodically. It has additional features on top of the LDP, such as support for explicit routes, resource reservations, and a priority scheme for paths. The resource reservation for paths is communicated using parameters such as peak data rate, committed data rate, and peak burst size. In addition, weight parameter is useful to determine the proportion of excess bandwidth relating to committed rate. The data rate calculation is performed using the granularity parameter. CR-LDP supports renegotiation of
Multiprotocol Label Switching
259
A R1
C LSR6
LSR5 R2 R3 B
Figure 10.12
LSR4
Constrained routed LDP.
QoS requirements. If the original request cannot be met by the network, an LSR may specify a new set of parameters, presumably with a lower resource requirement. LSPs with higher priority can preempt lower priority LSPs if sufficient resources are not available. Wang [9] provides details and a good comparison of CR-LDP and RSVP-TE. MPLS supports constraint-routed label-switched paths. A LSP is a path from the ingress node to the egress node of an MPLS domain followed by packets with the same label. Topology-driven MPLS will establish two LSPs matching shortestpath topology for reaching C (one consisting of R2 è R5 è R6 and the other, R2 è R4 è R6). This is hard to do with connectionless forwarding. However, with MPLS it is easy to install an LSP with a path that follows R1 è R2 è R5 è R6 for traffic from A to C and another path R3 è R2 è R4 è R6 for traffic from B to C, as shown in Figure 10.12. There are several advantages to doing this. First of all, the shared queue at router R2 is reduced (this will improve its burst tolerance). Second, the available bandwidth for traffic across R2 è R5 è R6 will be increased. We have used a very trivial example to demonstrate the principle. LSPs such as the ones mentioned here are termed as constraint-routed LSPs.
260
Engineering Internet QoS
Flow Trunk
LSP Link
Flow Trunk
Figure 10.13
LSP
Traffic trunking.
10.9.3 Traffic Trunk
Bhanikrama [10] proposed an idea about traffic trunking. A traffic trunk is an aggregation of traffic flows of the same class, which are placed inside an LSP. All traffic inside a trunk has the same label and the same 3-bit class of service field (currently the Exp field in the MPLS label) in the MPLS header. A flow consists of packets that have same MPLS header as well as the IP and TCP/UDP headers. A set of traffic parameters can be specified to determine the FEC. A trunk can carry any aggregate of microflows, where each microlow consists of packets belonging to a single TCP or UDP flow. Generally, trunks are expected to carry several such microflows of different transport types. Traffic trunks are routable entities like VCs in ATM and Frame Relay. These trunks can be established either statically or dynamically between any two nodes of an MPLS domain. Multiple trunks can be used in parallel to the same egress. Each traffic trunk can have a set of associated characteristics, e.g., priority, preemption, policing, overbooking. Figure 10.13 shows the relationship between flows, trunks, and LSPs. The decision of what label to choose at the ingress can be based on any field in the packet headers and/or on a predetermined policy and/or current state information. What scheme to use when choosing a label at the ingress node is not part of MPLS. One way of selecting labels is to aggregate different microflows into trunks. Different trunks can be routed along the same LSP, the only thing that distinguishes flows in different trunks in this instance being the Exp field. In general,
Multiprotocol Label Switching
261
a trunk is expected to carry any aggregate of flows, both congestion sensitive (i.e., TCP) and insensitive (i.e., UDP). 10.9.4 MPLS Experimental Results Rosenbaum et al. [11] performed experimental work to qualitatively analyze performance of an MPLS network supporting traffic trunking. This work analyzed the interference between TCP and UDP flows when they are mixed in trunks that follow LSPs through an MPLS core network. TCP is responsive to congestion, while UDP ignores congestion completely. As a consequence, TCP will reduce its transmission rate when congestion occurs, but UDP will keep on transmitting with the same rate. There are applications that implement some kind of flow control on top of UDP, but most of them still don’t. In order to see how TCP flows are affected by non-responsive UDP flows when transported through a core network that uses trunking, we set up four different test-configurations. Common in both cases is that in each given configuration, the UDP transmission rate is varied and the received throughput is monitored, both for UDP flows and TCP flows. In each case there will be two TCP flows and one UDP flow between the six end systems. These measurements are then plotted in a graph as a function of the UDP transmission rate. In those cases where trunks are used, they are of the kind that are bounded and isolated, which means that they won’t borrow available bandwidth from neighboring trunks nor will they lend their own free bandwidth to any other trunk. The testbed consists of six routers, of which two serve as ingress/egress routers. They were arranged in a fish like topology with one high bandwidth (maximum 50 Mbps throughput) and one low bandwidth (maximum 15 Mbps throughput); see Figure 10.14. Each router is running Linux 2.4 patched with MPLS code developed by the University of Wisconsin on 800-MHz Intel Pentiums. The machines are connected to each other with 100 Mbps Ethernet links. To narrow the available bandwidth, Linux Traffic Control (TC) is used. The program Netperf was used to generate both TCP and UDP traffic in the end program. 10.9.5 No Trunking The baseline case considered is an ordinary IP network that routes packets through the core according to best effort policies. In this case all traffic follows the very same path through the core, the high bandwidth path over router LSR 2 (that currently has its MPLS ability turned off, like all routers in this scenario). Figure 10.15
262
Engineering Internet QoS
LSR2 dest1
src1 50 Mbps src2
LER A
50 Mbps LSR4
LSR1
LER B dest2
65 Mbps src3
65 Mbps 15 Mbps
15 Mbps
dest3
LSR3
Figure 10.14
MPLS test-bed.
reveals that the UDP flow consumes almost all bandwidth that it wants without any consideration to other traffic. The TCP flows on the other hand, using their slowstart mechanism, stand back as soon as they discover congestion and therefore they keep on decreasing their demand for bandwidth in favor of the UDP-flow. 10.9.6 Two Trunks Using LSPs In the second experiment, the MPLS mechanism is turned on in the core network. LSP 1 consists of LER A-LSR 1-LSR 2-LSR 4-LER B and is the high bandwidth path through the core. The other path, LSP 2, follows LER A-LSR 1-LSR 3-LSR 4-LER B, and has a maximum bandwidth of 15 Mbps. Also, two trunks are created; trunk 1 is aggregating TCP 1 and UDP. This trunk is sent along LSP 1 and assigned a maximum bandwidth of 50 Mbps. TCP 2 is mapped to trunk 2, which is sent along LSP 2, and has a maximum bandwidth of 15 Mbps allocated to it. Figure 10.16 shows the outcome of this test. It is quite clear that the flow TCP 2 that now is isolated from the other flows in the core is unaffected by the transmission rate of the UDP flow. It remains rather constant no matter how flooded the other trunk is. TCP 1, on the other hand, still has severe problems getting reasonable throughput since it is competing with the UDP flow to get a share of the bandwidth allocated to trunk 1.
10.10 MPLS AND LATEST DEVELOPMENTS 10.10.1 Diffserv over MPLS Researchers have started looking at supporting Diffserv over MPLS [12]. The IETF MPLS WG [2] is currently working on a draft to support Diffserv over the MPLS
Multiprotocol Label Switching
263
45
UDP TCP 1 TCP 2
40
35
Throughput (Mbps)
30
é
25
20
15
10
5
0 15
Figure 10.15
20
25
30 35 40 UDP transmission rate (Mbps)
45
50
55
MPLS results with no trunks.
network. The main issue consists of mapping the 6-bit DSCP into the 3-bit Exp field of the MPLS label. It is worth noting that the decision to have a 3-bit Exp field map the IPv4 COS field was taken before establishment of the IETF Diffserv working group. Another motivation to keep this field small was to keep MPLS overhead low. In its simplest form, many-to-one will be required to reduce 64 DSCPs to 8 Exps. However, the IETF is working on two principal ideas: The first idea requires each packet belonging to a stream to carry information on the desired service (PHB). In this case, each LSP is capable of carrying 8 different PHBs based on the Exp field in the MPLS shim header. The Exp field is used to determine the scheduling treatment as well as packet drop precedences. This scheme poses problems for ATM as the shim layer processing requires a segmentation/reassembly function at each ATM-LSR. A possible solution could use part of VPI/VCI for the label and 8 bits for DSCP. The second solution requires inferring the PHB from the label itself. In this case, the label also tells an LSR what PHB to use for an MPLS frame. Packet drop precedence may be conveyed via the Exp field or mapped to the appropriate link layer header. This scheme will require an additional mechanism in LDP to convey
264
Engineering Internet QoS
45
UDP TCP 1 TCP 2
40
35
Throughput (Mbps)
30
é
25
20
15
10
5
0 15
Figure 10.16
20
25
30 35 40 UDP transmission rate (Mbps)
45
50
55
MPLS results with two trunks.
for association of the desired category with the label. A DSCP is mapped to an LSR at ingress LER. For each DSCP, a separate LSP is established to the egress LER. The MPLS label acts as classified for behavior aggregate. 10.10.2 Generalized MPLS (GMPLS) With recent developments in optical fiber switching, new types of schemes have become feasible [13]. These schemes may perform switching that doesn’t require packet or cell boundaries. This means that existing MPLS/ATM schemes based on a packet/cell header cannot be applied to such devices. The new LSRs may have devices whereby the forwarding decision may be based on time slots, wavelengths, or physical ports. An IETF draft [2] categorizes the interfaces on these devices as the following: 1. Packet-switch capable (PSC): MPLS LSRs and ATM-LSRs interface that forward data based on the content of the packet or cell header belonging to the PSC category.
Multiprotocol Label Switching
265
2. Time-division multiplex capable (TDM): SDH/SONET cross-connect is an example of such an interface, as it forwards data based on the data’s time slot in a repeating cycle. 3. Lambda switch capable (LSC): Interfaces forward data based on the wavelength of the channel on which the data is received. Recent optical cross-connect that can operate at the level of an individual wavelength fits into this category [14]. 4. Fiber-switch capable (FSC): Interfaces forward data based on the position of the data in the physical spaces. An optical cross-connect that may operate at the level of single and multiple fibers serves as an example of interface. These interfaces facilitate carriers to build hierarchies from FSC at the top and PSC at the bottom. The IETF is currently working on extensions to MPLS that will support all these four classes of interfaces.
10.11 SUMMARY One of the major advantages of MPLS is that it uses simplified forwarding based on an exact match of the fixed length label. MPLS provides a clean separation of routing and forwarding in IP networks. This deployment of new routing functionality works without changing the forwarding techniques of every router in the Internet. MPLS facilitates the integration of ATM and IP. It allows carriers to leverage their large investment of ATM equipment. This chapter provided an overview of MPLS technology. We discussed how MPLS improves routing scalability (hierarchy) through stacking of labels and eliminates the need for full routing tables from interior routers in the transit domains. Traffic engineering is seen as one of the most significant features of MPLS over traditional IP routing. This chapter provided a discussion of constraint routed LDP and associated signaling mechanisms such as CR-LDP and RSVP-TE. Finally, the chapter concluded with the latest work in progress on supporting Diffserv over the MPLS as well as the GMPLS.
10.12 REVIEW QUESTIONS 1. What are the main advantages of MPLS over traditional IP forwarding? 2. Which of the following applications will benefit from MPLS technology and why? a. File transfer
266
References
b. DNS queries c. Multimedia streaming 3. Refer to Figure 10.4. Assume that a new subnet 168.22.5 is connected via a new router R5. This router R5 is connected to router R2 directly at its new interface if2. a. What binding information will be sent from R5 to R2? b. What binding information will be sent from R2 to R4? c. Show the new updated label allocation table at R4 and R2. d. Show an example of the IP packet entering at R4 destined for 168.22.5 passing through the example network. 4. Review the push/pop operation on the stack of labels in the example of Figure 10.7 to demonstrate hierarchical routing. 5. Suppose you are required to design a label distribution protocol. From your knowledge of other routing protocols, recommend at least four messages that may be required for LDP and their usage. Explain how these messages will be useful in creating and maintaining the label swapping table at label switched routers (LSRs). What additional mechanism will be required for LDP to support QoS? 6. What is the cell interleave problem in an ATM-LSR? Describe the proposed solutions to solve this problem. 7. Can RSVP-TE use the SE filter to set up the back path? Explain how this works and what the advantages are of using the SE filter in this case. 8. What do we mean by traffic engineering in the context of MPLS? 9. What is constraint routing? Give an example of this for an MPLS network. 10. What is the GMPLS extension? Why is there a need for GMPLS?
References [1] B. Davie and Y. Rekhter. MPLS Technology and Applications. Morgan Kaufman Publishers, San Francisco, California, 1st edition, 2000. [2] Multi protocol label switching working group http://www.ietf.org/html.charters/mpls-charter.html, 2001.
charter.
URL:
References
267
[3] G. Armitage. MPLS: the magic behind the myths. IEEE Communications Magazine, pages 124– 131, January 2000. [4] S. Keshav. An Engineering Approach to Computer Networking. Massachusetts, 1st edition, 1997.
Addison Wesley, Boston,
[5] E. Rosen, Fedorkow G., Y. Rekhter, D. Farinacci, T. Li, and A. Conta. MPLS label stack encoding. RFC 3032, Internet Engineering Task Force, January 2001. [6] L. Andersson, P. Doolan, N. Feldman, A. Fredette, and B. Thomas. LDP specifications. RFC 3036, Internet Engineering Task Force, January 2001. [7] D. Awduche, J. Agogbua, M. O’Dell, and J. McManus. Requirements for traffic engineering over MPLS. RFC 2702, Internet Engineering Task Force, January 2001. [8] D. Awduche, L. Berger, D. Gan, T. Li, V. Srinivasan, and G. Swallow. RSVP-TE: Extensions to RSVP for LSP tunnels. RFC 3209, Internet Engineering Task Force, December 2001. [9] Z. Wang. Internet QoS Architectures and Mechanisms for Quality of Service. Morgan Kaufman Publishers, San Francisco, California, 1st edition, 2001. [10] P. Bhaniramka, B. Sun, and R. Jain. Quality of service using traffic engineering over MPLS: An analysis. In Proceedings of the 25th IEEE Conference on Local Computer Networks, pages 238– 241, Tampa, Florida, November 2000. [11] F. Rosenbaum, S. Jha, and M. Hassan. Experimental evaluation of traffic engineering in MPLS network. In SPIE Conference on Quality of Service over Next Generation Data Networks, ITCom2001, Denver, Colorado, August 2001. [12] I. Andrikopoulos and G. Pavlou. Supporting differentiated services in MPLS networks. In Proceedings of IWQoS’99, pages 207–215, London , United Kingdom, May 1999. [13] A. Banerjee, J. Drake, J. Lang, B. Turner, D. Awduche, L. Berger, K. Kompella, and Y. Rekhter. Generalized multiprotocol label switching (GMPLS): overview of signaling enhancements and recovery techniques. IEEE Communications Magazine, 39(7):144–151, July 2001. [14] D. Awduche and Y. Rekhter. Multiprotocol lambda switching: combining MPLS traffic engineering control with optical crossconnects. IEEE Communications Magazine, 39(9):111–116, March 2001.
268
References
Chapter 11 QoS in Mobile Wireless Networks Mobile wireless technology is the only other technology experiencing the same rapid growth enjoyed by the Internet. The number of mobile handsets being sold is almost doubling each year (see Figure 11.1), leading to the obvious future, a mobile wireless Internet. The QoS technologies discussed so far do not address some of the unique challenges we face in the mobile wireless Internet. In this chapter, we present a number of applications and networking technologies of the mobile wireless Internet and then discuss some of the unique QoS challenges faced in this new mobile environment. A vast topic like this deserves a complete book. Readers should gain a high-level understanding of the measures currently considered for addressing the QoS problem in the wireless Internet from this chapter.
11.1 MOBILE APPLICATIONS Since it is the rapid growth in mobile phone sales that is primarily driving us toward the mobile wireless world of the future, traditional voice calls will still remain a primary application for the mobile user. However, the increasing dependence on Internet data applications, such as WWW, e-mail, and file transfer, has made it clear that Internet access will be a major attraction to mobile consumers. On top of data, we are likely to see the demand for some emerging multimedia applications, such as videoconferencing and multicasting. Good coverage of adaptive multimedia is available from a variety of sources [2, 3, 4]. In addition to multimedia applications, work is already in progress to support traditional voice calls to the emerging IP-enabled mobile handsets, to be used in the
269
270
Engineering Internet QoS
800
700
Number of subscribers (millions)
600
ê
500
400
300
200
100
0 1992
1993
1994
1995
1996
1997
1998
1999
2000
Year
Figure 11.1
The growth of the mobile phone market [1].
next generation (3G) cellular networks. This technology trend requires that mobile QoS be in place before we see a true convergence of the wireless and the Internet.
11.2 MOBILE WIRELESS NETWORKS
There are several wireless networking technologies, sometimes competing, sometimes complementing, that enable the mobile wireless Internet. Detailed coverage of these networks is beyond the scope of this book. A good coverage of wireless network architectures can be found in Schiller and Lin [5] and Chlamtac [6]. However, for the benefit of novice readers, we present an overview of the most prominent wireless networks.
QoS in Mobile Wireless Networks
271
Global Internet
MH Wireless cell−1
MH
Gateway AP
Wired backbone
MH
cell−3
MH
AP
MH
Wireless
AP
MH
Wireless cell−2
Figure 11.2
The basic architecture of an infrastructure-based 802.11b wireless LAN.
11.2.1 Wireless LAN High-speed Internet connectivity within an indoor business environment is typically provided through local area networks (LANs). Although most of today’s LANs are fixed, wireless LANs are gaining popularity. Wireless LANs not only support some degree of mobility, but also they eliminate the need for any cabling to connect devices to the LAN. The freedom of mobility and the cableless connection are driving the wireless LAN market to a rapid success. Figure 11.2 shows how computers and wireless devices, collectively called mobile hosts (MH), are typically connected to a wireless LAN. A building or any specific indoor area, such as an airport lounge, is serviced through a set of access points (AP). Each AP provides the wireless LAN connectivity to any MH within a certain range. An MH wishing to communicate with a nearby AP must install a wireless LAN card. The access points themselves are connected through a wired backbone that has a gateway to the global Internet.
272
Engineering Internet QoS
IEEE 802.11b is the most popular standard for wireless LANs, which operate in the free, unlicensed instrumentation, scientific and medical (ISM) band. Therefore, to operate an IEEE 802.11b wireless LAN, there is no need to seek a license from any authority. This makes such wireless LANs even more popular. The rate supported by 802.11b is 11 Mbps, which is on par with the traditional 10 Mbps Ethernet LANs. An 802.11b LAN can also be rapidly deployed without any infrastructure (no APs and no wired backbone) to provide temporary network services in any desired location, such as in a building hit by an earthquake. Such infrastructureless installations are known as adhoc networks. To support routing of data packets from one device to another device outside the range of the wireless card, the devices must load some ad hoc routing software. HIPERLAN, a competitor of IEEE 802.11b, was standardized by the European Telecommunications Standards Institute (ETSI). Like 802.11b, it also operates with the free ISM band, but it supports 23.5 Mbps (for type 1), a much higher data rate than 802.11b. However, due to higher costs, the take-up of HIPERLAN remains slow. 11.2.2 Bluetooth Bluetooth is a low cost, low range (10 to 100 meter) wireless connectivity technology that enables instant, effortless communication between mobile phones, PDAs, notebooks, and any other devices that are close by. Like its wireless LAN counterparts, Bluetooth also uses the free ISM band for all its communications. Bluetooth supports a nominal data rate of 1 Mbps. The application scenarios of Bluetooth includes:
ë Office environment: could be used as a low-speed wireless LAN; ë Home environment: will allow communication between home devices, such as cordless phone, TV, VCR, heating system, and so on;
ë Personal area network (PAN): can allow the mobile phone to search an address directory stored in the PDA;
ë Public environment: will allow PDAs to access information databases in public areas, such as airports and railway stations;
ë Ad hoc networking: will allow devices to establish connectivity independently of any fixed infrastructure.
QoS in Mobile Wireless Networks
273
11.2.3 Cellular Networks Wide area (in the outdoor) wireless connectivity is supported by the traditional cellular telephone networks. The generic architecture of such cellular networks is shown in Figure 11.3. MHs connect to the network through base stations. Each base station forms a cell around it. An MH in a given cell remains within the range of the base station responsible for all communications in that cell. Base stations in cellular networks, therefore, are similar to access points in wireless LANs. Each base station is connected to a base station controller (BSC) via fixed links. One of the functions of a BSC is to perform a handoff between base stations when an MH leaves a cell and enters another. These BSCs together with other equipment form the fixed switching backbone of the cellular networks.
Switching backbone BSC
BS
Cell 1
Cell 2
BS
MH
Figure 11.3
The basic cellular network architecture.
There are several standards and systems deployed in different regions to build the cellular networks. The global system for mobile communications (GSM) is
274
Engineering Internet QoS
the most widely deployed cellular network that supports circuit-switched digital telephony and 9.6 Kbps data communication. Recently, an extension of GSM, called general packet radio service (GPRS), is being deployed to support packet switching data services for bursty Internet applications such as e-mail and WWW. GPRS boosts the data rate from 9.6 Kbps to 28.8 Kbps or even higher. Launched in 2001, the next generation of cellular networks, referred to as Third Generation or simply 3G, is being gradually deployed in many parts of the world. 3G networks will enable IP-based communication between the mobile handset and the network, bringing it closer to the existing IP-based Internet infrastructure. 3G supports 144 Kbps inside a moving vehicle, 384 Kbps for pedestrians, and 2 Mbps inside a building (fixed wireless access). 3G aims to integrate indoor and outdoor communications to support global seamless connectivity through a hierarchical cell structure using satellites, macro, micro, and pico cells. Satellite cells cover a large geographical area to provide global coverage. Using satellite, it is possible to cover every inch of planet Earth. The next in the hierarchy are the traditional hexagonal (see Figure 11.3) macro cells providing wide area coverage. The micro cells will cover outdoor areas, such as city streets hidden by large buildings, which cannot be effectively covered by macro cells. Finally, the pico cells will provide high-speed indoor coverage. 11.2.4 Comparison of Wireless Networks Although wireless connectivity remains the common denominator of all the wireless networks discussed above, they differ from each other in several ways. Table 11.1 compares and contrasts various wireless data networks.
11.3 MOBILE SERVICES OVER IP NETWORKS The research community has focused on providing mobile data service over the existing Internet. We provide a brief overview of two architectures, Mobile IP and Cellular IP, to support mobility in IP networks. 11.3.1 Mobile IP The Mobile IP is a standard protocol described in RFC2002 [7]. Before explaining the protocol architecture, we first define a set of terminologies:
QoS in Mobile Wireless Networks
275
Table 11.1 Comparison of Various Wireless Data Networks Network
Speed
Range
Spectrum
Switching
Applications
802.11b 11 Mbps HIPERLAN 23.5 Mbps Bluetooth 1 Mbps
10–20m 50m 10–100m
Unlicensed (free) Unlicensed (free) Unlicensed (free)
Packet Packet Packet
GSM GPRS 3G
35 km 35 km 10–35 km
Licensed (expensive) Licensed (expensive) Licensed (expensive)
Circuit Packet Packet
LAN LAN PAN, home network Outdoor Outdoor Indoor, outdoor
9.6 Kbps 28.8 Kbps 144/384 Kbps, 2 Mbps
Mobile node (MN): A device that moves from one network or subnetwork to another retaining its IP address is defined as mobile node. The device could be as small as a PDA and as big as a router onboard a vehicle or aircraft. The device is able to continue using the Internet at other locations without changing its IP address as long as it has physical connectivity to the new point of attachment. Home network (HN): The MN belongs to a network (home network) sharing the subnet part of its IP address. Even when the MN is away, the standard IP routing will deliver packets destined to MN to its home network. Home Agent (HA): A router on an MN’s home network responsible for forwarding IP datagrams to the mobile node when it is away from home. It also maintains the current location information for the MN. Foreign network (FN): Any network visited by an MN that is not its home network. Foreign agent (FA): A router in the visited foreign network that provides routing services to the MN once it is registered with the FA. The foreign agent forwards the datagrams that it receives from the MN’s home agent. The FA can act as a default router for packets originating from the MN. Correspondent node (CN): A node in the Internet that is currently communicating with the MN. It can be mobile or stationary.
276
Engineering Internet QoS
Care of address (COA): Tunnels are used to deliver the packet to an MN while it is away from the home network. The COA acts as the termination point of a tunnel toward an MN. There are two possible types of COA:
ë Foreign agent COA: Is an address of a foreign agent with which the MN is currently registered;
ë Colocated COA: Is a local address that the MN acquires temporarily and has associated with one of its network interfaces. In this case the MN itself becomes the termination point of a tunnel, as it has a topologically correct IP address.
Router Home agent Home network
Internet
COA
CN Router
Router Foreign agent
MN Foreign network
Figure 11.4
Mobile IP architecture.
Figure 11.4 shows the basic components of the Mobile IP architecture. A CN willing to send IP datagrams to an MN doesn’t need to know the current location of the MN. It sends the datagram as usual. As the IP routers in the Internet are not aware of the MN’s new location, the datagram is forwarded to the MN’s home address. The HA intercepts this packet (being aware that the MN is away).
QoS in Mobile Wireless Networks
Internet
Router
Gateway
Cell 1
277
BS
HA/FA 3. Forward packets to MH
Home agent for MH
MH
BS
2. Register with HA
Cell 2 1. Enter new access network
Figure 11.5
Cellular IP architecture.
IP-in-IP encapsulation is used to tunnel the packet to the COA. Encapsulation involves putting the original datagram in the data part of the new datagram and setting the destination address as COA and the source address as HA. The protocol identifier field is set to the indicated IP-in-IP encapsulation. The encapsulated packet reaches the foreign agent FA (assuming that it is the COA). The FA performs the decapsulation process and forwards the original datagram (sent by CN) to the mobile node. This scheme suffers from the problem of introducing excessive delay. Several optimizations have been proposed to ameliorate this problem. Textbooks, such as [5], provide a detailed treatment of these topics. 11.3.2 Cellular IP The Mobile IP approach is not suitable in a wireless environment in which the host mobility is very high. The Cellular IP project proposes an architecture, which is similar to GSM architecture, for a packet-based IP network that interoperates with the Internet and attempts to solve some of the problems with Mobile IP. This architecture separates the fine-grain local mobility (frequent) from coarse-grain mobility (infrequent) in a wide area. While Mobile IP is used for wider areas, the Cellular IP uses a new mobile host protocol that is optimized to provide access to a Mobile IP-enabled Internet in support of fast-moving wireless hosts [8, 9]. Figure 11.5 (adapted from Valko [8]) shows a step-by- step operation of the Cellular IP. When an MH enters a new access network (step 1), it registers with its home agent (step 2). From this point onwards, the home agent forwards packets addressed to the MH to the access network (step 3). There is no new registration required if
278
Engineering Internet QoS
the MH moves between the cells in this access network (for example, between cell 1 and cell 2). The assumption here is that mobility between access systems will be infrequent. Details of location management and handoff are beyond the scope of this chapter, and readers are encouraged to check the references.
11.4 IMPACT OF MOBILITY ON QoS Mobile IP and Cellular IP supports only the routing of packets without worrying about the QoS problems. There are several limitations and constraints in the mobile environment that pose additional challenges to QoS. In this section we discuss three important sources of QoS problems in mobile Internet: the link quality, movement of the user, and limitations of the portable devices. 11.4.1 Effect of Wireless Links The mobile user accesses the Internet via wireless links. When compared with the wired counterparts, wireless links have two main disadvantages. One is the (comparatively) poor quality resulting in much higher bit error rate (BER). High BER means that QoS mechanisms in mobile Internet must deal with high packet loss rates. A second problem, perhaps the most crucial one, with wireless links is the quality variation. The quality of the link can vary due to several reasons. The weather, interference with other mobile users, and barriers such as buildings, bridges, and mountains can all temporarily degrade the quality of the wireless link. The distance of the mobile user from the base station or the access point also plays a major role in the received signal strength and quality. The worst part of such quality variations is that it happens quite randomly. Although some of these quality variations can be predicted or statistically modeled, most of such occurrences remain unpredictable. Mobile QoS mechanisms therefore must account for such unpredictable changes in the line quality during a communication session. Although it is intuitive to think that forward error correction (FEC) techniques can effectively solve the wireless link quality problem, the problem of quality variation is more subtle than that. Each FEC technique has some overhead, which is a function of the redundancy used to correct a certain number of bit errors. The more bit errors we wish to correct, the higher the overhead. Therefore, one has to know in advance the actual quality of the link to deploy an appropriate FEC technique. If the link quality is varying over time, one choice of FEC will not be
QoS in Mobile Wireless Networks
279
very effective. Adaptive FEC schemes, which dynamically select the appropriate FEC coding, could be used to address varying link quality, but such techniques require more logic in the system. Low bandwidth is another practical problem with most existing wireless links. Restricted bandwidth makes it difficult to support multimedia applications, especially if multiple sessions are run simultaneously. Work on broadband wireless access is in progress and hopefully the bandwidth problem will be greatly solved in the future. For the time being, current solutions concentrate on developing low bandwidth protocols and new compression techniques. Bitrate required by various codecs has been discussed in Chapter 1. 11.4.2 Effect of Movement User mobility has a direct impact on the QoS. If the user is free to move, the network route from source to destination is likely to change during a communication session. Since network resources along the route must be reserved in advance to support the desired QoS of the session, the resource reservation task becomes extremely complicated if the route changes frequently and unpredictably. The other mobility-related issue is handoff. When the user moves out of the coverage of one wireless base station or access point and enters another, the session must be handed off from the previous base station to the new one. For traditional voice calls in circuit-switched cellular networks, such handoff is rather easy, as there is no consideration of relocating processing, data, and other contexts. However, handoff of real-time multimedia applications in packet-switching wireless networks is quite problematic, as a few seconds of disruption can have a detrimental effect on the QoS of the ongoing communication. The resource reservation and handoff issues in the mobile QoS environment are further discussed later in the chapter. 11.4.3 Limitations of Portable Devices The mobile handsets, such as mobile phones and personal digital assistants (PDAs), have severe limitations in terms of processing power, memory, and interface. In addition, the mobile devices have power restrictions, which leads to intermittent availability. Therefore, even if the network has excellent QoS management schemes in place, the result may still be unacceptable if the mobile device cannot cope with the QoS requirements of the user communication. Therefore, QoS management techniques in the mobile environment must take mobile devices into consideration. Chen et al. [10] have developed a framework to study the energy consumption of
280
Engineering Internet QoS
MAC protocols such as IEEE 802.11, E.C.-MAC, PARMA, MAR.-TDMA, and DQRUMA from the transceiver usage perspective.
11.5 MANAGING QoS IN MOBILE ENVIRONMENTS It is now clear that additional mechanisms must be in place before QoS can be achieved in mobile wireless networks. In this section, we discuss some measures to address the QoS problems in wireless networks. 11.5.1 Resource Reservation As discussed in Chapters 5 and 7, Intserv and Diffserv are the two basic approaches for supporting QoS in the Internet. Neither the Intserv nor the Diffserv approach works adequately with mobile hosts, due to difficulties in reserving resources in a mobile environment. In this section, we analyze the resource reservation problems in mobile networks and provide brief discussions of some proposed solutions. 11.5.1.1 Intserv The Intserv approach is designed to work with RSVP for all resource reservations. RSVP allocates resources on the links along the data path from the sender to the destination. However, for a mobile user, the data path will change as the user moves. The main limitation with RSVP is that there will be no resources reserved in a future router or base station where the mobile host may visit. MRSVP [11] is a proposed extension of RSVP that addresses the above limitation. MRSVP is based on the following concepts: MSPEC: A set of locations the mobile host will visit in the lifetime of a session is specified in advance. This set is called MSPEC. MSPEC for a mobile host with a “regular” mobility pattern can be obtained rather easily. Also, with the advancement in GPS technology and in-car navigation systems, it will be easier to predict the future cells quite accurately. However, obtaining the MSPEC will not be easy in a highly unpredictable environment. There is some ongoing research to approximately predict the MSPEC of a mobile host [12, 13]. Proxy agents: RSVP proxy agents are required in remote locations, specified in the MSPEC, to make reservations on behalf of the mobile host.
QoS in Mobile Wireless Networks
281
Passive reservations: To avoid wasting resources, reservations in future locations must be passive in the sense that these resources can be used by best effort sessions until the mobile host arrives. Therefore, any reservation made by a proxy agent is initially labeled as a passive reservation and switched to active reservation as soon as the mobile host arrives. At the time of this writing, there have been no RFCs specifying extensions of the RSVP to support mobile hosts. The MRSVP protocol proposed by Talukdar et al. [11] is a good start to discuss the problems associated with the current RSVP, but it is not a standard document. Hence, we do not discuss the protocol details of MRSVP in this book.
11.5.1.2 Diffserv
Next we look at the problems faced by the Diffserv approach in the mobile Internet. The main problem faced in the Diffserv environment is establishment of a service level agreement (SLA) between the user and the Internet service provider (ISP) of the router/network to which the user connects. For fixed corporate users, such SLAs are established manually. Any amendments to the SLA have to be done manually, which involves fax, phone, or Web forms. This is acceptable for fixed environments where such amendments are not that frequent. In the mobile environment, however, when a user leaves the home network and enters a foreign network, the user must establish an SLA with the foreign network. Manual establishment is out of question, as the time required for manual update will jeopardize any ongoing sessions. Therefore, to support Diffserv in the mobile Internet, SLA signaling must be in place that will allow mobile users to dynamically establish new SLAs with any visited foreign networks. A dynamic SLA framework is discussed in [14]. Such signaling must be scalable to support a large number of mobile hosts entering a network, and also very quick to minimize the impact on the application QoS. Of course, Diffserv has a problem similar to the Intserv’s in terms of resource allocations for mobile hosts. When a mobile host moves to a new network and tries to establish new SLAs, resources must be available in the network to support the QoS specified in the SLA. If sufficient resources are not available, the mobile host must deal with the degraded QoS. Therefore, mobile host applications must support adaptivity.
282
Engineering Internet QoS
11.5.2 Context-Aware Handoff Rerouting is the only problem that needs to be addressed during a handoff when best effort applications are in use. For multimedia real-time applications, however, more than routing is involved. When a mobile user is connected to a base station, the base station knows a number of QoS and other features, collectively known as the context of the flow, associated with the connection. Among other things, a context may include the QoS parameters (for example, the PHB of the Diffserv service), the security method, and any packet filtering rule the flow was receiving. Because such contexts are not transferred to the new base station, the new base station in traditional handoffs will need to establish the context from the scratch. Such establishment of the context, even if it is done dynamically using signaling, may take a long time, which may in turn cause significant disruption to the ongoing QoS of any real-time sessions. It is therefore necessary to transfer the context before or during the handoff to the new base station for smooth operation of the real-time applications. The exact syntax to hold the context information and the protocol to transfer the context from one base station to another must be standardized to accommodate third-party products. IETF is already working on this issue. 11.5.3 Application Adaptivity Mobile wireless networks exhibit basically two types of QoS fluctuations, hideable and nonhideable. Hideable fluctuations stem from minor variations such as increased latency, reduced signal quality, and so on. Such minor variations in the quality can be effectively addressed (or hidden from the applications) by the traditional QoS management schemes, such as FEC and resource reservation, jitter buffering, and so on. Once these minor fluctuations are shielded from the application, a consistent QoS level can be guaranteed. In the presence of hideable fluctuations, the existing applications will continue to work satisfactorily without requiring any further adjustment. In reality, however, some of the QoS fluctuations are quite severe and cannot be completely hidden from the applications (nonhideable fluctuations). Causes of such severe fluctuations include loss of available power level in the mobile handset (due to battery drainage), moving to a new base station that has severe shortage of resources to be allocated, and moving from a high-speed wireless cell to a lowspeed cell. The last cause would be typical in future 3G cellular networks that will have high-speed pico cells in buildings and low-speed macro cells in the streets.
QoS in Mobile Wireless Networks
283
Nonhideable QoS fluctuations will require cooperation from the applications. The applications must be adaptive to the available resources in the network. For example, an adaptive video application would switch from colored to black-andwhite only pictures during the low bandwidth durations to reduce bit rate and also to conserve power during low-power situations. Such adaptivity does require rebuilding existing applications to accommodate necessary logics and communication mechanisms with the underlying network. 11.6 RESEARCH DIRECTIONS Mobile/wireless packet networks need to provide end-to-end QoS support to communication-intensive applications. This is challenging in the presence of scarce and variable wireless bandwidth, bursty wireless channel error, and user mobility. Providing QoS in mobile/wireless packet networks is a topten research area. We provide a few samples of research work in this direction. QoS Architecture In recent years, many researchers have proposed architectures to support QoS in mobile/wireless networks. The Mobiware [15, 16] project developed a QoS-aware middleware platform that is capable of supporting adaptive multimedia applications operating over combined wireline and wireless networks. One of the primary objectives of this project is the transport and seamless delivery of audio and video flows to mobile devices with a smooth change of perceptual quality during periods of QoS fluctuation and handoff. Srivastava and Mishra [17] propose a novel architecture for QoS support in mobile network. They argue in favor of simple wireless link layer mechanisms (not oblivious of QoS), and suggest that applications should be made more sophisticated, with QoS renegotiation and adaptation capabilities. They also identify QoS support required in various layers of mobile wireless networks. TIMELY [18] is an adaptive resource management architecture and offers algorithms for resource management in the mobile computing environment. It provides reservation, advance reservation, and resource adaptation by coordination of adaptation between the different layers of the network in order to solve the problems introduced by scarce and dynamic network resources. TOMTEN (total management of transmissions for the end-user) is a framework for managing resources in this type of environment as well as in the traditional single network environment [19]. Yasuda et al. [20] discuss an end-to-edge QoS framework that consists of a mechanism for resource reservation, the QoS translation, and the QoS arbitration.
284
Engineering Internet QoS
A simple mobility scheme for IP-based networks, termed the “anchor chain” scheme, has been proposed by Bejerano et al. [21]. The scheme combines pointer forwarding and caching methods. Every mobile host (MH) is associated with a chain of anchors that connects it to its home agent. Details of the scheme are beyond the scope of this chapter. The Mobile IP protocol based on IPv4 has some drawbacks in the areas of survivability, performance, and interoperability with protocols for providing QoS. An alternate protocol called Mobile IP with location registers (MIPLR) has been proposed by Jain et al. [22] to address some of these problems. A framework for QoS support in Mobile IPv6 is under discussion at IETF [23]. In recent years we have seen deployment of the general packet radio service (GPRS) as an enhancement of the GSM infrastructure. GPRS is capable of handling IP traffic; however it lacks QoS support. A QoS-aware architecture for this network has been proposed by Priggouris et al. [24] based on the Intserv model. Koodli and Puuskari [25] discuss a QoS architecture and specific mechanisms that are being defined for multiservice QoS provisioning in universal mobile telecommunication systems (UMTS). Resource Reservation Resource Reservation Protocol–related work has mostly focused on extending and enhancing the RSVP protocol. Mahadevan and Sivalingam [26] describe a routing architecture that guarantees QoS using a modified RSVP. In the proposed architecture, the wireless/mobile networks are partitioned into a hierarchy consisting of Internet at the top and mobile base stations at the bottom. A number of neighboring mobile base stations are grouped into a routing domain. Local mobility is handled with routing table changes. Interdomain mobility is handled by Mobile IP. QoS during mobility is handled by active and passive reservation. Modified RSVP is used to accommodate for new QoS parameters such as loss profiles, probability of seamless communication, and rate reduction. Other proposals related to modification of RSVP to support mobility have been discussed [27, 28]. Resource Management and Handoff A resource estimation and reservation technique in mobile networks is discussed by Levine et al. [29]. The authors propose a scheme called the “shadow” cluster concept, a predictive resource estimation scheme that provides high wireless network utilization by dynamically reserving required resources to maintain a negotiated call
QoS in Mobile Wireless Networks
285
dropping probability. The shadow cluster scheme estimates future resource requirements based on a collection of cells that a mobile is likely to visit in the future. Ramanathan et al. [30] present strategies for accommodating continuous service to mobile users through estimating resource requirements of potential handoff connections. This paper investigates static as well as dynamic resource allocation schemes. The results indicate that using dynamic estimation and allocation significantly reduces the dropping probability for handoff connections, as it probabilistically estimates the potential number of connections that is likely to be handed off from neighboring cells, for each class of traffic. Admission control algorithms in the wireless network need to incorporate additional parameters such as user mobility. They should also provide probabilistic limits so that sufficient resources are available as handoff takes place. Jain and Knightly [31] developed a framework for designing admission control algorithms in wireless networks that support guaranteed QoS. Pati et al. [32] propose techniques for bandwidth reservation and call admission control for wireless mobile networks. Packet scheduling in the wireless network has its own challenges because of bursty channel errors and location-dependent channel capacity and errors. A base station schedules packets for both down-link and up-link flows in a cell with only a limited knowledge of the arrival processes of up-link flows. Lu et al. [33] propose a new model for wireless fair scheduling based on an adaptation of fluid fair queuing to handle location-dependent error bursts. Their simulation results demonstrate that the algorithm achieves the desirable properties identified in the wireless fluid fair queuing model. The remote-queuing multiple access (RQMA) [34] addresses the problem at the (wireless) link level. It allows the wireless link to be seamlessly integrated with a wired network that supports QoS guarantees. Handoff Handoff in mobile networks has been an active research topic. Tripathi et al. [35] provide different aspects of handoff and discuss handoff related features of cellular systems as well as implementation of handoff process. Choi et al. [36] provide a new cutoff priority scheme that provides better QoS to handoff traffic while maintaining high throughput for the originating calls. Routing A study of routing in the wireless area has been performed by Hsiao et al. [37]. The authors describe a new distributed routing algorithm that performs dynamic load
286
Engineering Internet QoS
balancing by construction of a load-balanced backbone tree. The authors claim that their approach simplifies routing and avoids per-destination state for routing and per-flow state for QoS reservations. The paper also reports on the performance of the algorithm based on convergence speed, degree of load-balance, and adaptation to mobility. Context Transfer The issue of context transfer regarding the handoff of real-time applications in mobile networks is currently studied in the Seamoby working group of the IETF [38]. Seamless offering of QoS to a mobile node during handovers is crucial for enabling a variety of application services in the mobile Internet. The working group is currently defining the QoS contexts and the mechanism for transferral of contexts during handovers. Internet drafts from this working group are discussing the notion of QoS profile types and the associated QoS profiles, which describe the QoS contexts. QoS interoperability among the participating access routers is currently being explored. Transport Layer Issues The traditional TCP protocol as described in Chapter 4 has several problems when used in the wireless environment. Any missing acknowledgment is interpreted by TCP as a sign of congestion, triggering the slow start algorithm. This is not always the case for the mobile/wireless network where packet loss can happen because of the high error rate of wireless links. In addition, packets can be lost because of handoff of mobile terminals. As the true cause of a missing acknowledgment is not identified, performance of the TCP session gets affected badly. Many researchers have proposed schemes to optimize TCP (transport layer) for mobility. The indirect-TCP (I-TCP) scheme divides TCP into fixed and wireless segments. A standard version of TCP is used for the fixed part. The TCP connection is terminated by the access point, which acts as a proxy for the mobile host. A special version of TCP optimized for wireless link is used between the base station (access point) and the mobile host. Details of this scheme can be found in Bakre and Badrinath [39]. Snooping TCP [40] and mobile-TCP (M-TCP) [41] suggest further improvements on I-TCP. Freeze-TCP [42] proposes a mechanism to support a true end-to-end scheme. This scheme does not require the involvement of any intermediaries (such as base stations) for flow control. Code change is restricted to TCP code on the mobile
QoS in Mobile Wireless Networks
287
client side. No change is required on the sender side or intermediate routers as required by most other schemes. This makes the scheme interoperable with the existing IP network. Another project, TULIP, has similar objectives. It is tailored for the half-duplex radio links available with today’s commercial systems, such as IEEE 802.11 [43].
11.7 SUMMARY To support telephony and other real-time multimedia applications such as video conferencing, we must develop QoS technologies for the mobile wireless Internet. There are many competing and complementing wireless networks, such as IEEE 802.11b, HIPERLAN, and Bluetooth in the indoors, and the cellular networks in the outdoors. Large variations in the wireless link quality and the dynamic change in the network access point for the mobile host are the major QoS challenges in the mobile Internet. Although some of the variations in the wireless links can be masked effectively by the traditional QoS technologies, severe fluctuations require application adaptation. Adaptive multimedia applications will hold the key to the QoS management in mobile Internet.
11.8 REVIEW QUESTIONS 1. What are the benefits of mobile wireless networking? 2. Name two indoor and two outdoor wireless networking technologies. 3. Name a few limitations of the portable devices. Explain how such limitations effect the QoS in mobile networks. 4. Explain why user mobility poses new QoS challenges in mobile networks. 5. Discuss the limitations of RSVP in the mobile environment. 6. Explain how MRSVP addresses the limitations of RSVP. 7. The idea of passive reservation is to reduce waste of resources. Can you think of any side effects of passive reservations? 8. Explain how wireless link quality affects the QoS in wireless networks. 9. What is application adaptivity? How does application adaptivity help manage QoS in mobile networks?
288
References
10. What do you understand by context transfer? Why is context transfer needed for mobile multimedia applications?
References [1] A. Gyasi-Agyei. Mobile IP-DECT Internetworking Architecture Supporting IMT-2000 Applications. IEEE Network, 15(6):10–22, November/December 2001. [2] I. Busse, B. Deffner, and H. Schulzrinne. Dynamic QoS control of multimedia applications based on RTP. Computer Communications, 19:49–58, January 1996. [3] T. Fitzpatrick, G. Blair, G. Coulson, N. Davies, and P. Robin. Software architecture for adaptive distributed multimedia applications. IEE Proceedings – Software, 145(5):163–171, October 1998. [4] R. Rejaie, M. Handley, and D. Estrin. Layered Quality Adaptation for Internet Video Streaming. IEEE Journal on Selected Areas in Communications, 18(12):2530–2543, December 2000. [5] J. Schiller. Mobile Communications. Addison-Wesley, London, United Kingdom, 2000. [6] Y-B. Lin and I. Chlamtac. Wireless and Mobile Network Architectures. John Wiley and Sons, New York, 2000. [7] C. Perkins. IP mobility support. Request for Comments 2002, Internet Engineering Task Force, October 1996. [8] Andras G. Valko. Cellular IP: a new approach to Internet host mobility. Communication Review, January 1999.
ACM Computer
[9] A.Valko, J. Gomez, S. Kim, and A. Campbell. Performance of cellular IP access networks. In Proc. of 6th IFIP International Workshop on Protocols for High Speed Networks (PfHSN’99), Salem, August 1999. [10] J.-C. Chen, K. M. Sivalingam, P. Agrawal, and S. Kishore. A comparison of MAC protocols for wireless local networks based on battery power consumption. In Proceedings of the Conference on Computer Communications (IEEE Infocom), page 150, San Francisco, California, March/April 1998. [11] A. Talukdar, B. R. Badrinath, and A. Acharya. MRSVP: A resource reservation protocol for an integrated services network with mobile hosts. The Journal of Wireless Networks, 7(1):5–19, 2001. [12] T. Liu, P. Bahl, and I. Chlamtac. Prediction algorithm for efficient management of resources in cellular networks. In IEEE Globecom, pages 982–986, November 1997. [13] S. Lu and V. Bhargavan. Adaptive resource management algorithms for indoor mobile computing environments. In ACM SIGCOMM, pages 231–242, Stanford, California, August 1996.
References
289
[14] A. Iwata and N. Fujita. A hierarchical multilayer QoS routing system with dynamic SLA management. IEEE Journal on Selected Areas in Communications, 18(12):2603–2616, December 2000. [15] Anand Balachandran, Andrew T. Campbell, and Michael E. Kounavis. Active filters: delivering scaled media to mobile devices. In Proc. International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV), pages 133–142, St. Louis, Missouri, May 1997. [16] Oguz Angin, Andrew T. Campbell, Michael E. Kounavis, and Raymond R.-F. Liao. The Mobiware toolkit: programmable support for adaptive mobile networking. IEEE Personal Communications Magazine, 5(4):32–43, August 1998. [17] Mani Srivastava and Partho P. Mishra. On quality of service in mobile wireless networks. In Proc. International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV), pages 155–166, St. Louis, Missouri, May 1997. [18] Vaduvur Bharghavan, Kang-Won Lee, Songwu Lu, Sungwon Ha, Jia-Ru Li, and Dane Dwyer. The TIMELY adaptive resource management architecture. IEEE Personal Communications Magazine, 5(4):20–31, August 1998. [19] R. De Silva, B. Landfeldt, S. Ardon, A. Seneviratne, and Christophe Diot. Managing application level quality of service through TOMTEN. Computer Networks, 31(7):727–739, April 1999. [20] Yasunori Yasuda, Nobuhiko Nishio, and Hideyuki Tokuda. End-to-edge QoS system integration: integrated resource reservation framework for mobile Internet. Lecture Notes in Computer Science, 2092:294–299, June 2001. International Workshop on Quality of Service (IWQoS). [21] Yigal Bejerano and Israel Cidon. An anchor chain scheme for IP mobility management. In Proceedings of the Conference on Computer Communications (IEEE Infocom), Tel Aviv, Israel, March 2000. [22] Ravi Jain, Thomas Raleigh, Danny Yang, Li Fung Chang, Charles Graff, Michael Bereschinsky, and Mitesh Patel. Enhancing survivability of mobile Internet access using mobile IP with location registers. In Proceedings of the Conference on Computer Communications (IEEE Infocom), New York, March 1999. [23] H. Chaskar and R. Koodli. A framework for QoS support in mobile IPv6. Internet Draft, Internet Engineering Task Force, March 2001. Work in progress. [24] Giannis Priggouris, Stathes Hadjiefthymiades, and Lazaros Merakos. Enhancing the general packet radio service with IP QoS support. Lecture Notes in Computer Science, 1989:365–379, January 2001. [25] Rajeev Koodli and Mikko Puuskari. Supporting packet-data QoS in next-generation cellular networks. IEEE Communications Magazine, 39(2), February 2001. [26] I. Mahadevan and K. Sivalingam. An architecture for QoS guarantees and routing in wireless/mobile networks. In Proceedings of First ACM International Workshop on Wireless Mobile Multimedia, Dallas, Texas, October 1998.
290
References
[27] Jukka Manner and Kimmo Raatikainen. Extended quality-of-service for mobile networks. Lecture Notes in Computer Science, 2092:275–280, 2001. International Workshop on Quality of Service (IWQoS). [28] Andreas Terzis, Mani Srivastava, and Lixia Zhang. A simple QoS signaling protocol for mobile hosts in the integrated services Internet. In Proceedings of the Conference on Computer Communications (IEEE Infocom), New York, March 1999. [29] D.A. Levine, I. Akyildiz, and M. Nagshineh. A resource estimation and call admission algorithm for wireless multimedia networks using the shadow cluster concept. IEEE/ACM Transactions on Networking, 5(1):1–12, 1997. [30] Parameswaran Ramanathan, Krishna M. Sivalingam, Prathima Agrawal, and Shalinee Kishore. Dynamic resource allocation schemes during handoff for mobile multimedia wireless networks. IEEE Journal on Selected Areas in Communications, 17(7):1270–1283, July 1999. [31] Rahul Jain and Edward Knightly. A framework for design and evaluation of admission control algorithms in multi-service mobile networks. In Proceedings of the Conference on Computer Communications (IEEE Infocom), New York, March 1999. [32] H. K. Pati, R. Mall, and I. Sengupta. An efficient bandwidth reservation and call admission control scheme for wireless mobile networks. Computer Communications, 25(1):74–83, January 2002. [33] S Lu, V. Bharghavan, and R. Srikant. Fair scheduling in wireless packet networks. ACM Computer Communication Review, 27(4):63–74, October 1997. [34] Norival R. Figueira and Joseph Pasquale. Providing quality of service for wireless links: wireless/wired networks. IEEE Personal Communications Magazine, 6(5), October 1999. [35] Nishith D. Tripathi, Jeffrey H. Reed, and Hugh F. VanLandingham. Handoff in cellular systems. IEEE Personal Communications Magazine, 5(6):26–37, December 1998. [36] Sung-Ho Choi and Khosrow Sohraby. Analysis of a mobile cellular system with hand-off priority and hysteresis control. In Proceedings of the Conference on Computer Communications (IEEE Infocom), Tel Aviv, Israel, March 2000. [37] Pai-Hsiang Hsiao, Adon Hwang, H. T. Kung, and Dario Vlah. Load balancing routing for wireless access networks. In Proceedings of the Conference on Computer Communications (IEEE Infocom), Anchorage, Alaska, April 2001. [38] Context transfer, handoff candidate discovery, and dormant mode host alerting (seamoby). http://www.ietf.org/html.charters/seamoby-charter.html, 2001. [39] Ajay Bakre and B. R. Badrinath. Handoff and system support for indirect TCP/IP. In Second USENIX Symposium on Mobile and Location-Independent Computing Proceedings, Ann Arbor, Michigan, April 1995. [40] Hari Balakrishnan, Srinivasan Seshan, Elan Amir, and Randy H. Katz. Improving TCP/IP performance over wireless networks. In Proc. of 1st ACM Conference on Mobile Computing and Networking, Berkeley, California, November 1995.
References
291
[41] K. Brown and S. Singh. M-TCP: TCP for mobile cellular networks. ACM Computer Communication Review, 27(5):19–43, October 1997. [42] Tom Goff, James Moronski, Dhananjay S. Phatak, and Vipul Gupta. Freeze-TCP: a true end-toend TCP enhancement mechanism for mobile environments. In Proceedings of the Conference on Computer Communications (IEEE Infocom), Tel Aviv, Israel, March 2000. [43] Christina Parsa and J. J. Garcia-Luna-Aceves. Improving TCP performance over wireless networks at the link layer. Mobile Networks and Applications, 5(1):57–71, April 2000.
292
References
Chapter 12 Future In this chapter we discuss some of the latest QoS developments that may shape the future of the Internet. The chapter starts with discussion of a new architecture that combines both Intserv and Diffserv for end-to-end service provision. There are several other issues that play an important role in providing end-to-end QoS. It is hard to provide details of all these topics within a single book. We briefly describe these with references for further reading in the second half of this chapter.
12.1 INTSERV OVER DIFFSERV 12.1.1 Motivation The strength of Intserv is per-flow reservation using RSVP and per-flow QoS guarantee. Per-flow QoS guarantee is possible via per-flow state management in the core routers. This leads to a state explosion and scalability problem that we discussed earlier, in Chapter 5. On the other hand, Diffserv enables scalability across large networks but may not be able to support per-flow QoS guarantees. Table 12.1 provides a comparison between these two approaches to recapitulate our earlier discussions in Chapters 5 and 7. 12.1.2 Generic Framework for Intserv over Diffserv IETF researchers are proposing a new hybrid architecture that takes the best features of these two approaches [1]. Intserv can be used in the edge of the network, as state explosion is unlikely to happen at the edge. With this new Intserv over Diffserv
293
294
Engineering Internet QoS
Table 12.1 Comparison of Intserv and Diffserv Criterion
Intserv
Diffserv
QoS guarantee Classification State maintenance Admission control Service type Signaling protocol Scalability
Per flow Multifield Per flow All routers CL or GS RSVP Unscalable for large number of flows
Aggregate (class-based) DSCP or Multifield Per class Edge routers AF or EF Bandwidth broker based Scalable (flows aggregated)
Receiver
Sender
Edge router 2
Edge router 1
Border router 1
Figure 12.1
Intserv domain
Diffserv domain
Intserv domain
Border router 2
Intserv over Diffserv architecture.
(IS/DS) framework, the core of an Intserv network is replaced with a Diffserv cloud to address the scalability problem. Because the hosts are still connected to Intserv at the edge, IS/DS can support end-to-end per-flow QoS. Figure 12.1 is a graphic representation of the IS/DS framework. The network architects may choose the size of the Diffserv domain. In one extreme, the Diffserv domain is pushed all the way to the periphery, with hosts alone having Intserv capability. In the other extreme, Intserv is pushed all the way to the core, with no Diffserv domain. Figure 12.1 shows a middle ground. Interconnection between IS and DS is achieved through edge and border routers, as shown in Figure 12.1. However, the DS border routers may not be RSVP capable. In order to achieve seamless connectivity between IS and DS, QoS mapping is used at the intersection of IS and DS. This mapping provides transparent end-toend connectivity across an IS/DS framework. Hosts connected to Intserv at the edge use either guaranteed service or controlled load service as defined in the Intserv
Future
295
specification. These services are invoked by individual flows. However, the DS domain in the core does not understand these services, as it supports only two PHBs, EF and AF. In this respect, the DS domain appears as a virtual link to the Intserv network. This is similar to an IEEE 802.1p link with its own QoS priorities. We have seen before how Intserv services are mapped to 802.1p priority classes. Similarly, we need to map Intserv services to Diffserv PHBs. Point-to-Point Communication in IS over DS We describe how communication takes place in an IS over DS architecture below. For simplicity we take a single point-to-point connection. The DS domain is RSVP unaware. We describe the process of establishing end-to-end QoS through the following steps, using Figure 12.1:
ì The sender generates RSVP PATH message describing its traffic profile; ì The PATH message is processed (path state is installed) by all routers in the IS domain and it is forwarded to the DS domain toward the Receiver;
ì Routers in the DS domain ignore the PATH message (no path state processing) and forward it to the IS domain toward the receiver;
ì Finally, the receiver receives the PATH message and generates a RESV message; ì IS domain carries the RESV message toward the sender; if any router has insufficient resources to support this reservation, it will generate an error message and will reject the request;
ì When the RESV message reaches edge router 1, it checks the service level specification (SLS) of the DS interface to see if there is enough unused resources in the agreement to support this reservation request. If not, the request is rejected, otherwise the RESV is forwarded toward the sender;
ì Receipt of the RESV message by the sender’s RSVP process indicates that the resource reservation has been successful. This sender remains unaware of the fact that the core of the network does not support Intserv. 12.1.3 Guaranteed Service over EF PHB Guaranteed service can be mapped only over EF PHB, as EF can provide delay bounds required by the guaranteed service. Equation 12.1 shows how to compute
296
Engineering Internet QoS
the delay bound for the entire DS cloud with í hops/link [2, 3].
î.ïSðLñ ò¤óSôHõ í ö)÷|ø ùWú ô ð í úûù ÷
üý ö
ô'þ
ù ð ÿ ú ù ÷ í ÷
(12.1)
where
ò ñ : Maximum deviation of the amount of service of the EF queue from the ideal òfluid service at rate ò : Minimum service rate of the EF queue (denoted )
ô : Bounded ratio of EF traffic load to service rate of EF queue ö : Token bucket rate õ : Bucket depth í : Maximum number of hops
Every link configures a certain portion of bandwidth for the EF traffic. The utilization of this bandwidth should be kept low. This formula is valid for ò low ù ð úoù utilization of the EF bandwidth; utilization ô has to be less than í ÷ . is the ideal EF service rate assuming a fluid-flow (close to bit-by-bit transmission) ò model. However, real-life transmissions are completed at packet boundaries instead ò of at bit boundaries, leading to some deviation from the true service rate of . In this formula, ñ denotes the maximum deviation from . 12.1.4 Controlled Load over AF PHB Controlled load service does not require strict delay bounds and hence can be transported using either EF or AF PHBs. As we learned in Chapter 5, each CL flow is described by a token bucket rate ö and depth õ . All CL flows can be grouped into several classes based on their ö õ ratios. After this classification, each class needs to be transported over a different AF class. In Chapter 7 we looked at several drop priority levels within the same AF PHB. This makes it possible to downgrade nonconforming packets by marking them with a lower drop priority. Because both conforming and nonconforming packets of a CL flow share the same PHB and hence the same queue, packet reordering does not happen at the exit from the DS cloud. Mapping of CL over EF is nontrivial. As EF doesn’t have subclasses, all CL flows must be mapped to the same EF PHB. The bandwidth for EF PHB should be configured so that the CL flow with the minimum delay requirement can be supported. Lack of drop precedence for EF PHB means that nonconformant packets can either be dropped or remarked with default PHB (best-effort service).
Future
297
Remarking with default PHB creates a new problem. Packets from the same flow will be put into separate router queues and hence are likely to arrive out of order at the exit point of the DS cloud. Because of this limitation, it is desirable that CL service be mapped to AF PHB unless EF PHB is not supported.
12.2 QoS ROUTING QoS routing has been a very active research area for many years. It selects network routes that satisfy the requested QoS for a connection or set of connections. On top of this, QoS routing should achieve global efficiency in resource utilization. For example, the shortest-widest-path algorithm uses bandwidth as a metric and it selects the paths that have the largest bottleneck bandwidth. The bottleneck bandwidth represents the minimum unused capacity of all links on the path. In the event of two paths with the same bottleneck bandwidth, the path with the minimum number of hops may be selected as a tie breaker. A very good overview of QoS routing, along with challenges and possible solutions, has been provided by Chen and Nahrstedt [4]. Also, Ma and Steenkiste have studied QoS path selection for traffic requiring bandwidth and delay guarantees [5]. The Wang and Crowcroft algorithm [6] finds a path for any given constraint on bottleneck bandwidth and propagation delay. They suggest that only bandwidth and delay metrics are necessary for QoS routing and this would work well for computing a route that required a particular QoS at some setup time. We discussed constrained based routing in Chapter 10. QoS routing is required to select paths with specific QoS requests in order to support constrained based routing. Constrained based routing may in addition select paths based on policies. Introduction of QoS routing globally in the Internet requires developing standard QoS routing protocols and upgrading the Internet. In addition, most QoS routing protocols are known to have higher overhead in terms of processing, storage, and communication [7]. As a result of these limitations, QoS routing protocols have been restricted mostly to the Intranet environment.
12.3 RESOURCE DISCOVERY AND QoS Several distributed applications are deployed all over the Internet. Network clients need to first discover and then to access these services. One of the major challenges in such a wide area service network is to build an efficient service discovery framework. This becomes significant especially when a number of service providers
298
Engineering Internet QoS
are capable of providing the same service. Clients would be inclined to choose a service provider that could provide the best QoS for a particular service at a competitive price. Research work is required on developing a QoS-aware service, discovery framework that could improve query responsiveness for new services and provide QoS capable service to end users. Researchers from Berkeley have developed an architecture called secure service-discovery service (SDS) [8]. Salient features of this architecture include security and fault tolerance. Service providers use the SDS to advertise complex descriptions of available or already running services, while clients use the SDS to compose queries for locating these services using XML to encode factors such as cost, performance, location, and device- or service-specific capabilities. However, this framework, does not address the QoS issues. The Monet research group from UIUC has extended the SDS work by providing a hierarchical structure to the SDS model [9]. The following improvements have been shown through simulation results:
ì The service client is added with feedback capability on past experiences; ì The discovery server uses caching and propagates the service advertisements with QoS feedbacks in the discovery server hierarchy. Nakao et al. [10] from Princeton University have proposed a framework for constructing network services for accessing media objects. User input and resource requirements for media objects are used to discover the end-to-end path. This path must have sufficient resources to play the object. Once the path has been discovered, individual nodes along it are configured with the modules that implement the service. After discovery of the service, it needs QoS routing for efficient realization of service. All possible routes between the particular source and sink must be assessed in conjunction with the server load. Currently, the server replication is commonly used to improve scalability. When the server replication is used, the primary concern is how a client chooses the best server. Fu et al. [11] have performed a simulation study by using a composite metric using QoS routing and dynamic load on servers for resource discovery. A brief discussion of this issue is provided in Section 12.5.
12.4 VIRTUAL PRIVATE NETWORK AND QoS Throughout this book we discussed issues relating to providing end-to-end QoS in the Internet. However, in the foreseeable future, the virtual private network (VPN) seems to be the likely candidate for deploying a QoS based network. So far,
Future
299
enterprises have been restricted to connecting their sites using leased lines, frame relays, or ATM virtual circuits. In recent years, IP based VPN has started to emerge. Besides the savings on infrastructure cost and network management benefits, IP based VPNs could also provide QoS guarantees to performance sensitive flows. The IP based VPNs must provide a service comparable to leased line networks. They have to provide features such as closed user groups, and security and performance guarantees [12]. Several router/switches have appeared in the market with IP VPN support. Fendick et al. [13] describe the architecture and features of a commercial IP switch designed specifically to meet the bandwidth demand and the need for differentiated services of the QoS-capable VPN. Balakrishnan and Venkateswaran [14] have looked at various technologies for an integrated services infrastructure and analyzed their ability to satisfy multiservice requirements for Intranet VPN applications. MPLS is emerging as a strong candidate for building very high-speed VPNs as ATM interfaces are not available at high speeds such as OC192. In addition to this, MPLS integrates well with the IP network, as we have discussed in Chapter 10. Busschbach [15] describe two ways of implementing QoS-capable VPNs. One of them uses a pure ATM infrastructure. The second architecture is based on MPLS and integrates well with the IP transport over a variety of infrastructures.
12.5 CONTENT DISTRIBUTION NETWORK AND QoS During the past several years, people have started using the Web for mainstream business. In many cases, these applications require performance guarantees from the network. Users are continuously requesting faster and more reliable access to such services. As the current Internet is unable to satisfy these performance requirements, a new architecture called content distribution network (CDN) has become popular. Special streaming-media events such as live sporting events (a one-day cricket match between India and Australia), fashion parade, concerts, interactive video, and e-commerce are examples of applications driving the CDN network. CDN basically is an overlay network to the Internet that is built specifically for the high-performance delivery of content. CDN achieves better performance by reducing the number of hops a Web request and resulting response (content) must traverse. It takes advantage of a strategically located set of distributed caching, load balancing, and Web-request redirection systems. Cache servers (also called surrogates) replicating the developer’s content are located within network edge points of presence (PoPs). Qiu et al. [16] have developed several placement algorithms
300
Engineering Internet QoS
and studied their performance. These algorithms use workload information, such as client latency and request rates, to make informed placement decisions. PoP is generally an IP network service provider’s central office that connects an end user to the Internet via access link. This overlay may result in having the content just one hop away from the end user. Johnson [17] has studied two commercial content distribution networks (CDNs) and provides some insight into their performance. In order to have a CDN network interworking with the rest of the Internet, a content provider’s Web site is redirected to the data center of the CDN provider. It is the job of the CDN to direct the request to a surrogate closest to the user (one that can provide better QoS). In addition CDNs are also making use of a new technology called Web switching or content switching. These techniques fall under the category of application layer switching whereby routers and switches in the Internet take into consideration higher layer information such as content type, user capability, etc., as opposed to a traditional IP header (layer 3) only while forwarding a datagram. Industry groups such as Content Alliance and Content Bridge Alliance are working closely with the IETF to come out with technical and business standards. For example, the content delivery network peering (CDNP) group is concerned with interoperability issues between CDN network providers. Details can be found in specialized texts such as Verma’s [18].
12.6 WEB QoS
Even though the network built using QoS architectures such as Intserv or Diffserv is capable of delivering network quality of service, applications such as Web transactions may not achieve end-to-end QoS if the Web servers use the traditional FIFO scheduling of requests. Bhatti et al. [19, 20] propose an architecture for supporting server QoS. They add a process called connection manager to the Web server Apache. This process works transparent to normal HTTP transactions by intercepting the HTTP requests and classifying them based on configured policies. The classified requests are put into separate queues and scheduled based on preconfigured priority levels. Through experimental results, they have shown that this new architecture is able to protect the premium grade request from basic (best-effort)– type transactions. The connection manager performs admission control functions and rejects requests if resources are not available.
Future
301
12.7 BILLING AND CHARGING FOR QoS The Internet has used a very simple model, where all packets are treated equally. In most cases users are charged only for the access link to network. Many ISPs use the following equation to price the user services: kï
ø
ó
õ ø
ó
(12.2)
Where V is the volume downloaded, T is the connection time, and (a, b, c) describes the selected tariff [21]. Equation 12.2 relates to looking at the charge generated within a single connected session. The parameter is any associated connection cost for the current session (for example, a dial-up ISP may charge a fee every time one dials in). The unit for may take the values of Megabytes. Many ISPs once favored hours for , but now some represent it in minutes or even seconds. It is usual for either or õ to be set to , but certainly not a requirement. The biggest reason that one is typically is for marketing simplicity so that the simple scheme may be understood by users. This also simplifies the accounting process. Most of the information related to accounting is collected using the remote authentication dial-in user service (RADIUS) [22]. The INDEX project [23] at Berkeley has found that flat-charging wastes resources, subsidizes heavy users, and hinders the deployment of high-quality Internet service. With introduction of QoS in the network, different billing/charging schemes will be required as customers will demand varying grades of service. Economists use a term called Nash Equilibrium whereby users are expected to act selfishly to maximize their own satisfaction level, with a network reaching equilibrium when no user can unilaterally increase their satisfaction level. Cocchi et al. [24] have studied the impact of differential pricing schemes for multiple-service class networks that can be chosen in a way that they produce Nash Equilibrium at optimum network efficiency. Many researchers have used an auction based approach where a user bids in competition with all other users to obtain service from the network. A scheme called smart market uses a second price or Vickery auction whereby each packet carries a bid and is admitted if the bid exceeds the current marginal cost of transportation. The user is finally required to pay the price of the highest unsuccessful bid (second price) [25]. Several variants of this smart market scheme have also been proposed [26, 27, 28]. Although there are advantages to using dynamic pricing/charging schemes, they suffer from heavy accountancy overhead. Researchers have shown that usagebased charging schemes can place substantial overhead on the system, depending upon the granularity of accounting required [29, 30].
302
Engineering Internet QoS
For the past few years there has been a continuing decline in computing and communication prices. This has lead some to believe that the solution to the QoS problem is simply to get too much resource. They believe that if there is no resource shortage, there is no need for QoS. Many researchers feel that the Internet already has too much dark fiber that provides an abundance of bandwidth and subsequently there is no need for complicating the network with dynamic charging [31]. Resource shortage has been a perennial problem. Back in the 1980s when memory was relatively expensive, a lot of research was done on memory management. In those days, some gurus were predicting that within 10 years, memory will be cheap and we will not have to worry about memory management in computers. It turns out that cheap resources do not obviate the need for resource management. Today’s computers have gigabytes of memory but even in those computers, memory management is required. In fact, today’s computers have many more levels of caches and memory hierarchy than those in the early era computers. The lesson is that having cheap bandwidth will not obviate the need for proper bandwidth management.
12.8 FINAL WORDS QoS provisioning in the Internet has been a popular research and development area for over a decade. Despite the progress made in past decade, we are far away from an Internet that supports QoS seamlessly over a variety of wired and wireless link technologies across multiple network providers between heterogeneous end systems. Switches and routers with QoS capabilities have appeared in the market for the past few years. QoS support in the Intranet environment is being implemented using switched Ethernet. Last mile problems are being solved with introduction of technologies such as digital subscriber loop (DSL) and cable modem. Network service providers are continually upgrading their core-network capacity using optical technologies such as wave division multiplexing (WDM). Connectivity to such core networks with some form of QoS guarantees will enable provisioning of end-to-end QoS. The answer to whether a QoS capable Internet will be feasible lies in whether there will be enough QoS-sensitive user applications and whether the market is willing to pay for such applications.
Future
303
12.9 SUMMARY Earlier we looked at Intserv and Diffserv architectures and their relative deficiencies. This chapter discussed a new architecture that combines both Intserv and Diffserv. We also had a brief overview of other related topics such as QoS routing, VPN and QoS, billing and charging for QoS, and content distribution network and QoS. Finally, the authors’ view of the future Internet concludes this chapter and the book.
12.10 REVIEW QUESTIONS 1. Compare and contrast Intserv and Diffserv architectures. 2. What are some of the challenges in providing Intserv over Diffserv? 3. How can end-to-end QoS signaling be achieved in Intserv over Diffserv architecture? 4. What are the goals of QoS routing? Why is it hard to implement QoS routing in the Internet? 5. Why is QoS becoming important for resource discovery protocols? 6. What are the benefits of combining QoS routing with server load for resource discovery? 7. What are the benefits of an IP based VPN? 8. What is a content distribution network (CDN)? 9. What is the significance of the data center in a CDN? 10. Why is server QoS an important issue in providing end-to-end QoS? 11. Why do some researchers feel that charging differently for different services is not required in the Internet? 12. Can the QoS problem be solved by throwing lots of bandwidth?
304
References
References [1] Y. Bernet, P. Ford R. Yavatkar, F. Baker, L. Zhang, M. Speer, R. Braden, B. Davie, J. Wroclawski, and E. Felstaine. A framework for integrated services operation over diffserv networks. RFC 2998, Internet Engineering Task Force, November 2000. [2] J. Wroclawski and A. Charny. Integrated service mappings for differentiated services networks. Internet Draft draft-ietf-issll-ds-map-01.txt, IETF, February 2001. [3] A. Charny and Y. Le Boudec. Delay bounds in a network with aggregate scheduling. In Quality of Future Internet Services, First COST 263 International Workshop(QofIS 2000), pages 1–13, Berlin, Germany, September 2000. Springer. [4] S. Chen and K. Nahrstedt. An overview of quality of service routing for next-generation high-speed networks: problems and solutions. IEEE Network, 12(6):64–79, November 1998. [5] Q. Ma and P. Steenkiste. Quality-of-service routing for traffic with performance guarantees. In Andrew T. Campbell and Klara Nahrstedt, editors, The Fifth IFIP Fifth International Workshop on Quality of Service, Columbia University, New York, May 1997. IFIP WG6.1, Chapman and Hall. [6] Z. Wang and J. Crowcroft. Quality of service routing for supporting multimedia communications. IEEE Journal on Selected Areas in Communications, 14(7):1465–1480, September 1996. [7] G. Apostolopoulos, R. Gurin, S. Kamat, A. Orda, and S. K. Tripathi. Intradomain QoS routing in IP networks: a feasibility and cost/benefit analysis. IEEE Network, 13(5):42–53, September 1999. [8] Steven E. Czerwinski, Ben Y. Zhao, Todd D. Hodes, Anthony D. Joseph, and Randy H. Katz. An architecture for a secure service discovery service. In ACM/IEEE International Conference on Mobile Computing and Networking, pages 24–35, Seattle, Washington, August 1999. [9] Dongyan Xu, Klara Nahrstedt, and Duangdao Wichadakul. QoS-aware discovery of wide-area distributed services. In Proceedings of IEEE/ACM Int’l Symposium on Cluster Computing and the Grid (CCGrid 2001), Brisbane, Australia, May 2001. [10] A. Nakao, L. Peterson, and A. Bavier. Constructing end-to-end paths for playing media objects. In Proceedings of the OpenArch’2001, Anchorage, Alaska, March 2001. [11] Zhenghua Fu and Nalini Venkatasubramanian. Directory based composite routing and scheduling policies for dynamic multimedia environments. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium 2001, San Francisco, April 2001. [12] R. Yuan and T. Strayer. Virtual Private Networks Technologies and Solutions. Addison Wesley, New Jersey, 2001. [13] K. W. Fendick, V. P. Kumar, T. V. Lakshman, and D. Stiliadis. The PacketStar 6400 IP switch — an IP switch for the converged network. Bell Labs Technical Journal, 3(4):32–47, October–December 1998.
References
305
[14] M. Balakrishnan and R. Venkateswaran. QoS and differentiated services in a multiservice network environment. Bell Labs Technical Journal, 3(4):222–239, Oct–Dec 1998. [15] Peter B. Busschbach. Toward QoS-capable virtual private networks. Bell Labs Technical Journal, 3(4):161–175, October–December 1998. [16] L. Qiu, V. Padmanabhan, and G. Voelker. On the placement of Web server replicas. In Proceedings of the Conference on Computer Communications (IEEE Infocom), volume 3, pages 1587–1596, Anchorage, Alaska, April 2001. [17] K. L. Johnson, J. F. Carr, M. S. Day, and M. F. Kaashoek. The measured performance of content distribution networks. Computer Communications, 24(2):202–206, February 2001. [18] D. Verma. Content Distribution Networks: An Engineering Approach. John Wiley and Sons, New York, 2001. [19] Nina Bhatti and Rich Friedrich. Web server support for tiered services. IEEE Network, 13(5):64– 71, September/October 1999. [20] Nina Bhatti, Anna Bouch, and Allan Kuchinsky. Integrating user-perceived quality into Web server design. Computer Networks, 33(1-6):1–16, June 2000. [21] M. Chalmers, S. Jha, and M. Hassan. Internet charging—behind the time$. In Proceedings of 9th IEEE International Conference on Networking, pages 116–121, Bangkok, Thailand, October 2001. [22] C. Rigney. RADIUS Accounting. IETF RFC2139, April 1997. [23] R. Edell and P. Varaiya. Providing internet access: what we learn from INDEX. IEEE Network, 13(5):18–25, September 1999. [24] R. Cocchi, et al. Pricing in computer networks: motivation, formulation and example. IEEE/ACM Transactions on Networking, 1(6):614–627, December 1993. [25] J. MacKie-Mason and H. Varian. Pricing the Internet. In Public Access to the Internet, JFK School of Government, May 26–27, 1993, page 37, 1993. [26] J. MacKie-Mason. A smart market for resource reservation in a multiple quality of service information network. Research Paper, Unpublished, University of Michigan, http://wwwpersonal.umich.edu/ jmm/, September 1997. [27] N. Semret, R. R.-F. Liao, and A. T. Campbelland A. A. Lazar. Pricing, Provisioning and Peering: Dynamic Markets for Differentiated Internet Services and Implications for Network Interconnections. IEEE Journal on Selected Areas in Communications, 18(12):2499–2513, 2000. [28] N. Semret and A. A. Lazar. Spot and derivative markets in admission control. In Proceedings of 16th International Teletraffic Congress, pages 925–941, Edinburgh, UK, June 1999. [29] S. Shenker, D. Clark, D . Estrin, and S. Herzog. Pricing in computer networks: Reshaping the research agenda. ACM Computer Communications Review, 26(2):183–201, April 1996.
306
References
[30] J. Mackie-Mason and H. Varian. Pricing congestible network resources. IEEE Journal on Selected Areas in communications, 13(7):1141–1149, 1995. [31] A. Odlyzko. The economics of the Internet: utility, utilization, pricing, and quality of service. http://www.research.att.com/ amo/doc/internet.economics.pdf, October 1999.
About the Authors Sanjay Jha ([email protected]) is an associate professor in the School of Computer Science and Engineering at the University of New South Wales, Sydney, Australia. Dr. Jha holds a Ph.D. from the University of Technology (UTS), Sydney, Australia. His research activities cover a wide range of topics in networking including quality of service (QoS), mobile/wireless Internet, and active/programmable network. He has been working as an industry consultant for major organizations, such as Canon Research Lab (CISRA), Lucent, and Fujitsu. He was a visiting scientist, at the DCC labs of Columbia University in 1995. In his previous job at UTS, he taught in the graduate program in internetworking (in collaboration with Cisco). He is a member of the advisory board of the International Journal of Network Management. More information on his research and teaching activities is available from http://www.cse.unsw.edu.au/˜sjha.
Mahbub Hassan ([email protected]) is an associate professor at the University of New South Wales, Sydney, Australia. He is a member of the editorial advisory board of Computer Communications Journal and was an associate technical editor of IEEE Communications Magazine, 1999–2001. He chaired the SPIE International Conference on Quality of Service over Next Generation Data Networks, 2001 (Denver) and 2002 (Boston). He is the author of the book Performance of TCP/IP over ATM Networks (Artech House, 2000). He has published over 70 research publications. Dr. Hassan holds a Ph.D. from Monash University, Melbourne, Australia, and an M.Sc. from the University of Victoria, Canada. He is a senior member of IEEE. More information on his research and professional activities is available from http://www.cse.unsw.edu.au/˜mahbub.
307
308
Index Admission control controlled load, 110 differentiated services, 170, 185, 190 end-to-end measurement-based, 188–89 integrated services, 116–17 mobile networks, 286 overview, 35–36 policy-based, 116–17, 146–47, 160, 164, 198 research, 45 resource reservation protocol, 146–47, 158–60, 162–63 ADPCM. See Adaptive differential pulse code modulation ADSL. See Asymmetric digital subscriber line AdSpec, 135, 144 functional block, 150–151 object class, 149–150 Advertised window, 87, 90 AFC. See Aggregate flow control Aggregate flow control, 162, 189–90, 209, 260–61 Aggregate Route-Based IP Switching, 242, 247 Aggregate traffic contract, 170, 187
1BASE-5 standard, 11 AAL. See Asynchronous transfer mode adaptation layer ABR. See Available bit rate Absolute timing, 17 ABT. See Asynchronous transfer mode block transfer Access point, 271–72 Acknowledgment, transmission control protocol, 83, 85–86, 90, 92, 94 Acknowledgment number, transmission control protocol, 83, 85–86 Adaptation layers, asynchronous transfer mode, 225–28 Adaptive audio playout, 18–19 Adaptive differential pulse code modulation, 9 Adaptive flow, 98, 99–100 Adaptive resource negotiation, 45 ADC. See Analog-to-digital converter Added variable delay method, 17–18 Additive increase and multiplicative decrease, 90, 92 Adequate resources, 36 Ad hoc network, 271, 273
309
310
Engineering Internet QoS
AIMD. See Additive increase and multiplicative decrease Alternate queuing, 69 ALTQ. See Alternate queuing AMRED-G technology, 189 Analog-to-digital converter, 5 Anchor chain mobility, 285 AP. See Access point Application-level adaptation adaptive audio playout, 18–19 destination wait method, 16–18 feedback control mechanism, 19 forward error correction, 20 interleaving, 20 repair at receiver, 20 Application-level data rate, 3 Application quality of service parameters, 3 ARIS. See Aggregate Route-Based IP Switching AR PDB. See Assured rate per-domain behavior Assured forwarding per-hop behavior, 172, 185–87, 189, 190, 235, 236, 296–97 Assured rate per-domain behavior, 172 Asymmetric digital subscriber line, 12 Asynchronous transfer, 214 Asynchronous transfer mode, 13, 45, 57, 241 adaptation layers, 225–28 cell formats, 218–21 connections, 215–18 interfaces, 218 Internet protocol integration, 228–37 multiprotocol label switching over, 254–55 protocol architecture, 214–15 quality of service support, 221–25 reasons for, 213–14 Asynchronous transfer mode adaptation layer, 225–28, 254–55 Asynchronous transfer mode block transfer, 161 Asynchronous Transfer Mode Forum, 223, 233 ATM. See Asynchronous transfer mode Audio compression, 9
Audioconferencing, 109 Audio transmission, in Internet, 25–26 Audio Video Interleave, 8 Authentication header, 162 Available bit rate, 187, 224, 225, 226 Average bandwidth, 164 Average rate, 32–33, 36 Average-rate policing, 38, 39–42 Average-rate shaping, 43 AVI. See Audio Video Interleave Backbone, asynchronous transfer mode to, 228–30 BA classification. See Behavior aggregate classification Balanced random early detection, 101 Bandwidth guaranteed flows, 164 wireless links, 280 Bandwidth broker, 170–71, 188, 196–97, 203–6 Base station controller, 273–74 BB. See Bandwidth broker BCS. See Behavior class selector Behavior aggregate classification, 171, 177 Behavior class selector, 223, 224 BER. See Bit error rate Best-effort network, 13–16, 26, 50, 98, 107, 108, 126, 164, 224, 296 BGP protocol, 118, 252–53, 254 Billing models, Internet service, 301–2 B-ISDN. See Broadband integrated services digital network Bit error rate, 4, 279 Blind delay method, 16 Blockage state, 160 Blue method, 100–1 Bluetooth, 273, 275 Bottleneck bandwidth, 297 BRED. See Balanced random early detection Bridged protocols, encapsulation, 230–31, 233 Bridges, local area network, 122–24 Broadband integrated services digital network, 213, 214 Bronze service, 235
Index BSC. See Base station controller Buffer occupancy estimation, 96 Buffers, 26, 44, 255 in leaky bucket, 38–39 space reservation, 35 in transmission control protocol, 86–87, 96, 100 See also Application-level adaptation Burst size, 32–33, 177, 222 Burst-size policing, 38, 39–42 Burst-size shaping, 43 Bursty traffic, random early detection for, 100 Byte-oriented protocol, 82. See also Transmission control protocol CAC. See Call admission control Cache memory, 9 Call admission control, 190 Call admission multicast protocol, 209 CAMP. See Call admission multicast protocol Capacity planning, differentiated services, 190 Care-of address, 276–78 CBQ. See Class-based queuing CBR. See Constant bit rate CBS. See Committed burst size CDN. See Content distribution network CDNP. See Content delivery network peering CDV. See Cell delay variation CDVT. See Cell delay variation tolerance CE. See Congestion experienced Cell delay variation, 223 Cell delay variation tolerance, 223 Cell interleave problem, 254–55, 256 Cell loss priority, 221 Cell loss ratio, 223 Cell switched router, 242 Cell transfer delay, 223 Cellular Internet protocol, 278–79 Cellular networks, 273–75 Central processing unit, 9–10 Checksum field, 81, 84, 147 CIR. See Committed information rate
311
CL architecture. See Clearing house architecture Class-based queuing, 68–69, 70–71, 93, 165, 169, 296–97 Class-based threshold random early detection, 101 Classifier, differentiated services, 177 Class of service field, 246–47 Clawback buffer, 18 Clearing house architecture, 209 Clock synchronization, 17 Close command, SCRAPI, 156 CLP. See Cell loss priority CLR. See Cell loss ratio CLS. See Controlled load service CL service class. See Controlled load service class CMIP. See Common management information protocol CN. See Correspondent node COA. See Care-of address Colocated care-of address, 276–78 Committed burst size, 177 Committed information rate, 177 Common management information protocol, 118 Common open policy service, 199, 200–1 Common open policy service, outsourcing, 201 Common open policy service, policy provisioning, 200, 201, 202, 205 Common part indicator, 227 Compression audio standards, 9 video standards, 5, 6–9 Congestion avoidance, 92, 98, 185. See also Random early detection Congestion control, 44, 87, 88–92, 93–94 Congestion experienced, 94 Congestion notification, explicit, 93–94, 98 Congestion window, 87, 90 Connectionless service, 12 delay jitter, 15–16 Internet protocol, 78 user datagram protocol, 81
312
Engineering Internet QoS
Connection-oriented service, 12, 78, 82, 85, 108, 112, 215. See also Asynchronous transfer mode; Integrated services; Transmission control protocol Conservation law, 50–51 Constant bit rate, 32, 141, 143, 178 adequate resources, 36 asynchronous transfer mode, 224, 225, 226, 235 guaranteed service, 110 policing parameters, 37 Constraint routed label distribution protocol, 258, 259–60, 297 Content Alliance, 300 Content Bridge Alliance, 300 Content delivery network peering, 300 Content distribution network, 299–300 Context-aware handoff, 283 Context transfer, 287 Contract. See Quality of service contract Contract, quality of service, 34 Controlled load service block, 151 Controlled load service class, 110, 117, 118, 119, 145, 154, 296–97 Convergence sublayer, 226 Copper cabling, 11 COPS. See Common open policy service Core state fair queuing, 72–74 Correspondent node, 276–78 COS field. See Class of service field Counters in deficit round robin, 59–62 in leaky bucket, 38–39, 40–42 CPI. See Common part indicator CPU. See Central processing unit CRC. See Cyclic redundancy check CS. See Convergence sublayer CSE packet classification, 69 CSFQ. See Core state fair queuing CSR. See Cell switched router CTD. See Cell transfer delay Cyclic redundancy check, 227 DAC. See Digital-to-analog converter DASH resource model, 161 Datagrams, Internet protocol, 78–81
DCLASS object, 164 DECbit, 93, 98 Deficit counter, 59–62 Deficit round robin, 58–62 Delay bound with weighted fair queuing, 66–68 Delay jitter, 13–16, 18, 19, 26 Delay time, 3, 10, 13–16, 26, 115 Dense wavelength division multiplexing, 11 Designated subnet bandwidth manager, 163 Desktop approach, asynchronous transfer mode to, 228–29 Destination address field, 80, 148 Destination port number, 81, 82, 148–49 Destination wait method, Montgomery’s, 16–18 Diameter, 198 Differentiated services, 1, 79 architecture, 169–75 assured service, 185–87 integrated services over, 293–97 open issues, 187–88 over asynchronous transfer mode, 234–36 over multiprotocol label switching, 265 premium service, 178–85 research directions, 188–90 resource reservation, 282 routers, 175–78 RSVP support for, 164–65 Differentiated services code point, 171, 172, 174–75, 178, 179, 185–86, 235, 265 Differentiated unspecified bit rate, 224 Diffserv. See Differentiated services Digital signal processing, 9 Digital subscriber loop, 302 Digital-to-analog converter, 5 Double slope random early detection, 101 Drop-based services, 186–88, 190, 296–97 Dropped packets. See Packet dropping Dropper, differentiated services, 178 Drop-tail packet dropping, 94–95 DRP. See Dynamic reservation protocol DRR. See Deficit round robin DSBM. See Designated subnet bandwidth manager
Index DSCP. See Differentiated services code point DSL. See Digital subscriber loop DSP. See Digital signal processing DSRED. See Double slope random early detection DTB. See Dynamic token bucket Dual leaky bucket, 38, 42 Duplex protocol, 82, 133 DVI standard, 9 DWDM. See Dense wavelength division multiplexing Dynamic negotiation, 34 Dynamic reservation protocol, 161 Dynamic resource allocation, 286 Dynamic service-level agreement, 282 Dynamic token bucket, 45 EBS. See Excess burst size ECN. See Explicit congestion notification ECT. See Explicit congestion notification capable transport EE packet classification, 69 EF. See Expedited forwarding Elastic applications, 108–9 EMBAC. See End-to-end measurementbased admission control Encapsulating security payload, 162 Encapsulation Internet protocol datagrams, 229–33, 278 multiprotocol label switching, 246 End systems, multimedia processing, 9–10 End-to-end delay, 3, 13–16 End-to-end latency, 26 End-to-end measurement-based admission control, 188–89 End-to-end quality of service, 13–14, 120, 126–27, 188 End-to-end service transparency, 188 Error specification object, 151 Ethernet, 11, 82, 120, 121, 125, 246, 271, 302 ETSI. See European Telecommunications Standards Institute European Telecommunications Standards Institute, 273
313
Event timing, operating systems and, 10–11 EWMA. See Exponential weighted moving average Excess burst size, 177 Expedited forwarding, 178–79, 184–85, 208, 235, 295–97 Expedited forwarding per-hop behavior, 178–79, 235, 295–97 Explicit congestion notification, 93–94, 98 Explicit congestion notification capable transport, 94 Exponential weighted moving average, 96 Extension header, 22 FA. See Foreign agent Fair queuing, 68, 72–74, 93, 98 Fair share resource allocation, 35–36, 50, 51–52, 57, 58 Fast Fourier transform, 11 Fast lookup, 45 Fast recovery, 92 FCFS model. See First come first served model FDDI. See Fiber distributed data interface FEC. See Forward error correction; Forwarding equivalence class Feedback control mechanism, 19 FF. See Fixed filter FFT. See Fast Fourier transform Fiber distributed data interface, 11, 246 Fiber-switch capable, 266 FIFO. See First in first out File transfer protocol, 69, 108 Filters, reservation, 137–41, 145, 151 Filter specification object, 151 Finish time, weighted fair queuing, 62–66 First come first serve, 52–54, 55, 244 First come first serve model, 13, 51 First in first out, 52–53, 120, 300 Fixed filter, 139–42, 259 Fixed priority scheduling, 11 Flags Internet protocol, 80 resource reservation protocol, 147 transmission control protocol, 84, 85, 94
314
Engineering Internet QoS
Flow control aggregate, 189–90, 209, 260–61 differentiated services, 189–90 integrated services, 111–12, 113 transmission control protocol, 86–88 user-to-network interface, 218, 221 Flow label, 118 Flow specification integrated services, 113–16 resource reservation protocol, 145, 151, 154–56, 162 Flow specification object, 151 Fluctuations, quality of service, 283–84 Fluid model service, 114–15 FN. See Foreign network Foreign agent, 276–78 Foreign agent care-of address, 276–78 Foreign network, 276–78 Forward error correction, 20, 279–80, 283 Forwarding equivalence class, 243, 245, 247, 259, 260 FQ. See Fair queuing Fragment offset field, 80 Frame grabber, 5–6, 14–15 Frame loss, 122 FRED scheme, 101 FreeBSD platform, 69 Freeze time variable, 101 Freeze transmission control protocol, 287–88 FSC. See Fiber-switch capable FTP. See File transfer protocol Full-queue problem, 94 Generalized multiprotocol label switching, 265–66 Generalized processor sharing, 55, 57, 62, 72 General packet radio service, 127, 274–75, 285 Generic flow control, 218, 221 Generic quality of service, 156 GFC. See Generic flow control GFR. See Guaranteed frame rate Global positioning system, 17, 281 Global synchronization problem, 95
Global system for mobile communications, 20, 274–75 GMPLS. See Generalized multiprotocol label switching Gold service, 185, 235 GPRS. See General packet radio service GPS. See Generalized processor sharing; Global positioning system GQoS. See Generic quality of service GSM. See Global system for mobile communications Guaranteed frame rate, 225 Guaranteed service class, 110–11, 114–19, 145, 154–55, 164 Guaranteed service functional block, 151 Guarantees asynchronous transfer mode, 221–25, 235 per-hop behavior, 171, 295–96 quality of service, 34, 35, 49, 50 H.261 standard, 7–8, 19 HA. See Home agent Handoff, 279, 280, 283, 285–86, 287 Handshake, three-way, 85–86 Hashing-based packet classification, 35, 127 HDTV. See High-definition television Header checksum field, 80 Header error correction, 221 Header length field, 79, 83–84 Headers Internet protocol, 79–81, 173–74, 260 real-time protocol, 20–22, 24 resource reservation protocol, 147–48 transmission control protocol, 169, 260 user datagram protocol, 81–82 HEC. See Header error correction Hideable fluctuations, 283–84 Hierarchy routing, 252–54 High-definition television, 8 High-priority thread, 11 Hiperlan, 273, 275 HMAC algorithm, 201 HN. See Home network Home agent, 276–78, 279 Home network, 276–78
Index Hop-by-hop messaging, 144–45, 151, 163 Human visual system, 7 Hypertext transfer protocol, 300 IANA, 175 ICMP. See Internet control message protocol Identifier field, 80 IEEE 802.1D standard, 121–123, 162, 164 IEEE 802.1Q standard, 121, 123–124 IEEE 802.3 standard, 11, 121 IEEE 802.11 standard, 11, 120–121, 281 IEEE 802.11b standard, 271–273, 275 IETF. See Internet Engineering Task Force IGP protocol, 118, 252–54 Illegal packet, 36 Indeo, 8 INDEX project, 301 Indirect transmission control protocol, 287 Instantaneous bandwidth, 164 Instrumentation, scientific and medical band, 271, 273 Integrated services, 1, 35, 108 application classification, 108–9 flow definition, 111 local area network gateway, 120–25 over asynchronous transfer mode, 234, 235 over differentiated services, 293–97 over asynchronous transfer mode, 234 problems, 125–26 research directions, 126–28 reservations specs, 113–16 resource reservation, 281–82 router components, 116–20 routing protocol, 112–13 service classes, 109–11 signaling/flow setup, 111–12 Integrated services over specific link layer, 160 Interactive multimedia, 26 Interarrival jitter, 22, 24–25 Interdomain communication management, 196–97 Interleaving, 20 International Telecommunication Union ATM recommendation, 213–15
315
audio and video standards, 7–9 Recommendation H.261, 7–8, 19 Internet billing models, 301–302 quality of service provisioning, 302 Internet2 working group, 206–8 Internet control message protocol, 77 Internet Engineering Task Force, 1, 20, 35, 78, 79, 93–94, 107, 108, 137, 169, 178, 188, 195, 202, 208, 258, 259, 266, 287, 293, 300 Internet Engineering Task Force differentiated services working group, 174, 265 Internet Engineering Task Force MPLS working group, 242, 243 Internet protocol ATM integration, 228–33 cellular, 277, 278–79 datagram format, 78–81 datagram forwarding, 78 layers and functions, 77–78 mobile, 276, 278, 285 mobile services over, 275–79 packet classification, 69 quality of service mapping, 233–37 version 4, 54, 78–81, 93–94, 118, 148, 149, 172–73, 175, 245, 285 version 6, 78, 118, 148, 149, 174, 246 Internet protocol-based virtual private network, 299 Internet service provider, 282, 301 Internet telephony, 1, 26, 81–82 Intserv. See Integrated services IP. See Internet protocol IPSEC, 201 IS/DS. See Integrated services, over differentiated services ISM. See Instrumentation, scientific and medical band ISP. See Internet service provider ISSLL. See Integrated services over specific link layer I-TCP. See Indirect transmission control protocol ITU. See International Telecommunication Union
316
Engineering Internet QoS
Jitter, 3, 10, 13–16, 18, 19, 178, 189, 283 delay, 13–16, 18, 19, 26 interarrival, 22, 24–25 Joint photographic experts group, 7–8, 10 JPEG. See Joint photographic experts group Killer reservation problem, 160 Label allocation, 249–50 Label binding, 248–49, 255 Label distribution, 247–52 Label distribution protocol, 247, 258–60 Label edge router, 243, 245, 265 Label encoding, 245–46 Label processing, 246–47 Label stack, 253–54 Label switched path, 258–61 Label switched router, 241, 243, 247–52, 253-55, 263–64 Label switching, 241–43, 250–52. See also Multiprotocol label switching Label switching path, 263–64, 265 Lambda switch capable, 266 Layered media encoding, 137 LDAP. See Lightweight directory access protocol LD-CELP, 25 LDP. See Label distribution protocol Leaky bucket algorithms, 37–38 dual, 42 research on, 45 simple, 38–39 token, 39–41 Least upper bound, 136 Length field, header, 81 LER. See Label edge router Lightweight directory access protocol, 202 Linear predictive coding, 9 Link bandwidth reservation, 35 Link length, real-time protocol, 26 Link speed, real-time protocol, 26 Linux-based bandwidth broker implementation, 203–6 Linux-based evaluation, premium service, 179–85 Linux-based router, 69, 70–71 Lip synchronization, 2–3 LLC. See Logical link control encapsulation
Local area networks, 12 asynchronous transfer mode, 228 and integrated services, 120–25 and resource reservation protocol, 162–64 wireless, 271–73 Local policy decision point, 199, 201 Lockout problem, 94 Logical link control encapsulation, 229–31, 233 LPDP. See Local policy decision point LSC. See Lambda switch capable LSP. See Label switched path LSR. See Label switched router LUB. See Least upper bound Management agent, integrated services, 118 Management information base, 202 Manual admission control, 36 Mapping Internet protocol-ATM, 233–37 integrated services, 124–25 integrated services/differentiated services, 294–95 Marker, differentiated services, 177, 178, 190 Market header field, 22 Maximum burst size, 222 Maximum cell transfer delay, 223 Maximum frame size, 223 Maximum segment size, 82, 85, 90 Maximum threshold pointer, 95, 97, 98, 99 Maximum transfer unit, 12, 82, 113, 150 Max-min fair share, 51–52, 57 MBS. See Maximum burst size MCR. See Minimum cell rate Mean time between failures, 4 Mean time to failure, 4 Mean time to repair, 4 Media filter, 136–37 Media timestamp, 24 Medium access, 12, 281 Merger, reservation, 135–37, 142–43 Meter, differentiated services, 177, 178 MF. See Multifield classification MFS. See Maximum frame size
Index MH. See Mobile host MIB. See Management information base Microsoft Windows, 10 Minimum cell rate, 222, 224, 225 Minimum threshold pointer, 95, 97, 98, 99 MIP-LR. See Mobile Internet protocol with location registers MN. See Mobile node Mobile applications, 269–71 adaptivity, 283–84 managing quality of service, 281–84 over Internet protocol networks, 275–79 research directions, 284–88 Mobile host, 271–72, 273–74, 279, 282, 285 Mobile Internet protocol, 276-78, 285 Mobile Internet protocol with location registers, 285 Mobile node, 276–78 Mobile phones limitations, 280–81 market for, 270 Mobile resource reservation protocol, 281–82 Mobile transmission control protocol, 287 Mobile wireless networks Bluetooth, 273 cellular, 273–75 local area, 271–73 network comparison, 275 Mobility, quality of service and movement effect, 280 portable devices limitations, 280–81 wireless link effect, 279–80 Mobiware project, 284 Montgomery’s destination wait method, 16–18 Motion compression, 7 Motion joint photographic experts group, 7–8 Motion JPEG. See Motion joint photographic experts group Motion picture experts group, 8 Motion picture experts group 1, 8, 20 Motion picture experts group 2, 8, 20 Motion picture experts group 4, 8
317
Movement, user, 280 MPEG. See Motion picture experts group MPEG-1. See Motion picture experts group 1 MPEG-2. See Motion picture experts group 2 MPEG-4. See Motion picture experts group 4 MPLS. See Multiprotocol label switching MRSVP. See Mobile resource reservation protocol MSPEC, 281–82 MSS. See Maximum segment size MTBF. See Mean time between failures M-TCP. See Mobile transmission control protocol MTTF. See Mean time to failure MTTR. See Mean time to repair MTU. See Maximum transfer unit Multicasting, 22, 111, 134, 135, 137, 142–43, 149, 160, 209 Multifield classification, 177, 178 Multimedia applications capacity increases, 11–12 handoff, 280, 283 mobile support, 269–70 quality of transmission, 2—3, 280 Multiple queues, 121, 123–24 Multiprotocol label switching, 165, 236–37, 241–42 basics, 243–244 development factors, 242 developments, latest, 265–66 label distribution, 247–52 over asynchronous transfer mode, 254–55 routing, 245–47, 252–54 traffic engineering, 255–64 virtual private network, 299 Multiprotocol lambda switching, 165 Nash equilibrium, 301 National Television Standard Committee signal, 5, 9 NE. See Network element Negative acknowledgment packet, 19 Net news, 108
318
Engineering Internet QoS
Network element, 110, 112, 113, 114, 115, 117, 134, 135, 137, 149, 151, 152, 164 Networking media, 11–12 Network interface card, 124 Network-level quality of service parameters, 3–4 Network service provider, 252–53 Network time protocol, 17 Network-to-network interface, 218, 220 NIC. See Network interface card NNI. See Network-to-network interface Nonconforming packet, 38, 40, 41, 42, 117, 118, 296 Nonhideable fluctuation, 283–84 Non-real-time variable bit rate, 224 Non-work-conserving scheduler, 50–51 NSP. See Network service provider NT protocol. See Network time protocol NTP timestamp, 24 NTSC. See National Television Standards Committee Object classes, resource reservation protocol, 148–52, 164 One-pass model, 134–35 One-pass with AdSpec, 135 Operating systems, 10–11, 26 Optical cross-connect, 165 Option field, 80, 85 OPWA. See One pass with AdSpec Organizationally unique identifier, 230–231, 233 OS. See Operating system OSPF protocol, 77, 248, 254 OUI. See Organizationally unique identifier OXC. See Optical cross-connect Packet-by-packet generalized processor sharing, 62 Packet classification, 34–35, 45, 68–69 integrated services, 117–18, 127 resource reservation protocol, 158 Packet delay, 44 Packet dropping, 36, 50, 77, 78, 93 differentiated services, 169–70, 178, 185, 185–87, 189, 265 drop-tail scheme, 94–95
integrated services, 117, 118 multiprotocol label switching, 246 probability equation, 97–98 research directions, 100–101 transmission control protocol, 94–100 See also Drop-based services Packet flow, 50 Packet loss detection, transmission control protocol, 88–89 Packet loss information, in tracking congestion, 25 Packet loss rate, 4, 44, 50, 77, 109 Packet processing, integrated services, 118 Packet scheduling, differentiated services, 185–87, 190 goals, 49–52 integrated services, 118 mobile networks, 286 multimedia processing, 10–11 packet-switched network, 43–44 resource reservation protocol, 158–60 versus queue management, 93 Packet scheduling techniques deficit round robin, 58–62 delay bound with WFQ, 66–68 first come first serve, 13, 52–54, 55 generalized processor sharing, 55–57 priority queuing, 54–55 research directions, 72–74 round robin, 57–58 virtual clock, 68 weighted fair queuing, 62–68 weighted round robin, 58 Packet serialization delay, 115 Packet service order, 50 Packet-switch capable, 266 Packet-switching network, 31, 43–44, 214 Packet voice gateways, 208–9 Padding field, 81 Pad field, 227 PAN. See Personal area network Pandora system, 18 Passive reservation, 282 Path error message, 146 Path messages, 144, 146, 149, 152–53, 157, 158, 162, 163, 203, 259, 295 Path pinning, 160
Index Path resource reservation protocols, 258–60 Path tear message, 146, 162 Payload type field, 22, 81, 221, 227, 228 PCM. See Pulse code modulation PCR. See Peak cell rate PDA. See Personal digital assistant PDB. See Per-domain behavior PDL. See Policy description language PDP. See Policy decision point PDU. See Protocol data unit PE. See Policy element Peak cell rate, 222, 224 Peak rate, 32–33, 36 Peak-rate policing, 37, 38–39, 42 Peak-rate shaping, 43 PEP. See Policy enforcement point Per-call resource reservation, 35 Percentage of time available, 4 Perceptual quality of service parameters, 2–3, 4 Per-domain behavior, 172 Per-hop behavior, 171, 175, 177, 187, 234, 235, 265, 295–96 assured forwarding, 172, 185–87, 189, 190, 235, 236, 296–97 encoding, 175, 176 expedited forwarding, 178–79, 235, 295–97 Per-hop domain encoding, 175 Permanent virtual circuit, 216–18 Personal area network, 273 Personal digital assistant, 273, 280–81 PGPS. See Packet-by-packet generalized processor sharing PHB. See Per-hop behavior PHOP. See Previous hop PIB. See Policy information base PID. See Protocol identifier Piggybacking, 83 Playout time, 16–18 PO. See Pushout Point of presence, 299–300 Point-to-point communication, 295 Point-to-point protocol, 246 Policing. See Traffic policing Policy, definition of, 198
319
Policy-based admission control, 116–17, 146–47, 160, 164, 198 Policy-based management, 195 bandwidth broker, 203–6 framework, 198–99 Internet2, 206–8 policy database, 202 policy rules and representations, 201–2 protocols, 200–1 research directions, 208–9 and resource reservation protocol, 202–3 Policy-based resource allocation, 255 Policy data object, 152 Policy decision point, 198–99, 200–1, 203, 205, 206 Policy description language, 209 Policy element, 203 Policy enforcement point, 198–99, 200–1, 202–3, 205, 206 Policy information base, 202, 203 Policy rule class, 202 Policy rule instance, 202 PoP. See Point of presence PPP. See Point-to-point protocol PQ. See Priority queuing PRC. See Policy rule class Premium service differentiated services, 178–85 QBone, 208 Previous hop, 144 PRI. See Policy rule instance Pricing, differentiated services, 186–88, 190 PRID. See Provisioning instance identifier Priority capping, 11 Priority downgrades, 36, 43 Priority field, local area network, 121–24 Priority mapper, local area network, 122 Priority queuing, 54–55, 56, 119, 121, 178–79, 181, 188–89 local area network, 121–24, 164 Priority scheduler, local area network, 122 Propagation delay, 26 Protocol data unit, 225, 226, 227 Protocol field, 80 Protocol identifier, 230–31, 233
320
Engineering Internet QoS
Provisioning instance identifier, 202 Proxy agent, 282 PSC. See Packet-switch capable Pulse code modulation, 9, 20 Pushout, 189 PVC. See Permanent virtual circuit Q.2931 standard, 112 QBone, 208 QBone premium service, 208 QBone scavenger service, 208 QBSS. See QBone scavenger service QoS. See Quality of service QPS. See QBone premium service Quality of service best effort, 13–16 framework, 1–4 parameters, 3–4, 34 translation, 4–5 Quality of service contract, 34 Queue management balanced random early detection, 101 Blue method, 100–1 Queue management, transmission control protocol, 93 explicit congestion notification, 93–94 global synchronization, 95 packet drop schemes, 94 random early detection, 95–100 weighted random early detection, 98 Queuing, 43–44. See also Priority queuing; Scheduling Queuing delay, worst case, 114–16 QuickTime, 8 RADIUS, 198, 301 Random early detection, 93, 117, 181, 183 balanced, 101 basics, 95–98 differentiated services, 185 problems with, 99–100 selective pushout with, 189 weighted scheme, 98 Random early detection with in/out, 98–99, 189 Random exponential marking, 101 RAP. See Resource allocation protocol
RAPI. See Resource reservation protocol application programming interface RAS. See Rate adaptive shaper Rate adaptation, 109 Rate adaptive shaper, 190 RDBMS. See Relational database management system Real-time applications delay and jitter, 15 embedded, 10 integrated services, 109 tolerant and intolerant, 109, 110 Real-time control protocol, 161 audio transmission example, 25–26 congestion control, 93 jitter calculation, 24–25 overview, 22–24 Real-time protocol, 69 in audio transmission, 26 congestion control, 93 overview, 20–22 Real-time variable bit rate, 224 Receiver command, 156 Receiver-oriented resource reservation protocol, 134, 158 Receiver window size field, 84 Reception report, 23 RED. See Random early detection Refresh. See Soft state refresh Refresh messages, 162 Relational database management system, 202 REM. See Random exponential marking Remote-queuing multiple access, 286 Reno algorithm, 88, 92 Repair at receiver, 20 Request for comment 701, 172 Request for comment 791, 173 Request for comment 793, 85 Request for comment 1323, 85 Request for comment 1349, 172 Request for comment 1483, 229 Request for comment 1826, 162 Request for comment 1827, 162 Request for comment 1889, 20, 24, 25 Request for comment 2002, 276
Index Request for comment 2212, 114 Request for comment 2379, 237 Request for comment 2380, 237 Request for comment 2381, 237 Request for comment 2474, 169, 174 Request for comment 2475, 169, 195, 196 Request for comment 2481, 93–94 Request for comment 2598, 178 Request for comment 2698, 177 Request for comment 2748, 200 Request for comment 2963, 190 Request for comment 3084, 200 Request for comment 3086, 172 Request for comment 3198, 195 Request specification, integrated services, 114–16 Reservation confirmation object, 152 Reservations. See Resource reservation Reservation setup, resource reservation protocol, 134–35 Reserve confirmation message, 146 Reserved bit, 84 Reserve error message, 146 Reserve messages, 144–45, 146, 158, 162, 203, 259, 295 Reserve tear message, 146, 162 Resource allocation, differentiated services, 190 Resource allocation protocol, 160, 188 Resource allocation protocol working group, 198–99 Resource allocation to class not flow, 69 Resource discovery, 297–298 Resource management cell, 224 Resource reservation differentiated services, 282 integrated services, 113, 126–27, 281–82 mobile environments, 280, 281–82, 285 multiprotocol label switching, 258–60 overview, 35, 36 research on, 45 Resource reservation protocol, 35, 66, 112 APIS, 156–58 extensions, 161–65 features, 133–35
321
message format, 147–56 message types, 144–47 other protocols, 160–61 policy-based management, 202–3 problems, 158–60, 161 reservation merger, 135–37, 142–43 reservation styles, 137–43 Resource reservation protocol application programming interface, 156–58 Resource reservation protocol hop object, 151 Resource reservation protocol/ns, 141–43 Resource reservation protocol, traffic engineering, 258–59 Resources, adequate, 36 Response time, 3 Resv messages. See Reserve messages Retransmission timer, 89 RFC. See Request for comment RIO. See Random early detection with in/out RIP protocol, 77, 248 RM cell. See Resource management cell Round robin scheduling, 57–62 Round-trip delay measurement, 17 Routed protocols, encapsulation, 230, 233 Route pinning, 144, 160 Routers/routing asynchronous transfer mode, 241 differentiated services, 169–70, 175–78 hierarchy, 252–54 integrated services, 112–13, 116–20, 124, 125, 127 Internet protocol, 78, 244 research directions, 297 resource reservation protocol, 134, 144, 147, 158–60, 164–65 wireless networks, 286–87 RQMA. See Remote-queuing multiple access RR. See Reception report RR scheduling. See Round robin scheduling RSpec, 145 RSVP. See Resource reservation protocol RSVP TE. See Resource reservation protocols, traffic engineering RTCP. See Real-time control protocol
322
Engineering Internet QoS
RTP. See Real-time protocol RTT estimation, 89–90 SAR sublayer. See Segmentation and reassembly sublayer SBM. See Subnet bandwidth manager Scalability, 125, 169 Scalable multipath aggregated routing, 127 Scalable resource reservation protocol, 161 SCFQ. See Self clocked fair queuing Scheduler, work-conserving and non-work-conserving, 50–51 Scheduling. See Packet scheduling; Packet scheduling techniques Scope object, 151 SCR. See Sustainable cell rate SCRAPI. See Simplified resource reservation protocol application programming interface SDS. See Service-discovery service Security, 162, 187 Segment, transmission control protocol, 82–85 Segmentation and reassembly sublayer, 226 Selection pushout with random early detection, 189 Self-clocked fair queuing, 72 Self-similarity, 20 Sender command, 156 Sender template object, 152 Sequence number field, 83 Sequence number header field, 22 SE reservation. See Shared explicit reservation Serialization delay, 41, 115 Serial link, 44 Serial transmission, 41 Service discovery framework, 297–98 Service-discovery service, 298 Service-level agreement, 4, 187, 189, 195–96, 197, 209, 282 Service-level objective, 196 Service level specification, 169, 170, 196, 295 Service rate, request specification, 114 Session objects, 148–49 Session reservation protocol, 161
SFB. See Stochastic fair blue Shadow cluster concept, 285–86 Shaper module, 178, 190 Shared-explicit-filter reservation, 138–39 Shared explicit reservation, 259 Shared media, 160 Shim layer/header, 265 Shortest-path routing protocol, 258, 260 Shortest-widest path algorithm, 297 SIBBS. See Simple interdomain bandwidth broker signaling protocol Signaling, quality of service, 34 Signaling protocol integrated services, 111–12, 125 resource reservation protocol, 162–63 Silver service, 185, 235 Simple interdomain bandwidth broker signaling protocol, 197 Simple leaky bucket, 38–39 Simple network management protocol, 118, 202 Simplex resource reservation protocol, 133, 160 Simplified resource reservation protocol application programming interface, 156–58 SLA. See Service level agreement Slack term, request specification, 114, 115–16 Sliding window, 87–88 SLO. See Service-level objective Slow start, transmission control protocol, 90, 91, 95 SLS. See Service level specification SMART. See Scalable multipath aggregated routing Smart market scheme, 301 SMI. See Structure of management information SNAP. See Subnetwork attachment point SNMP. See Simple network management protocol Snooping transmission control protocol, 287 Soft connection-orientation, 108, 112 Soft state refresh, 135, 147, 160, 162, 259 Source field, header, 80, 81
Index Source port number, 82 SPRED. See Selective pushout with random early detection SRED. See Stabilized random early detection SRP. See Scalable resource reservation protocol; Session reservation protocol SSRC. See Synchronization source identifier ST. See Stream protocol Stabilized random early detection, 101 Staged refresh timer, 162 Start time fair queuing, 72 Static configuration, service negotiation, 34 Status command, 156–57 ST-II. See Stream protocol-II Stochastic fair blue, 101 Stream protocol, 160 Stream protocol-II, 112, 160 Structure of management information, 202 Style object, 151 Subnet bandwidth manager, 162–63 Subnetwork attachment point, 229–31 Sustainable cell rate, 222 SVC. See Switched virtual circuit Switched virtual circuit, 216–18, 241 Switches, local area network, 122–24 Synchronization source identifier, 22, 23, 24 Synchronized clock, 17 System-level quality of service parameters, 3 Systems-level data rate, 3 Tag Distribution Protocol, 247 Tahoe algorithm, 88, 92 TCA. See Traffic-conditioning agreement TCM. See Three-color marker TCP. See Transmission control protocol TDM. See Time division multiplexing TDP. See Tag Distribution Protocol Teletraffic tapper, 203 Telnet, 69, 108, 124 Third-generation cellular networks, 274–75 Three-color marker, 177, 190 Three-way handshake, 85–86 Time division multiplexing, 68, 115, 266
323
Timefly, 284 Timeout period, 89, 92 Timeouts, in packet loss detection, 88–89, 92 Time-sharing operating system, 10–11 Timestamp, 22, 24, 85 Time to live, 80, 245, 246 Time value object, 151 Token bucket, 38, 39–42, 43, 45, 66–67, 111, 113, 114, 118, 149, 296 Token ring network, 120 TOMTEN, 284 ToS field. See Type of service field Total length field, 80 Traffic conditioning, differentiated services, 170, 178 Traffic conditioning agreement, 196 Traffic contract, asynchronous transfer mode, 222 Traffic control, integrated services, 119 Traffic engineering, multiprotocol label switching, 255–64 Traffic parameters, 32–33 Traffic patterns, asynchronous transfer mode, 222–23 Traffic policing, 36 algorithms, 37–38 dual leaky bucket, 42 integrated services, 117, 118 parameters, 37 requirements, 36–37 research on, 45 simple leaky bucket, 38–39 token bucket, 39–41 versus traffic shaping, 42–43 Traffic shaping, 42–43, 45 integrated services, 117, 118 Traffic sources, types of, 31–32. See also Constant bit rate; Variable bit rate Traffic specification integrated services, 113–14, 117, 134 resource reservation protocol, 134–35, 144, 145, 149 Traffic specification object, 149 Traffic trunking, 260–64 Transaction rate, 3
324
Engineering Internet QoS
Translation, service parameters, 4–5 Transmission control protocol, 20, 44, 52, 73 acknowledgment, 86 basics, 82 best effort traffic, 164 congestion control, 88–92, 93–94 differentiated services research, 189–90 flow control, 86–88 functions, 108–9 queue management, 93–100 segment format, 82–85 three-way handshake, 85–86 traffic trunking, 261–64 wireless networks, 287–88 Transmission control protocol/Internet protocol, 25, 77, 169, 202, 226, 228–29 ATM integration, 228–37 Transport layer, wireless network, 287–88 Trunk, traffic, 260–64 TTL. See Time to live TTT. See Teletraffic tapper TULIP project, 288 Twisted-pair cable, 11 Type of service field, 54–55, 66, 243 Internet protocol, 79, 172–74, 175 UBR. See Unspecified bit rate UDP. See User datagram protocol UMTS. See Universal mobile telecommunication system UNI. See User-to-network interface Unicast communication, 22, 134, 149 Universal mobile telecommunication system, 285 UNIX, 10, 11, 15, 69, 156 Unspecified bit rate, 224, 225, 226, 235 Urgent pointer, 84 User datagram protocol, 20, 26, 52, 73, 189 congestion control, 93 header fields, 81–82 real-time applications, 109 resource reservation, 144 traffic trunking, 261–64 User mobility, 280
User perception parameters, 2–3, 4 User-to-network interface, 218–219, 223 User-to-user field, 227 UTP protocol, 11 Variable bit rate, 32 adequate resources, 36 asynchronous transfer mode, 224, 225, 226 guaranteed service, 110 policing parameters, 37 VBR. See Variable bit rate VC. See Virtual channel VCI. See Virtual channel identifier VDSL. See Very high-speed digital subscriber line Vegas algorithm, 88, 92 Version field, IP header, 79 Very high-speed digital subscriber line, 12 Vickery auction, 301 Video compression, 5, 6–9 Videoconferencing, 5–6, 139–40 Video-on-demand applications, 1 Video-phone service, 187 Videostreaming, 109, 111 Virtual buffer, 39 Virtual channel, 216–18, 235–36, 254–55, 257, 261 Virtual channel identifier, 221, 246, 254–55, 265 Virtual channel multiplexing, 229–30, 231–33 Virtual clock, 67, 68 Virtual leased line, 178, 208 Virtual local area network, 121, 123, 124 Virtual path, 216–17 Virtual path identifier, 221, 246, 254–55, 265 Virtual private network, 241, 298–99 Virtual resource mapping, 127–28 Virtual transmission engine, 179–80, 184 Visual telephony, 8 VLAN. See Virtual local area network VP. See Virtual path VPI. See Virtual path identifier VPN. See Virtual private network VTE. See virtual transmission engine
Index Wave division multiplexing, 302 WDM. See Wave division multiplexing Weighted fair queuing, 62–68, 72, 118, 119, 125, 185, 186, 188, 190 Weighted random early detection, 98, 99 Weighted round robin, 58, 186 WFQ. See Weighted fair queuing Wide area network, 12, 228 Wildcard-filter reservation, 137–38 Wireless links, quality of service and, 279–80 Wireless link speed, 11–12 Wireless local area network, 271–73
325
Work-conserving scheduler, 50–51 World Wide Web browsing, 108 World Wide Web quality of service, 300 Worst-case fair weighted fair queuing, 72 WRED. See Weighted random early detection WRR. See Weighted round robin YESSIR. See Yet another sender session Internet reservations Yet another sender session Internet reservations, 161 Zero-size buffer, 38