Delivering Voice over IP Networks
Delivering Voice over IP Networks Second Edition DANIEL MINOLI EMMA MINOLI
Wiley Publishing, Inc.
Publisher: Robert Ipsen Editor: Margaret Eldridge Assistant Editor: Adaobi Obi Managing Editor: Angela Smith New Media Editor: Brian Snapp Text Design & Composition: North Market Street Graphics Designations used by companies to distinguish their products are often claimed as trademarks. In all instances where John Wiley & Sons, Inc., is aware of a claim, the product names appear in initial capital or ALL CAPITAL LETTERS. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration. ∞ This text is printed on acid-free paper. 嘷 Copyright © 2002 by Dan Minoli, Emmanuelle Minoli. All rights reserved. Published by Wiley Publishing, Inc., Indianapolis, Indiana Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4744. Requests to the Publisher for permission should be addressed to the Legal Department, Wiley Publishing, Inc., 10475 Crosspointe Blvd., Indianapolis, IN 46256, (317) 572-3447, fax (317) 572-4447, E-mail:
[email protected]. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold with the understanding that the publisher is not engaged in professional services. If professional advice or other expert assistance is required, the services of a competent professional person should be sought. Library of Congress Cataloging-in-Publication Data: Minoli, Daniel Delivering voice over IP networks / Dan Minoli, Emma Minoli.— 2nd ed. p. cm. ISBN 0-471-38606-5 1. Internet telephony. 2. TCP/IP (Computer network protocol). 3. Digital telephone systems. 4. Computer networks. 5. Data transmission sytems. I. Minoli, Emma. II. Title. TK5105.8865 .M57 2002 621.385—dc21 2002071368 Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic versions. For more information about Wiley products, visit our web site at www.wiley.com. Printed in the United States of America. 10 9 8 7 6 5 4 3 2 1
Contents PREFACE xiii ACKNOWLEDGMENT xv ABOUT THE AUTHORS xvii
Chapter 1:
Introduction and Motivation
1.1 INTRODUCTION 1 1.2 DRIVERS FOR VOICE OVER IP 6 THE NEGATIVE DRIVERS 12 1.3 APPROACHES FOR IP-BASED VOICE SYSTEMS VOICE SERVERS APPROACH 15 IP VOICE AND VIDEO PHONES 18 1.4 THE FUTURE 18 REFERENCES 18
1
14
Chapter 2: An Overview of IP, IPOATM, MPLS, and RTP 21 2.1 INTRODUCTION 21 2.2 INTERNET PROTOCOL 24 THE ROLE OF THE IP 24 IP ROUTING 26 IP DATAGRAMS 29 SUPPORT OF VOICE AND VIDEO IN ROUTERS 32 IP VERSION 6 (IPV6) 33 2.3 IP OVER ATM (IPOATM) 36 2.4 BASIC SYNOPSIS OF MPLS 39 MPLS FORWARDING/LABEL-SWITCHING MECHANISM 41 MPLS LABEL-DISTRIBUTION MECHANISM 43 2.5 REAL-TIME TRANSPORT PROTOCOL (RTP) 45 50 2.6 RTP CONTROL PROTOCOL (RTCP) 2.7 STREAM CONTROL TRANSMISSION PROTOCOL (SCTP) 52
v
vi
Contents
2.8
ATM QOS MECHANISMS 54 QUALITY OF SERVICE PARAMETERS QOS CLASSES 57 REFERENCES 59 NOTES 61
Chapter 3:
56
Issues in Packet Voice Communication
INTRODUCTION 63 SCOPE 64 SUMMARY OF RESULTS 65 3.2 TRAFFIC MODELS 66 INTRODUCTION 66 SPEECH EVENTS 66 SPEAKER MODELS 67 CALL ORIGINATION MODEL 72 3.3 PERFORMANCE CRITERIA 74 RESULTS OF SUBJECTIVE STUDIES 74 SMOOTHNESS CRITERIA 76 3.4 LINK MODEL 78 INTRODUCTION 79 MODEL DESCRIPTION 79 3.5 RESULTS 84 PROPERTIES OF THE DELAY DISTRIBUTION FINITE-BUFFER CASE 86 EFFECT OF SPEECH MODELS 88 OPTIMAL PACKET LENGTH 90 TRANSIENT BEHAVIOR 92 3.6 CONCLUSION 95 REFERENCES 96
63
3.1
84
Chapter 4: Voice Technologies for Packet-Based Voice Applications 101 4.1
INTRODUCTION 101 GENERAL OVERVIEW OF SPEECH TECHNOLOGY 101 WAVEFORM CODING 102 VOCODING (ANALYSIS/SYNTHESIS) IN THE FREQUENCY DOMAIN 4.2 G.727: ADPCM FOR PACKET NETWORK APPLICATIONS 111 INTRODUCTION 111 ADPCM ENCODER PRINCIPLES 114 ADPCM DECODER PRINCIPLES 121
107
Contents
4.3 EXAMPLE OF APPLICATION REFERENCES 123 NOTES 123
123
Chapter 5: Technology and Standards for Low-Bit-Rate Vocoding Methods 125 5.1
INTRODUCTION 125 OVERVIEW 127 VOCODER ATTRIBUTES 128 LINEAR PREDICTION ANALYSIS-BY-SYNTHESIS (LPAS) CODING 130 5.2 INTRODUCTION TO G.729 AND G.723.1 133 DIFFERENTIATIONS 133 STANDARDIZATION PROCESS 134 STANDARDIZATION INTERVAL 135 5.3 G.723.1 136 INTRODUCTION 136 ENCODER/DECODER 136 5.4 G.728 138 LD-CELP ENCODER 139 LD-CELP DECODER 140 5.5 G.729 140 ENCODER 141 DECODER 143 5.6 EXAMPLE OF APPLICATIONS 145 H.263 VIDEO CODING FOR LOW-BIT-RATE COMMUNICATION 145 H.324 MULTIMEDIA COMMUNICATION 146 H.323 MULTIMEDIA COMMUNICATIONS STANDARD FOR LANS AND ENTERPRISE NETWORKS 148 REFERENCES 150 NOTES 151
Chapter 6:
Voice over IP and the Internet
153
6.1 INTRODUCTION 153 6.2 IP/INTERNET BACKGROUND 157 INTERNET PROTOCOL SUITE 157 THE INTERNET 157 6.3 VOICE TRANSMISSION AND APPROACHES IN ATM, FRAME RELAY, AND IP ATM 162 FRAME RELAY 164 IP 164
162
vii
viii
Contents
ITU-T H.323 GROUP OF STANDARDS 165 STREAMING AUDIO 166 6.4 QOS PROBLEMS AND SOLUTIONS 167 6.5 PROTOCOLS FOR QOS SUPPORT FOR AUDIO AND VIDEO APPLICATIONS RSVP APPLICATIONS 169 IP MULTICAST 171 6.6 INTERNET TELEPHONY SERVERS (ITSS) 172 6.7 THE VOICE OVER IP/INTERNET MARKET 177 6.8 VOIP REGULATORY ISSUES 177 6.9 CONCLUSION 180 REFERENCES 181 NOTES 181
Chapter 7:
Signaling Approaches
183
7.1 INTRODUCTION 183 7.2 SIGNALING IN CIRCUIT-SWITCHED NETWORKS 187 7.3 H.323 STANDARDS 189 FUNCTIONAL ELEMENTS 189 H.323 BASICS 190 EXAMPLE OF SIGNALING 196 7.4 MGCP 202 7.5 SIP 207 SIP PROTOCOL COMPONENTS 209 SIP-T 210 7.6 OTHER IETF SIGNALING EFFORTS 215 PINT AND SPIRITS 215 ENUM 218 TRIP 219 7.7 MEGACO 219 7.8 SIGTRAN PROTOCOLS 221 PERFORMANCE CONSIDERATIONS FOR CCSS7 OVER IP SECURITY REQUIREMENTS FOR CCSS7 OVER IP 223 SCTP USE IN CCSS7 223 TRANSPORTING MTP OVER IP 226 TRANSPORTING SCCP OVER IP 229 7.9 SCTP 230 INTRODUCTION 230 MOTIVATION 230 ARCHITECTURAL VIEW OF SCTP 230 FUNCTIONAL VIEW OF SCTP 231 KEY TERMS 236 SERIAL NUMBER ARITHMETIC 236
223
169
Contents
SCTP PACKET FORMAT 239 SCTP ASSOCIATION STATE DIAGRAM 258 ASSOCIATION INITIALIZATION 260 USER DATA TRANSFER 262 TERMINATION OF AN ASSOCIATION 273 REFERENCES 276 NOTES 277
Chapter 8:
Quality of Service
279
8.1 INTRODUCTION 279 8.2 BACKGROUND 281 8.3 QOS APPROACHES 284 PER-FLOW QOS 284 CLASS-BASED QOS 288 MPLS-BASED QOS 289 TRAFFIC MANAGEMENT/QUEUE MANAGEMENT 291 8.4 QOS DETAILS 294 IETF INTSERV APPROACH 294 IETF DIFFSERV APPROACH 305 ADDITIONAL DETAILS ON QUEUE MANAGEMENT 320 CONCLUSION 326 8.5 CASE STUDY 327 REAL-TIME SERVICE REQUIREMENTS 327 TECHNICAL CHALLENGES 330 CISCO SOLUTIONS FOR SUPPORTING IP-BASED REAL-TIME SERVICES REFERENCES 340 NOTES 342
Chapter 9: Voice over MPLS and Voice over IP over MPLS 343 9.1 INTRODUCTION AND BACKGROUND 343 9.2 MOTIVATIONS 344 9.3 BASIC MPLS FEATURES 349 MPLS FORWARDING/LABEL-SWITCHING MECHANISM MPLS LABEL-DISTRIBUTION MECHANISM 356 OTHER FEATURES 358 COMPARISON 359 9.4 QOS CAPABILITIES 363 INTRODUCTION 365 DETAILS 368
353
330
ix
x
Contents
9.5
VOICE APPLICATIONS 370 IP HEADER COMPRESSION 371 VOIPOMPLS PROPOSAL 372 MPLS FORUM SPECIFICATION 374 REFERENCES 375 NOTES 376
Chapter 10:
Telephone Number Mapping (ENUM)
377
10.1 INTRODUCTION 377 10.2 BACKGROUND 379 10.3 INTRODUCTION TO ENUM 383 ENUM: AN EVOLVING ARCHITECTURE 385 DEFINING ENUM APPLICATIONS 387 THE ENUM ROAD TO SUCCESS 389 10.4 SUMMARY OF CAPABILITIES AND ISSUES 390 CAPABILITIES 390 ADVOCACY 397 10.5 NUMBER PORTABILITY 398 TYPES OF NP 400 SPNP SCHEMES 401 DATABASE QUERIES IN THE NP ENVIRONMENT 405 CALL ROUTING IN THE NP ENVIRONMENT 408 NP IMPLEMENTATIONS FOR GEOGRAPHIC E.164 NUMBERS 411 NP-ENABLED NUMBER CONSERVATION METHODS 411 CONCLUSION 414 10.6 E.164 NUMBERS AND DNS 417 INTRODUCTION 417 E.164 NUMBERS AND DNS 417 FETCHING UNIFORM RESOURCE IDENTIFIERS (URIS) GIVEN AN E.164 NUMBER IANA CONSIDERATIONS 420 SECURITY CONSIDERATIONS 420 10.7 APPENDIX TO THE RFC 2916 SCENARIO 421 REFERENCES 422 NOTES 424
Chapter 11:
Carrier Applications
11.1 INTRODUCTION AND OPPORTUNITIES 427 11.2 WHERE THE ACTION SHOULD BE 432 11.3 CARRIER VOICE NETWORKS 439
427
418
Contents
11.4
DEPLOYMENT AND ISSUES 446 WIRELESS NETWORKS 450 CABLE TELEPHONY NETWORKS 458 11.5 ENTERPRISE APPLICATIONS 465 11.6 INTERNATIONAL OPPORTUNITIES 466 11.7 EQUIPMENT/VENDOR TRENDS 469 REFERENCES 473 NOTES 474 INDEX
477
xi
Preface 2001 represented the twenty-fifth anniversary of voice over packet work for the senior author. He was involved with work sponsored by the Advanced Research Projects Agency (ARPA) in 1976 to develop packet-based low-bit-rate vocoded voice for transport in integrated packet networks used by the U.S. government, and he continued to work on such solutions through the 1980s. Developments in this field continued throughout the 1980s and into the 1990s, particularly regarding the standardization of vocoding methods that occurred in the mid-1990s and the major push by many router/switch vendors for the introduction of IP/Internet telephony, which started around 1997 and continues to the present. The first edition of this book was compiled in August 1997; the book appeared in print in April 1998. Voice over packet in general and voice over IP (VOIP) in particular have seen tremendous trade press, consultant, vendor, and conference hype in the past five years. Google.com identifies 285,000 hits when a search with the exact phrase “voice over IP” is used (a search on Marilyn Monroe shows only 252,000 hits). Considering the amount of air time given to the topic, the amount of actual deployment to date is underwhelming. What has held back the deployment of the technology includes the following: • An underestimation of the importance, complexity, and purpose of signaling and the need to interconnect with the 1.4 billion telephone sets already deployed globally. • The confusion brought about by the multiplicity of signaling protocols that have been advanced. • A lack of understanding of the economic value of the embedded base of equipment in carrier networks that likely will continue to operate flawlessly for what it was intended to do (make money with net bottom line of around 20 percent) for a decade or more. • A lack of understanding of where the true costs to deliver telephone services are.
xiii
xiv
Preface
• A lack of understanding of the commoditization of both switched voice and high-speed bandwidth, obliterating the value from any savings related to “bandwidth efficiency” of VOIP. • A lack of understanding that any bandwidth savings in VOIP has nothing to do with IP but everything to do with vocoding algorithms that can well be supported without IP. • The difficulty in securing Quality of Service (QoS) in public (and even private) IP networks. • The most critical failure of all: a misunderstanding of where the value of VOIP has to be to make it successful. The bright spot for VOIP is that it brings forth a panoply of new services not possible with circuit-switched voice, not that it acts as a transport mechanism to replace existing trunks and/or Class 5 switches. Why would anyone scrap what has over the years been called the best telephone system in the world just to replace it with something that does just the same and nothing more? A VOIP network cannot be just a “me too” network: “me too” carries voice. Developers have to start focusing in earnest on bringing out new applications, not the chronic litany of “bandwidth efficiency” advantages. There is a reported glut of fiber. The DWDM can deliver terabits per second on fibers. The alleged need to save bandwidth is suspect and anachronistic; it is a dead end for VOIP. Telecom professionals have to follow the lead of the PC developers: For well over a decade now, software developers have stopped trying to be highly parsimonious in RAM/disk drive efficiency so that an entire suite of new services of convenience on the desktop could be developed. Many new applications and capabilities have been developed on the desktop, including program linkage (audio and video) and easy-to-use browsers. The story of VOIP has to be less on IP and the many new protocols that one can invent overnight, and much more on new applications, possibilities, services, interactions, voice capture, storage, manipulation, dissemination, interplay, and so on. That is why the first edition of this book had a chapter that, at face value, did not seem to fit in with the sophistication and intricacies of all those 300-page Internet protocols out there; it was a chapter on applications, or more specifically, on self-serve customer service Web-site applications. The message conveyed for VOIP to be successful: It is the applications, not the protocols. The same is true with audiophiles: It is the musical composition/content coming off the speaker that makes all the difference, not that one speaker rolls off at 44 Hz and the other at 43 Hz. Voice over IP is not going to replace existing carrier networks in North America. Also, with PSTN-based voice becoming so cheap, it may not be worth the trouble to place voice on the intranet, except possibly for international applications. The future of VOIP must reside with new applications that it can and should bring forth.
Preface
xv
For the second edition, we have updated Chapter 1, the introduction and motivation. We have updated Chapter 2, on packet technology, by adding material on Multiprotocol Label Switching (MPLS). Chapter 6, which looks at IP/Internet telephony, has been updated. Chapter 7 has been changed; it now covers the issue of signaling. Four new chapters have been added. Chapter 8 looks at (QoS) issues. Chapter 9 looks at voice over MPLS (VOMPLS) services. Chapter 10 looks at directory services, with one example of a service capability that must be supported; specifically, IEFT’s ENUM. Chapter 11 looks at carrier applications in addition to migration strategies; it addresses the question of where it makes sense to use VOIP.
Acknowledgment The authors wish to thank Mr. Peter Johnson for contributing Chapter 11 to this book, which describes VOIP opportunities in the wireless space.
About the Authors Daniel Minoli Mr. Minoli, Managing Director, Econsultex Inc., has twenty-seven years marquee corporate experience in technology and applications related to, and focused on, broadband enterprise and carrier communications, Metro Ethernet and next-gen optics, LAN technology, voice over IP, Internet and intranet architectures and design, e-commerce, and multimedia applications. Most recently, Mr. Minoli was founder and/or cofounder of several high-tech companies, including InfoPort Communications Group, an optical and gigabit Ethernet metro carrier. In addition to a full-time corporate career, Mr. Minoli has taught graduate optical networking/ datacommunications, voice over IP, multimedia, and e-commerce at Stevens Institute, New York University, Carnegie Mellon University, and Monmouth University as an adjunct for eighteen years. He has been a technical analyst for Dataquest Gartner (Datapro) for seventeen years and has also been a consultant for numerous Venture Capitalists’ high-tech investments (exceeding $125 million of investment). Mr. Minoli enjoys over 3330 Web hits with his name (e.g., Google.com “Minoli Daniel”). He has been a columnist for trade periodicals (ComputerWorld, NetworkWorld, and Network Computing), has written dozens of technology reports for Datapro Corporation and the Gartner Group, and has written five market reports for Probe Research Corporation. Mr. Minoli has spoken at over 75 conferences and has published 300 articles. He is often sought out for advice by companies, patent attorneys, and Venture Capitalists, and he has been involved in mezzanine investments, providing in-depth reviews of technology and market baseline for high-tech companies in the multimedia, digital video, CTI/Java, VSAT, and telemedicine arenas. In the 1990s, he was a senior executive (Vice President, packet services) at TCG/AT&T, where he deployed several large data networks with a cumulative P&L of $250 million in capital expenditure, $75 million in operating expenditure, $125 million in direct revenue, and $4 billion in revenue impacted. Mr. Minoli started the Broadband/IP Services operation at the Teleport Communications Group in late 1994; he was “Broadband Data Employee #2” and built the operation to a $25million-per-year business (unit value: $250 to $300 million). Mr. Minoli’s team deployed 2000 backbone/concentration/access routers and 100 ATM/frame relay
xvii
xviii
About the Authors
switches in 20 cities in 5 years; his team turned up 1500 active broadband ports and secured 400 broadband customers. Prior to AT&T/TCG, Mr. Minoli worked at DVI Communications (as Principal Consultant), where he managed the deployment of three major corporate ATM networks, including one for the Federal Reserve Bank. From 1985 to 1994, Mr. Minoli worked at Bellcore/Tellcordia (as Technology Manager), where he thoroughly researched all aspects of broadband data networking— from service creation to standards writing, from architecture design to marketing, and so on. In the mid-1980s, he worked at Prudential Securities (as Assistant Vice President), deploying dozens of data networks, including a 300-node VSAT branch data network. At ITT Worldcom (from 1980 to 1984), he deployed a fax over packet network and automated nearly all carrier operations by developing over a dozen OSSs, including Customer Care/CRM. At Bell Labs, he worked on internal and public packet networks; at the Network Analysis Corporation, he undertook ARPA work on voice over packet, wireless networks, and integrated communications.
Emma Minoli Ms. Emma Minoli is CEO and Founder of Red Hill Consulting, a high-tech consulting firm specializing in the area of voice over IP, multimedia, and e-commerce technologies. Red Hill Consulting also focuses on marketing and public relations/ advocacy for the aforementioned technologies. During the past five years, its clients have included AT&T, J. D. Edwards, Chase Manhattan Bank, InfoPort Communications Group, and EtherCarrier Communications. Ms. Minoli has coauthored several books, including The Telecommunications Handbook—Digital Video section (CRC Press/IEEE Press), 1999; Delivering Voice over IP Networks (Wiley), 1998; Delivering Voice over Frame Relay and ATM (Wiley), 1998; and Web Commerce Technology Handbook (McGraw-Hill), 1997.
xviii
CHAPTER 1 Introduction and Motivation 1.1
Introduction
Data networks have progressed to the point that it is now possible to support voice and multimedia applications right over the corporate enterprise network and the intranet, for both on-net and off-net applications. Many companies have already deployed IP-based backbones that provide both broadband capabilities and Quality of Service (QoS)–enabled communications. Some companies have deployed Asynchronous Transfer Mode (ATM) networks. Switching technology, particularly in terms of the switched local area network (LAN), has gone a long way in the past five years, providing higher-capacity, lower-contention services across the enterprise campus network. High-speed wide area technology, such as Packet over Synchronous (POS) Optical Network and metro optical services as metro gigabit Ethernet, provide increased bandwidth across the enterprise regional, nationwide, and international networks. The Internet Engineering Task Force (IETF) Multiprotocol Label Switching (MPLS) specification also directly or indirectly provides improved support of IP services. In addition, QoS-supporting protocols, such as IPv6, Resource, Integrated Services Architecture (intserv), differentiated services (diffserv) in IPv4, and Realtime Transport Protocol (RTP), are now entering the corporate enterprise network (Reference [1] provides a treatment of the trends listed here). A lot of industry effort has gone into supporting IP over ATM using a number of technologies, such as Classical IP over ATM (CIOA). All of this opens the door for the possibility of carrying voice over the enterprise network. Interest exists in the carrier arena as to
1
2
Chapter One
the possibility of modernizing the existing public-switched telephone network (PSTN) with an IP-based infrastructure that would support multiple services, including Voice over IP (VOIP) (see Figure 1.1). At the same time, commercialized Internet use has increased significantly in the past few years, as companies ventured into Web-based commerce [2]. During the
ENUM Server (Directory) CCSS7 Legacy Analog/ Digital Phones PSTN
VoIP Gateway Desktop PCs Ethernet IP Phone
MGCP SIP Switched Ethernet LAN
Call Path
Ethernet IP Phone
Voice/Data Client
Voice/Data Client H.323, SIP
Network Router MPLS
Application Servers
Remote IP Phone
Call Server
Internet Remote IP Phone
Figure 1.1 The VoIP environment.
Intranet
Introduction and Motivation
3
late 1990s, the Internet was reportedly growing 100 percent per quarter. More recently, people have quoted 80 percent growth (or perhaps slightly less) per year. That large collection of backbones, access subnetworks, server farms, and hypertext information that is known as the Internet is acquiring ever-increasing importance, not only for the business community but also for the population at large. Access to information is proving increasingly valuable for education, collaborative work, scientific research, commerce, and entertainment. The advent of HTML-formatted, URL-addressable, and HTTP-obtainable information over the Internet—what is often called the World Wide Web (WWW or W3)—has generated a lot of attention in the past 10 years. Now there is a movement afoot to make the transition to fully multimedia enabled sites that allow voice, video, data, and graphics to be accessed anywhere in the world. The issue so far, however, has been that voice and video, by and large, have been of the stored kind—namely, a one-way download of sound files that are played out in non–real time at the user’s PC. Given this extensive deployment of data networking resources, the question naturally presents itself, is it possible to use the investment already made to carry real-time voice in addition to the data? The desire to build one integrated network goes back to the 1970s, if not even earlier. The Advanced Research Projects Agency, with project DACH-15-75-C0135 (and many other projects with many other researchers), funded the senior author’s work in 1975 to look at the feasibility of integrated voice and data packet networks. And Integrated Services Digital Network (ISDN) research started in Japan in the early 1970s (before the idea started to get some real attention in the late 1970s and early 1980s) with the explicit goal of developing and deploying integrated networks. However, a lot of the mainstream work has been in supporting voice and data over circuit-switched time-division multiplexed (TDM) networks. Only some early packet over data work, and then some Fiber Distributed Data Interface II (FDDI II) and Integrated Voice/Data LAN (IEEE 802.9) work, looked at voice support in a non-circuit-mode network. Even for ATM, the emphasis has been, until the past few years, on data services. The idea of carrying voice over data networks has received considerable commercial attention in the past five years. The ATM Forum, the Frame Relay Forum, and the MPLS Forum have published specifications, and a whole range of voice over data network has appeared and/or is appearing. The work of the ATM Forum and the Frame Relay Forum has focused on connection-oriented networks. However, connectionless IP-based networks are ubiquitous, and so there is a desire to carry business-quality voice over them. The major challenge in this regard is that IP networks do not yet support QoS features. Nonetheless, a plethora of IP phones and IP-to-public-network gateways has entered the market. This book is one of two related Wiley books published by the authors. This book focuses on the IP telephony technology itself. Figure 1.2 depicts the various voice over data network technologies now evolving, including VOIP. Also note that IP can utilize a number of data link layer services, such as ATM, MPLS, and frame relay. Figure 1.3 depicts a possible scenario of VOIP, as is addressed in this book.
4
FRAD protocol stack for voice over a FR WAN interworked to ATM
SONET/ DS3
ATM
56/T1 Protocol stack for voice over IP carried in a FR WAN (private or public)
FRAD protocol stack for voice over ATM WAN (private or public)
FRAD protocol stack for voice over ATM WAN (private or public)
FR
IP
Compression (optional)
SONET/ DS3
Figure 1.2 Various packet technologies usable for Voice over Packet.
FRAD protocol stack for voice over FR WAN (private or public)
ATM
AAL 5 or 2
Compression (optional)
VBR
SONET/ DS3
ATM
AAL 5/ 1.555
56/T1
AAL 1
FR
FR
Compression (optional)
Compression (optional)
Compression (optional)
CBR
FRAD protocol stack for voice over IP/LAN
LAN MAC
IP
Compression (optional)
VoIP
FRAD protocol stack for voice over Internet
Internet link (e.g., FR, ATM, T1/T3)
IP
Compression (optional)
FRAD protocol stack for voice over IP carried in an ATM WAN (private or public)
SONET
ATM etc.
CIOA/ MPLS
IP
Compression (optional)
Voice over LANs
LAN/ Internet
IP
H.323
Compression
5 PC
Dial-up Modem
Figure 1.3 Internet telephony.
Router
FAX
PSTN Local Access
Analog Phone
Internet Telephony Server
10baseT/ 100baseT
Router
Cable Modem POP
Data POP
PRI, T-1
Modem Pool
HFC network
Cable Modem
10baseT
PC
PRI, T-1
DS-3
PSTN Wide Area
Internet or Data Network
Router
Router
PC
PRI, T-1
DS-3
Modem Pool
HFC network
PRI, T-1
Data POP
Internet Telephony Server
10baseT/ 100baseT
Router
Cable Modem POP
Cable Modem
10baseT
PC
PC
Dial-up Modem
PSTN Local Access
Analog Phone
Router
FAX
6
Chapter One
After this introduction in Chapter 1, a basic review of IP technologies is provided in Chapter 2, which covers IP, IPv6, RSVP, RTP, and MPLS. Chapter 3 discusses voice characteristics that can be utilized in packet networks. Chapter 4 discusses adaptive differential pulse code modulation (ADPCM) as applied to packet network environments. Chapter 5 provides an overview of vocoder-based compression methods used in IP. Chapter 6 covers various proposals for delivery of voice in IP environments. Chapter 7 covers the important topic of signaling. Chapter 8 provides a major review of QoS technologies. Chapter 9 covers voice over MPLS. Chapter 10 addresses directory services. Chapter 11 looks at opportunities for traditional carriers. Finally, Chapter 12 briefly looks at wireless opportunities.
1.2
Drivers for Voice over IP
This section discusses a number of drivers for voice over IP.
The Positive Drivers Besides the potential for savings on long-distance phone charges to communicate with friends or relatives, Internet phones already have a place in the business world. For example, one can leave Internet phones turned on and ready for calls throughout the day; the technology is useful for communicating with coworkers in other parts of the building and at other locations by simply dialing them up on the Internet videophone. If they are at their desks, they can answer immediately. It can be a fine way to ask work-related questions without taking one’s hands off the keyboard [3]. The technology is good for telecommuters, who can dial in to the office and see and speak to coworkers while getting a glimpse of the office from home [3, 4]. Similarly, it can be good for distance learning applications [5]. There are both market and business drivers for the introduction of voice telephony over IP at this time. There have been four main stages of VOIP evolution in the past few years [6]: 1. PC-to-PC (since 1994) ⇒ Connects multimedia PC users, simultaneously online ⇒ Cheap, good for chat, but inconvenient and low quality 2. PC-to-phone (since 1996) ⇒ PC users make domestic and international calls via gateway ⇒ Increasingly services are “free” (e.g., Dialpad.com) 3. Phone-to-phone (since 1997) ⇒ Accounting rate bypass ⇒ Low-cost market entry (e.g., using calling cards)
Introduction and Motivation
7
4. Voice/Web integration (since 1998) ⇒ Calls to Web site/call centers and freephone numbers ⇒ Enhanced voice services (e.g., integrated messaging) Deregulation in the United States and elsewhere could mean that both incumbent carriers and new carriers can enter the market with new services. At various times in the past twenty years, a variety of carriers in the United States were precluded from entering certain telecommunication service sectors. One of the goals of the Telecommunications Act of 1996 was to change that. However, there has recently been a major slowdown in the competitive carrier landscape. This slowdown will be a major drag on the introduction of VOIP, since the new carriers were the principal beneficiaries of a less expensive technology that could be deployed in greenfield environments. The technology to carry voice over data networks is evolving, as noted in the introduction. There are economic advantages to end users in utilizing an integrated network, not only in terms of direct transmission costs, but also in reducing the network management costs of running separate and technologically different networks. That is the ultimate goal. In the meantime, many companies are, and will be for some time in the future, supporting the infrastructure and cost of multiple networks, including PSTN, private enterprise networks, wireless networks, intranets, business video networks, Internet access networks, and Internet-based Virtual Private Networks. Hence, the need to optimize the usage of all media components on all networks simultaneously, and to take advantage of pricing alternatives between networks, will become even more important as these networks proliferate in the corporate environment, and as the service providers offer increasingly competitive prices [7]. Separate from technology considerations, business drivers must come into play. Carriers need to make a positive bottom line (e.g., a 15 to 25 percent net bottom line) and be sustainable and self-sufficient. There are new revenue opportunities for Internet service providers (ISPs) in bundling voice service with Internet access. The interexchange carriers (IXCs) can avoid access charges. The local exchange carriers (LECs) can undercut the long-distance prices and offer InterLATA services without necessarily having to follow the traditional approach. Cable TV operators can bundle packet voice with cable services and perhaps find a better way to enter the telephony business without having to follow the classical time-slot-interchange method. Wireless companies can make more efficient use of the precious radio spectrum. Figures 1.4 to 1.9 depict typical carrier applications, based on reference [6]. All of these stakeholders can benefit by adding value to the network instead of just growing linearly to simply reach more physical points, and they can benefit by optimizing the economics of both packet-switched and circuitswitched networks. However, the major breakthrough has to come in the form of new services. Simply replacing a circuit-based transport mode with a packet-based transport mode will not justify the replacement of the old network with the new.
Figure 1.4 PC-to-PC, over IP.
Internet
Phone Gateway Computer
Phone Gateway Computer
• Needs similarly equipped Internet users (e.g., IP telephony software, multimedia PC etc), both logged-on simultaneously • Main applications: avoidance of usage-based telephone charges, chat-rooms, company LANs • Application providers include Firetalk, Phonefree • Potential Market: <50 million users?
In particular, the past few years have seen the emergence of reduced bit-rate voice compression algorithms that can increase the carrying capacity of a network by nearly tenfold (that is, by an order of magnitude) without the investment of additional resources in long-haul transmission facilities. The deployment, for example, of a network supporting near-toll-quality voice at 5.3 kbps rather than the twenty-five-year-old method of 64-kbps-per-call pulse code modulation (PCM) is not likely to be feasible in the context of an existing public switched telephone network because of the extensive embedded base of legacy equipment. Hence, if
Internet
Phone Gateway Computer
Desktop PC Telephone Public Switch
Figure 1.5 PC-to-phone (or fax), over IP.
Fax
• Internet users with multimedia PC able to call any phone or fax user (not, at present, vice versa) • Main motivation: Reduced telephone charges, "free" calls to U.S., Korea, Hongkong SAR etc. • Service providers include Net2Phone, DialPad etc. • Market potential: Sending > 250 million Web users, receiving >1.3 billion telephone/mobile users
8
Internet
Phone Gateway Computer
Fax
Phone Gateway Computer
Telephone
Telephone Public Switch
Fax
• Any phone/fax/mobilephone user to any other • Main motivation: Reduced call charges, accounting rate bypass, market entry for non-facilities-based carriers (e.g., via pre-paid cards) • Service providers include speak4free, I-link etc. • Market potential: >1.4 billion phone/fax/mobiles
Figure 1.6 Phone/mobile-to-phone/mobile (fax-to-fax), over IP.
Internet
Web Server
Telephone Desktop PC
Figure 1.7 PC-to-Website/call center, over IP.
Public Switch
Phone Gateway Computer Service Provider
• Internet users with multimedia PC browse Website and choose voice/video connection option • Main motivation: Service provider can interact directly with potential clients, via voice or video, for instance for telemarketing, freephone access • Service providers include NetCall, ITXC etc. • Market potential: >300 million Internet users
9
10
Chapter One
Figure 1.8 Phone/mobile-toWebsite/e-mail, over IP.
Internet
Phone Gateway Computer at Local PoP
Web Server
Public Switch Telephone
Mobilephone
Phone Gateway Computer Service Provider
• Phone or mobilephone users utilize enhance services (e.g., integrated messaging, voice response) available from IP service provider • Main motivation: Integrated messaging, computer telephony integration, m-commerce • Market potential: >1.4 billion phone/mobile users • Service providers include Yac.com, T2mail etc.
there is a desire to use the new compression algorithms and achieve a tenfold efficiency gain, then the IP route may be the way to go. Voice over IP can be deployed in private enterprise networks, but some technology suppliers are concentrating on providing new solutions for carriers, consistent with the approach just outlined. Applications of the evolving VOIP technology include the following: • • • • • • •
Internet voice telephony Intranet and enterprise network voice telephony Internet fax service Internet videoconferencing Multimedia Internet collaboration Internet call centers PBX interconnection
It is worth noting that there has been considerable progress recently in developing standards (with supporting equipment to follow) in the area of LAN/intranet-based multimedia (with compressed speech), as shown in Figure
Introduction and Motivation
11
1.9. These efforts will likely become the underpinnings of standards-based approaches to VOIP. Recent analysis from the firm Frost & Sullivan, published in the 2001 report U.S. Market for Enhanced IP-Based Voice Services, reveals that the VOIP industry generated a revenue of $520 million in 2001 and that the market is projected to reach $31.8 billion by 2007 [8]. There has been some penetration in intranet environments. For example, Cisco reportedly expects to deploy VOIP service at over 41 sites, connecting the 35,000 Cisco IP phones already in place, to provide enhanced internal communications [9]. A few years ago, VOIP was the domain of just a handful of early pioneering companies, including 3Com, Cisco, Clarent, Nuera Communications, and Hypercom. Now, however, this convergence technology has finally been embraced by the more traditional networking and telecommunications vendors who had previously viewed VOIP as a serious threat to their installed bases. The early VOIP pioneering companies have been joined by the classic PBX vendors—Alcatel, Avaya, Ericsson,
Network
N-ISDN
PSTN
Iso-Ethernet IEEE802.9
Packetswitched
B-ISDN (ATM)
B-ISDN (ATM)
Multimedia standard
H.320
H.324
H.322
H.323
H.321
H.310
Audio/voice
G.711 (M) G.722 G.728
G.723.1 (M) G.729
G.711 (M) G.722 G.728
G.711 (M) G.722 G.728 G.723.1 G.729
G.711 (M) G.722 G.728
MPEG1 (M) G.711 (M) G.722 G.728
Audio rates, Mbps
64 48–64
5.3–6.3 8
64 48–64
64 48–64 16 5.3–6.3 8
64 48–64
n × 64 64 48–64 16
16
16
16
Video
H.261 (M)
H.261 (M) H.263 (M)
H.261 (M)
H.261 (M) H.263
H.261 (M)
H.262 (M) (MPEG-2) H.261 (M)
Data*
T.120
T.120
T.120
T.120
T.120
T.120
Multiplex
H.221 (M)
H.223 (M)
H.221 (M)
H.225.0 (M)
H.221 (M)
H.222.0 (M) H.222.1 (M)
Control
H.242 (M)
H.245 (M)
H.242 (M)
H.245 (M)
H.242 (M)
H.245 (M)
Signaling
Q.931
—
Q.931
H.225.0 (Q.931)
Q.931
Q.2931
(M) = Mandatory *For example, Whiteboarding application
Figure 1.9 Evolution of voice over data networks via multimedia applications.
12
Chapter One
Figure 1.10 VOIP vendor breakdown. (Courtesy of NetworkWorld, 1/29/01.)
40 30 20 10 0 Survey product categories Complete VoIP system VoIP gateway only VoIP gatekeeper/ call agent only IP PBX VoIP-based PSTN/CO replacement switch IP Phone
Mitel, NEC, Nortel Networks, and Siemens. By 2000, all of the vendors had introduced viable VOIP products—often in the form of add-ons, which “IP-enabled” the latest versions of the vendors’ TDMs and switching matrix-based PBXs [10]. There were 175 VOIP vendors in 2000; a segment-by-segment breakdown of the key players is shown in Figure 1.10. Interoperability among VOIP products has been a major stumbling block to widespread acceptance of the technology. The International Telecommunications Union—Telecommunications (ITU-T) H.323 “umbrella” standard, the first posed for VOIP interoperability, proved complex to implement. As a result, in its place were proposed other less complicated standards, from which a number of loosely related standards—for example, the ITU-T H.323 and H.248/MEGACO, the IETF Session Initiation Protocol (SIP), the Media Gateway Control Protocol (MGCP), and the International Softswitch Consortium (ISC) specifications—emerged. The expectation is that no single standard will be predominant over the next couple of years; interoperability and coexistence will therefore be important [10]. Naturally, there are going to be challenges in deploying IP-based voice services. Table 1.1 depicts some of these challenges and some potential ways around them.
The Negative Drivers Voice over packet in general and voice over IP in particular have seen tremendous trade press, consultant, vendor, and conference hype in the past five years. Considering the amount of air time given to the topic, the amount of actual deployment to date is underwhelming. What has held back the deployment of the technology includes the following:
Introduction and Motivation
13
Table 1.1 Challenges and Quality Issues Problem PC too overloaded to run vocoder; processing delays too long Congested data networks
Protocol limitations Poor end-to-end network service Limited routing and directory capabilities
Possible solution Use standard terminals and PBXs, and use a high-power Internet voice telephony processor. Use compression, in conjunction with echo cancellation and packet recovery technology. Move to POS gigabit and /or Ethernet-based backbones and switched LAN segments. Look to deploy IPv6, RSVP, and MPLS. Upgrade to broadband, use switching, and use premium Internet services. Directory services are becoming available, for example, IETF ENUM.
• An underestimation of the importance, complexity, and purpose of signaling and the need to interconnect with the 1.4 billion telephone sets already deployed globally. • The confusion brought about by the multiplicity of signaling protocols that have been advanced. • A lack of understanding of the economic value of the embedded base of equipment in carrier networks that likely will continue to operate flawlessly for what it was intended to do (make money with net bottom line of around 20 percent) for a decade or more. • A lack of understanding of where the true costs to deliver telephone services are. • A lack of understanding of the commoditization of both switched voice and high-speed bandwidth, obliterating the value from any savings related to the “bandwidth efficiency” of VOIP. • A lack of understanding that any bandwidth savings in VOIP has nothing to do with IP but everything to do with vocoding algorithms that can well be supported without IP. • The difficulty in securing QoS in public (and even private) IP networks. • The most critical failure of all: a misunderstanding of where the value of VOIP has to be to make it successful. The bright spot for VOIP is that it brings forth a panoply of new services not possible with circuit-switched voice, not that it acts as a transport mechanism to replace existing trunks and/or Class 5 switches. Why would anyone scrap what has over the years been called the best telephone system in the world just to replace it with something that does just the same and nothing more? A VOIP network cannot be just a “me too” network: “me too” carries voice.
14
Chapter One
Developers have to start focusing in earnest on bringing out new applications, not the chronic litany of “bandwidth efficiency” advantages. There is a reported glut of fiber. The Dense Wavelength Division Multiplexing (DWDM) can deliver terabits per second on fibers. The alleged need to save bandwidth is suspect and anachronistic; it is a dead end for VOIP. Telecom professionals have to follow the lead of the PC developers: For well over a decade now, software developers have stopped trying to be highly parsimonious in RAM/disk drive efficiency so that an entire suite of new services of convenience on the desktop could be developed. Many new applications and capabilities have been developed on the desktop, including program linkage (audio and video) and easy-to-use browsers. The story of VOIP has to be less on IP and the many new protocols that one can invent overnight, and much more on new applications, possibilities, services, interactions, voice capture, storage, manipulation, dissemination, interplay, and so on. That is why the first edition of this book had a chapter that, at face value, did not seem to fit in with the sophistication and intricacies of all those 300-page Internet protocols out there; it was a chapter on applications, or more specifically, on self-serve customer service Web-site applications. The message conveyed for VOIP to be successful: It is the applications, not the protocols. The same is true with audiophiles: It is the musical composition/content coming off the speaker that makes all the difference, not that one speaker rolls off at 44 Hz and the other at 43 Hz. Voice over IP is not going to replace existing carrier networks in North America. Also, with PSTN-based voice becoming so cheap, it may not be worth the trouble to place voice on the intranet, except possibly for international applications. The future of VOIP must reside with new applications that it can and should bring forth.
1.3
Approaches for IP-Based Voice Systems
There have basically been two approaches to deploying VOIP in networks: 1. Desktop approach. Each individual in the organization purchases VOIPenabled terminals, which can be used to support remote communications. This is in the vein of the Computer Telephony Integration (CTI) approach and the ITU-T H.323 terminal (as implied by the capabilities listed in Figure 1.10). In effect, this is similar to each user having a personal computer (PC)–based modem, a desk-resident fax machine, or a dedicated printer, rather than having a shared, network-resident server to support these functions.
Introduction and Motivation
15
2. Shared approach. The VOIP capabilities are developed in an industrialstrength mode, using shared, network-resident servers (as implied by Figure 1.2). The earlier approach is usually the first approach to market, but there are advantages in migrating up to the network-resident model. Figure 1.11 depicts what a VOIP server could look like. As noted, voice compression is going to be a key enabling technology for new IP-based voice services. Digital Signal Processing (DSP) has progressed to the point where (in supporting 10 to 20 million instructions per second [MIPS]) good-quality voice is achieved. Recently adopted ITU-T standards, such as G.723.1, G.729, and G.729A, are discussed in some depth in the chapters that follow. In the past, voice analysis and synthesis using what is called vocoder technology produced a robotic-sounding voice, but this changed dramatically during the 1990s. A lot of work has also gone into subjective testing of the voice to determine how good the proposed algorithms are. The most frequently used test in ITU-T SG12 is the absolute category rating (ACR) test. Subjects listen to about 8 to 10 s of speech material and are asked to rate the quality of what they heard. Usually a five-point scale is used to represent the quality ratings, such as 5 = excellent to 1 = bad. By assigning the corresponding numerical values to each rating, a mean opinion score (MOS) can be computed for each coder by averaging these scores. Typically, a test must include a selection of material (e.g., male and female utterances) [11]. Vocoder technologies are covered in Chapters 4 and 5.
Voice Servers Approach A number of vendors have announced Internet Telephony Server (ITS) gateway technologies. These servers can be installed on PSTNs to route telephone calls over data networks such as the Internet. This kind of technology is being positioned to the carriers as a means to offer cost-effective alternatives to traditional longdistance calling or to enhance their data offerings by adding voice service, therefore creating a new revenue-generating or revenue-protecting opportunity. In addition to voice or fax services over the Internet, future applications of the ITS include phone-to-computer, computer-to-phone, and computer-to-computer connections. It will also enable audioconferencing, messaging, videoconferencing, call center operations, and media collaboration over the Internet [12]. A typical ITS server functions at a user or system level, like a PBX Tie Line. User selection of the IP network for voice or fax calling can be automated using PBX Least Cost Routing algorithms or can be dial-selected by users (e.g., by dialing 8 to access the IP/Internet). The use of the IP network could even be mandated through customer programming of PBX networking features. For example, customers could choose to have fax machines use only the Internet for intra-company
16
T-1/PRI/analog interface
Figure 1.11 Typical voice over IP server.
PBX
T-1/E-1, PRI, analog from PSDN or
I/O bus
DSP array
DS-0 TDMA bus
Management port/ console
CPU
Databases and MIBs
I/O bus
100baseT interface
PCI bus
10baseT interface
100baseT to data net
100baseT to data net
Introduction and Motivation
17
PC with Internet phone client (Cooltalk, Netmeeting)
PC with fax client
Internet Telephony Server 1 IP
Internet Telephony Server 2
Internet
IP T1, E1, PRI
Fax
Fax
Switch
Switch PSTN/Private
Phone 2
Phone 1
Figure 1.12 Typical Internet Technology Server (ITS). correspondence, while allowing voice calls to use a combination of private or PSTN and intranet facilities. External fax correspondence could also use the intranet or Internet by using the tail-end hop-off networking feature of the PBX (or ISP) to escape from the data network at the closest point of presence to the terminating fax number. Also, remote users could access the ITS for voice and fax calls over the Internet from PC-net phones (IP phones), allowing a single remote connection to be used for data, e-mail, voice mail, fax, and real-time voice calls. Such remote users might also receive all of the benefits of their host PBX or ISP service through remote access over the intranet or Internet [7]. [T]o make a call, one of the users will simply pick up the phone as usual and dial the phone number of the second user. The PBX will route the call to its Internet Telephony Server (treating it as just another PBX Tie Line) [see Figure 1.12]. The ITS will then make a “packet call” over the Internet to the distant ITS that will place the call through its associated switch. After the connection between the parties has been established, the rest of the phone
18
Chapter One
operation will continue as if the call was going over the regular PSTN. Up to 24 sessions can take place simultaneously, each including either a single voice or fax call, per each T1 card in the ITS. An ITS may employ a dynamic and transparent routing algorithm for its operation (e.g., no routing decisions on the part of the customers are needed). In addition, if the quality of the “IP” network falls below pre-specified threshold, backup to PSTN is automatic. [7]
IP Voice and Video Phones Internet phones and Internet videophones are entering the market, making it possible to talk to and see one or more remote parties over the Internet. Proponents make the case that PCs have always been the perfect vehicle for communication. Now, in conjunction with an ISP, one can use the PC to call anyone on the Internet (with the appropriate hardware and software) anywhere, anytime. For the time being, it does not cost any more to make these calls than the monthly ISP charges of $19.95, $14.95, or even $9.95. Some voice over Internet providers do not charge monthly access fees, but charge only on a per-minute basis. There is a gamut of products on the market, including the following: 1. 2. 3. 4.
1.4
Audio-only Internet phones Videoconferencing systems Internet videophones with audio and video transmission capabilities Server-based products or higher-end videoconferencing products that cost between $5,000 and $10,000
The Future
It is expected that IP-based telephony is going to see continued penetration in corporate intranets, extranets, and the Internet in the next few years. This book, in conjunction with the companion Wiley texts [1, 13], should give planners enough information to begin to assess and evaluate the value of this evolving technology for their own environments.
References 1. D. Minoli and A. Schmidt. Switched Network Services. New York: Wiley, 1998. 2. D. Minoli and E. Minoli. Web Commerce Handbook. New York: McGraw-Hill, 1998.
Introduction and Motivation
19
3. L. Sweet. “Toss Your Dimes—Internet Video Phones Have Arrived.” ZD Internet Magazine (August 1997): 57 ff. 4. O. Eldib and D. Minoli. Telecommuting. Norwood, MA: Artech House, 1995. 5. D. Minoli. Distance Learning Technology and Applications. Norwood, MA: Artech House, 1996. 6. Tim Kelly, “IP Telephony: Substitute or Supplement,” Telecoms@The Internet VI, IIR, Geneva, June 2000. 7. Lucent Technologies. www.lucent.com/BusinessWorks/internet/wap119a1.html. 8. D. Malossi and T. Harrison. “Bundled IP-Based Voice Applications Speak to Consumer Needs.” www.pulver.com. December 2001. 9. R. Acher. “Genuity Rolls Out Black Rocket Voice VOIP Service.” www .pulver.com. December 2001. 10. B. Yocom. “Voice over IP Is a (Fast) Moving Target.” Network World (January 29, 2001). 11. R. V. Cox and P. Kroon. “Low Bit-Rate Speech Coders for Multi-media Communication.” IEEE Communications Magazine (December 1996): 34 ff. 12. Lucent Technologies. www.lucent.com/press/0497/970416.nsa.html. 13. D. Minoli and E. Minoli, Delivering Voice over Frame Relay and ATM, New York: Wiley, 1998.
CHAPTER 2 An Overview of IP, IPOATM, MPLS, and RTP 2.1
Introduction
IP-based networks are ubiquitous in the corporate landscape and in the Internet. Now developers and planners are looking at voice over IP (VOIP) for intranet and enterprise network applications, and at voice over the Internet for geographically dispersed applications [1]. Bandwidth efficiency and quality are the principal trade-offs in this arena. Products for VOIP are emerging because organizations have significant investments in private data facilities that have the capacity available to carry additional on-net traffic at what is perceived to be little initial incremental expense. Table 2.1 documents key technical requirements for the carriage of voice over an IP network. The issue, however, is that IP by itself has limited QoS support. Hence, one needs to look at other supplementary methods, such as ATM support of IP [3], Multiprotocol Label Switching (MPLS) [2], differentiated services (diffserv), or integrated services (intserv). The Internet now has millions of hosts connecting millions of people worldwide. The Internet provides connectivity for a wide range of application processes called network services: One can exchange e-mail, access and participate in discussion forums, search databases, browse indexes and sites, transfer files, and so forth. Use of the Internet for multimedia applications, including voice, is a relatively new development. The ability to carry voice (and fax) across an IP network or the Internet creates a cost-effective way to support intracorporate and intercorporate communications (see Table 2.2). It is a well-known fact that ATM is a multimedia, multiservice, multipoint technology; hence, support of voice should be technically and theoretically more practi-
21
22
Chapter Two
Table 2.1 Basic Voice-Feature Requirements for Voice over Data Applications Feature Compression. Sub-PCM compression significantly reduces the amount of bandwidth used by a voice conversation, while maintaining high quality. Silence suppression. The ability to recover bandwidth during periods of silence in a conversation makes that bandwidth available for other users of the network. QoS. Assuring priority for voice transmission is critical. This keeps delay, delay variation, and loss to a tolerable minimum.
Signaling for voice traffic. Support of traditional PBXs and the associated signaling is critical. Echo control. Echo is annoying and disruptive. Control is key. Voice switching. Data network equipment can generally support on-net applications. Off-net is also critical. At the very least, the adjunct equipment must decide whether to route a call over the internal data network or to route it to the public switched telephone network.
Requirement Must have.
Must have.
Must have. Very little current support [type of service (TOS) and its generalization, diffserv, are not generally implemented on a global scale]. There is a hope that the Resource Reservation Protocol (RSVP), which reserves resources across the network, will help. However, RSVP is only a protocol; intrinsic network bandwidth must be provided before a reservation can be made. Must have for real applications. Must have for real applications. Ability to route off-net is a must for real applications.
cal than over IP. ATM was designed to be a multimedia, multiservice technology. ATM supports extensive QoS and service class capabilities, allowing time-sensitive traffic, such as voice, to be transported across the network in a reliable, jitter-free manner, and ATM switches have been designed with effective traffic management capabilities to support the QoS and service classes needed for the various applications, including voice [3]. The issues, however, are: (1) ATM is not widely deployed and is relatively expensive (this being partially due to the increased throughput it supports), (2) many existing applications are IP-based, and (3) ATM should be able to support IP in an effective manner. Figure 2.1 depicts the typical protocol stack for VOIP in the media plane. There is an upper-layer portion to the stack based on RTP/UDP and a lower-layer
Table 2.2 Advantages of Voice over IP Advantage Long-distance cost savings
Reduced equipment investment
Approach By integrating voice, data, and fax over an IP enterprise network, a company can reduce long-distance charges for intracompany calls. By reducing the number of access lines, the organization can also reduce the FCC charges. Employees, regardless of location, can communicate with each other toll-free for as long as is wanted. Companies generally lease or purchase separate equipment and facilities for voice support. With VOIP, the cost of securing and servicing equipment is reduced, because all intracompany traffic, voice and data, is delivered over the same network.
23
Audio I/O Equipment
RAS Control H.225.0
Receive Path Delay (sync)
Figure 2.1 Media transport functions.
Note: Signaling messages are also carried over the same stack, but RSVP and LDP are not shown.
System Control User Interface
Call Control H.255.0
H.245 Control
System Control
Audio Codec G.711, G.722, G.723, G.728, G.729
User Data Applications T.120, etc.
Video Codec H.261, H.263
Video I/O Equipment
H.225 Layer
UDP
TCP
RTP RTCP
UDP
TRADITIONAL OVERLAY TRANSPORT NETWORKS
DWDM
DWDM
DWDM
POS SONET/SDH
ATM
SONET/SDH
ATM
IP
INTELLIGENT IP-BASED OPTICAL NETWORKS
Intelligent Optical Net
ETHERNET (GbE, 10GbE)
MPLS
24
Chapter Two
transport portion based on ATM, POS, MPLS, and Ethernet. Media plane protocols are covered in this chapter; signaling plane protocols are covered in Chapter 7; and QoS protocols are covered in Chapter 8. This chapter covers the following issues that will play a role in voice over data networks: • • • • •
2.2
IP and IPv6 [4, 5] IP over ATM MPLS RTP Stream Control Transmission Protocol (SCTP)
Internet Protocol
This section highlights key IP functionability and capabilities.
The Role of the IP TCP/IP is the name for a family of communications protocols used to support internetting in enterprise and interenterprise applications. Protocols include the Internet Protocol (IP), the Transmission Control Protocol (TCP), the User Datagram Protocol (UDP), and other protocols that support specific tasks, such as transferring files between computers, sending mail, or logging into another computer. TCP/IP protocols are normally deployed in layers, with each layer responsible for a different facet of the communications. Each layer has a different responsibility. 1. The link layer (sometimes called the network interface layer) normally includes the device driver in the operating system and the corresponding network interface card in the computer. Together they handle all the hardware details of physically interfacing with the cable. 2. The network layer (sometimes called the internet layer) handles the movement of packets in the network. Routing of packets, for example, takes place here. IP provides the network layer in the TCP/IP protocol suite. 3. The transport layer provides a flow of data between two end system hosts for the application layer above. In the Internet protocol suite there are two transport protocols, TCP and UDP. TCP provides a reliable flow of data between two hosts. It is concerned with such things as partitioning the data passed to it from the application into appropriately sized frames for the network layer below, acknowledging received packets, and setting time-outs to
An Overview of IP, IPOATM, MPLS, and RTP
25
make certain that the other end acknowledges packets that are sent. Because this reliable flow of data is provided by the transport layer, the application layer can ignore all those details. UDP, on the other hand, provides a much simpler service to the application layer. It sends packets of data called datagrams from one host to the other, but there is no guarantee that the datagrams will be delivered to the other end. Any desired reliability must be added by the application layer. 4. The application layer handles the details of the particular application. There are many common TCP/IP applications that almost every implementation provides: • Telnet for remote login • The file transfer protocol (FTP) • The Simple Mail Transfer Protocol (SMTP) for e-mail • The Simple Network Management Protocol (SNMP) • Others In this architecture, TCP is responsible for verifying the correct delivery of data from the sender to the receiver. TCP allows a process on one end system to reliably send a stream of data to a process on another end system. It is connectionoriented: Before transmitting data, participants must establish a connection. Data can be lost in the intermediate networks. TCP adds support to detect lost data and to trigger retransmissions until the data is correctly and completely received. IP is responsible for relaying packets of data [protocol data units (PDU)] from node to node. IP provides the basis for connectionless best-effort packet delivery service. IP’s job is to move—specifically to route—blocks of data over each of the networks that sit between the end systems that want to communicate. IP provides for the carriage of datagrams from a source host to destination hosts, possibly passing through one or more gateways (routers) and networks in the process. An IP protocol data unit (datagram) is a sequence of fields containing a header and a payload. The header information identifies the source, destination, length, and characteristics of the payload contents. The payload is the actual data transported. Both end system hosts and routers in an internet are involved in the processing of the IP headers. The hosts must create and transmit them and process them on receipt; the routers must examine them for the purpose of making routing decisions and modify them (e.g., update some fields in the header) as the IP packets make their way from the source to the destination. IP protocols are supported over a variety of underlying media, such as ATM, frame relay, dedicated lines, ISDN, Ethernet, DSL, and so forth. As IP networks have become ubiquitous, the business community has become sophisticated about utilizing IP networks as a cost-effective corporate tool, first in data communications and now for other real-time applications. Organizations favor networks based
26
Chapter Two
on IP because of the flexibility and vendor support. IP networks run under the most widely used network operating systems; they are scaleable to a large degree; and they enjoy extensive implementation across product lines (e.g., in the routers, PC clients, server switches, etc.). As noted, a relatively new IP application now in demand is toll-quality, low-bandwidth voice (and fax) transmission over IP networks. Intranets use the same WWW/HTML/HTTP and TCP/IP technology used for the Internet. When the Internet caught on in the early to mid-1990s, planners were not looking at it as a way to run their businesses. But just as the action of putting millions of computers around the world on the same protocol suite fomented the Internet revolution, so connecting islands of information in a corporation via intranets is now sparking a corporate-based information revolution. Thousands of corporations now have intranets. Across the business world, employees from engineers to office workers are creating their own home pages and sharing details of their projects with the rest of the company.
IP Routing One of the common ways to interconnect LANs and subnetworks at this time is through the use of routers. Routers are found at the boundary points between two logical or physical subnetworks. Routing is a more sophisticated—and, hence, more effective—method of achieving internetworking, as compared to bridging. In theory, a router or, more specifically, a network layer relay, can translate between a subnetwork with a physical layer protocol P1, a data link layer protocol DL1, and a network layer protocol N1 to a subnetwork with a physical layer protocol P2, a data link layer protocol DL2, and a network layer protocol N2. In general, however, a router is used for internetworking two networks or subnetworks that use the same network layer but have different data link layer protocols [6–8] (see Figure 2.2). Routers have become the fundamental and the predominant building technology for data internetworking; however, ATM technology will likely impact the overall outlook. Routers permit the physical as well as the logical interconnection of two networks. Routers support interconnection of LANs over WANs using traditional as well as new services, including frame relay and ATM. Some routers operate directly over synchronous optical network (SONET). They also are utilized to interconnect dissimilar LANs, such as Token Ring to Ethernet. With the introduction of Layer 2 switching, ATM, and /or MPLS, however, the role of routers in enterprise networks could change slightly. For example, devices enabling connectivity between locations based on router technology may, conceivably, no longer be obligatory elements—but the concept of routing (forwarding frames at the network layer of the protocol model) will certainly continue to exist. In addition, routers work well for traditional data applications, but new broadband video and multimedia applications need different forwarding treatment, higher throughput, and tighter QoS control.
An Overview of IP, IPOATM, MPLS, and RTP
Host A (end system A)
Host B (end system B)
Application
Application
Presentation
Presentation
Session
Session
Transport
Router
Transport
Network
Network
Network
Data Link
Data Link
Data Link
Physical
Medium Type A 110110110
Physical
Medium Type B 110110110
27
Physical
Figure 2.2 Protocol view of routers.
The use of routers allows the establishment of distinct physical and logical networks, each having its own network address space. Routing methodologies are becoming increasingly sophisticated as topologies become larger and more complex. There are a variety of protocols supported by various LAN subnetworks that need to interwork to make end-to-end connectivity feasible. The more common network layer protocols are IP, IPX, and AppleTalk, although the general direction is in favor of IP. Routers can be used to connect networks in building or campus proximity or to support wide area connections. The communication technology that can be used includes low-speed, high-speed, and broadband dedicated-line services, as well as low-speed, high-speed, and broadband switched services. Routing deals with techniques to move PDUs to distinguishable Layer 3 entities [8]. The issue of distinguishability relates to the address assignment, which is covered later on in this chapter. There are two key underlying functions: (1) determination of optimal routes, and (2) movement (forwarding) of information through the internet. Routers build their routing tables through information obtained via routing protocols; these protocols allow routers on an internet to learn about one another and to keep current about the optimal way to reach all attached networks. Routers interconnect different types of networks and embody the ability to determine the best route to reach the destination. Path determination is accomplished through the use of algorithmic metrics that are functions of such network parameters as path length, available bandwidth, path security level, path cost, path QoS, and so
28
Chapter Two
forth. Generally, these metrics are implemented in software. Values required by the path determination algorithm are stored in router-resident routing tables of appropriate depth; the entries of the table are populated through local, as well as remote, information that is circulated around the network. Routing protocols are the adjunct mechanism by which routers obtain information about the status of the network. That is to say, routing protocols are used to populate routing tables and to calculate costs. Conceptually, routers operate by distributing, often through broadcast, advertisement PDUs that signal their presence to all pertinent network nodes. These advertisement PDUs also signal to the other routers destinations that are reachable through the advertising router or through links to neighbors. Routers communicate with other routers for the purpose of propagating the view of the network connections they have, the cost of connections, and the utilization levels. A number of techniques are available to populate the routing tables, and thereby support routing of information PDUs. Static routing requires the network manager to build and maintain the routing tables at each router or at a central route server. This implies that once configured, the network paths used for the PDUs must not change. A router using static routing can issue alarms when it recognizes that a communication link has failed, but it will not automatically update the routing table to reroute the traffic around the failure. Static routing is, therefore, typically used in limited-distance internets, such as in a building’s backbone or in a campus. Dynamic routing allows the router to automatically update the routing table and recalculate the optimal path, based on real-time network conditions (e.g., link failures, congestion, etc.). Routers implementing dynamic routing exchange information about the network’s topology with other routers. Dynamic routing capabilities are the most desirable, because they allow internets to adapt to changing network conditions.1 Dynamic routing is nearly always used when internetworking across WANs. Dynamic routers regularly update the view of the entire network; this view also includes a map of devices operating at or below the network layer. Some dynamic routers also support traffic balancing. Special routing protocols are used to communicate across different administrative domains (such as an organization’s intranet and the Internet). To support effective communication, the exchange of appropriate routing and status information among routers is required. The routers exchange information about the state of the network’s links and interfaces, and about available paths, based on different metrics. Metrics used to calculate optimal paths through the network include cost, bandwidth, distance, delay, load, congestion, security, QoS, and reliability. Routing protocols are used as the means to exchange this vital information. The three protocols commonly used in the TCP/IP context are RIP, IGRP, and OSPF. The process of reconfiguring the routing tables (the process called convergence) must occur rapidly so as to prevent routers with dated information from misrouting PDUs. Two methodologies are used for information dissemination, distance vector and link-state. Routers that employ distance vector techniques create a network map by
An Overview of IP, IPOATM, MPLS, and RTP
29
exchanging information in a periodic and progressive sequence. Each router maintains a table of relative costs (hop count or other weights, such as bandwidth availability) from itself to each destination. The information exchanged is used to determine the scope of the network via a series of router hops. After a router has calculated each of its distance vectors, it propagates the information to each of its neighboring routers on a periodic basis, say, once every 60 seconds.2 If any changes have occurred in the network, as inferred from these vectors, the receiving router modifies its routing table and propagates it to each of its own neighbors. The process continues until all routers in the network have converged on the new topology. Distance vector routing was the early kind of dynamic routing. Distance vector protocols include RIP, IGRP, and DECnet Phase IV. Distance vector protocols can be implemented in a reasonably simple manner. Routers using link-state protocols learn the topology of the internetwork infrastructure and update each other’s tables by periodically flooding the network with link-state information. This information includes the identification of the links or subnetworks directly connected to each router, and the cost of the connection. Routers using the Open Shortest Path First (OSPF) algorithm send link-state information to all routers on the internet; in turn, these routers use the information to populate a table of routers and link and subnetwork connections. After this, each router calculates the optimal path from itself to each link; indirect paths are discarded in favor of the shortest path. Link-state routing is a newer form of dynamic routing. Here, routers broadcast their routing updates to all routers within the administrative domain. Since routing information is flooded, rather than just sent between neighboring routers as is the case in distance vector environments, each router can develop a complete map of the network topology. Given the topology map, each router can then calculate the best path to each destination. Link-state routing may well be the preferred choice in the future because it requires less bandwidth than distance vector routing and converges much faster following a topology change. The higher processing requirement for link-state routing algorithms becomes less important as processor performance increases and the price per million operations per second continues to go down. Link-state protocols are indicated for rules-based routing and support of type of service or quality of service features. These protocols tend to be resistant to the creation of routing loops. In addition, they enjoy low overhead to support the routing function; bandwidth frugality is achieved through the use of more intensive computing resources and higher memory requirements for the router.
IP Datagrams As noted, in a TCP/IP environment, IP provides the underlying mechanism to transfer information from one end system on one LAN to another end system on the same or a different LAN.3 IP makes the underlying network transparent to the upper layers, TCP in particular. It is a connectionless protocol, where each IP PDU
30
Chapter Two
is treated independently. In this context, PDUs are also called datagrams. IP provides two basic services, addressing and fragmentation and reassembly of long TCP PDUs. IP adds no guarantees of delivery, reliability, flow control, or error recovery to the underlying network other than what the data link layer mechanism already provides. IP expects the higher layers to handle such functions. IP may lose PDUs, deliver them out of order, or duplicate them; IP defers these problems to the higher layers (TCP, in particular). Effectively, IP delivers PDUs on a best-effort basis. There are no network connections, physical or virtual, maintained by IP. The format of an IP PDU is shown in Figure 2.3. It is 20 or more octets long. A partial discussion of the fields, their purposes, and their format follows. The VERS field describes the version of the IP protocol, for example, Version 4. The LEN field is the length of the IP header, counted in 32-bit units. The Type of Service field describes the quality of service requested by the sender for this IP PDU. It has the format: Precedence|D|T|R|xxx
where Precedence is an indication of the priority of the IP PDU; D specifies whether this IP PDU can be delayed (0) or cannot be delayed (1); T indicates the type of throughput desired (0 = normal, 1 = high); R specifies whether reliable 0
4 VERS
8 LEN
16
24
Type of Service
Identification TTL
19
Total Length Flags
Protocol
Fragment Offset Header Checksum
Source IP Address Destination IP Address Options
Padding
Information
Figure 2.3 IP protocol data unit (PDU).
An Overview of IP, IPOATM, MPLS, and RTP
31
subnetwork is required (1) or not (0); and xxx is reserved for future use. The precedence options are Routine (000), Priority (001), Immediate (010), Flash (011), Flash Override (100), Critical (101), Internetwork Control (110), and Network Control (111). diffserv includes and extends these concepts. The Total Length field specifies the length of the entire IP PDU. Since the IP PDU is encapsulated in the underlying network frame (e.g., LLC/MAC), its length is constrained by the frame size of the underlying network. For example, as mentioned, the Ethernet limitation is 1500 octets. However, IP itself deals with this limitation by using segmentation and reassembly (SAR; also called fragmentation and defragmentation). IP does require, however, that all underlying networks be able to handle IP PDUs up to 576 octets in length without having to use SAR capabilities. Fragments of an IP PDU all have a header, basically copied from the original IP PDU, and segments of the data. They are treated as normal IP PDUs while being transported to the destination. However, if one of the fragments gets lost, the entire IP PDU is declared lost because IP does not support an acknowledgment mechanism; any fragments that have been delivered will be discarded by the destination. The Identification field contains a unique number assigned by the sender to aid in reassembling a fragmented IP PDU (all fragments of an initial IP PDU have the same and unique identification number). The Flags field is of the form O|DF|MF, where DF (“don’t fragment”) specifies whether the IP PDU can be segmented (0) or not (1), and MF (“more fragments”) specifies whether there are more segments (1) or no more segments, the present one being the last (0). The Fragment Offset field is used with fragmented IP PDUs and aids in the reassembly process. The value represents the number of 64-bit blocks (excluding header octets) that are contained in earlier fragments. In the first segment (or if the IP PDU consists of a single segment), the value is set to zero. The Time to Live (TTL) field specifies the time in seconds that this IP PDU is allowed to remain in circulation. Each IP gateway through which this IP PDU passes subtracts from this field the processing time expended on this IP PDU (each gateway is requested to subtract at least one unit from this counter). When the value of the field reaches zero, it is assumed that this IP PDU has been traveling in a loop, and it is therefore discarded. The Protocol field indicates the higher-level protocols to which this gateway should deliver the data. For example, a code of decimal 6 (= 00000110) means TCP; 29 is for ISO TP4; 10 is for BBN’s RCC; 22 is for Xerox’s IDP; 66 MIT’s RVD; and so forth. The Header Checksum field is a checksum covering the header only. It is calculated as the 16-bit ones complement of the ones complement sum of all 16-bit words in the header (for the purpose of the calculation, the Header Checksum field is assumed to be all zeros). The Source IP Address field contains the 32-bit IP address of the device sending this IP PDU. The Destination IP Address field contains the destination for this IP PDU. The IP addresses consist of the pair
32
Chapter Two
IP allows a portion of the host or device field to be used to specify a subnetwork (the network ID portion cannot be changed). Subnetworks are an extension to this scheme by considering a part of the to be a local network address, that is, a subnetwork address. IP addresses are then interpreted as <subnetwork address>
Subnet masks are used to describe subnetworks; they tell devices residing on the network how to interpret the device ID portion of the IP address. An example is: 255.255.255.240 (also represented as /28) for an environment with 16 subnets each with 16 hosts—except for the first and last subnet, which have 15 hosts. The address-checking software in each device is informed via the subnet mask not to treat the device ID exclusively as a device identifier, but as a subnetwork identifier followed by a smaller device identifier. Naturally, since the address space is finite, there is a trade-off between the number of subnetworks that can be supported and the number of devices on each subnetwork. The Internet community has adopted a scheme called Classless Internet Domain Routing (CIDR) that will preserve addresses by abandoning the old class rules. CIDR is expected to provide relief for the foreseeable future. The Options field (which must be processed by all devices in the interconnected network, although not all devices must be able to generate such a field) defines additional specific capabilities. These include explicit routing information, recording routes traveled, and timestamping.
Support of Voice and Video in Routers Even if new technologies such as those described later in this book (e.g., the technologies of Chapter 8) enter the market in a significant way, there will still be a large pool of enterprise networks based on traditional routers. Hence, support of such QoS metrics as delay, loss, jitter, and bandwidth in routers is important. Applications such as desktop conferencing, distance learning, mission-critical applications, voice, e-mail, and file transfer all compete for enterprise network resources. PDUs for all these applications show up in the routers and have to be appropriately handled, if QoS is to be secured. QoS attention is being focused at the WAN level since, in general, there is adequate bandwidth at the LAN level. In addition, the move to switched Ethernet all but eliminates delays due to random access contention (however, queuing delays in campus routers remain to be addressed). Cisco Systems, which is the market leader for routers, is approaching router-level QoS by using the following techniques4:
An Overview of IP, IPOATM, MPLS, and RTP
33
• Smart queuing Priority queuing Custom queuing Weighted fair queuing Weighted Random Early Detection (WRED) • Filtering and shaping Traffic shaping Frame relay traffic shaping These techniques have been introduced in routers since the mid-1990s and are proving to be an initial step in the direction of end-to-end QoS in both the Internet and in intranet. It is expected that in the coming years the protocols discussed in this chapter will become standard features in high-end routers. The topic of QoS is discussed at length in Chapter 8.
IP Version 6 (IPv6) The explosion of interest in Internet-based communication, the plethora of WWW sites being established on a daily basis, the introduction of electronic commerce [1], and the proliferation of networked resources all over the world may, in the next few years, exhaust the address space of IPv4. Therefore, in order to support an uninterrupted growth of the Internet while maintaining the current IP routing and addressing architecture, not only is a larger IP address space needed, but the assignments of addresses must also enable scaleable routing [9]. Although the 32-bit address mechanism in IPv4 can handle over 4 billion devices on about 16.7 million networks, the usable address space is more limited, particularly given the classification of addresses into Classes A, B, and C that exists for pre-CIDR applications. For example, an organization might have received a Class B address (i.e., the last 16 bits all represent hosts), but not made full use of it. Any time the administrator decides to subnet, a price must be paid in available devices. The total number of devices under subnets is always less than the number of devices without subnetting. That is the trade-off for the ability to network more easily. The lost addresses are all zeros and all ones addresses for each subnet and the all ones and all zeros values for the subnet field itself. The price of subnetting varies with the number of subnet bits and the class of network used. The IETF started to look at this problem in 1991. Part of its 1992 recommendation was to look at “bigger Internet addresses.” IPv6 has a 128-bit address.5 The group working on the protocol, sometimes known as the Simple Internet Protocol Plus Group, included the “working” IPv4 functions (though it included them in some different places in the IP header, or by different names), and removed (or
34
Chapter Two
made optional) infrequently used or “nonworking” functions. The following list depicts some highlights of IPv6 [10]. The protocol is described in RFC 1883; additional information on transition mechanisms can be found in RFC 1993 (see ds.InterNIC.net) [11, 12]. It now appears that the “address crunch” will not impact usage for the next 2–5 years; however, additional address space will be required over time. IPv6 Highlights Priority. There are four bits of the datagram to indicate its priority relative to other datagrams traveling across the network. The priority field first distinguishes among two broad types of traffic and then further refines the relative priorities within each traffic type. The broadcast distinction is between congestion-controlled and non-congestion-controlled traffic; it remains sensitive to congestion in the network. If the source detects congestion, it slows down, reducing traffic to the network. By slowing down, the system helps alleviate the congested situation. Streamlined header format. The IPv6 header is optimized for efficient processing. Superfluous fields have been eliminated, and all multibyte fields align on their natural boundaries. Flow label. The header now includes a flow label. Flow values may be assigned to particular streams of traffic with special quality-of-service requirements. 128-bit network addresses. As needed for growth, IPv6 now supports 128-bit network addresses. Elimination of header checksum. IPv6 no longer has a checksum of its own header. Fragmentation only by source host. Intermediate routers can no longer fragment a datagram. Only the sending host can create fragments. Extension headers. IPv6 is much more flexible in its support of options. Options appear in extension headers that follow the IP header in the datagram. Built-in security. IPv6 requires support for both authentication and confidentiality. The IPv4 address space problem is just one of the motivations behind IPv6. Some argue that today’s IPv4 host implementations lack such features as autoconfiguration, network layer security, priority, and others. IPv6 is intended to address most of the inadequacies in the existing IPv4 implementations. These inadequacies require the introduction of a new network layer header (IPv6). IPv6 removes the Header field and the IP Header Checksum and changes the Time to Live field to become a Hop Count limit (see Figure 2.4). IPv6 is identified in the Ethertype field in the SNAP/LLC with the siglum 86dd hex instead of 0800 hex.
An Overview of IP, IPOATM, MPLS, and RTP
Figure 2.4 IPv6 header.
35
32 bits VERS
PRI
Flow Label Payload Length
Next Address
Hop Limit
Source Address
Destination Address
Information
IPv6 supports datagram priority. Datagram-level priority values are placed in the priority field of the traffic’s IP datagrams. For congestion-controlled traffic, IP defines seven specific priorities, as follows [10]: 0 1 2 3 4 5 6 7
No specific priority Background traffic (e.g., news) Unattended data transfer (e.g., e-mail) Reserved for future definition Attended bulk transfer (e.g., file transfer) Reserved for future definition Interactive traffic (e.g., remote login and windowing systems) Control traffic (e.g., routing protocols and network management)
The second traffic type, non-congestion-controlled, does not adjust to congestion in the network. Such traffic sources include real-time audio, which cannot be delayed. IP reserves the priority values 8 through 15 for such traffic. For now, however, the IP standards offer no guidelines for specific assignment of these priorities. The source marks each datagram based on how willing it is to have the network
36
Chapter Two
discard that datagram. Lower priority values indicate a greater willingness to have a datagram discarded [10]. Real-time audio offers an example of how an application may use noncongestion-controlled priority. Many algorithms that digitize audio can tolerate the loss of some packets. In most cases, though, the algorithms have more difficulty reconstructing the audio when successive packets are lost. To reduce the probability of this happening, an audio application may vary the priority of its datagram. It might choose to alternate the priority between 8 and 9 with each datagram. Should the network have to discard two of the application’s datagrams, it will try to discard two of priority 8 before any of priority 9. Note that there is no relative ordering between congestion-controlled and non-congestion-controlled traffic. A datagram of priority 8 (non-congestion-controlled), for example, has neither a lower nor higher priority than a datagram of priority 7 (congestion-controlled) [10].
2.3
IP over ATM (IPOATM)
As a consequence of the popularity of IP implied in the previous section, one of the key considerations about ATM technology in recent years has been the support of IP. This requirement is driven by: (1) the desire to support the embedded base of applications and enterprise networks (including intranets), and (2) the desire to have access to the Internet, including virtual private networks (VPNs), over it. Beyond basic support of IP over ATM, the industry has also looked at ways to use the advantages of ATM to simplify IP-level Layer 3 PDU forwarding. In view of increased corporate dependence on information, including data, video, voice graphics, and distributed resources (Web access), users and planners want faster, larger, and better-performing networks—namely, higher speeds, scalability, and better performance and management. Some of the challenges being addressed are [13]: (1) Support more than besteffort and constant-bit-rate services on the same router-based network, (2) support circuitlike service via IP on a router-based network, and (3) support service levels, regardless of physical media and network discipline (e.g., ATM/LAN switching, traditional IP routing, and IP routing over ATM). Classical IP over ATM is the method of moving LAN and intranet traffic over ATM that has been developed by the IETF. The IETF’s specification is defined to provide native IP support over ATM and is documented in the following RFCs: • • • •
RFC 1483: Multiprotocol Encapsulation over ATM Adaptation Layer 5 RFC 1577: Classical IP and ARP over ATM RFC 1755: ATM Signaling Support for IP over ATM RFC 2022: Multicast Address Resolution (MARS) Protocol
An Overview of IP, IPOATM, MPLS, and RTP
37
These protocols are designed to treat ATM as virtual wire with the property of being connection-oriented, therefore, as with LAN Emulation (LANE), requiring unique means for address resolution and broadcast support. In the CIOA model,6 the ATM fabric interconnecting a group of hosts is considered a network, called Nonbroadcast Multiple Access (NBMA). An NBMA network is made up of a switched service like ATM or Frame Relay with a large number of end stations that cannot directly broadcast messages to each other. While there may be one Layer 2 network on the NBMA network, it is subdivided into several logical IP subnetworks (LISs) that can be traversed only via routers (see Figure 2.5). One of the design philosophies behind CIOA is that network administrators started out building networks using the same techniques that are used today—that is, dividing hosts into physical groups, called subnetworks, according to administrative workgroup domains.7 Then, the subnetworks are interconnected to other subnetworks via routers. A LIS in CIOA is made up of a collection of ATM-attached hosts and ATM-attached IP routers that are part of a common IP subnetwork. Policy administration, such as security, access controls, routing, and filtering, will still remain a function of routers because the ATM network is just fat wire. In CIOA, the functionality of address resolution is provided with the help of special-purpose server processes that are typically located together. This is accomplished via software upgrades on legacy routers. Each CIOA LIS has an Address Resolution Protocol (ARP) server that maintains IP address–to–ATM address mappings. All members of the LIS register with the ARP server; subsequently, all ARP requests from members of the LIS are handled by the ARP server. IP ARP requests are forwarded from hosts directly to the LIS ARP server, using MAC/ATM address ARP server (subnet 1)
ARP server (subnet 2)
ATM Network
ARP responses Registration ARP requests
Logical IP subnet 1
Figure 2.5 Classical IP over ATM model.
Logical IP subnet 2
Access links Trunks PVCs IP PDU path
38
Chapter Two
mapping that is acquired at CIOA registration. The ARP server, which is running on an ATM-attached router, replies with an ATM address. When the ARP request originator receives the reply with the ATM address, it can then issue a call setup message and directly establish communications with the desired destination. One of the limitations of this approach is that CIOA has no understanding of ATM QoS. Also, CIOA has the drawback of supporting only the IP protocol because the ARP server is knowledgeable only about IP. In addition, this approach does little to reduce the use of routers, although it does have the effect of separating, to a degree, the data forwarding function from the IP PDU processing function. In effect, IP PDUs do not have to be examined at the end of each hop, but can be examined at the end of a virtual channel (VC) or virtual path (VP) that may consist of several hops. The challenge is how to identify (address) the VC in question to reach a specific remote IP peer—hence, the address resolution function. The CIOA model’s simplicity does reduce the amount of broadcast traffic and the number of interactions with various servers8; in addition, once the address has been resolved there is a potential that the data transfer rate may subsequently be reduced. However, the reduction in complexity does come with a reduction in functionality. Communication between LISs must be made via ATM-attached routers that are members of more than one LIS. One physical ATM network can logically be considered to be several logical IP subnetworks, but the interconnection across IP subnets, from the host perspective, is accomplished via another router. Using an ATM-attached router as the path between subnetworks prevents ATM-attached end stations in different subnetworks from creating direct virtual circuits between one another. This restriction has the potential to degrade throughput and increase latency. There are also questions about the reliability of the IP ARP server, in that the current version of the specification has no provisions for redundancy: If the ARP server were to fail, all hosts on the LIS would be unable to use the ARP. Finally, CIOA suffers from the drawback that each host needs to be manually configured with the ATM address of the ARP server, as opposed to the dynamic discovery allowed. RFC 1577 specifies two major modifications to traditional connectionless ARP. The first modification is the creation of the ATMARP message, which is used to request addresses. The second modification is the InATMARP message, which inverts address registration. When a client wishes to initialize itself on a LIS, it establishes a switched virtual circuit to the CIOA ARP server. Once the circuit has been established, the server contains the ATM address extracted from the call setup message calling party field of the client. The server can now transmit an InATMARP request in an attempt to determine the IP address of the client that has just created the virtual circuit. The client responds to the InATMARP request with its IP address, and the server uses this information to build its ATMARP table cache. The ARP table in the server will contain listings for IP-to-ATM pairs for all hosts that have registered and periodically refreshed their entries to prevent them from timing out. The ATMARP server cache answers subsequent ATMARP requests for the clients’ IP addresses. Clients wishing
An Overview of IP, IPOATM, MPLS, and RTP
39
to resolve addresses generate ATMARP messages that are sent to their servers and locally cache the replies. Client cache table entries expire and must be renewed every 15 minutes. Server entries for attached hosts time out after 20 minutes. Data transfer is done by creating a VC between hosts, then using Logical Link Control/Subnetwork Acess Protocol (LLC/SNAP) encapsulation of data that has been segmented by AAL 5. Mapping IP packets onto ATM cells using LLC/SNAP is specified in RFC 1483, Multiprotocol Encapsulation over ATM. RFC 1483 specifies how data is formatted prior to segmentation (the RFC documents several different methods; however, the vast majority of host and router implementations use the LLC/SNAP encapsulation, and LLC/SNMP specifies that each datagram is prefaced with a bit pattern that the receiver can use to determine the protocol type of the source). The advantages provided by the encapsulation method specified in RFC 1483 are that it treats ATM as a data link layer that supports a large maximum transfer unit (MTU) and that it can operate in either a bridge or a multiplexed mode. Because the network is not emulating an Ethernet or Token Ring, like LANE, the MTU has been specified to be as large as 9180 bytes. The large MTU can improve the performance of hosts attached directly to the ATM network. As noted, multicast support is also of interest. CIOA provides multicast support via the Multicast Address Resolution Server (MARS). The MARS model is similar to a client/server design because it operates by requiring a multicast server to keep membership lists of multicast clients that have joined a multicast group. A client is assigned to a multicast server by a network administrator at configuration time. In the MARS model, a MARS system, along with its associated clients, is called a cluster. The MARS approach uses an address resolution server to map an IP multicast address from the cluster onto a set of ATM endpoint addresses of the multicast group members.
2.4
Basic Synopsis of MPLS
MPLS protocols allow high-performance label switching of IP packets; network traffic is forwarded by using a simple label, as described in RFC 3031 [14]. This section provides a working-level snapshot view of MPLS that is sufficient to highlight its potential value for VOIP and the QoS capabilities that it supports. Chapter 9 provides a more extended view of MPLS itself. MPLS is a five-year-old set of standards providing a link layer–independent transport framework for IP. MPLS runs over ATM, frame relay (FR), Ethernet, and point-to-point packet mode links. MPLS-based networks use existing IP mechanisms for the addressing of elements and for the routing of traffic. MPLS adds connection-oriented capabilities to the connectionless IP architecture, and it is the industry-accepted manifestation of the Network Layer/Layer 3/tag/IP switching technology that developed from various sources during the mid- to late 1990s. In an MPLS domain, when a stream of data traverses a common path, a labelswitched path (LSP) can be established by using MPLS signaling protocols. At the
40
Chapter Two
ingress label switch router (LSR), each packet is assigned a label and is transmitted downstream. At each LSR along the LSP, the label is used to forward the packet to the next hop. By combining the attributes of Layer 2 switching and Layer 3 routing into a single entity, MPLS provides [15] (1) enhanced scalability via switching technology; (2) support of class of service (CoS)- and QoS-based services (diffserv and intserv); (3) elimination of the need for an IPOATM overlay model and associated management overhead; and (4) enhanced traffic-shaping and traffic-engineering (TE) capabilities. Table 2.3 describes some MPLS features that make it a useful net-
Table 2.3 Application-Oriented Features of MPLS Aggregation of PDU streams
Explicit, improved routes Improved performance Link layer independence QoS support
Scalability of network layer routing
Traffic engineering (TE)
Virtual private network (VPN) support
In MPLS, however, the label-stacking mechanism can be used to perform the aggregation within Layer 2 itself. Typically, when multiple streams have to be aggregated for forwarding into a switched path, processing is required at both Layer 2 and Layer 3. The top label of the MPLS label stack is used to switch PDUs along the label-switched path, whereas the rest of the label stack is application-specific. MPLS supports explicit routes—those that have not been set up by normal IP hop-by-hop routing but for which ingress and egress nodes have specified all or some of the downstream nodes. MPLS enables higher data transmission performance from simplified packet forwarding and switching mechanisms. MPLS works with any type of link layer medium, such as ATM, FR, packet over SONET, and Ethernet. Explicit routes provide a mechanism for QoS constraint routing, among others. As an example, some of the initial deployment of the MPLS was over ATM infrastructures; in other cases, it was over a metro optical network. In the ATM scenario, the core LSRs and edge LSRs can allocate QoS to different user requirements and map them to different ATM VCs that support different ATM QoS. Because the edge LSRs constitute the ingress to the ATM overlay network, they are responsible for efficiently classifying the IP flows and mapping them to the ATM QoS. A key MPLS desideratum was to achieve a better, more efficient transfer of PDUs in the current IP networks. Combining the routing knowledge at Layer 3 with the ATM switching capability in ATM devices results in a better solution. In the MPLS scenario, it is sufficient to have adjacencies with the immediate peers. The edge LSRs interact with adjacent LSRs, which is sufficient for creating LSPs to transfer data. MPLS supports TE, a process of selecting the paths chosen by the data traffic for balancing the traffic load on the various links, routers, and switches within a network. To meet key performance objectives, TE must be (1) traffic-oriented, to include those aspects that enhance the QoS of traffic streams, and (2) resource-oriented, to include those aspects that optimize resource use. VPN is an application that uses label-stacking mechanisms. At the VPN ingress node, the VPN label is mapped onto the MPLS label stack, and the packets are label-switched along the LSP within the VPN until they emerge at the egress. At the egress node, the label stack is used for determining further forwarding of the PDUs.
An Overview of IP, IPOATM, MPLS, and RTP
41
Table 2.4 Basic MPLS RFCs RFC 2702
RFC 3031 RFC 3032
Requirements for TE over MPLS. Identifies the functional capabilities required for implementing policies that facilitate efficient, reliable network operations in an MPLS domain. These capabilities can optimize the use of network resources and enhance traffic-oriented performance characteristics. Specifies MPLS architecture. MPLS label-stack encoding. Specifies the encoding to be used by an LSR for transmitting labeled packets on Point-to-Point Protocol (PPP) data links, on LAN data links, and possibly on other data links. Also specifies rules and procedures for processing the various fields of label-stack encoding.
working technology; Table 2.4 lists key RFCs related to MPLS; and Table 2.5 lists key Internet Drafts related to MPLS. The delivery of QoS is where MPLS finds its greatest proficiency in the support of voice applications. The improved traffic management, the QoS capabilities, and the expedited packet forwarding via the label mechanism can all represent significant technical advantages for voice applications. Two approaches have evolved: (1) voice over MPLS (VOMPLS) and (2) voice over IP (VOIP) over, or encapsulated in, MPLS (VOIPOMPLS). Both approaches are discussed in Chapter 9. MPLS can be logically and functionally divided into two mechanisms for providing label-switching functionality: (1) MPLS forwarding/label-switching mechanism and (2) MPLS label-distribution mechanism.
MPLS Forwarding/Label-Switching Mechanism The key mechanism of MPLS is the forwarding/label-switching function (see Figure 2.6 for a view of the label mechanism). This is an advanced form of packet forwarding that replaces the conventional longest-address-match forwarding algorithm with a more efficient label-swapping forwarding algorithm. The IP header analysis is performed once at the ingress of the LSP for the classification of PDUs. PDUs that are forwarded via the same next hop are grouped into a forwarding equivalence class (FEC) based on one or more of the following parameters: (1) the address prefix, (2) the host address, and (3) the host address and QoS. The FEC to which the PDU belongs is encoded at the edge LSRs as a short fixed-length value known as a label. When the PDU is forwarded to its next hop,
Table 2.5 QoS-Related Internet Drafts December 2000 April 2001 April 2001 May 2001
Policy Framework MPLS Information Model for QoS and TE MPLS Support of Differentiated Services MPLS Support of Differentiated Services Using E-LSP Integrated Services Across MPLS Domains Using CR-LDP Signaling
42
Chapter Two
MPLS SHIM
IP Header
IP Pay Load
Network Layer Header
TCP & data
32 bits
Label
Exp. Bits
BS
TTL
20 bits
3 bits
1 bit
8 bits
Link Layer Header
MPLS encapsulated IP/TCP information
Appropriate Layer 2/Layer 1 encapsulation e.g., ATM's AAL 5; Frame relay's LAPD; Ehternet 802.3; SONET's POS Frame; etc.
Figure 2.6 MPLS label mechanism. the label is sent along with it. At downstream hops, there is no further analysis of the PDU’s network layer header; instead, the label is used as an index into a table, the entry of which specifies the next hop and a new label. The incoming label is replaced with this outgoing label, and the PDU is forwarded to its next hop. Labels usually have a local significance and are used to identify FECs based on the type of underlying network. For example, in ATM networks, the virtual path identifier (VPI) and virtual channel identifier (VCI) are used in generating the MPLS label; in FR networks, the data link control identifier (DLCI) is used. In ATM networks, the labels assigned to the FECs (PDUs) are the VPI and VCI of the virtual connections established as part of the LSP; in FR networks, the labels assigned to the FECs (PDUs) are the DLCIs. Label switching has been designed to leverage the Layer 2 switching function done in current data link layers, such as ATM and FR. It follows that the MPLS forwarding mechanism should be able to update the switching fabrics in ATM and FR
An Overview of IP, IPOATM, MPLS, and RTP
43
hardware in the LSR for the relevant sets of LSPs, which can be switched at the hardware level [16]. In Ethernet-based networks, the labels are short headers placed between the data link headers and the data link layer PDUs.
MPLS Label-Distribution Mechanism The MPLS architecture does not assume a single label-distribution protocol. Label distribution in MPLS is accomplished in the following two ways: 1. Using the Resource Reservation Protocol (RSVP) signaling mechanism to distribute labels mapped to RSVP flows. 2. Using the Label Distribution Protocol (LDP).
Label Distribution Using the RSVP The RSVP defines a session to be a data flow with a particular destination and transport layer protocol [17]. From the early to late 1990s, the RSVP was considered appropriate for QoS in IP networks. When RSVP and MPLS are combined, a flow or session can be defined with greater generality. The ingress node of an LSP can use a variety of means to determine which PDU sets are assigned to a particular label. Once a set of PDUs is assigned to a label, the label effectively defines the flow through the LSP. Such an LSP is referred to as an LSP tunnel because the traffic flowing through it is opaque to intermediate nodes along the label-switched path. The label-request information for the labels associated with RSVP flows will be carried as part of the RSVP path messages; the label-mapping information for the labels associated with RSVP flows will be carried as part of the RSVP resv messages [16]. The initial implementors of MPLS chose to extend RSVP into a signaling protocol for supporting the creation of LSPs that could be routed away automatically from network failures and congestion. An Internet Draft defines the extension to RSVP for establishing LSPs in MPLS networks [18]. The use of RSVP as a signaling protocol for TE is quite different than that envisioned by its original developers in the mid-1990s, as follows [19]: • A number of extensions were added to the base RSVP specification (RFC 2205 and 2209) for supporting the establishment and maintenance of explicitly routed LSPs. • RSVP signaling takes place between pairs of routers (rather than pairs of hosts) that act as the ingress and egress points of a traffic trunk. Extended RSVP installs a protocol state that applies not to a single host-to-host flow but to a collection of flows sharing a common path and a common pool of network resources. By aggregating numerous host-to-host flows into each LSP tunnel, extended RSVP significantly reduces the amount of RSVP state that must be maintained in the core of a service provider’s network.
44
Chapter Two
• RSVP signaling installs a distributed state related to packet forwarding, including the distribution of MPLS labels. • The scalability, latency, and traffic-overhead concerns regarding RSVP’s softstate model are addressed by a set of extensions that reduce the number of refresh messages and the associated message-processing requirements. • The path established by RSVP signaling is not constrained by conventional destination-based routing, so it is a good tool for establishing TE trunks. In 1997, the initial implementors of MPLS had many reasons for choosing to extend RSVP rather than design an entirely new signaling protocol to support TE requirements [19]. Extended RSVP provides a unified signaling system that delivers everything that network operators needed to dynamically establish LSPs, including the following: • Extended RSVP creates an LSP along an explicit route to support the TE requirements of large service providers. • Extended RSVP establishes an LSP state by distributing label-binding information to the LSRs in the LSP. • Extended RSVP can reserve network resources in the LSRs along the LSP, which is the traditional role of RSVP. Extended RSVP, however, also permits an LSP to carry best-effort traffic without it making a specific resource reservation. As will be seen in Chapter 8, RSVP can serve a dual role in MPLS: one for label distribution, another for QoS support.
LDP The LDP is a set of procedures and messages by which LSRs establish LSPs through a network by mapping network layer–routing information directly to data link layer–switched paths. These LSPs may have an endpoint at a directly attached neighbor (comparable to IP hop-by-hop forwarding) or an endpoint at a network egress node, enabling switching via all intermediary nodes. The LDP associates an FEC with each LSP that it creates. The FEC associated with an LSP specifies which PDUs are mapped to that LSP. LSPs are extended through a network as each LSR splices incoming labels for an FEC to the outgoing label assigned to the next hop for the given FEC. The messages exchanged between LSRs are classified into four categories: 1. Discovery messages, used to announce and maintain the presence of an LSR in a network. 2. Session messages, used to establish, maintain, and terminate sessions between LSP peers.
An Overview of IP, IPOATM, MPLS, and RTP
45
3. Advertisement messages, used to create, change, and delete label mappings for FECs. 4. Notification messages, used to provide advisory information and to signal error information. The LDP uses the Transmission Control Protocol (TCP) for session, advertisement, and notification messages. The TCP is used to provide reliable, sequenced messages. Discovery messages, transmitted via the UDP, are sent to the LSP port at the all-routers-on-this-subnet-group multicast address. These messages provide a mechanism through which LSRs can indicate their presence within a network. An LSR sends a hello message periodically, and when it chooses to establish a session with another LSR discovered via a hello message, it uses (via the TCP) the LDPinitialization procedure. Upon successful completion of the initialization procedure, the two LSRs become LSP peers and may exchange advertisement messages. An LSR requests label mapping from a neighboring LSR when it needs one, and it advertises a label mapping to a neighboring LSR when it wishes the neighboring LSR to use a label.
2.5
Real-time Transport Protocol (RTP)
There has been a flurry of activity in the recent past in the development of real-time protocols. These protocols are called real-time because they are used when there are tight constraints on the QoS that must be delivered in the network (e.g., the total transit delay or interpacket arrive time must be bounded). The following key primary protocols have been developed to support real-time delivery of information: • Real-time Transport Protocol (RTP). A real-time end-to-end protocol utilizing existing transport layers for data that has real-time properties. (RFC 1889, Jan. 1996.) • RTP Control Protocol (RTCP). A protocol to monitor the QoS and to convey information about the participants in an ongoing session.9 It provides feedback on the quality of the information transmitted so that modifications can be made and feedback on total performance. (RFC 1889, Jan. 1996.) • Real-Time Streaming Protocol (RTSP). A transport layer protocol designed for controlling the transmission of audio and video over the Internet (this protocol is not discussed further here—see Reference [20]). RTP provides end-to-end delivery services for data with real-time characteristics, such as interactive audio and video. Those services include payload type identification, sequence numbering, timestamping, and delivery monitoring. Applications typically run RTP on top of UDP to make use of its multiplexing and
46
Chapter Two
checksum services; both protocols contribute parts of the transport protocol functionality. However, RTP may be used with other suitable underlying network or transport protocols. RTP supports data transfer to multiple destinations using multicast distribution, if provided by the underlying network [21]. RTP provides transport of data with an inherent notion of time, and it also provides a means for transmitting real-time data because it, unlike the legacy transport layer protocol, has been optimized for such tasks. RTP has been developed with flexibility and scalability in mind and is now being used as the core protocol real-time transport on both pure IP-network and MPLS systems. Figure 2.7 depicts the packet format for RTP. It entails a 12-octet header. An RTP-encoded payload will have a 40-octet header (12 for RTP, 8 for UDP, and 20 for IP). RFC 2508 (called CRTP) compresses the header information to 2 to 4 octets on a hop-by-hop basis. In addition, extensions are being developed (e.g., draft-ietf-avt-crtp-enhance-02.txt and draft-ietf-avt-tcrtp-04.txt). RTP by itself does not address resource reservation and does not guarantee QoS for real-time services. Specifically, RTP does not provide any mechanism to ensure timely delivery or provide other QoS guarantees, but relies on lower-layer services to do so. (The functions of QoS guarantees and delivery are the responsibility of RSVP and network support of QoS-based services.) It does not guarantee delivery or prevent out-of-order delivery, nor does it assume that the underlying network is reliable and delivers packets in sequence. The sequence numbers included in RTP allow the receiver to reconstruct the sender’s packet sequence, but sequence numbers might also be used to determine the proper location of a packet—for example, in video decoding—without necessarily decoding packets in sequence [21]. The data transport is augmented by a control protocol (RTCP) to allow monitoring of the information delivery in a manner scalable to large multicast networks and to provide minimal control and identification functionality. Like RTP, RTCP is designed to be independent of the underlying transport and network layers [21]. While RTP is primarily designed to satisfy the needs of multiparticipant multimedia conferences, it is not limited to that particular application. Storage of continuous data, interactive distributed simulation, and control and measurement applications may also find RTP applicable. RTP is intended to be malleable to provide information required by a particular application and will often be integrated into the application processing rather than being implemented as a separate layer. RTP is a protocol framework that is deliberately not complete10 [21]. RTP’s primary role is to act as a simple, improved, scalable interface between real-time applications and existing transport layer protocols; RTP does not dictate which transport layer protocol is used. (The protocol is independent of the underlying transport and network layer, although, as noted, UDP is typically utilized.) RTP provides functions that allow transport protocols to work in a real-time environment and provides functionality just above the transport layer. The underlying network is assumed to be any IP network; this implies that in all likelihood,
An Overview of IP, IPOATM, MPLS, and RTP
Figure 2.7 RTP header.
bits 0
8
16
V(2) P X #CSRC M Payload Type
24
47
32
Sequence Number
Timestamp Synchronization Source (SSRC) Contributing Source Identifier (CSRC) Header Extension
Payload (audio, video, etc.) 0x00
Fields • X - Indicates header extension • P - Padding present • Used in encryption to pad to natural crypto block boundary • M - Marker bit • Indicates start of talkspurt (audio) • Indicates end of frame (video) • Payload type - format of the RTP payload and determines its interpretation by the application • Sequence number - increments by one for each RTP packet and may be used by receiver to detect packet loss or restore proper sequence • Timestamp - used by receiver for synchronization and jitter calculations • SSRC - value used to identify source of realtime stream, needed by receiver to group packets with same SSRC for payback • CSRC - Contributing sources • Used for receivers downstream from a mixer
but not with certainty, a packet will arrive at its destination. Due to the nature of packet switching (including FR and ATM), variable delay is to be expected. In addition, due to packet switching and routing, packets may arrive out of order. The protocol also contains definitions of which component should perform which specified function. The RTP component carries individual real-time data streams
48
Chapter Two
with a source identifier and payload type, time, and sequencing information. The feedback component monitors application performance and conveys information about the session (i.e., information about participants). The following lists provide a snapshot of RTP and a glossary of key concepts and terms. In a nutshell, when packets arrive at the destination the sequence number of each packet is examined to determine the correct sequencing of data and also to record the fraction of lost frames. The RTP packet’s timestamp is used to determine the interpacket gap. The timestamp value is set by the source as it encodes the data and transmits the packet into the network. As packets arrive at the destination, the change in interpacket gap can be examined, and during playback this information can be used to regenerate the contents at the same rate as they were encoded. By utilizing buffering at the receiver, the source can attempt to pace the outgoing traffic independently of the jitter introduced by the packet network [20]. RTP Highlights • Designed to provide end-to-end delivery services for temporally sensitive data with support for both unicast and multicast delivery. • Can be carried inside of a UDP payload. • Provides data source and payload type identification that is used to determine payload contents. • Provides packet sequencing that is used to confirm correct ordering at the receiver. • Provides timing and synchronization that is used to set timing at the receiver during content playback. • Provides monitoring that is used to facilitate diagnosis or feedback to the sender on the quality of data transmission. • Supports integration of heterogeneous traffic that is used to merge multiple transmitting sources into a single flow. RTP Glossary contributing source (CSRC) A source of a stream of RTP packets that has contributed to the combined stream produced by an RTP mixer (see mixer). The mixer inserts a list of the SSRC identifiers of the sources that contributed to the generation of the particular packet into the RTP header of that packet. This list is called the CSRC list. An example application is audioconferencing, where a mixer indicates all the talkers whose speech was combined to produce the outgoing packet, allowing the receiver to indicate the current talker even though all the audio packets contain the same SSRC identifier (that of the mixer). end system An application that generates the content to be sent in RTP packets and/or consumes the content of received RTP packets. An end system can act as one or more synchronization sources in a particular RTP session, but typically acts only as one.
An Overview of IP, IPOATM, MPLS, and RTP
49
mixer An intermediate system that receives RTP packets from one or more sources, possibly changes the data format, combines the packets in some manner, and then forwards a new RTP packet. Since the timing among multiple input sources will not generally be synchronized, the mixer will make timing adjustments and generate its own timing for the combined streams. Thus all data packets originating from a mixer will be identified as having the mixer as their synchronization source. monitor An application that receives RTCP packets sent by participants in an RTP session, in particular the reception reports, and estimates the current quality of service for distribution monitoring, fault diagnosis, and long-term statistics. The monitor function is likely to be built into the applications participating in the session, but may also be a separate application that does not otherwise participate and does not send or receive the RTP data packets. These are called third-party monitors. non-RTP means Protocols and mechanisms that may be needed in addition to RTP to provide a usable service. In particular, for multimedia conferences, a conference control application may distribute multicast addresses and keys for encryption, negotiate the encryption algorithm to be used, and define dynamic mappings between RTP payload type values and the payload formats they represent for formats that do not have a predefined payload type value. For simple applications, e-mail or a conference database may also be used. The specification of such protocols and mechanisms is outside the scope of this text. RTCP packet A control packet consisting of a fixed header part similar to that of RTP data packets, followed by structured elements that vary depending upon the RTCP packet type. Typically, multiple RTCP packets are sent together as a compound RTCP packet in a single packet of the underlying protocol; this is enabled by the length field in the fixed header of each RTCP packet. RTP packet A data packet consisting of the fixed RTP header, a possibly empty list of contributing sources (see contributing source), and the payload data. Some underlying protocols may require an encapsulation of the RTP packet to be defined. Typically, one packet of the underlying protocol contains a single RTP packet, but several RTP packets may be contained if permitted by the encapsulation method. RTP payload The data transport by RTP in packets, for example, audio sample or compressed video data. The payload format and interpretation are beyond the scope of the RTP specification. RTP session The association among a set of participants communicating with RTP. For each participant, the session is defined by a particular pair of destination transport addresses (one network address plus a port pair for RTP and RTCP). The destination transport address pair may be common for all participants, as in the case of IP multicast, or may be different for each, as in the case
50
Chapter Two
of individual unicast network addresses plus a common port pair. In a multimedia session, each medium is carried in a separate RTP session with its own RTCP packets. The multiple RTP sessions are distinguished by different port number pairs and/or different multicast addresses. synchronization source (SSRC) The source of a stream of RTP packets, identified by a 32-bit numeric SSRC identifier carried in the RTP header so as not to be dependent upon the network address. All packets from a SSRC form part of the same timing and sequence number space, so a receiver groups packets by SSRC for playback. Examples of SSRCs include the sender of a stream of packets derived from a signal source, such as a microphone, a camera, or an RTP mixer (see mixer). An SSRC may change its data format (e.g., audio encoding) over time. The SSRC identifier is a randomly chosen value meant to be globally unique within a particular RTP session; the binding of the SSRC identifiers is provided through RTCP. If a participant generates multiple streams in one RTP session—for example, from separate video cameras—each must be identified by a different SSRC. translator An intermediate system that forwards RTP packets with their SSRC identifiers intact. Examples of translators include devices that convert encodings without mixing, replicators from multicast to unicast, and application-level filters in firewalls. transport address The combination of a network address and port that identifies a transport-level endpoint; for example, an IP address and a UDP port. Packets are transmitted from a source transport address to a destination transport address.
2.6
RTP Control Protocol (RTCP)
The previous section highlighted that RTP is a simple protocol designed to carry real-time traffic and to provide a few additional services that are not present in existing transport protocols like UDP: With RTP, receivers can utilize the timestamp along with sequence numbers to better synchronize sessions and improve playback. As a companion to RTP, the IETF has designed the RTP Control Protocol (RTCP), which is used to communicate between the sources and destinations. RTCP is not used to establish QoS parameters with the ATM switch; instead, it is oriented toward state information. RTCP is based on the periodic transmission of control packets to all participants in the session, using the same distribution mechanism as the data packets. The underlying protocol must provide multiplexing of the data and control packets, for example, by using separate port numbers with UDP. RTCP performs four functions [21]:
An Overview of IP, IPOATM, MPLS, and RTP
51
1. The primary function is to provide feedback on the quality of the data distribution. This is an integral part of the RTP’s role as a transport and is related to the flow and congestion control functions of other transport protocols. The feedback may be directly useful for control of adaptive encodings, but experiments with IP multicasting have shown that it is also critical to get feedback from the receivers to diagnose faults in the distribution. Sending reception feedback reports to all participants allows one who is observing problems to evaluate whether those problems are local or global. With a distribution mechanism like IP multicast, it is also possible for an entity such as a network service provider that is not otherwise involved in the session to receive the feedback information and act as a third-party monitor to diagnose network problems. 2. RTCP carries a persistent transport-level identifier for an RTP source called the canonical name (CNAME). Since the SSRC identifier may change if a conflict is discovered or a program is restarted, receivers require the CNAME to keep track of each participant. Receivers also require the CNAME to associate multiple data streams from a given participant in a set of related RTP sessions, for example, to synchronize audio and video. 3. The first two functions require that all participants send RTP packets; therefore, the rate must be controlled in order for RTP to scale up to a larger number of participants. 4. A fourth, optional function is to convey minimal session control information, for example, participant identification to be displayed in the user interface. This is most likely to be useful in loosely controlled sessions where participants enter and leave without membership control or parameter negotiation. RTCP serves as a convenient channel to reach all the participants, but it is not necessarily expected to support all the control communication requirements of an application. A higher-level session control protocol may be needed. The following list depicts the kinds of reports that are generated by RTCP: • Sender Report (SR), for transmission and reception statistics from participants who are not active senders • Receiver Report (RR), for reception statistics from participants who are not active senders • Source Description (SDES) items, including CNAME • BYE, for indicating end of participation • APP, for application-specific functions Figure 2.8 provides an example of an RTCP report—specifically, the sender report.
52
Chapter Two
Figure 2.8 RTCP report: Sender Report.
bits 0
8
V(2) P
RC
16 PT=SR=200
24 Length
32 Header
SSRC of Sender Synchronization Source (SSRC) NTP Timestamp (msw) NTP Timestamp (lsw) RTP Timestamp
Sender Info
Sender's Packet Count Sender's Octet Count SSRC_1 (SSRC of first source) Fraction Lost
Cumulative Number of Packets Lost
Extended Highest Sequence Number Received Inter-arrival Jitter
Receiver Report
Last SR Delay Since Last SR (DLSR) • SSRCs - Indicated the SSRC of both the sender and the actual source • NTP (Network Time Protocol) and RTP Timestamps Correlates a global clock with the sender's media clock • Packet and Octet Counts - Enables receivers to know actual transmit numbers for this source
2.7 Stream Control Transmission Protocol (SCTP) The Stream Control Transmission Protocol (SCTP) is a new transport layer protocol developed by the IETF (RFC 2960) to overcome limitations imposed by TCP with respect to transport of signaling messages in VOIP networks. It may find other applications in the future. The SCTP provides all the TCP features, but it also provides (1) multistreaming support, allowing independent transport and delivery of multiple streams between hosts, and (2) multihoming, allowing more than one path between hosts. Also, the SCTP allows the reliable transfer of signaling messages between signaling endpoints in an IP network. Figure 2.9 shows the packet format. Carrier-related concerns have sponsored work in this area.
An Overview of IP, IPOATM, MPLS, and RTP
Figure 2.9 Stream Control Transmission Protocol.
53
(Signaling) Application
(Signaling) Application
SCTP
SCTP
IP H
IP Network
IP H
32 Bit
Source Port Destination Port SCTP Common Verification Tag Header Checksum Type Flags Length Chunk 1 User Data . . . . . . Type Flags Length Chunk N User Data
Chunks may hold different types of user and control data
The SCTP is designed to transport PSTN signaling messages over IP networks, although it is capable of broader applications. It is a reliable transport protocol that operates over a connectionless packet network, such as IP, and offers the following services to its users [22]: • Acknowledged error-free nonduplicated transfer of user data • Data fragmentation to conform to the discovered path’s maximum transmission unit (MTU) size • Sequenced delivery of user messages within multiple streams, with an option for order-of-arrival delivery of individual user messages • Optional bundling of multiple user messages into a single SCTP packet • Network-level fault tolerance through the support of multihoming at one or both ends of an association IP-signaling traffic usually comprises many independent message sequences between many signaling endpoints. The SCTP allows signaling messages to be
54
Chapter Two
independently ordered within multiple streams (i.e., unidirectional logical channels established from one SCTP endpoint to another) to ensure in-sequence delivery between associated endpoints. The transfer of independent message sequences in separate SCTP streams makes the retransmission of a lost message less likely to affect the timely delivery of other messages in unrelated sequences. To establish an association between SCTP endpoints, one endpoint provides the other endpoint with a list of its transport addresses (i.e., multiple IP addresses in combination with an SCTP port). These transport addresses identify the addresses that will send and receive SCTP packets. The SCTP is discussed in detail in Chapter 7.
2.8
ATM QoS Mechanisms
The focus of this book is on IP-based QoS. This topic is covered at length in Chapter 8. What follows is a brief review of ATM QoS capabilities. ATM’s claim to fame is its support of QoS. This is done via expansive use of resource sharing techniques, so that communications resources (specifically, broadband communication channels) are available on a per-VC basis, without having to allocate the maximum number of resources, which would grow linearly on the number of VCs or ports. ATM is a statistical multiplexing technology par excellence; yet the statistical multiplexing is done in such an intelligent way that QoS is guaranteed to the user. Statistical multiplexing allows higher utilization of resources based both on allocating unused bandwidth to those that need it and on the intrinsic higher efficiency of pooled traffic.11 Furthermore, the judicious use of overbooking also increases efficiency. The good news is that not only have standards been developed (e.g., UNI 3.1 and UNI/TM 4.0), but switches have been brought to the market by many vendors that support these standards, as described in Reference [3]. In general, support of QoS implies buffer management; in addition to algorithmic resources, this implies the presence of relatively large buffers. Besides the photonics (specifically, long-reach lasers), the bulk of the cost in an ATM switch is in memory. Traffic management allows ATM networks to have well-behaved operations in terms of predictable performance matching the expected (negotiated) level, thereby minimizing congestion and maximizing efficiency. ATM’s QoS support is useful not only in pure ATM networks, but also in IP/RSVP-based networks that rely on ATM for Layer 2 transport. Today, applications are not QoS-aware, but new voice, video, multimedia, and CTI applications may be developed with QoS in mind. In spite of the benefits, QoS and the switches’ approach to supporting it (e.g., dropping cells) have to be clearly understood, because some applications, such as TCP, may not operate well in a scenario of high cell loss; the cell loss naturally
An Overview of IP, IPOATM, MPLS, and RTP
55
depends on the class of service that is selected (with CBR having the lowest and UBR having the highest). QoS is achieved by managing the traffic intelligently. There are controls for the rate at which the traffic enters the network, at the VC level. The parameters used by ATM (specifically from the ATM layer pacing mechanism) to do traffic management are obtained at SVC or PVC setup time [24]. In ATM, the host signals its requirements to the network via the signaling mechanism. Each ATM switch in the path uses the traffic parameters to determine, via the Call Admission Control (CAC) mechanism, if sufficient resources are available to set up the connection at the requested QoS level. In private networks the Private Network Node Interface (P-NNI)12 protocol is responsible for determining if the required resources are available across the network (end-to-end). The CAC is used in each individual switch to determine if locally controlled resources are available, consistent with the request of the SETUP message. If the switch does have resources to support the call-request, it then reroutes the message to the next switch along a possible best path to the destination.13 To convey QoS requests, there has to be a capability for the end system to signal to the connecting ATM switch its requirements. In turn, this switch must propagate that request across the network. The former is done via User-Network Interface (UNI) signaling (e.g., ATMF UNI 4.0); the latter is done via NNI signaling (e.g., PNNI 1.0) [20, 25]. The signaling mechanism whereby the various QoS parameters are coded into the SETUP message supplements the QoS Class procedures defined in ATMF UNI 3.1 and ITU-T I.356, briefly discussed in the following. It is worth noting that many switches and carriers actually support UNI 3.1 (rather than UNI 4.0) at this writing. In practice, however, many networks will continue to offer discrete, class-based values for services, although the “vocabulary” is now available for the user to communicate the QoS values to several digits of precision. The specification indicates that “implementations capable of stating QoS in terms of individual numeric parameter values may do so using the procedures defined in UNI Signaling 4.0 and PNNI 1.0; implementations must at a minimum support QoS indication via QoS classes.” An important requirement of QoS is to exactly define measurements, cell events, outcomes, and so forth, and to have a reference model. For example, a lost cell outcome is defined as the situation when no cell is received corresponding to the transmitted cell within a specified time Tmax. Another important point is that quantitative values for performance objectives are not defined in the specifications; rather, the document specifies means to measure or estimate the values of defined performance metrics.14 It is understood why no numbers were specified: No one wanted to commit to some specific goal. This is not the case with other transmission standards. For example, ANSI standards define exact jitter bit error rate (BER) values, and so forth for DS1 lines, DS3 lines, and so forth. The consequence of this is that the VBR service from one carrier may be different from the service obtained from another carrier, even though the name of the service is the same. Somewhat
56
Chapter Two
mitigating this possibility for inconsistency is that carriers may all use a few kinds of switches. Hence, to a degree there may be some derivative commonality. It is to be understood that the measurement of the network performance on a VC is likely to be different from the negotiated objective at any given time. This is because (1) the negotiated objective is the worst case of network performance that the network will allow, including peak intervals (hopefully, the QoS measures will exceed these numbers in many cases); and (2) transient events may cause the measured performance to be worse than the negotiated objective (if and when a measurement is taken over a small time base). QoS commitments are probabilistic in nature; therefore, both users and carriers have to realize that statements like “guaranteed QoS” are actually incorrect. The stated QoS is only an approximation of the performance that the network plans to offer over the duration of the connection. Specifically, since there is no limit to the length of the connection and the network makes resource decisions based only on information available at the time the connection is established, the actual QoS may well vary over the course of time. Transient events such as intermittent physical trunk failure, higher transient bit error rate (e.g., for circuits over microwave links), and even bursts of traffic from other sources when the UPC parameters (including switch-specific “fudge knobs”) are not properly set by the switch administrator, can all impact QoS. Thus, the ATMF TM 4.0 document indicates that “QoS commitments can only be evaluated over a long period of time and over multiple connections with similar QoS commitments.” Although this implies that in the long term the QoS is met, it could also mean temporary problems with real-time traffic such as voice, particularly if CBR services are not used.
Quality of Service Parameters The ATMF TM 4.0 supports the following six QoS parameters: • • • • • •
Peak-to-peak cell delay variation (ptpCDV) Maximum cell transfer delay (MaxCTD) Cell loss ratio (CLR) Cell error ratio (CER) Severely errored cell block ratio (SECBR) Cell misinsertion rate (CMR)
The first three can be negotiated as part of the call setup, while the last three are more network-intrinsic. Negotiation may entail specifying one or more of the parameters in question; also, the QoS could be set up differently for the two directions (or multiple legs) of a VC. By definition of call setup, QoS can be established on a per-call per-VC basis. In the network, QoS support is achieved by appropriate
An Overview of IP, IPOATM, MPLS, and RTP
57
dynamic routing of the connection or by implementation-specific mechanisms. What may well fit in this last category is the current tendency of carriers to overengineer the network to make sure that QoS can be achieved and sustained. It should be noted, however, that carriers may provide a small set of discrete choices for the negotiable parameters, rather than accept a continuum of request values. For example, there may be a low (10−9), medium (10−8), and high (10−7) CLR to choose from, and so forth. Maximum cell transfer delay and peak-to-peak cell delay variation (both of which are negotiable) have to be defined very exactly, also using the reference model (as done in ATMF TM 4.0 [26]). Table 2.6 depicts some of the dependability measures that can be defined by formula. A service agreement for ATM services involves a traffic contract.15 The user’s traffic is described via traffic parameters, specifically the following: • • • •
Peak Cell Rate (PCR) Sustainable Cell Rate (SCR) Maximum Burst Size (MBS) Minimum Cell Rate (MCR)
QoS Classes As noted, carriers may support only a QoS-class service level (i.e., select from a menu), rather than a continuum of values for the parameters in question. Also, various parameters may be bundled, to arrive at classes, as follows: Gold package ptpCDV < 250 ms Silver package ptpCDV < 350 ms Bronze package ptpCDV < 450 ms
MaxCTD = 30 ms CLR = 10 in 1 billion MaxCTD = 40 ms CLR = 10 in 0.1 billion MaxCTD = 40 ms CLR = 10 in 0.01 billion
Table 2.6 Dependability QoS Metrics CLR
Lost cells = ᎏᎏᎏ Total transmitted cells
CER*
Errored cells = ᎏᎏᎏᎏᎏ (Successfully transferred cells + Errored cells)
Severely errored cell blocks SECBR = ᎏᎏᎏᎏ Total transmitted cell blocks CMR
Misinserted cells = ᎏᎏ Time interval
*CLR is negotiable in UNI 4.0; the other metrics are not negotiable.
58
Chapter Two
According to the standards (e.g., ITU-T Recommendations I.150 and Q.2931), a user of an ATM connection (a VCC or a VPC) is provided with one of a number of QoS classes supported by the network. It should be noted that a VPC may carry VC links of various QoS classes; here the QoS of the VPC must meet the most demanding QoS of the VC links carried. The QoS class associated with a given ATM connection is indicated to the network at the time of connection establishment and will not change for the duration of that ATM connection. QoS class (the reader may think of these as packaged menus) can have specified performance parameters (called Specified QoS class) or no specified performance parameters (called Unspecified QoS class). A Specified QoS class specifies a set of performance parameters and the objective values for each performance parameter identified. Examples of performance parameters that could be in a QoS class are all or a subset of the following: cell transfer delay, cell delay variation, and cell loss ratio. Within a Specified QoS class, at most two cell loss ratio parameters may be specified. If a Specified QoS class does contain two cell loss ratio parameters, then one parameter is for all CLP = 0 cells and the other parameter is for all CLP = 1 cells of the ATM connection. As presently foreseen, other performance parameters besides the cell loss ratio would apply to the aggregate cell flow of the ATM connection. A QoS class could contain, for example, the following performance parameters: maximum cell transfer delay, a cell delay variation, and a cell loss ratio on CLP = 0 cells. The performance provided by the network should meet (or exceed) performance parameter objectives of the QoS class requested by the ATM endpoint. A Specified QoS class provides a quality of service to an ATM connection in terms of a subset of the ATM performance parameters discussed in the preceding. For each Specified QoS class, there is one specified objective value for each performance parameter. Initially, each network should define objective values for a subset of the ATM performance parameters for at least one of the following Service Classes from ITU-T Recommendation I.362 in a reference configuration that may depend on propagation delay and other factors16: Service Class A Service Class B Service Class C Service Class D
Circuit Emulation and Constant Bit Rate Video Variable Bit Rate Audio and Video Connection-Oriented Data Transfer Connectionless Data Transfer
The following Specified QoS Classes are currently defined: Specified QoS Class 1. Supports a QoS that will meet Service Class A performance requirements. Specified QoS Class 1 should yield performance comparable to current digital private line performance.
An Overview of IP, IPOATM, MPLS, and RTP
59
Specified QoS Class 2. Supports a QoS that will meet Service Class B performance requirements. Specified QoS Class 2 is intended for packetized video and audio in teleconferencing and multimedia applications. Specified QoS Class 3. Supports a QoS that will meet Service Class C performance requirements. Specified QoS Class 3 is intended for interoperation of connection-oriented protocols, such as FR. Specified QoS Class 4. Supports a QoS that will meet Service Class D performance requirements. Specified QoS Class 4 is intended for interoperation of connectionless protocols, such as IP or SMDS. In the Unspecified QoS class, no objective is specified for the performance parameters. However, the network may determine a set of internal objectives for the performance parameters. In fact, these internal performance parameter objectives need not be constant during the duration of a connection. Thus, for the Unspecified QoS class, there is no explicitly specified QoS commitment on either the CLP = 0 or the CLP = 1 cell flow. Services using the Unspecified QoS class may have explicitly specified traffic parameters. An example application of the Unspecified QoS class is the support of a best-effort service (i.e., UBR). For this type of service, the user selects the Best-Effort Capability, the Unspecified QoS class, and only the traffic parameter for the PCR on CLP = 0 + 1. This capability can be used to support users that are capable of regulating the traffic flow into the network and to adapt to time-variable available resources.
References 1. D. Minoli and E. Minoli. Web Commerce Handbook. New York: McGraw-Hill, 1998. 2. D. Minoli and A. Schmidt. Switched Network Services. New York: Wiley, 1998. 3. D. Minoli and J. Amoss. IP over ATM. New York: McGraw-Hill, 1998. 4. D. Minoli and A. Alles. LAN, ATM, and LAN Emulation. Norwood, MA: Artech House, 1997. 5. D. Minoli. Internet and Intranet Engineering. New York: McGraw-Hill, 1997. 6. D. Minoli. Telecommunication Technology Handbook. Norwood, MA: Artech House, 1991. 7. D. Minoli. Designing Broadband Networks. Norwood, MA: Artech House, 1993. 8. R. Perlman. Interconnections: The Theory of Bridges and Routers. Reading, MA: Addison-Wesley, 1991.
60
Chapter Two
9. Cisco Systems. Internet Protocol Version 6. www.cisco/warp/public/732/ipv6/ipv6_wp.html. 10. S. A. Thomas. IPng and the TCP/IP Protocols. New York: Wiley, 1996. 11. ARG. Hands-on Internetworking with TCP/IP. Morristown, NJ, Spring 1997. 12. IPng. playground.sun.com/ipng; fttp://ftp.parc.xerox.com:/pub/ipng; and [email protected]. 13. J. McQuillan. The NGN Executive Seminar. New York, March 20, 1997. Business Communications Review. 14. E. Rosen, A. Viswanathan, and R. Callon. “Multiprotocol Label Switching Architecture.” RFC 3031 (January 2001). 15. R. Pulley and P. Christensen. A Comparison of MPLS Traffic Engineering Initiatives. NetPlane Systems, Inc., White Paper. Westwood, MA. www.netplane.com. 16. Future Software Limited White Paper. MultiProtocol Label Switching. Chennai, India, 2001. www.futsoft.com. 17. S. Braden. “Resource ReSerVation Protocol (RSVP)—Version 1 Functional Specification.” RFC 2205 (September 1997). 18. D. Awduche, L. Berger, et al. Extensions to RSVP for LSP Tunnels, Internet Draft. www.ietf.org://draft-ietf-mpls-rsvp-lsp-tunnel-08.txt. February 2001. 19. C. Semeria. RSVP Signaling Extensions for MPLS Traffic Engineering. Juniper Networks, Inc., White Paper. Sunnyvale, CA, 2000. www.juniper.net. 20. A. Schmidt and D. Minoli. MPOA. Greenwich, CT: Prentice-Hall/Manning, 1998. 21. H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. “RTP: A Transport Protocol for Real Time Applications.” RFC 1889 (1996). www.globecom.net/(nocl,sv)/ietf/rfc/rfc1889.shtml. 22. R. Stewart, K. Morneault, et al. RFC 2960: Stream Control Transmission Protocol. October 2000. 23. N. Giroux. “ATM Traffic Management for Efficient Multi-Service Networks.” Network & Service Management for ATM. ICM-sponsored conference, Chicago, IL, August 1997. 24. D. Minoli and G. Dobrowski. Signaling Principles for Frame Relay and Cell Relay Services. Norwood, MA: Artech House, 1994. 25. The ATM Forum. Traffic Management Specification 4.0, af-tm-0056.000. April 1996.
An Overview of IP, IPOATM, MPLS, and RTP
61
Notes 1
Dynamic routing mechanisms can coexist in networks that use static routing on certain routers. 2 More pedantically, periodic updates are not a property of distance vector routing; they are the mechanisms of choice over unreliable links. Broadcast is used to provide reliability through retransmission. 3 This section is reduced from a more extensive treatment in D. Minoli, 1st, 2nd, and Next-generation LANs (New York: McGraw-Hill, 1993). 4 5
6
Information from Cisco Materials, including Networker’s 1997 CD-ROM. Preserving the current IP routing and addressing architecture while increasing the total size of the IP address space is just one possible way of supporting growth of the Internet; however, it is not the only possible way. If the restriction on uniqueness of IP addresses within a private internet is relaxed, then the size of an internet (for example, the Internet) would no longer be bound by the size of the available IP address space. This would change the current architecture, but it would also allow the continued use of IPv4 without constraint by the size of the IPv4 address space. One technology that supports connectivity in the presence of nonunique addresses is Network Address Translation (NAT; RFC 1631). NAT technology allows each organization connected to the Internet to reuse the same block of addresses (for example, the addresses defined in RFC 1918), while requiring only a small number (relative to the total number of addresses used by the organization) of globally unique addresses for external connectivity. The case can be made that the use of NAT devices represents a significant departure from the current IP routing and addressing architecture. However, widespread deployment of mediating gateways indicates that the traditional IP-level connectivity may not be that crucial and that the connectivity provided by such gateways could be sufficient [9].
This section is based on A. Schmidt and D. Minoli, MPOA (Greenwich, CT: Prentice-Hall/Manning, 1997). 7 As was discussed earlier, however, this need not be an obligatory imperative in the future. 8 By reducing communication that would be required in LANE with the LECS, LES, and BUS, the time required for address resolution can be reduced. 9 The latter aspect of RTCP may be sufficient for loosely controlled sessions—that is, where there is no explicit membership control and setup—but it is not necessarily intended to support all of an application’s control communication requirements. (This functionality may be fully or partially subsumed by a separate session-control protocol.) [21]
62
Chapter Two
10
RTP is intended to be tailored through modifications and/or additions to the header as needed. Therefore, a complete specification of RTP for a particular application will require one or more companion documents, such as: (1) a profile specification document, which defines a set of payload type codes and their mapping to payload formats (e.g., media encodings); and (2) payload format specification documents, which define how a particular payload, such as an audio or video encoding, is to be carried in RTP. 11 For example, pooling the tellers at a bank and merging the queue into one queue is more efficient than having multiple servers and multiple queues behind them. Similarly, pooling the voice, video, and data traffic is intrinsically more efficient than having separate networks because of the teletraffic/queuing principles. 12
PNNI is used for global QoS support in a private ATM network. This is accomplished via hierarchical, link-state, source-based QoS-driven routing, where the information is propagated using the ATMF UNI signaling mechanism (rather than, for example, being based on the Broadband ISDN User Part of the Common Channel Signaling System No. 7).
13
The PNNI interswitch routing protocol is used to identify the shortest path between the current location and the destination; the switch then computes the probability of successfully completing the call (based on available resources to supported the requested QoS) over all available paths. The path with the highest likelihood for completion is selected.
14
The only exception to this is the Circuit Emulation Service, where jitter, wander, and BER values are specified by the ATMF.
15
Formally, the negotiated characteristics of a connection. The traffic contract at the public UNI consists of a connection traffic descriptor and a set of QoS parameters for each direction of the connection. The connection traffic descriptor consists of parameters such as PCR, SCR, MBS, and MCR, along with the Cell Delay Variation Tolerance and the conformance definition used to unambiguously specify the conforming cells in the connection. Refer to Reference [26] for more discussion and definition.
16
As noted, though, the standards do not specify the numerical values of these parameters. This discussion only makes the point about the structure of the QoS request (i.e., that it ought to be based on a predefined menulike mechanism).
chapter 3 Issues in Packet Voice Communication 3.1
Introduction
Packet-switched IP networks have established themselves as an attractive option in a wide variety of data communication environments. To the present, circuitswitched networks have been the principal mechanism for transmitting human speech on a real-time conversational basis. However, because of perceived economic and technical benefits, digital voice techniques and corresponding network architectures have received considerable attention in the recent past. These benefits range from noise and crosstalk immunity to data and voice compatibility, security, bandwidth conservation, integrated networking, new service synergies, and network management cost reductions, among others. To briefly illustrate the pervasiveness of digital techniques, consider the widely used interoffice digital transmission carriers and digital switching of the PSTN; the plethora of voice analysis/synthesis technology and standards now emerging; digital telephone sets and next-generation PBXs; IP phones; and Computer Telephony Integration (CTI). During the past decade or so, the interest in integration started with a move to support simple digitized voice (i.e., supporting PCM-encoded voice in a circuitbased network such as ISDN) and moved to packetized voice over packet/IP networks and the use of compressed voice. To fully exploit the benefits intrinsic in voice codification, these techniques must be coupled with a network concept that allows real-time sharing of resources and guaranteed availability. A number of configurations and related models have
63
64
Chapter Three
been presented in the literature. The natural network construction for digital voice appears to be a packet-switched IP arrangement; packet voice, for example, has been considered in conjunction with packet radio networks and processing satellites via random packet-access schemes, in addition to the standard use for telephonic communication. The common goal of the models is to distill the costperformance trade-offs obtained by integration. Some of the key parameters to be quantified for voice call in this network environment are blocking probability, endto-end packet delay, interpacket mean and variance gap time, end-to-end packet loss probability, header overhead, and throughput. Similar or related factors need to be studied for the data section of the network, if any. Concention here is on delay as the performance parameter of interest. Network transmission adds corruptive effects—noise, delay, echo, jitter, and packet loss—to the speech signal. Subjective studies have been conducted to access the impact of these effects. Many customers may be willing to tolerate degradation in some or all the listed criteria, if other benefits, notably transmission cost reduction, could be achieved. The sharing of transmission and switching facilities between data and voice, and the apparent gain in capacity obtained by interpolating several conversations on the basis of the on/off statistics of voice, appear to ensure such cost reduction. In view of this flexibility in customer performance requirements, the network designer may construct cost-effective architectures if analytical relations interconnecting some or all of the preceding criteria can be attained. Research is under way to develop such design techniques so that, with the cost-performance trade-offs as a tool, system optimizations can be performed with the characteristics of the terminals, the network topology, and the protocol all considered jointly. Furthermore, assessments can be made of the relative merits of packet switching, circuit switching and hybrid alternatives for handling speed traffic and, ultimately, for mixed data and speed traffic. This chapter covers some classical issues for the support of voice in general packet networks. The reader not interested in some level of queuing machinery may wish to skip Sections 3.4 and 3.5. The models presented in this chapter are not to be interpreted as the only applicable models; in fact, there is quite an extensive literature on this topic. The models presented are simply some of the work that the senior author has undertaken over the years [1–56].
Scope Consideration here is restricted to the modeling, analysis, and design of packetswitching networks carrying only real-time speech packets; thus, the presence of network control traffic is ignored. Any switch or router design considerations are further ignored, and ideal switch or router behavior is assumed. Section 3.2 of this chapter contains a discussion of packetized speech traffic models, and Section 3.3 discusses performance criteria. These traffic models and performance criteria are significant because these are the environmental conditions
Issues in Packet Voice Communication
65
that provide reasons why packetized speech network design differs from the data case. In general, speech traffic has a more regular and predictable arrival pattern than data; so, intuitively, one would expect the network design to be able to capitalize on this by achieving higher facility utilization with speech than is possible with data. On the other hand, network performance criteria for speech will be more stringent and, in particular, will require a regularity and consistency that is not required for data transmission. A queuing model for a single link carrying packetized speech data is presented, and its solution is used to study the end-to-end performance of the system under consideration. This is, of course, the simplest possible network, but it has sufficient complexity to reveal many of the issues that will be present in more general links and networks. A fully connected network with only direct routing would consist of links of this type. Multilink hopping will generally be less viable with speech traffic because of tight delay constants; however, these will be encountered in typical intranets and corporate enterprise networks. Also briefly outlined is how the steady-state distribution of the queuing delay experienced on the typical link of the packetized speech network can be obtained. Finally, the results and implications of this link model for a packetized speech network are addressed; some results on the optimal packet length are presented. As discussed in Chapter 4, there are two major techniques used in speech analysis, waveform coding and vocoding. Waveform coding attempts to reconstruct the facsimile of the input wave by sampling and digitizing its amplitude-time domain at regular time intervals. Vocoding methods generally attempt to model some key characteristic of the power-frequency domain (e.g., the sequence of local maxima, called formants) and resynthesize the speech at the remote end. Vocoding is a lossy compression method.
Summary of Results The major results reported in this chapter are as follows: 1. A unified treatment of speaker behavior models is presented. 2. An analysis of the relationship between a variety of protocols and performance criteria is presented. 3. Closed form and computational schemes for obtaining the steady-state delay distribution are presented. 4. The delay distribution is shown to be approximately exponential; the single parameter characterizing the distribution is obtained as a function system parameters. 5. The delay dependencies on packet size are obtained. A closed-form expression for optimal packet size is reported. It is indicated that a network serving low-bandwidth terminals (e.g., vocoders) requires very small packets (20 to 50 bytes) whereas a network for high-bandwidth terminals (e.g., PCM) operates best with somewhat longer packets (60 to 70 bytes).
66
Chapter Three
6. The effects of finite buffers to sustain excellent performance, even high utilization, are presented. 7. The transient behavior of the system is analyzed. Transient performance degradation is shown to be of limited duration.
3.2 Traffic Models Although the generation process is given only cursory attention in most of the published models, detailed scrutiny of such input is required for any further analysis.
Introduction The statistical analysis of speech patterns has attracted attention for the past half century. As a product of such study, several models for telephonic speech patterns have emerged. In general, to obtain a better fit to empirically measure data, the models must grow correspondingly in sophistication and complexity. The major realization of the investigation has been that a Markov process, with an appropriate number of states, describes speech mechanics well. This section describes, in an organized fashion, various models that can be used to study speech behavior in a statistical sense. Some of the models have been studied in the literature; others have been developed to fill in the trade-off gaps between model complexity and correspondence with the empirical data. A short discussion of the relative merits and weaknesses of each model is given. The accuracy of a model is its ability to predict the length of the ten speech events described in the following subsection. The most sophisticated models can accurately predict the distribution of all of these events. The less sophisticated models can predict the distribution of only a few events, particularly the talkspurt length and the pause length. However, it is these events that are of the most interest for our traffic models for network statistics. These simpler models yield more tractable analytical formulations. These models are applicable to waveform coding methods with silence suppression.
Speech Events The following ten events are relevant to speech patterns: 1. 2. 3. 4.
Talkspurt Pause Doubletalk Mutual silence
Issues in Packet Voice Communication
5. 6. 7. 8. 9. 10.
67
Alternative silence Pause in isolation Solitary talkspurt Interruption Speech after interruption Speech before interruption
Consider two speakers, A and B. The major events are defined as follows: 1. Talkspurt, Pause. The technique of obtaining on/off speech patterns is summarized as follows. A flip-flop is set each time speech (full-wave rectified and unfiltered) from speaker A crosses a threshold. This flip-flop is examined and cleared every 5 ms, with the output being a one if the threshold was crossed, zero otherwise. The resulting string of ones (spurts) and zeros (gaps) is examined for short spurts; all spurts less than 15 ms are erased. After this has been done, all gaps less than 200 ms are filled in to account for momentary interruptions, such as those due to stop constants. The resulting on/off pattern consists, by the definition used here, of talkspurts and pauses. An identical procedure is used for speaker B. (Note that this definition of talkspurt is not universal; other investigators use alternate definitions.) 2. Doubletalk. A time when speech is present from both A and B. 3. Mutual silence. A time when silence is present from both A and B. There is a slight divergence from the approach employed in the literature, in that a packetized talkspurt is considered: namely, instead of considering a Markov process, a Markov chain is used. Furthermore, the chain is assumed to be homogenous (time invariant). The existence of a device or software algorithm that can test a speech packet for an energy threshold is presumed. The definition of talkspurt is as follows: A contiguous sequence of nonempty packets from a single talker constitutes a talkspurt. A packet is considered to be nonempty if it exceeds the energy threshold. Thus, any pause duration less than one packet’s timelength will most likely be swallowed up in the discrete packetization, and the talkspurt will be considered not to have been interrupted by a pause. Pauses of up to two packets in length could be swallowed up, depending on the time phasing of the pause (see Figure 3.1).
Speaker Models A time-slotted environment in which the speech takes place is assumed. That is, a clock divides the time axis into segments, during each of which a speaker generates
68
Chapter Three
Figure 3.1 Packet stream for an actual speech wave. Actual speech power
Packets P1 P2 P3
P4 P5 P6
Time
an empty or nonempty packet, depending on whether the threshold energy was exceeded during that time period. These time segments will be called frames. Note that this specifically excludes voice-actuated synchronization. In the transmitted speech signal, it is assumed that empty packets mean a nonempty packet. The usual simplifying assumption that only one member of the speaker-listener pair changes speech activity states during a frame is made.
Six-State Markov Chain Model (Brady Model) The Markov chain-state-transition diagram is depicted in Figure 3.2a. A possible sequence of events is depicted in Figure 3.2b. Observe that a talkspurt for A is made up of any concatenation of states 1, 2, and 3. Although this six-state model is of interest because of its excellent predictive ability for the ten events, there exist simpler models that, although they cannot accurately describe certain events in the dynamics of a conversation, still yield accurate talkspurt and pause lengths.
Four-State Markov Chain Model By collapsing states 2 and 3 into a single state, and similarly collapsing states 4 and 5 into a single state, we obtain the four-state chain depicted in Figure 3.2c. Figure 3.2d shows one of the possible events. Observe that now a talkspurt from A consists of the concentration of states 1 and 2 only. This model predicts talkspurt length and pause length well, but does not provide an accurate fit to empirical data for doubletalk.
Issues in Packet Voice Communication
69
Three-State Markov Chain Model By ignoring the possibility of doubletalk altogether—namely, by eliminating state 2 in the previous model—a model is produced that can predict talkspurt length and silence length distribution but cannot represent doubletalk (see Figures 3.2e and 3.2f). Note that the talkspurt is characterized by a single state, and the silence period of a particular talker is characterized by two states. With this model, the talkspurt length is geometrically distributed, but the silent-period length is not. This is also true of the empirical data.
Two-State Markov Chain Model As a final simplification, an elementary two-state chain can be assumed. This can still model the talkspurt length fairly reasonably, but it does not model the silence length well. This chain is obtained by eliminating the possibility for mutual silence; namely, state 2 in the previous chain. Thus, speakers A and B alternate their turns, speaking with no pauses or response-time silences (see Figures 3.2g and 3.2h). The talkspurt length and the silence length are both geometrically distributed (only the former is realistic). In view of its simplicity, this model has been investigated further and applied to this link model analysis. The other models could also be used with a slight increase in computational effort. It is easily shown that in this case the probability (there are k packets input at the nth frame, and m pairs of terminals are active) is
冢 k 冣[P m
m−k ] [P (n) 2 ]
(n) k 1
where (n) (0) (n) P 1(n) = P (0) 1 p11 + P 2 p21 (0) (n) (0) (n) P (n) 2 = P 1 p 12 + P 2 p 22
and n P AB =
=
冤p p 冥 (n) (n) p12 p11
(n) (n) 21 22
冤
(1 − r − p)n (1 − r − p)n p r ᎏ + r ᎏᎏ, ᎏ − r ᎏᎏ r+p r+p r+p r+p
冥
(1 − r − p)n (1 − r − p)n p r ᎏ − p ᎏᎏ, ᎏ, + p ᎏᎏ r+p r+p r+p r+p
70
4
A stops talking
A starts talking
Mutual silence, A spoke last
A goes silent
1
5
ASBS 3
ATBS 1
A goes silent
ASBT 4
ATBT 2
B starts talking
6
3
B talks A silent
State
Frame
Speaker A Speaker B
State
Talkspurt A Talkspurt B
Doubletalk A is interrupted
A talks
Doubletalk A is interrupted
B goes silent
B talks
(a)
(c)
B stops talking
Mutual silence, B spoke last
2
A goes silent
B talks
B goes silent
B goes silent
B talks
A talks A talks
A talks B silent
1
1
2
4
4
3
4
3
1
1
3
(d)
3
(b)
4
4
4
3
1
1
1
3
3
1
1
1
3
1 2 6 3 1 4 4 1 4 6 6 5 1 4 6 3 6 3 6 5 1 2 6 3 3 4 6 3 1 2 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Time
Time
71 1–p
ATBS 1
p
(g)
r
p
(e)
1–y–x
ASBS 2
1–r
ASBT 2
y
r
1–r
ASBT 3
Speaker A Speaker B
State
Speaker A Speaker B
1
1
1
1
2
2 1
2
2
2
3
2
3
(h)
1
(f )
3
1
3
2
2
1
2
2
1
1
1
1
1
1
2
1
3
Figure 3.2 Speech origination models (XYTS = X talks, Y silent): (a) six-state model for speech pattern; (b) representative conversational sequence of events—shaded areas indicate a nonempty packet issued by the speaker; (c) four-state Markov chain model; (d) example of four-state model event sequence; (e) three-state model with transition probabilities; (f) example of a three-state model event sequence; (g) two-state model; (h) example of a two-state model event sequence.
1–p
ATBS 1
x
State
Time
Time
72
Chapter Three
One can also obtain lim prob (k packets from m speakers at frame n) n→∞
冢 冣冢
p m = k ᎏ r+p
冣冢 k
r ᎏ r+p
m−k
冣
Thus, it is observed that the steady-state arrival process is a binomial process. This is clearly due to the fact that each speaker’s behavior is an independent Bernoulli trial, with probability p/(r + p) of supplying a nonempty packet. This result can be used to easily compute the steady-state traffic statistics; it has, in fact, been used in the model in Section 3.4. The steady-state traffic can be used in studying long-term network behavior but the nth-step unconditional probabilities would be needed for any investigation of network transient behavior.
Other Models For the purpose of studying network behavior, the most significant aspect of the traffic models for a speaker is the resulting distribution of the length of the stream consecutive packets (i.e., talkspurt length). In comparing the implications of the previous models with empirical data, it can be noticed that, although the empirical distribution talkspurt length is closer to the geometric that the models predict, the most significant discrepancy is an underprediction of the frequency of short talkspurts. An intuitive conjecture is that a speaker operates in at least two models: one where he or she is the controlling speaker, and the other where he or she is merely issuing short utterances to reinforce the speech he or she is receiving. Thus, any one of the previous models can be extended to include two distinct states for A talking, B silent. In Figure 3.2g the two-state chain can be augmented to a four-state chain by introducing two such new states, one for each speaker. One state has a high probability for speech to be continued into the next frame, and the other state has a much lower continued speech probability. Thus, each active speaker can be in a long-burst or short-burst mode.
Call Origination Model Thus far, the behavior of a population of m terminals, where m is fixed, has been considered. This section addresses the issue of the call origination process. Let M be the maximum number of speakers that can access the packet switch under consideration; let m(n) be the number of off-hook terminals (i.e., active terminals) at frame n.
Issues in Packet Voice Communication
73
Two issues need to be addressed: 1. The statistical behavior of m(n) 2. The length (holding time) of a typical call. While specific answers can be given only with exact data from the community of users for which the network is intended, mathematical models that capture the flavor of the phenomenon (rather than the exact numbers) are easily constructed. The speech-originating process will be modeled by a more complex Markov chain that drives the imbedded speaker models. Figure 3.3 illustrates the technique for the two-state model. Let the new chain be denoted by Yi(n). Note that the holding time (i.e., the time the Yi(n) is in a state greater than or equal to 1) is geometrically distributed in each case, because Yi(n) remains in state 0 with probability 1 − sα, where s is the number of states in the speech chain and α is the probability of a transition from an on-hook state to each of the off-hook states. This is consistent with the memoryless holding-time standard assumption. It is clear that by appropriately selecting the parameter Ω the conversation can be made statistically longer or shorter; also, the number of transitions to the offhook state, namely, the number of calls the pair A-B is likely to make in a time interval t can be controlled.
Off-hook states αAB αAB
1 – 2αAB
1–r On-hook 0
r ASBT 1
1–r ASBT 2
r
Ω
Figure 3.3 Call-origination model interfaced to a two-state speech model.
Ω
(
1 – 2αAB αAB αAB Ω 1–r–Ω r r Ω 1–r–Ω α1, Ω very small
(
74
Chapter Three
3.3 Performance Criteria Well-documented subjective testing and measurements have established ranges of transmission network corruptive effects on speech where these effects are either: (1) not perceptible, (2) perceptible but tolerable, or (3) not tolerable. Estimates for the boundaries between these regions are available, although further investigation into combinations of effects is needed. The not-perceptible range is easy to deal with. The not-tolerable range is slightly deceptive, in that it will not usually result in a disruption of the conversation unless sustained for an unacceptable period of time. Fortunately, it is only in the not-tolerable range that speaker behavior leads to unstable degradation (shouting, heavy doubletalking, no information transfer, etc.). In the middle range, the speakers eventually begin to act to produce more efficient information transfer (fewer interruptions, etc.); thus, speaker models based on ideal network performances will tend to be conservative in estimating the degree of corruptive effects.
Results of Subjective Studies Packetized speech belongs to the category of real-time data traffic. Consistent with this classification, it has stringent delivery requirements with respect to loss or error. Generally speaking, the delivery requirements can be divided into two categories: 1. Owing to the psychological effects induced by delay, the end-to-end average network delivery time must be small. 2. Owing to the psychological effects induced by glitching (gaps due to delay fluctuation, noise, buffer overflow losses and other protocol discarding and misaddressing), the end-to-end variation of the delivery time, including losses, must be small. In other words, the human listener in a conversation has limited tolerance to both the average delay and the fluctuation of delay. It is thus apparent that the network designer must control not only the first moment of the delay (mean), but also the second moment (variance). This latter requirement is formally established in the “Smoothness Criteria” subsection.
End-to-End Delay The overall end-to-end delay can be written as
D(t) = V + h + d(t) + B where V is the delay due to the speech analog-digital conversion, h is the delay due to the packetizing period (equivalent to the packet timelength), d(t) is the
Issues in Packet Voice Communication
75
network delay at time t, and B is the waiting time at the receiver end. The precise value of V depends on the terminal technology and is usually small compared with the other terms; it will be ignored in the sequel. Therefore, if D = h + d, D represents the total end-to-end delay before the application of any receive-end buffering. The overall delay D(t) should not exceed 200 ms, the value of delay that has been shown to be commercially acceptable. When the delay reaches 800 ms, adverse psychological factors impede normal telephonic conversation. A delay of 200 to 800 ms is conditionally acceptable for a short portion of the conversation, when the occurrences of such delays are rare and far apart. In other words, there is a well-established range of acceptable delay, and temporary degradation is admissible, as long as such degradation occurs with low probability and short duration. Particular applications usually require more stringent constraints. As noted in Chapters 4 and 5, delays in the 100 to 200 ms range are now the typical goal; during the 1980s researchers were willing to consider longer delays.
Glitching Studies have been conducted in which speed is temporarily segmented and temporarily interrupted at constant (deterministic) rates, and in which speech is manipulated according to some random process. The following results have been shown: • In interrupted speech (equivalent to loss or discard of packets): Intelligibility decreases to very low values (10 percent) as the packet size approaches 0.25 s. Intelligibility increases to 80 percent as packet size approaches 0.020 s. • In segmented speech (equivalent to waiting for late packets): For fixed active-speech segment length, intelligibility increases as the silence period decreases. For fixed silence length, intelligibility decreases as the active-speech segment length decreases. Curing suggestions, such as short packets and interleaving, have been suggested. Owing to the high redundancy of the speech signal, speech losses as high as 50 percent can be sustained with marginal degradation if such loss occurs for very small (e.g., 20 ms) segments. This concept might be employed to control total network traffic in case of congestion. Thus, the acceptable packet loss rate is a function of packet size. Under certain speech encoding techniques, such as vocoding, the packets themselves may be composed of self-contained speech elements, called parcels, the selective discarding of which could be used as a traffic throttle. With PCM, for example, if an eight-level quantification is used, each 8-bit character can be considered a parcel.
76
Chapter Three
Smoothness Criteria The reconstructed continuous speech delivered to a listener by a packet-switched IP network contains gaps due to the statistical fluctuation of the network load and the consequent fluctuation of the network delay and loss performance. The gap structure perceived by the listener will be not only a function of these fluctuations but also a function of the network policy or protocol at the receiver end for dealing with these gaps. This section shows the importance of obtaining the delay distribution, or at least the second moment of delay. For data traffic, the mean delay has usually been the only design criterion.
Waiting for Late Packets Consider an A talkspurt, that is, a segment of continuous active speech between a speaker-listener pair A-B with a consecutive stream of nonempty packets issued by speaker A. Assume that a frame i (i = 1, 2, . . . ), a packet issued by A, experiences delay Di, where Di are identically (but not independently) distributed for all i. Then, for a fixed packet time length h, ai = ih is the time when A issues packet i, and bi = Di + ih is the time when B receives packet i. Note that, as defined here, h is identical to the packetizing delay in the previous delay formula. Let fi be the temporal fluctuation between packets i − 1 and i, as received by B. More precisely (see Figure 3.4),
fi = Di − Di − 1 + min (0, fi − 1)
for i = 1, 2, . . .
with the initial condition D0 = 0, f0 = 0. Then fi represents the a posteriori lateness of the ith packet relative to the lateness of the (i − 1)st packet. Because 0 ≤ Di ≤ ∞, the range of fi is −∞ ≤ fi ≤ ∞. If fi ≤ 0
Figure 3.4 Unlimited waiting protocol. f1 = 1 − 0 + 0 = 1, Z1 = 0, G1 = 1 f2 = 2 − 1 + 0 = 1, Z2 = 0, G2 = 1 f3 = 6 − 2 + 0 = 4, Z3 = 0, G3 = 4 f4 = 1 − 6 + 0 = −5, Z4 = G4 = 0 f5 = 3 − 1 − 5 = −3, Z5 = −3, G5 = 0
Packet departure at A
Packet arrival at B
1
2
1
3
4
2 1
5
4 2
6
3 5 3
4
5
D0 D1 D2 D3 D4 D5 0 1 2 6 1 3
Issues in Packet Voice Communication
77
corresponding to the ith packet arrival not later than the earliest time that an inorder delivery to B can be made, no temporal disruption results, provided that adequate buffering facilities exist at the receiver end. For this protocol the reconstructed speech is simply
冦
Gi = max (fi, 0) = fi 0
if fi > 0 if fi ≤ 0
That is, gaps are introduced only when fi is positive. Clearly, the network should be designed such that the distribution of Gi is acceptable from a performance point of view. After long algebraic manipulation, one arrives at
2σD2 (1 − ρD) σD2 prob (Gi > Kα) ≤ ᎏᎏ <ᎏ Kα2 Kα2 where σD2 is the variance of the end-to-end delay, and ρD is the correlation in deliveries. Thus, for any performance criteria where one wishes to control the tail of the gap distribution, given Kα, one can impose constraints on σD2 so that the probability of gaps exceeding Kα can have arbitrarily small probability. Note that Kα itself can be made arbitrarily small, certainly sufficient in an engineering sense, but that it cannot be made strictly zero. Thus, one can ensure the constraint
prob (Gi > Kα) ≤ α by satisfying
αKα σD2 ≤ ᎏ 4 Note that as α decreases, the amount of area σD2 allowed in the tail must decrease. Also, as the cutoff point for the tail Kα decreases, the needed σD2 decreases.
Limited Waiting for Late Packets Under the protocol in the preceding subsection it is shown that the gaps in the reconstructed speech are
Gi = max (fi, 0)
78
Chapter Three
If the designer of the network could control the variance of the delay, then the tail of the gap distribution would also be controlled. The variance, however, cannot be completely controlled or, in general, reduced to any arbitrary value. Therefore, an alternative protocol must be sought to prevent long gaps. Waiting indefinitely for late packets at the receiver end (say, by buffering subsequent packets) implies not only a long gap in the reconstructed speech until such late packets arrive, but also temporal distortion of the consecutive spoken material from that point on. Also, such a protocol implies no recovery from lost or misaddressed packets. To avoid these complications, a protocol can declare missing, and subsequently ignore, a packet whose lateness exceeds a certain preset limit S. Such an action is called a discard. Assume that (J − 1)h ≤ S ≤ Jh, where J is a positive integer. By substituting a period of silence of length Jh whenever a packet does not arrive in time, the temporal distortion of the overall speech string can be bounded by a predetermined value. Note that J = 1, for example, implies a temporal distortion of at most S and a gap of at most h before the protocol is reapplied to the next packet. In this case, a single packet arrives infinitesimally before the discard decision, in which case the packet is still delivered. Further improvement can be obtained if the silent period following rejection is aborted on the arrival of the next packet, but this alternative has not yet been investigated here. It can be shown that under this protocol the gap structure can again be controlled by controlling σD, but σD need not be made small as for the protocol in the preceding subsection.
Receiver-End Buffering The limited waiting period S can be regarded as a delay-variance reduction technique, wherein temporal gaps due to a single packet are prevented from exceeding S, paying for it with a glitch silence of length Jh. An additional protocol strategy can be employed to reduce the gap variance by buffering packets at the receiver end so that their total delay to the receiver is at least some minimum quantity. What is sacrificed with this technique is an increase in the average delay in return for the improved smoothness. This modification involves buffering at the receiver end those packets whose delay does not exceed a certain appropriately chosen value w. These packets would be stored for an amount of time equal to w − Di.
3.4
Link Model
So far, the input to the system and the requirements to be satisfied in delivering the input traffic to its destinations have been discussed. The system to accomplish such transfer is now described.
Issues in Packet Voice Communication
79
Introduction This section considers a distributed population of digital voice terminals and a network of packet switches (or routers) interconnected by a topology of (usually high-capacity) links. Terminals are homed to a specific packet switch or router. Communication between remote terminals takes place via the appropriate backbone packet switches or router, which are used for entry, exit and store-andforward relay operations. The objective of this network model is to obtain end-to-end delay, the percentage of lost packets, the amount of glitching, and so forth. The discussion initially looks at a single link on a channel and draws appropriate conclusions. Later, the analysis is extended to a network of tandem links and then to a more general network.
Model Description The discussion begins by identifying the underlying assumptions and introducing some descriptive notation.
Traffic Given any pair of speakers A-B, consider any one of the traffic models of Section 3.2, interfaced to the call origination model. Let P i(n) be the vector of unconditional probabilities of speaker i being in state 0, 1, 2, 3, . . . at frame n; for example: (n) (n) (n) P i(n) = [P i,0 , P i,1 , . . . P i,s ]
This discussion is concerned only with some of the elements of this vector, namely P i(n), the probability that speaker A supplies a packet at frame n, given that the pair A-B is off-hook. Then (n) P i,j 冱 talks
j/i
ᎏᎏ P (n) i = 冱 P i,j(n) j=1
Queue Operation 1. m terminals accessing the packet switch or router under consideration are off-hook and are engaged in infinitely long conversation. 2. Each terminal has two buffers for potential packets, and so it can start building the next packet after completion of one. Before the new one is completed, the old one must be cleared or it will be overwritten. This implies a constraint on the processor.
80
Chapter Three
3. A scheduled arrival of potential packets is assumed; namely, terminal Ti, generated completed empty or nonempty packets at times rh + i∆, r = 0, 1, 2, . . . , where ∆ is a parameter, possibly dynamically adjustable, of the entry switch or router. Although such synchronization is impossible to arrange immediately following the off-hook condition, it can be accomplished in subsequent frames by appropriately clipping speech or adding silence. This needs to be done only once. 4. The appropriate buffer of Ti is processed at time rh + i∆. If occupied by a nonempty packet, the contents of the buffer are placed on a queue for transmission. The queue itself operated in a first in, first out (FIFO) mode. 5. It is assumed that τi, the speaker behavior, is independent of the queue backlog. It is clear that h ≤ m∆ must be met to preserve the real-time requirement (avoid extra delay or packet loss) at the entry node caused by the processor falling further and further behind in its cynical scan for packets requiring transmission. Summarizing the preceding assumptions, it is seen that a packet joins the output transmission queue at time rh + i∆ with probability Pi(r). The time period h between the possible successive submissions of a packet by a particular terminal will still be referred to as a frame (see Figure 3.5).
Link Parameters The following parameters influence the link performance: m = Number of off-hook users accessing the switch B = Rate at which the digital voice terminal supplies bits, bps
Frame r
(r)
P m(r – 1)
P1
∆
(r)
P2
(r)
(r )
P4 …Pm
∆
h–m∆
h–m
Figure 3.5 Probabilistic queue arrival sequence.
(r)
P3
Arrival idle period
h
Issues in Packet Voice Communication
81
C = Capacity of the transmission link, bps P = Speech content per packet, bits P h = ᎏ = Timelength of speech carried in a packet, s B φ = Packet overhead in header and so forth, bits s = Number of states in the speech model, as discussed in Section 3.2 p = Steady-state probability that a speaker will issue a packet in a frame; determinable, in general, from s and the speech model state transition parameters (p = lim P (r) i ) r→∞
For formulation simplicity, it is assumed that s, B, P, φ, and, therefore, h are also identical for all terminals. From the preceding model parameters, it follows that the transmission service time for a packet is the constant µ obtained by dividing the total packet length in sets by the line capacity, namely, µ = (P + φ)/C. An expression for line utilization ρ can be stated in terms of the preceding parameters:
φ Pm[B + Pφ] PmB ρ = ᎏᎏ = ᎏ 1 + ᎏ C C P
冤
冥
Outline of Queue Solution Thus, it is assumed that at any instant i∆ (i = 0, 1, 2, . . . , with ∆ an integer number of time units) a packet joins the queue with probability p. Namely, if τi = 1 if a customer arrives at i∆ and 0 otherwise, then
prob (τi = 1) = p prob (τi = 0) = 1 − p Assume that the service time per packet is given by
prob (S = j) = Sj
j = 1, 2, 3, . . .
This is slightly more general then the constant server problem stated previously. Note that ∆ can certainly be made an integer for all practical purposes by appropriately choosing the underlying unit—say, seconds, milliseconds, microseconds, and so forth. The utilization of the system is ρ = pE(S)/∆. Using the method of embedded Markov chains, consider the amount of unfinished work Wn just after the nth potential arrival point (not actual arrival). Then
W0 = 0 Wn + 1 = max (0, Wn − ∆) + τn + 1S
82
Chapter Three
Under the assumption that Wn and τn + 1 are statistically independent— namely, the arrival behavior is independent of the queue statue—Wn (n = 0, 1, 2, . . . ) is a Markov chain. Because the queue is assumed empty at time zero, Wn can take only values j = 0, 1, 2, . . . Assuming that ρ < 1, so that a steady state exists, and letting
W = lim Wn n→∞
πj,n = prob (Wn = j) πj = lim πj,n n→∞
the following steady-state recurrence relations are obtained:
π0 = (1 − p)
冢
πQ = psQ
∆
冱 πi i=0
∆
冱
i=0
Q−1
冣 冢冱
πi +
i=0
冣
psQ − iπ∆ + i + (1 − p)π∆ + Q
Q = 1, 2, 3, . . . Let ψπ(Z) be the generating function of the π’s and ψs(Z) the generating function of the server’s discrete distribution. Then, by the usual technique, ∆−1
[pψs(Z) + 1 − p] 冱 (Z∆ − Zi)πi i=0 ψπ(Z) = ᎏᎏᎏᎏ Z∆ − [pψs(Z) + 1 − p] Note that at this point there are ∆ unknowns (π0, π1, . . . , π∆ − 1) in the generating function. Evaluating ψ(Z) at Z = 1 obtains ∆=1
冱
i=0
(∆ − i)πi = ∆ − pE(S)
∆ − 1 other equations are needed to obtain the first ∆π’s. Even without the evaluation of these ∆ unknowns, an expression for the expected amount of unfinished work can be obtained. In fact, E(W) = ψ′π ∆−1
冱
i(∆ − i)πi = [∆ − pE(S)] [∆ − 1 + 2pE(S)] − {∆(∆ − 1) − pV(S) − pE(S)[E(S) − 1]} i=0 ᎏᎏᎏᎏᎏᎏᎏᎏᎏ 2[∆ − pE(S)]
Issues in Packet Voice Communication
83
of interest here is the delay U, seen at steady state by an incoming customer. Clearly,
U = max (W − ∆, 0) + S Thus,
E(U) = E[max (W − ∆, 0)] + E(S) = E(W) − pE(S) + E(S) With a constant server, it can be shown that the mean of the distribution (mean waiting time for link, plus transmission) is
ρ 0.75 − ᎏᎏ 2 E(U) = ᎏᎏ µ 1−ρ
冤
冥
so that
p+φ E(U) = ᎏ C
冤
冤
冥
mB mBφ P 0.75 − ᎏᎏ − ᎏᎏ 4C 4C ᎏᎏᎏ mB mBφ P 1 − ᎏᎏ − ᎏᎏ 2C 2C
冢
冢
冣
冣
冥
Generally speaking, the delay distribution is also a function of five parameters driving the model, namely: (1) the line capacity C, (2) the packet length P, (3) the overhead φ, (4) the digitization bit rate B, and (5) the number of simultaneous users m. To obtain such distribution, the ∆ unknowns previously mentioned must be obtained. This can be done with the aid of the following three propositions. Let
M(Z)N(Z) ψπ(Z) = ᎏᎏ D(Z) Proposition 1: D(Z) = Z∆ − pψs(Z) − 1 + p has exactly ∆ roots inside or on |Z| = 1. ∆−1
Proposition 2:
N(Z) =
(Z∆ − Zi)πi has exactly ∆ roots on |Z| = 1. 冱 i=0
Proposition 3: No root of M(Z) = pψs(Z) + 1 − p coincides with a root of D(Z).
84
Chapter Three
Thus, it follows that the zeros Zi of D(Z), |Zi| ⱕ 1 must coincide with the zeros of N(Z); more specifically, because D(Z) has ∆ zeros inside or on the unit disk, the ψ(Z) converges at least inside the unit disk, it follows that the zeros of the denominator are also the zeros of the numerator. Thus, there are exactly ∆ equations for the determination of the ∆ unknowns π0, π1, . . . , π∆ − 1.
3.5 Results This section studies the following for the steady-state situation: the delay distribution faced by a typical packet, both for the case of infinite buffer capacity and the finite case; the effect of the speech model on the delay and on the utilization of the line; and, finally, the determination of optimal packet lengths, with respect to a specific performance criterion, as a function of the number of users, the overhead, the digitization rate, and the capacity of the transmission line. The model is also used to address some transient issues via numerical solution.
Properties of the Delay Distribution As indicated previously, the general interest here is in studying the characteristics of the delay distribution. A concentration of probability around the delay mean necessarily implies a related concentration of probability near zero for the gap structure. A distribution with a high-probability tail is poor, in the sense that one must pay a high price in terms of network control and clack facility utilization to ensure a certain performance confidence. Packet speech is particularly vulnerable to highprobability tails in the delay distribution, because the value of its information is perishable. If a network is not designed properly—say, with much hopping (e.g., too many router links) and a very high utilization of line capacity—the variance may be such that one would be forced to wait several seconds before ensuring that the fraction of the packets needed for fidelity are received. This would violate the tight delivery-time requirements previously described. As a first-order approximation, the end-to-end variance can be taken as the sum of the links variances (and is bounded by L2σM2 where L is the number of links and σM2 is the largest link variance). Therefore, the variance of the end-to-end delay can be directly related to the variance of the delay on a single link. A key issue is, what is the single-link-delay distribution for a packet voice network? Figure 3.6 shows the distribution of total delay for five values of ρ for the case P = 800, φ = 200, C = 50 kbps, B = 5 kbps, and a symmetric two-state speaker model. Naturally, as the utilization increases, the distribution becomes more dispersed. Figure 3.7 shows the expected value and the 95th percentile with the mean as predicted from an M|M|1. In terms of concentration of probability, the
1.0
a b
c
d
e
0.9 0.8 0.7 Probability
Figure 3.6 Typical total delay distributions: (a) ρ = 0.563, m=9 (b) ρ = 0.688, m = 11 (c) ρ = 0.813, m = 13 (d) ρ = 0.875, m = 14 (e) ρ = 0.938, m = 15
0.6 0.5 0.4 0.3 0.2 0.1 180200 250 300 350 400 450 500 550 Time, ms
0.350 0.340 0.330 0.320 0.310
b
0.300 0.290
Time, s
0.280 0.270 0.260
a
0.250 0.240 0.230 0.220 0.210
c
0.200
Figure 3.7 Expected delay as a function: (a) E(D) for M|M|1, (b) K0.95, and (c) E(D).
0.190 0.180 0.4 0.5 0.6 0.7 0.8 0.9 1.0 ρ
85
86
Chapter Three
distributions under consideration are comparable to exponential distribution for the cases of ρ = 0.813 and ρ = 0.938. The agreement is excellent. From this observation it appears that the single-link-delay distribution can be approximately characterized with one parameter, its expected value. For the infinite-buffer case, the following conclusions can be made: 1. The exponential distribution accurately fits the actual delay distribution. 2. As a consequence of good exponential fit, the 95th, 99th, and 99.9th percentiles of the model’s total delay distribution are comparable to those of the exponential distribution (in the number of standard deviations). The model delay has an accumulation of probability at an initial point, indicating the possibility of finding a server idle; this feature is not captured by the exponential random variable. On the other hand, the finite-buffer case reduces the expected delay and contracts delay distribution as compared to the infinite-buffer case, at the expense of blocking some packets. The “Finite-Buffer Case” subsection addresses this issue in more detail. To point out the peculiarities of the delay distribution (Figure 3.8) for a voice environment, compare the results of the present model to an M|D|1 model (representative of packet radio, random-access schemes); an M|M|1 model (representative of a data packet switched network); and also against the well-known G|G|1 heavy traffic approximation (see Figure 3.9). As anticipated, the other models turn out to be conservative in that these distributions imply a larger mean value and a higher probability tail.
Finite-Buffer Case This finite-buffer case is interesting from an engineering point of view. First, there is the issue of how many buffers to provide at a packet switch; second, what is the buffer overflow probability, given that a fixed number of buffers is provided? The modified version of the model in the “Outline of Queue Solution” subsection can be employed to successfully address and answer these issues. The examples in the “Properties of the Delay Distribution” subsection were rerun with two, three, and four buffers, and the results were analyzed. Figure 3.10 shows a plot of some of these results. The delay distributions for finite-buffer cases are contrasted with the infinite-buffer situation. Finite-buffer situations have a delay distribution that is much less dispersed; the long tail of the delay distribution is cut off. This has the beneficial effect of improving the performance of those packets that are not blocked, at the expense of dropping a few packets. Table 3.1 compares the mean and the 95th percentile of those packets
Issues in Packet Voice Communication
a
b
0.9 0.8 0.7 Probability
Figure 3.8 Comparison between model delay distribution and exponential approximation: (a) ρ = 0.813, (b) ρ = 0.938.
87
0.6 0.5
Mean Mean
0.4 0.3 0.2 0.1 200 Key:
250
300 Time, ms
Exponential fit
350 Model
that are successfully delivered; the improvement effect previously mentioned can be seen, particularly for high utilization. Note that now the largest possible delay is Kµ + h, where K is the number of buffers. Not yet shown is the probability that a packet is blocked; this is expected to increase as the number of buffers decreases. The significant observation is that such
1.0 0.9 0.8 0.7 Probability
Figure 3.9 Comparison of approximations, where P = 0.5, f = 110, φ = 110, B = 2000, C = 10,000, m = 5, and ρ = 0.95: (a) Model, (b) G|G|1, heavy traffic, (c) M|D|1, and (d) M|M|1. (NOTE: Only the dots should be compared.)
a b
0.6 0.5
c
0.4 0.3
d
0.2 0.1 70 100
150
200 250 Delay, ms
300
350
Chapter Three
Figure 3.10 Comparison between the delay distribution for infinite- and finite-buffer facilities, where P = 112, f = 800, φ = 200, m = 13–15, B = 5000, C = 50,000, h = 0.16, and µ = 0.02: (a) k = 2, 2 buffers; (b) k = 4, 4 buffers; (c) k = ∞, infinite buffer.
a b
0.9
c
0.8 0.7 Probability
88
0.6 0.5 0.4 0.3 0.2 0.1 200
250 300 Delay, ms ρ = 0.938
Key:
350 ρ = 0.813
probability is low, even when the number of buffers is small. Table 3.2 depicts results that are typical for any model parameter selection. Thus, even at r = 0.938, four buffers are sufficient to guarantee that 95 percent of the packets are not blocked.
Effect of Speech Models Thus far the speaker model has been kept frozen with respect to two factors: (1) the number of states in the speech chain, and (2) the transition matrix of the speech chain.
Table 3.1 Effect of Finite Buffer Buffer size Utilization 0.563 0.688 0.813 0.938
K=∞ E(D) K0.95
K=4 E(D) K0.95
K=3 E(D) K0.95
K=2 E(D) K0.95
0.182 0.187 0.197 0.239
0.182 0.187 0.194 0.203
0.182 0.187 0.191 0.196
0.182 0.185 0.186 0.188
0.188 0.207 0.238 0.357
0.188 0.205 0.226 0.234
0.188 0.203 0.213 0.216
0.188 0.190 0.196 0.198
Issues in Packet Voice Communication
89
Table 3.2 Fraction of Packets Not Blocked ρ = 0.563 ρ = 0.688 ρ = 0.813 ρ = 0.938
K=∞
K=4
K=3
K=2
1.0 1.0 1.0 1.0
1.0 0.9996 0.9905 0.9557
1.0 0.9967 0.9738 0.9275
0.9995 0.9653 0.9139 0.8653
Transition Matrix Changes for the Two-State Model Because of the symmetry requirement, the steady-state distribution is invariant under changes in the transition matrix. This follows, quite simply, because the steady-state distribution of the arrival process can be shown to be
P(Z (∞) = k) =
冢 k 冣冢ᎏ2 冣 m
1
m
for a two-state chain. Changes in the transition matrix affect only the transient behavior unless the symmetry requirement is eliminated.
Three-State Speaker Model A brief investigation of a three-state speaker model was undertaken. It has already been indicated that the two-state model is very conservative, predicting more traffic than actual. The average number of packets per frame supplied by a population of m users under the two- and three-state model are m/2 and mx/(2x + p), respectively, where the three-state transition matrix is
冢
1−p x 0
p 1 − 2x p
0 x 1−p
冣
Typical values of the transition and probabilities are p = 0.1, a high tendency to continue speech, once initiated; and x = 0.46, a high tendency to break away from mutual silence. Thus, for m terminals, 0.46m packets would be expected, compared with 0.5m for the two-state model. With the assumption of a three-state speech model, 10 percent more users can be accommodated (if x = 0.23, the model could accommodate 20 percent more speakers). The delay distribution for the examples of the “Finite-Buffer Case” subsection have been obtained for the three-state model and are depicted in Figure 3.11 for infinite buffers. Observe that the delay distribution is fairly similar to the distribution obtained with a two-state model at the same utilization. Compare ρ = 0.733, s = 3; ρ = 0.845, s = 3; and ρ = 0.813, s = 2. The effect of buffer size is similar to that described for the finite-buffer case. For the three-state speaker model, with the same value of m, fewer packets arrive
Chapter Three
Figure 3.11 Comparison between the delay distribution for a two-state speech model and a threestate speech model, where P = 800, φ = 200, m = 13–15, B = 5000, C = 50,000, h = 0.16, and µ = 0.02: (a) three states, m = 13, ρ = 0.733 (b) two states, m = 13, ρ = 0.813 (c) three states, m = 15, ρ = 0.845 (d) two states, m = 15, ρ = 0.938
a
0.9
b
c
d
0.8 0.7 Probability
90
0.6 0.5 0.4 0.3 0.2 0.1 200
250 300 Delay, ms
350
at the packet switch, and so the number of buffers necessary for acceptable operation is the same or reduced. In addition, the three-state speaker model has little or no effect on the steady-state output distribution, reflecting only the change in utilization. The explicit dependence of E(D) on ρ, h, and µ continues to hold.
Optimal Packet Length Figure 3.12 depicts the convexity of the total delay experiences by a voice packet (packetization plus network delay), when plotted against the packet length P. Figure 3.13 shows an actual curve where the mean total delay and the 0.01 and 0.05 percentiles are shown. From these curves, it is obvious that the designer must be careful in selecting the appropriate packet length if the delay is to be optimized. This model has been used in Reference [8] to show that Popt = L(1 + F)/(1 - L) and
冢ᎏLᎏ冣(F + 1)
2
φ 1−L E(D)|P = Popt = ᎏ B
L 0.75 − ᎏᎏ 2 ᎏᎏ 1+L A
冤冢
冣 冥 +1
冤
冥
0.75ᎏ − L − ᎏLᎏ − ᎏAᎏ + (F + 1)A ᎏ 1−L 1−L 2 F
Issues in Packet Voice Communication
Figure 3.12 Convexity of total delay against packet length: (a) link delay, (b) packetizing delay, and (c) total delay.
c
Time
b
a Pmin
100
P
Link utilization with given packet length
ρ 1.0 0.6 0.2
90
Time, ms
80
Figure 3.13 Representative optimization curve for the packet length: (a) K0.01, (b) K0.05, and (c) E(D).
70
a
60
b
c
50 40 30 100
200
300 P, bits
400
500
91
92
Chapter Three
where
F=
and
B A=ᎏ C
mA L=ᎏ 2
√
冤
冥
A (0.5 − L) 1 − 1 − ᎏ ᎏᎏ 2 L ᎏᎏᎏ L 1 + A 0.75 − ᎏ 2 ᎏᎏ 1−L
冢
冤
冣
冥
Detailed analysis of this formula indicates that vocoder technologies usually require short packet lengths (150 to 400 bits), while PCM can operate at slightly longer packets (600 bits).
Transient Behavior The model at hand can be employed to study the transient behavior of the single link. This reveals the time duration of degraded performance when a bad state is entered, that is, an unusual period of high speech activity in one direction. The basic procedure involves finding the steady-state delay, then perturbing the speaker activity in one direction. The basic procedure involves finding the steady-state delay, then perturbing the speaker activity states and observing the effect on the system. As an extreme, at a particular frame, every terminal is forced to supply a packet by restarting each speaker Markov chain in state 1 (active) with probability (1/2)m. On average, we would expect it only every (0.5)−m frames; for m = 20, this is about one in 1 million frames; in other terms, for a frame length of 0.11 second, this transient would occur once every 30 hours, on the average. This perturbation overloads the system for a period of time, causing a related increase in the delay faced by a typical incoming packet. Figure 3.14 shows the instantaneous expected delay as a function of time for various examples. The following observations can be made: 1. The maximum delay perturbation caused by the transient is very dependent on the steady-state link utilization and the speaker behavior parameter r (r is the probability that the speaker continues to speak).
Issues in Packet Voice Communication
0.50
0.45
0.40
0.35
0.30
0.25 Acceptable level
Perturbance
(4.5)
(4.0)
(3.0)
2.080
0.15
2.080 2.240
0.20
0.160 0.480 0.800 1.120
E (D ), s
Figure 3.14 Transient behavior: (a) 9 users, r = 0.1, ρ∞ = 0.536 (b) 11 users, r = 0.08, infinite buffers, ρ∞ = 0.688 (c) 11 users, r = 0.1, infinite buffer, ρ∞ = 0.688 (d) 11 users, r = 0.15, infinite buffer, ρ∞ = 0.688 (e) 11 users, r = 0.1, finite buffer, ρ∞ = 0.688 ( f ) 13 users, r = 0.1, ρ∞ = 0.813
93
Time, s
Steady states r = 0.15 r = 0.1 r = 0.08
2. The duration of the transient is a function of r only. As r decreases the more frame-to-frame correlation increases (r = 0.15, r = 0.1, r = 0.08) and the duration of the transient increases. The transient dies off rather rapidly (2 to 3 s) for typical values of r, r ≥ 0.08. 3. Finite buffer size has a damping effect on the transient. This behavior is intuitive. The system becomes overloaded: If the buffer is infinite, there is no other way to unload but to pump the packets out the line; in the case of a finite buffer, an unloading takes place when the packets are blocked, because there is no room in the buffer. The packet-blocking rate for this example immediately follows the overload (Figure 3.15). Typical transient delay distributions are shown in Figure 3.16, and Figure 3.17 depicts the overload situation. Note that utilization is temporarily pushed over 1. Because packet voice networks would be designed to be driven at high utilization, designers must guard against transients, or at least be aware of their effect.
Blocking probability overflow
Figure 3.15 Transient blocking rate for a four-buffer system.
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 160 320 480 640 Perturbation Time, ms
Probability
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
200
Steady state Frame 20 Frame 18 1.0 0.9 Probability
960
Steady-state distribution: Frame 7 Frame 8 Frame 9 Frame 10 Frame 11
100
Figure 3.16 Transient delay distribution: (a) frames 7 to 11, and (b) frames 12 to 20.
800
300
Frame 16 Frame 14 Frame 13 Frame 12
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 100
200 Time, ms
94
300
Issues in Packet Voice Communication
1.6 1.4 1.2 ρ 1.0 0.8 0.6 0.4 0.2 0
Utilization
E(D )
Figure 3.17 Instantaneous utilization.
144 120 96 72 48 24
E(D )
Perturbation
95
Time (in frames)
3.6 Conclusion A methodology is needed for the design of packet voice networks. Such networks differ from the packet data case in the following fundamental respects: • • • •
Regularity of input traffic More complex performance criteria (smoothness and error tolerance) Performance criteria imposed on a worst-case and end-to-end basis Different set of applicable protocols
A single-link situation has been modeled and implemented with a computer program. The link model can accept a wide variety of speech models and system parameters and yields the complete steady-state or transient distribution of delay. Some of the important facts learned from the studies are as follows: 1. Standard approximations are overly conservative in that they predict poorer performance than can be actually attained. 2. The single-link delay distribution is approximately exponential. 3. Percentile delay performance criteria track very well with equivalent performance criteria placed on the mean delay. 4. A closed-form expression for the mean delay is obtained. 5. A closed-form expression for optimal packet length is obtained. 6. Only a small number of buffers is found to be necessary to sustain adequate performance. A small number of buffers also reduces the transient excursions and durations.
96
Chapter Three
The single-link model was incorporated into a tandem-link model. Implementation and study of this model gives results for the situation when a link must service an incoming line as well as local terminals. Based on the results already obtained, a foreseeable methodology for general network design for packet voice can be outlined as follows: 1. Interactively set topology, link parameters, and system parameters. 2. Using closed-form analytical approximations for link delay behavior, obtain an optimal routing pattern. 3. Evaluate performance results using the detailed link model. 4. Return to step 1 until satisfied that the network is the cheapest one that satisfies all the design criteria.
References 1. D. Minoli. “Issues in Packet Voice Communication.” Proceedings of IEE 126 (8): 729–740. 2. D. Minoli. “Packetized Speech Networks. Part 1: Overview.” Australian Electronics Engineer (April 1979): 38–52. 3. D. Minoli. “Packetized Speech Networks. Part 2: Queueing Model.” Australian Electronics Engineer (July 1979): 68–76. 4. D. Minoli. “Packetized Speech Networks. Part 3: Delay Behavior and Performance Characteristics.” Australian Electronics Engineer (August 1979): 59–68. 5. D. Minoli. “General Geometric Arrival, Constant Server Queuing Problem with Applications to Packetized Voice.” ICC 1978 Conference Record 3: 36.6.1–36.6.5. 6. D. Minoli and E. Lipper. “Mixed Classes of Traffic Obeying Different Queueing Rules with Reference to Integrated Packet Switched Networks.” 1978 IEEE Canadian Conf. on Comm. and Power, Record: 1–4. 7. D. Minoli. “General Geometric Arrival, Discrete Server Queue.” NTC 1978 Conference Record: 44.2.1–44.2.5. 8. D. Minoli. “Optimal Packet Length for Packet Voice Communication.” IEEE Trans. on Comm. (concise paper) COMM-27 (March 1979): 607–611. 9. D. Minoli. “Satellite On-Board Processing of Packetized Voice.” ICC 1979 Conference Record: 58.4.1–58.4.5. 10. D. Minoli. “Some Design Parameters for PCM Based Packet Voice Communication.” 1979 International Electrical/Electronics Conference Record.
Issues in Packet Voice Communication
97
11. D. Minoli and K. Schneider. “Computing Average Loss Probability in a Circuit Switched Network.” IEEE Trans. on Comm. COMM-28 (January 1980): 27–33. 12. D. Minoli and K. Schneider. “An Algorithm for Computing Average Loss Probability in a Circuit Switched Network.” Telecommunications Journal 29-1 (June 1979): 28–37. 13. D. Minoli. “Digital Techniques in Sound Reproduction. Part 1” Audio (April 1980): 54–61. 14. D. Minoli and W. Nakamine. “Mersenne Numbers Rooted on 3 for Number Theoretic Transforms.” 1980 IEEE International Conf. on Acoust., Speech and Signal Processing. 15. D. Minoli. “Cost Effective Design of Local Access Networks Using Common Carriers Bulk Service Offering.” Electrical Communication 55 (2): 118–126. 16. D. Minoli. “Diseno mas economico de redes de acceso locales usando las ofertas de servicio masivo de las companias explotadoras.” Comunicaciones Electricas 55 (2): 118–126. 17. D. Minoli. “Gestaltung Kostengunstinger Anschlussnetze für das FAXPAKNetz.” Elektriches Nachrichtenwesen 55 (2): 118–126. 18. D. Minoli. “Optimisation de cout des reseaux d’acces locaux utilisant les options de tarif forfaitaires pour communications publiques.” Revue des Telecommunicationes 55 (2): 118–126. 19. D. Minoli. “Digital Voice Communication over Digital Radio Links.” SIGCOMM Computer Communications Review 9 (4): 6–22. 20. D. Minoli. “Sizing Trunk Bundles Which Can Be Seized at Both Ends with Different Grade of Service.” IEEE Trans. on Comm. COMM-28 (6): 794–801. 21. D. Minoli. “Getting the Most WATS for Every Communication Dollar.” Data Communications (September 1980): 91–102. 22. D. Minoli. “Engineering Two-Way Foreign Exchange Trunk Bundle Systems.” Computer Communication 3 (2): 69–76. 23. D. Minoli. “Digital Techniques in Sound Reproduction. Part 2.” Audio (May 1980): 34–42. 24. D. Minoli. “Selection of Communications Facilities under a Multigraduated Tariff.” Computer Networks 4 (6): 295–301. 25. D. Minoli. “Optimal Allocation in a Multi-Resources Graduate Tariff Communication Environment.” Computer Communications 3 (4): 117–124. 26. D. Minoli. “Unmasking a Puzzling New Tariff: A Look at Some Hard Facts On WATS.” Telephony 199 (21): 24–27. 27. D. Minoli. “A Case for Simpler Tariffs.” Telephony 201 (7): 22–24. 28. D. Minoli. “Designing Large Scale Private Voice Networks.” Telephony 201 (12): 130ff.
98
Chapter Three
29. D. Minoli. “Strategy in Multigraduated Tariffs under Random Usage.” Computer Communications 4 (6). 30. D. Minoli. “A New Design Criterion for Store-and-Forward Networks.” Computer Networks 7 (1983): 9–15. 31. D. Minoli. “Designing Practical Voice and Data Communications Networks. Part 1.” Computer World (May 6, 1985): 67, 73. 32. D. Minoli. “All About Channel Banks: Technology Briefing.” Datapro Report CA-80-010-902 (October 1987). 33. D. Minoli. “Evaluating Communications Alternatives. Part 1: Cost Analysis Methods.” Datapro Report CA03-010-401 (June 1986). 34. D. Minoli. “Evaluating Communications Alternatives. Part 2: Pragmatic Network Design Issues.” Datapro Report CA09-010-451 (June 1986). 35. D. Minoli. “Phone Changes Benefit Users.” Computer World (May 12, 1986): 19, 23. 36. D. Minoli. “Integrated Voice/Data PBX.” Teleconnect (May 1986). 37. D. Minoli. “Engineering PBX Networks. Part 1: Design Modules.” Datapro Report MT30-315-101 (September 1986). 38. D. Minoli. “Engineering PBX Networks. Part 2: Gathering Support Data.” Datapro Report MT30-315-201 (September 1986). 39. D. Minoli. “ISDN: Good News for the Communications Manager. Part 2.” Computer World (January 20, 1986). 40. D. Minoli. “ISDN: Good News for the Communications Manager. Part 1.” Computer World (January 13, 1986). 41. D. Minoli. “An Overview of ADPCM Transcoders.” Datapro Report CA80-010604 (November 1986). 42. D. Minoli. “Designing Voice Networks.” Datapro Report 5401MVN (May 1995). 43. D. Minoli. “Traffic Engineering Basics.” Datapro Report 5410MVN (June 1995). 44. D. Minoli. “Designing End-to-End Networks for New Multimedia Applications. Proceedings, ICA. Portland, OR, 1995. 45. D. Minoli. “Common Channel Signaling System Number 7.” Datapro Report 8420 (March 1996). 46. D. Minoli. “Designing Voice Networks.” Datapro Report 5401 (April 1996). 47. D. Minoli. “Queueing Fundamentals for Telecommunications.” Datapro Report 5430 (January 1996). 48. D. Minoli. “Signaling Concepts.” Datapro Report 2912 (February 1996). 49. D. Minoli. “Advanced Intelligent Networks.” Datapro Report 3070 (March 1996). 50. D. Minoli. “Installing and Maintaining a Premises-Based Voice Wiring System.” Datapro Report 5701 (March 1997).
Issues in Packet Voice Communication
99
51. D. Minoli. “Private T1 Networks for Business.” Datapro Report (May 1996). 52. D. Minoli. “The Telephone Room Environment.” Datapro Report 5720 (May 1996). 53. D. Minoli. “Traffic Engineering Basics.” Datapro Report 5420 (July 1996). 54. D. Minoli. “Interstate Private Line Facilities.” Datapro Report 3501 (August 1996). 55. D. Minoli. “T-Carrier Network Planning and Design.” Datapro Report 5240 (September 1996). 56. D. Minoli. “AT&T Tariff 12.” Datapro Report 3010 (November 1996).
chapter 4 Voice Technologies for Packet-Based Voice Applications 4.1 Introduction This chapter provides a synopsis of voice digitization and compression methodologies. After the general discussion, it focuses on ADPCM techniques for packet networks (ITU-T G.727). Specific standardized vocoding methods are discussed in more detail in Chapter 5.
General Overview of Speech Technology This section provides an overview of speech encoding methodologies that are relevant to voice over IP applications. In general, low-bit-rate voice (LBRV) methods are of interest for IP at this time. Two disciplines play a role: • Speech analysis is that portion of voice processing that converts speech to digital forms suitable for storage on computer systems and transmission on digital (data or telecommunications) networks. • Speech synthesis is that portion of voice processing that reconverts speech data from a digital form to a form suitable for human usage. These functions are essentially the inverse of speech analysis.
101
102
Chapter Four
Speech analysis processes are also called digital speech encoding (or coding), and speech synthesis is also called speech decoding. The objective of any speech-coding scheme is to produce a string of voice codes of minimum datarate, so that a synthesizer can reconstruct an accurate facsimile of the original speech in an effective manner, while optimizing the transmission (or storage) medium. Many of the LBRV methods make use of the features of human speech, in terms of the properties that can be derived from the vocal tract apparatus. The vocal tract is excited by air from the lungs. The excitation source is either voiced or unvoiced. In voiced speech, the vocal cords vibrate at a rate called the fundamental frequency; this frequency is what we experience as the pitch of a voice. Unvoiced speech is created when the vocal cords are held firm without vibration, and the air is either aspirated through the vocal tract or is expelled with turbulence through a constriction at the glottis, tongue, teeth, or lips.
Waveform Coding Two major techniques are used in speech analysis, waveform coding and vocoding. Waveform coding, which is applicable to traditional voice networks and voice over ATM, is treated extensively in the Wiley companion book to this volume [1]. Therefore, only a summary treatment is included here. Two processes are required to digitize an analog signal, as follows: 1. Sampling. This discretizes the signal in time. 2. Quantizing. This discretizes the signal in amplitude. The devices that accomplish speech analysis (digitization) are called codecs, for coder/decoder. Coders include analog to digital (A/D) converters, which typically perform a digitization function, and analysis modules, which further process the speech to reduce its datarate and prepare it for transmission. The reverse process uses synthesis modules to decode the signal and D/A converters to reconvert the signal back to analog format. Naturally, the goal of the entire digitizing process is to derive from an analog waveform a digital waveform that is a faithful facsimile (at the acoustical perception level) of the original speech. The sampling theorem indicates that if the digital waveform is to represent the analog waveform in useful form, then the sampling rate must be at least twice the highest frequency present in the analog signal. Waveform coding methods are driven by this theorem. Toward that end, analog telephonic speech is filtered before digitization to remove higher frequencies. The human speech spectrum contains frequencies beyond 12,000 Hz, but for telephony applications the higher frequencies can be safely filtered out. Specifically, the channel bank and digital loop carrier equipment in telephone
Voice Technologies for Packet-Based Voice Applications
103
networks is designed to eliminate frequencies above 3.3 kHz, although nominally the voice band is 4 kHz. Consequently, analog speech signals are sampled at 8000 Hz for pulse code modulation (PCM) applications. PCM is currently the most often used digitization in telephony. Today, nearly every telephone call in the United States is digitized at some point along the way using PCM. As noted, sampling used in the waveform coding of voice makes an analog waveform discrete in time, and quantizing makes it discrete in amplitude. This discreteness is a direct consequence of the fact that computers are digital devices, wherein the values that are allowed for variables are discrete. The digitization process measures the analog signal at each sample time and produces a digital binary code value representing the instantaneous amplitude. Optimizing speech quality means production of a digital waveform that can be reconverted to analog with as small an error as possible. Quantization is the process that maps a continuum of amplitudes into a finite number of discrete values. This results in a loss of information and the ensuing introduction of noise, called quantization noise or quantization error. In waveform coding this loss of information is small, and the results are called (nearly) lossless; vocoding methods discard much more information and are therefore called lossy. Signal-to-noise ratio (SNR), expressed in decibels, is a measure used to discuss voice quality. For telephony applications, speech coders are designed to have a signal-to-noise ratio above 30 dB over most of their range.
Uniform Quantization In a basic PCM system, input to the quantizer hardware comes in the form of an analog voltage provided by the sampler circuit. The simplest approach would be to use a uniform quantization method. Here, the range of input voltages is divided into 2n segments, and a unique codeword of n bits is associated with each segment. The width of each segment is known as the step size. The range R of an n-bit quantizer with step size s is clearly
R = s(2n) This implies that if the input voltage were to exceed R, clipping would result. To address this issue, logarithmic quantization is used.
Logarithmic Quantization The goal of logarithmic quantization is to maintain a reasonably constant signal-tonoise ratio over a wide range of analog amplitudes; using this technique the signalto-noise ratio will not vary with incoming signal amplitude. To accomplish this, one quantizes not the incoming signal but the log value of the signal; for example, for
104
Chapter Four
analog values w, the equation y = h + k log (w) with h and k constant provides such a logarithmic function.1 Logarithmic quantization is a compression process: It reduces the dynamic range of a signal according to a logarithmic function. After compression a reverse process, exponentiation, is required to recover a facsimile of the original; the entire cycle is often referred to as companding, for compressing/expanding [2]. Figure 4.1 depicts a logarithmic curve and a linear approximation to it. The x axis shows the input level; the y axis shows the companded value. The piecewise linear approximation defines, for illustrative purposes, eight regions, four on the positive side of the x axis and four on the negative side of the axis (in North America, a specific logarithmic scheme called -law is used; in Europe a similar but not identical approach called A-law; both methods employ 8-bit logarithmic quantization with 16 regions and 16 steps per region). For the illustrative example, 6 bits are used to encode the incoming signal: 3 bits for the region and 3 bits for eight quantization levels in each region, based on the piecewise linear approximation. For example, one can assign the binary region number to the leftmost 3 bits of the code as noted in the figure; each step in the region is uniquely identified, this assignment being the same in each region.
E.g., 111 100 region step
Companded signal 1.0
Region 100 111
Region 101
0.8
101 010 100 110 001 011 000
0.6 0.4 Input signal –1.00
–0.75 Region 000
Region 110
0.2 –0.50
0.25
Region 001
0.25 –0.2 –0.4
Figure 4.1 Example of discretestep piecewise linear approximation to logarithmic function.
Region 111
–0.6 Region 010 –0.8 Region 011 –1.0
0.50
0.75
1.00
Voice Technologies for Packet-Based Voice Applications
105
Adaptive Quantization To achieve further reductions in voice bitrate, one can employ analysis algorithms that make use of the technique of dynamically adapting the quantizer step size in response to variations in input signal amplitude. The goal is to maintain a quantizer range that is matched to the input signal’s dynamic range. PCM techniques that adapt step size are referred to as adaptive PCM (APCM). The technique can be applied to both uniform and logarithmic (nonuniform) quantizers. There are several adaptation algorithms, but all aim to estimate the slowly varying amplitude of the input signal while balancing the need to increase step size to attain appropriate range against the worsening in signal-tonoise ratio that results from larger step sizes. For syllabic companding techniques, the quantization characteristics change at about the same rate as syllables occur in speech. Other methods use instantaneous companding. Yet other methods calculate signal amplitude statistics over a relatively short group of samples and adjust the step size accordingly (e.g., feed-forward adaptive PCM and feedback adaptive PCM).
Waveform Coding Mechanisms Speech signals contain a significant amount of redundant information. By making use of this fact and by removing some of these redundancies through processing, one is able to produce data parameters describing the waveform with a lower datarate than is otherwise possible and is still able to make a reasonably faithful reconstruction of the original. Speech samples generated at the Nyquist rate are correlated from sample to sample. (Actually, they remain moderately correlated over a number of consecutive samples.) This implies that values of adjacent samples do not differ significantly. Consequently, given some number of past samples, it is possible to predict with a degree of accuracy the value of the next sample. Some of these techniques are discussed next. In the differential coding technique (also called linear prediction2), rather than coding the input waveform directly, one codes the difference between that waveform and one generated from linear predictions of past quantized samples. At sample time j, this encoder codes e(j), the prediction errors at time j, where
e(j) = y(j) − [a1y(n − 1) + a2 y(n − 2) + . . . + ap y(j − p)] and where y(j) is the input sample and the term in square brackets is a predicted value of the input based on previous values. The terms ai are known as prediction coefficients. The output values e(j) have a smaller dynamic range than the original signal; hence, they can be coded with fewer bits.3 This method entails linear predictions because, as the preceding equation shows, the error predictions involve only first-order (linear) functions of past samples. The prediction coefficients ai are selected so as to minimize the total squared prediction error
106
Chapter Four
E = e2(0) + e2(1) + . . . + e2(n) where n is the number of samples. Once computed, the coefficients are used with all samples until they are recalculated. In differential coding, a trade-off can be made by adapting the coefficients less frequently in response to a slowly changing speech signal. In general, predictor coefficients are adapted every 10 to 25 ms. As is the case with adaptive quantization, adaptive prediction is performed with either a feedback or feed-forward approach. In the case of feedback predictive adaptation, the adaptation is based on calculations involving the previous set of n samples; with feed-forward techniques, a buffer is needed to accumulate n samples before the coefficients can be computed (this, however, introduces a delay, because the sample values have to be accumulated) [2]. Values of n = 4 to n = 10 are used. For n ≥ 4, adaptive predicators achieve signal-to-noise ratios of 3 or 4 dB better than the nonadaptive counterparts and more than 13 dB better than PCM. A basic realization of linear prediction can be found in differential PCM (DPCM) coding, where, rather than quantizing samples directly, the difference between adjacent samples is quantized. This results in one less bit being needed per sample compared to PCM, while maintaining the signal-to-noise ratio. Here, if y(j) is the value of a sample at a time j for a PCM waveform, then the DPCM sample at time j is given by e( j), where
e(j) = y(j) − [a1y(n − 1) + a2 y(n − 2) + . . . + ap y(j − p)] and where a1 is a scaling factor, while a2 = a3 = . . . = ap = 0; namely,
e(j) = y(j) − a1y(n − 1) Further gains over PCM and DPCM are obtained by including adaptation. This is done either by incorporating adaptive quantization or by adjusting the scale factor (at syllabic rate), or both. The adaptive differential PCM (ADPCM) method uses these techniques. A simplified version of differential coding is found in a scheme called delta modulation (DM). DM is a first-order linear prediction where the codeword is limited to one bit. A sign bit representing the direction of the difference between the input waveform and the accumulated output is stored at sample time. The sign bits are used by the decoder to determine whether to increment or decrement the output waveform by one step size [2]. In a variant technique called adaptive delta modulation (ADM), the step size of the baseline DM is adapted according to a number of possible algorithms. The objective of these various algorithms is more accurate tracking of the input signal. This is accomplished by increasing the step size during periods of slope overload and decreasing it when slope overload is not occurring.
Voice Technologies for Packet-Based Voice Applications
107
Vocoding (Analysis/Synthesis) in the Frequency Domain The waveform methods previously discussed above relate to time-domain (signal amplitude versus time) representation of the speech signal. Another approach is to look at the signal in the frequency domain. A spectrum represents the frequency distribution of energy present in speech over a period of time. There are advantages at looking at the signal in this fashion. Frequency-domain coders attempt to produce code of minimum data rate by exploiting the resonant characteristics of the vocal tract. There is a lot of information that can be extracted and exploited in the speech spectrum.
Formats Certain frequencies resonate within the vocal tract, depending on the tract’s size and shape. Resonant frequencies appear in the spectrum as local maxima and are called formant frequencies or formants. The energy at these frequencies is reinforced when reflections of the wave coincide and additively build on each other; energy in other frequencies tends to dissipate. This results in the distinctive formants, as depicted in Figure 4.2.
Filters Filters are utilized to derive the frequency spectrum from the speech waveform. Traditionally, filters have used analog circuitry.
60
Speech energy, dB
40
20
Voiced speech
0 –20
–40
f1 –60
0
f2 1000
Figure 4.2 Example of formants.
f3 2000
f4
f5
3000 Frequency, Hz
f6 4000
f7 5000
6000
108
Chapter Four
The discrete Fourier transform (DFT) is a mathematical process for filtering waveforms digitally. Typically, DFTs are used to calculate correlation functions and to produce frequency spectra from discrete waveforms of finite length. The DFT divides the spectrum from 0 Hz through the sampling frequency (say, 4000 Hz) into n equal steps and provides an energy value for each [2]. Specifically, formant frequencies can be determined from the digital representation of a frequency spectrum. The result of DFT processing of the logarithm of a frequency spectrum is called a cepstrum, which is useful in the analysis process.
Parametric Vocoders Parametric vocoders model speech production mechanisms rather than the resulting waveforms. They do so by taking advantage of the slow rate of change of the signals originating in the vocal tract, allowing one set of parameters to approximate the state over a period up to about 25 ms. Most vocoders aim to characterize the frequency spectrum and the vocal tract excitation source (lungs and vocal cords) with only a small set of parameters. These parameters, called a data frame, include the following: • About a dozen coefficients that define vocal tract resonance characteristics • A binary parameter specifying whether the excitation source is voiced or unvoiced • A value for the excitation energy • A value for pitch (during voicing only) The vocal tract state is approximated by analyzing the speech waveform every 10 to 25 ms and calculating a new set of parameters at the end of the period. A sequence of data frames is used remotely (or on playback from storage) to control synthesis of a mirror waveform. Because only a handful of parameters are transmitted, the voice datarate is low. One of the advantages of vocoders is that they often separate excitation parameters: Pitch, gain, and voiced and unvoiced indications are carried individually in the data frame, so each of these variables can be modified separately before or during synthesis (see Figure 4.3). Vocoder datarates run from about 1200 to 8000 bps; the rate is dependent upon the frame rate, the number of parameters in the frame, and upon the accuracy with which each parameter is coded (see Table 4.1) [3]. As seen in Figure 4.3, there are excitation sources (voice and unvoiced), loudness controls, and a vocal tract filter network. The excitation source for voiced speech consists of a periodic impulse generator and a pulse-shaping circuit. The impulse period adjusts to follow the original pitch according to the pitch frequency parameter being fed to it from the data frame. The vocal tract filter network emulates resonance characteristics of the original vocal tract. The synthetic glottal waveform entering this section of the synthesizer is transformed to a speech
109
…
Voice Technologies for Packet-Based Voice Applications
Filter coefficients 01101… Data frames
Vocoder
Gain Voiced/unvoiced bit Pitch frequency
… Voiced excitation source Unvoiced excitation source
Voiced/ unvoiced switch
Loudness control
Vocal tract filter network
Synthesized speech waveform
Figure 4.3 Block diagram of a typical decoder. waveform approximating the original [2]. Different vocoder technologies have different filter network designs, as shown in Table 4.2.
Linear Predictive Coding Linear predictive coding (LPC) utilizes linear prediction methods. The term is applicable to those vocoding schemes that represent the excitation source parametrically and that use a higher-order linear predictor (n > 1). LPC analysis enjoys a number of desirable features in the estimation of such speech parameters as
Table 4.1 Datarate Requirements of Various Encoders
Coder log-PCM Adaptive delta modulation Adaptive DPCM Subband coder Adaptive predictive Channel vocoder LPC Formant vocoder SOURCE: Reference [3].
Minimum datarate, kbps Toll Communications Synthetic quality quality quality 56 40 32 24 16
36.6 24.6 16.6 69.6 67.2 2.4 2.4 0.5
110
Chapter Four
Table 4.2 Vocal Tract Mechanism for Various Vocoders Vocoder Formant vocoder Channel vocoder Homomorphic vocoder Phase vocoders LPC vocoders
Vocal tract mechanism Reproduces the formants; a filter for each of the first few formants is included, then all higher formants are lumped into one final filter. Network divides the spectrum into a number of bands. Uses calculation of a cepstrum every 10 or 20 ms for coding of both excitation and vocal tract parameters. Considers the phase of a signal in addition to its magnitude in an attempt to achieve a lower datarate for the same voice quality. Concatenated acoustic tubes (see text).
spectrum, formant frequencies, pitch, and other vocal tract measures. LPC analysis is conducted as a time-domain process. LPC coding produces a data frame at a rate of about 40 to 100 frames per second (lower frame rates produce lower-quality speech). As should be clear, the size of the frame depends on the number of coefficients (e.g., the order of the predictor) and the accuracy to which each of the parameters is quantized. It should be noted that speech synthesized from LPC coders is most sensitive to the first few coefficients; this, in turn, implies that the coefficients need not necessarily all be quantized with the same accuracy. The analog model that is solved by LPC is an approximation of the vocal tract (glottis and lips, but no nasal cavities) using concatenated acoustic tubes. If the number of cylinders is appropriately selected in the model, the frequency domain mathematics of the concatenated tubes problem approximately solves the vocal tract problem. LPC allows one to estimate frequency-domain acoustic tube parameters from the speech waveform, as follows. The LPC prediction coefficients obtained from the time-domain signal can be converted to reflection coefficients representing the set of concatenated tubes. This implies that with this methodology frequency-domain estimations that approximately describe the vocal tract can be obtained from time-domain data, using linear algebra. Specifically, the n prediction coefficients of an nth order predictor can be calculated by solving a system of n linear equations in n unknowns; the n reflection coefficients that are present in equations describing resonances in a concatenated acoustic tube on 0.5 ∗ (n − 1) sections can be calculated from the n prediction coefficients. Hence, LPC analysis generates a set of reflection coefficients, excitation energy, voice/unvoiced indication bit, and fundamental frequency (if signal is voiced). This functionality is very similar to what is implied in Figure 4.3.
Residual-Excited Linear Prediction Residual-excited linear prediction (RELP) does not derive pitch, gain, and the voiced/unvoiced decision from the prediction residual, as is done in LPC. Instead, a filter network can be driven directly by the residual waveform. RELP is also
Voice Technologies for Packet-Based Voice Applications
111
referred to as voice-excited linear prediction. Reflection coefficients are used (as in LPC) instead of prediction coefficients [2].
Vector Quantization Vector quantization (VQ) replaces a vector of information with a single value (or symbol) that represents a clustering of vectors that are close, based on some measure of distance. A vector may consist of a block of accumulated digital samples, a set of LPC reflection coefficients (with or without excitation parameters), or other frame or block of parameters. Given a set of vectors, K clusters can be defined in such a manner that each vector is a member of some cluster.4 Each cluster in its entirety can be represented in a codebook by one of its members or by some symbol or vector. The codebook contains K entries, one for each cluster. The clusters and codebook are chosen to best represent the original collections of vectors. At coding time, each time a vector is presented to the vector quantizer decision entity. At that juncture, the vector quantizer entity decides which cluster the vector belongs to, according to the same specific distance measure, and substitutes the appropriate symbol or value for the incoming vector. Here, quantization noise is measured by the distance between the codebook entry and the input vector [2]. Vector quantization methods have not yet seen widescale deployment.
4.2 G.727: ADPCM for Packet Network Applications The International Telecommunications Union (ITU) is a body within the United Nations Economic, Scientific and Cultural Organization (UNESCO). In the ITU-T (the sector dedicated to telecommunications), Study Group 15 (SG15) is charged with making recommendations related to speech and video processing. Study Group 14 (SG14) makes recommendations for modems, such as V.34 and V.32. This section discusses G.727. Although vocoder technology is expected to enter the scene for voice over IP, many carriers and suppliers of carrier equipment are still looking at ADPCM technologies—even for voice over ATM using AAL 2 techniques that, being a very recent development, could conceivably have leapfrogged the digitization techniques all the way to vocoder technology.
Introduction ITU-T Recommendation G.727 contains the specification of an embedded adaptive differential pulse code modulation (ADPCM) algorithm with 5, 4, 3, and 2 bits per sample (i.e., at rates of 40, 32, 24, and 16 kbps).5 The characteristics fol-
112
Chapter Four
lowing are recommended for the conversion of 64-kbps A-law or µ-law PCM channels to or from variable rate-embedded ADPCM channels. The recommendation defines the transcoding law when the source signal is a pulse code modulation signal at a pulse rate of 64 kbps developed from voice frequency analog signals as specified in ITU-T G.711. Figure 4.4 [4] shows a simplified block diagram of the encoder and the decoder. Applications where the encoder is aware and the decoder is not aware of the way in which the ADPCM codeword bits have been altered, or where both the encoder and decoder are aware of the ways the codewords are altered, or where neither the encoder nor the decoder are aware of the ways in which the bits have been altered, can benefit from other embedded ADPCM algorithms.
A- or µ-law 64-kbps PCM input
Convert to uniform PCM
Input signal +
+
Difference signal
ADPCM output
Adaptive quantizer
– Signal estimate
+
+
Reconstructed signal
+
BIT masking
Inverse adaptive quantizer
Adaptive predictor Quantizer difference signal
(a) Feed-forward + inverse adaptive quantizer
+
Convert to PCM
+ +
Reconstructed signal
Synchronous coding adjustment
Signal estimate +
ADPCM input
BIT masking
Feed-back inverse adaptive quantizer
+
+
Adaptive predictor
Quantizer difference signal
(b)
Figure 4.4 Simplified block diagrams of (a) the G.727 encoder and (b) the G.727 decoder. (From Reference [4].)
A- or µ-law 64-kbps PCM output
Voice Technologies for Packet-Based Voice Applications
113
The embedded ADPCM algorithms specified in G.727 are extensions of the ADPCM algorithms defined in ITU-T G.726,6 and are recommended for use in packetized speech systems operating according to the Packetized Voice Protocol (PVP) specified in Draft Recommendation G.764. PVP is able to relieve congestion by modifying the size of a speech packet when the need arises. Utilizing the embedded property of the algorithm described here, the least significant bits of each codeword can be disregarded at packetization points and/or intermediate nodes to relieve congestion. This provides for significantly better performance than is achieved by dropping packets during congestion.
Embedded ADPCM Algorithms Embedded ADPCM algorithms are variable-bit-rate coding algorithms with the capacity of bit-dropping outside the encoder and decoder blocks. They consist of a series of algorithms such that the decision levels of the lower rate quantizers are subsets of the quantizer at the highest rate. This allows bit reductions at any point in the network without the need for coordination between the transmitter and the receiver. In contrast, the decision levels of the conventional ADPCM algorithms, such as those in Recommendation G.726, are not subsets of one another; therefore, the transmitter must inform the receiver of the coding rate and the encoding algorithm. Embedded algorithms can accommodate the unpredictable and bursty characteristics of traffic patterns that require congestion relief. This might be the case in IP-like networks, or in ATM networks with early packet discard. Because congestion relief may occur after the encoding is performed, embedded coding is different from the variable-rate coding where the encoder and decoder must use the same number of bits in each sample. In both cases, however, the decoder must be told the number of bits to use in each sample. Embedded algorithms produce codewords that contain enhancement bits and core bits. The feed-forward (FF) path utilizes enhancement and core bits, while the feedback (FB) path uses core bits only. The inverse quantizer and the predictor of both the encoder and the decoder use the core bits. With this structure, enhancement bits can be discarded or dropped during network congestion.7 However, the number of core bits in the FB paths of both the encoder and decoder must remain the same to avoid mistracking. The four embedded ADPCM rates are 40, 32, 24, and 16 kbps, where the decision levels for the 32-, 24-, and 16-kbps quantizers are subsets of those for 40 kbits per quantizer. Embedded ADPCM algorithms are referred to by (x, y) pairs, where x refers to the FF (enhancement and core) ADPCM bits and y refers to the FB (core) ADPCM bits. For example, if y is set to 2 bits, (5, 2) represents the 24-kbps embedded algorithm and (2, 2) the 16-kbps algorithm. The bit rate is never less than 16 kbps because the minimum number of core bits is 2. Simplified block diagrams of both the embedded ADPCM encoder and decoder are shown in Figure 4.4.
114
Chapter Four
The G.727 recommendation provides coding rates of 40, 32, 24, and 16 kbps and core rates of 32, 24, and 16 kbps. This corresponds to the following pairs: (5, 2), (4, 2), (2, 2); (5, 3), (4, 3), (3, 3); and (5, 4), (4, 4).
ADPCM Encoder Subsequent to the conversation of the A-law or µ-law PCM input signal to uniform PCM, a difference signal is obtained by subtracting an estimate of the input signal from the input signal itself. An adaptive 4-, 8-, 16-, or 32-level quantizer is used to assign 2, 3, 4, or 5 binary digits to the value of the difference signal for transmission to the decoder. (Not all the bits necessarily arrive at the decoder since some of these bits can be dropped to relieve congestion in the packet network. For a given received sample, however, the core bits are assumed to be guaranteed arrival if there are no transmission errors and the packets arrive at their destination.) FB bits are fed to the inverse quantizer. The number of core bits depends on the embedded algorithm selected. For example, the (5, 2) algorithm will always contain 2 core bits. The inverse quantizer produces a quantized difference signal from these binary digits. The signal estimate is added to this quantized difference signal to produce the reconstructed version of the input signal. Both the reconstructed signal and the quantized difference signal are operated upon by an adaptive predictor that produces the estimate of the input signal, thereby completing the feedback loop.
ADPCM Decoder The decoder includes a structure identical to the FB portion of the encoder. In addition, there is also an FF path that contains a uniform PCM to A-law or µ-law conversion. The core, as well as the enhancement bits, is used by the synchronous coding adjustment block to prevent cumulative distortion on synchronous tandem codings (ADPCM-PCM-ADPCM, etc., digital connections) under certain conditions. The synchronous coding adjustment is achieved by adjusting the PCM output codes to eliminate quantizing distortion in the next ADPCM encoding stage.
ADPCM Encoder Principles Figure 4.5 shows a block schematic of the encoder. For each variable to be described, k is the sampling index and samples are taken at 125-µs intervals. A description of each block is given in the subsections that follow.
Input PCM Format Conversion This block converts the input signal s(k) from A-law or µ-law PCM to a uniform PCM signal sI(k).
115
Input PCM format conversion
S(k )
Figure 4.5 Encoder block schematic.
s (k )
Difference signal computation
d(k )
Adaptive quantizer
yL(k )
a1(k )
y(k )
Feed-back BIT masking
ADPCM output
Quantizer scale factor adaptation
L(k )
I (k )
Ic(k )
Adaptation speed control
Inverse adaptive quantizer
td(k )
tr(k ) FB
dq(k )
Tone and transition detector
Adaptive predictor
sr(k )
se(k ) a2(k )
Reconstructed signal calculator
116
Chapter Four
Difference Signal Computation This block calculates the difference signal d(k) from the uniform PCM signal sI(k) and the signal estimate se(k):
d(k) = sI(k) − se(k)
Adaptive Quantizer A 4-, 8-, 16-, or 32-level nonuniform midrise adaptive quantizer is used to quantize the difference signal d(k). Prior to quantization, d(k) is converted to a base-2 logarithmic representation and scaled by y(k), which is computed by the scale factor adaptation block. The normalized input/output characteristic (infinite precision values) of the quantizer is given in tables in the standard for the 16-, 24-, 32-, and 40-kbps algorithms, respectively (Table 4.3 depicts the normalized input/output characteristics for 40 kbps, for illustrative purposes). Two, three, four, or five binary digits are used to specify the quantized level representing d(k) (the most significant bit represents the sign bit and the remaining bits represent the magnitude). The 2-, 3-, 4-, or 5-bit quantizer output I(k) forms the 16-, 24-, 32-, or 40-kbps output signal and is also fed to the bit-masking block. I(k) includes both the enhancement and core bits.
Table 4.3 Quantizer Normalized Input/Output Characteristic for 40-kbps Embedded Operation Normalized quantizer input range* log2|d( k)| − y( k)
|I( k)|
Normalized quantizer output log2|dq( k)| − y( k)
(−∞, −1.05) [−1.05, −0.05) [−0.05, 0.54) [0.54, 0.96) [0.96, 1.30) [1.30, 1.58) [1.58, 1.82) [1.82, 2.04) [2.04, 2.23) [2.23, 2.42) [2.42, 2.60) [2.60, 2.78) [2.78, 2.97) [2.97, 3.16) [3.16, 3.43) [3.43, ∞)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
−2.06 −0.48 0.27 0.76 1.13 1.44 1.70 1.92 2.13 2.33 2.51 2.69 2.87 3.05 3.27 3.56
*[ indicates that the endpoint value is included in the range, and ( or ) indicates that the endpoint value is excluded from the range.
Voice Technologies for Packet-Based Voice Applications
117
Bit Masking This block produces the core bits Ic(k) by logically right-shifting the input signal I(k) so as to mask the maximum droppable (least-significant) bits. The number of bits to mask and the number of places to right-shift depend on the embedded algorithm selected. For example, this block will mask the two least-significant bits (LSBs) and shift the remaining bits two places to the right when the (4, 2) algorithm is selected. The output of the bit-masking block Ic(k) is fed to the inverse adaptive quantizer, the quantizer scale factor adaptation, and the adaptation speed control blocks.
Inverse Adaptive Quantizer The inverse quantizer uses the core bits to compute a quantized version dq(k) of the difference signal using the scale factor y(k) and the tables alluded to previously (e.g., Table 4.3) and then taking the antilog to the base 2 of the result. The estimated difference se(k) is added to dq(k) to reproduce the reconstructed version sI(k) of the input signal. The tables previously alluded to are applicable only when there are specific bits (e.g., 5 bits for Table 4.3) in the FF path.
Quantizer Scale Factor Adaptation This block computes y(k), the scaling factor for the quantizer and for the inverse quantizer. [The scaling factor y(k) is also fed to the adaptation speed control block.] The inputs are the bit-masked output Ic(k) and the adaptation speed control parameter aI(k). The basic principle used in scaling the quantizer is bimodal adaptation: fast for signals (e.g., speech) that produce difference signals with large fluctuations, and slow for signals (e.g., voiceband data and tones) that produce difference signals with small fluctuations. The speed of adaptation is controlled by a combination of fast and slow scale factors. The fast (unlocked) scale factor yu(k) is recursively computed in the base-2 logarithmic domain from the resultant logarithmic scale factor y(k):
yu(k) = (1 − 2−5) y(k) + 2−5W[Ic(k)] where
1.06 ≤ yu(k) ≤ 10.00 For 2-core-bit operation (1 sign bit), the discrete function W[Ic(k)] is defined as in Table 4.4a. For 3-core-bit operation (1 sign bit), the discrete function W[Ic(k)] is defined as in Table 4.4b. For 4-core-bit operation (1 sign bit), the discrete function W[Ic(k)] is defined as in Table 4.4c.
118
Chapter Four
Table 4.4 Values of W[Ic(k)] (a)
(b)
(c)
|Ic(k)|
1
0
W[Ic(k)]
27.44
−1.38
|Ic(k)|
3
2
1
0
W[Ic(k)]
36.38
8.56
1.88
−0.25
|Ic(k)|
7
6
5
4
3
2
1
0
W[Ic(k)]
69.25
21.25
11.50
6.13
3.13
1.69
0.25
−0.75
The factor (1 − 2−5) introduces finite memory into the adaptive process so that the states of the encoder and decoder converge following transmission errors. The slow (locked) scale factor y1(k) is derived from yu(k) with a low-pass filter operation:
y1(k) = (1 − 2−6) y1(k − 1) + 26yu(k) The fast- and slow-scale factors are then combined to form the resultant scale for
y(k) = a1(k) yu(k − 1) + [1 − a1(k)] y1(k − 1) where
0 ⱕ a1(k) ⱕ 1
Adaptation Speed Control The controlling parameter a1(k) can assume values in the range [0, 1]. It tends toward unity for speech signals and toward zero for voiceband data signals. It is derived from a measure of the rate of change of the difference signal values. Two measures of the average magnitude of Ic(k) are computed:
dms(k) = (1 − 2−5) dms(k − 1) + 2−5 F[Ic(k − 1)] and
Voice Technologies for Packet-Based Voice Applications
119
dm1(k) = (1 − 2−7) dm1(k − 1) + 2−7 F[Ic(k − 1)] where F[(Ic(k)] is defined as
[Ic(k)]
1
0
F[Ic(k)]
7
0
for 2-core-bit (1 sign bit) operation; or
[Ic(k)]
3
2
1
0
F[Ic(k)]
7
2
1
0
for 3-core-bit (1 sign bit) operation; or
[Ic(k)]
3
6
5
4
3
2
1
0
F[Ic(k)]
7
3
1
1
1
0
0
0
for 4-core-bit (1 sign bit) operation. Thus, dms(k) is a relatively short-term average of F[Ic(k)] and dm1(k) is a relatively long-term average of F[Ic(k)]. Using these two averages, the variable ap(k) is defined:
ap(k) =
冦
(1 − 2−4)ap(k − 1) + 2−3 (1 − 2−4)ap(k − 1) + 2−3 (1 − 2−4)ap(k − 1) + 2−3 1 (1 − 2−4)ap(k − 1)
if |dms(k) − dm1(k)| ≥ 2−3dm1(k) if y(k) < 3 if td(k) = 1 if tr(k) = 1 otherwise
Thus, ap(k) tends toward the value 2 if the difference between dms(k) and dm1(k) is large [average magnitude Ic(k) changing], for an idle channel [indicated by y(k) < 3] or for partial band signals [indicated by td(k) = 1 as described following]. The value of ap(k) tends toward the value 0 if the difference is small [average magnitude of Ic(k) relatively constant]. Note that ap(k) is set to 1 upon detection of a partial band signal transition [indicated by tr(k) = 1]. ap(k − 1) is then limited to yield the a1(k) used in the “Quantizer Scale Factor Adaptation” subsection.
a1(k) =
冦1a (k − 1) p
if ap(k − 1) > 1 if ap(k − 1) ≤ 1
120
Chapter Four
This asymmetrical limiting has the effect of delaying the start of a fast- to slow-state transition until the absolute value of Ic(k) remains constant for some time. This tends to eliminate premature transitions for pulsed input signals, such as switched carrier voiceband data.
Adaptive Predictor and Feedback Reconstructed Signal Calculator The primary function of the adaptive predictor is to compute the signal estimate se(k) from the quantized difference signal dq(k). Two adaptive predictor structures are used, a sixth-order section that models zeros and a second-order section that models poles in the input signal. This dual structure effectively caters to the variety of input signals that might be encountered. The signal estimate is computed by
se(k) = a1(k − 1)sr(k − 1) + a2(k − 1)sr(k − 2) + sez(k) where
sez(k) = b1(k − 1)dq(k − 1) + b2(k − 1)dq(k − 2) + b3(k − 1)dq(k − 3) + b4(k − 1)dq(k − 4) + b5(k − 1)dq(k − 5) + b6(k − 1)dq(k − 6) and the reconstructed signal is defined as
sr(k − i) = se(k − i) + dq(k − i) Both sets of predictor coefficients are updated using a simplified gradient algorithm, as shown in Figure 4.6.
Tone and Transition Detector In order to improve performance for signals originating from frequency shift keying (FSK) modems operating in the character mode, a two-step detection process is defined. First, partial band signal (e.g., tone) detection is invoked so that the quantizer can be driven into the fast mode of adaptation:
冦
td(k) = 1 0
if a2(k) < −0.71875 otherwise
In addition, a transition from a partial band is defined so that the predictor coefficients can be set to zero and the quantizer can be forced into the fast mode of adaptation:
Voice Technologies for Packet-Based Voice Applications
Figure 4.6 Gradient algorithms for updating prediction coefficients.
121
For the second-order predictor: a1(k) = (1 − 2−8)a1(k − 1) + (3 ⭈ 2−8) sgn [p(k)] sgn [p(k − 1)] a2(k) = (1 − 2−7)a2(k − 1) + 2−7{ sgn [p(k)] sgn [p(k − 2)] − f[a1(k − 1)] sgn [p(k)] sgn [p(k − 1)]} where p(k) = dq(k) + sez(k) f(a1) =
冦 4a 2 sgn (a )
|a1| ≤ 2−1 |a1| > 2−1
1
1
sgn [0] = 1, except sgn [p(k − i)] is defined to be 0 only if p(k − i) = 0 and i = 0; with the stability constraints; |a2(k)| ≤ 0.75 and |a1(k)| ≤ 1 − 2−4 − a2(k) If
tr(k) = 1
then
a1(k) = a2(k) = 0
For the sixth-order predictor: bi(k) = (1 − 2−8)bi(k − 1) + 2−7 sgn [dq(k)] sgn [dq(k − i)] for i = 1, 2, . . . , 6. If
tr(k) = 1
then
b1(k) = b2(k) = . . . = b6(k) = 0
As above, sgn [0] = 1, except sgn [dq(k) − i)] is defined to be 0 only if dq(k − i) = 0 and i = 0. Note that bi(k) is implicitly limited to ⫾2.
冦
tr(k) = 1 0
if a2(k) < −0.71875 and |dq(k)| > 24 ∗ 2 y1(k) otherwise
ADPCM Decoder Principles Figure 4.7 shows a block schematic of the decoder. There is an FB path and an FF path. The FB path uses the core bits to calculate the signal estimate. The FF path contains the core and enhanced bits and reconstructs the output PCM code word.
122
Feedback BIT masking
Lc(k )
Figure 4.7 Decoder block schematic.
ADPCM input
L′(k )
Feed-forward sr (k )FF reconstructed signal calculator
Quantizer scale factor adaptation
a1(k ) yL(k )
y(k )
Adaptation speed control
Adaptive predictor
tr(k ) td(k )
B2(k )
dq(k )FB Feedback sr (k )FB Feedback reconstructed inverse adaptive signal calculator quantizer se(k )
y( k )
Feed-forward dq(k )FF inverse adaptive quantizer
Tone and transition detector
Output PCM format conversion
sp ( k )
Synchronous coding adjustment
sd (k )
Voice Technologies for Packet-Based Voice Applications
123
4.3 Example of Application For intercontinental connections, the use of ADPCM at 32 or 40 kbps for improved voice transmission efficiency has become commonplace. ITU-T standards for ADPCM support about the same bandwidth as PCM but provide a reduced SNR: about 21 dB at 32 kbps (G.721), or about 28 dB at 40 kbps (G.726). Proprietary 32 kbps ADPCM encoders/decoders (codecs) that support a reduced bandwidth of less than 3200 Hz at an SNR of about 28 dB are also in common use [5]. Use of this technology over IP networks is also possible, although not all that common.
References 1. D. Minoli and E. Minoli. Delivering Voice over Frame Relay and ATM. New York: Wiley, 1998. 2. G. E. Pelton. Voice Processing. New York: McGraw-Hill, 1993. 3. John Bellamy. Digital Telephony. New York: Wiley, 1982. 4. ITU-T Recommendation G.727: 5-, 4-, 3-, and 2-bits Sample Embedded Adaptive Differential Pulse Code Modulation (ADPCM). Geneva, CH: ITU, 1990. 5. G. D. Forney, et al. “The V.34 High-Speed Modem Standard.” IEEE Communications Magazine, (December 1996): 28 ff.
Notes This function is applicable when w > 0. A piecewise linear approximation to the function can be utilized that is valid both for the value zero and for negative values. 2 Devices that use this technique are referred to as adaptive predictive coders (APCs). 3 Alternatively, one can achieve a higher signal-to-noise ratio with the same number of bits. 4 Membership in a cluster is specified by some rule, typically an n-dimensional distance measure in vector space. 5 This section is based on the ITU-T Recommendation G.727. This material is for pedagogical purposes only. Developers, engineers, and readers requiring more information should acquire the recommendation directly from the ITU-T [4]. 1
124
Chapter Four
6
The reader may wish to consult the companion Wiley text [1], for a description of ITU-T G.726 7 In the anticipated application with G.764, the Coding Type (CT) field and the Block Dropping Indicator (BDI) fields in the packet header defined in G.764 will inform the coder of what algorithm to use. For all other applications, the information that PVP supplies must be made known to the decoder.
chapter 5 Technology and Standards for Low-Bit-Rate Vocoding Methods 5.1
Introduction
As noted in the previous chapter, during the past quarter century there has been a significant level of research and development in the area of vocoder technology and compressed speech. During the early to mid-1990s, the ITU-T (specifically SG14 and SG15) standardized several vocoders that are applicable to low-bit-rate multimedia communications in general, and to voice over IP Internet and intranet applications in particular. Standardization is critical for interoperability and assurance of ubiquitous end-to-end connectivity. The recent standards are G.728, G.729, G.729A, and G.723.1. For some applications, the dominant factor is cost; for other applications, quality is paramount. This is part of the reason why several standards have evolved in the recent past. For completeness, Table 5.1 depicts the various standards that are available. However, to be ultimately successful, voice over IP will have to narrow down to one choice so that anyone can call anyone else (as is done today with modems and telephones) without worrying about what technology the destination party may be using. The winner will likely be the algorithms that will be bundled with an operating system, such as Windows 97, or a popular business application. The vocoders discussed in this chapter require between 10 and 20 million instructions per second (MIPS). In contemplating running these on a desktop PC,
125
126
Chapter Five
Table 5.1 ITU-T Speech Coding Standards Standard G.711 G.722 G.726 G.727 G.728 G.729 G.723.1
Description 64 kbps pulse code modulation (PCM) (both A-law and µ-law) Wideband vocoder operating at 64, 56, or 48 kbps ADPCM vocoder recommendation that folds G.721 and G.723 Embedded ADPCM operating at 40, 32, 24, or 16 kbps (see Chapter 4) 16-kbps low-delay code-excited linear prediction vocoder (LD-CELP) 8-kbps conjugate-structure algebraic-code-excited linear prediction (CS-ACELP) Low-bit-rate vocoder for multimedia communications operating at 6.3 and 5.3 kbps [this vocoder standard number has an extension (.1) because all the numbers in the G series have been used already]
it is worth noting that a 33 MHz 80486 ran at 27 MIPS; a 266 MHz Pentium II runs at 560 MIPS; and the successor to the Pentium Pro is expected to operate at 300 MHz and achieve 700 MIPS. There is also the expectation that by the year 2000, 500 MHz machines will deliver 1000 MIPS. Network capacity does not grow at the same rapid pace as Moore’s law, which says, in effect, that the power of the microprocessor goes up by an order of magnitude every five years (in fact, it has been documented by the senior author that usable aggregate network speed has historically been going up by an order of magnitude every 20 years). Corporate enterprise networks and intranets are chronically congested. Hence, these observations seem to imply that the only way to really have voice over IP take off is to trade off high desktop computational power for compressing speech down to the lowest possible rates to keep congestion low, but to do this without compromising the delay budget. (This could even mean developing higher-complexity algorithms with less end-to-end delay; however, these would not be applicable to mobile and cellular applications). This discussion focuses principally on G.729, G.729A, and G.723.1; G.728 is also covered, but its datarate (16 kbps) may be too high for IP applications. ITU-T G.729 is an 8-kbps conjugate-structure algebraic-code-excited linear-prediction (CS-ACELP) speech algorithm providing good speech quality. The algorithm has a 15-ms algorithmic codec delay. G.729 was originally designed for wireless environments, but it is applicable to IP and multimedia communications as well. Annex A of Recommendation G.729 (also called G.729A) describes a reduced-complexity version of the algorithm that has been designed explicitly for integrated voice and data applications [called Digital Simultaneous Voice and Data (DSVD)] that are prevalent in small office and home office (SOHO) low-bit-rate multimedia communications. These vocoders use the same bitstream format and can interoperate with one another.1 The ITU Recommendation G.723.1 is a 6.3- and 5.3-kbps vocoder for multimedia communications that was originally designed for low-bit-rate videophones.
Technology and Standards for Low-Bit-Rate Vocoding Methods
127
The algorithm’s frame size is 30 ms and the one-way codec delay is 37.5 ms. In applications where low delay is important, the delay in G.723.1 may not be tolerable; however, if the delay is tolerable, G.723.1 provides a lower-complexity lowerbandwidth alternative to G.729, at the expense of a small degradation in speech quality. Each of these three ITU Recommendations has the potential to become a key commercial mechanism for voice over IP on the Internet and other networks, since all three are low-bandwidth and are simple enough in complexity to be executed on the host processor, such as a PC, or be implemented on a modem chip. Hence, this chapter examines these standards in some level of detail.
Overview As noted in Chapter 4, the design goal of vocoders is to reduce the bit rate of speech for transmission or storage, while maintaining a quality level acceptable for the application at hand. On intra-nets and the Internet, voice applications may be standalone or multimedia-based. Since multimedia implies the presence of a number of media, speech coding for such applications implies that the speech bitstream shares the communication link with other signals. Some such applications include the following: • Simultaneous voice and video, for example, a videophone, stored video presentation, and so forth • Digital simultaneous voice and data (DSVD) whiteboarding applications where the data stream could be transmission of shared files that the parties are developing, discussing, creating, updating, or synthesizing • Simultaneous voice and fax, where a copy of a document is transmitted from one person to a group of one or more recipients In principle, the use of a uniquely specified vocoder might be desirable. Unfortunately, short-term local optimization considerations have lead developers to the conclusion that it is more economical to tailor the vocoder to the application at hand. Consequently, a number of new vocoders were standardized during the mid1990s. Specifically, three new international standards (ITU-G.729, G.729A, and G.723.1), and three new regional standards (enhanced full-rate vocoders for North American and European mobile systems) have emerged of late. As a consequence of this overabundance of standards, making an appropriate choice can be challenging. Vocoder attributes can be used to make trade-off analyses during the vocoder selection process that the developer of an intranet or Internet multimedia or telephony application (i.e., speech bandwidth of 200 to 3400 Hz sampled at 8 kHz) needs to undertake.
128
Chapter Five
Vocoder Attributes Vocoder speech quality is a function of bit rate, complexity, and processing delay. Developers of intranet and Internet telephony products must review all these attributes. There usually is a strong interdependence between all these attributes and they may have to be traded off against each other. For example, low-bit-rate vocoders tend to have more delay than higher-bit-rate vocoders. Low-bit-rate vocoders also require higher VLSI complexity to implement. As might be expected, low-bit-rate vocoders often have lower speech quality than the higherbit-rate vocoders.2
Bit Rate Since the vocoder is sharing the access communications channel or the likely overloaded enterprise network or Internet with other information streams, the peak bit rate should be as low as possible, particularly for SOHO applications. Today, most vocoders operate at a fixed bit rate regardless of the input signal characteristics; however, the goal is to make the vocoder variable-rate. For simultaneous voice and data applications, a compromise is to create a silence compression algorithm (see Table 5.2) as part of the coding standard. A common solution is to use a fixed rate for active speech and a low rate for background noise [1]. The performance of the silence compression mechanism is critical to speech quality: If speech is declared too often, the gains of silence compression are not realized. The challenge is that with loud background noises it may be difficult to distinguish between speech and noise. Another problem is that if the silence compression mechanism fails to recognize the onset of speech, the beginning of the speech will be cut off; this front-end clipping significantly impairs the intelligibility of the coded speech. The comfort noise generation mechanism must be designed in such a way that the encoder and decoder remain synchronized, even when there are no bits transmitted during some intervals. This allows for smooth transitions between active and nonactive speech segments.
Table 5.2
Silence Compression Algorithms
Algorithm Voice activity detector (VAD)
Comfort noise generation (CNG)
Description Determines if the input signal is speech or background noise. If the signal is declared to be speech, it is coded at the full fixed bit rate; if the signal is declared to be noise, it is coded at a lower bit rate. As appropriate, no bits are transmitted. Mechanism is invoked at the receiver end to reconstruct the main characteristic of the background noise.
Technology and Standards for Low-Bit-Rate Vocoding Methods
129
Delay The delay in a speech coding system usually consists of three major components: • Frame delay • Speech processing delay • Bridging delay Typically, low-bit-rate vocoders process a frame of speech data at a time, so that the speech parameters can be updated and transmitted for every frame. Hence, before the speech can be analyzed it is necessary to buffer a frame’s worth of speech samples. The resulting delay is called algorithmic delay. It is sometimes necessary to analyze the signal beyond the frame boundary (this is referred to as look-ahead); here, additional speech samples need to be buffered, with additional concomitant delay. Note that this is the only implementation-independent delay (other delay components depend on the specific implementation, e.g., how powerful the processor used to run the algorithm is, the kind of RAM used, etc.). Algorithmic delays are unavoidable; hence, they need to be considered as part of the delay budget by the planner. The second major component of the delay originates from the processing time it takes the encoder to analyze the speech and the processing time required by the decoder to reconstruct the speech. This processing delay depends on the speed of the hardware used to implement the vocoder. The combined algorithmic and processing delay is called the one-way system delay. The maximum tolerable value for the one-way system delay is 400 ms, if there are no echoes, but for ease and efficiency of communication it is preferable to have the one-way delay below 200 ms. If there are echoes, the tolerable one-way delay is 20 to 25 ms; therefore, the use of echo cancellation is often necessary. In applications such as teleconferencing, it may be necessary to bridge several callers with a multipoint control unit (MCU) in order to allow each person to communicate with the others. This requires decoding each bitstream, summing the decoded signals, and then reencoding the combined signal. This process doubles the delay and at the same time it reduces the speech quality because of the multiple (tandem) encodings. Given the previous observation, a bridged system can tolerate a maximum one-way delay of 100 ms, because the bridging will result in the doubling of the one-way system delay to 200 ms.
Algorithm’s Complexity Vocoders are often implemented on DSP hardware. Complexity can be measured in terms of computing speed in MIPS, of random access memory (RAM), and of read-only memory (ROM). Complexity determines cost; hence, in selecting a vocoder for an application, the developer must make an appropriate choice. When
130
Chapter Five
the vocoder shares a processor with other applications, the developer must decide how much of these resources to allocate to the vocoder. Vocoders utilizing less than 15 MIPS are considered to be low-complexity; those using 30 MIPS or more are considered to be high-complexity. As noted, increased complexity results in higher costs and greater power usage. Power usage is an important consideration in portable applications, since greater power usage implies reduced time between battery recharges or the necessity of using larger batteries, which, in turn, means more expense and weight.
Quality The measure used in comparisons is how good the speech sounds under ideal conditions—namely, clean speech, no transmission errors, and only one encoding (note, however, that in the real world these ideal conditions are often not met because there can be large amounts of such background noise as street noise, office noise, air conditioning noise, etc.). Table 5.3 shows the quality of the major coding schemes being utilized in voice over data networks. How well the vocoder performs under adverse conditions (e.g., what happens when there are channel errors or the loss of entire frames; how good the vocoder sounds when the speech is encoded and decoded in tandem, as is the case in a bridging application; how good it sounds when transcoding with another standard vocoder; how it sounds for a variety of languages) is the question that the standards bodies try to answer during the testing phase of the standards drafting and generation process.
Linear Prediction Analysis-by-Synthesis (LPAS) Coding The ITU-T Recommendations G.723.1, G.728, and G.729 belong to a class of linear prediction analysis-by-synthesis (LPAS) vocoders. Code-excited linear predictive (CELP) vocoders are the most common realization of the LPAS technique. Figure 5.1 shows a block diagram of an LPAS vocoder.
Basic Mechanisms Decoded speech is produced by filtering the signal produced by the excitation generator through both a long-term (LT) predictor synthesis filter and a short-term
Table 5.3 Quality of Coding Schemes Algorithm
G.723.1
Rate, bps Quality Complexity
5.3–6.3 Good Highest
G.729 G.729A 8 Good High
G.728 16 Good Lower
G.726 G.727 32 Good Low
G.711 64 Good Lowest
Technology and Standards for Low-Bit-Rate Vocoding Methods
Figure 5.1 Block diagram of an LPAS vocoder.
131
Input speech
+
Weighting filter
–
Minimization procedure
Excitation generator Short-term predictor synthesizer
Long-term predictor synthesizer
(ST) predictor synthesis filter. The excitation signal is found by minimizing the mean-squared-error signal (the difference between the original and the decoded signal) over a block of samples.3 It is weighted by filtering it through an appropriate filter. Both ST and LT predictors are adapted over time. Since the encoder analysis procedure includes the decoder synthesis procedure, the description of the encoder also defines the decoder. The ST synthesis filter models the short-term correlations in the speech signal. This is an all-pole filter with an order between 8 and 16. The predictor coefficients of the short-term predictor are adapted in time, with rates varying from 30 to as high as 400 times per second. The LT predictor filter models the long-term correlations in the speech signal. Its parameters are a delay and a gain coefficient. For periodic signals, the delay corresponds to the pitch period (or possibly to an integral number of pitch periods); for nonperiodic signals the delay is random. Typically, the long-term predictor coefficients are adapted at rates varying from 100 to 200 times per second [1]. A frequently used alternative for the pitch filter is the adaptive codebook. Here, the LT synthesis filter is replaced by a codebook that contains the previous excitation at different delays. These vectors are searched, and the one that provides the best match is selected. To simplify the determination of the excitation for delays smaller than the length of the excitation frames, an optimal scaling factor can be determined for the selected vector. To achieve a low bit rate, the average number of bits per sample for each frame of excitation samples must be kept small. The multipulse excitation vocoder represents the excitation as a sequence of pulses located at nonuniformly spaced intervals. The excitation analysis procedure determines both amplitudes and positions of the pulses. Finding these parameters all at once is a difficult problem, and simpler procedures, such as determining locations and amplitudes one pulse at a time, are typically used. The number of pulses required for an acceptable speech quality varies from four to six pulses every 5 ms. For each pulse, both amplitude and location have to be transmitted, requiring about 7 or 8 bits per pulse [1].
132
Chapter Five
Code-excited linear predictive vocoders approach the issue of reducing the number of bits per sample as follows: Both encoder and decoder store the same collection of C possible sequences of length L in a codebook, and the excitation for each frame is described by the index to an appropriate vector in the codebook. This index is typically found by conducting an exhaustive search of the codebook vectors and identifying the one that produces the smallest error between the original and decoded signals. To simplify the search procedure, many implementations use a gain-shape codebook where the gain is searched and quantized separately. The index requires (log2 C)/L bits per sample, typically 0.2 to 2 bits per sample, and the gain requires 2 to 5 bits for each codebook vector. The algebraic codebook-excited linear prediction (ACELP) introduces further simplification by populating the codebook vectors with a multipulse structure: By using only a few nonzero unit pulses in each codebook vector, the search procedure can be sped up. The partitioning of the excitation space is known as an algebraic codebook, hence the name of the vocoder.
Error-Weighting Filter The approach described in the preceding of minimizing a mean squared error results in a quantization noise that has equal energy across the spectrum of the input signal. However, by making use of properties of the human auditory system, the vocoder designer can focus on reducing the perceived amount of noise. It has been found that greater amounts of quantization noise are undetectable in the frequency bands where the speech signal has high energy. Namely, the designer wants to shape the noise as a function of the spectral peaks in the speech signal. To put this masking effect to work in the vocoder design, the quantization noise has to be properly distributed among different frequency bands. This can be achieved by minimizing a weighted error from the short-term predictor filter.
Adaptive Postfilter The noise in speech caused by the quantization of the excitation signal remains an area of vocoder design improvement (in the low-energy frequency regions in particular, the noise can dominate the speech signal). The perceived noise can be further reduced by using a postprocessing technique called postfiltering after reconstruction by the decoder. This operation trades off spectral distortion in the speech versus suppression of the quantization noise, by emphasizing the spectral peaks and attenuating the spectral valleys. The postfilter is generally implemented as a combination ST/LT filter. The ST postfilter modifies the spectral envelope, it being based on the transmitted ST predictor coefficients (it can also be derived from the reconstructed signal.) The parameters for the LT postfilter are either derived from the transmitted LT predictor coefficients or computed from the reconstructed speech [1].
Technology and Standards for Low-Bit-Rate Vocoding Methods
133
5.2 Introduction to G.729 and G.723.1 The excitation signals (e.g., ACELP) and the partitioning of the excitation space (the algebraic codebook) represent a distinguishable vocoder design feature. For example, G.729 and G.732.1 can be differentiated in this manner, although both assume that all pulses have the same amplitudes and that the sign information will be transmitted. The two vocoders also show major differences in terms of delay.
Differentiations G.729 has excitation frames of 5 ms and allows four pulses to be selected. The 40-sample frame is partitioned into four subsets. The first three subsets have eight possible locations for pulses, the fourth has sixteen. One pulse must be chosen from each subset. This is a four-pulse ACELP excitation codebook method (see Figure 5.2). G.723.1 has excitation frames of 7.5 ms, and also uses a four-pulse ACELP excitation codebook for the 5.3-kbps mode. For the 6.3-kbps rate a technique called multipulse excitation with a maximum likelihood quantizer (MP-MLQ) is employed. Here the frame positions are grouped into even-numbered and oddnumbered subsets. A sequential multipulse search is used for a fixed number of pulses from the even subset (either five or six, depending on whether the frame itself is odd- or even-numbered); a similar search is repeated for the odd-numbered subset. Then, the set resulting in the lowest total distortion is selected for the excitation [1]. At the decoder stage, the linear prediction coder (LPC) information and adaptive and fixed codebook information are demultiplexed and then used to reconstruct the output signal. An adaptive postfilter is used. In the case of the G.723.1 vocoder, the LT postfilter is applied to the excitation signal before it is passed through the LPC synthesis filter and the ST postfilter.
Figure 5.2 Parameters for new vocoders.
Vocoder parameter
Vocoder
Bit rate, kbps Frame size, ms Subframe size, ms Algorithmic delay, ms MIPS RAM, bytes Quality
G.729
G.729A
G.723.1
8 10 5 15 20 5.2K Good
8 10 5 15 10 4K Good
5.3–6.3 30 7.5 37.5 14–20 4.4K Good
134
Chapter Five
Standardization Process As noted, standardization is a critical requirement if the technology is to proliferate. Standards should also be developed quickly and not be unduly complex or long. As part of the standardization process, a document called terms of reference (ToR) is generated that contains a schedule and the performance requirements and objectives—in this instance, in the areas of quality, bit rate, delay, and complexity. In terms of bit rates, the ToR requirements for the ITU-T standards under discussion were derived from the amount of speech data that could be carried over a 14.4-kbps modem or over a digital cellular system. Specifically, for G.729, the ToR requirements were that the vocoder should operate at 8 kbps to support the range of first-generation digital cellular standards (about 7 kbps for Japanese systems, 8 kbps for U.S. systems, and 13 kbps in the European systems), as well as complete the vocoder bit rate sequence (that is, 64, 32, 16, and now 8 kbps). For G.723.1, the ToR requirement was that the vocoder should operate below 9.6 kbps. Participant contributions were based on 5.0- to 6.8-kbps technologies, hence, a 6.3-kbps rate was settled upon; in the later development of G.723.1, a rate of 5.3 kbps was added for flexibility. For the digital simultaneous voice and data (DSVD) vocoder (G.729A), modem throughput (specifically, that of the V.34 modem) was used as a peg, and the rate was selected at 8 kbps. Initially, none of the vocoders had a silence compression capability as part of the recommendation. More recent work has standardized silence compression schemes for both G.723.1 and G.729, now being included as annexes to the recommendations. The ToR requirement for delay for G.729 was discussed for some time. The frame size settled on was 10 ms. The algorithm has a 5 ms look-ahead. Hence, assuming a 10-ms processing delay and a 10-ms transmission delay, the one-way system delay of G.729 is 35 ms. G.723.1 has a look-ahead of 7.5 ms and a frame size of 32 ms, making the one-way system delay 97.5 ms. This delay was backengineered from the intended application, namely, low-bit-rate videophones. These videophones typically operate at 5 frames (or fewer), with a video frame period of 200 ms. The standard development group picked a one-way delay of 100 ms for the vocoder, keeping the delay in a bridging environment to 200 ms. Working backward from the 100-ms value, a maximum frame size of 32 ms was set. In selecting the delay requirements for a DSVD vocoder (G.729A), the delay inherent in V.34 modems was taken into account (one-way delays are greater than 35 ms); also, the issue of bridging was noted, with modem delay now greater than 70 ms. Therefore, SG14 and SG15 agreed on a one-way codec delay maximum of 40 ms (G.723.1 was rejected for DSVD applications because the combined one-way delay for a single encoding could be 135 ms or greater).
Technology and Standards for Low-Bit-Rate Vocoding Methods
135
Delay and complexity are often traded off against each other. For G.729, the ITU-Radiocommunications Standard Sector (ITU-R) was concerned about complexity, but eventually accepted a delay target that allowed a reduction in complexity compared with the G.728 vocoder. The vocoder needs 17 MIPS; however, the amount of RAM required is 50 percent more than for G.728, with the additional memory being used to process larger frames. G.723.1 is of lower complexity than G.729 (14 to 16 MIPS). The DSVD vocoder has a 10-MIPS complexity. Quality is a complex topic, as Table 5.4 illustrates for the G.729 vocoder. See Reference [1] for a discussion.
Standardization Interval The standardization discussed in this chapter occurred from mid-1990 to late 1995. G.729 work started in July 1990 and was completed by November 1995 (total time 64 months). G.723.1 work started in November 1992 and was completed by November 1995 (36 months). G.729A work started in November 1994 and was completed by May 1995 (18 months). As noted, there is a desire to bring out standards as quickly as can be done (during the 1980s, standards used to take four to eight years to complete). One can partition the process into three main parts: (1) time spent determining the requirements and objectives (which is culminated by the completion of the ToR), (2) time spent on submissions and testing (which is culminated by the selection of the vocoder), and (3) time spent drafting the recommendation and following the procedures of the ITU required for ratification [1].
Table 5.4 Example of Quality Requirements (G.729) Issue or parameter Quality without bit errors Quality with errors Random bit errors < 10−3 Detected frame erasures (random and bursty) Undetected burst errors Level dependency Talker dependency Music support Tandeming General capability With other ITU vocoders With new regional standards Idle channel noise Capability to carry signaling tones
Example of requirement No worse than G.726 (32 kbps) No worse than G.726 No more than 0.5 MOS degradation from 32-kbps ADPCM without errors None No worse than G.726 No worse than G.726 No artifacts generated Two codings with distortion <4 G.726 codings Two codings with distortion <4 G.726 codings For further study No worse than G.726 DTMF and others
136
Chapter Five
5.3 G.723.1 G.723.1 specifies a coded representation that can be used for compressing the speech or other audio signal component of multimedia services at a very low bit rate.4 In the design of this coder, the principal application considered by the Study Group was very low bit rate visual telephony as part of the overall H.324 family of standards.
Introduction This coder has two bit rates associated with it, 5.3 and 6.3 kbps. The higher bit rate gives greater quality. The lower bit rate gives good quality and provides system designers with additional flexibility. Both rates are a mandatory part of the encoder and decoder. It is possible to switch between the two rates at any 30-ms frame boundary. An option for variable rate operation using discontinuous transmission and noise fill during nonspeech intervals is also possible. The G.723.1 coder was optimized to represent speech with a high quality at the stated rates, using a limited amount of complexity. Music and other audio signals are not represented as faithfully as speech, but can be compressed and decompressed using this coder. The G.723.1 coder encodes speech or other audio signals in 30-ms frames. In addition, there is a look ahead of 7.5 ms, resulting in a total algorithmic delay of 37.5 ms. All additional delay in the implementation and operation of this coder is due to the following: 1. Actual time spent processing the data in the encoder and decoder 2. Transmission time on the communication link 3. Additional buffering delay for the multiplexing protocol
Encoder/Decoder The G.723.1 coder is designed to operate with a digital signal by first performing telephone bandwidth filtering (Recommendation G.712) of the analog input, then sampling at 8000 Hz, and then converting to 16-bit linear PCM for the input to the encoder. The output of the decoder is converted back to analog by similar means. Other input/output characteristics, such as those specified by Recommendation G.711 for 64-kbps PCM data, should be converted to 16-bit linear PCM before encoding or from 16-bit linear PCM to the appropriate format after decoding. The coder is based on the principles of linear prediction analysis-by-synthesis coding and attempts to minimize a perceptually weighted error signal. The encoder operates on blocks (frames) of 240 samples each. That is equal to 30 ms at an 8-kHz sampling rate. Each block is first high-pass filtered to remove the DC component
Technology and Standards for Low-Bit-Rate Vocoding Methods
137
and then is divided into four subframes of 60 samples each. For every subframe, a tenth-order linear prediction coder filter is computed using the unprocessed input signal. The LPC filter for the last subframe is quantized using a predictive split vector quantizer (PSVQ). The quantized LPC coefficients are used to construct the short-term perceptual weighting filter, which is used to filter the entire frame and to obtain the perceptually weighted speech signal [2]. For every two subframes (120 samples), the open-loop pitch period LLO is computed using the weighted speech signal. This pitch estimation is performed on blocks of 120 samples. The pitch period is searched in the range from 18 to 142 samples. From this point, the speech is processed on a basis of 60 samples per subframe. Using the estimated pitch period computed previously, a harmonic noise shaping filter is constructed. The combination of the LPC synthesis filter, the format perceptual weighting filter, and the harmonic noise shaping filter is used to create an impulse response. The impulse response is then used for further computations.
y [n ]
Simulated decoder LSP quantizer
LSP decoder
LSP interpolator
– A(z)
Framer
W (z) s [n ]
P (z)
Impulse response calculator
S(z)
Memory update
High-pass filter
u [n ]
e [n ]
+
v [n ]
x [n ] LPC analysis
Formant perceptual weighting
Zero-input response z [n ]
A[z ]
Harmonic noise shaping w [n ]
Pitch predictor p [n ]
–
Pitch decoder
LI βI
MP-MLQ/ ACELP
r [n ] f [n ]
Pitch estimator
Figure 5.3 Block diagram of the speech coder.
f [n ]
–
Excitation decoder
138
Chapter Five
Using the estimated pitch period estimation LLO and the impulse response, a closed-loop pitch predictor is computed. A fifth-order pitch predictor is used. The pitch period is computed as a small differential value around the open-loop pitch estimate. The contribution of the pitch predictor is then subtracted from the initial target vector. Both the pitch period and the differential values are transmitted to the decoder. Finally, the nonperiodic component of the excitation is approximated. For the high bit rate, multipulse maximum likelihood quantization (MP-MLQ) excitation is used, and for the low bit rate, an algebraic code excitation is used. The block diagram of the encoder is shown in Figure 5.3. The mathematics of the following are beyond the scope of this text (the interested reader should consult G.723.1 directly [2]): • • • • • • • • • • • • • • • •
Framer High-pass filter LPC analysis Line spectral pair (LSP) quantizer LSP decoder LSP interpolation Formant perceptual weighting filter Pitch estimation Subframe processing Harmonic noise shaping Impulse response calculator Zero-input response and ringing subtraction Pitch predictor High-rate excitation (MP-MLQ) Excitation decoder Pitch information decoding
5.4 G.728 ITU-T Recommendation G.728 contains the description of an algorithm for the coding of speech signals at 16 kbps using low-delay code-excited linear prediction (LD-CELP).5 The LD-CELP algorithm consists of an encoder and a decoder, as illustrated in Figure 5.4 [3]. The essence of the CELP technique, which is an analysisby-synthesis approach search, is retained in LD-CELP. However, LD-CELP uses
Technology and Standards for Low-Bit-Rate Vocoding Methods
64-kbps A -law or µ-law PCM input
Excitation VQ codebook
Convert to uniform PCM
Vector buffer
Synthesis filter
Gain
139
Backward gain adaptation
–
+
+
Perceptual weighting filter
Min. MSE
VQ index
16-kbps output
Backward predictor adaptation
(a) VQ index 16-kbps input
Excitation VQ codebook
Gain
Backward gain adaptation
Synthesis filter
Postfilter
Convert to PCM
64-kbps A -law or µ-law PCM output
Backward predictor adaptation
(b)
Figure 5.4 Simplified block diagram of the LD-CELP coder: (a) coder and (b) decoder. (From Reference [3].)
backward adaptation of predictors and gain to achieve an algorithmic delay of 0.625 ms. Only the index to the excitation codebook is transmitted. The predictor coefficients are updated through LPC analysis of previously quantized speech. The excitation gain is updated by using the gain information embedded in the previously quantized excitation. The block size for the excitation vector and gain adaptation is five samples only. A perceptual weighting filter is updated using LPC analysis of the unquantized speech.
LD-CELP Encoder After the conversion from A-law or µ-law PCM to uniform PCM, the input signal is partitioned into blocks of five consecutive input signal samples. For each input
140
Chapter Five
block, the encoder passes each of 1024 candidate codebook vectors (stored in an excitation codebook) through a gain-scaling unit and a synthesis filter. From the resulting 1024 candidate quantized signal vectors, the encoder identifies the one that minimizes a frequency-weighted mean squared error measured with respect to the input signal vector. The 10-bit codebook index of the corresponding best codebook vector (or codevector) that gives rise to that best candidate quantized signal vector is transmitted to the decoder. The best code vector is then passed through the gain-scaling unit and the synthesis filter to establish the correct filter memory in preparation for the encoding of the next signal vector. The synthesis filter coefficients and the gain are periodically updated on the previously quantized signal and gain-scaled excitation in a backward-adaptive manner.
LD-CELP Decoder The decoding operation is also performed on a block-by-block basis. Upon receiving each 10-bit index, the decoder performs a table look-up to extract the corresponding codevector from the excitation codebook. The extracted codevector is then passed through a gain-scaling unit and a synthesis filter to produce the current decoded signal vector. The synthesis filter coefficients and the gain are then updated in the same way as in the encoder. The decoded signal vector is then passed through an adaptive postfilter to enhance the perceptual quality. The postfilter coefficients are periodically updated, using the information available at the perceptual quality. The postfilter coefficients are periodically updated using the information available at the decoder. The five samples of the postfilter signal vector are next converted to five A-law or µ-law PCM output samples.
5.5 G.729 ITU-T Recommendation G.729 contains the description of an algorithm for the coding of speech signals at 8 kbps using conjugate-structure algebraic code-excited Linear Prediction (CS-ACELP).6 This coder is designed to operate with a digital signal obtained by first performing telephone bandwidth filtering (Recommendation G.712) of the analog input signal, then sampling it at 8000 Hz, followed by conversion to 16-bit linear PCM for the input to the encoder. The output of the decoder should be converted back to an analog signal by similar means. Other input/output characteristics, such as those specified by Recommendation G.711 for 64-kbps PCM data, should be converted to 16-bit linear PCM before encoding, or from 16-bit linear PCM to the appropriate format after decoding. The CS-ACELP coder is based on the code-excited linear prediction (CELP) coding model. The coder operates on speech frames of 10 ms, corresponding to 80
Technology and Standards for Low-Bit-Rate Vocoding Methods
141
Table 5.5 Bit Allocation of the 8-kbps CS-ACELP Algorithm (10-ms Frame) Parameter
Codeword
Subframe 1
Subframe 2
Line spectrum pairs Adaptive-codebook delay Pitch-delay parity Fixed-codebook index Fixed-codebook sign Codebook gains (stage 1) Codebook gains (stage 2) Total
L0, L1, L2, L3 P1, P2 P0 C1, C2 S1, S2 GA1, GA2 GB1, GB2
8 1 13 4 3 4
5
Total per frame 18 13 1 26 8 6 8 80
13 4 3 4
samples at a sampling rate of 8000 samples per second. For every 10-ms frame, the speech signal is analyzed to extract the parameters of the CELP model (linearprediction filter coefficients and adaptive- and fixed-codebook indices and gains). These parameters are encoded and transmitted. The bit allocation of the coder parameters is shown in Table 5.5. At the decoder, those parameters are used to retrieve the excitation and synthesis filter parameters. The speech is reconstructed by filtering this excitation through the short-term synthesis filter, as is shown in Figure 5.5. The short-term synthesis filter is based on a tenth-order linear prediction (LP) filter. The long-term, or pitch synthesis, filter is implemented using the adaptive-codebook approach. After computing the reconstructed speech, it is further enhanced by a postfilter [4].
Encoder The encoding principle is shown in Figure 5.6 [4]. The input signal is high-pass filtered and scaled in the preprocessing block. The preprocessed signal serves as the input signal for all subsequent analysis. LP analysis is done once per 10-ms frame to compute the LP filter coefficients. These coefficients are converted to line spectrum pairs (LSPs) and quantized using predictive two-stage vector quantization
Excitation codebook
Figure 5.5 Block diagram of conceptual CELP synthesis model.
Long-term synthesis filter
Short-term synthesis filter
Parameter decoding
Received bitstream
Post filter
Output speech
142
Chapter Five
Figure 5.6 Encoding principle of the CS-ACELP encoder. (From Reference [4].)
Input speech
Preprocessing
LP analysis quantization interpolation Fixed codebook
LPC info
GC
Synthesis filter
+
+
Adaptive codebook
GP LPC info Pitch analysis Perceptual weighting Fixed CB search
Gain quantization
Parameter encoding
Transmitted bitstream
LPC info
(VQ) with 18 bits. The excitation signal is chosen by using an analysis-by-synthesis search procedure in which the error between the original and reconstructed speech is minimized according to a perceptually weighted distortion measure. This is done by filtering the error signal with a perceptual weighting filter whose coefficients are derived from the unquantized LP filter. The amount of perceptual weighting is made adaptive to improve the performance for input signals with a flat frequency response. The excitation parameters (fixed- and adaptive-codebook parameters) are determined per 5-ms subframe (40 samples). The quantized and unquantized LP filter coefficients are used for the second subframe, while in the first subframe
Technology and Standards for Low-Bit-Rate Vocoding Methods
143
interpolated LP filter coefficients are used (both quantized and unquantized). An open-loop pitch delay is estimated once per 10-ms frame, based on the perceptually weighted speech signal. Then the following operations are repeated for each subframe. The target signal x(n) is computed by filtering the LP residual through the weighted synthesis filter W(z)/Â(z). The initial states of these filters are updated by filtering the error between LP residual and excitation. This is equivalent to the common approach of subtracting the zero-input response of the weighted synthesis filter from the weighted speech signal. The impulse response h(n) of the weighted synthesis filter is computed. Closed-loop pitch analysis is then done to find the adaptive-codebook delay and gain by searching around the value of the open-loop pitch delay, using the target x(n) and the impulse response h(n). A fractional pitch delay with 1/3 resolution is used. The pitch delay is encoded with 8 bits in the first subframe and is differently encoded with 5 bits in the second subframe. The target signal x(n) is used in the fixed-codebook search to find the optimum excitation. An algebraic codebook with 17 bits is used for the fixed-codebook excitation. The gains of the adaptive- and fixed-codebook contributions are vector-quantized with 7 bits, with moving average prediction applied to the fixed-codebook gain. Finally, the filter memories are updated using the determined excitation signal.
Decoder The decoder principle is shown in Figure 5.7. First, the parameter’s indices are extracted from the received bitstream. These indices are decoded to obtain the coder parameters corresponding to a 10-ms speech frame. These parameters are the LSP coefficients, the two fractional pitch delays, the two fixed-codebook vectors, and the two sets of adaptive- and fixed-codebook gains. The LSP coefficients are interpolated and converted to LP filter coefficients for each subframe. Then, for each 5-ms subframe the following steps are done:
Figure 5.7 Principle of the CS-ACELP decoder.
Fixed codebook
GC Adaptive codebook
GP
+
Short-term filter
Postprocessing
Table 5.6 Functions, Signals, and Variables Required by the Coder Name
Description Variables
gp
Adaptive-codebook gain
gc gl gf gt G Top ai ki k1′ oi ωi pˆi,j qi r(k) r′(k) wi lˆ i
Fixed-codebook gain Gain term for long-term postfilter Gain term for short-term postfilter Gain term for tilt postfilter Gain for gain normalization Open-loop pitch delay LP coefficients (a0 = 1.0) Reflection coefficients Reflection coefficient for tilt postfilter LAR coefficients LSF normalized frequencies MA predictor for LSF quantization LSP coefficients Autocorrelation coefficients Modified autocorrelation coefficients LSP weighting coefficients LSP quantizer output
Symbols 1/Â(z) Hh1(z) Hp(z) Hf (z) Ht(z) Hh2(z) P(z) W(z)
LP synthesis filter Input high-pass filter Long-term postfilter Short-term postfilter Tilt-compensation filter Output high-pass filter Prefilter for fixed codebook Weighting filter
c(n) d(n) ew(n) h(n) r(n) s(n) sˆ(n) s′(n) sf(n) sf′(n) sw(n) x(n) x′(n) u(n) v(n) y(n) z(n)
Fixed-codebook contribution Correlation between target signal and h(n) Error signal Impulse response of weighting and synthesis filters Residual signal Preprocessed speech signal Reconstructed speech signal Windowed speech signal Postfiltered output Gain-scaled postfiltered output Weighted speech signal Target signal Second target signal Excitation to LP synthesis filter Adaptive-codebook contribution Convolution v(n) ∗ h(n) Convolution c(n) ∗ h(n)
Signals
144
Technology and Standards for Low-Bit-Rate Vocoding Methods
145
• The excitation is constructed by adding the adaptive- and fixed-codebook vectors scaled by their respective gains. • The speech is reconstructed by filtering the excitation through the LP synthesis filter. • The reconstructed speech signal is passed through a postprocessing stage, which includes an adaptive postfilter based on the long-term and short-term synthesis filters, followed by a high-pass filter and scaling operation. As implied from this discussion, the coder encodes speech and other audio signals with 10-ms frames. In addition, there is a look-ahead of 5 ms, resulting in a total algorithmic delay of 15 ms. All additional delays in a practical implementation of this coder are due to the following: • processing time needed for encoding and decoding operations • transmission time on the communication link • multiplexing delay when combining audio data with other data The mathematics of the algorithm are beyond the scope of this text. Table 5.6 depicts the functions, signals, and variables required by the coder, giving a sense of the nontrivial nature of the computational and analytical machinery involved. The interested reader should consult G.729 directly [4].
5.6 Example of Applications In this section, some applications of low-bit-rate vocoders are discussed.
H.263 Video Coding for Low-Bit-Rate Communication There is a growing interest in video coding technology and its applications over both circuit-switched and packet-switched (e.g., IP) networks. Applications include video telephony and videoconferencing, computer-supported cooperative work, whiteboarding, and other value-added services. The limited transmission rate available on the public switched telephone network (PSTN), on wireless networks, and on intranets and the Internet presents a significant challenge to digital video communications. With V.34 modem technology the bit rate achievable on the PSTN has increased, but it currently is still limited to 33.6 kbps, which is a stretch for video applications. Digital wireless communication, which has gained acceptance recently, is also limited to a few kilobits per second in available transmission rate. Therefore, there is an increasing interest in video coding at such low bit rates [5]. Although there may be more bandwidth in the backbone of a network, the access
146
Chapter Five
speed, particularly for telecommuters, SOHO workers, and residential users, remains a key gating factor. Implicit in these video applications is the use of compressed speech, as shown in Figure 5.8 (modeled after Reference [6]). ITU-T recommendations for very low bit rate multimedia terminals include the following two algorithms [5]: • H.263. Based on existing technology, developed by late 1995 (same time schedule as for the recommendations for the H.324 terminal description, multiplexing, control, and speech). The objective for H.263 is to provide significantly better picture quality than the existing ITU-T algorithm for video compression (H.261), while operating at 28.8 to 33.6 kbps. • H.263/L. The long-term algorithm, including technology with more advanced performance, to be developed by 1998. The objective for H.263/L is to provide considerably better picture quality than H.263, with improved resiliency. In this context, it should be noted that V.34 is a relatively new standard (1994) for full-duplex data transmission over the PSTN at bit rates up to 28.8 kbps (recently extended to 33.6 kbps). V.34 modems are low-priced and are now displacing 14.4-kbps V.32bis modems (1990) in such applications as remote access to corporate networks, online services, and the Internet. The increased bit rates of V.34 modems, combined with the recent advances in digital voice codings previously described that provide near-toll quality for certain applications at rates below 8 or 16 kbps, allow the simultaneous transmission of voice, data, and video over ordinary voice-grade PTSN lines. These advances have recently led to the development of new multimedia modem standards, such as H.324 [7].
H.324 Multimedia Communication ITU-T Recommendation H.324, Terminal for Low Bitrate Multimedia Communication (1995), is the new international standard for multimedia conferencing on circuit-switched networks (e.g., PSTN); however, many of the elements of H.324 can be adapted to run on IP networks, as inspection of Figure 5.8 shows. Focusing on the audio component, H.324 specifies the G.723.1 speech codec, which (although it runs at 5.3 or 6.3 kbps, as previously noted) provides near-tollquality speech, using a 30-ms frame size and 7.5-ms look-ahead. A G.723.1 implementation is estimated to require 14 to 20 fixed-point MIPS in a general-purpose DSP. Terminals may use either rate and can change rates for each transmitted frame, since the vocoder rate is sent as part of the syntax for each frame (receivers can use an H.245 message to signal their preference for low- or high-rate audio) [6]. The average audio bit rate can be lowered further by using silence suppression. In such implementations, silence frames are not transmitted or are replaced with smaller
147
G.711
H.261 H.261 H.262
H.320 (1990)
ISDN LANs/intranets H.323 (1996) ATM/B-ISDN
Sender
System control
Data protocols V.14, LAPM,etc.
User data applications T.120 etc. Control protocol H.245
Audio codec G.723.1
Video codec H.263/H.261
Audio I/O equipment
Video I/O equipment
H.310 (1996)
G.723.1
H.263
H.324 (1995)
PSTN
e.g., H.324
SRP/LAPM procedures
Receive path delay
MPEG-1
G.711
Audio
Standard (year) Video
Network
A d a p t i v e M u x
H.245
H.245
H.242
H.245
l a y e r
l a y e r
Multiplex/ demultiplex H.223
H.222
H.225.0
H.221
H.223
Multiplex Control
Modem control V.25ter
Modem V.34, V.8/ V.8bis
MCU
PSTN network
T1/T3/OC3 UNI
Ethernet/IP
2 B+D, 23 B+D
V.34, V.8, V.8bis modem
Communication
Figure 5.8 Use of compressed speech in multimedia standards. (Modeled after Reference [6]).
Example activity shown
ATM/B-ISDN
LANs/intranets
ISDN
PSTN
Network
Receiver
148
Chapter Five
frames carrying background noise. Generally, both end users rarely speak at the same time, so this can save significant bandwidth for use by video or data channels. The G.723.1 codec imposes about 97.5 ms of end-to-end audio delay, which, together with modem, jitter buffer, transmission time, multiplexer, and other system delays, results in about 150-ms total end-to-end audio delay (propagation delay incremental) [6]. Interestingly, this audio delay is often less than the delay of the video codec; hence, additional delay at the receiver (see Figure 5.8) has to be added in the audio path to achieve lip synchronization (here H.245 is employed to send a message indicating the time differential between the transmitted video and audio signals).7 A number of H.324 applications may not require lip synchronization, or not require video at all. For these applications, optional H.324 audio codecs (e.g., 8-kbps speech codec G.729) can be used—which, as noted earlier, can reduce the total end-to-end audio delay to about 85 ms.
H.323 Multimedia Communications Standard for LANs and Enterprise Networks ITU-T Recommendation H.323, Visual Telephone Systems and Equipment for Local Area Networks which Provide a Non-Guaranteed Quality of Service (1996), is a recommendation that defines the components, procedures, and protocols necessary to provide audiovisual communication LANs. H.323 can be used in any packetswitched network, regardless of the ultimate physical layer. At the upper layers, IP can be utilized in conjunction with a reliable transport mechanism via Transmission Control Protocol (TCP), as well as in conjunction with an “unreliable” transport mechanism [e.g., User Datagram Protocol (UDP)]. As noted in Chapter 2, reliable transport mechanisms use acknowledgment and retransmission to guarantee delivery of PDUs, while unreliable transport mechanisms make a best effort to deliver PDUs without the overhead and delay incurred by retransmission. H.323 also uses the Real-time Transfer Protocol/Real-Time Control Protocol (RTP/RTCP) of the Internet Engineering Task Force (IETF), with extensions for call signaling and additional audio and video coding algorithms. H.323 is independent of network topology per se,8 and H.323 terminals (see Figure 5.9, which is a particularization of Figure 5.8), can communicate over LANs (via hubs, LAN switches, etc.), over local or remote internets (via routers, bridges, etc.), and over dial-up connections. Proponents see the “most exciting application” of this recommendation as video telephony over the Internet [8]. The H.323 recommendation provides various service levels of multimedia communication over a data network: for example, voice only; voice and video; voice and data; or voice, video, and data communications. All of these provide collaborative tools in support of the virtual corporation paradigm, via intranets, extranets or the Internet. For example, with H.323-ready devices, on-demand interactive multipoint multimedia conferences can be established without the need for reservations.
149 RAS control H.225.0
Call control H.225.0
Sender
Audio apps
R
X.224 Class 0
Receiver
IP network (over L2 infrastructure)
Physical layer (IEEE 802.3)
Link layer (IEEE 802.3)
LAN segments
H.245 Control channel
R
App
T.125
MAC
L3
L4
T.124 L7-L5
Data apps
Reliable transport (TCP) T.123
Network layer (IP)
Unreliable transport (UDP)
RTP
H.225.0 Call signaling channel
Terminal control and management
H.323 PROTOCOL VIEW
G.711 H.225.0 G.722 H.261 RAS G.723.0 H.263 RTCP channel G.728 G.279
Video apps
Codec: Encoder/decoder I/O: Input/output
H.225.0 layer
H.323 LOGICAL VIEW
Receive path delay
Figure 5.9 Logical and protocol views of H.323.
PC
Peripherals
System control
H.245 control
System control
Audio codec G.711, G.722, G.723.1 G.728, G.729
Audio I/O equipment
User data applications T.120 etc.
Video codec H.261, H.263
Video I/O equipment
Scope of Recommendation
Communication layers (Logical and physical drivers)
150
Chapter Five
Table 5.7 Umbrella of H.323 Standards Standard H.323
H.255.0 H.245
Description Provides system and component descriptions, call model descriptions, call signaling procedures, control messages, multiplexing, audio codecs, video codecs, and data protocols. Baseline standard that references other ITU-T documents. Describes the media (audio and video) stream packetization, media stream synchronization, control stream packetization, and control message formats. Describes the messages and procedures used for opening and closing logical channels for audio, video, and data; capability exchange; mode requests; control; and indications.
H.323 recommendations govern the operation of H.323 equipment and the communications between H.323 endpoints. The recommendation is a baseline standard that references many other ITU-T documents. H.323 provides the system and component descriptions, call model descriptions, and call signaling procedures. Table 5.7 provides a synopsis of the related recommendations. Other recommendations are listed in H.323 for audio and video coding. For audio coding, G.711 is mandatory, while G.722, G.728, G.723.1, and G.729 are optional; for video coding, H.261 QCIF mode is mandatory, while H.261 CIF and all H.263 modes are optional. The T.120 series of recommendations is used for data applications [8]. A design goal in the development of the H.323 recommendation was interoperability with other multimedia terminal types, including H.320 terminals on N-ISDN, H.321 terminals on ATM, H.322 terminals on IsoEthernet, H.324 terminals on the public switched telephone network, and H.310 terminals over ATM. The H.323 terminal provides real-time bidirectional audio, video, and data communications. Figure 5.9 depicts an H.323 terminal from both a logical and a protocol point of view. (Note, however, that H.323 does not specify audio or video equipment, data applications, or the network interface, these being outside the scope of the specification.)
References 1. R. V. Cox and P. Kroon. “Low Bit-Rate Speech Coders for Multimedia Communication.” IEEE Communications Magazine (December 1996): 34 ff. 2. ITU-T Recommendation G.723.1: Dual Rate Speech Coder for Multimedia Communications Transmitting at 5.3 and 6.3 kbps. Geneva, CH: ITU, March 1996. 3. ITU-T Recommendation G.728: Cooding of Speech at 16 kbps Using Low-Delay Code Excited Linear Prediction. Geneva, CH: ITU, September 1992. 4. ITU-T Recommendation G.729: Coding of Speech at 8 kbps Using ConjugateStructure Algebraic-Code-Excited Linear-Predication (CS-ACELP). Geneva, CH: ITU, March 1996.
Technology and Standards for Low-Bit-Rate Vocoding Methods
151
5. K. Rijkse. “H.263: Video Coding for Low-Bit-Rate Communication.” IEEE Communications Magazine (December 1996): 42 ff. 6. D. Lindbergh. “The H.324 Multimedia Communication Standard.” IEEE Communications Magazine (December 1996): 47 ff. 7. G. D. Forney, et al. “The V.34 High-Speed Modem Standard.” IEEE Communications Magazine (December 1996): 28 ff. 8. G. A. Thom. “H.323: The Multimedia Communications Standard for Local Area Networks.” IEEE Communications Magazine (December 1996): 52 ff.
Notes 1
A signal analyzed with the G.729A coder can be reconstructed with the G.729 decoder, and vice versa. The major complexity reduction in G.729A is obtained by simplifying the codebook search for both the fixed and adaptive codebooks; by doing this the complexity is reduced by nearly 50 percent, at the expense of a small degradation in performance. 2 Additional factors that influence the selection of a speech vocoder are availability, licensing conditions, or the way the standard is specified (some standards are only described as an algorithmic description, while others are defined by bitexact code) [1]. 3 That is, the vocoder parameters are selected in such a manner that the error energy between the reference and the reconstructed signal is minimized. 4 This section is based on ITU-T Recommendation G.723.1. This material is for pedagogical purposes only. Developers, engineers, and readers requiring more information should acquire the recommendation directly from the ITU-T [2]. 5 This section is based on ITU-T Recommendation G.728. This material is for pedagogical purposes only. Developers, engineers, and readers requiring more information should acquire the recommendation directly from the ITU-T [3]. 6 This section is based on ITU-T Recommendation G.729. This material is for pedagogical purposes only. Developers, engineers, and readers requiring more information should acquire the recommendation directly from the ITU-T [4]. 7 Since the receiver knows its local decoding delay for the video and audio stream, the time-skew message allows the receiver to insert the appropriate audio delay. Alternatively, the receiver can bypass lip synchronization and present the audio with minimal delay. 8 Specifically, the protocol architecture up to Layer 3.
CHAPTER 6 Voice over IP and the Internet 6.1
Introduction
This chapter provides an overview of the issues that impact voice over IP (VOIP) deployment and some of the product categories now emerging. There will be several potential uses of this technology, including cable TV operators looking for new revenues, wireless carriers seeking state-of-the-art platforms to meet expansion demands, Internet service providers (ISPs) looking to offer more on the Internet, and corporate planners looking to save money for domestic and international calls. As an example of the new entrants, America Online has announced an IP voice service; the service is targeted at consumers, the quality being far from businessstandard. However, statements made by proponents of VOIP, such as “computer telephony holds such service and price promise that corporate America soon will not stand for less” [1], have turned out to be greatly overoptimistic. Although ATM is a multimedia, multiservice, multipoint technology where the support of voice is technically more practical, at the theoretical level, than is the case over IP, Voice over ATM (VoATM) has nonetheless seen rather limited deployment. Several specifications have emerged in the past few years to support voice using either Constant Bit Rate methods (via ATM Adaptation Layer 1) or Variable Bit Rate methods (via ATM Adaptation Layer 2). However, it continues to be the case that ATM is not widely deployed, particularly in SOHO and branch locations, and is still relatively expensive. On the other hand, IP-based networks are ubiquitous in the corporate landscape. Hence, there is keen interest in applying IP technology to voice. The Internet now has several tens of millions of hosts connecting hundreds of millions of people worldwide. Many individuals use the Internet for a variety of
153
154
Chapter Six
applications: business people, educators, telecommuters, researchers, government officials, and hobbyists, to list just a few. However, use of the Internet for multimedia applications, including voice, is a relatively new development, at least for business applications. The evolving ability to carry voice and fax across an IP enterprise network or the Internet could afford a cost-effective way of managing intracorporate communications. With the latest equipment, a corporate user is able to dial a long-distance number or an office extension, and not be aware that the call is making the journey over the Internet or the intranet. And this will happen (at least initially) for the price of a call to the local ISP. VOIP can support intercorporate communications by bringing down the cost of the equipment and facilities necessary to build an integrated communication infrastructure and by greatly simplifying call center management and troubleshooting. For business customers, Internet-based telephony promises many new features, including, but not limited to, Internet-only call transport, fax over the Internet, conference calling, respond-now customer service, and voicemail that broadcasts calendar scheduling to groupware programs, electronic organizers, or even one’s smart watch. There are also new service and revenue opportunities for such service providers as international carriers, wireless carriers, and cable TV operators. Table 6.1 depicts a classification of product categories now becoming available. Figure 6.1 depicts a server for Internet and intranet support. Figure 6.2 depicts a hybrid server/desktop example for intranet or PSTN access. Table 6.1
Classification of VOIP Equipment Internet only
Intranet only
Internet and intranet
Internet and PSTN
Desktop only IP phones IP phones IP phones and IP phones and (dedicated or routing support legacy support shared PC hardware/ software) Server only Regular phones, Regular phones, Regular phones, Regular phones, but PBX/ but PBX/ but PBX/ but PBX/ key system key system key system key system connections connections connections connections (e.g., Figure 6.1) Hybrid Hardware for Hardware for Hardware for Hardware for desktop/ IP-phone-like IP-phone-like IP-phone-like IP-phone-like server function in function in function in function in PC and PBX/ PC and PBX/ PC and PBX/ PC and PBX/ key system key system key system key system connections connections connections connections
Intranet and PSTN
Internet, intranet, and PSTN
IP phones and IP phones and legacy support legacy/routing support
Regular phones, Regular phones, but PBX/ but PBX/ key system key system connections connections
Hardware for IP-phone-like function in PC and PBX/ key system connections (e.g., Figure 6.2)
Hardware for IP-phone-like function in PC and PBX/ key connections system
155
R
Access ISP
R
R
R
R
MIS INTRANET
Backbone NSP
R
ATM
INTERNET
Figure 6.1 Voice over the Internet and intranets: server-based.
R = Router FW = Firewall NSP = Network service provider ISP = Internet service provider
MDF
DMS-500
T1
10BaseT Telephony server PBX/switch
R/ FW
R
R
R
Delivery ISP
R
R
R/ FW
MDF
DMS-500
T1
10BaseT Telephony server PBX/switch
VOIP board
Branch office
Headquarters Fax VOIP board Server
Cisco, Bay, 3Com, or other IP routers
con TeleFax
2000FX
Server
IP
Fax
IP FXS
T1/E1
con TeleFax
2000FX
IP Router with Firewall PBX
Key system The Internet
PSTN
VPN
SOHO
Field office/ SOHO
VOIP board (station)
Corporate WAN
Fax con TeleFax
2000FX
FXS
Fax
FXO
(T1/FT1 based)
con TeleFax
2000FX
(Station)
VOIP board PSTN
VOIP board Warehouse
Regional office Fax PBX con TeleFax
IP
2000FX
E&M
IP Cisco, Bay, 3Com, or other IP routers
VOIP boards (server)
con TeleFax
2000FX
Key system
Figure 6.2 Voice over an intranet: hybrid server/station-based. (Courtesy of Microsoft Corp.)
156
Fax
FXS
Voice over IP and the Internet
6.2
157
IP/Internet Background
Network communications can be categorized into two basic types, as implied in Chapter 2: circuit-switched (sometimes called connection-oriented) and packet- or fastpacket-switched (these can be connectionless or connection-oriented). Circuitswitched networks operate by forming a dedicated connection (circuit) between two points. In packet-switched networks, data to be transferred across a network is segmented into small blocks called packets [also called datagrams or protocol data units (PDUs)] that are multiplexed onto high-capacity intermachine connections. A packet, which usually contains a few hundred bytes of data, carries identification that enables the network hardware to know how to send it forward to the specified destination. In frame relay, the basic transfer unit is the data link layer frame; in cell relay, this basic unit is the data link layer cell. Services such as frame relay and ATM use circuit-switching principles; namely, they use a call setup mechanism similar to that of a circuit-switched (ISDN) call. IP has become the de facto standard connectionless packet network layer protocol for both local area networks (LANs) and wide area networks (WANs). In a connectionless environment there is no call setup. Each packet finds its way across the network independently of the previous one.
Internet Protocol Suite Chapter 2 provided a basic review of the TCP/IP and UDP/IP suite of networking protocols. TCP/IP is a family of over 100 data communications protocols used in the Internet and in intranets. In addition to the communication functions supported by TCP (end-to-end reliability over a connection-oriented session) and IP (subnetwork-level routing and forwarding in a connectionless manner), the other protocols in the suite support specific application-oriented tasks, for example, transferring files between computers, sending mail, or logging into a remote host. TCP/IP protocols support layered communication, with each layer responsible for a different facet of the communications (as seen in Table 6.2). Some of the VOIP applications utilize TCP, while others utilize RTCP and UDP.
The Internet The same IP technology now used extensively in corporate internets is also used in (and, in fact, originated from) the Internet. The Internet is a global collection of interconnected business, government, and education computer networks—in effect, a network of networks. Recently there has been a near-total commercialization of the Internet, allowing it to be used for pure business applications (the original roots of the Internet were in the research and education arenas). A person at a computer terminal or personal computer equipped with the proper software communicates across the Internet by having the driver place the data in an IP packet and addressing
158
Chapter Six
Table 6.2
Functionality of the TCP/IP Suite Layers
Network interface layer
Network layer (Internet layer)
Transport layer
Application layer
This layer is responsible for accepting and transmitting IP datagrams. This layer may consist of a device driver (e.g., when the network is a local network to which the machine attaches directly) or of a complex subsystem that uses its own data link protocol. This layer handles communication from one machine to the other. It accepts a request to send data from the transport layer, along with the identification of the destination. It encapsulates the transport layer data unit in an IP datagram and uses the datagram routing algorithm to determine whether to send the datagram directly onto a router. The internet layer also handles the incoming datagrams and uses the routing algorithm to determine whether the datagram is to be processed locally or be forwarded. In this layer the software segments the stream of data being transmitted into small data units and passes each packet, along with a destination address, to the next layer for transmission. The software adds information to the packets, including codes that identify which application program sent it, as well as a checksum. This layer also regulates the flow of information and provides reliable transport, ensuring that data arrives in sequence and with no errors. At this level, users invoke application programs to access available services across the TCP/IP internet. The application program chooses the kind of transport needed, which can be either messages or stream of bytes, and passes it to the transport level.
the packet to a particular destination on the Internet. Communications software in routers in the intervening networks between the source and destination networks reads the addresses on packets moving through the Internet and forwards the packets toward their destinations. TCP guarantees end-to-end integrity. From a thousand or so networks in the mid-1980s, the Internet has grown to an estimated 100 million connected network hosts with about 300 million people having access to it (as of 2001). The majority of these Internet users currently live in the United States or Europe, but the Internet is expected to have ubiquitous global reach over the next few years. In 1973, ARPA initiated a research program to investigate techniques and technologies for interlinking packet networks of various kinds. The objective was to develop communication protocols that would allow networked computers to communicate transparently across multiple packet networks. The project became very successful and there was increasing demand to use the network, so the government separated military traffic from civilian research traffic, bridging the two by using
Voice over IP and the Internet
159
common protocols to form an internetwork or internet. The term internet is defined as “a mechanism for connecting or bridging different networks so that two communities can mutually interconnect.” So, in the mid-1970s ARPA became interested in establishing a packet-switched network to provide communications between research institutions in the United States. With the goal of heterogeneous connectivity in mind, ARPA funded research by Stanford University and Bolt, Beranek, and Newman to create an explicit series of communication protocols. The ARPA-developed technology included a set of network standards that specified the details of the computers that would be able to communicate, as well as a set of conventions for interconnecting networks and routing traffic. The result of this development effort, completed in the late-1970s, was the Internet suite of protocols. Soon thereafter, there were a large number of computers and thousands of networks using TCP/IP, and it is from their interconnections that the modern Internet has emerged. As noted in Chapters 1 and 3, ARPA was also interested in integrated voice and data. While the ARPAnet was growing into a national network, researchers at Xerox Corporation’s Palo Alto Research Center were developing one of the technologies that would be used in local area networking, namely, Ethernet. Ethernet became one of the important standards for implementing LANs. At about the same time, ARPA funded the integration of TCP/IP support into the version of the UNIX operating system that the University of California at Berkeley was developing. It followed that when companies began marketing non-host-dependent workstations that ran UNIX, TCP/IP was already built into the operating system software, and vendors such as Sun Microsystems included an Ethernet port on their devices. Consequently, TCP/IP over Ethernet became a common way for workstations to interconnect. The same technology that made PCs and workstations possible made it possible for vendors to offer relatively inexpensive add-on cards to allow a variety of PCs to connect to Ethernet LANs. Software vendors took the TCP/IP software from Berkeley UNIX and ported it to the PC, making it possible for PCs and UNIX machines to use the same protocol on the same network. In 1986, the U.S. National Science Foundation (NSF) initiated the development of the NSFnet. NSFnet has provided a backbone communication service for the Internet in the United States. It should be noted that the NSFnet operated utilizing a service acceptable user policy (AUP). The policy stated that the NSFnet was to support open research and education in and among U.S. research and intellectual institutions, plus support a research arm of for-profit firms when engaged in open scholarly communication and research. Use for other purposes was not acceptable. The commercialization of the Internet that is now being experienced is not based on the AUP. By the end of 1991, the Internet had grown to include over 5000 networks in over three-dozen countries, serving more than half a million host computers. These numbers have continued to grow at geometric rates throughout the 1990s. There are now several thousand Internet service providers (ISPs), although the number is expected to decrease greatly over the next five years. Table 6.3 depicts highlights of the history of the Internet over a 30-year span.
Table 6.3 Snapshot of Internet-Related Activities over the Years • Late 1960s: ARPA (DoD think tank) introduces the ARPAnet. • 1970s: ARPAnet expands geographically and functionally to allow nonmilitary traffic (e.g., universities and defense contractors). • Late 1970s: The realization takes hold that the ARPAnet cannot scale. • TCP/IP is developed for heterogeneous networking and interenterprise connectivity. Protocols to support global addressing and scalability are developed. • Early 1980s (1983): TCP/IP is a standard operating environment for all attached systems. • Network splits into a military component (MILNET) and a civilian component (ARPAnet). • 1986: Six supercomputer centers are established by NSF. • Interagency dynamics and funding considerations lead to the creation of the NSFnet by the NSF. IP protocol and newer equipment are utilized in the NSFnet. NSFnet and ARPAnet intersect at Carnegie Mellon University. • Late 1980s: ARPAnet is absorbed into NSFnet. • Phase 1. Three-tiered architecture developed: 1. NSF to undertake overall management and fund the backbone operationally and in terms of technology upgrades 2. Regional and state network providers to supply Internet services between universities and the backbone and become self-supportive through service fees 3. Campus networks and organizations, colleges, and universities to use TCP/IP-based systems to provide widespread access to researchers and students • 1987: Six supercomputer sites interconnected using DEC routers and 56-kbps links. • Traffic congestion begins to be experienced. • Phase 2. Merit partnership formed with IBM and MCI to upgrade network. • Mid-1988: A DS1-line (1.544-Mbps) network connects more than a dozen sites, using IBMbased switches. • 1989: Reengineering due to fast growth (15 percent per month); new routers and additional T1 links (MCI) are installed. • Phase 3. Third redesign of NSFnet, using an outsourcing approach—NSFnet is overlaid upon a public Internet (NSF is relieved from the responsibility of upgrading the network on an ongoing basis). Lines are upgraded to DS-3 rates (45 Mbps). • Merit, IBM, and MCI form Advanced Network Services Inc. (ANS); the not-for-profit organization is to build and manage a commercial Internet. • DS3 lines are provided by MCI; routers by IBM (RS/6000-based). Network is also called ANSnet. NSFnet is now a virtual network in the ANSnet (migration accomplished in two years). • 1992: Original NSFnet is dismantled. • ANS launches a for-profit subsidiary (ANS CORE) to face costs. • Debates are sparked by commercial Internet providers: PSINet, CERFNet, and AlterNet form the Commercial Internet Exchange (CIX) as a backbone and bypass to the NSFnet. 155 other members join, including NEARnet, JvNCnet, SprintLink, and InfoLAN. (Based on the CIX approach, CICnet, NEARnet, BARRnet, North WestNet, NYSERnet WestNet, and MIDnet form the Corporation for Regional and Enterprise Networking (CoREN). Regional commercial providers (not in CoREN) compete against CoREN.
160
Voice over IP and the Internet
161
Table 6.3 (Continued) • Phase 4. Rapid increase requires NSF to redesign the backbone. • Two years of bidding and planning leads to two awards to replace current NSFNet: 1. MCI to deploy very high speed backbone network service (vBNS), based on 155-Mbps SONET/ATM, to connect NFS supercomputing centers 2. Merit and USC Information Sciences Institute to do routing coordination • Network access providers (NAPs) are to provide access to the vBNS; NAP functions go to Ameritech, Sprint, MFS, and PacTel. • NFS institutes the Routing Arbiter for fair treatment among various Internet service providers with regards to routing administration; provision of a database of route information, network topology, routing path preferences, and interconnection information; and deployment of routing that supports type of service, precedence routing, bandwidth on demand, and multicasting (accomplished by route servers using Border Gateway Protocol and Interdomain Routing Protocol). • Fund established to support a Network Information Center (NIC): Registration Services (by Network Solutions Inc.) include IP, Domain Names, whois, and whitepages. Directory Services (by AT&T) include a directory of directories and white and yellow pages. Information Services (by General Atomics) include coordination services, a clearinghouse for information, training, workshops, a reference desk, and education. • Phase 5 (1994–2000). Major expansion of the dot-com economy. • Phase 6 (2000–present). Burst of the dot-com bubble. • Shakeout of ISPs is predicted by 2002: of more than 1400 ISPs in 1995, about 100 are expected to survive.
TCP and IP were developed for basic control of information delivery across the Internet. Application layer protocols, such as TELNET (Network Terminal), file transfer protocol (FTP), simple mail transfer protocol (SMTP), and hypertext transfer protocol (HTTP) have been added to the TCP/IP suite of protocols to provide specific network services. Access and backbone speeds have increased from 56 kbps, to 1.5 Mbps (most common now), to 45 Mbps and beyond, for most of the backbones. Voice applications over IP have to ride over the Internet systems developed for traditional data services. Most problematic is the lack of QoS support; this, however, is expected to slowly change. Nonetheless, in spite of the emergence of new technologies, such as RSVP and RTP, a retarding factor to true QoS support is the Internet’s own success: The numbers of people using it are increasing a such a rapid rate that it is difficult to add enough resource and protocol improvements to keep up with the demand. Intranets use the same WWW/HTML/HTTP and TCP/IP technology that is used for the Internet. When the Internet caught on in the early to mid-1990s, planners were not looking at it as a way to run their businesses. But just as the action of putting millions of computers around the world on the same protocol suite fomented the Internet revolution, so connecting islands of information in a corpo-
162
Chapter Six
ration via intranets is now sparking a corporate-based information revolution. Tens of thousands of corporations now have intranets. Across the business world, employees from engineers to office workers are creating their own home pages and sharing details of their projects with the rest of their companies.
6.3 Voice Transmission and Approaches in ATM, Frame Relay, and IP In principle, the emerging technologies for transmitting voice over data networks present opportunities for organizations to reduce costs and enable new applications. In particular, traditional router vendors see the opportunity to cannibalize the existing voice traffic by adding appropriate features to their routers. Clearly, if a company uses separate facilities to carry on-net voice from company location to company location, there could be additional costs in terms of communication channels, equipment, and carrier charges. In looking to carry voice over IP, one must keep in mind that voice transmissions can tolerate relatively low round-trip delay and jitter; in fact, for traditional commercial toll applications, that delay has been on the order of 10 to 30 ms. For voice over data networks, occasionally dropping packets, frames, or cells is not an issue, since the human ear can tolerate small glitches without loss of intelligibility. Also, for practical design considerations, delay ranges are allowed to be higher (up to 200 ms). Many of the algorithms utilized in voice over data networks are not transparent (being lossy), but preserve reasonable to good voice quality while greatly reducing data rates. A synopsis of evolving voice over data protocols follows.
ATM Current approaches for voice over ATM assume a PCM model where voice encoding and transmission takes place in real time. In effect, an entire DS1, composed of up to 24 voice channels, is transported end-to-end over ATM using networkinterworking techniques (structured n × 64 kbps is also supported by newer equipment). This model imposes a need to preserve timing in speech delivery and playback. This can be accomplished with a kind of timestamping in the ATM Adaptation Layer 1 (AAL 1) header or via adaptive clocking. But there has been interest in using other kinds of AALs. The ATM Forum started work on voice transport in 1993, and it was not until April 1995 that the Voice and Telephony Services over ATM (VTOA) working group published its first document, which contained the structured and unstructured circuit emulation specifications. More specifications were approved in 1997. Other documents that utilize other AALs (AAL 2 or 5) for voice followed (see Figure 6.3).1
Figure 6.3 Documents on server-based voice over the Internet and intranets.
ATM Forum Voice over ATM Documents VTOA Phase 1 (September 1997)
Now “ATM Trunking Using AAL 1 for Narrowband Services” af-vtoa-0089.000—Approved July 1997 without MIB Yes: 36 No: 1 Abstain: 2 Total votes: 39 16.7% (215 principal members)
VTOA Phase 2 (late 1998)
Now “ATM Trunking Using AAL 2 for Narrowband Services” Draft—Coordinated with ITU-T Dynamic Bandwidth Utilization in 64-kbps Time Slot Trunking over ATM—Using CES af-vtoa-0085.000—Approved July 1997 Yes: 31 No: 0 Abstain: 6 Total votes: 37 14.4% (215 principal members)
DBCES
Timetable for AAL 1 and AAL 2 Specifications CES-V2 (SCE) I.363.1 (AAL 1 Rev) VTOA Phase 1 I.363.2 AAL 2 AAL 2 SSCS VTOA Phase 2
January 1997 September 1996 September 1997 September 1997 Late 1998 Late 1998
ITU-T Recommendations AAL 2 Structure Service Specific Conversion Sublayer (SSCS) Common Part Sublayer (CPS)
AAL-SAP
I.SEG I.TRUNK I.363.2 AAL 2
Frozen September 1997 Frozen June 1998 Frozen February 1997
Approval June 1998 Approval September 1997
I.SEG • Segmentation and reassembly for data • Three sublayers (assured mode and error protection optional) Assured mode (Q.921 or SSCOP) Error protection Basic SAR with AAL 5 trailer
I.Trunk Application (SSCS user) support: The LLT SSCS will specify the support of the following traffic types:
• • • • •
• • • • •
Compressed voice PCM voice Silence indication Status and alarm Control
163
CAS Dialed digits Fax/modem demod/remod Circuit-mode data Frame-mode data
164
Chapter Six
Frame Relay A compressed and packetized model of voice transmission has emerged that separates the time scales for encoding, transmission, and playback. Hence, the need to preserve synchronous timing is no longer necessary: Improvements in encoding algorithms and faster and cheaper DSP hardware have changed the paradigm. At this juncture, most voice systems in this context use some kind of prediction technique (vocoding). In addition to proprietary methods, typical compression schemes being used include G.729 CS-ACELP, G.728 LD CELP, and G.723.1 MP-MLQ; also supported are G.726/G727 (ADPCM) and G.711 (PCM). Predictions are based on the most recently received information. Therefore, if a frame is lost, the newly arriving frame will show that the receiver’s prediction is not current, since the missing frame did not update the receiver. It follows that the output is not correct, and the result is somewhat distorted speech. Hence, the performance is related to both delay and loss in the frame relay network. One of the issues is how much time is needed for the receiver to catch up with the arriving frames and get current, so that the voice output will be as intended. Voice compression algorithms of the early 1990s could require several seconds to synchronize after a loss of bits; newer algorithms are able to self-synchronize within the length of a single vocoder frame, as implied by the discussions in Chapters 4 and 5. This makes each frame effectively independent. Since human ears can compensate for the loss of 20 ms of sound, an occasional lost frame does not disrupt communications. (However, if every other frame were lost, then there would be a serious problem.) In 1997 the Frame Relay Forum adopted the FRF.11 Voice over Frame Relay specification to encourage interoperability. Until then the same frame relay access device (FRAD) was needed at both ends. There are many FRADs that support voice on the market. A number of FRADs now support voice switching, fax demodulation, echo cancellation, silence suppression, and dynamic bit rate adaptation technologies, with support for both the FRF.11 and the ITU-T’s G.729 standard voice algorithms.
IP This model uses Web telephones that can be used in conjunction with Internet services to bypass the public telephone network. This approach envelopes frames of compressed speech into IP packets (IP encapsulation). Typically, voice is compressed to 8 kbps (or less) using proprietary or, preferably, standard methods. The IP overhead increases the datarate to 14.8 kbps. Some devices use silence compression technology, so that bandwidth is required only when someone is actually talking. (During periods of silence, bandwidth for voice is automatically freed up for other traffic on the enterprise network.) This typically reduces the bandwidth utilization to about 6 kbps (assuming 60 percent silence compression). Some devices also use forward error correction techniques to minimize loss, and jitter-buffer techniques to reduce latency variations. Based on current technology at this writing, the quality of the speech generally needs improvement. Quality is impacted by
Voice over IP and the Internet
165
both the compression algorithms (relatively less important) and by the lack of guaranteed QoS in the IP network (relatively more important). In addition to the current quality issue, Web/IP phones have traditionally suffered from the fact that they are proprietary—hence, the need for standards. Fortunately, as described in previous chapters, in 1995 the ITU-T standardized the ACELP voice algorithms for the coding of speech signals in WANs; ACELP is used for compression rates at or below 16 kbps. ITU-T G.729 (CS-ACELP) is an international standard that compresses the standard 64-kbps PCM streams as used in typical voice transmission to as low as 8 kbps. ITU-T G.728 (LD-CELP) is an international standard that compresses to 16 kbps. ITU-T G.723.1 compresses voice to rates as low as 5.3 kbps (it also operates at 6.3 kbps). Early supporters of G.723.1 included Microsoft, Intel, PictureTel, and the major videoconferencing vendors. This standard is used in the H.323 recommendation for conferencing over LANs. G.723.1 is considered to be a good first step and is best suited for intranets and controlled point-to-point IP-based connections. G.729A, a simplified version of G.729, operates at 7.9 kbps and is therefore slightly better in quality than G.723.1. In recent years, ITU-T G.723.1 has gained major penetration in the VOIP space.
ITU-T H.323 Group of Standards If VOIP is going to be successful, standardization will be critical. As alluded to in Chapters 1 and 5, H.323 describes terminals, equipment, and services for multimedia communication over LAN and IP networks that do not provide a guaranteed quality of service (see Figure 6.4). H.323 terminals and equipment may carry real-time voice, data, and video, or any combination, including videotelephony. In particular, H.323 allows terminals to signal (set up and tear down) their network connections needs to switch infrastructure. The LAN over which H.323 terminals communicate may be a single segment, or it may be multiple segments with complex intranet topologies. It should be noted that operation of H.323 terminals over multiple LAN segments (including the Internet) may actually result in poor performance, since the possible means by which QoS might be assured on such types of LANs and internets is beyond the scope of the recommendation. H.323 terminals may be integrated into personal computers or implemented in standalone devices, such as videotelephones. Support for voice is mandatory in the standard, while data and video are optional; but, if supported, the ability to use a specified common mode of operation is required so that all terminals supporting that media type can interwork. Other components in the H.323 series include H.225.0 packet and synchronization, H.245 control, H.261 and H.263 video codex, G.711, G.722, G.728, G.729, and G.723 audio and speech codex, and the T.120 series of multimedia communications protocols. H.323 terminals may be used in multipoint configurations and may interwork with H.310 terminals on B-ISDN, H.320 terminals on N-ISDN, H.321 terminals on B-ISDN, H.322 terminals on guaranteed QoS LANs (e.g., IEEE 802.9), H.324 terminals in PSTN and wireless networks, and V.70 terminals on PSTN. The H.323 standards are covered in more detail in Chapter 7.
166
Chapter Six
Video I/O equipment
Audio I/O equipment
Video codec H.261, H.263
Audio codec G.711, G.722, G.723, G.728, G.729
Receive path delay
H.225.0 layer
User data applications T.120, etc. System Control H.245 Control System control user interface
Call Control H.225.0 RAS Control H.225.0 LAN NIC
Peripherals and peripheral functions
Figure 6.4 Terminal and terminal equipment.
Streaming Audio Compression is also applicable to streaming audio and video that may also be delivered over IP and/or the Internet (e.g., Internet radio). Streaming is a technique of breaking up a file into pieces and sending those to the user in a sequence. The receiver is able to use or play the multimedia data as it arrives. User software can begin to process the pieces as soon as it receives them. For example, a streaming system would break compressed audio data into many packets, sized appropriately for the bandwidth available between the client and the server. When the client has received enough packets, the user software can be playing one packet, decompressing another, and receiving a third. This approach contrasts with the more traditional (and less user-friendly) method of delaying the playing until the entire file has been downloaded [2]. These systems use an encoding process for compressing and packetizing the datastream and a decoding process for managing buffers
Voice over IP and the Internet
167
according to available bandwidth, decompressing the packets and rendering its contents. Real-Time Streaming Protocol (RTSP) and Microsoft’s Active Streaming Format may be employed. Most products available today are based on proprietary methods, limiting interoperability. Because coder/decoder systems usually aim at reducing the contents’ datarate, they are lossy. This implies that the quality of the contents will be degraded to various degrees. Lossy compression is one of the reasons these systems are proprietary.
6.4
QoS Problems and Solutions
Voice over IP is impacted by network congestion. QoS encompasses various levels of bandwidth reservation and traffic prioritization for multimedia and other bandwidth-intensive applications. The topic is discussed at the protocol level in Chapter 8. The specific QoS solution depends on the applications and circumstances at hand. QoS is generally not required for batch applications; it is needed for most, if not all, real-time applications (see Table 6.4 [3]). For nonmultimedia applications, QoS in enterprise networks is useful for allocating and prioritizing bandwidth to specific users. For example, accounting departments may need more bandwidth when they are closing the books each month and CEOs need more bandwidth during extensive videoconferencing sessions. QoS is also important to supply the streams of data that continuously move across users’ computer screens, such as stock tickers, real-time news, or viable data.
Table 6.4 Applications and QoS QoS required Non-real-time data
Little or none
Non-real-time multimedia
Little or none
Real-time one-way
Various QoS levels
Real-time interactive
Various medium or high QoS levels
SOURCE: From Reference [3].
Applications Data file transfer Imaging Simulation and modeling Exchange text e-mail Exchange audio/video e-mail Internet browsing with voice and video Intranet browsing with voice and video Multimedia playback from server Broadcast video Distance learning Surveillance video Animation playback Videoconferencing Audioconferencing Process control
168
Chapter Six
Various QoS solutions are available, beginning at the low end with more bandwidth to the LAN desktop via Layer 2 switching. New protocols and standards offer the next level of QoS for enterprise network environments, including 802.1p, 802.1Q and RSVP [3]. Using ATM as a backbone improves bandwidth between subnetworks, and MPLS adds performance improvements in environments where IP dominates. Besides the capability for bandwidth reservation, QoS is affected by the abilities of switches to perform real-time IP routing. Advances in silicon integration are being brought to bear to optimize the performance of third-wave switches and pave the way for wire-speed IP routing capabilities.2 Third-generation switches are optimized for switching at gigabit per second speeds. This is a possible thanks to advancements in high-performance custom ASICs that can process packets simultaneously and in real time across multiple ports in a switch. Furthermore, the design of ultrawide data paths and multigigabit switching backplanes enables thirdwave switches to perform at gigabit speeds through full-duplex connections on all ports without blocking.3 In VOIP, the need for QoS is driven by the plethora of performance-related requirements for speech. The Alliance for Telecommunication Solutions (ATIS) recently looked into requirements for voice over data networks. The following list enumerates some of the requirements for voice support in data networks, including IP [4]: • • • • • • • • • • • • •
Attenuation distortion goals Crosstalk goals Delay (steady-state) goals Delay (variation) goals Dropout goals Echo return loss Group delay Gain hits Listener echo Loss (single frequency) Noise (impulse) Noise (quantization, including total distortion) Noise (steady-state)
• • • •
Phase hit Phase jitter Relative level—output Return loss
Voice over IP and the Internet
169
• Signal clipping (power) • Singing return loss • Talker echo path loss Some of these do not necessarily apply for PC-to-PC communication, but could apply when the voice over data network is interconnected with the PSTN.
6.5 Protocols for QoS Support for Audio and Video Applications This section discusses in a preliminary fashion a number of the Resource Reservation Protocol (RSVP) approaches to QoS. RSVP intserv was advocated before diffserv became well known. Both approaches are treated more extensively in Chapter 8.
RSVP Applications RSVP, along with available network bandwidth, is used in some situations to ameliorate QoS support in IP networks. New applications are now emerging that require such capabilities. For example, some call centers are adding Web telephone access, letting customers reach the carrier’s customer service agent by clicking a “speak to the agent” icon at the Web site. But in order to scale this on a broad scale, standards are required so that QoS can be supported and be made available as a network service. RSVP and the underlying intserv model are based on receiver-controlled reservation requests for unicast or multicast communication. RSVP carries a specific QoS through the network, visiting each node the network uses to carry the stream.At each node, RSVP attempts to make a resource reservation for the stream.To make a resource reservation at a node, the RSVP daemon communicates with two local decision modules, admission control and policy control. Admission control determines whether the node has sufficient available resources to supply the requested QoS. Policy control determines whether the user has administrative permission to make the reservation. If either check fails, the RSVP program returns an error notification to the application process that originated the request. If both checks succeed, the RSVP daemon sets parameters in a packet classifier and packet scheduler to obtain the desired QoS.The packet classifier determines the QoS class for each packet and the scheduler orders packet transmission to achieve the promised QoS for each stream (see Figure 6.5). A receiver-controlled reservation allows scaling of RSVP for large multicast groups. This support is based on RSVP’s ability to merge reservation requests as they progress up the multicast tree. The reservation for a single receiver does not need to travel to the source of a multicast tree; rather, it travels only until it reaches a reserved branch of the tree.
170
Chapter Six
Figure 6.5 RSVP routing.
Incoming Link 1 Incoming Link 2 Incoming Link 3
Transmitter subnetwork
Receiver subnetwork
Packet classifier
Packet scheduler
Policy control
RSVP daemon
Outgoing Link 1 Outgoing Link 2
Admission control
Router
RSVP does not perform its own routing; instead, it uses underlying routing protocols. RSVP is now available in many router products. To ensure delivery through the network, RSVP allows listeners to request a specific quality of service for a particular data flow. Listeners can specify how much bandwidth they will need and what maximum delay they can tolerate; internetworking devices then set aside the bandwidth for that flow. Users are either granted the channel they have requested or are given a busy signal. RSVP hosts and networks interact to achieve a guaranteed end-to-end QoS transmission. All the hosts, routers, and other network infrastructure elements between the receiver and the sender must support RSVP. They each reserve such system resources as bandwidth, CPU, and memory buffers to satisfy a request. RSVP operates on top of IP (either IPv4 or IPv6), occupying the place of a transport protocol in the protocol stack, but it provides session-layer services (it does not transport any data). The RSVP protocol is used by routers to deliver control requests to all nodes along the paths of the flows. Vendors have implemented RSVP both above and below Winsock. RSVP-aware applications can be developed with Winsock 2, which has a QoS-sensitive API. Another approach is to use an RSVP proxy that runs independently of the real application, making RSVP reservations. RSVP raises questions about billing for Internet bandwidth. In the current model, ISPs oversell their available capacity, and customers accept slowdowns. Since resource reservation puts a specified demand on bandwidth, overselling would result in unacceptable performance by the admission control module. ISPs will probably offer different service levels, and premiums will be charged for RSVP
Voice over IP and the Internet
171
reservations. Billing across multiple carriers will also have to be resolved, as will the allocation of computational resources to routers to inspect and handle packets on a prioritized basis. It is unclear whether existing routers would be able to handle widescale implementation of RSVP across the whole Internet. RSVP has seen relatively little deployment over the years because it has difficulty scaling beyond 30–40 nodes.
IP Multicast For the Internet to be a viable real-time audio medium, it needs a method for serving a community of users. IP Multicast is a suite of tools that addresses the bandwidth cost, availability, and service-quality problems facing real-time, large-scale Webcasting. Rather than duplicating data, multicast sends the same information just once to multiple users. When a listener requests a stream, the Internet routers find the closest node that has the signal and replicate it, making the model scalable. IP multicast can run over just about any network that can carry IP, including ATM, frame relay, dial-up, and even satellite links. Originally developed in the late 1980s, it is now supported by all major internetworking vendors, and its implementation and usage is picking up speed. Reliability is a problem with multicast because there is not necessarily a bidirectional path from the server to the user to support retransmission of lost packets. A string of lost packets could create enough return traffic to negate multicast’s bandwidth savings. For this reason, TCP/IP cannot be used. Among the transport protocols developed for IP Multicast, RTP and RTCP are the main ones for realtime multimedia delivery. RTP adds to each packet header the timing information necessary for data sequencing and synchronization. It does not provide mechanisms to ensure timely delivery or provide QoS guarantees; it does not guarantee delivery, nor does it assume that the underlying network is reliable. Uninterrupted audio requires a reliable transport layer; nevertheless, existing basic concealment techniques, such as frequency domain repetition combined with packet interleaving, work reasonably well if packet loss is minimal and occasional departures from perfection can be tolerated. One approach is to use forward error correction (FEC). Adding some redundant data improves performance considerably; combined with interleaving, this can be a good strategy, but it requires more bandwidth for a given quality level. This can be a challenge on a 28.8-kbps modem connection. On the business side, reliable multicast can be used to increase the performance of many applications that deliver information or live events to large numbers of users, such as financial data or video streaming. Reliable multicast creates higher-value application services for today’s IP-based networks. According to a study conducted by the IP Multicast Initiative (IPMI) in the late 1990s, 54 percent of information systems managers stated that IP Multicast had created new business opportunities for their companies, and these numbers are likely to grow from year to year [5].
172
Chapter Six
6.6
Internet Telephony Servers (ITSs)
Enterprises used to justify the cost of a private WAN by the cost savings these networks achieved for the on-net voice traffic. Now, bandwidth requirements for data networks are so great that organizations can add voice capabilities to these networks for relatively limited incremental costs. A number of vendors offer voice over IP gateways that integrate hardware and software designed and developed to provide seamless voice and fax transmission over an IP network. For example, Micom (now a Nortel subsidiary) offers the V/IP Phone/Fax IP gateway, as shown in Figure 6.6. Cisco Systems has a battery of products in this space. The gateway approach has the design advantage of installing directly in an organization’s IP network and interoperating with existing PBX and key systems and, so, with existing telephones and fax machines. On the enterprise side of the network, the gateway interoperates with LANs, routers, switches, and WANs. The issue with this and other approaches is that they are proprietary solutions, and, hence, the same exact equipment must be employed at every site. Gateways generally use both PC-bus compatible (e.g., ISA) interface cards, which terminate the PBX, compress the voice, and pack it into the IP datagram, and a suite of call management software. Each gateway typically receives a telephone number and an IP address, which are entered into the gateway’s database. The database provides the mapping of the gateway telephone number to the appropriate IP address (see Figure 6.7).
PBX
Cluster Controller
PBX
T1/E1 192–258 kbps Marathon 20K Digital Direct Series
T1/E1 192–256 kbps IBM Host
Marathon 20K Digital Direct Series
Public Frame Relay or Private Leased Line
Figure 6.6 Companywide V/IP deployment. (Courtesy of Micom Communications Corp., copyright 1999.)
Voice over IP and the Internet
Figure 6.7 Phone directory database maps phone numbers entered by callers to the IP address of the destination gateway site being called.
173
Telephone area code 808
Gateway acting as phone directory database server for the network • Gateway code = 808 • IP address = 193.30.18.10 Network 193.30.18.0
Router
Telephone area code 215
Telephone area code 709 WAN
Router
Network 193.30.19.0
Router
Network 193.30.20.0
In general, the planner needs to take into account the operating system, such as Novell NetWare, MS-DOS, or Microsoft XP. Some gateways support multiple operating systems. The gateway must handle all aspects of the call setup: digitization and compression, IP encapsulation, IP address mapping, and datagram delivery to and from the closest router. The interface cards interface with the PBX and take the voice or fax signals from the existing system and convert them into a digital format that can be processed by a PC. These cards must be available for analog voice (e.g., older PBXs) and for digital voice (e.g., newer PBXs). Also, the cards must be available in single channel or multiple channel applications (say four-port T1/E1) and must support traditional terminaTable 6.5 FXS FXO E&M
Basic PBX Nomenclature Foreign exchange station. An interface for connecting a standard telephone set, facsimile machine, key system, or PBX. Foreign exchange office. An interface that emulates a telephone and connects to the station side of a PBX or directly to a central office. Ear & mouth. A trunking arrangement used for two-way switch-toswitch or switch-to-network connections. An E&M tie line connects PBXs in different locations. The E lead receives signals, while the M lead transmits signals.
174
Chapter Six
Figure 6.8 VF interfaces to be supported by IP gateways.
Voice interface card
FXS
IP Network FXS
Voice interface card
From trunk side of PBX E&M
Voice interface card
Voice interface card
FXO To station side of PBX, central office, or CENTREX
Gateway
tion modes (e.g., FXS/FXO) and signaling (e.g., E&M). Table 6.5 defines some of these terms (also see Figure 6.8). Digital signal processing capabilities are built into the cards to digitize and compress voice using some proprietary or, preferably, standard compression scheme, such as the ITU-T G.729. An enterprise-network-based call would proceed as follows: 1. The caller picks up a standard desk telephone, which is supported by a PBX. The PBX is physically connected to the gateway over one of the access cards. 2. The caller then dials an access code (e.g., 8) that tells the PBX to route this call over the PBX trunk connected to the gateway. Next, the caller types in the branch or extension number (e.g., 892-2345). 3. The gateway routes call setup messages over the enterprise network to the remote gateway. The gateway sets up the call via the PBX, and if the called
Voice over IP and the Internet
175
party is available, voice bits will be encapsulated within the IP payload. More precisely, • The access number to the gateway, the destination office number, and the remote extension number trigger a calling-out signal that travels from the telephone through the PBX. • The calling-out signal goes into either an analog or a digital origination gateway interface card. • The origination gateway undertakes call setup based on the digits entered. The gateway’s telephone database maps the destination office number to the remote gateway’s IP address. • The gateway establishes the availability of an open channel on the remote gateway. If a priority queuing protocol such as RSVP is available, the gateway can use it to request allocation of bandwidth on the network. Otherwise, a standard best-effort IP service is utilized. • During the course of a conversation, the voice signal is digitized and compressed into datagrams. The datagrams are encapsulated into IP protocol data units (PDUs). The PDUs are transmitted from the gateway’s voice interface card through the PC’s (Ethernet) network interface card over the LAN medium and over to the router. The router forwards these PDU across the network on a priority/RSVP basis or on a best-effort basis. • The destination gateway handles comparable functions. Desirable features of a voice over IP gateway include the following: • • • • •
•
• •
Optimal use of bandwidth. Support of speech quality. Capability for flexible integration into a voice environment (e.g., PBX). Enterprise-network ready (e.g., must support LAN/WAN protocols for direct attachment). System-level scalability: must be easy to add PBX-side interfaces to support growth in requirements. This also means supporting the appropriate links (such as FXO, FXS, and E&M). Network-level scalability: must be able to support the mapping of a sufficient number of extensions (perhaps several thousand) to the respective IP addresses of the destination gateway. A master database with distributed upgrade may be desirable. Capability for attachment to the Internet, if desired, as illustrated in Figures 6.9 and 6.10. Support of fax.
176
Chapter Six
Figure 6.9 Internet-supported IP gateway for voice. PBX Gateway
Premises router
IP–Internet
Premises router
Premises router
PBX
Gateway
PBX
Gateway IP-Internet router
There is interest in IP-voice technology from an Internet-enabled call center perspective. With the development of Web-based commerce, there is interest in providing an integrated contact for online voice and data communications, while at the same time delivering convenient, quick, and high-quality service no matter where the transaction originates. Microsoft’s Telephony Application Program Interface (TAPI), among other interfaces, can be used in conjunction with Web browsers, Java, and voice over IP to provide a realization of computer telephony integration (CTI) and interactive voice response (IVR).
Voice over IP and the Internet
6.7
177
The Voice over IP/Internet Market
Proponents have predicted that the market for VOIP will grow rapidly. A report from Britain predicted that 15 percent of all voice calls would be made via the IP/Internet by 2000. Naturally, this was an overestimation. According to original figures from Forrester Research, the U.S. market was expected to be $2 billion by 2004. That is estimated to be more than 4 percent of U.S. long-distance revenues. Frost & Sullivan is even more bullish on Internet telephony: The firm estimated that spending will approach $2 billion as early as 2001 [6, 7]. These early predictions were all too bullish; Chapter 11 provides more up-to-date market information. Voice over IP/Internet is developing in three directions: (1) PC-to-PC, where individuals online talk through their PCs, (2) PC-to-phone, where individuals make and receive voice calls and messages while on the Internet, and (3) phone-tophone, where calls are made and received using normal phone handsets. While a number of vendors tout products that support voice over the Internet, performance issues, particularly transmission delays and communications-quality audio signals (e.g., MOS around 3.0), have created what some characterize as “a lackluster response, especially among corporate IS managers seeking a businessquality voice over IP solution.”4 Corporate and other IP networks need to be better managed to handle real-time network traffic needs—for example, prioritization and bandwidth allocation. Evolving QoS standards promise to solve some of these problems, but support is limited. In addition, while nearly all the early CPE was proprietary, there is now a good acceptance of standards, particularly H.323. Initially, VOIP vendors were of the small upstart category, rather than the mainstay. This is because the initial interest came from nonbusiness Internet users looking for free telephone service over the Internet. Of late, high-tech leaders such as Cisco and Nortel have entered the market. Their equipment is generally inexpensive (e.g., less than $10,000 in some cases) and can pay for itself in a few months. Cisco is now a major supplier of H.323 phones for enterprise applications.
6.8
VOIP Regulatory Issues
Some claim that several key regulatory and economic issues need to be resolved before VOIP will see a major deployment, even by new entrants, such as the Regional Bell Operating Companies now tackling the long-distance telephony market. The attitude of some carriers may be similar to the following one that appeared in the trade press in the late 1990s: “Yes, we want to eventually deliver voice-over-IP services, but everyone [regulators and carriers] wants to buy some time right now. There is so much involved when it comes to large-scale services, and this IXC just can’t jump in and end up in a position where delivery is costing a fortune” [8].
178
Chapter Six
The issue related to tariffs impacts local and international considerations. According to some, these tariffs will play a major role in determining the cost of offering such services and the prices that long-distance carriers can charge. Related to domestic communications, long-distance carriers have to determine how local access charges will affect Internet telephony. Local access charges are the fixed per-call rates that a long-distance carrier must pay to a LEC to complete a long-distance call to the final leg of its journey. In recent years, the long-haul carriers have taken the position that the access charge is too high and the current scheme is not fair. A reduction in local access charges will help IXCs finance the cost of developing and delivering Internet telephony services. In fact, Internet service providers do not pay access charges at all. On the international front, the tariff in question, known as the accounting rate, is the same one that has been in the news lately because the FCC has announced that it will fight to lower the fees that U.S. carriers pay their overseas counterparts to deliver international calls in other countries. The ultimate resolution of the accounting rate issue will play a major role in determining the development of VOIP, because in some countries regulators have decided that the Internet is different than the telephone network, so the accounting rate does not apply to Internet long-distance calls. U.S. carriers want all foreign governments to decide that Internet telephony is a new service. Worldwide regulators, on the other hand, lean toward the position that they cannot allow the Internet to be tariff-free because that would cause established carriers and their public networks to lose lots of traffic and revenue [8]. Proponents make the case that a tariff-free market would be the best, because the products and services would come to market faster and costs would drop. This would aid the general proliferation of videoconferencing and whiteboarding services that packet networks enable. Another issue is that carriers will still have to decide how to package and price Internet telephone services, in order to make them appealing to consumers. Some analysts estimate that packetized voice services would be lower than regular longdistance prices by about 20 percent, while others believe that not to be the case— that is, that consumers will see only a little discount, but in return will get more functionality for about the same cost as today’s long-distance charges [8]. When the Internet infrastructure that must be built up to provide truly reliable phone service on a large scale is taken into account, one is talking about a significant expense. Planners need to understand how the Internet is built: What the components are, and who the players are. How it can be expanded and improved. How a new company can offer the Internet. How new services can be developed. Such an angle is not necessarily as creative at it appears prima facie. Just make the mental association, for argument’s sake, of the Internet with the PSTN we now have in the United States. The PSTN allows any properly equipped and connected user to reach any ear, brain, person that is suitably connected and equipped with a handset, in the United States or abroad. The Internet allows any properly equipped
Regional office
Headquarters PBX
PBX
noc xaFeleT
noce xaF leT XF0 002
XF0 002
DOS PC with V/IP for DOS and voice interface card
MICOM
IP Router
NetWare server with V/IP for Netware and voice interface card
MICOM
IP Router WAN
Field office
Warehouse
IP Router
MICOM
noce xaF leT XF0 002
IP Router
Windows 95 PC with V/IP for Windows 95 software and voice interface card
MICOM
Key terminal system
Windows 95 PC with V/IP for Windows 95 software and voice interface card
noc xaFeleT XF0 002
Figure 6.10 Micom V/IP phone/fax IP gateway integrates with data and phone equipment to create a voice/fax overlay on an IP network. (Courtesy of Micom Communications Corp., copyright 1997.)
179
180
Chapter Six
and connected user to reach any server or host that is suitably connected and equipped with appropriate hardware and software, in the United States or abroad. The PSTN is a collection of regional and backbone networks. The Internet is a collection of regional and backbone networks. The PSTN is an overlay of many regional and many backbone networks, and the user can pick a regional and a backbone network of choice; all these networks are interconnected so that any user can call any user. The Internet is an overlay of many regional and many backbone networks, and the user can pick a regional and a backbone network of choice; all these networks are interconnected so that any user can reach any host or server. The PSTN is composed of switching gear belonging to a provider and interconnecting links owned or leased by the same provider. The Internet is composed of routing gear belonging to a provider and interconnecting links owned or leased by the same provider. Given this demystification of the PSTN, somebody wants to plan, design, deploy, own, extend, and interconnect (with existing players) some or all portions of the PSTN, including local or backbone components. Similarly, somebody should be able and want to plan, design, deploy, own, extend, and interconnect (with existing players) some or all portions of the Internet, including local or backbone components. At the macroeconomic scale (not from the casual surfer’s point of view), the point remains: If we already have an optimized nationwide PSTN network, why do we need to develop another one of the same scale, reliability, and robustness? There are a number of ways to charge for the Internet phone services. What users may be offered in the next couple of years is a menu of à la carte line items that will appear on the phone bill. It would include a basic charge for packetized service and extra charges for such additional services as whiteboarding, videoconferencing, operator assistance, tech-support packages, and equipment integration [8]. Additional considerations regarding the international regulatory landscape are provided in Chapter 11.
6.9
Conclusion
The integration of several media is desirable due to the potential economic advantages. However, voice over IP remains somewhat of a technical novelty today, at least in terms of domestic PSTN user-to-user applications. There has been some international penetration (see Chapter 11) and also some switch-to-switch trunk applications, particularly in cable TV and wireless environments. The likely prognosis over the next few years is that the advent of QoS technology and increased enterprise network bandwidth (say, achieved with ATM WAN services) could give a boost to voice over IP in the intranet segment.
Voice over IP and the Internet
181
References 1. Rivka Tadjer. “Web Commerce—Voice Over IP.” CMP Media Inc. Issue 670 (June 30, 1997). 2. Microsoft ASF White Paper. www.microsoft.com. 3. Infonetics Research, Inc., Quality of Service White Paper. www.atminc.com. 4. ATIS Letter Ballot LB 526: “Transmission Performance Standards Issues.” March, 1996. 5. Skip Pizzi and Steve Church. “Audio Webcasting Demystified.” Web Techniques Magazine 2 (8). 6. News lead, Sydney Morning Herald. August 19, 1997. 7. Yahoo! Technology News. August 18, 1997. 8. R. Tadjer. “Tariffs Put Snarl in Carrier IP Plans.” Telepath (September 1, 1997): 1 ff.
Notes 1
Based on personal communication with M. McLoughlin of GDC, September 1997. 2 Extreme Networks, Inc., promotional material. 3 Ibid. 4 Micom Communications Corp. promotional material.
CHAPTER 7 Signaling Approaches 7.1
Introduction
This chapter describes call control signaling protocols that are applicable to VOIP implementations. Signaling is a mechanism critical to the enabling of not only call setup but also delivery of advanced supplementary services. Signaling is needed for on-net call establishment and, in particular, for interworking with the PSTN. What has prevented deployment of VOIP technology at the broad carrier level includes (1) underestimation by enterprise-oriented vendors of the importance, complexity, and purpose of signaling and the need to interconnect with the approximately two billion telephone sets already deployed globally, and (2) confusion created by the multiplicity of signaling protocols that have been advanced. Understanding signaling is, therefore, crucial. The material presented in this chapter is based on various industry sources, as well as on a book written by the senior author in 1994 that discusses signaling systems and technologies [1]. The senior author also covered traditional signaling, including Common Channel Signaling System 7 (CCSS7), in a 1991 telecommunications technology handbook [2]. Figure 7.1 illustrates the pervasiveness of the signaling interactions that must be efficiently supported in a VOIP environment. The VOIP signaling protocol choices and characterizations are • ITU-T H.323 Standard implementations Complex protocol
183
Chapter Seven
Media Gateway Controller
trol
Con
Sign a
MGCP H.248
Media Gateway
Signalling Gateway
PSTN Signalling
ia GW
One Logical VoIP Gateway
SS7 ISDN Q.Sig
PSTN Bearers
sigtran
Global Signalling
Media Gateway Controller
Med
Figure 7.1 VOIP signaling: decomposed gateway approach.
l Bac khau l
184
H.323 SIP-BCP ISUP/TCP Sigtran: ISOP/SCTP BICC
Media Transport
T1/E1/PRI OCx E&M FXS/FXO
Media Gateway
• RTP/UDP/IP/ SONET • TRP/UDP/IP/ SONET/WDM • RTP/UDP/IP/ VLAN/GBE • RTP/UDP/IP/ RPR • AAL(1,3,5)/ ATM
1,400,000,000 existing users generating $900,000,000,000.00 annually
• Media Gateway Control Protocol (MGCP) Client server telephony device Used in cable TV networks Not for mobility support • Session Initiation Protocol (SIP) “Clean” end-to-end architecture Stateless protocol Three philosophical approaches have arisen in the past five years regarding signaling in VOIP applications:
Signaling Approaches
Figure 7.2 Comparison of H.323, MGCP, and SIP stacks.
Signalling and Gateway Control
Call Control and Signalling
Media
H.323
Audio/ Video
H.225 H.245
Q.931
RAS
185
SIP
MGCP
TCP
RTP
RTCP
RTSP
UDP
IP
H.323 Version 1 and 2 supports H.245 over TCP, Q.931 over TCP and RAS over UDP. H.323 Version 3 and 4 supports H.245 over UDP/TCP and Q.931 over UDP/TCP and RAS over UDP. SIP supports TCP and UDP.
1. All elements (NEs and CPEs) are intelligent; in this case, one would employ ITU-T H.323. 2. The network is intelligent, but the end nodes are dumb; in this case, one would employ MGCP, H.248/MEGACO, or Media Gateway Controller,1 CCSS7, and Bearer-Independent Call Control (BICC). 3. The end nodes are intelligent, but the network is dumb; in this case, one would employ SIP. Obviously, carriers use the first two approaches, while enterprise-oriented vendors tend to use the last approach. H.323, in various versions, has the largest market share to date. It is derived from ISDN signaling protocols and, therefore, has an affinity for PSTN-like infrastructures and PSTN interworking environments. It is a kind of ITU-T Q.931 on TCP/IP. Figure 7.2 [3] depicts the protocol stack for the various candidate choices. Table 7.1 shows the chronology of standardization efforts. The different signaling protocols were developed in different communities to address the need for real-time session signaling over packet-based networks. Each protocol has different origins and different supporters with differing priorities. H.323 was developed in the enterprise LAN community as a videoconferencing technique and has much in common with ISDN signaling protocols, such as Q.931. MGCP/MEGACO comes from the carrier community and is closely associated with control of softswitches, media gateways, and so forth, within a single domain. The IETF developed SIP, reusing many familiar Internet elements—SMTP, HTTP, URLs, MIME, and DNS. Despite the fact that all signaling protocols address
186
Chapter Seven
Table 7.1 Chronology of Signaling Standards Standard
Date
Proponents
Comments
ITU-T CCSS7 ITU-T Q.931
1970s 1980s
Carriers/ITU-T Carriers/ITU-T
Ubiquitous Basis for ATM, Frame Relay, and H.323
ITU-T H.323v1
May 1996
Multimedia vendors/ITU-T
ITU-T H323v2 ITU-T H323v3 ITU-T H323v4 IETF SGCP (Simple Gateway Control Protocol) IPCD (IP Device Control) IETF MGCP 0.1 MDCP (Media Device Control Protocol) MEGACO (MGCP+) IETF SCTP (RFC 2960)
January 1998 September 1999 November 2000 July 1998
Telcordia, Cisco
Superseded
August 1998
Level 3
Folded into MGCP 0.1
October 1998 December 1998
IETF Lucent
April 1999 October 2000
IETF/ITU-T Carriers
Folded into MEGACO Carrier concerns
the same problem, they are neither equals nor peers; however, they can coexist, although some debate exists as to what extent they can coexist [4]. Table 7.2 provides a basic comparison of the protocols [4]. The opinion among observers is that H.323 will become the enterprise legacy standard, while MGCP and H.248/MEGACO will be used between carrier call agents and other media gateways. Many observers believe that SIP will be used between call agents and between call agents and residential IP phones. In other words, how protocols are implemented will depend on where within a network the VOIP equipment is situated. Clearly, there will need to be coexistence among all these standards—at least in the short term (see Figure 7.3) [5].
Table 7.2 Comparison between Three Major Signaling Protocols Philosophy Complexity Scope Scalability New service revenues Internet fit CCSS7 compatibility Cost
H.323
MGCP/MEGACO
SIP
Vertical High Full Poor No No Poor High
Vertical High Partial Moderate No No Good Moderate
Horizontal Low Simple Good Yes Yes Poor Low
Signaling Approaches
Figure 7.3 Deployment of product signaling standards (as of early 2001).
187
H.323 standard is the most widely supported standard VoIP vendors foresee a coexistence of serveral standards
H.323, Version 1 H.323, Version 2 H.323, Version 3 H.323, Version 4 Session Initiation Protocol Media Gateway Control Protocol H.248/Megaco 0 Currently supported
30
60
Planning to add this support within six months
Note: Many vendors support more than one VoIP standard. As a result, the total of all products represented in the chart exceeds 100%.
7.2
Signaling in Circuit-Switched Networks
We begin the discussion with a brief overview of PSTN signaling. A thorough understanding of traditional signaling is mandatory for all developers in their initial approach to VOIP. Our intention is not, however, to provide such a background here, beyond a brief review of some basics. Circuit-switched telephone networks use a signaling protocol called CCSS7 (or SS7 or C7), defined in the previous section as Common Channel Signaling System 7. In the PSTN, signaling endpoints send and receive CCSS7 signaling messages. There are three kinds of signaling endpoints, as shown in Figure 7.42: • Service Switching Point (SSP) (also called the central office switch) • Signal Transfer Point (STP) • Service Control Point (SCP) Figure 7.5 provides a pictorial view of the protocol stack. Table 7.3 lists the key International Telecommunications Union (ITU) and American National Stan-
188
Chapter Seven
Figure 7.4 CCSS7 protocol stack.
7 6 5
T U P
TCAP
I S U P
4
SCCP
3
MTP Level 3
2
MTP Level 2
1
MTP Level 1
dards Institute (ANSI) standards for PSTN signaling. In CCSS7 networks, ISDN User Part (ISUP) signaling messages are used to set up, manage, and release trunk circuits that carry voice calls between SSPs. ISUP messages also carry caller ID information, such as a calling party’s name and telephone number. ISUP is used for both ISDN and non-ISDN calls between SSPs. Transaction Capabilities Application Part (TCAP) signaling messages support telephony services, including toll-free (freephone), calling card, local number portability, mobile (wireless) roaming, and authentication services. Mobile services are enabled by information carried in the Mobile Application Part (MAP) of a TCAP message. TCAP supports non-circuit-
Voice Trunks SSP
STP
SCP
Figure 7.5 CCSS7 signaling endpoints in a switched circuit network.
SSP
SSP
Voice Trunks
SSP
STP
SSP
SS7 Links SCP
SSP
Signaling Approaches
189
Table 7.3 PSTN Key Signaling Standards ITU
ANSI
MTP Level 2
ITU Q.701 - Q.703, 1992
ANSI T1.111.2-3, 1992
MTP Level 3
ITU Q.704 - Q.707, 1992
ANSI T1.111.4-7, 1992
SCCP
ITU Q.711 - Q.714, 1992
ANSI T1.112, 1992
TUP
ITU Q.721 - Q.724, 1988
N/A
ISUP
ITU Q.761 - Q.764, 1992
ANSI T1.113, 1992
TCAP
ITU Q.771 - Q.775, 1992
ANSI T1.114, 1992
related information exchange between signaling points using the Signaling Connection Control Part (SCCP) connectionless service. VOIP networks carry CCSS7 over IP messages by using protocols defined by Signaling Transport (sigtran) Working Group of the IETF, the international organization responsible for recommending Internet standards. The sigtran protocols support stringent requirements for CCSS7 signaling, as defined by ITU. The issue of signaling carriage and interworking is revisited later; we now shift our attention to the VOIP-framework architecture model.
7.3
H.323 Standards
This section looks at key highlights of the H.323 standard.
Functional Elements In IP telephony networks, signaling information is exchanged between the following functional elements [6]: • Media gateway (MG) A media gateway (MG) terminates voice calls on interswitch trunks from a PSTN, compresses and packetizes voice data, and delivers compressed voice packets to an IP network. For voice calls originating within an IP network, the MG performs the aforementioned functions in reverse order. For ISDN calls from a PSTN, Q.931 signaling information is transported from the MG to the media gateway controller (described below) for call processing. • Media gateway controller (MGC) A media gateway controller (MGC) handles the registration and management of resources at the MGs. It exchanges ISUP messages with SSPs via a signaling gateway (described below). Because vendors of MGCs often use off-the-shelf computer platforms, an MGC is sometimes called a softswitch.
190
Chapter Seven
• Signaling gateway A signaling gateway provides transparent interworking of signaling between circuit-switched and IP networks. It may terminate CCSS7 signaling or translate and relay messages over an IP network to an MGC or another signaling gateway. Because of their critical role in integrated voice networks, signaling gateways are often deployed in groups of two or more to ensure high availability. An MG, signaling gateway, and MGC may be separate physical devices or integrated in any combination (see Figure 7.6). Figure 7.7 depicts a generalized gateway, which combines an MG and signaling gateway. Figure 7.8 depicts an example of a specific gateway that combines an MG and signaling gateway. Figure 7.9 depicts an MGC (in H.323 language, it is called a gatekeeper).
H.323 Basics ITU-T Recommendation H.323 Version 4 states that H.323 “describes terminals and other entities that provide multimedia communications services over Packet Based Networks (PBNs) which may not provide a guaranteed Quality of Service. H.323 entities may provide real-time audio, video and/or data communications.” Figure 7.10 depicts the classical H.323 architecture model. H.323 is an “umbrella” standard that covers multimedia communications over LANs. H.323 constitutes (1) call establishment and teardown and (2) conferencing (audio, visual, and multimedia). H.323 defines sophisticated multimediaconferencing-supporting applications, including whiteboarding, data collaboration, and videoconferencing. Basic call features include call hold, call waiting, call transfer, call forwarding, caller identification, and call park. Figure 7.11 depicts the protocol model, which consists of
SCP
STP
Signaling Gateway Signaling Gateway
SSP
SSP
SS7 Links
STP
Signaling Gateway
Media Gateway Voice Trunks
Media Gateway
Figure 7.6 Example of a VOIP network configuration.
Media Gateway Media Gateway . .
IP Network
Media Gateway Controller . . . Media Gateway Controller . . . Media Gateway Controller
Signaling Approaches
Figure 7.7 Generalized gateway.
191
Services Call Control
Resource Manager
Routing
SS7oIP Stack H.323 Stack
Media Plane
MGCP MECACO Stack SS7oMPLS Stack LAN/H.323 RTP/UDP/ IP/Ethernet/ G.723.1 ATM (e.g. PCM on AAL1, AAL2, AAL5) ISDN (e.g. 2B+D 23B+D POTS (e.g. copper, T1)
• • • •
ISUP Stack B ISUP Stack
Signaling Plane
SIP Stack
Q.2931 Stack ISDNoIP Stack VoATM Stack
ISDNoMPLS Stack POTS Stack
Parts of H.225 [Registration, Authentication, Status (RAS)] and Q.931 H.245 RTP and RTCP Audio and video coder/decoder, or codec, standards
H.323 models consist of terminals, gateways, gatekeepers, and MCUs. See Figures 7.12 and 7.13 for examples.
H.323 Entities: Terminals Terminals are end systems (or endpoints) on a LAN (see Figure 7.14). The terminal embodies capabilities to support real-time two-way communications with another H.323 entity. The terminal must support (1) voice and audio codecs (such as those described in Chapter 5)3 and (2) signaling and setup (Q.931, H.245, and RAS). Optional support includes video coders and data (whiteboarding). Audio codecs (G.711, G.723.1, G.728, and so on) and video codecs (H.261 and H.263) compress and decompress media streams. Media streams are transported on RTP or RTCP; RTP
192 LAN Interface
UDP IP
Figure 7.8 Specific gateway.
PC/H.323 Phone
Hardware
OS
RTP
DSP Card Voice Codecs
Switching Logic/ Fabric
Media Plane Protocols Stack
PSTN Phone
Voice Interface
RTCP
Billing Services
UDP
PC/H.323 Phone
Hardware
DNS client
Protocol Library
TCP
H.225.0 Call Sig
LAN Interface
IP
H.225.0 RAS client
Call Manager H.245 Cntrl Sig
PSTN Phone
PSTN Signalling Physical Interface
PSTN Signalling Link Control Layer e.g. LAPD
PSTN Signalling Call Control Layer e.g. Q.931
Networking Call Control
Control Plane Protocol Stack
Signaling Approaches
Figure 7.9 Gateway controller, or gatekeeper.
193
Control Plane Protocol Stack: Gatekeeper Manager Protocol Library
OS
H.225.0 RAS (server)
H.225.0 Call Sig
UDP
Hardware
H.225.0 Cntrl Sig
TCP
Billing Services Directory Services
IP
Security Services
LAN Interface
Policy/ Call Mgmt
carries actual user media content, while RTCP carries status and control information. RTP and RTCP are carried on UDP, and signaling is transported reliably over TCP. RAS supports registration, admission, and status; Q.931 handles call setup and termination; and H.245 provides end system–to–end system capabilities exchange [7].
H.323 Entities: Gateways Gateways provide interfaces between the LAN and the circuit-switched network, as well as translation between entities in a packet-switched network (for example, an
H.323 Terminal
H.323 Gatekeeper Packet Technology
H.323 MCU
LAN
H.323 Gateway
H.323 Terminal Scope of H.323
Circuit Technology PSTN
Figure 7.10 Classical H.323 architecture model.
V.70 Terminal
H.324 Terminal
ISDN
POTS Terminal
H.320 Terminal
ISDN Phone
Figure 7.11 H.323 protocol stack.
Media
Audio Audio Codec Codec G.711 H.261 G.723 H.263 G.729
Data/Fax
RTCP
T.120
T.38
Call Control and Signalling
H.225 H.225 Q.931 RAS
H.245
RTP UDP
TCP
TCP
UDP
TCP
IP
H.323 Terminal Gateway H.323 Domain
H.323 Terminal
Wireless
Gateway
Gatekeeper
PSTN LSR/Router
LSR/Router
Gateway
ISDN LSR/Router
Gateway
L2 Switch MCU
Enterprise Network
Figure 7.12 H.323 domain (implementation).
194
Signaling Approaches
195
Figure 7.13 H.323 domain (logical).
Gatekeeper
Multipoint Control Unit
Circuit Switched Networks Terminal
Gateway
Packet Based Networks
IP/MPLS network) and a circuit-switched network (for example, a PSTN network). They also can provide translation of transmission formats, communication procedures, H.323 and non-H.323 endpoints, and codecs (compression and packetization of voice) [7]. Gateways translate communication procedures and formats between networks and handle call setup and clearing. Various types of gateways exist, as shown in Figure 7.12; however, the most common is an IP/PSTN gateway. Naturally, the gateway must support the same protocol stack described above on the local side. H.323 Terminal Video I/O Equipment
Video Codec H.261, H.263
Audio I/O Equipment
Audio Codec G.711, G.722, G.723, G.728, G.729
User Data Applications T.120, etc.
Receive Path Delay (sync)
RTP RTCP
H.225 Layer
UDP
IP
System Control H.245 Control
Figure 7.14 H.323 terminal.
System Control User Interface
TCP
Call Control H.255.0 RAS Control H.225.0
UDP
Local Area Network Interface (10/100/1000 Mbps Ethernet)
196
Chapter Seven
H.323 Entities: Gatekeepers Gatekeepers are optional. If present, however, they must perform certain functions. Gatekeepers manage a zone, which is a collection of H.323 devices. Usually, there is one gatekeeper per zone; an alternate gatekeeper might exist for backup and load balancing. Gatekeepers typically are software applications that are implemented on a PC or workstation but can also be integrated in a gateway or terminal. Some protocol messages pass through the gatekeeper; others pass directly between the two endpoints. The more messages routed between the gatekeeper, the more load and responsibility (that is, more information and more control). It is important to note that media streams never pass through the gatekeeper [7]. Mandatory gatekeeper functions include • • • •
Address translation (routing) Admission control Minimal bandwidth control–request processing Zone management
Optional gatekeeper functions include • Call control signaling (direct handling of Q.931 signaling between endpoints) • Call authorization, bandwidth management, and call management using some policy • Gatekeeper management information • Directory services
H.323 Entities: Multipoint Control Units (MCUs) MCUs are end systems that support conferences between three or more endpoints. The MCU can be a standalone device (for example, a PC), or it can be integrated into a gateway, gatekeeper, or terminal. Typically, the MCU consists of a multipoint controller (MC) and a multipoint processor (MP). The MC handles control and signaling for conference support, while the MP receives streams from endpoints, processes them, and returns them to the endpoints in the conference. MCUs can be centralized or decentralized. Centralized MCUs handle both signaling (MC) and stream processing (MP); decentralized MCUs handle only signaling, because the streams go directly between endpoints (in this case, MCUs function without MP).
Example of Signaling Signaling between endpoints can be supported directly or through a gatekeeper (see Figure 7.15).
Figure 7.15 Supported signaling approaches.
Direct Call Signaling
RA S
RA S
H.225/H.245 Call Signaling RTP Voice Bearer Traffic
• RAS signalling between gateway and gatekeeper • H.225 and H.245 signaling between gateways
Gatekeeper Routed Call Signaling
RA S
RA S
H.225/H.245 Call Signaling RTP Voice Bearer Traffic
• RAS signaling between Gateway and Gatekeeper • H.225 and H.245 signaling between gateway and gatekeeper (e.g., for pre-paid cards)
197
198
Chapter Seven
Figure 7.16 Endpoint signaling through a gateway.
Gateway
Terminal
.931) ng (Q li a n ig Call S .245) rol (H t n o Call C (RTP) tream S ia Med
Gatekeeper Address Translation Admission Control Bandwidth Control (RAS) • Mandatory Services: • Address translation • Admission control • Bandwidth control • Zone management
• Optional Services: • Call control signaling • Call authorization • Bandwidth management and reservation • Call management • Directory services
Here, we briefly illustrate signaling interactions [7]. Figure 7.16 depicts an endpoint signaling through a gateway. Figure 7.17 shows a gatekeeper-routed call signaling process. Figure 7.18 shows the use of two gateways. Figure 7.19 further illustrates H.245-provided capabilities exchange. The text that follows expands on the interaction shown in Figure 7.17 for a gatekeeper-routed call signaling (Q.931/ Gateway
Terminal
5) (H.24 ntrol o C ll Ca (RTP) eam r t S Media
Call Signaling Gatekeeper
Figure 7.17 Gatekeeper-routed call signaling (Q.931).
Address Translation Admission Control Bandwidth Control (RAS) Call Signaling (Q.931)
Figure 7.18 Gatekeeper signaling over two gateways.
Gateway A 5) H.24 trol ( n o C Call (RTP) tream S ia Med
Terminal
ARQ (Admission Request) "I have a call for 212-555-1212" Address Translation Admission Control Bandwidth Control (RAS) Call Signaling (Q.931)
ACF (Admission Confirm) "Use Gateway B at IP location X.Y.Z.W"
H.323 Call setup
Gateway B Gatekeeper A
Gateway
Terminal
eam
Str Media
(RTP)
Gatekeeper
Figure 7.19 Gatekeeper-routed call signaling (Q.931/H.245).
Address Translation Admission Control Bandwidth Control (RAS) Call Signaling (Q.931) Call Control (H.245)
199
200
Chapter Seven
H.245) interaction between client A and client B [7]. This interaction supports the establishment of a call between client A and client B. The steps are as follows: 1. Discover and register with the Gatekeeper RAS channel (see Figure 7.20). • Discovering Gatekeeper RAS works as follows: - The client transmits a Multicast Gatekeeper Request packet, which essentially asks: Who is my gatekeeper? - The gatekeeper responds with either a Gatekeeper Confirmation packet or a Gatekeeper Reject packet. • Registering with Gatekeeper RAS works as follows: - The client notifies the gatekeeper of its address and aliases. - The client transmits the Gatekeeper Registration Request packet. - The gatekeeper responds with either a Gatekeeper Registration Confirmation or a Registration Rejection packet. - In the network deployment shown in Figure 7.20, both client A and client B register with gatekeeper A. H.323 Gatekeeper Registration Gatekeeper A H.323 Gateway Learns of Gatekeeper via Static Configuration Registration Request (PRQ)
RRQ Hi: I Am Registering My Name or E.164 Address (Gateway B)
Hi: I Am Registering My Name or E.164 Address (Gateway A)
Client A
Registration Confirm (RCF)
Repeat Reject (RRJ)
Client B
RAS - Registration Admission and Status UDP Transport Port 1719
Figure 7.20 Registration, Authentication, Status (RAS)
• GRQ/GCF/GRJ (discovery) • Unicast-multicast, find a gatekeeper • RRQ/RCF/RRJ (registration) • Endpoint alias/IP address binding, endpoint authentication • ARQ/ACF/ARJ (admission) • Destination address resolution, call routing • LRQ/LCF/DRJ (location) • Inter-gatekeeper communication • DRQ/DCF/DRJ (disconnect) • flush call state
Signaling Approaches
201
2. Set up routed calls between endpoints through the gatekeeper (Q.931 call signaling). • Call Admission RAS is handled as follows: - Client A initiates an Admission Request packet, which essentially asks: Can I make this call? (The packet includes a maximum bandwidth requirement for the call.) - The gatekeeper responds with an Admission Confirmation packet. - The bandwidth for the call is either confirmed or reduced. - The gatekeeper call signaling channel address is provided by gatekeeper. • Call Setup Through Gatekeeper (Q.931) is handled as follows: - Client A sends a call setup message to the gatekeeper. - The gatekeeper routes the message to client B. - If client B accepts, an Admission Request packet with the gatekeeper is initiated. - If the gatekeeper accepts, client B sends a connect message to client A, specifying the H.245 call control channel for capabilities exchange. 3. Undertake initial communications and capabilities exchange (H.245 call control). • Capabilities Exchange (H.245) is handled as follows: - The clients exchange call capabilities with the Terminal Capability Set message that describes each client’s ability to transmit media streams—that is, the audio and video codec capabilities of each client. - If conferencing, MCU determination is negotiated during this phase. - After the capabilities exchange, the clients will have a compatible method for transmitting media streams, and multimedia communication channels can then be opened. 4. Establish multimedia communication and call services (H.245 call control). • Establishing Multimedia Communication is as follows: - For opening a logical channel for transmitting media streams, the calling client transmits an Open Logical Channel message (H.245). - The receiving client responds with an Open Logical Channel Acknowledgment message (H.245). - The media streams are transmitted over an unreliable channel; the control messages are transmitted over a reliable channel. - Once the channels are established, either the client or the gatekeeper can request call services—that is, the client or the gatekeeper can initiate an increase or decrease of the call bandwidth. 5. Call termination (H.245 call control and Q.931 call signaling). • Call Termination is handled as follows: - Although either party can terminate the call, assume that client A terminates the call.
202
Chapter Seven
- Client A completes transmission of the media and closes the logical channels used to transmit the media. - Client A transmits the End Session command (H.245). - Client B closes the media logical channels and transmits the End Session command. - Client A closes the H.245 control channel. - If the call signaling channel is still open, a Release Complete message (Q.931) is sent between the clients to close this channel. Figure 7.21 summarizes the call setup steps, while Figure 7.22 depicts an interaction between multiple gatekeepers. Figure 7.23 depicts FastStart call flow [8]. The following features have been added to H.323 Version 2: 1. H.235, for security and authentication (that is, passwords for registration with the gatekeeper) 2. H.450.x, for supplementary services (for example, call transfer and forwarding) 3. Fast call setup, which bypasses some setup messages and is triggered by the Q.931 Fast Start message that contains basic capabilities 4. A mechanism for specifying alternative gatekeepers to endpoints 5. The ability of a gatekeeper to request forwarding of Q.931 information on direct routed calls 6. Better integration of T.120 (an optional standard for data)—for example, a T.120 channel that opens as any H.323 channel should H.323 Version 3 defines a number of supplementary services that are grouped under the H.450 umbrella. Key supplementary services are • • • •
H.450.4 (Call Hold) H.450.5 (Call Park and Call Pickup) H.450.6 (Call Waiting) H.450.7 (Message Waiting)
Figure 7.24 identifies the strengths and weaknesses of H.323.
7.4
MGCP
The IETF RFC 2705 (October 1999) MGCP states that the “Media Gateway Control Protocol [MGCP] is a protocol for controlling telephony gateways from exter-
Gatekeeper B • Both endpoints have previously registered with the gatekeeper. • Terminal A initiate the call to the gatekeeper. (RAS messages are exchanged).
Terminal A
Terminal B
Gatekeeper A 1. ARQ
• The gatekeeper provides information for Terminal A to contact Terminal B.
2. ACF 3. SETUP 4. Call Proceeding
• Terminal A sends a SETUP message to Terminal B.
5. ARQ 6. ACF
• Terminal B responds with a Call proceeding message and also contacts the gatekeeper for permission.
7. Alerting 8. Connect H.245 Messages RTP Media Path
• Terminal B sends a Alerting and Connect message.
RAS messages Call Signaling Messages
• Terminal B and A exchange H.245 messages to determine master slave, terminal capabilities, and open logical channels.
Note: This diagram illustrates a simple point-to-point call setup where call signaling is not routed to the gatekeeper.
Gatekeeper A
Gatekeeper B
LRQ LCF
ACF
ACF IP/MPLS
RRQ/RCF
RRQ/RCF ARQ
Network View Gateway A
Terminal A
H.225 (Q.931) Setup H.225 (Q.931) Alert and Connect H.245 RTP
Ethernet 1
ARQ
Gateway B
Ethernet 2
Figure 7.21 Example of H.323 call-setup steps.
203
Terminal B
204
Chapter Seven
Figure 7.22 Interaction between multiple gatekeepers.
C1 H.323 Gatekeeper A
H.323 Client
Gatekeeper Zone A
C2
H.323 GW
H.323 Gatekeeper B IP/MPLS Network
Gatekeeper Zone B
C5
Long haul PSTN
Local PSTN
Local PSTN
C3
C4 Trunking
Trunking
Client C1-C2: on-net call Client C1-C3: handoff via Local gateway Client C1-C4: -handoff via Local gateway and long distance PSTN -handoff to gatekeeper B (remote gatekeeper) and local handoff at remote end via gateway Client C1-C5: handoff to gatekeeper B followed by on-net call
nal call control elements called media gateway controllers or call agents.” MGCP is a “master/slave” protocol; it assumes limited intelligence at the edge (endpoints) and intelligence at the core (call agent). It differs from, but interoperates with, SIP and H.323, which are peer-to-peer protocols. MGCP is also a media/device control protocol; services are provided by network elements. It was developed by Telcordia, Cisco, and Level 3; Lucent and Nortel got ITU-T to work with IETF to generate the H.248/MEGACO extension to MGCP. MGCP 0.1 is the result of fusing SGCP 1.2 (a Telcordia-sponsored protocol) with IPDC (a Level 3–sponsored protocol). MGCP defines interactions between MGs and MGCs; clearly, the MGCs control the MGs. MGCP uses the Session Description Protocol (SDP) (from RFC 2372) to describe media capabilities. (SDP is also used by SIP and MEGACO/H.248.) MGCP can be used as a standalone protocol; however, current implementations pair it with other architectures and protocols, such as H.323 and SIP. MGCP components include call agents (MGs) and gateways, as depicted in Figure 7.25. MGCP is used between call agents and MGs. The call agent or MGC provides call signaling, control, and processing intelligence to the gateway and
205 Network view:
Gateway A
ACF
ARQ
ACF
ACF (Admission Confirm): Yes You Can Use Gateway B IP address Y.Y.Y.Y
RTP
H.225 FastStart Connect
H.225 FastStart Set-Up
ARQ
Gateway B
• Allows endpoints to establish a basic connection in as little as one roundtrip • FastStart element appended to setup message • FastStart element contains OpenLogicalChannel proposals (normally carried in H.245 control channels) • H.245 procedures may be established later • Endpoint may refuse fast connect (revert to normal H.245 procedures)
Gatekeeper A (Zone A)
Media (UDP)
H.225 FastStart
ARQ (Admission Request): I Have a Call for 212-555-1212
Logical Channel Setup (RTCP)
Logical Channel Setup (RTP)
Logical Channel Setup (RTCP)
Logical Channel Setup (RTP)
Connect
Setup – H.225 FastStart
H.323 Gateway
Figure 7.23 H.323 Version 3 end-to-end FastStart call flow.
Bearer Plane
Signaling Plane
H.323 Gateway
Assumes Endpoints (Clients) Know Each Other's IP Addresses
H.323 V.2 End-to-End FastStart Call Flow
206
Chapter Seven
Figure 7.24 Strengths and weaknesses of H.323. H.323 Strengths Strenghts
H.323 Limitations
• ITU standards-based; 7 years of experience • Mature protocol with many large-scale deployments • Major vendor support • Includes standards for supplementary services • Network retains call state for duration of call, greater call control • Application services available through gatekeeper and application platforms
• Keeping call state in the network increases cost to scale (but avoids single point of failure) • Limited deployment of softphones, many are proprietary • Modeled after network layer standards (vs. application layer), hence, complex
sends and receives commands to and from the gateway. The gateway provides translations between circuit-switched networks and packet-switched networks, and it also sends notification to the call agent about endpoint events and executes commands from call agents. MGCP’s primitives are as follows: Endpoint configuration (EPCF), which instructs a gateway about encoding characteristics of an endpoint’s line side Notification request (RQNT), which instructs a gateway to watch for specific events Notify (NTFY), which informs call agents of requested events CreateConnection (CRCX), which creates a connection to an endpoint inside a gateway
Call Agent or Media Gateway Controller (MGC) MGCP
Figure 7.25 MGCP.
Media Gateway (MG)
SIP H.323
Call Agent or Media Gateway Controller (MGC) MGCP
Media Gateway (MG)
• Endpoint control • Media establishment • Scalable telemetry • Event-action table
Signaling Approaches
207
ModifyConnection (MDCX), which changes the parameters associated with an established connection DeleteConnection (DLCX), which deletes an existing connection; ACK message return any call statistics AuditEndPoint (AUEP) and AuditConnection (AUCX), the former which audits the status of an endpoint and the latter which audits any associated connections RestartInProgress (RSIP), which notifies call agents of an endpoint (or a group of endpoints is taken out of service) A simplified call flow can be described as follows [3]: 1. 2. 3. 4. 5. 6. 7.
Phone A goes off hook, after which gateway A sends a signal to the call agent Gateway A generates a dial tone and collects the dialed digits The digits are forwarded to the call agent The call agent determines how to route the call The call agent sends commands to gateway B Gateway B calls phone B The call agent sends commands to both gateways to establish RTP and RTCP sessions.
At the time of writing, MGCP was a working document, not a standard. Both IETF and ITU have decided to jointly mandate a single standard, endorsed by both communities, known as MEGACO/H.248 (MEGACO is from IETF; H.248 is from ITU). MEGACO is covered in more detail in Section 7.7.
7.5
SIP
According to IETF RFC 2543, SIP is an application layer–signaling protocol that defines initiation, modification, and termination of interactive multimedia communication sessions between users. Figure 7.26 provides a snapshot of key SIP concepts [8]. Currently undergoing standardization by the IETF SIP Working Group, SIP is designed for (1) integration with existing IETF protocols, (2) scalability and simplicity, (3) mobility, and (4) easy feature-and-service creation. It is also designed to be fast and simple in the core of an enterprise network. It can support basic call features (call waiting, call forwarding, call blocking, and so on), unified messaging, call forking, click to talk, instant messaging, and find me/follow me. SIP is a peerto-peer protocol (as are other Internet protocols) that permits a client to establish a session with another client. By contrast, MEGACO is a master/slave protocol. Figure 7.27 identifies key SIP components.
208
Chapter Seven
Figure 7.26 Key SIP concepts.
Location Server
Redirect Server
Register Server
LAN (Layer 2 Switch)
PSTN
User Agent
Proxy Server
Proxy Server
Gateway
SIP Components
SIP is a signaling protocol for creating, modifying, and terminating sessions— such as IP voice calls or multimedia conferences—with one or more participants in an IP network. Although the sigtran protocols mentioned previously are currently the protocols of choice for interworking IP networks with the PSTN, SIP is the protocol of choice for the converged communications networks of the near future. SIP provides the following functions [9]: Name translation and user location, to ensure that a call reaches the called party regardless of location. SIP addresses users by an e-mail-like address. Each user is identified through a hierarchical URL built around such elements as a user’s telephone number or host name (for example, SIP:[email protected]). Because of this similarity, SIP URLs are easy to associate with a user’s e-mail address. Feature negotiation, so that SIP allows all parties involved in a call to negotiate and agree on the features supported, recognizing that all participants may not be able to support the same kind of features. For example, a mobile-voiceonly-telephone user and two video-enabled-device users engaged in a session would agree to support voice features only. When the mobile-voice-onlytelephone user leaves the call, the remaining participants may renegotiate session features to activate video communications. Call participant management, so that during a session, a participant can bring other users into the call or transfer, hold, or cancel connections.
Signaling Approaches
Figure 7.27 SIP components.
SIP
209 SIP
SIP
SIP
LAN A
LAN B intranet • Client-Server model • User Agent Client (UAC)-Initiates sessions • User Agent Server (UAS)-Responds to session request • User Agent = UAC + UAS
• SIP defined by IETF MMUSIC working group as RFC 2543, in March 1999, work continues within SIP WG. SIP is ASCII text-based for easy implementation and dubugging • The Session Initiation Protocol (SIP) is an application layer control (signaling) protocol for creating, modifying, and terminating sessions with one or more participants • Uses URL-style addresses and syntax • MIME definition for multimedia • Simple extensible protocol • Methods - Define transaction • Heades - Describe transaction • Body - SDP • SIP is a peer-to-peer protocol where end-devices (User Agents-UAs) initiate sessions • SIP sessions include Internet multimedia conferneces, Internet telephone calls, and multimedia distribution. Members can communicate via multicast, via a mesh of unicast relations, or a combination • SIP uses several existing IETF protocols to provide message formatting (HTTP 1.1), media (RTP), name resolution and mobility (DHCP and DNS), and application encoding (MIME)
SIP Protocol Components SIP has two basic components: the user agent and the network server. The user agent is effectively the end-system component for the call, while the network server is the device that handles the signaling associated with multiple calls. The user agent consists of the user agent client (UAC), which initiates calls, and the user agent server (UAS), which answers calls. This architecture allows peer-to-peer calls to be made using a client/server protocol. The network server element consists of three server forms: the stateful proxy server, the stateless proxy server, and the redirect server. The main function of these servers is to provide name resolution and user location, because callers are unlikely to recall the IP address or host name of called parties. By using an easier-toremember e-mail-like address, the caller’s user agent can identify a specific server (or server cluster) to resolve called-party-address information (see Figure 7.28).
210
Chapter Seven
Figure 7.28 SIP Network servers.
SIP Network Servers Optional SIP components
User Agent IP/MPLS Network
Redirect Server
Proxy
Registrar Server
User Agents
• Returns a redirect to UA for direct routing • Does not route SIP messages
• Handles routing of SIP signaling • Does not initiate SIP messges
• Allows SIP UAs to receive calls • NOT needed to make outgoing calls
SIP provides its own reliable transfer mechanism independent of the packet layer. For this reason, SIP does not require the services of the sigtran SCTP protocol and functions reliably over an unreliable datagram protocol, such as UDP.
SIP-T SIP for telephones (SIP-T) is a mechanism that allows SIP to be used for ISUP call setup between CCSS7-based PSTNs and SIP-based IP telephony networks. SIP-T carries an ISUP message payload in the body of an SIP message. The SIP header carries translated ISUP-routing information. Also, SIP-T specifies the use of the SIP INFO method for effecting in-call ISUP signaling in IP networks.
Signaling Approaches
211
The user agent is an application that initiates, receives, and terminates calls. As mentioned previously, there are two types: the UAC and the UAS. Both types can terminate a call. The proxy server is an intermediary program that acts as both a server and a client to make requests on behalf of other clients. Requests are serviced internally or by passing them on, possibly after translation, to other servers. The server interprets, rewrites, or translates a request message before forwarding it. The location server is used by a SIP redirect or proxy server to obtain information about a called party’s possible location(s). The redirect server accepts a SIP request, maps the address into zero or more new addresses, and returns these addresses to the client. Unlike a proxy server, the redirect server does not initiate its own SIP request. Unlike a UAS, the redirect server does not accept or terminate calls [9]. The register server is a server that accepts register requests. This server may support authentication, and it is typically located with a proxy or redirect server and may offer location services. SIP components communicate by exchanging SIP messages, as depicted in Figure 7.29 [3]. SIP borrows most of its syntax and semantics from HTTP. An SIP message looks like an HTTP message, with message formatting, header, and Multipurpose Internet Mail Extension (MIME) support. The SIP address is identified by a SIP URL, and the URL has the following format: user@host. Establishing communication with SIP usually takes place in six steps (see Figure 7.30 [3]): 1. Registering, initiating, and locating the user 2. Determining the media to use, which involves delivery of a description of the session to which the user is invited 3. Determining the willingness of the called party to communicate—the called party must send a response message to indicate willingness to communicate (acceptance or rejection)
SIP Methods:
Figure 7.29 SIP messages.
• INVITE - Initiates a call by inviting a client to participate in session. • ACK - Confirms that the client has received a final response to an INVITE request. • BYE - Indicates termination of the call. • CANCEL - Cancels a pending request. • REGISTER - Registers the user agent. • OPTIONS - Used to query the capabilities of a server. • INFO - Used to carry out-of-bound information, such as DTMF digits.
SIP Responses: • 1xx - Informational Messages. • 2xx - Successful Responses. • 3xx - Redirection Responses. • 4xx - Request Failure Responses. • 5xx - Server Failure Responses. • 6xx - Global Failures Responses.
212
Figure 7.30 SIP call handling.
Call Teardown
Media Path
Call Setup
User Agent
200 (OK)
200 (OK)
(a)
BYE
BYE
200 (OK)
BYE
ACK
ACK
ACK
RTP MEDIA PATH
200 (OK)
180 (Ringing)
INVITE
200 (OK)
ACK
INVITE 302 (Moved Temporarily)
200 (OK)
INVITE
Proxy Server
180 (Ringing)
ACK
302 (Moved Temporarily)
INVITE
Location/Redirect Server
180 (Ringing)
INVITE
Proxy Server
User Agent
213
Figure 7.30 (Continued).
Bearer Plane
Signaling Plane
User Agent
Logical Channel (RTCP)
Logical Channel (RTP)
ACK
200 OK
Invite
Assumes Endpoints (Clients) Know Each Other's IP Addresses
SIP End-to-End Signaling Without a Server
User Agent
(b)
Media (UDP)
User Agent
Redirect Server
Logical Channel (RTCP)
Logical Channel (RTP)
RTP Session Established
ACK
OK (200)
Ringing (180)
Invite to returned 3xx address
e ) vit 02 (3 ed ov K M AC In
SIP Call Flow with Redirect
Media (UDP)
User Agent
User Agent
214
Chapter Seven
4. Call setup 5. Call modification or handling 6. Call termination Figure 7.31 depicts SIP call-forking capabilities, as well as third-party call control [8] (a taboo for traditional carriers). Figure 7.32 depicts SIP strengths and weaknesses. Third Party Call Control Web Interface
A
Proxy
Invite (w
ith SDP of A) OK (wit h SDP o f B) ACK
) SDP No ( A) e t P of Invi D S B) h (wit P of D S OK h (wit ACK
B
tablished Session Es
C
User Agent
User Agents
D
B
Call Forking
Invite el Canc
Proxy
A
Figure 7.31 Additional SIP capabilities.
User Agent
Invite Trying (100) Cancel
ite Inv 100) ( ing 00) Try (2 OK K AC
Session
C
Tr Inv yi it ng ng e in (10 O K g (1 0) (2 8 AC 00 0) ) K Ri
Establi
User Agents
D shed
Signaling Approaches
215
Figure 7.32 SIP strengths and weaknesses.
SIP Strengths
SIP Limitations
• Simple, Internet/IP friendly • Emerging support, wide spread development underway • Softphones & etherphones available • Integration with IP clients, instant messaging • Facilitates application development • Bare-bones call state duration maintained in network
• Late to market • Limited commercial deployments, small scale deployments • Lack of call state in network (billing, security) • Moving target, under development
SIP was designed for integration with IETF environments. Existing IETF protocol standards can be used to build an SIP application. SIP protocols can work with existing IETF protocols such as RSVP (to reserve network resources); RTP (to transport real-time data and provide QoS feedback); RTSP (to control delivery of streaming media); Session Advertisement Protocol (SAP) (to advertise multimedia sessions via multicast); SDP (to describe multimedia sessions); MIME (for content description); and HTTP (for Web-page delivery). SIP supports flexible, intuitive feature creation via SIP–Common Gateway Interface (SIP-CGI) and Call Processing Language (CPL). Functionally, SIP and H.323 are similar, as shown in Table 7.4 [3] (also see References [10] and [11]). Both protocols provide for call control, setup, and teardown (with capabilities exchange). Both also provide such basic call features as call waiting, call hold, call transfer, call forwarding, call return, call identification, and call park. Tables 7.5 and 7.6 provide additional comparisons between SIP, H.323, and ISUP [12].
7.6
Other IETF Signaling Efforts
Here, we highlight other efforts under way [9].
PINT and SPIRITS Sigtran is not the only IETF Working Group involved in defining new protocols to enable PSTN integration with IP networks. PSTN and Internet Interworking
Table 7.4 SIP/H.323 Protocol Comparison H.323
SIP
Standards Body ITU. Origins
IETF.
Telephony based. Borrows call signaling protocol from ISDN Q.SIG.
Internet based and web centric. Borrows syntax and messages from HTTP.
Client
Intelligent H.323 terminals.
Intelligent user agents.
Core servers
H.323 Gatekeeper.
SIP proxy, redirect, location, and registration servers.
Current Deployment
Widespread.
Interoperability testing between various vendor's products is ongoing at SIP bakeoffs.
Capabilities Exchange
Supported by H.245 protocol. H.245 provides structure for detailed and precise information on terminal capabilities.
SIP used SDP protocol for capabilities exchange. SIP does not provide as extensive capabilities exchage as H.323.
Control Channel Encoding Type
Binary ASN.1 BER encoding.
Text based UTF-8 encoding.
Server Processing
Version 1 or 2 - Stateful.
Stateless or stateful.
SIP is gaining interest.
Version 3 or 4 - Stateless or stateful. Quality of Service
Bandwidth management/control SIP relies on other protocols and admission control is managed such as RSVP, COPS, OSP to by the H.323 gatekeeper. implement or enforce quality of service. H.323 recommends using RSVP for resource reservation.
Security
Registration - If a gatekeeper is present, endpoints register and request admission with the gatekeeper.
Registration - User agent registers with a proxy server. Authentication - User agent authentication uses HTTP digest or basic authentication.
Authentication and Encryption H.235 provides recommendations for authentication and encryption Encryption - The SIP RFC defines three methods of in H.323 systems. encryption for data privacy. Endpoint Location and Call Routing
Uses E.164 or H323 ID alias and a address mapping mechanism if gatekeepers are present in the H.323 system.
Uses SIP URL for addressing Redirect or location servers provide routing information.
Gatekeeper provides routing information. Conferencing
Comprehensive audiovisual conferencing support.
Basic conferencing without conference or flood control.
Data conferencing or collaboration defined by T.120 specification. Service or Feature Creation
H.450.1 defines a framework for supplementary service creation.
Supports flexible and intuitive feature creation with SIP using SIP-CGI and CPL. Some example features include presence, unified messaging, or find me/follow me.
216
217
Different versions Signalling and bearer setup by different protocols H.245 for capabilities exchange (rich protocol in expressing terminal capabilities) Scalable
Different versions Signalling and bearer setup by different protocols SCP for capabilities exchange (limited in expressing terminal capabilities) Highly scalable
Scalable
IS UP IEs for capabilities exchange
Signalling and bearer setup by same protocols
Different national variants
Voice, data
Binary encoding
Not easily extensible (NonStandardParam IE)
Easily extensible (SIP Require header)
Courtesy: Trillium
Security protocols (H.235, IPSEC, TLS)
Security protocols (IPSEC, PGP, SSL, HTTP authentication)
Fault tolerance - redundant Gatekeeper and endpoints
Loop detection (Path Value)
Loop detection (header Via field)
Extensible (message and parameter compatibility IEs)
Physical security
Loop detection (timer, hop count, loop message)
E.164 address, static
Zone management - distributed Gatekeepers
Distributed Servers
Call setup: 1.0 round trips
Dedicated circuit, no QoS required
E.164 address, static
Call setup: 1.5 round trips using UDP, fast call and no Gatekeeper
Differentiated services support (bit rate, delay negotiation)
IP address, domain name resolution IP address, multizone, multidomain support via DNS through Gatekeeper (Annex G, Border Element)
Call setup: 1.5 round trips using UDP
ISUP Complex 44 messages/60 IEs
Admission (bandwidth control and management) Admission control through fallback procedures control through Gatekeeper
Voice, data, video
Text encoding Voice, data, video
H.323 Complex H.225.0 Signalling: 13 messages/263 IEs H.225 RAS: 30 messages/303 IEs H.245: 72 messages/127 IEs Binary encoding
Lightweight 6 messages/37 headers
SIP
Table 7.5 Service Comparison with ISUP
218
Chapter Seven
Table 7.6 Feature Comparison with ISUP Services
SIP
H.323 Version 3
ISUP
Call Hold & Retrieve
Yes
H.450.4
Q.733.2
Call Transfer
Yes
H.450.2
Q.732.1
Cal Diversion
Yes
H.450.3
Q.732.2
Call Park & Pick-up
Yes
H.450.5
Call Waiting
Yes
H.450.6
Message Waiting Indication
No
H.450.7
Terminal Portability
Notify
Q.733.1 Q.733.4
Conference Calling
Yes Facility/Setup/H.245
Q.734.1
Three Party
Yes Notify/Facility/H.245
Q.734.2
Call Completion or Busy Subscriber
Yes
H.450.9
Q.733.3
Calling Line ID Presentation (CLIP)
Yes
H.450.8
Q.731.3
Calling Line ID Restriction (CLIR)
Yes
H.450.8
Q.731.4
Connected Line ID Presentation (COLP) Yes
H.450.8
Q.731.5
Connected Line ID Restriction (COLR)
Yes
H.450.8
Q.731.6
Click-For-Dial
Yes
Yes
No
Courtesy: Trillium
(PINT) and Service in the PSTN/Internet Interworking Requesting Internet Service (SPIRITS) are two IETF Working Group recommendations that address the need to interwork telephony services between the PSTN and the Internet. PINT deals with services originating from an IP network; SPIRITS deals with services originating from the PSTN. In PINT, PSTN network services are triggered by IP requests. An SIP Java client embedded in a Java servlet on a Web server launches requests to initiate voice calls on the PSTN. The current focus of this initiative is to allow Web access to voice content and enable click-to-dial and fax services. In SPIRITS, IP network services are triggered by PSTN requests. SPIRITS is primarily concerned with Internet-related call waiting, caller ID delivery, and call forwarding [9].
ENUM IETF’s ENUM Working Group is devising a scheme to map E.164 telephone numbers to IP addresses by using the Internet Domain Name System (DNS) so that any application, including SIP, can discover resources associated with a unique phone number. An SIP phone or proxy server will use number domain translation and
Signaling Approaches
219
DNS resolution to discover a DNS resource that yields an SIP address at which a dialed number can be reached. This issue is covered in more detail in Chapter 10.
TRIP The IPTEL Working Group is developing telephony routing over IP (TRIP), a policy-driven interadministrative domain protocol for advertising the reachability of telephony destinations between location servers and for advertising attributes of the routes to those destinations. TRIP is designed to allow service providers to exchange routing information by using established Internet protocols to avoid the overprovision or duplication of gateways. If a telephone number does not have an associated SIP resource, the IP network routes the call to a telephone-routing gateway, which connects to the PSTN. In an interconnected environment with many peer-to-peer relationships between service providers, IP network resources must discover which telephone numbers are associated with which gateways [9].
7.7
MEGACO
As noted previously, MEGACO is a protocol that has been evolving from MGCP and was developed jointly by ITU and IETF; it is known as MEGACO in the IETF and H.248/H.GCP in the ITU-T (see Figure 7.33 for a basic diagram [4]). It is defined by IETF in the Informational RFC 3015 (November 2000), which uses the exact text from ITU-T Recommendation H.248 (February 2001). MEGACO/H.248 defines a protocol used between physically decomposed MGs and MGCs, uses SDP (RFC 2372) to describe media capabilities, and is designed to be a standalone architecture. MEGACO has been developed by the carrier community to address the issue of CCSS7/VOIP integration. Having grown out of the LAN, the H.323 initiative initially had trouble scaling to public network proportions. The architecture that it created was incompatible with the world of public telephony services; it struggled with multiple gateways and the CCSS7. To address this problem, the new initiative exploded the gatekeeper model and removed the signaling control from the gateway, putting it in an MGC. This device controls multiple MGs and is in effect a decomposition of the gatekeeper to its CCSS7 equivalents. MGCP/MEGACO is the protocol used for communication between the MGC and the MGs [4]. Compared to MGCP, MEGACO brings a performance enhancement, for it can support thousands of ports on a gateway or multiple gateways and can accommodate connection-oriented media such as Time Division Multiplexing (TDM) and Asynchronous Transfer Mode (ATM). In the MGCP/MEGACO architecture, the intelligence (control) is unbundled from the media (data). It is a master/slave protocol, where the master has absolute
220
Chapter Seven
Figure 7.33 MEGACO.
SOFTSWITCH
SOFTSWITCH
SIP, H.323, Q.BICC
MEGACO
GATEWAY
RTP, AAL 1/2/5
GATEWAY
MEGACO/H.248 Primitives • Add: Adds a termination (an endpoint)
• Modify: Modifies the properties of a termination
• Subtract: Subtracts or disconnects a termination
• Move: Moves a termination to another context
• AuditValue: Returns the current values of properties, events, signals, and statistics
• AuditCapabilities: Returns the current values of properties, events, signals, and statistics
• Notify: Allows the media gateway to notify the MGC of events within MG
• ServiceCharge: Allows MG to inform MGC it is going in or out of service
control and the slave simply executes commands. The master is the MGC, or softswitch (or call agent), and the slave is the MG, which can be a VOIP gateway, an MPLS router, an IP phone, and so forth [4]. MGCP/MEGACO is used for communication to the MGs. It instructs the MG to connect streams coming from outside a packet network to a packet stream such as RTP. The MGC issues commands to send and receive media from addresses, generate tones, and modify configuration. The architecture, however, requires a session initiation for communication between MGCs. When an MG detects an off-hook condition, the softwsitch instructs the gateway, via MEGACO commands, to put dial tone on the line and collect Dual Tone MultiFrequency (DTMF) tones. After detecting the number, the MGC determines how to route the call and, by using an intergateway signaling protocol such as SIP, H.323, or Q.BICC, contacts the terminating controller. The terminating controller can instruct the appropriate MG to ring the dialed line. When the MG detects that the dialed line is off hook, both MGs can be instructed via their respective MGCs to establish two-way voice across the data network. Thus, these protocols have ways to detect conditions on endpoints and to notify the MGC of their occurrence, as well as
Signaling Approaches
221
to place signals, such as dial tone, on the line and create media streams between endpoints on the gateway and the data network (for example, RTP streams) [4]. There are two basic constructs in MGCP/MEGACO: terminations and contexts. Terminations represent streams entering or leaving the MG (for example, analogue telephone lines, RTP streams, and MP3 streams). Terminations have properties, such as the maximum size of a jitter buffer, that can be inspected and modified by the MGC. A termination is given a name, or ID, by the gateway. Terminations that typically represent ports on the gateway, such as analog loops or DS-0s, are instantiated by the MG whenever it boots; they remain active all the time. Other terminations are created when they are needed; then get used and released. Such terminations, called ephemerals, are used to represent flows over the packet network, such as RTP streams. Terminations may be placed into contexts, defined as two or more termination streams that are mixed and connected. The normal, active context might have one physical termination (say, one DS-0 in an E-3) and one ephemeral termination (the RTP stream connecting the gateway to the network). Contexts are created and released by the MG under command of the MGC. Once created, a context is given a name, or ID, and can have terminations added and removed from it. A context is created by adding the first termination; it is released by removing the last termination. MGCP/MEGACO uses a series of commands to manipulate terminations, contexts, events, and signals [4]: Add Adds a termination to a context and may be used simultaneously to create a new context Subtract Removes a termination from a context and may result in the context being released if no terminations remain Move Moves a termination from one context to another Modify Changes the state of the termination AuditValue and AuditCapabilities Return information about the terminations, the contexts, and the general MG state and capabilities ServiceChange Creates a control association between an MG and MGC, as well as deals with some failover situations. MEGACO is a robust architecture that supports multimedia and large-scale PSTN gateways, as well as business CPEs (PBXs) and the IP Centrex. Residential VOIP over DSL or cable is also supported. IP-phone profiles are defined in IETF (Informational RCF 3054) and ITU-T specifications.
7.8
Sigtran Protocols
As highlighted previously,2 sigtran protocols specify the means by which CCSS7 messages can be reliably transported over IP networks. This kind of capability can
222
Chapter Seven
be used either as an upgrade to a traditional CCSS7 network in the circuitswitched PSTN or in conjunction with a VOIP network. One could view the carriage of PSTN signaling messages over IP as a step in the migration of the PSTN to an IP-based infrastructure, for it addresses the signaling plane of the PSTN. The sigtran architecture identifies two components [9]: a common transport protocol for the CCSS7 protocol layer being carried and an adaptation module to emulate lower layers of the protocol. For example, if the native protocol is MTP Level 3, the sigtran protocols provide the equivalent functionality of MTP Level 2. If the native protocol is ISUP or SCCP, the sigtran protocols provide the same functionality as that of MTP Levels 2 and 3. If the native protocol is TCAP, the sigtran protocols provide the functionality of SCCP (connectionless classes) and MTP Levels 2 and 3. Table 7.7 identifies key documents applicable to sigtran. The sigtran protocols provide all the functionality needed to support CCSS7 signaling over IP networks, including • • • • • • • • •
Flow control In-sequence delivery of signaling messages within a single control stream Identification of the originating and terminating signaling points Identification of voice circuits Error detection, retransmission, and other error-correcting procedures Recovery from outages of components in the transmission path Controls for avoiding congestion on the Internet Detection of the status of peer entities (in-service, out-of-service, and so on) Support for security mechanisms to protect the integrity of the signaling information
• Extensions for supporting security and future requirements
Table 7.7 Key Documents Applicable to sigtran Architectural Framework for Signaling Transport, RFC 2719, October 1999 Stream Control Transmission Protocol, RFC 2960, October 2000 SS7 MTP2-User Peer-to-Peer Adaptation Layer (M2PA), Internet Draft, July 2001 SS7 MTP3-User Adaptation Layer (M3UA), Internet Draft, July 2001 SS7 MTP2-User Adaptation Layer (M2UA), Internet Draft, July 2001 SS7 SCCP-User Adaptation Layer (SUA), Internet Draft, July 2001 Site Security Handbook, RFC 2196, September 1997 Security Architecture for the Internet Protocol, RFC 2401 SIP: Session Initiation Protocol, RFC 2543 ENUM Service Reference Model, Internet Draft, February 23, 2001 Management Information Base for Telephony Routing over IP (TRIP), Internet Draft SPIRITS Protocol Requirements, Internet Draft IN- and PINT-Related Requirements for SPIRITS Protocol, Internet Draft
Signaling Approaches
223
Restrictions imposed by narrowband CCSS7 networks, such as the need to segment and reassemble messages greater than 272 bytes, are not applicable to IP networks and are therefore not supported by sigtran protocols.
Performance Considerations for CCSS7 over IP CCSS7 messages transported over IP networks must meet the stringent performance requirements imposed by both ITU CCSS7 standards and user expectations [9]. For example, although the ITU standard specifies that end-to-end call setup delay cannot exceed 20 to 30 s after the ISUP Initial Address Message (IAM) is transmitted, users have generally come to expect much faster response times. For this reason, VOIP networks must be engineered to satisfy user expectations and ITU standards for performance.
Security Requirements for CCSS7 over IP If signaling messages are transported over a private intranet, security measures can be applied as deemed necessary by the network operator. For signaling messages transported over the public Internet, the use of security measures is mandatory [9]. Several security mechanisms are currently available for use in IP networks. For transmission of signaling information over the Internet, the sigtran Working Group recommends the use of IPSEC (see RFC 2401). IPSEC provides the following security services: Authentication, to ensure that information is sent to and from a known and trusted partner Integrity, to ensure that the signaling information has not been modified while in transit Confidentiality, to ensure that the transported information is encrypted to avoid illegal use or violation of privacy laws Availability, to ensure that communicating endpoints under attack remain in service for authorized use Sigtran protocols do not define new security mechanisms, for the current available security protocols provide the necessary mechanisms for secure transmission of CCSS7 messages over IP networks.
SCTP Use in CCSS7 To reliably transport CCSS7 messages over IP networks, the sigtran Working Group has devised the SCTP. This protocol allows the reliable transfer of signaling messages between signaling endpoints within an IP network. Carrier-related con-
224
Chapter Seven
cerns have prompted companies such as Telcordia, Cisco, Motorola, Siemens, Nortel, and Ericsson to sponsor work in the area of stream transmission control. To establish an association between SCTP endpoints, one endpoint provides the other with a list of its transport addresses (multiple IP addresses combined with an SCTP port). These transport addresses identify the addresses that will send and receive SCTP packets. IP signaling traffic usually comprises many independent message sequences between many signaling endpoints. SCTP allows signaling messages to be independently ordered within multiple streams (unidirectional logical channels established from one SCTP endpoint to another) to ensure in-sequence delivery between associated endpoints. By transferring independent message sequences in separate SCTP streams, the retransmission of a lost message is less likely to affect the timely delivery of other messages in unrelated sequences (called head-of-line blocking) [9]. SCTP is designed to transport PSTN signaling messages over IP networks, although it is capable of broader applications as well. SCTP is a reliable transport protocol operating over a connectionless packet network, such as IP, and offers the following services to its users [13]: • Acknowledged error-free nonduplicated transfer of user data • Data fragmentation to conform to discovered path’s maximum transmission unit (MTU) size • Sequenced delivery of user messages within multiple streams, with an option for order-of-arrival delivery of individual user messages • Optional bundling of multiple user messages into a single SCTP packet • Network-level fault tolerance through the support of multihoming at one or both ends of an association The design of SCTP includes appropriate congestion-avoidance behavior and resistance to flooding (denial-of-service) and masquerade attacks. Because TCP/IP does enforce head-of-line blocking, the sigtran Working Group recommends SCTP rather than TCP/IP for the transmission of signaling messages over IP networks (see Section 7.9). The Message Transfer Part (MTP) is divided into three levels. The lowest level, MTP Level 1, is equivalent to the Open Systems Interconnection (OSI) Physical Layer. MTP Level 1 defines the physical, electrical, and functional characteristics of the digital signaling link. Physical interfaces defined include E-1 (2048 kbps; thirtytwo 64-kbps channels), DS-1 (1544 kbps; twenty-four 64-kbps channels), V.35 (64 kbps), DS-0 (64 kbps), and DS-0A (56 kbps). MTP Level 2 ensures accurate endto-end transmission of a message across a signaling link. It implements flow control, message-sequence validation, and error checking. When an error occurs on a signaling link, the message (or set of messages) is retransmitted. MTP Level 2 is
Signaling Approaches
225
equivalent to the OSI Data Link Layer. MTP Level 3 provides message routing between signaling points in the CCSS7 network. It reroutes traffic away from failed links and signaling points and controls traffic when congestion occurs. It is equivalent to the OSI Network Layer [9]. There are three types of messages, called signal units (SUs), in CCSS7 (see Figure 7.34): 1. Message signal units (MSUs) 2. Link-status signal units (LSSUs) 3. Fill-in signal units (FISUs) FISUs are transmitted continuously on a signaling link in both directions, unless other signal units (MSUs or LSSUs) are present, and carry basic Layer 2 (known as Level 2 in CCSS7) information only. LSSUs carry one or two octets (8-bit bytes) of link-status information between signaling points at each end of a link. In other words, LSSUs allow peer MTP Level 2 layers to exchange link-status information. The link status is used to control link alignment and to indicate the status of a signaling point to the remote signaling point. MSUs carry all call control, database query and response, network management, and network maintenance data in the Signaling Information Field (SIF). MSUs have a routing label that allows an originating signaling point to send information to a destination signaling point across the network. MSUs originate at a level higher than MTP Level 2 and are destined to have a peer at another node. As noted, FISUs are sent when no other signal units are waiting to be sent
B F Flag BSN I FSN I B B Length 8 7 1 7 1 (bits) B F Flag BSN I FSN I B B Length 8 7 1 7 1 (bits)
Figure 7.34 CCSS7 signal units.
B F Flag BSN I FSN I B B Length 8 7 1 7 1 (bits)
U Spare CRC 6
2
Fill-In Signal Unit
16
U Spare Status CRC Link Status Signal Unit 6
2
8 or 16
U Spare SIO 6
2
8
16
SIF
CRC
8n ns 272
16
Message Signal Unit
226
Chapter Seven
across the synchronous link. This purpose is preserved by the heartbeat messages in SCTP. FISUs also carry acknowledgment of messages, a function also assumed by SCTP. In summary, SCTP provides • Acknowledged error-free nonduplicated transfer of signaling information • In-sequence delivery of messages within multiple streams, with an option for order-of-arrival delivery of individual messages • Optional bundling of multiple messages into a single SCTP packet • Data fragmentation as required • Network-level fault tolerance through support of multihoming at one or both ends of an association • Appropriate congestion-avoidance behavior and resistance to flooding (denial-of-service) and masquerade attacks To meet stringent CCSS7 signaling reliability and performance requirements for carrier-grade networks, VOIP network operators ensure that there is no single point of failure in the end-to-end network architecture between a CCSS7 node and an MGC. To achieve carrier-grade reliability in IP networks, links in a linkset are typically distributed among multiple signaling gateways, MGCs are distributed over multiple central processing unit (CPU) hosts, and redundant IP network paths are provisioned to ensure survivability of SCTP associations between SCTP endpoints [9]. SCTP is discussed in greater detail in Section 7.9.
Transporting MTP over IP For MTP messages transported over CCSS7 or IP networks, ITU specifies the following requirements: • MTP Level 3 peer-to-peer procedures require a response time within 0.5 s (500 ms) to 1.2 s (1200 ms). • No more than 1 in 10 million messages will be lost from transport failure. • No more than 1 in 10 billion messages (including duplicated ones) will be delivered out of sequence from transport failure. • No more than 1 in 10 billion (1 in 1 billion for ANSI specifications) messages will contain an error that is undetected by the transport protocol. • Availability of any signaling route set (the complete set of allowed signaling paths from a given signaling point toward a specific destination) is 99.9998 percent or better (downtime of approximately 10 min/yr or less). • The message length (payload accepted) is 272 bytes for narrowband CCSS7 and 4091 bytes for broadband CCSS7.
Signaling Approaches
227
To achieve the functional and performance requirements for MTP, the sigtran Working Group has recommended three new protocols: M2UA, M2PA, and M3UA.
M2UA: MTP Level 2 User Adaptation Layer M2UA is a sigtran protocol used for the transport of CCSS7 MTP Level 2 userpart signaling messages (that is, MTP Level 3) over IP by way of SCTP. The M2UA protocol layer provides a set of services to its users equivalent to that which MTP Level 2 provides to MTP Level 3 users. M2UA is used between the signaling gateway and MGC in VOIP networks. The signaling gateway receives CCSS7 messages over an MTP Level 1 and Level 2 interface from a signaling endpoint (SCP or SSP) or STP in PSTNs. The signaling gateway terminates the CCSS7 link at MTP Level 2 and transports MTP Level 3 above to an MGC or other IP endpoint by using M2UA over SCTP/IP. The signaling gateway maintains the availability state of all MGCs to manage signaling traffic flows across active SCTP associations.
M2PA: MTP Level 2 User Peer-to-Peer Adaptation Layer Like M2UA, M2PA is a sigtran protocol used for the transport of CCSS7 MTP Level 2 user-part signaling messages (MTP Level 3) over IP by way of SCTP. Unlike M2UA, however, M2PA is used to support full MTP Level 3 message handling and network management between any two CCSS7 nodes communicating over an IP network. IP signaling points function as traditional CCSS7 nodes by using the IP network instead of the CCSS7 network. Each circuit-switched or IP signaling point has a CCSS7 point code. The M2PA protocol layer provides the same set of services as that which MTP Level 2 provides to MTP Level 3. M2PA can be used between a signaling gateway and an MGC, between a signaling gateway and an IP signaling point, and between two IP signaling points. Signaling points may use M2PA over IP or MTP Level 2 over standard CCSS7 links to send and receive MTP Level 3 messages. M2PA facilitates the integration of CCSS7 and IP networks by enabling nodes in circuit-switched networks to access IP telephony databases and other nodes in IP networks by using CCSS7 signaling. Conversely, M2PA allows IP telephony applications to access CCSS7 databases, such as local number portability, calling card, freephone, and mobile subscriber databases. In addition, using M2UA over IP may result in cost advantages if traditional CCSS7 links are replaced by IP connections. In summary, M2PA and M2UA differ in the following ways: • In M2PA, the signaling gateway is a CCSS7 node with a point code; in M2UA, the signaling gateway is not a CCSS7 node and has no point code. • In M2PA, the connection between the signaling gateway and IP signaling points is a CCSS7 link; in M2UA, the connection between the signaling gateway and the MGC is not a CCSS7 link but is instead an extension of MTP from the signaling gateway to the MGC.
228
Chapter Seven
• In M2PA, the signaling gateway can have upper CCSS7 layers, such as SCCP; in M2UA, the signaling gateway has no upper CCSS7 layers, because it has no MTP Level 3. • In M2PA, management procedures rely on MTP Level 3; in M2UA, they are their own procedures. • In M2PA, IP signaling points process MTP Level 3 and MTP Level 2 primitives; in M2UA, the MGC transports MTP Level 3 and MTP Level 2 primitives to the signaling gateway’s MTP Level 2 for processing.
M3UA: MTP Level 3 User Adaptation Layer M3UA is a sigtran protocol used for the transport of MTP Level 3 user-part signaling messages (such as ISUP, TUP, and SCCP) over IP by way of SCTP. TCAP or Radio Access Network Application Port (RANAP) messages, as SCCP user protocols, may be carried either by SCCP via M3UA or by a different sigtran protocol called SCCP User Adaptation (SUA) Layer, as described below. M3UA is used between a signaling gateway and an MGC or IP telephony database. The signaling gateway receives CCSS7 signaling by using MTP as a transport over a standard CCSS7 link. The signaling gateway terminates MTP Levels 2 and 3 and delivers ISUP, TUP, SCCP, and/or any other MTP Level 3 user messages—as well as certain MTP network management events—over SCTP associations to MGCs or IP telephony databases. The ISUP or SCCP layer at an IP signaling point is unaware that the expected MTP Level 3 services are provided not locally but by the remote signaling gateway. Similarly, the MTP Level 3 layer at a signaling gateway may be unaware that its local users are actually remote parts over M3UA. Conceptually, M3UA extends access to MTP Level 3 services at the signaling gateway to remote IP endpoints. If an IP endpoint is connected to more than one signaling gateway, the M3UA layer at the IP endpoint maintains the status of configured CCSS7 destinations and route messages according to the availability and congestion status of the routes to these destinations via each signaling gateway. M3UA does not impose a 272-octet-length SIF limit, as specified by CCSS7 MTP Level 2. Larger information blocks can be accommodated directly by M3UA/SCTP without the need for an upper-layer segmentation/reassembly procedure, as specified by the SCCP and ISUP standards. However, a signaling gateway will enforce the maximum 272-octet-length limit when it is connected to a CCSS7 network that does not support the transfer of larger information blocks to the destination. For broadband MTP networks, the signaling gateway will fragment ISUP or SCCP messages larger than 272 octets as required. At the signaling gateway, the M3UA layer provides interworking with MTP-3 management functions to support seamless operation of signaling between CCSS7 and IP networks. For example, the signaling gateway notifies remote MTP Level 3 users at IP endpoints whenever a CCSS7 signaling point is reachable or unreach-
Signaling Approaches
229
able or whenever CCSS7 network congestion or restrictions occur. The M3UA layer at an IP endpoint keeps the state of the routes to remote CCSS7 destinations, and it may request the state of remote CCSS7 destinations from the M3UA layer at the signaling gateway. The M3UA layer at an IP endpoint may also notify the signaling gateway that an IP-endpoint M3UA is congested.
Transporting SCCP over IP SCCP User Adaptation Layer (SUA) is a sigtran protocol used for the transport of CCSS7 SCCP user-part signaling messages (such as TCAP and RANAP) over IP using SCTP. SUA is used between a signaling gateway and an IP signaling endpoint and between IP signaling endpoints. It supports both SCCP unordered and insequence connectionless services and bidirectional connection-oriented services with or without flow control, message-loss detection, and out-of-sequence-error detection (SCCP Protocol Classes 0 through 3). For connectionless transport, SCCP and SUA interface at the signaling gateway. From the perspective of a CCSS7 signaling point, the SCCP user is located at the signaling gateway. CCSS7 messages are routed to the signaling gateway based on point code and SCCP subsystem number. The signaling gateway then routes SCCP messages to the remote IP endpoint. If redundant IP endpoints exist, the signaling gateway can load-share among active IP endpoints by using a round-robin approach. Load sharing of TCAP messages occurs only for the first message in a TCAP dialogue; subsequent TCAP messages in the same dialogue are always sent to the IP endpoint selected for the first message, unless endpoints share state information and the signaling gateway is aware of the message-allocation policy of the IP endpoints. The signaling gateway may also perform Global Title Translation (GTT) to determine the destination of an SCCP message. The signaling gateway routes on the global title—that is, digits present in the incoming message, such as a called party number or mobile subscriber identification number. For connection-oriented transport, SCCP and SUA interface at the signaling gateway to associate the two connection sections needed for connection-oriented data transfer between a CCSS7 signaling endpoint and an IP endpoint. Messages are routed by the signaling gateway to CCSS7 signaling points based on the destination point code (in the MTP Level 3 address field) and to IP endpoints based on the IP address (in the SCTP header). SUA can also be used to transport SCCP user information between IP endpoints directly rather than via the signaling gateway. The signaling gateway is needed only to enable interoperability with CCSS7 signaling in the circuitswitched network. If an IP resident application is connected to multiple signaling gateways, multiple routes may exist to a destination within the CCSS7 network. In this case, the IP endpoint must monitor the status of remote signaling gateways before initiating a message transfer.
230
Chapter Seven
7.9
SCTP
This section provides a description of SCTP.
Introduction This section describes the reasoning behind the development of SCTP, the services it offers, and the basic concepts needed to understand the detailed functioning of the protocol [13].4 Our discussion is only a summarization; developers and software engineers should refer directly to RFC 2960. Because SCTP is expected to play a role in VOIP, we provide herewith a description of SCTP, based directly on the RFC. This material expands the information provided in Section 7.8.
Motivation Over the years, TCP [14] has been the primary means for reliable data transfer in IP networks. However, an increasing number of recent applications have found TCP too limiting and have incorporated their own reliable data transfer protocol UDP [15]. The limitations that users wish to bypass include the following: • TCP provides both reliable data transfer and strict order-of-transmission data delivery. Some applications need reliable transfer without sequence maintenance, while others are satisfied with partial ordering of the data. In both cases, the head-of-line blocking offered by TCP causes unnecessary delay. • The stream-oriented nature of TCP is often an inconvenience. Applications must add their own record marking to delineate their messages and must make explicit use of the push facility to ensure that a complete message is transferred in a reasonable time. • The limited scope of TCP sockets complicates the task of providing highly available data transfer capability using multihomed hosts. • TCP is relatively vulnerable to denial-of-service attacks, such as SYN attacks. Transport of PSTN signaling across the IP network is an application for which all of these limitations of TCP are relevant. While this application directly motivated the development of SCTP, other applications may find SCTP a good match to their requirements.
Architectural View of SCTP SCTP is viewed as a layer between the SCTP user application (SCTP user for short) and a connectionless packet network service such as IP. The remainder of this chapter assumes that SCTP runs over IP. The basic service offered by SCTP is the reliable transfer of user messages between peer SCTP users. SCTP performs
Signaling Approaches
231
this service within the context of an association between two endpoints. Application Programming Interface (APIs) exist at the boundary between the SCTP and the SCTP user layers—specifically, the upper-layer protocol (ULP). Tables 7.8 and 7.9 depict ULP-to-SCTP primitives and SCTP-to-ULP primitives, respectively. SCTP is connection-oriented in nature, but the SCTP association is a concept broader than the TCP connection (see Figure 7.35). SCTP provides the means for each SCTP endpoint to provide the other endpoint (during association startup) with a list of transport addresses (that is, multiple IP addresses in combination with an SCTP port) through which the other endpoint can be reached and from which it will originate SCTP packets. The association spans transfers over all of the possible source/destination combinations that may be generated from each endpoint’s lists.
Functional View of SCTP The SCTP transport service can be decomposed into a number of functions. These functions are depicted in Figure 7.36 and explained in the remainder of this section.
Association Startup and Takedown An association is started by a request from the SCTP user (via the ASSOCIATE, or SEND primitive). A cookie mechanism, similar to one described by Karn and Simpson [16], is employed during the startup to provide protection against security attacks. The cookie mechanism uses a four-way handshake, the last two legs of which are allowed to carry user data for fast setup. SCTP provides for a graceful close (that is, shutdown) of an active association on request from the SCTP user. The SHUTDOWN primitive is used. SCTP also allows for an ungraceful close (that is, abort), either on request from the user (ABORT primitive) or as a result of an error condition detected within the SCTP layer. Like TCP, SCTP does not support a half-open state wherein one end may continue sending data while the other end is closed. When either endpoint performs a shutdown, the association on each peer stops accepting new data from its user and only delivers data in queue at the time of the graceful close.
Sequenced Delivery within Streams The term stream is used in SCTP to refer to a sequence of user messages that are to be delivered to the ULP in order with respect to other messages within the same stream. This use contrasts with its use in TCP, where the term refers to a sequence of bytes (in this section, a byte is assumed to be eight bits). The SCTP user can specify at association startup time the number of streams to be supported by the association. This number is negotiated with the remote end. User messages are associated with stream numbers (SEND and RECEIVE primitives). Internally, SCTP assigns a stream-sequence number to each message passed to it by the SCTP user. On the receiving end, SCTP ensures that messages are delivered to the SCTP user in sequence within a given stream. However, while
Table 7.8 ULP-to-SCTP Primitives Initialize
Format: INITIALIZE([local port], [local eligible address list]) → local SCTP instance name
This primitive allows SCTP to initialize its internal data structures and allocate necessary resources for setting up its operation environment. Once SCTP is initialized, ULP can communicate directly with other endpoints without reinvoking this primitive.
Associate
Format: ASSOCIATE(local SCTP instance name, destination transport addr, outbound stream count) → association id [,destination transport addr list] [,outbound stream count]
This primitive allows the upper layer to initiate an association to a specific peer endpoint.
Shutdown
Format: SHUTDOWN(association id) → result
This primitive gracefully closes an association. Any locally queued user data will be delivered to the peer. The association will be terminated only after the peer acknowledges all the SCTP packets sent. A success code will be returned on successful termination of the association. If attempting to terminate the association results in a failure, an error code will be returned.
Abort
Format: ABORT(association id [, cause code]) → result
This primitive ungracefully closes an association. Any locally queued user data will be discarded and an ABORT chunk is sent to the peer. A success code will be returned on successful abortion of the association. If attempting to abort the association results in a failure, an error code will be returned.
Send
Format: SEND(association id, buffer address, byte count [,context] [,stream id] [,life time] [,destination transport address] [,unorder flag] [,no-bundle flag] [,payload protocol-id]) → result
This primitive is the main method to send user data via SCTP.
Set Primary
Format: SETPRIMARY(association id, destination transport address, [source transport address]) → result
This primitive instructs the local SCTP to use the specified destination transport address as primary path for sending packets.
Receive
Format: RECEIVE(association id, buffer address, buffer size [,stream id]) → byte count [,transport address] [,stream id] [,stream sequence number] [,partial flag] [,delivery number] [,payload protocol-id]
This primitive will read the first user message in the SCTP in-queue into the buffer specified by the ULP, if one is available. The size of the message read, in bytes, will be returned. It may, depending on the specific implementation, also return other information, such as the sender’s address, the stream id on which it is received, and whether there are more messages available for retrieval. The stream-sequence number for ordered messages may also be returned.
Status
Format: STATUS(association id) → status data
This primitive should return a data block containing the following information: association connection state, destination transport address list, destination transport address reachability states, current receiver window size, current congestion window sizes, number of unacknowledged DATA chunks, number of DATA chunks pending receipt, primary path, most recent
232
Signaling Approaches
233
Table 7.8 (Continued) SRTT on the primary path, RTO on the primary path, SRTT and RTO on the other destination addresses, and so forth. Change Heartbeat
Format: CHANGEHEARTBEAT(association id, destination transport address, new state [,interval]) → result
This primitive instructs the local endpoint to enable or disable heartbeat on the specified destination transport address.
Request Heartbeat
Format: REQUESTHEARTBEAT(association id, destination transport address) → result
This primitive instructs the local endpoint to perform a Heartbeat on the specified destination transport address of the given association. The returned result should indicate whether the transmission of the HEARTBEAT chunk to the destination address is successful.
Get SRTT Report
Format: GETSRTTREPORT(association id, destination transport address) → SRTT result
This primitive instructs the local SCTP to report the current SRTT measurement on the specified destination transport address of the given association. The returned result can be an integer containing the mostrecent SRTT in milliseconds.
Set Failure Threshold
Format: SETFAILURETHRESHOLD(association id, destination transport address, failure threshold) → result
This primitive allows the local SCTP to customize the reachability failure detection threshold Path.Max. Retrans for the specified destination address.
Set Protocol Parameters
Format: SETPROTOCOLPARAMETERS(association id, [,destination transport address,] protocol parameter list) → result
Receive Unsent Message
Format: RECEIVE_UNSENT(data retrieval id, buffer address, buffer size [,stream id] [,stream sequence number] [,partial flag] [,payload protocol-id])
Receive Format: RECEIVE_UNACKED(data retrieval id, buffer address, buffer size, [,stream id] Unacknowledged [,stream sequence number] [,partial flag] [,payload protocol-id]) Message Destroy SCTP Instance
Format: DESTROY(local SCTP instance name)
one stream may be blocked as it waits for the next in-sequence user message, delivery from other streams may proceed. SCTP provides a mechanism for bypassing the sequenced delivery service. User messages sent via this mechanism are delivered to the SCTP user as soon as they are received.
User Data Fragmentation When needed, SCTP fragments user messages to ensure that the SCTP packet passed to the lower layer conforms to the MTU path. On receipt, fragments are reassembled into complete messages before being passed to the SCTP user.
234
Chapter Seven
Table 7.9 SCTP-to-ULP Primitives DATA ARRIVE Notification
SCTP will invoke this notification on the ULP when a user message is successfully received and ready for retrieval.
SEND FAILURE Notification
If a message cannot be delivered, SCTP will invoke this notification on the ULP.
NETWORK STATUS CHANGE Notification
When a destination transport address is marked inactive (for example, when SCTP detects a failure), or marked active (for example, when SCTP detects a recovery), SCTP will invoke this notification on the ULP.
COMMUNICATION UP Notification
When SCTP becomes ready to send or receive user messages, or when a lost communication to an endpoint is restored, SCTP will use this notification.
COMMUNICATION LOST Notification
When SCTP loses communication to an endpoint completely (for example, via heartbeats) or detects that the endpoint has performed an abort operation, it will invoke this notification on the ULP.
COMMUNICATION ERROR Notification
When SCTP receives an ERROR chunk from its peer and decides to notify its ULP, it can invoke this notification on the ULP.
RESTART Notification
When SCTP detects that a peer has restarted, it may send this notification to its ULP.
SHUTDOWN COMPLETE Notification
When SCTP completes the shutdown procedures, this notification is passed to the upper layer by SCTP.
Acknowledgment and Congestion Avoidance SCTP assigns a transmission sequence number (TSN) to each user data fragment or unfragmented message. The TSN is independent of any stream-sequence number assigned at the stream level. The receiving end acknowledges all TSNs received, even if there are gaps in the sequence. In this way, reliable delivery is kept functionally separate from sequenced stream delivery. The acknowledgment and congestion-avoidance function is responsible for packet retransmission when timely acknowledgment has not been received. Packet retransmission is conditioned by congestion-avoidance procedures similar to those used for TCP. Figure 7.35 SCTP association.
SCTP User Application
SCTP User Application
SCTP Transport Service IP Network Service SCTP Node A
SCTP Transport Service
Network
One or more IP address appearances
One or more IP Network IP address Service appearances
Network transport
SCTP Node B
Signaling Approaches
Figure 7.36 Functional view of SCTP transport service.
235
SCTP User Application Sequenced delivery within streams User Data Fragmentation Association startup and takedown
Acknowledgement and Congestion Avoidance Chuck Bundling Packet Validation Path Management
Chunk Bundling The SCTP packet as delivered to the lower layer consists of a common header followed by one or more chunks, which are units of information within an SCTP packet consisting of a header and chunk-specific content. Each chunk may contain either user data or SCTP control information. The SCTP user has the option to request bundling of more than one user message into a single SCTP packet. The chunk-bundling function of SCTP is responsible for assembly of the complete SCTP packet and its disassembly at the receiving end. During times of congestion, an SCTP implementation may still perform bundling even if the user has requested that SCTP not bundle. The user’s disabling of bundling only affects SCTP implementations that may cause a small timeperiod delay before transmission to encourage bundling. When the user layer disables bundling, this delay is prohibited, but not the bundling that is performed during congestion or retransmission.
Packet Validation A mandatory Verification Tag field and a 32-bit (Adler-32) checksum field are included in the SCTP common header. (The Adler-32 checksum is described in the RFC.) The Verification Tag value is chosen by each end of the association during association startup. Packets received without the expected Verification Tag value are discarded to protect against blind masquerade attacks and stale SCTP packets from a previous association. The Adler-32 checksum is set by the sender of each SCTP packet to provide additional protection against data corruption in the network. The receiver of an SCTP packet with an invalid Adler-32 checksum silently discards the packet.
236
Chapter Seven
Path Management The sending SCTP user is able to manipulate the set of transport addresses used as destinations for SCTP packets through the API primitives. The SCTP path management function chooses the destination transport address for each outgoing SCTP packet based on the SCTP user’s instructions and the currently perceived reachability status of the eligible destination set. The path management function monitors reachability through heartbeats when other packet traffic is inadequate to provide this information, and it advises the SCTP user when the reachability changes of any far-end transport address. The path management function is also responsible for reporting the eligible set of local transport addresses to the far end during association startup and for reporting the transport addresses returned from the far end to the SCTP user. At association startup, a primary path is defined for each SCTP endpoint and is used for the normal sending of SCTP packets. On the receiving end, the path management function is responsible for verifying the existence of a valid SCTP association to which the inbound SCTP packet belongs before passing it for further processing. Note: Path management and packet validation are done simultaneously, so even though they are described separately here, they cannot be performed separately.
Key Terms Some of the language used to describe SCTP was introduced in previous sections. Table 7.10 provides a consolidated list of key terms and their definitions.
Serial Number Arithmetic The actual TSN space is finite, though very large. This space ranges from 0 to 232 − 1. Since the space is finite, all arithmetic dealing with TSNs must be performed modulo 232. This unsigned arithmetic preserves the relationship of sequence numbers as they cycle from 232 − 1 to 0 again. There are some subtleties to computer modulo arithmetic, so great care should be taken in programming the comparison of such values. When referring to TSNs, the ≤ symbol means “less than or equal to” (modulo 232). Comparisons and arithmetic on TSNs use serial number arithmetic as defined in RFC 1982 [17], where SERIAL_BITS = 32. An endpoint should not transmit a DATA chunk with a TSN that is more than 231 − 1 above the beginning TSN of its current send window. Doing so will cause problems in comparing TSNs. TSNs wrap around when they reach 232 − 1. That is, the next TSN that a DATA chunk must use after transmitting TSN = 232 − 1 is TSN = 0.
Table 7.10 Key Terms Active destination transport address
A transport address on a peer endpoint that a transmitting endpoint considers available for receiving user messages.
Bundling
An optional multiplexing operation whereby more than one user message may be carried in the same SCTP packet. Each user message occupies its own DATA chunk.
Chunk
A unit of information within an SCTP packet, consisting of a chunk header and chunk-specific content.
Congestion window (cwnd)
An SCTP variable that limits the data, in number of bytes, a sender can send to a particular destination transport address before receiving an acknowledgment.
Cumulative TSN Ack Point
The TSN of the last DATA chunk acknowledged via the Cumulative TSN Ack field of a SACK.
Idle destination address
An address that has not had user messages sent to it within some length of time— normally, the HEARTBEAT interval or longer.
Inactive destination transport address
An address considered inactive from errors and unavailable to transport user messages.
Message = user message
Data submitted to SCTP by the ULP.
Message Authentication Code (MAC)
An integrity check mechanism, based on cryptographic hash functions, using a secret key. Typically, message authentication codes are used between two parties that share a secret key in order to validate information transmitted between these parties. In SCTP, it is used by an endpoint to validate the State Cookie information returned from the peer in the COOKIE ECHO chunk. The term MAC has different meanings in different contexts. SCTP uses this term with the same meaning as in RFC 2104.
Network Byte Order
Most significant byte first; also known as Big Endian.
Ordered Message
A user message that is delivered in order with respect to all previous user messages sent within the stream on which the user message was sent.
Outstanding TSN (at an SCTP endpoint)
A TSN, and its associated DATA chunk, that has been sent by the endpoint but for which an acknowledgement has yet to be received.
Path
The route taken by the SCTP packets sent by one SCTP endpoint to a specific destination transport address of its peer SCTP endpoint. Sending to different destination transport addresses does not necessarily guarantee getting separate paths.
Primary Path
The primary path is the destination and source address that will be put into a packet outbound to the peer endpoint by default. The definition includes the source address, since an implementation may wish to specify both destination and source address to better control the return path taken by reply chunks and on which interface the packet is transmitted when the data sender is multihomed.
Receiver Window (rwnd)
An SCTP variable that a data sender uses to store the most recently calculated receiver window of its peer, in number of bytes. This variable gives the sender an indication of the space available in the receiver’s inbound buffer.
SCTP association
A protocol relationship between SCTP endpoints, composed of the two SCTP endpoints and protocol state information—Verification Tags, the currently active set of TSNs, and so on. An association can be uniquely identified by the transport address used by the endpoints in the association. Two SCTP endpoints must not have more than one SCTP association between them at any given time.
SCTP endpoint
The logical sender and receiver of SCTP packets. On a multihomed host, an SCTP endpoint is represented to its peers as a combination of (1) a set of eligible destination transport addresses to which SCTP packets can be sent and (2) a set of eligible source transport addresses from which SCTP packets can be received. All transport addresses used by an SCTP endpoint must use the same port number, although they can use multiple IP addresses. A transport address used by an SCTP endpoint must not be used by another SCTP endpoint. In other words, a transport address is unique to an SCTP endpoint. (continues)
237
Table 7.10 (Continued) SCTP packet (or simply packet)
The unit of data delivery across the interface between SCTP and the connectionless packet network (for example, IP). An SCTP packet includes the common SCTP header, possible SCTP control chunks, and user data encapsulated within SCTP DATA chunks.
SCTP user application (SCTP user)
The logical higher-layer application entity that uses the services of SCTP; also called the ULP.
Slow Start Threshold (ssthresh)
An SCTP variable, given in number of bytes. This is the threshold that the endpoint will use to determine whether to perform slow start or congestion avoidance on a particular destination transport address.
Stream
A unidirectional logical channel established from one to another associated SCTP endpoint, within which all user messages are delivered in sequence except for those submitted to the unordered delivery service. The relationship between stream numbers in opposite directions is strictly a matter of how the applications use them. It is the responsibility of the SCTP user to create and manage these correlations if they are so desired.
Stream-sequence number
A 16-bit sequence number used internally by SCTP to ensure the sequenced delivery of user messages within a given stream. One stream-sequence number is attached to each user message.
Tie Tags
Verification Tags from a previous association. These tags are used within a State Cookie so that the newly restarting association can be linked to the original association within the endpoint that did not restart.
Transmission Control Block (TCB)
An internal data structure created by an SCTP endpoint for each of its existing SCTP associations to other SCTP endpoints. TCB contains all the status and operational information for the endpoint to maintain and manage the corresponding association.
Transmission Sequence Number (TSN)
A 32-bit sequence number used internally by SCTP. One TSN is attached to each chunk containing user data to permit the receiving SCTP endpoint to acknowledge its receipt and detect duplicate deliveries.
Transport address
A transport address is traditionally defined by a network layer address, transport layer protocol and transport layer port number. In the case of SCTP over IP, a transport address is defined by the combination of an IP address and an SCTP port number (where SCTP is the transport protocol).
Unacknowledged TSN (at an SCTP endpoint)
A TSN (and the associated DATA chunk) which has been received by the endpoint but for which an acknowledgement has not yet been sent. Or in the opposite case, for a packet that has been sent but no acknowledgement has been received.
Unordered message
Unordered messages are unordered with respect to any other message, including both other unordered messages and other ordered messages. An unordered message might be delivered before or after an ordered message sent on the same stream.
User message
The unit of data delivery across the interface between SCTP and its user.
Verification Tag
A 32-bit unsigned integer that is randomly generated. The Verification Tag provides a key that allows a receiver to verify that the SCTP packet belongs to the current association and is not an old or stale packet from a previous association.
238
Signaling Approaches
Figure 7.37 SCTP packet format.
239
0 1 2 3 01234567890123456789012345678901 Common Header Chunk #1 ... Chunk #n
SCTP Packet Format An SCTP packet is composed of a common header and chunks. A chunk contains either control information or user data. The SCTP packet format is shown in Figure 7.37. Multiple chunks can be bundled into one SCTP packet up to the MTU size except for the INIT, INIT ACK, and SHUTDOWN COMPLETE chunks. These chunks must not be bundled with any other chunk in a packet. If a user data message does not fit into one SCTP packet, it can be fragmented into multiple chunks. All integer fields in an SCTP packet must be transmitted in network-byte order unless otherwise stated.
SCTP Common Header Field Descriptions Figure 7.38 shows the SCTP common header format. Source Port Number: 16 bits (unsigned integer). This is the SCTP sender’s port number. It can be used by the receiver in combination with the source IP address, the SCTP destination port, and possibly the destination IP address to identify the association to which this packet belongs. Destination Port Number: 16 bits (unsigned integer). This is the SCTP port number to which this packet is destined. The receiving host uses this port number to demultiplex the SCTP packet to the correct receiving endpoint per application. Verification Tag: 32 bits (unsigned integer). The receiver of this packet uses the Verification Tag to validate the sender of this SCTP packet. On transmission, the value of this Verification Tag must be set to the value of the Initiate Tag received from the peer endpoint during the association setup, with some exceptions listed in the RFC. A packet containing an INIT chunk must have a 0 Verification Tag. An INIT chunk must be the only chunk carried in the SCTP packet. Checksum: 32 bits (unsigned integer). This field contains the checksum of this SCTP packet. SCTP uses the Adler-32 algorithm for calculating the checksum. Figure 7.38 SCTP common header format.
0 1 2 3 01234567890123456789012345678901 Destination Port Number Source Port Number Verification Tag Checksum
240
Chapter Seven
Figure 7.39 Field format for those chunks to be transmitted in the SCTP packet.
0 1 2 3 01234567890123456789012345678901 Chunk Flags Chunk Length Chunk Type Chunk Value
Chunk Field Descriptions Figure 7.39 illustrates the field format for those chunks to be transmitted in the SCTP packet. Each chunk is formatted with a Chunk-Type field, a chunk-specific Flag field, a Chunk-Length field, and a Chunk-Value field. Chunk Type: 8 bits (unsigned integer). This field identifies the type of information contained in the Chunk-Value field. It takes a value from 0 to 254. The value of 255 is reserved for future use as an extension field. The values of chunk types are defined in Table 7.11. Chunk types are encoded so that the highest-order two bits specify the action that must be taken if the processing endpoint does not recognize the chunk type.
Table 7.11 Values of Chunk Types ID value
Chunk type
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 to 62 63 64 to 126 127 128 to 190 191 192 to 254 255
Payload Data (DATA) Initiation (INIT) Initiation Acknowledgment (INIT ACK) Selective Acknowledgment (SACK) Heartbeat Request (HEARTBEAT) Heartbeat Acknowledgment (HEARTBEAT ACK) Abort (ABORT) Shutdown (SHUTDOWN) Shutdown Acknowledgment (SHUTDOWN ACK) Operation Error (ERROR) State Cookie (COOKIE ECHO) Cookie Acknowledgment (COOKIE ACK) Reserved for Explicit Congestion Notification Echo (ECNE) Reserved for Congestion Window Reduced (CWR) Shutdown Complete (SHUTDOWN COMPLETE) Reserved by IETF IETF-defined chunk extensions Reserved by IETF IETF-defined chunk extensions Reserved by IETF IETF-defined chunk extensions Reserved by IETF IETF-defined chunk extensions
Signaling Approaches
241
00 Stop processing this SCTP packet and discard it; do not process any further chunks within it. 01 Stop processing this SCTP packet and discard it; do not process any further chunks within it, and report the unrecognized parameter in an Unrecognized Parameter Type (in either an ERROR or INIT ACK chunk). 10 Skip this chunk and continue processing. 11 Skip this chunk and continue processing, but report in an ERROR chunk by using the Unrecognized Chunk Type cause of error. Note: The ECNE and CWR chunk types are reserved for future use of Explicit Congestion Notification (ECN). Chunk Flags: 8 bits. The use of these bits depends on the chunk type. Unless otherwise specified, Chunk Flags are set to 0 on transmit and are ignored on receipt. Chunk Length: 16 bits (unsigned integer). This value represents the size of the chunk in bytes, including the Chunk-Type, Chunk-Flags, Chunk-Length, and Chunk-Value fields. Therefore, if the Chunk-Value field length is 0, the Chunk-Length field will be set to 4. The Chunk-Length field does not count any padding. Chunk Value: variable length. The Chunk-Value field contains the actual information to be transferred in the chunk. The use and format of this field is dependent on the chunk type. The total length of a chunk (including Type, Length, and Value fields) must be a multiple of 4 bytes. If the length of the chunk is not a multiple of 4 bytes, the sender must pad the chunk with all 0 bytes, and this padding is not included in the Chunk-Length field. The sender should never pad with more than 3 bytes, and the receiver must ignore the padding bytes. Optional and Variable-Length Parameter Format Values of SCTP control chunks consist of a chunk-type-specific header of required fields, followed by 0 or more parameters. The optional and variable-length parameters contained in a chunk are defined in a Type-Length-Value format as shown in Figure 7.40.
Figure 7.40 Optional and variable-length parameter format.
0 1 2 3 01234567890123456789012345678901 Parameter Length Parameter Type Parameter Value
242
Chapter Seven
Chunk-Parameter Type: 16 bits (unsigned integer). This field is a 16-bit identifier of the type of parameter. It takes a value of 0 to 65534. The value of 65535 is reserved for IETF-defined extensions. Values other than those defined in specific SCTP-chunk descriptions are reserved for use by IETF. Chunk-Parameter Length: 16 bits (unsigned integer). This field contains the size of the parameter in bytes, including the parameter type, parameter length, and parameter value. Thus, a parameter with a 0-length Parameter-Value field will have a length field of 4. The Parameter Length does not include any padding bytes. Chunk-Parameter Value: variable length. This field contains the actual information to be transferred in the parameter. The total length of a parameter (including Parameter-Type, -Length, and -Value fields) must be a multiple of 4 bytes. If the length of the parameter is not a multiple of 4 bytes, the sender pads the parameter at the end (that is, after the ParameterValue field) with all zero bytes. The length of the padding is not included in the Parameter-Length field. A sender should not pad with more than 3 bytes, and the receiver must ignore the padding bytes. The parameter types are encoded so that the highest-order two bits specify the action that must be taken if the processing endpoint does not recognize the parameter type. 00 Stop processing this SCTP packet and discard it; do not process any further chunks within it. 01 Stop processing this SCTP packet and discard it; do not process any further chunks within it, and report the unrecognized parameter in an Unrecognized Parameter Type (in either an ERROR or INIT ACK chunk). 10 Skip this parameter and continue processing. 11 Skip this parameter and continue processing, but report the unrecognized parameter in an Unrecognized Parameter Type (in either an ERROR or INIT ACK chunk). The actual SCTP parameters are defined in the specific SCTP-chunk sections.
SCTP-Chunk Definitions Here, we define the format of the different SCTP-chunk types. Payload Data (DATA) Chunk (ID=0) The format of Figure 7.41 must be used for the DATA chunk. Reserved: 5 bits. Should be set to all 0s and ignored by the receiver. U bit: 1 bit. The (U)nordered bit, if set to 1, indicates an unordered DATA chunk to which no stream-sequence number has been assigned. Therefore, the
Signaling Approaches
243
0 1 2 3 01234567890123456789012345678901 Reserved U B E Length Type = 0 TSN Steam Identifier S Steam Sequence Number n Payload Protocol Indentifier
Figure 7.41 Payload data.
User Data (seq n of Stream S)
receiver must ignore the Stream-Sequence-Number field. After reassembly (if necessary), unordered DATA chunks must be dispatched to the upper layer by the receiver without any attempt to reorder. If an unordered user message is fragmented, each fragment of the message must have its U bit set to 1. B bit: 1 bit. The (B)eginning fragment bit, if set, indicates the first fragment of a user message. E bit: 1 bit. The (E)nding fragment bit, if set, indicates the last fragment of a user message. An unfragmented user message must have both the B and E bits set to 1. Setting both B and E bits to 0 indicates a middle fragment of a multifragment user message, as summarized in Table 7.12. When a user message is fragmented into multiple chunks, the receiver uses the TSNs to reassemble the message, which means that the TSNs for each fragment of a fragmented user message must be strictly sequential. Length: 16 bits (unsigned integer). This field indicates the length of the DATA chunk in bytes from the beginning of the type field to the end of the user data field (excluding any padding). A DATA chunk with no user data field has a length set to 16, indicating 16 bytes. TSN: 32 bits (unsigned integer). This value represents the TSN for this DATA chunk. The valid range of TSN is from 0 to 4294967295 (2 ∗∗ 32 − 1). The TSN wraps back to 0 after reaching 4294967295. Stream Identifier S: 16 bits (unsigned integer). This value identifies the stream to which the following user data belongs.
Table 7.12 Fragment Description Flags B bit E bit
Description
10 00 01 11
First piece of a fragmented user message Middle piece of a fragmented user message Last piece of a fragmented user message Unfragmented message
244
Chapter Seven
Stream-Sequence Number n: 16 bits (unsigned integer). This value represents the stream-sequence number of the following user data within stream S. The valid range is 0 to 65535. When a user message is fragmented by SCTP for transport, the same stream-sequence number must be carried in each of the fragments of the message. Payload Protocol Identifier: 32 bits (unsigned integer). This value represents an application-specified, or upper layer–specified, protocol identifier. It is passed to SCTP by its upper layer and sent to its peer. It is not used by SCTP but can be used by certain network entities, as well as by the peer application, to identify the type of information carried in this DATA chunk. This field must be sent even in fragmented DATA chunks to ensure that it is available for agents in the middle of the network. The value 0 indicates that no application identifier is specified by the upper layer for this payload data. User Data: variable length. (Note: This is the payload user data.) The implementation must pad the end of the data to a 4-byte boundary with all 0 bytes. Any padding must not be included in the length field, and a sender must never add more than 3 bytes of padding. Initiation (INIT) Chunk (ID=1) This chunk is used to initiate an SCTP association between two endpoints. The format of the INIT chunk is shown in Figure 7.42. The INIT chunk contains the parameters given in Tables 7.13 and 7.14. Unless otherwise noted, each parameter must be included only once in the INIT chunk. The Chunk-Flag field in INIT is reserved, and all bits in it should be set to 0 by the sender and ignored by the receiver. The sequence of parameters within an INIT can be processed in any order. Initiate Tag: 32 bits (unsigned integer). The receiver of the INIT (the responding end) records the value of the Initiate Tag parameter. This value must be placed into the Verification Tag field of every SCTP packet that the receiver of the INIT transmits within this association. The Initiate Tag is allowed to have any value except 0. If the value of the Initiate Tag in a received INIT chunk is
0 1 2 3 01234567890123456789012345678901 Type = 1 Chunk Flags Chunk Length Initiate Tag Advertised Receiver Window Credit (a_rwnd) Number of Outbound Streams Number of Inbound Streams Initial TSN
Figure 7.42 INIT chunk.
Optional/Variable-Length Parameter
Signaling Approaches
245
Table 7.13 Fixed Parameters Fixed parameters
Status
Initiate Tag Advertised Receiver Window Credit Number of Outbound Streams Number of Inbound Streams Initial TSN
Mandatory Mandatory Mandatory Mandatory Mandatory
0, the receiver must treat it as an error and close the association by transmitting an ABORT. Advertised Receiver Window Credit (a_rwnd): 32 bits (unsigned integer). This value represents the dedicated buffer space, in number of bytes, that the sender of the INIT has reserved in association with this window. During the life of the association, this buffer space should not be lessened (that is, dedicated buffers taken away from this association); however, an endpoint may change the value of a_rwnd that it sends in SACK chunks. Number of Outbound Streams (OSs): 16 bits (unsigned integer). Defines the number of outbound streams the sender of this INIT chunk wishes to create in this association. The value of 0 must not be used. A receiver of an INIT with the OS value set to 0 should abort the association. Number of Inbound Streams (MISs): 16 bits (unsigned integer). Defines the maximum number of streams that the sender of this INIT chunk allows the peer end to create in this association. The value 0 must not be used. There is
Table 7.14 Variable Parameters Variable parameters
Status
Type value
IPv4 Address* IPv6 Address* Cookie Preservative Reserved for ECN Capable† Host Name Address‡ Supported Address Types§
Optional Optional Optional Optional Optional Optional
5 6 9 32768 (0 × 8000) 11 12
*The INIT chunks can contain multiple addresses that can be IPv4 and/or IPv6 in any combination. † The ECN capable field is reserved for future use of ECN. ‡ An INIT chunk must not contain more than one Host Name address parameter. Moreover, the sender of the INIT must not combine any other address types with the Host Name address in the INIT. The receiver of INIT must ignore any other address types if the Host Name address parameter is present in the received INIT chunk. § This parameter, when present, specifies all the address types that the sending endpoint can support. The absence of this parameter indicates that the sending endpoint can support any address type.
246
Chapter Seven
Figure 7.43 IPv4 address parameter (5).
0 1 2 3 01234567890123456789012345678901 Length = 8 Type = 5 IPv4 Address
no negotiation of the actual number of streams; instead, the two endpoints will use the min(requested, offered). A receiver of an INIT with an MIS value of 0 should abort the association. Initial TSN (I-TSN): 32 bits (unsigned integer). Defines the initial TSN that the sender will use. The valid range is from 0 to 4294967295. This field may be set to the value of the Initiate Tag field. Optional and Variable-Length Parameters in INIT The parameters in Figures 7.43 and 7.44 follow the Type-Length-Value format as defined previously. Any Type-Length-Value fields must come after the fixed-length fields. IPv4 Address: 32 bits (unsigned integer). Contains an IPv4 address of the sending endpoint. It is binary-encoded. IPv6 Address: 128 bits (unsigned integer). Contains an IPv6 address of the sending endpoint. It is binary-encoded. Note: A sender must not use an IPv4-mapped IPv6 address [18] but should instead use an IPv4 Address Parameter for an IPv4 address. Combined with the source port number in the SCTP common header, the value passed in an IPv4 or IPv6 Address parameter indicates a transport address that the sender of the INIT will support for the association being initiated. That is, during the lifetime of this association, this IP address can appear in the source address field of an IP datagram sent from the sender of the INIT and can be used as a destination address of an IP datagram sent from the receiver of the INIT. More than one IP Address parameter can be included in an INIT chunk when the INIT sender is multihomed. Moreover, a multihomed endpoint may have access to different types of networks; thus more than one address type can be present in one INIT chunk. In other words, IPv4 and IPv6 addresses are allowed in the same INIT chunk. 0 1 2 3 01234567890123456789012345678901 Length = 20 Type = 6
Figure 7.44 IPv6 address parameter (6).
IPv6 Address
Signaling Approaches
247
0 1 2 3 01234567890123456789012345678901 Length = 8 Type = 9 Suggested Cookie Life-span Increment (msec.)
Figure 7.45 Suggested cookie lifespan increment.
If the INIT contains at least one IP Address parameter, then the source address of the IP datagram containing the INIT chunk and any additional addresses provided within the INIT can be used as destinations by the endpoint receiving the INIT. If the INIT does not contain any IP address parameters, the endpoint receiving the INIT must use the source address associated with the received IP datagram as its sole destination address for the association. Note that not using any IP address parameters in the INIT and INIT ACK is an alternative means for making an association more likely to work across a NAT box. Cookie Preservative (ID = 9) The sender of the INIT must use the parameter in Figure 7.45 to suggest to the receiver of the INIT a longer lifespan of the State Cookie. Suggested Cookie Lifespan Increment: 32 bits (unsigned integer). This parameter indicates to the receiver the increments in milliseconds that the sender wishes the receiver to add to its default cookie lifespan. This optional parameter should be added to the INIT chunk by the sender when it reattempts to establish an association with a peer for which its previous attempt failed because of a stale cookie-operation error. The receiver may choose to ignore the suggested cookie lifespan increase for its own security reasons. Host Name Address (ID = 11) The sender of INIT uses the parameter in Figure 7.46 to pass its host name (in place of its IP addresses) to its peer. The peer is responsible for resolving the name. Using the Host Name Address parameter might make the association more likely to work across a NAT box. Host Name: variable length. This field contains a host name in host name syntax per RFC 1123 Section 2.1 [19]. The method for resolving the host name is out of the scope of SCTP. Note: At least one null terminator is included in the host name string and must be included in the length.
0 1 2 3 01234567890123456789012345678901 Length Type = 11
Figure 7.46 Host name address.
Host Name
248
Chapter Seven
0 1 2 3 01234567890123456789012345678901 Length Type = 12 Address Type #1 Address Type #2 ......
Figure 7.47 Address types.
Supported Address Types (ID = 12) The sender of INIT uses the parameter in Figure 7.47 to list all the address types that it can support. Address Type: 16 bits (unsigned integer). This is filled with the type value of the corresponding address TLV, such as IPv4 = 5, IPv6 = 6, or Host Name = 11. Initiation Acknowledgment (INIT ACK) Chunk (ID=2) The INIT ACK chunk is used to acknowledge the initiation of an SCTP association. The parameter part of INIT ACK is formatted similar to that of the INIT chunk. It uses two extra variable parameters: The State Cookie and the Unrecognized Parameter. The format of the INIT ACK chunk is shown in Figure 7.48; also see Tables 7.15 and 7.16. Initiate Tag: 32 bits (unsigned integer). The receiver of the INIT ACK records the value of the Initiate Tag parameter. This value must be placed into the Verification Tag field of every SCTP packet that the INIT ACK receiver transmits within this association. The Initiate Tag must not take the value 0. If the value of the Initiate Tag in a received INIT ACK chunk is found to be 0, the receiver must treat it as an error and close the association by transmitting an ABORT. Advertised Receiver Window Credit (a_rwnd): 32 bits (unsigned integer). Represents the dedicated buffer space, in number of bytes, that the sender of the INIT ACK has reserved in association with this window. During the life of the association, this buffer space should not be lessened (that is, dedicated buffers taken away from the association). Number of Outbound Streams (OSs): 16 bits (unsigned integer). Defines the number of outbound streams that the sender of this INIT ACK chunk wishes to create in this association. The value of 0 must not be used. A receiver of an Figure 7.48 Initiation acknowledgment.
0 1 2 3 01234567890123456789012345678901 Type = 2 Chunk Flags Chunk Length Initiate Tag Advertised Receiver Window Credit Number of Outbound Streams Number of Inbound Streams Initial TSN Optional/Variable-Length Parameter
Signaling Approaches
249
Table 7.15 Fixed Parameters Fixed parameters
Status
Initiate Tag Advertised Receiver Window Credit Number of Outbound Streams Number of Inbound Streams Initial TSN
Mandatory Mandatory Mandatory Mandatory Mandatory
INIT ACK with the OS value set to 0 should destroy the association, discarding its TCB. Number of Inbound Streams (MISs): 16 bits (unsigned integer). Defines the maximum number of streams that the sender of this INIT ACK chunk allows the peer end to create in this association. The value 0 must not be used. There is no negotiation of the actual number of streams; instead, the two endpoints will use the min(requested, offered). A receiver of an INIT ACK with the MIS value set to 0 should destroy the association discarding its TCB. Initial TSN (I-TSN): 32 bits (unsigned integer). Defines the initial TSN that the INIT ACK sender will use. The valid range is from 0 to 4294967295. This field may be set to the value of the Initiate Tag field. Implementation Note: An implementation must be prepared to receive an INIT ACK in which its large size (more than 1500 bytes) is due to the variable size of the State Cookie and the Variable Address list. For example, if a responder to the INIT has 1000 IPv4 addresses it wishes to send, it would need at least 8000 bytes to encode this in the INIT ACK chunk. In combination with the source port carried in the SCTP common header, each IP Address parameter in the INIT ACK indicates to the receiver of the INIT
Table 7.16 Variable Parameters Variable parameters
Status
Type value
State Cookie IPv4 Address* IPv6 Address* Unrecognized Parameters Reserved for ECN Capable† Host Name Address‡
Mandatory Optional Optional Optional Optional Optional
7 5 6 8 32768 (0 × 8000) 11
*The INIT ACK chunks can contain any number of IP address parameters that can be IPv4 and/or IPv6 in any combination. † The ECN capable field is reserved for future use of Explicit Congestion Notification. § The INIT ACK chunks must not contain more than one Host Name address parameter. Moreover, the sender of the INIT ACK must not combine any other address types with the Host Name address in the INIT ACK. The receiver of the INIT ACK must ignore any other address types if the Host Name address parameter is present.
250
Chapter Seven
ACK a valid transport address supported by the sender of the INIT ACK for the lifetime of the association being initiated. If the INIT ACK contains at least one IP Address parameter, then the source address of the IP datagram containing the INIT ACK and any additional addresses provided within the INIT ACK may be used as destinations by the receiver of the INIT ACK. If the INIT ACK does not contain any IP Address parameters, the receiver of the INIT ACK must use the source address associated with the received IP datagram as its sole destination address for the association. The State Cookie and Unrecognized Parameters use the Type-Length-Value format as defined previously and are described below. The other fields are defined the same way as their counterparts in the INIT chunk. Optional and Variable-Length Parameters State Cookie Parameter-type value: 7. Parameter length: variable size. The size depends on the cookie size. Parameter value. This parameter value must contain all the necessary state and parameter information required for the sender of this INIT ACK to create the association, along with a Message Authentication Code (MAC). Unrecognized Parameters Parameter-type value: 8. Parameter length: variable size. Parameter value. This parameter value is returned to the originator of the INIT chunk when the INIT contains an unrecognized parameter with a value that indicates that this parameter should be reported to the sender. This parameter-value field will contain unrecognized parameters copied from the INIT chunk, complete with parameter type, length, and value. Selective Acknowledgment (SACK) Chunk (ID=3) This chunk (see Figure 7.49) is sent to the peer endpoint to acknowledge received DATA chunks and to inform the peer endpoint of gaps in the received subsequences of DATA chunks as represented by their TSNs. The SACK must contain the Cumulative TSN Ack and a_rwnd parameters. By definition, the value of the Cumulative TSN Ack parameter is the last TSN received before a break in the sequence of received TSNs occurs; the next TSN value will not yet have been received at the endpoint sending the SACK. This parameter therefore acknowledges receipt of all TSNs less than or equal to its value. The handling of a_rwnd by the receiver of the SACK is discussed in detail in the RFC. The SACK also contains 0 or more Gap Ack Blocks. Each Gap Ack Block acknowledges a subsequence of TSNs received following a break in the sequence
Signaling Approaches
Figure 7.49 Selective acknowledgment.
251
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Type = 3 Chunk Flags Chunk Length Cumulative TSN Ack Advertised Receiver Window Credit (a_rwnd) Number of Duplicates TSNs = X Number of Gap Ack Blocks = N Gap Ack Block #1 Start Gap Ack Block #1 End ... Gap Ack Block #N Start Gap Ack Block #N End Duplicate TSN 1 ... Duplicate TSN X
of received TSNs. By definition, all TSNs acknowledged by Gap Ack Blocks are greater than the value of the Cumulative TSN Ack. Chunk Flags: 8 bits. Set to all 0s on transmit and ignored on receipt. Cumulative TSN Ack: 32 bits (unsigned integer). This parameter contains the TSN of the last DATA chunk received in sequence before a gap. Advertised Receiver Window Credit (a_rwnd): 32 bits (unsigned integer). This field indicates the updated received buffer space in bytes of the sender of this SACK. Number of Gap Ack Blocks: 16 bits (unsigned integer). Indicates the number of Gap Ack Blocks included in this SACK. Number of Duplicate TSNs: 16 bit. This field contains the number of duplicate TSNs that the endpoint has received. Each duplicate TSN is listed following the Gap Ack Block list. Gap Ack Blocks. These fields contain the Gap Ack Blocks. They are repeated for each Gap Ack Block up to the number of Gap Ack Blocks defined in the Number of Gap Ack Blocks field. All DATA chunks with TSNs greater than or equal to (Cumulative TSN Ack + Gap Ack Block Start) and less than or equal to (Cumulative TSN Ack + Gap Ack Block End) of each Gap Ack Block are assumed to have been received correctly. Gap Ack Block Start: 16 bits (unsigned integer). Indicates the Start offset TSN for this Gap Ack Block. To calculate the actual TSN number, the Cumulative TSN Ack is added to this offset number. This calculated TSN identifies the first TSN in this Gap Ack Block that has been received.
252
Chapter Seven
Figure 7.50 Example.
TSN = 17 still missing TSN = 15 TSN = 14 still missing TSN = 12 TSN = 11 TSN = 10
Gap Ack Block End: 16 bits (unsigned integer). Indicates the End offset TSN for this Gap Ack Block. To calculate the actual TSN number, the Cumulative TSN Ack is added to this offset number. This calculated TSN identifies the TSN of the last DATA chunk received in this Gap Ack Block. For example, assume the receiver has the DATA chunks of Figure 7.50 newly arrived at the time when it decides to send a SACK; the parameter part of the SACK must then be constructed as shown in Figure 7.51, assuming the new a_rwnd is set to 4660 by the sender. Duplicate TSN: 32 bits (unsigned integer). Indicates the number of times a TSN was received in duplicate since the last SACK was sent. Every time a receiver gets a duplicate TSN (before sending the SACK), it adds the TSN to the list of duplicates. The duplicate count is reinitialized to 0 after each SACK is sent. For example, if a receiver were to get the TSN 19 three times, it would list 19 twice in the outbound SACK. After sending the SACK, if it received yet one more TSN 19, it would list 19 as a duplicate once in the next outgoing SACK. Heartbeat Request (HEARTBEAT) Chunk (ID=4) An endpoint should send this chunk to its peer endpoint to probe the reachability of a particular destination
Cumulative TSN Ack = 12 a_rwnd = 4660
Figure 7.51 Example of SACK.
num of block = 2
num of dup = 0
block #1 strt = 2
block #1 end = 3
block #2 strt = 5
block #2 end = 5
Signaling Approaches
Figure 7.52 Heartbeat.
253
0 1 2 3 01234567890123456789012345678901 Chunk Flags Heartbeat Length Type = 4 Heartbeat Information TLV (Variable-Length)
transport address defined in the present association. The parameter field (see Figure 7.52) contains the heartbeat information, which is a variable-length opaque data structure understood only by the sender. Chunk Flags: 8 bits. Set to 0 on transmit and ignored on receipt. Heartbeat Length: 16 bits (unsigned integer). Set to the size of the chunk in bytes, including the chunk header and the Heartbeat Information field. Heartbeat Information: variable length. Defined as a variable-length parameter and uses the format described in the “Optional and Variable-Length Parameter Format” section. (See also Figure 7.53.) The sender-specific Heartbeat Information field should normally include information about the sender’s current time when this HEARTBEAT chunk is sent and the destination transport address to which this HEARTBEAT chunk is sent. Heartbeat Acknowledgment (HEARTBEAT ACK) Chunk (ID=5) An endpoint should send this chunk to its peer endpoint as a response to a HEARTBEAT chunk. A HEARTBEAT ACK (see Figure 7.54) is always sent to the source IP
Figure 7.53 Heartbeat information.
Figure 7.54 HEARTBEAT ACK chunk.
Variable Parameters
Status
Type Value
Heartbeat Info
Mandatory
1
0 1 2 3 01234567890123456789012345678901 HB Info Length Heartbeat Info Type = 1 Sender-specific Heartbeat Info
0 1 2 3 01234567890123456789012345678901 Type = 5 Chunk Flags Heartbeat Ack Length Heartbeat Information TLV (Variable-Length)
254
Chapter Seven
address of the IP datagram containing the HEARTBEAT chunk to which this ACK is responding. The parameter field contains a variable-length opaque data structure. Chunk Flags: 8 bits. Set to 0 on transmit and ignored on receipt. Heartbeat Ack Length: 16 bits (unsigned integer). Set to the size of the chunk in bytes, including the chunk header and the Heartbeat Information field. Heartbeat Information: variable length. This field must contain the heartbeat information parameter of the Heartbeat Request to which this HEARTBEAT ACK chunk is responding. Variable parameters
Status
Type value
Heartbeat Information
Mandatory
1
Abort Association (ABORT) Chunk (ID=6) The ABORT chunk (see Figure 7.55) is sent to the peer of an association to close the association. The ABORT chunk may contain Cause parameters to inform the receiver of the reason of the abort. DATA chunks must not be bundled with ABORT. Control chunks (except for INIT, INIT ACK, and SHUTDOWN COMPLETE) may be bundled with an ABORT, but they must be placed before the ABORT in the SCTP packet. If not, they will be ignored by the receiver. If an endpoint receives an ABORT with a format error or for an association that does not exist, it must silently discard it. Moreover, under any circumstances, an endpoint that receives an ABORT must not respond to that ABORT by sending an ABORT of its own. Chunk Flags: 8 bits. Reserved: 7 bits. Set to 0 on transmit and ignored on receipt. T bit: 1 bit. The T bit is set to 0 if the sender had a TCB that it destroyed. If the sender did not have a TCB, it should set this bit to 1. Note: Special rules apply to this chunk for verification; refer to the RFC for details. Length: 16 bits (unsigned integer). Set to the size of the chunk in bytes, including the chunk header and all the Error Cause fields present.
0 1 2 3 01234567890123456789012345678901 Type = 6 Reserved T Length
Figure 7.55 ABORT chunk.
zero or more Error Causes
Signaling Approaches
Figure 7.56 SHUTDOWN chunk.
255
0 1 2 3 01234567890123456789012345678901 Type = 7 Chunk Flags Length = 8 Cumulative TSN Ack
Shutdown Association (SHUTDOWN) Chunk (ID=7) An endpoint in an association must use this chunk to initiate a graceful close of the association with its peer. This chunk has the format shown in Figure 7.56. Chunk Flags: 8 bits. Set to 0 on transmit and ignored on receipt. Length: 16 bits (unsigned integer). Indicates the length of the parameter. Set to 8. Cumulative TSN Ack: 32 bits (unsigned integer). This parameter contains the TSN of the last chunk received in sequence before any gaps. Note: Since the SHUTDOWN message does not contain Gap Ack Blocks, it cannot be used to acknowledge TSNs received out of order. In a SACK, the lack of Gap Ack Blocks that were previously included indicates that the data receiver reneged on the associated DATA chunks. Since SHUTDOWN does not contain Gap Ack Blocks, the receiver of the SHUTDOWN should not interpret the lack of a Gap Ack Block as a renege. Shutdown Acknowledgment (SHUTDOWN ACK) (ID=8) This chunk (see Figure 7.57) must be used to acknowledge the receipt of the SHUTDOWN chunk at the completion of the shutdown process. The SHUTDOWN ACK chunk has no parameters. Chunk flags: 8 bits. Set to 0 on transmit and ignored on receipt. Operation Error (ERROR) (ID=9) An endpoint sends this chunk (see Figure 7.58) to its peer endpoint to notify it of certain error conditions. It contains one or
Figure 7.57 SHUTDOWN ACK chunk.
0 1 2 3 01234567890123456789012345678901 Length = 4 Type = 8 Chunk Flags
Figure 7.58 ERROR chunk.
0 1 2 3 01234567890123456789012345678901 Type = 9 Chunk Flags Length one or more Error Causes
256
Chapter Seven
Figure 7.59 Error causes.
0 1 2 3 01234567890123456789012345678901 Cause Code Cause Length Cause-specific Information
more error causes. An operation error is not considered fatal in and of itself, but it may be used with an ABORT chunk to report a fatal condition. It has the following parameters: Chunk Flags: 8 bits. Set to 0 on transmit and ignored on receipt. Length: 16 bits (unsigned integer). Set to the size of the chunk in bytes, including the chunk header and all the Error-Cause fields present. Error causes are defined as variable-length parameters that use the format shown in Figure 7.59. Cause Code: 16 bits (unsigned integer). Defines the type of error conditions being reported. (See Table 7.17.) Cause Length: 16 bits (unsigned integer). Set to the size of the parameter in bytes, including the Cause Code, Cause Length, and Cause-Specific Information fields. Cause-Specific Information: variable length. This field carries the details of the error condition. Refer to the RFC for definitions of error causes for SCTP. Cookie Echo (COOKIE ECHO) Chunk (ID=10) This chunk (see Figure 7.60) is used only during the initialization of an association. It is sent by the initiator of
Table 7.17 Cause Codes Cause Code value
Cause Code
1 2 3 4 5 6 7 8 9 10
Invalid Stream Identifier Missing Mandatory Parameter Stale Cookie Error Out of Resource Unresolvable Address Unrecognized Chunk Type Invalid Mandatory Parameter Unrecognized Parameters No User Data Cookie Received while Shutting Down
Signaling Approaches
Figure 7.60 COOKIE ECHO chunk.
257
0 1 2 3 01234567890123456789012345678901 Type = 10 Chunk Flags Length Cookie
an association to its peer to complete the initialization process. This chunk must precede any DATA chunk sent within the association, although it may be bundled with one or more DATA chunks in the same packet. Chunk Flags: 8 bit. Set to 0 on transmit and ignored on receipt. Length: 16 bits (unsigned integer). Set to the size of the chunk in bytes, including the 4 bytes of the chunk header and the size of the cookie. Cookie: variable size. This field must contain the exact cookie received in the State Cookie parameter from the previous INIT ACK. An implementation should make the cookie as small as possible to ensure interoperability. Cookie Acknowledgment (COOKIE ACK) Chunk (ID=11) This chunk (see Figure 7.61) is used only during the initialization of an association. It is used to acknowledge receipt of a COOKIE ECHO chunk. It must precede any DATA or SACK chunk sent within the association, although it may be bundled with one or more DATA chunks or a SACK chunk in the same SCTP packet. Chunk Flags: 8 bits. Set to 0 on transmit and ignored on receipt. Shutdown Complete (SHUTDOWN COMPLETE) Chunk (ID=14) This chunk (see Figure 7.62) must be used to acknowledge receipt of the SHUTDOWN ACK chunk at the completion of the shutdown process. The SHUTDOWN COMPLETE chunk has no parameters.
Figure 7.61 COOKIE ACK chunk.
0 1 2 3 01234567890123456789012345678901 Length = 4 Type = 11 Chunk Flags
Figure 7.62 SHUTDOWN COMPLETE chunk.
0 1 2 3 01234567890123456789012345678901 Length = 4 Type = 14 Reserved T
258
Chapter Seven
Chunk Flags: 8 bits. Reserved: 7 bits. Set to 0 on transmit and ignored on receipt. T bit: 1 bit. The T bit is set to 0 if the sender had a TCB that it destroyed. If the sender did not have a TCB, it should set this bit to 1. Note: Special rules apply to this chunk for verification; see RFC for details.
SCTP Association State Diagram During the lifetime of an SCTP association, the SCTP endpoint’s association progresses from one state to another in response to various events. The events that potentially advance an association’s state include • SCTP user primitive calls: for example, [ASSOCIATE], [SHUTDOWN], and [ABORT] • Reception of control chunks: INIT, COOKIE ECHO, ABORT, SHUTDOWN, and so on • Some timeout events The state diagram in Figure 7.63 illustrates state changes, together with the causing events and resulting actions. Note that some of the error conditions are not shown in the state diagram. Chunk types are given in all-capital letters, while parameter names have the first letter capitalized: for example, COOKIE ECHO (chunk type) and State Cookie (parameter name). More than one event or message that causes a state transition is labeled (A), (B), and so forth. To summarize Fig. 7.63: 1. If the State Cookie in the received COOKIE ECHO is invalid (that is, it failed to pass the integrity check), the receiver must silently discard the packet. Or, if the received State Cookie is expired, the receiver must send back an ERROR chunk. In either case, the receiver stays in the CLOSED state. State names are in all capital letters. 2. If the T1-initiation timer expires, the endpoint must retransmit INIT and restart the T1-initiation timer without changing state. This must be repeated up to the Max.Init.Retransmits times. After that, the endpoint must abort the initialization process and report the error to SCTP user. 3. If the T1-cookie timer expires, the endpoint must retransmit COOKIE ECHO and restart the T1-cookie timer without changing state. This must be repeated up to Max.Init.Retransmits times. After that, the endpoint must abort the initialization process and report the error to SCTP user. 4. In the SHUTDOWN SENT state, the endpoint must acknowledge any received DATA chunks without delay.
Figure 7.63 Association state diagram.
rcv INIT generate Cookie snd INIT ACK
(frm any state) rcv ABORT [ABORT] or delete TCB snd ABORT delete TCB
CLOSED
[ASSOCIATE] create TCB snd INIT strt init timer COOKIE-WAIT
rcv valid COOKIE ECHO (1) create TCB snd COOKIE ACK
(2)
rcv INIT ACK snd COOKIE ECHO snd init timer strt cookie timer COOKIE-ECHOED (3)
rcv COOKIE ACK stop cookie timer ESTABLISHED (from the ESTABLISHED state only)
[SHUTDOWN] check outstanding DATA chunks SHUTDOWNPENDING
rcv SHUTDOWN/check outstanding DATA chunks
No more outstanding snd SHUTDOWN strt shutdown timer (4)
SHUTDOWNSENT
(A) rcv SHUTDOWN ACK stop shutdown timer send SHUTDOWN COMPLETE delete TCB
SHUTDOWNRECEIVED
rcv:SHUTDOWN rcv:SHUTDOWN (B) (B)
No more outstanding send SHUTDOWN ACK strt shutdown timer
SHUTDOWNACK-SENT
(B) rcv SHUTDOWN snd SHUTDOWN ACK start shutdown timer move to SHUTDOWN-ACK-SENT
(5, 6)
(7)
(C) rcv SHUTDOWN COMPLETE stop shutdown timer delete TCB (D) rcv SHUTDOWN ACK stop shutdown timer send SHUTDOWN COMPLETE delete TCB CLOSED
259
260
Chapter Seven
5. In SHUTDOWN RECEIVED state, the endpoint must not accept any new send requests from its SCTP user. 6. In SHUTDOWN RECEIVED state, the endpoint must transmit or retransmit data and leave this state when all the data in queue is transmitted. 7. In SHUTDOWN ACK SENT state, the endpoint must not accept any new send request from its SCTP user. The CLOSED state is used to indicate that an association is not created (that is, it does not exist).
Association Initialization Before the first data transmission can take place from one SCTP endpoint (A) to another SCTP endpoint (Z), the two endpoints must complete an initialization process to set up an SCTP association between them. The SCTP user at either endpoint should use the ASSOCIATE primitive to initialize an SCTP association to another SCTP endpoint. Implementation Note: From an SCTP user’s point of view, an association may be implicitly opened, without an ASSOCIATE primitive being invoked, by the initiating endpoint’s act of sending of the first user data to the destination endpoint. The initiating SCTP will assume default values for all mandatory and optional parameters for the INIT/INIT ACK chunk. Once the association is established, unidirectional streams are opened for data transfer on both ends.
Normal Establishment of an Association The initialization process consists of the following steps (assuming that SCTP endpoint A tries to set up an association with SCTP endpoint Z and Z accepts the new association): 1. Endpoint A first sends an INIT chunk to endpoint Z. In the INIT, endpoint A must provide its Verification Tag (Tag_A) in the Initiate Tag field. Tag_A should be a random number in the range of 1 to 4294967295. After sending the INIT, endpoint A starts the T1-init timer and enters the COOKIE WAIT state. 2. Endpoint Z must respond immediately with an INIT ACK chunk. The destination IP address of the INIT ACK must be set to the source IP address of the INIT to which this INIT ACK is responding. In the response, besides filling in other parameters, endpoint Z must set the Verification Tag field to Tag_A, and it must also provide its own Verification Tag (Tag_Z) in the Initiate Tag field. Moreover, endpoint Z must generate and send a State Cookie with the INIT ACK.
Signaling Approaches
261
Note: After sending out INIT ACK with the State Cookie parameter, endpoint Z must neither allocate any resources nor keep any states for the new association. Otherwise, endpoint Z will be vulnerable to resource attacks. 3. Upon reception of the INIT ACK from endpoint Z, endpoint A must stop the T1-initiation timer and leave the COOKIE WAIT state. Endpoint A must then send the State Cookie received in the INIT ACK chunk in a COOKIE ECHO chunk, start the T1-cookie timer, and enter the COOKIE ECHOED state. Note: The COOKIE ECHO chunk can be bundled with any pending outbound DATA chunks, but it must be the first chunk in the packet. Until the COOKIE ACK is returned, the sender must not send any other packets to the peer. 4. Upon reception of the COOKIE ECHO chunk, endpoint Z will reply with a COOKIE ACK chunk after building a TCB and moving to the ESTABLISHED state. A COOKIE ACK chunk may be bundled with any pending DATA chunks (and/or SACK chunks), but the COOKIE ACK chunk must be the first chunk in the packet. Implementation Note: An implementation may choose to send the Communication Up notification to the SCTP user upon reception of a valid COOKIE ECHO chunk. 5. Upon reception of the COOKIE ACK, endpoint A will move from the COOKIE ECHOED state to the ESTABLISHED state, stopping the T1cookie timer. It may also notify its ULP that establishment of the association with a Communication Up notification has been successful. INIT and INIT ACK chunk must not be bundled with any other chunk. They must be the only chunks present in the SCTP packets that carry them. An endpoint must send the INIT ACK to the IP address from which it received the INIT. If an endpoint receives an INIT, INIT ACK, or COOKIE ECHO chunk but decides not to establish the new association because of missing mandatory parameters in the received INIT or INIT ACK, invalid parameter values, or lack of local resources, it must respond with an ABORT chunk. It should also specify the cause of abort, such as the type of the missing mandatory parameters, by including the error-cause parameters with the ABORT chunk. The Verification Tag field in the common header of the outbound SCTP packet containing the ABORT chunk must be set to the Initiate Tag value of the peer. After the reception of the first DATA chunk in an association, the endpoint must immediately respond with a SACK to acknowledge the DATA chunk. When the TCB is created, each endpoint must set its internal Cumulative TSN Ack Point to the value of its transmitted Initial TSN minus one. Implementation Note: The IP addresses and SCTP port are generally used as the key to find the TCB within an SCTP instance.
262
Chapter Seven
Handling Duplicate Items Refer to the RFC for additional protocol states for handling duplicate or unexpected INIT, INIT ACK, COOKIE ECHO, and COOKIE ACK.
User Data Transfer Data transmission must only happen in the ESTABLISHED, SHUTDOWN PENDING, and SHUTDOWN RECEIVED states. The only exception to this is that DATA chunks are allowed to be bundled with an outbound COOKIE ECHO chunk when in COOKIE WAIT state. DATA chunks must only be received according to the rules below in ESTABLISHED, SHUTDOWN PENDING, and SHUTDOWN SENT states. A DATA chunk received in CLOSED state is “out of the blue” and should be handled according to the rules in 8.4. A DATA chunk received in any other state should be discarded. A SACK must be processed in ESTABLISHED, SHUTDOWN PENDING, and SHUTDOWN RECEIVED states. An incoming SACK may be processed in the COOKIE ECHOED state. A SACK in the CLOSED state is out of the blue and should be processed according to the rules in 8.4. A SACK chunk received in any other state should be discarded. A SCTP receiver must be able to receive a minimum of 1500 bytes in one SCTP packet. This means that a SCTP endpoint must not indicate less than 1500 bytes in its initial a_rwnd sent in the INIT or INIT ACK. For transmission efficiency, SCTP defines mechanisms for bundling of small user messages and fragmentation of large user messages. The diagram of Figure 7.64 depicts the flow of user messages through SCTP. Here, the term data sender refers to the endpoint that transmits a DATA chunk and the term data receiver refers to the endpoint that receives a DATA chunk. A data receiver will transmit SACK chunks. To summarize Figure 7.64: 1. When converting user messages into DATA chunks, an endpoint will fragment user messages larger than the current association path MTU into multiple DATA chunks. The data receiver will normally reassemble the fragmented message from DATA chunks before delivery to the user (see Section 6.9 for details). 2. Multiple DATA and control chunks may be bundled by the sender into a single SCTP packet for transmission, as long as the final size of the packet does not exceed the current path MTU. The receiver will unbundle the packet back into the original chunks. Control chunks must come before the DATA chunks in the packet.
Signaling Approaches
Figure 7.64 Flow of user messages through SCTP.
263
User Messages SCTP user (1) SCTP DATA Chunks
SCTP Control Chunks
(2)
(2) SCTP packets
SCTP
Connectionless Packet Transfer Service (e.g., IP) Notes: 1) When converting user messages into DATA chucks, an endpoint will fragment user messages larger than the current association path MTU into multiple DATA chunks. The data receiver will normally reassembled the fragmented message from DATA chunks before delivery to the user (see Section 6.9 for details). 2) Multiple DATA and control chunks may be bundled by the sender into a single SCTP packet for transmission, as long as the final size of the packet does not exceed the current path MTU. The receiver will unbundle the packet back into the original chunks. Control chunks must come before DATA chunks in the packet.
The fragmentation and bundling mechanisms are optional to implement by the data sender, but they must be implemented by the data receiver; that is, an endpoint must properly receive and process bundled or fragmented data.
Transmission of DATA Chunks This protocol is specified as if there is a single retransmission timer per destination transport address, although implementations may have a retransmission timer for each DATA chunk. The data sender must apply the following general rules for transmission and/or retransmission of outbound DATA chunks: 1. At any given time, the data sender must not transmit new data to any destination transport address if its peer’s rwnd indicates that the peer has no
264
Chapter Seven
buffer space (that is, rwnd is 0). However, regardless of the value of rwnd (including if it is 0), the data sender can always have one DATA chunk in flight to the receiver if allowed by cwnd (see rule 2 below). This rule allows the sender to probe for a change in rwnd that the sender missed because of the SACK having been lost in transit from the data receiver to the data sender. 2. At any given time, the sender must not transmit new data to a given transport address if it has cwnd or more bytes of data outstanding to that transport address. 3. When the time comes for the sender to transmit, before sending new DATA chunks, the sender must first transmit any outstanding DATA chunks that are marked for retransmission (limited by the current cwnd). 4. The sender can then send out as many new DATA chunks as rules 1 and 2 above allow. Multiple DATA chunks committed for transmission may be bundled in a single packet. Furthermore, DATA chunks being retransmitted may be bundled with new DATA chunks, as long as the resulting packet size does not exceed the path MTU. A ULP may request that no bundling be performed, but this should only turn off any delays that a SCTP implementation may use to increase bundling efficiency. It does not in itself stop all bundling from occurring (that is, in case of congestion or retransmission). Before an endpoint transmits a DATA chunk, if any received DATA chunks have not been acknowledged (such as from delayed acknowledgment), the sender should create a SACK and bundle it with the outbound DATA chunk, as long as the size of the final SCTP packet does not exceed the current MTU. Implementation Note: When the window is full (that is, transmission is disallowed by rule 1 and/or rule 2), the sender may still accept send requests from its upper layer, but it must transmit no more DATA chunks until some or all of the outstanding DATA chunks are acknowledged and transmission is allowed again per rules 1 and 2. Whenever a transmission or retransmission is made to any address, if the T3retransmission timer of that address is not currently running, the sender must start that timer. If the timer for that address is already running, the sender must restart the timer if the earliest (that is, the lowest TSN) outstanding DATA chunk sent to that address is being retransmitted. Otherwise, the data sender must not restart the timer. When starting or restarting the T3-retransmission timer, the timer value must be adjusted (see the RFC). Note: The data sender should not use a TSN that is more than 231 − 1 above the beginning TSN of the current send window.
Signaling Approaches
265
Acknowledgment on Reception of DATA Chunks The SCTP endpoint must always acknowledge the reception of each valid DATA chunk. The guidelines on the delayed acknowledgment algorithm specified in Section 4.2 of RFC 2581 [20] should be followed. Specifically, an acknowledgment should be generated for at least every second packet—not every second DATA chunk—received and should be generated within 200 ms of the arrival of any unacknowledged DATA chunk. In some situations, it may be beneficial for an SCTP transmitter to be more conservative than that allowed by the algorithms detailed in this chapter. However, an SCTP transmitter must not be more aggressive than that allowed by the following algorithms. An SCTP receiver must not generate more than one SACK for every incoming packet, except for updating the offered window as the receiving application consumes new data. Implementation Note: The maximum delay for generating an acknowledgment may be configured by the SCTP administrator, either statically or dynamically, to meet the specific timing requirement of the protocol being carried. An implementation must not allow the maximum delay to be configured to exceed 500 ms. In other words, an implementation may lower this value below 500 ms but must not raise it above 500 ms. Acknowledgments must be sent in SACK chunks unless shutdown was requested by the ULP, in which case an endpoint may send an acknowledgment in the SHUTDOWN chunk. A SACK chunk can acknowledge the reception of multiple DATA chunks. In particular, the SCTP endpoint must fill in the Cumulative TSN Ack field to indicate the latest sequential TSN (of a valid DATA chunk) that it has received. Any received DATA chunks with TSN greater than the value in the Cumulative TSN Ack field should also be reported in the Gap Ack Block fields. Note: The SHUTDOWN chunk does not contain Gap Ack Block fields. Therefore, the endpoint should use a SACK instead of the SHUTDOWN chunk to acknowledge DATA chunks received out of order. When a packet arrives with duplicate DATA chunks and with no new DATA chunks, the endpoint must immediately send a SACK. If a packet arrives with duplicate DATA chunks bundled with new DATA chunks, the endpoint may immediately send a SACK. Normally, receipt of duplicate DATA chunks occurs when the original SACK chunk has been lost and the peer’s RTO has expired. The duplicate TSN numbers should be reported in the SACK as duplicates. When an endpoint receives a SACK, it may use the Duplicate TSN information to determine whether SACK loss is occurring. Further use of this data is for future study. The data receiver is responsible for maintaining its receive buffers. The data receiver should promptly notify the data sender of changes in its ability to receive data. How an implementation manages its receive buffers is dependent on many factors: for example, the operating system, memory management system, and
266
Chapter Seven
amount of memory. However, the data sender strategy is based on the assumption of receiver operation similar to the following: 1. At initialization of the association, the endpoint tells the peer how much receive-buffer space it has allocated to the association in the INIT or INIT ACK chunk. The endpoint sets a_rwnd to this value. 2. As DATA chunks are received and buffered, the endpoint decrements a_rwnd by the number of bytes received and buffered. This in effect closes rwnd at the data sender and restricts the amount of data it can transmit. 3. As DATA chunks are delivered to the ULP and released from the receive buffers, the endpoint increments a_rwnd by the number of bytes delivered to the upper layer. This in effect opens up rwnd on the data sender and allows it to send more data. The data receiver should not increment a_rwnd unless it has released bytes from its receive buffer. For example, if the receiver is holding fragmented DATA chunks in a reassembly queue, it should not increment a_rwnd. 4. When sending a SACK, the data receiver should place the current value of a_rwnd into the a_rwnd field. The data receiver should take into account that the data sender will not retransmit DATA chunks that are acknowledged via the Cumulative TSN Ack (that is, they will drop from its retransmit queue). Under certain circumstances, the data receiver may need to drop DATA chunks that it has received but has not released from its receive buffers—that is, delivered to the ULP. These DATA chunks may have been acknowledged in Gap Ack Blocks. For example, the data receiver may hold data in its receive buffers while it reassembles a fragmented user message from its peer when it runs out of receive-buffer space. It may drop these DATA chunks even though it acknowledges them in Gap Ack Blocks. If a data receiver drops DATA chunks, it must not include them in Gap Ack Blocks in subsequent SACKs until they are received again via retransmission. In addition, the endpoint should take into account the dropped data when calculating its a_rwnd. An endpoint should not revoke a SACK and discard data. Only in extreme circumstances should an endpoint use this procedure, such as when it is out of buffer space. The data receiver should take into account that dropping Gap Ack Block–acknowledged data can result in suboptimal retransmission strategies in the data sender and thus in suboptimal performance. The example of Figure 7.65 illustrates the use of delayed acknowledgments. If an endpoint receives a DATA chunk with no user data (that is, the length field is set to 16), it must send an ABORT with an error cause set to No User Data. An endpoint should not send a DATA chunk with no user data part.
Signaling Approaches
Figure 7.65 Use of delayed acknowledgments.
Endpoint A
Endpoint Z
{App sends 3 messages; strm 0} DATA [TSN = 7, Strm = 0, Seq = 3] (Start T3-rtx timer)
(ack delayed)
DATA [TSN = 8, Strm = 0, Seq = 4]
267
(send ack) SACK [TSN Ack = 8, block = 0]
(cancel T3-rtx timer) DATA [TSN = 9, Strm = 0, Seq = 5] (Start T3-rtx timer)
(ack delayed)
(cancel T3-rtx timer)
... {App sends 1 message; strm 1} (bundle SACK with DATA) SACK [TSN Ack = 9, block = 0] DATA [TSN = 6, Strm = 1, Seq = 2] (Start T3-rtx timer)
(ack delayed) (send ack) SACK [TSN Ack = 6, block = 0]
(cancel T3-rtx timer)
Processing a Received SACK Each SACK that an endpoint receives contains an a_rwnd value. This value represents the amount of buffer space that the data receiver, at the time of transmitting the SACK, has left of its total receive-buffer space (as specified in the INIT/INIT ACK). Using a_rwnd, Cumulative TSN Ack, and Gap Ack Blocks, the data sender can develop a representation of the peer’s receive-buffer space. One problem that the data sender must take into account when processing a SACK is that a SACK can be received out of order; that is, a SACK sent by the data receiver can pass an earlier SACK and be received first by the data sender. If a SACK is received out of order, the data sender can develop an incorrect view of the peer’s receive-buffer space. Since there is no explicit identifier that can be used to detect out-of-order SACKs, the data sender must use heuristics to determine whether a SACK is new. An endpoint should use the following rules to calculate the rwnd, using the a_rwnd value, the Cumulative TSN Ack, and Gap Ack Blocks in a received SACK. 1. At the establishment of the association, the endpoint initializes the rwnd to the a_rwnd, the peer specified in the INIT or INIT ACK. 2. Any time a DATA chunk is transmitted (or retransmitted) to a peer, the endpoint subtracts the data size of the chunk from the rwnd of that peer.
268
Chapter Seven
3. Any time a DATA chunk is marked for retransmission (via either T3retransmission timer expiration or via fast retransmit), the data size of those chunks must be added to the rwnd. Note: If the implementation maintains a timer on each DATA chunk, only DATA chunks with expired timers will be marked for retransmission. 4. Any time that a SACK arrives, the endpoint performs the following: (a) If the Cumulative TSN Ack is less than the Cumulative TSN Ack Point, the endpoint drops the SACK. Since the Cumulative TSN Ack is monotonically increasing, a SACK in which the Cumulative TSN Ack is less than the Cumulative TSN Ack Point indicates that it is out of order. (b) The endpoint sets rwnd equal to the newly received a_rwnd minus the number of bytes still outstanding after it processes the Cumulative TSN Ack and the Gap Ack Blocks. (c) If the SACK is missing a TSN that was previously acknowledged via a Gap Ack Block (for example, if the data receiver reneged on the data), the endpoint marks the corresponding DATA chunk as available for retransmit and as missing for fast retransmit. If no retransmit timer runs for the destination address to which the DATA chunk was originally transmitted, the endpoint starts the T3 retransmission for that destination address.
Management of Retransmission Timer An SCTP endpoint uses a T3-retransmission timer to ensure data delivery in the absence of any feedback from its peer. The duration of this timer is referred to as the retransmission timeout (RTO). When an endpoint’s peer is multihomed, the endpoint will calculate a separate RTO for each destination transport address of its peer endpoint. The computation and management of RTO in SCTP closely follows how TCP manages its retransmission timer. To compute the current RTO, an endpoint maintains two state variables per destination transport address: smoothed round-trip time (SRTT) and round-trip time variation (RTTVAR). Refer to the RFC for details.
Multihomed SCTP Endpoints An SCTP endpoint is considered multihomed if there is more than one transport address to be used as a destination address to reach that endpoint. Moreover, the ULP of an endpoint must select one of the multiple destination addresses of a multihomed peer endpoint as the primary path. By default, an endpoint should always transmit to the primary path unless the SCTP user explicitly specifies the destination transport address (and possibly source transport address) for use.
Signaling Approaches
269
An endpoint should transmit reply chunks (SACK, HEARTBEAT ACK, and so on) to the same destination transport address from which it received the DATA or control chunk to which it replies. This rule should also be followed if the endpoint bundles DATA chunks together with the reply chunk. However, when it acknowledges multiple DATA chunks received in packets from different source addresses in a single SACK, the SACK chunk may be transmitted to one of the destination transport addresses from which the DATA or control chunks being acknowledged were received. When the receiver of a duplicate DATA chunk sends a SACK to a multihomed endpoint, it may be beneficial for the receiver endpoint to vary the destination address and not use the source address of the DATA chunk. The reason is that receiving a duplicate from a multihomed endpoint might indicate that the return path (as specified in the source address of the DATA chunk) for the SACK is broken. Furthermore, when its peer is multihomed, an endpoint should try to retransmit a chunk to an active destination transport address that differs from the last destination address to which the DATA chunk was sent. Retransmissions do not affect the total outstanding data count. However, if the DATA chunk is retransmitted onto a different destination address, both the outstanding data counts on the new destination address and the old destination address to which the data chunk was last sent must be adjusted accordingly. Failover from Inactive Destination Addresses Some of the transport addresses of a multihomed SCTP endpoint may become inactive by the occurrence of certain error conditions or adjustments from the SCTP user. When there is outbound data to send and the primary path becomes inactive (for example, from failures), or where the SCTP user explicitly requests to send data to an inactive destination transport address, the SCTP endpoint should try to send the data to an alternate active destination transport address, if one exists, before reporting an error to its ULP. When retransmitting data, if the endpoint is multihomed, it should consider each source-destination address pair in its retransmission selection policy. When retransmitting, the endpoint should attempt to pick the most divergent sourcedestination pair from the original source-destination pair to which the packet was transmitted. Note: Rules for picking the most divergent source-destination pair are an implementation decision and are not specified within this document.
Stream-Identifier and Stream-Sequence Number Every DATA chunk must carry a valid stream identifier. If an endpoint receives a DATA chunk with an invalid stream identifier, it must acknowledge the reception of the DATA chunk following the normal procedure, immediately send an ERROR chunk with a cause set to Invalid Stream Identifier, and discard the DATA chunk. The endpoint may bundle the ERROR chunk in the same packet as the SACK as long as the ERROR chunk follows the SACK.
270
Chapter Seven
The stream-sequence number in all the streams must start from 0 when the association is established. Also, when the stream-sequence number reaches the value 65535, the next stream-sequence number must be set to 0.
Ordered and Unordered Delivery Within a stream, an endpoint must deliver DATA chunks received with the U flag set to 0 to the upper layer according to the order of the chunks’ stream-sequence number. If DATA chunks arrive out of order of their stream-sequence number, the endpoint must hold the received DATA chunks from delivery to the ULP until they are reordered. However, an SCTP endpoint can indicate that no ordered delivery is required for a particular DATA chunk transmitted within the stream by setting the U flag of the DATA chunk to 1. When an endpoint receives a DATA chunk with the U flag set to 1, it must bypass the ordering mechanism and immediately deliver the data to the upper layer (after reassembly if the user data is fragmented by the data sender). Doing so provides an effective way of transmitting out-of-band data in a given stream. Also, a stream can be used as an unordered stream by simply setting the U flag to 1 in all DATA chunks sent through that stream. Implementation Note: When sending an unordered DATA chunk, an implementation may choose to place the DATA chunk in an outbound packet that is at the head of the outbound transmission queue if this is possible. The Stream-Sequence Number field in a DATA chunk with U flag set to 1 has no significance. The sender can fill it with arbitrary value, but the receiver must ignore the field. Note: When transmitting ordered and unordered data, an endpoint does not increment its stream-sequence number when transmitting a DATA chunk with the U flag set to 1.
Reporting Gaps in Received DATA TSNs Upon the reception of a new DATA chunk, an endpoint must examine the continuity of the TSNs received. If the endpoint detects a gap in the received DATA chunk sequence, it should send a SACK with Gap Ack Blocks immediately. The data receiver continues sending a SACK after receipt of each SCTP packet that does not fill the gap. Based on the Gap Ack Block from the received SACK, the endpoint can calculate the missing DATA chunks and make decisions on whether to retransmit them. Multiple gaps can be reported in one single SACK. When its peer is multihomed, the SCTP endpoint should always try to send the SACK to the same destination address from which the last DATA chunk was received. Upon reception of a SACK, the endpoint must remove all DATA chunks that have been acknowledged by the SACK’s Cumulative TSN Ack from its transmit queue. The endpoint must also treat all the DATA chunks with TSNs not included in the Gap Ack Blocks reported by the SACK as missing. The number of missing
Signaling Approaches
Figure 7.66 Use of SACK to report a gap.
Endpoint A {App sends 3 messages; strm 0} DATA [TSN = 6, Strm = 0, Seq = 2] (Start T3-rtx timer)
Endpoint Z
DATA [TSN = 7, Strm = 0, Seq = 3]
X (lost)
DATA [TSN = 8, Strm = 0, Seq = 4]
(gap detected, immediately send ack) SACK [TSN Ack = 6, Block = 1, Strt = 2, End = 2]
271
(ack delayed)
(remove 6 from out-queue, and mark 7 as "1" missing report)
reports for each outstanding DATA chunk must be recorded by the data sender to make retransmission decisions. The example in Figure 7.66 shows the use of SACK for reporting a gap. The maximum number of Gap Ack Blocks that can be reported within a single SACK chunk is limited by the current path MTU. When a single SACK cannot because of MTU limitation cover all the Gap Ack Blocks needed to be reported, the endpoint must send only one SACK—reporting the Gap Ack Blocks from the lowest to highest TSNs—within the size limit set by the MTU and leave the remaining highest TSN numbers unacknowledged.
Adler-32 Checksum Calculation When sending an SCTP packet, the endpoint must strengthen the data integrity of the transmission by including the Adler-32 checksum value calculated on the packet. After the packet containing the SCTP common header and one or more control or DATA chunks is constructed, the transmitter must 1. Fill in the proper Verification Tag in the SCTP common header and initialize the checksum field to 0s 2. Calculate the Adler-32 checksum of the whole packet, including the SCTP common header and all the chunks 3. Put the resultant value into the checksum field in the common header and leave the rest of the bits unchanged When an SCTP packet is received, the receiver must first check the Adler-32 checksum, as follows: 1. Store aside the received Adler-32 checksum value 2. Replace the 32 bits of the checksum field in the received SCTP packet with all 0s and calculate an Adler-32 checksum value of the whole received packet
272
Chapter Seven
3. Verify that the calculated Adler-32 checksum is the same as that of the received Adler-32 checksum; if it is not the same, the receiver must treat the packet as an invalid SCTP packet The default procedure for handling invalid SCTP packets is to silently discard them.
Fragmentation and Reassembly An endpoint may support fragmentation when sending DATA chunks, but it must support reassembly when receiving DATA chunks. If an endpoint supports fragmentation, it must fragment a user message if the size of the user message to be sent causes the outbound SCTP packet size to exceed the current MTU. If an implementation does not support fragmentation of outbound user messages, the endpoint must return an error to its upper layer and not attempt to send the user message. Implementation Note: In this error case, the Send primitive would need to return an error to the upper layer. If its peer is multihomed, the endpoint must choose a size no larger than the association path MTU. The association path MTU is the smallest path MTU of all destination addresses. Once a message is fragmented, it cannot be refragmented; instead, if the PMTU has been reduced, the IP fragmentation must be used. When determining when to fragment, the SCTP implementation must take into account the SCTP packet header as well as the DATA chunk headers. It must also take into account the space required for a SACK chunk if it bundles a SACK chunk with the DATA chunk. Fragmentation takes the following steps: 1. The data sender must break the user message into a series of DATA chunks so that each chunk plus SCTP overhead fits into an IP datagram smaller than or equal to the association path MTU. 2. The transmitter must then assign, in sequence, a separate TSN to each of the DATA chunks in the series. The transmitter assigns the same streamsequence number to each of the DATA chunks. If the user indicates that the user message is to be delivered via unordered delivery, the U flag of each DATA chunk of the user message must be set to 1. 3. The transmitter must also set the B and E bits of the first DATA chunk in the series to 10, the B and E bits of the last DATA chunk in the series to 01, and the B and E bits of all other DATA chunks in the series to 00. An endpoint must recognize fragmented DATA chunks by examining the B and E bits in each of the received DATA chunks and queuing the fragmented DATA chunks for reassembly. Once the user message is reassembled, SCTP must pass the reassembled user message to the specific stream for possible reordering and final dispatching.
Signaling Approaches
273
Note: If the data receiver runs out of buffer space while still waiting for more fragments to complete the message reassembly, it should dispatch part of its inbound message through a partial delivery API, freeing some of its receive buffer space so that the rest of the message may be received.
Bundling An endpoint bundles chunks by simply including multiple chunks in one outbound SCTP packet. The total size of the resultant IP datagram, including the SCTP packet and IP headers, must be less than or equal to the current path MTU. If its peer endpoint is multihomed, the sending endpoint must choose a size no larger than that of the latest MTU of the current primary path. When it bundles control chunks with DATA chunks, an endpoint must place the bundled control chunks first in the outbound SCTP packet. The transmitter must transmit DATA chunks within a SCTP packet in increasing order of TSN. Note: Since control chunks must be placed first in a packet, and since DATA chunks must be transmitted before SHUTDOWN or SHUTDOWN ACK chunks, DATA chunks cannot be bundled with SHUTDOWN or SHUTDOWN ACK chunks. Partial chunks must not be placed in an SCTP packet. An endpoint must process received chunks in their order in the packet. The receiver uses the Chunk Length field to determine the end of a chunk and beginning of the next chunk, taking account of the fact that all chunks end on a 4-byte boundary. If the receiver detects a partial chunk, it must drop the chunk. An endpoint must not bundle INIT, INIT ACK, or SHUTDOWN COMPLETE chunks with any other chunks. Refer to the RFC for a discussion of congestion control and fault management.
Termination of an Association An endpoint should terminate its association when it exits from service. An association can be terminated by either abort or shutdown. An abort of an association is abortive by definition in that any data pending on either end of the association is discarded and not delivered to the peer. A shutdown of an association is considered a graceful close where all data in queue by either endpoint is delivered to the respective peers. However, in the case of a shutdown, SCTP (like TCP) does not support a half-open state wherein one end may continue to send data while the other end is closed. When either endpoint performs a shutdown, the association on each peer will stop accepting new data from its user and only deliver data in queue at the time of sending or receiving the SHUTDOWN chunk.
Abort of an Association When an endpoint decides to abort an existing association, it must send an ABORT chunk to its peer endpoint. The sender must fill in the peer’s Verification
274
Chapter Seven
Tag in the outbound packet and not bundle any DATA chunk with the ABORT. An endpoint must not respond to any received packet that contains an ABORT chunk. An endpoint receiving an ABORT must apply the special Verification Tag check rules described in the RFC. After checking the Verification Tag, the receiving endpoint must remove the association from its record and report the termination to its upper layer.
Shutdown of an Association Using the SHUTDOWN primitive, the upper layer of an endpoint in an association can gracefully close the association. Doing so allows all outstanding DATA chunks from the peer of the shutdown initiator to be delivered before the association terminates. Upon receipt of the SHUTDOWN primitive from its upper layer, the endpoint enters the SHUTDOWN PENDING state and remains there until all outstanding data has been acknowledged by its peer. The endpoint accepts no new data from its upper layer; instead, it retransmits data to the far end if necessary to fill gaps. Once all its outstanding data has been acknowledged, the endpoint must send a SHUTDOWN chunk to its peer and include in the Cumulative TSN Ack field the last sequential TSN that it receives from the peer. It must then start the T2shutdown timer and enter the SHUTDOWN SENT state. If the timer expires, the endpoint must resend the SHUTDOWN with the updated last sequential TSN received from its peer. The rules must be followed to determine the proper timer value for T2 shutdown. To indicate any gaps in TSN, the endpoint may also bundle a SACK with the SHUTDOWN chunk in the same SCTP packet. An endpoint should limit the number of retransmissions of the SHUTDOWN chunk to the protocol parameter Association.Max.Retrans. If this threshold is exceeded, the endpoint should destroy the TCB and report the peer endpoint as unreachable to the upper layer (thus, the association enters the CLOSED state). The reception of any packet from its peer (that is, as the peer sends all of its queued DATA chunks) should clear the endpoint’s retransmission count and restart the T2shutdown timer, giving its peer ample opportunity to transmit all of its queued DATA chunks that have not yet been sent. Upon the reception of the SHUTDOWN chunk, the peer endpoint must 1. Enter the SHUTDOWN RECEIVED state. 2. Stop accepting new data from its SCTP user. 3. Verify by checking the Cumulative TSN Ack field of the chunk that all its outstanding DATA chunks have been received by the SHUTDOWN sender.
Signaling Approaches
275
Once an endpoint has reached the SHUTDOWN RECEIVED state, it must not send a SHUTDOWN chunk in response to a ULP request and should discard subsequent SHUTDOWN chunks. If there are still outstanding DATA chunks left, the SHUTDOWN receiver must continue to follow normal data transmission procedures until all outstanding DATA chunks are acknowledged; however, the SHUTDOWN receiver must not accept new data from its SCTP user. While in the SHUTDOWN SENT state, the SHUTDOWN sender must immediately respond to each received packet containing one or more DATA chunks with a SACK, a SHUTDOWN chunk, and restart the T2 shutdown timer. If it has no more outstanding DATA chunks, the SHUTDOWN receiver must send a SHUTDOWN ACK chunk and start a T2 shutdown timer of its own, entering the SHUTDOWN ACK SENT state. If the timer expires, the endpoint must resend the SHUTDOWN ACK chunk. The sender of the SHUTDOWN ACK chunk should limit the number of retransmissions of the SHUTDOWN ACK chunk to the protocol parameter Association.Max.Retrans. If this threshold is exceeded, the endpoint should destroy the TCB and report the peer endpoint as unreachable to the upper layer (thus, the association enters the CLOSED state). Upon reception of the SHUTDOWN ACK chunk, the SHUTDOWN sender must stop the T2 shutdown timer, send a SHUTDOWN COMPLETE chunk to its peer, and remove all record of the association. Upon reception of the SHUTDOWN COMPLETE chunk, the endpoint verifies that it is in the SHUTDOWN ACK SENT state if it is not the chunk should be discarded. If the endpoint is in the SHUTDOWN ACK SENT state, the endpoint should stop the T2 shutdown timer and remove all knowledge of the association (thus, the association enters the CLOSED state). An endpoint should ensure that all of its outstanding DATA chunks have been acknowledged before initiating the shutdown procedure. An endpoint should reject any new data request from its upper layer if it is in the SHUTDOWN PENDING, SHUTDOWN SENT, SHUTDOWN RECEIVED, or SHUTDOWN ACK SENT state. If an endpoint is in SHUTDOWN ACK SENT state and receives an INIT chunk (for example, if the SHUTDOWN COMPLETE chunk was lost) with source and destination transport addresses (either in the IP addresses or in the INIT chunk) that belong to this association, it should discard the INIT chunk and retransmit the SHUTDOWN ACK chunk. Note: Receipt of an INIT with the same source and destination IP addresses as those used in the transport addresses assigned to an endpoint but with a different port number indicates the initialization of a separate association. The sender of the INIT or COOKIE ECHO should respond to the receipt of a SHUTDOWN ACK with a standalone SHUTDOWN COMPLETE in an SCTP
276
Chapter Seven
packet with the Verification Tag field of its common header set to the same tag that was received in the SHUTDOWN ACK packet. This is considered an Out-ofthe-Blue packet. The sender of the INIT lets the T1-initiation timer continue running and remains in the COOKIE WAIT or COOKIE ECHOED state. Normal T1-initiation timer expiration causes the INIT or COOKIE chunk to be retransmitted and thus start a new association. If a SHUTDOWN is received in the COOKIE WAIT or COOKIE ECHOED states, the SHUTDOWN chunk should be silently discarded. If an endpoint is in the SHUTDOWN SENT state and receives a SHUTDOWN chunk from its peer, the endpoint must respond immediately with a SHUTDOWN ACK to its peer and move into a SHUTDOWN ACK SENT state, restarting its T2-shutdown timer. If an endpoint is in the SHUTDOWN ACK SENT state and receives a SHUTDOWN ACK, it must stop the T2-shutdown timer, send a SHUTDOWN COMPLETE chunk to its peer, and remove all record of the association. Refer to the RFC for a description of the interface with ULP.
References 1. D. Minoli and G. Dobrowski. Signaling Principles for Frame Relay and Cell Relay Services. Norwood, MA: Artech House, 1994. 2. D. Minoli. Telecommunications Technology Handbook. Norwood, MA: Artech House, 1991. 3. Vovida.org. “Voice Over IP protocols, An Overview.” March 2001. 4. SIP, H.323, and MGCP/MEGACO Comparison. www.sipcenter.com/ aboutsip/siph323/mgcpback.htm. 5. B. Yocom. “Voice over IP Is a (Fast) Moving Target.” Network World (January 29, 2001). 6. Performance Technologies. “Signaling in Switched Circuit and VoIP Networks.” www.pt.com/tutorials/iptelephony/tutorial_ss7_ip_interworking.pdf. 7. RadCom Corporation, H.323 Tutorial, March 1998. 8. D. Oran. “Understanding VoIP.” NGN 2001 Conference Record, Boston, MA. 9. “Signaling in Switched Circuit and VoIP Networks.” www.pt.com/tutorials/ iptelephony/tutorial_ss7_ip_interworking.pdf. 10. I. Dalgic and F. Hanlin. “Comparison of H.323 and SIP for IP Telephony Signaling.” Proceedings of Photonics East. Boston, MA: SPIE, September 1999. 11. www.cs.columbia.edu/~hgs/papers/others/Dalg9909_Comparison.pdf 12. Promotional materials, Trillium.
Signaling Approaches
277
13. R. Stewart, K. Morneault, et al. “Stream Control Transmission Protocol.” RFC 2960 (October 2000). 14. J. Postel (ed.). “Transmission Control Protocol.” RFC 793, STD 7 (September 1981). 15. J. Postel (ed.). “User Datagram Protocol.” RFC 768, STD 6 (August 1980). 16. RFC 2522. P. Karn, W. Simpson, Photuris: Session-Key Management Protocol, March 1999. 17. R. Elz and R. Bush. “Serial Number Arithmetic.” RFC 1982 (August 1996). 18. RFC 2373. R. Hinden, S. Deering, IP Version 6 Addressing Architecture, 1998. 19. R. Braden. “Requirements for Internet Hosts—Application and Support.” RFC 1123, STD 3 (October 1989). 20. M. Allman, V. Paxson, and W. Stevens. “TCP Congestion Control.” RFC 2581 (April 1999).
Notes 1
The MEGACO initiative has its genesis in IP Device Control (IPDC), proposed by Level 3, 3Com, Alcatel, Cisco, and others, and Simple Gateway Control Protocol (SGCP), proposed by Telcordia. These protocols were brought together by IETF to form MGCP (Media Gateway Control Protocol), and work continues under the responsibility of the MEGACO Working Group. 2 Portions of the material in Section 7.2 and 7.8 are based on the Performance Technologies White Paper Signaling in Switched Circuit and VoIP Networks (www.pt.com/tutorials/iptelephony/tutorial_ss7_ip_interworking.pdf). Performance Technologies is a leading supplier of packet-based telecommunications and networking products that enable convergence of wireline, wireless, and next-generation IP networks. This company designs and develops a variety of technology solutions; among them are carrier-grade signaling gateways and Ethernet switching and network-access products—all targeted toward specific opportunities in the expanding the Internet-driven marketplace. Specifically, the company is a supplier of next-generation IP telephony products. 3 The support of video is optional. 4 Copyright (C) The Internet Society (2000). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works.
CHAPTER 8 Quality of Service 8.1
Introduction
Among the technical factors that have held back deployment of VOIP on a broad scale, one finds (1) lack of true support heretofore for Quality of Service (QoS) capabilities in packet networks and (2) as noted in Chapter 7, lack of robust signaling capabilities in available products for supporting efficient, pervasive, reliable, and routine interworking with the embedded PSTN base of 1.4 billion stations.1 Business factors that have held back VOIP deployment include the dearth of new applications, as well as the all but missing unbiased economic analysis in support of a VOIP business case for incumbent carriers. To begin addressing these issues, this chapter focuses on QoS protocols and approaches for packet networks in general and IP and MPLS in particular. A case study is included at the end of the chapter to illustrate the concepts discussed. QoS relates to use of design criteria, selection of protocols, determination of architectures, identification of approaches, choice of network restoration techniques, design of node buffer management, and other network aspects to ensure that end-to-end goals for congestion and availability, delay, jitter, throughput, and loss be reliably met over a specified time span and traffic load between any two chosen points in the network. These parameters are defined as follows [1]: 1. Congestion. A network condition where traffic bottles up in queues to the point that it noticeably and negatively impacts the operation of the application. 2. Service availability. The reliability of a user’s connection through the network.
279
280
Chapter Eight
3. Delay. The time taken by a packet to travel through the network from one end to another. 4. Delay jitter. The variation in the delay encountered by similar packets following the same route through the network. 5. Throughput. The rate at which packets go through the network. 6. Packet-loss rate. The rate at which packets are dropped, become lost, or become corrupted (some bits changed in the packet) while they go through the network. The industry has been working on the QoS issue for a decade now, but besides a cornucopia of multi-inch-thick protocols,2 relatively little deployment of QoSenabled networks has been seen in extranets, intranets, the Internet, and carrier networks. However, there is no dearth of QoS literature, as indicated by the reference list at the end of this chapter.3 Obviously, the protocol cornucopia leaves something to be desired; otherwise we would have seen by now statistically significant penetration of these protocols in the tens of thousands of networks that are deployed currently. Given the prolific nature of the specification development activities in recent years, in addition to the clear need for QoS in support of VOIP penetration beyond its current one percent market share, we provide in this chapter a short review of the field. The material is based on various industry sources, as well as on a book written by the senior author that discusses Internet technologies and includes an extensive treatment of QoS [2]. References [3] and [4] also cover QoS. In reference [5], the senior author describes a number of analytical design techniques for broadband networks. No fewer than five approaches have evolved for QoS in recent years, as follows: 1. ATM-based QoS approaches 2. Overengineering the network without using any special QoS discipline 3. Use of high-throughput “gigarouters,” with advanced queue management, without using any special QoS discipline 4. Per-flow QoS technology—IETF’s Integrated Services (intserv) Working Group recommendations 5. Class-based QoS technology—IETF’s Differentiated Services (diffserv) Working Group recommendations Some of these approaches reflect different philosophies regarding QoS. One school of thought looks to overprovisioning (assuming that the bandwidth exceeds demand); a second looks to traffic engineering (steering traffic away from congestion); and a third looks to advanced queueing techniques, where true contention exists for the resource (because it is considered scarce). “Internet folks” often take the approach of overprovisioning without much mathematically sophisticated
Quality of Service
281
analysis. “Incumbent carriers” often prefer robust (but complex) controls; however, they have focused more on Permanent Virtual Connections (PVCs) networks (for example, X.25 PVCs, Frame Relay PVCs, and ATM PVCs) rather than on switched/connectionless environments. In this chapter, we look briefly at the approach of advanced queue management and focus the discussion on intserv and diffserv. Figure 8.1 provides an easy-to-grasp look at the various techniques [6].
8.2
Background
QoS is defined as those mechanisms that give network administrators the ability to manage traffic bandwidth, delay, jitter, loss, and congestion throughout the network [7]. To realize true QoS, a QoS-endowed architecture must be applied end to end, not just at the edge of the network or at select network devices.4 The solution must encompass a variety of technologies that can interoperate so that they deliver scalable, feature-rich services throughout the network. The services must provide efficient use of resources by facilitating the aggregation of large numbers of IP flows where needed while, at the same time, providing fine-tuned granularity to those premium services defined by service level agreements (SLAs) in general and real-time requirements in particular. The architecture must also provide the Complexity Integrated Services • per-flow state maintenance • RSVP Signaling • guaranteed service • bounded delay "In-between" • packets carry class/ priority identifier • manage per-class queues in routers
Figure 8.1 Cost-complexity trade-off.
Best-Effort • fair access to all • FIFO queuing • no priority Cost
282
Chapter Eight
mechanisms and capabilities to monitor, analyze, and report detailed network status, because the need to continuously undertake traffic engineering, network tuning, and provisioning of new facilities is not going to go away, considering the growth of demand on the network will continue to be in the double-digits percentage points for many years. Armed with this knowledge, network administrators or network-monitoring software can react quickly to changing conditions, ensuring the enforcement of QoS policies. Finally, the architecture must also provide mechanisms to defend against the possibility of theft, to prevent denial of service, and to anticipate equipment failure [8]. Figure 8.2 depicts a simplified environment for a QoS-enabled network. In general terms, QoS services in packet-based networks can be achieved in two possible ways: 1. Out-of-band signaling mechanisms, to secure allocations of shared network resources. This includes signaling for different classes of services in ATM
High Priority: VOIP
Low Priority: Data
Customer Edge defines class
Best Effort: Internet
IP/MPLS
Network
Figure 8.2 Basic concept of QoS-enabled network.
Network Edge recognizes class marketing
Egress traffic prioritized based on class setting
Quality of Service
283
and RSVP. It should be immediately noted, however, that RSVP only reserves, but does not provide, bandwidth; as such, it augments existing unicast/multicast routing protocols—IP in particular. In turn, IP will have to rely on Packet over SONET (POS), ATM [say, via Classical IP over ATM (CIOA)], or GMPLS (optical switch control) to obtain bandwidth. This approach is used in the intserv model described following this list. 2. In-band signaling mechanisms, where carriers and ISPs can provide priority treatment to packets of certain types. This could be done, for example, with the Type of Service (TOS) field in the IPv4 header, the priority field in the IPv6 header, or the priority field in the Virtual LAN (VLAN) IEEE 802.1Q/1p header. The MPLS label is another way to identify to the router or IP switch for which special treatment is required. If routers, switches, and end systems all recognized and used the appropriate fields, and if the queues in the routers or switches were managed effectively according to the priorities, and if adequate resources (buffers, links, backup routes, and so forth) were provided in the network, this method of providing QoS “guarantees” could be called the simplest. The reason is that no new protocols would be needed, that the carrier’s router can configure in advance to recognize labels of different information flow types, and that relatively little state needs to be kept in the network. The in-band approach is used in the diffserv model described below. Specific tools available to the designer of an IP/MPLS network that is intended to support VOIP include the following: • Intserv/RSVP, a bandwidth “reservation” mechanism targeted to enterprise networks (because of size considerations) and also targeted to MPLS label distribution and MPLS QoS. • diffserv, in which a diffserv Codepoint (DSCP) is associated for every packet and defined per-hop behaviors (PHBs). • MPLS, where label-switched paths (LSPs) (especially in the core of the network for aggregating traffic flows) having different characteristics (link use, link capacity, number of link hops, and so on) are defined. MPLS uses the approach of mapping diffserv PHBs in the access network to flows in core network. • Traffic-management mechanisms, which embrace traffic shaping, marking, dropping, and queue handling. There are priority- and class-based queuing with disciplines such as Random Early Detection (RED) and other methods. As noted previously, two philosophical approaches exist to satisfy the service requirements of applications:
284
Chapter Eight
1. Overprovisioning, the overallocation of resources that meets or exceeds peak load requirements 2. Managing and controlling allocation of network and computing resources Depending on the deployment, overprovisioning can be viable if it is a simple matter of upgrading to faster LAN switches and network interface cards (NICs) or adding memory, CPUs, disks, and so on. However, it may not be viable or cost-effective in many other cases, such as when dealing with relatively expensive long-haul WAN links. Overprovisioned resources remain underused; they are used only during short peak periods. Better management consists of optimizing the use of existing resources, such as limited bandwidth and CPU cycles. VOIP stakeholders (carriers and intranet planners) have an economic incentive to deploy viable QoS capabilities so that an acceptable grade of service can be provided to the end users [9].5
8.3
QoS Approaches
Per-Flow QoS The IETF intserv Working Group has developed the mechanisms with link-level per-flow QoS control. RSVP is used for signaling. The services of intserv are (1) guaranteed and (2) controlled load service; the ITU-T Y.iptc (IP traffic control) has renamed these to delay-sensitive statistical bandwidth capability and delay-insensitive statistical bandwidth capability, respectively. (ITU Y.iptc effort uses intserv services and diffserv expedited forwarding.) The intserv architecture [10] defines QoS services and reservation parameters to be used to obtain the required QoS for an Internet flow. RSVP [11] is the signaling protocol used to convey these parameters from one or multiple senders toward a unicast or multicast destination. RSVP assigns QoS with the granularity of a single application’s flows [12]. The Work Group is now also looking at new RSVP extensions. The following is a snapshot of RSVP: • IETF developed an intserv model that is designed to support real-time services on the Internet and in enterprise internets. • The intserv model defines architecture of RSVP service guarantees. • intserv uses a setup protocol whereby hosts and routers signal QoS requests that pertain to a flow to the network and to each other. • intserv starts with a flow-based description of the problem being solved. • intserv defines the traffic and QoS characteristics for a flow. • Traffic-control mechanisms control traffic flows within a host/router to support the required QoS.
Quality of Service
285
• intserv encompasses three QoS classes: (1) guaranteed service, (2) controlled load service, and (3) best-effort service. • Guaranteed service. This service allows the user to request a maximum delay bound for an end-to-end path across a packet network. • Controlled load service. This service provides a small set of service levels, each differentiated by delay behavior. It supports three relative levels, but without particular numerical values of delay associated with them. • Best-effort service. This baseline (default) service can be achieved over the Internet or an intranet without any QoS modifications. • The goal of the intserv model is to make underlying technology from application but still provide the following features: • Internetwork routing, allowing applications to achieve their desired performance from the network via an optimal path selection. • Multicast capability, permitting one-to-many or many-to-many communication flows. • QoS facilities, representing parameters that describe desired characteristics that applications can expect from a network. • intserv requires work for each Layer 2 technology. • IETF uses different subgroups to look at Ethernet, ATM, and others. • The RSVP Signaling Protocol uses resource reservations’ source-todestination messages to secure QoS-based connectivity and bandwidth. • The operation is as follows: • Along the path between the source and target, resource requests are used to obtain permission from admission control software to use available local resources—buffers, trunks, and so forth—to support desired QoS. • The resource requests reestablish the reservation state, thereby committing reservation. • When the desired request cannot be fulfilled, a request-failure message is generated and returned to the appropriate party. • It is possible for reservation messages to become lost. The following list shows the messages used by the RSVP protocol: • • • • •
PATH—sent by the sender, records a path between sender and receiver. RESV—sent by the receiver, reserves resources along a path. ResvErr, PathErr—for errors. ResvTear, PathTear—for teardown of a reservation or path. ResvConf—for confirmation.
286
Chapter Eight
Signaling traffic is exchanged between routers belonging to a core area. After a reservation has been established, each router must classify each incoming IP packet to determine whether it belongs to a QoS flow; if so, it assigns the needed resources to the flow. Figure 8.3 shows an RSVP classifier function. The intserv classifier is based on a multifield classification because it checks five parameters in each IP packet, namely, Source IP address, Destination IP address, Protocol ID, Source Transport Port, and Destination Transport Port. The classifier function generates a FLOWSPEC object, as shown in Figure 8.4. The following is a list of application categories that intserv addresses: Elastic Applications. These applications have no constraints for delivery as long as the packets reach their destination, and there is no specific demand on the delay bounds or bandwidth requirements. Examples are Web browsing and e-mail. Real-Time Tolerant (RTT). These applications demand weak bounds on the maximum delay over the network. Occasional packet loss is acceptable. An example is Internet radio applications, which use buffering (hiding the packet losses from the application). Real-Time Intolerant (RTI). These applications demand tight bounds on latency and jitter. An example is a VOIP application, where excessive delay and jitter are hardly acceptable. To service these classes, intserv, using the various mechanisms at the routers, supports the following classes of service:
Integrated Services: Building Blocks: Host
Router rsvp
application RSVP daemon
routing PC AC
classifier
scheduler
Figure 8.3 RSVP classifier function.
AC = Admission Control PC = Policy Control
RSVP daemon
data
PC AC classifier
scheduler
Quality of Service
Figure 8.4 RSVP flow descriptors.
287
Flow Descriptors Flow
Data
FilterSpec =
specification of packets belonging to a flow eg. source addr. source port
Flow Descriptor = QOS
FlowSpec =
source source behavior Tspec = = rate, burstiness network reservation, eg. Rspec = delay or priority
Guaranteed service. This service is meant for RTI applications. It “guarantees” bandwidth for the application traffic and deterministic upper bound on delay. Controlled load service. This service is intended for RTT traffic. The average delay is guaranteed, but the end-to-end delay experienced by some arbitrary packet cannot be determined. An example is H.323 traffic. RSVP can support an intserv view of QoS; as noted elsewhere, it can also be used as a signaling protocol for MPLS for distributing labels (although a distinct label-distribution protocol is also available to MPLS, as seen in Chapter 2). In the mid-1990s, RSVP was developed to address network congestion by allowing routers to decide in advance whether they could meet the requirements of an application flow and then reserve the desired resources if they were available. RSVP was originally designed for the installation of a forwarding state associated with resource reservations for individual host-to-host traffic flows [13]. The physical path of the flow across a service provider’s network was determined by conventional destination-based routing (for example, RIP, OSPF, and IGP). By the late 1990s, RSVP became a proposed standard, and it has since been implemented in a variety of IP networking equipment. However, RSVP has not been widely used in service provider/carrier networks, because of operator concerns about its scalability and the overhead required to support potentially millions of host-tohost flows. RFC 2208, an informational document, discusses issues related to the scalability posed by the signaling, classification, and scheduling mechanisms [14]. An important consequence of this problem is that intserv-level QoS can be provided only within peripheral areas of a large network, preventing its extension inside core areas and the implementation of end-to-end QoS. IETF RSVP-related work groups have undertaken some work to overcome these problems. The RSVP Work Group
288
Chapter Eight
has recently published the RFC 2961, which describes a set of techniques to reduce the overhead of RSVP signaling. This RFC, however, does not deal with the classification problem; this remains to be addressed. One Internet Draft discusses the possibility of aggregating RSVP sessions into a larger one [15]. This aggregated RSVP session would use a DSCP for its traffic [12].
Class-Based QoS The IETF diffserv Working Group has developed a class-based QoS. Packets are marked at network edges. Routers use markings to decide how to handle the packets. There are four services, as follows: 1. 2. 3. 4.
Best-effort—normal Internet traffic Seven precedence levels—prioritized classes of traffic Expedited Forwarding (EF)—leased linelike service Assured Forwarding (AF)—four queues with three drop classes
The diffserv approach requires edge policing, but this technology is not yet defined. In a diffserv domain (RFC 2475), all the IP packets crossing a link and requiring the same diffserv behavior are said to constitute a behavior aggregate (BA). At the ingress node of the diffserv domain, the packets are classified and marked with a DSCP, which corresponds to their BA. At each transit node, the DSCP is used to select the PHB that determines the scheduling treatment and, in some cases, drop probability for each packet. At face value, diffserv appears capable of scaling more easily than intserv; also, it is simpler. Packet purists will probably argue that diffserv is the best approach, because there is very little if any state information kept along the route; those more in the carriers’ camp will probably argue that intserv is a better approach, because resource reservations and allocations can be better managed in the network in terms of their capability to engineer networks and maintain SLAs. It is within reason to assume that if the design is properly supported by statistically valid and up-to-date demand information,6 and resources are quickly added when needed, that either approach will probably provide reasonable results. Table 8.1 depicts a mapping between the various QoS classes advanced by developers. One is not able to generalize which of these techniques is better for VOIP, because the decision will have to be based on the type of network architecture one chooses to implement and the size of the network both in terms of network elements (NEs) and lines supported. One cannot argue that a metric wrench is better than a regular wrench. If one is working on a European-built engine, then a metric wrench would obviously be superior; if one is working on a U.S.-built engine, then a regular wrench would be the answer. For example, in theory, a reservation
Quality of Service
Table 8.1 CoS queues 1 2
3 4 5
6
289
Mapping between Various QoS Classes Applications Virtual private line Multimedia (VOMPLS, video) Business applications VPN/Internet Network control/routing protocols Network control/signaling protocols
Services classes
Diffserv definitions
ATM definitions
Pure priority 3
EF3
CBR
Real-time
AF1
VBR-rt
Assured delivery
AF4
VBR-nrt
Best-effort Pure priority 1
BE EF1
UBR/ABR —
Pure priority 2
EF2
—
scheme (specifically, intserv) would seem fine in a small network with a 3 to 7 endto-end hop diameter (the U.S. voice network kind of fits this range); a network of large diameter, where paths may be 8 to 15 hops, may find a reservation scheme too burdensome but a node-by-node distributed approach (specifically, diffserv) better (the Internet kind of fits this range). The same kind of argument also applies when looking at the total number of nodes (separate and distinct from the network diameter): If the network in question is the national core network with 10 to 20 core nodes, the reservation (intserv) scheme may be fine; but if the network in question covers all the tiers of a voice network with around 400 to 500 interacting nodes, the diffserv approach may be better. These are just general observations; the decision regarding the best method must be made based on careful networkspecific analysis, as well as product availability.
MPLS-Based QoS Prima facie, the use of MPLS affords a packet network the possibility for an improved level of QoS control as compared with pure IP. MPLS developers have proposed both diffserv- and interv-style approaches to QoS in MPLS. QoS controls are critical for multimedia application in intranets, dedicated (WAN) IP networks, VPNs, and a converged Internet. Services such as VOIPOMPLS, VOMPLS, MPLS VPNs, Layer 2 VPN (L2VPN), Differentiated Services Traffic Engineering (DSTE), and draft-martini typically require service differentiation in particular and QoS support in general. It is important to realize, however, that MPLS per se is not a QoS solution; it still needs a distinct mechanism to support QoS. The issue of QoS in an MPLS network is treated at length in Reference [16].
290
Chapter Eight
In the diffserv-style case, the EXPerimental (EXP) bits of the header are used to trigger scheduling and/or drop behavior at each LSR. This solution, based on Reference [17], allows the MPLS network administrator to select how diffserv BAs are mapped onto LSPs so that he or she can best match the diffserv, traffic engineering (TE), and protection objectives within his or her particular network. The proposed solution allows the network administrator to decide whether different sets of BAs are to be mapped onto the same LSP or separate LSPs. The MPLS solution relies on the combined use of two types of LSPs, as follows: 1. LSPs that can transport multiple Ordered Aggregates so that the EXP bits field of the MPLS shim header conveys to the LSR the PHB to be applied to the packet, covering information about the packet’s scheduling treatment and its drop precedence. 2. LSPs that transport only a single ordered aggregate so that the LSR infers the packet’s scheduling treatment exclusively from the packet’s label value while it conveys the packet’s drop precedence in the EXP field of the MPLS shim header or in the encapsulating link layer–specific selective drop mechanism (ATM, Frame Relay, or 802.1). Some developers have proposed a solution that efficiently combines the application-oriented intserv QoS with the power of MPLS label switching. The proposal is contained in Reference [12], which defines intserv-like QoS services in MPLS domains and targets the following problems: 1. Providing a user-driven MPLS QoS path setup. An application uses the standard intserv Reservation API to allocate network resources; the intserv reservation (signaled by using RSVP) is then mapped at the Ingress LSR of the MPLS domain into the proper CR-LSPs. 2. Reducing the Constraint-Based Label Distribution Protocol (CR-LDP) signaling overhead, providing caching and aggregation of CR-LSPs. Manual configuration of the bandwidth/signaling trade-off, as well as automatic load discovery mechanisms, is allowed. The key element of this solution is the MPLS Ingress LSR, which acts like an MPLS/intserv QoS gateway. The CR-LDP protocol allows an LSP with QoS constraints [18] to perform QoS classification by using a single valued label, not a multifield one. The main limitation of this solution is that it cannot be used by end hosts, because they cannot support CR-LDP signaling. However, intserv has been designed to allow applications to signal QoS requirements on their own (e.g., Reservation APIs are available, and many operating systems allow applications to use them.) The basic idea given in Reference [12] is to combine the applicationoriented intserv QoS with the power of MPLS label switching—that is, to define
Quality of Service
291
intserv-like QoS services in MPLS domains. By using these mechanisms, end-to-end QoS is reached without service disruptions between MPLS domains and intserv areas (the MPLS Ingress LSR acts like an MPLS/intserv QoS gateway). At the same time, the number of and effects of the changes to the current CR-LDP specifications are minimal. Most of the integration work is bounded into the Ingress LSR at the sender side of the MPLS domain’s border.
Traffic Management/Queue Management As noted previously, two approaches have been used for allocation: resource (1) the Out-of-Band Reservation Model (intserv/RSVP and ATM), which requires applications to signal their traffic requirements to the serving switch and, in turn, set up a source-to-destination path with such reserved resources as bandwidth and buffer space that either guarantee the desired QoS service or ensure with reasonable certainty that the desired service is to be provided; and (2) in-band precedence priority, where packets are marked or tagged according to priority, such as the diffserv DSCP, IP Precedence TOS, and IEEE 802.1Q/1p, and where a router takes aggregated traffic, segregates its flows into classes, and provides preferential treatment of classes, after which a router reads these markings and treats the packets accordingly. Both approaches, but especially the in-band priority, require advanced traffic and queue management. Typically, delays and QoS degradation are accumulated at points in the network where there are queues. Queues arise when the server capacity (for example, an outgoing link or a CPU undertaking a task such as a sort or table lookup) is less than the aggregated demand for service brought along by the incoming “jobs” (for example, packets). Because of how internetworking technology has been developed in the past fifteen years, queues are typically found at routing points rather than at switching points. Furthermore, the distribution of the delay (and hence the jitter) increases as the number of queues that must be traversed increases, as shown in Table 8.2. To manage resource and support QoS, routers require sophisticated queue management. QoS mechanisms for controlling resources that achieve more predictable delays (see Figure 8.5) include • Classification • Conditioning—specifically, policing/shaping traffic (for example, Token Bucket) • Queuing management (for example, RED) • Queue/packet scheduling [for example, weighted fair queueing (WFQ)] • Bandwidth reservation via signaling and path establishment (for example, RSVP, H.225, MPLS CR-LDP) Routers can implement the following mechanisms to deal with QoS [9]:
Table 8.2
Increasing Variance as the Number of Queues Increases
Figures of Merit
E(x) E(x^2) V(x) = (E(x^2) - E(x)^2)
E(A) E(A^2) V(A) = (E(A^2) - E(A)^2)
E(B) E(B^2) V(B) = (E(B^2) - E(B)^2)
Values
Random Variable
Sample Space
Probability Values
X
1 3 5
0.333333 0.333333 0.333333
Y
1 3 5
0.333333 0.333333 0.333333
A=X+Y
2 4 6 8 10
0.111111 0.222222 0.333333 0.222222 0.111111
Z
1 3 5
0.333333 0.333333 0.333333
B=X+Y+Z
3 5 7 9 11 13 15
0.037037 0.111111 0.222222 0.259259 0.222222 0.111111 0.037037
3 11.66667 2.666667
6 41.33333 5.333333
9 69 8
292
Quality of Service
293
Figure 8.5 Pictorial view of QoS tools. QOS Toolbox Classify Meter
Police/Shape Reserve Bandwidth Schedule Queue
Select Constrained Route Advanced Queue Manager
Admission control. Accepts or rejects access to a shared resource. This a key component for intserv and ATM networks, ensuring that resources are not oversubscribed; hence, it is more expensive and less scalable. Congestion management. Prioritizes and queues traffic access to a shared resource during congestion periods (as is done in diffserv). Congestion avoidance. Instead of waiting for congestion to occur, uses measures to prevent it. Algorithms such as the Weighted Random Early Detection (WRED) exploit TCP’s congestion-avoidance algorithms to reduce traffic injected into the network, thereby preventing congestion. Traffic shaping. Reduces the burstiness of ingress network traffic by smoothing the traffic and then forwarding to the egress link. Basic elements of a router include some or all of the following [9]: Packet classifier. This functional component is responsible for identifying flows and matching them with a filter. The filter is composed of parameters such as source and destination, IP address, port, protocol, and TOS field. The filter is also associated with information that describes the treatment of this packet. Aggregate ingress traffic flows are compared against these filters. Once a packet header is matched with a filter, the QoS profile is used by the meter, the marker, and the policing/shaping functions. Metering. The metering function compares the actual traffic flow against the QoS profile definition. Marking. Marking is related with metering in that when the metering function compares the actual measured traffic against the agreed QoS profile, the traffic is handled appropriately.
294
Chapter Eight
Policing/shaping. The policing functional component uses the metering information to determine whether the ingress traffic should be buffered or dropped. The shaping functional component means that packets are dispensed at a constant rate and buffered to achieve a constant output rate. A common algorithm used here is the Token Bucket, which shapes the egress traffic and polices the ingress traffic. Queue manager/scheduler. This capability handles the packets that are in the router’s set of queues, based on the priority management and traffic handling components described above. Queue management is discussed in more detail in Section 8.4.
8.4
QoS Details
This section provides a detailed view of QoS approaches and architecting.
IETF intserv Approach A description of intserv follows.
intserv Architecture IETF has an intserv model designed to support real-time services on the Internet and in enterprise internets. This model defines the architecture of RSVP service “guarantees.” The intserv model uses a setup protocol whereby hosts and routers signal QoS requests pertaining to a flow to the network and to each other. The intserv/RSVP model relies upon traditional datagram forwarding in the default case but allows sources and receivers to exchange signaling messages that establish additional packet classification and forwarding state on each node along the path between them. In the absence of state aggregation, the amount of state on each node scales in proportion to the number of concurrent reservations, which can be large on high-speed links. This model also requires application support for the RSVP signaling protocol. diffserv mechanisms can be used to aggregate an intserv/RSVP state in the core of the network [19]. The key RSVP RFCs and Internet Drafts are identified in Table 8.3. RSVP has evolved into a general-purpose signaling protocol for IP-based networks, applications, and services. Classic RSVP (RFC 2205) can be used for applications requesting end-to-end QoS signaling. Router-based RSVP-TE is a modification used in MPLS TE environments. RSVP-TE also has extensions for fast restoration and for GMPLS. Vendors such as Microsoft, Cisco, and Intel support RSVP; for example, Microsoft has defined APIs from applications to RSVP [20]. Efforts are under way to retain the application-initiated RSVP signaling while minimizing the state information that must be kept in the network; these activities are
Quality of Service
Table 8.3
295
Key RSVP RFCs and Internet Drafts
Request for Comments (RFCs)
Internet Drafts
RFC 2205, Resource ReSerVation Protocol—Version 1 Functional Specification, R. Braden (ed.), L. Zhang, S. Berson, S. Herzog, and S. Jamin, September 1997. RFC 2207, RSVP Extensions for IPSEC IPv4 Data Flows, L. Berger and T. O’Malley, September 1997. RFC 2208, Resource ReSerVation Protocol (RSVP) Version 1 Applicability Statement—Some Guidelines on Deployment, A. Mankin (ed.), F. Baker, B. Braden, M. O’Dell, A. Romanow, A. Weinrib, L. Zhang, September 1997. RFC 2209, Resource ReSerVation Protocol (RSVP)—Version 1 Message Processing Rules, R. Braden and L. Zhang, September 1997. D. Awduche, J. Malcolm, J. Agogbua, M. O’Dell, and J. McManus, Requirements for Traffic Engineering over MPLS, draft-ietf-mpls-traffic-eng-01.txt, June 1999. D. Awduche, L. Berger, D.-H. Gan, T. Li, G. Swallow, and V. Srinivasan, Extensions to RSVP for LSP Tunnels, draft-ietf-mplsrsvp-lsp-tunnel-02.txt, March 1999. R. Callon, A. Viswanathan, and E. Rosen, Multiprotocol Label Switching Architecture, draft-ietf-mpls-arch-05.txt, April 1999. R. Callon, G. Swallow, N. Feldman, A. Viswanathan, P. Doolan, and A. Fredette, A Framework for Multiprotocol Label Switching, draft-ietf-mpls-framework-03.txt, June 1999. M. Yuhara and M. Tomikawa, RSVP Extensions for ID-Based Refreshes, draft-yuhara-rsvp-refresh-00.txt, April 1999.
encompassed by the aggregated RSVP/RFC 3175, the intserv/diffserv interworking work (RFC 2998), and the refresh reduction proposals (RFC 2961). The intserv model starts with a flow-based description of the problem being solved. A flow is a single data stream from a single sending application to a set of receiving applications. Aggregated flows form a session, which is a homogeneous stream of simplex data from several senders to several receivers. An example of a flow is the data sent from a TCP source to a TCP destination; the reverse is a separate flow. Each TCP stream is one of a series of successive steps in moving information from a sender to a receiver. In this case, the flow identifiers are the source and destination IP addresses, the IP transport protocol identifier (UDP, TCP, and so on), and the port number. intserv defines traffic and QoS characteristics for a flow and encompasses three QoS classes: (1) guaranteed service, (2) controlled load service, and (3) best-effort service. Traffic-control mechanisms control traffic flows within a host/router to support the required QoS. The goal of intserv is to mask the underlying technology from the application while still providing the following features: • Internetwork routing, allowing applications to achieve their desired performance from the network via optimal path selection
296
Chapter Eight
• Multicast capability, permitting one-to-many or many-to-many communication flows • QoS facilities, representing parameters that describe the desired characteristics that applications can expect from the network intserv requires work for each Layer 2 technology. Hence, the IETF has used different subgroups to look at Ethernet, Token Ring, and ATM. By using intserv methods, a shared IP network such as the Internet can be designed for real-time applications (for example, real-time video). However, the overall performance efficiency at the network level remains to be understood (that is, how many customers can be supported over a given router or link). In recent years, the QoS development effort has been divided between two working groups: the RSVP Group (rsvp) and the Integrated Services Group (intserv). When an IP network that supports QoS is built, the RSVP specification is the mechanism that performs QoS requests; this is analogous to ATM signaling. The intserv specifications aim at documenting what capabilities are available to QoSaware applications. IETF originally defined service categories in intserv, as follows [10]: • Guaranteed service. This service allows the user to request a maximum delay bound for an end-to-end path across a packet network. The service is guaranteed to be within that delay bound, but no minimum is specified. This is analogous to ATM’s CBR. Real-time applications can make use of this service. Leaky bucket, reserved rate, and weighted fair queuing are used for application control. • Controlled load service. This service provides a small set of service levels, each differentiated by delay behavior. It supports three relative levels, but without particular numerical values of delay associated with them, and provides a best-effort end-to-end capability with a load baseline. Applications sensitive to congestion can make use of this service. Leaky bucket queuing methods are used for application control. • Best-effort service. This baseline (default) service can be achieved over the Internet and intranets without any QoS modifications. It provides a besteffort end-to-end capability. Legacy applications can make use of this service.
RSVP Background In the RSVP environment, the application must know the characteristics of its traffic and must signal the supporting network element to reserve certain resources to meet the application’s traffic needs. According to the availability of resources, the network either reserves the resources and sends back a positive acknowledgment or answers in the negative. This portion of the standard is called admissions control,
Quality of Service
297
a policy decision implemented by the router/switch. Admission control decides which traffic will receive protection and services and which will not. Without any admissions control, all available resources to all classes of traffic would have to be granted—which is what one has in best-effort networks. If the network accepts the request, the application sends the data over the network; the data flow is expected to conform to the traffic properties that the application has negotiated with the network. If the application attempts to send out-of-profile traffic, the data will be given best-effort service; otherwise, the packets are dropped altogether. The RSVP signaling protocol uses resource reservations source-to-destination messages to secure QoS-based connectivity and bandwidth. Along the path between the source and the target are used the resource requests to obtain permission from admission control software to use available local resources (buffers, trunks, and so on) to support the desired QoS. Resource requests then reestablish the reservation state, thereby committing the reservation. When the desired request cannot be fulfilled, a request failure message is generated and returned to the appropriate party. In cases where the reservation messages are transmitted but lost somewhere in the network, the endstations may assume that their request was accepted and begin transmitting information to a destination that, in fact, has no resources reserved; such information will likely be dropped by the routers. To allow a host to determine whether the RSVP message was successful, the host can, if desired, explicitly query the network for state information. Multicasting is an evolving application that developers are also looking to support. RSVP is designed to support heterogeneity of QoS if there are multiple receivers in a multicast session; each receiver can get a different QoS by either merging requests or using different QoS layers. Because RSVP is a receiver-driven protocol, it has the capability of scaling to a large number of recipients. There is a mechanism in RSVP to reduce the number of messages traveling upstream via a merging function. It should be clear that RSVP, from a functional perspective, is similar to ATM signaling; with RSVP, a user can provision a network connection with a carrier/ISP that uses a single physical connection but over which it can provide dynamic QoS.
RSVP Nomenclature and Mechanisms Some of the RSVP nomenclature follows; see also Table 8.4. Flow is the term used in RSVP, MPLS, and other protocols to describe a sequence of PDUs with the same QoS requirements. Typically, flows are segregated by the pair of IP destination address and port number. A session designates flows with a particular destination IP address and port; in this manner, a session can be identified and provided with special QoS treatment. RSVP uses two terms to describe traffic categories: (1) flowspec, which is the information contained in the reservation request pertaining to QoS requirements for the reservation in question, and (2) filterspec, which specifies the flows received or scheduled by the host. Table 8.4 provides some highlights of the protocol.
298
Chapter Eight
Table 8.4
RSVP Nomenclature
Advertised specification (Adspec) Filterspec
Flow specification (flowspec)
Packet filter Resource specification (Rspec)
Sender template Session
Transmission specification (Tspec)
A set of modifiable parameters used to describe the QoS capability of the path between the source and destination. The set of PDUs (packets) that receive the QoS specified by the flowspec. The session ID, an implicit part of the filter, segregates and schedules in the packet classifier output packet streams according to their source address|port. A description of the desired QoS reservation. The flowspec in a reservation request contains the service class and two sets of numeric parameters: Tspec and Rspec. If the request is successful, the flowspec sets the packet scheduler. A unique header pattern occurring in packet classification. A way to characterize the desired QoS. The characterization of resources reserved to satisfy receivers in terms of what QoS characteristics the packet stream will use; this information evaluates QoS requests. The sender’s IP address (and, optionally, port number). The specific parameters that describe a reservation, including unique information used to differentiate the traffic flow associated with the session. A session is identified by the combination: destination address|protocol|port. A way to characterize traffic. The characterization of the information flow from the standpoint of the packet stream’s physical appearance (that is, headers, packets per second, and so on); this information differentiates the QoS requests.
Some highlights of RSVP are as follows: • It supports the ability for entities to signal their desired QoS. • It is not a routing protocol. • It assumes the prior existence of network layer–routing support via such protocols as IGRP and BGP. • It requests state information but does not help provide it. • It is a soft, not hard, state. • It is not an admission control or packet scheduling application. • It sends QoS requests upstream toward senders, which works particularly well in multicast environments (that is, a receiver can best determine the acceptable quality of a videoconference and/or whether additional costs are justified). • It supports two reservation styles for use in multisender sessions: • Distinct reservations, which are separate for each sender.
Quality of Service
299
• Shared reservations, which are shared by multiple senders. • Its applications have the ability to request different reservation styles depending upon the type of service or economic considerations. RSVP work started in 1991 at Lawrence Berkeley National Laboratories and Xerox’s Palo Alto Research Center in support of first generation Internet-based multimedia tools. The desiderata were efficient use of Internet resources, scalability, support of unicast and multicast transmission, and coexistence with TCP/IP. There are three components used by end systems (hosts) to determine and signal QoS: 1. The setup protocol used by routers or hosts to signal QoS into the network 2. A traffic model or specification (the flowspec) that defines the traffic-flow data and QoS characteristics of the data leaving a source 3. Traffic controls (shaping mechanisms) that measure traffic flows leaving a host or router to ensure that the flows do not exceed the pre-agreed QoS RSVP uses IP as the basic method of carrying the signaling messages, facilitating broad applications because, for example, ISP’s networks are IP-based. However, RSVP produces a simplex reservation—that is, the endstations are only specifying resource reservations for one direction at a time; hence, two reservation requests are needed if bidirectional quality of service is desired. As noted, if an RSVP reservation is successful, there will not be acknowledgment from the network, as would be the case with an ATM call request for an SVC. This design decision was made to keep the protocol simple, although it can pose problems when the protocol interworks with other technologies, such as ATM and SONET/SDH. RSVP messages can be passed from router to router and be processed only by routers that support RSVP; as we covered for IPv6, the messages are ignored in the case where the PDUs cross non-RSVP-capable routers. With the receiver-driven nature of RSVP, the server can send a PATH message characterizing the traffic to be sent. The receivers return RSVP reservation requests, specifying a QoS from routers along the route. Once PDUs begin to flow, a protocol-like Real-Time Protocol (RTP) can ensure real-time delivery of the time-sensitive information (for example, video) and keep related streams in the same program (for example, voice and video) synchronized. There are many ways to place the RSVP data into the IP payload. The endstations can transmit the messages in direct mode (that is, directly mapped into IP PDU, with protocol type 46) by using TCP or UDP encapsulation. The UDP method, currently the most common encapsulation method found on end-system implementations, is supported for systems that cannot generate raw IP packets. By design, the RSVP suite forces little permanent state information upon the network devices supporting the protocol. This state is referred to as soft. For soft state to work, the system must be refreshed periodically. The developers took the approach that handling dynamic routing changes should be a normal procedure,
300
Chapter Eight
not an exception; therefore, routers should continuously update their reservations when they periodically receive resource requests (see Table 8.5 [21]). With RSVP, resource requests are made and then refreshed periodically. The refresh messages are identical to the original resource request messages, only repeated. The merging capability alluded to previously has the benefit of possibly requiring less state in routers. • Resource reservations in IP hosts and routers are represented by soft state; that is, reservations are not permanent but time out after some period. Reservations must be refreshed to prevent time-out, and they may be explicitly deleted. In ATM, resources are reserved for the duration of a connection, which must be explicitly and reliably deleted. • The soft-state approach of RSVP allows the QoS reserved for a flow to be changed at any time, whereas ATM connections have a static QoS that is fixed at setup time. • RSVP is a simplex protocol; resources are reserved in one direction only. In ATM, connections (and associated reservations) are bidirectional in point-topoint calls and unidirectional in point-to-multipoint calls. • Resource reservation is receiver-initiated in RSVP. In ATM, resources are reserved by the end system setting up the connection. In point-to-multipoint calls, connection setup (and hence resource reservation) must be done by the sender. • RSVP has explicit support for sessions containing multiple senders—namely, the ability to select a subset of senders and dynamically switch between senders. No such support is provided by ATM. Table 8.5
RSVP versus ATM
Feature Initiation Directionality Uniformity
QoS renegotiation
RSVP Receiver-driven Unicast/simplex Allows receivers with heterogeneous QoS for a given session Allows dynamic reconfiguration of resources
Length of session
Reservations expire (time-out)
Maturity
Under development at press time Soft state (refresh/time-out)
State
ATM Source driven Duplex Homogeneous QoS per SVC
Requires new setup (new PVC/PVP/SVC) to support a change (except for ABR) Permanently reserved for the connection until connection is dropped Well developed at press time Hard state (explicit delete)
Quality of Service
301
• RSVP has been designed independently of other architectural components, routing in particular. Moreover, route setup and resource reservation are done at different times. In ATM, resource reservation and route setup are done at the same time (the connection setup time). If the path from source to destination has changed, possibly from a routing change or link failure, the next refresh message will create a new resource reservation. (There is the possibility, however, that the network will return an error message specifying that the requested resources are not available on the new route.) Dynamically changing routes can pose a problem for reliable QoS support. If it fails because of an outage, a soft-state approach with dynamic network-initiated rerouting will, with some non-zero probability, temporarily impact QoS; the length of time will depend on the time required to determine a new route and process the reservation message. When a route fails in a hard-state protocol, such as ATM, the network will drop the connection and require a new call setup message. Hence, a hard-state protocol requires the endstation to receive a message from the network notifying it that the virtual connection has been deleted, upon which the endpoints must reestablish the circuit.
RSVP Protocol Operation The operation of RSVP is defined by the exchange of RSVP messages that contain information objects. Reservation messages flow downstream from the senders to notify receivers of the pending content and the associated characteristics required to adequately accept the material. Reservations flow upstream toward the senders to join the multicast distribution tree and/or place QoS reservations. The information flows in RSVP can be categorized as follows [22]: 1. RSVP data generated by the content source specifying the characteristics of its traffic (sender TSpec) and the associated QoS parameters (sender RSpec). This information is carried, unmodified, by interconnecting network elements in an RSVP SENDER_TSPEC object to the receivers. An RSVP Adspec is also generated by the content source; it carries information describing properties of the data path, including the availability of specific QoS services. 2. RSVP data generated by the interconnecting network elements (the ATM switch and IP routers), which is used by receivers to determine what resources are available in the network. The QoS parameters that can be reported help the receivers determine available bandwidth, link delay values, and operating parameters. As in the sender’s RSVP data, an RSVP Adspec can be generated by the interconnecting network elements and carries a description of available QoS services. (The SENDER_TSPEC contains
302
Chapter Eight
information that cannot be modified, while the Adspec’s content may be updated within the network.) 3. RSVP data generated by the receiver specifying the traffic characteristics from both a packet description (receiver TSpec) and a resource perspective (receiver RSpec). This information is placed into an RSVP FLOWSPEC and carried upstream to interconnecting network elements and the content source. Along the path toward the sender, the FLOWSPEC may be modified by routers because of reservation merging. Implementations of the RSVP protocol are very similar to client/server models. The specification identifies messages exchanged and determines which sequences are supported. The RSVP protocol also defines several data objects, which carry resource reservation information. There are five basic message types (see Table 8.6) used in RSVP, and each message type carries several subfields. The PATH and RESV messages are described below in some detail. PATH Messages The protocol operates by the source sending a quasiperiodic PATH message (out of band from the actual reserved quality data session) to the destination address (that is, the receivers) along the physical path that joins the end systems. As the PATH datagrams traverse the network, the interconnecting routers consult their normal routing tables to decide where to forward the message. When a PATH message is processed by a router, it will establish some PATH state gleaned from fields in the message. PATH state records information about the IP address of the sender, along with its policy and QoS class descriptions. Upon reception of the PATH message, the receiver determines that a connection has been requested and attempts to determine whether, and how, it would like to join the session. The receiver uses the address specified in the SENDER_TSPEC because the source can be a Class D multicast address (hence, it does not use the IP address of the sender of the PATH message). PATH messages contain the following fields: • session ID • previous hop address of the upstream RSVP neighbor Table 8.6
RSVP Messages
Message types PATH RESV CONFIRMATION TEARDOWN ERROR
Function Sent by the source to specify that a resource exists and, optionally, which parameters should be used when transmitting. Transmission of a message in hopes of reserving resources. Sent by a receiver, this optional message signals successful resource reservation. Deletes an existing reservation. Notifies that an abnormal condition, such as a reservation failure, exists.
Quality of Service
303
• sender descriptor (filter + TSpec) • options (integrity object, policy data, and Adspec) The PATH messages are sent at a quasiperiodic rate to protect the systems from changes in state. If a network failure causes the route the PATH messages took to change, the next PATH will reserve resources in the next cycle. If interconnecting devices along the old path cannot be reached, their stored state will time out when they do not receive the quasiperiodic PATH message. The PATH message contains the previous hop address of the upstream RSVP neighbor. The previous hop address is used to ensure that the PATH message has traversed the network without looping. Finally, the PATH message contains a SENDER_TEMPLATE object, which is simply the sender’s IP address used for identification. RESV Messages If the receiver elects to communicate with the sender, it will send a reservation message (RESV) upstream along the same route that the PATH message used. If the RESV message fails at one of the intermediate routers, an error messages will be generated and transmitted to the requester. To improve network efficiency, if two or more RESV messages for the same source pass through a common router or switch, the device can attempt to merge the reservation. The merged reservation is then forwarded as an aggregate request to the next upstream node. The RESV messages are addressed to the upstream node, with the source address becoming the receiver. The RESV contains a TSpec corresponding to the sessions source. RESV messages contain the following fields: • • • •
session ID previous hop address of the downstream RSVP neighbor reservation style flow descriptor (different combinations of flow and flowspec are used based on the reservation style) • option (integrity; policy data) If the request is admitted, in addition to forwarding the RESV messages upstream, the host or router will install packet filtering into its forwarding database. The forwarding database is queried when the device has a packet to be transmitted and is used to segregate traffic into different classes. The flow parameters established for this QoS-enabled traffic will also be passed to the packet scheduler. The parameters are used by the scheduler to forward packets at a rate that is compliant to the flow’s description. If the interconnecting network contains routers that do not support the RSVP protocol, the PATH/RESV messages are forwarded through the non-RSVP network since they are just regular IP packets. The routers at the edge of the RSVP system communicate with their neighbor as if they were directly connected. The protocol will operate in this environment; however, the quality of the reservations
304
Chapter Eight
will be impacted by the fact that the network will now contain spots providing only best-effort performance. The performance across these spots must be estimated and communicated to the receivers in Adspec messages. Operational Procedures An application wishing to make use of RSVP signaling communicates with the protocol through an API. Before receivers can make reservations, the network must have knowledge of the source’s characteristics. This information is communicated across the API when the hosts register themselves. The RSVP code in the host then generates a SENDER_TSPEC object that contains the details on the resources required and what the packet headers will look like. The source also constructs the initial Adspec containing generic parameters. Both of these objects are then transmitted in the PATH message. As the PATH message travels from the source to the receivers, routers along the physical connection modify the Adspec to reflect their current state. The traffic-control module in the router checks the services requested in the original Adspec and the parameters associated with those services. If the values cannot be supported, the Adspec will be modified, or if the service is unavailable, a flag will be set in the Adspec to notify the receiver. By flagging exceptions, the Adspec will notify the receiver if any non-RSVP routers exist along the path (that is, links that will provide only best effort service), if any routers exist along the path that do not support one of the service categories (controlled load or guaranteed), and if a value for one of the service categories exists that differs from what is selected in the SENDER_TSPEC. At the receiver, the Adspec and SENDER_TSPEC are removed from the PATH message and delivered to the receiving application. At this juncture, the receiver uses the Adspec/SENDER_TSPEC combination to determine what resources it needs to receive the contents from the network. Since the receiver has the best information on how it interacts with the source application, it can accurately determine the packet headers and traffic-parameter values for both directions of the session from the Adspec and SENDER_TSPEC. Finally, the receiver’s maximum transfer unit (MTU) must be calculated, because both guaranteed and controlled load QoS control services place an upper bound on packet size. (The MTU is the maximum packet size that can be transmitted. It is specified to help bound delay.) Each source places the desired MTU in the SENDER_TSPEC, and routers may optionally modify the Adspec’s MTU field on a per-class-of-service basis. Once the receiver has identified the parameters required for the reservation, it will pass those values to the network via its RSVP API. The parameters from the TSpec and RSpec objects are used to form the FLOWSPEC, which are placed in a RESV message and transmitted upstream by way of the default route. When they are received by an internetworking device, the RESV message and its corresponding PATH message are used to select the correct resources to be reserved for the session.
Quality of Service
305
Deployment As discussed previously, RSVP/intserv augments best-effort connectionless services with a QoS request/allocation mechanism. New software and hardware is needed on routers and end systems to support the QoS negotiations. It should be noted that with RSVP, the network still uses routers and IP. The kinds of functionality required in the router include the following: • • • •
Classifier, which maps PDUs to a service class Packet scheduler, which forwards packets based on service classes Admission control, which determines whether the QoS requests can be met Setup protocol state machine
The RSVP updates the classifier with the filterspec and updates the scheduler with the flowspec. These capabilities must be included in routers. RSVP software is now available in many high-end routers. However, actual field deployment appears to be rather limited.*
IETF diffserv Approach This section looks at the diffserv approach to QoS. Of the key documents for diffserv, shown in Table 8.7, RFC-2475 is the most basic. Figure 8.6 shows some of the elements used to support diffserv. In a diffserv domain, all the IP packets crossing a link and requiring the same diffserv behavior are said to constitute a BA. At the ingress node of the diffserv domain, the packets are classified and marked with a DSCP, which corresponds to their BA. At each transit node, the DSCP is used to select the PHB that determines the scheduling treatment and, in some cases, drop probability for each packet. Focusing briefly just on MPLS, Reference [17] specifies a solution for supporting the diffserv BA whose corresponding PHBs are currently defined (in [DIFF_HEADER], [DIFF_AF], [DIFF_EF], per Table 8.7) over an MPLS network. As mentioned in [DIFF_HEADER] (see Table 8.7), “Service providers are not required to use the same node mechanisms or configurations to enable service differentiation within their networks, and are free to configure the node parameters in whatever way that is appropriate for their service offerings and traffic engineering objectives.” Therefore, the solution defined in Reference [17] gives service providers flexibility in selecting how diffserv classes of service are routed or trafficengineered within their domain (for example, separate classes of services sup* A Google.com search on the phrase “RSVP protocol” reveals 68,700 hits; a search on the phrase “Penetration of RSVP in intranets” identifies only 173 hits. Searching an “actual deployment of RSVP in intranets” shows 651 hits.
Table 8.7 diffserv RFCs and Documents RFC 2475 [DIFF_ARCH] RFC 2597 [DIFF_AF] [DIFF_EF] RFC 2474 [DIFF_HEADER]
RFC 2475, An Architecture for Differentiated Services, Blake et al., December 1998. RFC 2597, Assured Forwarding PHB Group, Heinanen et al., June 1999. Davie et al., An Expedited Forwarding PHB, draft-ietf-diffserv-rfc2598bis01.txt, April 2001. RFC 2474, Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers, Nichols et al., December 1998.
A device that performs metering; metering is the process of measuring the temporal properties (e.g., rate) of a traffic stream selected by a classifier. The instantaneous state of this process may be used to affect the operation of a marker, shaper, or dropper, and/or may be used for accounting and measurement purposes. Policing: The process of discarding packets (by a dropper) within a traffic stream in accordance with the state of a corresponding meter enforcing a traffic profile.
Meter Queue Incoming Packets
Outgoing Packets Classifier
Marker
Shaper/ Dropper
An entity which selects packets based on the content of packet headers according to defined rules. A device that performs marking; marking is the process of setting the DS codepoint in a packet based on defined rules; pre-marking, re-marking. A device that performs shaping; shaping is the process of delaying packets within a traffic stream to cause it to conform to some defined traffic profile. A device that performs dropping; dropping is the process of discarding packets based on specified rules; policing.
Figure 8.6 Key node elements for diffserv.
Quality of Service
307
ported via separate LSPs and routed separately, as well as all classes of service supported on the same LSP and routed together).
Introduction This section provides a synopsis of diffserv, as summarized from RFC 2475 [19]. diffserv uses edge-based packet marking, local per-class forwarding mechanisms (known as behaviors), and network provisioning to support multiple QoS classes in IP (and now MPLS) networks. The DSCP in the IP packet header indicates how the packet should be treated at each node. DSCPs are set at ingress nodes based on an analysis of the packet. Intermediate routers and/or switches service the packets based on the value of the DSCP. diffserv is simpler than intserv/RSVP and ATM because no signaling or per-flow state information in maintained in the core of the network. No changes are required for applications. The mechanism can be implemented efficiently in routers and switches: Perusal of the six DSCP bits is all that is required (the “difficult” work is done at the edges). This affords interior network flexibility; the core network can be IP, ATM, MPLS, Frame Relay, or GMPLS/optical. diffserv PHBs cover four arenas: 1. Best-effort services (default) 2. Expedited forwarding (EF), for low-delay, low-latency, low-jitter service (as described in RFC 2598) 3. Assured forwarding (AF), which are the four “relative” classes of service defined in RFC 2597 4. Class selectors for backward compatibility with IP precedence methods Figures 8.7 to 8.9 define key concepts. Figure 8.10 depicts the implementation of various diffserv functions in routers [6]. QoS is supported by the following activities: • Classification • Traffic management/conditioning (policing, marking, shaping, and/or dropping) • Queue management, such as RED (this is a vendor-specific function) • Queue scheduling, such as Weighted Round-Robin, Weighted Fair Queuing, and Deficit Round-Robin (this is a vendor-specific function) • TE (this is an MPLS-specific function) RFC 2475 defines an architecture for implementing scalable service differentiation in the Internet. This architecture achieves scalability by aggregating the trafficclassification state, which is conveyed by means of IP layer packet marking using the
308
Chapter Eight
Figure 8.7 Key concepts.
DS boundary node A DS node that connects one DS domain to a node either in another DS domain or in a domain that is not DS-capable. DS capable Capable of implementing differentiated services as described in this RFC2475 architecture; usually used in reference to a domain consisting of DS-compliant nodes. DS-codepoint DS-compliant
DS domain
A specific value of the DSCP portion of the DS field, used to select a PHB. Enabled to support differentiated services functions and behaviors as defined in [DSFIELD], RFC 2475, and other differentiated services documents; usually used in reference to a node or device.
DS interior node
A DS-capable domain; a contiguous set of nodes which operate with a common set of service provisioning policies and PHB definitions. A DS boundary node in its role in handling traffic as it leaves a DS domain. A DS boundary node in its role in handling traffic as it enters a DS domain. A DS node that is not a DS boundary node.
DS node
A DS-compliant node.
DS region
A set of contiguous DS domains which can offer differentiated services over paths across those DS domains. The DS domain downstream of traffic flow on a boundary link.
DS egress node DS ingress node
Downstream DS domain
Router/LSR (interior) Traffic conditioner
Router/LSR (ingress) Behavior Aggregate A DS behavior aggregate. (BA) BA classifier A classifier that selects packets based only on the contents of the DS field. DS behavior A collection of packets with the same DS codepoint crossing aggregate a link in a particular direction. DS field The IPv4 header TOS octet or the IPv6 Traffic Class octet when interpreted in conformance with the definition given in [DSFIELD]. The bits of the DSCP field encode the DS codepoint, while the remaining bits are currently unused. Traffic conditioner
Router/LSR (egress)
An entity which performs traffic conditioning functions and which may contain meters, markers, droppers, and shapers. Traffic conditioners are typically deployed in DS boundary nodes only. A traffic conditioner may re-mark a traffic stream or may discard or shape packets to alter the temporal characteristics of the stream and bring it into compliance with a traffic profile.
Traffic conditioning Control functions performed to enforce rules specified in a TCA, including metering, marking, shaping and policing. Traffic Conditioning An agreement specifying classifier rules and any corresponding traffic Agreement (TCA) profiles and metering, marking, discarding and/or shaping rules which are to apply to the traffic streams selected by the classifier. A TCA encompasses all of the traffic conditioning rules explicitly specified within a SLA along with all of the rules implicit from the relevant service requirements and/or from a DS domain's service provisioning policy.
DS (differentiated services) field [23]. Packets are classified and marked to receive a particular per-hop forwarding behavior on nodes along their path. Sophisticated classification, marking, policing, and shaping operations need only be implemented at network boundaries or hosts. Network resources are allocated to traffic streams by service-provisioning policies that govern how traffic is marked and conditioned upon
Quality of Service
309
Classification Microflow
A single instance of an application-to-application flow of packets which is identified by source address, source port, destination address, destination port and protocol id.
MF Classifier
A multi-field (MF) classifier that selects packets based on the content of some arbitrary number of header fields; typically some combination of source address, destination address, DS field, protocol ID, source port and destination port.
Per-Hop-Behavior (PHB) PHB group
The externally observable forwarding behavior applied at a DS-compliant node to a DS behavior aggregate. A set of one or more PHBs that can only be meaningfully specified and implemented simultaneously, due to a common constraint applying to all PHBs in the set such as a queue servicing or queue management policy. A PHB group provides a service building block that allows a set of related forwarding behaviors to be specified together (e.g., four dropping priorities). A single PHB is a special case of a PHB group.
Pre-mark
To set the DS codepoint of a packet prior to entry into a downstream DS domain. To change the DS codepoint of a packet, usually performed by a marker in accordance with a TCA.
Re-mark
Performed at edge
Figure 8.8 Other key concepts.
• Identify packets for subsequent forwarding treatment performed in routers or hosts combined with other actions • Based on: one or more fields in packet header payload contents input interface
entry to a diffserv-capable network and how that traffic is forwarded within that network. A variety of services can be implemented on top of these building blocks. The RFC defines an architecture for implementing scalable service differentiation in the Internet. A service defines some significant characteristics of packet transmission in one direction across a set of one or more paths within a network. These characteristics may be specified in quantitative or statistical terms of throughput, delay, jitter, and/or loss, or they may otherwise be specified in terms of some relative priority of access to network resources. Service differentiation is desired to accommodate heterogeneous application requirements and user expectations and to permit differentiated pricing of Internet service. This architecture is composed of a number of functional elements implemented in network nodes, including a small set of per-hop forwarding behaviors, packet-classification functions, and traffic-conditioning functions including metering, marking, shaping, and policing. This architecture achieves scalability by implementing complex classification and conditioning functions only at network boundary nodes and by applying PHBs to aggregates of traffic that have been
310
Classifier
Marker
Leaky Bucket or Token Bucket
A device that performs shaping; shaping is the process of delaying packets within a traffic stream to cause it to conform to some defined traffic profile.
Figure 8.9 Additional key concepts.
(meter) R (token/sec)
B (tokens)
• Operation Conformant traffic is injected into the network Non-conformant traffic may be marked, delayed, tokens or discarded Packets • Performed at network ingress or logical policing (flow) points • Shaping removes jitter at the expense of some Shipping Queue latency
A device that performs marking; marking is the process of setting the DS codepoint in a packet based on defined rules; pre-marking, re-marking.
An entity which selects packets based on the content of packet headers according to defined rules.
Incoming Packets
Meter
• Policing checks conformance to a configured (or signaled) traffic profile
Policer/Shaper
A device that performs metering; metering is the process of measuring the temporal properties (e.g., rate) of a traffic stream selected by a classifier. The instantaneous state of this process may be used to affect the operation of a marker, shaper, or dropper, and/or may be used for accounting and measurement purposes.
Outgoing Packets
Quality of Service
Figure 8.10 diffserv router functions. (Source: Cisco Systems)
311
diffserv Router Functions Edge Router diffserv Functions meter
classifier
marker
shaper dropper
• Classifier - MF or BA • Meter - measures traffic against profile • Packet marker - marks DSCPs • Shaper/dropper - traffic conditioners Core
Edge
Core
Edge
Interior Router diffserv Functions classifier
PHB (queues)
• Classifier - BA only • PHB - supported by queue mngt/sched. techniques
appropriately marked using the DS field in the IPv4 or IPv6 headers [23] (see Figure 8.11). PHBs are defined to permit a reasonably granular means of allocating buffer and bandwidth resources at each node among competing traffic streams. Per-application flow or per-customer forwarding states need not be maintained within the core of the network. A distinction is maintained between • • • •
The service provided to a traffic aggregate The conditioning functions and PHBs used to realize services The DS field value (DSCP) used to mark packets to select a PHB The particular node implementation mechanisms that realize a PHB
Service-provisioning and traffic-conditioning policies are sufficiently decoupled from the forwarding behaviors within the network interior to permit implementation of a variety of service behaviors, with room for future expansion. This architecture provides only service differentiation in one direction of traffic flow and is therefore asymmetric.
312
Chapter Eight
Figure 8.11 DSCP.
0 1 2 3 4 5 6 7 DSCP
CU(*)
Differentiated Services Codepoint (DSCP) (RFC 2474) • used to select the service (PHB) the packet will receive at each DS-cable node • formerly the IPv4 TOS and IPv6 traffic class fields (*) CU: Currently undefined (bit 6 and 7)
The following requirements have been identified and are addressed in the RFC 2475 architecture: • Should accommodate a variety of services and provisioning policies, extending end to end or within a particular (set of) network(s) • Should allow decoupling of the service from the particular application in use • Should work with existing applications without the need for application programming interface changes or host software modifications (assuming suitable deployment of classifiers, markers, and other traffic-conditioning functions) • Should decouple traffic-conditioning and service-provisioning functions from forwarding behaviors implemented within the core network nodes • Should not depend on hop-by-hop application signaling • Should require only a small set of forwarding behaviors whose implementation complexity does not dominate the cost of a network device and that will not introduce bottlenecks for future high-speed system implementations • Should avoid a per-microflow or per-customer state within core network nodes • Should use only an aggregated classification state within the network core • Should permit simple packet-classification implementations in core network nodes (BA classifier) • Should permit reasonable interoperability with non-DS-compliant network nodes • Should accommodate incremental deployment
diffserv Architectural Model diffserv architecture is based on a simple model in which traffic entering a network is classified and possibly conditioned at the boundaries of the network, then assigned to different behavior aggregates. Each behavior aggregate is identified by a single DSCP. Within the core of the network, packets are forwarded according to
Quality of Service
313
the PHB associated with the DSCP. In this section, we discuss the key components within a diffserv region, traffic-classification and -conditioning functions, and how diffserv is achieved through the combination of traffic conditioning and PHB-based forwarding. diffserv Domain A DS domain is a contiguous set of DS nodes that operate with a common service-provisioning policy and set of PHB groups implemented on each node. A DS domain has a well-defined boundary consisting of DS boundary nodes that classify and possibly condition ingress traffic, ensuring that packets transiting the domain are marked appropriately for selecting a PHB from one of the domain-supported PHB groups. Nodes within the DS domain select the forwarding behavior for the packets based on their DSCP, mapping that value to one of the supported PHBs by using either the recommended DSCP —> PHB mapping or a locally customized mapping [23]. Inclusion of non-DS-compliant nodes within a DS domain may result in unpredictable performance and may impede the ability to satisfy SLAs. A DS domain normally consists of one or more networks under the same administration, for example, an organization’s intranet or an ISP. The administration of the domain is responsible for ensuring that adequate resources are provisioned and/or reserved to support the SLAs offered by the domain. DS Boundary Nodes and Interior Nodes A DS domain consists of DS boundary nodes and DS interior nodes. DS boundary nodes interconnect the DS domain to other DS or non-DS-capable domains, while DS interior nodes only connect to other DS interior or boundary nodes within the same DS domain. Both DS boundary nodes and interior nodes must apply the appropriate PHB to packets based on the DSCP; otherwise, unpredictable behavior may result. In addition, DS boundary nodes may be required to perform traffic-conditioning functions as defined by a traffic-conditioning agreement (TCA) between their DS domain and the peering domain to which they are connected. Interior nodes may be able to perform limited traffic-conditioning functions, such as DSCP remarking. Interior nodes that implement more complex classification and traffic-conditioning functions are analogous to DS boundary nodes. A host in a network containing a DS domain may act as a DS boundary node for traffic from applications running on that host; we therefore say that the host is within the DS domain. If a host does not act as a boundary node, the DS node topologically closest to that host will act as the DS boundary node for that host’s traffic. DS Ingress Node and Egress Node DS boundary nodes act both as a DS ingress node and as a DS egress node for different traffic directions. Traffic enters a DS domain at a DS ingress node and leaves a DS domain at a DS egress node. A DS ingress node is responsible for ensuring that the traffic entering the DS domain conforms to any TCA between it and the other domain to which the ingress node is connected. A DS egress node may perform traffic-conditioning functions on traffic forwarded to a directly connected peering domain, depending on the details of
314
Chapter Eight
the TCA between the two domains. Note that a DS boundary node may act as a DS interior node for some set of interfaces. diffserv Region A diffserv region (DS region) is a set of one or more contiguous DS domains. DS regions are capable of supporting diffserv along paths that span the domains within the region. The DS domains in a DS region may support different PHB groups internally and different DSCP —> PHB mappings. However, to permit services that span across the domains, the peering DS domains must each establish a peering SLA that defines (either explicitly or implicitly) a TCA that specifies how transit traffic from one DS domain to another is conditioned at the boundary between the two DS domains. It is possible for several DS domains within a DS region to adopt a common service-provisioning policy and to support a common set of PHB groups and DSCP mappings, thus eliminating the need for traffic conditioning between those DS domains. Traffic Classification and Conditioning diffserv is extended across a DS domain boundary by establishing an SLA between an upstream network and a downstream DS domain. The SLA may specify packet-classification and remarking rules and also specify traffic profiles and actions to traffic streams that are in or out of profile. The TCA between the domains is derived (explicitly or implicitly) from this SLA. The packet-classification policy identifies the subset of traffic that receives diffserv by being conditioned and/or mapped to one or more BA (by DSCP remarking) within the DS domain. Traffic conditioning constitutes metering, shaping, policing and/or remarking to ensure that the traffic entering the DS domain conforms to the rules specified in the TCA, in accordance with the domain’s service-provisioning policy. The extent of traffic conditioning required is dependent on the specifics of the service offering and may range from simple DSCP remarking to complex policing and shaping operations. The details of traffic-conditioning policies that are negotiated between networks is outside the scope of this book. Classifiers Packet classifiers select packets in a traffic stream based on the content of some portion of the packet header. We define two types of classifiers. The BA classifier classifies packets based on the DSCP only; the MF classifier selects packets based on the value of a combination of one or more header fields, such as source address, destination address, DS field, protocol ID, source port, and destination port numbers, as well as other information (for example, incoming interface). Classifiers are used to “steer” packets matching some specified rule to an element of a traffic conditioner for further processing. Classifiers must be configured by some management procedure in accordance with the appropriate TCA, and they must also authenticate the information that they use to classify packets.
Quality of Service
315
In the event of upstream packet fragmentation, MF classifiers that examine the contents of transport layer header fields may incorrectly classify packet fragments subsequent to the first. A possible solution to this problem is to maintain a fragmentation state; however, this is not a general solution, because of the possibility of upstream fragment reordering or divergent routing paths. The policy of applying packet fragments is outside the scope of this book. Traffic Profiles A traffic profile specifies the temporal properties of a traffic stream selected by a classifier. It provides rules for determining whether a particular packet is in profile or out of profile. For example, a profile based on a token bucket might look like the following: DSCP = X,
use token bucket r, b
The above profile indicates that all packets marked with DSCP X should be measured against a token bucket meter with rate r and burst size b. In this example, out-of-profile packets are those packets in the traffic stream that arrive when insufficient tokens are available in the bucket. The concept of in- and out-ofprofile packets can be extended to more than two levels; for example, multiple levels of conformance with a profile may be defined and enforced. Different conditioning actions may be applied to in-profile and out-of-profile packets, or different accounting actions may be triggered. In-profile packets may be allowed to enter the DS domain without further conditioning; alternatively, their DSCP may be changed. The latter happens when the DSCP is set to a nondefault value for the first time [23] or when the packets enter a DS domain that uses a different PHB group or DSCP —> PHB mapping policy for this traffic stream. Out-of-profile packets may be queued until they are in profile (shaped), discarded (policed), marked with a new DSCP (remarked), or forwarded unchanged while triggering some accounting procedure. Out-of-profile packets may be mapped to one or more behavior aggregates that are inferior in some dimension of forwarding performance to the BA into which in-profile packets are mapped. A traffic profile is an optional component of a TCA. Its use is dependent on the specifics of the service offering and the domain’s service-provisioning policy. Traffic Conditioners A traffic conditioner may contain the following elements: meter, marker, shaper, and dropper. A traffic stream is selected by a classifier, which steers the packets to a logical instance of a traffic conditioner. A meter is used (where appropriate) to measure the traffic stream against a traffic profile. The state of the meter with respect to a particular packet (for example, whether it is in or out of profile) may be used to affect a marking, dropping, or shaping action. When packets exit the traffic conditioner of a DS boundary node, the DSCP of each packet must be set to an appropriate value.
316
Chapter Eight
Figure 8.9 shows a block diagram of a classifier and traffic conditioner. Note that a traffic conditioner may not necessarily contain all four elements. For example, in the case where no traffic profile is in effect, packets may pass only through a classifier and a marker. Meters Traffic meters measure the temporal properties of the stream of packets selected by a classifier against a traffic profile specified in a TCA. A meter passes state information to other conditioning functions to trigger a particular action for each packet that is either in or out of profile (to some extent). Markers Packet markers set the DS field of a packet to a particular DSCP, adding the marked packet to a particular DS BA. The marker may be configured to mark all packets to a single DSCP that are steered to that marker. The marker may also be configured to mark a packet to one of a set of DSCPs used to select a PHB in a PHB group, according to the state of a meter. When the marker changes the DSCP in a packet, it is said to have remarked the packet. Shapers Shapers delay some or all of the packets in a traffic stream to bring the stream into compliance with a traffic profile. A shaper usually has a finite-size buffer, and packets may be discarded if there is insufficient buffer space to hold the delayed packets. Droppers Droppers discard some or all of the packets in a traffic stream to bring the stream into compliance with a traffic profile. This process is known as policing the stream. Note that a dropper can be implemented as a special case of a shaper by setting the shaper buffer size to zero (or a few) packets. Location of Traffic Conditioners and MF Classifiers Traffic conditioners are usually located within DS ingress and DS egress boundary nodes, although they may also be located in nodes within the interior of a DS domain or within a nonDS-capable domain (see Figure 8.12). Within the Source Domain We define the source domain as the domain containing the nodes that originate the traffic that receives a particular service. Traffic sources and intermediate nodes within a source domain may perform trafficclassification and -conditioning functions. The traffic originating from the source domain across a boundary may be marked by the traffic sources directly or by intermediate nodes before leaving the source domain. This is referred to as initial marking, or premarking. Consider the example of a company with a policy stating that its CEO’s packets should be given higher priority. The CEO’s host may mark the DS field of all outgoing packets with a DSCP that indicates “higher priority.” Alternatively, the first-hop router directly connected to the CEO’s host may classify the traffic and mark the CEO’s packets with the correct DSCP. Such high-priority traffic may also be conditioned near the source so that a limit occurs on the amount of highpriority traffic forwarded from a particular source. There are some advantages to marking packets close to the traffic source. First, a traffic source can more easily take into account an application’s preferences
Figure 8.12 DS domains and other elements.
DS Boundary
DS Interior
DS Boundary
• MF Classifier • Metering • Marking • Traffic Conditioning
• BA Classifier • PHB Support • Queue Mngt.Sched.
• MF Classifier • Metering • Marking • Traffic Conditioning
DS Domain
Microflows BA Traffic
DS Domain
DS domain
DS interior node
A DS-capable domain; a contiguous set of nodes which operate with a common set of service provisioning policies and PHB definitions. A DS boundary node in its role in handling traffic as it leaves a DS domain. A DS boundary node in its role in handling traffic as it enters a DS domain. A DS node that is not a DS boundary node.
DS node
A DS-compliant node.
DS egress node DS ingress node
DS region
A set of contiguous DS domains which can offer differentiated services over paths across those DS domains. DS boundary node A DS node that connects one DS domain to a node either in another DS domain or in a domain that is not DS-capable. Behavior Aggregate A DS behavior aggregate. (BA) BA classifier A classifier that selects packets based only on the contents of the DS field.
Microflow
A single instance of an application-to-application flow of packets which is identified by source address, source port, destination address, destination port and protocol id.
MF Classifier
A multi-field (MF) classifier which selects packets based on the content of some arbitrary number of header fields; typically some combination of source address, destination address, DS field, protocol ID, source port and destination port.
Per-Hop-Behavior (PHB)
The externally observable forwarding behavior applied at a DS-compliant node to a DS behavior aggregate.
Traffic conditioner
An entity which performs traffic conditioning functions and which may contain meters, markers, droppers, and shapers. Traffic conditioners are typically deployed in DS boundary nodes only. A traffic conditioner may re-mark a traffic stream or may discard or shape packets to alter the temporal characteristics of the stream and bring it into compliance with a traffic profile.
317
318
Chapter Eight
when it decides which packets should receive better forwarding treatment. Second, classification of packets is much simpler before the traffic has been aggregated with packets from other sources, since the number of classification rules that need to be applied within a single node is reduced. Packet marking may be distributed across multiple nodes, so the source DS domain is responsible for ensuring that the aggregated traffic toward its provider DS domain conforms to the appropriate TCA. Additional allocation mechanisms, such as bandwidth brokers or RSVP, may be used to dynamically allocate resources for a particular DS BA within the provider’s network [24]. The boundary node of the source domain should also monitor conformance with the TCA, and it may police, shape, or remark packets as necessary. At the Boundary of a DS Domain Traffic streams may be classified, marked, and otherwise conditioned on either end of a boundary link (the DS egress node of the upstream domain or the DS ingress node of the downstream domain). The SLA between the domains should specify which domain has the responsibility of mapping traffic streams to the DS BAs and conditioning those BAs in conformance with the appropriate TCA. However, a DS ingress node must assume that the incoming traffic might not conform to the TCA and must be prepared to enforce the TCA in accordance with local policy. When packets are premarked and conditioned in the upstream domain, potentially fewer classification- and traffic-conditioning rules need to be supported in the downstream DS domain. In this circumstance, the downstream DS domain may only need to remark or police the incoming BAs for enforcing the TCA. However, more sophisticated services that are path- or source-dependent may require MF classification in the downstream DS domain’s ingress nodes. If a DS ingress node is connected to an upstream non-DS-capable domain, the DS ingress node must be able to perform all the necessary traffic-conditioning functions on the incoming traffic. In Non-DS-Capable Domains Traffic sources or intermediate nodes in a nonDS-capable domain may employ traffic conditioners to premark traffic before it reaches the ingress of a downstream DS domain. In this way, the local policies for classification and marking may be concealed. In Interior DS Nodes Although the basic architecture assumes that complex classification and traffic-conditioning functions are located only in a network’s ingress and egress boundary nodes, deployment of these functions in the interior of the network is not precluded. For example, more restrictive access policies may be enforced on a transoceanic link, requiring MF classification and traffic-conditioning functionality in the upstream node on the link. This approach may have scaling limits from the potentially large number of classification and conditioning rules that might need to be maintained. PHBs A PHB is a description of the externally observable forwarding behavior of a DS node applied to a particular DS behavior aggregate. The term forwarding
Quality of Service
319
behavior is a general concept in this context. For example, in the event that only one BA occupies a link, the observable forwarding behavior (that is, loss, delay, jitter) will often depend only on the relative loading of the link (that is, in the event that the behavior assumes a work-conserving scheduling discipline). Useful behavioral distinctions are observed mainly when multiple BAs compete for buffer and bandwidth resources on a node. The PHB is the means by which a node allocates resources to BAs, and it is on top of this basic hop-by-hop resource allocation mechanism that useful differentiated services may be constructed. The most simple example of a PHB is one that guarantees a minimal bandwidth allocation of X percent of a link (over some reasonable time interval) to a BA. This PHB can be fairly easily measured under a variety of competing traffic conditions. A slightly more complex PHB would guarantee a minimal bandwidth allocation of X percent of a link, with fair, proportional sharing of any excess link capacity. In general, the observable behavior of a PHB may depend on certain constraints on the traffic characteristics of the associated BA or the characteristics of other behavior aggregates. PHBs may be specified in terms of their resource (for example, buffer, bandwidth) priority relative to other PHBs or in terms of their relative observable traffic characteristics (for example, delay, loss). These PHBs may be used as building blocks to allocate resources and should be specified as a group (PHB group) for consistency. PHB groups will usually share a common constraint that applies to each PHB within the group, such as a packet-scheduling or buffer-management policy. The relationship between PHBs in a group may be in terms of absolute or relative priority (for example, discard priority by means of deterministic or stochastic thresholds), but this is not required (for example, N equal link shares). A single PHB defined in isolation is a special case of a PHB group. PHBs are implemented in nodes by means of some buffer-management and packet-scheduling mechanisms. PHBs are defined in terms of behavior characteristics relevant to service-provisioning policies, not in terms of particular implementation mechanisms. In general, a variety of mechanisms may be suitable for implementing a particular PHB group. Furthermore, it is likely that more than one PHB group may be implemented on a node and used within a domain. PHB groups should be defined so that the proper resource allocation between groups can be inferred and that integrated mechanisms having the ability to simultaneously support two or more groups can be implemented. A PHB group definition should indicate possible conflicts with previously documented PHB groups that might prevent simultaneous operation. As described in [23], a PHB is selected at a node by a mapping of the DSCP in a received packet. Standardized PHBs have a recommended DSCP. However, the total space of DSCPs is larger than the space available for recommended DSCPs for standardized PHBs, and [23] leaves provisions for locally configurable mappings. A DSCP —> PHB mapping table may contain both 1 —> 1 and N —> 1 mappings. All DSCPs must be mapped to some PHB; in the absence of some local
320
Chapter Eight
policy, DSCPs that are not mapped to a standardized PHB in accordance with that PHB’s specification should be mapped to the default PHB. Network Resource Allocation The implementation, configuration, operation, and administration of supported PHB groups in the nodes of a DS domain should effectively partition the resources of those nodes and the internode links between BAs in accordance with the domain’s service-provisioning policy. Traffic conditioners can further control the use of these resources through enforcement of TCAs and possibly through operational feedback from the nodes and traffic conditioners in the domain. Although a range of services can be deployed in the absence of complex traffic-conditioning functions (for example, using only static marking policies), functions such as policing, shaping, and dynamic remarking enable the deployment of services providing quantitative performance metrics. The configuration of and interaction between traffic conditioners and interior nodes should be managed by the administrative control of the domain and may require operational control through protocols and a control entity. A wide range of possible control models exists. The precise nature and implementation of the interaction between these components is outside the scope of this architecture. However, scalability requires that the control of the domain not require micromanagement of the network resources. The most scalable control model would operate nodes in open loop in the operational time frame and would only require administrative time-scale management, as SLAs are varied. This simple model may be unsuitable in some circumstances, and some automated but slowly varying operational control (minutes rather than seconds) may be desirable to balance the use of the network against the recent load profile.
Additional Details on Queue Management This section expands on the queue management methodologies, discussed in Section 8.3, that have evolved in the past decade to support improved routing and QoS. These methodologies are now included in many mid-range and high-end routers. The methodologies described here can be implemented with or without the presence of diffserv or intserv, but they are clearly more effective when there is underlying QoS protocol support. Queuing per se embodies two functions: Congestion control (management). This mechanism controls the actual queue occupancy and can be designed to discard packets in case of threshold overreaching, depending on some specified algorithm. Service scheduling. (In this context, service denotes packet transmission.) This mechanism is based on an appropriate criterion. Queues are serviced by the packet scheduler in a manner consistent with the intended traffic-
Quality of Service
321
administration goal. It is possible that in the process, some packets are treated in “an unfair manner,” whereby packets in certain queues are serviced more often than others, to provide differentiated treatment to certain flows. By design, there is a finite amount of buffer space or memory in a queue; therefore, the number of packets that can be buffered within a queue needs to be controlled (congestion control). This shortage of buffers has more to do with the availability of queue servers (typically, time slots on an outgoing WAN communication link, whether a T1, DS-3, or SONET/SDH), than with memory size on the board, although this can also become an issue at very high speeds—OC-48, OC192, and OC-768, for example. This concept is similar to not having twenty telephone access lines into a customer care center with only five people; what is the purpose of having twenty customers simultaneously reach the care center if there are only five people able to accept (service) the calls? The router or switch forwards packets at the line rate; however, when a burst occurs, or if the switch is oversubscribed and congestion occurs, packets are then buffered. Some buffering is acceptable, but excessive buffering, particularly when supporting real-time (streaming) traffic, is counterproductive. Traditional first in, first out/first come, first served (FIFO/FCFS) queuing provides no service differentiation and can result in network performance problems, such as long delay, high jitter, and high packet loss. QoS needs routers to support some kind of queue scheduling and management to prioritize outbound packets and to control queue depth (which, in turn, minimizes congestion). An example of queue management is RED; examples of queue scheduling are Fair Queuing/ Weighted Fair Queuing and DDR. Queues reaching full occupancy are problematic because new flows cannot get through (called lock-out) and packets from existing flows can be dropped, causing across-the-board TCP slow starts (called global synchronization). Refer to RFC 2309 for a discussion. There are several packet-discard algorithms. The simplest is tail drop; once the queue fills up, any new packets will be dropped. This is acceptable for UDP packets; however, there are disadvantages for TCP traffic. (This is not an issue for H.323 VOIP, although the data traffic will suffer.) With TCP, already established flows quickly go into a congestion-avoidance mode and exponentially drop the rate at which the packets are sent. This problem, the aforementioned global synchronization, occurs when all TCP traffic sources are simultaneously increasing and decreasing their flow rates. Some flows should slow down so that other flows can take advantage of the freed-up buffer space. We previously mentioned that RED is an active queue-management algorithm that randomly drops packets before buffers fill up, reducing global synchronization (see Figure 8.13 [25]). For example, the algorithm can be set so that when the average queue occupancy is between 0 and 70 percent full, no packets will be dropped; however, as the queue grows past 70 percent, the probability that random packets are discarded quickly increases up until the queue is full, when the probability reaches 1. Weighted RED
322
Chapter Eight
Figure 8.13 RED.
Queue Control • monitor time-based average queue length (AvgLen) and drops arriving packets with increasing probability as AvgLen increases • no action if AvgLen < MinTH and all packets dropped if AvgLen > MaxTH
Queue ARRIVAL
DEPARTURE
MinTH AvgLen MaxTH
(WRED) takes RED one step further by giving some packets different thresholds at which their probabilities of discarding start [9]. RED is useful only where congestion can occur (for example, WAN links). Also, it is largely useful when most of the traffic is TCP-based but not ideal in UDP environments. The packet scheduler, performing the service-scheduling function, selects packets from the queues and sends them out the egress port. There are several packet-scheduling algorithms that service the queues in a service-differentiated manner. Weighted Round-Robin (WRR) scans each queue, and depending on the weight assigned to a certain queue, WRR allows a certain number of packets to be pulled from the queue and sent out. The weights represent a certain percentage of the bandwidth. In practice, unpredictable delays are still experienced, since a large packet at the front of the queue may hold up smaller-sized packets. WFQ is a more sophisticated packet-scheduling algorithm that computes the time that the packet arrives and the time to actually send out the entire packet. WFQ is therefore able to handle varying-sized packets and to optimally select packets for scheduling. WFQ is work-conserving, meaning that no packets idly wait when the scheduler is free. WFQ is also able to put a bound on the delay as long as the input flows are policed and the length of the queues is bound. In Class-Based Queuing (CBQ), an algorithm used in many commercial products, each queue is associated with a class, where higher classes are assigned a higher weight, translating to relatively more service time from the scheduler that the lower priority queues [9]. What follows amplifies these two concepts.
Quality of Service
323
Congestion-Management Techniques Congestion-management techniques are methods implemented in routers to support the various signaling protocols and provide different classes of service. They involve • Creating different queues for different classes of traffic • An algorithm for classifying incoming packets and assigning them to different queues • Scheduling packets out of the various queues and preparing them for transmission Queue-scheduling algorithms are used to manage bandwidth resource by deciding which packets are to be sent out next. Different solutions have trade-offs in capabilities and complexity. There is no accepted standard, but most routers provide some sort of non-FIFO mechanism. There are several types of queuing techniques commonly implemented, as follows. (See References [1] and [26], as well as Figure 8.14 [6].) First in, first out (FIFO) queues. In this method, packets are transmitted in the order in which they arrive. There is only one queue for all the packets. Packets are stored in the queue when the outgoing link is congested and are sent when there is no congestion. If the queue is full, packets will be dropped. Weighted Fair Queuing (WFQ). In this method, packets are classified into different “conversation messages” by inspection of the TOS value, destination and source port number, destination and source IP address, and so on. One queue is maintained for each “conversation.” Each queue has some priority value or weight assigned to it (calculated from header data). Low-volume traffic is given higher priority over high-volume traffic (for example, DNS traffic over ftp traffic). After accounting for high-priority traffic, the remaining bandwidth is allocated fairly among multiple queues (if any) of low-priority traffic. WFQ also divides packet streams into separate packets so that the bandwidth is shared fairly among individual conversations. The aim of WFQ is to ensure that low-volume high-priority traffic gets the service levels it expects. WFQ also adapts itself whenever the network parameters change, and it cycles through the fair queues and picks up units of information proportional to the above calculation for transmission from each queue. WFQ acts as a preparator for RSVP, setting up the packet classification and scheduling required for the reserved flows. By using WFQ, RSVP can deliver guaranteed service. RSVP uses the mean data rate, the largest amount of data that the router will keep in the queue and the minimum QoS, to determine the bandwidth reservation. During congestion periods, ordinary data packets are dropped, but messages that have control message data will continue to get placed on the queue.
324
• Queuing strategies augmented with strict priority queues for voice useful in supporting voice
1
2
3 Scheduler
P(i)= packet length of flow(i) S(i)= when router begins sending packet F(i)= when router finishes sending packet A(i)= when packet arrives at router therefore F(i)= S(i) + P(i) or F(i)= MAX(F(i-1), A(i)) + P(i)
Scheduler
Figure 8.14 Queue-management techniques for contemporary routers.
• WFQ provides flow protection and can be used to bound delay
source #1 gets 50%, source #2 gets 33% and source #3 gets 16.67% of the bandwidth
• Example: WFQ scheduler orders packets for departure based on "weights" signaled into the network from the source
Source #3
Source #2
Source #1
Weighted Fair Queuing
• Each F(i) (timestamp) is computed for each packet and the one with the smallest F(i) value is transmitted
Source #3
Source #2
Source #1
• Objectives: • fair access to bandwidth and resources in routers • no one flow shall receive more than its fair share • Assume queues are serviced in a bit-by-bit round robin (BR) fashion • transmit one bit from each queue • but one cannot interleave bits from different queues • compute when packet would have left the router using BR
Fair Queuing
Priority Queuing • Some traffic classes require low latency and jitter
Data
Voice
Queue #3
Queue #2
Queue #1
Scheduler 2nd pass A - packet served, new DC = 500 B - no packet, new DC = 0 C - packet served, new DC = 800
1st pass A - packet not served, new DC = 100 B - packet served, new DC = 200 C - packet not served, new DC = 100
Quantum = 1000 bytes
W3
W2
W1
Scheduler
• Assign weight to each queue • Service each non-empty queue in proportion to its weight in round-robin fashion • Optimal when using uniform packet sizes, small number of flows, and connections are long
Queue #3
Queue #2
Queue #1
Weighted Round Robin (WRR)
• Does not need to know mean packet size up front • Assign quantum to queue scheduler and initialize deficit counter (DC) = 0 • Service each non-empty queue if packet size <= (quantum + deficit counter) and update DC, else add quantum to DC and try next pass • Simple to implement and same WRR optimality issues
1200
800
1500
Deficit Round Robin (DRR)
WFQ - Class 2
WFQ - Class 1
Priority Queue
Quality of Service
325
Custom Queuing (CQ). In this method, separate queues are maintained for separate classes of traffic. The algorithm requires an octet count to be set per queue. This octet count, rounded to the nearest whole packet length, represents the amount of data scheduled for delivery, ensuring that the minimum bandwidth requirement by the various classes of traffic is met. CQ roundrobins through the queues, picking the required number of packets from each. If a queue is of length 0, the next queue will be serviced. CQ is a static strategy; it does not adapt to the network conditions. The system takes a while longer to switch packets, since packets are classified by the processor card. Priority queuing (PQ). One can define four traffic priorities—high, medium, normal, and low. Incoming traffic is classified and placed on queue in either of the four queues. Classification criteria are protocol type, incoming interface, packet size, fragments, and access lists. Unclassified packets are put in the normal queue. The queues are emptied in the order of high, medium, normal, and low. In each queue, packets are in the FIFO order. During congestion, when a queue gets larger than a predetermined queue limit, packets will be dropped. The advantage of PQs is their absolute preferential treatment to high-priority traffic so that mission-critical traffic always gets top-priority treatment. The disadvantage is that PQ represents a static scheme; it does not adapt itself to network conditions, and it is not supported on any tunnels.
Congestion-Avoidance Mechanisms Whereas congestion management deals with strategies to control congestion once it has sent in, congestion avoidance deals with strategies to anticipate and avoid congestion in the first place. There are several often-used strategies [1, 27]: Tail drop. As usual, at the output we have queues of packets waiting to be scheduled for delivery. Tail drop simply drops an incoming packet if the output queue for the packet is full. When congestion is eliminated, queues will have room, and tail drop allows packets to be queued. As noted earlier, the main disadvantage is the problem of TCP global synchronization (GS), where all the hosts send at the same time and stop at the same time. This can happen because tail drop can drop packets from many hosts at the same time. RED drop. These strategies should be employed only on top of reliable transport protocols like TCP; only then can they act as congestion avoiders. RED starts dropping packets randomly when the average queue size is more than a threshold value. The rate of packet drop increases linearly as the average queue size increases until the average queue size reaches the maximum threshold. After that, a certain fraction—designated as the mark probability denominator—of packets are dropped, once again randomly. The minimum threshold should be greater than some minimum value so that packets are not dropped unnecessarily. The difference between maximum and minimum threshold should be great enough to prevent GS.
326
Chapter Eight
WRED drop. This is a RED-based strategy where, in addition, the algorothm drops low-priority packets over high-priority packets when the output interface starts getting congested. For intserv environments, WRED drops nonRSVP-flow packets; for diffserv environments, WRED looks at IP precedence bits to determine priorities and hence which ones to selectively dump. WRED is usually configured at the core routers since IP precedence is set only at the core edge routers. WRED drops more packets from heavy users than meager users so that sources generating more traffic will be slowed in times of congestion. Non-IP packets have precedence 0—that is, the highest probability to be dropped. The average queue size formula is as follows: Average = (old_average ∗ 2^ (-n)) + (current_queue_size ∗ 2^(-n)) where n is the exponential weight factor configured by the user. A high value of n means a slow change in the “average,” which implies a slow reaction of WRED to changing network conditions: It will be slow to start and stop dropping packets. A very high n implies no WRED effect; a low n means WRED will be more in synch with current queue size and will react sharply to congestion and decongestion. Very low n, however, means that WRED will overreact to temporary fluctuations and may drop packets unnecessarily.
Conclusion Summarizing this entire discussion on QoS, packet purists believe that diffserv is the best approach to QoS because there is very little if any state information kept along the route, while those who are more in the carriers’ camp believe that intserv is a better approach because resource reservations and allocations can be better managed in the network in terms of being able to engineer networks and maintain SLAs. Either approach will probably provide reasonable results when the entire system is optimally designed. Purists note that in Frame Relay and ATM (and, to a degree, MPLS), the path forwarding state and traffic management or QoS state are established for traffic streams on each hop along a network path. Traffic aggregates of varying granularity are associated with a label-switched path at an ingress node, and packets/cells within each label-switched path are marked with a forwarding label that is used to look up the next-hop node, the per-hop forwarding behavior, and the replacement label at each hop [19]. This model permits fine-granularity resource allocation to traffic streams, since label values are not globally significant but significant only on a single link; therefore, resources can be reserved for the aggregate of packets/cells received on a link with a particular label, and the label-switching semantics govern the next-hop selection, allowing a traffic stream to follow a specially engineered path through the network. This improved granularity comes at the cost of additional management and configuration requirements to establish and maintain the label-switched paths. In addition, the amount of forwarding state maintained at
Quality of Service
327
each node scales in proportion to the number of edge nodes of the network in the best case (assuming multipoint-to-point label-switched paths) and in proportion with the square of the number of edge nodes in the worst case, when edge-to-edge label-switched paths with provisioned resources are employed.
8.5
Case Study
This section includes as a case study a White Paper generated by Cisco Systems, which pulls together nicely many of the QoS concepts described in this chapter [28].8 Multiservice IP infrastructure must be enabled with IP traffic-measurement features to feed several applications, such as traffic engineering, billing, routes analysis, performance monitoring, or service attack analysis. Control and efficiency of network resources are becoming the objective of Internet service providers (ISPs) for addressing increased competition and decreasing profit margins. These economic challenges translate into service requirements and challenges that an IP router vendor must address. These service requirements translate into network design requirements and IP platform requirements. This section focuses on platform requirements that help the network to predict the service performance of a real-time signal. The terms IP traffic management and measurement are used to refer to the collective intelligence (specifically, hardware and software functions) within the Cisco 12000 Series Internet router that allows provisioning and monitoring of services that require predictability such as VOIP. Figure 8.15 shows a typical service provider topology for carrying long-haul or international VOIP traffic.
Real-Time Service Requirements By their nature, voice and video applications do not need the network to deliver a fixed amount of bandwidth. What these applications do need is for the network to minimize transit delay and to keep delay within a reasonably narrow range. Within a VOIP connection, one endpoint takes a voice stream, in digitized format, packetizes it, and then transmits it over the IP network. The network introduces some variation in the delay (jitter) with each packet delivered. The receiving endpoint depacketizes the voice stream, buffers it, and plays back the original signal. Buffering cancels network-induced jitter, and the voice signal can be played back at a steady rate as long as transit delay is contained within a narrow range. Packets arriving before playback time can be used to reconstruct a source signal. Packets arriving after playback time are useless in reconstructing a real-time signal. Real-time applications require predictable service from the network; that is, they require a bound (known as a priori) on the delivery delay of each packet. In general, lower delay is preferred. Typically, a one-way transit delay of up to 150 ms does not cause a perceivable degradation in voice quality for most telephony appli-
328
Figure 8.15 Cisco 12000 Series network-supporting VOIP.
IP Phone
EQ/ Tandem
PST
Voice Gateways
POP
GK
Public Internet
12000 Network QoS-enabled
CORE
GK
PST
Other TDM LD Carriers
IP Phone
EQ/ Tandem
LD Hop-off Voice Gateways
Voice Gateways
POP
Quality of Service
Figure 8.16 VOIP quality with and without IP traffic-management features. (Source: CiscoLabs)
Congested 110
10 High
100
9 Quality
90
8
80
Link Saturation (%)
Lightly Loaded
7
70
6
60
Quality
5 (PSOM)
50 40
4
30
3
20
2
10
1 Low 0 Quality
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Source: CiscoLabs
329
Timeline
¥ Under load, a router without traffic management capabilities does not maintain Voice quality
cations. To set playback time (essentially defining total application delay), a telephony application needs to have some information about the maximum delay (the statistical bound) that each packet will experience. Since all data is buffered until playback time, an application is indifferent regarding when data is delivered as long as it arrives before playback time. Certain telephony gateways use an adaptive algorithm to size jitter buffers. The adaptive algorithm uses the current value of jitter experienced at the terminating gateway to size the jitter buffer. A large contributor of network-induced jitter is the queuing delay that each packet accumulates in routers. There are two components of this jitter. One is caused by contention for resource between packets from multiple real-time applications, such as voice calls. The other component is contention for resource between a real-time packet and non-real-time packet. This jitter must be bounded and minimized by the network to support real-time communication. Network resources (bandwidth and buffers) are shared between voice and video streams, as well as transmission control protocol (TCP) traffic that is bursty and bandwidth-consuming. To provide predictable and reliable service for voice and other real-time applications, IP routers must implement IP traffic-management features, including intelligent scheduling algorithms (traffic isolation), congestion avoidance (traffic protection), and network outage recovery mechanisms to protect from node or path failures (traffic engineering and fast reroute). Figure 8.16 shows the relationship between voice quality with and without QoS when congestion occurs in a network. The center shows how voice quality degrades significantly as a direct consequence to increases in traffic load without
330
Chapter Eight
QoS. The line at the top shows a network with QoS capabilities maintaining voice quality even during heavy network load conditions.
Technical Challenges One major challenge in providing end-to-end QoS in a shared infrastructure is supporting scalability. Traditional connection-oriented approaches such as TDM and ATM provide hard end-to-end QoS guarantees, which involve per-flow signaling, buffering, and scheduling and are difficult to implement in high-speed core routers. IP traffic-management methods based on traffic aggregation are required to control end-to-end service performance. A key element for planning and managing these enhanced services is having a scalable IP traffic-measurement capability. It is important that these methods be consistent across different nodes within a network and consistent across different media interfaces along a given path.
Cisco Solutions for Supporting IP-Based Real-Time Services The following sections examine different components of IP traffic-management features built in the Cisco 12000 Series Internet routers and the methods (architecture) for deployment within a network to achieve predictability of service performance within a scalable and efficient infrastructure. These features consist of a control plane imbedded in Cisco IOS software and in a data plane fully implemented in hardware. The following features will be examined: • • • • •
Classification Policing and marking Shaping Scheduling Congestion manager
End-to-End IP QoS Architecture Contention for network resources between applications with conflicting requirements is addressed by separating traffic into classes for different treatment. Classes can be defined to meet specific requirements, such as delay/jitter limit and packetloss limit. Admission control to a class is needed to ensure that no more traffic gets admitted than the resources allocated to that class can satisfy. Contention for resources between traffic with the same class (same requirement) is addressed by either using call admission control (voice, video) or congestion-avoidance mechanisms (TCP flows).
Quality of Service
331
intserv defines fine-grained (flow-based) methods of performing IP traffic admission control that uses RSVP; diffserv defines methods of classifying IP traffic into coarse-grained service classes and defines forwarding treatment based on these classifications. The end-to-end Cisco QoS architecture is based on a combination of diffserv architecture and the integration of resource admission control from RSVP/intserv.
intserv/RSVP The intserv model inherits the connection-oriented approach from telephony network design. Every individual communication must explicitly specify to the network its traffic descriptor as well as requested resources. The edge router performs an admission control to ensure that available resources are sufficient in the network. The intserv standard assumes that routers along a path set and maintain state for each individual communication. The intserv model has not been widely adopted in the Internet, because the connection-oriented approach assumes a “flat” model of the Internet, that is, one that is administratively homogeneous. The number of connections required to handle all traffic of the Internet leads to a state explosion in the core routers. However, the resource admission control concept defined within intserv is a useful tool to manage application-level traffic with strict QoS requirements. The role of RSVP in the Cisco QoS Architecture is to provide resource admission control for VOIP networks. If resources are available, RSVP accepts a reservation and installs a traffic classifier in the QoS forwarding path. The traffic classifier tells the QoS forwarding path how to classify packets from a particular flow and what forwarding treatment to provide. The installation of a traffic classifier and flow treatment is the interface between RSVP and diffserv. RSVP is a control plane feature that limits accepted VOIP load to what the network can support. Integration of resource-based admission control with the diffserv network (RSVP aggregation) aims at achieving scalable, strict QoS guarantees for VOIP calls.
diffserv The diffserv approach divides QoS into a number of functional elements to be implemented in network interfaces. Each element provides a traffic-control function or forwarding treatment (PHB) on a packet-by-packet basis. The standard defines the three types of functional elements: 1. A small set of PHBs. Typically, a PHB represents the scheduling and discard priorities that a packet should receive on a router interface. Each PHB is identified by a DSCP, which occupies six bits of the TOS byte in the IP header. 2. Packet classification.
332
Chapter Eight
Figure 8.17 diffserv PHB functionality.
Classification Scheduling
Classification Policing Marking Scheduling
Classification Marking Shaping Scheduling
V
V Classification Policing Marking Scheduling
Classification Marking Shaping Scheduling
Classification Scheduling
3. Traffic-conditioning functions, including metering, marking, shaping, and policing. This model, as shown in Figure 8.17, achieves scalability by implementing complex classification and conditioning functions at network boundary interfaces and by applying a PHB to aggregates of traffic (BAs) in the core. A simple traffic reclassification may be required at the core. An important component of the QoS Architecture is MPLS QoS. MPLS may be deployed within an IP network, for example, to provide a scalable VPN solution, traffic engineering/tunneling, and other applications. Because MPLS is not an end-to-end host protocol, an MPLS infrastructure is required to support IP QoS features rather than extend them to a new QoS. It is important to note that QoS is an end-to-end characteristic of an application or host that should not be altered by the transport technology (in this case, MPLS). Classification is the process of defining a class of traffic and identifying packets that belong to the same class of traffic or, more simply, packets from applications having the same performance requirements. (See Figures 8.18 and 8.19.) TCP/IP traffic classification can be done in two modes: BA or multifield (MF). BA classification is based on only the diffserv DSCP. For best scaling, this mode is more suitable in the core. MF classification selects packets based on a combination of one or more fields based on the TCP/IP header. All fields are user-programmable and include input interface, Media Access Control [MAC] address, DS field, source and destination IP address/prefix, source and destination port, and protocol type. MF mode is typically specified by access control lists (ACLs), to which the classification criteria will match. Its implementation is recommended to be as close to the edge as possible.
V
Re
sv
Figure 8.18 An example of diffserv and RSVP integration classification.
th
Path
Resv
333 V Flow Aggregation
MF Policing
V
diffserv Entry
h
Resv
diffserv Exit
V
BA Scheduling
MF - BA Translation Admission BA or MF Shaping BA or MF Scheduling
BA Policing
Administration MF Scheduling
P at
BA Shaping BA Scheduling
V
Flow Aggression
Resv
Flow Source
Pa
V
Transit Router
Control Path Requirements Forwarding Path Requirements MF = Micro Flow BA = Behavior Aggregate
V Transit Router
Re
sv
V
Flow Destination
Admission MF Scheduling
th Pa
MF - BA Translation Admission BA or MF Shaping BA or MF Scheduling
Path
334
Chapter Eight
Figure 8.19 Classification at the edge.
Customer-Controlled Router
Customer Premise
Network Edge Packet Classifier
Cisco IOS software offers flexible and multipolicy classification solutions that address current requirements and scale to future needs.
Metering Traffic metering provides traffic controls that accommodate temporary bursts while limiting traffic sources to a long-term average rate. (See Figure 8.19.) Cisco IOS software uses a token bucket technique to measure the traffic rate. Token bucket rate control does not intend to shape traffic streams. Token buckets are specified by defining three traffic parameters: 1. Committed rate—Measured in bits per second, this is the long-term average rate permitted for the traffic source. Tokens are inserted into the bucket at the committed rate. 2. Normal burst—Measured in bytes, this allows for temporary bursts of packets that conform to the token bucket limit. The normal burst represents the bucket’s depth. 3. Excess burst—Provides a “bonus round” where excess packets are gradually dropped to warn the violating traffic source to slow down before groups of packets are discarded from exceeding the rate limit. When traffic arrives, if sufficient tokens are available, then the traffic is said to conform; if not, the traffic is said to exceed. The appropriate action policy is then executed.
Quality of Service
Figure 8.20 IP traffic metering.
335
Tokens P B B - Burst Size P - Rate Limit
Packet Discard %
Tokens Extracted for Packets
100
Packets Arriving Normal Burst
Conform
Excess Burst
Exceed
Policing and Marking Policing is the process of ensuring that incoming traffic belonging to a given class is conforming to the traffic profile defined (either signaled or provisioned) for that class. Policing happens at the ingress of a service provider network (domain). Typically, the profile is specified in terms of traffic type, incoming interface, average rate, and instantaneous rate limits. A flexible policing implementation can also be used to protect against attack. If the traffic does not conform, because the source is exceeding its allocated resource, corrective/proactive actions will be needed to protect conforming traffic. Options include dropping the packets in excess of the contract, marking the packets as lower priority in the scheduler, or just passing the packets out. In Cisco IOS software, the policer and marker components are implemented via the committed access rate (CAR). Typically, the user specifies both metering and action parameters. The policer and marker components can operate in four modes: 1. Policing off (transmit)—Every packet is transmitted 2. Policing only—Nonconforming packets are dropped 3. Policing and marking (set precedence and transmit)—Nonconforming packets are marked 4. Marking—All incoming packets are marked From the operation perspective, CAR allows the user to specify an action to be taken for packets that either conform to or exceed the specified rate limit. The set of conform-actions and exceed-actions from which drop policies are selected are as follows:
336
Chapter Eight
• Transmit—Switch the packet • Set precedence and transmit—Mark the precedence or DCSP bits and then switch • Continue—Evaluate the next rate limit in a chain of rate limit statements • Set precedence and continue—Mark the precedence DCSP bits and then evaluate the next rate limit in the chain • Drop—Discard the packet The packet-classification functionality of CAR is an outcome of trafficmatching and rate-measurement functionality; it is not a standalone capability. Multiple policing actions can be specified for an interface as long as they are associated with token bucket parameters. If a packet matches the traffic-matching specification associated with a rate limit, the associated token bucket will be examined and the conform or exceed action will be executed. Once the transmit action is selected and executed, subsequent policing statements are not evaluated. CAR policing statements can be nested by using the continue key word. The default action at the end of a list of one or more rate limit statements is to transmit the packet.
Traffic Shaping Traffic-shaper delays some or all outgoing IP packets to bring a stream into compliance with the traffic profile associated with the output link on which it will go out. Typically, traffic shaping happens at the egress of a service provider network (domain). A shaper uses a buffer to store the incoming packets in excess (burst) and to delay their transmission to the output interface. Typically, the shaper uses the result of the traffic meter to decide whether a packet should be expedited or delayed. Distributed traffic shaping (DTS) is the implementation of the Cisco IOS traffic-shaping scheme on a per-line-card basis. DTS on the Cisco 12000 Series Internet router provides traffic shaping regardless of the encapsulation configured on the interface. DTS on the Cisco 12000 Series Internet router can shape the output traffic to a specified bit rate and buffer excessive packets in the shape queue to transmit out later. Each shape queue uses a first in, first out (FIFO) as the scheduling scheme. Traffic-shaping parameters (traffic descriptors) are user-configurable. Traffic descriptors include three components: 1. Committed information rate (CIR), specified in bits per second to sustain 2. Committed burst size (Bc), specified in bits per burst 3. Excess burst size (Be), specified in bits of queuing maintained in the pipeline
Quality of Service
337
The traffic shaper smoothes the incoming traffic into the average bit rate defined by the CIR before it puts it out on the outgoing link. The Bc is used to derive the time interval over which the shaper monitors arriving traffic (Time interval = Bc/CIR). The user defines the average bit rate that is expected and burst size that is acceptable on that shape entity. Excess burst size defines the acceptable number of bits permitted to go over the burst size.
Policy-Based Routing Policy-based routing (PBR) is a Cisco IOS QoS feature that allows users to override routing results obtained from a standard IP routing engine. Customers can define policies that selectively cause packets to take a different path than the one computed by the routing algorithm. The selected traffic can then benefit from a high-bandwidth or low-delay link. PBR works in conjunction with a Cisco IOS software classifier that filters traffic to which the policy should be applied. PBR may also work with a marker to set IP precedence bits before forwarding a packet to a defined path. PBR is typically used at network peering interfaces; incoming traffic that matches a defined user policy is directed to a private Internet connection (private peering link), whereas the rest of the traffic is directed to a public connection (public peering link).
Traffic Scheduling The Cisco 12000 Series uses two-stage input scheduling to the switching fabric and output scheduling from the switching fabric. For each stage, a combination of virtual output queuing (VOQ), round-robin, and an enhanced version of modified deficit round-robin (MDRR) are used to control and isolate voice and video traffic. RED/WRED is used for traffic protection within a class. Each traffic class is assigned a queue that isolates its traffic from traffic of different classes. The role of the scheduler is to provide a traffic class with a level of service that is independent from other traffic behavior. The scheduler uses multiple queue service disciplines to assign an amount of link bandwidth to a class. The scheduler must support head-of-line blocking (HOL) avoidance. As shown in Figure 8.21, HOL blocking occurs when a packet at the head of a queue is blocking traffic to all destinations because its destination line card is unavailable. Typically, the scheduler is provisioned so that voice and video classes have controlled latency and jitter. VOQ is a queuing technique in which packets from the routing engine or switching fabric are sorted according to output interface and IP precedence (Figure 8.22). At the receive side, an output port is the destination slot of a packet; this queue is called local output queue (LOQ). At the transmit side, the port is a physical port connected to a network; this queue is called output queue (OQ). Along with these queues, there are eight multicast queues per line card. The VOQ must be implemented to avoid the head-of-line blocking.
338
Chapter Eight
Figure 8.21 Head-of-line blocking.
1
LC0
2
Single Queued Fabric
3 4
LC1 LC2 LC3
Without virtual output queuing, the second packet in the queue 4 is blocked although the line card 2 (LC2) is ready to receive traffic.
LOQs are serviced by hardware using a combination of RR and MDRR. The RR algorithm cycles through the queues one after the other, transmitting one packet before moving to the next queue. Each LOQ, along with multicast, is serviced round-robin. Within a group, the queues are serviced MDRR. MDRR is an enhancement to DRR by adding high-priority-HP. DRR servicing discipline tracks the byte deficit (specifically, the difference between the number of bytes that should have been sent and the number of bytes that have been sent) for each queue and uses it to regulate long-term bandwidth assigned to the queue. The MDRR service algorithm can work in two modes: 1. Strict-priority mode, where the high-priority queue is serviced first. Only when all high-priority traffic is clear will other queues be considered. Remaining queues are serviced in DRR. This mode has the advantage of guaranteeing a minimum latency for the high-priority queue. 2. Alternate-priority mode, where a quantum of data is taken from the highpriority queue, followed by one quantum from one of the other queues (chosen via DRR), and followed back to a quantum of the high-priority queue. Figure 8.22 VO/COS Queuing in the Cisco 12000 Series router.
WRED
CAR Incoming Traffic
CEF
Transmit Line Card Local VOQ
MDRR DRR
Crossbar Switch Fabric
Receive Line Card
MDRR DRR
. . .
Per COS Queuing
Quality of Service
Figure 8.23 MDRR operation.
Alternate Priority
339
Strict Priority LLQ
LLQ
• • LLQ = Low Latency Queue
Congestion Manager The previous section described the way the scheduler provides isolation for multiple traffic classes. The scheduler has no way to differentiate between traffic in the same queue. The congestion manager, known as drop preference, provides protection of one traffic class (an application or a user) from the misbehavior of other traffic classes within the same queue. The Cisco 12000 Series Internet routers use WRED as a mechanism for adjusting the congestion notification to TCP sources without causing TCP GS. WRED is an active queue-management technique that uses a weighted value of a queue’s average occupancy, discard thresholds, traffic priority (such as DCSP or MPLS EXP bits), and a random function to decide whether to drop a packet during network congestion. When multiple traffic classes are combined into the same queue, each class uses a different set of RED parameters. Typically, the drop threshold for higher-priority traffic is set above that used for lower-priority traffic, meaning that as the average depth of the queue increases, the drop probability increases for the lower-priority traffic first and the drop probability for the upper-priority traffic remains the same. Unlike other network equipment vendor implementations of WRED, the Cisco 12000 Series Internet router computes the drop probability of an incoming packet based on a weighted average queue length instead of the current queue length. By using an average queue length, it allows RED to not react to a short (and transient) burst but react only to persistent congestion. At the other end of a connection, drop probability based on current queue length adversely affects QoS and network stability.
Traffic Engineering Network traffic volumes can vary over time and by circumstance. Real-world variables, such as time of day, day of week, and natural disasters, can create extra traffic loads that cannot be easily handled by the network paths computed by the IP layer. Traffic engineering is the ability to control an entire path that the traffic will use to move from an ingress point to an egress point within a network. This means
340
Chapter Eight
that each individual flow can be forwarded independently from the rest of the traffic and independently from the paths chosen by Layer 3. Based on the offered load, a portion of traffic can be dynamically shifted over less-used paths. An added benefit is higher use of available bandwidth. The connection characteristics of MPLS provide a practical way to exercise traffic engineering within an IP network. MPLS tunnels are created explicitly between an ingress router and an egress router of a network to carry a portion or all IP traffic between the ingress and egress routers. The hierarchical and connection characteristics of MPLS provide the ability to not only set up a traffic path but also define a protected path around a link to cover that link or a node failure. This concept is known as fast reroute.
References 1. A. Paul. “QoS in Data Networks: Protocols and Standards.” www.cis.ohio-state.edu/~jain/cis788-99/qos_protocols/index.html. 2. D. Minoli and A. Schmidt. Internet Architectures. New York: Wiley, 1999. 3. A. Schmidt and D. Minoli. Multiprotocol over ATM Building State of the Art ATM Intranets Utilizing RSVP, NHRP, LANE, Flow Switching, and WWW Technology. Prentice-Hall, 1998. 4. D. Minoli and A. Schmidt. Network Layer Switched Services. New York: Wiley, 1998. 5. D. Minoli. Broadband Network Analysis and Design. Norwood, MA: Artech House, 1993. 6. C. Metz. “A Survey of Advanced Internet Protocols.” NGN 2001. www .cisco.com. November 2001. 7. J. Zeitlin. “Voice QoS In Access Networks—Tools, Monitoring and Troubleshooting.” NGN 2001 Proceedings, Boston, Mass. November 2001. 8. R. Pulley and P. Christensen. A Comparison of MPLS Traffic Engineering Initiatives. NetPlane Systems, Inc., White Paper. Westwood, MA. www.netplane .com. 9. D. Kakadia. “Tech Concepts: Enterprise QoS Policy Based Systems and Network Management.” Sun Microsystems. www.sun.com/software/bandwidth/ wp-policy/. 10. R. Braden, D. Clark, and S. Shenker. “Integrated Services in the Internet Architecture: An Overview.” RFC 1633 (June 1994). 11. R. Braden (ed.), L. Zhang, S. Berson, S. Herzog, and S. Jamin. “Resource ReSerVation Protocol (RSVP)—Version 1 Functional Specification.” RFC 2205 (September 1997).
Quality of Service
341
12. F. Tommasi, S. Molendini, and A. Tricco. Integrated Services across MPLS Domains Using CR-LDP Signaling, Internet Draft. May 2001. 13. C. Semeria. RSVP Signaling Extensions for MPLS Traffic Engineering. Juniper Networks, Inc., White Paper. Sunnyvale, CA, 2000. www.juniper.net. 14. A. Mankin (ed.), F. Baker, B. Braden, S. Bradner, M. O’Dell, A. Romanow, A. Weinrib, and L. Zhang. “Resource ReSerVation Protocol (RSVP)—Version 1 Applicability Statement—Some Guidelines on Deployment.” RFC 2208 (September 1997). 15. F. Baker, C. Iturralde, F. Le Faucheur, and B. Davie. Aggregation of RSVP for IPv4 and IPv6 Reservations, Internet Draft. www.ietf.org://draft-ietf-issll-rsvpaggr-04. April 2001. 16. D. Minoli. Delivering Voice over MPLS Networks. New York: McGraw-Hill, 2002. 17. F. Le Faucheur. MPLS Support of Differentiated Services, Internet Draft. search.ietf.org/internet-drafts/draft-ietf-mpls-diff-ext-09.txt. April 2001. 18. B. Jamoussi et al. Constraint-Based LSP Setup Using LDP, Internet Draft. www.ietf.org://draft-ietf-mpls-cr-ldp-05. August 2001. 19. S. Blake et al. “An Architecture for Differentiated Services.” RFC 2475 (1998). 20. www.microsoft.com/hwdev/network/qos/default.htm. 21. RFC 1821. M. Borden, E. Crawley, et al., “Integration of Real-Time Services in an IP-ATM Network Architecture.” www.sunsite.auc.dk.RFC/rfc/raf1821.html (August 1995). 22. A. Schmidt and D. Minoli. MPOA. Greenwich, CT: Prentice-Hall/Manning, 1998. 23. K. Nichols, S. Blake, F. Baker, and D. Black. “Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers.” RFC 2474 (December 1998). 24. Y. Bernet, R. Yavatkar, P. Ford, F. Baker, L. Zhang, K. Nichols, and M. Speer. A Framework for Use of RSVP with Diffserv Networks, Internet Draft. 25. www.arici.org/floyd/red.html. 26. Cisco Systems White Paper. Congestion Management Overview. www.cisco .com/univercd/cc/td/doc/product/software/ios120/12cgcr/qos_c/qcpart2/ qcconman.htm. 1999. 27. Cisco Systems White Paper. Congestion Avoidance Overview. www.cisco .com/univercd/cc/td/doc/product/software/ios120/12cgcr/qos_c/qcpart3/ qcconavd.htm. 1999. 28. Cisco Systems White Paper. Designing Service Provider Core Networks to Deliver Real-Time Services. January 2001.
342
Chapter Eight
Notes 1
Products may have signaling, but generally speaking, this signaling is not sufficiently robust in design; nor does it support interworking in an efficient, pervasive, reliable, and routine manner. 2 For over two decades, the senior author has advanced a heuristic that states, “the probability that a standard achieves broad implementation and deployment is inversely proportional to the size (measured in pages, words, or states) of the specification.” Very few protocols thicker than 1 inch have seen broad implementation and deployment in this field (the reader is encouraged to identify a few). Upon reflection, the reason is obvious: The thicker the specification document, the larger the hardware chipset, real estate, power, chassis size, fan assembly, and hardware debugging needed. Also, the thicker the specification document, the larger the software coding effort, the programming complexity, the memory, the CPU power, and the debugging needed. The time to market will be perilously elongated as well. 3
The listing is based on www.cis.ohio-state.edu/~jain/refs/ipq_book.htm. However, if IP is actually deployed at the core of the network in support of VOIP, as discussed in Chapter 11, the QoS can also initially be targeted for the core. 5 Grade of service relates to an overall level of service delivery (similar to an SLAoriented view), while QoS refers to the achievement of specific network parameters within defined ranges, such as 0.100 < delay < 0.200 s. 6 The $700-billion debt created by the telecommunications industry in 2000 and the abundance of carrier failures in 2001 strongly argue not only for mathematically sound, statistically significant primary market research, but also for mathematically sound forecasting of demand and analytical decision making regarding engineering and deployment. 7 WRED provides per-class RED queues. A variant is Flow RED (FRED), which supports per-flow queues. 4
8
This replaces the case study paper that was included in the first edition of this book.
CHAPTER 9 Voice over MPLS and Voice over IP over MPLS 9.1
Introduction and Background
Voice over multiprotocol label switching (VOMPLS) networks are receiving industry attention. MPLS is an emerging standard that provides a link layer– independent transport framework for IP. Specifically, MPLS runs over Asynchronous Transfer Mode (ATM), Frame Relay (FR), Ethernet, and point-to-point packet (PPP) mode links. MPLS adds connection-oriented capabilities to the connectionless IP architecture while using existing IP mechanisms for addressing of elements and routing of traffic. Proponents see MPLS as a key development in IP/Internet technologies to help increase the number of capabilities—including traffic engineering (TE) capabilities—essential to today’s best-effort packet networks, providing traffic with different qualitative Classes of Service (CoS) and different quantitative Quality of Service (QoS), as well as providing IP-based virtual private networks (VPNs). It is expected that MPLS will help address the ever-present scaling issues faced by the Internet as it continues to grow (see Figure 9.1). MPLS enjoys certain attributes that conceivably make it a better technology than pure IP for the support of packetized voice applications. Two approaches have evolved: (1) voice over MPLS (VOMPLS) and (2) voice over IP (VOIP) with IP over, or encapsulated in, MPLS (VOIPOMPLS). IP purists prefer the latter; MPLS advocates prefer the former. In this chapter, VOMPLS refers generically to either approach, except where noted. The MPLS Forum, an industrywide association of leading networking and telecommunication companies focused on accelerating the deployment of MPLS,
343
344
Chapter Nine
Figure 9.1 MPLS features of interest.
Fundamental Features of MPLS Traffic Engineering (TEA)
QOS
VPNs (L2 & L3)
Note: All three of these features are of interst to VoMPLS/VoIPoMPLS
announced in July 2001 that its members have approved for general release an implementation agreement for VOMPLS. The MPLS Forum defines VOMPLS as voice traffic carried directly over MPLS without IP encapsulation of the voice packet. VOMPLS represents the use of MPLS as an efficient transport of voice services within an IP MPLS network. The announcement, representing the MPLS Forum’s first completed implementation agreement, provides a standards-based approach for service providers offering voice over packet (VOP) services to interconnect voice media gateways over their MPLS networks. According to the Application and Deployment Working Group of the MPLS Forum, the VOMPLS implementation agreement is an important milestone toward the deployment of reliable IP-based voice services on multiservice networks. This standard will accelerate product innovation and the deployment of MPLS-based equipment. According to the working group, voice is a key application in the success of core network technologies. The VOMPLS implementation agreement enables voice to be carried directly on the MPLS network, filling a significant gap in current MPLS standards. While developing this implementation agreement, the MPLS Forum cooperated with the ITU-T study groups. As a result, two ITU-T study groups—SG 11 (for signaling requirements and protocols) and SG 13 (for multiprotocol and IP-based networks)—have recently initiated work on VOMPLS. This chapter is a short synopsis of a 400-page book on the topic of VOMPLS published by the senior author [1].
9.2
Motivations
Two technical factors that so far have held back the deployment of VOP and VOIP on a broad scale are (1) QoS considerations for packet networks and (2) robust
Voice over MPLS and Voice over IP over MPLS
345
signaling in support of interworking with the embedded PSTN, which is not going to go away at any time in the next decade or two. MPLS is a new technology that is expected to be used by many future core networks, including converged data and voice networks. The promise of MPLS is to 1. Provide a connection-oriented protocol for Layer 3 IP 2. Support the ability to traffic-engineer the network 3. Support wireline-speed forwarding of protocol data units MPLS enhances the services that can be provided by IP networks, offering TE (that is, specified routes through the network), guaranteed QoS, and VPNs. (MPLS, however, is not really a multiprotocol system: it works only with IP, not other protocols, although it can be multiservice.) QoS is where MPLS can find its best potential in supporting voice applications. The improved traffic management, the QoS capabilities, and the expedited packet forwarding via the label mechanism can represent significant technical advantages to voice [2]. The following predicaments have motivated the development of a new technology [3]: • New applications require services that are deterministic in nature. The specific service characteristics required by the applications must be guaranteed across the complete path of the network in which the application data traverses. Providing the deterministic service by using the nondeterministic IP network presents a major challenge. • Current routing technology uses the best available path information based only on the destination address; the application data’s attributes, however, are not considered. • As the network grows, increased demand occurs on the routers to handle the huge amounts of routing information in addition to the applications data. Moreover, the forwarding decision made at each hop as a packet travels from one router hop to another inhibits scalability and performance. MPLS addresses a number of the aforementioned predicaments. It is yet another type of network compared with IP, FR, and ATM. What follows are some key highlights of MPLS [4]. • Improves packet-forwarding performance in the network: • Enhances and simplifies packet forwarding through routers using Layer 2 switching paradigms • Is simple, allow for easy implementation • Increases network performance by enabling routing via switching at wireline speeds
346
Chapter Nine
• Supports QoS and CoS for service differentiation: • Uses traffic-engineered path setup and helps achieve service-level guarantees • Incorporates provisions for constraint-based and explicit path setup • Supports network scalability: • Can be used to avoid the N2 overlay problem associated with meshed IP over ATM networks • Integrates IP and ATM in the network: • Provides a bridge between access IP and core ATM • Can reuse existing router and ATM switch hardware, effectively joining the two disparate networks • Builds interoperable networks: • Is a standards-based solution that achieves synergy between IP and ATM networks • Facilitates IP over synchronous optical network (SONET) integration in optical switching • Helps build scalable VPNs with TE capability In MPLS networks, the packet-forwarding functions are decoupled from the route-management functions. In these networks, packets are forwarded based on short labels. The traditional IP header analysis is not performed at the endpoint of each hop; instead, each packet is assigned to a flow once when it enters the network. MPLS label switching uses Layer 3 routing information while performing the switching, using hardware support, at Layer 2. Consequently, MPLS results in high-speed routing of information (data, voice, video, and multimedia) through the network, based on parameters such as QoS and application requirements. MPLS does not replace IP routing but, instead, works alongside existing routing technologies to provide very high speed data forwarding between labelswitched routers (LSRs). Figure 9.2 further highlights the separation of functions. Figure 9.3 shows a typical network [5]. Figure 9.4 depicts the basic operation of MPLS. At first pass, MPLS is a better packet technology for voice support because of its QoS capabilities, its TE capabilities, and its ability to create Layer 2 or Layer 3 tunnels. Also, MPLS allows better integration with ATM networks already deployed in carrier environments than that allowed by a pure IP network. It is important to realize, however, that MPLS per se is not a QoS solution; it still needs a distinct mechanism to support QoS. As discussed later, either a diffserv or intserv mechanism can be used. (For some multitiered networks, both diffserv and intserv could be used.)
Figure 9.2 Control/data plane in MPLS.
MPLS Addressing
IP
Signaling
CR-LDP or RSVP-TE
Routing
OSPF-TE, ISIS-TE
Control Plane
Cells & Frames & Optical
Transport
Data Plane
ISIS: Intermediate System-to-Intermediate System CR-LDP: Constrain-Based Routed Lavel Distribution Protocol RSVP-TE: Resource Reservation Protocol-Traffic Engineering
Figure 9.3 MPLS elements.
An LSR is a high-speed router device in the core of an MPLS network that participates in the establishment of LSPs using the appropriate label signaling protocol and highspeed switching of the data traffic based on the established paths.
Core LSR INGRESS Edge LSR Customer Location
EGRESS
LSP Core LSR
(also known as LER: Label Edge Router)
An LER is a device that operates at the edge of the access network and MPLS network. LERs support multiple ports connected to dissimilar networks (such as frame relay, ATM, and Ethernet) and forwards this traffic on to the MPLS network after establishing LSPs, using the label signaling protocol at the ingress and distributing the traffic back to the access networks at the egress. The LER plays a very important role in the assignment and removal of labels, as traffic enters or exits an MPLS network.
347
Core LSR
Core LSR
Core LSR
CORE
Edge LSR
The forward equivalence class (FEC) is a representation of a group of packets that share the same requirements for their transport. All packets in such a group are provided the same treatment en route to the destination. As opposed to onventional IP forwarding, in MPLS, the assignment of a particular packet to a particular FEC is done just onece, as the packet enters the network. FECs are based on service requirements for a given set of packets or simply for an address prefix. Each LSR builds a tab to specify how a packet must be forwarded. This table, called a label information base (LIB), is comprised of FEC-to-label bindings.
Customer Location
348
Label
1a
Label
1b
3
Label
Core Label Switching Router
4
Figure 9.4 Basic MPLS operation. (Courtesy: Altera (altera.com)
The diagram illustrates the flow of a packet through an MPLS-enabled network. The source network is on the left and the destination network on the right. The large cloud in the center is the wide area network (WAN).
1a. Existing routing protocols (OSPF, IS-IS) established the reachability of the destination networks 1b. Label distribution protocol (LDP) established label-to-destination network mappings 2. Ingress edge LSR receives a packet, performs layer-3 value-added services, and labels the packets 3. LSR switches the packet using label swapping 4. Egress edge LSR removes the label and delivers the packet
Edge Label Switching Router
2
Voice over MPLS and Voice over IP over MPLS
9.3
349
Basic MPLS Features
Table 9.1 identifies the key RFCs supporting MPLS that were available as of this writing. There are about 25 IETF Internet Drafts, as well as approximately 100 individual submission papers to IETF, on the topic of MPLS. As of this writing, however, there were no IETF drafts or submittals on the topic of VOMPLS. The LSRs provide a connection-oriented service, similar to ATM and FR permanent virtual circuits (PVCs) using label-switched paths (LSPs). At each node, the label on the incoming packet is used for table lookup to determine the outbound link and a new label (this is the label-swapping mechanism). A new shim header is required except on links to ATM switches, which reuse virtual path identifier/virtual channel identifier (VPI/VCI) fields in cells. Labels have local (singlehop) significance only. MPLS adds tags to IP packets at ingress routers, as shown in Figure 9.5. The tags are used by MPLS switches in making the forwarding decisions, which enables the MPLS switch/router to direct traffic along a path that the routing engine would not necessarily pick (this is called constraint-based routing). The tags are stripped at the egress switch/router. Initially, MPLS was a TE tool; its original performance-enhancement goal was to direct traffic along a path with available capacity. Now, MPLS is seen as offering QoS-supportive infrastructures, as well as other capabilities. MPLS supports reservation of bandwidth for traffic flows with differing QoS requirements, and it has the ability to bring reliability and predictability to IP-based networks, particularly by way of its support of Service Level Agreements (SLAs), CoS, Grade of Service (GoS), and QoS. MPLS has the ability to converge multiple networks, technologies, and services into a single core network. As noted, VOMPLS is voice traffic carried directly over MPLS without IP encapsulation of the voice packet, representing the use of MPLS as an efficient transport of voice services within an IP/MPLS network. Furthermore, MPLS has the potential to bring operational savings by streamlining provisioning, and it can make bandwidth management more efficient by supporting dynamic bandwidth allocation. The default MPLS connection establishment creates multiple “trees,” each rooted on a single egress node, in lieu of setting up N2 point-to-point paths between ingress and egress routers (see Figure 9.6 for an example). The reachability information is automatically managed by using the Open Shortest Path First (OSPF) or Intermediate System–to–Intermediate System (IS-IS) algorithm as the nodes boot up. The nodes notify the nearest neighbors of label assignments via a Label Distribution Protocol (LDP). The paths merge as they approach the egress node, simplifying the routing table. An optional capability supports constraintbased routing, which can be used by service providers to engineer the path where the traffic is to be transited (by planner’s choice). The constraint-based routes are calculated either through Constraint-Based Routing LDP (CR-LPD) or through an
350
Chapter Nine
Table 9.1
RFCs on the IETF Standard Track as of Mid-2001
RFC 2702
Requirements for TE over MPLS. Identifies the functional capabilities required for implementing policies that facilitate efficient, reliable network operations in an MPLS domain. These capabilities can optimize the use of network resources and enhance traffic-oriented performance characteristics.
RFC 3031
Specifies MPLS architecture.
RFC 3032
MPLS label-stack encoding. Specifies the encoding to be used by an LSR for transmitting labeled packets on Point-to-Point Protocol (PPP) data links, on LAN data links, and possibly on other data links. Also specifies rules and procedures for processing the various fields of label-stack encoding.
RFC 3034
Use of label switching on the FR Networks Specification. Defines the model and generic mechanisms for MPLS on FR networks. Extends and clarifies portions of the MPLS architecture and the LDP relative to FR networks.
RFC 3035
MPLS using LDP and ATM VC switching. Specifies in detail the procedures to be used when distributing labels to or from ATM LSRs when those labels represent FECs for which the routes are determined on a hop-by-hop basis by network layer–routing algorithms. Also specifies the MPLS encapsulation to be used when sending labeled packets to or from ATM LSRs.
RFC 3036
LDP specification. Defines a set of procedures called LDP by which LSRs distribute labels to support MPLS forwarding along normally routed paths.
RFC 3037
LDP applicability. Describes the applicability of LDP by which LSRs distribute labels to support MPLS forwarding along normally routed paths.
RFC 3038
VCID notification over ATM link for LDP. Specifies the procedures for the communication of VCID values between neighboring ATM LSRs.
RFC 3033
Specifies the assignment of the information field and protocol identifier in the Q.2941 Generic Identifier and Q.2957 User-to-User Signaling for the Internet Protocol.
RFC 3107
Carrying label information in BGP-4. Specifies the way in which the labelmapping information for a particular route is piggybacked in the same BGP Update message that is used to distribute the route itself. When BGP is used to distribute a particular route, it can also be used to distribute an MPLS label that is mapped to that route.
augmented OSPF or IS-IS. In the latter case, RSVP route-establishment methods— specifically, Resource ReserVation Protocol–Traffic Engineering (RSVP-TE) methods—determine the thread path between ingress and egress routers. This method is well accepted by the leading router vendors. Two approaches to MPLS use have emerged of late. ISPs and IP-oriented individuals focus on TE using RSVP-based signaling. Incumbent carriers view MPLS as comparable to ATM but with variable cell lengths; in this context, LDP and the extension CR-LDP are used. The industry has come a long way from the basic TE
351
• MPLS forwarding - user plane • employs label swapping forwarding process • at MPLS ingress and egress packet-based LSR may send packet to L3 (FIB) for processing
Figure 9.5 MPLS mechanisms at a glance.
Header defined in RFC3032
0 1 2 3 01234567890123456789012345678901 Label Length = 4 Exp S TTL Stack Label: Label Value, 20 bits Entry Exp: Experimental Use, 3 bits S: Bottom of Stack, 1 bit TTL: Time to Live, 8 bits
Labeled Packets
labels
• LSR operations are swap, pop or push a label
LIB
FIB
Database
Routing Protocol Messages
Labeled Packets
labels
Routing Protocol Messages
LIB
FIB
Database
• ATM-based MPLS may use VPI/VCI for label • GMPLS defines label values for other media I/F = interface LIB = Label Information Base FIB = Forward Information Base
LIB
FIB
Database
MPLS routing - uses same IP routing mechanisms • control protocol to distribute labels between neighboring LSRs • labels placed in LIB - contains input I/F, input label, output I/F, and output label
352
Chapter Nine
Figure 9.6 MPLS logical topology.
A
B
120
100
120 310 100
120
LSR 310
LSR
LSR 310
310
LSR
310
310
310 LSR
MPLS 'tree' to node A MPLS 'tree' to node B
application of the late 1990s; yet, some see MPLS as a way of converting the Internet to a circuit base [6]. LDP is defined for label distribution inside one MPLS domain. One of the most important services that may be offered using MPLS in general and LDP in particular is support for CR of traffic across the routed network. CR offers the opportunity to extend the information used to setup paths beyond what is available for the routing protocol [7]. For instance, an LSP can be set up based on explicit route, QoS, and other constraints. CR is a mechanism used to meet TE requirements, which may be met by extending LDP for support of CR LSPs. Other uses for CR-LSPs include MPLS-based VPNs. More discussion on label distribution follows. TE is concerned with the task of mapping traffic flows to the physical network topology. Specifically, it provides the ability to move traffic flows away from the shortest path calculated by a routing protocol such as Routing Information Protocol (RIP), OSFP, or Interior Gateway Protocol (IGP) and onto a less congested path (see Figure 9.7). The purpose of TE is to balance the traffic load on the various links, routers, and switches in the network so that none of these components is over- or underused [8]. TE allows service providers to fully exploit the network infrastructure, a feature that can also be useful for route selection in QoS-based applications, such as VOMPLS. At the MPLS Working Group meeting held in December 1997, there was consensus that LDP should support explicit routing of LSPs with provision for indication of associated (forwarding) priority. Specifications exist for an end-to-end setup mechanism of a CR-LSP initiated by the ingress LSR; also, mechanisms exist to provide means for reservation of resources using LDP. Procedures exist for the support of • Strict and loose explicit routing • Specification of traffic parameters
Voice over MPLS and Voice over IP over MPLS
353
Figure 9.7 MPLS TE.
San Jose
Shortest Path (RIP, OSPF, etc.)
Boston
Traffic Engineering Path
IGP Shortest Path Traffic Engineering Path
• • • •
Route pinning CR-LSP Preemption through setup/holding priorities Handling failures Resource class
MPLS can be logically and functionally divided into two elements to provide the label-switching functionality (see Figure 9.8): 1. MPLS forwarding/label-switching mechanism 2. MPLS label-distribution mechanism
MPLS Forwarding/Label-Switching Mechanism The key mechanism of MPLS is the forwarding/label-switching function. See Figures 9.9 and 9.10 for a view of the label and label insertion. This is an advanced form of packet forwarding that replaces the conventional longest-address-match forwarding with a more efficient label-swapping forwarding algorithm. The IP header analysis is performed once at the ingress of the LSP for the classification of protocol data units (PDUs). PDUs that are forwarded via the same next hop are grouped into a forwarding equivalence class (FEC) based on one or more of the following parameters: • The address prefix • The host address • The host address and QoS
354
Chapter Nine
Label Management
PHY Edge LSR
LDP CD-LDP TCP UDP IP FWD MPLS FWD Link
LIB
Data Plane
LDP CD-LDP TCP UDP IP FWD MPLS FWD Link
LIB
Control Plane
LIB
Figure 9.8 Protocol stack. (Courtesy: Altera (altera.com))
LDP CD-LDP TCP UDP IP FWD MPLS FWD Link
PHY Core LSR
LDP - Label distribution protocol CR-LDP - Constraint-based LDP, used for traffic engineering* TCP - Transmission control protocol UDP - User datagram protocol
PHY Edge LSR
LIB - Label information base. Table of labels mapping input port/label to output port/label Internet protocol (IP) FWD - Next hop forwarding based on IP address. Longest match forwarding used MPLS FWD - Label switching based on MPLS label and LIB lookup
* Resource reservation protocol traffic engineering (RSVP-TE) is another signaling mechanism used for traffic engineering
The FEC to which the PDU belongs is encoded at the edge LSRs as a short fixed-length value known as a label. When the PDU is forwarded to its next hop, the label is sent along with it. At downstream hops, there is no further analysis of the PDU’s network layer header; instead, the label is used as an index into a table, and the entry in the table specifies the next hop and a new label. The incoming label is replaced with this outgoing label, and the PDU is forwarded to its next hop. Labels usually have local significance and are used to identify FECs based on the type of underlying network. For example, in ATM networks, the VPI and VCI are used in generating the MPLS label; in FR networks, the data
Figure 9.9 MPLS label.
Link Layer Header
MPLS SHIM
Network Layer Header
TCP & data
32 bits
Label
Exp. Bits
BS
TIL
20 bits
3 bits
1 bit
8 bits
Figure 9.10 MPLS label insertion.
(a) ATM network (b) Frame relay network (c) SONET/PPP (POS) network (d) Ethernet network (a)
Label
Exp. Bits
BS
TTL
VPI/VCI
IP header
Data
IP header
Data
Data
VPI/VCI
cell
(b)
Label
Exp. Bits
BS
TTL
DLCI
Data
cell
IP header
Data
IP header
Data
Data
DLCI
frame
Data
frame
IP header
(c)
PPP header
Label
Exp. Bits OH
BS
TTL
Payload
SONET frame
Layer 3 Header OH
Label
Exp. Bits
BS
TTL
Ethernet frame
355
Data Payload
SONET frame
IP header
(d) MAC header
Data
Layer 3 Header
Data
Data
356
Chapter Nine
link control identifier (DLCI) is used. In ATM environments, the labels assigned to the FECs (PDUs) are the VPI/VCI of the virtual connections established as part of the LSP; in FR environments, the labels assigned to the FECs (PDUs) are the DCLIs. Label switching has been designed to leverage the Layer 2 switching function done in the current data link layers such as ATM and FR. It follows that the MPLS forwarding mechanism should update the switching fabrics in ATM and FR hardware in the LSR for the relevant sets of LSPs, which can be switched at the hardware level [3]. In Ethernet-based networks, the labels are short headers placed between the data link headers and the data link layer PDUs.
MPLS Label-Distribution Mechanism The MPLS architecture does not assume a single label-distribution protocol. The distribution of labels in MPLS is accomplished in several ways: • Extending routing protocols, such as the Border Gateway Protocol (BGP), to support label distribution • Using the RSVP signaling mechanism to distribute labels mapped to the RSVP flows • Using the LDP
Label Distribution Using BGP When a pair of LSRs that maintain BGP peering with each other exchange routers, they also need to exchange label-mapping information for these routes. The exchange is accomplished by piggybacking the label-mapping information for a route in the same BGP update message used to exchange the route.
Label Distribution Using RSVP RSVP defines a session to be a data flow with a particular destination and transport layer protocol [9]. From the early 1990s to the late 1990s, RSVP was considered only for QoS purposes in IP networks. When RSVP and MPLS are combined, a flow or session can be defined with greater generality. The ingress node of an LSP can use a variety of means to determine which PDUs are assigned a particular label. Once a label is assigned to a set of PDUs, the label effectively defines the flow through the LSP. Such an LSP is referred to as an LSP tunnel, because the traffic flowing through it is opaque to intermediate nodes along the label-switched path. The label-request information for the labels associated with RSVP flows are carried as part of the RSVP path messages; the label-mapping information for the labels associated with RSVP flows are carried as part of the RSVP resv messages [3]. The initial implementors of MPLS chose to extend RSVP into a signaling
Voice over MPLS and Voice over IP over MPLS
357
protocol to support the creation of LSPs that could be automatically routed away from network failures and congestion. [RSVP_MPLS_TE] defines the extension to RSVP for establishing LSPs in MPLS networks [10]. The use of RSVP as a signaling protocol for TE is quite different than that envisioned by its original developers in the mid-1990s, as follows [8]: • A number of extensions were added to the base RSVP specification (RFC 2205 and 2209) for supporting the establishment and maintenance of explicitly routed LSPs. • RSVP signaling takes place between pairs of routers, rather than between pairs of hosts, that act as the ingress and egress points of a traffic trunk. Extended RSVP installs a state that applies not to a single host-to-host flow but to a collection of flows sharing a common path and a common pool of network resources. By aggregating numerous host-to-host flows into each LSP tunnel, extended RSVP significantly reduces the amount of RSVP state that must be maintained in the core of a service provider’s network. • RSVP signaling installs a distributed state related to packet forwarding, including the distribution of MPLS labels. • The scalability, latency, and traffic-overhead concerns regarding RSVP’s softstate model are addressed by a set of extensions that reduce the number of refresh messages and the associated message-processing requirements. • The path established by RSVP signaling is not constrained by conventional destination-based routing, so it is a good tool to use for establishing TE trunks. In 1997, the initial implementors of MPLS had many reasons for choosing to extend RSVP rather than design an entirely new signaling protocol to support TE requirements [8]. Extended RSVP provides a unified signaling system that delivers everything that network operators needed to dynamically establish LSPs. • Extended RSVP creates an LSP along an explicit route to support the TE requirements of large service providers. • Extended RSVP establishes an LSP state by distributing label-binding information to the LSRs in the LSP. • Extended RSVP can reserve network resources in the LSRs along the LSP, which is the traditional role of RSVP. Extended RSVP, however, also permits an LSP to carry best-effort traffic without making a specific resource reservation. As will be seen later, RSVP can serve a dual role in MPLS: one for label distribution, another for QoS support (which we discussed in Chapter 8).
358
Chapter Nine
Label-Distribution Protocol The LDP is a set of procedures and messages by which LSRs establish LSPs through a network by mapping network layer–routing information directly to data link layer–switched paths. These LSPs may have an endpoint at a directly attached neighbor (comparable to IP hop-by-hop forwarding) or an endpoint at a network egress node, enabling switching via all intermediary nodes. The LDP associates an FEC with each LSP that it creates. The FEC associated with an LSP specifies that PDUs are mapped to that LSP. LSPs are extended through a network as each LSR splices incoming labels for an FEC to the outgoing label assigned to the next hop for the given FEC. The messages exchanged between the LSRs are classified into the four categories, as shown in Table 9.2. The LDP uses the Transmission Control Protocol (TCP) for session, advertisement, and notification messages. The TCP is used to provide reliable, sequenced messages. Discovery messages, transmitted via the User Datagram Protocol (UDP), are sent to the LSP port at the all-routers-on-thissubnet-group multicast address. Discovery messages provide a mechanism through which LSRs can indicate their presence within a network. An LSR sends a hello message periodically, and when it chooses to establish a session with another LSR discovered via a hello message, it uses (via the TCP) the LDP-initialization procedure. Upon successful completion of the initialization procedure, the two LSRs become LSP peers and may exchange advertisement messages. An LSR requests label mapping from a neighboring LSR when it needs one, and it advertises a label mapping to a neighboring LSR when it wishes the neighboring LSR to use a label.
Other Features Besides the three main facets of MPLS discussed above, work is underway in the following areas: • Layer 2 VPNs over an MPLS core • Generalized MPLS (optical control) • VOMPLS
Table 9.2
LDP Messages
Message type Discovery messages Session messages Advertisement messages Notification messages
Function Used to announce and maintain the presence of an LSR in a network Used to establish, maintain, and terminate sessions between LSP peers Used to create, change, and delete label mappings for FECs Used to provide advisory information and to signal error information
Voice over MPLS and Voice over IP over MPLS
• • • •
359
Real-time service provisioning Pseudo–wire emulation (end-to-end) diffserv Convergence of the core network
Several-dozen draft documents and other IETF submissions have been generated; 10 RFCs have been created, as seen in Table 9.1. Draft martini (Layer 2 MPLS) defines the support of other transport modes besides routed IP service. Examples include Transparent LAN Service (TLS), ATM, and FR. RFC 2547 specifies IP VPN transport service at Layer 3. The VPN logically separates customer traffic across the backbone; Border Gateway Protocol (BGP) permits access between different VPNs. Proponents claim the security to be as good as FR PVCs. MPLS promises to be the most effective way to provide a stable packet network and to integrate ATM and IP in the same backbone network, thereby enabling carriers to preserve investments that they have made in ATM.1 MPLS is not an IP network, although it does use IP routing protocols such as OSPF or IS-IS. Similarly, MPLS is not an ATM network, although it can use reprogrammed ATM switches. Hence, MPLS can be realized over router hardware as well as over switch hardware. Proposals have been made for transporting the PDUs of Layer 2 protocols such as FR, ATM AAL5, and Ethernet and providing a SONET circuit-emulation service (CEM) across an MPLS network. In an MPLS network, it is possible to carry the PDUs of Layer 2 protocols by prepending an MPLS label stack to these PDUs. The MPLS Working Group has specified the necessary label distribution procedures for accomplishing this task via the encapsulation methods [11]. Reference [12] describes a method for transporting time-division multiplexed (TDM) digital signals (TDM circuit emulation) over a packet-oriented MPLS network. The transmission system for circuit-oriented TDM signals is the SONET and/or Synchronous Digital Hierarchy (SDH). To support TDM traffic, which includes voice, data, and private leased-line service, the MPLS network must emulate the circuit characteristics of SONET/SDH payloads. MPLS labels and a new circuit-emulation header are used to encapsulate TDM signals and provide CEM over MPLS. The MPLS Forum mentioned previously is an industry-advocates’ association aimed at stimulating deployment of MPLS technology. The group is working on service provider requirements, deployment scenarios, implementation agreements, test-plan definitions, test-script development, and educational materials.
Comparison MPLS protocols allows high-performance label switching of packets; network traffic is forwarded using a simple label [13]. In an MPLS domain, when a stream of data traverses a common path, a LSP can be established by using MPLS signaling
360
Chapter Nine
protocols. At the ingress LSR, each packet is assigned a label and transmitted downstream. At each LSR along the LSP, the label is used to forward the packet to the next hop. By combining the attributes of Layer 2 switching and Layer 3 routing into a single entity, MPLS provides [14] 1. Enhanced scalability via switching technology 2. Support of CoS- and QoS-based services (diffserv and intserv) 3. Elimination of the need for an IP over ATM overlay model and its associated management overhead 4. Enhanced traffic-shaping and -engineering capabilities Table 9.3 depicts some MPLS features that make it a useful networking technology. MPLS draws the best of both worlds, particularly regarding the support of QoS-demanding applications. Figure 9.11 depicts the conversions of ATM and IP technologies in MPLS. MPLS reduces the processing overhead in routers, improving the packet-forwarding performance, and it provides a new way to provide QoS in networks—an approach that is both complementary and in competition with diffserv, intserv/RSVP, and ATM. CR is superior to the routing mechanism of pure IP systems, where the routing decision is based on additional information rather than just the calculation of the shortest path (the shortest path can be congested or unsecured, or it can have low available capacity). Since multiprotocol over ATM (MPOA) never saw realization, IP use of ATM was left with the single choice of Classical IP over ATM (CIOA; RFC 1577), which has, unfortunately, given rise to an N2 route propagation problem (very problematic in large networks). In this problem, routers have to be interconnected with a mesh of ATM VCs (or FR VCs, for that matter). MPLS also allows carriers to provision VPN services that can meet specified service quality metrics, such as bronze, silver, gold, and platinum. This technology allows ISPs to scale their networks to support TE demands without having to use ATM overlay networks. It should be noted, however, that as of this writing, the maturity of MPLS equipment in general—when measured in terms of carrier class reliability, features set, and OAM&P support—and VOMPLS in particular leaves a lot to be desired. Much more work is needed in VOMPLS, for example, at the theoretical level and in terms of implementations. A carrier could decide that at this time, VOATM has had nearly a decade of maturity; VOIP, five years of maturity. Yet, MPLS enjoys certain attributes that prima facie make it a better technology than pure IP for the support of packetized voice applications. Developers and vendors should therefore bring forth VOMPLS technologies and products, but they must do so with an understanding of the principles expounded herewith to be successful.
Table 9.3
Application-Oriented Features of MPLS
Link layer independence
MPLS works with any type of link layer medium, such as ATM, FR, Packet over SONET, and Ethernet.
Improved performance
MPLS enables higher data transmission performance from simplified packet forwarding and switching mechanisms.
Explicit/improved routes
MPLS supports explicit routes, which are not set up by normal IP hop-by-hop routing but for which an ingress/egress node has specified all or some of the downstream nodes.
QoS support
Explicit routes provide a mechanism for QoS constraint routing. As an example, some of the initial MPLS deployment was over ATM infrastructures; in other cases, it was over metro optical networks. In the ATM scenario, the core LSRs and edge LSRs can allocate QoS to different user requirements and map them to different ATM VCs that support different ATM QoS. Because the edge LSR is the ingress to the ATM overlay network, it is responsible for efficiently classifying the IP flows and mapping to the ATM QoS.
Traffic engineering (TE)
MPLS supports TE, a process of selecting the paths chosen by data traffic for balancing the traffic load on the various links, routers, and switches in the network. To meet key performance objectives, TE must be (1) traffic-oriented, where it enhances the QoS of traffic streams, and (2) resource-oriented, where it pertains to optimizing resource use.
Aggregation of PDU streams
In MPLS, the label-stacking mechanism can be used to perform the aggregation within Layer 2 itself. Typically, when multiple streams must be aggregated for forwarding into a switched path, processing is required at both Layers 2 and 3. The top label of the MPLS label stack is used to switch PDUs along the label-switched path; the rest of the label stack is application-specific.
Virtual private network (VPN) support
VPN is an application that uses label-stacking mechanisms. At the VPN ingress node, the VPN label is mapped onto the MPLS label stack, and the packets are label-switched along the LSP within the VPN until they emerge at the egress. At the egress node, the label stack is used to determine further forwarding of the PDUs.
Scalability of network layer routing
A key MPLS desideratum is to achieve a better, more efficient transfer of PDUs in the current IP networks. Combining the routing knowledge at Layer 3 with the switching capability in ATM devices results in a better solution. In the MPLS scenario, it is sufficient to have adjacencies with the immediate peers. The edge LSR interacts with the adjacent LSR, which is sufficient for the creation of LSPs to transfer the data.
361
362
Chapter Nine
Figure 9.11 Comparison of three leading protocols.
IP IP routing software MPLS
Forwarding
IP routing software ATM
Label Swapping
ATM control plane Label Swapping
MPLS
IP
ATM
Addressing
IP
IP
IP
Signalling
CR-LDP or RSVP-TE
PNNI Signalling
N/A
Routing
OSPF-TE, ISIS-TE
PNNI Routing
OSPF, ISIS
Cells
Frames
Transport
Cells & Frames & Optical
Control Plane
Data Plane
PNNI: Private Network-Network Interface ISIS: Intermediate System-to-Intermediate System CR-LDP: Constrain-Based Routed Lavel Distribution Protocol RSVP-TE: Resource Reservation Protocol-Traffic Engineering
Table 9.4 provides rationalization for the improved treatment of packets through an MPLS network as compared to traditional IP. Figure 9.12 depicts a basic MPLS engine. Table 9.5 depicts the deployment status of MPLS as a whole. According to observers [15], “Carriers are not expected to converge their IP and ATM networks to MPLS for at least another [three to four] years. This delay actually may afford a window of opportunity to developers who seek to bring forward
Voice over MPLS and Voice over IP over MPLS
Table 9.4
363
Comparison between MPLS and IP MPLS
Entire IP header analysis
IP
Support for unicast and multicast data
Performed only once at the ingress of the packet’s path in the network Based on the number of parameters, including the destination address in an IP header such as QoS and data types (voices) Requires only one forwarding algorithm
Hardware OMA&P
Cheaper Cheaper
Routing decisions
Performed at each hop of the packet’s path in the network Based on the destination address in the IP header
Requires special multicast routing and forwarding algorithms — Lacking
new services under the VOMPLS umbrella. There is recognition that “MPLS is now an entry criterion, in spite of the lack of wide-scale MPLS deployments in North American IP networks.2 Most vendors are positioning their products for value-added services, recognizing that service providers are looking for new sources of revenue.”
9.4
QoS Capabilities
Chapter 8 examined QoS in IP networks from a general perspective. This section focuses on those aspects of QoS that are of interest in an MPLS context. In general terms, QoS services in packet-based networks can be achieved in two possible ways: 1. In-band signaling mechanisms, where carriers and ISPs can provide priority treatment to packets of a certain type. This could be done, for example, Table 9.5 Capability IP TE IPVPN Multiservice L2VPN GMPLS Wireless core VOMPLS
Deployment Status of MPLS Demo
Field trials
Limited deployment
Wide deployment •
• • • • • •
364
Chapter Nine
Figure 9.12 Basic MPLS engine. (Source: Trillium Digital Systems, Inc.)
Resource Management
Label Manager (LIB/FIB)
LDP
CR-LDP
Explicit Route Calculations
Constraint Based Routing
Traffic Engineering Database
User Inputs TUCL Routing Data Base UDP/TCP
IP Forward
Packet Classifier/MPLS Forward
Label Mapping and Switching
ATM
Frame Relay
PPP
• Label management support for various label operations, including • Label assignments modes (topology-driven and data-driven) • Label control (independent and ordered) • Label retention modes, (liberal and retention) • Discovery mechanisms support for both basic and extended modes • Label distribution that supports both downstream-on-demand and unsolicited methods • Constraint-based traffic engineered routes (CR-LDP) for QoS/CoS • LSP tunnel creation and label stacking • Stream merging/aggregation for ATM interface support • Loop detection mechanism during LSP support using path vector and hop counts • Multipath TEP setup capability for load balancing • MIB support for LIB and FIB entries
with the TOS field in the IPv4 header or with the Priority field in the IPv6 header. The MPLS label is another way to identify to the router/IP switch that special treatment is required. If routers, switches, and end systems all recognized and used the appropriate fields, and the queues in the routers and switches were effectively managed according to the priorities, this method of providing QoS guarantees could be called the simplest. The reason is that no new protocols would be needed and the carrier’s router can
Voice over MPLS and Voice over IP over MPLS
365
be configured in advance to recognize labels of different types of information flows. This approach is used in the diffserv model. 2. Out-of-band signaling mechanisms, for securing allocations of shared network resources. This type of signaling includes ATM signaling for different classes of services and RSVP. With RSVP, the end user can request services based on QoS. It should be noted, however, that RSVP only reserves, but does not provide, bandwidth. As such, it augments existing unicast/multicast routing protocols—in particular, IP. IP, in turn, may well have to rely on packet over SONET (POS), ATM (say, via CIOA), or GMPLS (optical switch control) to obtain bandwidth. This approach is used in the intserv model. Tables 9.6 list key Internet Drafts related to MPLS QoS.
Introduction As noted previously, the use of MPLS gives a packet network an increased level of QoS control. QoS is defined as those mechanisms that give network administrators the ability to manage traffic bandwidth, delay, jitter, loss, and congestion throughout the network. QoS controls are critical for multimedia application in intranets, dedicated (WAN) IP networks, VPNs, and a converged Internet. Technologically, there are three approaches to MPLS QoS: Per-flow QoS technology. The intserv Working Group offers link-level per-flow QoS control. RSVP is used for signaling in intserv (MPLS also uses RSVP as a general signaling protocol), and the Working Group is now looking at new RSVP extensions. Services of intserv are controlled load service and guaranteed, terms that have been renamed by the ITU-T Y.iptc (IP traffic control) to delayinsensitive statistical bandwidth capability and delay-sensitive statistical bandwidth capability, respectively. The ITU Y.iptc effort uses intserv services and diffserv expedited forwarding. Class-based QoS technology. The diffserv Working Group has developed a class-based QoS capability, in which packets marked at the network’s “edge” (see Figure 9.13 for an example). Routers use markings to decide how to handle packets. There are four services:
Table 9.6
QoS-Related Internet Drafts
May 2001 April 2001 April 2001 December 2000
Integrated Services across MPLS Domains Using CR-LDP Signaling MPLS Support of Differentiated Services MPLS Support of Differentiated Services Using E-LSP Policy Framework MPLS Information Model for QoS and TE
366 Figure 9.13 Example of MPLS QoS (diffserv).
LSR
Metro2 diffserv/TOS marking/honoring with Managed queuing • Real-time Example • High-grade data of precedure • Medium-grade data Levels • Low-grade data
CR
Edge LSR/AR
Metro1
LSR
Core
Metro3
Edge LSR/AR CR
TOS = Type of service CR = Customer router AR = Access router LSR = Label Switched Router
diffserv/TOS marking/honoring with Managed queuing
Metro4
LSR
MPLS Network
Voice over MPLS and Voice over IP over MPLS
1. 2. 3. 4.
367
Best-efforts normal (Internet) traffic Seven precedence levels (prioritized classes of traffic) Expedited forwarding (EF) (leased-line-like service) Assured forwarding (AF) (four queues with three drop classes)
Class-based QoS requires edge policing, although this technology is not yet defined. Other QoS technologies and ideas. Many ideas come from traditional telecommunications providers, mapping flow-based QoS into a circuit of some type. Examples include • MPLS LSPs • ATM VCs • Optical lambdas Optical lambdas (optical circuits) could make sense for core network trunks, but it resuscitates the old circuit versus packets discussions within the industry. In summary, the following list identifies techniques—current as of this writing— for supporting QoS in packet networks [16]: • diffserv: • Associate DSCP for every packet; define per-hop behavior (PHB) • RSVP/intserv: • Perceived as more widespread in enterprise networks • Now used in MPLS for label distribution • MPLS: • LSPs, used in the core to aggregate traffic flows • Mapping of diffserv PHB in access networks to MPLS flows in core network • Queuing, dropping, and traffic-shaping mechanisms: • Exhaustive and priority- and class-based queuing • Random Early Detection (RED) To realize true QoS, its architecture must be applied end to end, not just at the edge or at select network devices.3 The solution must provide a variety of technologies that can interoperate for delivering scalable, feature-rich services
368
Chapter Nine
throughout the network. The services must provide efficient use of resources by aggregating large numbers of IP flows where needed while providing fine-tuned granularity to those premium services defined by service level agreements (SLAs). The architecture must provide the devices and capabilities to monitor, analyze, and report detailed network status. Armed with this knowledge, network administrators or network-monitoring software can react quickly to changing conditions, ensuring the enforcement of QoS guarantees. Finally, the architecture must also provide mechanisms to defend against the possibility of theft, to prevent denial of service, and to anticipate equipment failure [14]. The use of MPLS gives a packet network an improved level of QoS control. QoS controls are critical for multimedia application in intranets, dedicated (WAN) IP networks, VPNs, and a converged Internet. QoS requirements for packet networks come from many sources: ITU-T, TIA, QoS Forum, ETSI, IEPS, and so on. Two groups, each with differing philosophies regarding QoS, exist: (1) Internet folks, who often take the approach of overprovisioning, and (2) incumbent carriers, who often prefer robust but complex controls. Services such as VOMPLS, MPLS VPNs, Layer 2 VPNs (L2VPNs), differentiated services traffic engineering (DS-TE), and the Draft martini typically require service differentiation in particular and QoS support in general. It is important to realize, as stated earlier, that MPLS per se is not a QoS solution. To support QoS, MPLS needs a distinct mechanism. One such mechanism is diffserv-style forwarding treatment at each node. In this case, the EXP (EXPerimental) bits of the header are used to trigger scheduling and/or drop behavior at each LSR.
Details The intserv Working Group has developed the per-flow QoS capability alluded to previously. RSVP offers signaling for intserv, and it is also used as a general signaling protocol (for example, MPLS). The intserv architecture [17] defines QoS services and reservation parameters used to obtain the required QoS for an Internet flow. RSVP [9] is the signaling protocol used to convey these parameters from one or multiple senders toward a unicast or multicast destination. RSVP assigns QoS with the granularity of a single application’s flow [18]. Signaling traffic is exchanged between routers belonging to a core area. After a reservation has been established, each router must classify each incoming IP packet to determine whether it belongs to a QoS flow; if it does belong, the router will assign the needed resources to the flow. The intserv classifier is based on a multifield classification, for it checks five parameters in each IP packet: namely, the Source IP address, Destination IP address, Protocol ID, Source Transport Port, and Destination Transport Port. As noted previously, RSVP plays two roles in MPLS: to support an intserv view of QoS and as a signaling protocol for distributing labels. In the mid-1990s, RSVP
Voice over MPLS and Voice over IP over MPLS
369
was developed to address network congestion by allowing routers to decide in advance whether they could meet the requirements of an application flow and then reserve the desired resources if they were available. RSVP was originally designed to install a forwarding state associated with resource reservations for individual traffic flows between hosts [8]. The physical path of the flow across a service provider’s network was determined by conventional destination-based routing (for example, RIP, OSPF, and IGP). By the late 1990s, RSVP became a proposed standard, and it has since been implemented in IP networking equipment. However, RSVP has not been used widely in service provider and carrier networks, because of operator concerns about its scalability and the overhead required to support potentially millions of host-to-host flows. Informational RFC2208 [19] discusses issues related to the scalability posed by the signaling-, classification-, and scheduling-mechanisms. An important consequence of this problem is that intserv-level QoS can be provided only within peripheral areas of a large network, preventing its extension inside core areas and the implementation of end-to-end QoS. IETF RSVP-related Working Groups have undertaken some work to overcome these problems. The RSVP Working Group has recently published RFC 2961, which describes a set of techniques to reduce the overhead of RSVP signaling. However, this RFC does not deal with the classification problem, which is still to be addressed. The [20] draft discusses the possibility of aggregation of RSVP sessions into a larger one. The aggregated RSVP session uses a DSCP for its traffic. Such a solution wastes the undoubted benefits given by the intserv quantitative QoS approach [18]. Packet purists will probably argue that diffserv is the best approach, because there is very little if any state information kept along the route, whereas those more in the carriers’ camp will probably argue that intserv is a better approach, because resource reservations and allocations can be better managed in the network in terms of being able to engineer networks and maintain SLAs. It is within reason to assume that if the design is properly supported by statistically valid and up-to-date demand information, and if resources are added quickly when needed, either approach will probably provide reasonable results. Table 9.7 depicts a mapping between the various QoS classes advanced by developers. At face value, diffserv appears to be able to scale more easily than intserv; also, it is simpler. One is not able to generalize about which technique is better for VOMPLS, because the decision will have to be based on the type of network architecture one chooses to implement. One cannot argue that a metric wrench is better than a regular wrench. If one works on a European-made engine, the metric wrench will obviously be superior; if one works on a U.S.-made engine, the regular wrench will be the answer. For example, in a small network where the end-to-end hop diameter is around 3 to 7, a reservation scheme—specifically, intserv—would seem fine (the U.S. voice network somewhat fits this range); in a network of large diameter where paths may be 3 to 15 hops, may find a reservation scheme too burdensome, and a node-by-node distributed approach—specifically, diffserv—may be better (the Internet somewhat fits this range).
370
Chapter Nine
Table 9.7 CoS queues 1 2 3 4 5
6
Mapping between Various QoS Classes Applications Virtual private line Multimedia (VOMPLS, video) Business applications VPM/Internet Networking control/routing protocols Networking control/signaling protocols
Service classes diffserv definitions ATM definitions Pure priority 3
EF3
CBR
Real-time
AF1
VBR-rt
Assured delivery Best-effort Pure priority 1
AF4
VBR-nrt
BE EF1
UBR/ABR —
Pure priority 2
EF2
—
The same kind of argument also applies when looking at the total number of nodes, separate and distinct from the network diameter. If the network in question is the national core network with 10 to 20 core nodes, the reservation/ intserv may be appropriate; if the network in question covers all the tiers of a voice network with around 400 to 500 interacting nodes, the diffserv approach may be better. These are just general observations; the decision regarding the best method must be made based on careful network-specific analysis, as well as product availability.
9.5
Voice Applications
If voice and data applications are to be mixed in a single network, attention must be given to the QoS of the voice application.4 Data, which usually rides on TCP, responds gracefully to transient congestion. TCP ensures reliable delivery of data; if packets are dropped, TCP will retransmit them. TCP responds to congestion by backing off the rate it sends. At an aggregate level, this back-off keeps traffic in the network within reasonable bounds for data applications. Voice is severely impacted by network congestion, and excessive delays cannot be tolerated. A packet containing voice cannot be played if it is received late. If packets are lost, retransmission is not an option, because they will very likely arrive too late. A quality voice service demands low delay and low loss. There are three elements in achieving QoS. First, the network resources must be committed to the voice application. Second, the traffic must be marked so that at each point of congestion, the committed resources can be applied to the guaranteed flow. Third, the amount of marked traffic must be kept at or below the amount of bandwidth committed to the flow.
Voice over MPLS and Voice over IP over MPLS
371
Committing network resources to voice application must be done on an “accounting basis,” link by link. MPLS TE provides the means to achieve this task. TE uses RSVP to reserve resources for the voice flows; for example, if the shortest path (that is, the normal IP route) has too few resources, TE can use an explicit route to send the flow on a less congested path. The MPLS label information provides a means of marking the traffic. Because RSVP sets up an MPLS TE tunnel, it also assigns the labels to be used by the flow. These labels, in addition to the EXP bits (QoS-specific bits that are carried in the label information), provide a simple and—in a proponent’s view—scalable way of identifying the voice flows. This is the diffserv approach discussed previously an intserv approach is also available.5 The TE tunnels can be employed like voice tie lines.6 Consider a tunnel between two voice gateways: The tunnel can be sized to the expected number of calls, and as calls arrive that are destined to the remote gateway, they are routed over the tunnel by applying the appropriate MPLS label. By aggregating calls into tunnels, network scalability is enhanced. Intermediate nodes need not be aware of individual calls; they deal only in the aggregated tie lines. All of the voice calls within the tunnel are marked with the same MPLS label. At each intermediate node, the packets belonging to the voice flow are identified solely by the MPLS label information. Intermediate nodes have a simple, scalable means of applying the proper queuing treatment to the flow. If the number of calls received exceeds the call capacity of the tunnel, several options exist. One is a dynamic capacity-adjustment strategy can be used; as a tunnel nears its call capacity, the voice gateway can signal a new tunnel or request increased capacity for the existing tunnel. A second option is that the gateway may address the problem at the call control level by routing through an alternate gateway or directing the call off-net. A third option is that the call can simply be blocked and a busy signal returned. In all options, however, the call load is carefully regulated to not exceed the reserved network resources. Thus, because MPLS provides an effective and scalable means of allocating resources—marking voice flows, and matching the number of voice calls to network resources—high-quality voice service can be guaranteed.
IP Header Compression The IP header represents a large overhead on voice. That overhead can exceed 100 percent; thus, compression of the headers to a minimum is highly desirable. MPLS can be used to achieve this task. A number of well-known header-compression techniques exist. In the data plane one needs a means to uniquely identify the flow of packets of a single call. If a tunnel carries only one call, the tunnel label can be used for this purpose. However, for scalability it makes sense to use a stack of two labels. The outer label represents the tunnel as described above; the inner label is used to identify the compressed flow. Behind the inner label is either a compressed header or, in the case of header suppression, the voice data.
372
Chapter Nine
Another concern for voice calls is disruption from a failure of network elements. In IP, when a link or node fails, IP automatically proceeds to find a new route. The task can, however, be time-consuming—more time than what may be tolerated by or acceptable to voice users. Again, MPLS offers a solution. TE tunnels can be locally repaired by sending the packets over a predetermined and preestablished backup tunnel. Because the repair is made locally, the time to repair can be well under 50 ms, which is the standard currently used in the PSTN. There are two divergent schools of thought regarding MPLS as a technology. One school views MPLS as an important enhancement to IP networks, allowing the creation of new services; the other school views MPLS as a technology in its own right—a frame-based successor to ATM. Correspondingly, as noted previously, there are two approaches to carrying voice in MPLS networks. One approach maps voice directly to MPLS as a service layer; the other maps voice to the IP service layer, which, in turn, uses MPLS to provide the enhanced IP services outlined above. IETF has standardized both (1) VOIP and (2) IP over MPLS. What about VOMPLS?
VOIPOMPLS Proposal A proposal for VOMPLS was brought to IETF in March 2000. It was rejected by IETF in favor of VOIPOMPLS. The VOMPLS proposal has since been taken up by the MPLS Forum. There were two arguments that carried IETF’s decision. The first concerned the ubiquity of connectivity offered by IP as representing much of the promise of VOIP. Telephone networks can be extended across LANs and cable networks, and building wiring can be simplified. Eliminating IP from the protocol stack limits the connectivity to the edge of the MPLS cloud. Thus, VOMPLS is not appropriate in bringing voice to the desktop workstation. The second argument concerned efficiency. Because IP overhead is so expensive, the need for a VOMPLS standard was postuated. However, when header compression/suppression is used, the data on the wire looks close to that which VOMPLS offers. Thus, with VOIPOMPLS one can have both efficiency and ubiquitous connectivity. Some argue that VOMPLS cannot replace VOIP. The ubiquity of connectivity does not exist for MPLS as it does for IP. VOIPOMPLS with header suppression can obviate the need for VOMPLS. According to suppliers (Cisco sources), many customers appear to be interested in the former and few about the latter. Indeed, MPLS can be used to eliminate IP from the stack, but the benefits of doing so are far from obvious, according to some observers. MPLS does offer benefits to VOIP. The efficiency of header compression and the scalability of aggregating flows into simple labels both offer economic benefits. MPLS QoS and fast restoration offer the means of creating high-quality voice service. Figure 9.14 depicts one application example that uses Cisco routers [21].
373
Figure 9.14 MPLS-supported VoIP.
Internet Service
VPN Service
Enterprise LAN
Enterprise LAN
CE
VoIP Gateway
Central Office
Voice Trunking
Traditional Telephony
Internet Access Router
PE
PE
PE
Regular TE Tunnel
GB Tunnel
Toll Bypass
MPLS Network
PSTN-Traditional TDM Network
PE
PE
Internet Access Router
PE
Central Office
Enterprise LAN
CE
VoIP Gateway
Class 5 Legacy Switches
Legend GB: "Guaranteed Bandwidth" CE: Customer Equipment PE: Provider's Equipment GB-TE Tunnel Regular TE Tunnel Physical Link
Enterprise LAN
Traditional Telephony
374
Chapter Nine
MPLS Forum Specification The purpose of the newly released MPLS Forum’s Implementation Agreement (IA), VOMPLS—Bearer Transport Implementation Agreement (July 2001), is to define a method for conveying voice directly over MPLS without first encapsulating the voice sample in IP. There are many possible arrangements in which voice may be carried in an MPLS environment, of which two are the most commonly discussed: 1. VOIPOMPLS, where the typical protocol stack would contain voice samples encapsulated in IP layer protocols (for example, RTP/UDP/IP) followed by encapsulation in the MPLS protocol. Compressed headers may be used in some implementations. The result is then conveyed by an MPLS transport arrangement, such as FR, ATM, PPP, or Ethernet. 2. VOMPLS—without the IP encapsulation of the voice packet—where the typical protocol stack would consist of voice samples encapsulated in the MPLS protocol on top of an MPLS transport arrangement, such as FR, ATM, PPP, or Ethernet. The first arrangement, VOIPOMPLS, is essentially a method of implementing VOIP and is largely supported by existing IETF standards. It is not the subject or purpose of the MPLS Forum’s IA. The second arrangement, VOMPLS, provides a very efficient transport mechanism for voice in the MPLS environment and is the subject of and purpose of the MPLS Forum’s IA. There are many similarities to this arrangement and other architectures in use today for voice over ATM (VOATM) and voice over Frame Relay (VOFR). The purpose of the newly released MPLS Forum’s IA is to define how a voice payload is encapsulated directly in the MPLS frame. It defines a VOMPLS header format supporting various payload types, including audio, dialed digits (DTMF), channel-associated signaling, and a silence insertion descriptor. The IA-defined VOMPLS bearer-transport header formats differ from the RTP formats used in VOIP. The IA defines MPLS support for the transport of digital voice payloads and describes frame formats and procedures required for voice transport. Also, it addresses the following functions: 1. Transport of uncompressed (that is, G.711—64 kbps) and compressed voice within the payload of an MPLS frame, as well as support for a diverse set of voice-compression algorithms 2. Silence removal and silence insertion descriptors 3. Dialed digits (DTMF information) 4. Channel-associated signaling bits The IA does not define algorithms for encoding audio streams. It references existing algorithms and specifies how the bits that they output are conveyed
Voice over MPLS and Voice over IP over MPLS
375
within an MPLS packet structure. Support for the unique needs of the different voice-compression algorithms is accommodated with algorithm-specific transfer syntax definitions, which establish algorithm-specific frame formats and procedures. Transport of supporting information for voice communication, such as signaling indications (for example, ABCD bits) and dialed digits, is also provided through the use of transfer syntax definitions specific to the information being sent.
References 1. D. Minoli. Planning and Designing Voice over MPLS Networks and Voice over IP over MPLS Networks. New York: McGraw-Hill, 2002. 2. A. Kankkunen. Voice over MPLS Framework, Internet Draft. www.ietf.org:// draft-kankkunen-vompls-fw-01.txt. July 19, 2000. 3. Future Software Limited White Paper. MultiProtocol Label Switching. Chennai, India, 2001. www.futsoft.com. 4. International Engineering Consortium tutorial material on MPLS. 5. International Engineering Consortium notes. www.iec.org. 6. S. Bradner. “The Future of the Net.” NGN 2001. Boston, MA. 7. B. Jamoussi. MPLS Working Group. “Constraint-Based LSP Setup using LDP.” February 2001. 8. C. Semeria. RSVP Signaling Extensions for MPLS Traffic Engineering. Juniper Networks, Inc., White Paper. Sunnyvale, CA, 2000. www.juniper.net. 9. Braden. “Resource ReSerVation Protocol (RSVP)—Version 1 Functional Specification.” RFC 2205 (September 1997). 10. Awduche. Extensions to RSVP for LSP Tunnels, Internet Draft. www.ietf.org://draft-ietf-mpls-rsvp-lsp-tunnel-08.txt. February 2001. 11. L. Martini. Transport of Layer 2 Frames over MPLS, Internet Draft. draftmartini-12circuit-trans-mpls-08.txt. November 2001. 12. A. Malis. SONET/SDH Circuit Emulation Service over MPLS (CEM) Encapsulation, Internet Draft. draft-malis-sonet-ces-mpls-05.txt. July 23, 2001. 13. E. Rosen, A. Viswanathan, and R. Callon. “Multiprotocol Label Switching Architecture.” RFC 3031 (January 2001). 14. R. Pulley and P. Christensen. A Comparison of MPLS Traffic Engineering Initiatives. NetPlane Systems, Inc., White Paper. Westwood, MA. www.netplane.com. 15. RHK Inc. Market Report. Switching and Routing 1H01 Market Share. August 2001.
376
Chapter Nine
16. J. Zeitlin. “Voice QoS in Access Networks—Tools, Monitoring and Troubleshooting.” NGN 2001 Proceedings. 17. R. Braden, D. Clark, and S. Shenker. “Integrated Services in the Internet Architecture: An Overview.” RFC 1633 (June 1994). 18. F. Tommasi, S. Molendini, and A. Tricco. Integrated Services across MPLS Domains Using CR-LDP Signaling, Internet Draft. May 2001. 19. A. Mankin (ed.), F. Baker, B. Braden, S. Bradner, M. O’Dell, A. Romanow, A. Weinrib, and L. Zhang. “Resource ReSerVation Protocol (RSVP)—Version 1 Applicability Statement—Some Guidelines on Deployment.” RFC 2208 (September 1997). 20. F. Baker, C. Iturralde, F. Le Faucheur, and B. Davie. Aggregation of RSVP for IPv4 and IPv6 Reservations, Internet Draft. www.ietf.org://draft-ietf-issll-rsvpaggr-04. April 2001. 21. Cisco Systems White Paper. Voice Trunking and Toll-Bypass Trunking Using Cisco MPLS diffserv-Aware Traffic Engineering. 2001.
Notes 1
According to The ATM and IP Report (April/May 2001, page 1), “Carriers have $17B invested in ATM infrastructure with no replacement in sight.” 2 However, a number of major North American carriers have announced MPLS trials and interest as of this writing. For example, Verizon and Qwest announced MPLS testing in spring 2001 and WorldCom and AT&T announced IP-VPNs with QoS in summer 2001. 3 However, if MPLS is actually deployed at the core of the network in support of VOMPLS, as discussed in Chapter 1, the QoS can also initially be targeted for the core. 4 This section is based on the succinct sidebar “MPLS Enhancements to VoIP,” by G. Swallow, Cisco Systems (May 28, 2001; mplsworld.com). 5 Note that in this context, RSVP (more precisely, RSVP-TE) is used to set up the LSP tunnels for TE, not to support the QoS apparatus per se (as would be the case in intserv). 6 Tie lines, however, are more the purview of private PBX networks—the PSTN needs trunks, not tie lines.
CHAPTER 10 Telephone Number Mapping (ENUM) 10.1
Introduction
The point has been made in previous chapters of this book that the true market opportunity for VOIP must be in the PSTN, since the enterprise intranet market is limited. Although the volume of data is thirteen times that of voice in 2002 and is projected to be twenty-three times that of voice by 2006, at the present voice still brings in about 80 percent of revenue for carriers; the U.S. voice revenue for local plus long distance is around $200 billion per year, whereas the global revenue is around $800 billion, including that for mobile services (see Figure 10.1).1–3 Even in “greenfield” applications, such as cable TV telephony and wireless networks, interworking with the PSTN is crucial. Therefore, PSTN voice is a desirable market to optimize (with new technologies) and/or penetrate. This chapter looks at one facet of what is needed to affect a packet/circuit integration in the PSTN for basic services in addition to advanced new services. That facet is telephone number mapping, or ENUM (RFC 2916; September 2000). ENUM is a proposal that defines domain name system (DNS)–based architecture and protocols for mapping a telephone number to a set of attributes (for example, URLs) that can be used to contact a resource associated with that number. This IETF protocol will help converge the PSTN and the IP network, for it supports the mapping of a telephone number from the PSTN to Internet services. Directory services are part of an overall networking functionality—X.500 directory services, for example. VOIP must support effective addressing and address
377
Chapter Ten
Figure 10.1 Global telecommunications revenue. (Source: ITU)
Projection of revenue growth (US$bn) 1000 Service revenue (US$bn))
378
900 800 700
Actual Projected Other: Data, Internet, Leased lines, telex,etc
600 500
Mobile Int'l
400 300 200 100
Domestic Telephone/fax
0 90 91 92 93 94 95 96 97 98 99 00 01 02
translation. These functions can be supported in a customer-resident gateway, but for full scalability, the directory function is best located in the network at large. A protocol is then needed to support various interactions with the directory. ENUM is an approach proposed to support a number of directory-related functions. With ENUM, the telephone number can also serve as basis for a person’s e-mail address. This agreement allows a person to reach multiple services by knowing a single contact address: a telephone number. ENUM supports a capability that takes “a telephone number in and gives a URL out.” The protocol takes a complete, international telephone number and resolves it to a series of URLs by using a DNS-based architecture. ENUM was developed as a solution to how one can find services on the Internet by using only a telephone number and how telephones, which have an input mechanism limited to twelve keys on a keypad, can be used to access Internet services [1]. Because ENUM puts telephone numbers into the DNS, it allows for a gamut of applications based only on a given telephone number. Proponents see the most promising application as an improvement in VOIP for telephone calls made over the Internet; additional applications include addressing for fax machines, e-mail, instant messaging, and Web sites. No current consensus exists on how to tackle the issue of convergence. Although the technical issue is somewhat straightforward, the politics are sensitive. The issue has generated controversy because “number administration” has intrinsic prestige, influence, and power. The issue is particularly thorny on the international arena. People that control addressing control certain aspects of network operation, design, and even owenership. There is competition that places the heavily regulated telephone industry against Internet entities that are uninterested in government regulation. Some countries are involved in the debate out of concern that a merged network could undermine state-owned telephone networks [2].
Telephone Number Mapping (ENUM)
10.2
379
Background
The convergence can be facilitated—that is, the PSTN can be linked organically to the Internet by making the telephone number part of an Internet address. ENUM [1] was developed as a potential solution to how network elements can find services on the Internet by using only a telephone number and to how telephones, having an input mechanism limited to twelve keys on a keypad, can be used to access Internet services. At its most basic level, ENUM aims to facilitate the convergence of PSTN and IP networks; it is the mapping of a telephone number from the PSTN to Internet functionalities. ENUM has a number of meanings. For example, it is the name of a protocol that resolves fully qualified telephone numbers to fully qualified domain name addresses by using a DNS-based architecture; it is the name of a chartered working group of the IETF chartered to develop protocols that map telephone numbers to resources found on the Internet using the DNS; and it is the title of RFC 2916, the approved protocol document that discusses the use of DNS for the storage of ITU-T E.164 numbers and the available services connected to an E.164 number. It should be noted that ENUM changes neither the numbering plan nor the telephony numbering or its administration in any way; moreover, it will not drain already scarce numbering resources, because it uses existing numbers. E.164 is the specification of the international telephone numbering plan administered by the ITU that specifies the format, structure, and administrative hierarchy of telephone numbers. Specifically, E.164 refers to the ITU document that describes the structure of telephone numbers. The ITU issues country codes to each nation; the administration of telephone numbers within each country is governed by that country’s telecommunications regulatory agency. A fully qualified E.164 number is designated by a country code, an area or city code, and a telephone number. For example, a fully qualified E.164 number for the telephone number 555-1234 in Washington, D.C. (area code 202) in the United States (country code 1) is +1-202-555-1234. E.164 numbers are appropriate for use in ENUM because they are an existing system for global traceability. Under the ENUM proposal, the number 1-202-555-1234 would become 4.3.2.1.5.5.5.2.0.2.1.e164.arpa as an Internet address (that is, the telephone number backward—separated by periods—with the extension .e164.arpa added in). The system would recognize both addresses as belonging to the same individual or entity.4 The .e164.arpa extension was picked to appease both camps; E.164 is the specification for the carrier’s numbering scheme, as noted, while arpa refers to the Advanced Research Projects Agency (ARPA), the U.S. agency that funded much of the Internet work in the 1970s and 1980s. Telephone numbers currently identify many different types of end terminals, supporting different services and protocols, and are used to identify telephone stations, fax machines, pagers, data modems, e-mail clients, text terminals for the hearing impaired, and so on. A prospective caller may wish to discover which ser-
380
Chapter Ten
vices and protocols are supported by the terminal named by a given telephone number. The caller may also require more information beyond simply the telephone number to communicate with the terminal. As an example, certain telephones can receive short e-mail messages. The telephone number does not embody sufficient information to send e-mail; the sender must have more information equivalent to that in a mailto: URL. From the perspective of the person receiving the call, the owner of the telephone number or device may wish to control the information that prospective callers may receive. The architecture must allow for different service providers competing openly to furnish the directory information required by clients to reach the desired telephone numbers. To address these issues, the IETF Working Group recently specified a DNS-based architecture and set of protocols that fulfill the following requirements: 1. The system must enable resolving input telephone numbers into a set of URLs that represent different ways to start communication with a device associated with the input phone number. 2. The system must scale to handle quantities of telephone numbers and queries comparable to those of current PSTN usage. It is highly desirable that the system respond to queries with a speed comparable to that of current PSTN queries, including in the case of a query failure. 3. The system must have some means to insert the information needed to answer queries into the servers via the Internet. The source of this information may be individual owners of telephone numbers (or the devices associated with those numbers), or it may be service providers that own servers that can answer service-specific queries. The system is designed not to preclude the insertion of information by competing service providers, in a manner that allows for the source of the information to be authenticated. 4. The system must enable the authorization of requests and updates. 5. The effort must carefully consider and document the security and performance requirements for the proposed system and its use. 6. The effort must take into account the impact of developments in local number portability on the proposed system. Naturally, the protocol put forth needs to consider how number resolution via the ENUM system is affected by the PSTN infrastructure for telephone numbering plans, such as the ITU-T E.164 standard. Figure 10.2 provides a pictorial view of the operation of ENUM [3]. Figure 10.3 depicts an ENUM deployment with softswitches [3]. Figure 10.4 shows typical access to ENUM services. The following are not goals of ENUM: 1. ENUM does not develop any protocol or system for routing calls of a specific service or for locating gateways to a specific service. One example of
Telephone Number Mapping (ENUM)
Figure 10.2 Pictorial operation of ENUM.
The Practical Value of Deploying ENUM Today
NetNumber ENUM Service
2
381
NetNumber Resolver converts number to 1.2.6.3.1.3.6.3.0.7.1.e164.com
NetNumber returns NAPTR
NAPTR Record IP
1
1.2.6.3.1.3.6.3.0.7.1.e164.com sip: nturner@n... smtp: nturner@
SIP Proxy, Gatekeeper Softswitch or IP PBX IP origination User "Dials" 1(703) 631-3621 3
IP
Call terminates on ICP#2's network without ever hitting the PSTN Customer Premise SIP Proxy, Gatekeeper Softswitch or IP PBX The NetNumber ENUM Service provides a global directory shared among service providers and using conventional telephone numbers as pointers. This approach can accelerate the adoption of IP based services among the end points, applications and intelligent IP agents.
such a service is mobile telephony; one example of such a gateway is IP telephony gateway. 2. ENUM does not develop protocols for the “intelligent” resolution of these queries. In other words, updates to ENUM data are limited to the insertion, update, and removal of URL information and do not include inserting “logic” into the servers (to be used to respond to queries in an intelligent manner). Of course, servers are free to support such intelligent services, but the insertion of such logic is not the object of ENUM standardization. Documentation developed in the recent past—the Internet Draft Number Portability in the GSTN: An Overview and the RFC 2916, E.164 Number and DNS—specifies the architecture and protocols (query, update) of the ENUM system. Proponents argue that a government-sanctioned standard, such as through adoption via the ITU, for a centralized directory system is needed to avoid the consumer confusion that will arise as the plethora of devices used for communicating becomes more pronounced [2].
382
Chapter Ten
Figure 10.3 ENUM use with softwitches.
SIP Termination from Softswitch Network NetNumber ENUM Service
PSTN-based Interconnect
PSTN
SIP
SIP SIP-based IP Interconnect
At ingress of Softswitch network, ENUM query provides second routing option: SIP-based interconnect
As inferred from the material presented thus far, the IETF has looked at the numbering issue and generated RFCs and Drafts. Liaisons with the ITU have occurred. Lately, the following U.S. governmental and nongovernmental groups have studied where to best asset their power over the telephone–Internet database: 1. The Federal Communications Commission (FCC), which oversees the U.S. telephone system 2. The Commerce Department, which has taken charge of the Internet’s addressing system 3. The State Department, which has responsibility for cross-border issues and is the department having representation at the ITU 4. The Federal Trade Commission (FTC), which is responsible with protecting consumers (for example, privacy issues) 5. The Internet Corporation for Assigned Names and Numbers (ICANN), which has some jurisdiction over the Internet addressing system5 Because the ENUM plan sits at the point-of-interaction of the telephone– Internet worlds, it presents a nontrivial policy issue in the United States and elsewhere. One issue that regulators can decide is who will take charge of the database that will map telephone numbers to Internet addresses. Some companies (for example, NeuStar) have lobbied the government to approve the proposal quickly; others (for example, VeriSign) want regulators to steer clear of the debate. Proponents in the meantime continue to press forward. At the fall 2001 Voice on the Net (VON) industry conference in Atlanta, Georgia, the formation of the ENUM Alliance was announced. The ENUM Alliance is the first organization of
Telephone Number Mapping (ENUM)
Figure 10.4 Access to ENUM service.
383
Direct Proxy Access SIP Phone Origination via SIP phone
SIP Phone
Private IP Network
NetNumber ENUM SIP Proxy Public Internet
IP Origination/Termination via SIP proxy SIP Re-Direct with ENUM Wireline Phone
PTSN
PSTN Origination/Termination via Softswitch
Softswitch with Proxy
ENUM implementers in the commercial marketplace and represents a major step forward in developing viable, user-driven applications.
10.3
Introduction to ENUM
The average business card today lists a host of information on how to find an individual.6 At the least, it lists an individual’s name and his or her company’s name, address, telephone number, fax number, e-mail address, and Web site URL. It is a great deal of information for one small card; it is also a great deal of information for a person to remember. But what if a business card could list a single point of contact for all media? What if a person could be reached using the information contained in one line of information instead of three or four? Is it possible that network elements can be enabled to find services on the Internet by using only the twelve keys on a telephone keypad? How can telephone numbers be used to access Internet services? ENUM was developed to answer many such questions. It is a solution to how network elements can find services on the Internet by using only a telephone number, and how telephones, having an input mechanism limited to twelve keys on a keypad, can be used to access Internet services. It will provide cost savings and revenue opportunities for both carriers and customers, and it will solve many of the interoperability problems faced by the VOIP industry. To understand the significance of this convergence, one needs to understand the differences between PSTN and IP networks. The telephone network is circuitswitched. A call that originates on that network travels from switch to switch along a dedicated path so that only one call can use that path at a time. Addressing for the PSTN is by telephone number. By contrast, the Internet uses a packet-switched network. Data on this network is segmented into packets that are sent in multiple
384
Chapter Ten
routes across the network to their destinations. There is no dedicated path for each call, unlike that which exists over the PSTN. Addressing for the IP network is by uniform resource locator (URL); for example, http://www.redhillco.com, where http is the protocol (hypertext transfer protocol) and www.redhillco.com is the address of the http server. A side-by-side comparison is shown in Table 10.1. ENUM enables what would traditionally be circuit-switched traffic to be carried along a packet-switched network because it matches a circuit address (a telephone number) to a network address (a URL). Because this traffic is not restricted to traveling along a dedicated line, traffic flow becomes more efficient and much more flexible. What will ENUM allow one to do? As an example, ENUM will change the way that VOIP works by removing many limitations of the current VOIP technology. ENUM solves the problem of addressing by translating a telephone number into a URL. Currently, VOIP allows one to make a call over the Internet to others who have subscribed to a specific service, but that system is not interoperable with the PSTN. Technologies in conjunction with ENUM will also allow one to send an e-mail message by using a telephone number, and a subscriber’s e-mail, fax, instant messenger, and telephone will all be reachable by using the same telephone number. If an application is capable of using a URL, it can be enabled for use with ENUM. The workings of ENUM are designed to be invisible to both user and subscriber. The IP network will be accessible by use of either an Internet-enabled telephone or a standard telephone that has access to a softswitch (a switch that allows access to the IP network) or a circuit switch with IP-enabling software added to it. It is important to note that a call placed from an Internet-enabled telephone will reach the PSTN if that phone number cannot be found on the Internet. How ENUM completes a telephone call is described in the next section. To avoid confusion regarding what ENUM is and is not, note that ENUM does not change the North American Numbering Plan (NANP) or any other telephone numbering plan, nor does it change telephony numbering or its administration in any way. Because it uses existing assigned telephone numbers, ENUM will not drain already scarce numbering resources; in fact, if properly implemented, ENUM can conserve these resources. ENUM does not change telephone numbers themselves but, instead, translates them into domain names for use by the DNS. ENUM Table 10.1
Comparison of PSTN and IP Networks
Network Aspect
Telephone Network
IP Network
Switch Type Traffic Type Connection
Circuit Switched Voice Dedicated Connection
Addressing
Telephone Number
Packet Switched Data Multiple Routes, Multiple Sessions URL
Telephone Number Mapping (ENUM)
385
does not carry traffic, nor does it set up calls; however, it does allow both the caller and the called party (that is, the ENUM subscriber) to establish preferences regarding how communications are to be handled.
ENUM: An Evolving Architecture To take advantage of ENUM, a telephone company must first assign a telephone number to the user for services. The user must then register that number for one or more ENUM services. For example, the user might wish to register the number to receive calls at a home or office phone. Additionally, that user might wish to register an e-mail address, as well as a fax machine, to match the telephone number. However the user chooses to set up these services, the information for the registered services will be saved in what are called naming authority pointer (NAPTR) resource records. To place an ENUM call, first the person initiating the call dials the telephone number as he or she would normally dial it. For instance, the caller dials the number 1-202-555-1234. In cases where the caller dials less than a complete number (for example, a caller within the 202 area code might leave off the 1, or a caller within an office system might dial only 1234), network equipment will re-create the complete form of the number for use with ENUM. Next, the phone number is translated into an address that can be used by the DNS. Because this address is based on a complete international telephone number—say, +1-202-555-1234—a unique Internet address exists for every unique phone number. The phone number 1-202-555-1234, for example, would first be stored as +1-202-555-1234. The 1 is the country code for the United States, Canada, and the seventeen other countries that make up NANP. The + indicates that the number is a complete international telephone number, that is, an E.164 number. E.164 is the name of the international telephone numbering plan administered by the ITU. To determine whether the number and address are ENUM-registered, the telephone number is translated in the following manner: 1. All characters of the number are removed except for the digits—for example, 12025551234 2. The order of the digits is reversed—for example, 43215552021 3. Dots are placed between each digit—for example, 4.3.2.1.5.5.5.2.0.2.1 4. The domain .e164.arpa is appended to the end—for example, 4.3.2.1.5.5.5.2.0.2.1.e164.arpa E164.arpa has been proposed as the DNS domain for use with ENUM. This designation may change as a result of ongoing discussions between the ITU, the IETF, and other international organizations involved with ENUM. In the event that
386
Chapter Ten
the international community chooses a different ENUM domain, the structures discussed here will apply to that new designated domain. The .arpa domain is under the administrative management of the Internet Architecture Board (IAB) and has been designated specifically for Internet infrastructure purposes. ENUM is considered appropriate as an infrastructure application because it provides a set of DNS-based resource directories, referenced by phone number, for use by various ENUM-enabled application clients. The telephone number is reversed because DNS reads addresses from right to left, from the highest level to the lowest level. In this case, a DNS lookup would start at the .arpa domain and continue with .e164. Under e164, it would look for the 1 as the NANP country code; then it would look up each succeeding digit in the telephone number until the address becomes fully resolved. Figure 10.5 shows a number of branches, with top-level domains of .com, .uk, .int, and .arpa. As one can see from Figure 10.5, if DNS begins to search under .arpa, it can then search under .e164, followed by the country code and reversed telephone number. DNS cannot, however, look under .int once it has begun to look in the .arpa tree. Once the phone number is translated into an Internet address, ENUM issues a DNS query on the domain, as previously described. One of two events can happen: 1. If an authoritative name server is found, ENUM will retrieve the relevant NAPTR resource records and the call will proceed according to the subscriber’s registered services for that number. The telephone call that is connected will be conducted entirely over the Internet, without using the PSTN. This call will be connected in as little time as, or even in less time than, a circuit-switched call. 2. If an authoritative name server cannot be found, ENUM will return a 404 Not Found error to the telephone, will open a connection to the PSTN, and route the call conventionally.
root .ARPA .COM .CO
.UK .ITU
.INT
.e164 4.4. (UK) 3.3. (FR)
Figure 10.5 DNS search.
1. (NANP) 4.3.2.1.5.5.5.2.0.2
Telephone Number Mapping (ENUM)
387
Figure 10.6 displays one possibility for a voice call flow using an ENUM lookup. In this case, the subscriber has registered for ENUM services by using the following SIP address: sip:[email protected]. A query based on the telephone number dialed is sent to the DNS server, which returns the SIP address, and the SIP proxy sets up the call. This is only one of a number of ways that ENUM can be used to set up a call. The flow of information will remain the same no matter what the application; of course, depending on the originating and receiving media, the results will vary. The text that follows shows some of the various applications that will be possible under ENUM.
Defining ENUM Applications After reading the previous sections, it should be clear that ENUM can be used to register numerous services in addition to what is fast becoming the most popular service: VOIP. Because telephone numbers will be stored in the DNS, any device that has access to the Internet should be able to look up that telephone number as an Internet address. That device will be able to tell what other services are ENUMregistered and can be accessed for that number. The voice application has been explained in the previous section. Along the same line as telephone use, faxing will become just as easy and efficient. In fact, for fax applications it is even more logical and much more efficient to use the IP network. Circuit-switched networks were designed to carry voice, but fax machines do not send voice traffic—they send data. An Internet-enabled fax machine (or a fax machine on an Internet-enabled circuit-switched network) will have the same basic functionality as an Internet-enabled telephone. As long as the owner of a phone number has registered that number for fax services, another Internetenabled fax machine will be able to reach it using an ENUM lookup. If not, that
DNS-Server
Response SIP:[email protected] Query 4.3.2.1.5.5.5.2.0.2.1.e164.arpa?
"Call Setup" SIP SIP:[email protected]
Figure 10.6 Example of voice call flow.
Dial +1-202-555-1234
SIP Proxy
SIP Proxy
388
Chapter Ten
fax machine will still be reachable via the PSTN; if a 404 Not Found error is sent to the originating fax machine, that machine will be able to open a connection to the PSTN and dial the number traditionally. Using e-mail is just as easy. Rather than typing in an e-mail address, the sender could type in the recipient’s telephone number. Once again, if that number has been mapped to an e-mail address, and if the e-mail software is ENUM-enabled, the mail will be sent and the address lookup will be invisible to both sender and recipient. In this case, the sender would type in the complete international telephone number; the e-mail client would see that this is a phone number, not an e-mail address. The only changes to e-mail clients will be the need to accommodate ENUM functions that will look up the e-mail address registered for that phone number in the DNS. The possibilities become even more far-reaching when one considers the number of Internet applications that can be used. Instant messaging could easily be modified for use with telephone numbers, and gone will be the problem of remembering screen names. A fax machine could send a document to an e-mail address, or a computer could e-mail a document to a fax machine. Voice Protocol for Internet Mail (VPIM), a protocol that stores voicemail electronically, could use ENUM as a method to retrieve voicemail from anywhere in the world. Figure 10.7 shows the several types of numbers that may be registered for ENUM services, along with examples of services that can be enabled under ENUM. In addition to enabling convergence, ENUM will help enable many SIP functions. SIP, in turn, can enhance convergence by enabling convergence services. Some of the applications and services that SIP may enable include traditional call
ENUM Directory DNS
Internet Domain Plane Addresses
E.164 Numbers • 10 Digit • 800 • International • Private Networks
Enabled Applications • Voice over IP • Unified Messaging • Instant Messaging • IP Fax
Figure 10.7 Services enabled by ENUM.
• Personal Web Pages
Telephone Number Mapping (ENUM)
389
forwarding, as well as traditional follow-me and do-not-disturb functions; however, many new applications and services used to merge Internet applications with video and voice communications might also be enabled by SIP. By way of applications that use SIP resources, a person who has made a telephone connection via his or her computer could be made aware of an incoming call by a prompt on his or her computer. That person could then make a selection on the computer to end the dial-up session and answer the phone, forward the call to another number, or send the call to voicemail. As another example, a user could transfer a call to a Web page instead of another phone, in which case the call would end and the user’s Web browser would open the new page. In terms of ENUM support of applications, remember that an application for which one can use a URL might benefit from use with ENUM.
The ENUM Road to Success It is important to remember that ENUM does not disallow the placing or receiving of traditional PSTN telephone calls and that customers are not restricted to communicating only with other ENUM customers. ENUM gives telephone customers the best of two distinct worlds: the ability to place telephone calls over the Internet when a recipient is an ENUM customer and the ability to place traditional PSTN calls. The one-contact business card is merely the start of the advantages that ENUM will bring to customers. If an application exists on the Internet, it can potentially be mapped to a telephone number and reached by using ENUM. A large number of applications will be accessible via the ENUM protocol, but by far the most exciting of these applications have to do with IP telephony. Standards organizations and equipment providers have in the last two years been working toward improving the quality of and developing equipment for IP telephony. This equipment is meant to significantly increase the quality of VOIP so that a telephone call made over the Internet will sound as good as or better than and be connected faster than a traditional circuit-switched call. The QoS for VOIP has been improved greatly over versions of the technology that existed just two years ago, not only because of improvement in the quality of equipment but also because of incredible increases in broadband technology. The time for true IP telephony has come. There are 400 million telephone numbers and over 130 million Internet customers in the United States alone, and because both DNS and E.164 are consistent worldwide, ENUM is a not just a U.S. system but a global one.7 Toward the goal of convergence are unstoppable trends that are destined to create fantastic opportunities for both telephony and Internet industries. According to self-identified proponents, the Internet has matured to the point where true VOIP is not only possible but inevitable. The last two years have seen several conferences and meetings of industry and standards organizations to discuss and define ENUM and its potential roles. Equipment manufacturers have spent that time developing Internet-enabled equipment for use with ENUM. Plans for
390
Chapter Ten
testing of ENUM are underway, and companies already experienced in administering telephony and IP systems are prepared to take leadership positions in the administration of ENUM. The telephony and Internet industries are on the verge of significant changes. ENUM will do much more than merge all the contact information on a business card into just a single phone number. It will expand the access capabilities that we have with current technology and promote research into new technologies. Once ENUM becomes widespread, it may well enable the greatest changes to the Internet that we have yet seen.
10.4
Summary of Capabilities and Issues
Having given a tutorial description in the previous section, we can now proceed to highlight some features of ENUM-based integration. We do this in a question-andanswer format.
Capabilities What are the main benefits that ENUM provides to subscribers? ENUM [1] enables Internet-based users to make a selection from a range of services available for communicating with another person when the caller knows only a telephone number or has access only to a telephone keypad.* ENUM allows users to access Internetbased services and resources from Internet-aware telephones, ordinary telephones connected to Internet gateways or proxy services, and other Internet-connected devices where input is limited to numeric digits. ENUM enables users to specify their preferences for receiving incoming communications and gives greater user control over communications. For example, a user can specify a preference for voicemail messages over live calls during certain times of day or may indicate a destination for call forwarding. What kinds of customers could use ENUM? Although ENUM can be used by individual residential telephone customers, users could also be corporations, government agencies, the military, and many other nonindividual customers. How is ENUM going to work for the average telephone subscriber? What does this system look like to that subscriber? If an average telephone customer were to make a telephone call using, for instance, an Internet-enabled phone to another Internetenabled phone, all of the steps in between would be invisible to the customer. To the call initiator and call receiver, this phone call would appear the same as a call made over the PSTN. Subscribers will, of course, have to identify the ENUM services that they wish to use. *This section is based on its entirety on Reference [1].
Telephone Number Mapping (ENUM)
391
What kinds of applications could use ENUM? ENUM links a telephone number to a host or resources on the Internet that can connect the call, either end-toend over IP networks or through a designated gateway to the PSTN. This capability is useful for connecting SIP- or H.323-compatible endpoints that exist across domain boundaries. Although the potential applications of ENUM are far-reaching, the principal applications for ENUM have centered on two areas: VOIP and VPIM. The longstated goal of the VOIP industry has been to make a phone call over the Internet as easy and as high-quality as a regular PSTN phone call. The goal of the VPIM industry has been to develop a comprehensive mechanism by which voicemail systems can exchange messages over IP networks. ENUM enables carrier and enterprise voicemail systems to find each other, interoperate, and exchange messages. Although VOIP and VPIM are the most widely discussed applications of ENUM, other application possibilities exist, including but not limited to Internet faxing and instant messaging. What is .arpa and why is it the top-level domain for ENUM? Why not create a new top-level domain specifically for ENUM? E164.arpa is the proposed DNS domain for use with ENUM. This designation may change as a result of ongoing discussions between the ITU, the IETF, and other international organizations involved with ENUM. In the event that the international community chooses a different ENUM domain, the structures discussed herein and in IETF RFC 2916 will apply to that new designated domain. The .arpa domain has been designated for Internet infrastructure purposes and is managed by the Internet Assigned Numbers Authority (IANA) in cooperation with the Internet technical community under the guidance of the IAB. A new top-level domain (for example, .e164) was not created, because ENUM is an infrastructure application appropriate for designation within the previously established .arpa domain. ENUM is considered appropriate as an infrastructure application because it provides a set of DNS-based resource directories, referenced by phone number, for use by various ENUMenabled application clients, such as telephones, SIP servers, and voice-messaging systems. Why is the number reversed? DNS names are structured from right to left. In the example used above (4.3.2.1.5.5.5.2.0.2.1.e164.arpa), DNS would first search for the top-level domain arpa, then search second-level domains for e164, then search the next level for the country code 1, and so forth. More information on DNS structuring can be found in the RFC 2672 Non-Terminal DNS Name Redirection (www.rfc-editor.org/rfc/rfc2672.txt). Why are there dots between the numbers? Each dot separates the number into administrative domains, or zones. This separation allows for delegation of authority at various points along the name and eliminates the need for clients to know individual delegation schemes to know where to put the dots. Will a user have to type in the dots and reversed numbers? No, this will be done by the application (for example, a Web browser) or device (for example, an Inter-
392
Chapter Ten
net-enabled telephone) that supports ENUM. The user simply dials a telephone number in the traditional manner. Will ENUM telephone routing confuse the PSTN routing system? ENUM facilitates the discovery of resources associated with a telephone number. Therefore, it facilitates various applications that identify appropriate peer servers associated with an intended end user. It does not, however, impact how these applications will operate once the location of an end user–associated application server has been established. Consequently, ENUM does not affect application-level functions, such as call routing and signaling, regardless of the underlying technology employed. However, it should be noted that a telephone company’s call routing mechanism could use ENUM as well. A core principle of ENUM is that by providing a unified resource directory service, it will not change the existing right-to-use rules and principles for telephone numbers. ENUM is not intended to change how telephone numbers are administered; instead, it will facilitate a wide range of applications by using phone numbers as subscriber names. ENUM also will not interfere with existing PSTN functions and technology, such as circuit switching, Common Channel Signaling System 7 (CCSS7) (ISUP or TCAP), and Intelligent Networking, where similar resource-discovery activities are performed through PSTN legacy technologies. How will the e164.arpa domain be organized? One convenient way of organizing this domain would be to delegate according to the 243 country codes designated by the ITU. It is important to understand, however, that delegation in DNS can occur at any digit or zone domain in DNS terms. For example, within the root e164.arpa, there would be the following: • An NS listing for .1.e164.arpa, representing the country code (1) of the NANP (the United States, Canada, and several Caribbean countries) • An NS listing for .4.4.e164.arpa, representing the country code (44) of the United Kingdom • An NS listing for .6.4.e164.arpa, representing the country code (46) of Sweden • An NS listing for .1.8.e164.arpa, representing the country code (81) of Japan • An NS listing for .8.5.3.e164.arpa, representing the country code (358) of Finland At the national TN/NS level, further NS delegation (DNAME, CNAME, and PTR) can occur to enterprises, TN/NS application service providers, carriers, and even individuals who have DNS servers in their homes. Why use DNS? DNS has been selected for use with ENUM because the technology is there—it already works, it scales, and it is global, fast, and open. ENUM-enabled DNS provides a low-cost shared infrastructure for IP services, similar to SCP functionality.
Telephone Number Mapping (ENUM)
393
What is the effect of e164.arpa deployment on the global DNS system? The answer to this question requires research, such as research into the effect of misdials on the root of e164 (that is, caller specification of a wrong number can result in many additional queries to the e164.arpa root). Additional work will be necessary to advise on the use of ENUM for such applications as determining the level of data caching necessary for relieving stress, suppressing the escalation of poorly formed queries, misdials, and cache misses on the root structure. For telephony applications, performance and load engineering is critical, as query volumes from small- to medium-sized cities alone can easily reach many thousands per second. Response times, as well as transaction loads, must be carefully considered. Conventional DNS caching is of significantly reduced value in ENUM because of the huge size of the name space and relatively even distribution of queries into the name space over arbitrary time intervals. Unlike conventional DNS queries, call volumes are not highly concentrated into a popular small subset of the number space. What are SRV and NAPTR records? These are DNS resource records that contain information about the resources, services, and applications associated with a specific phone number. These services are determined by the subscriber. What happens if a user dials a number that cannot be resolved by DNS? Similar to a 404 Not Found message in a Web browser, an error message will be returned to the device or software initiating the call. In the case of a Web browser, a 404 Not Found message will be displayed. In the case of an SIP telephone, ENUM will open a gateway to the PSTN and connect in the traditional way. What happens if a user dials an emergency number (for example, 911 in the United States and 112 in Europe)? Emergency numbers are generally considered access codes and, as such, are outside of E.164 and ENUM services. If the user dials an emergency number from an SIP phone, the phone will recognize that it cannot make an SIP connection and will open a gateway to the PSTN. What protocol does ENUM use for Internet telephony? ENUM itself is protocolagnostic because it is application-agnostic. It does not specify the applications with which a particular number is associated; instead, it provides a unified way of discovering resources associated with it. It can, for example, work with either H.323 or SIP. What can be said about articles in the technical press claiming that VOIP just does not work? VOIP, an evolving technology, is in an early but rapidly improving stage of development. It is only a question of when—not if—Internet telephony will become a reality, that is, fully integrated with the existing global telephone service. However, ENUM is not intended solely to facilitate VOIP but, rather, to work with a range of applications (including VOIP) where a telephone number is used as a subscriber name. How does ENUM relate to SIP? SIP initiates interactive communications sessions between users, as well as terminates those communications and modifications to sessions. SIP is one protocol that ENUM may use to send out initiation attempts to multiple locations for finding the user who is receiving a call.
394
Chapter Ten
What happens to a phone number when a subscriber ports from one service provider to another? When a number ports, the service provider of record changes; that is, the industry recognizes a different service provider as the holder for that particular number. This recognition is important for routing and billing purposes. Subscribers who port from one service to another should still be able to continue using their ENUM-enabled services, assuming that their new service providers support them. The actual location of server resources identified by ENUM will likely change as the subscriber changes any of the underlying service providers. When the user disconnects services, the number goes back to the original service provider’s inventory, not to the new service provider’s inventory. What happens to the ENUM services when a subscriber cancels telephone service? As we have seen, the number returns to the communications service provider’s inventory for reassignment. The subscriber who cancels telephone services must realize that the carrier that issued that phone number will have to cancel the associations that number has with all ENUM services, even those provided by other service providers. If carrier cancellation is not done, the new user of that telephone number could have a conflict with the old user’s services. However, where number portability is available, a user has the option of porting the telephone number over to a new service provider instead of canceling the existing service and losing that current number. Could ENUM be used to provide telephone number portability? In a country that does not yet have a centralized database administration service, having a shared directory service like ENUM might be of interest. However, ENUM is not intended to serve this function, for significant technical, regulatory, security, and operational limitations exist for using ENUM for this purpose. ENUM is a sharedresource discovery service, not an industry-provisioning service. In most countries where number portability is deployed, a telephone service provider is generally required to comply with regulatory and industry processes, procedures, and systems, regardless of the underlying technology (SIP, H.323, circuit-switched, and so forth) that it employs for telephony service delivery. How ENUM is administered in those countries will also likely require the mirroring of number portability–and number administration–provisioning rules (for example, antislamming) to ensure that service providers using ENUM-enabled services do not violate applicable regulatory rules or industry guidelines. ENUM is another downstream use of number-provisioning and number-administration activities, and will need to be deployed consistent with applicable national requirements, for it does not create an alternate numbering system with its own set of rules and policies. How is the user of a number authenticated? Users could be corporations, individuals, government agencies, military organizations, and numerous other nonindividual users. Service providers typically assign large blocks of numbers to these entities. The telecommunications manager within these entities then assigns numbers to users; thus, even the service providers cannot identify the users for a large
Telephone Number Mapping (ENUM)
395
portion of the allocated numbers. Unresolved at the present, this issue is one that must be resolved before the deployment of a robust, secure ENUM service. It is likely that a service provider that allocates a number to a user will be involved in the authentication process. What about private numbering plans within a company? The ENUM protocol can be used in private numbering plans the same way that it can be used in the public E.164 numbering plan. The Internet telephony gateway or proxy needs some intelligence to decode a particular dialing string and then decide how to look up resources for that particular number. Instead of looking for resources in e164.arpa, the gateway or proxy would look for SRV or NAPTR records for private numbers under some other structure, such as e164.bigcompany.com. Are users going to have to pay to have their telephone numbers ENUMprovisioned? Probably yes; however, the costs will most likely be recovered indirectly through the underlying prices that subscribers pay for ENUM-enabled services. This is a DNS-based system, and someone must pay to have a domain name registered in DNS. Listing telephone numbers will be no different. Whether the cost will be a direct charge to the subscriber or an indirect charge as part of some larger service will depend on those offering the services. It is important to remember that a user does not need to have ENUM list his or her phone number. ENUM would be a subscriber-controlled opt-in system for announcing over the Internet a particular telephone number’s availability to accept service sessions and how to manage those sessions from having subscribed to an ENUM-enabled service. If a customer does not have an Internet telephony device or service, the associated phone number will likely not be listed. However, a subscriber may not necessarily be aware that he or she has subscribed to such a service; the service provider may have had that service provisioned with ENUM on the subscriber’s behalf. Are users going to have control over how this system is used with their phone numbers? Ultimately, yes. To reiterate, the first principle in the creation and operation of a global ENUM service is that phone number subscribers or their designated representatives are the ultimate decision makers on how a DNS record for a phone number is to be provisioned. How will the rights of telephone number subscribers be protected? This is an essential question that must be resolved, although a clear statement of policy to protect subscribers should be part of any ENUM system charter. A simple answer to this question is to respect existing regulatory and business rules regarding number administration, slamming, nonreliance, and so on. Only by replicating or reimplementing ENUM analogs to the existing rules of the road will we avoid a wide range of serious administrative, operational, and political conflicts. How is “slamming,” or “hijacking,” going to be prevented? Slamming, or the involuntary transfer of service provider, must be avoided in any ENUM system. However, it is a serious problem in the PSTN, and one must be careful not to expect more from Internet services than one is able to guarantee elsewhere. Anti-
396
Chapter Ten
slamming fundamentally requires a neutral third party solution. The U.S. industry is grappling with this issue on long distance services right now. It was solved on number portability from the outset. Authenticated subscriber access is not a total solution, because if subscribers disconnect their telephony service, they lose rights to the phone number. Consequently, some combination of originator authentication and telephone number rights validation, using new and existing validation sources, can be used to solve the problem, depending on the level of standard required. Are there any examples of global namespace delegation that should be considered models? The closest technical equivalent is in-addr.arpa, a domain that provides a reverse mapping: from IP address to domain name. It is used as part of the Internet infrastructure operation to help authenticate an IP address and identify the operator associated with an IP address. It is not seen directly by users, as with e164.arpa, and is intended for operational infrastructure rather than for direct access by end users. As with e164.arpa, in-addr.arpa allocations are hierarchical, as according to the infrastructure administrative structure. For in-addr.arpa, the CIDR address allocation hierarchy is used; for e164.arpa, the hierarchy will be based on the ITU E.164 Recommendation. What will be the effort to administer the root of the e164.arpa namespace? Any solution ought to require little or no work on the part of the e164.arpa root administrator. Optimally, the root of e164.arpa should contain a small listing of all of the national ENUM top-level country code name servers. Who can administer the ENUM registry in the short term? ENUM is approaching a stage where the industry will want to begin interoperability testing, for which they will want to use the e164.arpa domain. The interoperability test would have the same principles that current ones do—that is, no charge, sharing of information, and so forth. One method of enabling the registry is to develop an RFC that defines the interim delegation principles for both the IANA principles for the transition to the permanent registry. What can be done in the long term? A formal effort will need to be made to define and establish the structure for ENUM registry. A potential example of the charter for that effort is as follows: 1. Define the global ENUM service. 2. Perform the task of certifying organizations to IANA that wish to operate national TN/NS once they have been nominated by their respective nation states. A simple letter could be sent to the appropriate national authorities that asks them how they wish to proceed or whether they even want to participate. 3. Coordinate technical standards for the operation of ENUM service in cooperation with the IETF. 4. Establish guidelines and policies, which national TN/NS administrators operate. 5. Promote public policy on how ENUM resources should be used.
Telephone Number Mapping (ENUM)
397
Oversight for this activity should comprise several constituencies, such as the following: 1. 2. 3. 4.
The potential ENUM user community The potential ENUM provider community National governments (at least as advisories) IAB-IESG representatives
Who will administer the national telephone number name servers? Many competent companies or organizations can operate these servers. Many companies have, in fact, already come forward to express interest in running these servers initially at no charge and on an experimental basis until consensus can be reached regarding how this system is to ultimately organize. A number of regulatory constraints exist in various countries that might apply to the ENUM administrator, name service operators, and delegation policies below the national level. For example, in a country where local telephony service competition and number portability are being deployed, it is not unusual that a neutral third party be required to provide master database administration services, nor is it unusual that a requirement for antislamming and nonreliance on competing carriers exist for routing or resolution functions.
Advocacy The NetNumber Alliance [3] is the first and only initiative in the SIP industry to organize and advance SIP-standardized IP-based interconnects. Alliance members include • CLECs • Unified Communications (UC) service providers • Voice portals The NetNumber “Interconnect with ENUM” Alliance provides a context for advancing SIP interconnect issues; in fact, SIP is expected to become the interconnect standard. ENUM helps maximize end-to-end SIP and helps softswitch traffic terminate to SIP endpoints. The approach uses softswitch, H323, and SIP technologies. A common view is that SIP is the interconnect standard, as noted above. IP-based interconnects are cheaper. A “common denominator” protocol is needed for IP interconnects. SIP standard implementations are relatively uniform, and there is strong momentum of SIP-based endpoints (for example, UC services and XP clients). A trend exists for SIP adoption within softswitch architectures.
398
Chapter Ten
NetNumber Alliance objectives include 1. Promotion and testing of practical interoperability of SIP and global ENUM across multiple platforms 2. Provision of a context to resolve technical issues, such as NAT and firewalls 3. Offering of a forum to explore business issues associated with SIPinterconnect origination and termination settlement charges 4. Promotion of the availability of SIP-addressable endpoints to interexchange carriers
10.5
Number Portability
This section provides an overview of ITU-T E.164 telephone number portability (NP) in the Global Switched Telephone Network (GSTN), namely, the worldwide PSTN. As discussed elsewhere, if VOIP has to be a competitor to the traditional telephone network, it needs to support such basic capabilities as transport, signaling, Advanced Intelligent Network functions, NP, and so on, as shown by the following: Traditional Telephone Services • Local number portability (LNP) • Line information database (LIDB) • Toll-free, pre-paid, debit card, credit card, and similar services • Call forwarding, call waiting, conference calling, and similar services • Virtual private network (VPN) • Voicemail (VM) • Unified messaging (UM) • Network call center • Dynamic call routing • Mobility There are three types of NP*: (1) service provider number portability (SPNP), (2) location portability (not to be confused with terminal mobility), and (3) service portability [4]. Service provider portability, the focus herein, is a regulatory imperative in many countries seeking to liberalize local telephony service com-
*This section (Section 10.5) is based on its entirety on Reference [4].
Telephone Number Mapping (ENUM)
399
petition by enabling end users to retain preexisting telephone numbers while changing service providers. Implementation of NP within national GSTN entails potentially significant changes to numbering administration, network element signaling, call routing, call processing, service management, billing, and other functions. NP changes the fundamental nature of a dialed E.164 number from a hierarchical physical routing address to a virtual address, thereby requiring the transparent translation of the latter to the former. In addition, there are various regulatory constraints that establish relevant parameters for NP implementation, most of which are not network technology–specific. Consequently, the implementation of NP behavior consistent with applicable regulatory constraints, as well as the need for interoperation with the existing GSTN NP implementations, are relevant topics for numerous areas of IP telephony and VOIP work in progress at IETF. SPNP is a regulatory imperative in many countries seeking to liberalize telephony service competition, especially local service. Historically, local telephony service, as compared to long distance or international service, has been regulated as a utility-like form of service. Although a number of countries began liberalization (for example, privatization, deregulation, and reregulation) many years ago, the advent of NP was relatively recent (mid-1990s). E.164 numbers can be nongeographic and geographic. Nongeographic numbers do not reveal their location information. Geographic E.164 numbers were intentionally designed as hierarchical routing addresses that can systematically be digit-analyzed to ascertain the country, the service network provider, the service end-office switch, and the called party’s specific line. As such, without NP a subscriber wishing to change service providers would incur a number change as a consequence of being served off of a different end-office switch operated by the new service provider. The cost and convenience impact to the subscriber of changing numbers is seen as barrier to competition. Hence, NP has become associated with GSTN infrastructure enhancements associated with a competitive environment driven by regulatory directives. Forms of SPNP have been deployed or are being deployed widely in the GSTN in various parts of the world, including the United States, Canada, western Europe, Australia, and the Pacific Rim (for example, Hong Kong). Other regions, including South America (for example, Brazil), are actively considering it. As already noted implementation of NP within a national telephony infrastructure entails potentially significant changes to numbering administration, network element signaling, call routing, call processing, service management, billing, and other functions. NP changes the fundamental nature of a dialed E.164 number from a hierarchical physical routing address to a virtual address. NP implementations attempt to encapsulate the impacts to the GSTN and make NP transparent to subscribers by incorporating a translation function to map a dialed, potentially ported E.164 address into a network routing address (either a number prefix or another E.164
400
Chapter Ten
address) that can be hierarchically routed. This approach is roughly analogous to the use of network address translation on IP addresses for enabling IP address portability by containing the address change impact to the edge of the network and retaining in the core the use of CIDR blocks, which can be route-aggregated by the network service provider to the rest of the internet. NP bifurcates the historical role of a subscriber’s E.164 address into two or more data elements—a dialed, or virtual, address and a network routing address— that must be made available to network elements through an NP translations database, carried by forward call signaling and recorded on call detail records. Not only is call processing and routing affected but also CCSS7 messaging. A number of TCAP-based SS7 messaging sets use an E.164 address as an application-level network-element address in the Global Title Address (GTA) field of the SCCP message header. Consequently, CCSS7 signaling transfer points (STPs) and gateways need to be able to perform n-digit Global Title Translation (GTT) to translate a dialed E.164 address into its network address counterpart via the NP database. In addition, there are various national regulatory constraints that establish relevant parameters for NP implementation, most of which are not network technology–specific. Consequently, implementations of NP behavior in IP telephony consistent with applicable regulatory constraints, as well as the need for interoperation with the existing GSTN NP implementations, are relevant topics for numerous areas of IP telephony work in progress at IETF. The text that follows describes three types of NP and four schemes that have been specifically standardized to support SPNP for geographic E.164 numbers. Call routing and database query implementations are described for two regions (North America and Europe) and industries (wireless versus wireline). Number portability database (NPDB) interfaces and call routing schemes used in North America and Europe are described to show the variety of standards that may be implemented globally. A glance at global NP implementations is provided. Number pooling is discussed briefly to show how NP is being enhanced in the United States to conserve North American area codes. The conclusion briefly touches upon the potential impacts of NP on IP and telecommunications interoperability.
Types of NP As there are several types of E.164 numbers (telephone numbers, or just TNs) in the GSTN, there are correspondingly several types of E.164 NPs in the GSTN. First, there are so-called nongeographic E.164 numbers, commonly used for service-specific applications such as freephone (800 or 0800). Portability of these numbers is called nongeographic number portability (NGNP). NGNP was deployed in the United States in 1986 to 1992. Geographic number portability (GNP), which includes traditional fixed or wireline numbers as well as mobile numbers that are allocated out of geographic number range prefixes, is sometimes called local number portability (LNP) in the United States.
Telephone Number Mapping (ENUM)
401
NP allows telephony subscribers in the GSTN to keep their phone numbers when they change their service providers or subscribed services or when they move to a new location. The ability to change the service provider while keeping the same phone number is called service provider portability (SPNP), or operator portability. The ability to change a subscriber’s fixed service location while keeping the same phone number is called location portability. The ability to change the subscribed services (for example, from the old, traditional-type telephone service to the ISDN services) while keeping the same phone number is called service portability. Another aspect of service portability is to allow subscribers to enjoy their subscribed services in the same manner when they roam beyond their home networks into cellular and/or wireless networks. Mobile number portability (MNP) refers to specific NP implementation in mobile networks as part of a broader NP implementation in the GSTN or on a standalone basis. Where interoperation of LNP and MNP is supported, service portability between fixed and mobile service types is possible. At the present, SPNP has been the primary form of NP deployed because of its relevance in enabling local service competition. In the GSTN are interim NP (INP) or interim LNP (ILNP) and true NP. Interim NP usually refers to the use of remote call forwarding–like measures to forward calls to ported numbers through the donor network to the new service network. These measures are considered interim relative to true NP, which seeks to remove the donor network or old service provider from the call or signaling path altogether. Often, the distinction between interim and true NP is a national regulatory matter relative to the technical and operational requirements imposed on NP in that country. Implementations of true NP in certain countries (the United States, Canada, Spain, Belgium, and Denmark, for example) may pose specific requirements for IP telephony implementations as a result of regulatory and industry requirements for providing call routing and signaling independent of the donor network or last previous serving network.
SPNP Schemes Four schemes can be used to support SPNP. These schemes are described briefly in the following text, but first some additional terminology is introduced. The donor network (donor SP) is the network that first assigns a telephone number (for example, TN +1-202-533-1234) to a subscriber out of a number range (for example, +1 202-533) administratively assigned to it. The current service provider (new SP), or new serving network, is the network that currently serves the ported number. The old serving network (or old SP) is the network that served the ported number before the number was ported to the new serving network. Because a TN can port a number many times, the old SP is not necessarily the same as the donor network, except for the first time the TN ports away or if the TN ports back
402
Chapter Ten
into the donor network and away again. Although the new SP and old SP roles are transitory as a TN ports around, the donor network is always the same for any particular TN based on the SP to whom the subtending number range was administratively assigned. As noted in the following discussion on number pooling, this enhancement to NP further bifurcates the donor network role into two: the number range, or code holder, network and the block holder network. To simplify the illustration, all of the transit networks are ignored, the originating or donor network is the one that performs the database queries or call redirection, and the dialed directory number has been ported out of the donor network before. It is assumed that the old serving network, the new serving network, and the donor network are different networks to show which networks are involved in call handling and routing and database queries in each of four SPNP schemes discussed here. The port of the number (the process of moving it from one network to another) is assumed to have happened before the call setup and is not included in the call steps. Information carried in the signaling messages to support each of the four schemes is not discussed to simplify the explanation.
All-Call Query (ACQ) Figure 10.8 shows the call steps for the ACQ scheme. Those call steps are as follows: 1. The originating network receives a call from the caller and sends a query to a centrally administered NPDB, a copy of which is usually resident on a network element within its network or through a third party provider. 2. The NPDB returns the routing number associated with the dialed directory number. (The routing number is discussed later.) 3. The originating network uses the routing number to route the call to the new serving network.
Query on Release (QoR) Figure 10.9 shows the call steps for the QoR scheme. Those call steps are as follows:
Figure 10.8 ACQ scheme.
Centralized NPDB
New Serv. Network
Original Network
Donor Network
Number ported
Old Serv. Network
Internal NPDB
Telephone Number Mapping (ENUM)
Figure 10.9 QoR scheme.
Centralized NPDB
New Serv. Network
Original Network
Donor Network
Number ported
403
Old Serv. Network
Internal NPDB
1. The originating network receives a call from the caller and routes the call to the donor network. 2. The donor network releases the call and indicates that the dialed directory number has been ported out of that switch. 3. The originating network sends a query to its copy of the centrally administered NPDB. 4. The NPDB returns the routing number associated with the dialed directory number. 5. The originating network uses the routing number to route the call to the new serving network.
Call Dropback Figure 10.10 shows the call steps for the dropback scheme, also known as Return to Pivot (RTP). Those call steps are as follows: 1. The originating network receives a call from the caller and routes the call to the donor network. 2. The donor network detects that the dialed directory number has been ported out of the donor switch and checks with an internal network– specific NPDB. 3. The internal NPDB returns the routing number associated with the dialed directory number.
Figure 10.10 Dropback scheme.
Centralized NPDB
New Serv. Network
Original Network
Donor Network
Number ported
Old Serv. Network
Internal NPDB
404
Chapter Ten
4. The donor network releases the call by providing the routing number. 5. The originating network uses the routing number to route the call to the new serving network.
Onward Routing (OR) Figure 10.11 shows the call steps for the OR scheme, also known as remote call forwarding. Those call steps are as follows: 1. The originating network receives a call from the caller and routes the call to the donor network. 2. The donor network detects that the dialed directory number has been ported out of the donor switch and checks with an internal network– specific NPDB. 3. The internal NPDB returns the routing number associated with the dialed directory number. 4. The donor network uses the routing number to route the call to the new serving network.
Comparison of the Four Schemes Only the ACQ scheme does not involve the donor network when routing the call to the new serving network of the dialed ported number. The other three schemes involve call setup to or signaling with the donor network. Only the OR scheme requires the setup of two physical call segments, one from the originating network to the donor network and the other from the donor network to the new serving network. The OR scheme is the least efficient in terms of using the network resources. The QoR and dropback schemes set up calls to the donor network first but release the call back to the originating network, which then initiates a new call to the current serving network. For the QoR and dropback schemes, circuits are still reserved one by one between the originating network and the donor network when the originating network sets up the call toward the donor network. Those circuits are released one by one when the call is released from the donor network back to the originating network. The ACQ scheme is the most efficient in terms of using the switching and transmission facilities for the call.
Figure 10.11 OR scheme.
Centralized NPDB
New Serv. Network
Original Network
Donor Network
Number ported
Old Serv. Network
Internal NPDB
Telephone Number Mapping (ENUM)
405
Both the ACQ and QoR schemes involve centralized NPDBs for the originating network to retrieve the routing information. Centralized NPDB means that the NPDB contains ported number information from multiple networks. This is in contrast to the internal network–specific NPDB that is used for the dropback and OR schemes. The internal NPDB contains only information about the numbers that were ported out of the donor network. The internal NPDB can be a standalone database that contains information about all or some ported-out numbers from the donor network. It can also reside on the donor switch, where it contains only information about those numbers ported out of the donor switch. In that case, no query to a standalone internal NPDB is required. The donor switch for a particular phone number is the switch to which the number range is assigned from which that phone number was originally assigned. For example, number ranges in the NANP are usually assigned in the form of central office (CO) codes comprising a six-digit prefix formatted as a NPA+NXX. Thus a switch serving +1-202-533 would typically serve +1-202-533-0000 through +1-202-533-9999. In major cities, switches usually host several CO codes. NPA, for numbering plan area (also known as the area code), is three digits long and has the format of NXX, where N is any digit from 2 to 9 and X is any digit from 0 to 9. NXX in the NPA+NXX format is known as the office code that has the same format as the NPA. When the first number out of an NPA+NXX code is ported out to another switch, that NPA+NXX is called portable NPA+NXX. Similarly, in other national E.164 numbering plans, number ranges cover a contiguous range of numbers within that range. Once a number within that range has been ported away from the donor network, all numbers in that range are considered potentially ported and should be queried in the NPDB. The ACQ scheme has two versions. One version is for the originating network to always query the NPDB when a call is received from the caller regardless of whether the dialed directory number is ported; the other version is to check whether the dialed directory number belongs to any portable number range. If yes, an NPDB query is sent; if not, no NPDB query is sent. The former version performs better when there are many portable number ranges; the latter version performs better when there are not many portable number ranges at the expense of checking every call to see whether NPDB query is needed. The latter version is similar to the QoR scheme, except the QoR scheme uses call setup and relies on the donor network to indicate the ported-out number before launching the NPDB query.
Database Queries in the NP Environment As indicated earlier, the ACQ and QoR schemes require a switch to query the NPDB for routing information. Various standards have been defined for switchto-NPDB interfaces. Those interfaces, with their protocol stacks, are described briefly in the following text. NPDB is used for a standalone database that may
406
Chapter Ten
support one, some, or all of the interfaces discussed. The NPDB query contains the dialed directory number; the NPDB response contains the routing number. Certainly, other information is sent in the query and response; however, the primary interest is to get the routing number from the NPDB to the switch for call routing.
The United States and Canada One of the following five NPDB interfaces can be used to query an NPDB: 1. Advanced Intelligent Network (AIN), using the American National Standards Institute (ANSI) version of the Intelligent Network Application Part (INAP) [5, 6]. The INAP is carried on top of the protocol stack that includes the ANSI Message Transfer Part (MTP) Levels 1 through 3, ANSI Signaling Connection Control Part (SCCP), and ANSI Transaction Capabilities Application Part (TCAP). This interface can be used by wireline or wireless switches, is specific to NP implementation in North America, and is modeled on the Public Office Dialing Plan (PODP) trigger defined in the AIN 0.1 call model. 2. Intelligent Network (IN), similar to the network used for querying 800 databases, is carried on top of the protocol stack that includes the ANSI MTP Levels 1 through 3, ANSI SCCP, and ANSI TCAP. This interface can be used by wireline or wireless switches. 3. ANSI IS-41 [7] is carried on top of the protocol stack that includes the ANSI MTP Levels 1 through 3, ANSI SCCP, and ANSI TCAP. This interface can be used by IS-41–based cellular/personal communication services (PCSs) wireless switches, such as AMPS, TDMA, and CDMA. Cellular systems use spectrum at the 800-MHz range; PCS systems use spectrum at the 1900-MHz range. 4. Global System for Mobile Communication Mobile Application Part (GSM MAP) [8] is carried on top of the protocol stack that includes the ANSI MTP Levels 1 through 3, ANSI SCCP, and International Telecommunication Union—Telecommunication Sector (ITU-TS) TCAP. It can be used by PCS 1900-MHz wireless switches that are based on GSM technologies. GSM is a series of wireless standards defined by the European Telecommunications Standards Institute (ETSI). 5. ISUP triggerless translation, in which NP translations are performed transparently to the switching network by the signaling network (STPs or signaling gateways). ISUP IAM messages are examined to determine whether the Called Party Number (CdPN) field has already been translated; if the field has not been translated, an NPDB query is performed and the appropriate parameters in the IAM message are modified to reflect the results of the
Telephone Number Mapping (ENUM)
407
translation. The modified IAM message is forwarded by the signaling node to the designated DPC in a transparent manner to continue call setup. The NPDB can be integrated with the signaling node or accessed via an API, locally or by query to a remote NPDB using a proprietary protocol or the four schemes described previously. Wireline switches can use interfaces 1, 2, or 5 from the preceding list; IS41–based wireless switches can use interfaces 1, 2, 3, or 5; PCS 1900-MHz wireless switches can use interfaces 1, 2, 4, or 5. In the United States, SPNP will be supported by both wireline and wireless systems, not only within the wireline or wireless domain but also across the wireline/wireless boundary. However, this is not true in Europe, where SPNP is usually supported only within the wireline or wireless domain, not across the wireline/wireless boundary, because of explicit use of service-specific number-range prefixes to avoid caller confusion about the call charge. GSM systems in Europe are assigned distinctive destination network codes, and the caller pays a higher charge when calling a GSM directory number.
Europe One of the following three interfaces can be used to query an NPDB: 1. Capability Set 1 (CS1) of the ITU-TS INAP [9], which is carried on top of the protocol stack that includes the ITU-TS MTP Levels 1 through 3, ITUTS SCCP, and ITU-TS TCAP. 2. Capability Set 2 (CS2) of the ITU-TS INAP [10], which is carried on top of the protocol stack that includes the ITU-TS MTP Levels 1 through 3, ITU-TS SCCP, and ITU-TS TCAP. 3. ISUP triggerless translation, in which NP translations are performed transparently to the switching network by the signaling network (STPs or signaling gateways). ISUP IAM messages are examined to determine whether the CdPN field has already been translated; if the field has not been translated, an NPDB query is performed and the appropriate parameters in the IAM message are modified to reflect the results of the translation. The modified IAM message is forwarded by the signaling node to the designated DPC in a transparent manner to continue call setup. Wireline switches can use interfaces 1, 2, or 3; however, all of the implementations in Europe so far are based on CS1. As indicated previously, NP in Europe does not go across the wireline/wireless boundary. The wireless switches can also use interfaces 1 or 2 to query the NPDBs if those NPDBs contain ported wireless directory numbers. MNP is used to support SPNP by GSM networks in Europe. In most if not all cases in Europe, calls to the wireless directory numbers are first routed to the wireless donor network; there, an internal NPDB is queried to
408
Chapter Ten
determine whether the dialed wireless directory number has been ported out. In this case, the interface to the internal NPDB is not subject to standardization. MNP in Europe can also be supported via the MNP Signaling Relay Function (MNP-SRF). An internal NPDB or a database integrated at the MNP-SRF is used to modify the SCCP Called Party Address parameter in GSM MAP messages so that they can be redirected to the wireless serving network. (Call routing involving MNP is explained later.)
Call Routing in the NP Environment This section discusses call routing after routing information is retrieved either through an NPDB query or internal database lookup at the donor switch or from the ISUP signaling message (for example, for the dropback scheme). For the ACQ, QoR, and dropback schemes, it is the originating network that has the routing information and is ready to route the call; for the OR scheme, it is the donor network that has the routing information and is ready to route the call. A number of triggering schemes may be employed for determining where in the call path the NPDB query is performed. In the United States, for domestic calls, the originating local carrier performs the query; otherwise, the long distance carrier is expected to. To ensure independence of the actual trigger policy employed in any one carrier, forward call signaling is used to flag an NPDB query that has already been performed and, therefore, to suppress any subsequent NP triggers that may be encountered in downstream switches in downstream networks. This approach allows the earliest able network in the call path to perform the query without introducing additional costs and call setup delays when redundant queries are performed downstream.
The United States and Canada In the United States and Canada, a ten-digit NANP number called a location routing number (LRN) is assigned to every switch involved in NP. In the NANP, a switch is not reachable unless it has a unique number range (CO code) assigned to it. Consequently, the LRN for a switch is always assigned out of a CO code assigned to that switch. The LRN assigned to a switch currently serving a particular ported telephone number is returned as the network routing address in the NPDB response. The SPNP scheme adopted in North America is very often referred to as the LRN scheme, or LRN method. LRN serves as a network address for terminating calls served off that switch using ported numbers. The LRN is assigned by the switch operator using any of the unique CO codes (NPA+NXX) assigned to that switch. The LRN is considered a nondialable address because the same ten-digit number value may be assigned to a line on that switch. A switch may have more than one LRN.
Telephone Number Mapping (ENUM)
409
During call routing and processing, a switch performs an NPDB query to obtain the LRN associated with the dialed directory number. NPDB queries are performed for all the dialed directory numbers in which the NPA+NXX codes are marked as portable NPA+NXX at that switch. When formulating the ISUP IAM to be sent to the next switch, the switch puts the ten-digit LRN into the ISUP CdPN parameter and the originally dialed directory number into the ISUP Generic Address parameter (GAP). A new code in the GAP is defined to indicate that the address information in the GAP is the dialed directory number. A new bit in the ISUP Forward Call Indicator (FCI) parameter—that is, the Ported Number Translation Indicator (PNTI) bit—is set to imply that NPDB query has already been performed. All switches in the downstream will not perform the NPDB query if the PNTI bit is set. When the terminating switch receives the IAM and sees the PNTI bit in the FCI parameter set and its own LRN in the CdPN parameter, it retrieves the originally dialed directory number from the GAP and uses the dialed directory number to terminate the call. A dialed directory number with a portable NPA+NXX does not imply that directory number has been ported. The NPDBs currently do not store records for nonported directory numbers. In that case, the NPDB will return the same dialed directory number instead of the LRN. The switch will then set the PNTI bit but keep the dialed directory number in the CdPN parameter. In the real-world environment, the originating network is not always the one that performs the NPDB query. For example, it is usually the long distance carriers that query the NPDBs for long distance calls. In that case, the originating network operated by the local exchange carrier (LEC) simply routes the call to the long distance carrier that is to handle that call. A wireless network acting as the originating network can also route the call to the interconnected local exchange carrier network if it does not want to support the NPDB interface at its mobile switches.
Europe In Europe, a routing number is prefixed to the dialed directory number. The ISUP CdPN parameter in the IAM will contain the routing prefix and the dialed directory number. For example, the United Kingdom uses routing prefixes with the format of 5XXXXX; Italy, C600XXXXX. The networks use the information in the ISUP CdPN parameter to route the call to the new/current serving network. The routing prefix can identify the current serving network or the current serving switch of a ported number. For the former case, another query to the internal NPDB at the current serving network is required to identify the current serving switch before routing the call to that switch. This shields the current serving switch information for a ported number from the other networks at the expense of an additional NPDB query. Another routing number, which may be meaningful within the current serving network, will replace the previously prefixed routing
410
Chapter Ten
number in the ISUP CdPN parameter. For the latter case, the call is routed to the current serving switch without an additional NPDB query. When the terminating switch receives the IAM and sees its own routing prefix in the CdPN parameter, it retrieves the originally dialed directory number after the routing prefix and uses the dialed directory number to terminate the call. In addition to including the routing prefix to the CdPN parameter, some other information may be added or modified as is listed in the draft ITU-T Recommendation Q.769.1 [11]. Those enhancements in the ISUP to support NP are described briefly below. Three methods can be used to transport the directory number (DN) and the routing number (RN): 1. Two separate parameters with the CdPN parameter containing the RN and a new Called Directory Number (CdDN) parameter containing the DN. A new value for the Nature of Address (NOA) indicator in the CdPN parameter is defined to indicate that the RN is in the CdPN parameter. The switches use the CdPN parameter to route the call, as is done today. 2. Two separate parameters with the CdPN parameter containing the DN and a new Network Routing Number (NRN) parameter containing the RN. This method requires that the switches use the NRN parameter to route the call. 3. Concatenated parameter with the CdPN parameter containing the RN plus the DN. A new NOA indicator in the CdPN parameter is defined to indicate that the RN is concatenated with the DN in the CdPN parameter. Some countries may not use new NOA values, because the routing prefix does not overlap with the dialed directory numbers. However, if the routing prefix overlaps with the dialed directory numbers, a new NOA value must be assigned. Spain uses XXXXXX as the routing prefix to identify the new serving network and uses a new NOA value of 126. A network option to add a new ISUP parameter, known as the Number Portability Forwarding Information parameter, has a 4-bit Number Portability Status Indicator field that indicates whether an NP query is done for the called directory number and whether the called directory number is ported or not if the NP query is done. All of the enhancements discussed above are for national use because NP is supported within a nation. Within each nation, the telecommunications industry or the regulatory bodies can decide which method or methods to use. NP-related parameters and coding are never passed across the national boundaries. As indicated previously, an originating wireless network can query the NPDB and concatenate the RN with the DN in the CdPN parameter; then route the call directly to the current serving network. If NPDBs do not contain information about the wireless DNs, the call—originated from either a wireline or wireless network—will be routed to the wireless
Telephone Number Mapping (ENUM)
411
donor network. There, an internal NPDB is queried to retrieve the RN that is then concatenated with the DN in the CdPN parameter. If MNP-SRF is supported, the wireless donor network’s Gateway Mobile Services Switching Center (GMSC) that receives a call from the wireline network can send the GSM MAP Send Routing Information (SRI) message to the MNP-SRF. The MNP-SRF interrogates an internal or integrated NPDB for the RN of the MNP-SRF of the wireless current serving network and prefixes the RN to the dialed wireless DN in the global title address information in the SCCP Called Party Address (CdPA) parameter. This SRI message will be routed to the MNPSRF of the wireless current serving network, which then responds with an acknowledgment by providing the RN plus the dialed wireless DN as the Mobile Station Roaming Number (MSRN). The GMSC of the wireless donor network formulates the ISUP IAM with the RN plus the dialed wireless DN in the CdPN parameter and routes the call to the wireless current serving network. A GMSC of the wireless current serving network receives the call and sends an SRI message to the associated MNP-SRF, where the global title address information of the SCCP CdPA parameter contains only the dialed wireless DN. The MNP-SRF replaces the global title address information in the SCCP CdPA parameter with the address information associated with a home location register (HLR) hosting the dialed wireless DN; then the MNP-SRF forwards the message to that HLR after verifying that the dialed wireless DN is a ported-in number. The HLR then returns an acknowledgment by providing an MSRN for the GMSC to route the call to the MSC currently serving the mobile station associated with the dialed wireless DN. (See Reference [12] for details and additional scenarios.)
NP Implementations for Geographic E.164 Numbers Number Portability has been implemented by a large number of operators. Table 10.2 shows the known SPNP implementations worldwide.
NP-Enabled Number Conservation Methods In addition to porting numbers, NP provides number administrators the ability to assign numbering resources to operators in smaller increments. Today, it is common for numbering resources to be assigned to telephone operators in a large block of consecutive telephone numbers (TNs). For example, in North America each of these blocks contains 10,000 TNs and is of the format NXX+0000 to NXX+9999. Operators are assigned a specific NXX, or block; thus, operators are referred to as block holders. A specific block has 10,000 TNs with line numbers ranging from 0000 to 9999. Instead of assigning an entire block to an operator, NP allows an administrator to assign a subblock or even an individual TN, processes known as block pooling and individual telephone number (ITN) pooling, respectively.
Table 10.2 Country Argentina Australia
Austria Belgium
Brazil Chile Colombia
Denmark
Finland France Germany
Hong Kong
Ireland
Italy
Japan Mexico Netherlands
Norway
Peru
Snapshot of SPNP Implementations. SPNP implementation Analyzing operative viability now. Will determine whether portability should be made obligatory after a technical solution has been determined. NP supported by wireline operators since November 30, 1999. NP among wireless operators around March or April 2000 but may be delayed to 1Q01. The access provider or long distance provider has the obligation to route the call to the correct destination. The donor network is obligated to maintain and make available a register of numbers ported away from its network. Telstra uses onward routing via an on-switch solution. Uses onward routing at the donor network. Routing prefix is 86xx, where xx identifies the recipient’s switch. ACQ selected by the industry. Routing prefix is Cxxxx, where xxxx identifies the recipient’s switch. Another routing prefix is C00xx, where xx identifies the recipient’s network. Plans to use NOA for identifying concatenated numbers, and plans to abandon the hexadecimal routing prefix. Considering NP for wireless users. Has recently discussed NP. There was an Article 3.1 on NP to support NP before December 31, 1999, when NP became technically possible. Regulators have not yet issued regulations concerning this matter. Uses ACQ. Routing number not passed between operators; however, NOA is set to 112 to indicate the ported number. QoR can be used based on bilateral agreements. Uses ACQ. Routing prefix is 1Dxxy, where xxy identifies the recipient’s network and service type. Uses onward routing. Routing prefix is Z0xxx, where xxx identifies the recipient’s switch. The originating network needs to do necessary rerouting. Operators decide their own solutions. Deutsche Telekom uses ACQ. Routing prefix is Dxxx, where xxx identifies the recipient’s network. Recipient network informs other networks about ported-in numbers. Routing prefix is 14x, where 14x identifies the recipient’s network, or a routing number of 4x plus 7 or 8 digits is used, where 4x identifies the recipient’s network and the rest of digits identify the called party. Operators choose their own solution but use onward routing. Routing prefix is 1750 as the intranetwork routing code (network-specific) and 1752xxx to 1759xxx for GNP, where xxx identifies the recipient’s switch. Uses onward routing. Routing prefix is C600xxxxx, where xxxxx identifies the recipient’s switch. Telecom Italia uses IN solution; and other operators use an on-switch solution. Uses onward routing. Donor switch uses IN to get the routing number. NP is considered in the telecom law; however, the regulator (Cofetel) or the new local entrants have started no initiatives on this process. Operators decide which NP scheme to use. Operators have chosen ACQ or QoR. KPN implemented an IN solution similar to the U.S. solution. Routing prefix is not passed between operators. OR for the short term and ACQ for the long term. QoR is optional. Routing prefix can be xxx (with NOA = 8) or 142xx (with NOA = 3), where xxx or xx identifies the recipient’s network. Wireline NP was planned to be supported in 2001.
412
Telephone Number Mapping (ENUM)
413
Table 10.2 (Continued) Country Portugal Spain Sweden
Switzerland United Kingdom United States
SPNP implementation No NP today. Uses ACQ. Telefonica uses QoR within its network. Routing prefix is xxyyzz, where xxyyzz identifies the recipient’s network. NOA is set to 126. Standardized the ACQ but OR for operators without IN. Routing prefix is xxx (with NOA = 8) or 394xxx (with NOA = 3), where xxx identifies the recipient’s network. Operators decide which NP scheme to use. Telia uses onward routing between operators. Uses OR now and QoR in 2001. Routing prefix is 980xxx, where xxx identifies the recipient’s network. Uses onward routing. Routing prefix is 5xxxxx, where xxxxx identifies the recipient’s switch. NOA is 126. BT uses the dropback scheme in some parts of its network. Uses ACQ. LRN is used in the Called Party Number parameter. Called Party Number is carried in the Generic Address Parameter. Uses a PNTI indicator in the Forward Call Indicator parameter to indicate that the NPDB dip has been performed.
Block Pooling Block pooling refers to the process whereby a number administrator assigns a range of numbers defined by a logical subblock of the existing block. Using North America as an example, block pooling would allow a number administrator to assign subblocks of 1000 TNs to multiple operators; that is, NXX+0000 to NXX+0999 can be assigned to operator A, NXX+1000 to NXX+1999 can be assigned to operator B, NXX-2000 to 2999 can be assigned to operator C, and so on. In this example, block pooling divides one block of 10,000 TNs into ten blocks of 1000 TNs. Porting the subblocks from the block holder enables block pooling. Using the example above, operator A is both the block holder and the holder of the first subblock, NXX+0000 to NXX+0999. The second subblock, NXX+1000 to NXX+1999, is ported from operator A to operator B; the third subblock, NXX+2000 to NXX+2999, is ported from operator A to operator C; and so on. NP administrative processes and call processing will enable proper and efficient routing. From a number administration and NP administration perspective, block pooling introduces a new concept, that of the subblock holder. Block pooling requires coordination between the number administrator, the NP administrator, the block holder, and the subblock holder. Block pooling must be implemented in a manner that allows for NP within the subblocks. Each TN can have a different serving operator, subblock holder, and block holder.
ITN Pooling ITN pooling refers to the process whereby a number administrator assigns individual telephone numbers to operators. Using North America as an example, one
414
Chapter Ten
block of 10,000 TNs can be divided into 10,000 ITNs. ITN is more commonly deployed in freephone services. In ITN, the block is assigned not to an operator but to a central administrator. The administrator then assigns ITNs to operators. NP administrative processes and call processing will enable proper and efficient routing.
Conclusion There are three general areas of impact to IP telephony work in progress at IETF: 1. Interoperation between NP in GSTN and IP telephony 2. NP implementation or emulation in IP telephony 3. Interconnection to NP administrative environment A good understanding of how NP is supported in the GSTN is important when addressing the interworking issues between IP-based networks and the GSTN, and it is especially important when the IP-based network needs to route calls to the GSTN. As shown previously, a variety of standards exist with various protocol stacks for the switch-to-NPDB interface. Moreover, national variations of the protocol standards make it complicated to deal with in a global environment. If an entity in the IP-based network needs to query those existing NPDBs for routing number information to terminate calls to the destination GSTN, it would be impractical if not impossible for that entity to support all those interface standards to access the NPDBs in many countries. Several alternatives may address this particular problem. One is to use certain entities in the IP-based networks for dealing with NP query, similar to the international switches used in the GSTN to interwork different national ISUP variations. This alternative would force signaling information associated with calls to certain NP-capable networks in the terminating GSTN to be routed to those IP entities that support the NP functions. Those IP entities would then query the NPDBs in the terminating country, which would limit the number of NPDB interfaces that certain IP entities need to support. Another alternative can be to define a common interface to be supported by all the NPDBs so that all the IP entities use that standardized protocol to query them. Existing NPDBs can support this additional interface, or new NPDBs can be deployed that contain the same information but support the common IP interface. The candidates for such a common interface include Lightweight Directory Access Protocol (LDAP) and SIP [13] (using, perhaps, the SIP redirection capability). Of course, another alternative is to use the interworking function to convert from one protocol to another. IP-based networks can handle domestic calls between two GSTNs. If the originating GSTN has performed the NPDB query, SIP will need to transport and
Telephone Number Mapping (ENUM)
415
make use of some of the ISUP signaling information, even if the ISUP signaling is encapsulated in SIP. Also, IP-based networks may perform the NPDB queries as the N-1 carrier. In that case, SIP also needs to transport the NP-related information while the call is routed to the destination GSTN. There are three pieces of NP-related information that SIP needs to transport: (1) the called directory number, (2) a routing number, and (3) an NPDB dip indicator. The NPDB dip indicator is needed so that the terminating GSTN will not perform another NPDB dip; the routing number is needed so that it is used to route the call to the destination network or switch in the destination GSTN; and the called directory number is needed so that the terminating GSTN switch can terminate the call. When the routing number is present, the NPDB dip indicator may not be present, because there are cases where the routing number is added for routing the call even if NP is not involved. One issue is how to transport the NP related information via SIP. Using the SIP URL is one mechanism, but a better choice may be to add an extension to the “tel” URL [14] that is also supported by SIP. If the called directory is +1-202-533-1234, and its associated routing number is +1-202-5440000, the tel URL may look like tel:+1-202-533-1234;rn=+1-202-5440000;npdi=yes, where “rn” stands for routing number and “npdi” stands for NPDB dip indicator. The rn and npdi will be two new parameters to add as extensions to the tel URL to support NP. Since the “fax” URL is similar to the tel URL in that NP can impact the fax calls as well as the telephone calls, the same extensions to the tel URL need to be applied to the fax URL as well. (See [15] for the proposed extensions to the tel URL to support NP and freephone service.) Those extensions to the tel and fax URLs will be supported automatically by SIP, for they can be carried as optional parameters in the user portion of the “sip” URL. For a called directory number that belongs to a country supporting NP, and if the IP-based network is to perform the NPDB query, it is logical to perform the NPDB dip first to retrieve the routing number, using that routing number to select the correct IP telephony gateways that can reach the serving switch serving the called directory number. If the rn parameter is present in the tel URL in the SIP INVITE message, it therefore should be used instead of the called directory number for making routing decisions. If rn is not present, however, the dialed directory number can be used as the routing number for making routing decisions. Telephony Routing Information Protocol (TRIP) [16] is a policy-driven interadministrative domain protocol for advertising the reachability of telephony destinations between location servers and for advertising attributes of the routes to those destinations. With NP in mind, it is important to note that it is the routing number (if present) rather than the called directory number against which the TRIP tables should be checked for making the routing decisions. Overlap signaling exists in the GSTN today. For a call routing from the origi-
416
Chapter Ten
nating GSTN to the IP-based network that involves overlap signaling, NP will impact the call processing within the IP-based networks if those networks must deal with the overlap signaling. The entities in the IP-based networks that are to retrieve the NP information (for example, the routing number) must collect complete called directory number information before retrieving the NP information for a ported number; otherwise the information retrieval will not be successful. This issue is important for the IP-based networks when the originating GSTN does not handle the overlap signaling and collect the complete called directory number. The IETF ENUM working group has defined the use of DNS for identifying available services associated with a particular E.164 number [17]. Reference [18] outlines the principles for the operation of a telephone number service that resolves telephone numbers into Internet domain name addresses and servicespecific directory discovery. In Reference [18] is a three-level approach—the first level is the mapping of the telephone number delegation tree to the authority to which the number has been delegated; the second level, the provision of the requested DNS resource records from a service registrar; and the third level, the provision of service-specific data from the service provider itself. NP certainly must be considered at the first level, because the telephony service providers do not own or control the telephone numbers under the NP environment; therefore, they may not be the proper entities to have authority for a given E.164 number. Moreover, the donor network should not be relied on to reach the delegated authority during the DNS process, because there is a regulatory requirement on NP in some countries. The delegated authority for a given E.164 number is likely to be either an entity designated by the end user that owns and controls a specific telephone number or a third party designated by the end user or by the industry. The IP-based networks also may need to support some forms of NP in the future if E.164 numbers [19] are assigned to the IP-based end users. One method is to assign a GSTN-routing number for each IP-based network domain or entity in an NP-capable country; doing so may increase the number of digits in the routing number to incorporate the IP entities and impact the existing routing in the GSTN. Another method is to associate each IP entity with a particular GSTN gateway at which the called directory number is used to locate the IP entity serving that dialed directory number. Yet another method is to assign a special routing number so that the call to an end user currently served by an IP entity is routed to the nearest GSTN gateway. The called directory number is then used to locate the IP entity serving that dialed directory number, and some mechanisms would be needed for the IP-based network to locate the IP entity serves a particular dialed directory number. One such mechanism can be ENUM. Many other types of networks use E.164 numbers to identify the end users or terminals in those networks. NP among GSTN, IP-based networks, and various other types of networks may also need to be supported in the future.8
Telephone Number Mapping (ENUM)
10.6
417
E.164 Numbers and DNS
This section embodies the RFC 2916 approach to use of the DNS for storage of E.164 numbers. Specifically, this section* discusses how DNS can be used for identifying available services connected to one E.164 number [17].
Introduction Through transformation of E.164 numbers into DNS names, the use of existing DNS services, such as delegation through NS records, and the use of NAPTR [20] records in a DNS [21, 22], one can look up the available services for a specific domain name in a decentralized way with distributed management of the different levels in the lookup process.
ITU Numbers and DNS The e164.arpa domain is being populated to provide the DNS infrastructure for storage of E.164 numbers. To facilitate distributed operations, this domain is divided into subdomains. Holders of E.164 numbers who want to be listed in the DNS should contact the appropriate zone administrator by examining the SOA resource record associated with that zone, just as they would do in normal DNS operations. Of course, as with other domains, policies for such listings are controlled on a subdomain basis and may differ depending on the country of origin. To find the DNS names for a specific E.164 number, use the following procedure: 1. See that the E.164 number is written in its full form, including the country code IDDD (for example: +46-8-9761234). 2. Remove all nondigit characters with the exception of the lead-in + (for example: +4689761234). 3. Remove all characters with the exception of the digits (for example: 4689761234). 4. Put dots (.) between each digit (for example: 4.6.8.9.7.6.1.2.3.4). 5. Reverse the order of the digits (for example: 4.3.2.1.6.7.9.8.6.4). 6. Append the .e164.arpa string to the end (for example: 4.3.2.1.6.7.9.8.6.4.e164.arpa). Note: The + is kept in step 2 of the preceding list for indicating that the number on which the regular expression operates is an E.164 number. Future work will be needed to determine how other numbering plans (such as closed ones) might
*This section (Section 10.6) is based on its entirity on Reference [17].
418
Chapter Ten
be identified. It is possible, though not definite, that such plans will use a similar mechanism as the one described in RFC 2916.
Fetching Uniform Resource Identifiers (URIs) Given an E.164 Number For a record in the DNS, the NAPTR record is used for identifying available ways of contacting a specific node identified by that name. It can be used for knowing what services exist for a specific domain name, including phone numbers by the use of the e164.arpa domain as described above. The identification uses the NAPTR resource record defined for use in the URI resolution process, although it can be generalized in a way that suits the needs specified in this document. The NAPTR resource record is the string from step 2 of the foregoing list that is input to the NAPTR algorithm.
The NAPTR Record The key fields in the NAPTR resource record are order, preference, service, flags, regexp, and replacement. • The order field specifies the order in which records must be processed when multiple NAPTR records are returned in response to a single query. • The preference field specifies the order in which records should be processed when multiple NAPTR records have the same value of order. • The service field specifies the resolution protocol and resolution services that will be available if the rewrite specified by the regexp or replacement fields is applied. • The flags field contains modifiers that affect what happens in the next DNS lookup, typically for optimizing the process. • The regexp field is one of two fields used for the rewrite rules; it is the core concept of the NAPTR record. • The replacement field is the other field that may be used for the rewrite rule. Note that the client applies all the substitutions and performs all the lookups; these activities are not performed in the DNS servers. Note also that URIs are stored in the regexp field. Specification for Use of NAPTR Resource Records The input is an E.164encoded telephone number. The output is a URI in its absolute form, as according to the “absoluteURI” production in the Collected ABNF found in RFC 2396 [23]. An E.164 number, without any characters except for a lead-in + and digits (result of step 2 from the list early in this section) is the input to the NAPTR algorithm. The service supported for a call is E2U.
Telephone Number Mapping (ENUM)
419
Specification of Service E2U (E.164 to URI) • Name: E.164 to URI • Mnemonic: E2U • Number of operands: 1 • Type of each operand: First operand is an E.164 number • Format of each operand: First operand is the E.164 number in the form as specified in step 2 from the list early in this section • Algorithm: Opaque • Output: One or more URIs • Error conditions: • E.164 number not in the numbering plan • E.164 number in the numbering plan, but no URIs exist for that number • Service unavailable • Security considerations: • Malicious redirection • Denial of service One of the fundamental dangers related to any service such as this is that a malicious entry in a resolver’s database will cause clients to resolve the E.164 into the wrong URI. The possible intent may be to cause the client to retrieve a resource containing fraudulent or damaging material. Also, by removing the URI to which the E.164 maps, a malicious intruder may remove the client’s ability to access the resource (denial of service). This operation is used to map a one E.164 number to a list of URIs. The first well-known step in the resolution process is to remove all nondigits apart from the lead-in + from the E.164 number as described in steps 1 and 2 of the list early in this section.
Examples Example 1 &ORIGIN 4.3.2.1.6.7.9.8.6.4.e164.arpa IN NAPTR 100 10 "u" "sip+E2U" "!^.*$!sip:[email protected]!" IN NAPTR 102 10 "u" "mailto+E2U" "!^.*$!mailto:[email protected]!"
This describes that the domain 4.3.2.1.6.7.9.8.6.4.e164.arpa is preferably contacted by SIP and secondly by SMTP. In both cases, the next step in the resolution process is to use the resolution mechanism for each of the protocols, SIP and SMTP, to know what node to contact for each.
420
Chapter Ten
Example 2 $ORIGIN 4.3.2.1.6.7.9.8.6.4.e164.arpa IN NAPTR 10 10 "u" "sip+E2U". "!^.*$!sip:[email protected]!" IN NAPTR 102 10 "u" "mailto+E2U" "!^.*$!mailto:[email protected]!" IN NAPTR 102 10 "u" "tel+E2U" "!^.*$!tel:+4689761234!"
Note that the preferred method is to use the SIP protocol; however, the result of the NAPTR record rewrite is a URI (the “u” flag in the NAPTR record). In the case of the protocol SIP, the URI might be a SIP URI, which is resolved as described in RFC 2543 [24]. In the case of the tel URI scheme [25], the procedure is restarted with this new E.164 number. The client is responsible for loop detection. The rest of the resolution of the routing is done as described above. Example 3 $ORIGIN 6.4.e164.arpa. * IN NAPTR 100 10 "u" "ldap+E2U" "!^+46(.*)$!ldap://ldap.se/cn=01!".
We see in this example that information about all E.164 numbers in the fortysix country code (for Sweden) exists in an LDAP server; the required search is specified by the LDAP URI [26].
IANA Considerations A request has been made to IANA to delegate the E164.ARPA domain following instructions to be provided by the IAB. Names within this zone are to be delegated to parties according to the ITU Recommendation E.164. The names allocated should be hierarchical in accordance with ITU Recommendation E.164; the codes should be assigned in accordance with Recommendation E.164. Delegations in the zone e164.arpa, not delegations in delegated domains of e164.arpa, should be done after expert review (the IESG will appoint a designated expert).
Security Considerations Because this system is built on top of DNS, one cannot be sure that the information received from the DNS is more secure than any DNS query. To solve that, the use of DNSSEC [27] for securing and verifying zones is recommended. The caching in DNS can make the propagation time for a change the same as the time to live for the NAPTR records in the zone that is changed. Therefore, the use of this in an environment where IP addresses are for hire (for example, when using DHCP [28]) must be done carefully. Many countries (and other numbering environments) have multiple providers of call routing and number/name translation services. In these areas, any system
Telephone Number Mapping (ENUM)
421
that permits users, or putative agents for users, to change routing or supplier information may provide incentives for changes that are actually unauthorized (and, in some cases, for denial of legitimate change requests). Such environments should be designed with adequate mechanisms for identifying and authenticating those requesting changes and for authorizing those changes.
10.7
Appendix to the RFC 2916 Scenario
Say that the content of the e164.arpa zone is the following: $ORIGIN e164.arpa. 6.4 IN NS ns.regulator-e164.example.se.
The regulator has in turn given a series of 10,000 numbers to the telco with the name Telco-A. For that reason, the regulator has in his DNS: $ORIGIN 6.4.e164.arpa. 6.7.9.8 IN NS ns.telco-a.example.se.
A user named Sven Svensson has from Telco-A the phone number +46-89761234. The user gets the service of running DNS from the company called Redirection Service. Sven Svensson has asked Telco-A to point out Redirection Service as the authoritative source for information about the number +46-89761234. For that reason, Telco-A puts in his DNS the following: $ORIGIN 6.7.9.8.6.4.e164.arpa. 4.3.2.1 IN NS ns.redirection-service.example.se.
Sven Svensson already has traditional telephony from Telco-A, but he also has an SIP service from the company Sip Service, which provides Sven with the SIP URI sip:[email protected]. The ISP with the name ISP A runs e-mail and Web pages for Sven under the e-mail address of [email protected] and URI http://svensson.ispa.se. For that reason, the DNS for the redirection service contains the following: $ORIGIN 4.3.2.1.6.7.9.8.6.4.e164.arpa. IN NAPTR 10 10 "u" "sip+E2U" "!^.*$!sip:[email protected]!". IN NAPTR 10 10 "u" "mailto+E2U" "!^.*$!mailto:[email protected]!". IN NAPTR 10 10 "u" "http+E2U" "!^.*$!http://svensson.ispa.se!". IN NAPTR 10 10 "u" "tel+E2U" "!^.*$!tel:+46-8-9761234!".
A user named John Smith wants to contact Sven Svensson, so he has to start with only the E.164 number of Sven—that is, +46-8-9761234. He takes the num-
422
Chapter Ten
ber and enters it into his communication client, which happens to know how to handle the SIP protocol. The client removes the hyphens and ends up with the E.164 number +4689761234. That number is used in the algorithm for NAPTR records, as described below. The client converts the E.164 number into the domain name 4.3.2.1.6.7.9.8.6.4.e164.arpa. and queries for NAPTR records for this domain name. Using DNS mechanisms, which includes following the NS record referrals, the following records are returned: $ORIGIN 4.3.2.1.6.7.9.8.6.4.e164.arpa. IN NAPTR 10 10 "u" "sip+E2U" "!^.*$!sip:[email protected]". IN NAPTR 10 10 "u" "mailto+E2U" "!^.*$!mailto:[email protected]". IN NAPTR 10 10 "u" "http+E2U" "!^.*$!http://svensson.ispa.se". IN NAPTR 10 10 "u" "tel+E2U" "!^.*$!tel:+46-8-9761234".
Because the client knows sip, the first record above is selected and the regular expression !^.*$!sip:[email protected] is applied to the original string, +4689761234. The output is sip:[email protected], which is used according to SIP resolution.9
References 1. www.enum.org/information/faq.cfm. 2. A. E. Cha. “Showdown at the Digital Corral—Internet-Based Single Number Plan Starts a Tug of War over Control.” Washington Post, April 22, 2001. 3. N. Turner. “Interconnect with ENUM Alliance—Advancing SIP and Global ENUM as the Standards for IP Interconnects.” Fall VON Conference, Atlanta, GA, October 15–18, 2001. [email protected]. 4. M. Foster. Number Portability in the GSTN: An Overview, Internet Draft. October 19, 2001. 5. ANSI Technical Requirements No. 2. “Number Portability—Switching Systems.” April 1999. 6. ANSI Technical Requirements No. 3. “Number Portability Database and Global Title Translation.” April 1999. 7. TIA/EIA IS-756 Rev. A. “TIA/EIA-41-D Enhancements for Wireless Number Portability Phase II (December 1998) Number Portability Network Support.” April 1998. 8. GSM 09.02. “Digital Cellular Telecommunications System (Phase 2+); Mobile Application Part (MAP) Specification.” April 1995; European Telecommunications Standards Institute.
Telephone Number Mapping (ENUM)
423
9. ITU-T Q-Series Recommendations—Supplement 4. “Number Portability Capability Set 1 Requirements for Service Provider Portability (All Call Query and Onward Routing).” May 1998. 10. ITU-T Q-Series Recommendations—Supplement 5. “Number Portability— Capability Set 2 Requirements for Service Provider Portability (Query on Release and Dropback).” March 1999. 11. ITU-T COM 11-R 162-E, Draft Recommendation Q.769.1. “Signaling System No. 7—ISDN User Part Enhancements for the Support of Number Portability.” May 1999. 12. Draft GSM 03.66 V7.2.0 (1999-11) European Standard (Telecommunications series) Digital Cellular Telecommunications System (Phase 2+); Support of Mobile Number Portability (MNP); Technical Realisation; Stage 2; (GSM 03.66 Version 7.2.0 Release 1998). 13. M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, “SIP: Session Initiation Protocol.” RFC 2543 (March 1999). 14. A. Vaha-Sipila. “URLs for Telephone Calls.” RFC 2806 (April 2000). 15. J. Yu. “Extensions to the “tel” and “fax” URLs to Support Number Portability and Freephone Service,” Internet Draft. www.ietf.org://draft-yu-tel-url-03.txt. November 1, 2001. 16. J. Rosenberg, H. Salama, and M. Squire. “Telephony Routing Information Protocol (TRIP).” RFC 3219 (September 2001). 17. P. Faltstrom. “E.164 Number and DNS.” RFC 2916 (September 2000). 18. A. Brown and G. Vaudreuil. ENUM Service Specific Provisioning: Principles of Operations, Internet Draft. www.ietf.org://draft-ietf-enum-operation-02.txt. February 23, 2001. 19. ITU-T Recommendation E.164. “The International Public Telecommunications Numbering Plan.” 1997. 20. M. Mealling and R. Daniel. “The Naming Authority Pointer (NAPTR) DNS Resource Record.” RFC 2915 (September 2000). 21. P. Mockapetris. “Domain Names—Concepts and Facilities.” RFC 1034, STD 13 (November 1987). 22. P. Mockapetris. “Domain Names—Implementation and Specification.” RFC 1035, STD 13 (November 1987). 23. T. Berners-Lee, R. T. Fielding, and L. Masinter. “Uniform Resource Identifiers (URI): Generic Syntax.” RFC 2396 (August 1998). 24. M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg. “SIP: Session Initiation Protocol.” RFC 2543 (March 1999). 25. A. Vaha-Sipila. “URLs for Telephone Calls.” RFC 2806 (April 2000). 26. T. Howes and M. Smith. “An LDAP URL Format.” RFC 1959 (June 1996).
424
Chapter Ten
27. D. Eastlake. “Domain Name System Security Extensions.” RFC 2535 (March 1999). 28. R. Droms. “Dynamic Host Configuration Protocol.” RFC 2131 (March 1997).
Notes 1
Ovum sources. Volume 2002—voice, 30 Tbps; data, 400 Tbps. Volume 2006— voice, 30 Tbps; data, 700 Tbps. 2 In Figure, data, Internet, and private lines represent a $100 billion international market in 2002; the majority of this figure, however, is from private line services. 3 The 80 percent figure is an often-quoted number. However, Figure 10.1 depicts a $900 billion market in 2002, with about $50 billion in data and Internet (the remainder of the $120 billion “other” bucket is for private line services). This actually makes the data and Internet figure about 5.5 percent (= 50/900). 4 Users are not required to use the common number, and they could still keep separate telephone and e-mail addresses. But if they use the common number, they gain the advantages intrinsic with the system. 5 ICANN has turned down, up to the time of this writing, proposals to set up .tel and .num domains. 6 This material (Section 10.3) is based on the NeuStar, Inc., White Paper ENUM, Driving Convergence in the Internet Age (www.neustar.com). NeuStar, Inc., founded in 1996, is the leading provider of critical clearinghouse services that enable communications networks to interoperate. NeuStar positions itself as a trusted third party provider of services such as administering crucial public resources (telephone numbers), managing vital real-time network routing databases (number portability routing registry), and exchanging data for provisioning and billing. NeuStar demonstrates its reliability in managing large databases, consistently executing millions of transactions daily, maintaining secure and confidential data, and enabling the secure exchange of network and business information to ensure interoperability between nextgeneration networks. For example, NeuStar provisions NANPA services based on strict FCC-established criteria for neutrality, as a neutral third party described in FCC Order 99-346. 7
Based on Numbering Resource Utilization/Forecast Reports, data filed as of October 23, 2000; database roll-ups by C. Stroup, Industry Analysis Division, FCC. Also, “The 5 Year Forecast,” The Standard.com, March 19, 2001. 8 The Internet Draft is Copyright (C) The Internet Society. This document and translations of it may be copied and furnished to others, and derivative works
Telephone Number Mapping (ENUM)
425
that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. 9 The RFC is Copyright (C) The Internet Society. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works.
CHAPTER 11 Carrier Applications 11.1
Introduction and Opportunities
This chapter discusses issues related to penetration of VOIP services. It looks at the issue of VOIP applicability to carrier space for enabling it to tackle the U.S. PSTN market, the foreign PTT market, and the continent-to-continent (international) market. While bandwidth saving is a factor in VOIP, it is only an element of an overall approach. The volume of sales achieved via an enterprise network entry or via a greenfield carrier network entry will be rather limited in total absolute dollars in the short term; therefore, interest in penetrating the PSTN carrier market exists. However, to accomplish this penetration, developers must understand the requirements of carriers. Carriers are aggressively seeking to build new, sustainable revenue streams based on high margin–enhanced services, which carry higher profit potentials than commodity transmission services [1]. There also are opportunities in wireless networks and in cable TV (cable telephony) networks, which we explore briefly in this chapter. Many VOIP-related metrics more than doubled year over year in recent years, such as in 2000 [2]: the number of installed VOIP networks, the number of players in the VOIP arena, the dollars spent on VOIP products, the number of VOIP channels shipped, and even the capacity of VOIP products. However, on the scale of other telecom and datacom services, the absolute value is still small. As of this writing, estimates are that approximately 100 billion minutes per year is carried on VOIP systems worldwide. The expectation is that by 2004, about 400 billion minutes per year will be carried over packet networks; that figure is forecast to grow to 1250 billion minutes per year by 2006. The revenue associated with these services is estimated to be $1 billion in 2002, $4 billion in 2004, and $9 to $12.5 billion in 2006 (see Figure 11.1). Compared with the total worldwide telecom services market of $900 billion
427
Chapter Eleven
Figure 11.1 Revenues for VOIP (calculated). (Source: FCC history, industry forecasts, Dean & Company analysis, AT&T, Probe Research Next Generation Networks, November 2001).
U.S. Figures
Minutes of Use (B)
950 Enhanced Applications
Circuit Switched + VoIP Minutes
850
VOP: 275B
428
750 650
450
Circuit Switched Substitution
Circuit Switched Minutes
550
Demand Stimulation
1997 1998 1999 2000 2001 2002 2003 2004
Global Figures (minutes) 1,400 1,200 Minutes (B)
1,000 800 600 400 200 0
2000 2001 2002 2003 2004 2005 2006
Worldwide Revenues
Worldwide Revenues
1,400
1,400
1,200
1,200
1,000
1,000
800
800
600
600
400
400
200
200
0
2000 2001 2002 2003 2004 2005 2006
Revenues @ $0.01/minute, no erosion
0
2000 2001 2002 2003 2004 2005 2006
Revenues @ $0.01/minute through 2004 and @ $0.0075/minute forward
in 2002, this revenue is around 0.1 percent of the total. Considering the amount of airtime given to the topic during the past five years, however, the amount of actual deployment to date has been underwhelming, particularly in North America. What has held back the deployment of VOIP technology at the broad carrier level includes the following:
Carrier Applications
429
1. An underestimation of the importance, complexity, and purpose of signaling and the need to interconnect with the approximate two billion telephone sets already deployed globally 2. The confusion brought about by the multiplicity of signaling protocols that have been advanced 3. A lack of understanding of the economic value of the embedded base of equipment in carrier networks, which will likely continue to operate flawlessly for its intended purposes—that is, to make money with net bottom line of around 20 percent for a decade or more 4. A lack of understanding of what constitutes the carriers’ true costs of delivering telephone services 5. A lack of understanding of the commoditization of both switched voice and high-speed bandwidth, obliterating the value of any savings related to VOIP bandwidth efficiency 6. A lack of understanding that any VOIP bandwidth savings has nothing to do with IP but has everything to do with vocoding algorithms, which can well be supported without IP 7. The difficulty in securing QoS in public and even private IP networks 8. The most critical failure of all: a misunderstanding of where the value of VOIP must be for it to be successful The bright spot for VOIP is that it will bring forth a panoply of new services not possible with the circuit-switched voice; the bright spot is not as a transport mechanism to replace existing trunks and/or Class 5 switches. Until new applications are developed, VOIP will remain a 0.1 percent solution. Why would anyone scrap what has been called over the years the “best telephone system in the world,” just to replace it with something that does just the same and nothing more? A VOIP network cannot be just a “me too” network; “me too” carries voice. Developers have to start focusing in earnest in bringing out new applications, not the chronic litany of “bandwidth efficiency” advantages. There is a reported oversupply of fiber in the United States (long-haul bandwidth) and in the continent-to-continent arena. DWDM can deliver terabits per second on fibers. And whenever a construction crew digs up a street to lay fiber, it deploys cables with ribbons of fiber strands of 24, 96, or even 500 fiber pairs; nobody ever deploys cable with a single fiber pair. The alleged need to save bandwidth is suspect and anachronistic; it is a dead end for VOIP, at least for wireline networks. Telecom developers have to follow the lead of the PC developers: For well over a decade now, software developers have stopped trying to be overly parsimonious in RAM/disk drive efficiency in favor of an entire suite of new services of convenience on the desktop and in favor of ease of use. Many new applications and capabilities have been developed on the desktop, including program linkage (audio
Chapter Eleven
Figure 11.2 Worldwide revenue and percentage allocation.
1000 900 Service revenue (US$ bn)
430
Data, Internet, Leased lines
800 700 600
Mobile
500
Int'l
400 300
Domestic Telephone/Fax
200 100 0
98 99 00 01 02
Source: ITU Projections
The changing pie: Global telecom service revenue, 1998 Internet leased lines 10.6% Mobile service revenues 21.2% Domestic fixed line revenues 59.2%
International revenues 8.8%
1998 Telecom service revenue. Total = US$724b
Source: ITU World Telecommunications Report 1999: Mobile Cellular.
and video) and easy-to-use browsers. The same mental shift has to occur regarding backbone bandwidth—this mental shift is worth billions, if not trillions, of dollars for those who get it. Therefore, the story of VOIP has to be less on IP and the bucketful of new protocols that one can invent overnight, and much more on new applications, possibilities, services, interactions, voice capture, storage, manipulation, dissemination, interplay, and so forth. New applications need to be advanced by all of the VOIP techniques for those techniques to have a chance of penetration in the PSTN. Regrettably, new applications so far seem to have taken a backseat during the past
Carrier Applications
Figure 11.3 Worldwide telephone and Internet users. (Source: ITU)
431
Taking the long-term view Fixed voice, Mobile voice and Internet users worldwide (millions) 1910-2010, Logarithmic scale
Normal scale 2,000
1,000
Mobile voice
1,500
Fixed voice
1,000 10
Mobile voice
Fixed voice
500
Internet
0 1910 20 30 40 50 60 70 80 90 2000 10
Internet 0 1990
2000
2010
five years of industry advocacy. VOIP conserves bandwidth, which has been the misplaced single-dimensional focus of the proponents. Times have changed since the 1970s, when bandwidth was scarce, and to avoid a stale anachronism, proponents must realize that backbone bandwidth has become a near commodity of late and that bandwidth conservation is of limited interest in terms of the overall benefits that VOIP can afford to “main street” carriers. Although the volume of data is reported to be thirteen times that of voice in 2002 and is forecast to be twenty-three times that of voice in 2006, voice still brings in about 80 percent of the revenue for carriers. The U.S. voice revenue is around $200 billion per year at this writing (for local and long distance1) and, as noted previously, the worldwide revenues are around $900 billion (including mobile services). Of this total, data and private line services represent about $125 billion per year—data is about $50 billion per year and private line is about $75 billion per year. (See Figure 11.2 for an allocation of the revenue.) Figure 11.3 depicts the worldwide number of subscribers to fixed voice services, wireless voice services, and Internet services; notice that the crossover for fixed voice services to wireless voice services occurs in 2003, whereas the crossover for fixed voice services to Internet services occurs in 2007 [3]. Even with these services, a subscriber on a telephone landline may spend $30, $50, $100, or even $500 per month, while an Internet user spends $19.95 per month or maybe $29.95 for a broadband connection. Figure 11.3 demonstrates that traditional telephone users are still a major constituency who must be addressed by VOIP. Table 11.1 lists aspects and issues relevant to the discussion of IP telephony.
432
Chapter Eleven
Table 11.1
Aspects and Issues in IP Telephony
Why IP telephony? Why? When? What? Technical aspects of IP telephony IP telephony standards activities QoService Bandwidth Numbering Impact of IP telephony on network architectures Economic aspects of IP telephony Consumers, carriers, and countries Size, substitutability, and settlements Impact on the public telecommunication operator Regulatory aspects of IP telephony Changing conceptions of telecommunication networks and services Current approaches to the regulatory status of IP telephony IP telephony in high-price markets Sustainability of local-access networks IP telephony development does not equal Internet development IP telephony in falling-price markets IP telephony in low-price markets Impact of IP telephony on Universal Service schemes IP telephony puts downward pressure on IDD charges IP telephony in practice The impact of the regulatory environment on IP telephony Impact of public telecommunication operators on the evolution of IP telephony The impact of IP telephony on the regulatory environment Who benefits from restrictive policies? Macro issues Cost-oriented pricing: A drive to total commodization Towards technology-neutral regulation
11.2
Where the Action Should Be
We amplify in this section some of the foregoing points and afford some additional observations. There are two main subsets of packetized voice services, as discussed elsewhere in this book: 1. Internet telephony—VOIP using the public Internet 2. Voice over packet—VOIP using private, carrier-provided, managed “private label” IP-based networks or private, managed MPLS networks Historically, IP telephony has been a substitute for high-cost PSTN telephony. Namely, it has aimed to avoid (1) long distance and international call prices and (2) above-cost settlement rates. Increasingly, IP telephony is becoming a supple-
Carrier Applications
433
mentary application offered by ISPs; examples include “free” PC-to-phone calls to the United States and elsewhere, as well as integrated messaging and computer/ telephony. According to some observers, traditional carriers are not as afraid of VOIP over wireless systems (for example, 3G) as they are of wireline applications, because a worldwide wireline toll-quality VOIP service would be catastrophic to revenue [4]. The key question is, “In the future, will a majority of telephony offered by telecom carriers be IP telephony?” In other words, “Will carriers be deploying integrated voice and data networks?” A number of proponents will answer in the affirmative, particularly for newly launched carriers operating abroad [5]; others would be more reserved, particularly for wireline networks over the next two to three years. At this juncture, near-term carrier opportunities for VOIP are tied to international applications and wireless networks, as discussed later in this chapter. Growth is anticipated to be significant in the wireless space, with the worldwide population growing from 1.2 billion in 2002 to 1.8 billion in 2005. It makes sense to have one integrated handheld terminal for all the mobile communications needs, including voice and messaging; hence, VOIP may be appropriate in this space. VOIP technology providers have to answer the question “Who is the buyer of this technology?” Enterprise-based solutions are the easiest to deploy, but even though enterprise players could easily deploy VOIP systems (including VOMPLS), the market opportunity is rather small when limited to that stratum. Developers must therefore target the technology to carriers. There are conceivably several possible motivations for considering a packetized approach to voice, as advocated by various constituencies. By ranking these publicly stated motivations by the order that seems to make the most sense, one has the following: 1. New applications become possible with VOIP, thereby generating opportunities for new services and new revenues; for example, computer telephony integration (CTI)–type applications. 2. Carrier cost savings can be achieved by using packetized technology in the operations, equipment, or transmission budget. 3. Although the volume of data is ten to twenty times that of voice, voice still brings in about 80 percent of the revenue for carriers. There are more than a billion potential voice customers, as shown in Figures 11.4 to 11.6. Voice is, therefore, a desirable market to optimize with new technologies and/or to penetrate. 4. New technologies have become available that because of their technological niceties must be injected into the network, as according to some constituencies. 5. An elegant new integrated architecture becomes possible with connectionless packet technologies; indeed, a new “does-it-all” network based on a sin-
434
Chapter Eleven
Figure 11.4 Worldwide telephone lines (users) 1960 to 2000. (Source: ITU World Telecommunication Indicators Database).
Telephone main lines worldwide (M) 970 7%
Annual average growth (right scale)
694
744
792 F o r e c a s t
520 407 327
245 96 129 60
65
5%
175
5% 4% 3% 2% 1%
70
75
80
85
90
95
96
97
00
0%
gle approach supporting a gamut of services is elegant and ostensively efficient, as according to some constituencies. 6. Because the IP has become ubiquitous in the data arena, it is therefore desirable to use IP for everything; IP can, in fact, be made good enough to support anything and everything, as according to some constituencies. 7. A transfer of wealth (that is, market share) from traditional telephony vendors to newer data-oriented vendors should occur, as according to some constituencies. According to the market research firm Giga Information Group, global data traffic has or will soon surpass voice transmissions via telephone lines when measured by volume. By 2002, international voice traffic will represent less than 10 percent of all international traffic over international telecommunications lines, or 160 billion minutes2; global data traffic, by contrast, will exceed an equivalent of 1.6 trillion minutes of traffic (this is a one to ten ratio). These kinds of figures, however, fail to reveal that revenue for carriers is skewed exactly to the opposite; therefore, carrier investments, policies, staffings, deployments, and attention will correspondingly be toward those services (specifically, voice) that bring in 83 percent of their revenue. According to various well-respected sources, U.S. voice traffic by volume is 35 percent of the total, while data traffic is 65 percent of the total (partitioned as 30 percent corporate and 35 percent Internet).3 The revenue picture, however, is as follows—voice, 83 percent; data, 17 percent (the data revenue figure partitioned as 12 percent corporate and 5 percent Internet). The challenge faced by IP planners is how to bring revenue and profitability to the IP
Carrier Applications
2,000
250 Actual
1,800 1,600 1,400 1,200
Projected
Fixed lines
200
Mobile subscribers International traffic
150
1,000 800
100
600 400
50
200 0
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005
0
Billions of minutes of international telephone traffic
Projection of growth trends, fixed and cellular subscribers and int'l traffic, 1995-2005 Fixed lines and mobile subscribers worldwide (millions)
Figure 11.5 Projections on the worldwide voice subscriber population. (Source: ITU)
435
network, be it the intranet or the Internet. VOIP is seen as an opportunity for achieving this goal. However, time-division multiplexing (TDM) trunk replacement with statistical TDM (STDM) VOIP trunks does little to change this revenue picture. Bandwidth parsimony is a lost leader, in our opinion, in terrestrial networks, as an inspection of Figure 11.7 will reveal. Observers now quote a $4,000 monthly recurring charge (MRC) for 1 Gbps of long-haul bandwidth, which equates to $4,000,000 per month for 1 Tbps, and also 30 Tbps used worldwide for voice.4, 5 A bandwidth compression of ten to one would save 27 Tbps, which based on the metric just described, would be valued globally at $108 million per month, or about $1.3 billion per year. This saving against a global base of $900 billion per year, however, is less than a 0.15 percent saving. No strategist would embark on a large project involving new technology, new expenditures, new technical approaches, new billing approaches, new operations approaches, new training, new provisioning, and new customer care to save 0.15 percent from the bottom line. In any event, compression is achievable by using vocoding techniques such as CELP and ACELP; it is not attributable to IP. We hope that this paragraph will register the following punch lines and enable the discourse to graduate to a higher plane:
436
Chapter Eleven
Figure 11.6 Comparing various classes of users. (Source: ITU)
Growth trends: fixed-lines, mobilephones and estimated Internet users, millions 1,200 1,000
Fixed-line telephones Mobilephones Estimated Internet users
800 600 400 200 0
1995 1996 1997 1998 1999 2000 2001 2002 2003
Note: Columns show actual and projected users at end of year.
1. A discussion of how VOIP is more bandwidth-efficient simply makes the case against deploying this technology at all, because the savings are completely trivial in the full context of the problem. 2. Any value at all for VOIP must be in the total new application horizon that it opens up once voice can be generated, stored, transmitted, received, manipulated, enhanced, and correlated in a user’s below-one-thousand-dollar PC or handheld voice terminal. We have had, after all, more than 125 years to optimize voice transport, and if VOIP only does something to transport, then it will have rather limited success in the market.6 And, as noted, the “packet” portion of VOIP has very little to do with the bandwidth saving itself; that credit must go to the vocoding technology, for compression could well be accomplished without any packetization of any kind (IP, ATM, Frame Relay, or MPLS). Particularly in North America, the early 2000s saw major availability of long-haul bandwidth. Many fiber routes were reported to be only 2 to 5 percent used.7 Therefore, the emphasis of VOIP, absent new applications, must be for local networking, where bandwidth availability remains an issue. At the same time, however, this is the most difficult space to penetrate because of the size of the problem and the prevalence of the embedded base, and furthermore, the early 2000s saw a major shakeout of service providers in the United States. The competitive pendulum has shifted, at least in the short term, against the formation of new (greenfield) carriers, such as competitive local exchange carriers (CLECs), DSL-based local exchange carriers (DLECs), radio-based local exchange carriers (RLECs), Ethernet local exchange carriers (ELECs), and building local
437
1
10
100
1,000
10,000
$0
$5
$10
$15
$20
$25
'00 '01
(c)
Year
'02 '03
(a)
'04
TAT-8 PTAT-1 TAT-10 TAT- AC-1 TAT-14 Flag 1988 1989 1992 12/13 1999 2000 Atlantic 1996 2001
Circuit capacity, rising by 89% p.a.
Circuit costs, falling by 72% p.a.
1
10
1000
1,000
10,000
100,000
Figure 11.7 Bandwidth as a commodity.
*Metropolitan service; Source: CIBC World and Dell’Oro Estimates, June 2001
$ per Gigabit of Throughout
Source: ITU, adapted from FCC.
Circuit cost p.a. (US$)
Addressing constaints: Increased trans-Atlantic bandwidth Circuit capacity (56/64 kbits/s, 000s)
Oct-98
Source: Band-X
0
20
40
60
80
100
120
$16 $7 $6
Dec-98
$5
$4 $3.50 $3
(b)
(d )
Dec-99
Dec-00
London Sydney London Hong Kong Composite World Los Angeles Tokyo New York London Los Angeles Hong Kong London Amsterdam London Frankfurt London Milan London Madrid London Brussels Composite European New York Frankfurt London Paris
1995 1996 1997 1998 1999 2000 2001 2002 2003
$29
*Long Haul service; Source: RHK, 2000
80 70 60 50 40 30 20 10
Revenue/Gbps* ($, thousands) 90 $80
438
Chapter Eleven
exchange carriers (BLECs). This shift dims the opportunity for VOIP installations at the local level, even when softswitch technology is employed. In a greenfield environment, the planner might consider deploying VOIP architecture (softswitches, for instance), rather than a traditional Class 5 switch, to achieve transmission savings and switch-cost reduction when small- to medium-sized switches are needed.8 In existing environments, however, the advantages to VOIP must be secured through new applications, not through technical niceties or the elegance of new architecture. Figure 11.1, cited earlier, shows the calculated worldwide revenue figures for VOIP services: A $4 billion global market is seen for 2004 and an $8 to 12 billion market is seen for 2006, figures that are approximate yet provide an order-ofmagnitude sense of the market. Typically, the equipment is one-third of the revenue; namely, financial markets demand $3 of revenue per year for every $1 invested in equipment. The equipment revenue then would be $1.3 billion in 2004 and $2.6 to $4 billion in 2006. New carriers in Asia and elsewhere may in fact use VOIP/softswitch technology, but this is not yet the case to any significant degree in North America.9 Nonetheless, VOIP has achieved some market penetration in the early 2000s. Figure 11.8 depicts the number of worldwide yearly outgoing international minutes for VOIP use over time. As points of reference, there were around 105 billion minutes of international PSTN traffic (mainly voice and fax) in 199910 and around 5 trillion minutes of total PSTN traffic (ITU sources). In 1999, only 0.6 percent of the total voice traffic was on VOIP, as according to Probe Research Corporation sources. IDC forecasts that “Web talk” revenues will reach $16.5 billion by 2004, with 135 billion minutes of traffic.11 The Gartner Group forecasts that VOIP and competition in Europe will reduce prices 75 percent by 2002. According to Tarifica, IP telephony as a percentage of all international calls in
IP Telephony traffic, in million minutes 7,000
5.5%
6,000 5,000
Figure 11.8 Importance of VOIP in international outgoing traffic. (Source: ITU Internet Reports, adapted from TeleGeography Inc.)
As percentage of int'l outgoing traffic
3.2%
4,000 3,000 1.6%
2,000 1,000
0.0%
0.2%
1997
1998
0 1999
2000
2001
Carrier Applications
439
2004 could be 40 percent; according to Analysys, 25 percent. According to IDC, the majority of IP telephony calls in developing countries are incoming calls [6]. The geography of IP needs to be a factor when considering VOIP. Investment in IP networks is still highly United States–centered. More than 95 percent of interregional IP bandwidth connectivity is to and from North America. Europe is catching up; major investment in fiber-based networks occurred since the opening up of European Union (EU) markets in the late 1990s. Asia and other Pacific Rim regions are, however, still lagging behind. In summary, carriers will deploy VOIP when it provides the following: 1. Major new revenue opportunities beyond the revenue stream that carriers currently have 2. Major savings in the operations, administration, maintenance, and provisioning (OAM&P) side of the house Regarding point 1 of the preceding list, a major breakthrough would be the introduction of an entire set of new data-enriched voice applications—for example, distributed call centers, dynamic interactive voice response (IVR), personal voicemail (PVM), Voice Protocol for Internet Mail (VPIM)–based services, unified messaging, and voice-based PCs with network extensions. Regarding point 2, even though budgets vary from carrier type to carrier type, some have the following percentage allocation: 8 percent for amortized equipment (capital expenditures); 6 percent for backbone network (transmission); 11 percent for access; 18 percent for SGA; 17 percent for operating expenditures; 11 percent for building-related costs; 4 percent for NOC costs; and 25 percent for profit plus tax. (Total: 100 percent of the income.) These kinds of cost breakouts exemplify the fact that equipment and transmission costs are usually not the major components for a carrier (in this example, being 14 percent).12 Hence, any breakthrough that reduces these items, even by 50 percent, only has a limited impact on the bottom line of the carrier by itself. With operating expenditures typically exceeding the equipment and transmission costs, carriers look for improved OAM&P support tools to keep these costs reduced and in check by any new technology being introduced.
11.3
Carrier Voice Networks
Figure 11.9 shows a simplified view of a North American IXC environment within which VOIP must operate to be successful. The environment is considerably more complex than a handful of nodes casually connected by links, as perennially illustrated by vendor diagrams and conference talks. The figure shows a three-tier network. Some IXC networks are three-tiered; others are two-tiered. (IXCs tend to keep their architecture confidential to claim faster connection time compared with their competitors.)
440
Chapter Eleven
West
Figure 11.9 Simplified view of IXC’s current voice network.
Mountain
Central
East
IXC 1 CCSS7
Tandem Class 5
ILEC B
LATA1
LATA2
LATA3
ILEC A
1XC 2
The typical North American voice network environment comprises various ILECs, IXCs, CLECs and wireless cellular companies (the last two classes implicit in the figure). A national network with 300 to 500 nodes is not atypical. Figure 11.10 depicts a boiled-down IXC environment. Figure 11.11 depicts a local ILEC environment. (ILECs, too, are getting into long distance voice services, but for this discussion we focus on the local service.) A carrier-grade network for VOIP must scale to serve large metropolitan areas. Today’s local digital switches can serve 100,000 lines of voice. While some believe that one should use IP (or, more specifically, packet) for everything and that IP can be made good enough to support anything and everything, multiplexing in general, and statistical multiplexing in particular, has always been known to pay only for itself when the cost of the link to be shared is high (for example, in national or international applications). Typically, this condition has translated into the pragmatic result that local-only Frame Relay is not financially advantageous, that local-only fractional T1 is not financially advantageous, that local-only ATM service is not (all that) financially advantageous. Therefore, if the main tenet of the VOIP industry is that VOIP is bandwidth-efficient, then VOIP will simply have an extremely limited ILEC/local service market; the converse would be true if the focus were in new services. The fact is, engineers designing new technology must move beyond the dry, clinical abstraction offered by packet disciplines, queue management, and the protocol state machine and offer the buyers of their gear (the carriers) and the clients of the buyers of their gear (the end users) new solutions, not new protocols. Figure 11.12 graphically depicts some initial applications of VOIP in carrier networks; Figure 11.13 shows a typical enterprise application; Figure 11.14 depicts a possible transition to VOIP on the part of the IXCs, based on better-networking-
Carrier Applications
441
Figure 11.10 Boiled-down IXC network. 1XC 1 CCSS7
Simplification
Tier 1
1 per region
Tier 2
Tier 3
20 per region 80 per region
only features such as bandwidth conservation and QoS. If VOIP designers start adding new end-user solutions, the migration could happen faster. The best action would be to enlarge the application scope to increase the opportunity space. It is always easy to redesign the core of a network, since there are significantly fewer nodes. It may be that VOIP can, for the next couple of years or so, set its sights on a core national voice network, although, as noted, that segment is where the least pressure on bandwidth savings exists. This choice, however, is driven less by actual needs but more by the capability of the new technology. A core application, as simple diagrams shown at conferences or on vendor’s white papers tend to illustrate, implies supporting trunking needs for the interconnection of the 8-to10–router national-level backbone deployed by typical national carriers. However, if this is the course of entry (that is, of deployment), the feature set of VOIP solutions would need to focus on core-level needs rather than edge needs. Moving to the edge of the network, one starts to find hundreds, thousands, tens of thousands, and hundreds of thousands of nodes. A migration at the edge level will invariably take many years—even a decade or so. It should be noted, however, that as of this writing, the maturity of VOIP equipment in general, when measured in terms of carrier-class reliability, features
442
Chapter Eleven
Figure 11.11 Boiled-down ILEC network.
ILECs CCSS7
IXCs
Region/ LATA
LD Tandem
CITY 2 to SS7
Local Tandem
to SS7
Local Tandem
CITY 3 SONET RING
IXC1 IXC2 IXC3
LD Tandem
to SS7 Class 5
to SS7
Local Tandem Local Tandem
Class 5 to SS7
20-100K lines
WEST SECTOR 20-100K lines
CITY 1 (~1Million Lines)
20-100K lines
DOWNTOWN SECTOR CLEC 2
CLEC 1
20-100K lines
Local Tandem
Class 5
EAST SECTOR
Note: DCS, ADN, channel backs not shown
set, port density, and OAM&P support, still leaves something to be desired. Much more work is needed in VOIP at the theoretical level and in terms of implementations. A carrier could decide that by this time, VOATM has had nearly a decade of maturity; VOIP, five years of maturity. Furthermore, MPLS enjoys certain attributes that prima facie make it a better technology than pure IP for supporting packetized voice applications and, as such, could be a competitive technology to pure VOIP. Therefore, to be successful developers and vendors should bring forth VOIP technologies and products with an understanding of the principles expounded herewith. Developers need to understand the existing voice architecture and physical topologies of the providers of voice services. The new technology will not displace
Carrier Applications
Figure 11.12 Multitechnology packet applications for voice.
443
Data Router IP and/or MPLS Network Optical Network GMPLSBased Optical Core
Voice Switch
Optical Switch/ OXC
Circuit Switched Tandem Circuit Switched CLASS 5
Data Router
PSTN (Tier 1, or 2, or 3)
Voice Switch
Circuit Switched Tandem
Circuit Switched CLASS 5
existing technology, at least for a decade, on a broad scale. Therefore, the new technology has to work in a complementary fashion and must offer new services. Also, there may be international opportunities (that is, interconnecting between continents), wireless applications, and a handful of greenfield applications. Hence, the migration to VOIP will likely follow these rules: • International trunking, national-level trunking, regional-level trunking, local trunking, and local applications • Enterprise applications • New network expansion, such as wireless networks and cable TV/telephony networks • Greenfield applications • New services
Figure 11.13 Toll bypass with the voice/data-converged network.
Class 5 Legacy Switches PSTN-Traditional TDM Network
PBX with Circuit Emulation Interface
CE
CE
Cisco Toll Bypass Cisco PE
GB Tunnel
PE
Cisco Enterprise LAN
Enterprise LAN Solution Requirements
QoS on CE QoS on PE Mapping QoS on diffserv-Aware + + + = Router Router Traffic to Core Traffic Engineering Tunnels Routers
Class 5 Legacy Switches IP Phone
IP Phone
PSTN-Traditional TDM Network MultiService Switch
MultiService Switch
CE
Cisco
CE
Toll Bypass Cisco PE
GB Tunnel
Enterprise LAN Solution Requirements
PE
Cisco Enterprise LAN
QoS on CE QoS on PE Mapping QoS on diffserv-Aware + + + = Router Router Traffic to Core Traffic Engineering Tunnels Routers
CE = Customer equipment PE = Provider equipment
444
445
AREA 2 (e.g. Mountain)
Tier 3: Metro Tandems
Tier 2: Region
State 1 (e.g. NJ)
AREA 4
State 5
State 7
State 2 (e.g. NY)
State 6
Tier 1: National
State 10
2003 2006 2009
AREA 3
Figure 11.14 Possible IXC transition timetable for VOIP.
AREA 1 (e.g. West)
VoIP Penetration Phase 3
VoIP Penetration Phase 2
VoIP Penetration Phase 1
446
Chapter Eleven
Figure 11.15 Example of core usage of VOIP in an initial transition.
Denver
Chicago
Voice Tandem
Seattle
Voice Tandem
R
R R
San Francisco
Los Angeles
R
R
R R
New York
Washington
R
R
R
Phoenix
St. Louis
Voice Tandem
Houston
Voice Tandem
National-level networks can be somewhat meshy, as shown in the figures of Chapter 1, but there are also natural aggregation tiers where the traffic is simply dual-homed (for reliability) to two nodes in the higher tier. This means that the meshiness exits to some degree within each tier but not entirely among all nodes in the network. Voice networks have been designed and have been deployed (at least in North America) in a hierarchical manner. When considering the national backbone, there may be opportunity to take advantage of traffic fluctuations in different time zones, such as in New York and San Francisco. Therefore, a network such as the one shown in Figure 11.15 might benefit from MPLS techniques supporting a VOIPOMPLS design for constraintbased routing (for example, selecting a route that is the least congested at a particular time of day) as compared with a pure IP solution. Designers need to know the kind of traffic generated by voice (about 200 KB per call). Designers also need to urgently understand that traffic in core networks is never bursty unless the network is grossly overdesigned with hundreds of links when dozens would do.
11.4
Deployment and Issues
Carrier networks have at least two characteristics that must be taken into account when the introduction of VOIP is being considered:
Carrier Applications
447
1. As discussed earlier, carrier networks are complex, with national networks entailing typically several hundred nodes arranged in a natural multitiered architecture. There is an absolute need for interworking between and among the parallel local, long distance, and international networks. 2. There is a major embedded base of equipment (estimated at around $150 billion in the United States alone) that needs to be fully used. Recent events in the carrier space (especially in the United States) have clearly demonstrated that financial markets expect carriers to make a positive net bottom line of between 15 to 25 percent net (30 to 50 percent gross). Carriers do not make money by simply trying and deploying every technology that comes along, particularly if these technologies are not developed with network/customer/architecture needs of the carriers in mind. To advance the VOIP cause, there should be much less emphasis on protocols development and more emphasis on delivering products that cut the equipment cost by at least 100 percent and the operations cost by 50 percent. Only at these levels do carriers begin to consider alternatives to the PMO from a financial viability point of view, for a simple replacement of an existing system with a new one that has the same functions. Reliability continues to be a major requirement; costs cannot be cut at the expense of reliability. Alternatively, many new revenuegenerating applications are needed. One might be tempted to add “toll quality” as a mandatory requirement to the list of aforementioned characteristics, which must be taken into account. However, considering the generally low quality of cellular telephony services that people have gotten used to lately, quality is perhaps a negotiable parameterization. The opportunities for VOIP have to be in the telephone-to-telephone solution (see Figure 11.16). Only a small subset of people in the world can afford a twothousand-dollar PC just to make and receive telephone calls, as illustrated in Figure 11.17. The perspective market is represented by the total number of telephone users: about one billion on wireline systems and a half billion on wireless systems only in 1999. The VOIP market will have its best expansion when it enables (likely in the next few years) telephone terminals to transmit straight over the Internet or over an IP/MPLS carrier network, sparing the use of PCs. Figure 11.16 reinforces the point on telephone-to-telephone VOIP advantages in opportunities when compared with PC-to-PC opportunities [7]. It should be noted that, according to FCC numbers, only about 2 percent of all consumer expenditures are devoted to telephone service (approximately $1,000 per year per household). This percentage has remained virtually unchanged over the past fifteen years despite major changes in the telephone industry and in telephone usage. Therefore, new services are a must when addressing new revenues. Most public telecommunications operators in the world are still heavily dependent on voice revenues. Mobile revenues (largely voice) represent the main area of growth at the present. Price erosion of Internet revenues is offsetting volume gains
Figure 11.16 Telephone-totelephone VOIP opportunities as compared with PCto-PC opportunities. (Source: ITU World Telecommunication Indicators Database)
Current market size Market growth Projections
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
300 million
Computer to Computer < 50 million users
Computer to Phone < 50 million users
Phone to Phone > 912 million users
Poor
Poor
Good
490 million
912 million
6 billion 15%
High income Upper-mid income
58%
Lower-mid income
69% 82%
Low income
Internet users
Mobile users
Yearly Minutes of use
Telephone lines
Population
Volume of IP telephony traffic
200 B
Telephone to Telephone
100 B
Figure 11.17 Opportunities for VOIP. (Source: ITU projections)
Computer to Computer
1996
1997
1998
1999
448
2000
2001
2002
2003
2004
Carrier Applications
449
(for example, falling leased-line prices.) Some, on the other hand, paint the following very optimistic picture for the future: 1. Mobile Internet is likely to be a major area of future revenue growth 2. There is a possible future shift of broadcast entertainment (TV, music, and pay-per-view services) onto telecom-type networks (broadband Internet) 3. PSTN voice traffic will be shifting to IP-based networks In principle, VOIP/IP telephony is important for two reasons: in the short term, because it cuts the cost of calls (especially those routed over the public Internet), and in the long term, because telecom carriers are migrating their separate voice and data networks to converged IP-based networks [8]. Examples of IP telephony service providers include Net2Phone, Dialpad.com, and iBasis. Some forecasts that may be of interest to VOIP planners are [9]: • By 2005, there could be • 1.4 billion telephone lines • 1.3 billion cellular telephone subscribers • 500 to 750 million Internet users • These could account for • 250 billion minutes (if international voice/fax traffic) • trillion minutes of total voice/fax traffic • 1 million gigabits (1 petabit) per second of Internet traffic • Services market of around U.S. $1.1 trillion • Equipment market of around U.S. $400 billion • The premium of an international call over a domestic call (currently more than 300 percent) will be less than 20 percent • Internet-like pricing structure Some of the basic issues under consideration for VOIP are [8]: • Technical: • How is IP telephony defined? • Is QoS comparable? Will it improve? • How are numbering issues handled? • Economic: • What price and cost savings can be expected? • How quickly will carriers migrate their networks? • Is it just a form of bypass of telecom monopolies?
450
Chapter Eleven
• Regulatory: • Is it voice or is it data? • Is it a substitute? Is it an Internet application? • License it? Prohibit it? Restrict it? Liberalize it? • Should IP telephony contribute to Universal Service? Many service providers are now bundling various value-added solutions to supply customers with greater value and to address specific customer needs. Unified messaging is now included in complex solutions, such as IP centrex, and customers are charged for the whole package. While unified messaging may not generate significant revenue as a standalone service, it does enhance the value of the IP centrex solution. Bundling allows service providers to use existing applications to create customized solutions and to differentiate themselves from the competition while also targeting specific consumer groups. In addition, bundled services allow providers to charge package rates and thus balance their portfolios with popular low-margin and high-value high-margin applications. Flat monthly service fees have become popular with the increasing adoption of mobile communication. Service providers have started bundling a variety of value-added applications and are offering their entire packages at monthly fees rather than charging per-usage fees. Monthly fees are attractive to customers because they allow for easier budgeting of monthly telecommunication expenses. Service providers, on the other hand, benefit from bundling low-margin basic voice and data services with some high margin–enhanced services. Flat monthly fees allow service providers to generate higher profits during off-peak hours and from less-frequent users. However, it is imperative that market participants balance charges so that they produce acceptable margins during peak usage times [10]. These are just a few of the considerations that are necessary for exploring VOIP opportunities.
Wireless Networks Introduction A (typical) press quote puts the wireless discussion in context: While it is hard to be bullish about anything these days, the Boston, Massachusetts–based Yankee Group is extremely gung-ho about wireless. If the research firm is right, wireless penetration will double over the next five years—reaching 21 percent of the world’s population by 2006, or a total of 1.3 billion subscribers. That is a bold forecast considering the market slowdown. Currently, wireless penetration amounts to 10.6 percent of Earth’s 6 billion people, according to the Yankee Group. [11] Major opportunities in wireless appear to exist, as this other quote implies:
Carrier Applications
451
Cisco is betting “tornado markets” will spur new business this decade, much the way the Internet opened a torrent of new opportunities during the past one. The tornado markets include (i) Internet-based phone systems for businesses, (ii) wireless networks for home and offices, (iii) Internet-based systems to streamline the way corporations store data, and (iv) gear that reduces Internet traffic congestion by putting copies of popular digital content on computers in multiple geographic locations. A fifth twister is the gear for building metropolitan fiberoptic networks, the speedy on-ramps that connect the long-haul to local traffic centers. Cisco conservatively estimates these markets at $20 billion, and $40 billion by 2004. [12] Wireless technologies have seen major rollout in the past fifteen years worldwide, particularly in the context of “wide area” cellular voice services. A couple of key drivers in this arena have been voice quality and bandwidth conservation. While bandwidth conservation is already achievable with vocoding technology by itself, there are advantages in moving from a circuit-based vocoded-wireless voice network to a packet-based vocoded-wireless voice network; the latter can offer integrated services (new applications) to the user, as well as simplified network design, planning, engineering, and OAM&P. There also have been some lowbandwidth WAN data applications. More recently, there has been major interest in LAN-oriented wireless technologies, enabling users to have access to mobile/nomadic data services at high data rates (for example, at 11 Mbps). Carriers, planners, and proponents are looking in the upcoming few years into literally blending all of these different discrete services, technologies, architectures, and approaches into a distance-insensitive campus-to-national voice-and-data singlehandheld device to support all of a person’s communications needs. An integrated solution is clearly more cost-effective all around if it can be properly conceived, designed, and deployed. There is currently a heightened interest in penetrating the wireless applications market: “From airport lounges and hotel meeting rooms to coffee shops and restaurants across the globe, a wireless world is being built for mobile professionals to stay connected” [13]. In the shorter term, before the push to an all-integrated network, there has been a push for voice over IP over wireless (VOIPOW) into the traditional cellular/wireless carrier networks, particularly since the wireless industry itself is about to have the biggest change ever. Figure 11.18 depicts a simple application. VOIPOW solutions targeted at data (mobile laptop) users allow mobile workers to make and receive telephone calls on a shared wireless infrastructure. VOIPOW is becoming available for both wireless LAN and wireless WAN applications. For wireless LAN applications, companies are providing wireless capability to multiple applications. For example, using an IEEE 802.11b wireless PCMCIA card, mobile callers can conduct Internet telephony calls with ease. Future development includes voice-enabled wireless personal digital assistants (PDAs) and other Inter-
452
Chapter Eleven
Figure 11.18 VOIPOW. Mall
VoIPow Hub
CO Key Telephone System
PSTN
net devices. Travelers carrying equipment with an Internet browser and a wireless PCMCIA can pass within 30 to 500 feet of a wireless access point and log on to their corporate Intranet or check their e-mail or portfolios. An access point is the size of a portable CD player and has an antenna connected to it. Behind this device is a network/Internet infrastructure. Developers plan the use of Bluetooth systems and of the IEEE 802.11, 802.11b (Wi-Fi), and 802.15 systems to enable locals in top-of-the-line buildings to achieve global reach over an intranet, an extranet, or the Internet. According to recent Cahners In-Stat Group’s market data [14], “Public area net services in hotels, convention centers and airports are poised to flourish in coming years.” The firm projects $180 million of service revenues in 2002. This field is being tracked by the Wireless Ethernet Compatibility Alliance (WECA), which includes such members as Cisco, IBM, Intel, 3Com, and Microsoft. WECA is looking to forge relationships and network standards among wireless ISPs (WISPs) and, eventually, carriers that will enable roaming for IEEE 802.11b wireless LAN users.
Carrier Applications
453
According to WECA members, the public access wireless LANs that will be deployed in airports, convention centers, and even restaurants will create a burgeoning web of wireless LAN hot spots. These hot spots will let mobile workers with 80211b–equipped computers, connect over a shared 11 Mbps link to Internet-based services and corporate networks. [14] There will be significant investment, according to observers: The overall 802.11b market is expected to keep growing at a healthy rate despite the economic slowdown, according to an April 2001 report by Cahners In-Stat Group. By 2005, the firm estimates that companies will be spending nearly $6.4B on WLAN equipment. Companies such as IBM, Compaq, and Dell are introducing notebook PCs with built-in 802.11b radios and antennas. Adapter card vendors have just started bringing out 802.11b cards for handheld computers, such as those using Microsoft PocketPC software. The carriers are watching the project closely according to WISPr (Wireless Internet Service Provider Roaming—a contingent within WECA that is preparing roaming proposals/specifications) members . . . There is a tremendous amount of work going on by all carriers; they are quiet about it, but they are all doing it. [12] Public Access Locations (PALs) services are already available at the Dallas/Fort Worth and Austin, Texas, airports, as well as at Hilton hotels and Starbucks coffee shops and in marinas (e.g., Brewer’s Marinas in Connecticut).
Standards Snapshot A major issue facing wireless-systems designers is the fact that there is a large variety of wireless protocols, as seen in Table 11.2. The IEEE standard developed by Working Group 802.11 was accepted by the IEEE board during the summer of 1997 and became IEEE Standard 802.11-1997. This standard defines three different physical implementations (signaling techniques and modulations), a media access control function, and a management function. The three physical implementations are 1. Direct sequence spread spectrum radio (DSSS) in the 2.4-GHz band 2. Frequency hopping spread spectrum radio (FHSS) in the 2.4-GHz band 3. Infrared light (IR) All of the implementations support data rates of 1 Mbps and, optionally, 2 Mbps. The 802.11 Working Group then considered additions to the standard to provide higher data rates (5.5 and 11 Mbps) in the 2.4-GHz band and additions that will allow wireless LANs to operate in a 5-GHz band (see Table 11.2). DSSS is the most prevalent implementation. The differentiation with Bluetooth technology is that the latter is a low-cost, low-power, short-range radio link. The 802.15 wireless personal area network (WPAN) effort focuses on the development of consensus standards for PAN or short-distance wireless networks.
454
Provides application-development platform for mobile devices, including cell phones and PDAs. The Open Group's initiative aimed at defining standards for mobile-device management, including session management, synchronization, device-independent content, security and accounting. 3G standard similar to CDMA 2000 but uses wider 5-MHz radio channels; provides data rates up to 2 Mbps, but more spectrum needs to be allocated in some areas.
J2ME (Java 2 Platform, Micro Edition) MMF (Mobile Management Forum) W-CDMA (Wideband-CDMA)
Source: Network Computing (www.networkcomputing.com), 12/17/01.
WISPR (Wireless Internet Driven by the WECA (Wireless Ethernet Compatibility Association); represents industry's first Service Provider Roaming) effort to provide transparent roaming and billing across public wireless LANs.
Goal is to define physical and MAC standards for fixed point-to-multipoint broadband wireless access systems.
IEEE 802.16
Basic IEEE specifications for wireless local area networks.
IEEE 802.11 and IEEE 802.11b
Mired in technical debate and politics, it is critical to wireless LAN market expansion, but delays and indecisiveness may make it meaningless if de facto standards emerge.
Security framework for all IEEE 802 networks; one of the key components of future multivendor interoperable wireless security systems, but implementation won't be simple.
IEEE 802.1x
IEEE 802.11i
2.5G standard for wireless WAMs based on GSM (Global System for Mobile Communications) system deployed throughout Europe and in other parts of the world; on IP-based pocket-data system providing theoretical peak data rates of up to 160 Kbps.
GPRS (General Packet Radio Service)
New standard for 2.4-GHz wireless LANs, providing a bump in data rate to 20+ Mbps, but backward-compatible products won't arrive soon.
Extends GPRS data rate to 384 Kbps, but upgrade may be costly for carriers.
EDGE (Enhanced Data rates for Global Evolution)
IEEE 802.11g
3G standard for wireless WAMs; uses the same architecture as 1X; offers 384 Kbps outdoors and 2 Mbps indoors, but operators will likely need to wait for new spectrum.
CDMA 2000 3X
Revision of 802.11 MAC (Media Access Control) standard; provides QoS (Quality of Service) capabilities needed for real-time applications like IP telephony and video.
Qualcomm is pushing 1XEV as an evolution of 1X technology; uses 1.25-MHz CDMA radio channel dedicated to and optimized for packet data; throughputs of more than 2 Mbps.
CDMA 2000 1XEV
IEEE 802.11e
2.5G standard for wireless WAMs; provides more efficient voice and packet-switched data services with peak data rates of 153 Kbps.
CDMA (Code-Division Multiple Access) 2000 1X
Significance Derivative of Bluetooth 1.x spec; more meaningful standards development relate to Bluetooth application profiles.
Wireless Standards
Bluetooth/IEEE 802.15
Standards
Table 11.2
Carrier Applications
455
These WPANs address wireless networking of portable and mobile computing devices, such as PCs, PDAs, peripherals, cell phones, pagers, and consumer electronics, allowing these devices to communicate and interoperate with one another. The goal is to publish standards with broad market applicability and to deal effectively with issues of coexistence and interoperability with other wired and wireless networking solutions. The basic wireless technology in use today is the IEEE 802.11b technology; it supports up to 11 Mbps for distances up to 1500 feet, or thereabouts, depending on design. As an alternative, the IEEE 802.15 TG1 is deriving a WPAN standard based on that of the Bluetooth v1.x Foundation Specification. Approval by the IEEE Standards Board is expected in 2002. The scope and purpose is as follows: 1. To define PHY and MAC specifications for wireless connectivity with fixed, portable, and moving devices within or entering a personal operating space (POS). A goal of the WPAN Group is to achieve a level of interoperability that can allow the transfer of data between a WPAN device and an 802.11 device. The POS is the space about a person or object that typically extends up to ten meters in all directions and envelops the person, whether stationary or in motion. The proposed WPAN standard will be developed to ensure coexistence with all 802.11 networks. 2. To provide a standard for low-complexity low-power-consumption wireless connectivity to support interoperability among devices within or entering the POS, including devices that are carried, worn, or located near the body. The proposed project will address QoS to support a variety of traffic classes. Examples of devices that can be networked include computers, personal digital assistants (PDAs)/handheld personal computers (HPCs), printers, microphones, speakers, headsets, bar code readers, sensors, displays, pagers, and cellular and personal communications service (PCS) phones. A number of problems exist. For example, both IEEE 802.11 and Bluetooth operate in the same 2.4-GHz ISM Band, and Bluetooth-enabled devices will likely be portable and need to operate in an IEEE 802.11 WLAN environment. Another problem is that some level of mutual interference will occur. The IEEE 802.15 Coexistence Task Group 2 (TG2) for WPANs is developing recommended practices to facilitate the coexistence of WPANs (802.15) and WLANs (802.11). The task group is developing a coexistence model to quantify the mutual interference of a WLAN and a WPAN and is developing a set of coexistence mechanisms to facilitate coexistence of WLAN and WPAN devices. 1. Coexistence model, to demonstrate the effectiveness of the adopted coexistence mechanism and to quantify the effect of the mutual interference of WLAN (for example, 802.11) and WPAN (for example, Bluetooth) upon one another and the effect of the mutual interference under various scenarios:
456
Chapter Eleven
• WLAN in laptop and Bluetooth in nearby PDA. • WLAN and Bluetooth in the same laptop. 2. Coexistence mechanisms, mechanisms or techniques to facilitate the coexistence of WLAN and WPAN devices. 3. Both to be documented in an IEEE recommended practice. The IEEE P802.15.3 High Rate (HR) Task Group (TG3) for WPANs is chartered to draft and publish a new standard for high-rate (20 Mbps or greater) WPANs. Besides a high data rate, the new standard will provide for low-power, low-cost solutions addressing the needs of portable consumer digital imaging and multimedia applications. In addition, when approved, the new WPAN-HR standard may provide compatibility with the TG1 draft standard based upon the Bluetooth specification. The IEEE 802.15 TG4 is chartered to investigate a low-data-rate solution with multimonth to multiyear battery life and very low complexity. It is intended to operate in an unlicensed international-frequency band. Potential applications are sensors, interactive toys, smart badges, remote controls, and home automation (e.g., telemetry).
VOIP Applications The following is a list of three cases for the use of VOIP technology that are directly transferable to the wireless area: 1. Using IP as a transport technology in the long distance network. (Some call this long distance bypass.) Using VOIP as a long distance bypass by placing calls from a computer has been used widely now for several years. Wireless VOIP as a long distance bypass has been used in parts of Europe and Asia (mainly using GSM), regions that are still developing means of inexpensive long distance cellular calling. In the Czech Republic, for example, wireless VOIP over GSM has been in use since 1999 to cut the cost of international long distance calls from that country. 2. Using IP as a transport technology for existing services. An example would be server gateway architecture. New developments in the use of VOIP in wireless LANs have used this approach to cut costs, and many companies and institution now use it as a viable cost-reduction alternative. Table 11.3 IEEE 802.11 IEEE 802.15
IEEE 802.16
Key IEEE Standardization Activities Basic set of wireless standards. Supports wireless device stationed in or roaming through buildings. Supports short-range links among computers, mobile telephones, peripherals, and other consumer electronics that are worn or carried. Supports high-rate broadband-wireless access services to buildings through rooftop antennas from central base stations.
Carrier Applications
457
3. Using IP end to end. This use positions IP as an application enabler. Using VOIP end to end in a wireless environment greatly opens up the available applications to wireless users, as well as enables new applications to be developed that would not have been possible without VOIP use. Although these points were used originally as arguments for VOIP use in the landline area, VOIP in all three cases can now be applied to the wireless area as well. The demand for VOIP technology in wireless networks is increasing for a variety of reasons. For example, limited spectrum in developed countries has driven the wireless world to develop and deploy bandwidth-efficient high-quality networks such as VOIP. In addition, the increase in mobile workers requires quick, easy communication between remote locations. Wireless IP enables businesses to efficiently converge voice and data over a single wireless infrastructure. Finally, in greenfield environments (such as in developing countries), it is easier and more economical to deploy a wireless network than a wireline network. While bandwidth savings is a factor in using VOIPOW, it is only one element of the overall advantages. Supportive devices are poised to rapidly penetrate the consumer networking market, now that standards have emerged to enable low-cost development. Both traditional and new wireless carriers are aggressively seeking to build new, sustainable revenue streams based on high margin–enhanced services, which carry higher profit potentials. VOIPOW is seen by proponents as one such way of delivering these services to increase revenue. To date, however, VOIP has been used in a limited fashion in the wireless domain; wireless networks around the world are still almost completely circuitswitched voice-oriented. Furthermore, some observers say the VOIPOW is not ready for prime time. So what is the problem with wireless VOIP? In short, according to observers in the wireless space, VOIP is still under development with the goal of matching circuit-switched voice quality by using a similar spectral efficiency. All mobile networks are optimized to give the best possible voice quality through the bottleneck—the air interface—where every bit of extra capacity counts on the bottom line. Either VOIP will add extra headers into the air interface or an additional network element will perform the packing and unpacking in the core area. If voice packets are passed through the air interface, even with the compressed header technology, at least ten percent of the voice traffic will contain voice packet headers. If packing and unpacking is done in the core network area, some time delay will be added without improving the voice quality; thus, some flexibility of the IP service on offer will disappear. Packet delays and handling of lost packets will also be problematic. Coding delay (if the packet is generated in a network side), network delay, jitter-buffer delay, and queuing delay can add up, especially when calling overseas. At the present, sending a packet halfway around the world and back typically takes 500 ms. Even if that time should slightly decrease in the future, it will still not match the delay on existing voice lines. When
458
Chapter Eleven
one adds 3G voice coding to that configuration with two levels of interleaving at both ends, the delay will be very noticeable. Competition will also exist in the VOIP world. Probably the greatest threat will come from wireless LANs. A very inexpensive localized wireless LAN (IEEE 802.11b) can offer an inexpensive VOIP alternative with up to 11 Mbps of speed and 100 m of cell range with Internet and voice access points. It is already used widely in educational institutions and in the corporate world as a cost-saving application and operates on a free, unlicensed frequency band. Global ISPs, multinational companies, and even public groups can build very cost efficient networks in metropolitan areas. As noted, the Wireless Metropolitan Area Network (IEEE 802.16) for “the last mile” is promising even higher speeds and will be competing for the same wireless high-speed corporate-market data revenues as 3G networks. Are 3G operators willing to compete with that network in the wireless VOIP market after paying for an expensive license, as well as expensive financing and infrastructure? The building of 3G networks has already started in competitive, financially tough environments where mistakes are not tolerated. In these new networks, wireless VOIP will be provided as a part of the 3G multimedia applications. Some eastern European countries have already successfully experimented using VOIP with GSM. With cheaper call rates, wireless VOIP will definitely have its place in the future, when lower voice quality will be acceptable. Initially, all voice connections in 3G networks will be circuit-switched. 3G VOIP experiments will follow soon, but do not expect 3G operators to turn to VOIP only at any time in the near future. Meanwhile, 3G vendors can show their product development by making their international 3G network sales calls over wireless IP. While we wait for that, it appears that one of the 3G “killer applications,” the traditional voice traffic, will remain in circuit-switched side to guarantee quality [4]. Fixed-line VOIP enables people to make multiple overseas and long distance calls for one local call price. In VOIPOW, operators can still control call prices per the application used.
Cable Telephony Networks This section provides a summarized view of the cable telephony space. A study by RHK Inc. indicates that voice over broadband (cable) would reach $646 million by 2004, a 63 percent compound annual growth rate. RHK foresees ILECs and multiservice operators starting to offer bundled services to consumers and small- and medium-sized businesses. Also, a mid-2001 survey by Gartner Dataquest found that since the beginning of 2001, 6 percent of households had replaced a traditional telephony line with another form of access, usually broadband and ILEC high-speed access. Additionally, 61 percent of households reported having at least one cellular phone [15]. Major work in this arena has been undertaken by Cable Television Laboratories and PacketCable. PacketCable is a collaboration between multiple system operators (MSOs) and VOIP vendors to develop system specs for VOIP over cable. Figure 11.19 depicts the basic elements of the PacketCable Model [16].
Carrier Applications
Figure 11.19 PacketCable model.
Embedded client (MTA/CM)
HFC access network
459
Cable modem termination system
Managed IP network
Call management server
Signaling gateway PSTN Media gateway Packet cable model defines the functional elements for providing a residential voice telephony service using voice over Internet protocol (VoIP) over a hybrid fiber/coax (HFC) network. Elements of a VoIP cable network include: • An embedded client or cable modem/multimedia terminal adapter (CM/MTA); • An HFC access network; • A cable modem termination system (CMTS); • A call management server (CMS); • A managed IP network; • A signaling gateway; • A media gateway; and • A public switched telephone network (PSTN). The CM/MTA sits at the subscriber premises and provides a tip/ring pair to the telephone set. The CM/MTA acts as a residential gateway, converting analog voice into packetized voice for transmission over the HFC access network to the CMTS. These elements comprise the subscriber access network. The call management server establishes voice connections over the managed IP network. The signaling gateway passes messages to the SS7 signaling network, and the media gateway provides the voice-bearer path to the PSTN.
Technology advances and an increasingly competitive environment are forcing MSOs to take a strategic view of technology implementation and answer fundamental questions on how to leverage technology to improve profitability.13 Many hybrid fiber/coax (HFC) operators are considering an infrastructure migration to IP. Carriers have spent billions of dollars deploying TDM equipment in access and core networks. This legacy infrastructure has evolved as a series of overlay networks—one for video, one for data, and one for voice (see Figure 11.20). Although
460
Chapter Eleven
Figure 11.20 Legacy HFC multiservice delivery.
Logical digital switch
Telephony RSU
HDT
CM
CMTS
High-speed data
Edge router
HFC
STB Video
HDT RSU CMTS CM STB
= = = = =
IP backbone
Video distribution equipment
Host Digital Terminal Remote Service Unit Cable Modem Termination System Client Modem Set Top Box
the HFC network medium is shared, in the TDM world the services reside on different frequency allocations and have different head-to-end equipment and customer premises equipment (CPE). This nonoptimal infrastructure is the result of two primary factors, as follows: First, an efficient way to converge discrete services to gain any real economies in bandwidth or capital outlay has not existed historically. Voice, video, and data have very different technical requirements—with some requirements more forgiving than others—and very different degrees of ubiquity resulting from these requirements. Video is primarily a downstream service that is somewhat able to accept a noisy plant, sporadic service interruption, latency, and jitter. Cable data is the next most widely deployed service, despite more stringent network requirements. Network upgrades to a two-way plant often are required here, and the industry is still deciding on the issues of security, ISP colocation, and QoS differentiation. These issues are related in many ways and may be solved with an enhanced IP infrastructure. The third major service offering, voice, is the most QoSintolerant service and is arguably the service where MSOs have the most difficulty luring customers away from the competition. Nonetheless, MSOs have a competitive edge, because they have deployed hundreds of thousands of circuit-switched HFC telephony lines, and the growth rate for this service easily outpaces new fixed-line subscriber deployment for local exchange telephony carriers. Second, there is the heterogeneity of service offerings and implementation timeframes. MSO offerings range from providing video services exclusively to
Carrier Applications
461
offering all three converged services; the operators that deploy multiple services typically stagger the deployment times.
Why IP? The current network dynamic views IP as replacing TDM in access and core networks. The migration of HFC networks from circuit-switched services to IP-based has long been accepted as a foregone conclusion, the only remaining question being not if, but when? Migration to an IP-based network clearly is de rigeur for operators as they outline their vision. One key advantage for IP is critical mass: Although ATM was designed for exactly what HFC service providers wrestle with—namely, combined multiservices with very different QoS requirements—it was overdefined by the standards organizations, which prevented homogenous mass deployments and sent implementation costs skyward. IP, however, is a reflection of the style of IETF, which keeps definitions relatively simple and lets deployments proliferate as the market dictates. The catch, as HFC operators have found out, is that IP was not originally defined as a mechanism for different service requirements. All of the activity relating to developments such as diffserv, MPLS, and tunneling are attempts to provide functionality for a protocol with a besteffort, connectionless, data-only heritage. Packet-based networks enable more efficient use of bandwidth in the access network by interleaving, for example, voice and data traffic within the same spectrum allocation. Although cable TV systems support 1-Gbps rates in the downstream direction, the upstream channel is considerably smaller (approximately 30 Mbps), and the bandwidth has to be shared among all users on the cable segment. Hence, bandwidth conservation has some appeal. Because voice and data traffic have different QoS requirements, accommodating these requirements is processorand protocol-intensive. The disadvantage of being processor-intensive diminishes over time because Moore’s Law cuts the cost of processing in half every eighteen months or so. The protocol-intensive nature of converged networks is more of an issue as myriad standards, protocols, and pet technologies compete to be proven by commercial deployment. Moreover, operators have to weigh the benefits of niche protocols for specific applications versus the benefits of choosing a less than perfect protocol in the interest of ubiquity. IP has the momentum because it is so commonplace: It has become a de facto standard. It is extremely beneficial for an operator to use IP in lieu of other packet- and cell-based protocols, even if they are better for certain applications.
Implementation Alternatives Once operators have decided that a converged IP-based infrastructure is their objective, they need to address the most effective way to transition (or implement as a greenfield deployment) the network. The issues include technical, economic, and operational considerations and relate to an operator’s service mix and plans to
462
Chapter Eleven
change that mix, as well as the embedded base, the expected growth rate, and the infrastructure portion that will migrate to IP. Although packet-based infrastructures are applicable to certain types of video standards—an example is the Moving Picture Experts Group 2 (MPEG-2)—the momentum of IP convergence for the purpose of capital equipment convergence and bandwidth conservation is with voice and data. Operators have four strategic scenarios: 1. Deploying data-only services 2. Deploying circuit-switched voice and Data over Cable Service Interface Specification (DOCSIS) data 3. Leading with data and following with VOIP later 4. Deploying VOIP and data concurrently Several large MSOs have already deployed hundreds of thousands of circuitswitched HFC telephony lines, deployment that continues at a vigorous pace. Many such MSOs have no immediate plans to change from circuit-switched to packet-based voice; they are instead content with scenario 1 of Figure 11.21. Some operators view data as the only viable service addition to video, at least until the arrival of long-term end-to-end IP multiservice maturity. Many cable data systems are deployed today, particularly in the DOCSIS 1.0 environment separate from the video transmission spectrum. This specification is beneficial because it standardizes the HFC transmission link between the cable modem (CM) and the cable modem termination system (CMTS). In this case, it is a best-effort implementation of the statistically aggregated IP traffic (see scenario 2 in Figure 11.21). If QoS is intended to be applied to these data streams—with different classes of end users, for example—it will be necessary
Circuit-switched
Access
Core HDT
RSUs
1 4
GR-303 to PSTN
QPSK HFC Telephony distribution system GR-303 gateway
Softswitch Packet
Figure 11.21 Deployment scenarios. (Courtesy of Stuart Benington)
3 2
RSUs
DOCSIS HFC CMTS
CMTS Managed IP network Router
Carrier Applications
463
to upgrade these networks to DOCSIS 1.1. This upgrade is also necessary for such functions as applying different QoS parameters for a colocated ISP in an operator’s network. Different deployment models exist between an operator who plans to deploy data as a long-term strategy and an operator who plans to lead with data, and follow with VOIP while waiting for a more pronounced sign of technical maturity on the call management server (CMS) end. The CMS solutions, which are often referred to under the general label of softswitches, are the least mature component of PacketCable VOIP architecture (illustrated in the lower right quadrant of Figure 11.21). An interim solution that is gaining momentum is the concept of deploying DOCSIS 1.1 in the access network for voice and data and then performing an IP to GR-303 gateway conversion in the head end. This solution is particularly attractive for an operator who owns, or has access to, Class 5 switches. It gives operators the ability to capitalize on the economies and operational advantages of DOCSIS while retaining the comfort level, sunken costs, and depreciation benefits of employing their Class 5 switch for IP voice traffic in the access network. This solution does not change the DOCSIS data solution in the core; either the CMTS performs both the Layer 2 and the Layer 3 (routing) function or a Layer 2–only CMTS connects to an edge router in the head end as a handoff point to the IP backbone. The premise device evolves from strictly a cable modem to an embedded multimedia terminal adapter (E-MTA) that accommodates voice and data. This interim solution is illustrated as scenario 3 in Figure 11.21. Some MSOs that have a large amount of circuit-switched telephony deployed today are pursuing the reverse of this hybrid approach by using a GR-303 interface to a Class 5 switch in the core serving as the handoff point. This embedded base represents a major capital investment and source of revenue for operators who do not want to uproot this infrastructure unnecessarily. That said, operators may gain huge economies once the converged “managed IP backbone” becomes a reality, which provides a compelling incentive to roll circuit-switched HFC telephony traffic to a CMS. In this case, a packet voice gateway function in the host digital terminal (HDT) provides this link. This is illustrated as scenario 4 in Figure 11.21. A crucial difference between a DOCSIS data-only solution versus a DOCSIS voice-and-data solution is how the optical node layout is constructed. Operators cannot be as aggressive in data subscriber concentration if they plan to add voice in the foreseeable future. For example, the ratio of optical nodes to CMTS interfaces will drop substantially with the addition of G.711 VOIP for any reasonable blocking rate and centum call second (CCS) traffic load. Moreover, operators uncertain of whether they will pursue VOIP need to balance the up-front deployment costs needed to make the network VOIP-ready versus the overhaul costs associated with adding voice after the network has been engineered for data only. The ratio of the incremental cost of prebuilding a network to support voice (Ci) over the cost for a retroactive network upgrade (Cnu) gives a rough estimate
464
Chapter Eleven
for the necessary probability of adding voice to justify the prebuild (Pv). With the following formula, these costs may help determine the necessary probability for adding voice to justify the up-front cost: Ci/Cnu = Pv This calculation emphasizes an important point: If operators believe that even a modest chance of deploying VOIP exists, they should investigate the option of preparing HFC deployment for voice and data from the outset. One other relevant factor is how strictly the operator and vendor communities adhere to the PacketCable standards. Many standards and node interconnections have to work seamlessly for VOIP to become a reality (see Figure 11.22). The need for seamless connections holds true for both DOCSIS-to-PSTN connectivity and DOCSIS-to-CMS (end-to-end) VOIP solutions. The more a given standard or set of standards reaches ubiquity, the more choices the operator has for each node and the more the consumer benefits from a price and feature standpoint.
Criteria for Success Consumers do not care whether they have circuit-switched voice or VOIP services or whether their data is integrated in the same bandwidth. Key issues for consumers are price, reliability, and features. Effective implementation of IP, moreover,
SIP CMS
Application server(s)/ media servers
Managed IP backbone
MGCP/NCS TALI/ISTP
CMTS/ER
MGCP/ TGCP
MGCP/ NCS DOCSIS 1.1
RTP/IP
RTP/IP
Media gateway
SS7
EMTA TDM
Figure 11.22 Typical PacketCable architecture.
HDT/IP GR-303 gateway GR-303
Signaling Bearer traffic
Class 5
PSTN
Signaling gateway
Carrier Applications
465
should only be done when operators are comfortable with the technology’s maturity and economics. A network architecture based on IP multiservices clearly has the foundation to meet subscriber needs because prices are lower when capital and operational costs drop. Feature availability also flourishes as more vendors become PacketCablecompliant and as reliability approaches that of the circuit-switched world. The true measure of success is how well an operator fulfills consumer needs.
Other Considerations The PSTN model requires 99.94 percent system availability for telephone service subscribers. According to PacketCable, the same end-to-end network availability objective of 99.94 percent exists for VOIP over cable networks. The subscriber access portion has an availability target of 99.99 percent, or 00.01 percent downtime (53 minutes per line per year) [17]. To meet industrywide standards, carriergrade VOIP networks must meet certain levels of system availability, transmission quality, scalability, and maintainability [16]. The proposed AT&T–Comcast Cable TV combined network is reported to have 22 million subscribers and generate $19 billion per year in revenue, corresponding to $80 per month per home ($1,000 per year per home). Comcast is reportedly offering $72 billion to AT&T for 13.7 million homes. This means that the price tag is around $5,500 per home. Assuming that 30 percent of the $1,000 revenue from the customer goes to cover content fees and that 70 percent is infrastructuresupporting revenue, it would take eight years for a payback. Now assume that new services, such as cable telephony, can be overlaid in the cable to generate another $1,000 per year, which is the amount of telephone service expenditure in a typical household. Clearly, the payback can be reduced appreciably, perhaps in as little as four years, and serves as the economic motivation for cable telephony. Taking a different perspective, pundits have been for the last several years anticipating VOIP over HFC as “eighteen months to two years away”; hence, it is easy to see why skeptics have risen from the dust left behind from the original heady promises of deployment. Be that as it may, the key to launching VOIP lies in cable’s ability to take IP applications beyond plain-old telephony [18].
11.5
Enterprise Applications
Figure 11.13 depicts typical enterprise applications. Most enterprise VOIP environments are a variant of this basic figure. Because of the extensive trade-press coverage of this issue, our discussion on the topic is limited to this paragraph. Reportedly, Cisco has been very successful in introducing this technology to the enterprise environment. We note that a study released in late 2001 by Sage Research indicated that 86 percent of American companies were evaluating or
466
Chapter Eleven
deploying IP telephony. This encouraging news was tempered, however, by reports that companies continue to use PBXs long after they depreciate and that PBX capacity is underused. These signs suggest that getting enterprises to migrate will still require compelling new features.
11.6
International Opportunities
There continues to be international opportunities in VOIP. Figure 11.23 shows cost savings achievable by VOIP technologies in some selected routes, giving VOIP a window of opportunity while the arbitrage exists. In countries where there is competition and hence the emergence of new carriers, there clearly is a VOP/VOMPLS opportunity. As shown in Figure 11.24, the pool of such countries is increasing, although there has been a major practical retrenchment in North America as of late. There is relatively strong interest in VOIP in Asia (e.g., Taiwan, China, and Japan). IDC has forecast the market for VOIP services in Asia and Pacific Rim regions to be $6.2 billion by 2006 [19].14 The following shows the typical revenue “pie” for an international call [5]:
1.6
Calls to the United States One minute (USD, February 2000) 1.0
0.8
IDD IP out
0.7
IP in
0.6
Figure 11.23 VOIP advantages in international applications. (Source: Summary of ITU country case studies, available at www.itu.int/wtpf/ casestudies; Net2Phone: PTOs.)
0.4 0.2
0.2
0.2 0.2
0.1 Bolivia
Hungary
0.1 Nepal
Singapore
Note: "IDD" refers to published prices from the incumbent operator for international direct dialing. "IP out" refers to listing the Net2Phone IP Telephony service withing the country. "IP in" refers to using Net2Phone in the US to call to the country.
Figure 11.24 Countries with telecom competition. (Source: ITU Telecommunication Regulatory Database)
Increasing competition: By no. of countries, by service, 1995-2000 80 70
68
local
60
int.
50
49 39
40
57 53
long distance
36 30 20 10
18 16 13
0 1995
1997
2000
Countries permitting int'l competition 1990 1 Japan 2 New Zealand 3 UK 4 USA 1995 1 Australia 2 Canada (partial) 3 Colombia 4 Chile 5 Denmark 6 Finland 7 Japan 8 Korea (Rep.) 9 Malaysia 10 New Zealand 11 Philippines 12 Sweden 13 UK 14 USA
1998 1 Australia 2 Austria 3 Belgium 4 Brunei D. 5 Canada 6 Chile 7 Colombia 8 DPR Congo 9 Denmark 10 Dominican Rep. 11 El Salvador 12 Finland 13 France 14 Germany 15 Ghana 16 Guatemala 17 HK-China (after Dec 1998) 18 Indonesia 19 Ireland (after Dec 1998)
467
20 Israel 21 Italy 22 Japan 23 Korea (Rep,) 24 Malaysia 25 Mexico 26 Netherlands 27 New Zealand 28 Norway 29 Peru 30 Philipines 31 Russia 32 Somalia 33 Spain (after Dec 1998) 34 Sweden 35 Switzerland 36 Uganda 37 Ukraine 38 UK 39 USA
468
Chapter Eleven
• International telephone call at $3.00 every three minutes: • Telco that “owns” the customer gets its share of line rental (under U.S. $0.01 per call) • Telco that originates the call gets an international call charge (U.S. $2.00) • Telco that terminates the call gets a net settlement (U.S. $1.00) • PC-to-phone call (dialup) at $1.00 per call: • Telco that “owns” the customer gets a fractional share of the line rental plus local call charge (under U.S. $0.10 per call) • ISP that “owns” the customer or IP telephony provider gets a fractional share of the subscription charge (U.S. $0.10) • IP telephony provider gets profit (more than U.S. $0.70) • Telco that terminates the call gets an interconnect or local fee (under U.S. $0.10) Note: Interconnect rates are a fraction of settlement rates. Pricing VOIP services has been a recent driver at the international level [8]: • In competitive low-price markets, the main market opportunity for VOIP is for value-added services, such as unified messaging. • In markets in transition to competition, VOIP offers a route towards early introduction of competition and creates downward pressure on prices. • In high-price monopoly markets, VOIP creates (where permitted) opportunities for low-cost calls. Even if not permitted, VOIP is used widely to reduce costs for international call termination. Some of the economic and strategic questions asked on an international level include the following: • How big is the market for IP telephony? How big will it become? • What impact is IP telephony having on net settlement payments to developing countries? • Does IP telephony generate new traffic, or does it substitute for existing traffic? • What impact will IP telephony have on tariff rebalancing strategies of carriers? • Should developing country carriers attempt to block IP telephony or provide it? • Should incoming and outgoing IP telephony calls be treated differently? The ITU-T recently published these opinions on VOIP:
Carrier Applications
469
• Implications for Member States • The deployment of IP-based networks benefits users, industries, and the economy at large, because it fosters technical and market innovation, diversity and economic growth • IP . . . could be viewed as a significant opportunity for all countries to respond to the convergence of information and communication technologies and evolve their networks • Implications for Operators • Continuing development of the Internet and IP-based networks is a significant medium for communications and commerce; • Mobile wireless systems are expected to migrate towards an IP-based architecture in order to deliver integrated voice, data and multimedia services, as well as access to the Internet; • ITU . . . is of the view that: • IP Telephony applications are best supplied in a market in which consumers have choices among multiple, alternative sources because only then will citizens, businesses and the overall economy reap the benefits of innovation and cost effectiveness From a regulatory point of view, in the United States there is currently no specific regulation of IP telephony. It is exempt from FCC’s international settlements policy. In the European Union (EU), IP telephony is not considered voice telephony, because it is not considered real-time. In Canada, IP telephony service providers are treated like other telephony providers and contribute to Universal Service funds. In Hungary, IP telephony is allowed to provide a delay of more than 250 ms and packet loss of more than 1 percent. In China, operators have negotiated a specific accounting rate for IP telephony traffic [5].
11.7
Equipment/Vendor Trends
This section provides a snapshot view (current as of this writing) of some vendor activities to give a sample of the state of affairs in the VOIP arena. Only a few illustrative examples are covered to provide a sense of some of the trends. A number of equipment and software providers subscribe, at least at face value, to observations such as, Hidden behind the hype and promise of converged services is a basic truth: next-generation networks are only as valuable as their ability to allow us to
470
Chapter Eleven
quickly, economically shape and scale enhanced services that meet customers’ highest service quality expectations [1]. Table 11.4 identifies some key vendors in each equipment/software category. NetNumber, Inc., announced at the fall 2001 Voice on the Net (VON) in Atlanta, Georgia, that it has created the Interconnect with ENUM Alliance program to bring together SIP and ENUM technologies to advance service provider interconnection for voice communications over the Internet (ENUM is discussed in Chapter 10). NetNumber has stated that its new program is intended to promote SIP as the dominant signaling standard for interconnecting VOIP service providers and carriers and to establish NetNumber’s global ENUM service as the global address registry for cross-domain SIP server lookups based on telephone numbers. As covered in Chapter 10, using SIP and ENUM together will enable Table 11.4
Key Vendors in Each Equipment/Software Category
Signaling gateway AB NETWORK LIMITED ACE*COMM Adir Agilent Technologies Aspect Clarent Cognitronics Comgates Comgates Ltd. CommWorks Corporation Continuous Computing Corporation CopperCom IntelliNet Performance Technologies Sonus Networks Star21 Networks AG Tekelec Telica Media gateway AudioCodes Comgates Continuous Computing Convedia Convergent Networks CopperCom Nuera Sonus Networks Telica Unisphere VocalData
Media gateway controller ADC Alcatel Cisco Comgates Continuous Computing Corporation Convergent Networks CopperCom GNP ipVerse Lucent Marconi Nortel Nuera Santera Sonus Networks Sylantro Syndeo Corporation Tekelec Telcordia Technologies Telica Telos Technology Unisphere VocalData Vsys Application server Convergent Networks Kabira Sonus Unisphere Networks Vocal Data
Note: Several vendors manufacture products in several categories; the key product category is shown.
Carrier Applications
471
edge service providers to promote the availability of SIP-addressable endpoints to each other and to interexchange carriers [20]. The Interconnect with ENUM Alliance has four primary objectives: (1) to promote and test practical interoperability of SIP and global ENUM across multiple platforms, (2) to provide a context for resolving technical issues such as network address translations (NATs) and firewalls, (3) to offer a forum for exploring business issues associated with origination and termination settlement charges for SIP interconnects, and (4) to promote the availability of SIP-addressable endpoints to interexchange carriers. As part of the ENUM Alliance, members agree to 1. Provision E.164 phone numbers and associated network-facing SIP server addresses into NetNumber’s global ENUM directory, thereby allowing this information to be queried by other global ENUM participants 2. Have an ENUM-compliant network-facing SIP server that addresses IP-enabled endpoints provided by the company to its customers 3. Query NetNumber and accept SIP requests from other alliance members 4. Participate in interoperability trials with other alliance members, agreeing to promote the use of global ENUM within the telecommunications industry 5. Ensure that their network equipment is compliant with the NetNumber ENUM service 6. Have a basic commercial agreement with NetNumber for global ENUM service The Frame Relay Forum (FRF) has recently announced the adoption of FRF.20, the Frame Relay IP Header Compression Implementation Agreement. This agreement defines how IP header–compressed packets should be handled over an FR interface, including how the compression is to be negotiated between the compressor and decompressor and how packets are to be encapsulated. IP header compression is used in virtual circuits between end-user systems (that is, DTE to DTE). When an end user transports voice packets over an IP network, each voice packet is made up of the voice content (payload) and a header that contains additional protocol information. IP voice calls use three types of protocols—RTP, UDP, and IP—with the header consuming a substantial portion of the total voice packet. While the voice payload itself can be as little as 20 bytes long, the VOIP header can be as large as 40 bytes long. IP header compression examines the 40 bytes of header and finds redundancy and other opportunities for sending only a reference to the context rather than the entire header. By reducing the VOIP header from 40 bytes to a minimum of 2 to 4 bytes, IP header compression offers bandwidth efficiencies for enterprises, service providers, and providers of managed services networks. A 64-kbps link can transport only two uncompressed G.729 VOIP calls, but by using IP header compression, the same link can carry five concurrent calls [21].
472
Chapter Eleven
Cable Television Laboratories, Inc., or CableLabs, had planned to complete the DOCSIS (Data over Cable Service Interface Specification) 2.0 Specification by the end of 2001. As a part of this specification, advanced physical layer modulation techniques will be used to triple the raw upstream capacity of cable modem technology for cable operators and cable ISPs, without requiring any physical rebuilding of cable networks. In the initial DOCSIS 1.0 Specification, raw downstream and upstream capacities delivered are approximately 37 Mbps and 5 Mbps, respectively. In the current 1.1 Specification, the upstream capacity has been bumped up to approximately 10 Mbps. (No change has been made to downstream capacity; that was already highly optimized as part of the initial specification.) Cable ISPs parcel out this bandwidth in accordance with their best economic models to provide the current maximum 3 Mbps (downstream)/500 kbps (upstream) service that cable modem customers can experience. By raising the upstream capacity to the approximately 30 Mbps contemplated by the DOCSIS 2.0 Specification, cable ISPs will be afforded an opportunity to offer other advanced high-speed data services, including VOIP, on a more cost-effective basis, because they will be able to raise the upstream capacity to customers to a sufficient level to allow matched two-way speeds with only minimal capital investment. DOCSIS 2.0 will include two modulation techniques: synchronous code division multiple access (S-CDMA) and advanced frequency agile time division multiple access (A-TDMA). DOCSIS 2.0 will have full compatibility with DOCSIS 1.0 and 1.1 cable modems and cable modem termination systems, as well as additional channel capacity in the upstream path, a more robust operating environment from increased protection against electronic impairments that occur in cable systems, coexistence of S-CDMA and A-TDMA in the same physical channel as DOCSIS 1.0/1.1, and further enhancement and augmentation of the international DOCSIS standard (ITU J-114), which has been adopted worldwide and provides international interoperability of data signals [22]. At the end of 2001, AT&T announced two new service options that extend the reach of VOIP for businesses headquartered in the United States. First, AT&T Managed Internet Service with VOIP will now permit enterprise customers to define additional enterprises as on-net, thus creating VOIP extranets. Second, AT&T Managed Router Service with VOIP now supports DS-3 FR speeds at U.S.-headquarters sites, enabling hub-and-spoke VOIP architectures to reach sites in more than forty countries. The company notes that early adopters experimented with VOIP on a small scale—on the fringes of their networks—to check that the network had been VOIP-engineered for quality. As their confidence grew, customers, too, realized how efficient it could be to piggyback voice on top of data. Some corporate planners are now ready to move VOIP from the network fringes to the center and reap even greater benefits. Businesses with AT&T Managed Internet Service (MIS) now have the option to include other businesses in a VOIP extranet, meaning that through this technology a business can define a community of vendors, partners, and customers—all of whom must have AT&T MIS—as on-net, thus making the entire
Carrier Applications
473
community eligible for an on-net credit against the monthly flat-rate channel charge. The result is a monthly discount on intersite VOIP calls that can be as high as 50 percent against the effective rates. The addition of DS-3 FR capability to the AT&T Managed Router Service with VOIP changes the potential configurations of VOIP implementations based on FR service. The superior efficiency of DS-3 FR means that many voice channels can converge on a single location to be handled by a single 7200 Series Cisco router. AT&T states that as business continuity and operational efficiency become increasingly important to network planners, interest in VOIP has gone far beyond experimentation. According to AT&T, Customers have made it clear that it is time for VOIP to carry real weight in supporting intracorporate communication. [23]
References 1. D. Malossi and T. Harrison. “SnowShore Networks Introduces Web Content Integration for Enhanced Voice Services. www.pulver.com. December 3, 2001. 2. B. Yocom. “Voice over IP Is a (Fast) Moving Target.” Network World (January 29, 2001). 3. T. Kelly. “When Will Data Overtake Voice?” ITU, Competitive Carrier Forces 2000, Montreux, April 5, 2000. 4. P. Possi. “Voice in a Packet—Don’t Scrap Your Wireless Circuit Switches Yet!, Wireless Voice over IP.” www.umtsworld.com/articles/c0001.htm. 5. T. Kelly. “IP Telephony: Substitute or Supplement?” ITU, “Telecoms @ The Internet VI” IIR, Geneva, Switzerland, June 12, 2000. 6. T. Kelly. “When and Where Will IP Overtake Voice?” ITU, TeleNor Carrier Event, Lofoten, Norway, August 29–September 1, 2000. 7. F. Carducci. “Voice over IP and VIP-TEN Project.” [email protected]. 8. T. Kelly. “Internet Pricing and Voice over IP: Implications and Developments— Learning Initiatives on Reforms in Network Economies.” Lime.Net, Copenhagen, Denmark, March 11–16, 2001. 9. T. Kelly. “The New Network Economy.” ITU, Webster’s University, Geneva, Switzerland, February 29, 2000. 10. D. Malossi and T. Harrison. “Bundled IP-Based Voice Applications Speak to Consumer Needs.” www.pulver.com. December 12, 2001. 11. D. McDonough, Jr. Wireless News Factor. July 16, 2001. 12. WSJ. July 19, 2001. 13. IEEE Institute Magazine (July 2001).
474
Chapter Eleven
14. Network World. (May 28, 2001). 15. K. Dave. “Studies Reveal Growing Interest in Deploying New Telephony Solutions.” www.pulver.com. September 19, 2001. 16. W. Carter. “Planning 2002: Carrier-Grade IP Telephony, Designing Tomorrow’s Network,” (September 2001). www.cabletoday.com/ct2/archives/0901/0901 carrier-grade.htm. 17. VOIP Availability and Reliability Model for PacketCable Architecture, PKTTR-VOIPAR-V01-001128. www.packetcable.com/specifications.html. 18. L. Hamilton and M. Robuck. “Ho-Hum on VoIP?: Here’s How to Get Your IP Groove Back.” (October 2001). www.cabletoday.com. 19. K. Dave. “VoIP Penetration in Asia Continues.” www.pulver.com. September 26, 2001. 20. R. Archer. “NetNumber Launches Alliance Program to Promote SIP and ENUM Service Provider Interconnections.” www.pulver.com. October 15, 2001. 21. R. Archer. “Frame Relay Forum Approves IP Header Compression Implementation Agreement.” www.pulver.com. July 5, 2001. 22. R. Archer. “DOCSIS 2.0 to Triple Cable Operator Upstream Capacity.” www.pulver.com. September 4, 2001. 23. D. Malossi and T. Harrison. “AT&T Upgrades Voice over IP for Hubs, Extranets.” www.pulver.com. December 3, 2001.
Notes 1
For example, according to the FCC, in 1998 annual long distance service revenue in the United States was about $105 billion. 2 As compared with 105 billion minutes in 1999. 3 Nortel Networks, Lehman Brothers, Merril Lynch, and Dr. Bilel Jamoussi at Nortel. 4 Ovum sources. Volume 2002—voice, 30 Tbps; data, 400 Tbps. Volume 2006— voice, 30 Tbps; data, 700 Tbps. 5 A different and smaller figure would be derived from the above Giga Information Group, as follows: Bits transacted per year—2 × 160 × 109 × 60 × 64 × 103 ∼ 128 × 1016 (the factor 2 comes into play because the link is duplex). Applicable time base—3600 × 8 × 250 ∼ 7.2 × 106 (this equates to 8 hours a day, 250 days a year). The bits-per-second rate is ∼15 × 1010, or 150 Gbps; the above figure was 30 × 1012 6 At face value, one could argue that the digitization of the PSTN, which occurred from the mid-1960s to the mid-1980s, improved transport only. However,
Carrier Applications
475
while digital voice was a substitute for analog voice, the digital technology significantly reduced operation costs for the carriers. Operating digital switches required significantly fewer people and, in conjunction with Digital Crossconnect systems (similarly enabled by digitization), also greatly reduced the provisioning costs (by reducing wire-room work). Requirements for power, space, air conditioning, and so forth, were, in turn, reduced. Typically, the operating expenditure fraction of a carrier’s budget is 20 to 30 percent of the total; reducing operating expenditures has even a more profound effect on a carrier’s budget than equipment or transmission. 7 Some observers note that while “fiber/bandwidth glut” may be an issue in some parts of the world, it is not an issue in all parts of the world. 8 The cost per line of a traditional switch is in the range of $300 to $350. 9 Sprint recently did announce a multiyear billion-dollar project for VOATM. 10 As compared with 106 billion minutes in 2002. 11 We included this figure for reporting purposes. We believe the 2004 figure for services to be about $4 billion. 12 Similar numbers hold true for enterprise networks. 13 The material that follows in Section 11.4 is based on S. Benington (Stuart. [email protected]), “Planning 2002: HFC to IP Migration; Follow the Leader, or Set Your Own Course.” (September 2001). www.cabletoday.com/ct2/archives/0901/0901hfc-ip.htm. Also, Tellabs’ Broadband Media Group. 14 These numbers are higher than the numbers we subscribe to in this book, but we report them nonetheless.
Index
Abort, 232 Abort Association (ABORT) chunk, 254 Absolute category rating (ACR) test, 15 Access points, wireless, 452 Acknowledgment, 234 Adaptation speed control, 118–120 Adaptive codebook, 131 Adaptive delta modulation (ADM), 106 Adaptive differential pulse code modulation (ADPCM), 111–123, 123 adaptation speed control in, 118–120 adaptive quantization in, 116 algorithms used by, 112–114 applications for, 123 bit masking in, 117 codecs in, 123 decoder used by, 114, 121–122
difference signal computation in, 116 encoder used by, 114–121 input PCM format conversion in, 114 inverse adaptive quantization in, 117 quantizer scale factor adaptation in, 117–118 SNR for, 123 Adaptive pulse code modulation (APCM), 105 SNR for, 106 Adaptive quantization, 116 inverse, 117 Add, 221 Addresses: assignment of, in Internet, 32, 33 destination, inactive, 269 in DNS, 386, 391 Internet, converting telephone numbers to, 379, 385, 396 in IP, 61 resolution of, in CIOA, 37–38
477
Address Resolution Protocol (ARP), 37–38 Address Type, 248 Adler-32 checksum, 271–272 Admission control, 169, 293, 296–297 Adspec, 298, 301–302 Advanced Intelligent Network (AIN), 406 Advertised Receiver Window Credit, 245, 248, 251 Advertisement messages, 45 Algebraic codebook-excited linear prediction (ACELP), 132, 165 All-call query (ACQ), 402, 404–405, 408 America Online, 153 Analog to digital (A/D) converters, 102 Analysis modules, 102 ANSI IS-41, 406 Application layer, 25, 158 Application servers, vendors of, 470 APP reports, 51 a_rwnd, calculating, 267–268
478
Index
Associate, 232 Associations: abort of, 273–274 initialization of, 260–262 shutdown of, 274–276 startup/takedown of, 231 state diagram for, 258–260 termination of, 273–276 Asynchronous Transfer Mode (ATM), 3, 21–22 effect on router roles, 26 IP over, 36–39 QoS mechanisms in, 54–59 versus RSVP, 300 traffic management in, 54–55 voice over, 153, 162–163 AT&T: merger with Comcast, 465 VOIP services offered by, 472–473 ATMARP messages, 38–39 Audio: coding of, standards for, 150 real-time, 35–36 streaming, 166–167 Audioconferencing, mixers in, 48 AuditCapabilities, 221 AuditConnection (AUCX), 207 AuditEndPoint (AUEP), 207 AuditValue, 221 Authentication, 223 Availability, 223, 279 Bandwidth: availability of, 431 in cable, 461 compression of, 435 reservation of, in RSVP, 323 utilization of, and VOIP, 164
B bit, 243 Behavior aggregates (BAs), 288, 305 Best-effort service, 285, 296 Bit masking, 117 Bit rates: for G.723.1, 136 of vocoders, 128 Block pooling, 413 Bluetooth, 455 Border Gateway Protocol (BGP), 359 label distribution using, 356 Buffering, 321 Buffer management, 54 Buffers, finite, 86–88, 93 Businesses, communications among, 154 BYE reports, 51 Cable: bandwidth in, 461 capacity of, 472 voice over, 465 Call dropback, 403–405, 408 Call origination model, 72–73 Call participant management, 208 Canonical name (CNAME), 51 Capability Set 1 (CS1), 407 Capability Set 2 (CS2), 407 Carriers and VOIP, 7 Carrier voice networks, 439–447 Cause Code, 256 Cause Length, 256 Cause-Specific Information, 256 Cell relay, 157
Cellular telephony, spectrum used by, 406 Cepstra, 108 Change Heartbeat, 233 Checksum, 239 Adler-32, 271–272 Chunk Flags, 241, 244, 251, 253–258 Chunk Length, 241, 273 Chunk-Parameter Length, 242 Chunk-Parameter Type, 242 Chunk-Parameter Value, 242 Chunks: ABORT, 254 bundling of, 235, 239, 261, 264, 273 COOKIE ACK, 257 COOKIE ECHO, 256–257 DATA, 242–244, 263–268 ERROR, 255–256 HEARTBEAT, 252–253 HEARTBEAT ACK, 253–254 INIT, 244–248 INIT ACK, 248–250 SACK, 250–252, 265, 267–268 in SCTP, 240–258 SHUTDOWN, 255, 265 SHUTDOWN ACK, 255 SHUTDOWN COMPLETE, 257–258 Chunk types, values of, 240, 242 Chunk Value, 241 Circuit-switched networks: versus packet-switched, 157, 383–384 signaling in, 187–189 Cisco Systems, 32, 465
Index
Class-Based Queuing (CBQ), 322 Classical IP over ATM (CIOA), 37–39 address resolution in, 37–38 data transfer in, 39 problems with, 360 subnetworks in, 37, 38 Classless Internet Domain Routing (CIDR), 32 Codecs, 102 in ADPCM, 123 Code-excited linear predictive (CELP) vocoders, 130, 132 Coding: of audio, standards for, 150 differential, 105–106 linear predictive, 190–110 of speech, 126, 127, 130 of video, 145–146, 150 waveform, 65, 102–106 Comcast, merger with AT&T, 465 Commerce Department, 382 Committed access rate (CAR), 335 Committed rate, 334 Common Channel Signaling System 7 (CCSS7), 187–189 messages in, 225–226 and NP, 400 performance considerations for, 223 and SCTP, 223–226 security requirements for, 223 COMMUNICATION ERROR, 234
COMMUNICATION LOST, 234 COMMUNICATION UP, 234 Companding, 104 syllabic, 105 Confidentiality, 223 CONFIRMATION messages, 302 Congestion, 279 avoidance of, 234, 293, 325–326 effects on data versus voice, 370 management of, 293, 320, 323–325, 339 Conjugate-structure algebraic code-excited linear prediction (CS-ACELP), 140–141, 165 Constraint-Based Label Distribution Protocol (CR-LDP), 290–291 Constraint-based routing (CR), 349, 352 Contexts, 221 Contributing sources (CSRCs), 48 Controlled load service, 285, 287, 296 Convergence, 28 politics of, 378 Cookie, 257 Cookie Acknowledgment (COOKIE ACK) chunk, 257 Cookie Echo (COOKIE ECHO) chunk, 256–257 Cookie mechanism, 231 Cookie Preservative, 247 CreateConnection (CRCX), 206
479
Cumulative TSN Ack, 251, 255 Custom Queuing (CQ), 325 Data: receivers of, 265–266 requirements of, 460 senders of, 266 traffic volume of, versus voice, 377, 431, 434 transfer of: in CIOA, 39 and congestion, 370 DATA ARRIVE, 234 Datagrams, 25 fragmentation/reassembly of, 272–273 fragments of, 31, 34 in IP, 29–32 priority of, 35–36 Data over Cable Service Interface Specification (DOCSIS): 1.0 specification, 472 1.1 specification, 472 2.0 specification, 472 Datarates: of LPC, 110 of vocoding, 108 Delay, 280 algorithmic, 129 causes of, 291 components of, 129 controlling with QoS, 291 end-to-end, 74–75 in G.723.1, 134, 136, 148 in G.728, 139 in G.729, 134, 145 one-way system, 129 processing, 129 in vocoding, 129 for voice, 327–329 for VOIP, 162, 457–458
480
Index
Delay distribution, properties of, 84–86 Delay jitter, 280 DeleteConnection (DLCX), 207 Delivery, ordered versus unordered, 270 Delta modulation (DM), 106 Destination addresses, inactive, 269 Destination IP Address, 31 Destination Port Number, 239 Destroy SCTP Instance, 233 Difference signal computation, 116 Differential coding, 105–106 Differential pulse code modulation (DPCM), 106 diffserv, 283, 305–320, 331–334, 365–367 architecture of, 309–320 boundary/interior nodes in, 313 documents relating to, 306 ingress/egress nodes in, 313–314, 318 versus intserv, 288–289, 307, 326, 331, 369–370 and MPLS, 290 QoS support in, 307 scalability in, 320 traffic management in, 307–309 diffserv domain, 313 hosts in networks containing, 313 diffserv region, 314 Digital to analog (D/A) converters, 102 Digital wireless, transmission rate of, 145 Directory numbers, transporting, 410
Directory services, 377–378 Discards, 78 Discovery messages, 44, 358 Discrete Fourier transform (DFT), 108 Distance learning, technologies for, 6 Distance vector routing, 28–29 Distributed traffic shaping (DTS), 336 Domain Naming System (DNS): addresses in, 386, 391 and E.164 numbers, 417–421 effects of e164.arpa on, 393 resource records in, 393 use in ENUM, 392 Doubletalk, 67 Draft martini, 359 Droppers, 316 Drop policies, criteria for, 336 Drop preferences, 339 Duplicate TSN, 252 E.164, 379 fetching URIs with, 418–420 numbers in, and DNS, 417–421 numbers in, and NP, 416 e164.arpa, 385–386, 391, 417 administration of, 396 effect on global DNS system, 393 IANA considerations and, 420 organization of, 392 security considerations and, 420–421
E bit, 243 Elastic applications, 286 E-mail, and ENUM, 388 Endpoint configuration (EPCF), 206 End systems, 48 determining/signaling delay by, 299 ENUM. See Telephone number mapping ENUM Alliance, 382–383 ERROR messages, 302 Error-weighting filters, 132 Ethernet, 159 Excess burst, 334 Extension headers, 34 Fast reroute, 340 Fax machines, Internetenabled, 387–388 Feature negotiation, 208 Federal Communications Commission (FCC), 382 Federal Trade Commission (FTC), 382 Fill-in signal units (FISUs), 225–226 Filters: error-weighting, 132 long-term predictor filters, 131 short-term synthesis filters, 131 for speech, 102–103, 107–108 Filterspec, 297, 298 First in, first out/first come, first served (FIFO/ FCFS), 321, 323 Flags, 31, 418 Flow label, 34 Flows, 295, 297
Index
Flowspec, 297, 298, 299 Formant frequencies, 107 Forward error correction (FEC), 171 Forwarding equivalence classes (FECs), parameters of, 353 Fragment Offset, 31 Frame Relay, 157 voice over, 164 Frame relay access devices (FRADs), 164 FRF.20, 471 G.723.1, 126–127, 133–138, 146–148, 165 bit rates of, 136 decoder of, 138 delay of, 134, 136, 148 encoder of, 136–138 excitation frames in, 133 operation of, 136–138 ToR requirements for, 134 G.728, 138–140 delay in, 139 G.729, 126, 133–135, 140–145 decoder of, 143–145 delay of, 134, 145 encoder of, 141–143 excitation frames in, 133 ToR requirements for, 134 G.729A, 126, 151, 165 Gap Ack Block End, 252 Gap Ack Blocks, 251 Gap Ack Block Start, 251 Gatekeepers, 196 Gateways, 193–195 Geographic number portability (GNP), 400 Get SRTT Report, 233 Glitching, 75
Global namespace delegation, 396 Global switched telephone network (GSTN), and NP, 414–415 Global synchronization, 321 Global System for Mobile Communication Mobile Application Part (GSM MAP), 406 Grade of service, versus QoS, 341 Guaranteed service, 285, 287, 296 H.248, 186, 219. See also MEGACO H.263, 146 H.263/L, 146 H.323, 148–150, 165, 185, 186, 189–202 applications for, 190 components of, 165 versus ISUP, 217 problems with, 219 protocol model, 191 versus SIP, 215, 216–217 Version 2, 202 Version 3, 202 H.324, 146–148 Hard-state protocols, 301 Header Checksum, 31 Headers, 25 compression of: in IP, 371–372, 471 in VOIP, 471 in IPv6, 34 Head-of-line (HOL) blocking, 337 Heartbeat Ack Length, 254 Heartbeat Acknowledgment (HEARTBEAT ACK) chunk, 253–254
481
Heartbeat (HEARTBEAT) chunk, 252–253 Heartbeat Information, 253, 254 Heartbeat Length, 253 Heartbeat messages, 226 Host Name, 247 Host Name Address, 247 Identification, 31 IEEE 802.11, 456 IEEE 802.11-1997, 453 IEEE 802.16, 456 IEEE 802.15 TG1, 455, 456 IEEE 802.15 TG2, 455 IEEE 802.15 TG4, 456 IEEE 802.15.3 HR TG3, 456 Initial marking, 316 Initial TSN, 246, 249 Initialize, 232 Initiate Tag, 244–245, 248 Initiation Acknowledgment (INIT ACK) chunk, 248–250 parameters of, 249, 250 Initiation (INIT) chunk, 244–248 parameters of, 245, 246 Input PCM format conversion, 114 Instant messaging, and ENUM, 388 Integrated Services Digital Network (ISDN), 3 Integrity, 223 Intelligent Network (IN), 406 Interconnect with ENUM Alliance, 470–471 objectives of, 471 Interexchange carrier (IXC) environment, 439–440 International Telecommunications Union (ITU), 111
482
Index
Internet, 157–162 accessing via PSTN, 377–378 address assignment in, 32, 33 addresses, conversion of telephone numbers to, 379, 385, 396 call centers and, 169, 176 faxing via, 387–388 growth of, 3, 158, 159 history of, 158–161 versus PSTN, 178–180 QoS and, 161 speed of, 161 telephony over: packaging/pricing of, 178, 180 services offered by, 154 uses of, 153–154 voice over, market for, 177 Internet Corporation for Assigned Names and Numbers (ICANN), 382, 424 Internet layer, 24 Internet phones, 18 advantages of, 6 Internet Protocol (IP), 24–36 addresses in, 61 applications for, 456–458 criteria for success with, 464–465 datagrams in, 29–32 functions of, 25, 30 header compression in, 371–372, 471 link/node failure and, 372 migration to, 461–464 versus MPLS, 363 MTP over, 226–229 multicasting with, 171
networks using, 25–26, 383–384 over ATM (IPOATM), 36–39 RFCs for, 36 QoS architecture in, 330–331 routing with, 26–29 SCCP over, 229 telephony using, 389, 414–415, 432 questions about, 468 regulations regarding, 469 traffic management with, 329 uses of, 157, 432–433 version 4, address mechanism of, 33 version 6, 33–36 headers in, 34 security in, 34 voice over (see Voice over IP) Internet service providers (ISPs), billing practices of, 170–171 Internet Telephony Servers (ITSs), 15–18, 172–176 Internetwork routing, 285, 295 Intranets, 26, 161–162 intserv, 283–285, 294–305, 365, 368–369 applications for, 286 architecture of, 294–296 deployment of, 305 versus diffserv, 288–289, 307, 326, 331, 369–370 and MPLS, 290 QoS classes in, 285, 295 service classes in, 287, 296
Inverse adaptive quantization, 117 IP centrex, 450 IP Security (IPSEC), 223 IPv4 Address, 246 IPv6 Address, 246 ISDN User Part (ISUP), 188 versus H.323, 217 versus SIP, 217, 218 triggerless translation in, 406–407 ITJ-114, 472 ITN pooling, 413–414 Jitter, network-induced, 327 Label distribution: in MPLS, 43–45, 356–358 using RSVP, 43–44, 356–357 Label Distribution Protocol (LDP), 44–45, 358 Labels: in MPLS, 41–42, 354 for TE tunnels, 371 Label-switched paths (LSPs), in MPLS QoS, 290 Label switching, in MPLS, 41–43, 353–356 Label switch routers (LSRs), messages exchanged by, 44–45, 358 LEN, 30 Length, 243, 254, 255, 256, 257 Linear prediction analysis-bysynthesis (LPAS), 130–132 Linear predictive coding (LPC), 109–110 datarate of, 110 Link layer, 24
Index
Links: parameters of, 80–81 transient behavior of, 92–94 Link-state protocols, 29 Link-status signal units (LSSUs), 225 Local number portability (LNP), 400 Local output queues (LOQs), 337–338 Location portability, 401 Location routing numbers (LRNs), 408–409 Location server, 211 Lock-out, 321 Long-term (LT) predictor filters, 131 Low-delay code-excited linear prediction (LD-CELP), 138–140, 165 decoding in, 140 encoding in, 139–140 LSP tunnels, 43, 356 M2PA, 227–228 versus M2UA, 227–228 M2UA, 227 versus M2PA, 227–228 M3UA, 228–229 Markers, 316 Marking, 293, 335–336, 365–367 initial, 316 modes of, 335 Media Gateway Controller (MEGACO), 185, 186, 219–221 commands in, 221 constructs in, 221 functions of, 220, 221 versus MGCP, 219
Media gateway controllers (MGCs), 189–190 vendors of, 470 Media Gateway Control Protocol (MGCP), 185, 186, 202–207 commands in, 221 constructs in, 221 functions of, 220 versus MEGACO, 219 operation of, 207 primitives of, 206–207 Media gateways (MGs), 189–190 vendors of, 470 Message signal units (MSUs), 225 Message Transfer Part (MTP), 224–225 over IP, 226–229 Metering, 293 Meters, 316, 334 Micom, 172 Mixers, 48, 49 Mobile number portability (MNP), 401 Modems, V.34, 145, 146 Modified deficit round-robin (MDRR), 337–338 modes of, 338 Modify, 221 ModifyConnection (MDCX), 207 Monitors, 49 Move, 221 MPLS Forum, 359 Implementation Agreement of, 374–375 Multicast Address Resolution Server (MARS) model, 39 Multicasting, 297 in intserv, 296
483
in IP, 171 in RSVP, 285, 297 support for, in CIOA, 39 Multipoint controllers (MCs), 196 Multipoint control units (MCUs), 129, 196 Multipoint processors (MPs), 196 Multiprotocol Label Switching (MPLS), 39–45, 283, 343–363 advantages of, 360 approaches to, 350–352 connection establishment in, 349 deployment status of, 363 and diffserv, 290 documents relating to, 350 elements of, 355 features of, 40, 345–346, 349–363 goals of, 345 and intserv, 290 versus IP, 363 label distribution in, 43–45, 356–358 label switching in, 41–43, 353–356 motivators for, 345 packet forwarding in, 41–43, 346, 353–356 QoS based on, 289–291, 332, 345, 361, 365–368 routing technologies and, 346, 361 scalability and, 361 standards for, 41 TDM traffic on, 359 and traffic engineering, 340, 352–353, 361
484
Index
Multiprotocol Label Switching (MPLS) (Continued) views of, 372 and voice, 343–376 VPN support in, 361 Multipulse excitation vocoders, 131 Multipulse excitation with a maximum likelihood quantizer (MP-MLQ), 133 Mutual silence, 67 Name translation, 208 Naming authority pointer (NAPTR) records, 385, 393, 418–420 fields in, 418 uses of, 418 NetNumber, Inc., 470–471 NetNumber Alliance, 397–398 objectives of, 398 Network Address Translation (NAT), 61 Network interface layer, 24, 158 Network layer, 24, 158 Network-Network Interface (NNI) signaling, 55 Networks: achieving reliability in, 226 cable, capacity of, 472 for cable telephony, 458–465 carrier voice, 439–447 circuit-switched: versus packet-switched, 157, 383–384 signaling in, 187–189 design of, 446 elements of, 189–190
increasing capacity of, 8, 126 integrated, advantages of, 7 mobile, 457–458 national-level, 446 resource allocation in, 320, 365 using IP, 25–26, 383–384 for voice, 64, 371 versus data, 95 wireless, 450–458 Network server, 209 NETWORK STATUS CHANGE, 234 NeuStar, 424 Nonbroadcast Multiple Access (NMBA) networks, 37 Nongeographic number portability (NGNP), 400 Non-RTP means, 49 Normal burst, 334 North American Numbering Plan (NANP): number assignment in, 405, 411 switch numbers in, 408 Notification messages, 45 Notification request (RQNT), 206 Notify (NTFY), 206 NSFnet, 159 Number of Duplicate TSNs, 251 Number of Gap Ack Blocks, 251 Number of Inbound Streams, 245–246, 249 Number of Outbound Streams, 245, 248–249 Number portability (NP), 398–416
call routing and, 408–411 and CCSS7, 400 changes caused by, 399 database interfaces, 414 database queries in, 405–408 and ENUM, 394 and e.164 numbers, 416 geographic, 400 and GSTN, 414–415 interim versus true, 401 nongeographic, 400 and number conservation, 411–414 and SIP, 415 types of, 398–401 Onward routing (OR), 404–405, 408 Open Shortest Path First (OSPF), 29 Operation Error (ERROR) chunk, 255–256 Options, 32 Order, 418 Output queues (OQs), 337 Overprovisioning, 280, 284 PacketCable, 458 Packet classifier, 293 Packet filter, 298 Packetized Voice Protocol (PVP), 113 Packet loss rate, 280 Packets, 157 classification of, 314–315 discarding, algorithms for, 321–322 forwarding, in MPLS, 41–43, 346, 353–356 optimal length of, 90–92 in PCM, 103 prioritizing, 363–367 in SCTP, 239–258
Index
sequencing of, in RTP, 48 validation of, 235 waiting for, 76–78 Packet voice communication, 63–99 delivery requirements for, 74 technologies for, 101–124 Parcels, 75 PathErr messages, 285 Path management, 236 PATH messages, 285, 302–303 fields in, 302–303 PathTear messages, 285 Payload Data (DATA) chunk, 242–244 acknowledgment of, 265–268 transmission of, 263–264 Payload Protocol Identifier, 244 Payloads, 25 PCS telephony, spectrum used by, 406 Per-hop behaviors (PHBs), in diffserv, 307, 309–311, 318–320, 331 implementing, 319 specifying, 319 Personal operating spaces (POSs), 455 Policing, 294, 335–336 modes of, 335 Policy-based routing (PBR), 337 Policy control, 169 Postfiltering, 132 Power usage, 130 Preference, 418 Premarking, 316 Primitives: of MGCP, 206–207
SCTP-to-ULP, 234 ULP-to-SCTP, 232–233 Priority Queuing (PQ), 325 Private branch exchanges (PBXs), nomenclature for, 173 Private Network-Network Interfacing (PNNI), 62 Protocol, 31 Protocol data units (PDUs), forwarding of in MPLS, 353–356, 361 Protocols: for ENUM, 393 hard-state, 301 link-state, 29 for QoS, 280 real-time, 45 for routing, 27–28 for signaling, 185–186 soft-state, 299–300 Protocol stacks: for NP database queries, 406–407 for VOIP, 22–24 in VOIPOMPLS versus VOMPLS, 374 Proxy server, 211 PSTN and Internet Interworking (PINT), 215–218 Public Access Locations (PALs), 453 Public switched telephone network (PSTN): versus Internet, 178–180 versus IP networks, 383–384 system availability of, 465 transmission rate of, 145 using to access Internet, 377–378
485
Pulse code modulation (PCM), 103 packet length for, 92 QoS classes, 57–59 in intserv, 285, 295 mapping between, 289, 370 Specified, 58–59 Unspecified, 59 QoS facilities, 285, 296 Quality of Service (QoS), 279–342 approaches to, 280–281, 284, 368 architecture of, 281–282, 367–368 in IP, 330–331 in ATM, 54–59 capabilities of, 363–370 case study, 327–340 class-based, 288–289, 365–367 controlling delay with, 291 definition of, 281 degradation of, 291 determining/signaling, by end systems, 299 in diffserv, 307 documents relating to, 41, 365 factors affecting, 168 versus grade of service, 341 and the Internet, 161 means of achieving, 282–283, 363–365, 370 MPLS-based, 289–291, 332, 345, 361, 365–368 parameters for, 56–57, 279–280 per-flow, 284–288, 365 problems/solutions for, 167–169
486
Index
Quality of Service (QoS) (Continued) protocols for, 280 requirements for, 55 router mechanisms for dealing with, 293 in routing, 32–33, 346 and RSVP, 169–171, 367, 368–369 scalability and, 330 stated versus actual, 56 technical challenges of, 330 uses of, 167 and voice quality, 329–330 Quantization noise, 103 minimizing, 132 Quantizing, 102, 103 adaptive, 105, 116 inverse adaptive, 117 logarithmic, 103–104 scale factor adaptation, 117–118 uniform, 103 with vectors, 111 Query on release (QoR), 402–403, 404–405, 408 Queue operation, 79–80 Queues: causes of, 291 management of, 291–294, 320–326 scheduling of, 294, 321 and variance, 292 weighting of, 322 Queuing: FIFO/FCFS, 321, 323 functions of, 320–321 Random Early Detection (RED), 321 Real-Time Intolerant (RTI) applications, 286 Real-time protocols, 45
Real-time services: Cisco support for, 330–340 requirements of, 327–330 Real-Time Streaming Protocol (RTSP), 45 Real-Time Tolerant (RTT) applications, 286 Real-Time Transport Protocol (RTP), 45–50, 171, 191–193 applications of, 46 customizing, 62 features of, 48 packet sequencing in, 48 terms used in, 48–50 Receive, 232 Receiver-end buffering, 78 Receiver Reports (RRs), 51 Receive Unacknowledged Message, 233 Receive Unsent Message, 233 RED drop, 325 Redirect server, 211 regexp, 418 Register server, 211 Request Heartbeat, 233 Reservations, distinct versus shared, 298–299 Reserved, 258 Residual-excited linear prediction (RELP), 110–111 Resource requests, 297 Resource reservation, 300–301 Resource Reservation Protocol (RSVP), 283–285, 294–305 versus ATM, 300 bandwidth reservation in, 323 in Cisco QoS architecture, 331
deployment of, 305 documents relating to, 287–288, 295 extended, 43–44, 357 information flows in, 301–302 and ISP billing practices, 170–171 label distribution in, 43–44, 356–357 messages used by, 285 operation of, 301–304 and QoS, 169–171, 367, 368–369 reservation types in, 298–299 and routing, 170 signaling with, 357 terms used in, 297–298 traffic management with, 357 uses of, 287 RESTART, 234 RestartInProgress (RSIP), 207 ResvConf messages, 285 ResvErr messages, 285 RESV messages, 285, 302–304 fields in, 303 ResvTear messages, 285 Retransmission timer, management of, 268 RFC 2208, 287, 369 RFC 2475, 307–309 RFC 2547, 359 RFC 2702, 41, 350 RFC 2961, 288, 369 RFC 3031, 41, 350 RFC 3032, 41, 350 RFC 3033, 350 RFC 3034, 350 RFC 3035, 350 RFC 3036, 350
Index
RFC 3037, 350 RFC 3038, 350 RFC 3107, 350 Routers, 26–29 criteria for using with RSVP/intserv, 305 effect of ATM on, 26 elements of, 293–294 information exchange by, 28 mechanisms for dealing with QoS, 293 metrics used by, 28 operation of, 28 voice/video support in, 32–33 Routing: constraint-based, 349, 352 distance vector, 28–29 dynamic, 28, 29 internetwork, 285, 295 with IP, 26–29 and MPLS, 346, 361 policy-based, 337 protocols for, 27–28 QoS in, 32–33 and RSVP, 170 static, 28 Routing numbers, transporting, 410 Routing prefixes, 409 Routing tables, 28 updating of, 28, 29 Rspec, 298 RTCP packets, 49 RTP Control Protocol (RTCP), 45, 46, 50–52, 191–193 functions of, 50–51 reports generated by, 51 RTP packets, 49 RTP payload, 49 RTP session, 49–50
Sampling, 102–103 Sampling theorem, 102 Scalability: in diffserv, 320 and MPLS, 361 and QoS, 330 and TE tunnels, 371 and VOIP, 440 SCCP User Adaptation Layer (SUA), 229 Security: and CCSS7, 223 and e164.arpa, 420–421 in IPv6, 34 Selective Acknowledgment (SACK) chunk, 250–252, 265 processing of, 267–268 Send, 232 SEND FAILURE, 234 Sender Reports (SRs), 51 Sender template, 298 Serial number arithmetic, 236 Service, 418 ServiceChange, 221 Service E2U, specification of, 419 Service in the PSTN/Internet Interworking Requesting Internet service (SPIRITS), 215–218 Service portability, 401 Service provider number portability (SPNP), 398–399, 401 implementations of, 412–413 schemes for, 401–405 Services, 309 bundling of, 450 classes in intserv, 287, 296 real-time, 327–340
487
Service scheduling, 320–321 Session Description Protocol (SDP), 204 Session Initiation Protocol (SIP), 185, 207–215 compatibility with other protocols, 215 components of, 209–210 and ENUM, 388–389, 393, 470–471 establishing communication with, 211–214 functions of, 208 versus H.323, 215, 216–217 versus ISUP, 217, 218 and NP, 415 for telephones (SIP-T), 210 Session messages, 44 Sessions, 295, 297, 298, 356 Set Failure Threshold, 233 Set Primary, 232 Set Protocol Parameters, 233 Shapers, 316 Shaping, 294 Short-term (ST) synthesis filters, 131 Shutdown, 232 Shutdown Acknowledgment (SHUTDOWN ACK) chunk, 255 Shutdown Association (SHUTDOWN) chunk, 255, 265 SHUTDOWN COMPLETE, 234 Shutdown Complete (SHUTDOWN COMPLETE) chunk, 257–258
488
Index
Signaling, 183–277 in circuit-switched networks, 187–189 endpoints for, 187 in-band, 282, 363–365 out-of-band, 282–283, 365 overlap in, 415–416 philosophies of, 185 protocols used for, 185–186 with RSVP, 357 standards for, 186, 189 steps in, 200–202 Signaling Connection Control Part (SCCP) over IP, 229 Signaling gateways, 190 vendors of, 470 Signaling route set, 226 Signals: compression/exponentiation of, 104 digitizing, 102 reconstruction of, 120 Signal-to-noise ratio (SNR): for ADPCM, 123 for APCM, 106 for telephony, 103 Signal units (SUs), 225 Sigtran protocols, 221–229 components of, 222 documents relating to, 222 functions of, 222 Silence compression algorithms, 128, 164 Silence suppression, 146–148 Slamming, 395–396 Soft-state protocols, 299–300 Softswitches, 189 functions of, 220 Source Descriptions (SDESs), 51 Source domain, 316 Source IP Address, 31
Source Port Number, 239 Speaker models, 67–72 effects of, 88–90 Specifications, size versus efficacy, 342 Speech: analysis of, 101 coding of: for multimedia applications, 127 quality of, 130 standards for, 126 decoding, 130–131 filtering for telephony, 102–103 filters for, 107–108 synthesis of, 101 voiced versus unvoiced, 102 Speech events, 66–67 SRV records, 393 Standardization, 134–135 State Department, 382 Static routing, 28 Status, 232 Stream Control Transmission Protocol (SCTP), 52–54, 230–276 architectural view of, 230–231 association initialization in, 260–262 association state diagram for, 258–260 association termination in, 273–276 and CCSS7, 223–226 chunk definitions in, 242–258 chunk field descriptions, 240–242 common header field descriptions, 239 functional view of, 231–236
multihomed endpoints in in, 268–269 packet format in, 239–258 primitives of, 232–234 services offered by, 53, 224, 226, 230–231 terminology of, 237–238 Stream Identifier, 243, 269 Streaming, 166 Streams, sequenced delivery within, 231–233 Stream-Sequence Number, 244, 270 Subnet masks, 32 Subnetworks, 32 in CIOA, 37, 38 Subtract, 221 Suggested Cookie Lifespan Increment, 247 Supported Address Types, 248 Switches: interfaces used by, 407–408 numbers assigned to, 408–409 optimizing, 168 Synchronization sources (SSRCs), 50 Synthesis modules, 102 Tail drop, 321, 325 Talkspurt, 67 T bit, 254, 258 TCP/IP, 24–26, 157 applications for, 25 layers in, 158 TEARDOWN messages, 302 Telecommuters, technologies for, 6 Teleconferencing, technology needed for, 129 Telephone calls, routing of, 386, 392, 407–411
Index
Telephone number mapping (ENUM), 218–219, 377–425 administration of, 396–397 advantages of, 378, 384, 389, 390 applications for, 378, 387–391 architecture criteria for, 380 call routing in, 386, 392 cancellation of service in, 394 cost of, 395 criteria for use of, 385, 390 domain used by, 385–386, 391 emergency numbers in, 393 future of, 390 and NP, 394 operation of, 384–386, 390–392 and private numbering plans, 395 protocol used by, 393 and SIP, 388–389, 393, 470–471 and slamming, 395–396 subscriber rights and, 395 things it does not do, 379, 380–381, 384–385, 392 use of DNS in, 392 user control over, 395 and VOIP, 384, 393 wrong numbers in, 393 Telephone numbers: authentication of, 394–395 conservation of, 411–414 conversion to Internet addresses, 379, 385, 396 under E.164, 379
geographic versus nongeographic, 399, 400 porting of, 394 (see also Number portability) Telephones, Internet, 18, 164–165 advantages of, 6 Telephony: cable, 458–465 cellular/PCS, spectrum used by, 406 consumer cost of, 447 future of, 433 international, revenue breakdown for, 466–468 Internet-based: packaging/pricing of, 178, 180 services offered by, 154 using IP, 389, 414–415, 432 questions about, 468 regulations regarding, 469 mobile, 188, 447 SNR for, 103 speech filtering for, 102–103 tariffs on, 178 Telephony Routing Information Protocol (TRIP), 415 Telephony routing over IP (TRIP), 219 Terminals, 191–193 Terminations, 221 Terms of reference (ToR), 134 TE tunnels, 371–372 labels for, 371 repair of, 372 and scalability, 371 Third-party monitors, 49
489
Throughput, 280 Time to Live, 31 Token buckets, 294 parameters of, 334 Tone/transition detection, 120–121 Total Length, 31 Traffic: classes of, 332 conditioning of, 314, 315–318 controls for, 299 descriptors for, 336 factors affecting, 339–340 management of, 283, 291–294 in ATM, 54–55 in diffserv, 307–309 in IP, 329 models of, for voice, 66–73 non-congestion-controlled, 35–36 profiles for, 315 scheduling of, 337–338 shaping of, 293, 336–337 volume of, for voice versus data, 377, 431, 434 Traffic engineering (TE), 280, 339–340 and MPLS, 340, 352–353, 361 and RSVP, 357 and voice, 371 Transaction Capabilities Application Part (TCAP), 188–189 Transfer syntax definitions, 375 Translators, 50 Transmission Control Protocol (TCP): functions of, 24–25 limitations of, 230 use by LDP, 45, 358
490
Index
Transmission sequence number (TSN), 234, 243 duplicate, 252 reporting gaps in, 270–271 Transport addresses, 50 Transport layer, 24, 158 Transport mechanisms, reliable versus unreliable, 148 Tspec, 298 Type of Service, 30 U bit, 242–243 Uniform resource identifiers (URIs), fetching with e.164 numbers, 418–420 Uniform resource locators (URLs), 384 User agent, 209, 211 User agent client (UAC), 209 User agent server (UAS), 209 User data: fragmentation of, 233, 239 transfer of, 262–273 User Data, 244 User Datagram Protocol (UDP): functions of, 25 in LDP, 358 User location, 208 User-Network Interface (UNI) signaling, 55 Vector quantization (VQ), 111 Vendors, 470 of VOIP, 12, 177, 469–473 Verification Tag, 239 VERS, 30 Video: access to, 3 coding of, 145–146 standards for, 150
requirements of, 327, 460 router support for, 32–33 Virtual output queuing (VOQ), 337 Virtual private networks (VPNs), support in MPLS, 361 Vocoding, 15, 65, 107–111 advantages of, 108 applications for, 145–150 attributes of, 128–130 bit rates of, 128 complexity of, 129–130 computing power required for, 125–126 datarates of, 108 delay in, 129 high-bit-rate, 128 low-bit-rate, 125–151 LPAS, 130–132 packet length for, 92 parametric, 108–109 quality of, 130 selecting technology for, 129–130, 151 standards relating to, 125, 126–127, 134–135 versus waveform coding, 103 Voice: access to, 3 allowable delay for, 327–329 committing network resources to, 371 compression of, 8, 15, 164 and congestion, 370 cost of, 172 digital: benefits of, 63 network construction for, 64 and MPLS, 343–376
over Frame Relay, 164 over Internet, market for, 177 quality of, and QoS, 329–330 requirements of, 327, 460 revenues generated by, 377, 431, 433, 434 router support for, 32–33 technologies for, 101–124 tranffic engineering for, 371 traffic models for, 66–73 traffic volume of, versus data, 377, 431, 434 user tolerance for degradation of, 64, 74, 75, 162, 164 wireless, 451 Voicemail, and ENUM, 388 Voice over ATM (VoATM), 153, 162–163 documents relating to, 163 Voice over broadband (cable), growth of, 458 Voice over data networks, 3, 162–167 requirements for, 168–169 standards for, 11 Voice over IP (VOIP), 153–181, 164–165 advantages of, 22, 468 applications for, 10, 13–14, 153, 154, 456–458 developing, 429–431, 439, 447 enterprise, 465–466 approaches to, 14–18 bandwidth utilization of, 164 over cable, system availability of, 465 carriers and, 7 challenges of, 13 compression used with, 164
Index
consumer cost of, 450 delay associated with, 457–458 allowable, 162 deployment criteria for, 439, 447–450 and ENUM, 384, 393 equipment used for, 154 evolution of, 6–7 factors impeding, 183, 279, 344–345, 428–429 functioning of, 327 future of, 436, 440–441, 449 and 3G, 458 goals of, 391 growth of, 427–428 header compression in, 471 improvements in, 389 interface cards for, 172–174 international opportunities for, 466–469 ITU-T opinions on, 469 market for, 177 market penetration of, 438–439 maturity of, 441–442 migration to, 443 over MPLS (VOIPOMPLS), 372–374 versus VOMPLS, 374 negative drivers for, 12–14 positive drivers for, 6–12, 433–434 pricing for, 7, 18 protocol stack for, 22–24 quality of, 164–165 regulatory issues regarding, 177–180
requirements for, 22, 126 revenue generated by, 11, 427–428, 438 scalability and, 440 services offered by AT&T, 472–473 signaling and, 183–277 standards for, 12 threat from WLANs, 458 tools for supporting, 283 types of, 177 vendors of, 12, 177, 469–473 versus VOMPLS, 372 Voice over MPLS (VOMPLS), 344, 349 factors impeding, 360 versus VOIP, 372 versus VOIPOMPLS, 374 Voice over packet (VOP), factors impeding, 344–345 Voice Protocol for Internet Mail (VPIM), goals of, 391 VOIP gateways, 172–175 advantages of, 172 desirable features of, 175 operation of, 174–175 VOIP over wireless (VOIPOW), 451–452, 457–458 advantages of, 457 Waveform coding, 65, 102–106 versus vocoding, 103 Web telephones, 164–165
491
Weighted Fair Queuing (WFQ), 322, 323 Weighted Random Early Detection (WRED), 293, 321–322, 339 Weighted Round Robin (WRR), 322 Wireless Ethernet Compatibility Alliance (WECA), 452–453 Wireless local area networks (WLANs): coexistence with WPANs, 455–456 as VOIP competitor, 458 Wireless metropolitan area networks (WMANs), 458 Wireless networks, 450–458 access points in, 452 advantages of, 457 physical implementations of, 453 standards relating to, 453–456 Wireless personal area networks (WPANs), 453–456 coexistence with WLANs, 455–456 standards relating to, 455 Wireless services: future of, 451–453 growth of, 433, 451 routing of calls on, 407–408, 411 WRED drop, 326