Implementing Mobile TV
This page intentionally left blank
Implementing Mobile TV ATSC Mobile DTV, MediaFLO, DVB-H/SH, DMB, WiMAX, 3G Systems, and Rich Media Applications Amitabh Kumar
AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Focal Press is an imprint of Elsevier
Focal Press is an imprint of Elsevier 30 Corporate Drive, Suite 400, Burlington, MA 01803, USA The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, UK © 2010 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). All information presented in this book is based on the best efforts by the author and is believed to be accurate at the time of writing. It should be recognized that Mobile TV is still an emerging technology and many facets of the technology including standards, regulatory treatment, spectrum and applications may undergo changes. The author or the publisher make no warranty of any kind, expressed or implied with regard to the accuracy or completeness of the information contained, documentation or intended uses of any product or service described herein. The author or publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing or use of this information in any manner whatsoever. Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data Application submitted British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN: 978-0-240-81287-8 For information on all Focal Press publications visit our website at www.elsevierdirect.com 10 11 12 13
54321
Printed in the United States of America
This book is dedicated to my father. I hope that posterity will judge me kindly, not only as to the things which I have explained, but also as to those which I have intentionally omitted so as to leave to others the pleasure of discovery. René Descartes, La Geometrie (1637)
This page intentionally left blank
Contents The most merciful thing in the world, I think, is the inability of the human mind to correlate all its contents. H. P. Lovecraft, The Call of Cthulu (1926)
Mobile TV-A Prologue ........................................................................................ xv Introduction to the Second Edition ...................................................................... xix
Part I Overview of Technologies .................................................1 Chapter 1: About Mobile TV ............................................................................... 1 1.1 The Beginning ..................................................................................................... 2 1.2 Mobile TV: A New Reality .................................................................................. 3 1.3 What Else is Different in Mobile TV? ................................................................. 4 1.4 Standards for Mobile TV ..................................................................................... 5 1.5 New Growth Areas with Mobile TV ................................................................... 7 1.6 What Type of Opportunity Does Mobile TV Present? ........................................ 7 1.7 What Handset Types Does Mobile TV Work On? .............................................. 8 1.8 Is Mobile TV Really Important?.......................................................................... 8 Chapter 2: Introduction to Digital Multimedia ....................................................... 9 2.1 Introduction ......................................................................................................... 9 2.2 Picture ................................................................................................................ 10 2.3 Image Compression ........................................................................................... 16 2.4 Video .................................................................................................................. 18 2.5 Analog TV Signal Formats ................................................................................ 21 2.6 Digital TV Formats ............................................................................................ 23 2.7 Video Bit Rate Reduction .................................................................................. 25 2.8 Compression Standards ..................................................................................... 31 2.9 The AVS – M Video Coding Standard (China) ................................................. 39 2.10 Video Files ......................................................................................................... 40 vii
viii
Contents
2.11 2.12 2.13 2.14 2.15 2.16
File Containers and Wrappers ........................................................................... 44 Audio Coding .................................................................................................... 46 Audio Compression ........................................................................................... 48 Streaming ........................................................................................................... 54 Streaming Players and Servers .......................................................................... 57 Summary and File Formats ................................................................................ 60
Chapter 3: Introduction to Streaming and Mobile Multimedia ................................ 63 3.1 What is Mobile Multimedia? ............................................................................. 63 3.2 How Do Mobile Devices Access Multimedia?.................................................. 67 3.3 File Formats for Mobile Multimedia ................................................................. 68 3.4 3GPP Mobile Media Formats ............................................................................ 72 3.5 Internet Video .................................................................................................... 81 3.6 Flash Lite™ ....................................................................................................... 83 3.7 DivX Mobile ...................................................................................................... 84 3.8 Rich Media–Synchronized Multimedia Integration Language (SMIL) ............ 87 3.9 Delivering Multimedia Content ......................................................................... 90 3.10 Graphics and Animations in the Mobile Environment ...................................... 96 3.11 Mobile Multimedia Applications....................................................................... 98 3.12 Summary of File Formats Used in Mobile Multimedia................................... 103 Chapter 4: Overview of Cellular Mobile Networks ...............................................105 4.1 Introduction ..................................................................................................... 105 4.2 Cellular Mobile Services: A Brief History ...................................................... 106 4.3 CDMA Technologies ....................................................................................... 109 4.4 3G Networks .................................................................................................... 111 4.5 3G Technologies: CDMA and GSM................................................................ 114 4.6 4G Technologies .............................................................................................. 118 4.7 Data and Multimedia Over Mobile Networks ................................................. 119 4.8 Multimedia and Data Over 3G Networks ........................................................ 122 4.9 Mobile Networks: A Few Country-Specific Examples ................................... 128 Chapter 5: Overview of Technologies for Mobile TV ............................................137 5.1 Why New Technologies for Mobile TV? ........................................................ 137 5.2 What Does a Mobile TV Service Require? ..................................................... 139 5.3 Mobile TV Using 3G Technologies ................................................................. 148 5.4 Terrestrial TV Technology Overview .............................................................. 154 5.5 Mobile TV Using Terrestrial Broadcasting Networks ..................................... 163 5.6 Comparison of Mobile TV Services ................................................................ 175 5.7 Outlook for Mobile TV Services ..................................................................... 178
Contents
Part II
ix
Technologies for Mobile TV and Multimedia Broadcasting .............................................................181
Chapter 6: Mobile TV Using 3G Technologies .....................................................181 6.1 Introduction ..................................................................................................... 181 6.2 The Beginning: Streaming on Mobile Devices................................................ 184 6.3 Overview of Cellular Network Capabilities for Carrying Mobile TV............. 189 6.4 Understanding a 3G Streaming Service........................................................... 192 6.5 Mobile TV Streaming Using 3GPP Standards: Packet-Switched Streaming Service ............................................................................................ 193 6.6 Broadcasting to 3GPP Networks ..................................................................... 200 6.7 Examples of Streaming Platforms ................................................................... 201 6.8 Practical Implementation of Video Services over 3G Networks ..................... 202 6.9 Operator-Specific Issues in 3GPP Streaming Services .................................... 209 6.10 Multimedia Broadcast and Multicast Service (MBMS) .................................. 209 6.11 Mobile TV Services Based on CDMA Networks............................................ 213 6.12 Other Multimedia Services over 3G Networks................................................ 215 6.13 Wi-Fi Mobile TV Delivery Extensions............................................................ 218 Chapter 7: Mobile TV Services in the ATSC Framework .......................................221 7.1 Introduction: Digital Broadcasting to Handhelds and Mobile Devices ........... 222 7.2 Why ATSC Mobile DTV? ............................................................................... 222 7.3 The Open Mobile Video Coalition (OMVC) ................................................... 223 7.4 Technology of ATSC Mobile DTV.................................................................. 224 7.5 The ATSC Mobile DTV Standard ................................................................... 225 7.6 ATSC Frame Structure with Mobile Channels ................................................ 227 7.7 Content Types, Encoding, and Capacity .......................................................... 237 7.8 Multiplexing of M/H Channels ........................................................................ 240 7.9 Upgrading Transmitters for Mobile Services .................................................. 241 7.10 ATSC Mobile DTV Transmission ................................................................... 241 7.11 ATSC Transmitter Networks ........................................................................... 242 7.12 Receivers and Handheld Units ......................................................................... 244 7.13 Data Transmission on ATSC Mobile DTV ...................................................... 245 7.14 Electronic Service Guide (ESG) ...................................................................... 245 7.15 ATSC Mobile DTV Pilot Projects and Commercial Launches ....................... 246 7.16 Example of an ATSC Mobile DTV Transmission System for Mobile TV...... 247 Chapter 8: Mobile TV Using DVB-H Technologies .............................................. 249 8.1 Introduction: Digital Video Broadcasting to Handhelds ................................. 249 8.2 Why DVB-H? .................................................................................................. 250 8.3 How Does DVB-H Work? ............................................................................... 250
x
Contents
8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11 8.12 8.13 8.14
Technology of DVB-H................................................................................... 254 DVB-H Higher Layer Protocols .................................................................... 258 Network Architecture .................................................................................... 259 DVB-H Transmission .................................................................................... 260 Transmitter Networks .................................................................................... 262 Terminals and Handheld Units ...................................................................... 266 DVB-H Implementation Profiles ................................................................... 266 Electronic Service Guide in DVB-H ............................................................. 268 Content Security ............................................................................................ 270 DVB-H Commercial Services ....................................................................... 273 Example of a DVB-H Transmission System for Mobile TV ......................... 275
Chapter 9: Mobile TV Using DVB-SH Technologies ............................................ 279 9.1 Satellite Mobile TV with a Terrestrial Component ....................................... 279 9.2 The DVB-SH Standard .................................................................................. 280 9.3 Characteristics of Satellites for Mobile Broadcasting ................................... 289 9.4 Ground Transmitters for DVB-SH................................................................. 291 9.5 Receiver Characteristics ................................................................................ 293 9.6 The ICO DVB-SH System (MIM)................................................................. 293 9.7 DVB-SH System for Europe.......................................................................... 294 9.8 Future Systems Using DVB-SH Technology ................................................ 296 Chapter 10: DMB and China Multimedia Mobile Broadcasting (CMMB) ............. 299 10.1 Introduction to DMB Services ....................................................................... 299 10.2 A Brief Overview of DAB Services .............................................................. 300 10.3 How is the DAB Structure Modified for DMB Services? ............................. 300 10.4 Satellite and Terrestrial DMB Services ......................................................... 304 10.5 DMB Services in KOREA ............................................................................. 305 10.6 DMB Services Ground Segment ................................................................... 311 10.7 S-DMB System Specifications ...................................................................... 312 10.8 DMB Trials and Service Launches ................................................................ 313 10.9 China Multimedia Mobile Broadcasting (CMMB) ....................................... 314 10.10 The DTMB Standard ..................................................................................... 319 Chapter 11: Mobile TV Using MediaFLO™ Technology ...................................... 323 11.1 Introduction to MediaFLO............................................................................. 323 11.2 How Does MediaFLO Work? ........................................................................ 323 11.3 MediaFLO Technology Overview ................................................................. 326 11.4 System Capacities and Content Types ........................................................... 328 11.5 MediaFLO Transmission ............................................................................... 333 11.6 MediaFLO Transmitter Networks ................................................................. 337 11.7 Terminals and Handheld Units ...................................................................... 338
Contents
xi
11.8 MediaFLO Electronic Service Guide ............................................................ 340 11.9 MediaFLO Commercial Networks ................................................................ 341 11.10 Example of a MediaFLO System for Mobile TV: Verizon Wireless ............. 342
Chapter 12: Mobile TV Using WiMAX ............................................................. 345 12.1 A Brief Overview of WiMAX Technology ................................................... 346 12.2 Why is Mobile WiMAX Suited for Mobile TV? ........................................... 358 12.3 WiMAX-Based Mobile TV Basics ................................................................ 360 12.4 WiMAX Devices and Handsets ..................................................................... 364 12.5 Examples of Mobile TV Services Based on WiMAX ................................... 366 Chapter 13: Spectrum for Mobile TV Services.................................................... 371 13.1 Introduction ................................................................................................... 371 13.2 An Overview of Spectrum Bands .................................................................. 372 13.3 Mobile TV Spectrum ..................................................................................... 379 13.4 Country-Specific Allocation and Policies...................................................... 385 13.5 Spectrum for MediaFLO Services ................................................................. 391 13.6 Spectrum Allocation for Wireless Broadband Services ................................ 394
Part III
Multimedia Handsets and Related Technologies............399
Chapter 14: Chipsets for Mobile TV and Multimedia Applications ....................... 399 14.1 Introduction: Multimedia Mobile Phone Functionalities .............................. 400 14.2 Functional Requirements of Mobile TV Chipsets ......................................... 401 14.3 Chipsets and Reference Designs.................................................................... 405 14.4 Chipsets for ATSC Mobile DTV ................................................................... 408 14.5 Chipsets for 3G Mobile TV ........................................................................... 409 14.6 Chipsets for DVB-H Technologies ................................................................ 413 14.7 Eureka 147 DAB Chipset .............................................................................. 415 14.8 Chipsets for DMB Technologies ................................................................... 415 14.9 Industry Trends .............................................................................................. 419 14.10 Outlook for Advanced Chipsets..................................................................... 422 Chapter 15: Operating Systems and Software for Mobile TV and Multimedia Phones ................................................................. 425 15.1 Do I Need to Worry About the Software Structure on Mobile Phones?........ 425 15.2 Application Clients ........................................................................................ 426 15.3 An Introduction to the Software Structure on Mobile Phones ...................... 430 15.4 Common Operating Systems for Mobile Devices ......................................... 435 15.5 Middleware in Mobile Phones ....................................................................... 449 15.6 Application Software Functionalities for Mobile Multimedia ...................... 452 15.7 Applications for Mobile Phones .................................................................... 455
xii
Contents
Chapter 16: Handsets for Mobile TV and Multimedia Services ............................ 457 16.1 Introduction: Do You Have a Target Audience Out There? .......................... 457 16.2 Mobile Receiver Devices ............................................................................... 458 16.3 Handset Features for a Rich Multimedia Experience .................................... 459 16.4 Handsets for 3G Services............................................................................... 466 16.5 Handsets for Terrestrial Broadcast Services .................................................. 468 16.6 Handsets for Satellite Technologies with a Terrestrial Component............... 470 16.7 Handsets for CMMB ..................................................................................... 471 16.8 Phones for WiMAX and WiBro Technologies .............................................. 472 16.9 Portable Navigation Devices (PNDs) ............................................................ 473 16.10 Can Handsets Be Upgraded with the Latest Technology?............................. 473 16.11 Summary ........................................................................................................ 474 Chapter 17: Mobile TV and Multimedia Services Interoperability ......................... 477 17.1 Introduction ................................................................................................... 477 17.2 Organizations for Advancement of Interoperability in Mobile TV ............... 482 17.3 Interoperability in Mobile TV ....................................................................... 484 17.4 Interoperability in Terrestrial Mobile TV Networks ..................................... 486 17.5 Interoperability in 3G-Based Mobile TV Services ........................................ 489 17.6 Interoperability in Mobile TV Provided via the Internet: IP Networks......... 495 17.7 Interoperability of Multimedia Services ........................................................ 496 17.8 Summary ........................................................................................................ 498
Part IV
Content and Services on Mobile TV and Multimedia Networks..........................................501
Chapter 18: Mobile TV and Multimedia Services Worldwide .............................. 501 18.1 Introduction ................................................................................................... 501 18.2 China .............................................................................................................. 503 18.3 Japan .............................................................................................................. 508 18.4 Germany ........................................................................................................ 511 18.5 Italy ................................................................................................................ 512 18.6 Netherlands .................................................................................................... 513 18.7 The United States........................................................................................... 514 18.8 Hong Kong..................................................................................................... 516 18.9 India ............................................................................................................... 516 18.10 Summary ........................................................................................................ 519 Chapter 19: Content and Revenue Models for Mobile TV .................................... 521 19.1 Introduction ................................................................................................... 522 19.2 Mobile TV Content ........................................................................................ 523
Contents
19.3 19.4 19.5 19.6 19.7 19.8
xiii
Interactive Services .......................................................................................... 530 Delivery Platforms ........................................................................................... 536 Preparing Content for Mobile Delivery ........................................................... 537 Content Authoring Tools ................................................................................. 541 Mobile TV as a Business Proposition.............................................................. 543 Summary: Focus on Content Development and Delivery Platforms ............... 546
Chapter 20: Interactivity and Mobile TV ........................................................... 549 20.1 Introduction: Why Interactivity in Broadcast Mobile TV? ............................. 549 20.2 Making Mobile TV Interactive ........................................................................ 549 20.3 3G Networks .................................................................................................... 554 20.4 Broadcast Networks and Interactivity.............................................................. 555 20.5 Summary .......................................................................................................... 562 Chapter 21: Content Security for Mobile TV...................................................... 565 21.1 Introduction: Pay TV Content Security ........................................................... 565 21.2 Security in Mobile Broadcast Networks .......................................................... 568 21.3 Conditional Access Systems for Mobile TV ................................................... 568 21.4 Examples of Mobile CA Systems .................................................................... 571 21.5 Digital Rights Management (DRM) and OMA ............................................... 571 21.6 Content Security and Mobile TV Standards .................................................... 579 21.7 Multimedia Applications: High-Capacity SIMs and Removable Media ......... 581 21.8 Examples of Mobile Broadcast Content Security............................................ 582 21.9 Models for Selection of Content Security ....................................................... 586 Chapter 22: Mobile TV: The Future.................................................................. 589 22.1 Some Initial Happenings in the Industry ......................................................... 589 22.2 Where Does Mobile TV Stand Today? ............................................................ 590 22.3 Challenges for Mobile TV and Multimedia Services in the Future................. 594 22.4 Leading Indicators for Growth in Mobile TV Services ................................... 596 22.5 Summary .......................................................................................................... 597 Glossary ........................................................................................................ 599 Index ............................................................................................................. 609
This page intentionally left blank
Mobile TV-A Prologue The economists are generally right in their predictions, but generally a good deal out in their dates. Sidney Webb, The Observer, Sayings of the Week, February 25, 1924
When mobile TV was first launched in 2005, it was perceived as one of the most important happenings that would shape the mobile industry in the coming years. But events were to prove otherwise to the disappointment, and to an extent, the surprise of a very large industry. In fact the situation in 2008 was such that many virtually wrote off mobile TV. It was only in 2009 that a dramatic turnaround in fortunes began, with mobile TV in 2010 set to reach a critical mass for a very large ecosystem of viewers, operators, handset and chip manufacturers and software developers. The reasons in hindsight are not difficult to understand, and it is also not that the industry did not valiantly struggle to overcome these. The problem is that there were too many issues. First was the issue of mobile operators and broadcasters going different ways in leveraging their own networks to provide mobile TV. This led to the use of 3G unicast streaming by mobile operators and terrestrial transmission by the broadcasters based on replication of TV programs with little or no interactivity and a handful of receivers available that could actually receive them. Second was the use and multiple standards that split networks even within the same country, as was the case in Germany with DVB-H and DMB networks, both of which eventually closed down. In addition, the regulators were not helpful with spectrum issues, which held up launches in large parts of Europe and Asia. Third, the operators did not seem to get the model right. They attempted to offer the service as pay TV, which restricted the market and the handsets available. This is evident from the success of free to air DMB-T services in Korea and ISDB-T in Japan. Korea had over 20 million users of its free ISDB-T service, while Japan had over 60 million phones sold that had tuners for its 1-Seg ISDB-T services, which are aired free. A majority of multimedia handsets in these markets now come with the mobile TV tuners and decoders built in. In contrast, the users of pay mobile TV in any market did not reach even a fraction of this number. The only exception was the 3G-based services such as MobiTV (over 6 million customers), which do not need special handsets. However, even these networks did not make a breakthrough, as operators in most markets levied high data usage charges for a bandwidth, which was at a premium. xv
xvi
Mobile TV-A Prologue
The 3G quality was also restricted for various reasons, such as low encoding resolution, usage environment and limitations of unicast streaming. It was not a surprise that the initial years left bruised operators and foreclosed networks even while the major product vendors touted successful trials in each country. In the United States, for the broadcast systems based on ATSC DTV, there was no mobile extension until as late as 2009. The initial launches of DVB-H by Modeo and Hi-Wire were closed down, as it was impractical to build entirely new infrastructure. MediaFLO, which operated on its own spectrum and provided services through AT&T and Verizon Wireless, also garnered less than half a million users in the first year of its launch due to the requirement of a separate FLO-enabled handset and the availability of the service in limited markets only. The situation changed only in 2009 when additional spectrum became available after the digital transition. The success story of AT&T was being written with the iPhone, a device that did not support mobile TV. Mobile TV was not a priority with the major operators: AT&T, Verizon Wireless, or T-Mobile. In Europe, the European Union (EU) took the bold step of declaring DVB-H as the standard to be followed across Europe. Despite this apparent advantage, mobile TV continued to face heavy challenges. DVB-H met the same fate in Germany as in the United States, where Operator ‘3’ returned the DVB-H license to the regulator. In the United Kingdom, no spectrum was made available for DVB-H, while in France and Spain, commercial launches were delayed. With the exception of Italy, the pioneer of mobile TV in Europe, no country could get even a million users, with their pay mobile TV offerings requiring special handsets and conditional access systems. The users could opt for either a substandard phone that offered mobile TV or one that burnt a hole in their pockets. Phones in use by large segments of the customers stayed out of the domain, which was addressed by the mobile operators. The model of set-top boxes as applied to mobile TV was not working. Asia, China, and India were delayed in their regulatory processes, which would have enabled the provision of mobile TV to large communities. Smaller countries did launch mobile TV, but these were prodded by the vendors and looked more like “me-too” efforts rather than a successful mobile TV offering. China came out of the time warp only in 2009, with the SARFT driving terrestrial mobile TV with CMMB standard. In order to address the split markets, new operators ventured forth with satellite-based mobile TV. In 2008, it appeared to be a panacea for all the ills of mobile TV. China, going into the 2008 Olympics, had signed a deal with CMBsat, a subsidiary of EchoStar for a high powered S-band satellite providing services over China. However, its regulators failed to give the necessary permissions for the satellite to be placed in orbit. On April 18, 2008, the ICO G1 satellite was launched and all set to provide mobile TV services for the U.S. market. In January 2009, the W2A satellite was launched for providing high-powered DVB-SH mobile TV services for Europe by Solaris after it won the license. However, all was to go
Mobile TV-A Prologue
xvii
wrong with this industry as early as 2009. The CBMsat satellite was delayed, while the W2A mobile broadcasting payload failed after its launch in early 2009. By May 2009, ICO North America had filed for bankruptcy under Chapter 11, despite having an operational satellite in orbit and an operational network on the ground. The successes of Japan and Korea again appeared to be not working elsewhere. The quest for business models was unending. Any single model, such as subscription, advertising, or sponsored content did not seem to work, as there were too few handsets except in Korea and Japan. Mobile networks did embrace multimedia, but in ways that were not predicted by analysts and research reporters. Mobile devices came with such large memories (upwards of 16 GB) that a connection to online music services was unnecessary. On-device storage of videos and music became the norm. Where video was concerned, it was YouTube and Google Video that emerged as the winners, apart from social networking sites. But in an industry with more than 4 billion mobile users, the initial fallacies in embarking on mobile TV were quickly understood. ATSC has now come out with its mobile handheld standard, ATSC Mobile DTV (formerly ATSC M/H), which can enable thousand of transmitters across the United States at a relatively low cost to also broadcast simultaneously to mobile phones. Despite apparently different mobile TV standards, the underlying technologies have converged to a set of uniform standards, such as IP-Datacasting (IPDC), the Open Mobile Alliance’s Electronic Service Guide (ESG), smartcard profiles (SCP) for content protection, and multistandard universal chipsets that can tune in to any type of transmission. After a dawn-to-dusk cycle, the sun is again rising on the horizon for mobile TV—and with a renewed intensity. The use of video content on mobile phones is entering a new phase, with customers increasingly wanting video access on their mobile phones. The number of 3G users has ballooned, as have the smartphones needed for multimedia. Equipment vendors now make multistandard transmission equipment as well as receivers, making the diverse standards not such a major issue at the end of the day. Spectrum has begun to be available after WRC 07 and the digital transition in which was completed in 2009. The launch of CMMB in China has led to a massive uptake of mobile TV. According to an In-Stat report on China1 released in 2006, the number of mobile TV users in China was predicted to grow at a compound annual growth rate of over 315% in the next five years. It is now estimated that by 2012, more than 20% of users will be using mobile TV. The scales will be tilted by the increasing use of free-to-air broadcast networks, including ATSC Mobile DTV in the United States, and the spread of mobile TV to user communities in China and India. There are likely to be four major streams for the growth of mobile TV. The first will continue to be the mobile operators, where improved quality will be offered through the upgrades to 1
Mobile TV in China, Anty Zheng – Research Director, In-Stat China (http://www.instat.com.cn/index.php/ archives/672)
xviii
Mobile TV-A Prologue
3GPP standards and the use of MBMS. These operators will also embrace LTE by 2012. The second stream remains that of broadcasters, which are scaling up the operations as spectrum and standards issues get resolved. The third stream is that of wireless broadband (including mobile WiMAX, a technology that has weathered many a storm and is now here to stay, with more than 500,000 users being added per quarter) and broadband for all plans on the horizon in the United States. The fourth category of providers is that of satellite-based mobile TV providers with a terrestrial component. This book is a second journey into the exciting world of mobile TV and multimedia, with new operators, technologies, and business models.
Introduction to the Second Edition The trouble with doing something right the first time is that nobody appreciates how difficult it was. Walt West
This book is exclusively dedicated to mobile TV, which is the killer application of the twenty-first century, riding on the success of 3G mobile networks, transition to digital TV, and wireless broadband. A lot has changed since mobile TV initially appeared in 2005. 3G networks have achieved a critical mass of over 500 million users. There have been breakthroughs in terrestrial broadcasting of mobile TV across countries, addressing potentially a billion additional users in 2010 alone. It today presents an opportunity that is unparalleled in history. This is an opportunity for service providers, content producers, application developers, handset vendors, and users alike to target high revenue generating applications. This revised edition is about the new opportunity. It provides a comprehensive overview of the entire landscape, answers all your questions, and provides all the tools you need to be a meaningful player in the new markets.
About This Book Even though mobile TV is slated to grow exponentially in the very near future, concise information on the subject continues to remain scattered. It is true that many of the technologies have recently emerged from the trials, but the basic bedrock of the structure on which such services will be based is now firmly in place. No single week passes by today when a new commercial launch of mobile TV somewhere in the world is not announced. The standards for the services have the status of recommendations of ATSC, DVB, ETSI, ITU, and 3G Partnership projects. The implementation is swift and multifronted—in the form of technology itself as well as every other form: handsets, applications, chipsets, software, operating systems, spectrum, transmission technologies, and even content writing for mobile TV. The book provides a comprehensive introduction to the technological framework in which such services are being provided, with extensive clarity on how one type of service, for example, a mobile TV service based on 3G (MobiTV™, AT&T®) differs from DMB service in Korea or CMMB in China or ISDB-T in Japan. Will it be possible to use one handset for xix
xx
Introduction to the Second Edition
all these services? What types of services can be expected on mobile networks? What are the techniques used for digital rights management on these networks? What spectrum will they use? What limitations do they have? What quality of viewing can they offer? What type of content will make such networks work and how will it make money? Mobile multimedia has brought about a profound change in the industry. The handsets are now designed to deliver multimedia rather than voice. They support large, 3-inch WVGA screens, stereo speakers, A2DP Bluetooth, media players, and 16 GB flash memories. Their software is empowered to deliver content tailored for cellphones or mobiles with rich animations. It is a different world, carrying with it smaller screens, and requiring lower data rates to carry the information, but in a much more challenging delivery environment. It deals with media formats that are unique to the mobile domain. It deals with players that are for mobiles and with browsers that are unique to the mobile world. It also deals with technologies that not only deliver content but also provide mechanisms for its payment and user interactivity. The growth of mobile TV brings challenges for everyone. The users now have a very powerful device in their hands that can do much more than connect calls or play music. Are they ready to use such services? The operators are aggressively launching services. Are the content providers ready for them? Is the content secure? What type of advertising will work on such networks? What are the technology options for operators and service providers and customers? Are the regulatory authorities ready to enable the environment for mobile TV? What spectrum will be available for such services? What are the limitations for services based on each individual technology? The book addresses all these questions.
About the Second Edition The technology and markets for mobile TV have changed dramatically in the very recent past. In July 2009, the ATSC Mobile DTV transmitters went on the air, signifying a new era in the United States, where most local stations will have a mobile simulcast based on the newly recognized ATSC Mobile DTV standards. CMMB, a mobile TV standard for China, has spread to about 200 cities by end of 2009, and 3G is now enabled in China and India. MediFLO technology has had a new lease on life with additional spectrum having been released in the United States with DTV transition and its recognition as an approved technology for mobile TV in Japan, the largest mobile TV market in the world, and a bastion of ISDB technologies. This revised second edition is a completely rewritten volume that updates technologies, services and media formats and presents all information in a practical framework. Four new chapters have been added on ATSC Mobile DTV, MediaFLO technologies, WiMAX, and DVB-SH, while information on others such as CMMB has also been added in detail. The book is divided into four parts: Part I: Overview of Technologies Part II: Technologies for Mobile TV and Multimedia Broadcasting
Introduction to the Second Edition
xxi
Part III: Multimedia Handsets and Related Technologies Part IV: Content and Services on Mobile TV and Multimedia Networks Part I begins by laying down the fundamentals that go into the mobile multimedia networks, such as those that deliver mobile TV. Though digital multimedia is discussed in brief, the key focus is on mobile multimedia. Part I also gives an overview of Mobile Networks worldwide as well as an overview of technologies for mobile TV. The need to carry mobile TV and rich media applications has led to 3G networks evolving rapidly in order to add higher data carrying capabilities with HSDPA, EV-DO, and LTE. This book seeks to piece together the technologies of video, audio, data, and networks that make mobile TV possible and presents an integrated view of the interfaces, services, and applications that will frontline the developments of mobile TV in the coming years. These are discussed in two chapters on “Overview of mobile networks”( Chapter 4) and “Overview of technologies for mobile TV”(Chapter 5). In Part II, the book discusses each of the mobile TV technologies, including those based on 3G, ATSC Mobile DTV, MediaFLO, DMB and CMMB, DVB-H, and WiMAX in detail, with one chapter devoted to each service. The technology-specific chapters dwell on all aspects of the services ranging from standards, protocols, transmission, ESG, broadcast characteristics, and examples of networks where these are implemented. The rollout of mobile TV is also closely linked to the availability of spectrum as a resource. One chapter (Chapter 13) is devoted to spectrum for mobile TV services and the manner of rollout in various countries. This chapter presents the information in a holistic manner, including the impacts of digital dividend post-digital transition and WRC 07 harmonized allocations. Interoperability issues between networks and roaming have proved to be very important in the past, and will be more so in the future. Interoperability for mobile TV and multimedia networks is discussed in a separate chapter (Chapter 17). Mobile TV has spawned many new industries and fast-paced developments are happening in operating systems for mobile devices, application software, chipsets, and the handsets themselves. The industry is aware that the past growth has been possible due to increasing volumes and continuously lowering prices. The revenues that can be derived from the networks will depend on understanding the optimum multimedia formats and delivery modes, smartphones, feature phones available in the market, and how they can be addressed. The new handsets and user devices present in all cases frontline developments in each area of technology ranging from satellite or terrestrial tuners to multimode devices such as portable navigation devices or personal media players. Part III of the book is exclusively dedicated to presenting the new devices and what drives them. We discuss the chipsets, operating systems, and handsets for multimedia in Chapters 14, 15, and 16.
xxii
Introduction to the Second Edition
Finally, Part IV of the book, devoted to content, presents a series of interlinked chapters on content types that can be delivered along with their preparation tools, user interactivity, and content security. Although mobile TV will undoubtedly have its share of live TV channels, a host of new content best suited for viewing on the small screens is already appearing and will be the key to the usage and growth of mobile TV services. Mobile environment needs content specifically designed that can be compelling to watch. The content for mobile TV, already a specialized business, will be more so in the coming years. Along with the content, the delivery platforms for such content are equally important. This book discusses the emerging trends and prerequisites in this regard. Mobile networks have emerged as important vehicles for delivery of content. However, such delivery of content needs to be secure and the license holders need to able to exercise rights on how the content is used. Content security technologies common across the industry such as OMA BCAST and smartcard profiles are discussed.
Intended Audience The book is primarily intended to give a coherent view of the world of mobile TV and multimedia applications on mobile networks. It offers an insight into the maze of technologies, processes, and dimensions involved in providing the mobile TV services. The book—while technical—does not contain any formulae or mathematical calculations that go into the design of networks. It has been planned in a manner to benefit all those in the mobile industry, such as professionals, engineers, and managers, as well as students and the academic community. The mobile industry directly or indirectly comes into contact with every individual, and extensive work is being done to further the capabilities of the networks. The book is intended to help all those who are in any manner connected with mobile networks and multimedia, as they need to get a complete picture on what is happening in the field and how they can be a part of the momentum. It helps users, content providers, and operators, as well as those who are planning such services understand the dimensions of the new medium, which is the best possible integration of communication, broadcasting, and multimedia technologies. The understanding of the basic technologies and all related developments in the field prepare the ground for an easy introduction to the complex world of mobile TV, which will be essential for success in the coming years.
How to Read This Book Any of the four parts of the book can be read independently, with the other parts being used for a reference to the technologies or networks in use. However, as mobile TV and multimedia networks are characterized by their own file formats, encoding technologies, and content delivery mechanisms, it is useful to read through the book in sequence if time permits. Readers will find some repetition in the content in some chapters, which was
Introduction to the Second Edition
xxiii
necessary to present the matter in a self-contained format without excessive referrals to other sections or chapters.
Acknowledgments The information in a book of this nature is based on the work of numerous standards bodies, industry organizations, and operators who have deployed the technology in their networks. These include the OMA, DVB, ATSC, ETSI, ITU, 3GPP, CDG, GSMA, and many others. I would like to thank Paul Temme, Senior Acquisitions Editor at Focal Press, who not only encouraged me to write this extensively revised edition but also provided valuable suggestions. I would also like to thank Anais Wheeler, who managed the production of the book in the most friendly and efficient manner. Finally, I would also like to thank the many readers who provided valuable input after the first edition, which makes the second much more practical and aligned to readers as well as the industry.
Amitabh Kumar
[email protected]
This page intentionally left blank
PA R T 1
Overview of Technologies Do we really need another way to rot our brains? Yes, yes we do—and live TV on our phones is just the ticket. Danny Dumas in Gadget Lab,Wired (www.wired.com/gadgetlab/2008/05/review-att-mobi/)
CHAPTE R 1
About Mobile TV Television? No good will come of this device. The word is half Greek and half Latin. C. P. Scott, journalist (http://en.wikiquote.org/wiki/Television)
Are you one of those who is fascinated with the idea of being able to deliver content to mobile devices? Or by the new mobile Flash Player, which lets you watch amazing streaming videos from thousands of sites? Or record a movie using a Handycam®, then edit it and post it on your website? Or seen a game of baseball on MobiTV or VCAST? Or are you a content producer, broadcaster, or network operator who is at the other end of the line feeding content to millions of users? Are you intrigued by the P2P networks and the way they deliver video and audios? You have lots in common, then, with many others who are deep into the world of handling audio, video, and pictures on mobile networks. Join me on a practical journey together into the realm of mobile TV, which has emerged as the most effective way to deliver high-quality interactive content—and get paid for it. Over 6 million users are using just one of the services of mobile TV (MobiTV). Millions more are connected to other networks, some based on 3G streaming, while others are using terrestrial broadcast much like digital TV for the big screens.
© 2010 Elsevier, Inc. All rights reserved. DOI: 10.1016/B978-0-240-81287-8.00001-1
1
2
Chapter 1
1.1 The Beginning For the first time in the history of the Emmy awards, a new category was created for 2006: for original production of content that is designed for the new platforms, including PCs and the mobile world—cellphones, PDAs, Palm devices, iPods® and iPhones®, and the platform of mobile TV. Seventy-four entries were received, more than in any other Emmy awards category. The entries included “24 Mobisodes” from leading Hollywood studio 20th Century Fox. In October 2006, content industry’s biggest event, MIPCON 2006, described mobile TV as the most significant wireless trend for the mobile industry in the coming years. The excitement in the industry was not unfounded, as the events to unfold were revealed. Every major in1ternational event since has been telecast live on mobile TV, from the Olympics to President Obama’s election. However, a real breakthrough had still eluded the industry. It is at the turn of the year 2010 that the long-awaited breakthrough is finally in sight: the transition to 3G networks, which were nascent in 2006 but had grown to over 500 million users at the close of 2009. The first quarter of 2009 added 50 million active 3G subscribers (as reported by Maravedis®), indicating that the 3G is now adding at least 200 million users a year. Subscriber numbers will double in the next two years, with countries in Europe and Asia, including India and China, enabling the 3G networks and creating a pool of nearly a billion customers with mobile multimedia devices. This now is the new audience, not counting millions of terrestrial mobile TV receivers. They are ready to receive mobile TV, multimedia, and advertising, and to generate interactive content. They will be buying more than a billion smartphones in the next three years. An equally powerful sequence of events is being staged in the field of terrestrial broadcasting. The digital transition has finally been completed in the United States, releasing the newly auctioned spectrum to players such as MediaFLO, which has triggered their nationwide rollout. The broadcast industry also got its act together and agreed on different but regionally harmonized standards such as ATSC Mobile DTV (formerly ATSC M/H) for the United States, DVB-H for Europe (and parts of Asia), DMB for Korea, and ISDB for Japan. China was a surprise—out of multiple standards such as DMB and DTMB, a single implementation of mobile TV, CMMB, rose like a phoenix from the flames of the summer Olympics in 2008. Before 2009 was finished, over 200 cities and provincial markets were live with CMMB. Today China is the fastest-growing market for mobile TV in the world, taking even seasoned industry observers by surprise. India is next, with 400 million users waiting for 3G and a countrywide rollout of mobile TV before the sun rises on the Commonwealth Games in November 2010. There is now no single area where the focus of delivery is greater than those of mobile receivers. These receivers are not mobile phones alone. Far from it: these include standalone receivers, navigation devices, personal media players, and car receivers. The production
About Mobile TV
3
of content and applications for the tiny screens of mobile TVs and navigation devices has indeed unleashed the imagination of the industry with the production of short-form programs and original content designed to be effective even for the limited span of time available for viewing.
1.2 Mobile TV: A New Reality Mobile TV is now a reality. The technology, though new, has been proven. It is inconceivable that a major global event or news will now not be available on the mobile TV medium for major future entertainment, sports, or other national or international events. Operators have started gearing up their networks for adding mobile TV services or have rolled out entirely new networks. There are over 4 billion mobile users around the globe with over 500 million smartphones capable of handling mobile multimedia. The growth in the markets is expected to be exponential, and will be aided by the falling price of handsets and better harmonization of standards. The price of chipsets for mobile TV has already fallen below $10, opening the way for advanced handsets to be inexpensively available. The price points of the chipsets such as mobile TV receiver are expected to fall below $5 in the next year.
1.2.1 What is Mobile TV? Mobile TV is the transmission of TV programs or video for a wide range of wireless devices ranging from mobile TV–capable cellphones to PDAs and mobile receivers usable in every conceivable mode of transport. The programs can be transmitted in a broadcast mode to every viewer in a coverage area or be unicast so as to be delivered to a user on demand. They can also be multicast to a group of users. The broadcast transmissions can be via the terrestrial medium, just as analog or digital TV is delivered to our homes, or can be delivered via using high-powered satellites directly to mobile devices. The transmissions can also be delivered using the Internet as the delivery mechanism.
1.2.2 How is Mobile TV Different from Ordinary Terrestrial or Satellite TV? Mobile phones constitute an entirely different domain. The phones come with screens that are tiny in comparison to a standard TV. They have a limitation on the power consumption as preservation of the battery and talk time is of paramount importance. Every device in the cell is designed with features that can conserve power. The processors in cellphones, though powerful even in comparison to PCs just a few years back, cannot be harnessed to run complicated encoding or decoding tasks or format and frame rate conversions. The cellphones are connected via 3G cellular networks, which can support high data rates for multimedia but are not designed to handle the 4–5 Mbps needed for a standard-definition TV. Hence though there are cellphones that can receive ordinary TV telecasts, they are not really ideal for such use.
4
Chapter 1
Mobile TV is a technology that has been specifically designed to fit in the mobile world of limited bandwidth and power and small screens, yet add new features such as interactivity via the cellular network. Taking advantage of the small screen size, the number of pixels that need to be transmitted is reduced to roughly 1/16 a standard-definition TV. Digital TV today is based on the use of MPEG-2 compression, mainly because this was the best compression available in the 1990s when widespread cable and satellite delivered TV became common. Mobile TV uses more efficient compression algorithms, such as MPEG-4, Flash Lite, or H.264, for compressing video and audio—and with visual simple profiles. Compressing voice efficiently has been the hallmark of cellular networks using audio coding in AMR or QCELP. In mobile TV, we need high-fidelity stereo, and the use of audio coding using Advanced Audio Coding (AAC) based on MPEG-2 or MPEG-4 has become the norm. In the 3G world, which is characterized by the need to use bandwidth efficiently to accommodate thousands of users in a cell area, file formats based on cellular industry standards such as 3GPP (Third-Generation Partnership Project) are commonly used. Based on transmission conditions, cellular networks may also reduce the frame rates or to render frames with lower number of bytes per frame. However, reducing the bit rates needed to deliver video is not the only characteristic of mobile TV services. The broadcast technologies have been specially modified to enable the receivers to save power. The terrestrial broadcast mobile TV services, such as DVB-H or ATSC Mobile DTV, use a technique called time slicing, which allows the receiver to switch off power to the tuner for up to 90% of the time while showing uninterrupted video. The transmissions also incorporate features to overcome the highly unpredictable signal reception in mobile environments by providing robust forward error correction. Mobile environments are also characterized by users traveling at high speeds—for example, in cars or trains. Standard terrestrial transmissions based on ATSC (Advanced Television Systems Committee) or even DVB-T (Digital Video Broadcasting –Terrestrial) are not suited for such environments. This is due to the use of orthogonal frequency division multiplexing (OFDM) in DVB-T, where the 8000 carriers, which are used for the modulation, appear to be at different frequencies than intended. For this purpose, a special modulation technique—that is, COFDM with 4K carriers—is used. ATSC, which uses 8-VSB, is subject to severe multipath fading and uses a distributed transmission system (DTS), which is needed to overcome these effects. Mobile TV has spawned its own set of standards for terrestrial, satellite, and 3G cellular network deliveries.
1.3 What Else is Different in Mobile TV? Mobile TV is designed to received by cellphones, which are basically processors with their own operating systems (e.g., Windows Mobile™) and application software packages (e.g., browsers, mailing programs). The handsets support the animation and graphics software such as Java or Adobe Flash, players such as Real Player or Windows Media, and so on. Content providers have been aware of these capabilities and hence have designed content
About Mobile TV
5
that now takes advantage of the devices on which these will be played. The new content that is prepared for mobile TV takes advantage of intermixing rich animations, graphics, and video sequences, which play either natively or through software clients on mobile phones. The bandwidth used to deliver a flash animation file is a fraction of that used for delivering the same length of video. This means that mobile phones with all the limitations can indeed display very appealing content and presentation for simple programs such as weather or news. They can also be used to create entirely new interactive services such as voting, online shopping, chat, or mail, which are delivered with video music and animations. Mobile TV programming is delivered with a new interactive electronic service guide (ESG), which makes access to content and its purchase much easier. It needs to be delivered with a user interface (UI) that makes reading on the tiny screens much easier and intuitive through use of widgets or interactive icons. The animation software such as Java or Flash that is basically taken from the PC world is again not ideally suited for the constrained environment of mobile sets. This has led to the need to adopt profiles of implementation that are suited for mobile devices. Java MIDP, Flash Lite™ profiles, and graphics delivered via scalable vector graphics (SVGTiny or SVG-T) are results of marathon standardization efforts across the industry to make a uniform environment for creation and delivery of content.
1.4 Standards for Mobile TV Watching mobile TV appears deceptively simple. After all, it is carrying the same programs that were being broadcast anyway. But this simplicity hides a vast trove of technologies and standards that have been developed over time to make the feat of bringing TV to the small 2-inch screens possible. Audio enthusiasts have long been used to handling over 30 types of audio file formats ranging from simple .wav files to .mpg, Real, QuickTime, Windows Media, and other file formats. Video has been available in no fewer than 25 different formats, from uncompressed video to MPEG-4/AVC. Moreover, video can be shown in a wide range of resolutions, frame sizes, and rates. It has been a massive job for the industry to come together and agree on standards that will be used as a common platform for delivering mobile TV services. The standards may differ slightly based on technology, but the extent of harmonization that has been achieved in a time frame as short as a decade reflects a new life cycle of technology and products. The effort required countless groups to work together—chip designers, handset manufacturers, software developers, TV broadcasters, and mobile operators being amongst hundreds of stakeholders involved. It also required the content generation industry to design content for the mobiles, the broadcasting and the cellular mobile industry to prepare the transmissions systems, and security specialists to come up with new ways to secure content. The change, which became abundantly clear with the advent of mobile phones, had been in the air for quite some time. Mobile phones are no longer “phones,” but are multimedia
6
Chapter 1
devices for receiving and creating content, entertainment, and professional use. The handsets can be connected to PCs, digital and video cameras, office systems, and a host of other devices to deliver or play multimedia files or presentations.
1.4.1 Resources for Delivering Mobile TV A mobile phone is a versatile device. It is connected to cellular networks and at the same time receives FM broadcasts through its FM tuner or connects to a wireless LAN using Wi-Fi. The delivery of mobile TV can similarly be multimodal through the 3G networks, Wi-Fi, satellite, or terrestrial broadcast networks. In all these manifestations of delivery, a common necessary resource is the spectrum. The rapid growth of mobile TV and its momentum and scale was indeed an event not foreseen by the industry, though not all may agree with this statement. The result has been that the mobile TV industry has been left scrambling to search ways to find spectrum and deliver mobile TV. In Europe, the traditional TV broadcast spectrum in UHF and VHF stands occupied by the transition to digital TV and the need to simulcast content in both modes. The United States, after completion of its digital transition, has auctioned the excess spectrum, which has enabled technologies such as MediaFLO (AT&T mobile TV and Verizon VCAST) to cover all markets in the country. In Korea the DAB spectrum for audio broadcast services was used to deliver mobile TV services in a format named as Digital Multimedia Broadcast-Satellite or S-DMB. The government also allowed the use of the VHF spectrum for mobile TV services and this led to the terrestrial version of the DMB services, called T-DMB, being launched and used in Europe including in Germany and Italy. DVB-H is a standard largely designed to use the existing DVB-T networks and ideally use the same spectrum. This is indeed the case in many countries with the UHF spectrum being earmarked for such services. The United States has now adopted the ATSC Mobile DTV standard for mobile TV, which will enable virtually all digital TV stations to simulcast content for reception on mobile devices. In Japan, which uses ISDB-T broadcasting, the industry chose to allow the same spectrum to be used for mobile TV with technology called 1-Seg broadcasting. The scramble to provide mobile TV services by using the available networks and resources partly explains the multiple standards that now characterize this industry. Serious efforts are now on to find spectrum and resources for mobile TV on a regional or global basis, which will in the future lead to convergence of the standards.
1.4.2 The Mobile TV Ecosystem It is not only the TV viewers or content producers that constitute the mobile TV community. The new multimedia phones that can display mobile TV can also play music, and that too is directly taken off the networks rather than downloaded from a PC. A new content industry for sale to mobiles was born. The new opportunities unleashed by software for mobile TV
About Mobile TV
7
and the content development in Java or Flash, made in one go millions of software developers working in these fields a part of this industry, and their products are now available through the application marketplaces. The chipset industry needed to come up with specialized mobile chips for handling multimedia, content security, and connectivity. The family expanded with new content creators, content aggregators, music stores, and e-commerce platform developers. The need to protect content so that the rights holders could receive their dues (unlike the early days of Internet content sharing) led to serious measures for digital rights management or DRM. The traditional community of content production of Hollywood indeed expanded manifold, encompassing all in the industry, including cellular operators, broadcasters, content producers, or those in the vast software, hardware, and services industries.
1.5 New Growth Areas with Mobile TV Although mobile TV may appear to be an end in itself, it is in fact a part of the portfolio of multimedia services that can be delivered by the new generation mobile networks. It is thus in company with YouTube, Twitter, Facebook, multimedia messaging (MMS), video calling, and multimedia client server, Java applications, location-based services (LBS), instant messaging, and so on. In fact, the increasing use of multimedia was a foregone conclusion after the success of i-mode services in Japan, which demonstrated the power of the data capabilities of the wireless networks. The launch of FOMA (Freedom of Multimedia Access) services, with its 3G network, took interactivity and multimedia applications to a new level. The new generation of networks empower users to generate their own content, which can be broadcast or shared with others. The rich media services have become a part of all advanced third-generation networks.
1.6 What Type of Opportunity Does Mobile TV Present? What is available today as mobile TV is only the tip of the iceberg. Although there are over 200 networks operational today for delivering mobile TV, what is happening right now is a major move toward regionally harmonized broadcasting using new technology networks such as ATSC Mobile DTV, FLO, or DVB-H, and open standards for ESG, encryption and content. This is changing the landscape for the content providers and network operators in being to target larger audiences more efficiently while the users benefit from open handsets and lower priced offerings. Using the mobile networks to address over 4 billion mobile users is an exciting idea that drives content producers to the MIPCOM. It is even more exciting to be able to target hundreds of millions of devices that have the capabilities to process and deliver video in real time. Broadcasters crowd the NAB or IBC mobile TV forums to get the quickest entry into the new world of mobile broadcasting.
8
Chapter 1
However, it is not only the number of subscribers or revenues that reveal the future potential of mobile TV. The medium is much more personal, direct, and interactive, a significant departure from broadcasting to a faceless set of customers, which is what most broadcast environments provide. Mobile TV provides a new opportunity to a wide range of users. The users get new power from the multimedia capabilities built into the handsets, which now include video and audio and multimedia applications properly configured to deliver live TV or video on demand. The nature of content needed for mobile networks is different, so the media industry also gets an opportunity to create new distribution platforms, target advertising, and reuse existing content for the new networks. The broadcast and cellular operators have been seeing a new growth market and there is considerable new opportunity for the manufacturing and software industries.
1.7 What Handset Types Does Mobile TV Work On? The capability to receive mobile TV is today largely dependent on the delivery network. Terrestrially delivered mobile TV can be received only with handsets that have a tuner specifically built in for the type of broadcast, e.g., DVB-H or MediaFLO. Such handsets may be operator-specific and the choice may be limited to just a few types. In markets such as Korea or Japan, where free-to-air transmissions exist, more than 80% of handsets have a tuner built in. Most 3G multimedia smartphones, on the other hand, permit reception of streamed mobile TV.
1.8 Is Mobile TV Really Important? A question that has been asked in millions of mobile TV blogs was whether mobile TV was really that important. Would anyone really watch TV on the sets once the initial craze was over? The answer, it would appear from initial responses, is probably positive. This is so because the mobile TV can be available widely through broadcast networks, and watching it is not necessarily expansive. The users are today on the move, and refreshing new content and updates, fun, and music seem to be always welcome, as do the opportunities to remain connected using the new generation of smartphones. Continuous additions to mobile phone capabilities, beginning from a simple camera, MP3 player, FM radio, and now mobile TV have now shifted the handset from a mere calling and answering device to being squarely a part of an advanced entertainment, Internet access, gaming office application, mobile commerce, and utility device. We are now squarely in this new age.
CHAPTE R 2
Introduction to Digital Multimedia There must be an ideal world, a sort of mathematician’s paradise where everything happens as it does in textbooks. Bertrand Arthur William Russell
2.1 Introduction When mobile TV was initially implemented, it was seen as a textbook case of transporting large-screen content to mobile devices. However, things did not work out as expected. This was no different from the way mobile websites had gone in the initial days of wireless access protocol or WAP. There was a certain uniqueness about the mobile world—the small screens, interactive applications, and creative users that made it a different world from the relatively passive large-screen TV. The world of digital video is indeed very challenging. It involves delivering video to a range of devices, from tiny mobile screens to giant digital cinemas. In between lies a wide range of devices such as HDTVs, DTVs, PDAs, and so on with varying sizes and resolutions. The delivery may be via terrestrial, satellite, or cable systems; DTH platforms; IP TV; 3G networks; or mobile broadcast networks such as ATSC, DMB, or DVB-H. All these are made possible by standards and technologies that define audio and video coding, transmission, broadcast, and reception. The basic elements of the digital transmission system are however very simple. These constitute a still or moving picture (video) and audio in one or more tracks. The audio and video are handled using compression and coding standards and transmitted using well-defined network protocols. An understanding of the coding formats and standards and protocols and standards for transmission is useful to fully understand the dimensions of mobile TV and other frontline technologies. We begin our journey into the world of mobile TV by taking a broad overview of the media types used in the digital domain. Audio and video compression is a very common topic and the only reason we need to discuss it here is that multimedia in mobile networks is handled with specific formats and characteristics. The quality of what you see on the mobile screens is totally determined by some of the characteristics we define in this chapter. © 2010 Elsevier, Inc. All rights reserved. DOI: 10.1016/B978-0-240-81287-8.00002-3
9
10
Chapter 2
Figure 2.1: Broadcasting environment today.
Digital video on the web and on broadcast networks has traditionally employed different resolutions and techniques of coding. As the broadcast networks begin to target the mobile devices, an understanding of the multimedia formats with origins in different domains becomes very important. Mobile networks are characterized by transmissions at speeds much lower than the standarddefinition TVs and require audio and video to be compressed by very efficient algorithms such as MPEG-4 with limited profiles. The mobile devices present a very constrained environment for applications, owing to the limitations of power, processor, and memory capacity. This implies that they can handle only visual simple profiles of the video comprising limited objects suited for the tiny screens. We look at the pictures, video, and audio and the manner in which they are compressed for mobile networks.
2.2 Picture The basic element of multimedia is a picture. The picture in its native format is defined by its intensity, color, and size. For example, a picture presented on a full screen of a VGA (Video Graphics Array) monitor would be represented by 640480 pixels. The size of the picture file would be dependent on the number of bytes used to represent each pixel. For example, if a picture is stored as 3 bytes per pixel, the picture size 640 480 3 921,600 or 921 kilo bytes KB or 921KB. The image size is thus represented as 0.92 mega bytes (MB) and the picture quality is represented as 0.297 megapixels. The same picture on an extended graphics array (XGA) monitor (1024 768) would be displayed on a higher resolution with file size of 1024 768 3 2359.2 KB or 2.4 MB. The picture resolution is 0.78 megapixels.
Introduction to Digital Multimedia
11
Figure 2.2: A picture.
2.2.1 Image File Sizes A picture is represented by pixels, and the number of pixels has a direct bearing on the image’s file size. An image as transmitted for standard-definition TV (CCIR 601, now called ITU BT.601) is represented by 720576 pixels (or 720480 for NTSC), i.e., about 300 K
Figure 2.3: Screen size and pixels. (Courtesy of 3G.co.uk)
12
Chapter 2
pixels. The same image if represented on a mobile TV screen could be represented as 352240 and would only need 82 K pixels. An HDTV transmission with 19201080 pixels will need 2 M pixels for displaying on one screen. In general, different screen sizes and resolutions can be represented by different pixel counts. The pixel count and its representation in respect of number of bits directly reflect the quality.
2.2.2 Image Resolution: Communications, Computer, and Broadcast Domains A number of image formats are used for carrying video at lower bit rates and for lower resolutions. One of the early formats was the CIF (Common Intermediate Format), which was needed for applications such as video conferencing, which connect across national borders. As the ISDN lines supported only 64–128 kilo bits per second (Kbps), full-screen resolution could not be supported. The CCIR H.261 video conferencing standard, for example, uses the CIF and Quarter CIF (QCIF) resolutions. The CIF format is defined as 352240, which translates to 240 lines with 352 pixels per line. QCIF format is also used for low-bandwidth applications or on a web page and has only 178144 pixels. The use of CIF and QCIF notations to determine the picture “window size” is quite common in the telecommunications and Internet domains. Analog broadcasting is based on one of the three standards: PAL, NTSC, and SECAM. The key transmission parameters can be represented by NTSC (525 line, field frequency 60 fields/ sec), PAL (625 lines, field frequency 50 fields/sec), or SECAM (625 lines, field frequency 50 fields/sec). (The actual number of active lines transmitted are however lower—480 for NTSC and 576 for PAL video.) The digital representation of TV signals was standardized in the MPEG-1 standard, which used the term Source Input Format or SIF. The SIF for NTSC was defined as 360240 (active pixels 352240); for PAL it was defined as 360288 (active pixels 352288). This is the resolution
Figure 2.4: Image representations.
Introduction to Digital Multimedia
13
used in the VCDs. It is thus evident that the SIF for PAL is identical to the CIF format used in the communication domain, except for the fact that in the aspect ratio of pixels used is 1.22 in CIF as opposed to 1.33 (4/3) in PAL. The CIF format of 352240 was chosen so as to be divisible into 88 “macroblocks” of pixels for compression, as you will see later in this chapter. For the computer industry, using video monitors, mentioning resolution in the form of VGA is much more common, as this was the resolution of the erstwhile color video monitors (monitors used today have much higher resolutions, such as XGA [1024768] and SXGA [1280960]). The VGA resolution is 640 pixels480 lines. A Quarter VGA (QVGA) is then 320240 and has 0.768 megapixels. QVGA is a commonly used format in mobile TV applications, although VGA and CIF resolutions are also used, depending on whether the service originates on a cellular network or a broadcast network. QVGA is sometimes also called Standard Interchange Format (SIF) as defined in the computer industry and represents the same resolution as in the source input format (SIF) in the broadcast domain. There are other sizes that can be used to define an image. These can be one half or 1/16 of a VGA screen (i.e., 160120 or QSIF). As the displays moved to higher resolutions, new formats that represented the higher pixel densities as well as higher aspect ratios (e.g., 16:9 instead of 4:3) became common. Table 2.1 depicts the pixels recommended for various image size applications. Table 2.1: Common Display Formats. Broadcast Domain Format SIF (PAL) SIF(NTSC) 480i (SDTV NTSC) 480p
Pixels 352288 352240 704480 704480
720p (HDTV) 1280720 1080i,1080p 19201080 HDTV QSIF 176144
Computer Displays/Mobile Aspect Ratio
Format
4:3 QVGA 4:3 CGA 4:3 or VGA 16:9 4:3 or WQVGA 16:9 16:9 WVGA 16:9 SVGA
Communications
Pixels
Aspect Ratio
Format
Pixels
Aspect Ratio
320240 320200 640320
4:3 4:3 1:1
CIF QCIF 4 CIF
352288 176144 704576
1.2:1 1.2:1 1.2:1
400240
5:3
16CIF
14081152
1.2:1
768480 800600
8:5 4:3
SQCIF Web 720
12896 720540
1.33:1 4:3
Web 720HD Web 360 Web 360HD Web 640 Web 640HD
720400
16:9
360270 360:203
4:3 16:9
640:480 480270
4:3 16:9
4:3
XGA
1024768
4:3
Cinema 2K Cinema 4K
19981080 39962160
1.85:1 1.85:1
WSXGA SXGA
1280720 12801024
16:9 5:4
Academy 2K Academy 4K
18281332 36562664
1.37:1 1.37:1
WXGA UXGA
1368766 16001200
16:9 4:3
14
Chapter 2
Quick Facts Image Resolution and Display Resolution The distinction between the image resolution and the display quality should be appreciated. The quality of display depends on pixels per inch (ppi) on the display device. An HD image with 19201080 resolution will display well on a TV screen but will not have the same perceived quality of display on a cinema screen, as the pixels per inch will be too low and pixel size too large. Digital cinema has a typical resolution of 4 K, in full aperture (40963112), and has 12 million pixels per frame as opposed to 2 million for HDTV.
Figure 2.5: Image quality and pixel size (or grain size). Most mobile phones today have high enough resolution to support either 320240 or 640480 image resolution. An iPhone 3G, for example has a screen resolution of 480320 pixels, which is half of VGA resolution. However, it has 163 pixels per inch, giving a good perceived image quality.
In general, the screen can display only low-resolution images in mobile phones. This is the reason why a low-resolution camera (i.e., a VGA or CGA camera) is placed near the screen for video calls. Most phones also have cameras that can go up to 8 megapixels. However, such pictures displayed on the screen of the mobile phone
Introduction to Digital Multimedia
15
Table 2.2: Image Resolution. Print Size Wallet 45 inches 57 inches 810 inches
Megapixels
Image Resolution
0.3 0.4 0.8 1.6
640480 pixels 768512 pixels 1152768 pixels 15361024 pixels
Figure 2.6: Picture quality by pixels. (Courtesy of cnet.com)
look quite different, due to the lower resolution of display! For the same size picture, quality can vary widely based on the resolution of the camera used. The need for high resolution can imply very large pixel counts for digital images. As an example the Kodak recommended image resolutions for different picture sizes are listed in Table 2.2.
Quick FAQs 1. What is a high-resolution-screen phone? One of the phones with a high-resolution screen is the Sharp Aquos Fulltouch 931SH. It has a resolution of 1024480, which brings it to par with that of XGA resolution, except for the smaller screen size, represented by 480 lines. The phone, incidentally, also has a 1-Seg TV tuner, which can be used mobile TV using ISDB-T.
16
Chapter 2
Figure 2.7: Sharp Aquos Fulltouch 931SH. 2. What is the screen resolution of PDAs? Most common PDA resolutions are 640480 (VGA) and 800480 (WVGA).
2.3 Image Compression Transmission of a picture in uncompressed format is not practical, due to its large size and consequent time needed for its transmission. For use on the Internet and email, the image sizes need to be much smaller than the uncompressed formats. There are many ways to reduce the file size, such as: ● ● ●
Changing the picture size to suit the receiver Number of bytes used to represent each pixel Compression
Obviously, there is a very wide range of image formats, with different compression ratios, and the techniques used have a bearing on the image portability and quality. For local storage and special applications (e.g., publication, large-screen displays), it may still be necessary to handle images in uncompressed format.
2.3.1 JPEG Image Format The JPEG format is one of the most commonly used image formats. The JPEG encoders work by dividing a picture into macro blocks of 88 pixels and applying the DCT process (Discrete Cosine Transformation). The higher coefficients are then discarded through a “zig-zag” scanning (selecting the lower frequency components first followed by higher frequencies) leading to reduction in file size. The reduction is dependent on how many coefficients we are willing to discard and correspondingly the loss acceptable in compression. The quantized values are further compressed using lossless Huffman Coding. The entire process of compression using DCT is based on the fact that the human eye can not perceive fine details that are represented by higher frequency coefficients, so these can be easily discarded without discernable loss of quality.
Introduction to Digital Multimedia
Figure 2.8: Compression using DCT.
Figure 2.9: DCT quantization using zig-zag scanning.
17
18
Chapter 2
In most cases, a 20:1 compression can be achieved without discernable loss of quality. The compression so achieved is lossy, as the higher frequency components discarded cannot be recovered again. It is for this reason that images needed for studio work and editing are stored in an uncompressed format. The JPEG files are stored with the .jpg extension and are widely supported by browsers as well as a majority of applications.
2.3.2 The GIF Format The GIF or Graphical Interchange Format was originally developed by Compuserve in 1980 and has been a de facto standard for image storage and transmission since then. It is virtually a lossless compression. The GIF format uses LZW compression technique and is efficient at condensing color information for pixel rows of identical color by limiting the color palette to 16 or 256 colors. It is particularly effective for drawings and sketches with large areas of the same color. There are two variants: GIF87a, which supports 8-bit color (256 colors) and interlacing, and GIF89a, which in addition supports transparency and animation. GIF files are saved with the .gif extension and have universal browser support. The GIF format is a Unisys patented technology.
2.3.3 The BMP Format The BMP is the Bitmapped Graphics Format, defined by Microsoft and commonly used in the Windows environment. BMP reduces the file size by supporting 1-, 4-, 8-, or 16-bit color depth. The images can be uncompressed or have RLE compression. Due to this, the file size is very large. The files have the .bmp extension. Some of the common picture formats are summarized in Table 2.3.
2.4 Video When there is motion, there is a need to convey continuous information on the objects as they move, which brings us into the realm of video. The handling of video is based on Table 2.3: Common Picture Formats. bmp exif gif87a/gif89a jpeg paint/pict pdf png tiff wmf
Bitmap Exchange Image File Format Graphical Interchange Format Joint Photographic Expert Group MacPaint and MacDraw Portable Document Format Portable Network Graphics Tagged Image File Format Windows Meta File
Introduction to Digital Multimedia
19
the principles of persistence of vision of the human eye, which cannot distinguish rapid changes in scene. Taking advantage of the persistence of vision, it is possible to transmit a predetermined number of pictures (called frames) every second, and the human eye would not see any discontinuity in the motion. This is the principle used in cinema projection where 24 frames are shown, with each frame being shown twice, to bring to a refresh rate of 48 frames per second, to provide a feeling of continuous motion.
2.4.1 Generation of Video Signals: Scanning The word “frame” originates from cameras, which capture a series of pictures called frames that are then on passed in the form of a video output together with one or two audio channels. Each frame essentially represents a picture and the motion is captured by transmitting either 25 or 30 frames per second (based on PAL or NTSC standards). The first step in the generation of video from pictures is that of scanning a picture. A camera typically measures the intensity and color information in a picture by scanning a horizontal line across the picture. A series of horizontal lines are used to complete the full picture.
Figure 2.10: Scanning of images.
In the analog domain, the scanning process generates a series of levels (amplitude vs. time) representing the variation of the picture from white to black. The process generates a waveform representing each horizontal line until all the lines are scanned and converted into the analog waveform to complete a frame. Each frame has a predefined number of lines. The lines are separated by a vertical blanking pulse.
20
Chapter 2
Figure 2.11: Scanning in a television frame.
The scanning must be repeated a number of times each second to cover motion resulting in transmission of 25–30 frames per second.
2.4.2 Interlaced and Progressive Scanning When the pictures were scanned at a frame rate of 25 or 30 frames per second, there was a visible flicker owing to the time gap between the frames (i.e., 40 milliseconds (msec) for 25 frames per second (fps), while a 20 msec refresh is needed to give a flicker-free viewing experience). Hence the techniques that had been used in the motion picture industry came to be used, whereby the projector shows each frame twice to reduce flicker. In the days of analog signals, this could not be implemented easily, as there was no way to store the frame. Hence a new mechanism called interlaced scanning was used. The frame was divided into two halves with each having about half the lines, called a field. The first field displayed the odd-numbered lines, while the other displayed the even-numbered lines. Interlaced scanning is still used today even in digital transmissions. Interlaced scanning does not work when applied to computer monitors or mobile screens. Computer monitors need to display small character images and produce a visible flicker with interlaced scan. These therefore work on progressive scan, which produced better pictures. Nonlinear editing of video editing also requires signals to be processed in progressive scans.
Introduction to Digital Multimedia
21
Figure 2.12: Interlaced and progressive scan.
2.4.3 Color The human eye perceives light in three colors: red, green, and blue (called RGB in the video world). While this is a good way to represent the signals, it is more convenient to have the luminance component separate and the color components carried separately. This helps in a black and white TV, to use only the luminance signal. This mapping is done easily be representing the signals as Y (luminance) and color components called U (representing B-Y) and V (representing R-Y). Historically, when all TV sets were monochrome, only the luminance component was used. For backward compatibility, the technology of transmission of the luminance and color signals separately was adapted. The monochrome monitors could continue to display the Y (luminance signals) while the color TV sets would use luminance and color signals. As the human eye perceives color details at lower acuity than luminance, the bandwidth of the color signals is kept lower than luminance. For example, in PAL, the luminance channel is transmitted at a larger bandwidth of 5.5 MHz while the U and V channels are transmitted at 1.5 MHz. Similarly, in NTSC the color channels called I and Q are transmitted at bandwidths of 1.3 MHZ and 400 KHz, respectively, against that of luminance, which is 4.2 MHz.
2.5 Analog TV Signal Formats Analog video comprises the color components R, G, and B, which may be carried separately on three different wires or cable for local connectivity. This type of video carriage is known as the component format. Computer monitors usually have connectors for accepting the RGB format component video. In TVs, the YUV format is used owing to compatibility with monochrome devices.
22
Chapter 2
2.5.1 Composite Video Where the carriage of signals is over medium distances (e.g., within a facility) the three-cable method proves cumbersome and instead the technique of a composite video signal is used. A composite video signal comprises the luminance component (Y), which has been modulated with a color subcarrier.
Figure 2.13: Composite and component video.
NTSC standard uses the QAM modulation of the two color components; the SECAM standard uses frequency modulation.
Figure 2.14: NTSC composite signal.
Introduction to Digital Multimedia
23
2.5.2 S-Video S -video avoids the combining of luminance and chroma components by keeping the two separate. This means that the video is carried using two cables, one carrying the luminance (Y) and the other carrying the chrominance (C). S-video connectors were frequently used in higher-grade home video equipment.
2.6 Digital TV Formats Analog video can be digitized by sampling at a frequency that is larger than the Nyquist rate, i.e., twice of the bandwidth of the signal. The sampling of video is done on the component video (Y,U,V) (to generate digital streams Y, Cb, and Cr), which are then combined to generate a digital representation of the signal. The sampling of the color signals is done at lower rates than the luminance signal without perceptible loss of quality. For this purpose, it is usual to code the U and V components at half the bit rate of the luminance component. This is denoted by the nomenclature of YUV samples as 4:2:2; i.e., for every four samples of Y there are two samples of U and two samples of V. This is done by using half the sampling rate for the color signals U and V. It is possible to reduce the bit rates further by sampling the color only on alternate slots. This gives rise to 4:2:0 notations for the sampling employed. Table 2.4 shows the ITU BT.601 recommended rates for component digital video. In both PAL (576i) and NTSC (480i), each horizontal scan line is represented by 720 samples for Y and 360 for the color components Cr and Cb. This makes processing of multistandard digital signals easier. The samples are represented digitally by using 10 bits each in professional video equipment. This generates interlaced digital component video. There is a difference between the total picture area and the active area used for carrying video information (see Table 2.5). In digital video, the capacity available through the inactive lines (horizontal ancillary area) and the vertical blanking (vertical ancillary areas) is used to carry pairs of stereo audio channels. In the case of NTSC, the available capacity can carry data rates of up to 5.7 Mbps. Table 2.4: Sampling for Generation of Component Digital Video. Component Digital Video Sampling (PAL) 4:2:2 Sampling 4:1:1 Sampling 4:2:0 Sampling 4:4:4 Sampling
Luma at 13.5 MHz, Chroma at 6.75 MHz (23.375 MHz) Luma at 13.5 MHz (43.375 MHz), Chroma at 3.375 MHz Luma at 13.5 MHz, Chroma at 6.75 MHz (interleaved) Luma and Chroma are sampled at 13.5 MHz Each
24
Chapter 2 Table 2.5: Active Picture Areas Used in Digital Standards. Total Area Including Sync
NTSC PAL/SECAM
Active Picture Area
Width
Height
Width
Height
864 864
525 625
720 720
486 576
Frame Rate
29.97 25
Table 2.6: SDI Signal Standards. Video Format 480i, 576i (SD-SDI) 480p, 576p 1080i, 720p (HD-SDI) 1080p (Dual-Link SDI)
SMPTE Standard
Bit rate
SMPTE 259M SMPTE 344M SMPTE 292M SMPTE 272M
270 Mbps 540 Mbps 1.485 Gbps 2.970 Gbps
In the AES/EBU format, two audio channels can be carried for a data rate of 3.072 Mbps. Thus two to four channels of audio are carried along with component digital video to generate the SDI signal.
2.6.1 SDI Video An analog-to-digital conversion of video signals by sampling of video generates digital video in uncompressed format. This video, along with audio, is delivered in broadcast stations using a serial digital interface and is commonly called SDI. The Society of Motion Picture and Television Engineers (SMPTE) has standardized SDI video and SDI-HD video as per the notations in Table 2.6. SDI video is generally the source used for all further processing such as encoding, compression, and storage. The SDI signals can also carry embedded closed-caption information.
2.6.2 Digital Video for Small-Screen Devices For CIF and QCIF signals, the ITU provides for a lower sampling rate of 4:2:0 for the chroma signals. This leads to the following representation: Common Intermediate Format (CIF)—288 lines of luminance information (with 360 pixels per line) and 144 lines of chrominance information (with 180 pixels per line). Quarter Common Intermediate Format (QCIF)—144 lines of luminance (with 180 pixels per line) and 72 lines of chrominance information (with 90 pixels per line). Table 2.7 lists the CCIR recommended video standards.
Introduction to Digital Multimedia
25
Table 2.7: ITU Video Standards.
Luminance Resolution Chrominance Resolution Color Subsampling Fields Per Sec Interlacing
CCIR 601 525/60 NTSC
CCIR 601 625/50 PAL/SECAM
CIF
QCIF
720480 360480 4:2:2 60 Yes
720576 360576 4:2:2 50 Yes
352288 176144 4:2:0 30 No
176144 8872 4:2:0 30 No
2.6.3 Interlaced Scanning vs. Progressive Scan for Small-Screen Devices The small screen devices (CIF and below) use progressive scan instead of interlaced, as shown in Figure 2.15. (The progressive scan is denoted by “p” and interlaced scan with “i”.)
Figure 2.15: Display on small-screen devices.
2.7 Video Bit Rate Reduction SDI video at 270 Mbps is a commonly used standard for professional use in studios, broadcast systems, and a variety of other video handling environments for standard-definition video. However, for most transmission and broadcast applications there is a need to reduce the bit rates while maintaining acceptable quality. There are two ways in which the bit rate of video can be reduced: scaling and compression.
2.7.1 Scaling In applications where a smaller window size can be used, the number of pixels and consequently the bits required to carry them can be reduced. This type of scaling is called spatial scaling.
26
Chapter 2
Temporal scaling Bit rates can be reduced for certain applications by reducing frame rates. This is particularly true for frames where motion is limited (such as a news reader on TV). An example is the RealVideo© streaming, which can drop the frame rates from 30 (or 25) to 15 fps or even lower. In mobile streaming, transmission conditions may force frame rates down to as low as 7–10 frames per second, which makes video look “jerky” as the frame rate is too low even with persistence of vision to provide a perception of continuous motion.
2.7.2 Video Compression Compression of video is a complex process, and a number of techniques are employed to compress video by factors of 100 or more while maintaining quality for designated applications. The compression of video builds on the techniques for compression of pictures such as JPEG compression using DCT. As each frame represents largely the same picture, with motion in some areas of the picture, the techniques of frame prediction and interpolation are used in addition to the compression of the picture itself represented in the frame. All the compression techniques take advantage of the redundancies that are present in the video signal to reduce the bit rates of video for use in digital TV, mobile TV, IP TV, and other networks. Compression of video can be lossy or lossless. In case of lossy compression (such as dropping of bits or coefficients in the compression algorithms), the original picture cannot be restored to full resolution. Spatial redundancy In normal pictures, there are areas where the pixels would all depict the same object, e.g., sky or clouds. In such cases, the variation from one pixel to another is minimal, and instead of describing each pixel with all Y and color information bits, these can be coded by using the statistical redundancy information. A code such as Run-Length Encoding (RLE) enables the carrying of frequently occurring parameters using fewer bits. Temporal redundancy In case of motion, each frame has some pixels that would have changed with respect to the previous frame as a result of motion. However, this is not the case for all pixels in the frame, many of which would carry the same information, as the frame rate is quite high (e.g., 25–30 frames per second). Hence conveying all the information of a frame, as if it were totally unrelated to the previous frame, is unnecessary. Only the “change” information (denoted as motion vectors) between one frame and another frame needs to be conveyed. It is also possible to predict some frames based on the motion vector information. Every time all information of a frame is carried it is called an I-frame; frames that are predicted using the motion vectors from previous frames are called P-frames, per the notion used in MPEG-2
Introduction to Digital Multimedia
27
compression. There is another type of predicted frame called the B-frame, which is predicted using the I- and P-frames using the previous as well as next (forward frames) as reference. Temporal or interframe compression is possible owing to a large amount of common information between the frames, which is carried using only motion vectors rather than full frame information. Perceptual redundancy The human retina and the visual cortex are inherently able to distinguish the edges of objects with far superior acuity than they can the fine details or color. This characteristic of human vision is used to advantage in object-based coding in some higher compression protocols such as MPEG-4, which use contour-based image coding. Statistical redundancy In natural images, not all parameters occur with the same probability in an image. This fact can be used to code frequently occurring parameters with fewer bits and less frequently occurring parameters with a larger number of bits. This type of coding enables the carriage of a greater number of pixels with fewer bits, thereby reducing the bit rate of the stream. This technique, called Huffman Coding, is commonly used in compression algorithms. Scaling: reducing pixel count An important parameter for the bit rate of a signal is the number of pixels that are required to be carried, as each pixel may be required to be encoded with up to 24 bits. As an example, although the standard-definition video is 720480 (345.6 K pixels), MPEG-1 format, which is used to carry “VCD quality” (SIF) video, uses a resolution of only 352288 (101.3 K pixels), thus reducing the number of pixels to be carried by one-third. Video conferencing, which is used over multiple 128 K telephone ISDN lines (H. 261), employs only a quarter of the SIF format pixels by using 176144 as the pixel density (25.3 K pixels per frame). This is lower by a factor of 13 as compared to standard-definition video (Table 2.8). Table 2.8: Bit Rates for Small Screen Devices. S.No 1 2 3 4 5
Compression Format MPEG-1 MPEG-2 MPEG-4 H.261 H.263
Picture Representation 352288 SIF 720480 CCIR 176144 QCIF 352288 QSIF 176144 QCIF 352288 QSIF 12896–720480
Application
Bit Rate
Video CD Broadcast TV, DVD Internet, Mobile TV Video Conferencing Video Conferencing
0–1.5 Mbps 1.5–15 Mbps 28.8–512 Kbps 384 K–2 Mbps 28.8 K–768 Kbps
28
Chapter 2
Figure 2.16: Compressing video.
Now it is very easy to visualize the processes that are involved in the two areas—i.e., scaling and compression—for reduction of bit rates. In our example, the SD video (720480) with 345.6 K pixels per frame at 25 frames per second requires the transmission of 8.64 megapixels per second. For mobile TV, having QCIF resolution of 176144 (25.3 K pixels per frame) and 15 frames per second, the required transmission rate is only 380 K pixels per second. In the previous example, by scaling the picture and the frame rate, the pixel rate has been reduced from 8.64 megapixels to 0.38 megapixels, which is a scaling down of approximately 23 times. The final bit rate is of course determined by how many bits are required to carry each pixel. This is the domain of compression. The pixels are now ready to be subject to compression, the first stage of which would begin by formation of 88 macro blocks and application of DCT process, Huffman Coding, RLE, object-based coding, and so on, based on the compression protocol employed. Once the entire process is completed, a bit rate as low as 64 kbps is needed to carry the information, which would otherwise have needed 9.12 Mbps to carry even the scaled-down video rate of 0.38 megapixels per second at 24 bits per pixel.
2.7.3 MPEG Compression MPEG stands for the Motion Picture Expert Group; compression standards formulated by MPEG have been widely used and adapted as international standards.
Introduction to Digital Multimedia
29
Figure 2.17: MPEG compression process.
Compressing within a frame The DCT quantization process for each frame is the same as used for images. Each 88 block is transformed into another 88 block after DCT transformation. For color pictures, the macroblock comprises of four blocks of luminance and one block each of U and V color. The new 88 block now contains frequency coefficients. The upper-left corner of the block contains the lower frequencies, and these are picked up for transmission. The lower-right corner contains higher-frequency content that is less discernable by the human eye. The number of coefficients dropped is one of the factors in determining the compression. If no coefficient is dropped, the picture compression is lossless and can be reversed by inverse discrete cosine transformation process. Compressing between frames Compressing between the frames is used so that not all the 30 frames (NTSC) are required to be transmitted every second. This is called “temporal compression.” For this purpose, video is divided into series of frames into a “group of pictures.” The group of pictures carries three types of frames. Intraframe or I-Frame: These frames are coded based on the actual picture content in the frame. Thus, each time an I-frame is transmitted, it contains the full information for the picture in the frame and the receiving decoder can generate the picture without any reference to any previous or next frames.
30
Chapter 2 Predicted Frame or P-Frame: Generated from the previous I- or P-frames by using the motion vector information to predict the content. Bidirectional Frame or B-Frame: The B-frames are generated by an interpolation of the past and future I- and P-frame information using vector motion information. The encoder has the frame memory and the transmission order of the B-frame, which has been generated by interpolation and is reversed so that the decoder finds the frames in the right order.
The degree of the temporal compression depends on the number of I-frames transmitted as a ratio of the B- and P-frames. This would depend on the type of source video content and can be set in the encoders. The lowering of data rate takes place owing to a B-frame containing only about half the data contained in an I-frame, and a P-frame containing only one-third the amount of data.
Figure 2.18: Temporal compression in MPEG.
2.7.4 Motion Vectors and Motion Estimation The techniques of spatial compression using DCT largely address the compression of pictures. In order to effectively compress video with moving images, it is also necessary to employ techniques that directly target moving objects. The estimation of motion by motion estimation is one such technique used in MPEG.
Introduction to Digital Multimedia
31
Motion estimation is done by comparing the position of picture elements (macroblocks) in one frame with previous frames and estimate direction and magnitude of motion, which is represented by motion vectors. The process is complex and encoder quality is determined by how accurately the motion can be estimated.
2.8 Compression Standards A number of compression formats exist and are used based on the application. The existence of many formats also depicts the historical evolution of compression technology, which has become more complex with the falling cost of processing-intensive chips.
2.8.1 MPEG-1 MPEG-1 (ISO 11172) was the first multimedia audiovisual coding standard. MPEG-1 was designed for CD-quality video and audio coding with a limited resolution of 352288 for PAL and 352240 for NTSC. The frame rates are 25 fps for PAL and 30 for NTSC, as in the analog systems, but the MPEG-1 uses progressive scanning. It generates the compressed stream at rates up to 1.5 Mbps and has been largely used for VCDs. It uses the processes of DCT and RLE as well as motion estimation based on pixel motion. MPEG-1 provides for up to two audio channels and three layers of audio coding complexity (layer 1 to layer 3), of which layer 3 is most popular and is known as MP3. The MPEG-1 standard does not address the streaming formats.
2.8.2 MPEG-2 The MPEG-2 standard (ISO 13818) is a compression standard that was finalized in 1994 and is today the most widely used standard for broadcast TV as well as storage applications such as DVDs. MPEG-2 was designed to handle full-resolution video, including HDTV. It can generate bit rates from 1.5 to 15 Mbps for standard-definition video. The type of compression employed is defined through the use of the MPEG-2 profiles. For transmission of broadcastquality standard-definition video (CCIR 601), the “main profile at main level (MP@ML)” is used, which can generate bit rates up to 15 Mbps, but in practice bit rates of 2.5 Mbps may be adequate. For studio processing, the use of B- and P-frames is dispensed with and only I-frames are used, resulting in a compressed video stream at 50 Mbps. The I-frame-only compression makes the compressed video suitable for frame by frame editing. MPEG-2 transport frame MPEG-2 provides a unique structure for the transport stream whereby the stream can carry any number of video, audio, and data channels, which are identified by their program IDs and can be grouped together in any manner using Program Association Tables or PAT.
32
Chapter 2
Figure 2.19: MPEG-2 transport stream.
MPEG-2 is also backward-compatible with MPEG-1 and has the provision for carriage on different transport media, including streaming and ATM (asynchronous transmission mode adaptation layer). MPEG-2 is the most widely used system used today in digital broadcasting. The digitalization of the analog TV transmission networks is based on the use of the MPEG-2 transmission format and frame structure. MPEG-2 transport frame is also used in mobile TV broadcasting networks such as ATSC Mobile DTV and DVB-H, as you will see in later chapters.
2.8.3 MPEG-4 MPEG-4 follows an entirely different approach to video compression. The video objects and the background are considered as distinct and basic constituents of a picture. This is a departure from the approach used in MPEG-1 and MPEG-2 standards of using only pixels and blocks to describe the picture. Under MPEG-4, the picture is analyzed in such a manner so as to identify a single background (generally static) and a number of objects that are in motion. The objects are identified and compressed separately. Information on the motion of video objects is sent as part of the stream. The decoder then reconstructs the picture by combining the background and the individual video objects, including their motion.
Introduction to Digital Multimedia
33
The MPEG-4 algorithms, which were primarily oriented toward providing high compression and lower bit rates than MPEG-2, have subsequently found application in streaming video applications. To cater to the wide range of applications that are possible using MPEG-4, a number of profiles and levels are defined. Figure 2.20 shows the bit rates generated by MPEG-4 for various screen resolutions.
Figure 2.20: MPEG-4 profiles for mobile devices.
The MPEG-4 visual simple profile is the prescribed standard for video and audio transmission over mobile networks under the 3GPP Release 5 as explained in the next chapter. In addition, the profiles for MPEG-4 have been enhanced to include Advanced Simple Profile (ASP). The ASP provides for interlaced frame video to be coded using B-frames and global motion compensation. MPEG-4 standards now also include scalable video coding (SVC) by adding the concept of enhancement layers. The basic level of encoding is the base layer with image quality as per the MPEG-4 ASP (visual). One level of enhancement is provided by better picture quality per frame (also known as the fine-grain scalability (FGS).This improves the number of bits used to represent each picture or frame. The second layer of enhancement is provided by improving the frame rate or temporal enhancement (called the FGS Temporal Scalability layer or FGTS). As the MPEG-4 standards define a video object separately, it is possible to define threedimensional (3D) objects as well, and this makes the MPEG-4 standard ideally suited for video handling for many applications such as video games and rich media. Compression under MPEG-4 has a number of steps, some of which are: Identification of video objects—The picture is broken up into separately identified video objects and the background.
34
Chapter 2 Video object coding—The video object is then coded. The texture coding within the object is handled using the DCT process.
2.8.4 Multimedia and Interactivity with MPEG-4 The high efficiency of video and audio coding achieved by the MPEG-4 are not the only success factors that led to its increasing use in applications such as IP streaming or mobile TV. It is also tailor-made for interactive and multimedia applications. Why? First, as it is based on object-based coding, it can deal separately with video, audio, graphics, and text as objects. Second, synthetic (and natural) objects can be created and incorporated in the decoded picture. Third, as it is based on object-based coding rather than frame-based coding, it provides flexibility in adapting to different bit rates. It is not limited by the need to transmit a certain number of frames per second, with repeated coding of the same objects in case of scene changes. This makes it ideally suited for mobile environments, where the user may travel from near a base station transmitter to the outer fringes and the usable bit rates may change considerably. Finally, it has a provision for scene coding called Binary Format for Scenes (BIFS), which can be used to recreate a picture based on commands. This implies that objects can be reordered or omitted, thus virtually recompositing a picture with objects, graphics, and text. A picture can be rendered by adding or deleting new streams. When such changes are done based on commands (termed Directed Channel Change or DCC), it can be used for a host of applications with powerful interactivity, such as targeted advertising. The BIFS information determines the source of the elementary streams in the final picture, and these can be different from those from the originating source.
Figure 2.21: Object-based decoding in MPEG-4.
Introduction to Digital Multimedia
35
MPEG-4 has 22 parts, which define various attributes of the standard, such as Delivery Multimedia Integration Framework (MPEG-4, part 6), Carriage Over IP Networks (part 8), and Advanced Video Coding (MPEG-4, part 10, now standardized as H.264/AVC).
2.8.5 MPEG-4 Applications Use of MPEG-4 now spans all applications, from web-based video to digital television, production, and transmission. MPEG-4 provides file structure in which the .MP4 files can contain video, audio, presentation, images, or other information. MPEG-4 files may or may not contain audio. Files carrying MPEG-4 audio are denoted by .MA4; files carrying MP4 audio outside the MP4 container are denoted by .AAC.
2.8.6 H.264/AVC (MPEG-4, Part 10) The H.264 coding standard was a result of joint development effort of the Moving Picture Expert Group (MPEG) and the Video Coding Expert Group (VCEG) and was released in 2003. The standard was adopted by the ITU in May 2003 under the H.264 recommendations and the ISO/IEC as MPEG-4 part 10 (ISO 14496-10) in July 2003. The H.264 standard was oriented toward the twin objectives of improved video coding efficiency as well as better network adaptation (i.e., the coding is independent of the transmission network that will be used). These are achieved by distinguishing between the two different conceptual layers, i.e., the Video Coding Layer (VCL) and the Network Abstraction Layer (NAL). H.264/AVC represents a significant improvement over the previous standard of MPEG-4 in terms of bit rates. The lower bit rates and the use of the Network Abstraction Layer makes H.264/AVC ideally suited to be used in wireless multimedia networks, CDMA, 3G, and other packetbased transport media. For mobile devices, 3GPP release 6 has adapted the H.264 video coding as the standard for wireless and mobile broadcast networks; 3GPP release 5 was limited to the use of the MPEG-4 visual simple profile. H.264 enables the transmission of video at bit rates about half of those generated by MPEG-2. This, together with better network layer flexibility and the use of TCP/IP and UDP protocols, is leading to its increasing use in DSL/ADSL networks for IPTV as well as conventional broadcast networks, which are today completely dominated by MPEG-2. In the coming years, with reduction in the cost of the encoding and decoding equipment, the transition to H.264 is expected to be significant. The comparison in Figure 2.22 reflects the bit rates and storage requirements using the MPEG-2, the MPEG-4 (advanced simple profile or ASP), and the H.264 standards. MPEG 4 can deliver HD content at 7–8 Mbps as opposed to 15–20 Mbps by using MPEG-2. H.264 has been ratified as a standard in both the HD-DVD and Blu-ray DVD formats. It has also been built into Apple QuickTime 7 (and higher versions) as a video codec.
36
Chapter 2
Figure 2.22: Performance comparison of a 120-minute DVD-quality movie at 768 Kbps.
2.8.7 H.264/AVC Encoding Process In the H.264 encoding process, a picture is split into blocks. The first picture in an encoding process would be coded as an I-frame without use of any other information involving prediction. The remaining pictures in the sequence are then predicted using motion estimation and motion prediction information. Motion data comprising displacement information of the block from the reference frame (spatial displacement) is transmitted as “side information” and is used by the encoder and decoder to arrive at the predicted frame (called inter-frame). The residual information (the difference between intra and inter-blocks) is then transformed, scaled, and quantized. The quantized and transformed coefficients are then entropy-coded for inter-frame or intra-frame prediction. In the encoder, the quantized coefficients are also inverse-scaled and transformed to generate the decoded residual information. The residual information is added to the original prediction information and the resulting information is fed to a deblocking filter to generate decoded video.
2.8.8 TV and Video At this point in time, it is also important to understand the distinction between TV and video.
Introduction to Digital Multimedia
Figure 2.23: H.264 encoding. Table 2.9: H.264/AVC Profiles. Level 1 Level 1b Level 1.1 Level 1.2 Level 1.3
15 Hz QCIF at 64 Kbit/s 15 Hz QCIF at 192 Kbit/s 30 Hz QCIF at 192 Kbit/s 15 Hz CIF at 384 Kbit/s 30 Hz QCIF at 768 Kbit/s
Level 2 Level 2.1 Level 2.2
30 Hz QCIF at 2 Mbit/s 25 Hz 625HHR at 4 Mbit/s 12.55 Hz 625SD at 4 Mbit/s
Level 3 Level 3.1 Level 3.2
25 Hz 625SD at 10 Mbit/s 30 Hz 720p at 14 Mbit/s 60 Hz 720p at 20 Mbit/s
Level 4 Level 4.1 Level 4.2
30 Hz 1080 at 20 Mbit/s 30 Hz 1080 at 50 Mbit/s 60 Hz 1080 at 50 Mbit/s
Level 5 Level 5.1
30 Hz 16VGA at 135 Mbit/s 30 Hz 4Kx2K at 240 Mbit/s
37
38
Chapter 2
The TV world is characterized by the use of interlaced video (for economy of bandwidth in transmission) and the use of specific frame rates, frame resolution, and color information. In the analog domain, these manifest themselves as NTSC, PAL, or SECAM standards and their variations. In the digital domain, the terms NTSC or PAL are strictly not applicable, as these terms also include the manner in which color information is handled using subcarriers. However, digital TV still retains the characteristics of the original analog TV signal (such as frame rate) and pixels per frame (such as 480i at 30 fps or 576i at 25 fps). NTSC signals are characterized by a resolution of 720480 and a frame rate of 30 fps; PAL signals have a resolution per frame of 720576 and a frame rate of 25 fps. The TV signals, even after digitalization or compression, retain their source specific use (such as NTSC or PAL) and can be used only in compatible environments unless converted. The “TV” thus retains its distinct identity as against video, as it is widely used the Internet-dominated arena, which is common across the world (Figure 2.24).
Figure 2.24: A TV signal maintains its identity of original format after compression.
The term “video” used here denotes the way it is handled on the Internet or while displaying it on common display devices such as CRTs or LCD displays. The computer monitors are based on the use of the video in a progressive display format rather than an interlaced format. Similarly, on the Internet, the format for video used is per internet engineering task force (IETF) standards, which is common across the globe. This is how streaming video can be
Introduction to Digital Multimedia
39
displayed or websites opened worldwide without any additional thought to the standards underlying the video and audio.
2.9 The AVS ⴚ M Video Coding Standard (China) China uses video and audio coding per the specifications defined by the audio and video coding standards (AVS) workgroup established by the Ministry of Information Industry (MII). The AVS standards, as they are known, have 10 parts, of which part 7 pertains to the video coding for mobile devices. Part 7 of the AVS standard is popularly known as the AVS-M.
2.9.1 AVS Standard Parts Part 1 : The AVS System Part 2: Video Part 3: Audio Part 4: Conformance Part 5: Reference Software Part 6: Digital Media Rights Management Part 7: Mobile Video Part 8: Transmission of Video via an IP Network Part 9: AVS File Format Part 10: Audio and Speech Coding The architecture of the AVS-M codec is very similar to the H.264 standard. AVS-M supports only progressive scanning of video and uses a 4:2:0 scheme for color components. Hence there is no concept of fields and one picture is always one frame. Only two types of pictures are specified: the I-pictures, which are derived from a full encoding of the frame, and the P-pictures, which are predicted based on a maximum of two reference frames for forward prediction. For the purpose of encoding, the picture is divided into macroblocks. A slice is a sequence of macroblocks in a raster scan. The slices are always nonoverlapping. A macroblock is partitioned into six 88 blocks (4: luma, 2: chroma). Alternatively, it can be partitioned as 24 44 blocks (16 luma and 8 chroma blocks). It uses VLC coding and an error image deblocking filter in a manner similar to H.264. Macroblocks of 44 are used for inverse cosine transformation (ICT). For faster processing in the encoder, AVS-M uses a prescaled integer transform (PIT) where the scaling results are precalculated and available in the encoder. The predicted picture frames (P) can be away from the reference frames for better error resilience. It is possible to also mark frames so that these will not be used as reference frames for prediction. This makes it possible to drop these frames, if temporal scaling is required, without affecting the predicted frames. Thus there is no cascading effect of dropping frames on video quality.
40
Chapter 2 Table 2.10: AVS-M Profiles and Levels ( JiBen Profile).
Level 1 1.1 1.2 1.3 2 2.1 2.2 3 3.1
Screen Size SQCIF (12896) or QCIF (176144) SQCIF (12896) or QCIF (176144) CIF (352288) CIF (352288) CIF (352288) 352480 or 352576 352480 or 352576 VGA (640480) D1 (720480 or 720576)
Maximum Bit Rate
Maximum Frame Rate
64 Kbps 128 Kbps 384 Kbps 768 Kbps 2 Mbps 4 Mbps 4 Mbps 6 Mbps 8 Mbps
30 fps for SQCIF,15 fps for CIF 30 fps for SQCIF,15 fps for CIF 15 fps 30 fps 30 fps 30 fps or 25 fps 30 fps or 25 fps 30 fps 30 fps or 25 fps
As is the case with H.264, specific levels and profiles have been standardized to promote interoperability amongst AVS-M systems. Under AVS-M a “JiBen Profile” has been defined, and has nine levels. Table 2.10 lists these levels. The encoders and decoders for AVS-M can be implemented by using the software provided in the standard. The implementations give performance similar to H.264 but at a relatively lower cost.
2.10 Video Files Video is not always transmitted after compression. In general, it is necessary to store video. A number of file formats are used in the multimedia industry. Many of the file formats have their origin in the operating systems used and the manner in which the files were sampled and held in store in the computers. Others are based on the compression standard used. Conversions between file formats are today easily done by using a variety of software available.
2.10.1 The Windows AVI Format (.avi) AVI (Audio Video Interleaved) is the de facto standard for video on Windows-based machines, where the codecs are built-in for generating AVI video. AVI represents how audio and video are carried.
Figure 2.25: AVI format.
Introduction to Digital Multimedia
41
AVI is generated through sampling of audio and video input and does not have any significant compression. For this reason, AVI files are used for storage but not for transmission over networks.
2.10.2 Windows Media Format (.wmv) The Windows Media Format is a proprietary format of Microsoft and is used on Windows Media 9 codecs and players. Despite being proprietary, due to the wide use of Windows machines, it is used extensively. The use of .wmv files on other machines such as Macs requires Windows Media software.
2.10.3 MPEG (.mpg) Format As the name suggests, the MPEG format denotes video and audio compressed as per MPEG-1 or MPEG-2 compression. Motion JPEG (MJPEG) files are also represented by .mpg files. MPEG being an international standard, both Windows and Mac operating systems provide native support for MPEG.
2.10.4 QuickTime™ (.mov) Format QuickTime™ is a proprietary format from Apple Computer. It is widely used in the industry for audio and video as well as graphics and presentations. However, it is closely aligned to the standards at its core and has MPEG-4 as the base in QuickTime 6 and H.264/AVC in QuickTime 7. Due to friendly and advanced features, QuickTime players are available for most operating systems.
2.10.5 RealMedia™ Format (.rm) The RealMedia format has gained popularity through the extensive use of RealMedia players and servers on the Internet. The basic versions of RealMedia Producer, Server, and Player have been available as free downloads, which has contributed to the widespread use as well (such as the free player RealPlayer® 10). Full-length movies and music through the Rhapsody® music store are now available. Most websites support content hosted in the RealMedia format and for this reason it is almost mandatory for any device accessing the web to support RealMedia.
2.10.6 Flash Video™ (.flv) The Flash Video format has gained popularity as one of the most extensively used formats for web-based video deliveries, both for streaming and download. Flash Video can be played by Adobe Flash players (Adobe Flash Player 10) as well as web browser plug-ins. Flash Video
42
Chapter 2
is used by websites such as YouTube®, Google Video®, Yahoo Video®, and many others. Flash Video players can be downloaded and installed free and their popularity has led to these players being downloaded on an estimated 80% of computers connected to the Internet. Third-party players that support DirectShow®, such as Windows Media, QuickTime (with the Perian® plug-in), and the VLC media player, also support Flash Video. Originally, Flash Video was created by the use of a proprietary variant of the H.263 codec (called the Sorenson Spark). However, version 10 of Flash Video supports video compression in H.264 and audio in MP3 or AAC, making it an open format.
2.10.7 The DivX Format DivX is a media format (and a media player) that is generated using DivX codecs. The name owes its identity to DivX, Inc., the company that originally introduced these codecs, which have a capability to compress lengthy videos into manageably sized files. These files can be played by DivX players. DivX codec can also be used as a plug-in for a variety of players such as Windows Media Player. DivX files are denoted by the file extension .divx and denote a media container that contains multiple video streams, multiple audio tracks, subtitling (in the DivX-specific XSUB format), interactive video menus, and so on. Although the original DivX codec used proprietary encoding, release 7 of DivX includes MPEG-4 video and AAC audio (amongst other formats) in the .divx containers. DivX has multiple profiles based on the screen resolution needed and span the range from mobile devices to HD content. An alternative container format used for DivX content is the Matroska Multimedia Container format. Files in this format are denoted most often by file extensions of .mkv. The Matroska container format is characterized by its capability to store a large number and types of audio, video, subtitling, and metadata, and features an interactive menu for access to multimedia content similar to DVDs. The latest version of DivX—i.e., DivX Plus HD—can play all DivX video formats, including content in the HD (.mkv) format. Most well-known players such as ALL player®, VLC media player®, DivX player, and Media Player Classic provide native support to content in Matroska format. DivX Player is available for free download from the divx.com website. A player with more enhanced features (DivX Table 2.11: DivX Profiles. DivX Profile
Application
6.5 and above High Def 4 and above High Def 3.11 and above Standard Def, Home Theater 5 and above Mobile
Resolution
Bit Rate (Average–Peak)
19201080 30 fps 1280720 30 fps 480720 30 fps 576720 25 fps
4–20 Mbps 4–20 Mbps 4–8 Mbps
320240 30 fps
0.6 Mbps
Introduction to Digital Multimedia
43
Pro) is available for download for about $20 and features conversion to DivX video, advanced encoding features, and a DivXPlus™ HD encoding profile. DivX encoding can compress typical DVD content of 4.7 GB into a file of about 700 MB, which makes it a good option for Internet-based downloads.
2.10.8 XVID Format Xvid is a family of video codecs developed by some erstwhile DivX staff who had embarked on a separate line of development. It is based on the use of MPEG-4 part 2 (advanced simple profile) for encoding. Xvid is free software available for a number of platforms. It can be downloaded from www.xvid.org. Like DivX, Xvid codecs provide high compression for video. Xvid is available for mobile devices and Xvid mobile players can be downloaded free.
2.10.9 MXF File Format The MXF (material exchange format) is an SMPTE standard (SMPTE 377M) that has been in use in professional video equipment such as video servers or non linear editing machines (NLEs). The MXF defines a file container that can carry multiple types of video and audio and is thus an open file standard. Having a common file wrapper (defined by MXF) helps by easy transport and storage of video content. MXF files are denoted with a file extension .mxf. It is possible to use multivendor NLEs, servers, cameras, and other devices by exchanging the content using MXF format. The MXF format also effectively captures and transfers metadata, which is very important in a multidevice environment. MXF can also store files in a streamable format. This is made possible by index tables that maintain portioning for streaming or for file transfer.
Figure 2.26: MXF file format.
44
Chapter 2
However, not all MXF files are interchangeable, despite the original intention of having a common format. This is due to variants in subformats of MXF files created by certain equipment such as such the Panasonic DVCPRO-V2.
Quick FAQs Multimedia Formats 1. Where is DivX format most commonly used? The most common use of DivX format is for online transfer of movies and in home theater systems. Its advantage lies in manageable file sizes for even HD content. It is also used in consumer devices such as Sony’s PlayStation Portable (PSP®) and many video games. 2. If MXF is a common media format, does it mean that the source of files such as PAL or NTSC content is no longer relevant? NTSC and PAL are specifically used to denote the analog signal format. However, the MXF provides only a container for the format for common transport. The source formats such as 480i/30 fps or 576i/25 fps remain the same as the original content source. 3. Is the AVS-M format of china compatible with MPEG4? No; even though both formats are very similar, they are not compatible. However, it is interesting to note that China has chosen to use H.264 for video and HE-AAC V2 for audio in its standard for mobile TV (CMMB). 4. What is the official audio coding standard for china? China uses audio coding per DRA standards, which constitutes the official Chinese standards for audio. 5. Why does MPEG4 not have its own transport stream like MPEG2? MPEG4 is designed for delivery over multiple types of media, including IP-based transport. An abstraction layer is used to logically separate the transport from the compression standard.
2.11 File Containers and Wrappers As discussed earlier in the chapter, there are many standards for compression of video and its storage in files. Some of these formats are proprietary (such as Windows Media and RealVideo); others are based on different ways of “wrapping” video and audio compressed per one of the open standards (MPEG2, MPEG4, AAC, MP3, etc.). The file wrapper specifies the types of video and audio that are present along with the metadata such as title, author, or other details. The players for video frequently work with multiple types of file formats. Further, digital video can be a transmitted digital stream (e.g., being transmitted from an ATSC transmitter), streamed over IP, or saved in the form of a file. We find that even though these different variants may look bewildering at first sight, they are based on a few common standards. Figure 2.27 represents the architecture of digital video in different manifestations.
Introduction to Digital Multimedia
45
Figure 2.27: An overview of digital video and audio.
In case the video and audio need to be stored in a file format, the common method is to use a container for audio and video, which is then placed in a file wrapper that describes the types of audio and video content and metadata. These stored files are then recognized by appropriate wrappers such as QuickTime, Flash Video, MP4, DivX, MP4, ASF, and others. Moreover, even these containers can carry video and audio in many alternative formats. Figure 2.28 further elaborates on how the video and audio content is handled after compression. The MPEG-2 transport stream may for example be transmitted via a terrestrial transmission system using ATSC (or DVB-T, or ISDB-T), or a satellite or cable system. The stored files can be played by an appropriate player or streamed by a media streamer.
2.11.1 File Format Converters File format conversion is frequently required in view of multiple standards of video capture, storage, editing, and transmission. In most cases in which the file formats are in an open standard, the file format conversion is a straightforward process converting the “wrapper” to the desired format such as QuickTime, AVI, MXF, and so on. An example of such a format
46
Chapter 2
Figure 2.28: Overview of digital video formats, continued.
converter is the XFconverter™ 1.1 from OpenCube Technologies. It provides compatibility with different container formats such as AVI, MXF, GXF, QuickTime, Wave, and so on, and can handle video and audio essences in all common formats such as MPEG2, MPEG4, DV, DVCPro, DVCProHD, and many other formats. An easy-to-use GUI provides an excellent way to manage file wrapper conversions.
2.12 Audio Coding There are many ways to represent audio, depending on whether the audio is compressed or uncompressed, and the standard used for compression. Many of these formats have a historical origin based on use (e.g., telecommunications systems such as PCM) or the operating systems of the computers used. The audio standard used also depends on the application. Music systems require high-fidelity audio—i.e., 20 Hz–20 KHz with two or more channels—and mobile phones use 100 Hz–4 KHz for speech.
Introduction to Digital Multimedia
47
2.12.1 Audio Sampling Basics The range of frequencies audible to the human ear is from 20 Hz to 20 KHz. In order to handle this audio range digitally, the audio needs to be sampled at at least twice the highest frequency. The rates of sampling commonly used are as follows: Audio CDs: 44.1 KHz at 16 bits per sample per channel (1.411 Mbps for stereo) DATs (Digital Audio Tapes): 48 KHz at 16 bits per sample DVDs: 48–192 KHz at 16–24 bits per sample
● ● ●
The large number of bits needed to code audio is due to the large dynamic range of audio of over 90dB. Using a smaller number of bits leads to higher quantization noise and loss of fidelity.
Figure 2.29: Sampling and coding of analog audio.
The process of sampling and coding generates Pulse Code Modulated (PCM) audio. PCM audio is the most commonly used digital audio in studio practice. From the perspective of mobile TV, it is useful to distinguish between music (stereo audio of CD quality) and voice (mono and limited in bandwidth to 4 KHz). Some of the sampling rates commonly used are given in Table 2.12. Table 2.12: Audio Sampling Rates. S Number 1 2 3 4 5
Audio Source
Frequency Band
Sampling Rate
Speech Telephony Wideband Speech Music Music (CD Quality) Music (Professional and Broadcast)
200 Hz to 3.4 KHz 100 Hz to 7 KHz 50 Hz to 15 KHz 20 Hz to 20 KHz 20 Hz to 20 KHz
8 KHz 16 KHz 32 KHz 44.1 KHz 48 KHz
48
Chapter 2
2.12.2 PCM Coding Standards Owing to the logarithmic nature of the human ear in perceiving audio levels and the wide dynamic range involved, PCM coding is usually done using logarithmic coding. The A-law and μ-law codecs that have been standardized by the ITU under recommendation G.711 form the basis of digital telephony. The A-law codec is used internationally; the μ-law codec is used in the United States, Canada, and Japan, amongst others. Both coding standards are similar and provide for different step sizes for quantization of audio. The small step size, near zero (or low level), helps code even low-level signals with high fidelity while maintaining the same number of bits for coding. A-law voice at 64 Kbps and μ-law voice at 56 Kbps are most commonly used in digital fixed-line telephony.
2.12.3 Audio Interfaces When the audio is coded, such as when using PCM or a coder, it consists of a bitstream. There is a need to define audio interfaces that prescribe the line codes and formats for the audio information. AES-3 audio The standards for the physical interface of audio have been standardized by the Audio Engineering Society (AES) and the European Broadcast Union (EBU) under the AES-3/ EBU. This physical interface provides for a balanced shielded pair cable that can be used up to around 100 meters. Due to the need to carry on cable for such distances, the signals are coded with a line code. In the case of AES-3, a Non Return to Zero (NRZ) code is used with BPM (Biphase Mask) in order to recover the digital audio at the distant end. AES-3 can carry uncompressed or compressed audio and is most commonly used for carriage of PCM audio. Commonly used AES bit rates are as follows (for two audio channels): ● ● ●
48 KHz sampling rate 3.072 Mbps 44.1 KHz sampling rate 2.822 Mbps 32 KHz sampling rate 2.048 Mbps
2.13 Audio Compression In most applications, audio must be transmitted at rates that may range from 8–12 Kbps (cellular mobile) or 144 Kbps (stereo music). There is thus a need to compress audio. Audio compression is based on the characteristics of the human ear, which does not perceive all frequencies equally. The following are the main characteristics of the human ear that can be used to advantage when introducing compression.
Introduction to Digital Multimedia
49
Stereophonic perception limit The ear does not recognize sound as stereo below 2 Khz and hence sound can be transmitted as mono for frequencies below this threshold. Hearing threshold When the sound level is low, the human ear is most sensitive only to the middle band. It is relatively insensitive to low-frequency and high-frequency sounds. Temporal masking A high-level tone signal masks the lower-level signals near this tone (i.e., raises the threshold of hearing). Lower-level signals, which will be not heard, can be discarded in the compression process.
2.13.1 Audio Compression and Coding Principles Audio coders use the perceptual compression on the twin basis of human ear perception and discarding of irrelevant data. Sub-band coding is the most common technique, whereby the spectrum is split into a number of sub-bands. The bands that would be masked by the louder components nearby are then discarded. MPEG compression MPEG has developed standards for audio coding that are used widely in the industry. MPEG-1 audio coding uses the psychoacoustic mode. MPEG-1
For audio coding, the MPEG-1 compression standard is most popular; MPEG-1 Layer 3 (MP3) has been used widely over the Internet and in media players. MPEG-1 has three layers that denote increasing complexity of compression and encoding process. The MPEG-1 Layer 1 is used in the digital compact cassettes; Layer 2 is based on the MUSICAM compression format. The MUSICAM format is also used in the Digital Audio Broadcasting Systems (which are replacements for analog FM broadcast systems). Layer 3 (known as MP3) is an Internet standard and is used in many popular MP3 players. The sampling rates provided for in MPEG-1 are 32, 44.1, and 48 KHz. MP3 is also used for audio associated with digital video in VCDs, which use MPEG-1. The MPEG-1 standard has the following components describing video and audio: ● ● ●
Part 1 MPEG-1 Program Stream Part 2 MPEG-1 Video for CD Part 3 MPEG-1 Audio
50 ● ●
Chapter 2 Part 4 Conformance Part 5 Reference Software
2.13.2 Advanced Audio Coding (AAC) (MPEG-2 Part 7) MPEG-2 has been the standard for digital broadcast TV since its introduction and is one of the most widely used standards in the industry. The Advanced Audio Coding Standard (AAC) was developed as an improvement over the MP3 audio standard (MPEG-1 Part 3). There profiles were defined for AAC: low complexity (AAC-LC), main profile and (AAC-Main), and scalable sampling rate profile (AAC-SSR).
2.13.3 Audio Codecs in MPEG-4 MPEG-4 audio coding constitutes a family of standards that cater to a wide range of bit rates. MPEG-4 audio coding brings in much more complex algorithms with superior compression. MPEG-4 encoding also generates audio in AAC format. MPEG-4 AAC is backward-compatible with MPEG-2 AAC. MPEG-4 AAC adds an additional tool called Perceptual Noise Substitution, which removes the coding of background noise to reduce the data rates. It also uses a tool called Joint Stereo Coding whereby similarity between the left and right audio channels is used to remove the redundancy between the channels. The redundancy between consequent audio frames is reduced by using a tool called the Long-Term Predictor (LTP), which removes the stationary harmonic signals from the encoding cycle. The AAC standard is very popular because of its use in Apple’s iPod™ and the iTunes™ music store. MPEG-4 AAC provides better quality than MP3 at the same bit rates. It also supports coding of multichannel audio. AAC codecs have three profiles, based on the environment in which they are being used. These are the AAC-MP (main), the AAC-LC (low complexity), and the AAC-SSR (scalable sampling rate) profiles. The functionalities introduced in MPEG-4 include the “Multiple Bit Rate Coding” and scalable coding. Variable bit rate coding algorithms are better suited to media where streaming is involved and fixed rates of delivery cannot be guaranteed. The new techniques introduced in MPEG-4 AAC include: ●
●
Speech Codec HVCX: Stands for Harmonic Vector eXcitation Coding and is used to code speech at 2 kbps and 4 Kbps. CELP Coder (Code Excited Linear Prediction): Provides encoding from 6 Kbps to 18 Kbps. The encoder has options for 8 KHz and 16 KHz sampling rates.
MPEG-4 high-efficiency AAC V2 The HE-AAC V2—or AAC-Plus codec, as it is popularly known—is an improvement over the AAC coding in that it is able to improve the bit rates without degradation of quality.
Introduction to Digital Multimedia
51
Figure 2.30: MPEG audio formats.
Figure 2.31: MPEG-4 audio encoder bit rates.
This audio codec is very important, owing to its adoption by the DVB as well as standards bodies such as 3GPP and 3GPP2 for use on mobile and 3G networks. It is also the mandatory audio coding standard for Korea’s S- DMB mobile TV system as well as the Japanese ISDB-T mobile TV system. It is used extensively for music downloads over 3G and 2.5G networks.
52
Chapter 2
In addition, it is also used in the U.S. satellite radio service XM Satellite Radio and other radio systems, such as Radio Mondiale, which is the international system of broadcasting digital radio in the shortwave and medium-wave bands.
Figure 2.32: AAC encoder families.
The AAC encoding is improved in two steps called the v1 and v2. AAC v1 uses a technique called Spectral Band Replication (SBR) whereby the correlation between the highfrequency and low-frequency bands is used to replicate one from another. The version v2 goes further by adding another tool called the Parameterized Representation of Stereo (PS). In this technology, the stereo images of the two channels (L and R) are parameterized and transmitted as mono-aural information together with difference signals. These are then used to reconstruct the signal. Structure of MPEG-4 audio files MPEG-4 files have an ISO container structure that contains metadata with content: ● ● ● ● ●
MPEG-4 container file Song title Album cover … Audio
The audio files coded in MPEG-4 are denoted by an .MP4 or .MA4 suffix. The MPEG-4 container has multiple parts, including the title and album cover (constituting all information in the signal being transmitted). It is possible to apply DRM to MPEG-4 audio.
Introduction to Digital Multimedia
53
2.13.4 The AMR-WBⴙ Codec The extended adaptive multirate wideband codec (denoted by AMR-WB) is derived from the AMR series of codecs as opposed to the MPEG encoding used in AAC. Mobile multimedia applications require an audio codec that can handle a wide range of content such as speech, sports, news, and music, and provide extremely low bit rates at consistent quality. Together with the widely fluctuating quality of received channels and in order to maintain consistent quality, the audio codecs need to be able to adapt bit rates rapidly. AMR-WB meets these requirements and provides bit rates in the range of 6 Kbps to 48 Kbps for stereo audio (up to 48 KHz sampling) by using dual technologies of ACELP for speech and a technology called transform coded excitation (TCX) for audio. Quick FAQs
AMR-WBⴙ Codec 1. Where is the AMR-WBⴙ codec used? AMR WB has been adopted for use in mobile broadcasting by both 3GPP and DVB. It is used in Packet Switched Streaming Service (PSS), Multimedia Broadcasting and Multicasting (MBMS), DVB-H (IP Datacasting [IPDC] over DVB-H), and MMS in mobile messaging. It is also used for podcasting, audio books, and commercial video clips. 2. Why is MP3, which is very popular for music, not preferred for mobile streaming? At the bit rates available in a mobile broadcasting channel (i.e., 8 to 48 Kbps) AMR-WB performs much better than MP3. The feature of adaptable bit rates makes it more efficient for speech and music—for mixed content. AMR-WB tackles variable-quality mobile broadcast channels efficiently. 3. How is AMR-WBⴙ encoding done in practice? 3GPP Release 5 (and above) encoders support selection of AMR-WB as one of the methods of converting audio. 4. What is the sony® Atrac audio coding? The ATRAC codecs (for “adaptive transform acoustic coding”) is a proprietary coding scheme of Sony developed for portable audio players, e.g., using minidisks (ATRAC CDs). In order to maintain high fidelity, frequencies up to 22.5 KHz are covered, giving a stereo encoding rate of 292 Kbps. It is used in the Sony Walkman and in Walkman phones in some markets.
2.13.5 Proprietary Audio Codecs Some of the codecs used in the industry do not fall under the MPEG umbrella. The prominent ones include Windows Media, Apple QuickTime, and Real Audio. Windows Media 9 Players are available as a default on Windows-based machines and use Windows Media Codec version 9. A wide range of sampling and encoding rates can be selected depending on the application.
54
Chapter 2
Apple QuickTime™ 9 supports a wide range of codecs including the choice of MPEG4. Some of the proprietary options include Qualcomm PureVoice™ codec for speech encoding, Fraunhofer™ II S MP3 audio codec, and Qdesign Music codec for music. RealAudio from RealNetworks provides its proprietary audio codecs, which include the ATRAC3 codec jointly developed with Sony. The ATRAC3 codec provides high-quality music encoding from 105 Kbps for stereo music.
2.14 Streaming Streaming of content such as video became a popular technology alongside the growth of the Internet in the 1990s. The alternative was to download a file (which can be 20 Mbytes, even with MPEG-4 compression for three minutes of play). But the wait time for download was generally unacceptable. In streaming mode, video and audio are delivered to the users of mobiles or other devices at the same rate (on the average) at which it is played out. For example, for a connection at 128 Kbps, video at 64–100 Kbps can be streamed continuously, giving the user effectively live access to multimedia content. Streaming is made possible by high compression codecs together with the technology to “stream” content by converting a storage format to a packetized format (i.e., UDP packets) suitable for delivery over the Internet or IP networks. In principle, there are two approaches to streaming. It is possible to receive video, audio, and web pages using HTTP itself (i.e., without the use of any special protocol). This is referred to as HTTP streaming and is possible if the delivery channel is capable of sustained HTTP data delivery at the required bit rates. A more efficient approach is by using real-time streaming, which uses the standard IETF protocols RTP and RTSP. In addition, there are proprietary formats for streaming, e.g., Apple QuickTime Server, RealTime Server, Windows Media, and Flash Video streaming server.
2.14.1 Streaming Network Architecture Streaming involves the following steps: ● ● ● ● ●
Capture and encoding of content Conversion to streaming format Stream serving Stream transport over IP networks Media player
Complete streaming and delivery solutions have been developed by RealNetworks, Microsoft Windows Media Platforms, and Apple QuickTime multimedia. All of these are widely used. Formats such as QuickTime have support for MPEG-4 coding.
Introduction to Digital Multimedia
55
2.14.2 The Capture and Encoding Processes The capture of video involves the acceptance of a video and audio stream in a format that is compatible with the video capture card of the PC or server. The input streams can be in uncompressed or compressed format. After compression, the files are stored in the appropriate compressed format, such as .mpg or .mp4, depending on the encoder used.
2.14.3 File Conversion to Streaming Format In order for the files to be delivered via real-time streaming, they need to have timing control information, which can be used by the server to manage the delivery rate. For this purpose, the files are converted to the streaming format, which adds the timing control information as well as metadata to help the orderly delivery of streaming data for a variety of applications. QuickTime uses a feature called Hint Tracks to provide control information that points to the streamed video and audio information.
2.14.4 Stream Serving Stream serving is a specialized application that is used in a client–server mode to deliver a continuous series of packets over the IP network to the client. The streaming application uses multimedia real-time file exchange protocols that have been developed by the IETF. These include the Real Time Protocol (RTP), the Real Time Control Protocol (RTCP), and the Real Time Streaming Protocol (RTSP). The streaming process involves two separate channels that are set up for the streaming session. The data channel provides for the transfer of the video and audio data, whereas the
Figure 2.33: Streaming protocol stack.
56
Chapter 2
control channel provides feedback from the streaming client (i.e., the media player) to the server. The video and audio data that forms the bulk of the transfer in the streaming process is handled by the RTP using UDP and IP as the underlying layers. Hence the data is delivered as a series of datagrams without needing acknowledgments, making it very efficient. The client provides information such as the number of received packets and the quality of the incoming channel to the client via the RTCP channel. The server, based on the information received, knows the network congestion and error conditions and the rate at which the client is actually receiving the packets. The server can take action to deliver the packets at the correct rate. For example, based on the feedback from the client, the server can select one of the available streaming bit rates (64 Kbps, 128 Kbps, 256 Kbps, etc.) or choose to lower the frame rate to ensure that the sustained data rate of the transfer does not exceed the capability of the IP channel. RTSP is thus the overall framework under which the streaming content is delivered to a client over the IP network. It supports VCR-like control of playback such as the play, forward, reverse, and pause functions, which in association with the client media player provide the user with full control over the functionality of the playback process via streaming.
2.14.5 Stream Serving and Bandwidth Management The streaming server and the media client that sets up a connection to the server for streaming operate in a handshake environment.
Figure 2.34: Stream serving.
Introduction to Digital Multimedia
57
Quick Facts Transmission Rates Needed for QuickTime Streaming ● ● ● ● ● ● ● ●
1 megabit per second, 640480 1 megabit per second, 480360 768 kilobits per second, 320240 512 kilobits per second, 320240 384 kilobits per second, 320240 256 kilobits per second, 240180 112 kilobits per second, 240180 56 kilobits per second, 192144
In a streaming session, if the data rate drops due to link conditions, the client needs to signal to the server to carry out intelligent stream switching or other measures such as dropping of the frame rate. The process mentioned previously constitutes a one-to-one connection and handshake and is termed as a “unicast” connection. For each client (e.g., a mobile or a media player), there is a separate stream (i.e., a separate data channel and separate control channel) that is set up to successfully run the streaming process. This type of connection may not be ideal when there is a large number of users accessing the same content, as the number of streams and the data to be supplied multiplies rapidly. The other option is to have a multicast transmission. In a multicast connection, where all users receive the same content, the data is multicast. The routers in the network that receive the multicast stream are then expected to repeat the data to the other links in the network. However, instead of hundreds or thousands of unicast sessions, each link carries only one stream of multicast content. The approach has many advantages, but the individual clients here have no control or mechanism to request server for changing bit rate and so on in the event of transmission disturbances. In MPEG-4, there is another mechanism to provide higher bit rates to clients on a higher bandwidth network. The MPEG-4 streaming server transmits a basic low-resolution stream as well as a number of additional streams (helper streams). The client can then receive additional helper streams and assemble a higher quality of picture if bandwidth is available.
2.15 Streaming Players and Servers There are a number of encoders and streaming servers, some of them based on proprietary technologies.
2.15.1 RealNetworks RealNetworks streaming setup consists of the RealVideo™ codec and SureStream™ streaming server. The RealVideo is based on the principles of MPEG-4 coding. It uses frame rate
58
Chapter 2
Figure 2.35: MPEG-4 layered video coding and streaming.
unsampling. This allows frame rates required for the selected delivery rate to be generated by motion vectors and frame interpolation. This implies that a simple media file can be created for different encoding rates. While serving streams, the RealNetwork SureStream will set up the connection after negotiation with the player. The lowest rate (duress) is streamed in the most congested conditions. SureStream uses dynamic stream switching to switch to a lower
Figure 2.36: Stream switching.
Introduction to Digital Multimedia
59
(or higher) bit rate depending on the transmission conditions and feedback from the client. For example, it can switch from a 64 Kbps stream to a 128 Kbps stream or vice versa. The RealMedia format uses both the RTP protocol and its proprietary RDP protocol for data transfer. The RealMedia family of protocols is oriented towards unicast streaming.
2.15.2 Microsoft Windows Media Format Windows Media is in the Microsoft family of coders and decoders as well as that of streaming servers and players. The encoders can take video files stored in various formats such as .avi and generate files in the .wmv (Windows Media Video Format) or .asf (Advanced Streaming Format) formats. The codecs used are of two types: Windows Media–based and MPEG-4based. Windows Media Player is available as a part of the Windows operating system. Windows Media Servers (WMS) stream files in the .wmv or .asf formats.
Figure 2.37: Buffered playout in streaming.
Release 9 of Windows Media provides advanced features such as Fast Streaming and Dynamic Content Programming. Fast Streaming provides for instant-on streaming, i.e., no buffering before playback and “Always On features,” which are suited for broadband connections. This ensures that there are no interruptions during playback. The Windows Media streaming is not based on RTP, RTSP, and SDP protocols but is proprietary. Multicasting is supported via IGMPv3 support. Windows media has support for IPV6.
60
Chapter 2
Figure 2.38: Media players.
2.15.3 Apple QuickTime Apple’s QuickTime is a complete set of tools and players for handling multimedia and streaming. QuickTime components include a browser plug-in or QuickTime multimedia player, and the QuickTime streaming server. QuickTime, in addition to handling video, audio, graphics, and music (MP3), can handle virtual reality scenes. QuickTime uses the RTP and RTSP protocols as the underlying stack in its latest releases, and MPEG4 as the core compression standard.
2.16 Summary and File Formats In this chapter, you have seen that the basic element of multimedia is a picture. The size of the picture in terms of the pixels determines the file size through which the picture can be represented. Mobile phones have screens that range from a quarter of a VGA screen (QVGA) to WVGA or higher pixel counts. The size of the picture can be further reduced by compression schemes such as JPEG. When there are moving images, these are carried as a series of pictures called frames. Commercial television systems carry 25 or 30 frames per second. It is common to reduce the bit rates for carriage of video by compression or reduction of frame rates. There are many schemes for compression, beginning with MPEG-1 and increasing in complexity. MPEG-2 is today widely used for the carriage of digital television. MPEG-4 and H.264 are further developments that provide lower bit rates. With mobile phones having a small screen size such as QVGA and high compression such as MPEG-4, it is possible to carry video at very low bit rates ranging from 64–384 Kbps. Audio needs to be similarly coded for carriage on mobile networks, and a number of protocols have developed
Introduction to Digital Multimedia
61
Table 2.13: Summary of File Formats. Picture File Formats BMP (*.bmp) GIF (*.gif) PNG (*.png) JPEG (*.jpeg) or (*.jpg) WBMP (*.bmp)
Microsoft Windows Bitmap Graphics Interchange Format Portable Network Graphics Joint Photographic Experts Group Wireless Bit Map Video File Formats
AVI files (*.avi) DV video files (*.dv, *.dif) MJPEG video files (*.mjpg, *.mjpeg) MPEG-2 files (*.mp2) MPEG-4 files (*.mp4) QuickTime files (*.mov, *.qt) Raw MPEG-4 video files (*.m4v) Flash Video (*.flv) Raw video files (*.yuv) Real Media files (*.rm) MPEG2 program stream files (.mpg) MPEG2 video elementary files (*.m2v) WAV files (*.wav, *.wmv)
Audio Video Interleaved Digital Video Motion JPEG MPEG-2 MPEG-4 Apple QuickTime Source MPEG-4 files Adobe Flash Video format YUV video files Real Media Video MPEG2 Program Stream
Audio File Formats MP3 files (*.mp3) Windows Media Audio (*.wma) MPEG-4 audio files (*.m4a, *.mp4) AAC files (*.aac) Real Media Audio (*.rma, *.ra) WAV files (*.wav, *.wmv) MIDI
Advanced Audio Coding, MPEG-4 Windows Audio and Video Musical Instrument Digital Interface
for this purpose. These range from MPEG-1 layer 3 (MP3) to AAC (MPEG-2 part 7) and MPEG-4 AAC for music and AMR for speech. The use of advanced compression techniques makes it possible to deliver multimedia to the world of mobile phones. Some of the commonly used file formats found in various applications are given in Table 2.13. Before We Close: Some FAQs 1. In which format do popular online sites offering movies provide the content? what are the file sizes for such movies? Hulu® is a popular online cinema website that provides videos in the Flash Video format. VuzeHD® is a network where HD videos can be downloaded. This site uses the DivX format. CinemaNow® has over 7500 full-length feature films available for download with support for
62
Chapter 2
multiple formats (i.e., DivX, Flash Video, Real video/audio, and Windows Media). A 2.5-hour movie will have a download file size of 1.5 GB and require typically a T1 line (1.5 Mbps for download). 2. Is it possible to capture streaming video in a home theater for later viewing? Many players offer the facility for capture of streaming video (e.g., Replay Media Catcher 3 can capture streaming video in QuickTime, Real video and audio, Windows Media, and Flash formats). 3. I have a MiniDV camera bought in the United States that has a FireWire output cable. I am able to process my content using common editing software such as Windows Movie Maker and play it using Windows Media Player. Is this content still NTSC? Unless the content is converted to another standard such as PAL, it still retains its original format of frame rates and frame resolutions. However, that does not prevent it from being played using any of the common media players. Again, unless the signal is in analog format, the term NTSC is strictly not applicable—it should be termed as 720480 30 fps video. 4. What is an .Ogg file format? The .ogg file format is an open-standard container format for audio files. It is promoted by the Xiph.org foundation as a patent-free technology for encoding, transporting, and downloading audio. The encoder from Xiph.org is the Vorbis. Files in the .ogg format can be played by popular players such as iTunes, iMovie, QuickTime, and Windows Media Player. 5. Can Final Cut Pro™ (FCP) 5 be used to create 3GPP content? Yes, FCP is based on QuickTime and can be used to save content in the 3GPP or 3GPP2 formats. 6. How can Xvid content be displayed on mobile devices? The Xvid mobile player can be downloaded for a range of devices such as Windows Mobile or Symbian. The Xvid mobile profile is designed for use on mobile devices with limited resources and its certification ensures compatibility across many devices.
CHAPTE R 3
Introduction to Streaming and Mobile Multimedia In a way staring into a computer screen is like staring into an eclipse. It is brilliant and you don’t realize the damage until it is too late. Bruce Sterling (http://thinkexist.com/quotation/in_a_waystarting_ into_a_computer_screen_is_like/262688.html)
3.1 What is Mobile Multimedia? Imagine how it would be if all the multimedia that you have in your home—the music and the DVDs—could be transported into the mobile domain. Well, we are almost there. But it has involved “fitting the multimedia to the pipe size available.” When mobile devices began to be targeted for the delivery of multimedia (video, music, pictures, animations, Flash movies, and voice), a number of issues needed to be settled. These include: ● ● ● ●
Technologies used for multimedia in the mobile domain File formats used for multimedia Transport protocols to be used on mobile networks Procedures for call setup, release, and transfer of multimedia content
Many of these characteristics are dependent on the mobile devices themselves. How large are the screen sizes? What are the capabilities they possess to handle multimedia files, e.g., those in MPEG-2, MPEG-4, or Windows Media? Can they handle multiple services at one time? In this chapter, we will discuss some of these characteristics that set mobile devices apart from fixed desktop devices or home TVs. It turns out that this has been not a single step but a continuous journey through a series of formats to find the right fit, just like the Apple coming out with “adaptive live streaming” for 3G iPhones in late 2009 or Microsoft with Silverlight™.
3.1.1 The Mobile World Legacy for Multimedia: The 3GP The mobile world is today dominated by networks using the GSM, CDMA, or 3G technologies. Devices that can handle multimedia on these network number in hundreds of © 2010 Elsevier, Inc. All rights reserved. DOI: 10.1016/B978-0-240-81287-8.00003-5
63
64
Chapter 3
millions. Hence many of the media formats that exist in the industry today are those defined by the 3GPP or 3GPP2 standards.
3.1.2 Encoding for Audio and Video The mobile devices are characterized by small screens, limited processing power, and limited memory.1 This implies that complex encoding and decoding tasks for video, pictures, graphics, animations, and so on need to be defined as subsets of the full-resolution, full-powered desktop applications. The limited capabilities of mobile devices have meant defining encoding standards and encoder profiles (such as MPEG4 simple profile Level 1) to ensure that these can be used safely in a wide range of devices. Specific formats for video, voice, and audio encoding have been prescribed for use in GSM and 3G (UMTS or CDMA) networks.
3.1.3 Screen Sizes, Frame Rates, and Resolutions Mobile devices are characterized by the use of small screen sizes, e.g., CIF (352288), QVGA (320240), or QCIF (176144). These have a correspondingly lower resolution than the SDTV or a desktop (SVGA or XGA). The frame rates of video offered on mobile devices may be at the prescribed NTSC or PAL rate (30 or 25 frames per second, respectively) or may be at lower rates, e.g., 15 frames per second. The service providers make a selection of the screen sizes and frame rates over which certain services (such as mobile TV) may be offered.
3.1.4 File Formats A number of file formats for video, music, and voice are prevalent in the industry. These range from completely uncompressed video and audio to compressed and commonly used formats such as MPEG-4 video and MP3 or AAC audio. The file formats are also standardized by bodies such as 3GPP and 3GPP2 so that the files can be delivered and played by universally used players. When video is used in a broadcast environment, it is not sufficient to merely have raw or encoded data. It needs to be associated with “metadata,” which describes the file and media properties. This requires “containers” in which everything can be put together. It is common to use ISO-based file-format containers for storage and transmission of multimedia information. MPEG4 and 3GPP files, for example, are based on an ISO file container format. 1
It should be recognized that the capabilities of mobile devices have been growing exponentially. With advanced graphics and multimedia processors, 8 GB micro SD cards, 8 megapixel cameras, and WVGA resolution screens already available, it is expected that there will be essentially no major limitations on these devices except power consumption in the next two years.
Introduction to Streaming and Mobile Multimedia
65
3.1.5 Transmission Media The transmission of content to mobile devices implies the use of wireless media. The media can be a cellular network such as GSM, GPRS, CDMA-1X, CDMA2000, or 3G evolutions such as EVDO and HSPA. Wireless is a very challenging environment for the delivery of multimedia. This is due to the fact that the signal strengths and consequently the error rates can vary sharply as the user moves in the coverage area. The protocols used are expected to recover the data to deliver error-free files. However, for real-time services such as video or music streaming, the maintenance of a basic transmission rate is critical. There are various mechanisms to deal with sustained rates of data transfer such as buffering, service flows in WiMAX, and automatic reassignment of resources. The operators need to make a selection of these parameters for service planning.
Figure 3.1: Elements of mobile multimedia in the cellular mobile world.
3.1.6 Service Definitions and Transfer Protocols The transfer of multimedia information can involve a number of steps. These can include setting up a connection, selection of services (voice call, video call, browsing, and so on), negotiation
66
Chapter 3
of parameters (e.g., data rates for streaming), and the tearing down of the connection after successful data transfer. Some services may involve point to multipoint data transfer, such as a video conferencing service. The 3GPP, for example, defines certain predefined services such as MMS messaging, 64 Kbps circuit-switched data calls (3G–324M), packet-switched streaming 3GPP-PSS), multimedia broadcast and multicast service (MBMS), etc. The services may be associated with specific transfer protocols such as Flute for file transfers.
3.1.7 Animation and Graphics Another aspect of multimedia in the mobile domain relates to the transfer of applications from the desktop to the mobile space. These applications include streaming servers and players, video players on mobiles, and mobile browsers. Gaming and animation applications also need to be “squeezed in” so as to fit into the capabilities of mobile devices. Hence it is common to use different profiles that are more suitable for mobile phones rather than desktops for development of such applications as well as their execution on the mobile platforms.
Figure 3.2: Software environments for mobile phones.
In general, the mobile multimedia standards prescribe the use of limited types of encoders and encoding formats, subsets of graphics applications (such as Scalable Vector GraphicsTiny), scaled-down animation software (such as Adobe Flash Lite or Java MIDP® (Mobile Information Device Profile). To summarize, mobile multimedia has a number of elements: ● ● ●
Multimedia files Handling graphics and animation Call setup and release procedures to deliver multimedia
Introduction to Streaming and Mobile Multimedia ● ●
67
Multimedia transfer protocols Multimedia players or receive end clients
This is not to suggest that all mobile devices need to be confined to the same limitations on encoder profiles, file formats, and applications that can be supported. A range of mobile devices (including mobile phones) is constantly emerging that can support much higher capabilities than supported by baseline profiles that are recommended for backward compatibility. Although the standardized protocols and formats mentioned in this chapter restrict the recommended use to a few specific file types and protocols, networks in practice may use other file formats, encoders, and players, as well as DivX, QuickTime, Windows Media, Flash Video, and Real, which are proprietary but used so widely so as to be considered de facto standards in their own right.
3.2 How Do Mobile Devices Access Multimedia? There are primarily four ways in which mobile devices gain access to multimedia information: The first is via a connection made via a telecom network to a “service” providing multimedia content. These types of connections, made via 3G or GPRS/EDGE networks, are often used for video calls or circuit switched video conferencing. These are set up using the protocols and procedures specifically prescribed for such calls by 3GPP. The second method of access is by making an Internet connection. This connection can be via any means: • Using the 3G/GPRS network on which the device operates. • As an “external connection” using networks such as Wi-Fi, WiMAX, Bluetooth, or others. Most 3G networks now support access to such unlicensed mobile access (UMA) networks seamlessly while roaming. The iPhone 3GS can, for example, switch between Wi-Fi and 3G networks where available. More details on the characteristics of third-generation networks are given in Chapter 4. Thirdly, the mobile device may be in the receiving area of a terrestrial transmission and receive mobile TV or multimedia files. Lastly, WiMAX and 4G networks such as LTE (long term evolution technologies) deliver multimedia content using unicasting or multicasting.
3.2.1 The Mobile Internet and the .mobi Domain Mobile devices are being used increasingly to access the mobile Internet. So is there a separate “Internet” for these devices, where all content is designed to suit the capabilities of mobile phones?
68
Chapter 3
Figure 3.3: Mobile devices can access multimedia in diverse ways.
It turns out that that is not the case, even though initial moves were toward designing WAP-enabled sites. It is not a “must” to design the sites for mobile devices. Browsers such as Opera Mini can convert the content designed for desktop devices for mobiles. However, it was natural that an Internet top-level domain be made available and customized for mobile devices. Such a domain is the .mobi domain, the registrations for which commenced in 2006. Having a .mobi website helps in providing a better web experience to users than a “normal website.” Such sites may be built with special mobi screen makers and Flash-based animation content in addition to a better organization of information or media.
3.3 File Formats for Mobile Multimedia It was evident to all that without the harmonization of efforts for standardization, deployment of mobile multimedia would be a difficult proposition. Operators, equipment manufacturers, and handset vendors, as well as the standards bodies, became very seriously involved with
Introduction to Streaming and Mobile Multimedia
69
efforts to standardize the file formats, protocols, call setup procedures, and applications for mobile networks. The standardization was done under 3GPP.
3.3.1 3GPP Standardization Areas 3GPP The 3GPP is a partnership project of a number of standards bodies that are setting standards for third-generation cellular mobile services and LTE. 3GPP specifically refers to the third-generation partnership project of GSM-evolved 3G networks, i.e., 3G-UMTS. Evolution technologies such as HSDPA, HSUPA, and HSPA(LTE) for higher-speed data transfer are a result of coordinated effort by the 3GPP. The 3GPP releases include standards for encoding and decoding of audio, video graphics, and data, as well as call control procedures and cell phones and user devices. 3GPP2 The 3GPP2 partnership was created to provide specifications and an evolution path for technologies for the cellular networks based on the ANSI-41 (CDMA-1x RTT) standard and its successors, such as CDMA2000 and EV-DO. Its members included organizational partners such as the Telecommunications Industries Association (TIA) USA, ARIB, TTC Japan, TTA Korea, and CCSA China, amongst others.
3.3.2 3GPP Mobile Networks The services that can be provided using 3G networks have been defined progressively in different releases of 3GPP, which mirror the capabilities of new networks, and consequently the types of connections that can be established and the multimedia services that can be provided. 3GPP recommendations provide for end-to-end protocols for call establishment, media transfer, and release. The first release of the industry-coordinated specifications for the mobile networks was in 1999. Since then there have been progressive developments that have been reflected in further releases and upgrades. 3GPP release 1999 The 3GPP release 1999 resulted in the adoption of a Universal Terrestrial Radio Access (UTRA). The UTRA is the radio standard for WCDMA; release 99 had provisions for both the FDD and the TDD (3.84 Mcps) modes. Release 99 also standardized a new codec—narrowband AMR.
70
Chapter 3
Figure 3.4: 3GPP-supported services.
3GPP release 4, March 2001 3GPP release 4 took the first steps toward an IP-based infrastructure. The 3GPP embraces an all-IP core network based on IPv6. It provided for the following main features: ●
●
●
●
New Messaging Systems: Provided for enhanced messaging systems, including rich text formatting (RTF) and still image and multimedia messaging (MMS). Circuit-Switched Network Architecture: Release 99 provided for bearer-independent network architecture. IP Streaming: The release provided for a protocol stack that provided for streaming of real-time video over the network. GERAN—GPRS/EDGE Interface: Release 4 provided for the EDGE /GPRS interface.
3GPP release 5, March 2002 Reflecting the rapid pace of standardization in 3G systems, the 3GPP release in 2002 unveiled the IMS (IP Multimedia System) as a packet core of the 3G mobile networks. Voice over Internet Protocol(VoIP) calls became possible using the Session Initiation Protocol (SIP), which is how VoIP calls are made. At the same time, legacy switched voice calls can be made using the circuit-switched core. The concept of HSDPA (High-Speed Downlink Packet
Introduction to Streaming and Mobile Multimedia
71
Access) based on higher order modulation (of 16 QAM; quadrature amplitude modulation) was also unveiled. It also provided for a wide-band AMR (AMR WB) codec and end-to-end quality of service (QoS). HSDPA is a major step toward the services such as Unicast Mobile TV on 3G networks. The framework provided by the IP multimedia system of release 5 sets the stage for end-toend IP-based multimedia services, breaking away from the circuit-switched architecture of the previous generations. It also provides for an easier integration of the instant messaging and real-time conversational services. The messaging enhancements include enhanced messaging and multimedia messaging. It is important to note that the IMS is access-independent. Hence it can support IP-to-IP sessions over packet data GPRS/EDGE or 3G, packet data CDMA, IP wireless LANs 802.11, and 802.15, as well as wire-line IP networks. The IMS consists of session control, connection control, and application services framework. Security interfaces were also introduced in release 5, which included access security, access domain security, and lawful interception interface.
Figure 3.5: 3GPP releases for mobile multimedia.
72
Chapter 3
3GPP release 6, March 2005 A major feature of release 6 was the introduction of the MBMS services. The following were the major new features of release 6 of 3GPP: ●
● ●
● ● ●
Wide-band codec: Release 6 introduced an enhancement of the AMR wide-band codec (AMR-WB) for better sound quality and coding. Packet Streaming Services (3GPP-PSS protocols) Wireless LAN to UMTS interworking: A mobile subscriber can connect to a wireless LAN using the IP services via the W-LAN. Digital Rights Management Push Services: Pushing of content to mobile devices Multimedia broadcast and multicast services (MBMS)
The 3GPP packet streaming services that were introduced in Release 6 also brought in new media types for streaming. New “brands” were introduced for the description of these services in the ISO basic media file formats so that the content could be identified and directed to the appropriate players in receiving devices. Examples of new brands introduced are: ● ● ●
Streaming servers: 3gs6 Progressive download: 3gr6 MBMS: 3ge6
3.3.3 Evolution to Packet-Switched Architecture One of the major areas of standardization of both the partnership projects has been the migration from circuit-switched domains used in the GSM and ANSI-41 CDMA networks to packed-switched domains. In the case of 3GPP, the new architecture for migration to packetswitched domains is the IP multimedia system (IMS), which in 3GPP2 is under the multimedia domain (MMD). The initial implementations of the new 3G networks involved the packet networks as an overlay. However, later implementations have been merged in a single IP core.
3.4 3GPP Mobile Media Formats In the following sections, we will be reviewing progressive evolutions in the data transmission speeds and delivery mechanisms with new releases of the 3GPP. Instead of considering these releases in an abstract manner, it is also important to understand the forces that have been driving these changes. The simplest way to demonstrate this is the use of megapixel cameras. A 2 MP camera has a picture file size of approximately 6 MB with 24 bits per pixel. Transmission of such pictures via MMS became practical when HSDPA handsets were introduced in 2006 (such as by Cingular, now part of AT&T in the United States), which could provide a data rate of
Introduction to Streaming and Mobile Multimedia
73
1.8 Mbps. This made it possible to download a picture in less than 30 seconds using such a service. HSDPA was unveiled in 3GPP release 5 and was a result of the forecasting of impending higher connectivity requirements.
3.4.1 Mobile Streaming Streaming video (MP4) and MP3 music are some of the most commonly used services on desktops. It was natural that these would be extended to the world of mobile devices. This meant that the devices support RTSP through mobile networks. However, it was soon evident that in many cases, the connections were not good enough to sustain continuous streaming. This led to mobile streaming technology to also include a “progressive streaming format” as well as delivery of content as a file before viewing. This became the basis of RSS feeds and podcasting. Using an RSS feed, content creators can publish their broadcasts as progressive download files rather than streaming them in real time, making the viewing more pleasant. The Packet Switched Streaming Services (PSS) were defined in release 6 of 3GPP. These have become the basis of packet IP connectivity of mobile devices and video on 3G and HSDPA networks. The new enhancements in 3GPP permit Mo-blogging, video sharing (YouTube), and picture sharing (Flickr™) types of services with portals, IP-based connectivity, and large file transfer capability.
3.4.2 The IP Multimedia System IMS, which was unveiled in release 5 of the 3GPP, was destined to become one of the most important developments toward the migration to IP-based core networks. Today it provides the only widely implemented mechanism for fixed-mobile convergence by virtue of its IP core network, SIP-based call initiation, and media gateways to all types of networks. The mobile environment today requires users to be continuously connected to networks, even though there may be little activity, because establishing and releasing connections is resourceintensive in mobile networks. At the same time, upon becoming active, the users expect the minimum latency in restarting the applications. With an increasing base of UMTS and HSDPA users, this has meant that mechanisms to keep thousands of users continuously connected in every cellular base station area needed to be evolved. This very feature, a result of lifestyle evolution, is being introduced in release 7 of the 3GPP as continuous packet connectivity (CPC). Large storage is now a common feature of all smartphones and mobile devices, demonstrated nowhere better than in the iPhone or iPod with 80 GB storage. Release 7 of the 3GPP provides for a new approach in dealing with multimedia and large files by allowing a high-speed protocol based on USB technology. The new enhancements permit the UICC to be considered as a large and secure storage including use of flash memory technology, an OMA smartcard web server, and remote file management technologies.
74
Chapter 3
The 3GPP releases include standards for encoding, decoding of audio, video graphics, and data, as well as call control procedures and cellphones and user devices.
3.4.3 File Formats for Mobile Multimedia in 3GPP What are file formats for 3GPP? The 3GPP, for third-generation UMTS (or WCDMA) networks, has defined a standard file format to contain audio/visual sequences that may be downloaded to cellular phones so that they play uniformly, regardless of the country the user is in, the handset type, or the operator network. The 3GPP2 (which as mentioned earlier is the body for CDMA evolved networks) has also adapted the use of similar file formats. The files are based on the ISO file format. Within the file (as with all files in the ISO family), there is an intrinsic file-type box, which identifies the specifications to which the file complies, and which players are permitted by the content author to play the file. This identification is through four-letter “brands.” The media files generated by the encoders are based on MPEG-4 and H.263 coding standards for the initial releases of 3GPP. The files that are used in GSM, 2.5G, and 3G WCDMA networks are denoted by .3gp and are based on MPEG-4 video and AAC or AMR audio. The files used in CDMA and evolved networks—CDMA2000, CDMA1x, and 3X—are denoted by .3g2 and are also based on the same codecs, i.e., MPEG-4 for video and AAC, AMR for audio, with additional support for QCELP. MPEG-4 is an object-based encoding standard that constitutes a layered structure with separation between the coding layers and the network abstraction layers, making it ideal for delivery over a variety of media. MPEG-4 also has a large number of profiles that can enable its application to very low bit rate applications, while at the same time maintaining the flexibility to go up to higher bit rates through enhancement layers, or to broadcast-quality transmissions right up to high definition (HD). It is also ideally suited for handling computergenerated and animation objects such as synchronized graphics and text, face, and body animation, and many other applications. The MPEG-4 part 10 is also standardized as the ITU standard H.264 and fits into the MPEG-4 framework.
3.4.4 3GPP File Formats for Circuit-Switched 3G-324M Services As MPEG-4 has many profiles and levels, 3GPP has standardized the information in Table 3.1 as the baseline media specifications for use over 3G networks with 3G-324M encoders/decoders. The standardization was considered necessary to limit the complexity of the encoders and decoders used in mobile devices that could be used over circuit-switched 3G-324 M services. The simple profile permits the use of three compression levels with bit rates from 64 kbps in Level 1 to 384 kbps in Level 3. The MPEG-4 simple visual profile Level 1 has adequate
Introduction to Streaming and Mobile Multimedia
75
Table 3.1: 3GPP File Formats for 3G-324M Networks. Codec Feature Video Codec Frame Rate Resolution Audio Coding
Specification MPEG4 Simple Profile Level 1, Recommended Support of MPEG-4 Simple Visual Profile Level 1 (ISO/IEC 14496-2) Up to 15 fps 176144 AMR coding and decoding is mandatory; G723.1 is recommended
error resilience for use on wireless networks, while at the same time having low complexity. It also meets the needs for low delay in multimedia communications. The MPEG-4 simple visual profile Level 1 has support for H.263 baseline profile codec. The encoding mechanism recommends the enabling of all error resilience tools in the simple visual profile. The conversational calls using 3G-324M use essentially the H.263 protocols. The 3GPP recommends the features and parameters that should be supported by such codecs and such extensions are covered in the Mobile Extension Annex of H.263. The support for the MPEG-4/AVC (H.264) codec with full baseline profile has been recommended as optional in release 6 of the 3GPP. Today, the support of H.264 or H.264 with modifications for mobile networks is quite common.
3.4.5 ISO File Formats 3GPP files (.3gp) are “structurally” based on the ISO file formats, which is the primary standard for MPEG-4-based files. The ISO/IEC formats have been standardized by the ISO Moving Picture Expert Group and were earlier derived from QuickTime formats. The 3GPP file format described in Table 3.1 is a simpler version of the ISO file format (ISO-14496-1 Media Format) supporting only video in H.263 or MPEG-4 (visual simple profile) and audio in AMR or AAC-LC formats. The .3gp and .3gp2 formats, both of which are based on the ISO file format, have structures to incorporate inclusion of non-ISO codecs such as H.263, AMR, AMR-WB (for 3GPP), and EVRC and QCELP (for 3GPP2).
3.4.6 The ISO File Container The ISO file format essentially provides a container in which the media metadata and information is carried based on a universally accepted format (Figure 3.6). For this purpose, the ISO-based media file format defines a “file type” box called “ftyp.” This field precedes any variable-length fields such as media data. The file type box also contains fields called “brand” and “compatible brands.” The field “brand” describes the best use of the file; “compatible
76
Chapter 3
Figure 3.6: MPEG file format definitions in ISO.
brands” gives a list of compatible formats, e.g., players on which the playback may be possible. The values that can be used are defined by 3GPP and 3GPP2. Brand The field “brand” can be used to indicate the 3GPP release version in 3GPP, thus indicating to the receiver the file capabilities. In general, higher releases such as release 7 will be compatible with the lower releases and these will fall in the “compatible brands field.” An ISO-compatible file in 3GPP (release 5 and beyond) would also have the value “isom” as a “compatible brand” to indicate compatibility of the file with the ISO baseline format. 3GPP2 has its own specific values for these fields.
3.4.7 3GPP Files 3GPP files generated by the encoders are based on MPEG-4 and H.263 coding and packaged in accordance with 3GPP standards. In this book, we will refer to .3gpp and .3gp2 files by the generic name .3gp.
Introduction to Streaming and Mobile Multimedia
Figure 3.7: Examples of brand usage in 3GPP files for MMS and download.
Quick Facts .3GP File Format
Figure 3.8: File conversion to 3GPP.
77
78
Chapter 3
1. How are .3GP Files Generated? Mobile phones, while recording video, record files in .3gp format (some advanced phones also record video in DV or HDV formats). An example is the Sony P1i phone. .3GP files can be also be generated from any other content format by all major video processing packages such as QuickTime Pro. 2. How are .3GP Files Played? .3GP files can be played by software available on virtually all mobile phones in addition to desktop players such as QuickTime, iTunes (movies), the VLC media player, RealPlayer, and so on. 3. How can Video Content be Converted to .3GP? Video content can be easily converted to .3GP by a large number of software packages, some of which are free. One such package is IamTOO™ 3GP Video Converter. The software can convert any video files or DVDs to any of the common cellphone formats as well as screen resolutions (the software provides the save option). Another example of media converter is the Nokia Multimedia Converter 2.0 for conversion of AVI, WAV, MPEG, and MP3 into standard 3GPP/AMR, H.263, wide-band, and narrow-band AMR-supported formats.
The MPEG4 format (.mp4) (ISO 14496-14) with wrapper and container attributes allows multiplexing of multiple audio and video streams in one file (which can be delivered over any type of network using the network abstraction layer). It also permits variable frame rates, subtitles, and still images. 3GPP files may conform to one of the following profiles: ●
●
3GPP Streaming Server Profile: The profile ensures interoperability while selecting the alternative encoding options available between the streaming server and other devices. 3GPP Basic Profile: Used for PSS and messaging applications (MMS). The use of the basic profile guarantees that the server will work with the other networks and mobile devices. Quick Facts 3GPP-PSS Streaming Profile Latest Release in Common Use: Release 6 Video Encoding: Mandatory, H.263 Profile 0, Level 10 (QCIF 64 Kbps); Optional, H.264 Visual Simple Profile, Basic Profile Audio Encoding: AAC-LC (stereo, 48 Kbps) mandatory; AMR-WB mandatory; AAC, AAC optional Vector Graphics: SVG-T mandatory, SVG-Basic optional Image Coding: JPEG mandatory; GIF 87a, 89a, PNG optional
Introduction to Streaming and Mobile Multimedia
79
Session Setup and Control: SDP, RTSP Presentation Format: Synchronized Multimedia Integration Language (SMIL) 2.0 Basic Language Profile, Meta Information, Media Description, Media Clipping, Event Timing, Basic Transitions Transport: RTP over UDP for media; RTCP for control, QoS signaling, progressive download Encryption: Mandatory per 3GPP for storage and transmission, DRM based on OMA DRM 2.0 standard
3.4.8 Creating and Delivering 3GPP and 3GPP2 Content Content in 3GPP and 3GPP2 formats can be prepared and delivered using a number of available industry products. As an example, Apple QuickTime provides a platform for the creation, delivery, and playback of 3GPP and 3GPP2 multimedia content. It provides native support of mobile standards as well as the full suite of tools, from ingesting, editing, encoding, and stream serving. Apple’s QuickTime Pro, which can be installed on Windows computers or Macintosh computers, allows the user to ingest video and audio files, to compress using H.264 or
Figure 3.9: Screenshot of QuickTime video file creation. (Courtesy of Apple Computers)
80
Chapter 3
3GPP(2), and to prepare multimedia files using Dolby 5.1 or AAC audio. The output files can be saved as 3GPP for delivery over mobile networks as well. Apple’s QuickTime Streaming Server (QTSS) provides the capability to stream MPEG-4, H.264, or 3GPP files over IP networks using the open standards RTP/RTSP protocols. The QuickTime family also has other tools, such as XServer, where playlists can be loaded with 3GPP, MPEG-4, or MP3 files to create a playlist so that the server can be used as an Internet or mobile network TV station. Limitations of 3GPP .3GP files have many limitations, many of which were intentional to make encoding, decoding, and transmission in the mobile environment practical. The picture size of 176144 and encoding at 128 Kbps is fine for video calls but is proving to be inadequate as the networks advance beyond 3G (HSDPA or EVDO) and further to 4G and applications include video clips and movies. Many handsets now support the WVGA (720480) format or higher pixels and can display much higher-resolution video than is the norm in 3GPP encoding. Figure 3.10 shows how .3GPP content looks at QCIF (176144), WVGA (720480), and a larger picture size as it is scaled up. Notice the compression artifacts and loss of resolution, which are unacceptable in larger screen sizes.
Figure 3.10: 3GPP video on different display sizes.
Introduction to Streaming and Mobile Multimedia
81
Quick Facts Other Common Container Formats Container Format 3GPP, 3GPP2 Apple QuickTime (.mov) Microsoft ASF/ASX Flash Video (.fli/.flv) RMVB/RA/RM (China) OGG/OGM DVD (VOB)
Video/Audio Codecs MPEG-4 SP, H.264, and HE-AAC, H.264, and AMR WB/NB MPEG-4 SP, H.264, and HE-AAC, AMR WB/NB VC-1 and WMA FLV (Sorenson Spark or H.264) and MP3 RV and RA Vorbis, FLAC MPEG-2, AAC
3.5 Internet Video “Internet TV” is commonly used to describe video and associated audio that is available and can be viewed on the Internet. This may be by opening a website by entering its URL address in the browser. This type of viewing of video over the Internet falls under the category of HTTP access to web content (HTTP streaming). Video can also be streamed from a streaming server using the RTSP protocol or initiating media play by using a media player that may use RTSP streaming or proprietary formats (Windows Media Player, Adobe Flash Player, or RealVideo, for example). Any video files that are embedded in the HTML content of the page and natively supported by the browser (e.g., Internet Explorer, Firefox, Opera) will then “play” on the web page. Apart from live streaming, video files can also be downloaded and played by launching the appropriate player (Windows Media Player, Adobe Flash Player, RealPlayer, Apple QuickTime, Google Video Player, etc.). Downloads are practical only with short videos, owing to file size and time involved in downloading.
3.5.1 YouTube The file formats and resolutions of videos that are available on the Internet are site-dependent, in most cases. For example, YouTube, which is designed to accept video uploads from camcorders and cellphones, accepts video in .wmv (Windows media), .avi (Audio Video Interleaved, used by Windows for video), .mov (QuickTime movies format), .3gp and .mpg (compressed MPEG-2) formats. When users download video from YouTube, the format used is Flash Video (.flv).
3.5.2 Google Video Google Video can be downloaded in three formats: .avi, .mp4, and .gvi. Video from the website can also be streamed by using a shortcut that points to a .gvp file. The downloaded
82
Chapter 3
video can be played by using Google Video Player (a free download) if in .avi or .gvi formats and by using a DivX player if it is in .mp4 format. If the browser used is Flash-enabled, Flash Video (.flv) can also be played. Videos uploaded to the websites can vary in resolution depending on the source, which can range from a cellphone with a VGA camera to a digital camera with 5 MP resolution or even HD. These are usually converted to a common format and scaled down in resolution so that the viewers can retrieve them universally. The common screen resolutions supported on mobile devices are 320240 (QVGA) and QCIF (176144). YouTube automatically converts videos to the 320240 format. The videos can also be downloaded to other devices such as iPods or the Sony PSP. Internet video is in effect TV (or video) on a PC with “best-effort delivery.” There is no end-to-end quality-of-service control to ensure that video can be viewed without serious degradation or interruption. There is no encryption of services to generate pay TV revenues and generally no TV business model.
3.5.3 Apple HTTP Live Streaming The mainstay of streaming applications by Apple has been its QuickTime Streaming Server (QTSS), which streams video using the RTSP protocol. However, RTSP streaming in the mobile environments has been difficult, due to highly variable bit rates. In fact, the iPhone 3GS does not even support RTSP-based streaming. RTSP video can also be affected by firewalls that may restrict this type of traffic. Apple has now introduced HTTP Live Streaming, where the media file is broken down into segments of about 10 seconds each and packaged using an MPEG transport stream. These segments are picked up by the HTTP player in a receiver device such as the iPhone 3G, that begins playout after three or four segments are received and then keeps receiving these segments through HTTP requests. There is no restriction on content type and the media player can negotiate the type of stream that best fits the bandwidth available, such as via a mobile network or 3G. With the improvements in core and access networks, the Internet can be used for good-quality video streaming, particularly in areas where the access networks provide high speeds of 4 Mbps or higher. Content delivery networks (CDNs) such as iBEAM™, Limelight networks®, and so on provide content caching at the edge so that the streaming video can be delivered with the highest data rates. Some operators now offer services over the Internet using CDNs to deliver content on a unicast basis. The use of Flash Video for providing TV services over the Internet (replicated by the CDNs) has become a very popular method for delivery of video in the recent past. However, such content is viewable only where the access networks support the speeds needed to view live streaming video.
Introduction to Streaming and Mobile Multimedia
83
3.6 Flash Lite™ Nothing greater signifies the importance of Flash Video than its use on YouTube as an exclusive format for delivery of video. You can upload content in a variety of formats (such as .mpg, .3gp, .avi, or QuickTime), but YouTube has standardized on the use of Flash Video for streaming (or download) from the site. It also supports Flash Lite for video delivery to mobile devices. The number of handsets that have been shipped with the Flash Lite runtime and players preinstalled is in the hundreds of millions and is likely to exceed 2.5 billion by 2010. This makes it an important format for those targeting the delivery of any type of video content to mobile devices. The fact that YouTube is now providing access to full-length movies and other video content makes Flash Video a key format for delivery of mobile multimedia. In addition, many content delivery networks now use Flash Video. Flash Video files are denoted by .flv (for Flash Live Video). Flash Video is generated by high-compression codecs and is ideally suited for downloading or streaming on the web. Flash version 6 used a proprietary codec called the Sorenson codec. Flash 8 and higher versions use a modified version of H.264 (On2® VP6). Adobe Flash Lite is a runtime version of Flash that has been optimized for mobile devices.
Quick Facts Adobe Flash Lite Encoding Specifications for Flash Lite Content Encoder Support: Sorenson Spark and On2 VP6 Screen Resolution: 480320 or 320240, 176144; other resolutions possible. Flash Lite Player on Mobile Phones Flash Lite 3.1 Distributable Player is a runtime version of Adobe Flash Player for mobile devices. Having a size of about 400 KB (for Symbian S60), it does not require the Flash Lite software to be preinstalled in phones. Generation of Flash Lite Content Flash content targeted for mobile devices can be generated by the Adobe Flash Media Encoding Server. The Flash encoding server can generate Flash content for both mobile devices (Flash Lite) and regular Flash players. Most other video editing packages support the saving of video in Flash Lite format. These include Adobe CS3 Professional with the Flash Lite upgrade and higher versions. Adobe After Effects® can be used to export Flash content (.swf) as Flash Video (.flv). Adobe Mobile Client Adobe Mobile Client is the new generation of mobile software platform from Adobe that includes a Flash-based rendering engine and device APIs. An example of a mobile phone based on the Adobe Mobile Client is the LG KP500-Cookie Phone.
84
Chapter 3
Delivery of Flash Lite Content Flash Lite content can be delivered using: ● ● ● ●
Streaming video using RTSP Using Flash Media Server (with Flash Lite 3.0) Delivering video through HTTP connection Video download and play from local storage
YouTube Access for Mobile Devices http://m.youtube.com
3.6.1 Mobile TV Using Flash Video Delivery of video to mobile devices using Flash Video is now very common. Most of the user-generated content (UGC) sites now use Flash Video for delivery of video content to desktops as well as mobiles. One of the early mobile TV service providers using Flash Video Lite is Singtel, with its Mio TV service. The service operates on the 3G network of Singtel. Users need to have a supported handset with Flash Lite 2.0 or above and download the mio TV application before they can use the service. Other users can download the players from the http://www.adobe. com/products/flashlite/ website. mio TV channels were offered at Singapore $6.00 per month in bundled offerings at the time of its launch in May 2008. There is no separate data charge. The Flash content is streamed from an access point to which the application connects. Using Flash, subscribers have access to higher-quality content than would have been possible via .3GP codecs, due to superior compression at the same bit rates. Many other operators are moving from proprietary players in mobiles to Flash players. An example is the web TV service provided by Babelgum™ in the United Kingdom, Italy, and the United States. The freely downloadable application can be used on a range of mobile devices including the iPhone and the PSP.
3.7 DivX Mobile Movies that are downloadable in the DivX format have been very popular in home theaters and Internet-based TV. DivX mobile is now available on many DivX-certified phones. Many of the Qualcomm chipset–based mobile phones have DivX incorporated. It can also be downloaded from the DivX website for free for a wide range of mobile phones. The DivX mobile player is available for download for Windows Mobile 5 and 6 and Symbian S60 (third edition and higher), UIQ 3.0, and higher versions. DivX-enabled phones can play DivX video directly from any website hosting content in DivX format. Alternatively, videos can be downloaded from the website http://mm.divx.com. The website also has Google Video as a partner.
Introduction to Streaming and Mobile Multimedia
85
Figure 3.11: DivX mobile. (Courtesy of DivX)
FAQs on DivX Mobile 1. How is DivX Mobile content created? DivX mobile content can be created by the DivX converter that is downloadable from http:// www.divx.com. This software can convert from DVDs or other video formats to DivX mobile. While converting, various parameters such as picture resolution, audio bit rate, file size, and mobile profile can be selected. The default mobile profile is for 320240 (QVGA).There are also third-party converters available, such as Super™Converter. 2. Which phones can play DivX mobile content? All phones that are DivX certified can play DivX content. Examples are the Samsung Omnia, the SGH-F500 (Ultra Video), the LG Viewty, and the Cookie. DivX mobile can be downloaded on phones that are not DivX certified. This includes Windows Mobile, UIQ and Symbian phones. 3. Can DivX content be streamed? DivX content is meant for viewing on a video on demand (VOD) basis. The DivX encoder generates file sizes that are small enough for Internet download. However, the file can begin playing after download has started and it is not necessary to download the entire file before viewing. Typical bit rates needed are 400 Kbps for video and 128 Kbps for stereo audio. 4. From which sites can DivX mobile content be downloaded? DivX mobile content is available for download from http://mm.divx.com as well as a number of other mobile content websites, such as Nokia OVI(store.ovi.com), Cinemanow.com, and others.
86
Chapter 3
5. How can content producers sell content in DivX format online? Content producers can use an open hosted service such as an “Open Video System.” The platform provides encoding, encryption, hosting, and online delivery of content. Content producers can thus open their own video stores.
3.7.1 Microsoft Silverlight® Just as Flash Player has been available for free download or original equipment manufacturer OEM installation for many years and is now available in a majority of devices, Microsoft has been quietly but firmly increasing availability of its new web application framework called Silverlight. In fact, many events such as the Indian Premier League 2009 could be viewed only if the Silverlight plug-in was downloaded. Version 3.0 of Silverlight was released in July 2009 and is compatible with Windows and Mac OS X. Mobile versions compatible with Symbian S60 and Windows Mobile 6 are being released. Silverlight integrates multimedia, animations, graphics, and interactivity in a single runtime environment, providing a very efficient way to present rich media applications. It supports WMV, WMA, AAC, H.264, and VC-1 multimedia formats. Applications for Silverlight are written using a text-like language (XAML) and can be developed using the .NET framework. Visual elements are very easy to manage using XAML and are represented as “visual trees” that can be rendered along with other vector or bitmap graphics.
Figure 3.12: Microsoft Silverlight. (Courtesy of Microsoft®)
Introduction to Streaming and Mobile Multimedia
87
3.7.2 Microsoft Smooth Streaming Much like Apple, Microsoft has also adapted an HTTP-based streaming format called Smooth Streaming (something that seemed to be sorely needed by users annoyed by frequent buffering). Smooth Streaming dynamically switches the video quality of a file being delivered based on the bandwidth and CPU load. Microsoft Smooth Streaming, based on adaptive HTTP streaming, is an extension to the Microsoft’s Internet Information Services (IIS 7.0- Media Services) and signals a departure from its ASF format.
3.8 Rich Media–Synchronized Multimedia Integration Language (SMIL) Many applications require not just the display of a few images or graphics or audio files but also that these files be synchronized and presented as integrated media. An example is a voice-over associated with an image or a speech associated with a presentation. This type of synchronization enables the delivery of “rich media” and can effectively represent a playlist running at the originating end much like a TV station. Synchronized Multimedia Integration Language (SMIL), pronounced “smile,” is one technique than can accomplish this objective. SMIL is supported by Real as well as Apple’s QuickTime architectures. It is also possible to add media synchronization to HTML by using XML to allow the description of parameters for synchronization of streaming video, images, and text.
Figure 3.13: Rich media presentation using SMIL.
88
Chapter 3
In the absence of a synchronization language, the images and clips are delivered as separate units that when opened by users in differing sequence do not present an integrated picture, as the sender might have desired. SMIL is a World Wide Web Consortium (W3C) standard that allows the writing of interactive multimedia applications involving multimedia objects and hyperlinks and allows full control of the screen display. SMIL can be played out by SMIL-compatible players. The transmission can be either via in the streaming mode (PSS) or can be downloaded, stored, and played. SMIL is similar to HTML and can be created using a text-based editor. (SMIL files have the .smil extension.) The language has parameters that can define the location and sequence of displays in a sequential fashion and prescribe the content layout, i.e., windows for text, video, and graphics. As an example, the SMIL language has commands for sequencing of clips seq, parallel playing of clips par, switching between alternate choices (e.g., languages, bandwidth) switch, location of media clips on the screenregion, and others. Detailed SMIL language authoring guidelines and tools are widely available.
Figure 3.14: SMIL-based content streaming.
A typical case of SMIL may be the streaming of two video clips one after another followed by a weather bulletin containing video, a text window, and a text ticker. The following code is an example of such an SMIL file using Real media files.
Introduction to Streaming and Mobile Multimedia
89
<smil>