File Interchange Handbook For Images, Audio, and Metadata
This Page Intentionally Left Blank
File Interchange Handbook For Images, Audio, and Metadata
Editor in Chief,
Brad Gilmer
AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Focal Press is an imprint of Elsevier
Focal Press is an imprint of Elsevier 200 Wheeler Road, Burlington, MA 01803, USA Linacre House, Jordan Hill, Oxford OX2 8DP, UK Copyright © 2004, Elsevier Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail:
[email protected]. You may also complete your request on-line via the Elsevier homepage (http://elsevier.com), by selecting “Customer Support” and then “Obtaining Permissions.” Recognizing the importance of preserving what has been written, Elsevier prints its books on acid-free paper whenever possible. Library of Congress Cataloging-in-Publication Data Application submitted. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN: 0-240-80605-0 For information on all Focal Press publications visit our website at www.focalpress.com 04 05 06 07 08 09 10 9 8 7 6 5 4 3 2 1 Printed in the United States of America
Contents
Introduction 1
Convergence of Information Technology and Traditional Television Production
vii
1
by Hans Hoffmann
2
Structure and Organization of Metadata and the SMPTE Metadata Dictionary
31
by Oliver Morgan
3
The Digital Picture Exchange File Format
61
by Dave Bancroft
4
SMPTE 360M: General Exchange Format
101
by Bob Edge and Ray Baldock
5
The Material Exchange Format
123
by Bruce Devlin and Jim Wilkinson
v
Contents
vi
6
Advanced Authoring Format
177
by Phil Tudor
7
Advanced Systems Format
227
by Nick Vicars-Harris
8
QuickTime File Format
275
by George Towner
Index
295
Introduction
It was not long ago that the average computer had a 200 MB hard drive, a 50 MHz processor, and a connection to the outside world that consisted of either a 3.5” floppy disk or a 9,600 band modem. If the computer was connected to a network, which was rare, the network consisted of a simple hub operating at 10 Mbps. A single, large file transfer from one computer to another brought the entire network to its knees. In this environment, it seemed ridiculous to move a project from a video tape to a computer, where it could take at least 200 GB of storage. It was equally ridiculous to move film images to such a computer environment. There were a few digital film facilities, but they employed special computers and huge storage systems. To sum up, just a few years ago, computers and networks could not support professional imaging applications. Fast forward to 2003: You cannot purchase a hard disk smaller than 80 GB. Processors run at 1 GHz or faster. Floppies are almost obsolete, having been replaced by keychain USB drives and gigabyte PCMCIA cards. Many businesses have at least a 1 Mb DSL connection, and networks are deployed at 100BaseT or gigabit speeds for a few hundred dollars. Clearly, this is a different environment from just a few years ago. But one additional advance has been crucial in enabling imaging applications on the desktop—compression. Using current compression technologies, that 200 GB file is now 8 GB. New compression technologies promise to reduce file sizes even more. Today, it does not seem unrealistic to transport images, sound, and metadata on a computer network. The promise of doing so on a generic computer platform is quickly becoming a reality. Given these advances, the time is ripe for the development of file formats for our industry. But why not use generic information technology (IT) file formats? Why continue the expensive and, some might argue, arrogant past by creating additional industry-specific file formats? The industry is using many computer-related technologies, and that trend is increasing. However, some requirements are not
vii
viii
Introduction
met by existing generic formats. Take partial file transfer, for example. This feature comes in handy when you want to retrieve two frames of video from an 8 GB file. Another requirement is metadata support. Many of the file formats in this book have metadata support tailored for imaging applications. In addition, many of the formats in this book represent a move by the industry from purpose-built television infrastructure to more commodity-driven IT infrastructure. To the extent that the needs of the user can be met by a commodity IT product, it is good news. Users and manufacturers alike can take advantage of lower-cost commodity computer products applicable in our industry. Although we still may require some specialized file formats, we can use these file formats on commodity technology platforms. This book covers all major file formats used for the professional interchange of images, sound, and metadata. Typically, these file formats are used to transfer content from one system to another in a film, postproduction, or broadcast facility. Out of the thousands of available file formats, the ones in this book were selected either because they are established in the industry, or because they are in development and have features targeted toward professional applications. The industry has made a huge financial commitment to the development of these file formats. It has spent millions of dollars and thousands of hours developing the formats in this book. Clearly, the industry believes these formats are extremely important. This book is an interesting study in the advancement of technology and ideas. Digital Picture Exchange (DPX) was invented before the concept of metadata became popular. It envisioned a world in which the transfer of metadata with the image would be critical. The work of the EBU/SMPTE Task Force was seminal in establishing a roadmap for the future. In retrospect, it was successful despite its long name (The EBU/SMPTE Task Force for Harmonized Standards for the Exchange of Programme Material as Bitstreams). The task force caused the reorganization of the Society of Motion Picture and Television Engineers (SMPTE), and significant work began on the standardization of several technologies, which led to many of the formats covered in this book. The General Exchange Format (GXF) built upon previous experience in the industry and took into account many advanced user requirements, such as partial file transfer and index table support. QuickTime and the Advanced Systems Format (ASF) come from the computer industry and have built on lessons learned there. The Material Exchange Format (MXF) spans the divide between the tape and the computer worlds with support for streamable content and enhanced metadata. The Advanced Authoring Format (AAF) draws from both camps, combining computer technology with extensive metadata support for film and postproduction workflows.
Introduction
ix This book collects the writings of many of the best authorities on the subject of professional file interchange. In most cases, the authors contributed significantly to the file formats themselves. Many of the authors have a long history in the motion picture and television industry, but not surprisingly, some of them come from the computer industry, which has exerted a growing influence on the world at large and has changed the world of professional production. This book does not cover nonprofessional file formats. That does not mean such formats do not have a place in the professional environment. It just means that the Editor elected to keep the scope of the book at a manageable size by limiting the subject to professional formats. For the most part, this book also does not cover streaming formats. Other books cover streaming in great detail. This book is focused on the movement of images, sound, and metadata via file transfer from one system to another. Finally, this book does not cover the application of file formats. This may be added in a future edition. Much of the work represented in this book would not have been possible if some enlightened people in the industry had not tackled issues surrounding the convergence of IT and content production. I would like to thank all the participants of the European Broadcast Union (EBU)/SMPTE Task Force report for their landmark efforts in this area. I would also like to thank Merrill Weiss, who had the original concept for this book. Merrill has made many contributions to the industry over his career. In addition, Mike Cox has been involved in file formats for several years. Mike was one of the primary reviewers of this book, and I thank him for his time and effort.
This Page Intentionally Left Blank
This book is dedicated to its contributors. They are the best in the industry, and it is an honor to work with them.
xi
This Page Intentionally Left Blank
Convergence of Information Technology and Traditional Television Production
1 CHAPTER
Hans Hoffmann
1.1
INTRODUCTION The process of television program production has changed dramatically because of several technological and economical alterations. The often-cited convergence between Information Technology (IT) and traditional television production not only enabled television facility designers to benefit from increased processing power and storage capacity but also created the need to address radical changes in workflow. At the same time, challenging economic concerns have pressured broadcasters to increase efficiency, reduce production costs, and establish new businesses. Information technology offers ways to achieve these goals. It enables significantly enhanced creativity, improved efficiency, and more reasonable economies of scale in the creation of television programming. Driven by the growing demand for programming to fill the multiplicity of competitive distribution channels to consumers being installed around the world, IT likely will become a pervasive force in teleproduction. The term IT-based production (television, audio/sound, and motion pictures) has become common in the industry. The following are some of its major characteristics: ◆
Content is handled in file form and can be transferred in non real-time.
◆
Content is transported via standard IT networks and protocols.
◆
IT-based production relies on the creation and use of metadata as computerized information.
2
1
Convergence of Information Technology and Traditional Television Production
◆
IT-based production relies on the use of identifiers (i.e., Unique Material Identifiers, or UMIDs).
◆
IT-based production requires significant efforts in information management (content and asset management systems).
◆
IT-based production comprises an end-to-end solution (e.g., from acquisition through editing to playout).
The digital transformation of the production chain, which started in the 1990s, was characterized by two significant technical standards: International Telecommunication Union-Radiocommunication (ITU-R) Recommendation BT.601 (studio encoding parameters of digital television for the standard 4:3 and wide-screen 16:9 aspect ratios) and the Serial Digital Interface (SDI) according to ITU-R Recommendation BT.656 (interfaces for digital component signals in 525-line and 625-line television systems operating at the 4:2:2 level of recommendation under ITU-R BT.601, Part A). These baseband standards were developed by user organizations and the industry in a worldwide initiative. By applying a sampling rate of 13.5 MHz for the luminance channel (Y) and 6.75 MHz for each of the color difference channels (B-Y and R-Y), the resulting bit rate of 27 Megawords per second (10-bit resolution) necessitated a serial bit rate of 270 Mbps. Traditional digital television studio installations apply these standards for the digital encoding and infrastructure (backbone) to exchange TV signals. After the ratification of Recommendation 601 and 656, an entire industry (the traditional broadcast vendors) became established in the market and developed successful products. Several manufacturers provided components such as signal routers for moving Recommendation 601 and 656 signals between the devices in the studio environment. Others developed videotape recorders such as the D1 (8-bit uncompressed) and the D5 (8- or 10-bit uncompressed). At that time, the IT industry played a less significant role in the professional area of production. The reason was that the bit rate of 270 Mbps made it difficult to provide costeffective products, in particular for moving 270 Mbps signals over IT networks and for affordable hard-disk storage (e.g., the storage requirements for one minute of Recommendation 601 in 8 bit would be about 1.6 GB). However, users and the industry soon discovered that, by applying digital compression to the baseband signal according to Recommendation 601, both the large storage requirements for uncompressed ITU-R BT.601 signals and the transfer bit rate could be reduced. An obvious drawback was that different, incompatible compression algorithms were introduced into the professional production environment (unlike MPEG-2 compression, widely adopted for distribution to the viewer). Most of the first server systems in the 1990s used the Motion JPEG compression algorithm. For videotape recorders, prominent examples were the Digital Betacam (compression
1.1 Introduction
based on Discrete Cosine Transform, or DCT) and, more recently, the IMX (compression based on MPEG-2) or the DVCPRO (compression based on digital video, or DV). Most compression algorithms (e.g., DCT, DV, and MPEG) apply methods to reduce the bit rate by deleting information from the original signal. This information usually cannot be recognized as missing, because of nuances in the human visual system. (Irrelevant information reduction also is called lossy coding.) The existence of incompatible, and sometimes proprietary, compression algorithms implemented in products from competing vendors resulted in discord. For example, videotapes (apart from different cassette size and recording technology) could not be exchanged across vendors because of different compression systems. The solution to this problem was to decode from the compressed domain to ITU-R BT.601(baseband, uncompressed) to use the Recommendation 656 interface to transfer program material (e.g., video) between equipment from different manufacturers. This decoding and reencoding resulted in a loss of picture quality. Considering the whole production chain with all of its systems, multiple decoding and reencoding (or generations) led to significant impairment of the picture quality. The first step in avoiding such picture-quality loss was the development, standardization, and finally, the market introduction of the Serial Data Transport Interface (SDTI, standardized in SMPTE 305M). This interface was developed to transport packetized data such as compressed video within a studio production environment in its native form. The transport mechanism is compatible with SDI (ITU-R BT.656 or SMPTE 259M). This helped to avoid picture-quality reduction during compressed content exchange—as long as the source and destination systems supported the same compression format. In addition to the advantages that compression technology has delivered to the traditional broadcast environment, nonbroadcast industries—computer, IT, or multimedia, in particular—provided solutions for the professional TV production market. This significant development typically is referred to as the convergence between traditional broadcast and IT (computer) industries. In addition, with rapid developments in storage capacity and signal processing power, computer equipment became a better candidate for replacing traditional video-production equipment. This has led already to dramatic changes in workflow and the way programs are produced. Nonlinear editing (NLE) became affordable with computer/server-based storage systems and has already started to penetrate almost every broadcaster. Compression, as applied today in many professional TV products, has enabled not only efficient storage on different media but also the transfer of content via computer networks in non real time, in real time, or faster than real time. Figure 1.1 gives an overview of the storage and interface technologies in professional TV production. It also shows the effect of compression on interface technologies—first on the development and
3
1
4
Convergence of Information Technology and Traditional Television Production
D15 SDTI D14 Parallel / SDI
D12 Digital-S
D11
DCT
IMX/D10
DigiBeta D1
DVCPRO 50
D5
DVCAM File Exchange
Analog M1
M2
Betacam
Betacam S
DVCPRO 25 D2
Quatercam
FIGURE
MXF
Betacam SX
AAF
D3
1985
RAM/Solid State Optical Formats
1990
GXF
1995
2000
Recording and interface technologies in television production
1.1
standardization of SDTI, which enables the exchange of compressed video signals via SDI-based studio infrastructure, and later by introducing file exchange via computer networks or removable media (e.g., optical disc).
1.2
HISTORY OF FILE FORMATS IN PROFESSIONAL CONTENT PRODUCTION As outlined, computer-based technology has increasingly proved its usefulness in many applications within the professional broadcasting environment. Examples can be found worldwide in server systems for production, postproduction, playout, and archiving. Users and vendors are witnessing the first attempts to extend the application range of hard-disk, optical-disc, and memory-based storage to mobile applications in professional news gathering. In accordance with this, broadband networking, including continuous recording of transmitted television programs at home, has started to penetrate the consumer domain. The common denominator in all these applications is the transport of program content and its storage on nonlinear media or via networks within proprietary file formats. Therefore, program exchange can only be carried out across platforms that can manage and exploit such proprietary file structures. Users have already expressed a strong requirement to share files between systems made by different vendors. Sharing in this context refers to the exchange of content assembled in files by means of removable media or, in particular, by directly
1.2
History of File Formats in Professional Content Production
accessing the content stored in these files through standardized interfaces and network protocols. The operational and economic benefits of sharing files using nonproprietary file formats are summarized as follows: ◆
Multiple users can simultaneously access data related to a common project within a distributed production environment.
◆
File exchange does not degrade picture quality because the compressed video in the file body can be transferred in its native compressed form.
◆
File exchange can be carried out through local and wide area networks (LANs and WANs) at different speeds (i.e., slower than, equal to, or faster than real time).
◆
The speed of the file transfer can be adapted to the available channel bandwidth. (If the network allows 10 Mbps, the file is transferred at that speed; if a faster network is available and the peripheral equipment can support it, then the file can be transferred at a higher speed.)
◆
Users can balance the transfer costs against the transfer time.
◆
Metadata, audio, video, and data can be transferred in one wrapper.
◆
The physical media (tape, optical disc, etc.) can be separated from the content embedded in the file.
◆
A horizontal system, following a layered model, is possible.
◆
Broadcast systems can be built using readily available computer equipment that might result in lower overall system cost.
Within a distributed, multiuser environment, these advantages can only be exploited if the source and destination system can interoperate. This requires the file format and its content to be well defined and open. However, most implementations available to the broadcast market have employed different proprietary file formats. Some of them have been directly adopted from the IT industry, such as QuickTime, Audio Video Interleaved (AVI), and Advanced Systems Format (ASF), whereas others have been developed for more demanding applications in the professional broadcast world, such as MPEG and GXF. These professional applications have been successfully standardized. Unfortunately, the professional video market has faced challenges in adopting IT standards. Specifically, there have been challenges with the numerous, incompatible, or nonstandardized file formats of the IT world. In addition, most IT file formats had difficulties complying with the emerging needs of the professional broadcast industry (e.g., file size, editing capabilities within a file, and payload neutrality). Nevertheless, the enthusiastic introduction of server-based NLE stations in most broadcast installations worldwide and the requirement to
5
1
6
Convergence of Information Technology and Traditional Television Production
interconnect different NLE stations has created a need for common, standardized, and open file formats that can cope with all the requirements of professional production. The major criterion is the availability of file formats that permit the exchange of information in its native form—such as compressed or uncompressed video, audio, data, or metadata (content). This exchange occurs between different systems as files, independent of the location of the users (i.e., distributed production). Initially, many supposed that a single, standardized file format would meet the needs of the entire postproduction and broadcast community. This view had to be quickly corrected. First, professional media production involves features that vary significantly between application environments. Second, depending on the viewpoint of the user or manufacturer, a single file format standard would be either over- or underdesigned for their requirements. Third, the preexistence of files in large installations and successful standardization of these existing file formats cannot represent state-of-the-art and futureproof demands (because it may already be obsolete technology). Taking into account this situation, several accredited organizations in the professional TV production world—such as the European Broadcasting Union (EBU), the Society of Motion Picture and Television Engineers (SMPTE), and the Association of Radio Industries and Businesses (ARIB)—as well as the ProMPEG Forum and the Advanced Authoring Format (AAF) Association, have started to develop a new file format called the Material Exchange Format (MXF). This format addresses the user requirements for mainstream, IT-based, TV program production such as news, archives, and production. Substantial contribution came from users in the United States, Asia, and especially Europe. The development of MXF also had strong and broad industry support (via the ProMPEG Forum and projects of the European Commission). It has established a degree of interoperability with AAF for the exchange of information between the graphics and postproduction environments.
1.3
MIGRATION TO SERVER- AND NETWORK-BASED TECHNOLOGIES Users will be faced with several technology choices when introducing IT-based system components, such as computer servers or networks. Choices include signal and coding formats for acquisition, contribution, archiving and production of the methods of interconnecting systems; the management and control systems of these systems; and the file format used for storage and transfer. Usually, any computer or server-based system that stores television signals on nonlinear media (hard disk, optical disc, etc.) or even on data tape handles television
1.3
Migration to Server- and Network-Based Technologies
signals in file form. An NLE system, for example, would convert incoming SDI or SDTI streams immediately to a file for storage or other application purposes. Once files have been created, it is natural to want to maintain and exchange these files via networks or on removable media. Since convergence began, several interface technologies have been offered for content exchange. Real-time, uncompressed digital video signals in the studio for standard digital television are still best handled by ITU-R BT.601 and ITU-R BT.656 SDI. In some applications involving the transfer of compressed video, users may wish to use the SDTI derivate of SDI, which offers not only the exchange of compressed signals (without picture-quality loss) but also faster-than-real-time transfer (e.g., four times 25 Mbps). Similar traditional interfaces are available and used for highdefinition television (e.g., SMPTE 292M for uncompressed HDTV). The IT industry has been offering several interfaces or networks originating mainly from the telecommunication or multimedia market (Fig. 1.2). Here are some examples: ◆
IEEE 1394: This consumer point-to-point interface supports up to 64 nodes at speeds up to 800 Mbps. Its main application is in the real-time (and fasterthan-real-time) transfer of compressed video signals. The SMPTE also has standardized the mapping of the DV-based compression scheme. In the professional domain, its application can be found in the desktop environment of VTR-to-VTR, VTR-to-computer, and hard-disk interconnects.
◆
Fibre Channel: This LAN-oriented network supports data rates up to 10 Gbps, mainly for the high-speed interchange of files. It also supports large storage arrays and server systems. Fibre Channel has become a state-of-the-art technology for internal and external interconnection of professional studio servers.
◆
Gigabit Ethernet: This uses chip technology similar to Fibre Channel and is increasingly a cost-effective LAN technology for medium speed applications in professional studios. It seems to be an attractive technology for users who want to migrate from low-bit-rate Ethernet to higher speed networks.
◆
ATM: This technology has become one of the dominant technologies for wide area interconnects. It supports high speeds (e.g., 622 Mbps, STM-4), and offers different transfer modes (quality of service, or QoS) for real-time and file-transfer applications.
Internet Protocol (IP), with different protocols such as Transmission Control Protocol (TCP) or User Datagram Protocol (UDP), runs over some of the interface technologies just described. It can also be thought of as a transport mechanism. Traditionally used for on-demand, non real time file transfer, the
7
1
8
Convergence of Information Technology and Traditional Television Production
Analogue Domain
Analog/Digital Converter
SDTI
dB -0-
1
SDTI
dB -0-
2
SDTI
dB -0-
3
SDTI
dB -0-
4
SDTI
SDTI
5
SDTI
6
Digital SDI / HSDI Video Router
SDTI
7
8
dB -0-
dB -0-
dB -0-
dB -0-
-10-
-10-
-10-
-10-
-10-
-10-
-10-
-10-
-20-
-20-
-20-
-20-
-20-
-20-
-20-
-20-
-30-40-
-30-40-
-30-40-
-30-40-
-30-40-
-30-40-
-30-40-
-30-40-
-50-
-50-
-50-
-50-
-50-
-50-
-50-
-50-
02:04:13:09
1394
SDTI
dB -0-
SDI / SDTI coax / fiber cable
ATM Gigabit Ethernet
WAN (ATM)
Central Server/Storage Sytem
Switch
Switch
Fibre-Channel Network/SAN
Fibre-Channel Network
Re
FIGURE 1.2
so
u rc
e an
d N e tw o r k M
Network technology scenarios in TV studios
ge ana
me
nt
1
SDTI
dB -0-
2
SDTI
dB -0-
3
SDTI
dB -0-
4
SDTI
SDTI
5
SDTI
6
SDTI
7
8
dB -0-
dB -0-
dB -0-
dB -0-
-10-
-10-
-10-
-10-
-10-
-10-
-10-
-10-
-20-
-20-
-20-
-20-
-20-
-20-
-20-
-20-
-30-40-
-30-40-
-30-40-
-30-40-
-30-40-
-30-40-
-30-40-
-30-40-
-50-
-50-
-50-
-50-
-50-
-50-
-50-
-50-
02:04:13:09
1.4
Going Digital, Bit Streams, and Files: The EBU/SMPTE Task Force
multimedia revolution has added streaming over IP. Video over IP is a term that has been used in this context for many applications, such as for low-bit-rate MPEG transport in videoconferencing and content browsing. The major uses of IP in the professional broadcasting domain, however, have been the exchange of files and search and retrieval in newsroom- or archive-browsing applications. At the beginning of its use in the professional media environment, some proprietary modifications were introduced to TCP/IP (tuned TCP buffers) and its application program interface (API), known as the File Transfer Protocol (FTP). This allowed compliance with the challenging requirements of gigabyte file exchange and avoided congestion or packet loss. Today, the Internet Engineering Task Force (IETF) provides additional specifications (Request for Comments, or RFCs) to meet the demands for handling and transmitting large files (e.g., via “Jumbograms”). Challenges remain, including the issue of signal loss during dynamic rerouting that may happen across public networks. These challenges are being actively addressed. Most interface technologies proposed by the IT world for professional program production were designed to operate in asynchronous modes with files, rather than with high-bit-rate, synchronous, real-time signals. This is no surprise considering that almost all original Internet traffic came from on-demand (non real time), file-based business and military applications. As a consequence, the broadcast industry is first addressing file-based applications as it moves to adopt IT. Eventually, many of today’s applications, such as time-consuming VTR dubbing, may be accomplished using file transfer (preferably, faster than real time). Centralized storage of content, either using Storage Area Networks (SAN) or Networks Attached Storage (NAS) applications, will greatly reduce the need for file copying—if distributed file systems are applied. Broadcasters need real-time transmission of content, and the IT world is offering solutions. The IETF, for example, provides additional specifications to facilitate streaming video over IP via protocols such as the Real-Time Transport Protocol (RTP), the Real-Time Control Protocol (RTCP, specified in RFC 1889, 1890, etc.), and the Resource Reservation Protocol (RSVP, specified in RFC 2205). There is even an RFC available that deals with the transfer of serial highdefinition signals (SMPTE 292M) via IP.
1.4
GOING DIGITAL, BIT STREAMS, AND FILES: THE EBU/SMPTE TASK FORCE In the mid-1990s, the two largest forums on professional broadcasting, the EBU and the SMPTE, began to study the effect of IT on professional television
9
1
10
Convergence of Information Technology and Traditional Television Production
program production. In mid-1996, EBU findings1 stated that the performance, stability, and reliability of traditional television production can only be met using IT technology if users insist on open and standardized technologies. The SMPTE, a standards-setting body for professional broadcasting and production, in an independent study came to similar conclusions. The organizations have worked closely together over the years. During the International Broadcast Convention (IBC-Amsterdam) on September 12, 1996, the two groups decided to establish the EBU/SMPTE Task Force for Harmonized Standards for the Exchange of Program Material as Bit Streams (later referred to as the Task Force). The Task Force was charged with two assignments: “a) to produce a blueprint for the implementation of the new technologies, looking forward a decade or more, and b) to make a series of fundamental decisions that will lead to standards which will support the vision of future systems embodied in the blueprint.” Two significant reports were published as a result of the joint effort.2 In carrying out its work, the Task Force divided its effort into five fundamental areas; compression, physical link and transport layers for networks, wrappers and file formats, metadata, and file transfer protocols. Each area was assigned to a dedicated subgroup. Major aims of the Task Force were to provide a framework, principal architectures, and a structured point of view for future IT-based television production (layered approach); to identify suitable technologies; to provide stability via standards; and to initiate new developments.
1.4.1
Interoperability: the Value and Need for Standards Some users have been concerned that the migration toward IT in digital video systems could result in the abandonment of specific industry standards, such as those of the International Telecommunication Union (ITU), the SMPTE, and the EBU. For that reason, users tried to establish joint efforts to increase their effect on the market and to initiate standardization in several areas. It may be useful to consider standardization efforts as providing a well-balanced “force” to vendors. Attempts to over-standardize a technology might be cost-intensive and might hamper competitive products as they enter the market. In the case of wellaccepted and proper standardization, users and the industry may need to specify additions on top of a standard to meet the last 10% of functionality that broadly based commercial systems were never designed to meet. It is important for the market that a variety of systems can be set up by “mixing and matching” products from different vendors. An example of the need for clearly defined best practices is the use of MPEG-2 4:2:2P@ML. MPEG provided the baseline standards, but the SMPTE provided additional recommendations and standards (e.g., SMPTE 356M) to define the bit-stream usage in professional production
1.4
Going Digital, Bit Streams, and Files: The EBU/SMPTE Task Force
environments. By carefully selecting “nominal” values from the ranges of choices within a standard, users can better achieve interoperability for their individual and sometimes competing applications. In other words, it may be that the role of professional broadcast standards organizations is a combination of writing standards when none exist and advising the specific broadcasting industry segment about applicable IT standards and how they can be used in professional video applications. As a general design philosophy, user organizations should attempt to choose open standards to which all qualified vendors can design and supply systems. Under such conditions, vendors would be able to develop products that can interoperate, but they could differentiate themselves in the application functionality. (Users would have the benefit of selecting products from a wider range of manufacturers.) The EBU has published a document titled, “Statement on Open Standards (D79)” that emphasizes this requirement.3 Similarly, the SMPTE defines, in its “Recommended Practice RP 213 Annex B,” the meaning and levels of interoperability.4 This definition of interoperability levels became important in guiding the development of standards, because of increasing complexity of systems and the adoption of horizontal system designs. For example, a single standard may define interoperability for a particular layer, such as a compressed bit-stream syntax, but this standard does not guarantee that the file within which the compressed bitstream is wrapped can be opened by the target application. Consequently, additional standards are required (in this example, the file format and the standard to map the bit stream into the file format) to achieve interoperability of a certain application. By selecting international standards wherever possible, global competition can be maintained, providing all international players with opportunities to contribute their technologies to common systems and data exchange. Often, the nomenclature of compatibility, or interoperability, and standardization is used in a similar way. Note, however, one clear distinction important for the niche market of professional TV production: Products that are interoperable or that can interchange content in a compatible way, such as via a common file format, may increase their value if this interoperability is achieved using a standard developed from an accredited and ratified standardization body. This will ultimately assist long-term stability.5 For example, files with compressed content may be stored in archives and might be accessible by today’s products. Nevertheless, only a well-defined standard, describing the technical details of the file and how to decode the compressed signals, will allow users to access the material over time. Applying these considerations to the Task Force and the subsequent work on file formats in the SMPTE, the EBU, and other bodies (such as the Pro-MPEG Forum and the AAF Association), the ideal result of a standardized file format
11
1
12
Convergence of Information Technology and Traditional Television Production
TV
IT SMPTE
D-5
IETF
MXF D-3
D-2
ISO
AAF
WIN ANSI
IEEE 1394
ANSI Beta
SDTI JPEG MPEG
D-5 AES
ETSI ITU-R
SAN
DV
FC NAS
MAC SDI
UNIX SCSI
601 EBU
FIGURE
ITU-T
Worldwide standards bodies
1.3
would be a TV program encapsulated in a file format that can be exchanged between different systems. The professionals in the broadcast world had to painfully learn to deal with standards-setting organizations originally set up to serve different markets—the telecommunications or IT world. In this context, the major challenge is to achieve mutual understanding about the requirements for technology, workflow, and processes when developing or adopting standards (Fig. 1.3).
1.4.2
Problem Description and Layered Approach Television production systems of the future will integrate news environments, production, postproduction, graphics, and archives more tightly than current installations. In particular, archives will need to be open to different forms of content, and they will have to embrace both the international broadcast community and the multimedia industry. Moreover, metadata or information about the content will be as important as the video and audio itself. The Task Force came up with the following formula:
1.4
Going Digital, Bit Streams, and Files: The EBU/SMPTE Task Force
CONTENT = ESSENCE + METADATA Here, essence represents video, audio, and data; metadata represents information about the essence. After the introduction of content and asset management, this equation was broadened as follows: ASSET = CONTENT + RIGHTS The metadata part is associated with rights information. The logical statement here is that, as a broadcaster, if you have content but you do not have the right to use it, then it has no value—it is not an asset. In fact, it may be a liability. In future IT-based TV production scenarios, large content-information storage systems will become central to the production process and will need to be managed in an efficient way. They will become the core elements of news and production environments and will likely be the central storage and management system for the entire production process. This is a major shift in view on the role of archive systems. Traditionally, archives have been viewed as an “end of pipe” process. However, with IT-based technology, archives are migrating to a core role in the facility. In the traditional broadcast production environment, systems were developed with a vertical approach. This made the integration of different information types difficult. Often, solutions from one vendor made it impossible for users to replace parts of their system with products from a different vendor. This “lock-in” to one vendor’s products—often associated with proprietary signal or interface technologies—was inconvenient for many users. The IT world, on the other hand, has followed a horizontal or modular approach to systems. This was a natural consequence of being software-centric and dealing with a rapid rate of change. In addition, the rate of change in one area did not keep pace with the rate of change in a different area. This created a strong requirement for the ability to replace individual system components, rather than a whole IT infrastructure. The approach of the IT world has been to follow a horizontally layered model known as the ISO reference model for open system interconnection (OSI-layer model, Fig. 1.4).6 In an ideal world, the model permits the exchange of individual layers without affecting the others and, in consequence, provides a framework for the development of standards valid for individual layers. Theoretically, this approach allows a user to upgrade the physical topology of an IT network (e.g., move from Ethernet to Gigabit Ethernet) without modifying all the applications using the network. Applying the OSI approach to file formats in practical broadcast production processes would mean, for example, that a fully standardized file format represents a single, horizontal layer of interoperability.
13
1
14
Convergence of Information Technology and Traditional Television Production
User Application
User Application
A
B
Application
Application Peer-to-Peer Protocols
Presentation
Presentation
Session
Session
Transport
Transport
Network
Network
Network
Network
Data Link
Data Link
Data Link
Data Link
Physical
Physical
Physical
Physical
Router
FIGURE
OSI-layer model
1.4
The different compression schemes, different interface and different workflow scenarios have presented problems in all technology layers of professional production systems. The EBU/SMPTE Task Force identified at the outset of its work that the IT-based OSI model would be a suitable methodology for providing solutions for the multilayer interoperability problem in network and serverbased (IT) broadcasting environments. On a case-by-case basis, the Task Force used a simplified model of the OSI reference system to organize traditional broadcast technology according to a layered approach. Figure 1.5 provides an example of a simplified OSI model. The model will vary with the broadcast technology used in each of the layers. For example, if an SDI infrastructure is used in the networks layer, the transfer protocol layer would not exist. SDI (a traditional, vertical standard comprising several layers) is a unidirectional interface technology with its own clear framework. For instance, the start and end of active video signaling of SDI could belong to the formatting layer, rather than the transfer layer. If SDTI over SDI is used, it could be argued that this represents either a protocol layer or an addition to the network layer. In a typical IT network example, such as transferring a video file via Ethernet, the OSI-layer model is clearer: A video signal is either compressed or uncompressed and formatted into a file (file formatting). Then, the file is transferred with a QoS (e.g., the FTP application via TCP/IP) via the network. This example clearly shows the difficulties that the Task Force encountered during its work. Nevertheless, it became clear that interoperability in future professional broadcasting could only be achieved through a well-structured and well-
1.4
Going Digital, Bit Streams, and Files: The EBU/SMPTE Task Force
e.g. video, audio, metadata, data
Content
Application Presentation
e.g. MXF, GXF, AAF, etc.
File Formatting Transfer Protocols
e.g. SDI, ATM, FC, Ethernet, IP, etc.
e.g. coax, fibre, etc.
FIGURE
e.g. a user wants a media file transfer from A to B e.g. exchange file format
Session
e.g. FTP session
Transport
e.g. flow control, error control
Network
e.g. address, establish connection
Data Link
e.g. error control, framing
Physical
e.g. mechanical, electrical characteristics
Networks Physical
15
Simplified OSI-layer models applied to television systems
1.5 File Transfer
File Storage
GXF, MXF
File Format
File Format
MXF, AVI
File System
VFAT, NTFS
TCP/IP
File Transfer High Level Format
Clustering
Low Level Format
Sectoring
ATM, FC
Fibre
FIGURE
Network
Physical
Physical
Magneto Hard Disk
Storage and file transfer: simplified layer model with technology examples
1.6
layered approach, particularly as file formats and file transfer via networks become dominant applications. Another important categorization of the technologies surrounding files and file formats is storage versus transfer of files (Fig. 1.6). A file is like a container holding all the program elements (content) within a specific project. The file including content can be transferred over a network or stored on a storage medium as a single entity, easily identified by a unique file name for unambiguous retrieval. Once a file is opened, metadata will provide a description of the essence accommodated in the file body and define the relationship (contextual, timing, etc.) of the elements. The definition and standard for a file format is, in principle, independent of the transport mechanism (file
1
16
Convergence of Information Technology and Traditional Television Production
transfer) or the storage mechanism for files (layer model, as just explained). However, the often-mentioned requirement to accommodate partial file transfers requires a degree of interrelationship between the file format and the transport mechanism. If a minor transmission error occurs during a transfer of a video file, the network protocols usually initiate a retransmission of the corrupted packets. However, if large parts of the file are corrupted or a user wants to transfer only parts of an existing file (e.g., a few video frames from a large movie file), limited interaction between the file format layer and the file transfer mechanism is required. A basic file format structure is shown in Figure 1.7. Usually, a file consists of a preamble with run-in sequences, followed by the body and an end-of-file marker. Editorial or descriptive information, such as metadata, typically are located in the preamble. The file body consists of the so-called payload. This can be uncompressed/compressed video, audio, data, or additional metadata. If a real-time video signal is transferred onto a hard-disk-based server (e.g., for NLE), the incoming data stream is stored as a file. In file transfers between servers over networks, the incoming signal is usually already in file form. It may be directly transferred to the storage medium, or it may need conversion to a different file format before storage. High data throughput, fast nonlinear access to the stored content, and efficient usage of storage are required. Therefore, the file format may need to be restructured to match the inherent file structure and the structure of the segmented format of the storage medium. The latter is called structured storage, low-level storage format, or native file format.
Metadata
Video
Audio
Data 010100101001 001010101010 101010100101 010010100001
422p@ML ISO/IEC… Pro-MPEG/SMPTE Operating Points
Preamble
DV-IEC SMPTE 322
DV-based 25/50 (4:1:1 & 4:2:2) Compression Standard 314/321M
Body
File Format
FIGURE 1.7
Example for common signal formats mapped into a generalized file format
End
1.4
Going Digital, Bit Streams, and Files: The EBU/SMPTE Task Force
The following are the main elements that must be considered when discussing files in professional broadcasting: ◆
The format used to transfer the information as a file, which may exist only on the wire, may be different from the file format used to store the information on disk or tape
◆
The storage format that contains the file format used to write the bits to disk or tape
◆
The file transfer protocols being used
◆
The API and operating system responsible for generating access to the file stored on disk or tape
Regarding IT-based installations, the discussions on the constraints imposed on moving files between systems have not ended. In particular, the application of streaming files with real-time capabilities generates a challenge for typical IT networks (considering requirements of TV production such as full synchronization, nanoseconds of jitter, etc.).
1.4.3
Results of the EBU/SMPTE Task Force Systems To better understand the requirements of system design, the Task Force has developed a model based on orthogonal parameters and intersected by an underlying control and monitoring layer. This model has been used to explore the relationships between signals, workflows, and processes, as well as networks/interfaces, control, and monitoring (management) systems. The Task Force model can be used to describe or analyze any type of program or activity. The description of part of any system can be made in terms of the model by describing the technologies used to carry each of the planes for any given layer. It can also describe the control and monitoring functions across the activities and planes. A television system can be considered as several signal-carrying planes controlled by an intersecting control plane. Each production task requires the manipulation of signals in some or all of the planes. In traditional television systems, the planes consisted of distinct physical systems: Video, audio, and data were carried on different cables. Metadata was often simply written on a piece of paper or tape label. Future systems will not necessarily have these distinct physical systems. Instead, they will be based on
17
1
18
Convergence of Information Technology and Traditional Television Production
Pre-Production
Acquisition & Production
Post-Production
Distribution
Storage
Transmission & Emission
Archiving
Metadata Plane Data Essence Plane Audio Essence Plane Video Essence Plane
Application Layer* Network Layer Data Link Layer Physical Layer
Activities Network, Resource, Control and Management Plane * additional layers for communications and message exchange between system components may be required
FIGURE
System model of the EBU/SMPTE Task Force
1.8
networks carrying multiplexed signals. It is useful, however, to consider new IT systems in terms of a logical model in which the signal types are distinct. Figure 1.8 shows the model developed by the Task Force. Recent findings, however, suggest that a further distinction in the application layer is required to address the interaction and message exchange between system components. The additional layers required are Message Exchange Protocols, the definitions of the messages themselves (see, for example, the work of the SMPTE on Management Information Bases, or MIBs), and the API.
Compression With the introduction of modern compression schemes (e.g. MPEG-2 422 profile and DV/DV-based) in professional production (Fig. 1.9), users have been faced with the following set of key questions: ◆
Will the data reduction provide the anticipated economic benefits without impairing the picture quality, especially considering the multiple decoding and reencoding required in most production workflows?
◆
Will the compression algorithm and the bit stream support all operational functions (e.g., editing), and will it be sufficiently standardized so that archived material can be accessed many years in the future?
◆
Will existing and future interfaces be able to transport the compressed data in an interoperable and standardized way that also allows third-party products to process the compressed signals?
1.4
Going Digital, Bit Streams, and Files: The EBU/SMPTE Task Force
19 ?
MPEG-4 H.264 DV DV-based MPEG 422P@ML JPEG
Time
FIGURE
Evolution of video compression formats
1.9
The Task Force findings for audio are as follows: “The EBU/SMPTE Task Force strongly recommends that the AES-3 data stream be utilized for the carriage of all audio signals, compressed or full bit-rate. In some cases, standards will be required to define the mapping of the data into the AES stream.” The Task Force further states that the sampling rate will normally be 48 kHz (AES5-1984, reaffirmed 1992), locked to the video frame-rate (AES11-1991) with 16 bits, 20 bits, or 24 bits per sample. With respect to purely file-oriented audio signal processing, the Broadcast Wave Format (BWF) provides an appropriate solution. It is important to recognize that this essential definition for audio needs to be maintained within any file format to be used in professional applications. In other words, a mapping of source signals such as uncompressed or compressed audio, or a mapping of the BWF into any other file format, must ensure that no modification is applied to the source format. The Task Force findings for video were as follows: For core video applications in mainstream TV production and postproduction for standard definition television, two different compression families on the market were advocated as candidates for use in future networked television production: ◆
DV/DV-based 25 Mbps with a sampling structure of 4:1:1, and DV-based 50 Mbps (SMPTE 314M) with a sampling structure of 4:2:2, using fixed bit rates and intraframe coding techniques exclusively. DV-based 25 Mbps with a sampling structure of 4:2:0 should be confined to special applications.
◆
MPEG-2 4:2:2P@ML using both intraframe encoding and Group of Pictures (GoP) structures, and data rates to 50 Mbps. MPEG-2 MP@ML with a sampling structure of 4:2:0 should be confined to special applications.
20
1
Convergence of Information Technology and Traditional Television Production
According to the Task Force, standard definition television in uncompressed form should follow ITU-R BT.601. In HDTV applications, similar requirements for interoperability have been defined. Currently, dominant compression schemes such as MPEG and DV provide enhancements to cover HDTV applications (MPEG with its high-level profile and DV with its HDCAM and DV-based 100 Mbps derivates). With respect to the transport of the compressed or uncompressed data in file form, an essential user requirement has been to provide mapping standards. This documentation provides the technical information to map compressed bit streams into the file format in a consistent and interchangeable way. The current standard documents of the SMPTE that represent the mapping of DV, DV-based, and MPEG 4:2:2 compressed video bit streams into SDTI have also been used to define the mappings into the newly defined MXF. (Uncompressed video mapping, according to ITU-R BT.601, is also being developed.) The functional and operational advantages of adopting a common layer (e.g., the compressed video bit-stream layer) both for mapping into (traditional) streaming interfaces (e.g., SDTI) and for file formats are obvious.
Wrappers and Metadata The findings of the Task Force in the area of wrappers and metadata provided the foundation for most of the standardization work on professional file formats over the last few years (as will be discussed later in this book) (Fig. 1.10). File formats and wrappers are almost synonyms. According to user requirements,2 the principal characteristics for wrappers have been defined as follows: ◆
Wrappers should provide means to convey and link essence and metadata in the logical and physical domain.
◆
Wrappers must support content organization in the wrapper payload area in a playable form (streaming wrapper) as well as for specific storage or content manipulation purposes (e.g., audio part separated from video).
◆
Wrappers have to provide links for external data. This can be other wrappers, metadata in a database, essences, and so on.
As a consequence, wrappers (or file formats) have to meet several challenging functional requirements in the different application areas of electronic news gathering (ENG), postproduction, production, archiving, and so on. Further analysis has shown that these user requirements cannot be met by a single wrapper. For that reason, more than one wrapper will be required. An important task for those creating standards in this area was to ensure that future wrappers provide so-called low-processing conversion capabilities. An appropriate exam-
1.4
Going Digital, Bit Streams, and Files: The EBU/SMPTE Task Force
21
File Exchange Software
Software
ID UM Essence
ID UM Essence
File Access
File Access
D I010100101001 UM 001010101010 101010100101 010010100001
010100101001 001010101010 101010100101 010010100001
010100101001 001010101010 101010100101 010010100001
Essence
ID UM Metadata ID UM
ID UM Essence
Essence
ID UM
ID UM Essence
D I010100101001 UM 001010101010 101010100101 010010100001
010100101001 001010101010 101010100101 010010100001
Essence
D I010100101001 UM 001010101010
010100101001 001010101010 101010100101 010010100001
101010100101 010010100001
Essence
ID UM
010100101001 001010101010 101010100101 010010100001
Essence
File Interconnect Metadata
FIGURE
010100101001 001010101010 101010100101 010010100001
ID UM Metadata
Basic Task Force model for wrapper/file exchange applications
1.10
ple for ensuring successful development of different wrappers with low-processing conversion capabilities is the use of similar object models for AAF, whose application is proposed for the postproduction environments, and MXF, to be used in the mainstream TV production. Metadata describes the characteristics of the media information in computerized or electronic form. Metadata is, in principal, all the information that is not essence (i.e., not video, audio, or data). It may describe scripts (simple text files) for a particular shoot or business transactions, rights, or simply the name of a program. You can think of metadata as the information contained on a tape label or on printed format sheets in a tape case. All this descriptive information, now in electronic, computerized form can be categorized as metadata. Users and the industry already have a set of standards to support metadata applications (Fig. 1.10). The SMPTE Metadata Dictionary (SMPTE Recommended Practice 210) is one successful outcome of the Task Force. Additional activities by the EBU (e.g., P_Meta EBU TECH 3295), the International Federation of Television Archives (FIAT/IFTA), the MPEG community (e.g., MPEG-7), and the TV-Anytime Forum followed.
22
1
Convergence of Information Technology and Traditional Television Production
The SMPTE Metadata Dictionary represents a classified list of metadata (also called data elements) that has already grown to hundreds of data elements (e.g., a data element could be the “name” of an actor). Many practical applications of metadata, however, require just some of the data elements in the dictionary. For that reason, a further categorization of metadata as sets has been defined. A good example of a set would be to list the people who took part in a movie. The SMPTE Metadata Dictionary contains data elements for “Name,” “First Name,” and so on. By using these data elements, a “Set” (e.g., called Actors) could be created. This set could be placed in the header of an audiovisual file, for example. To encode metadata in binary form, the SMPTE developed a technology called the binary Key-Length-Value (KLV) protocol (SMPTE 336M). In addition, the SMPTE developed several standards to map this encoding protocol into different interfaces (e.g., SDI Ancillary data, MPEG-2, and AES) to provide a transport mechanism for metadata. For metadata exchange, the following findings of the EBU described two specific areas:7 1. The first area is system-to-system (S2S) exchange of metadata for interoperability purposes. The S2S point of view concentrates on the technological and implementation aspects when metadata is generated, exchanged, and processed. The interoperability architecture follows a layered model by introducing metadata definitions (i.e., the dictionary), a metadata encoding protocol (KLV or XML), and the mapping of this protocol into different transport mechanisms, as shown in Figure 1.11. 2. The second area is business-to-business (B2B) transactions in which different applications can talk to each other, such as database-to-database or automatic billing and budgeting interactions. These interactions may occur within or between broadcasters, and may ultimately reach the viewer at home. (For examples, see the work of the TV-Anytime Forum, available via www.tvanytime.org.) Both areas clearly show that different businesses within the broadcasting chain will require different kinds of metadata and metadata technologies. It has often been said that the success of metadata in professional TV production will go hand in hand with the successful standardization of a common file format that supports the relevant metadata standards. There will also be a broad range of applications that store the metadata in databases and work independently of any media file format. Therefore, metadata is not only applied to define and describe the essential functions and the payload (e.g., type of compression algorithm or aspect ratio) of a file; it can also be used to formulate
1.4
Going Digital, Bit Streams, and Files: The EBU/SMPTE Task Force
23
Dictionary and Subdefinitions
Protocol Encode/Decode (e.g. KLV or XML)
Transport Mappings
Transport Mappings
Transport Mappings
(e.g. MPEG-2)
(e.g. AES/EBU)
(e.g. ancillary data)
Stream
SDTI
dB -0-
FIGURE
1
SDTI
dB -0-
2
SDTI
dB -0-
3
SDTI
dB -0-
4
SDTI
SDTI
5
SDTI
6
File
SDTI
7
8
dB -0-
dB -0-
dB -0-
dB -0-
-10-
-10-
-10-
-10-
-10-
-10-
-10-
-10-
-20-
-20-
-20-
-20-
-20-
-20-
-20-
-20-
-30-40-
-30-40-
-30-40-
-30-40-
-30-40-
-30-40-
-30-40-
-30-40-
-50-
-50-
-50-
-50-
-50-
-50-
-50-
-50-
02:04:13:09
S2S, layered metadata approach
1.11
business interaction models in which essence (video, audio, and data) is irrelevant. Practically, there will be several applications in the IT-based production chain that require the following: 1. Some metadata is directly associated to the essence (audiovisual material). This means that it is part of the media file (e.g., located in the header of the media file containing audiovisual material). 2. Some metadata is located in a database. There is no need to embed it into the media file, but a link (like a Web URL), from the file carrying the audiovisual material to the metadata on the database, is provided. Case 2 requires the link or association from the file containing the audiovisual material to be handled in a reliable way, with particular emphasis on maintaining the appropriate storage source and storage destination information if either the file with the audiovisual material or the metadata is moved. To provide an appropriate technology for the task of managing the association between metadata and audiovisual material, the SMPTE developed a UMID (SMPTE 330M).
24
1
Convergence of Information Technology and Traditional Television Production
It has been mentioned already that the proper acceptance of a file format used in professional broadcasting will have to encompass different complexity levels. Depending on its intended application range, a file format to be used in postproduction, such as AAF, will have to provide rich functionalities for picture and audio manipulation. A file format for TV production will require less complex functionalities (i.e., only simple edit cuts) but may include more productionoriented and real-time functions. These functional capabilities of a file format are described in a particular type of metadata, called structural (or technical) metadata. Other metadata that is added to the file but is not required by the function of the file is called descriptive (or user) metadata. Descriptive metadata adds value to the file; examples include the name of the author, the production location, the names of participants, the rights holders, budget information, and the shooting script. MXF supports Descriptive Metadata Plug-In interfaces. This means that the file format is able to transfer all types of user metadata that follow the rules described in the MXF standard (described later in this book). It is important that the user requirements for transparent metadata transfers are met and that file formats treat metadata as just another type of data to be carried in the file. Proper processing of the metadata at the destination will also be required. During 2003, the EBU published an information paper8 describing the metadata implications surrounding the introduction of file-oriented production. The paper distinguished between content-driven and information-driven approaches when introducing metadata and files in broadcast facilities.
Networks and Transfer Protocols The results of the Task Force have shown that, to meet the user requirements for content transfer in future IT-based production environments, a clear distinction between stream and file transfer methods is necessary. This is required particularly because certain interface types (or networks) adopted from the IT world cannot meet the challenging real-time (or streaming) performance requirements of professional broadcast applications.* The most important performance parameters for real-time transfer via interfaces can be found in the first report of the Task Force on user requirements. Usually, the IT world rather prefers the term streaming over real-time transfer.
*Traditional broadcast engineers will argue that streaming (as often associated with simple multimedia applications) by no means correctly characterizes the high-end QoS requirements of professional broadcasting for bit error rate (BER), jitter/wander, bandwidth, delay, and bit rate.
1.4
Going Digital, Bit Streams, and Files: The EBU/SMPTE Task Force
In streaming mode, content is transferred in a way that maintains certain timing relationships to a clock. This allows immediate display of the content (synchronous, isochronous transmission). You will see later in this book that a streamable file format will be a key user requirement for newly introduced formats such as MXF. In streaming operations, the transport system must comply with certain QoS parameters. These define the tolerances for bit rate, delay, jitter/wander, and BER. The network topology applied to streaming is point-to-point and pointto-multipoint (broadcast) with, usually, a unidirectional data transfer. Different methods with different technical performance are used in the network and protocol layer to achieve the required QoS parameters. The most popular are UDP or RSVP on IP networks, or direct mapping of the file into the transport without additional flow control protocols (e.g., direct mapping into ATM or Fibre Channel). With respect to the streaming of a file, user requirements can be summarized as follows. ◆
Essence must be arranged in the file body in a directly playable order.
◆
Resynchronization information must be distributed over the file to permit relock after interruption.
◆
The transport system, such as networks or unidirectional links, has to meet certain QoS parameters.
◆
Depending on the network used, the file has to be transferred slightly faster than real time to compensate for the terminal buffer delays.
◆
Sufficient metadata information should be available to understand the payload to be played.
In contrast to streaming, file transfer usually provides reliable transport of the information with guaranteed delivery, even under adverse conditions. Often, the terms generating a clone and generating a bit-for-bit copy are used in discussions to emphasize that no differences between the sender and destination files are permitted. This can be achieved to a limited degree by either Forward Error Correction or flow control protocols (e.g., TCP/IP) over bidirectional links that initiate a retransmission of corrupted packets, if necessary. The topologies applied include point-to-point and point-to-multipoint (reliable) transfer. Timecritical applications, in which a file has to arrive at the destination at a predetermined moment, require certain QoS parameters concerning bandwidth and bit-rate control to be met as well. Bit-rate control is required if many users need to share the bandwidth on a network; it avoids the full consumption by a single user of the available network bandwidth. The transfer time of a file is normally determined by the delay experienced when transiting the network and, in particular, by the flow-control protocols
25
1
26
Convergence of Information Technology and Traditional Television Production
(e.g., TCP/IP) and the delays found in the source and destination servers (buffer memories, DMA transfers, disk access, etc.). Simple solutions to avoid blocking on the network include the use of protocols that permit an adjustment in the maximum bit rate per user (Fig. 1.12). More sophisticated solutions offer QoS parameters at the network level (e.g., ATM and Fibre Channel). In applications requiring faster-than-real-time transfer, the network must provide both adequate bandwidth and bandwidth control. As shown in Figure 1.13, the Task Force was studying alternatives to the widespread FTP and TCP in an attempt to meet some crucial user requirements, such as partial file transfer and bit-rate control commands (maximum bandwidth control), and, in particular, to facilitate the demand for reliable point-to-multipoint transfer. An enhanced FTP (FTP+) that includes additional commands for partial transfer and rate control and the Express Transfer Protocol (XTP) have been investigated as possible candidates. Unfortunately, neither protocol succeeded
Fast file transfer methods
Transfer Rate
100 Mbits/s
NFS-based file access
10 Mbits/s
FTP / TCP based transfers
Local
Campus, City
Distance
FIGURE 1.12
File transfer modes, as defined by the Task Force
WAN
1.4
Going Digital, Bit Streams, and Files: The EBU/SMPTE Task Force
27
Files for Transfer / Access Fast, Local FC Transfer
Core FTP
Point-to-Multipoint Transfers
Distributed File Access
Application Level
Custom File Access Application
FTP File Transfer Application
Custom File Access Application
NFS File System Access
Protocol Level
ANSI X3T11 & FTP+
FTP API (RFC 959)
FTP+
NFS 3.0 API (RFC 1813)
Transport Level
FC-4 SCSI / IP
TCP (RFC 793)
XTP (XTP Forum)
TCP / UDP
Network Level
Fibre Channel FC-2
COS 3
COS 4 Rate Set
IP (RFC 791)
Best Effort QoS
QoS for IP Nets, ATM, FC, Explicit T1, E1, T3, E3, QoS
MAC & Physical Layers 802.3 (10/100/1000 Mbit/s), IEEE 1394 ATM / SONET / SDH, T1, E1, T3, E3, Fibre Channel FC -0, 1
FIGURE
File formats and TCP/UDP stack, as defined by the Task Force
1.13
in the follow-up standardization work (after the Task Force). To some extent, this is because the IT industry provided workarounds that include enhancements to the existing TCP/IP stack as well as TCP/IP extensions. On the other hand, work of the Pro-MPEG Forum in mid-2003 has shown increasing interest in the broadcast user community and the industry to provide “codes of practices” for point-to-multipoint transfer of files as well as for file manipulation commands (enhanced FTP for partial transfer of files).
1
28
1.5
Convergence of Information Technology and Traditional Television Production
DATA FLOW IN FUTURE CONTENT PRODUCTION In the broadcast community, the description and classification of metadata— with the use of a defined coding scheme (KLV), including UMIDs—has been agreed on, has been standardized by the SMPTE, and is finding acceptance in the market. Metadata can be used to define and describe the essential functions of a file format, as well as the payload of a file and business interactions. As content flows through the broadcast chain, different types of data are required at different points. This applies to the ability to manage and “filter” metadata across different metadata applications, as shown in Figure 1.14. Because all video, audio, data, and metadata in future broadcast installations will be handled as information, the unique identification of each information element (e.g., through a UMID) becomes critical. However, the proper management of all information types (content and asset management systems) is challenging. For
Archive Essence Metadata Search, Format, EDL, UMID, Rights etc.
ID UM
Essence Metadata
Business Issues Rights etc.
UMID, Format, etc.
ID UM
EDL, Format, UMID, etc.
ID UM
M-Filter
Consumer
Essence Metadata
M-Filter
Playout
Essence Metadata
M-Filter
Editing
Essence Metadata
M-Filter
Shooting M-Filter
Idea Text Editing
UMID, Format, Rights etc.
EPG etc.
TVA
ID UM
Contribution Essence Metadata UMID, Format, Rights etc.
ID UM
Data I/O Layer
Object Server Idea
FIGURE 1.14
Databases Production
Data Tape Transmission
Example for data flow applying UMID, metadata, and files
Consumer
1.6 Summary
29
example, rules for dealing with UMID generation (if essence is copied or renewed) or prioritizing and limiting access to a company’s internal metadata (e.g., budget information) need to be set individually to meet each broadcaster’s needs. Exploiting all the advantages of files and metadata in broadcasting requires substantial homework, in terms of workflow and process analysis internally for each broadcaster, and this has often been a hampering factor in the rapid introduction of IT. On the other hand, there is increasing recognition in the broadcast community that complete, file-based production will happen. It is just a matter of time.
1.6
SUMMARY This chapter has shown that files and file formats will become the predominant technology for the storage and exchange of content. Over the longer term, they may replace traditional videotapes (tapeless production). The results of the EBU/SMPTE Task Force have initiated the appropriate actions in all the technology areas concerned with future server and network-based content production environments. Traditional broadcast as well as IT companies are discovering new business opportunities in professional media and have created IT-based solutions for the broadcasters. These will result in enhanced services: more flexibility, new processes, more effective workflows in program making, and promising economic benefits. In this environment, interoperability, and consequently standardization, is mutually advantageous for users and manufacturers. File formats and their associated technologies such as material identification and metadata are key enablers of interoperability. This chapter has also shown that the large user and industry bodies of the broadcast media world, such as the EBU, the SMPTE, and MPEG, have identified the actions that will lay a technology foundation for future network and server-based broadcasting.
ACKNOWLEDGEMENTS I would like to thank all participants in the EBU/SMPTE Task Force and, in particular, its chairmen, Horst Schachlbauer (representing the EBU) and Merrill Weiss (representing the SMPTE), for their tremendous efforts. Moreover, I want to extend my thanks to all of my former colleagues in the Institute fur Rundfunktechnik, Munich, the EBU, Geneva and to all who have worked with me over the years in many projects of the EBU and the SMPTE.
30
1
Convergence of Information Technology and Traditional Television Production
REFERENCES 1. European Broadcasting Union, working papers of the EBU Project P/BRRTV, presented at the EBU workshop “Making the BRR Connection,” Geneva, Switzerland, November 4–5, 1996; and the EBU workshop “Totally Digital,” Geneva, July 17–18, 1996. 2. EBU/STMPE Task Force for Harmonized Standards for the Exchange of Program Material as Bit Streams, “First Report: User Requirements,” April 1997; “Final Report: Analyses and Results,” July 1998, available via http://www.ebu.ch or http://www.smpte.org. 3. European Broadcasting Union, “Statement on Open Standards (D79),” available via http://www.edu.ch. 4. Society of Motion Picture and Television Engineers, “Recommended Practice RP213,” available via http://www.smpte.org. 5. Katharina Grimme, “Digital Television Standardization Strategies,” (Norwood: Artech House, 2002), pp. 18–19. 6. Fred Halsall, “Data Communications, Computer Networks and Open Systems,” (Boston: Addison-Wesley, 1996). 7. European Broadcasting Union, publications of the EBU P/META Project, available via http://www.ebu.ch 8. European Broadcasting Union, “Metadata Implementation Considerations for Broadcasters, Information Paper I36,” available via http://www.ebu.ch
2 CHAPTER
Structure and Organization of Metadata and the SMPTE Metadata Dictionary Oliver Morgan
2.1
INTRODUCTION TO METADATA In the early days of digital media, little thought was given to metadata. Recording, mixing, editing, and display systems typically worked with a single media format. The challenges of coaxing sufficient performance from the hardware to process the pixels were such that the idea of building agile decoders seemed ludicrous. Equipment was designed to be single-purpose, and most essence parameters were burned into hardware. In this context, it is little wonder that metadata was limited to controlling the few switches that existed—such as selecting 625 lines or 525 lines, or choosing one of ten possible wipe patterns. Further, the industry was vertically integrated, and it was possible to obtain most or all components of a system from a single supplier. Design engineers felt at liberty to specify a unique metadata code for each switch. They were asked to share the specification only with those immediately before and after them in the signal chain. As the number of picture formats, viable compression methods, recording formats, and video effects devices increased, it became untenable to design (or, more usually, lash together) a new system for every combination of parameters. The only solution was to carry with the essence an adequate set of information to configure each step along the production chain automatically. Many focused efforts in the 1980s covered subjects such as universal time and control code, universal VTR remote control, universal switcher effects control, universal ancillary data format, and universal edit list format. In each case, “universal” really meant “within this device category.” Unfortunately, the pace of innovation exceeded the pace of standardization. Every year, a set of product
2
32
Structure and Organization of Metadata and the SMPTE Metadata Dictionary
innovations would stretch the previous year’s standards to their breaking point and beyond. This was aggravated by the increasing penetration of software into the television industry. This software was inherently more flexible and, at the same time, needed a broader definition of parametric information; therefore, the developers repeated the tradition of inventing from scratch. By the mid1990s, this dysfunctional cycle was causing serious problems for production professionals and equipment manufacturers alike, so much so that the industry embarked on the creation of a new architecture for defining, formatting, and transmitting dubbed metadata. This became the SMPTE Metadata Dictionary. Some lessons from previous decades had to be remembered in the new architecture:
2.1.1
◆
Innovations will continue apace, so the new architecture must stretch to include them.
◆
There will be several solutions to every design challenge, and the new architecture cannot favor one at the expense of others.
◆
Neither hardware nor software will go away, so a happy medium must be found between the convenience of hardware engineers and the convenience of software developers.
◆
It will remain uncommon to be able to design a studio from scratch. As old and new devices will coexist, they must be enabled to provide metadata to each other through appropriate translation.
◆
To prolong their service life, all devices and processors must be encouraged to deliver as much metadata as possible to downstream devices.
◆
During its service life, every device will be used in systems beyond its original design goals. Downstream devices must be prepared to encounter strange new metadata and to select those necessary and appropriate for their operation.
Historical Metadata Formats Metadata is not new; every file format and signal format includes it. Depending on the originating device, several formatting methods have been used.
Digital Video The most prevalent approach in digital video (DV) is to insert a small number of bit flags in predetermined places in the signal outside the displayable area, usually associated with the picture vertical or horizontal refresh. Some examples of
2.1 Introduction to Metadata
33
MSB
LSB
Byte 1 Byte 2
FIGURE
Secondary Data Type 0 0 (RESERVE)
VIDEO INVALID
AUDIO INVALID
LOCK
A STYPE
AP3
Byte 3
V 50/60
V STYPE
A 50/60
Byte 4
FF
FS
V REC MODE
Byte 5
A-1 LF
Byte 6
A-1 REC ST
Byte 7
A-1 DRF
Byte 8
A-2 LF
Byte 9
A-2 REC ST
Byte 10
A-2 DRF
A-1 CHN A-1 REC END
A-1 REC MODE
TRANSFER MODE AP2
BCSYS
AP1
APT
DISP
A-1 PA
A-1 AUDIO MODE
A-1 EF
A-1 SMP
A-1 QU
A-1 SPEED A-1 CHN A-2 REC END
A-2 REC MODE
A-2 PA
A-2 AUDIO MODE
A-2 EF
A-2 SMP
A-2 QU
A-2 SPEED
Example from the SMPTE 322M DV-based specification
2.1
this technique include IEC 61834 DV signals,1 the SMPTE 352M signal identifier,2 and SDTI-CP systems items.3 An example of this from the SMPTE DV-based specification is shown in Figure 2.1.
Wave Another simple approach is to start a file with a block of metadata. A good example of this file header approach is the Wave audio file format, used (with some variations) by personal computers. It is also found in broadcast as the EBU Broadcast Wave Format (BWF),4 shown in Figure 2.2.
-> fmt( ) -> struct{ WORD wFormatTag; WORD nChannels; DWORD nSamplesPerSec; DWORD nAvgBytesPerSec; WORD nBlockAlign; }
FIGURE 2.2
Wave header example from EBU T3285
// // // // //
Format category Number of channels Sampling rate For buffer estimation Data block size
2
34
Structure and Organization of Metadata and the SMPTE Metadata Dictionary
Features of interest in the Wave file header include the following: ◆
The use of a short “magic byte sequence” to identify the format (in this case, RIFF WAVE)
◆
The use of an encoded value to refer to external tables and signal specifications (in this case, the FORMAT TAG field)
◆
The specification of several numerical parameters of the file content (in this case, SAMPLE RATE and NUMBER OF CHANNELS) as full range numbers, even though only a few of the possible values are in common use
◆
The undesirable introduction of special cases and format variations based upon known rules, sometimes flagged by special values of a certain parameter (in this case, BITS PER SAMPLE)
This technique is used by many other formats, including Digital Picture Exchange (DPX)5 (Chapter 3 of this book), JPEG,6 and MPEG.7
Tagged Image File Format Another useful technique makes each parameter self-identifying. Instead of the fixed format tables of the previous example, a simple and regular syntax allows variations and extensions on a basic specification with greater reliability. A good example is taken from the Tagged Image File Format (TIFF),8 shown in Figure 2.3.
Markup Another common technique is to add additional metadata into a preexisting format by specifying an escape sequence. An early example of this is the UNIX IFD Entry Each 12-byte IFD entry has the following format:
FIGURE 2.3
Bytes 0–1
The tag that identifies the field
Bytes 2–3
The field type
Bytes 4–7
The number of values, count of the indicated type
Bytes 8–11
The value offset, the file offset (in bytes) of the value for the field. The value is expected to begin on a word boundary; the corresponding value offset will thus be an even number. This file offset may point anywhere in the file, even after the image data.
Simple metadata syntax
2.1 Introduction to Metadata
Untitled 0028 0001 VA1A2 KB 99 0028 0002 VA1A2 KI 0480 01:00 SPLIT-IN CH1=-01:00 CH2=+01:25 SPLIT-OUT CH1=+01:00 CH2=-01:25
FIGURE
35
00:03:33.00 00:03:39.00 01:02:33.00 01:02:39.00 00:04:28.00 00:04:31.00 01:02:34.00 01:02:37.00 CH3=+00:00 CH4=+00:00 CH3=+00:00 CH4=+00:00
Section of an EDL
2.4
“troff” format, in which the underlying format is plain text, and the escape sequence is a line beginning with a period. An example in digital video is the venerable Edit Decision List (EDL) format,9 in which the underlying format is a series of fixed format text records, and the escape sequence is a record beginning with a keyword. An example is shown in Figure 2.4. This technique has developed into structured markup, the most current example being Extensible Markup Language (XML).10 Various aspects of XML make it particularly suitable as a metadata format. These are discussed in greater detail later in this chapter.
2.1.2
Why a Public Dictionary? The most important development in metadata has been the creation of a public dictionary. The definition of every metadata parameter used in digital media has been gathered in a single place. The dictionary records only a minimal set of information about each parameter, and includes a reference to the defining document that specifies the meaning of the parameter.
Benefits The first benefit of a public dictionary is that you can refer to a single database and use the public name or identifier of an item to validate and interpret it. Thus, you can always use a consistent method. As digital media becomes more elaborate, files are beginning to include a wider range of items from a wider range of communities. A central reference point greatly simplifies the process of decoding interpretation and display. A less obvious benefit is that a public dictionary discourages unnecessary duplication of effort. When it is easy to find out if a piece of metadata has been previously defined, reuse is easier, and new devices are less likely to specify the same parameter (such as video frame rate) in subtly different and incompatible ways.
36
2
Structure and Organization of Metadata and the SMPTE Metadata Dictionary
Undesirable Alternative Recent practice shows many examples of the undesirable alternative. With a little effort, you could discover four or five ways of specifying video frame rate, all in use at the same time within a single project. Looking deeper, you would find that each method uses different encoding of a few bits to represent a repertoire of common frame rates, and that each provides a different escape mechanism to signal less common variants.* Worse, to uncover these differences, you would have to know where to find the different reference documents. Only then could you consider what might happen if one device or program was configured incorrectly. For example, it is not uncommon to find defects such as lip sync problems caused by incompatible matching of 47.952, 48.000, and 48.048 kHz audio.
Downside The primary complaint against a public dictionary is the required compilation effort. In the initial phases, it often seems the return on the effort is minimal, especially when each item is only used in one place. For the compilers of the dictionary, it is also dispiriting to catalog multiple variants of similar items and to research the underlying specifications to determine if they are the same or different definitions. The benefits only become obvious with new development.
Mixing Private and Public Metadata A second concern is the many private vocabularies of metadata, often with valid reasons to remain private. This seems to undermine the concept of a public dictionary. The SMPTE Metadata Dictionary addresses this concern by providing dedicated areas to record the existence of private metadata. In these areas, it is not required to document the metadata fully, if a link is provided for authorized users to discover further details. Organizations registering such metadata are encouraged to reveal as much as they feel appropriate. Knowing that private metadata exists and knowing who can interpret it is a benefit to the users of digital media systems. Armed with this information, not only users but also equipment can decide which simple action to take: find out more, or enable a specific process on a file.
*MPEG, DV, AVI, SMPTE 352M, and SMPTE 258M are examples of different methods of describing frame rate.
2.1 Introduction to Metadata
2.1.3
Required Technologies To create such a dictionary, it is necessary to differentiate between the name of an item and its purpose, as well as between the representation of an item of metadata and its meaning. The name of the item says where to place it in the dictionary so it can be found; it is important for this name to be unique. A description of the purpose of the item tells which of a collection of similar items is most appropriate in a given context. The representation of an item explains how it is encoded and formatted in text or binary form. The meaning of an item specifies how a range of values relates to the outside world—either as algorithmic mapping, such as a count of horizontal pixels, or as a set of discrete alternative choices, such as which compression technology has been used on a signal.
Unique Identifiers The names of metadata items need to be unique. There are several good algorithms for creating universal names, typified by three standards: ASN.1 Object Identifiers,11 ISO Universal Unique Identifiers (UUIDs),12 and Standard Generalized Markup Language (SGML) Public Identifiers.13 Today, these are better known by their prevalent implementations: SMPTE Universal Labels (ULs),14 Globally Unique Identifiers (GUIDs),15 and XML namespace-qualified names,16 respectively. Each of these three techniques combines a publicly registered part and a locally assigned part. In the case of SMPTE ULs, the entire label is registered publicly (except for the private part of labels for private metadata). For GUIDs, the registered part is an Ethernet node address, and the local part is a highprecision timestamp. For XML names, the registered part is the Uniform Resource Identifier (URI) scheme name, and the local part is a combination of a public identifier called the namespace identifier and a unique text tag defined in the schema for each class of XML documents. Figure 2.5 summarizes the characteristics of these unique identifier algorithms. The SMPTE Metadata Dictionary uses SMPTE ULs. These are specified by SMPTE 298M as a string of groups of 4 bytes. (In practice, most SMPTE ULs are 16 bytes.) The uniqueness of SMPTE labels is provided by an ASN.1 compliant magic byte sequence, 0x060e2b34, and a series of subidentifiers that form an expanding tree of nodes and possible leaves. Each metadata item is a leaf on the tree. The top level of the tree is shown in Figure 2.6. GUIDs are also 16-byte numbers. Because of peculiarities of the encoding in GUIDs and ULs, it is possible to use either method to allocate unique identifiers within a 16-byte space with collision. The SMPTE Metadata Dictionary does not
37
2
38
FIGURE
Structure and Organization of Metadata and the SMPTE Metadata Dictionary
SMPTE UL
GUID
XML QName
Based on
ISO ASN.1 OID
ISO UUID
ISO SGML Public
Binary length (typical)
16 bytes
16 bytes
N/A
Standard text form
SMPTE 298M or IETF RFC 3061
IETF Internet Draft
W3C Recommendation
Registered part
All 128 bits
48 bits
URI Scheme and Domain Name
Registration authority
SMPTE-RA
IEEE (24 bits) Manufacturer (24 bits)
IANA
Locally defined part
48 or more bits (for type 13 or 14)
Timestamp
Programmer’s responsibility
Comparison of Unique Identifiers
2.5
exploit this, although both the Advanced Authoring Format (AAF) and the Material Exchange Format (MXF) do. XML namespace-qualified names may require a long string of characters to represent them. To manage this, XML employs namespace tags, explained in detail later in this chapter. It is possible to map any SMPTE UL onto an XML name using a URI scheme such as IETF RFC 3061,17 although a dedicated method is now under discussion.
Identifier Allocation Strategies The tree structure of SMPTE ULs is used primarily to provide an expandable numbering method that ensures the availability of additional unique numbers in the tree at all levels of subidentifiers. Each subidentifier occupies at least 1 byte;
SMPTE Metadata Dictionary
1
3 Identification
5 Interpretive
2
4 Administration
FIGURE 2.6
13
7 SpatioTemporal
Process
6 Parametric
Top-level SMPTE metadata nodes
15 Publicly Registered
Experimental (Transient)
14 Relational
Organizationally Registered
2.1 Introduction to Metadata
but once the first 126 values have been allocated (ignoring zero), the subidentifier grows to occupy 2 bytes. Growth to 3 bytes would occur if the first 16,383 values had been allocated, and so on. Each time a subidentifier moves up a size, fewer bytes are left for allocation to deeper levels in the tree. This method of variable-length coding is called BER Object Identifier coding.18 The top-level nodes in the SMPTE Metadata Dictionary were shown in Figure 2.6. A SMPTE Engineering Guideline19 enumerates several levels of subnodes, but it is the dictionary20 that contains the latest list of subnodes. You can find the dictionary at http://www.smpte-ra.org/mdd. The hierarchical allocation of identifiers makes it possible to construct a recursive parsing algorithm that uses a series of small lookup tables, one for each subidentifier within each recognized node. This approach may be used by some applications, in particular those that build a database of metadata definitions dynamically. However, it is typical to treat the entire 16 bytes of a UL as an opaque byte string and to recognize known labels using a logarithmic search across an ordered list or search tree. The hierarchical allocation tends to leave unused zero bytes at the end of a UL. It is tempting to assign these byte positions to encode the values of metadata items. Unfortunately, whenever this is done, it becomes impossible to parse the UL properly as an opaque number, because each permissible value of encoded data adds a new entry to the lookup table. Therefore, this approach is avoided except when the number of distinct values is limited. Several examples of acceptable encoding of values can be found in the SMPTE Metadata Dictionary (e.g., UMID). In each case, the encoded variations on the base UL number no more than a dozen. In documents that define collections of metadata items, it has become customary to include a table showing the derivation of the identifier sublevels. An example is given in Figure 2.7. When the final bytes of the label contain encoded value data, these usually are shown as wildcard values. In these cases, all permissible values are registered in the SMPTE Metadata Dictionary. An encoded value data of zero is not permitted, and the base label with trailing zeros is also registered in the dictionary as a node. These wildcard nodes sometimes are referred to as degenerate labels. An example is shown in Figure 2.8. Its appearance in the SMPTE Metadata Dictionary is shown in Figure 2.9.
Data Representations Each metadata item has a preferred representation—for example, as an unsigned binary number or a text string. In many cases, several representations of the same
39
2
40
Structure and Organization of Metadata and the SMPTE Metadata Dictionary
Byte No.
FIGURE
Description
Value (hex)
Meaning
1
Object Identifier
06h
2
Label size
0Eh
3
Designator
2Bh
ISO, ORG
4
Designator
34h
SMPTE
5
Registry Category Designator
02h
KLV Sets & Packs
6
Registry Designator
05h
Fixed Length Packs (no length fields)
7
Structure Designator
01h
Set / Pack registry
8
Version Number
01h
Registry Version 1
9
Item Designator
0Dh
Organizationally registered
10
Organization
01h
AAF Association
11
Application
02h
MXF File Structure
12
Structure Version
01h
Version 1
13
Structure Kind
01h
MXF File Structure sets & packs
14
Set / Pack Kind
05h
Primer Pack
15
Primer version
01h
Version of the Primer Pack
16
Reserved
00h
Example derivation of an SMPTE UL
2.7
Byte No.
FIGURE 2.8
Description
Value (hex)
Meaning
1
Object Identifier
06h
2
Label size
0Eh
3
Designator
2Bh
ISO, ORG
4
Designator
34h
SMPTE
5
Registry Category Designator
04h
Labels
6
Registry Designator
01h
Labels
7
Structure Designator
01h
Labels
8
Version Number
01h
Registry Version 1
9
Item Designator
0Dh
Organizationally Registered
10
Organization
01h
AAF Association
11
Application
02h
Operational Patterns
12
Structure Version
01h
Version 1
13
Operational Pattern Definition
xxh
Item Complexity
14~16
Definition depends on byte 13
xxh
Example derivation of an SMPTE UL with wildcards
2.1 Introduction to Metadata
06.0E.2B.34.04.01.01.01 06.0E.2B.34.04.01.01.01 06.0E.2B.34.04.01.01.01 06.0E.2B.34.04.01.01.01 06.0E.2B.34.04.01.01.01 06.0E.2B.34.04.01.01.01 06.0E.2B.34.04.01.01.01 06.0E.2B.34.04.01.01.01 06.0E.2B.34.04.01.01.01 06.0E.2B.34.04.01.01.01 06.0E.2B.34.04.01.01.01
FIGURE
41
0D.01.02.00.00.00.00.00 0D.01.02.01.00.00.00.00 0D.01.02.01.01.01.qq.00 0D.01.02.01.01.02.qq.00 0D.01.02.01.01.03.qq.00 0D.01.02.01.02.01.qq.00 0D.01.02.01.02.02.qq.00 0D.01.02.01.02.03.qq.00 0D.01.02.01.03.01.qq.00 0D.01.02.01.03.02.qq.00 0D.01.02.01.03.03.qq.00
Application Structure Version MXF OP1a, Single Item, Single Package MXF OP1b, Single Item, Ganged Packages MXF OP1c, Single Item, Alternate Packages MXF OP2a, Play-list Items, Single Package MXF OP2b, Play-list Items, Ganged Packages MXF OP2c, Play-list Items, Alternate Packages MXF OP3a, Edit Items, Single Package MXF OP3b, Edit Items, Ganged Packages MXF OP3c, Edit Items, Alternate Packages
Wildcard entries in the SMPTE Metadata Dictionary
2.9
data value are used. For example, it is common for some text items to be constrained to use 7-bit ISO 646 characters in many contexts; in other contexts, they are represented in 16-bit Unicode (UTF-16). Similarly, integer values may be represented as text strings or binary values. Even binary values need additional description, because some hardware or software store numbers with the least significant byte first (so-called little endian), whereas others store the most significant byte first (big endian). In other cases, the choice depends on context. Every known representation of a metadata item is registered in the SMPTE Metadata Dictionary. Whenever an item is used in a file, the correct UL for the preferred representation in that file must be given. This permits file translators to reformat items reliably. Note that many file formats take a straightforward approach: in AAF and MXF, all text strings are UTF-16, whereas the General Exchange Format (GXF) and DPX use ISO 8859-1 8-bit characters (not Unicode UTF-8). Usually, the general representation is registered first, and constrained representations are registered sequentially with the next subidentifier. This is the only case in which a leaf in the dictionary has leaves beneath it.
Uniform Syntax The SMPTE Metadata Dictionary is often erroneously thought to be tied to the representation of metadata in binary form—specifically, in KLV-encoded form according to SMPTE 336M. This is not so. The SMPTE Metadata Dictionary is primarily concerned with cataloging items and collections of metadata and their known representations, independent of the transmitted form. The documents concerned with this are EG37; SMPTE 335M, 395M, and 400M; and a forthcoming document on the structure of a registry of types for individual values, enumerated values, and collections of values.
2
42
Structure and Organization of Metadata and the SMPTE Metadata Dictionary
KLV encoding is orthogonal to these documents. It defines a uniform syntax for binary serialization, not only of metadata but also for pure essence streams. KLV encoding is discussed later in this chapter.
Mapping of Legacy Metadata Definitions The SMPTE Metadata Dictionary is not exclusive. It permits several entries for representations of metadata items, including representations that differ only in the encoding used for particular values of an enumerated type, and encoding that are bit fields as opposed to numbers. This permissive approach was taken, among other reasons, to encourage the mapping of legacy metadata formats into the dictionary. For example, metadata items particular to SMPTE 268M DPX have been added to the dictionary, even though they are not used elsewhere and similar items are already in the dictionary. As demand arises, future revisions of legacy documents may add crossreferences to the appropriate metadata dictionary entries in each of their tables, following the style adopted for MXF documents.
Metadata Dictionary Structure The record structure of entries in the dictionary is defined by SMPTE 335M. For each entry, the dictionary includes the following: ◆
Key: The SMPTE UL for the item, including the dictionary version number at the time this item was introduced
◆
Name: A plain text name, not necessarily suitable for machine processing
◆
Symbol: A name that conforms to relevant computer language syntax restrictions (such as XML and other popular languages)
◆
Description: For human understanding
◆
Defining Document: A reference to the document that precisely defines the meaning of this item, or to an authoritative source for such information
◆
Type Specification: A textual description of the type; ULs for types are being added as links into the forthcoming SMPTE Types Registry
◆
Value Length and Range restrictions
◆
Node/Leaf: If the entry is a node or a leaf in the naming tree
◆
Administrative Notes: Such as whether the item has been approved, is a place holder, or has been deprecated (not recommended for new equipment)
2.1 Introduction to Metadata
2.1.4
Metadata Registries Another name for a public dictionary, especially one updated regularly, is a registry. In 1999, SMPTE set up the SMPTE Registration Authority, an independent nonprofit organization, to create, administer, and publish dictionaries (or registries). The registries administered by the SMPTE Registration Authority can be found at http://www.smpte-ra.org. They include the SMPTE Metadata Dictionary; other SMPTE registries are being added.
2.1.5
Comparison with XML Compared to the organization of the SMPTE Metadata Dictionary, XML technology takes a different approach. A simple example of the same metadata shown in both KLV and XML format is in Figure 2.10. The contents of XML documents are written according to XML dialects, which may be created as required. XML dialects consist of a collection of definitions for named elements and the specifications for combining these elements. Dialects may be expressed formally in one or more description languages, the most well-known of which are Document Type Definitions (DTDs)10 and XML Schema.21 These are not the same. DTDs are thought to be obsolete, primarily because of their lack of support for the qualified names used in namespaces—although some of the more interesting facilities of DTDs have no clear counterpart in XML Schema. (For instance, DTDs still provide the only method of describing any external nonXML files associated with an XML document, such as binary essence data.) XML Schema was approved in 2002 and is in widespread use. An individual XML document declares the dialects to which it conforms through a set of namespace declarations at the head of the document. XML Schema formally defines namespaces and provides a wealth of facilities for combining, extending, and revising them. There is no central registry of namespaces or their schemas, although a namespace identifier must be unique according to the rules for URIs. Several approaches to locating, obtaining, and applying schemas are defined by the XML Schema specification. These facilities of XML Schema are useful, but much of their flexibility is contrary to creating an XML dialect that faithfully uses the SMPTE Metadata Dictionary. For example, to preserve the ability to check metadata against the dictionary, every time new items are added to the dictionary, a new XML Schema must be created that includes all previous versions of the dictionary and adds the
43
2
44
Structure and Organization of Metadata and the SMPTE Metadata Dictionary
KLV 06 0a 06 29
0e 30 0e 06 04 06 01 06 0e 0b 00 00
2b 31 2b 0e 30 0e 30 2b 01 02
34 32 34 2b 31 2b
01 33 02 34 32 34
01 34 01 01 33 01
01 35 01 01
01 36 01 01
01 37 02 01
00 38 00 01
00 00 00 00 00 00 39 00 00 00 00 00 00 01 00 00 00 00 00 00
01 01 01 01 02 00 00 00 00 00 00
34 02 33 01 04 00 00 00 00 00 00 00 04 30 31 32 33 01 39
Direct Conversion to XML 0123456789 0123 0 0123 9
XML Namespace Example Here is a script with embedded elements from several namespaces unrelated to the original