THE
INDUSTRIAL COMMUNICATION TECHNOLOGY H
© 2005 by CRC Press
A
N
D
B
O
O
K
I N D U S T R I A L I N F O R M AT I O N T E C H N O L O G Y S E R I E S Series Editor
RICHARD ZURAWSKI Forthcoming Books Embedded Systems Handbook Edited by Richard Zurawski
Electronic Design Automation for Integrated Circuits Handbook Luciano Lavagno, Grant Martin, and Lou Scheffer
© 2005 by CRC Press
THE
INDUSTRIAL COMMUNICATION TECHNOLOGY H
A
N
D
B
O
O
Edited by
RICHARD ZURAWSKI
© 2005 by CRC Press
K
Library of Congress Cataloging-in-Publication Data The industrial communication technology handbook / Richard Zurawski, editor. p. cm. — (The industrial information technology series ; 1) Includes bibliographical references and index. ISBN 0-8493-3077-7 (alk. paper) 1. Computer networks. 2. Data transmission systems. 3. Wireless communication systems. I. Zurawski, Richard. II. Series. TK5105.5.I48 2005 670'.285'46—dc22
2004057922
This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher. All rights reserved. Authorization to photocopy items for internal or personal use, or the personal or internal use of specific clients, may be granted by CRC Press, provided that $1.50 per page photocopied is paid directly to Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923 USA. The fee code for users of the Transactional Reporting Service is ISBN 0-8493-3077-7/05/$0.00+$1.50. The fee is subject to change without notice. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. The consent of CRC Press does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific permission must be obtained in writing from CRC Press for such copying. Direct all inquiries to CRC Press, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation, without intent to infringe.
Visit the CRC Press Web site at www.crcpress.com © 2005 by CRC Press No claim to original U.S. Government works International Standard Book Number 0-8493-3077-7 Library of Congress Card Number 2004057922 Printed in the United States of America 1 2 3 4 5 6 7 8 9 0 Printed on acid-free paper
© 2005 by CRC Press
Foreword
A handbook on industrial communication technology! What a challenge! When we know the complexity of industrial applications, the number of possible solutions, the number of standards, the variety of applications, of contexts, and of products! The challenge can be expressed with just a few words: applications diversity, need for networking, integration of functions, and technologies. Applications diversity: The applications concerned with industrial communications are known under the following terms: process control, manufacturing and flexible systems, building automation, transport management, utilities, and embedded systems, in trains, aircraft, cars, etc. All these applications need similar services, but in very different environments and also with very different qualities of service. Need for networking: The need for networking is not new. Since the MAP and TOP projects, in the field of automation, it is clear that the future of automation is really in distributed systems supported by distributed (heterogeneous) communication systems. The sharing of information, the necessity of interoperability, and the necessity of abstraction levels are just some of the reasons why industrial communication has always been considered a major challenge. Integration: In all the domains, integration is a key word meaning that all the functions in an enterprise need to be interconnected, in real time, as much as possible. This is only feasible through the use of robust communication systems, real-time features, and coherent design of the applications. With the development of ubiquitous computing and ambient intelligence, industrial communication applications will become the next challenge. Technologies: Numerous technologies are available for use at different levels of control and command and in all the services provided by a company; in addition, they exist for maintenance, supervision and monitoring, diagnosis, spare parts management, and so on. Specific solutions are frequently dictated by specific problems. The importance of standards cannot be overemphasized. Wireless systems, fieldbuses and cell or plant networks, building automation, device buses and applications, embedded systems, Internet technologies and related applications, security and safety, MAC protocols, and representative application domains are just some of the topics treated in this handbook. Methodology considerations for choosing and developing systems are also presented. This handbook will become the major reference source for this domain. Setting aside some technological details, the methods and principles presented will be relevant for years to come. Putting together such a book would not be possible without the cooperation of a great number of authors, all specialists in their fields and involved in the development of communication systems and applications, as well as members of the International Advisory Board. The Industrial Communication Technology Handbook is a must for industrial communication professionals. Jean-Pierre Thomesse Institute National Polytechnique de Lorraine Nancy, France v © 2005 by CRC Press
International Advisory Board
Jean-Pierre Thomesse, LORIA-INPL, France, Chair Salvatore Cavalieri, University of Catania, Italy Dietmar Dietrich, Vienna University of Technology, Austria Jean-Dominique Decotignie, CSEM, Switzerland Josep M. Fuertes, Universitat Politecnico de Catalunia, Spain Jürgen Jasperneite, Phoenix Contact, Germany Chris Jenkins, Proces-Data, U.K. Ed Koch, Akua Control, U.S. Thilo Sauter, Austrian Academy of Sciences, Austria Viktor Schiffer, Rockwell Automation, Germany Wolfgang Stripf, Siemens AG, Germany
vi © 2005 by CRC Press
Preface
Introduction Aim The purpose of The Industrial Communication Technology Handbook is to provide a reference useful for a broad range of professionals and researchers from industry and academia interested in or involved in the use of industrial communication technology and systems. This is the first publication to cover this field in a cohesive and comprehensive way. The focus of this book is on existing technologies used by the industry, and newly emerging technologies and trends, the evolution of which is driven by the actual needs and by the industry-led consortia and organizations. The book offers a mix of basics and advanced material, as well as overviews of recent significant research and implementation/technology developments. The book is aimed at novices as well as experienced professionals from industry and academia. It is also suitable for graduate students. The book covers extensively the areas of fieldbus technology, industrial Ethernet and real-time extensions, wireless and mobile technologies in industrial applications, linking the factory floor with the Internet and wireless fieldbuses, industrial networks’ security and safety, automotive applications, industrial automation applications, building automation applications, energy systems applications, and others. It is an indispensable companion for those who seek to learn more on industrial communication technology and systems and for those who want to stay up to date with recent technical developments in the field. It is also a rich source of material for any university or professional development course on industrial networks and related technologies. Contributors The book contains 42 contributions, written by leading experts from industry and academia directly involved in the creation and evolution of the ideas and technologies treated in the book. Over half of the contributions are from industry and industrial research establishments at the forefront of the developments shaping the field of industrial communication technology, for example, ABB, Bosch Rexroth Corporation, CSEM, Decomsys, Frequentis, Phoenix Contact, PROCES-DATA, PSA Peugeot-Citroen, PROFIBUS International, Rockwell Automation, SERCOS North America, Siemens, and Volcano. Most of the mentioned contributors play a leading role in the formulation of long-term policies for technology development and are key members of the industry–academe consortia implementing those policies. The contributions from academia and governmental research organizations are represented by some of the most renowned institutions, such as Cornell University, Fraunhofer, LORIA-INPL, National Institute of Standards (U.S.), Politecnico di Torino (Italy), Singapore Institute of Manufacturing Technology, Technical University of Berlin, and Vienna University of Technology. Format The presented material is in the form of tutorials, surveys, and technology overviews, combining fundamentals with advanced issues, making this publication relevant to beginners as well as seasoned profesvii © 2005 by CRC Press
sionals from industry and academia. Particular emphasis is on the industrial perspective, illustrated by actual implementations and technology deployments. The contributions are grouped in sections for cohesive and comprehensive presentation of the treated areas. The reports on recent technology developments, deployments, and trends frequently cover material released to the profession for the first time. Audience The handbook is designed to cover a wide range of topics that comprise the field of industrial communication technology and systems. The material covered in this volume will be of interest to a wide spectrum of professionals and researchers from industry and academia, as well as graduate students, from the fields of electrical and computer engineering, industrial and mechatronic engineering, mechanical engineering, computer science, and information technology.
Organization The book is organized into two parts. Part 1, Basics of Data Communication and IP Networks, presents material to cover in a nutshell basics of data communication and IP networks. This material is intended as a handy reference for those who may not be familiar with or wish to refresh their knowledge of some of the concepts used extensively in Part 2. Part 2, Industrial Communication Technology and Systems, is the main focus of the book and presents a comprehensive overview of the field of industrial communication technologies and systems. Some of topics presented in this part have received limited coverage in other publications due to either the fast evolution of the technologies involved, material confidentiality, or limited circulation in case of industry-driven developments. Part 1 includes six chapters that present in a concise way the vast area of IP networks. As mentioned, it is intended as supplementary reading for those who would like to refresh and update their knowledge without resorting to voluminous publications. This background is essential to understand the material presented in the chapters in Part 2. This part includes the following chapters: “Principles of LowerLayer Protocols for Data Communications in Industrial Communication Networks,” “IP Internetworking,” “A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues,” “Fundamentals in Quality of Service and Real-Time Transmission,” “Survey of Network Management Frameworks,” and “Internet Security.” Part 2 includes five major sections: Field Area and Control Networks, Ethernet and Wireless Network Technologies, Linking Factory Floor with the Internet and Wireless Fieldbuses, Security and Safety Technologies in Industrial Networks, and Applications of Networks and Other Technologies. Field Area and Control Networks The section on fieldbus technology provides a comprehensive overview of selected fieldbuses. The focus is on the most widely used in industry and the most widely known. The presentation is not exhaustive, however. One of the limiting factors was the availability of qualified authors to write authoritatively on the topics. This section begins with “Fieldbus Systems: History and Evolution,” presenting an extensive introduction to the fieldbus technology, comparison and critical evaluation of the existing technologies, and the evolution and emerging trends. This chapter is a must for anyone with an interest in the origins of the current fieldbus technology landscape. It is also compulsory reading for novices to understand the concepts behind fieldbuses. The “The WorldFIP Fieldbus” chapter was written by Jean-Pierre Thomesse, one of the pioneers of the fieldbus technology. WorldFIP is one of the first fieldbuses, developed in France at the beginning viii © 2005 by CRC Press
of the 1980s and widely used nowadays, particularly in applications that require hard real-time constraints and high dependability. This is almost a “personal” record of a person involved in the development of WorldFIP. A brief record of the origins and evolution of the FOUNDATION Fieldbus (H1, H2, and HSE) and its technical principles is presented in the chapter “FOUNDATION Fieldbus: History and Features.” The description of PROFIBUS (PROFIBUS DP) is presented in “PROFIBUS: Open Solutions for the World of Automation.” This is a comprehensive overview of PROFIBUS DP, one of the leading players in the fieldbus field, and it includes material on HART on PROFIBUS DP, application and master and system profiles, and integration technologies such as GSD (general station description), EDD (electronic device description), and DTM (device type manager). The chapter “Principles and Features of PROFInet” presents a new automation concept, and the technology behind it, that has emerged as a result of the trend in automation technology toward modular, reusable machines and plants with distributed intelligence. PROFInet is an open standard for industrial automation based on the industrial Ethernet. The material is presented by researchers from the Automation and Drives Division of Siemens AG, the leading provider of automation solutions within Siemens AG. Dependable time-triggered communication and architecture are presented in “Dependable TimeTriggered Communication,” written by Hermann Kopetz et al. Hermann Kopetz is the inventor of the concept and the driving force for the technology development. The TTP (Time-Triggered Protocol) and TTA (Time-Triggered Architecture) had a profound impact on the development of safety-critical systems, particularly in the automotive industry. This is one of the most authoritative presentations on this topic. The time-triggered CAN (TTCAN) protocol was introduced by Bosch in 1999 with the aim of making CAN suitable for the new needs of the automotive industry. This technology is introduced in “Controller Area Network: A Survey.” This chapter describes the main features of the Controller Area Network (CAN) protocol, including TTCAN. The chapter “The CIP Family of Fieldbus Protocols” introduces the following CIP (Common Industrial Protocol) -based networks: DeviceNet, a CIP implementation employing a CAN data link layer; ControlNet, implementing the same basic protocol on new data link layers that allow for much higher speed (5 Mbps), strict determinism, and repeatability while extending the range of the bus (several kilometers with repeaters); and Ethernet/IP, in which CIP runs over TCP/IP. The chapter also introduces CIP Sync, which is a CIP-based communication principle that enables synchronous low-jitter system reactions without the need for low-jitter data transmission. This is important in applications that require much tighter control of a number of real-time parameters characterizing hard real-time control systems. The chapter also overviews CIP Safety, a safety protocol that adds additional services to transport data with high integrity. The P-NET fieldbus is presented in the chapter “The Anatomy of the P-NET Fieldbus.” The chapter was written by the chairman of the International P-NET User Organization and the technical director of PROCES-DATA (U.K.) Ltd., which provides the real-time PC operating system for P-NET. The chapter “INTERBUS Means Speed, Connectivity, Safety” introduces INTERBUS, a fieldbus with over 6 million nodes installed, and a broad base of device manufacturers. The chapter also briefly introduces IP over INTERBUS and looks at data throughput for IP tunneling. The IEEE 1394 FireWire, a high-performance serial bus, principles of its operation, and applications in the industrial environment are presented in “Data Transmission in Industrial Environments Using IEEE 1394 FireWire.” The issues involved in the configuration (setting up a fieldbus system) and management (diagnosis and monitoring, and adding new devices to the network, to mention some activities) of fieldbus systems
ix © 2005 by CRC Press
are presented in “Configuration and Management of Fieldbus Systems.” This chapter also discusses the plug-and-participate concept and its implementations in the industrial environment. The section on fieldbus technology is concluded by an excellent chapter discussing the pros and cons of selecting control networks for specific applications and application domains. The material in this chapter is authored by Jean-Dominique Decotignie. It includes a great deal of practical recommendations that can be useful for practicing professionals. It is the kind of material that cannot be easily found in the professional literature. Ethernet and Wireless Network Technologies This section on Ethernet and wireless/mobile network technologies contains four chapters discussing the use of Ethernet and its variants in industrial automation, as well as selected issues related to wireless technologies. Ethernet is fast becoming a de facto industry standard for communication in factories and plants at the fieldbus level. The random and native CSMA/CD (carrier-sense multiple access with collision detection) arbitration mechanism is being replaced by other solutions allowing for deterministic behavior required in real-time communication to support soft and hard real-time deadlines. The idea of using wireless technology on the factory floor is appealing, since fieldbus stations and automation components can be mobile, and furthermore, the need for (breakable) cabling is reduced. However, the wireless transmission characteristics are fundamentally different from those of other media types, leading to comparably high and time-varying error rates. This poses a significant challenge for fulfilling the hard real-time and reliability requirements of industrial applications. This section begins with the chapter “Approaches to Enforce Real-Time Behavior in Ethernet,” which discusses various approaches to ensure real-time communication capabilities, to include those that support probabilistic as well as deterministic analysis of the network access delay. This chapter also presents a brief description of the Ethernet protocol. The practical solutions to ensure real-time communication capabilities using switched Ethernet are presented in “Switched Ethernet in Automation Networking.” This chapter provides an evaluation of the switched Ethernet suitability in the context of industrial automation and presents practical solutions obtained through R&D to address actual needs. The issues involving the use of wireless and mobile communication in the industrial environment (factory floor) are discussed in “Wireless LAN Technology for the Factory Floor: Challenges and Approaches.” This is a very comprehensive chapter dealing with topics such as error characteristics of wireless links and lower-layer wireless protocols for industrial applications. It also briefly discusses hybrid systems involving extending selected fieldbus technologies (such as PROFIBUS and CAN) with wireless stations. The chapter “Wireless Local and Wireless Personal Area Network Technologies for Industrial Deployment” concludes this section. This chapter discusses from the radio network perspective the potentials and limits of technologies such as Bluetooth, IEEE 802.11, and ZigBee for deployment in the industrial environments. Linking Factory Floor with the Internet and Wireless Fieldbuses The demand for process data availability at different levels of factory organizational hierarchy, from production to the business level, has caused an upsurge in the activities to link the “factory floor” with the intranet/Internet. The issues, solutions, and technologies for linking industrial environments with the Internet and wireless fieldbuses are extensively discussed in this section.
x © 2005 by CRC Press
The issues and actual and potential solutions behind linking factory floor/industrial environments with the Internet/intranet are discussed in “Linking Factory Floor and the Internet.” This chapter also discusses new trends involving industrial Ethernet. The chapter “Extending EIA-709 Control Networks across IP Channels” presents a comprehensive overview of the use of the ANSI/EIA-852 standard to encapsulate the ANSI/EIA-709 control network protocol. This contribution comes from authors from industry involved directly in the relevant technology development. The means for interconnecting wire fieldbuses to wireless ones in the industrial environment, various design alternatives, and their evaluation are presented in “Interconnection of Wireline and Wireless Fieldbuses.” This is one of the most comprehensive and authoritative discussions of this challenge, presented by one of the leading authorities of the fieldbus technology. Security and Safety Technologies in Industrial Networks Security in the field area networks employed in the industrial environment is a major challenge. The requirement for process data availability via intranet/Internet access opens possibilities for intrusion and potential hostile actions to result in engineering system failures, including catastrophic ones if they involve chemical plants, for instance. These and safety issues are the focus of this section. This section begins with the chapter “Security Topics and Solutions for Automation Networks,” which provides a comprehensive discussion of the issues involved, challenges, and existing solutions amenable to adaptation to industrial environments, and outlines a need for new approaches and solutions. The second paper in this section is “PROFIsafe: Safety Technology with PROFIBUS,” which focuses on the existing solutions and supporting technology in the context of PROFIBUS, one of the most widely used fieldbuses in industrial applications. The material is presented by some of the creators of PROFIsafe. CIP Safety, a safety protocol for CIP, is presented in the Field Area and Control Networks section in “The CIP Family of Fieldbus Protocols.” Applications of Networks and Other Technologies This is the last major section in the book. It has eight subsections dealing with specialized field area networks (synonymous with fieldbuses) and their applications to cover automotive communication technology, building automation, manufacturing message specification in industrial communication systems, motion control, train communication, smart transducers, energy systems, and SEMI (Semiconductor Equipment and Materials International). This section tries to present some of the most representative applications of field area networks outside the industrial controls and automation presented in the Field Area and Control Networks section. The “Automotive Communication Technologies” subsection has four chapters discussing different approaches, solutions, and technologies. The automotive industry is a very fast growing consumer of field area networks, aggressively adopting mechatronic solutions to replace or duplicate existing mechanical/hydraulic systems. This subsection begins with the chapter “Design of Automotive X-by-Wire Systems,” which gives an overview of the X-by-wire approach and introduces safety-critical communication protocols (TTP/C, FlexRay, and TTCAN) and operating systems and middleware services (OSEKTime and FTCom) used in automotive applications. The chapter also presents a comprehensive case study illustrating the design of a Steer-by-Wire system. The newly emerging standard and technology for automotive safety-critical communication — FlexRay — is presented in the chapter “FlexRay Communication Technology.” The material is among the most
xi © 2005 by CRC Press
comprehensive and authoritative available at the time of this book’s publication, and it is written by industry people directly involved in the standard and technology development. The LIN (Local Interconnect Network) communication standard, enabling fast and cost-efficient implementation of low-cost multiplex systems for local interconnect networks in vehicles, is presented in “The LIN Standard.” The Volcano concept and technology for the design and implementation of in-vehicle networks using the standardized CAN and LIN communication protocols are presented in “ Volcano: Enabling Correctness by Design.” The material comes from the source: Volcano Communications Technologies AG. This chapter provides insight into the design and development process of an automotive communication network. Another fast-growing consumer of field area networks is building automation. At this stage, particularly for office, commercial, and industrial complexes, the use of automation solutions offers substantial financial savings on costs of lighting and HVAC and can considerably improve the quality of the environment. There are other benefits as well. Relevant communication solutions for this application domain are presented in the subsection “Networks in Building Automation.” This subsection is composed of three contributions, outlining the issues involved and the specific technologies currently in use. An excellent introduction to issues, architectures, and available solutions is presented in “The Use of Network Hierarchies in Building Telemetry and Control Applications.” The material was written by one of the pioneers of the concept of building automation and a technology developer. The details of the European Installation Bus (EIB), a field area network designed specifically for building automation purposes, are presented in “EIB: European Installation Bus.” This chapter was contributed by one of the most active proponents of using field area networks in building automation and a co-founder of one of the largest research groups in this field, the Vienna University of Technology. “Fundamentals of LonWorks/EIA-709 Networks: ANSI/EIA-709 Protocol Standard (LonTalk)” chapter introduces the technical aspects of LonWorks networks, one of the main contenders for building automation. It covers protocol, development environments, and tools. The subsection “Manufacturing Message Specification in Industrial Automation” focuses on the highly successful international standard MMS (manufacturing message specification), which is an Open Systems Interconnection (OSI) application layer messaging protocol designed for the remote control and monitoring of devices such as remote terminal units (RTUs), programmable logic controllers (PLCs), numerical controllers (NCs), robot controllers (RCs), etc. This section features two chapters: “The Standard Message Specification for Industrial Automation Systems: ISO 9506 (MMS),” which gives a fairly comprehensive introduction to the standard and illustrates its use; and “Virtual Factory Communication System Using ISO 9506 and Its Application to Networked Factory Machine,” which shows the use of MOTIP (MMS on top of TCP/IP) in development and operation of the virtual factory environment. The chapter also discusses an MMS-based Internet monitoring system. The chapter “The SERCOS interface™” describes the international standard (IEC/EN 61491) for communication between digital motion controls, drives, input/output (I/O), and sensors. It includes definitions, a brief history, a description of SERCOS interface communication methodology, an introduction to SERCOS interface hardware, a discussion of speed considerations, information on conformance testing, and information on available development tools. A number of real-world applications are presented and a list of sources for additional information is provided. The “IEC/IEEE Train Communication Network” chapter presents details of the international standard IEC 61375, adopted in 1999. It also discusses other European and U.S. initiatives in this field.
xii © 2005 by CRC Press
“A Smart Transducer Interface Standard for Sensors and Actuators” presents material on the IEEE 1451 standards for connecting sensors and actuators to microprocessors, control and field area networks, and instrumentation systems. The standards also define the Transducer Electronic Data Sheet (TEDS), which allows for the self-identification of sensors. The IEEE 1451 standards facilitate sensor networking, a new trend in industrial automation, which, among other benefits, offers strong economic incentives. The use of IEC 61375 (Train Communication Network) in substation automation is presented in “Applying IEC 61375 (Train Communication Network) to Data Communication in Electrical Substations.” This is in an interesting case study illustrating the suitability of some of the field area networks for various application domains. The last subsection and chapter in the Applications of Networks and Other Technologies section is “SEMI Interface and Communication Standards: An Overview and Case Study.” This is an excellent introduction to SEMI, providing an overview of the fundamentals of the SEMI Equipment Communication Standard, commonly referred to as SECS, its interpretation, the available software tools, and case study applications. The material was written by experts from the Singapore Institute of Manufacturing Technology who were involved in a number of SEMI technology developments and deployments.
Locating Topics To assist readers with locating material, a complete table of contents is presented at the front of the book. Additionally, each chapter begins with its own table of contents. For further assistance, two indexes are provided at the end of the book: an index of authors who contributed to the book, together with the titles of their contributions, and a detailed subject index.
xiii © 2005 by CRC Press
Acknowledgments
I thank all members of the International Advisory Board for their help with structuring the book, selection of authors, and material evaluation. I have received tremendous cooperation from all contributing authors. I thank all of them for that. I also express gratitude to my publisher, Nora Konopka, and other CRC Press staff involved in the book’s production, particularly Jessica Vakili, Elizabeth Spangenberger, and Gail Renard. My gratitude goes also to my wife, who tolerated the countless hours I spent preparing this book. Richard Zurawski ISA Corp Santa Clara, CA
© 2005 by CRC Press
The Editor
Dr. Richard Zurawski is president and CEO of ISA Corp., South San Francisco and Santa Clara, CA, a company involved in providing solutions for industrial and societal automation. He is also chief scientist with and a partner in a Silicon Valley-based start-up involved in the development of wireless solutions and technology. Dr. Zurawski is a co-founder of the Institute for Societal Automation, Santa Clara, a research and consulting organization. Dr. Zurawski has over 25 years of academic and industrial experience, including a regular appointment at the Institute of Industrial Sciences, University of Tokyo, and full-time R&D advisor with Kawasaki Electric, Tokyo. He has provided consulting services to Telecom Research Laboratories, Melbourne, Australia, and Kawasaki, Ricoh, and Toshiba Corporations, Japan. He has participated in an IMS package: Formal Methods in Distributed Autonomous Manufacturing Systems and Distributed Logic Controllers, Task 8: Distributed Intelligence in Manufacturing Systems; Globeman 21 Group I: Global Product Management. He has also participated in a number of Japanese Intelligent Manufacturing Systems programs. Dr. Zurawski’s involvement in R&D projects and activities in the past few years includes remote monitoring and control, network-based solutions for factory floor control, network-based demand side management, MEMS (automatic microassembly), Java technology, SEMI (Semiconductor Equipment and Materials International) implementations, development of DSL telco equipment, and wireless applications. Dr. Zurawski currently serves as an associate editor of the IEEE Transactions on Industrial Electronics and Real-Time Systems: The International Journal of Time-Critical Computing Systems, Kluwer Academic Publishers. He was a guest editor of three special sections in IEEE Transactions on Industrial Electronics: two sections on factory automation and one on factory communication systems. He has also been a guest editor of a special issue of the Proceedings of the IEEE dedicated to industrial communication systems. In addition, Dr. Zurawski was invited by IEEE Spectrum to contribute material on Java technology to “Technology 1999: Analysis and Forecast Issue.” Dr. Zurawski is the series editor for The Industrial Information Technology Series, CRC Press, Boca Raton, FL, and has served as a vice president of the Institute of Electrical and Electronics Engineers (IEEE) Industrial Electronics Society (IES), chairman of the Factory Automation Council, and chairman of the IEEE IES Ad Hoc Committee on IEEE Transactions on Factory Automation. He was an IES representative to the IEEE Neural Network Council and IEEE Intelligent Transportation Systems Council. He was also on a steering committee of the ASME/IEEE Journal of Micromechanical Systems. In 1996, he received the Anthony J. Hornfeck Service Award from the IEEE Industrial Electronics Society. Dr. Zurawski has established two IEEE events: the IEEE Workshop on Factory Communication Systems, the only IEEE event dedicated to industrial communication networks; and the IEEE International Conference on Emerging Technologies and Factory Automation, the largest IEEE conference on factory automation. He has served as a general, program, and track chair for a number of IEEE conferences and workshops. xv © 2005 by CRC Press
Dr. Zurawski has published extensively on various aspects of control systems, industrial and factory automation, industrial communication systems, robotics, formal methods in the design of embedded and industrial systems, and parallel and distributed programming and systems. Currently, he is preparing The Embedded Systems Handbook, soon to be published by CRC Press.
© 2005 by CRC Press
Contributors
Luís Almeida
Joachim Feld
Øyvind Holmeide
Universidade de Aveiro Aveiro, Portugal
Siemens AG Nürnberg, Germany
OnTime Networks Billingstad, Norway
Herbert Barthel
A.M. Fong
Jürgen Jasperneite
Siemens AG Nürnberg-Moorenbrunn, Germany
Singapore Institute of Manufacturing Technology Singapore
Phoenix Contact GmbH & Co. KG Bad Pyrmont, Germany
Günther Bauer
Klaus Frommhagen
Ulrich Jecht
Vienna University of Technology Vienna, Austria
Fraunhofer Institute of Photonic Microsystems Dresden, Germany
UJ Process Analytics Baden-Baden, Germany
K.M. Goh
PROCES-DATA (U.K.) Ltd. Wallingford, Oxon, United Kingdom
Ralph Büsgen Siemens AG Nürnberg, Germany
Christopher G. Jenkins
Salvatore Cavalieri
Singapore Institute of Manufacturing Technology Singapore
University of Catania Catania, Italy
Zygmunt J. Haas
Gianluca Cena
Cornell University Ithaca, New York
IEIIT-CNR Torino, Italy
Scott C. Hibbard
Jean-Dominique Decotignie
Svein Johannessen ABB Corporate Research Billingstad, Norway
Wolfgang Kampichler Bosch Rexroth Corporation Hoffman Estates, Illinois
Frequentis GmbH Vienna, Austria
Wolfgang Kastner Helmut Hlavacs
Centre Suisse d’Electronique et de Microtechnique Neuchatel, Switzerland
University of Vienna Vienna, Austria
Wilfried Elmenreich
Mai Hoang
Vienna University of Technology Vienna, Austria
University of Potsdam Potsdam, Germany
Vienna University of Technology Vienna, Austria
Dong-Sung Kim Kumoh National Institute of Technology Gumi-Si, South Korea
xvii © 2005 by CRC Press
Hubert Kirrmann
Peter Lutz
Antal Rajnák
ABB Corporate Research Baden, Switzerland
Interests Group SERCOS interface e.V. Stuttgart, Germany
Volcano AG Tägerwilen, Switzerland
Edward Koch Akua Control San Rafael, California
Hermann Kopetz Vienna University of Technology Vienna, Austria
Kirsten Matheus Carmeq GmbH Berlin, Germany
Dietmar Millinger DECOMSYS — Dependable Computer Systems Vienna, Austria
Christopher Kruegel Vienna University of Technology Vienna, Austria
Christian Kurz University of Vienna Vienna, Austria
Ronald M. Larsen SERCOS North America Lake in the Hills, Illinois
Kang Lee National Institute of Standards and Technology Gaithersburg, Maryland
Y.G. Lim Singapore Institute of Manufacturing Technology Singapore
Lucia Lo Bello
Petra Nauber
Thilo Sauter Austrian Academy of Sciences Wiener Neustadt, Austria
Uwe Schelinski Fraunhofer Institute of Photonic Microsystems Dresden, Germany
Viktor Schiffer
Fraunhofer Institute of Photonic Microsystems Dresden, Germany
Rockwell Automation Haan, Germany
Nicolas Navet
Fraunhofer Institute of Photonic Microsystems Dresden, Germany
Michael Scholles LORIA Vandoeuvre-lès-Nancy, France
Georg Neugschwandtner Vienna University of Technology Vienna, Austria
Roman Nossal DECOMSYS — Dependable Computer Systems Vienna, Austria
Paulo Pedreiras Universidade de Aveiro Aveiro, Portugal
Christian Schwaiger Austria Card GmbH Vienna, Austria
Karlheinz Schwarz Schwarz Consulting Company (SCC) Karlsruhe, Germany
Françoise Simonot-Lion LORIA Vandoeuvre-lès-Nancy, France
University of Catania Catania, Italy
Stefan Pitzek
Tor Skeie
Vienna University of Technology Vienna, Austria
ABB Corporate Research Billingstad, Norway
Dietmar Loy
Manfred Popp
Ye Qiong Song
LOYTEC Electronics GmbH Vienna, Austria
Siemens AG Fürth, Germany
LORIA Vandoeuvre-lès-Nancy, France
xviii © 2005 by CRC Press
Stefan Soucek
O. Tin
Cédric Wilwert
LOYTEC Electronics GmbH Vienna, Austria
Singapore Institute of Manufacturing Technology Singapore
PSA Peugeot–Citroen La Garenne Colombe, France
Wilfried Steiner Vienna University of Technology Vienna, Austria
Hagen Woesner Albert Treytl Vienna University of Technology Vienna, Austria
Wolfgang Stripf Siemens AG Karlsruhe, Germany
Technical University of Berlin Berlin, Germany
K. Yi Adriano Valenzano IEIIT-CNR Torino, Italy
Singapore Institute of Manufacturing Technology Singapore
Peter Wenzel
Pierre A. Zuber
PROFIBUS International Karlsruhe, Germany
Bombardier Transportation Total Transit Systems Pittsburgh, Pennsylvania
Jean-Pierre Thomesse Institut National Polytechnique de Lorraine Vandoeuvre-lès-Nancy, France
Andreas Willig University of Potsdam Potsdam, Germany
xix © 2005 by CRC Press
Contents
Part 1
Basics of Data Communication and IP Networks 1 Principles of Lower-Layer Protocols for Data Communications in Industrial Communication Networks..............................................................................1-1 Andreas Willig and Hagen Woesner 2 IP Internetworking ............................................................................................................2-1 Helmut Hlavacs and Christian Kurz 3 A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues.......3-1 Lucia Lo Bello 4 Fundamentals in Quality of Service and Real-Time Transmission...............................4-1 Wolfgang Kampichler 5 Survey of Network Management Frameworks................................................................5-1 Mai Hoang 6 Internet Security ................................................................................................................6-1 Christopher Kruegel
Part 2
Industrial Communication Technology and Systems
Section I
Field Area and Control Networks
7 Fieldbus Systems: History and Evolution ........................................................................7-1 Thilo Sauter 8 The WorldFIP Fieldbus .....................................................................................................8-1 Jean-Pierre Thomesse 9 FOUNDATION Fieldbus: History and Features ....................................................................9-1 Salvatore Cavalieri 10 PROFIBUS: Open Solutions for the World of Automation .........................................10-1 Ulrich Jecht, Wolfgang Stripf, and Peter Wenzel 11 Principles and Features of PROFInet ............................................................................11-1 Manfred Popp, Joachim Feld, and Ralph Büsgen 12 Dependable Time-Triggered Communication ..............................................................12-1 Hermann Kopetz, Günther Bauer, and Wilfried Steiner
xxi © 2005 by CRC Press
13 Controller Area Network: A Survey ...............................................................................13-1 Gianluca Cena and Adriano Valenzano 14 The CIP Family of Fieldbus Protocols ...........................................................................14-1 Viktor Schiffer 15 The Anatomy of the P-NET Fieldbus .............................................................................15-1 Christopher G. Jenkins 16 INTERBUS Means Speed, Connectivity, Safety.............................................................16-1 Jürgen Jasperneite 17 Data Transmission in Industrial Environments Using IEEE 1394 FireWire..............17-1 Michael Scholles, Uwe Schelinski, Petra Nauber, and Klaus Frommhagen 18 Configuration and Management of Fieldbus Systems..................................................18-1 Stefan Pitzek and Wilfried Elmenreich 19 Which Network for Which Application.........................................................................19-1 Jean-Dominique Decotignie
Section II
Ethernet and Wireless Network Technologies
20 Approaches to Enforce Real-Time Behavior in Ethernet .............................................20-1 Paulo Pedreiras and Luís Almeida 21 Switched Ethernet in Automation Networking .............................................................21-1 Tor Skeie, Svein Johannessen, and Øyvind Holmeide 22 Wireless LAN Technology for the Factory Floor: Challenges and Approaches..........22-1 Andreas Willig 23 Wireless Local and Wireless Personal Area Network Technologies for Industrial Deployment ......................................................................................................................23-1 Kirsten Matheus
Section III
Linking Factory Floor with the Internet and Wireless Fieldbuses
24 Linking Factory Floor and the Internet.........................................................................24-1 Thilo Sauter 25 Extending EIA-709 Control Networks across IP Channels..........................................25-1 Dietmar Loy and Stefan Soucek 26 Interconnection of Wireline and Wireless Fieldbuses..................................................26-1 Jean-Dominique Decotignie
Section IV
Security and Safety Technologies in Industrial Networks
27 Security Topics and Solutions for Automation Networks............................................27-1 Christian Schwaiger and Albert Treytl
xxii © 2005 by CRC Press
28 PROFIsafe: Safety Technology with PROFIBUS ...........................................................28-1 Wolfgang Stripf and Herbert Barthel
Section V
Applications of Networks and Other Technologies
Automotive Communication Technologies 29 Design of Automotive X-by-Wire Systems ....................................................................29-1 Cédric Wilwert, Nicolas Navet, Ye Qiong Song, and Françoise Simonot-Lion 30 FlexRay Communication Technology ............................................................................30-1 Dietmar Millinger and Roman Nossal 31 The LIN Standard ............................................................................................................31-1 Antal Rajnák 32 Volcano: Enabling Correctness by Design.....................................................................32-1 Antal Rajnák Networks in Building Automation 33 The Use of Network Hierarchies in Building Telemetry and Control Applications......................................................................................................................33-1 Edward Koch 34 EIB: European Installation Bus ......................................................................................34-1 Wolfgang Kastner and Georg Neugschwandtner 35 Fundamentals of LonWorks/EIA-709 Networks: ANSI/EIA-709 Protocol Standard (LonTalk)..........................................................................................................35-1 Dietmar Loy Manufacturing Message Specification in Industrial Automation 36 The Standard Message Specification for Industrial Automation Systems: ISO 9506 (MMS) ..............................................................................................................36-1 Karlheinz Schwarz 37 Virtual Factory Communication System Using ISO 9506 and Its Application to Networked Factory Machine...........................................................................................37-1 Dong-Sung Kim and Zygmunt J. Haas Motion Control 38 The SERCOS interface™..................................................................................................38-1 Scott C. Hibbard, Peter Lutz, and Ronald M. Larsen Train Communication Network 39 The IEC/IEEE Train Communication Network ............................................................39-1 Hubert Kirrmann and Pierre A. Zuber
xxiii © 2005 by CRC Press
Smart Transducer Interface 40 A Smart Transducer Interface Standard for Sensors and Actuators ...........................40-1 Kang Lee Energy Systems 41 Applying IEC 61375 (Train Communication Network) to Data Communication in Electrical Substations ..................................................................................................41-1 Hubert Kirrmann SEMI 42 SEMI Interface and Communication Standards: An Overview and Case Study........42-1 A.M. Fong, K.M. Goh, Y.G. Lim, K. Yi, and O. Tin
xxiv © 2005 by CRC Press
3077_book.fm Page 1 Friday, November 19, 2004 11:21 AM
1 Basics of Data Communication and IP Networks 1 Principles of Lower-Layer Protocols for Data Communications in Industrial Communication Networks................................................................................................1-1 Andreas Willig and Hagen Woesner 2 IP Internetworking ............................................................................................................2-1 Helmut Hlavacs and Christian Kurz 3 A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues.......3-1 Lucia Lo Bello 4 Fundamentals in Quality of Service and Real-Time Transmission...............................4-1 Wolfgang Kampichler 5 Survey of Network Management Frameworks................................................................5-1 Mai Hoang 6 Internet Security ................................................................................................................6-1 Christopher Kruegel
1-1 © 2005 by CRC Press
1 Principles of Lower-Layer Protocols for Data Communications in Industrial Communication Networks 1.1 1.2
Introduction ........................................................................1-1 Framing and Synchronization............................................1-2 Bit Synchronization • Frame Synchronization • Example: Bit and Frame Synchronization in the PROFIBUS
1.3
Medium Access Control Protocols.....................................1-6 Requirements and Quality-of-Service Measures • Design Factors • Random Access Protocols • Fixed-Assignment Protocols • Demand-Assignment Protocols • Meta-MAC Protocols
1.4
Error Control Techniques.................................................1-15 Open-Loop Approaches • Closed-Loop Approaches • Hybrid Approaches • Further Countermeasures
1.5
Flow Control Mechanisms................................................1-18 XON/XOFF and Similar Methods • Sliding-Window Flow Control • Further Mechanisms
1.6
Packet Scheduling Algorithms..........................................1-20 Priority Scheduling • Fair Scheduling
1.7
Andreas Willig University of Potsdam
Hagen Woesner Technical University of Berlin
Link Layer Protocols .........................................................1-22 The HDLC Protocol Family • The IEEE 802.2 LLC Protocol
References .....................................................................................1-24 Bit and Frame Synchronization • Medium Access Control Protocols • Error Control • Flow Control • Link Layer Protocols • Packet Scheduling
1.1 Introduction In packet-switched networks the lower layers (data link layer, medium access control layer, physical layer) have to solve some fundamental tasks to facilitate successful communication. The lower layers are concerned with communication between neighbored stations, in contrast to the layers above (networking layer, transport layer), which are concerned with end-to-end communications over multiple intermediate stations.
1-1 © 2005 by CRC Press
1-2
The Industrial Communication Technology Handbook
The lower layers communicate over physical channels, and consequently, their design is strongly influenced by the properties of the physical channel (bandwidth, channel errors). The importance of the lower layers for industrial communication systems is related to the requirement for hard real-time and reliability guarantees: if the lower layers are not able to guarantee successful delivery of a packet/frame* within a prescribed amount of time, this cannot be compensated by any actions of the upper layers. Therefore, a wide variety of mechanisms have been developed to implement these guarantees and to deal with harmful physical channel properties like transmission errors. In virtually all data communication networks used in industrial applications the transmission is packet based; i.e., the user data are segmented into a number of distinct packets, and these packets are transmitted over the channel. Therefore, the following fundamental problems have to be solved: • What constitutes a frame and how are the bounds of a frame specified? How does the receiver detect frames and the data contained? To this end, framing and synchronization schemes are needed, discussed in Section 1.2. • When should a frame be transmitted? If multiple stations want to transmit their frames over a common channel, appropriate rules are needed to share the channel and to let each station determine when it may send its frames. This problem is tackled by medium access control (MAC) protocols, discussed in Section 1.3. • How should channel errors be coped with? The topic of error control is briefly touched on in Section 1.4. • How should the receiver be protected against too much data sent by the transmitter? This is the problem of flow control, discussed in Section 1.5. • Which packet should be transmitted next? This is the problem of packet scheduling, sketched in Section 1.6. • Finally, in link layer protocols all these mechanisms are combined into a working protocol. We discuss two important protocols in Section 1.7. The chapter is necessarily short on many topics. The interested reader will find further references in the text.
1.2 Framing and Synchronization The problem of synchronization is related to the transmission of information units (packets, frames) between a sending and a receiving entity. In computer systems, information is usually stored and processed in a binary digital form (bits). A packet is formed from a group of bits and shall be transmitted to the receiver. The receiver must be able to uniquely determine the start and end of a packet as well as the bits within the packet. The transmission of information over short distances, for instance, inside the computer, can be done with parallel transmission. Here, a number (say 64) of parallel copper wires transport all bits of a 64-bit data word at the same time. In most cases, one additional wire transmits the common reference clock. Whenever the transmitter has applied the correct voltage (representing a 0 or 1 bit) on all wires, it signals this by sending a sampling pulse on the clock wire toward the receiver. Conversely, on receiving a pulse on the clock wire, the receiver samples the voltage levels on all data wires and converts them back to bits by comparing them with a threshold. This kind of transmission is fast and simple, but cannot span large distances, because the cabling cost becomes prohibitive. Therefore, the data words have to be serialized and transmitted bit by bit on a single wire.† *We will use both terms interchangeably. †The term wire is actually used here as a synonym for a transmission channel. It therefore could also be a wireless or ISDN channel.
© 2005 by CRC Press
1-3
Principles of Lower-Layer Protocols for Data Communications
1
0
1
1
0
0
1
0
NRZ
Manchester
Diff. Manchester
1 means no level change
0 means level change
FIGURE 1.1 NRZ, Manchester, and differential Manchester codes.
1.2.1 Bit Synchronization The spacing of bits generated by the transmitter depends on its local clock. The receiver needs this clock information to sample the incoming signal at appropriate points in time. Unfortunately, the transmitters’ and receivers’ clocks are not synchronized, and the synchronization information has to be recovered from the data signal; the receiver has to synchronize with the transmitter. This process is called bit synchronization. The aim is to let the receiver sample the received signal in the middle of the bit period in order to be robust against the impairments of the physical layer, like bandwidth limitation and signal distortions. Bit synchronization is called asynchronous if the clocks are synchronized only for one data word and have to be resynchronized for the next word. A common mechanism used for this employs one start bit preceding the data word and one or more stop bits concluding it. The Universal Asynchronous Receiver/Transmitter (UART) specification defines one additional parity bit, which is appended to the 8 data bits, leading to the transmission of 11 bits total for every 8 data bits [3]. The upper row in Figure 1.2 illustrates this. For longer streams of information bits, the receiver clock must be synchronized continuously. The digital phase-locked loop (DPLL) is an electrical circuit that controls a local clock and adjusts it to the received clock being extracted from the incoming signal [23]. To recover the clock from the signal, sufficiently frequent changes of signal levels are needed. Otherwise, if the wire shows the same signal level for a long time (as may happen for the non-return to zero (NRZ) coding method, where bits are directly mapped to voltage levels), the receiver clock could drift away from the transmitter clock. The Manchester encoding (shown in the second row of Figure 1.1) ensures that there is at least one signal change per bit. Every logical 1 is represented by a signal change from one to zero, whereas a logical 0 shows the opposite signal change. The internal clock of the DPLL samples the incoming signal with a much higher frequency, for instance, 16 times per bit. For a logical 0 bit that is arriving exactly in time, the DPLL receives a sample pattern of 0000000011111111. If the transition between the 0 and 1 samples is not exactly in the middle of the bit but rather left or right of it, the local clock has to be readjusted to run faster or slower, respectively. In the classical IEEE 802.3 Ethernet, the bits are Manchester encoded [2]. To allow the DPLL of the receiver to synchronize to the received bit stream, a 64-bit-long preamble is transmitted ahead of each frame. This preamble consists of alternating 0 and 1 bits that result in a square wave of 5 MHz. A start-of-frame delimiter of two consecutive 1 bits marks the end of the preamble and the beginning of the data frame.
© 2005 by CRC Press
1-4
The Industrial Communication Technology Handbook
Start
D7
D6
D5
D4
D3
D2
D1
D0
Parity
Stop
UART character (11 bit) Start, Stop Start/Stop bit D7D0 Data bits SD1
DA
SA
FC
FCS
ED
Control frame (no data)
SD2
DA
SA
FC
Data
FCS
ED
SD1SD3 DA, SA FC FCS LE LEr ED
Start Delimiter Destination, Source address Frame Control byte Frame Check Sequence (CRC) Length Field Length Field repeated End Delimiter
Fixed data length (8 characters)
SD3
LE
LEr
SD3
SA
DA
FC
Data
FCS
ED
Variable length data frame (0249 characters)
FIGURE 1.2 EN 50170 PROFIBUS: character and selected frame formats.
1.2.2 Frame Synchronization It is of interest for the receiver to know whether the received information is (1) complete and (2) correct. Section 1.4 treats the latter problem in some more detail. To decide about the first problem, the receiver needs to know where a packet starts and ends. The question that arises immediately is that of marking the start and end of a frame. There are several ways to accomplish this; in real-world protocols one often finds combinations of them. In the following, the most important will be discussed briefly. 1.2.2.1 Time Gaps The most straightforward way to distinguish between frames is to leave certain gaps of silence between them. However, when many stations share the same medium, all of them have to obey these time gaps. As it will be seen in Section 1.3.3.2, several MAC protocols rely on minimum time gaps to determine if the medium is accessible. While time gaps are a simple way to detect the start of a frame, it should be possible to detect the end of it, too. Using time gaps, the end of the previous packet can be detected only after a period of silence. Even if the receiver detects a silent medium, it cannot be sure if this is the result of a successful transmission or a link or node failure. Therefore, additional mechanisms are needed. 1.2.2.2 Code Violations A bit is usually encoded by a certain signal pattern (e.g., a change in voltage or current levels) that is, of course, different for a 0 and 1 bit. A signal pattern that represents neither of the allowed values can be taken as a marker for the start of a frame. An example for this is the IEEE 802.5 Token Ring protocol [28], which uses differential Manchester encoding (see Figure 1.1 and [28]). Here, two special symbols appear: J for a so-called positive code violation and K for a negative one. In contrast to the bit definitions in the encoding, these special symbols do not show a transition in the middle of the bit. Special 8-bit-long characters that mark the beginning and end of the frame are constructed from these symbols. 1.2.2.3 Start/End Flags Some protocols use special flags to indicate the frame boundaries. A sequence of 01111110, that is, six 1 bits in a sequence surrounded by two 0 bits, marks the beginning and the end of each frame. Of course, since the payload that is being transmitted can be an arbitrary sequence of bits, it is possible that the flag is contained in the payload. To avoid misinterpretation of a piece of payload data as being the end of a frame, the sender has to make sure that it only transmits the flag pattern if it is meant as a flag. Any
© 2005 by CRC Press
Principles of Lower-Layer Protocols for Data Communications
1-5
flag-like data have to be altered in a consistent way to allow the receiver to recover the original payload. This can be done using bit- or byte-/character-stuffing techniques. Bit stuffing, as exercised in high-level data link control (HDLC) [23] protocols, requires the sender to insert a zero bit after each sequence of five consecutive 1 bits. The receiver checks if the sixth bit that follows five 1s is a zero or one bit. If it detects a zero, it is removed from the sequence of bits. If it detects a 1, it can be sure that this is a frame boundary. Unfortunately, it might happen that a transmission error modifies the data sequence 01111100 into 01111110, and thus creates a flag prematurely. Therefore, additional mechanisms like time gaps are needed to remove the following bits and detect the actual end of the frame. Byte stuffing, as employed in Point-to-Point Protocol (PPP), uses the same flag pattern, but relies on a byte-oriented transmission medium [5, 6]. The flag can be written as a hexadecimal value 0¥7E. Every unintentional appearance of the flag pattern is replaced by two characters, 0¥7D 0¥5E. This way, the flag character disappears, but the 0¥7D (also called escape character) has to be replaced, if it is found in the user data. To this end, 0¥7D is replaced by 0¥7D 0¥5D. The receiver, after detecting the escape character in the byte stream, discards this byte and performs an exclusive-or (XOR) operation of 0¥20 with the following byte to recover the original payload. In both cases, more data are being transmitted than would be necessary without bit- or byte-stuffing techniques. To make things worse, the amount of overhead depends on the contents of the payload. A malicious user might effectively double the data rate (with byte stuffing) or increase it by around 20% (with bit stuffing) by transmitting a continuous stream of flags. To avoid this, several measures can be taken. One is to scramble the user data before they are put into data frames [4]. Another possibility is the so-called consistent overhead byte stuffing (COBS), proposed in [1]. Here, the stream of data bytes is scanned in advance for appearing flags. The sequence of data bytes is then cut into chunks of at most 254 bytes not containing the flag. Thus, every flag that appears in the flow is replaced by one byte representing the number of nonflag data bytes following it. This way, no additional data are to be transmitted as long as there is at least one flag every 255 data bytes. Otherwise, one byte is inserted every 254 bytes, indicating a full-length chunk. 1.2.2.4 Length Field To avoid the processing overhead that comes with bit or character stuffing, it is possible to reserve a field in the frame header that indicates the length of the whole frame. Having read this field, the receiver knows in advance how many characters or bytes will arrive. No end delimiter is needed anymore. Either a continuous transmission of packets followed by idle symbols, or the usual combination of preamble and start delimiter is needed to correctly determine which of the header fields carries the length information. Being potentially the best solution concerning transmission overhead, the length field mechanism suffers from erroneous transmission media. If the packet length information is lost or corrupted, then it is difficult to find again. Therefore, it has to be protected separately using error-correcting codes or redundant transmission. Additional mechanisms (for example, time gaps) should be employed to find the end of a frame even when the length field is erroneous.
1.2.3 Example: Bit and Frame Synchronization in the PROFIBUS As an example to illustrate the mechanisms introduced above, let us look at the lower layers of the EN 50170/DIN 19245 process fieldbus (PROFIBUS) [52]. This standard defines a fieldbus system for industrial applications. The lowest layer, the physical layer, is based on the RS-485 electrical interface. Shielded twisted-pair or fiber cable may be used as the transmission medium. The UART converts every byte into 11-bit transmission characters by adding start, parity, and stop bits. Thus, asynchronous bit synchronization is used on the lowest layer of the PROFIBUS. The second layer is called the fieldbus data link (FDL). It defines the frame format, as shown in Figure 1.2. Start and end delimiters are used in every frame, but different start delimiters SDx (SD1 to SD4; the
© 2005 by CRC Press
1-6
The Industrial Communication Technology Handbook
latter is not shown in the figure) define different frame types. Thus, a receiver knows after reading an SD1 that a control frame of fixed length will arrive. In addition, time gaps of 33 bit times are required between the frames. After receiving an SD3, the receiver interprets the next byte as LE (length field) and checks this against the redundant transmission of LE in the third byte, thereby decreasing the probability of undetected errors in the length field. Using the combination of time gaps and the redundant transmission of the length field, a character stuffing to replace all possible start and end delimiters in the payload becomes unnecessary.
1.3 Medium Access Control Protocols All medium access control or multiple-access control (MAC) protocols try to solve the same problem: to let a number of stations share a common resource (namely, the transmission medium) in an efficient manner and such that some desired performance objectives are met. They are a vital part of local area network (LAN) and metropolitan area network (MAN) technologies, which typically connect a small to moderate number of users in a small geographical area, such that a user can communicate with other users. With respect to the Open Systems Interconnection (OSI) reference model, the MAC layer does not form a protocol layer on its own, but is considered a sublayer of either the physical layer or the data link layer [45]. However, due to its distinguished task, the MAC sublayer deserves separate treatment. The importance of the MAC layer is reflected by the fact that many MAC protocol standards exist, for example, the IEEE 802.x standards. Its most fundamental task is to determine for each station attached to a common broadcast medium the points in time where it is allowed to access the medium, i.e., to send data or control frames. To this end, each station executes a separate instance of a MAC protocol. The design and behavior of a MAC protocol depend on the design goals and the properties of the underlying physical medium. Specifically for hard real-time communications, the MAC layer is a key component: if the delays on the MAC layer are not strictly bounded, the upper layers cannot compensate this. A large number of MAC protocols have been developed during the last three decades. The following references are landmark papers or survey articles covering the most important protocols: [7], [8], [17], [20], [21], [32], [33], [34], [36], [42], [43], [48], [49], [50], [54]. Furthermore, MAC protocols are covered in many textbooks on computer networking, for example, [12], [23], [45]. In this survey, we stick to those protocols that are important for industrial applications and that have found some deployment in factory plants, either as stand-alone solutions or as building blocks of more complex protocols.
1.3.1 Requirements and Quality-of-Service Measures There are a number of (sometimes conflicting) requirements for MAC protocols; some of them are specific for industrial applications with hard real-time and reliability constraints. There are two main delay-oriented measures: the medium access delay and the transmission delay. The medium access delay is the time between arrival of a frame and the time where its transmission starts. This delay is affected by the operational overhead of the MAC itself, which may include collisions, MAC control frames, backoff and waiting times, and so on. The transmission delay denotes the time between frame arrival and its successful reception at the intended receiver. Clearly, the medium access delay is a fraction of the transmission delay. For industrial applications with hard real-time requirements, both delays must be upper bounded. In addition, a desirable property is to have low medium access delays in case of low network loads. A key requirement for industrial applications is the support for priorities: important frames (for example, alarms, periodic process data) should be transmitted before unimportant ones. This requirement can be posed locally or globally: in the local case, each station decides independently which of its waiting frames is transmitted next. There is no guarantee that station A’s important frames are not blocked by station B’s unimportant frames. In the global case, the protocol identifies the most important frame of all stations to be transmitted next.
© 2005 by CRC Press
1-7
Principles of Lower-Layer Protocols for Data Communications
The need to share bandwidth between stations constitutes another important class of desired MAC properties. A frequently posed requirement is fairness: stations should get their fair share of the bandwidth, even if other stations demand much more. It is also often required that a station receives a minimum bandwidth, like for the transmission of periodic process data of fixed size. With respect to throughput, it is clearly important to keep the MAC overhead small. This concerns the frame formats, the number and frequency of MAC control frames, and efficiency losses due to the operation of the MAC protocol. An example for efficiency loss is collisions: the bandwidth spent for collided packets is lost, since typically the collided frames are useless and must be retransmitted. A MAC protocol is said to be stable if an increase in the overall load does not lead to a decrease in throughput. Depending on the application area, other constraints can be important as well. For simple field devices, the MAC implementation should have a low complexity and be simple enough to be implementable in hardware. For mobile stations using wireless media, the energy consumption is a major concern; therefore, power-saving mechanisms are needed. For wireless transmission media, the MAC should contain additional mechanisms to adapt to the instantaneous error behavior of the wireless channel; possible control knobs are the transmit power, error-correcting codes, the bit rate, and several more.
1.3.2 Design Factors The most important factors influencing the design of MAC protocols are the medium properties/medium topology and the available feedback from the medium. We can broadly distinguish between guided media and unguided media. In guided media the signals originating from frame transmissions propagate within well-specified geographical bounds, typically within copper or fiber cables. If the medium is properly shielded, then beyond these bounds the communications are invisible and two cables can be placed close to each other without mutual interference. In contrast, in unguided media (with radio frequency or infrared wireless media being the prime example) the wave propagation is visible in the whole geographical vicinity of the transmitter, and ongoing transmissions can be received at any point close enough to the transmitter. Therefore, two different networks overlayed within the same geographical region can influence each other. This coexistence problem appears, for example, with IEEE 802.11b [37] and Bluetooth [13, 22]. Both systems utilize the 2.4-GHz industrial, scientific, and medical (ISM) band [16, 25, 35]. Guided media networks can have a number of topologies. We discuss a few examples. In a ring topology (see Figure 1.3), each station has a point-to-point link to its two neighbors, such that the stations form a ring. In a bus topology like the one shown in Figure 1.4, the stations are connected to a common bus
1
4
FIGURE 1.3 Ring topology.
© 2005 by CRC Press
1
2
4
3
2
3 FIGURE 1.4 Bus topology (the black boxes are line terminations).
1-8
The Industrial Communication Technology Handbook
1
2
Star
4
3
FIGURE 1.5 Star topology.
1
7
2
5 6 3
8
4 FIGURE 1.6 Partial mesh topology.
and all stations see the same signals. Hence, the bus is a broadcast medium. In the star topology illustrated in Figure 1.5, all stations only have a physical connection to a central device, the star coupler, which repeats and optionally amplifies the signals coming from one line to all the other lines. A network with a star topology also provides a broadcast medium, where each station can hear all transmissions. When using wireless transmission media, the distance between stations might be too large to allow all stations to receive all transmissions. Therefore, the network is often only partially connected or has a partial mesh structure, shown in Figure 1.6. Additional routing mechanisms have to be employed to implement multihop transmission, for example, from station 4 to station 8. An important property of a physical channel is the available feedback. Specifically, some kinds of media allow a station to read back data from the channel while transmitting. This can be done to detect faulty transceivers (like in the PROFIBUS protocol [52]), collisions (like in the Ethernet protocol), or parallel ongoing transmissions of higher priority (like in the Controller Area Network (CAN) protocol). This feature is typically not available when using wireless technologies: it is not possible to send and receive simultaneously on the same channel.
1.3.3 Random Access Protocols In random access (RA) protocols the stations are uncoordinated and the protocols work in a fully distributed manner. RA protocols typically incorporate a random element, for example, by exploiting random packet arrival times, setting timers to random values, and so on. The lack of central coordination and of fixed resource assignment allows the sharing of a channel between a potentially infinite number of stations, whereas fixed assignment and polling protocols support only a finite number of stations. However, the randomness can make it impossible to give deterministic guarantees on medium access delays and transmission delays. There are many RA protocols that are used not only on their own, but also as building blocks of more complex protocols. One example is the GSM system, where speech data are transmitted in exclusively allocated time slots on a certain frequency, but the call setup messages have to contend for a shared channel using an ALOHA-like protocol.
© 2005 by CRC Press
Principles of Lower-Layer Protocols for Data Communications
1-9
1.3.3.1 ALOHA and Slotted ALOHA A classical protocol is ALOHA [7], for which we present two variants here. In both variants a number of stations want to transmit packets to a central station. In pure ALOHA a station sends a newly arriving data frame immediately without inquiring the status of the transmission medium. Hence, frames from multiple stations can overlap at the central station (collision) and become unrecognizable. In slotted ALOHA all stations are synchronized to a common time reference and the time is divided into fixed-size time slots. Newly arriving frames are transmitted at the beginning of the next time slot. In both ALOHA variants the transmitter starts a timer after frame transmission. The receiver has to send an immediate acknowledgment frame upon successful reception of the data frame. When the transmitter receives the acknowledgment, it stops the timer and considers the frame successfully transmitted. If the timer expires, the transmitter selects a random backoff time and waits for this time before the frame is retransmitted. The backoff time is chosen randomly to avoid synchronization of colliding stations. This protocol has two advantages: it is extremely simple and it offers short delays in case of a low network load. However, the protocol does not support priorities, and with increasing network load, the collision rate increases and the transmission delays grow as well. In addition, ALOHA is not stable: above a certain threshold load an increase in the overall load leads to a decrease in overall throughput. The maximum normalized throughput of pure ALOHA is 1/2e ª18% under Poisson arrivals and an infinite number of stations. The maximum throughput can be doubled with slotted ALOHA. A critical parameter in ALOHA is the backoff time, which is typically chosen from a certain time interval (backoff window). A collision can be interpreted as a sign of congestion. If another collision occurs after the backoff time, the next backoff time should be chosen from a larger backoff window to reduce the pressure on the channel. A popular rule for the evolution of the backoff window is the truncated binary exponential backoff scheme, where the backoff window size is doubled upon every collision. Above a certain number of failed trials, the window remains constant. After successful transmission the backoff window is restored to its original value. 1.3.3.2 CSMA Protocols In carrier-sense multiple-access (CSMA) the stations act more careful than in ALOHA: before transmitting a frame they listen on the medium (carrier sensing) to see whether it is busy or free [32, 46]. If the medium is free (many protocols require it to be contiguously free for some minimum amount of time), the station transmits its frame. If the medium is busy, the station defers transmission. The various CSMA protocols differ in the following steps. In nonpersistent CSMA the station simply defers for a random time (backoff time) without listening to the medium during this time. After this waiting time the station listens again. All other protocols discussed next wait until the end of the ongoing transmission before starting further activities. In p-persistent CSMA (0 < p < 1) the time after the preceding transmission ends is divided into time slots. A station listens to the medium at the beginning of a slot. If the medium is free, the station starts transmitting its frame with a probability p and with probability 1 – p it waits until the next slot comes. In 1-persistent CSMA the station transmits immediately without further actions. Both approaches still have the risk of collisions, since two or more stations can decide to transmit (1-persistent CSMA) or can choose the same slot (p-persistent CSMA). The problem is the following: if station A senses the medium as idle and starts transmission at time t0, station B would notice this earliest at some later time t0 + t, due to the propagation delay. If B performs carrier sensing at a time between t0 and t0 + t, it senses the medium to be idle and starts transmission too, resulting in a collision. Therefore, the collision probability depends on the propagation delay, and thus on the maximum geographical distance between stations. Similar to ALOHA, pure CSMA protocols rely on acknowledgments to recognize collisions. Although the throughput of CSMA-based protocols is much better than that of ALOHA (ongoing transmissions can be completed without disturbance), the number of collisions and their duration limit the throughput. Collision detection and collision avoidance techniques can be used to relax these problems. These are discussed in the following sections. Specifically for wireless media the task of carrier sensing is not without problems. After all, the transmitter senses the medium ultimately because he wants to know the state of the medium at the
© 2005 by CRC Press
1-10
The Industrial Communication Technology Handbook
A
C
B
FIGURE 1.7 Hidden-terminal scenario.
intended receiver, since collisions are only important at the receiver. However, due to path loss [40, Chapter 4], any signal experiences attenuation with increasing distance. If a minimum signal strength is required, the hidden-terminal problem occurs (refer to Figure 1.7): consider three stations, A, B, and C, with transmission radii as indicated by the circles. Stations A and C are in range of B, but A is not in the range of C and vice versa. If C starts to transmit to B, A cannot detect this by its carrier-sensing mechanism and considers the medium to be free. Hence, A also starts frame transmission and a collision occurs at B. For wireless media there is a second scenario where carrier sensing leads to false predictions about the channel state at the receiver: the so-called exposed-terminal scenario, depicted in Figure 1.8. The four stations A, B, C, and D are placed such that the pairs A/B, B/C, and C/D can hear each other; all remaining combinations cannot. Consider the situation where B transmits to A, and one short moment later C wants to transmit to D. Station C performs carrier sensing and senses the medium is busy, due to B’s transmission. As a result, C postpones its transmission. However, C could safely transmit its frame to D without disturbing B’s transmission to A. This leads to a loss of efficiency. Two approaches to solve these problems are busy-tone solutions [50] and the request-to-send (RTS)/ clear-to-send (CTS) protocol, as applied in the IEEE 802.11 wireless LAN (WLAN) medium access control protocol [47]. In the busy-tone approach the receiver transmits a busy-tone signal on a second channel during frame reception. Carrier sensing is performed on this second channel. This solves the exposedterminal problem. The hidden-terminal scenario is also solved, except the rare cases where A and C start their transmissions simultaneously. The RTS/CTS protocol attacks the hidden-terminal problem using only a single channel. Consider the case that A has a data frame for B. After A has obtained channel access, it sends a short RTS frame to B, indicating the time duration needed for the whole frame exchange sequence (the sequence consists of the RTS frame, the CTS frame, a data frame, and a final acknowledgment frame). If B receives the RTS frame properly, it answers with a CTS frame, indicating the time needed for the remaining frame exchange sequence. Station A starts transmission after receiving the CTS frame. Station C, hearing the RTS and CTS frames, defers its transmissions for the indicated time, thus not disturbing the ongoing frame exchange. It is a conservative choice to defer on any of these frames, but the exposed-terminal problem still exists. If station C defers only on receiving both frames, the exposed-terminal problem is solved. However, there is the risk of bit errors in the CTS frame, which may lead C to start transmissions falsely. The RTS/CTS protocol of IEEE 802.11 does not resolve collisions of RTS frames at the receiver, nor does
A
FIGURE 1.8 Exposed-terminal scenario.
© 2005 by CRC Press
B
C
D
Principles of Lower-Layer Protocols for Data Communications
1-11
it entirely solve the hidden-terminal problem [39]. Furthermore, this four-way handshake imposes serious overhead, which only pays out for large frames. 1.3.3.3 CSMA Protocols with Collision Detection If two or more stations collide without recognizing this, they would uselessly transmit their entire frames. If the stations could quickly detect a collision and abort transmission, less bandwidth is wasted. The class of carrier-sense multiple access with collision detection (CSMA/CD) protocols enhances the basic CSMA method with a collision detection facility. The collision detection is performed by reading back the signal from the cable during transmission, and by comparing the measured signal with the transmitted one. If the signals differ, a collision has been detected [23, Section 6.1.3]. When a station experiences a collision, it executes a backoff algorithm. In the IEEE 802.3 Ethernet this algorithm works with slotted time. A time slot is large enough to accommodate the maximum roundtrip time, in order to make sure that all stations have the chance to reliably recognize an ongoing transmission. As an example, in the CSMA/CD method of IEEE 802.3 a truncated binary exponential backoff scheme is used: after the first collision, a station randomly chooses to wait either 0 or 1 slot. If another station starts transmission during the waiting time, the station defers. After the second collision, a station chooses to wait between 0 and 3 slots, and for all subsequent collisions, the backoff window is doubled. After 10 collisions the backoff window is kept fixed to 1024 slots, and after 16 collisions the station gives up and discards the frame. In wireless LANs (for example, in the IEEE 802.11 wireless LAN) acknowledgment frames are used to give the transmitter feedback, since wireless transceivers cannot transmit and receive simultaneously on the same channel. The lack of an acknowledgment frame indicates either a collision or a transmission error. Furthermore, two colliding frames do not necessarily result in total loss of information: when the signal strength of one frame is much stronger than the other one, the receiver may be able to successfully decode the frame (near–far effect). 1.3.3.4 CSMA Protocols with Collision Resolution This class of CSMA protocols reacts to collisions not by going into a backoff mode and deferring transmissions, but by trying to resolve them. One approach to resolving a collision is to determine one station among the contenders, which is ultimately allowed to send its frame. One example for this is protocols with bit-wise priority arbitration like the MAC protocol of Controller Area Network (CAN) [30] and the protocol used for the D-channel of Integrated Services Digital Network (ISDN) [41]. Another approach is to give all contenders a chance to transmit, like what is done in the adaptive tree walking protocol [14], which works as follows: The time is slotted, just as in the Ethernet CSMA/CD protocol. Furthermore, all stations are arranged in a balanced binary tree T and know their respective positions in this tree. All stations wishing to transmit a frame (called backlogged stations) wait until the end of the ongoing transmission and start to transmit their frame in the first slot (slot 0). If there is only one backlogged station, then it could transmit its frame without further disturbance. If two or more stations collide, then in slot 1 only the members of the left subtree TL are allowed to try transmission again. If another collision happens, only stations of the left subtree TL,L of TL are allowed to transmit in slot 2, and so forth. On the other hand, if only one station from TL transmits its frame, then for fairness reasons the next frame transmission is reserved for a station from the right subtree TR, and so on. The bit-wise arbitration protocols do not try to be fair to stations. As an example, we present the MAC protocol of CAN. CAN requires a transmission medium that guarantees that overlapping signals do not destroy each other, but lead to a valid signal. If two stations transmit the same bit, the medium adapts the common bit value. If one station transmits a zero bit and the other a one bit, the medium adopts a well-defined state, for example, a zero bit. The CAN protocol uses a priority field of a certain length at the beginning of a MAC frame. Backlogged stations wait until the end of an ongoing frame and then transmit the first bit of their priority field. In parallel, they read back the state of the medium and compare it with their transmitted bits. If both agree, the station continues with the second bit of the priority field. If the bits differ, the station has lost contention and defers until the end of the next frame. This process
© 2005 by CRC Press
1-12
The Industrial Communication Technology Handbook
is continued until the end of the priority field is reached. If it can be guaranteed that all priority values are distinct, only one station survives contention. This protocol supports global frame priorities in a natural way, and the medium access time for the highest-priority frame is tightly bounded. However, the assignment of priorities to stations or frames is nontrivial when fairness is a goal. If the priorities are assigned on a per-station basis, the protocol is inherently unfair. One solution is to rotate station priorities over time. In CAN applications the priorities are not assigned to stations but to data. Another drawback of the protocol is that all stations have to be synchronized with a precision of a bit time, and the need for all stations to agree on the state of the medium limits either the bit rate or the geographical extension of a CAN network. 1.3.3.5 CSMA Protocols with Collision Avoidance If it is technically not feasible to immediately detect collisions, one might try to avoid them. Protocols belonging to this class are called carrier-sense multiple access with collision avoidance (CSMA/CA) protocols. An important application area is wireless LANs, where (1) stations cannot transmit and receive simultaneously on the same channel, and (2) the transmitter cannot directly detect collisions at the receiver due to path loss and the need for a minimum signal strength (see the discussion of the hiddenterminal scenario in Section 1.3.3.2). The IEEE 802.11 WLAN protocol combines two mechanisms to avoid collisions. The first one is the RTS/CTS handshake protocol described in Section 1.3.3.2. The second mechanism, the carrier-sensing mechanism of IEEE 802.11, not only requires a minimum idle time on the channel, but each station also chooses a random backoff time, during which the carrier-sense operation is continued. If another station starts to transmit in the meantime, the station defers and resumes after the other frame is finished. This approach with random backoff times also enables the introduction of stochastic priorities into IEEE 802.11: frames with different priorities can choose their backoff times from different distributions, with more important frames likely having shorter backoffs than unimportant ones. Such an approach is proposed in [11] and also used for the IEEE 802.11e extension of the IEEE 802.11 standard. Another example is the EY/NPMA protocol of HIPERLAN [18]. Here the collision avoidance part consists of three phases; all stations wishing to transmit a frame wait for the end of the ongoing transmission. In the first phase (priority phase), the stations wait for a number of slots corresponding to the frames priority (there are five distinct priorities). If station A decides to transmit in slot n and station B starts in slot m < n, then A defers, since B has a higher-priority frame. In the second phase (elimination phase), the surviving stations transmit a burst of random length before switching to receive mode. If it receives some energy, the station gives up, since another station sends a longer burst. In the third phase (yield phase), the surviving stations keep idle for a random amount of time. If another station starts to transmit in the meantime, the station defers. Otherwise, the station starts to transmit its data frame.
1.3.4 Fixed-Assignment Protocols In fixed-assignment (FA) protocols a station is assigned a channel resource (frequency, time, code, space) exclusively; i.e., it does not need to contend with other stations when using its share, and it is intrinsically guaranteed that medium access can be achieved within a bounded time. In frequency-division multiple-access (FDMA) systems the available spectrum is subdivided into N subchannels, with some guard band between them. A channel is assigned exclusively to a station. When a frame arrives, the station can transmit immediately on the assigned channel; the intended receiver has to know the channel in advance. Idle subchannels cannot be used by highly loaded stations. When a station wants to use multiple subchannels in parallel, it needs multiple transceivers. In code-division multiple-access (CDMA) systems the stations spread their frames over a much larger bandwidth than needed while using different codes to separate their transmissions. The receiver has to know the code used by the transmitter; all parallel transmissions using other codes appear as noise. Similar to FDMA, stations can transmit newly arriving frames immediately.
© 2005 by CRC Press
Principles of Lower-Layer Protocols for Data Communications
1-13
In time-division multiple-access (TDMA) systems the time is divided into fixed-length superframes, which in turn are divided into time slots. Each station is assigned a set of slots in every superframe.* During its slots, a station can use the full channel bandwidth. The stations need to be synchronized on slot boundaries; however, the slots contain some guard times to compensate for inaccurate synchronization. In a centralized setting where all stations transmit to a central station, such inaccuracies can be introduced by different propagation delays resulting from different distances between stations and the central controller. In the GSM network a timing-advance mechanism is used to compensate different propagation delays [55]. In space-division multiple-access (SDMA) systems spatial resources are divided among stations. Consider, for example, a cellular system, where the base station is equipped with a smart antenna array. Using this array, the base station can form a number of spatially directed spot beams and focus them to the stations. If a beam covers two or more stations, they have to share the channel by some other protocol, but stations in different beams can transmit in parallel. Another example is the use of sectored antennas in cellular systems. In all these schemes the allocation of channel resources to stations can be static or dynamic. In the static case the allocation may be preconfigured. In a dynamic scheme a station requests the resource once from some resource management facility, which may be part of a central station/access point. The TimeTriggered Protocol is an example of such a scheme [51]. Another example is the cyclic window in WorldFIP, which offers a preconfigured allocation of time slots. Some light dynamics can be introduced by changing the allocation tables at appropriate times. The fixed assignment of channel resources is advantageous, specifically for industrial applications, for the following reasons: • It allows the guarantee of a minimum bandwidth to a station. • It allows the guarantee of a strictly bounded medium access time as well as strictly isochronous service.
1.3.5 Demand-Assignment Protocols In demand-assignment (DA) protocols channel resources are also assigned exclusively to a station, but on a much shorter timescale than for fixed-assignment protocols. In the latter case, assignment happens once and lasts for the lifetime of a session, while in demand-assignment protocols resources are assigned only for the duration of a data burst. Consequently, for each new data burst a station must obtain new channel resources. Clearly, this involves appropriate signaling mechanisms. We can broadly distinguish two classes of DA protocols: In distributed protocols there is no central authority for resource allocation; instead, token-passing schemes are often used. On the other hand, in centralized protocols the stations have to signal their demands to a central station, which assigns resources and schedules transmissions. The signaling channel can be either in-band (requests can be piggybacked to transmissions of data or control frames) or a separate logical signaling channel with its own medium access procedure, for example, ALOHA/slotted ALOHA. For industrial applications demand-assignment protocols have two major advantages: they can guarantee a bounded medium access time and they allow the use of idle resources. However, for the distributed schemes there is inevitably some jitter in the medium access times, which hinders strictly isochronous services. The centralized schemes introduce a single point of failure, namely, the resource manager. 1.3.5.1 Centralized Schemes: Hub-Polling Protocols and Reservation Protocols As a very general description [44], a hub-polling system consists of a central station (called hub) and a number of stations, with each station conceptually having a queue of frames. The hub carries out two different tasks: (1) it queries the queue states from the stations, and (2) it assigns bandwidth to the stations according to the query results and some polling policy. Typically it is assumed that a query is less costly *It is perfectly possible to assign slots every k-th superframe as well.
© 2005 by CRC Press
1-14
The Industrial Communication Technology Handbook
than to serve a frame; otherwise, the query overhead would not be justified. To be queried, a station must register itself with the hub. Polling schemes differ in the sequence by which stations are polled: • In round-robin, the stations are visited one after another. • In table-driven schemes, the next station to be visited is determined from a prespecified table. • In random polling, the next station to poll is determined randomly. Furthermore, they differ in the type of service a polled station is granted: • k-limited service: Up to k frames are served per station before proceeding to the next station. • Time-limited service: The station may transmit frames, including retransmissions, for no longer than a specified time. • Exhaustive service: A queue is serviced until it is empty. • Gated service: The server serves only those frames of station i that were already present when starting service for i. As an example, the master/slave protocol of PROFIBUS can be classified as a table-driven and timelimited service (however, with varying masters). In the BITBUS protocol [29], the role of the master does not change over time. A variation of hub-polling protocols is probing protocols [24, 42]. These are based on the observation that polling each station separately is wasteful, if the load is low. Instead, it is more effective to poll a group of stations as a whole. For example, the hub may announce that a random access slot follows, which can be used by stations belonging to a certain group to signal their transmission needs. If no station answers, the next group can be polled. If a single station answers, it is granted access to the medium. If two or more stations answer, their requests will collide in the random access slot. Different methods can now be applied to resolve this collision, for example, the tree walking approach discussed in Section 1.3.3.4, or all stations in the group can be polled separately. In [56] the latter approach is introduced, along with a scheme that adapts the group sizes to the current load. In reservation protocols the stations have to send a reservation message to the resource manager. The reservation message may specify the length of the desired data transmission and its timing constraints. The resource manager can perform an admission control test to decide whether the request can be satisfied without harming the guarantees given to already admitted requests. After successful admission control, the resource manager sends some feedback describing the allocated resources (for example, the time slots to use). There are three common methods to transmit reservation messages: (1) in piggybacking schemes the reservation requests are sent along with already admitted data or control frames; (2) the stations send request frames on a separate signaling channel using a contention-based MAC protocol (ALOHA or CSMA protocols); and (3) the resource manager may poll all stations that are currently idle and thus cannot use piggybacking. Many protocols developed in the context of wireless ATM [17] belong to this class, for example, the MASCARA protocol [38]. The FTT-CAN [10] protocol is another example of this class, where stations send reservation requests for periodic transmissions to a central master station. 1.3.5.2 Distributed Schemes: Token-Passing Protocols In distributed schemes there is no central facility controlling resource allocation or medium access. Instead, a special frame, called the token frame, circulates between stations. Only the station that currently holds the token (token owner) is allowed to initiate transmissions. After some time, the token owner must pass the token to another station by sending a token frame. Token-passing schemes can be applied in networks with a ring topology (examples: IEEE 802.5 Token Ring [28] or Fiber Distributed Data Interface (FDDI) [15, 31]) or with a bus/tree topology (examples: IEEE 802.4 Token Bus [27] or PROFIBUS with the FMS profile [52]). To guarantee an upper bound on medium access delay, the IEEE Token Bus, FDDI, and PROFIBUS protocols use variants of the timed-token protocol [9]. In this protocol all stations agree on a common parameter, the target token rotation time TTTRT . Furthermore, each station is required to measure the time TRT that passed between the last time it received the token and the actual token reception time. This time
© 2005 by CRC Press
Principles of Lower-Layer Protocols for Data Communications
1-15
is called the token rotation time. If the difference TTTRT – TRT is positive, the arriving token is called an early token; otherwise, it is called a late token. Some protocols forbid a station to transmit when receiving a late token; however, the PROFIBUS protocol allows transmission of a single high-priority frame in case of a late token. When a station receives an early token, it may transmit for a time corresponding to TTTRT – TRT . This way, the timed-token protocol guarantees an upper bound on the medium access delay. Token-passing protocols over broadcast media (bus, tree) construct a logical token-passing ring. The token frame is passed among all stations in this ring; each station gets the token once per cycle. The ring members have the additional burden to execute ring maintenance algorithms, which include, among others, new stations, excluding leaving or crashed stations; detect and repair lost tokens; and some more. These mechanisms rely on control frames and are designed in a way that they do not harm the timing guarantees given by the timed-token protocol.
1.3.6 Meta-MAC Protocols In reference [19] meta-MAC protocols are introduced. The basic idea is simple and elegant: a station contains not only a single MAC instance, but several of them, running in parallel. These can be entirely different protocols or the same protocol, but with different parameters. However, only one protocol is really active at a given time in the sense that its decisions (transmit/not transmit) are executed; in the other instances decisions are only recorded. From time to time a new active protocol is selected. This selection is based on history information about transmission outcomes (success, failure). For each candidate protocol it is evaluated how successful the protocol would have been given the outcomes in the history. For example, a protocol that produced a lot of transmit decisions in successful time slots would get a high ranking, while a protocol whose transmit decisions would have resulted in collisions gets a bad ranking. Based on this ranking, a new protocol is chosen.
1.4 Error Control Techniques When a packet is transmitted over a physical channel, it might be subject to distortions and channel errors. Potential error sources are noise, interference, loss of signal power, etc. As a result, a packet may be either completely or partially lost (for example, when the receiver fails to acquire bit synchronization or loses it somewhere), or a number of the bits within a packet are modified. In some types of channels, errors occur quite frequently, with wireless channels being the prime example [79]. One option to deal with errors is to tolerate them. For example, in Voice-over-IP systems a loss rate of speech packets of approximately 1% still gives an acceptable speech quality at the receiver, depending on the codec and the influence of error concealment techniques [65, Chapter 7]. Hence, as long as the loss rate is below this level, no action needs to be taken. However, in safety-critical industrial applications, errors are often not tolerable; they must be detected and subsequently corrected. There are the following fundamental approaches to error control [61, 69, 70]: • In open-loop approaches the transmitter receives no feedback from the receiver about the transmission outcomes. Redundancy is introduced to protect the transmitted data against a certain amount of errors. • In closed-loop schemes the transmitter gets feedback about erroneously received packets. The receiver requests retransmission of these packets by the transmitter. • In hybrid schemes these two approaches are combined. The detection of errors is based on checksums, which are appended to a packet. Well-known kinds of checksums are cyclic redundancy checks (CRCs) or parity bits [69]. However, no checksum algorithm is perfect; there are always bit error patterns that cannot be detected by a checksum algorithm. Hence, the residual error probability is nonzero, but fortunately very small for many practical channels. A study of the performance of checksum algorithms over real data is [76]. There is a rich literature on error control. Some standard references are [61], [69], [70], and [71].
© 2005 by CRC Press
1-16
The Industrial Communication Technology Handbook
1.4.1 Open-Loop Approaches In general, open-loop approaches involve redundant data transmission. Several kinds of redundancy can be used: • Send multiple copies of a packet. • Add redundancy bits to the packet data. • Diversity techniques. In the multiple-copies approach, the transmitter sends K identical copies of the same packet [57], each one equipped with a checksum. If the receiver receives at least one copy without checksum errors, this is accepted as the correct packet. If the receiver receives all copies with checksum errors, it might apply a bit-by-bit majority voting scheme [73, Chapter 4] on all received copies and check the result again. A variation of the multiple-copies scheme is to not send multiple copies of the same packet, but to send each bit of the user data multiple times: instead of sending 00110101, the transmitter sends, for example, 000.000.111.111.000.111.000.111. Hence, each user bit is transmitted three times and the receiver applies majority voting to each group of three bits. In error-correcting codes or forward error correction (FEC) codes to k bits of user data, a number n – k of redundancy bits are appended and the block of n bits is transmitted (the fraction k/n is called code rate), such that bit errors can be detected and a limited amount of bit errors can be corrected [69–71]. In block coding schemes, the user data are divided into blocks of k bits and each block is coded independently. Some well-known block FEC schemes are Reed–Solomon codes, Hamming codes, and Bose–Chaudhuri–Hocquenghem (BCH) codes. In convolutional coding schemes, the encoder has some memory, such that the coding of the current bit affects coding of future bits. Therefore, there are no clear block boundaries. Recently, the class of turbo codes has attracted considerable attention [75, 60]. In this class of codes two convolutional codes are concatenated and combined with an interleaver [58]. Diversity techniques are often applied on wireless channels. In general, in diversity schemes multiple copies of the same signal are created and the receiver tries to combine these copies in a sensible way. These copies can be created either explicitly (by sending the same packet multiple times on the same channel, on different channels, in different directions, etc.) or implicitly (by letting the channel create the multiple signal copies — reflections). In the case of receiver diversity, the receiver is equipped with two or more antennas. If these are appropriately spaced [72], the antennas receive two copies of the transmitted waveform, which in the best case are uncorrelated. Hence, it might happen that one antenna receives only a weak signal while the other one experiences good signal quality. The two antenna signals may then be combined in different ways. Clearly, there are many more diversity schemes.
1.4.2 Closed-Loop Approaches In closed-loop approaches receiving station B checks the arriving packets sent by station A by means of checksums and sequence numbers. In addition, station B provides A with feedback information indicating the transmission outcome (success or failure). Usually, B sends acknowledgment frames to provide this feedback to A, but the feedback information may as well be piggybacked onto data frames sent from B to A. Automatic repeat request (ARQ) protocols implement this approach [23, 63]. Some basic ARQ protocols are the send-and-wait/alternating-bit protocol, the Goback-N protocol, and the SelectiveRepeat protocol. In the send-and-wait/alternating-bit protocol, the transmitter sends a packet and starts a timer. The receiver sends an acknowledgment if the packet is received correctly; otherwise, it keeps quiet. If the transmitter receives the acknowledgment, the timer is canceled and the next packet is transmitted. If the transmitter’s timer expires without acknowledgment, the transmitter retransmits the packet. A 1-bit sequence number is used to prevent duplicates at the receiver, which can occur if not the data frame but the acknowledgment is lost and the same data frame is transmitted again. If the receiver receives a duplicate packet, the packet is acknowledged, but the data are not delivered to the user. This protocol is simple and works reliably as long as the delay for data packets or acknowledgments can be upper bounded.
© 2005 by CRC Press
Principles of Lower-Layer Protocols for Data Communications
1-17
A drawback of this protocol is its inability to fill “long fat pipes” (links with a high-bandwidth-delay product), since there can be at most one outstanding (not yet acknowledged) frame at any time. Both the Goback-N and the Selective-Repeat protocols are not restricted to a single unacknowledged or outstanding frame, but allow for multiple outstanding frames; these protocols are also called slidingwindow protocols. The frames are identified by sequence numbers. In the Goback-N protocol there may be up to N outstanding frames. The transmitter sets a timer for each transmitted frame and resets it as soon as an acknowledgment for this frame is received. If the receiver receives an in-sequence frame, it delivers the frame to its local user and sends a positive acknowledgment; otherwise, the frame is dropped (even if it is received correctly) and the receiver sends a negative acknowledgment or keeps quiet. When the transmitter receives a negative acknowledgment for an outstanding frame, or if the timer for this frame expires, it retransmits this frame and all subsequent outstanding frames. Therefore, it might happen that correctly received frames are retransmitted, which is inefficient. This drawback is attacked by the Selective-Repeat protocol, which works similar to Goback-N, but allows the receiver to buffer and acknowledge frames that are not received in sequence. As soon as the missing frames arrive, the buffered frames are delivered to the user in their correct sequence and the buffer is freed.
1.4.3 Hybrid Approaches Open-loop and closed-loop approaches can be combined, forming so-called hybrid ARQ protocols. Some simple schemes are: • To each packet some light FEC is applied; the remaining errors are corrected through the ARQ mechanism. • Normal packets are not FEC coded; only retransmissions are. • Increase the amount of redundancy for subsequent retransmissions, for example, by adapting the number of copies in a multicopy approach [57]. Another line of attack would be to make the receiver more clever and to take advantage of the information contained in already received erroneous packets by using packet-combining methods [64, 66, 78], for example, equal-gain combining or bit-by-bit majority voting. Such an approach is also referred to as type II hybrid ARQ [70]. In [59] and [77] this approach has been made deadline aware, by adopting the strategy to increase the coding strength (decreasing the code rate) more and more as the packet deadline comes closer (deadline-dependent coding). In reference [62] a scheme is described that takes both the estimated channel state and the packet deadline into account to select one coding scheme from a set of available schemes. For a bursty wireless channel, this scheme reduces the bandwidth need for a prescribed maximum failure probability, compared to a static scheme solely taking the channel state into account. A scheme that utilizes already received and partially erroneous packets but does not require redundancy is the intermediate checksum method (see, for example, [68]), where a packet is not equipped with a single checksum covering its whole contents, but is subdivided into several chunks, such that each chunk is equipped with a separate checksum. The receiver requests only the erroneous chunks for retransmission.
1.4.4 Further Countermeasures The transmitter has some further control knobs to reduce the probability of packet errors at the receiver. One control knob is the packet length: in a scenario where a packet is equipped with a checksum but not with redundant FEC data, longer packets have a higher probability of being received in error. On the other hand, short packets have a higher probability of being received successfully, but the overhead of the fixed-length packet header becomes prohibitive. For a given channel quality there is an optimum packet length, and adaptive schemes exist to find this [67, 68]. It is a fundamental communications law that the bit error rate at the receiver depends on the ratio of the energy expended per bit to the channel noise level [74]. There are two possibilities to use this relationship to increase transmission reliability:
© 2005 by CRC Press
1-18
The Industrial Communication Technology Handbook
• If the transmit power is increased, the energy per bit is increased and the bit error rate is reduced. However, often the transmit power is technically or legally restricted. • If the bits are transmitted at lower speed, the energy per bit is increased, too. Hence, a transmitter might apply transmit power control or modulation rate control.
1.5 Flow Control Mechanisms Flow control compensates different processing speeds of transmitters and receivers [12, Chapter 6], [23], [80]. Specifically, if the receiver does not have enough resources (buffers, processing speed) to process packets as fast as the transmitter sends them, mechanisms to slow down the transmitter are useful. Otherwise, the receiver would have to drop data packets, causing the transmitter to retransmit them and to waste further network resources. It is therefore necessary for the transmitter to receive feedback from the receiver. The function of flow control is to be distinguished from congestion control, although many authors consider the former to be a special case of the latter. Congestion control is relevant in multihop networks, where two end nodes are connected by a series of intermediate nodes, for example, routers. In congestion control, it is not the ultimate receiver but the intermediate nodes that need to be protected against resource exhaustion. However, here we do not discuss congestion control any further. Fieldbus systems offer different communication models. One important model is where all communications are performed between individually addressable stations and all packets are delivered from the link layer to the upper layers. Representatives of this class are PROFIBUS [52] and Foundation/IEC Fieldbus [26]. These systems can benefit from flow control mechanisms. Another important model is that of a real-time database, where not stations but data are addressed. The owner of a data item (the producer) broadcasts it, and all nodes interested in the data (the consumers) copy the data item into a preallocated local buffer. Each time the consumer receives an updated version of the same data item, it copies the data silently into the local buffer without notifying the applications running on the consumer node. The latter read the buffer contents when they need the value of the data item. Reading from and writing to this buffer are decoupled, and reading the buffer does not trigger any communications to fetch the data from the producer. Representatives of this class are CAN [30] and Factory Instrumentation Protocol (FIP)/WorldFIP [53]. Flow control is not an issue here, since the buffers are preallocated. Conceptually, flow control mechanisms need two key ingredients: • A signaling mechanism provides the transmitter with information about the available resources at the receiver. Different signaling mechanisms can vary in their accuracy (number of distinguishable states of receiver resource utilization), update frequency, signaling path (in-band or out-of-band), and relationship to other mechanisms, for example, error control. • A signaling answer determines the transmitter’s reaction to flow control signals. Flow control is not restricted to the data link layer, but is used in higher-layer protocols like the Transmission Control Protocol (TCP) as well. In the following we describe some of the most important mechanisms frequently found on the link layer. A more general discussion can be found in textbooks like [12], [23], and [45].
1.5.1 XON/XOFF and Similar Methods This family of flow control methods is simple. The receiver distinguishes only two different states: ready or not ready to accept frames. The transmitter, upon acquiring a ready signal, transmits frames at an arbitrary rate until it acquires a not-ready signal. After this, the transmitter does not transmit any data packets until a ready signal is again acquired. This basic scheme is implemented in different protocols. One application of this scheme can be found in the ITU V.24 recommendation (equivalent to EIA RS232-C), providing an interface between a DTE (data terminal equipment, for example, a computer) and
© 2005 by CRC Press
Principles of Lower-Layer Protocols for Data Communications
1-19
a DCE (data communications equipment, for example, a modem). This interface targets asynchronous and serial communications. It provides either 9 or 25 lines, 2 of which are the request-to-send and clearto-send lines. When the DTE wants to send data to the DCE, it raises the RTS line. If the DCE is willing to accept data, it answers by raising the CTS line. As soon as the DCE lowers the CTS signal, the DTE must stop transmission. This is an out-of-band signaling mechanism since data and flow control signals do not share the same line. The XON/XOFF mechanism is employed, for example, in the North American Digital Data System (DDS) [81, Section 24-11]. This mechanism rests on two characters of the underlying charset. For example, in the ASCII charset the DC1 character is used for XON and the DC3 character is used for XOFF. When the transmitter receives an XOFF character, it must stop its transmission, and it may resume as soon as XON is received. This is an in-band mechanism; the occurrence of these characters in the payload must be prevented by using proper escaping mechanisms (see also Section 1.2). The HDLC family of link layer protocols [81, Section 26-2] uses special supervisory frames for executing flow control: the RR (receive ready) and RNR (receive not-ready) frames. Since user data are transmitted in another type of frames, the RR/RNR mechanism uses out-of-band signaling. The receiver issues an RNR frame when all its buffers are full. As soon as it can accept new data, the receiver sends an RR frame. This frame also contains the sequence number of the next expected data frame (see Section 1.7).
1.5.2 Sliding-Window Flow Control In this class of schemes flow control is integrated with a sliding-window ARQ protocol like Goback-N or Selective-Repeat (see Section 1.4.2). The transmitter has a buffer for a number W of packets, called its window. The window size W specifies the number of allowed outstanding packets, i.e., packets for which the transmitter has not received an acknowledgment (yet). The receiver can use this for flow control by delaying its acknowledgments. This approach is tightly integrated with the ARQ protocol and does not need any extra control frames or escape mechanisms. However, there are two important drawbacks: • Delaying acknowledgments is not a good idea for time-critical transmissions with desired response times in the millisecond range. • Even without real-time requirements, the link layer protocol does not wait arbitrarily long for acknowledgments. Instead, with each frame a timeout is associated. In link layer protocols these timeouts are typically chosen such that propagation delays, packet generation delays, and processing speeds are included, but nothing more. This is in sharp contrast to multihop networks, where queueing delays are a significant fraction of the overall delay. In multihop networks, timeouts are therefore chosen much larger than necessary for a single link. Either way, if a timeout occurs, the transmitter retransmits the packet. If the receiver is still busy, the retransmission is wasted.
1.5.3 Further Mechanisms One straightforward approach to flow control can be applied in connection-oriented link layer protocols. Upon connection setup, the receiving station specifies a rate by which the transmitter can send packets [80], and which the receiver guarantees to always accept. Instead of a rate specification, the receiver can also specify the parameters (s, r) of a leaky bucket. The leaky-bucket scheme works as follows: The transmitter generates permits at rate r, i.e., every 1/r seconds. The transmitter is allowed to keep a maximum number of s permits; any permit in excess of this number is dropped. When a packet is to be transmitted, it is checked whether there is a permit. If so, the number of stored permits is decremented and the packet is transmitted. Otherwise, the transmitter has to wait for the arrival of the next permit. Another approach is used by TCP. TCP contains a flow control mechanism where the receiving end of a TCP connection tells the sender explicitly about its available buffer space (the advertised window). The advertised window is part of the TCP header and carried in each acknowledgment or data packet
© 2005 by CRC Press
1-20
The Industrial Communication Technology Handbook
going back from the receiver to the transmitter. This mechanism is independent of any underlying link layer flow control mechanism.
1.6 Packet Scheduling Algorithms At an abstract level, a packet scheduling algorithm selects the packet to be transmitted next, after service of the current packet has been finished. The packet is selected from a set of waiting packets. The packet waiting room can be located within a single station, but it can also be distributed over several stations. In the latter case, a MAC protocol can be considered part of a packet scheduling algorithm, since it actually decides about the station transmitting a packet, and the winning station has to make a local decision regarding which of its waiting packets to transmit. In this section we consequently restrict the perspective to a single station and its packet scheduler. As opposed to processor scheduling algorithms, packet scheduling algorithms are nonpreemptive; i.e., an ongoing transmission is not interrupted upon arrival of a more important packet. A packet scheduler bases its decision upon some performance objectives to be optimized. Typical objectives are delay, avoiding deadline misses, jitter avoidance, fairness, throughput, and priority. In the absence of any specific criterion, packets are often served on a first-come, first-served (FCFS) basis. In this section we discuss some popular scheduling schemes. A more detailed introduction to packet scheduling and more general scheduling problems are [87, Chapter 9] and [86].
1.6.1 Priority Scheduling In priority scheduling each packet is tagged with an explicit priority value, or the packet priority is derived from other packet attributes like addresses, packet types, and so on. The scheduler always selects the packet that currently has the highest priority. Multiple packets of the same highest priority are served in random order or FCFS order. Some algorithms map time-dependent information onto priorities. One example is the rate-monotonic scheduling algorithm [88] and its nonpreemptive extensions. Here it is assumed that the packets are generated from different periodic streams or flows, and each packet is associated with a deadline corresponding to its flow period. The priorities are then assigned in inverse order of the periods; therefore, the stream with the smallest period receives highest priority. Another example is the earliest-deadlinefirst (EDF) algorithm where the packet with the tightest deadline has highest priority.
1.6.2 Fair Scheduling In the last years many algorithms for fair queueing have been developed. The packets are grouped into distinct flows, and to each flow a separate queue is associated. Within a flow, packets are served in FCFS order. A nonempty queue is said to be backlogged. The goal of a fair scheduling algorithm is twofold: • Each backlogged flow should get a minimum share of the available bandwidth independent of the behavior of the other flows (firewall property). • The bandwidth of a currently inactive flow should be fairly distributed to the other flows, thus making efficient use of bandwidth. One of the simplest fair queueing algorithms is the round-robin algorithm, where all backlogged queues are served in round-robin order. A modification of this scheme is weighted round-robin, where each flow i is associated with a specific weight fi such that
 f = 1 (F is the set of all flows). The time is i
iŒF
divided into epochs of fixed upper length t. At the start of an epoch the scheduler determines the set of backlogged queues. The first nonempty queue j receives service until it is empty or its transmission time approaches its share of fj · t of the overall epoch. Following this, the second nonempty queue is served, and so on. The next epoch starts when all nonempty queues of the previous epoch are served.
© 2005 by CRC Press
Principles of Lower-Layer Protocols for Data Communications
1-21
Other fair queueing algorithms have been derived from the generalized processor sharing (GPS) approach [90–92]. In its pure form, GPS assumes a number of flows with associated weights fi
Â
Ê fi = 1ˆ˜ . All backlogged queues are served in parallel. Serving a queue means transmission of the Á ¯ Ë iŒF packet at the head of the queue. As soon as this packet is finished, the next packet’s transmission is started. It can be shown that GPS has the following desirable property: if queue i is always nonempty during time interval (t1,t2) and if Wi(t1,t2) is the amount of service queue i receives during (t1,t2), then for a GPS server the following holds: Wi (t 1,t 2 ) fi ≥ W j (t 1,t 2 ) f j for all sessions j (except those with Wj(t1,t2) = 0). If both sessions i and j are backlogged during (t1,t2), then we have Wi (t 1,t 2 ) fi = W j (t 1,t 2 ) f j This scheme, however, is not directly usable for packet-based communications, since at most one packet can be transmitted at a time. Therefore, packet-based approximations to GPS have been developed. A good approximation strategy tries to pick the packets in the same order as they would finish under GPS. This decision has to be made each time a packet has finished transmission and the next packet is to be picked. However, at this time the packet that would finish next under GPS may not have arrived yet. The scheduler can only take the currently backlogged queues into account when making its decision. In weighted fair queueing (WFQ) the scheduler simulates the GPS operation. More specifically, for each flow a virtual time is maintained. For the k-th packet of flow i, the virtual start and finish times Sik and Fik are defined as [84] Sik = max{Fik -1, V (aik )} Fik = Sik +
Lki ri
where Fi 0 = 0, ri = r · fi (r is the overall link capacity), aik is the arrival time of the k-th packet in flow i, Lki is its length, and V(·) is a so-called virtual-time function. This basic algorithm can be performed with different virtual-time functions (vtf). For vtf of WFQ, the following holds: VWFQ(t 1 ) = 0 ∂VWFQ(t) ∂t
=
1
Â
fi
iŒBWFQ ( t )
t1 denotes the beginning of a system busy period and BWFQ(t) denotes the set of backlogged queues at time t. By this definition the vtf may change on every packet arrival or departure, and the vtf needs to be tracked by the scheduler. Therefore, the vtf has a computational complexity proportional to the number of flows. A number of schemes have been developed with similar properties but lower complexity. One example is frame-based fair queueing [93]. Both GPS and WFQ can be shown to guarantee a minimum service rate of ri = r · fi to a flow i. If flow i is leaky bucket constrained (i.e., the packets have a minimum
© 2005 by CRC Press
1-22
The Industrial Communication Technology Handbook
interarrival time and the packet length is appropriately bounded), then it can also be shown that for each packet an upper bound on its finishing time can be guaranteed. In the context of wireless transmission media, the situation changes slightly: to avoid waste of resources, the scheduler should pick a packet only from those backlogged flows where the head-of-queue packet is destined to a station for which the wireless channel is currently in a good state. Therefore, a backlogged flow does not receive service if its packet is likely to fail [85]. A number of wireless fair queueing schemes has been developed (see, for example, [89]), differing, for example, in the amount of compensation granted to flows suffering from bad channels for some time.
1.7 Link Layer Protocols In this section we present two standard link layer protocols. In general, a link layer protocol combines several of the mechanisms discussed in the previous sections.
1.7.1 The HDLC Protocol Family The HDLC protocol (high-level data link control) [81, Section 26.2; 82] can be considered the “mother” of many link layer protocols, including LAPB (used in X.25), LAPD (used in ISDN), LAPM (used in GSM), and the IEEE 802.2 Logical Link Control (LLC) protocol discussed in Section 1.7.2. An HDLC variant is also used in the IEEE BITBUS standard [29]. It is designed for point-to-point links; however, it can also be used over multiple-access channels with unique station addresses. The HDLC protocol distinguishes the following station types: • A primary station controls the link; it is responsible for error control, flow control, and setup and teardown of the logical link. All frames generated by a primary station are called commands. • A secondary station is controlled by a primary station. Specifically, it may not initiate data transfers on its own. Frames generated by a secondary station are called responses. • A combined station combines these two roles. These two station types can be used in two different configurations: • In the unbalanced configuration there is a single primary station and a number of secondary stations with distinct addresses. • In the balanced configuration two combined stations are connected. Since either station is both a primary and a secondary station, both can initiate data transfers. The HDLC protocol offers three modes of operation: • In the normal response mode (NRM) there is a central coordinator (a primary station) and a number of secondary stations. The secondary stations only send frames upon being polled by the primary station. • In the asynchronous response mode (ARM) the same configuration is used as in the normal response mode, but a secondary station may send frames on its own, without having to wait to be polled. • The asynchronous balanced mode (ABM) is used on point-to-point links. There is a combined station at either end of the link. This mode is used, for example, in the X.25 link layer and in the IEEE 802.2 Logical Link Control (see next section). The protocol is built upon three different frame types, illustrated in Figure 1.9. The general frame format has a flag field, used for bit and frame synchronization (see Section 1.2); an address field to identify a specific secondary on a multipoint link; a control field (explained below); an optional information field carrying the user data; a frame check sequence (FCS) field containing a 16- or 32-bit CRC checksum; and a closing flag. The three frame types are distinguished by their purpose and the different layouts of the control field:
© 2005 by CRC Press
1-23
Principles of Lower-Layer Protocols for Data Communications
8 bits
x * 8 bits
Flag
8/16 bits
Address
N bits
Control
Iframe
0
N(S)
Uframe
1
1
M
M
P/F
Sframe
1
0
S
S
P/F
N(S) N(S) P /F
Information
16/32 bits
8 bits
FCS
Flag
N(R) N(R) N(R)
M
M
M
N(R) N(R) N(R)
FIGURE 1.9 HDLC frame structure.
• Supervisory frames (S-frames) are used for error control and flow control purposes: — The two S-bits in the supervisory frame correspond to four different receiver answers: RR and RNR are used for flow control (see Section 1.5), whereas the two answers REJ and SREJ belong to the Goback-N or Selective-Repeat ARQ protocol (the HDLC frame format is usable for both protocols). — The P/F (poll/final) bit is a poll bit if it is sent in a command frame; otherwise, it is called the final bit. If the poll bit is set, the primary requires an acknowledgment for the corresponding command frame. If the secondary answers with the final bit set to one, this indicates that the command frame has been received and the corresponding command has been successfully executed. If the final bit is zero, the secondary indicates successful reception of the command frame, but executing the requested command has not (yet) been finished. — The receiver sequence number N(R) is discussed below. • Unnumbered frames (U-frames) are used for link management purposes (link setup, teardown). The five M-bits (mode bits) encode commands and responses. When used as a command, the primary can set the secondary’s mode of operation (ABM, ARM, NRM), reset the secondary, disconnect from the secondary, or reject a frame. When used as a response, the M-bits either acknowledge or reject a command. • Information frames (I-frames) carry user data. The control field contains transmit and receive sequence numbers. A station transmitting data packets equips each I-frame with a sequence number N(S). The receiver checks whether N(S) is the same as the expected sequence number N(R). If so, the receiver increments N(R) and sends an acknowledgment carrying the new value. This acknowledgment is either piggybacked onto an I-frame going in the opposite direction or sent as a separate S-frame when there is no outgoing I-frame ready for transfer. With the help of these sequence numbers, the receiver can detect lost and duplicate frames. HDLC supports procedures for setup and teardown of a logical connection and allows specification of the operation mode (ABM, ARM, NRM) between two stations. With the available frame types, different ARQ protocols can be implemented, most notably Goback-N and Selective-Repeat (called Selective-Reject in the context of HDLC). The several HDLC variants differ in their ARQ protocols and in the supported modes. In the next section we briefly discuss one of HDLC’s descendants, the IEEE 802.2 Logical Link Control protocol.
1.7.2 The IEEE 802.2 LLC Protocol The IEEE 802.2/ISO/IEC 8802-2 Logical Link Control (LLC) protocol [83] is a member of the IEEE 802.x family of MAC and link layer protocols. Specifically, it operates on top of the different IEEE 802.x MAC protocols, like Ethernet (IEEE 802.3), Token Bus (IEEE 802.4), Token Ring (IEEE 802.5), or wireless LAN
© 2005 by CRC Press
1-24
The Industrial Communication Technology Handbook
(IEEE 802.11). The LLC protocol offers three services to upper layers: an unacknowledged connectionless datagram service (best effort), an acknowledged connectionless datagram service, and a reliable connection-oriented service. The LLC can run with several MAC protocols because it makes rather weak assumptions about the MAC services: nothing more than a connectionless best-effort service is assumed. All these services use addressing information consisting of four attributes: source and destination MAC addresses as well as source and destination service access points (SAPs). Consequently, all packets carry SAP addresses in addition to MAC addresses. The connection-oriented service requires explicit setup and teardown of a link layer connection. A link layer connection is characterized by source and destination MAC and SAP addresses. For each link layer connection there is a separate connection context, which includes, among others, the sequence numbers. A connection provides reliable and in-sequence data delivery, and it is additionally possible to request flow control operations. Specifically, the upper layers can specify an amount of data they are willing to accept. The sequence number fields are larger than shown in Figure 1.9; up to 127 sequence numbers can be distinguished. The ARQ protocol is essentially Goback-N; the Selective-Reject feature of HDLC is not used. The LLC uses the asynchronous balanced mode.
Abbreviations ATM — Asynchronous Transfer Mode EIA — Electrical Industries Association FMS –– Fieldbus Message Specification FTT — Flexible Time–Triggered GSM — Global System for Mobile communications ITU — International Telecommunications Union LAP — Link Access Procedure
References Bit and Frame Synchronization [1] Stuart Cheshire and Mary Baker. Consistent overhead byte stuffing. ACM SIGCOMM Computer Communication Review, 27:209–220, 1997. [2] IEEE. Carrier Sense Multiple Access with Collision Detection (CSMA/CD): ETHERNET, 1985. [3] International Organization for Standardization (ISO). IS 1177-1985, Character Structure for Start/ Stop and Synchronous Character Oriented Transmission, 1985. [4] J. Manchester, J. Anderson, B. Doshi, and S. Dravida. IP over SONET. IEEE Communications Magazine, 36(5): 136–142, May 1998. [5] W. Simpson. RFC 1661, The Point-to-Point Protocol (PPP), July 1994. [6] W. Simpson. RFC 1662, PPP in HDLC-Like Framing, July 1994. Obsolete RFC 1549. Status: STANDARD.
Medium Access Control Protocols [7] Norman Abramson. Development of the ALOHANET. IEEE Transactions on Information Theory, 31:119–123, 1985. [8] Norman Abramson, Editor. Multiple Access Communications: Foundations for Emerging Technologies. IEEE Press, New York, 1993. [9] G. Agrawal, B. Chen, W. Zhao, and S. Davari. Guaranteeing synchronous message deadlines with the timed-token medium access control protocol. IEEE Transactions on Computers, 43:327–339, 1994. [10] Luis Almeida, Paulo Pedreiras, and Jose Alberto G. Fonseca. The FFT-CAN protocol: why and how. IEEE Transactions on Industrial Electronics, 49:1189–1201, 2002.
© 2005 by CRC Press
Principles of Lower-Layer Protocols for Data Communications
1-25
[11] Michael Berry, Andrew T. Campbell, and Andras Veres. Distributed control algorithms for service differentiation in wireless packet networks. In Proc. INFOCOM 2001, Anchorage, AK, April 2001. IEEE. [12] D. Bertsekas and R. Gallager. Data Networks. Prentice Hall, Englewood Cliffs, NJ, 1987. [13] Bluetooth Consortium. Specification of the Bluetooth System. http://www.bluetooth.org, 1999. [14] J.I. Capetanakis. Tree algorithm for packet broadcast channels. IEEE Transactions on Information Theory, 25:505–515, 1979. [15] Biao Chen, Nicholas Malcolm, and Wei Zhao. Fiber distributed data interface and its use for timecritical applications. In Jerry D. Gibson, editor, The Communications Handbook, pp. 597–610. CRC Press/IEEE Press, Boca Raton, FL, 1996. [16] Carla-Fabiana Chiasserini and Ramesh R. Rao. Coexistence mechanisms for interference mitigation in the 2.4-GHz ISM band. IEEE Transactions on Wireless Communications, 2:964–975, 2003. [17] Lou Dellaverson and Wendy Dellaverson. Distributed channel access on wireless ATM links. IEEE Communications Magazine, 35:110–113, 1997. [18] ETSI. High Performance Radio Local Area Network (HIPERLAN): Draft Standard, March 1996. [19] Andras Farago, Andrew D. Myers, Violet R. Syrotiuk, and Gergely V. Zaruba. Meta-MAC protocols: automatic combination of MAC Protocols to optimize performance for unknown conditions. IEEE Journal on Selected Areas in Communications, 18:1670–1681, 2000. [20] Robert G. Gallager. A perspective on multiaccess channels. IEEE Transactions on Information Theory, 31:124–142, 1985. [21] Ajay Chandra V. Gummalla and John O. Limb. Wireless medium access control protocols. IEEE Communications Surveys and Tutorials, 3, 2–15, 2000. http://www.comsoc.org/pubs/surveys. [22] Jaap C. Haartsen. The Bluetooth radio system. IEEE Personal Communications, 7:28–36, 2000. [23] Fred Halsall. Data Communications, Computer Networks and Open Systems. Addison-Wesley, Reading, MA, 1996. [24] J.F. Hayes. Modeling and Analysis of Computer Communications Networks. Plenum Press, New York, 1984. [25] Ivan Howitt. Bluetooth performance in the presence of 802.11b WLAN. IEEE Transactions on Vehicular Technology, 51:1640–1651, 2002. [26] IEC. IEC 1158-1, FieldBus Specification: Part 1: FieldBus Standard for Use in Industrial Control: Functional Requirements. [27] IEEE. IEEE 802.4, Token-Passing Bus Access Method, 1985. [28] IEEE. IEEE 802.5, Token Ring Access Method and Physical Layer Specifications, 1985. [29] IEEE. IEEE 1118, Standard Microcontroller System Serial Control Bus, August 1991. [30] ISO. ISO 11898, Road Vehicle: Interchange of Digital Information: Controller Area Network (CAN) for High-Speed Communication, 1993. [31] Raj Jain. FDDI Handbook: High-Speed Networking Using Fiber and Other Media. Addison-Wesley, Reading, MA, 1994. [32] Leonard Kleinrock and Fouad A. Tobagi. Packet switching in radio channels. Part I. Carrier sense multiple access models and their throughput-/delay-characteristic. IEEE Transactions on Communications, 23:1400–1416, 1975. [33] J.F. Kurose, M. Schwartz, and Y. Yemini. Multiple-access protocols and time-constrained communication. ACM Computing Surveys, 16:43–70, 1984. [34] S.S. Lam. Multiaccess protocols in computer communications. In W. Chon, Editor, Principles of Communication and Network Protocols, Volume I, Principles, pp. 114–155. Prentice Hall, Englewood Cliffs, NJ, 1983. [35] Jim Lansford, Adrian Stephens, and Ron Nevo. Wi-Fi (802.11b) and Bluetooth: enabling coexistence. IEEE Network Magazine, 15:20–27, 2001. [36] Andrew D. Myers and Stefano Basagni. Wireless media access control. In Ivan Stojmenovic, Editor, Handbook of Wireless Networks and Mobile Computing, pp. 119–143. John Wiley & Sons, New York, 2002.
© 2005 by CRC Press
1-26
The Industrial Communication Technology Handbook
[37] IEEE. IEEE 802.11, Standard for Information Technology: Telecommunications and Information Exchange between Systems: Local and Metropolitan Networks: Specific Requirements: Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications: Higher Speed Physical Layer (PHY) Extension in the 2.4 GHz Band, 1999. [38] Nikos Passas, Sarantis Paskalis, Dimitri Vali, and Lazaros Merakos. Quality-of-service-oriented medium access control for wireless ATM networks. IEEE Communications Magazine, 35:42–50, 1997. [39] C.S. Raghavendra and Suresh Singh. Pamas: power aware multi-access protocol with signalling for ad hoc networks. ACM Computer Communication Review, 27, 5–26, 1998. [40] Theodore S. Rappaport. Wireless Communications: Principles and Practice. Prentice Hall, Upper Saddle River, NJ, 2002. [41] Erwin P. Rathgeb. Integrated services digital network (ISDN) and broadband (B-ISDN). In Jerry D. Gibson, Editor, The Communications Handbook, pp. 577–590. CRC Press/IEEE Press, Boca Raton, FL, 1996. [42] Izhak Rubin. Multiple access methods for communications networks. In Jerry D. Gibson, Editor, The Communications Handbook, pp. 622–649. CRC Press/IEEE Press, Boca Raton, FL, 1996. [43] S.R. Sachs. Alternative local area network access protocols. IEEE Communications Magazine, 26:25–45, 1988. [44] Hideaki Takagi. Analysis of Polling Systems. MIT Press, Cambridge, MA, 1986. [45] Andrew S. Tanenbaum. Computer Networks, 3rd edition. Prentice Hall, Englewood Cliffs, NJ, 1997. [46] Andrew S. Tanenbaum. Computernetzwerke, 3rd edition. Prentice Hall, Muenchen, 1997. [47] IEEE. IEEE 802.11, Standard for Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications, November 1997. [48] Fouad A. Tobagi. Multiaccess protocols in packet communications systems. IEEE Transactions on Communications, 28:468–488, 1980. [49] Fouad A. Tobagi. Multiaccess link control. In P.E. Green, Editor, Computer Network Architectures and Protocols. Plenum Press, New York, 1982. [50] Fouad A. Tobagi and Leonard Kleinrock. Packet switching in radio channels. Part II. The hidden terminal problem in CSMA and busy-tone solutions. IEEE Transactions on Communications, 23:1417–1433, 1975. [51] TTTech Computertechnik GmbH, Vienna. TTP/C Protocol, Version 0.5, 1999. [52] Union Technique de l’Electricit’e. General Purpose Field Communication System, EN 50170, Volume 2, PROFIBUS, 1996. [53] Union Technique de l’Electricit’e. General Purpose Field Communication System, EN 50170, Volume 3, WorldFIP, 1996. [54] Harmen R. van As. Media access techniques: the evolution towards terabit/s LANs and MANs. Computer Networks and ISDN Systems, 26:603–656, 1994. [55] Bernhard Walke. Mobile Radio Networks: Networking, Protocols and Traffic Performance. John Wiley & Sons, Chichester, 2002. [56] Andreas Willig and Andreas Köpke. The adaptive-intervals MAC protocol for a wireless PROFIBUS. In Proc. 2002 IEEE International Symposium on Industrial Electronics, L’Aquila, Italy, July 2002.
Error Control [57] A. Annamalai and Vijay K. Bhargava. Analysis and optimization of adaptive multicopy transmission arq protocols for time-varying channels. IEEE Transactions on Communications, 46:1356–1368, 1998. [58] Sergio Benedetto, Guido Montorsi, and Dariush Divsalar. Concatenated convolutional codes with interleavers. IEEE Communications Magazine, 41:102–109, 2003. [59] Henrik Bengtsson, Elisabeth Uhlemann, and Per-Arne Wiberg. Protocol for wireless real-time systems. In Proc. 11th Euromicro Conference on Real-Time Systems, York, England, 1999. [60] Claude Berrou. The ten-year-old turbo codes are entering into service. IEEE Communications Magazine, 41:110–116, 2003.
© 2005 by CRC Press
Principles of Lower-Layer Protocols for Data Communications
1-27
[61] Daniel J. Costello, Joachim Hagenauer, Hideki Imai, and Stephen B. Wicker. Applications of errorcontrol coding. IEEE Transactions on Information Theory, 44:2531–2560, 1998. [62] Moncef Elaoud and Parameswaran Ramanathan. Adaptive use of error-correcting codes for real-time communication in wireless networks. In Proc. INFOCOM 1998, San Francisco, March 1998. IEEE. [63] David Haccoun and Samuel Pierre. Automatic repeat request. In Jerry D. Gibson, Editor, The Communications Handbook, pp. 181–198. CRC Press/IEEE Press, Boca Raton, FL, 1996. [64] Bruce A. Harvey and Stephen B. Wicker. Packet combining systems based on the Viterbi decoder. IEEE Transactions on Communications, 42:1544–1557, 1994. [65] Olivier Hersent, David Gurle, and Jean-Pierre Petit. IP Telephony: Packet-Based Multimedia Communications Systems. Addison-Wesley, Harlow/England, London, 2000. [66] Samir Kallel. Analysis of a type-II hybrid ARQ scheme with code combining. IEEE Transactions on Communications, 38:1133–1137, 1990. [67] Paul Lettieri, Curt Schurgers, and Mani B. Srivastava. Adaptive link layer strategies for energyefficient wireless networking. Wireless Networks, 5:339–355, 1999. [68] Paul Lettieri and Mani Srivastava. Adaptive frame length control for improving wireless link throughput, range and energy efficiency. In Proc. INFOCOM 1998, pp. 564–571, San Francisco, 1998. IEEE. [69] Shu Lin and Daniel J. Costello. Error Control Coding: Fundamentals and Applications. Prentice Hall, Englewood Cliffs, NJ, 1983. [70] Hang Liu, Hairuo Ma, Magda El Zarki, and Sanjay Gupta. Error control schemes for networks: an overview. MONET: Mobile Networks and Applications, 2:167–182, 1997. [71] Arnold M. Michelson and Allen H. Levesque. Error-Control Techniques for Digital Communication. John Wiley & Sons, New York, 1985. [72] Arogyaswami Paulraj. Diversity techniques. In Jerry D. Gibson, Editor, The Communications Handbook, pp. 213–223. CRC Press/IEEE Press, Boca Raton, FL, 1996. [73] Martin L. Shooman. Reliability of Computer Systems and Networks. John Wiley & Sons, New York, 2002. [74] Bernard Sklar. Digital Communications: Fundamentals and Applications. Prentice Hall, Englewood Cliffs, NJ, 1988. [75] Bernard Sklar. A primer on turbo code concepts. IEEE Communications Magazine, 35, 94–102, 1997. [76] Jonathan Stone, Michael Greenwald, Craig Partridge, and James Hughes. Performance of checksums and CRC’s over real data. IEEE/ACM Transactions on Networking, 6:529–543, 1998. [77] Elisabeth Uhlemann, Per-Arne Wiberg, Tor M. Aulin, and Lars K. Rasmussen. Deadline-dependent coding: a framework for wireless real-time communication. In Proc. International Conference on RealTime Computing Systems and Applications, pp. 135–142, Cheju Island, South Korea, December 2000. [78] Xin Wang and Michael T. Orchard. On reducing the rate of retransmission in time-varying channels. IEEE Transactions on Communications, 51:900–910, 2003. [79] Andreas Willig, Martin Kubisch, Christian Hoene, and Adam Wolisz. Measurements of a wireless link in an industrial environment using an IEEE 802.11-compliant physical layer. IEEE Transactions on Industrial Electronics, 49:1265–1282, 2002.
Flow Control [80] Rene L. Cruz. Routing and flow control. In Jerry D. Gibson, Editor, The Communications Handbook, pp. 650–660. CRC Press/IEEE Press, Boca Raton, FL, 1996. [81] Roger L. Freeman. Reference Manual for Telecommunications Engineering, 3rd edition, Volume 2. John Wiley & Sons, New York, 2002.
Link Layer Protocols [82] D.E. Carlson. Bit-oriented data link control procedures. IEEE Transactions on Communications, 28:455–467, 1980.
© 2005 by CRC Press
1-28
The Industrial Communication Technology Handbook
[83] LAN/MAN Standards Committee of the IEEE Computer Society. International Standard ISO/IEC 8802-2, Information Technology: Telecommunications and Information Exchange between Systems: Local and Metropolitan Area Networks: Specific Requirements: Part 2: Logical Link Control, 1998.
Packet Scheduling [84] Jon C.R. Bennet and Hui Zhang. Hierarchical packet fair queueing algorithms. In Proc. ACM SIGCOMM, 1996. Association of Computing Machinery. [85] Pravin Bhagwat, Partha Bhattacharya, Arvind Krishna, and Satish K. Tripathi. Using channel state dependent packet scheduling to improve TCP throughput over wireless LANs. Wireless Networks, 3:91–102, 1997. [86] E.G. Coffman, Jr. Computer and Job-Shop Scheduling Theory. John Wiley & Sons, New York, 1982. [87] Srinivasan Keshav. An Engineering Approach to Computer Networking: ATM Networks, the Internet and the Telephone Network. Addison-Wesley, Reading, MA, 1997. [88] C.L. Liu and J. Layland. Scheduling algorithms for multiprogramming in a hard real-time environment. Journal of the ACM, 20:46–61, 1973. [89] Songwu Lu, Vaduvar Bharghavan, and Rayadurgan Srikant. Fair queueing in wireless packet networks. In Proc. of ACM SIGCOMM ’97 Conference, pp. 63–74, Cannes, France, September 1997. [90] A.K. Parekh and R.G. Gallager. A generalized processor sharing approach to flow control in integrated services networks: the single node case. In Proc. IEEE INFOCOM, Volume 2, pp. 915–924, 1992. IEEE. [91] A.K. Parekh and R.G. Gallager. A generalized processor sharing approach to flow control in integrated services networks: the multiple node case. In Proc. IEEE INFOCOM, Volume 2, pp. 521–530, 1993. IEEE. [92] Abhay Kumar J. Parekh. A Generalized Processor Sharing Approach to Flow Control in Integrated Services Networks. Ph.D. dissertation, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, February 1992. [93] Anujan Varma and Dimitrios Stiliadis. Hardware implementation of fair queueing algorithms for ATM networks. IEEE Communications Magazine, 35:54–68, 1997.
© 2005 by CRC Press
2 IP Internetworking 2.1
ISO/OSI Reference Model ..................................................2-1 The Physical Layer • The Data Link Layer • The Network Layer • The Transport Layer • The Session Layer • The Presentation Layer • The Application Layer
2.2
The TCP/IP Reference Model.............................................2-4 The Host-to-Network Layer • The Internet Layer • The Transport Layer • The Application Layer
2.3 2.4
Reference Model Comparison............................................2-6 Data Link Layer Protocols and Services ............................2-8 Frame Creation • Error Detection and Correction • Media Access Control
2.5
Network Layer Protocols and Services ............................2-10 IPv4 • IPv4 Multicasting • IPv6 • Address Resolution Protocol • Internet Control Message Protocol • Internet Group Management Protocol
2.6
Transport Layer Protocols and Services ..........................2-18 Transmission Control Protocol • User Datagram Protocol • Resource Reservation Protocol
2.7 2.8
Helmut Hlavacs University of Vienna
Christian Kurz University of Vienna
Presentation Layer Protocols and Services ......................2-21 Application Layer Protocols and Services .......................2-22 TELNET • File Transfer Protocol • Hypertext Transfer Protocol • Simple Mail Transfer Protocol • Resource Location Protocol • Real-Time Protocol
2.9 Summary............................................................................2-26 References .....................................................................................2-26
2.1 ISO/OSI Reference Model The ISO/OSI reference model [ISO7498] was developed by ISO (International Organization for Standardization) and finished in 1982. The OSI (Open Systems Interconnection) reference model allows the connection of open systems. This objective is reached by applying a layered approach. The communication system is divided into seven layers (see Figure 2.1) [PET2000]. The lowest three layers are network dependent. They provide support for data communication between and linking of two systems. The upper three layers are application oriented. They allow the end-user application processes to interact with each other. The intermediate layer (transport layer) isolates the application-oriented layers from the communication details at the lower layers [HAL1996]. Each layer performs a well-defined function. This allows the reduction of the level of complexity at each layer and is defined by a protocol. The information flow between the layers is directed through interfaces and should be minimized [TAN1996]. Each layer exchanges messages using services of the layer below. It communicates with the related peer at the same level in a remote system and provides services to the layer above [COL2001]. At each layer the source host adds a header to the packet, which
2-1 © 2005 by CRC Press
2-2
The Industrial Communication Technology Handbook
Host A
Host B Application Layer
Interface Presentation Layer Interface
Application Oriented Layers
Session Layer Interface Transport Layer
Intermediate Layer
Interface Network Layer Interface Data Link Layer
Network Dependent Layers
Interface Physical Layer
Physical Link (e.g., Cable)
FIGURE 2.1 ISO/OSI reference model. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)
is read and removed again by the receiver. It is important to note that the implementation of one layer is therefore independent from the implementation of the other layers. In the next section, each of the layers is discussed separately, starting at the lowest one.
2.1.1 The Physical Layer The lowest layer is concerned with the transmission of raw bits from the electrical interface of the user equipment to the communication channel. This can be either an electrical, optical, or wireless medium, and it transfers a serial stream of data. It has to be ensured that a sent 1 bit is seen by the receiver as a 1 bit, not as a 0 bit. Design issues at this layer are, for example, how long one bit lasts and by which wavelength of light or by which voltage level a 1 and a 0 bit is represented. Additionally, the handling of the initial connection and the closure of the connection are carried out at the physical layer. Also, mechanical properties of the network equipment, such as size and shape of connectors and cables, have to be specified. Furthermore, electrical (or optical) parameters must be determined. These are the voltage levels, electrical resistance of the cable, duration of signaling elements and voltage changes, and coding method. The next issue handled by the physical layer is the functional specification. It concerns the meaning of switched connections, distinguishing between data and control wires and specifying clock rate and ground.
2.1.2 The Data Link Layer As the physical layer is only concerned with the transmission of raw data, the main function of the data link layer is to recognize and correct transmission errors. For this reason, the sender divides the data stream into frames that are transmitted sequentially. When the frame is received, an acknowledgment may be sent back to the sender. If a frame is destroyed by a burst in the line and therefore is not acknowledged, it is retransmitted by the sender. As the acknowledgment frame could also be lost, care
© 2005 by CRC Press
IP Internetworking
2-3
has to be taken that no duplicate frames are inserted into the data stream. The data link layer therefore solves problems arising from lost, damaged, or duplicate frames. This layer may also offer different service classes, e.g., for protected or unprotected services. If the receiver is slower than the sender, frames can be lost because of different processing speeds. To prevent this scenario, a mechanism is implemented to regulate network traffic. Therefore, the sender should know how much buffer space is left at the receiver. Another task of the data link layer is the media access control within broadcasting networks. In these networks, all connected computers perceive all data transferred; they share a common link. Therefore, it has to be made sure that there is only one sender at a time to avoid data collision. If a collision occurs, it has to be detected and retransmission of all affected data has to be initiated. When data can be transmitted in both directions simultaneously, the acknowledgment frame for sender A sending to receiver B competes with the data frames that B is sending to A. A solution for this problem is piggybacking, where the acknowledgment information is added to data packages sent, instead of sending additional frames.
2.1.3 The Network Layer The network layer is responsible for the setup, handling, and termination of network-wide connections; it controls the operation of the subnet. There are two possible types of network connections: virtual connections and datagram connections. Virtual connections are set up at the start of a transmission to fix the route for the following data packets; packets are always sent using the same route. Using a datagram connection, the route is chosen separately for each package. Sometimes it has to be ensured that packets arrive in the same order as they were sent. A packet B sent after a packet A may arrive in front of A using a different route in datagram communication. Additionally, the network layer is concerned with package routing from source to destination. Routing information can either be stored in a static table or be determined dynamically at the start of each transmission. The chosen route can also depend on the current network load. If too many packets are sent in one subnet, a capacity bottleneck forms. To avoid this situation, the network layer may implement congestion control. To be able to analyze network traffic, an accounting mechanism is incorporated at this layer. This mechanism counts how many packets are sent, also storing information about packet source and destination. The gathered information can be used to produce billing information. Also, there may be problems when a packet is traveling through heterogeneous networks. The network layer handles the issues of different packet sizes, varying addressing schemas, or different protocols.
2.1.4 The Transport Layer This layer is the interface between the higher application-oriented layers and the underlying networkdependent layers. Thus, the session layer can transfer messages independent from the network structure. Seen from layers above, messages can be transferred transparently without having knowledge of the underlying network structure. The transport layer basically cuts messages into smaller packets if needed and passes them to the network layer. At the sender, the messages are assembled again and passed to the session layer. An important task of the transport layer is the handling of transport connections. Normally, one network connection is created for each transport connection required by the session layer. If the session layer requires higher output than can be handled by one connection, the network layer might create additional connections. On the other hand, if one wants to save costs, a number of transport connections can be multiplexed onto one network connection. As there is the possibility to set up multiple connections, a transport header is added to distinguish between them. The transport layer provides different classes of quality of service (QoS). The lowest service class provides only basic functionality for connection establishment; the highest class allows full error control and flow control. To avoid the situation of a fast sender overrunning a slower receiver with messages, an
© 2005 by CRC Press
2-4
The Industrial Communication Technology Handbook
algorithm for flow control is provided. The most popular type of connection is an error-free point-topoint connection where messages are delivered in the same order they were sent. Additionally, messages with no guaranteed order can be sent. It is also possible to send messages not only to one, but to multiple destinations, or to send broadcast messages. The transport layer establishes and terminates connections across the network. Therefore, the need for a naming mechanism arises, allowing processes to choose with whom they converse.
2.1.5 The Session Layer Layer 6 organizes and synchronizes the data exchange for two application layer processes. It sets up and clears the communication channel for the whole duration of the network transaction between them, and therefore sets up sessions between users on different machines. A session might be used to log into another machine in a remote time-sharing environment or to transfer a file. The session layer provides interaction management (also called dialogue control). Data can be exchanged using duplex or half-duplex connections. A duplex connection transfers data both ways simultaneously. A half-duplex connection can transfer either one way or the other, where the session layer decides which party is allowed to use the link. Another task of the session layer is the token management. It is useful when both sides are not allowed to perform the same operation at the same time. To schedule these operations, a token is issued only to one process at each given time, allowing only the process that holds the token to perform the critical task. For big data transmissions, synchronization points can be set periodically. If the network connection fails, the transmission is restarted at the last synchronization point set. Thus retransmission of the whole data can be avoided. Nonrecoverable exceptions during transmissions are reported to the application layer.
2.1.6 The Presentation Layer The main task of the presentation layer is the representation of data, e.g., integers, floating point numbers, or character strings. Therefore, the syntax for these data containers is defined. As different computers may use varying internal data representations (for example, for characters or numbers, a conversion has to be done), the data sent are converted to an appropriate transfer syntax and are transformed back to the receiver’s internal data format upon receipt. Those converters for the syntax of data do not necessarily have to understand the semantics. This layer may also provide services for data encryption and data compression.
2.1.7 The Application Layer This layer provides services not to other layers, but directly to application programs. Thus, there is no specific service in this layer, but there is a distinct combination of services offered to each application process. As the connected hosts may use different file systems, the application layer handles the differences and avoids incompatibilities. Therefore, this layer provides the means for network-wide distributed information services. This allows the application processes to transfer files, send e-mails, or perform directory lookups. Furthermore, the application layer provides services for the identification of intended communication partners, to check the availability of an intended communication partner, to verify communication authority, to provide privacy services, to authenticate communication partners, to select the dialogue discipline, to reach an agreement on the responsibility for error recovery, and to identify the constraints on data syntax [HAL1996].
2.2 The TCP/IP Reference Model TCP/IP (Transmission Control Protocol/Internet Protocol) was first used when the ARPANET was emerging. This network was developed for use by the U.S. Armed Forces. Therefore, it was required that even when some parts of the network were destroyed during battle, it should still provide communication
© 2005 by CRC Press
2-5
IP Internetworking
Host A
Host B Application Layer
Interface Transport Layer Interface Internet Layer Interface Host-to-Network Layer
Physical Link (e.g., Cable)
FIGURE 2.2 TCP/IP reference model. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)
services. As long as two hosts were still functioning and there was any path available between them, communication must be possible. Another important issue was the ability to connect multiple different networks together no matter the underlying protocols, physical transport medium, or provided bandwidth. The TCP/IP reference model is structured similar to the ISO/OSI model introduced in Section 2.1, but it consists of only four layers. A comparison of both models is done later in this chapter. The lowest layer is the host-to-network layer, where the Internet layer is attached. Above the transport layer is located the highest layer, the application layer (Figure 2.2) [SCO1991]. The next sections give an overview of the services provided by the TCP/IP model layers, starting with the lowest one.
2.2.1 The Host-to-Network Layer TCP/IP does not specify services or operations at the host-to-network layer. It is only required that the host can somehow connect to the network to enable the Internet layer to send packets. As this layer is not defined, the implementation can vary on each system. The network service may be provided by Ethernet, Token Ring, asynchronous transfer mode (ATM), wide area network (WAN) technologies, wireless technologies, or any other means of transferring network packets.
2.2.2 The Internet Layer The main features of the Internet layer are addressing, packet routing, and error reporting. Additionally, services for fragmentation and reassembly of packets are provided [HAL1996]. The core protocols at the Internet layer are the Internet Protocol (IP) [RFC791], the Address Resolution Protocol (ARP) [RFC826], the Internet Control Message Protocol (ICMP) [RFC792], and the Internet Group Management Protocol (IGMP) [RFC3376]. The Internet Protocol is concerned with packet routing, IP addressing, and the fragmentation and reassembly of packets. It is a packet-switching protocol based on a best-effort connectionless architecture. Packets travel independently from each other from source to destination host. Each packet may be routed differently through the network; thus packets may be delivered in a different order than they were sent. Packets may also be lost, because delivery is not guaranteed. To be able to route packets across the network, each host has to know the location of a gateway or a router. The gateway decides which path a packet has to travel. For this reason, a routing table is maintained at the Internet layer. To send packets across networks that only support small packet sizes, the packets are broken down in size at the source host and are assembled again at the destination host.
© 2005 by CRC Press
2-6
The Industrial Communication Technology Handbook
ARP [RFC826], the Address Resolution Protocol, translates network layer addresses to link layer (hardware) addresses. Thus, an IP address is translated to, for example, an Ethernet address. The Internet Control Message Protocol (ICMP) [RFC792] is concerned with datagram error reporting and is able to provide certain information about the Internet layer. The Internet Group Management Protocol (IGMP) [RFC3376] is used to manage IP multicast groups [RFC1122].
2.2.3 The Transport Layer The transport layer provides stream and datagram communication services. Protocols specified at this layer are the Transmission Control Protocol (TCP) [RFC793] and the User Datagram Protocol (UDP) [RFC768]. Both protocols deliver end-to-end communication services (i.e., message transfer). The Transmission Control Protocol is a connection-oriented and reliable point-to-point communication service [RFC793]. A data stream is sent to any other host in the Internet without errors. This data stream is broken down into messages and handed down to the Internet layer. TCP sets up and terminates the connection, and it sequences and acknowledges the packets it sends. It is also responsible for retransmitting packets lost during transmission. Also, a service for flow control is implemented, thus avoiding a receiver being flooded by a faster sender. The User Datagram Protocol is a connectionless, unreliable communication protocol [RFC768]. Thus sequencing or flow control is not provided. It is used when prompt delivery of packets is more important than error-free transmission, like demanded for the transmission of video or audio content. Compared to TCP, there is no connection establishment, no connection state, a smaller packet overhead, and an unregulated send rate [KUR2001].
2.2.4 The Application Layer This layer provides services to application processes. It accesses services of the transport layer and allows processes at different hosts to communicate with each other using a variety of protocols. These include the Hypertext Transfer Protocol (HTTP) [RFC2616] to send and receive files that make up Web pages. Also, protocols for sending electronic mail, the Simple Mail Transfer Protocol (SMTP) [RFC821], and interactive file transfer, the File Transfer Protocol (FTP) [RFC959], are implemented at this layer. Another provided and often used service is Telnet, which is a terminal emulation protocol [RFC854]. It enables the user to log on to remote hosts. To access news articles at virtual blackboards, the Network News Transfer Protocol (NNTP) [RFC977] is provided. Additionally, protocols for the management of TCP/IP networks are available at this layer. The Domain Name Service (DNS) [RFC1034, RFC1035] resolves a host name to an IP address. Network management, including the collection and exchange of management information, is facilitated by the Simple Network Management Protocol (SNMP) [RFC1157]. Besides these basic protocols, a wide variety of other protocols are implemented for use at the TCP/ IP application layer. An overview of the assignment of the protocols mentioned in this section to the respective layer is given in Figure 2.3.
2.3 Reference Model Comparison Both reference models described above are based on a layered approach. Also, the layers provide quite similar services. The application layer of the TCP/IP model corresponds to the application layer of the ISO/OSI reference model. Presentation and session layers are not present in the TCP/IP reference model. Thus, in the TCP/IP model, services provided by these two layers have to be performed by the application process itself. The two transport layers perform similar services. The next layer in the TCP/IP model, the Internet layer, is equivalent to the network layer in ISO/OSI. The data link layer and the physical layer of the OSI reference model are represented by the host-to-network layer in the TCP/IP model (Figure 2.4). In both models, layers above the transport layer are application dependent [TAN1996].
© 2005 by CRC Press
2-7
IP Internetworking
HTTP
SMTP
FTP
NNTP
TCP
SNMP
RTP
ICMP
IP
WLAN
IGMP
Token Ring
Application Layer
Transport Layer
UDP
ARP
Ethernet
DNS
ATM
Internet Layer
Host-toNetwork Layer
FIGURE 2.3 TCP/IP architecture. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)
ISO/OSI
TPC/IP
7
Application Layer
Application Layer
4
6
Presentation Layer
5
Session Layer
4
Transport Layer
Transport Layer
3
3
Network Layer
Internet Layer
2
2
Data Link Layer Host-to-Network Layer
1
1
Physical Layer
FIGURE 2.4 ISO/OSI vs. TCP/IP. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)
The ISO/OSI model is mainly a conceptual model; it is an example for a universally applicable structured network model. It introduced three main concepts, which were also obeyed when the TCP/ IP reference model was developed. Each layer provides exactly defined services to the layer above and uses services of the layer below. These services are accessed using interfaces specifying which parameters are expected and the results returned. Protocols defined in each layer communicate with their peer at the remote host, independent from the underlying network structure. These ideas in both protocols are similar to object-oriented software development [TAN1996]. In the beginning, the TCP/IP model did not strictly separate between services, interfaces, and protocols; these concepts have been introduced later. Thus, protocols in the ISO/OSI model are better encapsulated than those in the TCP/IP model, and it is easier to alter services in the ISO/OSI reference model [TAN1996]. As the ISO/OSI model was developed before the respective protocols and their implementation, it was possible to easily distinguish between services, interfaces, and protocols for each layer. The developers were able to choose the appropriate number of layers such that each one could perform only a distinct set of matching services. For TCP/IP, the protocols were developed first, and afterwards, the
© 2005 by CRC Press
2-8
The Industrial Communication Technology Handbook
abstract model was created. The problem with this approach was that the model did not fit to any existing protocol stacks [TAN1996]. Finally, there are differences at the area of connectionless vs. connection-oriented communication. The ISO/OSI reference model provides services for both kinds of communication at the network layer, but only connection-oriented services at the transport layer. The TCP/IP reference model supports both connectionless and connection-oriented communication at the transport layer, but only connectionless services at the network layer [TAN1996]. In the following, the most important TCP/IP models and services, their functionality, and their position in the OSI stack will be described.
2.4 Data Link Layer Protocols and Services In the OSI model, the data link layer is situated at layer 2, and in the TCP/IP reference model, at the host-to-network layer. Its purpose is to offer services to OSI layer 3 in the way that protocols at layer 3 may send data to neighboring computers (i.e., computers directly connected via a network link or via layer 1 or 2 repeaters, bridges, hubs, or switches) in a reliable way. The data link layer may offer one of the following services to layer 3: • Using an unacknowledged connectionless service, no measures are taken to detect lost packets by the sender or the receiver. • In acknowledged connectionless services, the receiver must acknowledge the data it received by sending back an acknowledgment to the sender. If the sender does not receive the acknowledgment after a certain amount of time, it assumes that the data were lost and retransmits them. • In connection-oriented services, the data link layer must first create a (possibly virtual) path between sender and receiver before data can be sent. Furthermore, the data link layer adds sequence numbers to the sent data units in order to detect lost or erroneous data units. The bit-error rate of modern wire-line (electrical or optical) local area network (LAN) interconnections is too low to justify the additional effort for virtual path creation at this level. In LANs, therefore, usually acknowledged (Token Ring) or unacknowledged (Ethernet) connectionless services are used at layer 2. Lost packets or packets delivered out of order are then often to be detected at layer 4 or even higher. Wireless networks, however, may severely suffer from lost packets or high bit-error rates. Under these conditions, sophisticated data link layer protocols like IBM’s Synchronous Data Link Control (SDLC), or the closely related ISO norm High-Level Data Link Control (HDLC) and the CCITT recommendation Link Access Procedure (LAP), or the IEEE 802.2 norm Logical Link Control (LLC), are often used.
2.4.1 Frame Creation One major task of layer 2 is to pack the data it receives from a higher layer for transfer into so-called frames, i.e., data packets, which are then modulated onto the physical network medium. This is done in a way that the desired receivers are able to (1) detect that a frame has been sent, (2) decode the frame reliably and retrieve the sender and receiver addresses, and (3) identify those frames, which are meant for themselves. Frames are nothing more than a sequence of bits modulated onto a carrier. In order to be able to decode information stored in a frame, a receiver first has to be able to identify the first bit of a frame. This start bit then is usually followed by a specific sequence of bits containing frame information like the frame length, the content type, checksums, etc. A simple method for finding the frame start is given by bit stuffing. In protocols like X.25, the start of a frame is signaled by six 1s. If the data transported in the frame also contain six 1s in a row, then after the fifth 1, a 0 has to be inserted by the data link layer. The receiving data link layer then knows that if it receives five consecutive 1s followed by a 0, the sender must have inserted the 0, and therefore removes it. Another method for synchronizing senders and receivers at the bit level is given by sending sync bytes, as, for instance, is done in Digital Video Broadcast (DVB) [REI2001]. Here, data frames are 204 bytes long
© 2005 by CRC Press
IP Internetworking
2-9
and contain a certain value (0¥47) always at the same position (sync byte). The task of a sync-byte detector is to detect the regular occurrence of this value every 204 bytes. If this value is detected five times, then sender and receiver are synchronized and the receiver may easily compute the frame start from it. Other methods include octet counting or octet stuffing and will be described in the context of application protocols. Once the start bit of a frame is identified, a network card may determine whether it is the receiver of the sensed frame. In the IEEE 802 standard, each network interface card is assigned a unique 6-bytelong media access control (MAC) address, and each sent frame starts with the MAC address of the destination network card. Thus, each network card receiving a frame just compares the first 6 bytes of the frame with its own MAC address, and if they are equal, it passes the frame on to the next higher layer for further processing.
2.4.2 Error Detection and Correction Sending data over certain media types is often unreliable and may be severely disturbed by external disruptions, causing data to be lost or wrongly received. Thus, another important task of the data link layer is either to correct corrupted frames or at least to detect the occurrence of bit errors. In order to detect or correct bit errors, the sender must add checksum information, which is additional to the transported headers and user data. The more information is added, the more wrong bits may be detected or even corrected. A popular method for error detection and correction is given by Hamming codes. Here, certain code words are sent, which are different at a specific number of bits, the Hamming distance. For instance, a code containing the words 000111, 111000, 000000, and 111111 has the Hamming distance 3; i.e., code words differ at at least 3 bits from each other. In order to identify that d bits have been changed during transmission, a Hamming distance of d + 1 is required. If the receiver has to be able to correct d bits, then a Hamming distance of 2d + 1 must be kept. The code above thus is able to detect two wrong bits and correct one wrong bit. A simpler way for detecting wrong bits is given by the parity bit. Here, only one bit is added to each code word, counting the number of 1s in the word. If this number is even, the parity bit is set to 0; otherwise, it is set to 1 (or vice versa). Codes with parity bits may detect one wrong bit per code word only. A more sophisticated error detection code is give by the cyclic redundancy check (CRC) code. Here, each sequence of bits is treated as a polynomial over the field of binary numbers (modulo 2). The number 101, for instance, is treated as the polynomial x2 + 1. Modulo 2 means that each addition of single bits is treated as an exclusive-or (XOR) operation, i.e., 0 + 0 = 0, 0 + 1 = 1 + 0 = 1, and 1 + 1 = 0. For CRC codes, a fixed polynomial is chosen, called generator polynomial G(x). If a code word W(x) is to be sent, it is replaced by another polynomial R(x), which can be divided by G(x) with rest 0, and from which the original code word W(x) can be reconstructed. The polynomial R(x) is then transmitted and received. If the received R(x) can be divided by G(x) without rest, then the transmission has been error-free with high probability. Otherwise, bit errors are detected and the transmitted code word is dropped. Other error correction techniques include Reed–Solomon codes and convolutional codes, but these will not be treated here.
2.4.3 Media Access Control An important part of the data link layer is given by the media access control (MAC) sublayer. At this sublayer, the access to the physical medium is controlled, which may be shared by several senders concurrently. Depending on the media type, one or several senders may transmit data at the same time. In case of conflicts, several techniques exist in order to grant the right to use the medium. 2.4.3.1 ALOHA The ALOHA technique, developed at the University of Hawaii, allows all senders to send their data to a commonly shared broadcast medium whenever they wish. In a broadcast medium, the data sent by one
© 2005 by CRC Press
2-10
The Industrial Communication Technology Handbook
host are received by all others listening to the same medium. In case of collisions due to the concurrent sending of two or more senders, the colliding frames are discarded and must be sent anew. 2.4.3.2 CSMA/CD For carrier-sense multiple access/collision detection (CSMA/CD), as, for example, implemented in Ethernet, several network cards share the same broadcast medium (e.g., an electrical wire). Each network card listens to the medium (carrier sense), and if no signal is detected, then a new sender may use the medium immediately. Due to the limited speed of signals, two or more senders may send simultaneously without noticing each other in time, resulting in collisions. At such an instance, all colliding frames are discarded and each sender waits for a random amount of time until it tries to send again. 2.4.3.3 TDMA In time-division multiple access (TDMA), time is divided into time slices, and each sender is granted one slice where it may send its data into the medium. Here, bandwidth may be wasted as senders own their time slice, whether they have something to send or not. 2.4.3.4 FDMA In frequency-division multiple access (FDMA) several sending frequencies exist, and for each frequency, one sender may transmit without fearing interference from other frequencies. For example, GSM (global system for mobile communication) uses a mixture of TDMA and FDMA for its calls. Additionally, GSM terminals change their frequency according to a fixed scheme (frequency hopping). 2.4.3.5 CDMA The concept of code-division multiple access (CDMA) is fundamentally different from the previous concepts. Here, each sender is assigned a unique bit sequence of length N called chip. Each sent bit is then added (modulo 2) to all chip bits, yielding the chip if a 0 is to be sent, or the inverse chip if a 1 is to be sent. If a terminal wants to transmit R bits per second (bps), then R chips have to be transferred per second, making necessary a much higher bandwidth of R ¥ N bps in total. Thus, the necessary frequency band is broadened significantly. In essence, the signal is spread over a broad spectrum and the chip is thus often called spreading sequence. In CDMA, senders with different chips can send concurrently and do not disturb the reception of other signals. This works because different chips are mathematically orthogonal to each other with respect to the inner products of chips (which can also be interpreted as bit vectors) and their inverse. Also, due to the use of a broader spectrum, the reconstruction of the signal is more robust with respect to other noise sources.
2.5 Network Layer Protocols and Services 2.5.1 IPv4 The term Internet Protocol (IP) usually denotes IP version 4 (IPv4), which has been specified in [RFC791] and [RFC1122] and is the established standard protocol for the Internet at layer 3 of the ISO/OSI reference model and at the Internet layer of the TCP/IP reference model. The task of IP is to transport a packet from one source computer to a destination computer, where both computers are interconnected by an internet. Here, internet denotes any (possibly privately managed) heterogeneous network that is interconnected using IP and IP-based routers. In contrast, the Internet denotes the well-known worldwide IPbased network interconnecting millions of computers and being managed by network information centers (NICs) and Internet service providers (ISPs). When traveling through an internet, a packet may pass by several intermediary networks with different network technologies, for instance, Ethernet, Token Ring, ATM, etc., used at layers 1 and 2. At the border between two different networks, the packet’s destination network address is examined by a router, i.e., a computer that is connected to both networks and that is able to select other routers in the path between sender and receiver, or to find the receiver in its own network. Routing decisions are usually made using
© 2005 by CRC Press
2-11
IP Internetworking
TABLE 2.1
IP Network Classes
Class
Most Significant Bits
Network Address
Host Address
A B C D E
0 10 110 1110 11110
7 bits 14 bits 21 bits 28 bits Reserved
24 bits 16 bits 8 bits 0 bits Reserved
Source: From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.
predefined and regularly updated routing tables. However, the next chosen router is by no means fixed and may depend on runtime situations like congestion or link failures, or it may simply be chosen at random. As a consequence, packets may travel through different paths from sender to receiver, and neither the delivery itself nor the original order can be guaranteed. IP packets are called datagrams, which may have a total length of 65,535 bytes. Datagrams may be cut into a sequence of smaller datagrams, if the datagram size is larger than the network’s maximum transfer unit (MTU), i.e., the largest OSI layer 2 frame that may be transmitted by the network. For Ethernet, for instance, the MTU is 1500 bytes. This process is called fragmentation, and the IP header reflects several fields for reassembling such fragments into the original datagram again. Each datagram or fragment is led by a 20-byte header containing the following information: • • • • • • • • • •
The version number of the IP (4). The IP header length (IHL), which may be larger than 20. The total length of the datagram, including header. An identification number for reassembling fragmented datagrams. All fragments with the same ID belong to the same datagram. Flags, including the don’t fragment (DF) flag, flagging that the datagram should not be fragmented, and the more fragments (MF) flag, signaling that more fragments are still to come. A fragment offset identifying the offset of the received fragment in the whole datagram. The time-to-live (TTL) counter, which is decreased by one by each router. A datagram with TTL equal to zero is discarded. This prevents faulty datagrams from circling through the Internet forever. A number identifying the used transport protocol (6 for TCP, 17 for UDP, …). A header checksum. The IP source and destination addresses.
An important aspect of IPv4 is given by the 32-bit-long IP addresses. The written form follows the dotted decimal notation scheme X1.X2.X3.X4, where the Xi are decimals between 0 and 255. Each address starts with an address class identifier, and then is followed by the network address, and finally by the host address. There are different network classes, as shown in Table 2.1. Each network card attached to the Internet must have a unique IP address. The address assignment scheme is a two-step strategy. First, each site managing a network connected to the Internet is assigned a unique network address by a central authority called the network information center (NIC). Then each site may assign the unique host addresses belonging to this network address, which may include 224 – 2 = 16,777,214 addresses (class A), 216 – 2 = 65,534 (class B), or 28 – 2 = 254 (class C) unique host addresses. IP defines a set of private addresses that may be used freely, but whose traffic should not be routed over the Internet without modification [RFC1918]. The three address blocks are: • 10.0.0.0 to 10.255.255.255 (one class A network) • 172.16.0.0 to 172.31.255.255 (16 contiguous class B networks) • 192.168.0.0 to 192.168.255.255 (256 contiguous class C networks)
© 2005 by CRC Press
2-12
The Industrial Communication Technology Handbook
Source
MR1 MRS
Recv. 1
Recv. 2
FIGURE 2.5 Unicast.
Multicast addresses are special addresses reserved for groups of hosts receiving the same multimedia program via multicast from a single source [RFC1112]. Multicast addresses may range from 224.0.0.0 to 239.255.255.255; details about multicasting are described in Section 2.5.2. Two host addresses are reserved in each (sub)network. The host address 0 denotes the network itself; the highest possible host address denotes a broadcast address that is received by all hosts of a given network.
2.5.2 IPv4 Multicasting The normal mode of communicating via IPv4 is unicast; i.e., one sender sends data to one receiver. Another possible transfer mode inside a subnet is broadcast. In this case one sender sends data to each node of the subnet, regardless of whether the node is interested in the data or not. IP broadcasting, however, does not work beyond the respective subnet boundaries. The third mode of communication is called multicast. Here one sender sends data to a well-defined group of nodes, which may be attached to the same subnet or attached to some other subnet that can be reached via the Internet. Nodes that do not belong to this group do not receive or ignore the sent data. The main advantage of multicasting can be seen in Figure 2.5 and Figure 2.6. A host sending a data packet to a group of N receivers in unicast mode (Figure 2.5) must send the data N times, once for each receiver, thus causing significant traffic and CPU overhead at the source. When using multicast (Figure 2.6), the host sends the data only once, and somewhere in between the source and the receivers, multicast routers duplicate the data packets (as done by MR1 in Figure 2.6) and pass them on to the interested receivers. This way, the source sends each packet only once, reducing traffic for the source itself and for the links between sender and receivers. It must be noted that in a multicasting network all routers must be able to route multicast traffic. If pure unicast routers are present, then multicast traffic must be embedded into unicast traffic, resulting in tunnels, as, for instance, is necessary for the Internet MBone example described below. Pure unicast traffic, however, can be routed by unicast and multicast routers. Source
MR1 MRS
FIGURE 2.6 Multicast.
© 2005 by CRC Press
Recv. 1
Recv. 2
2-13
IP Internetworking
Internet
FIGURE 2.7 MBone multicast islands.
MR1
MR2
Unicast Routers FIGURE 2.8 Tunnel between two multicast routers MR1 and MR2. Logically, MR1 sends multicast traffic directly to MR2. Physically, the data are transported in the payload section of unicast packets.
2.5.2.1 MBone Most of the existing Internet routers either are not able to route multicast traffic, or this ability has not been activated. If a multicast data packet is received by such a pure unicast router, the packet cannot be routed and therefore is discarded. In contrast, the Internet multicast backbone (MBone) is a set of Internet routers that are able to route multicast data and that collaborate with each other. Each of these routers is also attached to a multicasting-enabled subnet; thus, the MBone makes up a set of interconnected multicast islands (Figure 2.7). MBone routers act at two levels. At the usual unicast level, they are standard Internet routers, able to communicate with all other Internet routers via unicast. At the multicast level, they logically send multicast traffic or multicast routing information only to other members of the MBone. As between two different MBone routers physically there may be an arbitrary number of pure unicast routers, multicast data and usually also routing information are sent inside unicast tunnels (Figure 2.8). This means that if a multicast router sends a multicast packet PM toward its receivers, it creates a new UDP unicast packet PU and puts the whole multicast packet PM (including its IP/UDP headers) into the data section of the UDP packet PU . The UDP packet PU is then sent to the next multicast router via unicast. For tunneling, usually IP in IP [RFC1853] is used, but the more general generic route encapsulation (GRE) [RFC2784] may also be used. The MBone is a so-called overlay network on top of the Internet, because the MBone routers together with the tunnels form up a second smaller logical network above the Internet, which at the multicast level is not necessarily aware of the lower-level Internet structure and all its unicast routers. At the multicast level, only the MBone multicast routers and tunnels (connections between the MBone routers) are visible. Nowadays, the MBone consists of thousands of multicast islands being interconnected via tunnels, and users attached to a multicast island may multicast audio and video transmissions to all other users connected to the MBone worldwide. 2.5.2.2 IPv4 Multicast Addressing Like in unicast, when sending a multicast UDP packet, the destination address field of the IP header represents the nodes that receive the packet. However, this destination address must be a class D IP multicast
© 2005 by CRC Press
2-14
The Industrial Communication Technology Handbook
TABLE 2.2
IPv4 Multicast Addressing Scheme
Start
End
Description
224.0.0.0 224.0.1.0 232.0.0.0 233.0.0.0 239.0.0.0
224.0.0.255 238.255.255.255 232.255.255.255 233.255.255.255 239.255.255.255
Routing protocols (e.g., DVMRP, topology discovery, etc.) Either permanently assigned or free for dynamic use Source-specific multicast (SSM) GLOP Administratively scoped IP multicast
address, also called group address. Thus, a multicast packet is always sent to a group of hosts rather than to a specific host. Table 2.2 shows parts of the Internet multicast addressing scheme [IANAM, ALB2004]. It can be seen that some parts of the addressing range are reserved, for instance, for routing protocols, etc.; some are reserved for static multicast groups, which are defined permanently; and some are reserved for different multicast address assignment schemes. For sending a multicast to a transient group (one that is created and destroyed again), the sender must obtain an unused multicast address. Unfortunately, there is no central authority for assigning such an address. Thus, users must either arbitrarily take an address from one of the free address ranges and hope that no one else uses it, or use tools like sd or sdr (see Section 2.5.2.5), which are able to suggest unused addresses. Alternatively, senders may use global scope multicast addresses (GLOP) [RFC2770] or multicast address-set claim (MASC) [RFC2909] for obtaining such an address. Finally, there is a range of multicast addresses that are devoted to limiting their scope within a hierarchically set scheme rather than with the somewhat crude TTL mechanism (explained in Section 2.5.2.4). These are called administratively scoped [RFC2365] addresses; i.e., a large company or institution may limit the set of multicast routers that may receive the sent traffic to their subnets, but not beyond. 2.5.2.3 Local Multicast Hosts wanting to receive multicast data must first join the respective group that will receive the data. If the casting is restricted to a specific LAN, then a receiver at least needs to implement the Internet Group Management Protocol (see Section 2.5.6). It must provide the functions JoinHostGroup(group-address, interface) and LeaveHostGroup(group-address, interface) for its IP service interfaces [RFC1112]. With IGMP, the host joins a group at the IP level and informs its local multicast router that it wishes to receive data sent to this group. The two interface functions instruct each network interface card that it should either join or leave a multicast group at the data link layer (ISO/OSI layer 2). When sending a multicast packet in a LAN, it is advisable to use the existing multicasting capabilities of the used LAN data link layer technology, which are often available additionally to unicast and broadcast. This means that inside a LAN, multicast data should be handled by layer 2 only, rather than layer 3. For instance, a multicast IP address (4 bytes) can be mapped to a unique IEEE 802 (e.g., Ethernet, FDDI, etc.) MAC layer multicast address (6 bytes). For this purpose, the IANA [IANA] has been assigned the IEEE 802 MAC address block from 01-00-5E-00-00-00 to 01-00-5E-FF-FF-FF for the sole use of IP multicasting. For mapping the IP multicast address to the corresponding MAC multicast address, the least significant 23 bits of the IP multicasting address are added to the IANA MAC multicasting base address 01-00-5E-00-0-00. As an IP class D address (32 bits) starts with 4 fixed bits (see Table 2.1), leaving 28 bits free to choose, 5 bits of an IP multicast address are ignored in this mapping, leading to the fact that 25 = 32 IP multicast addresses are always mapped to the same MAC multicast address. The procedure for the transmission of multicast traffic sent in the same LAN is simple. The sender sends the data to a specific IP multicast address AI, which is mapped to the corresponding MAC multicast address AM, and the destination MAC address of each sent frame is set to AM. If a network interface card is instructed to receive multicast sent to the IP multicast address AI (via call to JoinHostGroup), the IP multicast address is again mapped to the same MAC multicast address AM. Once the network interface card detects a frame having the very multicast MAC address AM as the destination address, it accepts the frame and passes it on to layer 3.
© 2005 by CRC Press
2-15
IP Internetworking
TABLE 2.3
Connection between TTL and Scope
TTL
Scope
128 64 48 16–32 1–16
Low-speed tunnels Intercontinental International (within the continent) National (depending on the links involved) Within institution
A call to LeaveHostGroup deletes this association at the receiver. From there on, received multicast frames sent to the group will be ignored. 2.5.2.4 Multicast Routing If multicast packets should be received outside their own LAN, things become more complicated. Whether packets should be sent beyond their own LAN via the local multicast router is in principle determined by the TTL field of the sent packet. Similar to unicast, this field is decremented by one by each router it passes by. Once it reaches zero, it is dropped. This automatically prevents packets from circulating through the Net forever due to incorrect routing tables and also provides scoping, i.e., a way for defining how far the sent packets may travel. For instance, if a packet should be received by hosts being attached to the same LAN only (and nowhere else), the TTL must be set to 1; if packets should be received only by hosts situated on the same continent as the sender, the TTL must be set to 48. Other values for the TTL limit the scope to certain areas centered around the sender (Table 2.3). If TTL is greater than one, then the local multicast router must forward the packet to each multicast router it is connected to. On the MBone this means the packet is sent over each tunnel going out of the local multicast router. As the sender does not know who the other members of the multicast group (i.e., the receivers) are, each multicast packet should be sent to all multicast routers of the MBone (i.e., flooding the whole network) in order to make sure that all group members get the sent data. However, this would lead to a drastic overload of the multicast network, and usually routing protocols exist that minimize the traffic and yet guarantee that each member of the multicast group will receive each packet that is sent to the group, for instance, the distance-vector multicast routing protocol (DVMRP) [RFC1075, PUS2003], multicast extensions to open shortest path first (MOSPF) [RFC1584], or protocol-independent multicast (PIM) [ADA2003, RFC2362]. 2.5.2.5 Multicast Applications Several tools have been created for creating, managing, and receiving multicast traffic over the MBone. For initializing and joining multicast sessions, the tools Session Directory (sd or sdr) or Multikit can be used. Sdr shows multicast programs currently being sent or scheduled for the future. It can also be used for obtaining an unused multicast address and announcing a multicast session to be scheduled for the future. When sessions are joined, sdr will launch the appropriate tools for presenting the program. This can be video tools like vic (video conferencing) or nv (network video), or audio tools like vat (visual audio tool) or rat (robust audio tool). Telephony is done via Free Phone (fphone), and a whiteboard application is given by wb. Other examples for multicast tools include text tools like the Network Text Editor (nt) and a polling tool (mpoll).
2.5.3 IPv6 The Internet Protocol version 6 (IPv6) has been designed for replacing the old IPv4 in the next-generation Internet [RFC1883, RFC1887]. It represents a totally new approach and is incompatible with version 4. As most Internet hosts and routers still only support IPv4, IP packets following IPv6 often cannot be transported from sender to receiver without further modification. Usually, when leaving the IPv6 subnetwork of the sender, IPv6 packets are tunneled over IPv4, i.e., transported in IPv4 packets, where the whole IPv6 packet is treated as pure IPv4 data.
© 2005 by CRC Press
2-16
The Industrial Communication Technology Handbook
The header has been simplified and contains only 7 fixed fields (the IPv4 header includes 13): • • • • •
A version field containing the value 6. A priority field distinguishing between data and real-time traffic. A flow label for supporting pseudo end-to-end connections with guaranteed QoS. The payload length specifies the size of the data contained in the packet. The next header points at the next optional header or an ID for the used transport protocol (TCP or UDP). • The hop limit is decreased by each passed-by router; a packet with zero hop limit is discarded. This prevents faulty packets from circling through the network forever. • Finally, the 16-byte source and destination addresses are contained. IPv6 offers the following enhancements with respect to IPv4: • Addresses are 16 bytes long, written in groups of four hexadecimal digits separated by colons (e.g., 8000:0000:1111:2222:3333:4444:ABCD:EFFF). This solves the shortage of IPv4 addresses caused by the exponential growth of the Internet. Even when wasting a lot of such addresses due to the inefficient use of network addresses, thousands of IP addresses could be assigned to each square meter of the Earth’s surface. • New address classes exist, including addresses for Internet service providers and geographical regions. • Due to the simpler header, routing is made more efficient. Additionally, IPv6 supports an arbitrary list of options that may be skipped by routers that do not support them. • IPv6 supports authentication and encryption. • IPv6 supports QoS for real-time applications. Of course, multicasting is also an intrinsic capability of IPv6 but will not be treated here. For more information, see [RFC2373] and [RFC2460]. Even though IPv6 offers substantial advantages, its implementation is costly and requires buying new routers and reconfiguring existing hosts. For these reasons, IPv4 still is the Internet Protocol today, and IPv6 will not dominate the Internet until the year 2010 or even later.
2.5.4 Address Resolution Protocol The Address Resolution Protocol (ARP) defined in [RFC826] and its complement, the Reverse Address Resolution Protocol (RARP) defined in [RFC903], are a means for connecting OSI layer 2 addresses to their corresponding layer 3 IP addresses. Basically, computers communicate with each other by sending messages on the data link layer (and subsequently the physical layer), for instance, by sending an Ethernet frame over an Ethernet variant. On this level all network cards following IEEE 802 are identified by globally unique 6-byte-long identifiers called MAC addresses. In order to successfully send an Ethernet frame, each sending network card must put both its own MAC address and the MAC address of the receiving card into the Ethernet frame. If too many computers are connected by a single layer 2 network (possibly via hubs, bridges, or switches), senders often know only the IP address of a receiver. However, for Ethernet cards, IP addresses are meaningless. In such situations, ARP can be used to find out the MAC address of a network card, which at a higher layer is bound to a given IP address. If computer A wants to find out the MAC address of a network card on computer B, which according to its IP address belongs to the same layer 2 subnet, then on computer A ARP is automatically activated. At first, computer A looks into a small ARP cache to find out if the desired binding is already stored there. If not, computer A generates an ARP request message (who is B.B.B.B tell A.A.A.A, where B.B.B.B is the IP address of computer B and A.A.A.A the IP address of computer A), which is no more than a special Ethernet frame containing the following information: • Ethernet protocol type is set to 0¥806.
© 2005 by CRC Press
IP Internetworking
• • • •
2-17
Sender MAC address. Sender IP address. Receiver MAC address is set to the Ethernet broadcast address FF:FF:FF:FF:FF:FF. Receiver IP address.
As the receiver in this Ethernet frame is the broadcast address, all network cards connected to the same subnet will receive this request, including computer B. Upon the reception of the ARP request message, computer B will then activate its own ARP, which will immediately send an ARP response message (B.B.B.B is HH:HH:HH:HH:HH:HH, where B.B.B.B is the IP address and HH:HH:HH:HH:HH:HH is the MAC address of the network card of computer B). Once the ARP response message has been received by computer A, computer A will store this IP-MAC address binding for computer B in its ARP cache and may start sending Ethernet frames to computer B. In order to avoid outdated ARP caches, these caches are periodically emptied. The purpose of RARP is to let computers find out their IP addresses upon start-up, in case they only know their MAC addresses. This can be the case, for example, for diskless workstations, which automatically attach to a server, or for workstations with identical disk images (which do not require manual setup). RARP works in a manner similar to that of ARP, except that the protocol type value is set to 0¥8035. Also, a router is required that contains a table with the MAC-IP bindings. Alternatives to RARP are given by the Bootstrap Protocol (BOOTP) or the Dynamic Host Configuration Protocol (DHCP), which allow the resolution of IP addresses in a more flexible way.
2.5.5 Internet Control Message Protocol The Internet Control Message Protocol (ICMP), defined in [RFC792] and [RFC1122], is used for automatically sending control signals and commands between computers attached to an IP network. Also, ICMP messages can be used for testing connections and measuring interconnection performance. ICMP messages are sent as special IP packets and thus can be handled by routers. As a consequence, ICMP messages can be sent to or received from arbitrary computers connected with each other over an IP network. An ICMP message contains the following data: • • • •
The type defines the purpose of the ICMP packet. There are over 30 different ICMP types. The code further defines the packet’s purpose. A header checksum. The rest of the packet may then contain further data depending on the ICMP type.
The most important ICMP packet types are: Echo Request: When receiving such an ICMP packet, the receiver should answer with an ICMP Echo Reply packet. Echo Reply: Answer to an ICMP Echo Request packet. Time Stamp Request: The same as Echo Request, except that the receiver answers with a Time Stamp Reply packet, which holds additional time stamps. Time Stamp Reply: The answer to an ICMP Time Stamp Request, which holds the time points at which the Time Stamp Request was received and the Time Stamp Reply was sent back. Destination Unreachable: This message is returned by a router to the source host to inform it that the destination of a previously sent packet cannot be reached. Time Exceeded: Sent from a router to a source host to inform it that the lifetime of a previously sent packet has reached zero. Parameter Problem: Sent to a source host to inform it that a previously sent packet contains invalid header data. Source Quench: Sent to a source host to inform it that due to insufficient bandwidth, it should lower its sending bit rate.
© 2005 by CRC Press
2-18
The Industrial Communication Technology Handbook
2.5.6 Internet Group Management Protocol The Internet Group Management Protocol (IGMP) is defined in [RFC2236] and is used by IP hosts to inform multicast routers in their LAN about their multicast group memberships (see Section 2.5.2.4). IGMP messages are encapsulated into IP datagrams with protocol number 2. The goal is to ensure that the multicast router knows whenever a host in its multicast island joins or leaves a multicast group. As a necessary prerequisite, all hosts wishing to receive multicast traffic must join the local LAN all-hostsgroup with multicast IP address 224.0.0.1. Periodically, multicast routers send a Host Membership Query message to the all-hosts-group of their attached LANs. Upon receiving this message, each host answers by reporting those host groups it is a member of by sending appropriate Host Membership Report messages (one for each group). In principle, the multicast router is interested only in whether, for a specific group A, there are members in the LAN. Thus, even if several hosts are members of the same group A, it is sufficient that only one membership report for A reaches the router. In order to minimize the sent membership reports, before sending a membership report for A, each member of A first waits a random amount of time. Then, if no other membership report of some other member of A has been received, the host sends its own membership report to the group address of A, thus reaching the multicast router (which receives all multicast traffic) and all other members of A (which in turn suppress their own membership reports for A). Additionally to the above scheme, if a host newly joins a multicast group, it sends a Host Membership Report to the multicast router without waiting for a query, thus being able to receive the respective traffic immediately in case it is the first member of this group in the LAN. To cover the probability of lost reports, this is done at least twice. An IGMP version 2 packet has the following format: • The type field defines the type of the message. • The maximum response time is meaningful only on Membership Query messages and defines the maximum allowed time before sending a report message (unit is 1/10 second). • The checksum is computed over the whole IP payload. • The group address field contains the respective multicast address. There are various types of IGMP messages: • • • •
Host Membership Query Group Specific Query Version 1 and Version 2 Membership Report Leave Group
Whenever a host leaves a group, it may send a Leave Group message to the all-routers multicast group (224.0.0.2) to inform all routers of the LAN that there are possibly no more members of this group present. If the leaving host was the one host that actually answered the last Membership Query for this group, then it should send this message. Upon receiving a Leave Group message, a router sends one or more Group Specific Query messages to the group that the host has left, to verify whether members of this particular group are left.
2.6 Transport Layer Protocols and Services 2.6.1 Transmission Control Protocol The Transmission Control Protocol (TCP) operates at OSI layer 4 (in the TCP/IP reference model at the transport layer) on top of IP and is assigned the IP number 6. It constitutes the most important Internet protocol and is defined in [RFC793], [RFC1122], and [RFC1323]. The purpose of TCP is twofold: • To guarantee the correct delivery of packets sent over an intrinsically unreliable packet-oriented IP network.
© 2005 by CRC Press
2-19
IP Internetworking
TCP
FIGURE 2.9 Full-duplex TCP connection. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)
• To control the output bit rate of each sender in order to minimize packet losses due to congested routers or receivers. TCP operates connection oriented in full duplex. Applications using TCP may assume that a TCP connection opened from a source host to a receiver host is like a reliable pipeline or byte stream. Data (arbitrary bytes) put into this pipeline are guaranteed to drop out at the receiver without losses and in correct order (Figure 2.9). In order to guarantee this correctness, TCP divides the data to send into so-called segments, which are themselves sent in IP packets. In principle, IP packets can hold up to 65,535 bytes. However, in order to avoid fragmentation, the size of TCP segments is more importantly limited by the network’s MTU. Each segment starts with a TCP header, which is at least 20 bytes long, but may hold additional options. The rest of the segment may hold user data, but may also be empty. The TCP header contains the following information: • Source port and destination port. • A sequence number identifying each sent byte. This is wrapped back to zero, in case the highest number has been used. • An acknowledgment number denoting the number of the next expected byte. This field only contains valid data if the ACK bit is set. • The data offset holding the size of the TCP header. • Explicit congestion notification (ECN) and control bits, including URG, ACK, PSH, RST, SYN, and FIN. • Sender receive window size. • A header checksum. • An optional pointer to urgent data (URG flag set) and optional TCP headers. In order to create a TCP connection between two applications X and Y running on computers A and B, both applications first must get a port number, an identifier between 0 and 65,535, which can be assigned only once on each computer. The application X initiating the connection then must provide its own port number, the IP address of computer B, and the port number of the partner application Y to TCP. TCP then sends a segment to the given IP address and port number, where the SYN flag is set to 1 and ACK is set to 0, and a random sequence number x is chosen. If application Y correctly waits at the given port, the TCP on computer B answers with a segment, where the SYN and ACK bits are set, the sequence number of side B is set to a random number y, and the acknowledgment number is set to x + 1. Upon reception of this second segment, the TCP on computer A again sends a segment, where the SYN and ACK flags are set, the sequence number is set to x + 1, and the acknowledgment number is set to y + 1. As three segments must be sent for establishing a TCP connection, this process is called three-way handshake (see Figure 2.10). After the establishment of the connection, each side may send arbitrary bytes to the other side. If one side wants to terminate the connection, a segment with set FIN flag must be sent. Otherwise, if, for instance, application X sends data to Y, then the data are put into one or more TCP segments, which are then sent via IP to computer B. Due to the sequence numbers of each segment, the TCP layer at B is able to realize missing segments or the out-of-order delivery of segments. For each correctly received segment, B must send an acknowledgment segment back to A, where the acknowledgment number identifies the number of the next expected byte. The TCP on computer A, on the other hand, starts a
© 2005 by CRC Press
2-20
The Industrial Communication Technology Handbook
Host 1
Host 2
SY N ( S EQ =
=y (S E Q SYN
SYN
(S EQ
, AC K
= x+ 1 ,
x)
=x +1 )
AC K=
y+ 1 )
FIGURE 2.10 TCP three-way handshake. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)
so-called retransmission timer for each sent segment. If no acknowledgment has been received within a certain amount of time, computer A assumes that the segment was lost and has to be sent again. TCP also maintains two so-called sliding windows in order to control the transmission bit rate of each sender (flow control). One window simply tells each sender how many bytes the receiver may currently receive without risking a buffer overflow. This information is transmitted in each ACK segment in the receive window size field. The second window is called congestion window (CWND). Here, each sender additionally restricts the number of bytes it may send without acknowledgment to the congestion window size. Initially, the window size is set to 1 packet (i.e., the maximum allowed segment size), a strategy that is called slow start. For each acknowledged byte, TCP increases the size of its congestion window, at first with exponential speed, but after reaching a certain threshold h, only with linear speed. If a timeout of the retransmission timer occurs, h is set to h/2 and the congestion window is reset to one packet. Instead of waiting for the retransmission timer to time out, a strategy called fast retransmit enables receivers to send duplicate ACKs to the sender, in case out-of-order segments are received. A sender receiving more than two or three such duplicate ACKs may deduce that an intermediate segment has been lost rather than that the segments have been just remixed on the way, and may retransmit the missing segment earlier [RFC1122].
2.6.2 User Datagram Protocol The User Datagram Protocol (UDP) is the second important IP at OSI layer 4 (in the TCP/IP reference model at the transport layer) [RFC768, RFC1122] and is assigned the IP number 17. It is meant for transporting application data in a message-oriented, unreliable manner from one application to another. As most functionality is already provided by IP, the UDP header only contains the port numbers of the source and receiver applications, the length of the UDP packet, and a checksum. As UDP does not provide any functionality for detecting lost packets or out-of-order delivery, it is mostly used either in local networks with large bandwidths and reliable layer 2 transport, or for transporting multimedia data like live broadcasts, where a few lost packets will not seriously decrease the perceived quality of the presentation. In any case, detection of lost packets or out-of-order delivery must be carried out by the receiving applications, usually by including sequence numbers into the UDP application data. The interpretation of these numbers is left solely to the applications.
© 2005 by CRC Press
2-21
RE
PATH
V
RESV
TH
Source
ES R
SV
PA
PATH
PA TH
IP Internetworking
RESV
Destination
FIGURE 2.11 RESV and PATH messages in a multicast tree. (From Kurz, C. and Hlavacs, H., TCP/IP architecture, protocols, and services, in Industrial Information Technology Handbook, Zurawski, R. (Ed.) CRC Press, Boca Raton, 2004. With permission.)
2.6.3 Resource Reservation Protocol IPv4 does not contain mechanisms for guaranteeing a minimum quality of service (QoS) for its traffic, for instance, a minimum sustainable end-to-end bit rate or a maximum end-to-end delay or jitter (delay variation). This may severely affect the presentation quality of real-time transmissions, using, for example, Real-Time Protocol (RTP) (see application layer protocols). The Resource Reservation Protocol (RSVP) tries to fill this gap by providing means for guaranteeing certain quality of service parameters [RFC2205, RFC2750]. It is an optional add-on for Internet routers and clients using IP (IPv4 and IPv6), and is currently available on a small subset of Internet hosts only. Being at the same level as TCP or UDP, it has its own IP number (47). RSVP is no routing protocol itself, but rather a signaling protocol. It cooperates with other routing protocols for controlling efficient unicast and multicast over IP. RSVP allows two different QoS modes. In the controlled load service, RSVP simulates a lightly loaded network for its clients, although the network itself may be overloaded [RFC2211]. Although no hard QoS parameters are met, a lightly loaded network is likely to be sufficient for many load-tolerant and adaptive applications like audio/video streaming. In contrast, the guaranteed service guarantees that the RSVP path will meet the agreed QoS level at all times [RFC2212]. A client application wishing to receive a multicast multimedia stream passes this request to its local RSVP daemon. This daemon then sends a reservation (RESV) request to adjacent RSVP routers toward the multimedia source along the reverse multicast tree path. The RESV request contains a description of the desired quality of service in a so-called flow descriptor. Coming from the other side, the multicast source periodically sends PATH messages down the multicast tree. PATH messages create and acknowledge valid and active multicasting paths (Figure 2.11). Also, they carry information about the quality of service of the path from source to receiver. RSVP routers may merge different QoS requests into one single reservation, here choosing the maximum of each request as the prereserved QoS level. During runtime, reservations may be changed to other QoS levels. Also, RSVP paths must be acknowledged periodically by PATH and RESV messages, but RSVP is fault tolerant with respect to a few missing messages. Only if none have been received for a certain time is the whole path cancelled. On each RSVP router, an RSVP daemon manages and controls the IP routing process. It consists of the following modules: An incoming QoS reservation request is approved or denied by the admission control, depending on whether the QoS request can be satisfied. The rights for making reservations are checked in the policy control module. Incoming data packets are sorted by the packet classifier, which puts them into different queues. Finally, the packet scheduler is responsible for granting the agreed QoS to the packets in the routing queues; packets belonging to the same queue are treated identically.
2.7 Presentation Layer Protocols and Services Applications may send arbitrary data to others, often embedding complex data structures into their messages. In this process, the data structures have to be transformed (flattened, marshalled) into a sequence of bytes, containing the data as well as information about the used data representation. The
© 2005 by CRC Press
2-22
The Industrial Communication Technology Handbook
receiver must be able to understand the structure of the byte sequence and how to interpret the single bytes in order to reconstruct the sent data structures. This is achieved by the presentation layer (layer 6 of the ISO/OSI model). The presentation layer ensures that two computer systems may successfully communicate even if they use different data representations. Due to different data representation schemes, the presentation layer often is forced to translate sent or received messages. This, however, should be done in a manner totally transparent to the OSI application layer above. Problems may arise, for instance, because of the CPU byte order. In modern 32-bit architectures, CPUs store values and addresses using 32 bits, stored in four consecutive bytes. In Intel processors, for example, the least significant byte is stored first and the most significant byte last. This is called little endian. On the other hand, for example, Motorola processors store a 4-byte value in the reverse order, called big endian. If an Intel-based computer sends a 32-bit value to a Motorola-based computer, without further corrective measures, the receiver totally misinterprets the received value. This may be prevented, for instance, by forcing the sender to convert the data to the receiver’s format before sending, or alternatively forcing the receiver to convert the data from the sender’s format after receiving. A third approach is to agree to a commonly used format and to convert to this format before sending or from this format after receiving. TCP/IP, for instance, defines a common network byte order. Using, for example, the C programming language, 32-bit values may be converted to and from this format by the macros hton() and ntoh(). Another system using an external data format is given by the external data representation (XDR), as specified by Sun [SUN1990]. An additional problem arising in different computers is the code interpretation. For instance, characters may be stored using one of the following codes: ASCII (common in Intel compatibles, 8 bits/character), EBCDIC (used on IBM mainframes, 8 bits/character), or UNICODE (16 or 20 bits/character). Here, the presentation layer is responsible for automatically translating between the various code schemes. At the next-higher decoding level, received complex data structures should be reconstructed (unmarshalled) from their flattened byte sequence representation. For inhomogeneous data, the data structures must be described by metadata, for instance, defining the data types belonging to each structure, being followed by the data values themselves. This, for instance, can be achieved by using the standardized Abstract Syntax Notation 1 (ASN.1) [X680]. Other tasks of the presentation layer include the encryption of messages and supporting authentication. Finally, the presentation layer may also be responsible for the compression of data.
2.8 Application Layer Protocols and Services In both the ISO/OSI scheme and the TCP/IP reference model, the application layer defines protocols directly to be used by applications for exchanging data with each other. These include, for instance, authentication, distributed databases and files systems, file transport, data syntax restrictions, coordination and agreement procedures, quality of service issues, e-mail, and terminal emulation. Many standard protocols are already specified by the Internet Engineering Task Force (IETF). They define standard data structures that are to be exchanged between applications. Applications following these protocols are guaranteed to be able to successfully interact with other applications over the Internet, even if these applications have been created by different sources. For instance, Web browsers following HTTP may download Web pages from any Web server connected to the Internet. The IETF-specified protocols usually use TCP for reliable transport and UDP for the transport of realtime multimedia data (although real-time multimedia data may also be sent over TCP). Usually, both control commands and pure data can be transmitted over the same TCP connection. For signaling the end of a data transmission, one of three approaches is used. In octet stuffing, the end of a data transmission is signaled by a certain byte sequence (similar to the bit stuffing used at the data link layer). If the transported data also contain this very sequence, the sequence is changed (escaped) into another sequence. The receiver must detect such a change and undo it. An example for octet stuffing is SMTP. In octet
© 2005 by CRC Press
IP Internetworking
2-23
counting, transported messages contain special headers that specify the number of data bytes to be transferred. This concept is used, for instance, in HTTP. Finally, in connection blasting, the end of a transmission is signaled by closing the TCP connection. This is used, for instance, in FTP.
2.8.1 TELNET The TELNET protocol is meant for providing a general 8-bit interface for the communication between users, hosts, and processes [RFC854]. Generally, a TELNET client running on computer A opens a TCP connection to port 23 of a TELNET server on computer B. Both sides then emulate a certain simple type of terminal called network virtual terminal (NVT), but may negotiate additional services after the connection has been established. An NVT is a bidirectional character device consisting of a printer that shows the information received from the other side, and a keyboard where keystrokes are produced and sent to the other side. TELNET defines a set of commands that may be sent in-band with the stream of data. The mechanism used here is the octet stuffing. Byte 255 is called interpret as command (IAC) and signals that the following byte specifies a TELNET command, for example, for sending an interrupt to the running process or for erasing the last character. If a data byte with value 255 is to be sent, then two bytes with value 255 are sent. Upon receiving two consecutive bytes with value 255, the receiver side must remove one of them automatically.
2.8.2 File Transfer Protocol The File Transfer Protocol (FTP) is used for transporting arbitrary binary data from one Internet host to another [RFC959]. On computer A, an FTP client is started with the IP or DNS address of the Internet computer B with which communication is desired. The FTP client then opens a TCP connection to port 21 of computer B, representing the control connection. The control connection uses the TELNET protocol underneath, and users may send control commands to the FTP server on computer B, including the request for showing the contents of the current directory at computer B (LIST), as well as changing this current directory to another one (CWD), creating new directories (MKD), etc. Additionally, the user may start uploads (STOR) or downloads (RETR) of files to and from the current directory. Upon the reception of a control command over the control connection, the server answers with a reply, sending status or error information to the client. One has to distinguish between the FTP control command that is actually sent over the control channel and commands that are typed in by users into a command line application, which may be different. Once data are to be sent, a TCP data connection is opened by the server on computer B from port 20 to the user client on computer A listening on port 21. Then, depending on the specified direction, the data are sent either from A to B or vice versa. After transmitting the last byte, the sender must close the data connection, indicating to the other side that the transmission has ended. It is worth noting that FTP knows different transmission modes. In the binary mode, the data are sent without modification. In the ASCII mode, the FTP automatically changes different character representation codes, for instance, when sending a pure text file from an IBM mainframe (using EBCDIC) to a PC (using ASCII), or when exchanging data between different operating systems like Microsoft Windows and Unix or Unix-like operating systems (having different end-of-line representations in text files).
2.8.3 Hypertext Transfer Protocol The Hypertext Transfer Protocol (HTTP) is available as version 1.0 [RFC1945] and version 1.1 [RFC2616]. Its purpose is to manage the download of documents that are part of the World Wide Web (WWW), usually following the Hypertext Markup Language (HTML) [RFC1866]. Most Web browsers and servers nowadays understand HTTP/1.0, although [RFC1945] is not a standard but rather an informational guideline. Newer Web clients and servers also support the standardized HTTP/1.1.
© 2005 by CRC Press
2-24
The Industrial Communication Technology Handbook
HTTP is a client/server-based protocol following the octet-counting approach. A client wishing to download a specific document from a Web server opens a TCP connection to the server port 80 (sometimes 8080). The client then sends a request, containing a request line, various headers, an empty line, and an optional body. The request line specifies what the client wants the server to do. For example, a request line “GET /dir1/dir2/the_document.html HTTP/1.1” informs the server that the clients wants to download the document “the_document.html,” which is situated in the directory “/dir1/dir2” by using HTTP/1.1. Clients may also send data to the server, for example, a form that has been filled out by a user. This can be done using the PUT command. The server then answers by sending a status line containing a code for success or an error description, various headers describing the downloaded document (e.g., its size or the time stamp of its last change), followed by an empty line. Finally, in the message body, the HTML document itself is transported to the client. HTTP/1.1 masters several limitations of HTTP/1.0. For example, an HTML document may contain several other subdocuments, like photos, wall papers, frames, etc. In HTTP/1.0, for each subdocument a new TCP connection has to be created. In HTTP/1.1, all subdocuments can be transported over the same persistent TCP connection.
2.8.4 Simple Mail Transfer Protocol The Simple Mail Transfer Protocol (SMTP) defines the exchange and relay of text mails over TCP/IP [RFC821]. If a mail client running on computer A wants to send mail to a receiver on computer B, it opens a TCP connection to port 25 of either computer B or an intermediate mail server that is able to pass on the mail to the receiver on computer B. Then the sender client sends SMTP commands to the receiver, which replies by sending SMTP responses. Once the sender wants to send an electronic mail, it sends the command MAIL with an identifier for the sender. If the receiver is willing to accept mail from the sender, it answers with an OK reply. Now, the sender client sends a sequence of receipt to (RCPT) commands, which identify the receivers of the mail. Each recipient is acknowledged individually by an OK reply. Once all receivers have been specified, the client sends a DATA command followed by the mail data itself. In order to indicate the end of the mail, the client sends a line containing only a period. If such a line is part of the message, the sender will introduce an additional period, which is removed by the receiver automatically (octet stuffing). In SMTP, (text) mail must be composed of 7-bit ASCII characters only (byte values 0 to 127), a limitation that was not severe in 1982 when SMTP was designed. Nowadays, electronic mail often contains multimedia attachments like audio or video files, where each byte may contain any value between 0 and 255. In order to be able to transport binary data over SMTP, these data are usually transformed into a sequence of 7-bit ASCII characters by using a byte-to-character mapping like Base64 or uuencode. Upon receiving such a transformed character sequence, the receiver must apply the inverse of the transform in order to retrieve the original binary data.
2.8.5 Resource Location Protocol Computers connected over an IP network may offer a variety of services to others, including services standardized by the IETF like DNS, SMTP, FTP, etc., as well as self-created services, for instance, for managing personal information. The Resource Location Protocol (RLP) has been designed to enable arbitrary computers to automatically find other computers that provide specific services [RFC887]. For this purpose, RLP defines a set of request messages that may be sent by the searching computer. RLP uses UDP as a transport protocol. A request message is sent to the UDP port 39 of another host and contains a question and a description of one or more services that are looked for. Depending on the question, hosts that provide the service or know of others that do answer by sending a reply message. RLP defines the following request messages:
© 2005 by CRC Press
IP Internetworking
2-25
• Who Provides? is usually broadcast into a LAN. Hosts providing one of the described services may answer; hosts that do not provide any of the specified services may not. • Do You Provide? is directly sent to some specific host. It may not be broadcast. A host receiving this message must answer, regardless of whether it provides any of the specified services. • Who Anywhere Provides? also is usually broadcast into a LAN. Hosts either providing any service or knowing other hosts that do so may answer. • Does Anyone Provide? is sent to a specific host, which must send back an answer, regardless of whether it knows of any host providing any of the services. There are two possible answers. The I Provide reply contains a (possibly empty) list of services that are supported by the answering host. The They Provide reply contains a (possibly empty) list of supported services, qualified by a list of host IP addresses supporting them. An RLP message contains the following fields: • The type field defines the question or reply type. • The flag local only specifies whether only hosts with the same IP network address should answer or be included in the answer list. • A message ID enables the mapping of received answers to previously sent requests. • Finally, the resource list contains a description of the looked-for or provided services and supporting hosts. Resources and services may by described by several fields. The first description byte specifies the IP number of the IP transport protocol the service uses, for instance, 6 for TCP or 17 for UDP. The next byte defines the port that is usually used by the service, for instance, 23 for TELNET or 25 for SMTP. Additional bytes may then define arbitrary self-created services.
2.8.6 Real-Time Protocol The Real-Time Protocol (RTP) has been designed for carrying real-time multimedia data like audio or video information [RFC3550]. Multimedia data usually are produced as a continuous stream of bits. For this stream to be carried over the network, it must be packetized and sent as a sequence of packets to one (unicast) or several (multicast) receivers. For real-time traffic, UDP is preferred over TCP, as the delivery of late or lost packets (which is mandatory for TCP) may cause the presentation to stall, which is undesirable, for example, for video conferences. Instead, in case of (a few) lost packets, small artifacts may be visible or audible, which are less annoying than a complete connection breakdown or stall. At the receiver, the original sequence of RTP packets and its content are restored, and lost packets are identified. Pure RTP does not know anything about the payload content. Instead, RTP headers may be altered to fit the needs of specific applications like audio and video conferences. Such changes are then defined in so-called profile specifications. Additionally, different RTP payload formats may be defined in payload format specifications, as, for instance, is given by [RFC2190] for H.263. The RTP specification defines the following header fields: • • • • • • • • •
The RTP version (1 or 2) Padding and header extension flags Contributing sources (CSRC) count, i.e., length of the CSRC list A marker flag M to be used freely by profiles Payload type (PT), must be interpreted by the application Sequence number, increased by one for each new RTP packet Time stamp of the sampling of the first RTP payload byte RTP synchronization source identifier, which must be unique for concurrent RTP sessions An optional contributing sources list (CSRC list)
As RTP is transported over the best-effort protocols TCP/UDP/IP, no guarantee can be made that a required bit rate is available for the real-time transport. Instead, RTP provides a means for measuring
© 2005 by CRC Press
2-26
The Industrial Communication Technology Handbook
and controlling the output bit rate and perceived quality of service of a real-time stream. This procedure is provided by the Real-Time Control Protocol (RTCP). RTCP can carry the following information: • • • • •
In a sender report, statistics for each active sender are sent to the receivers. In a receiver report, receivers (which are not senders) send reception statistics to the active senders. Sender attributes like e-mail addresses, etc. (source description). The request for leaving the presentation. Application-specific control information.
For each real-time session (transporting exactly one medium like audio or video), each participant needs two ports — one for RTP and one for RTCP. RTP is able to multiplex several sessions into one. This is done by a so-called mixer. For example, the audio data of several participants of an audio conference may be mixed into one single audio stream and sent over a connection with low bandwidth. Here, the mixer would act as a new synchronization source; the IDs of the original sources, however, may then be stored additionally after the RTP header in the list of contributing sources. Another RTP entity is a translator, which is able to change payload content or tunnel packets through a firewall.
2.9 Summary The TCP/IP suite consists of numerous protocols covering several layers of the ISO/OSI stack or, alternatively, the TCP/IP reference model. Starting at OSI layer 2, protocols are defined for link-level services for secure frame transport over IP at OSI layer 3 for the unreliable delivery of datagrams from one host connected to the Internet to another. At OSI layer 4, transport protocols regulate the either reliable and controlled or the unreliable transport of data from process to process. Further services may alter the data due to different presentation schemes or offer direct support to applications.
References [ADA2003] A. Adams, J. Nicholas, W. Siadak, Protocol Independent Multicast–Dense Mode (PIM-DM): Protocol Specification (Revised), IETF Internet Draft, 2003, http://www.ietf.org/Internet-drafts/ draft-ietf-pim-dm-new-v2-04.txt. [ALB2004] Z. Alb, et al. IANA Guidelines for IPv4 Multicast Address Assignments, IETF Internet Draft, 2004, http://www.ietf.org/Internet-drafts/draft-ietf-mboned-rfc3171bis-01.txt. [COL2001] G. Coulouris, J. Dollimore, T. Kindberg, Distributed Systems, 3rd edition, Addison-Wesley, Boston, 2001. [HAL1996] F. Halsall, Data Communications, Computer Networks and Open Systems, 4th edition, AddisonWesley, Reading, MA, 1996. [IANA] Internet Assigned Numbers Authority, http://www.iana.org/. [IANAM] Internet Assigned Numbers Authority, Internet Multicast Addresses, http://www.iana.org/ assignments/multicast-addresses. [ISO7498] ISO/IEC 7498-1, Information Technology–Open Systems Interconnection: Basic Model, ISO, 1994. [KUR2001] J.F. Kurose, K.W. Ross, Computer Networking: A Top-Down Approach Featuring the Internet, Addison-Wesley, Reading, MA, 2001. [PET2000] L.L. Peterson, B.S. Davie, Computer Networks: A Systems Approach, 2nd edition, Morgan Kaufmann, San Francisco, 2000. [PUS2003] T. Pusateri, Distance Vector Multicast Routing Protocol, Version 3, IETF Internet Draft, 2003, http://www.ietf.org/Internet-drafts/draft-ietf-idmr-dvmrp-v3-11.txt. [REI2001] U. Reimers, Digital Video Broadcasting, Springer, Berlin, 2001. [RFC768] RFC 768, User Datagram Protocol, IETF, 1980, http://www.ietf.org/rfc/rfc0768.txt. [RFC791] RFC 791, Internet Protocol: DARPA Internet Program Protocol Specification, DARPA, 1981, http://www.ietf.org/rfc/rfc791.txt.
© 2005 by CRC Press
IP Internetworking
2-27
[RFC792] RFC 792, Internet Control Message Protocol, DARPA, 1981, http://www.ietf.org/rfc/rfc792.txt. [RFC793] RFC 793, Transmission Control Protocol, DARPA, 1981, http://www.ietf.org/rfc/rfc793.txt. [RFC821] RFC 821, Simple Mail Transfer Protocol, IETF, 1982, http://www.ietf.org/rfc/rfc821.txt. [RFC826] RFC 826, An Ethernet Address Resolution Protocol, IETF, 1982, http://www.ietf.org/rfc/ rfc826.txt. [RFC854] RFC 854, Telnet Protocol Specification, IETF, 1983, http://www.ietf.org/rfc/rfc854.txt. [RFC887] RFC 887, Resource Location Protocol, IETF, 1983, http://www.ietf.org/rfc/rfc887.txt. [RFC903] RFC 903, A Reverse Address Resolution Protocol, IETF, 1984, http://www.ietf.org/rfc/rfc903.txt. [RFC959] RFC 959, File Transfer Protocol (FTP), IETF, 1985, http://www.ietf.org/rfc/rfc959.txt. [RFC977] RFC 977, Network News Transfer Protocol: A Proposed Standard for the Stream-Based Transmission of News, IETF, 1986, http://www.ietf.org/rfc/rfc977.txt. [RFC1034] RFC 1034, Domain Names: Concepts and Facilities, IETF, 1987, http://www.ietf.org/rfc/ rfc1034.txt. [RFC1035] RFC 1034, Domain Names: Implementation and Specification, IETF, 1987, http:// www.ietf.org/rfc/rfc1035.txt. [RFC1075] RFC 1075, Distance Vector Multicast Routing Protocol, IETF, 1988, http://www.ietf.org/rfc/ rfc1075.txt. [RFC1112] RFC 1112, Host Extensions for IP Multicasting, IETF, 1989, http://www.ietf.org/rfc/ rfc1112.txt. [RFC1122] RFC 1122, Requirements for Internet Hosts: Communication Layers, IETF, 1989, http:// www.ietf.org/rfc/rfc1122.txt. [RFC1157] RFC 1157, A Simple Network Management Protocol (SNMP), IETF, 1990, http:// www.ietf.org/rfc/rfc1190.txt. [RFC1323] RFC 1323, TCP Extensions for High Performance, IETF, 1992, http://www.ietf.org/rfc/ rfc1323.txt. [RFC1584] RFC 1584, Multicast Extensions to OSPF, IETF, 1994, http://www.ietf.org/rfc/rfc1584.txt. [RFC1853] RFC 1853, IP in IP Tunneling, IETF, 1995, http://www.ietf.org/rfc/rfc1853.txt. [RFC1866] RFC 1866, Hypertext Markup Language: 2.0, IETF, 1995, http://www.ietf.org/rfc/rfc1866.txt. [RFC1883] RFC 1883, Internet Protocol, Version 6 (IPv6) Specification, IETF, 1995, http://www.ietf.org/ rfc/rfc1883.txt. [RFC1887] RFC 1887, An Architecture for IPv6 Unicast Address Allocation, IETF, 1995, http:// www.ietf.org/rfc/rfc1887.txt. [RFC1918] RFC 1918, Address Allocation for Private Internets, IETF, 1996, http://www.ietf.org/rfc/ rfc1918.txt. [RFC1945] RFC 1945, Hypertext Transfer Protocol: HTTP/1.0, IETF, 1996, http://www.ietf.org/rfc/ rfc1945.txt. [RFC2190] RFC 2190, RTP Payload Format for H.263 Video Streams, IETF, 1997, http://www.ietf.org/ rfc/rfc2190.txt. [RFC2205] RFC 2205, Resource ReSerVation Protocol (RSVP), IETF, 1997, http://www.ietf.org/rfc/ rfc2205.txt. [RFC2211] RFC 2211, Specification of the Controlled-Load Network Element Service, IETF, 1997, http:// www.ietf.org/rfc/rfc2211.txt. [RFC2212] RFC 2212, Specification of Guaranteed Quality of Service, IETF, 1997, http://www.ietf.org/ rfc/rfc2212.txt. [RFC2236] RFC 2236, Internet Group Management Protocol, Version 2, IETF, 1997, http://www.ietf.org/ rfc/rfc2236.txt. [RFC2362] RFC 2362, Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol Specification, IETF, 1998, http://www.ietf.org/rfc/rfc2362.txt. [RFC2365] RFC 2365, Administratively Scoped IP Multicast, IETF, 1998, http://www.ietf.org/rfc/ rfc2365.txt.
© 2005 by CRC Press
2-28
The Industrial Communication Technology Handbook
[RFC2373] RFC 2373, IP Version 6 Addressing Architecture, IETF, 1998, http://www.ietf.org/rfc/ rfc2373.txt. [RFC2460] RFC 2460, Internet Protocol, Version 6 (IPv6) Specification, IETF, 1998, http://www.ietf.org/ rfc/rfc2265.txt. [RFC2616] RFC 2616, Hypertext Transfer Protocol: HTTP/1.1, IETF, 1999, http://www.ietf.org/rfc/ rfc2616.txt. [RFC2750] RFC 2750, RSVP Extensions for Policy Control, IETF, 2000, http://www.ietf.org/rfc/ rfc2750.txt. [RFC2770] RFC 2770, GLOP Addressing in 233/8, IETF, 2000, http://www.ietf.org/rfc/rfc2770.txt. [RFC2784] RFC 2784, Generic Route Encapsulation (GRE), IETF, 2000, http://www.ietf.org/rfc/ rfc2784.txt. [RFC2909] RFC 2909, The Multicast Address-Set Claim (MASC) Protocol, IETF, 2000, http:// www.ietf.org/rfc/rfc2909.txt. [RFC3550] RFC 3550, RTP: A Transport Protocol for Real-Time Applications, IETF, 2003, http:// www.ietf.org/rfc/rfc3550.txt. [RFC3376] RFC 2376, Internet Group Management Protocol Version 3, IETF, 2002, http://www.ietf.org/ rfc/rfc3376.txt. [SCO1991] T. Scocolowski, C. Kale, A TCP/IP Tutorial, IETF, Network Working Group, RFC 1180, January 1991, http://www.ietf.org/rfc/rfc1180.txt. [SUN1990] Network Programming, Sun Microsystems, Inc., Mountain View, CA, March 1990. [TAN1996] A.S. Tanenbaum, Computer Networks, 3rd edition, Prentice Hall, 1996. [X680] Information Technology: Abstract Syntax Notation One (ASN.1): Specification of Basic Notation, ITU-T Recommendation X.680 (1997), ISO/IEC 8824-1, 1998.
© 2005 by CRC Press
3 A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues 3.1 3.2 3.3
Introduction ........................................................................3-1 Routing and Routers ...........................................................3-2 Routing Algorithm Design Issues ......................................3-3 Optimality • Convergence • Scalability • Robustness • Flexibility and Stability • Simplicity
3.4
Classification of Routing Protocols ...................................3-4 Static or Dynamic • Global or Decentralized • Link State or Distance Vector • Single Path or Multipath • Flat and Hierarchical • Intra-AS and Inter-AS • Unicast and Multicast
3.5
IP Unicast Routing: Interior and Exterior Gateway Protocols ..............................................................................3-8 Interior Gateway Protocols for IP Networks • Exterior Gateway Protocols for IP Internetworks
3.6
IP Multicast Routing.........................................................3-17 Distance-Vector Multicast Routing Protocol • Multicast OSPF • Protocol-Independent Multicast • Core-Based Tree • Interdomain IP Multicast Routing
3.7
IP Addressing and Routing Issues....................................3-19 Classful IP Addressing • Impact of IP Addressing on Routing Tables and Internet Scalability • Subnetting • Variable-Length Subnet Masks • Classless Interdomain Routing
3.8
IPv6 Overview ...................................................................3-24 IPv6 Addressing, Subnetting, and Routing • IPv6 Deployment in the Current Internet: State of the Art and Migration Issues
Lucia Lo Bello University of Catania
3.9 Conclusions .......................................................................3-29 References .....................................................................................3-29
3.1 Introduction This chapter addresses routing from a broad perspective. After an introduction on routing algorithm principles, an overview of the routing protocols currently used in the Internet domain is presented. Both unicast and multicast routing are dealt with. The chapter then focuses on the strict correlation between Internet Protocol (IP) routing and addressing.
3-1 © 2005 by CRC Press
3-2
The Industrial Communication Technology Handbook
The impact of the traditional classful addressing scheme on the size of routing tables for Internet routers and its poor scalability in today’s Internet is discussed. Then, classless interdomain routing (CIDR), which, for the time being, has solved the problems previously mentioned, is presented and discussed. Finally, the next-generation IP, which represents the long-term solution to the problems of the current Internet, is introduced. IP version 6 (IPv6) is outlined and issues on the IPv4-to-IPv6 transition addressed.
3.2 Routing and Routers Two or more networks joined together form an internetwork, where network layer routing protocols implement path determination and packet switching. Path determination consists of choosing which path (or route) the packets are to follow from a source to a destination node, while packet switching refers to transporting them. Path determination is accomplished by routing algorithms that, given a set of routers and links connecting them, determine the best (i.e., least-cost) path from source to destination, according to a given cost metric. A router is a specialized network computing device, similar to a computer, but optimized for packet switching. It typically contains memory (ROM, RAM, Flash) and some kind of bus and is equipped with an operating system (OS), a configuration, and a user interface. As happens inside a computer, in a router a boot process loads bootstrap code from the ROM, thus enabling the device to load its operating system and configuration into the memory. A significant difference between a router and a computer lies in the user interface and memory configuration. While DOS or UNIX systems typically have one physical bank of memory chips that will be allocated by the software to different functions, routers feature several distinct banks of memory, each dedicated to a different function. In many routers, OSs are stripped-down schedulers derived from early versions of Free BSD (Berkeley Software Distribution) UNIX. A growing interest in the Linux OS has recently appeared. Some vendors run proprietary OSs on their routers (for example, Cisco routers run the Internetwork Operating System (IOS), which embeds a broad set of functions). A router’s task is to switch IP packets between interconnected networks. In order to allow the calculation of the best path for individual packets, routing protocols enable routers to communicate with each other, exchanging both topology information (e.g., about neighbors and routes) and state information (e.g., costs), which are fed into routing tables. A routing table consists of a list of routing entries indicating which outgoing link should be used to forward packets to a given destination. Figure 3.1 shows a simplified routing table. When a router receives an incoming packet, it checks the routing table to find a destination/nexthop association for the destination address specified in the packet. The routing table data structure contains all the information necessary to forward an IP data packet toward its destination. When forwarding an IP data packet, the routing table entry providing the best match for the packet’s IP destination is chosen.
Destination Network
Next Router
# of Hops to Destination
Interface
205.219.0.0
205.219.5.2
—
Ethernet 0
151.5.0.0
160.4.2.5
5
Ethernet 0
Default
193.55.114.128
2
Ethernet 1
FIGURE 3.1
A simplified routing table.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-3
3.3 Routing Algorithm Design Issues While designing routing algorithms, several issues have to be addressed. The main ones are listed and discussed below.
3.3.1 Optimality Optimality is the ability to find the optimal (i.e., least-cost) path from source to destination according to a given metric. A single metric may be used, or a combination of multiple metrics in a single hybrid one. The most common routing metric is path length, usually expressed in terms of hop count, i.e., the number of hops that a packet must make on its path from a source to a destination. Alternatively, a network administrator may assign arbitrary costs (expressed as integer values) to each network link and calculate the path length as the sum of the costs associated with each link traversed. These costs account for several link features, such as: • Bandwidth. • Routing delay: The length of time required to move a packet from source to destination through the internetwork. • Load: The degree of utilization of a router, obtained by monitoring variables such as CPU utilization or packets routed per second. • Reliability: Accounts for the link’s fault probability or recovery time in the event of failure. • Monetary cost: Companies may prefer to send packets over their own lines, even though slower, rather than through faster, but expensive external lines that charge money for usage time. Different routing protocols generally adopt different metrics and algorithms that are not compatible with each other. As a result, in a network where multiple routing protocols are present, a way to determine the best path across the multiple protocols has to be found. Each routing protocol is therefore labeled with an integer value that defines the trustworthiness of the protocol, called the administrative distance. When there are multiple different routes to the same destination from two different routing protocols, routers will select the route supplied by the protocol with the shortest administrative distance.
3.3.2 Convergence When a router detects a topology change (e.g., new routes being added, existing routes changing state, etc.), this information must be propagated through the network and a new routing topology calculated. Routers achieve this by distributing routing update messages to the other routers, thus stimulating recalculation of optimal routes and eventually causing all routers to agree on these routes. The time taken to detect changes in the network topology, reconfigure the topology correctly, and agree, called convergence time, is a very important characteristic of routing algorithms. Slow convergence should be avoided, as it may entail network interruption or routing loops. These occur when, due to slow convergence, a packet arriving at a router A or B and destined for a router C bounces back and forth between these two routers until either convergence is reached or the packet has been switched the maximum number of times allowed. Convergence time may depend either on the network topology and size (e.g., number of routers, link speeds, routing delays) or on the routing protocol used and the setting of the relevant timing parameters.
3.3.3 Scalability Routing algorithms that behave well in small systems should also scale well in larger internetworks. Unfortunately, some routing algorithms (such as those based on heavy flooding techniques), while performing well in small networks, are not suitable for use in large-size ones.
© 2005 by CRC Press
3-4
The Industrial Communication Technology Handbook
3.3.4 Robustness Routing algorithms should perform correctly even in the presence of unusual or unforeseen events (e.g., router failures, misbehavior, sabotage).
3.3.5 Flexibility and Stability When responding to network changes (e.g., in bandwidth, router queue size, network delay), routing algorithms should exhibit flexible, but stable behavior.
3.3.6 Simplicity In order to reduce the overhead on routers (in terms of both processing and storage), routing algorithms should be as simple as possible. Moreover, they have to exploit system resources efficiently (especially when executing on resource-constrained hosts).
3.4 Classification of Routing Protocols Routing protocols may be classified according to se ver al different char acter istics [Cisco03][Kenyon02][Kurose01]. Here we will address the most relevant. As will be seen, routing protocols may be static or dynamic, global or decentralized, link state or distance vector, single path or multipath, flat or hierarchical, intra-AS or inter-AS, unicast or multicast.
3.4.1 Static or Dynamic Static algorithms are based on fixed tables. Static routes seldom change, and when changes do occur, it is usually a result of human intervention (i.e., editing a router’s forwarding table). Static routing algorithms are simple, introduce a low overhead, and are suitable for environments where network traffic is stable and predictable. Static routing is commonly adopted where there is no need for an alternative path (for example, in permanent point-to-point wide area network (WAN) links to remote sites or dial-up Integrated Services Digital Network (ISDN) lines). Dynamic routing algorithms automatically generate the routing paths responding to the network traffic or topology changes. When a change occurs, the routing algorithm running on a router recalculates routes, reflects changes in the routing table, and then propagates updates throughout the network, thus stimulating recalculation in the other routers as well. Dynamic algorithms sometimes have static routes inserted in their routing tables. This is the case, for instance, of default routers, to which all traffic should be forwarded when the destination address is unknown (i.e., not explicitly listed in the routing table). For example, the last entry in Figure 3.1 indicates the default router, to which all traffic should be forwarded when the destination address is not explicitly listed in the routing table.
3.4.2 Global or Decentralized A global algorithm makes the routing decision on the basis of complete information about the network, in terms of connectivity and link costs. The calculation of the best path can be up to a single site or replicated over multiple ones. In a decentralized routing algorithm no site has complete knowledge of the network, and route calculation is iterative and distributed. Each node only knows the status of the links directly connected to it. This information is then distributed to its neighbors, i.e., nodes directly connected to it, and this iterative process of route calculations and exchanges enables a node to determine the least-cost path to a destination.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-5
3.4.3 Link State or Distance Vector Link-state algorithms (also called shortest-path-first algorithms) compute the least-cost path using complete, global knowledge of the network in terms of connectivity and link costs. Each router maintains a complete copy of the topology database in its routing table and floods routing information to all the nodes in the internetwork. At the beginning, each router will only know about its neighbors, but it will increase its knowledge through link-state broadcasts received from all the other routers. The router does not send the entire routing table, but only the portion of the table that describes the state of its own links. Distance-vector algorithms (also known as minimum-hop or Bellman–Ford algorithms) require each router to keep track of its distance (hop count) from all other possible destinations. Each node receives information from its directly connected neighbors, calculates the routing table, and then distributes the results back to the neighbors. When a change is detected in the link cost from a node to a neighbor, the router first updates its distance table and then, if the change also affects the cost of the least-cost path, notifies its neighbors. Distance-vector algorithms are distributed and iterative, as the routing table distribution process goes on until no more exchanges with neighbors occur. Routers can identify new destinations as they come into the network, learn of failures in the network, and calculate distances to all known destinations. Each router advertises on a regular basis all the destinations it is aware of with the relevant distances and sends update messages containing all the information maintained in the routing table to neighboring routers on directly connected segments. Each router can therefore build a detailed picture of the network topology by analyzing routing updates from all other routers. The best route for each destination is determined according to a minimum-distance (minimum-hop) rule. Table 3.1 compares link-state and distance-vector routing algorithms. As all the routers share the same knowledge of the network in link-state algorithms, they have a consistent view of the best path to a given destination. This would entail a sudden change in the load on the least-cost link, and even congestion if all the routers decided to send their packets through that link at the same time. TABLE 3.1
Link-State vs. Distance-Vector Routing Algorithms Link State Global
Distance Vector Decentralized
“Tell the world about the neighbors”; i.e., a router does not send the entire routing table, but only the portion of the table that describes the state of its own links More robust, as each router autonomously calculates its routing table Fast Good High
Type Route advertising
Message overhead
High: any change in a link cost entails the need to send all nodes the new cost
Implementation complexity Processor and memory requirements Stability
High
“Tell all neighbors about the world”; i.e., routers tend to distribute the entire routing table (or large portions of it) to their directly attached neighbors only An incorrect node calculation can be spread over the entire network Slow (routing loops may occur) Poor “Good news propagates fast, bad news propagates slowly”; i.e., a decrease in the cost of the best path propagates fast, while an increase goes slowly Low: when link costs change, the results of the change will be propagated only if the latter entails a change in the least-cost path for one of the nodes attached to that link Low
High
Low
Problematic, due to oscillation in routes
Good
Robustness Convergence Scalability Responsiveness to network changes
© 2005 by CRC Press
3-6
The Industrial Communication Technology Handbook
A way to avoid such oscillations would be to ensure that all the routers do not run the algorithm at the same time. However, it has been noted that routers on the Internet can self-synchronize. Even though they initially execute the routing algorithm at the same rate, but at different times, the algorithm execution instance will eventually become synchronized at the routers [Floyd97]. To deal with this problem, randomization is introduced into the period between the execution instants of the algorithm at each router.
3.4.4 Single Path or Multipath Routing protocols may be single path or multipath. The difference lies in the fact that multipath algorithms support multiple entries for the same destination in the routing table, while single path ones do not. The presence of alternative routes in multipath routing protocols allows traffic to be multiplexed over several circuits (LAN (local area network) or WAN), thus providing not only greater throughput and topological robustness, but also support for load balancing (i.e., splitting traffic between paths that have equal costs). In multipath algorithms, multiplexing may be packet based or session based. In the former case, a roundrobin technique is typically used. In the latter case, load sharing is performed on a session basis, typically using a source–destination or destination hash function.
3.4.5 Flat and Hierarchical Another distinction can be made between flat and hierarchical routing algorithms. In a flat routing algorithm, all routers are peers. Each router is indistinguishable from another as they all execute the same algorithm to compute routing paths through the entire network. This flat model suffers from two main problems: lack of scalability and poor administrative autonomy. The first derives from the growing computational, storing, and communication overhead the algorithm introduces when the number of routers becomes large. The second arises from the need to hide some features of a company’s internal network from the outside, a crucial requirement for enterprise networks. In hierarchical routing, logical groups of nodes, called domains or autonomous systems (ASs), are defined. A routing domain is a collection of routers that coordinate their routing knowledge using a single routing protocol. An AS is a routing domain that is administered by one authority (person or group). Each AS requires a registered AS number (ASN) to connect to the Internet. According to this hierarchy, some routers in an AS can communicate with routers in other ASs, while others can communicate only with routers within their own AS. In very large networks, additional hierarchical levels may exist, with routers at the highest hierarchical level forming a routing backbone (as shown in Figure 3.2). Packets from nonbackbone routers are conveyed along the backbone by backbone routers until a backbone router connected to the destination AS is found. Then packets are sent through one or more nonbackbone routers within the AS until the ultimate destination is reached. Compared to flat routing, a drawback of hierarchical routing is that suboptimal paths may sometimes be found. Nevertheless, hierarchical routing offers several advantages over flat routing. First, the amount of information maintained and exchanged by routers is reduced, and this increases the speed of route calculation, thus allowing faster convergence. Second, unlike flat routing, where a single router problem can affect all routers in the network, in a hierarchical algorithm the scope of router misbehavior is limited. This increases overall network availability. In addition, the existence of boundary interfaces between different levels in the hierarchy can be exploited to enforce security policies (e.g., access control lists or firewalls) on border routers. Hierarchical routing also improves scalability and protocol upgrades, thus making the task of the network manager easier. Thanks to the above-mentioned advantages, large companies generally adopt hierarchical routing.
3.4.6 Intra-AS and Inter-AS A very large internetwork in the IP domain is typically organized as a collection of ASs. An AS can be composed of one or more areas, made up of contiguous nodes and networks that may be further split
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
AS-3
Backbone
ASBR
EGP
EGP EGP
ASBR
ABR Area 1
IGP
Area 3 ABR Area 4
ABR
ASBR
IGP
AS-1
3-7
IGP
Area 2
IGP AS-2 Area 6 Area 4 ABR ABR IGP IGP ABR Area 5
FIGURE 3.2 A routing architecture with three ASs connected through backbone routers.
into subnetworks. Within an area, network devices equipped with the capability to forward packets between subnetworks are called intermediate systems (ISs). ISs may be further classified into those that can communicate within routing areas only (intra-area ISs) and those that can communicate both within and between routing areas (inter-area ISs). Autonomous system border routers (ASBRs) on the backbone are entrusted with routing traffic between different ASs, while area border routers (ABRs) deal with traffic between different areas within the same AS. The routing protocol used within an AS (an intra-AS routing protocol) is commonly referred to as an Interior Gateway Protocol (IGP). A separate protocol is used to interface among the ASs, called the Exterior Gateway Protocol (EGP). EGPs are usually referred to as inter-AS routing protocols. All routers within each AS will run one or more IGPs. Routing information between ASs is exchanged through the routing backbone via an EGP. The use of an EPG limits the amount of routing information exchanged among the three ASs and allows them to be managed differently. The main differences between intra-AS routing protocols (IGPs) and inter-AS routing protocols (EGPs) can be summarized in terms of policy, scalability, and performance. When dealing with inter-AS routing, enforcing policy is crucial. For example, traffic originating from a given AS might be required not to pass through another specific AS. On the other hand, policy is less critical within an AS, where everything is under the control of a single administrative entity. Scalability represents a more critical requirement in inter-AS routing, where a large number of internetworks may be involved, than in intra-AS routing. Conversely, performance is more important within an AS than in inter-AS, where it is of secondary importance compared to policy. For instance, an EGP may prefer a more costly path to another one if the former complies with certain policy criteria that the second does not fulfill.
3.4.7 Unicast and Multicast Routing protocols involving just one sender and one receiver are called unicast protocols. However, several applications require addressing packets from one source to multiple destinations. This is the case, for instance, of applications that distribute identical information to multiple users. In this case, multicast routing is required. Multicast routing enables sending a packet from one source to multiple recipients with a single operation. Multicast addresses are present in IP. A multicast address is a single identifier for a group of receivers, which are thus members of a multicast group. To deploy multicast routing, two approaches are possible. In the first one, there is no explicit multicast support at the network layer, but an emulation using multiple
© 2005 by CRC Press
3-8
The Industrial Communication Technology Handbook
point-to-point unicast connections. This means that each application-level data unit is passed to the transport layer and here duplicated and transmitted over individual unicast network layer connections. In the second option, explicit multicast support is provided: a single packet is transmitted by the source and then replicated at a network router, i.e., forwarded over multiple outgoing links, to reach the destinations. The advantage of the second approach is that there is a more efficient use of bandwidth, as only one copy of a packet will cross a link. However, this approach does have a cost. In the Internet, for instance, multicast is not connectionless, as routers on a multicast connection have to maintain state information. This entails a combination of routing and signaling in order to establish, maintain, and tear down connection state in the routers. Compared to unicast routing, where the focus is on the destination of a packet, multicast routing is backward oriented: multicast routing packets are transmitted from a source to multiple destinations through a spanning tree. Multicast IP routing and relevant protocols will be addressed in Section 3.6. Unicast IP routing is addressed in Section 3.5.
3.5 IP Unicast Routing: Interior and Exterior Gateway Protocols With reference to unicast IP routing, among the IGPs that support IP there are [Lewis99]: • • • • •
The Routing Information Protocol (RIPv1 and RIPv2) The Cisco Interior Gateway Routing Protocol (IGRP) The Cisco Enhanced Interior Gateway Routing Protocol (EIGRP) The Open Shortest-Path-First Protocol (OSPF) The Intermediate System-to-Intermediate System Protocol (IS-IS)
EGPs in the IP domain include: • The Exterior Gateway Protocol (EGP) • The Border Gateway Protocol (BGP) In the following, the different protocols in each class will be described and compared.
3.5.1 Interior Gateway Protocols for IP Networks 3.5.1.1 Distance-Vector IGPs This section describes and compares two popular distance-vector protocols supporting IP: RIP and IGRP. 3.5.1.1.1 Routing Information Protocol The Routing Information Protocol (RIP) was one of the first IGPs and has been used for routing computations in computer networks since the early days of the ARPANET. Formally defined in the XNS (Xerox Network Systems) Internet Transport Protocols publications (1981), its widespread use was favored by its inclusion, as the routed process, in the Berkeley Software Distribution (BSD) version of UNIX supporting Transmission Control Protocol (TCP)/IP (1982). Two RIP versions exist: version 1 [Hedri88] and version 2 [Malkin97]. In RIP, each router sends a complete copy of its entire routing table to all its neighbors on a regular basis (typical RIP update timer = 30 seconds). A single RIP routing update contains up to 25 route entries within the AS. Each entry contains the destination address of a host or network, the IP address of the next-hop, the distance to the destination (in hops), and the interface. To obtain the cost to a given destination, a router can also send RIP request messages. After receiving an update, a router compares the new information with the information it already possesses. If the routing update includes a new destination network, it is added to the routing table. If the router receives a route to an existing destination with a lower metric, it replaces the current entry with the new one. If an entry in the update message has the same next-hop as the current route entry, but a different metric, the new metric will be used to update the routing table.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-9
FIGURE 3.3 A simplified internetwork used to explain split horizon with poison reverse.
If a router does not hear from a neighbor for a given time interval, called a dead interval (an invalid timer is used for this purpose), it assumes that the neighbor is no longer available (either down or unreachable). As a result, the router modifies its local routing table and then notifies its neighbors of the unavailable route. After another predefined time interval (a flush timer is set for this purpose), if nothing is heard from the route, the information is flushed from the router’s routing table. Routers allow for configuration of an active or passive RIP mode on specific interfaces. The active mode means full routing capability, while passive means a listen-only mode; that is, no RIP updates are sent out. To speed up convergence, RIP uses triggered updates. That is, whenever a RIP router learns of a change, such as a link becoming unavailable, it sends out a triggered update immediately, rather than waiting for the next announcement interval. As it takes time for triggered updates to get to all the other routers in large networks, a gateway that has not yet received the triggered update may issue a regular update at the wrong time, thus causing a bad route to be reinserted in a neighbor that has already received the triggered update. In order to prevent new routes from reinstating an old link, a hold-down period is enforced in the protocol: when a route is removed, no update for that route will be accepted for a given period, until the topology becomes stable. Hold-downs have the drawback of slowing convergence. Besides hold-down, RIP implements other techniques to avoid routing loops between adjacent routers, called split horizon with poison reverse. The general split-horizon algorithm prevents routes from being propagated back to the source, i.e., down the interface from which they were learned. As an example, consider the case in Figure 3.3. During normal operations, router A will notify router B that it has a route to network 1. According to the split-horizon algorithm, when B sends updates to A, it will not mention network 1. Now let us assume that the router A interface to network 1 goes down. Without split horizon, router B would inform router A that it can get to network 1. Since it no longer has a valid route, router A might select that route. In this case, A and B would both have routes to 1. But this would result in a circular route, where A points to B and B points to A. Using split horizon with poison reverse, instead of not advertising routes to the source, routes are advertised back to the source with a cost of infinity (i.e., 16), which will make the source router ignore the route. On the whole, RIP is quite robust and very easy to set up. It is the only routing protocol that UNIX nodes universally understand and is therefore commonly used in UNIX environments. RIP is also commonly used in end-system routing as a dynamic router discovery protocol. Dynamic router discovery is an alternative to static configurations, which allows hosts to dynamically locate routers when they have to access devices external to the local network. RIP was designed to work with moderate-size networks using reasonably homogeneous technology. This makes it suitable for local networks featuring a small number of routers (about a dozen) and links with equal characteristics. RIP has a poor degree of scalability, so it is not recommended for use in more complex environments. RIP cannot be used to build a backbone larger than 15 hops in diameter,* as it
*RIP is usually configured in such a way that a cost of 1 is used for the outbound link. If a network administrator chooses to use larger costs, the upper bound of 15 can be quickly reached and will become a problem.
© 2005 by CRC Press
3-10
The Industrial Communication Technology Handbook
specifies a maximum hop count of 15. A number of hops equal to 16 corresponds to infinity. This is useful to prevent packets that get stuck in routing loops from being constantly switched back and forth between routers. As it uses a simple, fixed metric to compare alternative routes, RIP can generate suboptimal routing tables, resulting in packets sent over slow (or costly) links even in the presence of better choices. RIP is therefore not suitable for environments where routes need to be chosen based on dynamically varying parameters such a measured delay, reliability, or load (RIP does not support load balancing). RIP version 1 (RIPv1) lacks variable-length subnet mask (VLSM) [Brade87] supports, so RIPv1 is classful. As will be explained in Section 3.7, this may seriously deplete the available address space to the detriment of scalability. On the other hand, RIP version 2 (RIPv2) is classless. VLSM, classful, and classless concepts will be addressed in detail in Section 3.7. RIPv1 uses a broadcast mode to advertise and request routes, while RIPv2 has the ability to send routing updates via multicast addresses. RIPv2 also supports route aggregation techniques, i.e., the use of a single network prefix to advertise multiple networks [Chen99]. Moreover, RIPv2 provides some support for authorization data link security, implementing authentication support on a per-message basis.* RIPv2 also offers some support for ASs and IGP/EGP interaction by means of external route tags. External routes are learned from neighbors situated outside the AS, while internal routes lie completely within a given AS. In RIPv2 a route tag field is used to separate internal RIP routes from external routes imported from an EGP. Finally, RIPv2 scales better than RIPv1, but, compared with link-state protocols, it suffers from slow convergence and scalability limitations. 3.5.1.1.2 Interior Gateway Routing Protocol The Interior Gateway Routing Protocol (IGRP) [Hedri91] was created by Cisco in the 1980s. Not an open standard, it only runs on Cisco routers. IGRP shares several features with RIP. IGRP sends out updates on a regular basis (every 90 seconds) and uses update, invalid, and flush timers as well (with different values from RIP, depending on the implementation). Like RIP, IGRP uses triggered updates to speed up convergence and hold-down timers to enforce stability. However, while RIP only allows a network diameter of 15 hops, IGRP can support a network diameter of up to 255 hops, so it can be used in larger networks. IGRP differs from RIP regarding metrics, parallel route support, the reverse poisoning algorithm, and the use of a default gateway. Like RIPv1, IGRP does not support VLSM. Unlike RIP, IGRP does not use a single metric, but a vector of metrics that takes into account the topological delay time, the bandwidth of the narrowest bandwidth segment of the path, channel occupancy, and the reliability of the path. A single composite metric can be computed from this vector [Hedri91], which encompasses the effect of the various components into a single number representing how good a path is. The path featuring the smallest value for the composite metric will be the best path. The hop count and MTU (maximum transmission unit, i.e., the maximum packet size that can be sent along the entire path without fragmentation) of each network are also considered in the best path calculation. With IGRP, network administrators are supplied with a large range of metrics and they are also allowed to customize them. By giving higher or lower weight to specific metrics, network administrators can therefore influence IGRP’s automatic route selection. For example, suitable constants can be set to several different values to provide different types of service (interactive traffic, for instance, would typically give a higher weight to delay, whereas file transfer would assign a higher weight to bandwidth). IGRP is more accurate than RIP in calculating the best path, as a vector of metrics instead of a single metric improves the description of the status of a network. If, for instance, a single metric is used, several consecutive fast links will appear to be equivalent to a single slow link. While this would be appropriate for delay-sensitive traffic, it would not be so for bulk *A whole RIP entry, denoted by a special address family identifier (AFI) value of 0¥FFFF, is used for authentication purposes, thus reducing the total number of routing entries per advertisement to 24.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-11
data transfer, which is more sensitive to bandwidth. This problem may arise with RIP, but not with IGRP, as it considers delay and bandwidth separately. An interesting feature of IGRP is that it provides multipath routing and can perform load balancing, by splitting traffic equally between paths that have equal values for the composite metrics. To deal with the case of multiple paths featuring unequal values for the composite metrics, a variance parameter is defined as a multiplier of the best metrics. Only routes with metrics within a given variance of the best route can be used as multiple paths. In this case, traffic is distributed among multiple paths in inverse proportion to the composite metrics. The variance can be fixed by the network administrator. The default value is 1, which means that when the metric values of two paths are not equal, only the best route will be used. A higher variance value means that any path with a metric up to that value will be considered for routing purposes. However, variance values other than 1 should be carefully managed, as they may cause routing loops*. Unlike RIP, which can only prevent routing loops between adjacent routers, the route poisoning algorithm used in IGRP prevents large routing loops between nonadjacent routers. This is achieved by poisoning routes for which the metrics increase by a factor of 10% or more after an update. The rationale behind this is that routing loops generate continually increasing metrics. It should be noted that if this poisoning rule is used, valid routes may be erroneously deleted from routing tables. However, if the routes are valid, they will be reinstalled by the next update. The advantage of the IGRP poisoning algorithm is that it safely allows a zero hold-down value, which significantly improves network convergence time. IGRP handles default routes differently from RIP. Instead of a single dummy entry for the default route, IGRP allows real networks to be flagged as default candidates. IGRP periodically analyzes all the candidate default routes to choose the one with the lowest metric, which will be the actual default route. This approach is more flexible than the one adopted by typical RIP implementations. The default route can change in response to changes within a network. 3.5.1.2 A Hybrid Protocol: The Enhanced Interior Gateway Routing Protocol EIGRP was introduced by Cisco in the early 1990s as an evolution of IGRP [Cisco02a]. As it offers support for multiple network layer protocols (e.g., IP, AppleTalk, and Novell NetWare), EIGRP is commonly used in mixed networking environments. EIGRP is called a hybrid protocol as it combines a distance-vector routing protocol with the use of a diffusing update algorithm (DUAL) [Garcia93], which has some of the features of link-state routing algorithms. The advantages of this combination are fast convergence and lower bandwidth consumption. DUAL, based on distance information provided by route advertisements from neighbors, finds all loop-free paths to any given destination. Among all the loop-free paths to the destination, the neighbor with the best path is selected as the successor, while the others are selected as feasible successors, i.e., eligible routes to be used if the primary route becomes unavailable. If, after a failure, one feasible successor is found, DUAL promotes it to primary route without performing recalculation, thus reducing the overhead on routers and transmission facilities. If, on the other hand, no feasible successors exist, a recomputation (called a diffusing computation) is performed to select a new successor. Unnecessary recomputations affect convergence, so they should be avoided. EIGRP uses the same composite metrics as IGRP. However, it does not send periodic updates, but instead implements neighbor discovery/recovery, based on a hello mechanism, to assess neighbor reachability. Routing updates are only sent in the event of topology changes or a failure in a router or link. Moreover, when the metric for a route changes, only partial updates are sent, and they only reach the routers that really need the update. As a result, EIGRP requires less bandwidth than IGRP. EIGRP relies on the Reliable Transport Protocol (RTP) [Schulz03] to achieve guaranteed and ordered delivery of EIGRP packets to all neighbors. It supports VLSM, thus providing more flexibility in inter*It should also be pointed out that the load balancing performed by the IGRP may produce out-of-sequence packets. This should be taken into account when neither the data link nor transport layer protocols have the capability of handling out-of-sequence packets.
© 2005 by CRC Press
3-12
The Industrial Communication Technology Handbook
network design than RIPv1 or IGRP. This can be particularly useful when dealing with a limited address space (as will be explained in Section 3.7). EIGRP has a modular architecture that makes it possible to add support for new protocols to an existing network. On the whole, EIGRP is robust and easy to configure and use. EIGRP supports both internal and external routes, allowing tags to identify the source of external routes. This feature can be exploited by network administrators to develop their own interdomain routing policies [Pepel00][Cisco97]. 3.5.1.3 Link-State Protocols 3.5.1.3.1 OSPF: The Open Shortest-Path-First Protocol OSPF is a link-state IGP designed by the Internet Engineering Task Force (IETF) in the late 1980s to overcome the limitations of RIP for large networks. The name of this protocol derives from the fact that it is an open standard and that it uses the shortest-path-first (also called Dijkstra [Dijks59]) algorithm [Perlm92][Moy89]. It was specifically designed for the TCP/IP environment and runs directly over IP. OSPF is commonly used in medium to large IP networks and is implemented by all major router manufacturers. Several specifications have appeared since the first one [Moy89]. The RFC for OSPF version 2 (OSPFv2) is in [Moy88], while OSFP specifications to support IPv6 are given in [Coltun99]. One of the most appealing features of OSPF is its support for hierarchical routing design within ASs (although it is purely an IGP). An AS running OSPF can therefore be configured into areas that are interconnected by one backbone area (as shown in Figure 3.4). There are two types of routing within OSPF: intra-area and inter-area routing. In OSPF, each router monitors the state of its attached interfaces. If a topology change occurs, the router that has detected it distributes information about the change to all the other routers within its area, through broadcast messages called link-state advertisements (LSAs). LSAs include metrics, interface addresses, and other data. A router uses this information to build a topological database in the form of a direct graph. The costs associated with the various edges (i.e., links) are expressed by a single dimensionless metric configured by the network administrator. A topological database is therefore present in each router. When two routers have identical topological databases, they become adjacent. This is important, as routing information can only be exchanged between adjacent routers. Locally, the router applies a shortest-path tree algorithm to all the networks, taking itself as the root node. As said above, each router only sends LSAs to all the routers in the same area. This prevents intraarea routing from spreading outside the area. Each router within an area will know routes to any destination within the area and to the backbone. Other AS ASBR 11
IR 1
Area 1
ABR 2
H1
Area 0 (Backbone) BR 10
ABR 3 Area 2
IR 4
ABR 9 IR 6
IR 5 IR 7
H2
Area 3
H3
IR 8 AS 1
FIGURE 3.4 OSPF routing hierarchy.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-13
The internal topology of any area is hidden to every other area. Each router within a given area will know how to reach both every other router within its area and the backbone, but it does not have any idea of the number of routers existing or the way they are interconnected for any other area. Inter-area routing is performed through the backbone (which must be configured as area 0). It should be pointed out that an OSPF backbone is a routing area and not a physical backbone network, as the name might suggest. Four different types of routers can be configured in OSPF: internal routers (IRs), area border routers (ABRs), backbone routers (BRs), and autonomous system border routers (ASBRs). All of them are depicted in Figure 3.4. IRs are only entrusted with intra-area routing. All their interfaces are connected within an area. ABRs route packets between different areas within the AS. They interface to multiple areas (including the backbone), so they belong to both an area and the backbone and maintain topological information about both. BRs belong to the backbone (area 0), but are not ABR; that is, their interfaces are connected only to the backbone (e.g., BR-10). As the backbone operates an area itself, BRs maintain area routing information. ASBRs, also called boundary routers, are responsible for exchanging routing information with routers belonging to other ASs (e.g., ASBR-11 in Figure 3.4). They have an external interface to another AS and learn external routes from dynamic EGPs, such as BGP-4 (presented in Section 3.5), or static routes. External routes are passed transparently throughout the AS and kept separate from the OSPF link-state data. External routes can also be tagged by the advertising routes. A packet to be sent to another area in the AS (inter-area routing) is first routed to an ABR in the source area (intra-area routing) and then routed via the backbone to the ABR that belongs to the destination area. Finally, it will be routed to the ultimate destination. Route calculation introduces high complexity, and as an OSPF router stores all link states for all the areas, the overhead is high in a large internetwork. On the other hand, OSPF scales well and converges quickly. It also provides several appealing features, such as: • Security: All the exchanges between OSPF routers (e.g., LSAs) are authenticated, thus preventing intruders from manipulating routing information. • Type-of-service (TOS) support: Each link may feature different costs according to the traffic TOS requirements. • Load balancing between multiple equal-cost paths. • Triggered updates. • Designated routers: On a LAN, if several routers may be connected, one will be elected as the designated router and another as its backup. The designated router is responsible primarily for generating LSAs for the LAN to all other networks in the OSPF area. • Explicit support for VLSM and the ability to have discontinuous subnets (i.e., made up of single networks or sets of networks featuring noncontiguous addressing). This feature is very useful on the Internet (as will be discussed in Section 3.7). • Tagged external routes: OSPF is capable of receiving routes from and sending routes to different ASs. • Route summarization: An area border router can aggregate two subnets belonging to the same area so that only one entry in the routing table of another router will be used to reach both subnets. This minimizes the size of topological databases in the routers, thus significantly reducing protocol traffic. 3.5.1.3.2 Integrated IS-IS The Intermediate System-to-Intermediate System Protocol (IS-IS) [Callon90] is a link-state protocol. It is an open standard and its name indicates that this protocol is used by routers to talk to each other. Developed for the OSI world, IS-IS was made integrated so that it can route both OSI and IP simultaneously. Similar to OSPF, integrated IS-IS uses the SPF algorithm and is based on LSAs sent to all routers within a given area and hello packets to check the current state of a router. As it is a link-state protocol, it converges fast, but at the expense of high complexity. Integrated IS-IS supports VLSM, load sharing,
© 2005 by CRC Press
3-14
The Industrial Communication Technology Handbook
and triggered updates. Still not widely deployed at present (it is confined to telco and government networks), integrated IS-IS has good potential for medium to large IP networks.
3.5.2 Exterior Gateway Protocols for IP Internetworks EGPs are responsible for routing between ASs. Possible alternatives in the IP world are static routes, the Exterior Gateway Protocol (EGP) [Rosen82], and the Border Gateway Protocol (BGP) [Rekhr95]. 3.5.2.1 Static Routes Static routes can be applied to both intra-AS and inter-AS routing. However, as they offer particularly appealing features for inter-AS routing, they are included in this subsection. Configuration of static routes is simple, and it is very easy to enforce policy (as no routes equals no access). With static routes, no routing protocol messages travel over the links between ASs. On the other hand, maintenance of static routes in large internetworks may be complex, as they do not scale. For this reason, many network designers adopt the Dynamic Host Configuration Protocol (DHCP) [Droms97], which dynamically allocates a default gateway from a set of candidate gateways. Static routes also lack flexibility (as there is no way to choose a better path that could have been selected if dynamic routing protocols were used), so they are not suitable for changing environments. They do not respond to topological changes. To enforce fault tolerance in static routes, a secondary gateway is usually maintained, which could take the role of the primary one if the latter becomes unreachable or goes down. 3.5.2.2 Exterior Gateway Protocol The Exterior Gateway Protocol [Rosen82] was the first EPG to be developed. It runs directly over IP and is a best-effort service. The routing information of EGP is similar to distance-vector protocols, but it does not use metrics. EGP suffers from some design limitations, as it does not support routing loops detection and multiple paths. If more than one path to a destination exists, packets can easily get stuck in routing loops. Nowadays, it has been declared obsolete and replaced by the Border Gateway protocol (BGP). 3.5.2.3 Border Gateway Protocol Border Gateway Protocol version 4 (BGP-4), originally specified in RFC 1771 [Rekhr95], is a very robust and scalable routing protocol that is becoming a de facto standard for inter-AS routing on the current Internet. BGP-4 is an open standard. It somewhat resembles distance-vector protocols, as it is a distributed protocol where information exchange occurs only between directly connected routers. However, it is more appropriate to define BGP-4 as a path-vector protocol. This is because, as stated in [Rekhr95], “the primary function of a BGP speaking system is to exchange network reachability information with other BGP systems.” Network reachability information includes only the list of autonomous systems (ASs) that need to be traversed in order to reach other networks. Path information, instead of cost information, is exchanged between neighboring BGP routers, and BGP-4 does not specify the rule for choosing a path from those advertised. The routing mechanism and routing policy are therefore separated. A policy is manually configured to allow a BGP router to rate possible routes to other ASs and choose the best path. Two BGP routers first have to establish a TCP connection. After a negotiation phase, in which the two systems exchange some parameters (such as BGP version number, AS number, etc.), they become BGP peers and can start exchanging information. Initially, they exchange the full routing tables. Thereafter, only incremental updates are sent when some change in the routing tables occurs. No periodic refresh of the entire routing table is needed, so a BGP speaker maintains the current version of the entire BGP routing tables of all of its peers for the duration of the connection. In order to maintain the connection, peers regularly exchange keep-alive messages. Notification messages are sent in response to errors or special conditions. In the event of error, a notification message is sent and the connection is closed. Network reachability information is used to construct a graph of AS connectivity, from which routing loops may be pruned and policy decisions at the AS level may be enforced. Each BGP router maintains a routing table with all feasible paths to a given network.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-15
As said above, BGP does not propagate cost information but path information, and does not specify which path should be selected from those that have been advertised, as this is a policy-dependent decision left up to the network administrator. This is because, as stated previously (in Section 3.4), when dealing with inter-AS routing, policy is more important than performance. However, when a BGP speaker receives several updates describing the best paths to the same destination, it has to choose one of them. The selection criteria are based on rules that take path attributes into account. Once the decision has been made, the speaker puts the best path in its routing table and then propagates the information to its neighbors. The path attributes generally used in the route selection process are [Lewis99][Halabi00][Huitem95]: • Weight: A Cisco-defined attribute that is local to a router and is not advertised to neighboring routers. If the router learns about more than one route to the same destination, the route with the highest weight will be preferred. • Local preference: Used to choose an exit point from the local autonomous system (AS). It is propagated throughout the local AS. If there are multiple exit points from the AS, the local preference attribute is used to select the exit point for a specific route. • Multiexit discriminator, or metric attribute: Used as a suggestion to an external AS regarding the preferred route into the AS that is advertising the metric. • Origin: Indicates how BGP learned about a particular route. The origin attribute can have one of three possible values: IGP (the route is interior to the originating AS), EGP (the route is learned via the Exterior Border Gateway Protocol (EBGP), or incomplete (the origin of the route is unknown or learned in some other way — this occurs when a route is redistributed in BGP). • AS_path: When a route advertisement passes through an autonomous system, the AS number is added to the ordered list of AS numbers the advertisement traversed. • Next-hop: The IP address that is used to reach the advertising router. • Community: This provides a way of grouping communities, i.e., destinations to which routing decisions can be applied. Predefined community attributes are no-export (this route must not be advertised to EBGP peers), no-advertise (this route must not be advertised to any peer), and Internet (this route must not be advertised to the Internet community — all routers in the network belong to it). Attributes are crucial to achieve scalability, define routing policies, and maintain a stable routing environment. Other BGP features include route filtering, i.e., the ability of a BGP speaker to specify which routes to send and receive from any of its peers. Filtering may refer to inbound or outbound links and may be applied as permit or deny. BGP-4 supports supernetting or classless interdomain routing (CIDR) [Fuller93] (dealt with in Section 3.7), which enables route aggregation, that is, the combination of several routes within a single route advertisement. This minimizes the size of routing tables and protocol overhead. BGP was originally designed to perform inter-AS routing, but it can also be used for intra-AS routing. BGP connections between ASs are called external BGPs (e-BGPs), while those within an AS are called internal BGPs (i-BGPs). Figure 3.5 shows a typical scenario for e-BGP, i.e., a multihomed AS connected to the Internet. All the networks X, Y, Z, 1, 2, and 3 are ASs. More specifically, 1 and 3 are stub networks, 2 is a multihomed stub network, and X, Y, and Z are backbone provider networks. A stub network is such that all traffic entering it should be destined to that network. A multihomed AS is connected to multiple ASs (for example, via two different service providers*), but does not allow transit traffic. Stub network 2 will be prevented from forwarding traffic between Y and Z by a selective route advertisement mechanism. BGP routes will be advertised in such a way that network 2 will not advertise
*In this situation, multiple service providers are used to increase availability and to allow load sharing.
© 2005 by CRC Press
3-16
The Industrial Communication Technology Handbook
FIGURE 3.5 A scenario for BGP.
to its neighbors (Y and Z) any path to other destinations except itself. For instance, 2 will not advertise path 2 Z 3 to network Y, so the latter will not forward traffic to 3 via 2. i-BGP can be used to distribute routing information to routers within the AS regarding destinations (networks) outside the AS. Running i-BGP offers several advantages, such as a consistent view of the AS to external neighbors, more control over information exchange within the AS, flexibility, and a slightly shorter convergence time than e-BGP (which is slow). Table 3.2(A) and Table 3.2(B) list the various IGPs and EGPs discussed and compare them according to multiple design criteria. The tables can be helpful to guide the designer’s choice when dealing with routing protocols in the IP domain. We have seen that various IGPs and EGPs exist in the IP domain, including both proprietary protocols and open standards. In a composite, complex internetwork they may coexist either for historical reasons or due to the presence of multiple vendor solutions. This coexistence may cause incompatibility problems due to both the different metrics adopted and peculiarities of the various protocols. It is therefore necessary to find a way to overcome incompatibilities and form a holistic view of the network topology, thus enabling interoperability. This is the case for route redistribution [Awdu02]. Route redistribution is the process that enables routing information from one protocol to be translated and used by a different one. A router can therefore be configured to run more than one routing protocol and redistribute route information between the two protocols. TABLE 3.2(A) IGPs vs. EGPs in the IP Domain Protocol IGP
EGP
Technology
Type
Metrics
Distance vector
Classful
RIPv1 RIPv2 EIGRP
Advanced distance vector
Classful Classless Classless
OSPF IS-IS BGP-4
Link state Link state Path vector
Classless Classless Classless
Bandwidth delay, load, reliability, MTU Hop count Hop count Bandwidth, delay, load, reliability, MTU Cost Cost Cost, hop, policy
IGRP
Scalability
VLSM Support
Medium
No
Small Small Large
No Yes Yes
Large Very large Large
Yes Yes Yes
TABLE 3.2(B) IGPs vs. EGPs in the IP Domain Protocol IGP
EGP
IGRP RIPv1 RIPv2 EIGRP OSPF IS-IS BGP-4
© 2005 by CRC Press
Hop Count Limit
Load Balance (Equal Paths)
Load Balance (Unequal Paths)
Standard
Convergence Time
Routing Algorithm
100 (up to 255) 15 15 100 (up to 255) 200 1024
Yes Yes Yes Yes Yes Yes No
Yes No No Yes No No No
No Yes Yes No Yes Yes Yes
Slow Slow Slow Fast Fast/slow Fast/slow Slow
Bellman–Ford Bellman–Ford Bellman–Ford Dual Dijkstra IS-IS
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-17
Redistribution is needed in two cases: • At the AS boundary, as IGPs and EGPs do not match • Within an AS, when multiple IGPs are used Redistribution has direction. That is, routing information can be redistributed symmetrically (mutual redistribution) or asymmetrically (hierarchical redistribution). Handling redistribution is not an easy task, and it is difficult to define a general approach. Solutions are typically vendor dependent, as the problems that can arise are strictly related to the details of the various protocols.
3.6 IP Multicast Routing The aim of multicast routing is to build a tree of links connecting all the routers that are attached to hosts belonging to the multicast group. Two different approaches are possible. The first one, called groupshared tree, foresees a single tree for all sources in the multicast group, while the second approach, called source-based trees, provides an individual routing tree for each source in the multicast group. The group-shared tree approach entails solving the Steiner tree problem [Hakimi71]. Although several heuristics were devised to solve this problem [Wall80][Waxm88][Wei93], they were not adopted by existing Internet multicast routing algorithms, due to the complexity and poor scalability of these methods, which entail that multicast-aware routers maintain state information about all the links in the network. Moreover, the tree built using these techniques should be recalculated at any change in link costs. A more effective way to find the group-shared tree is the center-based (or core-based) approach, which is adopted by several Internet multicast routing algorithms. Here a router of the multicast group (also called a rendezvous point) is elected to be the core of the multicast tree. The core node will be the recipient of join messages sent by all the routers attached to the hosts belonging to the multicast group. A join message is forwarded toward the core via unicast routing, and it stops when it reaches the core or when it reaches a router that is already part of the multicast tree. This way, the path followed by the join message is the branch to the core from the originating node, and it will become part of the multicast tree (the branch is said to be grafted on to the existing tree). The advantages of this approach are that unicast tables are exploited to forward the join messages and that multicast-aware routers do not need to maintain link state. The source-based trees approach entails solving a least-cost path multicast routing problem. This could be expensive, as each sender needs to know all links’ costs to derive the least-cost spanning tree. For this reason, the reverse path-forwarding (RPF) algorithm [Dalal78] is used, which is also useful as it prevents loops. The RPF algorithm allows a router to accept a multicast packet only on the interface from which the router would send a unicast packet to the source of the incoming multicast packet. RPF is effective, as each router only has to know the next-hop along its least-cost path to the source, but it has the drawback that even routers that are not attached to hosts belonging to the multicast group would receive multicast packets. This can be solved by pruning techniques, i.e., a prune message can be sent back upstream by a multicast router receiving a multicast message for which it has no recipients. As in a dynamic scenario, if it happens that a receiver later joins a multicast-aware router that has already sent a prune message, unprune messages to be sent back upstream or suitable timeouts (time to live (TTL)) to remove bad prunes can be introduced. In the Internet, multicast is achieved by the combination of the Internet Group Management Protocol version 2 (IGMPv2) [Fenne97] and multicast routing protocols. IGMP is an end system to intermediate system (ES-IS) protocol for multicasts. End systems (ESs) are network devices without the capability to forward packets between subnetworks. IGMP is used by a host to inform its directly attached router that an application running on it is interested in joining a given multicast group. IGMP allows multicast group members to join or leave a group dynamically and maintains state information on router interfaces that can be exploited by multicast routing protocols to build the delivery tree.
© 2005 by CRC Press
3-18
The Industrial Communication Technology Handbook
Multicast routing algorithms, such as the protocol-independent multicast (PIM), the Distance-Vector Multicast Routing Protocol (DVMRP), the multicast open shortest path first (MOSFP), and the corebased tree (CBT), are in charge of coordinating the routers so that multicast packets are forwarded to their ultimate destinations. They are outlined below.
3.6.1 Distance-Vector Multicast Routing Protocol The Distance-Vector Multicast Routing Protocol [Waitz88] is an Interior Gateway Protocol derived from RIP. DVMRP combines many of the features of RIP with the truncated reverse-path-broadcasting (TRPB) algorithm described by Deering [Deeri88]. DVMRP is based on flood and prune, according to the dense-mode multicast routing model, which assumes that the multicast group members are densely distributed over the network and bandwidth is plentiful. Such a model fits high-density enterprise networks. The multicast forwarding algorithm requires the building of per-source multicast trees based on routing information, and then the dynamic creation of per-source group multicast delivery trees, by pruning the multicast tree for each source in a selective way. To unprune a previously pruned link, DVMRP provides both an explicit graft message and a TTL on prune messages (2 hours by default). DVMRT includes support for tunneling IP multicast packets. This is very useful, as not all IP routers have multicast capabilities. Tunneling is performed by encapsulating multicast packets in IP unicast packets that are addressed and forwarded to the next multicast router on the destination path. Thanks to its tunneling capabilities, DVMRP has been used on the Internet for several years to support the multicast overlay network MBone [Kumar96], although its flooding nature does not recommend its adoption on large internetworks. The tree building performed by DVMRP needs more state information than RIP, so DVMRP is more complicated than RIP. There is also a very important difference from RIP. While the target of RIP is to route and forward packets to a particular destination, the goal of DVMRP is to keep track of the return paths to the source of multicast packets.
3.6.2 Multicast OSPF Multicast OSPF (MOSPF) [Moy94] is an extension of the OSPFv2 unicast protocol enabling the routing of IP multicast packets. MOSPF is not a separate routing protocol; the multicast extensions are built on top of OSPFv2 and have been implemented so that a multicast routing capability can be gradually introduced into an OSPFv2 routing domain. A new OSPF link-state advertisement (LSA) describing the location of multicast destinations is added. Each router builds least-cost multicast trees for each (sender, group) pair. The path for a multicast packet is obtained by building a source-rooted pruned shortest-path multicast tree. The state of the tree is cached: it has to be recalculated following link-state changes or when the cache times out. Unlike unicast packets, in MOSPF an IP multicast packet is routed based on both the packet’s source and its multicast destination. During packet forwarding, any commonality of paths is exploited. When multiple hosts belong to a single multicast group, a multicast packet will be replicated only when the paths to the separate hosts diverge [Moy94]. MOSPF is an example of sparse-mode routing protocol, which assumes that multicast members are widely distributed over the network and bandwidth is possibly restricted. The sparse mode is suitable for internetworking applications. However, MOSF does not support tunneling and, due to the inherent scaling problems of shortest-path algorithm, is not suitable for large internetworks.
3.6.3 Protocol-Independent Multicast Protocol-independent multicast (PIM) is a recent IP multicast protocol. The name indicates that PIM is not dependent on any particular unicast routing protocol. PIM works with IGMP and existing unicast routing protocols, such as RIP, IGRP, OSFP, IS-IS, and BGP. There are two operating modes for PIM, the sparse mode and the dense mode, which are described below.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-19
3.6.3.1 Sparse-Mode PIM The sparse-mode protocol-independent multicast (PIM-SM) [Estrin98] is a protocol for efficiently routing to multicast groups that may span wide-area (and interdomain) internetworks and is designed to support the sparse distribution model. PIM-SM works using a rendezvous point (RP) and requires explicit join/prune messages. A sender willing to send multicast packets has first to announce its presence by sending data to the RP. Analogously, a receiver willing to receive multicast data has first to register with the RP. Once the data flow from sender to RP to receiver starts, the routers on the path automatically optimize the path, removing not-needed hops. This way, traffic only flows where it is required and router state is maintained only along the path. A drawback of PIM-SM is that RP could become a bottleneck, so multiple RPs may be introduced to avoid congestion through load sharing. 3.6.3.2 Dense-Mode PIM The dense-mode protocol-independent multicast (PIM-DM) [Nicho03] forwards multicast packets out to all connected interfaces except the receive one. Thus, it floods the network first and prune-specific branches later. PIM-DM efficiently supports routing in dense multicast networks, where it is reasonable to assume that every downstream system is potentially a member of the multicast group. Unlike DVMRP, PIM-DM does not use routing tables. PIM-DM is easy to configure, but less scalable and less efficient than PIM-SM for most applications.
3.6.4 Core-Based Tree The core-based tree (CBT) multicast routing protocol builds a shared multicast distribution tree per multicast group. CBT follows the core-based approach described at the beginning of Section 3.6 for group-shared tree building. Despite its simplicity, the way the tree is built in CBT tends to concentrate traffic around the core routers, and this may lead to congestion. For this reason, some implementation features multiple core routers and performs load sharing between them. CBT is very suitable for supporting multicast applications, such as distributed interactive simulations or distributed video gaming, as they are characterized by many senders within a single multicast group. The deployment of CBT until now has been limited. More details on CBT can be found in [CBT][CBTv2].
3.6.5 Interdomain IP Multicast Routing The current approaches to interdomain IP multicast routing are based on an extension of BGP-4, called the Multicast Border Gateway Protocol (MBGP) [Bates00]. MBGP carries two sets of routes, one for unicast routing and one for multicast routing. The routes associated with multicast routing are used by PIM-SM to build data distribution trees.
3.7 IP Addressing and Routing Issues This section focuses on the strict coupling between IP addressing and routing. First, the two commonly used addressing models, classful and classless, are described, together with their impact on routing. Then subnetting, variable-length subnet masks (VLSM), and classless interdomain routing (CIDR) are presented. Finally, IPv6 and its deployment in the current Internet, together with IPv4/IPv6 migration issues, are discussed.
3.7.1 Classful IP Addressing According to the first IP specification [Poste81], each system attached to an IP-based Internet has a globally unique 32-bit Internet address. IP addresses are administered by the Internet Assigned Numbers Authority [IANA]. Systems that have interfaces to more than one network must have a unique IP address for each network interface. The Internet address consists of two parts: (1) the network number (or
© 2005 by CRC Press
3-20
The Industrial Communication Technology Handbook
TABLE 3.3(A) Address Formats and Network Sizes for Each Class
Class ID Class A Class B Class C Class D Class E
Highest-order bits = 0 Second-highest-order bits = 10 Third-highest-order bits = 110 Fourth-highest-order bits = 1110 Fifth-highest-order bits = 11,110
Network Prefix Size (bit)
Network Number Size (bit)
Host Number Length (bit)
Maximum No. of Networks
Maximum No. of Hosts Per Network
8 16
7 14
24 16
27 – 2 = 126a 214
224 – 2b 210 – 2
24
21
8
221
28 – 2
na
na
na
na
na
tbd
tbd
tbd
tbd
tbd
a
The maximum number of networks which can be defined is 126, and not 128, as there are 2 reserved networks, i.e., network 0.0.0.0 (reserved for default routes) and network 127.0.0.0 (reserved for the loopback function). b As host numbers 0.0.0.0 (“this network”) and 1.1.1.1 (“broadcast”) are reserved.
TABLE 3.3(B) Address Type and Ranges for Each Class
Class A Class B Class C Class D Class E
Type
Dotted Decimal Notation Range
Unicast Unicast Unicast Multicasta Reserved for experimental use only
From 1.0.0.0 to 127.255.255.255 From 128.0.0.0 to 191.255.255.255 From 192.0.0.0 to 223.255.255.225 From 224.0.0.0 to 239.255.255.255 From 240.0.0.0 to 247.255.255.255
a
Class D is used for multicast applications and routing protocols such as OSPF and RIPv2.
network identifier, netID), which identifies the network to which the host belongs, and (2) the host number (or host identifier, hostID), which specifies the particular host on the given network. In classful addressing, the IP address space is divided into three main address classes, Class A, Class B, and Class C, which differ for the position of the boundary between the network number and the host number within the 32-bit address. Two additional classes are also defined: Class D (used for multicast addresses) and Class E (reserved for future use). IP addresses are commonly expressed in what is called dotted decimal notation, which divides the 32bit Internet address into four 8-bit fields. Each field value corresponds to each byte of the address, written in its decimal form and separated by a period (dot) from the other bytes in the address. For example, let us consider the IP address 192.32.215.8. The first number, 192, is the decimal equivalent of the first eight bits of the address, i.e., 11000000; the second, 32, is the equivalent of the second eight bits of the address; and so on. In binary notation, the address is therefore 11000000 00100000 11010111 00001000. Table 3.3(A) summarizes the address formats for each class and the maximum number of networks and hosts that can be defined within each class, while Table 3.3(B) shows the address type and address ranges for the different classes in the dotted decimal notation. Each unicast address class (i.e., A, B, and C) has an associated default mask, which is a bit mask used by host and routers to assess how much of the netID is significant for forwarding decisions. The bit mask therefore indicates how much of the address is allocated to the netID and how much is left to the hostID. Table 3.4 shows the default masks associated with each unicast IP address class. For example, the default mask 255.0.0.0 for Class A indicates that only the first eight bits are used by the netID. The mask role is crucial for routers, as by ANDing such a mask with a destination address, they can easily determine if an incoming packet should be sent directly to the local network or forwarded to another one.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-21
TABLE 3.4 Prefix-Default Masks Association for Each Unicast Class
Class A Class B Class C
Default Mask
Default Prefix
255.0.0.0 255.255.0.0 255.255.255.0
/8 /16 /24
Recently, masks have been indicated using the so-called prefix, which indicates the number of contiguous bits used by the mask. Prefix–default mask associations for each unicast class are also shown in Table 3.4. A host can be configured with multiple IP addresses of different classes on the same physical interface. Direct communication is only possible between nodes within the same class and prefix, while nodes with a different address class or the same address class and a different prefix need an intermediate device, such as a layer 3 switch, router, proxy, or network address translator (NAT). Network address translation allows a router to act as an agent between the Internet and a local private network. This means that only a single, unique IP address is required to represent an entire group of computers. NAT implements short-term address reuse and is based on the fact that a very small percentage of hosts in a stub domain (i.e., a domain such as a corporate network that only handles traffic originated or destined to hosts in the domain) are communicating outside of the domain at any given time. Because many hosts never communicate outside of their stub domain, only a subset of the IP addresses inside the domain needs to be translated into IP addresses that are globally unique when outside communications are required. Each NAT device has a table consisting of pairs of local IP addresses and globally unique addresses. The IP addresses inside the stub domain are not globally unique. They are reused in other domains, thus solving the address depletion problem. The globally unique IP addresses are assigned according to the CIDR address allocation schemes, which solve the scaling problem (as will be discussed below). The main advantage of NAT is that it can be installed without changes to routers or hosts. More details on NAT can be found in [Egeva94].
3.7.2 Impact of IP Addressing on Routing Tables and Internet Scalability The two most compelling problems facing today’s Internet are IP address depletion and poor scaling due to the uncontrolled increase in the size of Internet routing tables. The first problem is the result of the IP version 4 (IPv4) addressing scheme, based on a 32-bit address, which limits the total number of IPv4 addresses available. The situation is further complicated by the traditional model of classful addressing, which determined inefficient allocation of some portions of the IP address space in the early days of the Internet. The second problem derives from the exponential growth of the number of organizations connected on the Internet, combined with the fact that Internet backbone routers have to maintain complete routing information for the Internet. This problem cannot be solved by hardware enhancements, such as expanding router memory or improving router processing power; to deal with large routing table processing, route flapping (i.e., rapid changes in WWW route connections), and large volumes of information to be exchanged without jeopardizing routing efficiency or the reachability of Internet portions, a more comprehensive and effective approach is needed. In the following, a technique, called subnetting, to reduce the uncontrolled growth of Internet routing tables is described.
3.7.3 Subnetting Introduced in [Mogul85], subnetting allows a single Class A, B, or C network number to be divided into smaller parts. This is achieved by splitting the standard classful hostID field into two parts: the subnetID
© 2005 by CRC Press
3-22
The Industrial Communication Technology Handbook
FIGURE 3.6 Extended network prefix for subnetting.
and the hostID on that subnet (as shown in Figure 3.6). The (netID, subnetID) pair forms the extended network prefix. In this way, if the internal network of a large organization is split into several subnetworks, the division is not visible outside the organization’s private network. This allows Internet routers to use a single routing table entry for all the subnets of a large organization, thus reducing the size of their routing tables. However, subnetworks are visible to internal routers, which will have to differentiate between the internal routes. Internet routers therefore use only the netID of the destination address to route traffic, while internal routers will use the extended network prefix. Subnetting has two main advantages. First, it hides the complexity of private network organization within the private network boundary, preventing it from spreading outside and affecting the size of Internet router routing tables. Second, thanks to subnetting, local administrators do not have to obtain a new network number from the Internet when deploying new subnets. Each bit in the subnetID mask has a one-to-one correspondence with the Internet address. If a bit in the subnet mask is set to 1, the corresponding bit in the IP address will be considered by the router as part of the extended network prefix. Otherwise, the corresponding bit in the IP address will be considered as part of the host number. Modern routing protocols still carry the complete four-octet subnet mask. The use of a single subnet mask, however, limits organizations to hosting a fixed number of fixed-size subnets. A further improvement that greatly enhances flexibility is using more than one subnet mask for an IP network split into several subnetworks. This solution, called variable-length subnet masks, is described next.
3.7.4 Variable-Length Subnet Masks Variable-length subnet masks (VLSM), introduced in [Brade87], proposed the possibility of using more than one subnet mask for a subnetted IP network. As in this case, where the extended network prefix may have different lengths, the subnetted IP network is called a network with variable length subnet masks. VLSM is a powerful improvement in flexibility. VLSM also offers another significant benefit — the possibility of introducing route aggregation (summarization). Route aggregation is defined as the ability of a router to collapse several forwarding information base entries into a single entry [Trotter01]. This aggregation process works in combination with VLSM to build hierarchically structured networked environments. VLSM allows us to recursively divide the address space of a large organization. This is accomplished by splitting a large network into subnets, some of which are then further divided into subnets, some of which will, in turn, be split into subnets. This division makes routing information relevant to one group of subnets invisible to routers belonging to another subnet group. A single router can therefore summarize multiple subnets behind it into a single advertisement, thus allowing a reduction in the routing information to be maintained at the top level. From the routing perspective, route summarization offers the following advantages: • It reduces the amount of information stored in routing tables.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-23
• It simplifies the routing process, thus reducing the load on router resources (e.g., processor and memory). • It improves network convergence time and isolates topology changes. Without summarization, every router would need to have a route to every subnet in the network environment. As the size of the network gets larger, the more crucial route summarization becomes. 3.7.4.1 Routing Protocol Requirements for Deploying VLSM Now that the advantages of VLSM are clear, let us analyze what features routing protocols have to offer in order to provide support for VLSM deployment. The first requirement is that routing protocols must carry extended network prefix information along with each route advertisement. This allows each subnetwork to be advertised with its corresponding prefix length or mask. As said before (in Section 3.5), some protocols, such as OSPF, Interior IS-IS, and RIPv2, provide this feature. RIPv1, on the other hand, allows only a single subnet mask to be used within each network number, because it does not provide subnet mask information as part of its routing table update messages. In this case, a router would have to either guess that the locally configured prefix length should be used (but this cannot guarantee that the correct prefix will be applied) or perform a lookup in a statically configured prefix table containing all the masking information (but static tables raise severe scalability issues, require nonnegligible effort for maintenance, and are error-prone). As a result, to successfully deploy VLSM in a large complex network, the designer must choose an IGP such as OSPF, IS-IS, or RIP v2, while RIPv1 should be avoided. The second requirement for deploying VLSM is that all routers must adopt a consistent forwarding algorithm based on the longest match. When VLSM is implemented, it may happen that a destination address matches multiple routes in a router’s routing table. As a route with a longer extended network prefix is more specific than a route with a shorter one, when forwarding traffic, routers must always choose the route with the longest matching extended network prefix. The third requirement to support VLSM is related to route aggregation and consists of assigning addresses so that they have topological significance. This means that addresses have to reflect the hierarchical network topology. In general, network topology follows continental and national boundaries, so IP addresses should be assigned on this basis. If the organizational topology does not match the network topology, route aggregation should not be applied. While, in fact, it is reasonable to aggregate a pool of addresses assigned to a particular region of the network into a single routing advertisement, it is not meaningful to group together addresses that are not topologically significant. Wherever route aggregation cannot be applied, the size of the routing tables cannot be reduced. The solution that allows today’s Internet to operate normally despite the problems related to the depletion of IPv4 addressing space and to the growing size of Internet routing tables is classless interdomain routing (CIDR), which is described below.
3.7.5 Classless Interdomain Routing CIDR, also called supernetting, relaxes the traditional rules of classful IP addressing; as with CIDR, the netID is not constrained to 8, 16, or 24 bits anymore, but may be any number of bits long. This realizes the so-called classless addressing [Hinden93][Rekht93a][Fuller93][Rekht93b]. In the CIDR model, a prefix length specifies the number of leading bits in the 32-bit address that represent the network portion of the address. For example, the network address in the dotted decimal form a.b.c.d/21 indicates that the first 21 bits specify the netID, while the remaining 11 bits identify the specific hosts in the organization. As a result, CIDR supports the deployment of arbitrarily sized networks rather than the standard 8-bit, 16-bit, or 24-bit network numbers associated with classful addressing. Moreover, the rightmost 11 bits could be further divided through subnetting [Mogul85], so that new internal networks within the a.b.c./21 network can be created. For route advertising, instead of the traditional high-order bits scheme, prefix length is used. Routers supporting CIDR therefore rely on the prefix length information provided with the route.
© 2005 by CRC Press
3-24
The Industrial Communication Technology Handbook
192.0.0.0/8
192.169.0.0/16
192.168.1.0/24 192.168.2.0/24 192.168.3.0/24
192.169.1.0/24
FIGURE 3.7 CIDR and route summarization.
CIDR also supports route aggregation. As a result, with CIDR a single routing table entry can represent the address space of thousands of traditional classful routes. That is, in route advertisement, networks can be combined into supernets, as long as they have a common network prefix (Figure 3.7). This is crucial to reduce the size of Internet backbone router routing tables and to simplify routing management. The implementation of CIDR in the Internet is mainly based on the BGP-4 protocol. Internet is divided into addressing domains in a hierarchical way. Within a domain, detailed information about all the networks belonging to the domain is available, while outside the domain only the common network prefix is advertised. This allows a single routing table entry to specify a route to many individual network addresses. CIDR and VLSM are similar, since both allow a portion of the IP address space to be recursively divided into smaller pieces. Both approaches require that the extended network prefix information be provided with each route advertisement and use longest matching for addresses. The key difference between VLSM and CIDR is a matter of where recursion is performed. In VLSM the subdivision of addresses is done after the address range is assigned to the user (e.g., a private enterprise’s network). In CIDR the subdivision of addresses is done by the Internet authorities and ISPs before the user receives the addresses. CIDR deployment also imposes the same routing protocol requirements as VLSM. Although CIDR, in combination with network address translation (NAT), represents an acceptable short-term solution to today’s Internet deficiencies, the long-term solution is to redesign the address format to allow for more possible addresses and more efficient routing on the Internet. This is the reason for a new version of the IP, called IP version 6 (IPv6) [Deeri98], which was devised to overcome the limitations of IPv4.
3.8 IPv6 Overview IPv6 [Deeri98], the new version of the IP, solves the problem of a limited number of available IPv4 addresses and also adds many improvements in areas such as routing, network self-configuration, and QoS support. The IPv6 protocol has been streamlined to expedite packet handling by network nodes and provides support for congestion control, reducing the need for reliable, but untimely higher-level protocols (e.g., TCP). Moreover, IPV6 characteristics can be exploited inside routers to entrust them with the task of providing diversified scheduling for real-time and non-real-time flows. The IPv6 header is different from the IPv4 header format, and this makes the two protocols incompatible. Although the IPv6 addresses are four times longer than the IPv4 addresses, the IPv6 header is only twice the size of the IPv4 header, as several functions present in the IPv4 header have been relocated in extension headers or dropped [Deeri98].
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-25
For example, in IPv6 there is no checksum field. This speeds up routing, as routers are relieved from recalculating checksum for each incoming packet. IPv6 also provides the hop-limit field, which indicates the number of hops remaining valid for this packet. Used to limit the impact of routing loops, the hop limit field is decremented by 1 by each node that forwards the packet. The packet is discarded if the hop limit reaches zero. In IPv4 there was a timeto-live field, but it was expressed in seconds. The change in number of hops reduces the processing time within routers. Another difference is that in IPv6 the minimum header length is 40 bytes long and can be supplemented by a number of extension headers of variable length. For example, the payload length in bytes is specified in the payload length field. Note that IPv6 options can be of arbitrary length (and not limited to 40 bytes, as in IPv4). Another interesting field in the IPv6 header is the flow label [Partri95], which enables the source to label packets for special handling by intermediate systems. The important features of IPv6 can therefore be summarized as follows [Deeri98]: • A new addressing scheme • No fragmentation/reassembly at the intermediate routers: These time-consuming operations are left to the source and destination, thus improving IP forwarding within the network. • Simplified header and fixed-length options in the header: Compared to IPv4, these features reduce the processing overhead on routers. • Support for extension headers and options: IPv6 options are placed in separate headers located in the packet between the IPv6 header and the transport layer header. Most IPv6 option headers are not processed by any router along a path before the packet arrives at its final destination, and this improves router performance for packets containing options. • Quality-of-service capabilities: These are crucial for the future evolution of the Internet. A new capability enables labeling of packets belonging to specific traffic flows for which the sender has requested special handling, such as nondefault quality of service or real-time service. • Support for authentication and privacy: Through an extension that provides support for authentication and data integrity. • Support for source routes: IPv6 includes an extended source routing header designed to support source-initiated selection of routes (used to complement the route selection provided by existing routing protocols for both interdomain and intradomain routes). Some extension headers are exploited for routing purposes. The routing header is used by a source to list one or more intermediate nodes to be visited on the way to a packet’s destination (more details will be given below). This particular form of the routing header is designed to support source demand routing (SDR) [Estrin96]. The hop-by-hop options header is used to carry optional information that must be examined by every node along a packet’s delivery path. Finally, the end-to-end options header is used to carry optional information that needs to be examined only by a packet’s destination node (or nodes).
3.8.1 IPv6 Addressing, Subnetting, and Routing The address field in IPv6 is 128 bits long. This not only entails a higher number of possible IP addresses than the one achievable with the 32-bit IPv4 address, but also allows more levels of the addressing hierarchy and simpler autoconfiguration. IPv6 addresses are assigned to interfaces, not nodes. A node can have several interfaces, and therefore it can be identified by any of the unicast addresses assigned to any of its interfaces. IPv6 supports unicast, anycast, and multicast addresses. With unicast addressing, the packet is delivered to the interface identified by the specific address. With anycast addressing, the packet is delivered to one of the interfaces identified by that address, i.e., the nearest one, according to the routing metric adopted. Anycast can be considered
© 2005 by CRC Press
3-26
The Industrial Communication Technology Handbook
a refinement of unicast devised for simplifying and streamlining the routing process. When used as part of a route sequence, anycast addresses permit a node to select which of several ISPs it wants to transfer its traffic. This capability is referred to as source-selected policies. Finally, with multicast addressing, the packet is delivered to all the interfaces identified by that address. With multicast and anycast addressing, addressed interfaces typically belong to different nodes. The address form can be expressed in three different ways: preferred, compressed, and mixed. The preferred form is the full IPv6 address in hexadecimal values, which is H:H:H:H:H:H:H:H, where each H refers to a hexadecimal integer (16 bits). The compressed form substitutes zero strings with a shorthand indicator — double colons (::) — to compress arbitrary-length strings of zeros. This is useful as IPv6 addresses containing long strings of zeros are quite common. The mixed form is represented as H:H:H:H:H:H:H:H:D.D.D.D, where the Hs represent the hexadecimal values of the six high-order 16-bit parts of the address, while the Ds stand for the standard IPv4 decimal value representation of the four low-order 8-bit parts of the address. This mixed form is useful in hybrid IPv4/ IPv6 environments. There are two special addresses, the unspecified address 0:0:0:0:0:0:0:0 and the loopback address. The first one indicates the absence of an address, which must never be assigned to any node or used as the destination address. The second is the special unicast address 0:0:0:0:0:0:0:1, which may be used by a node to send a packet to itself. There are six types of unicast IPv6 addresses. Here we only mention aggregatable global unicast addresses, which can be routed globally on the IPv6 backbone, i.e., the 6Bone, and are equivalent to public IPv4 addresses. Aggregatable global unicast addresses can be aggregated or summarized to produce an efficient routing infrastructure. IPv6 subnetting can be compared to classless addressing. As in the CIDR notation, the prefix length indicates the leading bits that constitute the netID. IPv6 routing is hierarchical and reflects the classless concept. However, with IPv6, small to regional network service providers and end users are no longer able to obtain address space directly from numbering authorities such as IANA [IANA]. Only top-level aggregators (TLAs) (i.e., large ISPs) will be assigned address space from the Internet Registry. TLAs will be assigned address blocks, which they will in turn handle and delegate to their downstream connections, i.e., next-level aggregators (NLAs) (medium-size ISPs or specific customers sites) and site-level aggregators (SLAs) (individual organizations) [3COM]. With this new hierarchical architecture, the number of entries to be maintained in the routing tables of Internet core routers is reduced, thus limiting the routing complexity of the future Internet. IPv6 embeds simple routing extensions that support powerful new routing functionalities, such as: • Provider selection (according to some criteria such as policy, performance, cost, etc.) • Host mobility (route to current location) • Auto-readdressing (route to new address) The new routing functionality is obtained by creating sequences of IPv6 addresses using the IPv6 routing option. The routing option is used by an IPv6 source to list one or more intermediate nodes to be visited on the way to a packet’s destination. (This function is IPv4’s loose source and records route options.) To enable address sequences, IPv6 hosts are required in most cases to reverse routes in a packet they receive (if the packet was successfully authenticated using the IPv6 authentication header) containing address sequences in order to return the packet to its originator. The address sequence facility of IPv6 is simple but powerful. As an example, if host H1 were to decide to enforce a policy that all packets to/from host H2 should only go through given provider ISPx, it would construct a packet containing the following address sequence: H1, ISPx, H2. This ensures that when H2 replies to H1, it will reverse the route and the reply will go through ISPx. The addresses in H2’s reply would be: H2, ISPx, H1.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-27
FIGURE 3.8 Dual stack.
3.8.2 IPv6 Deployment in the Current Internet: State of the Art and Migration Issues 3.8.2.1 Transition from IPv4 to IPv6 The two versions of IP are not compatible, so they cannot coexist in the same subnet. This makes the spread of the protocol difficult. No company, organization, or university would switch to IPv6 if it meant turning the network off until all the nodes and routers are updated. To cope with these problems, the IETF has standardized an almost painless transition mechanism called SIT (simple Internet transition) [Gillig96]. The new protocol has been provided with properties that allow a simple, fast transition and mechanisms that allow the two protocols to coexist. IPv6 thus has the following properties: • Incremental update: IPv4 nodes can be updated to IPv6 one at a time. • Minimal dependence in update operations: The only requirement is that before performing a transition operation on a host, it has to be performed on the Domain Name Service (DNS) server; there are no requirements for transition on routers. • Easy addressing: At the moment of transition, the same addresses can be used simply by transforming them. A class of addresses that exploits IPv4 addresses (IPv4 compatibile) has been provided. • Low initialization costs: Little effort is required to update IPv4 to IPv6 and initialize new systems supporting IPv6. In addition, two cooperation mechanisms have been provided: dual stack and tunneling. 3.8.2.1.1 Dual Stack These are gateways that support both IPv4 and IPv6 implementing the two protocols completely, so as to allow IPv6 nodes to receive correctly traffic from nodes using IPv4 (as shown in Figure 3.8). These gateways receive IPv4 packets, replace the header with an IPv6 one, and relaunch them to the IPv6 subnet. Dual-stack nodes have at least two addresses, an IPv4 one and an IPv6 one, that can be related. For example, the IPv6 address may be IPv4 compatible, but not necessarily. 3.8.2.1.2 Tunneling This mechanism was actually born before IPv6 to solve certain communication problems in networks using different protocols. It consists of creating a virtual tunnel between two network nodes. In practice, a whole packet arriving at the first node (from the network header onward) is inserted into the payload of another packet. In IPv6 the routers providing tunneling are similar to the gateway dual stacks discussed above. They encapsulate IPv6 packets in IPv4 packets, as can be seen in Figure 3.9. This allows IPv6 packets to pass through networks that do not support this protocol. At the destination there obviously has to be a router to perform the inverse operation, i.e., to open the IPv4 packets, extract the IPv6 packets, and send them to the destination subnet. This mechanism is shown in Figure 3.10, where routers RA and RB handle the tunnel. This mechanism is fundamental for the worldwide spread of IPv6: it allows communications between IPv6 islands through the IPv4 network. IPv6 islands are the subnets using IPv6 instead of IPv4. To communicate with each other through the IPv4 network, they have to use tunneling. There are two kinds of tunneling:
© 2005 by CRC Press
3-28
The Industrial Communication Technology Handbook
FIGURE 3.9 Tunneling.
FIGURE 3.10 Tunneling.
• Configured tunneling: When the packet’s destination is not at the end of the tunnel. There are two modes: • Router to Router: The tunnel interconnects two IPv6 routers that are neither the source nor the destination of the IPv6 packet. In this case, the tunnel is part of the path the packet has to cover. • Host to Router: The tunnel interconnects an IPv6 host, which is the source of the packet, to an IPv6 router, which is not the packet’s destination. In this case, the tunnel is the first part of the path the packet has to cover. • Automatic tunneling: When the packet’s destination is at the end of the tunnel. Again, there are two modes: • Host to Host: The tunnel connects two IPv6 hosts; the first is the source of the packet and the second its destination. In this case, the tunnel is the whole path the packet has to cover. • Router to Host: The tunnel connects an IPv6 router and an IPv6 host; the former is not the source of the packet, but the latter is its destination. In this case, the tunnel is the final part of the path the packet has to cover. 3.8.2.2 An Experimental Network: The 6Bone The first lab experiments on networking solutions based on IPv6 soon led to worldwide geographical area experimentation and the introduction in 1996 of the 6Bone network [6Bone]. 6Bone (IPv6 backbone) is an experimental IPv6 network, parallel to the network using IPv4 and realized by interconnecting IPv6 labs via tunneling. It became a reality in March 1996 with the setting up of a first tunnel between the IPv6 labs of G6 [G6] (France), UNI-C [UNI-C] (Denmark), and WIDE [WIDE] (Japan). 6Bone has seen continuous growth in the number of interconnected labs and is the environment in which the most interesting IPv6 protocol experiments are being carried out: verification of the maturity of implementations, handling of the addressing spaces assigned to experimental providers, IPv6 routing, etc. The network is organized as a three-layer hierarchy. At the highest level are the sites making up the 6Bone backbone, i.e., the portion of the network on which most of the geographical connectivity enjoyed by the other sites connected is based. At the next level down there are the so-called 6Bone transit sites, i.e., sites that are connected to at least one of the backbone sites but which, in turn, operate as network access points for nodes that do not have a direct tunnel toward the backbone. The latter are the lowest level in the current 6Bone hierarchy and are called leaves. Connectivity between the backbone sites is ensured by a large number of tunnels on the Internet and some direct links forming an arbitrary mesh topology within which the routing of IPv6 packets is based
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-29
on the BGPv4+ dynamic routing protocol [Bates00] (a version of BGP4 that is capable of supporting both IPv4 and IPv6).
3.9 Conclusions The adoption of IPv6 has been limited up to now, as it requires modification of the entire infrastructure of the Internet. More effort on the transition is therefore necessary to make it as simple as possible and open the way for the potential of IPv6. IPv6 is expected to gradually replace IPv4, with the two coexisting for a number of years during a transition period, thanks to tunneling and dual-stack techniques. Currently, IPv6 is implemented on the 6Bone network [6Bone], a collaborative project involving Europe, North America, and Japan. Studies about how to exploit novel IPv6 features have already appeared, and companies are also interested in IPv6 technology to overcome IPv4 limitations [Ficher03][LoBello03][VaSto03]. Migration from IPv4 to IPv6 is expected to be gradual, due to reasons that slow the process, for example: • Increased memory requirements for intermediate devices, such as routers, switches, etc., for network addresses • The extra load on domain name systems (DNSs), which need to maintain and provide both the addresses that, during the transition, each IPv6 host will have, i.e., an IPv4 28-bit address and an IPv6 128-bit one • The need to redesign the user interfaces of current TCP/IPv4 applications and services, which are based on traditional 32-bit addresses and therefore have to be adapted to work with the larger IPv6 addresses Nevertheless, all the major router vendors have already started to enable IPv6 implementation on their systems. Among the routing protocols supporting IPv6 are RIPng [Malkin98], OSPFv3 [Coltun99], Integrated IS-ISv6 [Hopps03], and MP-BGPv6 [Marque99]. More details can be found in [Cisco02b].
References [3COM] Understanding IP Addressing, White Paper, 3COM Corporation, www.3com.com. [6Bone] http://www.6bone.net. [Awdu02] D. Awduche, A. Chiu, A. Elwalid, I. Widjaja, X. Xiao, RFC 3272, Overview and Principles of Internet Traffic Engineering, May 2002. [Bates00] T. Bates, Y. Rekhter, R. Chandra, D. Katz, RFC 2858, Multiprotocol Extensions for BGP-4, June 2000. [Brade87] R. Braden, J. Postel, RFC 1009, Requirements for Internet Gateways, June 1987. [CBT] RFC 2189, Core-Based Tree (CBT Version 2) Multicast Routing: Protocol Specification, September 1997. [CBTv2] RFC 2201, Core-Based Tree (CBT) Multicast Routing Architecture, September 1997. [Chen99] E. Chen, J. Stewart, RFC 2519, A Framework for Inter-Domain Route Aggregation, February 1999, ftp://ftp.rfc-editor.org/in-notes/rfc2519.txt. [Cisco97] Cisco Systems, Integrating Enhanced IGRP into Existing Networks, 1997, http://www.cisco.com (search for the document title). [Cisco02a] Cisco Systems, Inc., Enhanced IGRP, http://www.cisco.com/univercd/cc/td/doc/cisintwk/ ito_doc/en_igrp.htm. [Cisco02b] Cisco IOS Learning Services, The ABCs of IP Version 6, 2002, www.cisco.com/go/abc. [Cisco03] Cisco Systems, Inc., Internetworking Technologies Handbook, Cisco Press, Indianapolis, 2003. [Callon90] R.W. Callon, RFC 1195, Use of OSI IS-IS for Routing in TCP/IP and Dual Environments, December 1990. [Coltun99] R. Coltun, D. Ferguson, J. Moy, RFC 2740, OSPF for IPv6, December 1999.
© 2005 by CRC Press
3-30
The Industrial Communication Technology Handbook
[Dalal78] Y.K. Dalal and R.M. Metcalf, Reverse path forwarding of broadcast packets, Communications of the ACM, 21(12), 1040–1048, December 1978. [Deeri88] S. Deering, Multicast routing in internetworks and extended LANs, ACM Computer Communication Review, 18(4), Proceedings of ACM SIGCOMM’88, pp. 55–64, Stanford, Aug. 16–19, 1988. [Deeri98] S. Deering, R. Hinden, RFC 2460, Internet Protocol, Version 6 (IPv6) Specification, December 1998. [Dijks59] E.W. Dijkstra, A note on two problems in connection with graphs, Numer. Math., 1, 269–271, 1959. [Droms97] R. Droms, RFC 2131, Dynamic Host Configuration Protocol, March 1997. [Egeva94] K. Egevang, P. Francis, RFC 1631, The IP Network Address Translator (NAT), May 1994. [Estrin96] D. Estrin, T. Li, Y. Rekhter, K. Varadhan, D. Zappala, RFC 1940, Source Demand Routing: Packet Format and Forwarding Specification (Version 1), May 1996. [Estrin98] D. Estrin, D. Farinacci, A. Helmy, D. Thaler, S. Deering, M. Handley, V. Jacobson, C. Liu, P. Sharma, L. Wei, RFC 2362, Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol Specification, June 1998. [Fenne97] W. Fenner, RFC 2236, Internet Group Management Protocol, Version 2, November 1997. [Ficher03] S. Fichera, S. Visalli, O. Mirabella, QoS Support for Real-Time Flows in Internet Routers, paper presented at RTLIA ’03, 2nd International Workshop on Real-Time LANs in the Internet Age, Satellite Workshop of the 15th Euromicro Conference on Real-Time Systems (ECRTS03), Porto, Portugal, June 2003. [Floyd97] S. Floyd, V. Jacobson, Synchronization of periodic routing messages, IEEE/ACM Transactions on Networking, 2, 122–136, 1997. [Fuller93] V. Fuller, T. Li, J. Yu, K. Varadhan, RFC 1519, Classless Inter-Domain Routing (CIDR): An Address Assignment and Aggregation Strategy, September 1993. [G6] http://www.g6.asso.fr. [Garcia93] J.J. Garcia-Luna-Aceves, Loop-free routing using diffusing computations, IEEE/ACM Transactions on Networking, 1, 130–141, 1993. [Gillig96] R. Gilligan, E. Nordmark, RFC 1933, Transition Mechanisms for IPv6 Hosts and Routers, April 1996. [Hakimi71] S.L. Hakimi, Steiner’s problem in graphs and its implications, Networks, 1, 113–133, 1971. [Halabi00] B. Halabi, D. McPherson, Internet Routing Architectures, Cisco Press, Indianapolis, 2000. [Hedri88] C.L. Hedri, RFC 1058, Routing Information Protocol, June 1988. [Hedri91] C.L. Hedri, An Introduction to IGRP, August 1991, http://www.cisco.com/warp/public/103/ 5.html. [Hinden93] R. Hinden, Editor, RFC 1517, Applicability Statement for the Implementation of Classless Inter-Domain Routing (CIDR), September 1993. [Hopps03] C.E. Hopps, Routing IPv6 with IS-IS, January 2003, draft-ietf-isis-ipv6-05.txt. [Huitem95] C. Huitem, Routing in the Internet, Prentice Hall, 1995. [IANA] Internet Assigned Number Authority homepage, http://www.iana.org/. [Kenyon02] T. Kenyon, Data Networks, Digital Press, Elsevier Science, 2002. [Kumar96] V. Kumar, Mbone: Interactive Media on the Internet, New Riders Publishing, Indianapolis, 1996. [Kurose01] J.F. Kurose, K. Ross, Computer Networking, Addison-Wesley, Reading, MA, 2001. [Lewis99] C. Lewis, Cisco TCP/IP Routing Professional Reference, McGraw-Hill Companies, New York, 1999. [LoBello03] L. Lo Bello, S. Fichera, S. Visalli, O. Mirabella, Congestion Control Mechanisms for MultiHop Network Routers, paper presented at IEEE International Conference on Emerging Technologies and Factory Automation ETFA2003, Lisbon, Portugal, October 2003. [Malkin97] G. Malkin, R. Minnear, RFC 2080, RIPng for IPv6, January 1997. [Malkin98] G. Malkin, RFC 2453/STD 0056, RIP Version 2, November 1998. [Marque99] P. Marques, F. Dupont, RFC 2545, Use of BGP-4 Multiprotocol Extensions for IPv6 InterDomain Routing, March 1999.
© 2005 by CRC Press
A Perspective on Internet Routing: IP Routing Protocols and Addressing Issues
3-31
[Mogul85] J. Mogul, J. Postel, RFC 950, Internet Standard Subnetting Procedure, August 1985. [Moy89] J. Moy, RFC 1131, OSPF Specification, October 1989. [Moy88] J. Moy, RFC 2328, OSPF Version 2, April 1998. [Moy94] J. Moy, RFC 1584, Multicast Extensions to OSPF, March 1994. [Nicho03] J. Nicholas, W. Siadak, Protocol Independent Multicast — Dense Mode (PIM-DM): Protocol Specification (Revised), September 2003, draft-ietf-pim-dm-new-v2-04.txt. [Partri95] C. Partridge, RFC 1809,Using the Flow Label Field in IPv6, June 1995. [Pepel00] I. Pepelnjak, EIGRP Network Design Solutions, Cisco Press, Indianapolis, 2000. [Perlm92] R. Perlman, Interconnections: Bridges and Routers, Addison-Wesley, Reading, MA, 1992. [Poste81] J. Postel, RFC 791, Internet Protocol, September 1981. [Rekht93a] Y. Rekhter, T. Li, RFC 1518, An Architecture for IP Address Allocation with CIDR, September 1993. [Rekht93b] Y. Rekhter, C. Topolcic, RFC 1520, Exchanging Routing Information across Provider Boundaries in the CIDR Environment, September 1993. [Rekhr95] Y. Rekhter, T. Li, RFC 1771, A Border Gateway Protocol 4 (BGP-4), March 1995. [Rosen82] E.C. Rosen, RFC 0827, Exterior Gateway Protocol, October 1982. [Schulz03] H. Schulzrinne, S. Casner, R. Frederick, RFC 3550, RTP: A Transport Protocol for Real-Time Applications, July 2003. [Trotter01] G. Trotter, RFC 3222, Terminology for Forwarding Information Base (FIB) Based Router Performance, December 2001. [UNI-C] http://www.uni-c.dk. [VaSto03] P. Van der Stok, M. van Hartskamp, Robust Real-Time IP-Based Multimedia Communication, paper presented at RTLIA ’03, 2nd International Workshop on Real-Time LANs in the Internet Age, Satellite Workshop of the 15th Euromicro Conference on Real-Time Systems (ECRTS03), Porto, Portugal, June 2003. [Waitz88] D. Waitzman, C. Partridge, RFC 1075, Distance Vector Multicast Routing Protocol, November 1988. [Wall80] D. Wall, Mechanisms for Broadcast and Selective Broadcast, Ph.D. Dissertation, Stanford University, June 1980. [Waxm88] B.M. Waxmann, Routing of multipoint connections, IEEE Journal of Selected Areas in Communications, 6, 1617–1622, 1988. [Wei93] L.Wei, D. Estrin, TR USC-CD-93-560, A Comparison of Multicast Trees and Algorithms, Department of Computer Science, University of Southern California, Los Angeles, September 1993. [WIDE] http://6bone.v6.wide.ad.jp.
© 2005 by CRC Press
4 Fundamentals in Quality of Service and Real-Time Transmission 4.1 4.2
What Is Quality of Service? ................................................4-1 Factors Affecting the Network Quality..............................4-3 Bandwidth • Throughput • Latency • Queuing Delay • Transmission Delay • Propagation Delay • Processing Delay • Jitter • Packet Loss
4.3
QoS Delivery........................................................................4-6 FIFO Queuing • Priority Queuing • Class-Based Queuing • Weighted Fair Queuing
4.4
Protocols to Improve QoS ..................................................4-8 Integrated Services • Differentiated Services • Multi-Protocol Label Switching • Combining QoS Solutions
4.5
Wolfgang Kampichler Frequentis GmbH
Protocols Supporting Real-Time Traffic..........................4-14 Real-Time Transport Protocol • Real-Time Transport Control Protocol • Real-Time Streaming Protocol
References .....................................................................................4-17
4.1 What Is Quality of Service? It is difficult to find an adequate definition of what quality of service (QoS) actually is. There is a danger that because we wish to use quantitative methods, we might limit the definition of QoS to only those aspects of QoS that can be measured and compared. In fact, there are many subjective and perceptual elements to QoS, and there has been a lot of work done trying to map the perceptual to the quantifiable (particularly in the telephony industry). However, as yet there does not appear to be a standard definition of what QoS actually is in measurable terms. When considering the definition of QoS, it might be helpful to look at the old story of the three blind men who happen to meet an elephant on their way. The first man touches the elephant’s trunk and determines that he has stumbled upon a huge serpent. The second man touches one of the elephant’s massive legs and determines that the object is a large tree. The third man touches one of the elephant’s ears and determines that he has stumbled upon a huge bird. All three of the men envision different things, because each man examines only a small part of the elephant. In this case, think of the elephant as a concept of QoS. Different people see QoS as different concepts, because various and ambiguous QoS problems exist. Hence, there is more than one way to characterize QoS. Briefly described, QoS is the
4-1 © 2005 by CRC Press
4-2
The Industrial Communication Technology Handbook
ability of a network element (e.g., an application, host, or router) to provide some level of assurance for consistent and timely network data delivery [3]. By nature, the basic Internet Protocol (IP) service available in most of the network is a best effort. For instance, from a router point of view (upon receiving a packet at the router), this service could be described as follows: • It determines first where to send the incoming packet (the next-hop of the packet). This is usually done by looking up the destination address in the forwarding table. • Once it is aware of the next-hop, it will send the packet to the interface associated with this nexthop. If the interface is not able to immediately send the packet, it is stored on the interface in an output queue. • If the queue is full, the arriving packet is dropped. If the queue already contains packets, the newcomer is subjected to extra delay due to the time needed to emit the older packets in the queue. Best effort allows the complexity to stay in the end hosts, so the network can remain relatively simple. This scales well, as evidenced by the ability of the Internet to support its growth. As more hosts are connected, the network degrades gracefully. Nevertheless, the resulting variability in delivery delay and packet loss does not adversely affect typical Internet applications (e.g., e-mail or file transfer). Considering applications with real-time requirements, delay, delay variation, and packet loss will cause problems. Generally, applications are of two main types: • Applications that generate elastic traffic — The application would rather wait for reception of traffic in the correct order, without loss, than display incoming information at a constant rate (such as an e-mail). • Applications that generate inelastic traffic — Timeliness of information is more important to the application than zero loss, and traffic that arrives after a certain delay is essentially useless (such as voice communication). In an IP-based network, applications run across User Datagram Protocol (UDP) or Transmission Control Protocol (TCP) connections. TCP guarantees delivery, doing so through some overhead and session layer sequencing of traffic. It also throttles back transmission rates to behave gracefully in the face of network congestion. By contrast, UDP is connectionless; thus, no guarantee of delivery is made, and sequencing of information is left to the application itself. Most elastic applications use TCP for transmission, and in contrast, many inelastic applications use UDP as a real-time transport. Inelastic applications are often those that demand a preferential class of service or some form of reservation to behave properly. However, many of the mechanisms that network devices use (such as traffic discard or TCP session control) are less effective on UDP-based traffic since it does not offer some of TCP’s self-regulation. Common for all packets is that they are treated equally. There are no guarantees, no differentiation, and no attempt at enforcing fairness. However, the network should try to forward as much traffic as possible with reasonable quality. One way to provide a guarantee to some traffic is to treat those packets differently from packets of other types of traffic. Increasing bandwidth is seen as a necessary first step for accommodating real-time applications, but it is still not enough. Even on a relatively unloaded network, delivery delays can vary enough to continue to affect time-sensitive applications adversely. To provide an appropriate service, some level of quantitative or qualitative determinism must be supplemented to network services. This requires adding some “intelligence” to the net, to distinguish traffic with strict timing requirements from others. Yet there remains a further challenge: in the real world, the end-to-end communication path consists of different elements utilizing several network layers and traversing domains managed by different service providers. Therefore, it is unlikely that QoS protocols will be used independently, and in fact, they are designed for use with other QoS technologies to provide top-to-bottom and end-to-end QoS between senders and receivers. What does matter is that each element has to provide QoS control services and the ability to map other QoS technologies in the correct manner. The following gives a brief overview
© 2005 by CRC Press
4-3
Fundamentals in Quality of Service and Real-Time Transmission
LAN Ethernet 100 Mbps
LAN Ethernet 100 Mbps
WAN STM-1 155 Mbps Router
Router
Processing delay Queuing delay Transmission delay Propagation delay
FIGURE 4.1 Network end-to-end communication path.
of end-to-end network behavior and some key QoS protocols and architectures. For a detailed description, refer to other chapters in this book.
4.2 Factors Affecting the Network Quality A typical end-to-end communication path might look like that illustrated in Figure 4.1 and consist of two machines, each connected through a local area network (LAN) to an enterprise network. Further, these networks might be connected through a wide area network (WAN). The data exchange can be anything from a short e-mail message to a large file transfer, an application download from a server, or communication data from a time-sensitive application. While networks, especially local area networks, have been getting faster, perceived throughput at the application has not always increased accordingly. An application is generally running on a host CPU, and its performance is a function of the processing speed, memory availability, and overall operating system load. In many situations, it is the processing that is the real limiting factor on throughput, rather than the infrastructure that is moving data [14]. Network interface hardware transfers incoming packets from the network to the computer’s memory and informs the operating system that a packet has arrived. Usually, the network interface uses the interrupt mechanism to do so. The interrupt causes the CPU to suspend normal processing temporarily and to jump to a code called a device driver. The device driver informs the protocol software that a packet has arrived and must be processed. Similar operations occur in each intermediate network node. Routing devices pass packets along a chain of hops until the final address is reached. These hops are routing machines of various kinds that generally maintain a queue (or multiple queues) of outgoing packets on each outgoing physical port [2]. If these queues of outgoing data packets become full, a routing machine simply starts to discard packets randomly to ease the buildup of congestion. It is evident that such nodes are customized for forwarding operations, which are mostly processed in hardware. In recent years, however, the Internet has seen increasing use of applications that rely on the timely, regular delivery of packets, and that cannot tolerate the loss of packets or the delay caused by waiting in queues. In general, the one-way delay is equivalent to the sum of single-hop delays suffered between each pair of consecutive pieces of equipment encountered on the path. Measurable factors [7], [8] that are used to describe network QoS are as follows.
4.2.1 Bandwidth Bandwidth (better described as data rate in this context) is the transmission capacity of a communications line, which is usually stated in bit/second. In reality, as data exchange approaches the maximum limit (in a shared environment), delays and collisions might mean a drop in quality. Basically, the bandwidths of all networks utilized in an end-to-end path need to be considered, as the narrowest section provides the maximum speed of data transfer for the entire path. A routing device needs to be capable of transmitting data at a rate commensurate with the potential bandwidth of the network segments that it is servicing. The cost of bandwidth has fallen in recent years, but demand has obviously gone up.
© 2005 by CRC Press
4-4
The Industrial Communication Technology Handbook
TABLE 4.1
Queuing Delays
Number of Queued 1000-Bit Packets
STM-1 (155 Mbps)
STM-4 (622 Mbps)
Gigabit Ethernet (1 Gbps)
40 (80% load) 80 (85% load) 200 (93% load) 500 (97% load)
256 ms 512 ms 1280 ms 3200 ms
64 ms 128 ms 320 ms 800 ms
40 ms 80 ms 200 ms 500 ms
4.2.2 Throughput Throughput is the average of actual traffic transferred over a given link, in a given time span expressed in bit/second. It can be seen, for congestion-aware transport protocols such as TCP, as transport capacity = (data sent)/(elapsed time), where data sent represents the unique data bits transferred (i.e., not including header bits or emulated header bits). It should also be noted that the amount of data sent should only include the unique number of bits transmitted (i.e., if a particular packet is retransmitted, the data it contains should be counted only once). Hence, in such a case, the throughput is not only limited by the transmission window, but also limited by the value of the round-trip time.
4.2.3 Latency In general, latency is the time taken to transmit a packet from a sending to a receiving node. This encompasses delay in a transmission path or in a device within the transmission path. The nodes might be end stations or intermediate routes. Within a single router, latency is the amount of time between the receipt of a data packet and its transmission, which includes processing and queuing delay, as described next, among other sources of delay.
4.2.4 Queuing Delay The major random component of delay (that is, the only source of jitter) for a given end-to-end path consists of queuing delay in the network. Queuing delay depends on the number of hops in the path and the queuing mechanisms used, and it also increases with the offered load leading to packet loss if the queues are filled up. The last packet in the queue has to wait (N*8)/X seconds before being emitted by the interface, where N is the number of bytes that have to be sent before the last queued packet and X is the sending rate (bit/s). Typical queuing delay values of state-of-the-art routers are summarized in Table 4.1. Values are about 0.5 to 1 ms; thus, it can be said that queuing delay in a well-dimensioned backbone network (using priority scheduling mechanisms, as described later) would not dramatically increase latency, even if there are five to eight hops within the path. At this point, it should be mentioned that queuing delay may be impaired by edge routers connecting high- and low-bandwidth links and could easily reach tens of milliseconds, thus increasing latency more noticeably.
4.2.5 Transmission Delay Transmission, or serialization delay, is the time taken to transmit all the bits of the frame containing the packet, i.e., the time between emission of the first bit of the frame and emission of the last bit (see also [4]). It is inversely proportional to the line speed or, in other words, the ratio between packet size (bit) and transmission rate (bps). For example, transmission of a 1500-byte packet over a 10-Mbps link takes 1.2 ms, whereas for a 64-kbps link it takes 187.5 ms (the protocol overhead is not considered in either case). In general, a small packet size and a high transmission rate lower the transmission time.
4.2.6 Propagation Delay Propagation delay is the time between emission (by the emitting equipment) of the first bit (or the last bit) and the reception of this bit by the receiving equipment. It is mainly a function of the speed of light
© 2005 by CRC Press
Fundamentals in Quality of Service and Real-Time Transmission
4-5
and the distance traveled. For local area networks, propagation delay is almost negligible. For wide area connections, it typically adds 2 ms per 250 miles to the total end-to-end delay. One can assume that a well-designed homogeneous high-speed backbone network (e.g., STM-4*) would have a network delay (only propagation and queuing taken into account) of 10 ms when considering 10 hops using priority queuing mechanisms and a network extension of about 625 miles.
4.2.7 Processing Delay Most networks use a protocol suite that provides connectionless data transfer end to end, in our case IP. Link layer communication is usually implemented in hardware, but IP will usually be implemented in software, executing on the CPU in a communicating end station. Normally, IP performs very few functions. Upon inputting of a packet, it checks the header for correct form, extracts the protocol number, and calls the upper-layer protocol function. The executed path is almost always the same. Upon outputting, the operation is very similar, as shown in the following IP instruction counts: • Packet receipt: 57 instructions • Packet sending: 61 instructions Since input occurs at interrupt time, arbitrary procedures cannot be called to process each packet. Instead, the system uses a queue along with message-passing primitives to synchronize communication. When an IP datagram arrives, the interrupt software must en(d)-queue the packet and invoke a send primitive to notify the IP process that a datagram has arrived. When the IP process has no packets to handle, it calls the receiving primitive to wait for the arrival of another datagram. Once the IP process accepts an incoming datagram, it must decide where to send it for further processing. If the datagram carries a TCP segment, it must go to the TCP module; if it carries a UDP datagram, it is forwarded to the UDP module. Being complex, most TCP designs use a separate process to handle incoming segments. A consequence of having separate IP and TCP processes is that they must use an interprocess communication mechanism when they interact. Once TCP receives a segment, it uses the protocol port numbers to find the connection to which the segment belongs. If the segment contains data, TCP will add the data to a buffer associated with the connection and return an acknowledgment to the sender. If the incoming segment carries an acknowledgment for outbound data, the input process must also communicate with the TCP timer process to cancel the pending retransmission. The process structure used to handle an incoming UDP datagram is quite different from that used for TCP. As UDP is much simpler than TCP, the UDP software module does not execute as a separate process. Instead, it consists of conventional procedures executed by the IP process to handle an incoming UDP datagram. These procedures examine the destination UDP port number and use it to select an operating system queue for the incoming datagram. The IP process deposits the UDP datagram on the appropriate port, where an application program can extract it [15].
4.2.8 Jitter Jitter is best described as the variation in end-to-end delay, and it has its main source in the random component of the queuing delay. Jitter can be expressed as the distortion of interpacket arrival times when compared to the interpacket departure times from the original sending station. For instance, packets are sent out at regular intervals, but may arrive at varying irregular intervals. Jitter is the variation in interval times. When packets are taking multiple paths to reach their destination, extreme jitter can lead to packets arriving out of order. Jitter is generally measured in milliseconds, or as a percentage of variation from the average latency of a particular connection.
*Synchronous digital hierarchy (SDH) defines n transport levels (hierarchy) called a synchronous transport module-n (STM-n).
© 2005 by CRC Press
4-6
The Industrial Communication Technology Handbook
FIGURE 4.2 Classification, queuing, and scheduling.
4.2.9 Packet Loss Packets that fail to arrive, or arrive so late that they are useless, contribute to packet loss. Lost (or dropped) packets are a product of insufficient bandwidth on at least one routing device on the network path. Some packets may arrive, but have been corrupted in transit and are therefore unusable. Note that loss is relative to the volume of data that is sent and is usually expressed as a percentage of data sent. In some contexts, a high loss percentage can mean that the application is trying to send too much information and is overwhelming the available bandwidth. Packet loss starts to be a real problem when the percentage of loss exceeds a specific threshold or when loss occurs in bursts. Thus, it is important to know both the percentages of lost packets and their distribution [5].
4.3 QoS Delivery As packet-switched networks are operated in a store-and-forward paradigm, a solution for service differentiation in the forwarding process is to give priority to packets requiring, for instance, an upperbounded delay over other packets. Considering that queuing is the central component in the internal architecture of a forwarding device, it is not difficult to imagine that managing such queuing mechanisms appropriately is crucial for providing the underlying QoS. Hence, queuing can be seen as one of the fundamental parts for differentiating service levels. The queuing delay can be minimized and kept under a certain value, even in the case of interface congestion. To achieve this, the forwarding device has to support classification, queuing, and scheduling (CQS) techniques to classify packets according to a traffic type and its requirements, to place packets on different queues according to this type. Finally, to schedule outgoing packets by selecting them from the queues in an appropriate manner, see Figure 4.2. The following descriptions of queuing disciplines focus on output queuing strategies, being the predominating strategic location for store-and-forward traffic management and QoS-related queuing [3], common for all QoS policies. Queuing should never happen permanently and continuously; instead, it is used to deal with occasional traffic peaks.
4.3.1 FIFO Queuing First-in, first-out (FIFO) queuing is considered to be the standard method for store-and-forward handling of traffic from an incoming interface to an outgoing interface. Many router vendors have highly optimized forwarding performances that make this standard behavior as fast as possible. When a network operates in a mode with sufficient level of transmission capacity and adequate levels of switching capability, FIFO
© 2005 by CRC Press
Fundamentals in Quality of Service and Real-Time Transmission
4-7
queuing is highly efficient. This is because as long as the queue depth remains sufficiently short, the average packet-queuing delay is an insignificant fraction of the end-to-end packet transmission time. Otherwise, when the load on the network increases, the transient bursts raise significant queuing delay, and if the queue is full, all subsequent packets are discarded.
4.3.2 Priority Queuing One of the first queuing variations to be widely implemented was priority queuing. This is based on the concept that certain types of traffic can be identified and shuffled to the front of the output queue so that some traffic is always transmitted ahead of other types of traffic. Priority queuing may have an adverse effect on forwarding performance because of packet reordering (non-FIFO queuing) in the output queue. This method offers several levels of priority, and the granularity in identifying traffic to be classified into each queue is very flexible. Although the level of granularity is fairly robust, the more differentiation attempted, the more impact on computational overhead and packet-forwarding performance. Another possible vulnerability in this queuing approach is that if the volume of high-priority traffic is unusually high, normal traffic to be queued may be dropped because of buffer starvation. This usually occurs because of overflow caused by too many packets waiting to be queued and there is not enough room in the queue to accommodate them.
4.3.3 Class-Based Queuing Another queuing mechanism introduced several years ago is called class-based queuing (CBQ) or custom queuing (CQ). Again, this is a well-known mechanism used within operating system design intended to prevent complete resource denial to any particular class of service. CBQ is a variation of priority queuing, where several output queues can be defined. CBQ provides a mechanism to configure how much traffic can be drained off each queue in a servicing rotation. This servicing algorithm is an attempt to provide some semblance of fairness by prioritizing queuing services for certain types of traffic, while not allowing any one class of traffic to monopolize system resources. CBQ can be considered a primitive method of differentiating traffic into various classes of service, and for several years, it has been considered an efficient method for queue resource management. However, CBQ simply does not scale to provide the desired performance in some circumstances, primarily because of the computational overhead concerning packet reordering and intensive queue management in networks with very high speed links.
4.3.4 Weighted Fair Queuing Weighted fair queuing (WFQ) is another popular method of queuing that algorithmically attempts to deliver predictable behavior and to ensure that traffic flows do not encounter buffer starvation. It gives low-volume traffic flows preferential treatment and allows higher-volume traffic flows to obtain equity in the remaining amount of queuing capacity. WFQ uses a servicing algorithm that attempts to provide predictable response times and negate inconsistent packet transmission timing, which is done by sorting and interleaving individual packets by flow, and queuing each flow based on the volume of traffic in each flow [6]. Typically, low-bandwidth streams, such as Voice-over-IP (VoIP), are given priority over largerbandwidth consumers such as file transfer. The weighted aspect of WFQ is dependent on the way in which the servicing algorithm is affected by other extraneous criteria. This aspect is usually vendor specific, and at least one implementation uses the IP precedence bits in the type-of-service (ToS, or DiffServ Code Point (DSCP), as described later) field to weight the method of handling individual traffic flows. WFQ possesses some of the same characteristics as priority and class-based queuing — it simply does not scale to provide the desired performance in some circumstances, primarily because of computational overhead. However, if these methods of queuing (priority, CBQ, and WFQ) are moved completely into hardware instead of being done in software, the impact on forwarding performance can be reduced greatly.
© 2005 by CRC Press
4-8
The Industrial Communication Technology Handbook
4.4 Protocols to Improve QoS Delivering network QoS for a particular application implies minimizing the effects of sharing network resources (bandwidth, routers, etc.) with other applications. This means effective QoS aims to minimize delay, optimize throughput, and minimize jitter and loss. The reality is that network resources are shared with other competing applications. Some of the competing applications could also be time-dependent services (inelastic traffic); others might be the source of traditional, best-effort traffic. For this reason, QoS has the further goal of minimizing the parameters mentioned for a particular set of applications or users, but without adversely affecting other network users. In order to regulate network capacity, the network must classify traffic and then handle it in some way. The classification and handling may happen on a single device consisting of both classifiers and queues or routes. In a larger network, however, it is likely that classification will happen at the periphery, where devices can recognize application needs, while handling is performed at the core, where congestion occurs. The signaling between classifying devices and handling devices can come in a number of ways, like the ToS of an IP header or other protocol extensions. Classification can occur based on a variety of information sources, such as protocol content, media identifier, the application that generated the traffic, or extrinsic factors such as time of the day or congestion levels. Similarly, handling can be performed in a number of ways: • Through traffic shaping (traffic arrives and is placed in a queue, where its forwarding is regulated; excess traffic will be discarded) • Through various queuing mechanisms (first-in, first-out, priority weighting, and class-based queuing) • Through throttling using various flow control algorithms such as used in TCP • Through the selective discard of traffic to notify transmitters of congestion • Through packet marking for sending instructions to downstream devices that will shape the traffic QoS protocols are designed to act that way, but they never create additional bandwidth; rather, they manage it to be used more effectively. Briefly summarized, QoS is the ability of a network element (e.g., an application, host, or router) to provide some level of assurance for consistent and timely network data delivery. The following sections give a brief overview of some of the key QoS protocols and architectures.
4.4.1 Integrated Services The Integrated Services (IntServ) architecture provides a framework for applications to choose between multiple controlled levels of delivery of services for their traffic flows. Two basic requirements exist to support this framework. The first is for the nodes in the traffic path to support the QoS control mechanisms and guaranteed services. The second is for a mechanism by which the applications can communicate their QoS requirements to the nodes along the transit path, as well as for the network nodes to communicate between each other about the requirements that must be provided for the particular traffic flow. All this is provided by a Resource Reservation Setup Protocol called RSVP [9], which is best described as the QoS signaling protocol. The information presented here is intended to be a qualitative description of the protocol, as in [3]. There is a logical separation between the Integrated Services QoS control services and RSVP. RSVP is designed to be used with a variety of QoS control services, and the QoS control services are designed to be used with a variety of setup mechanisms [11]. RSVP does not define the internal format of the protocol objects related to characterizing QoS control services; rather, it can be seen as the signaling mechanism transporting the QoS control information. RSVP is analogous to other IP control protocols, such as Internet Control Message Protocol (ICMP) or one of the many IP routing protocols. RSVP itself is not a routing protocol, but it uses the local routing table in routers to determine routes to the appropriate destinations.
© 2005 by CRC Press
Fundamentals in Quality of Service and Real-Time Transmission
4-9
FIGURE 4.3 Traffic flow of the RSVP Path and Resv messages.
In general terms, RSVP is used to provide QoS requests to all router nodes along the transit path of the traffic flows and to maintain the state necessary in the routers required to actually provide the requested services. RSVP requests, generally, result in resources being reserved in each router in the transit path for each flow. RSVP requires the receiver to be responsible for requesting specific QoS services, instead of the sender. This is an intentional design in RSVP that attempts to provide for efficient accommodation of large groups (e.g., multicast traffic), dynamic group membership (also for multicast), and diverse receiver requirements. There are two fundamental RSVP message types: the Resv message and the Path message, which provide for the basic RSVP operation, illustrated in Figure 4.3. An RSVP sender transmits Path messages downstream along the traffic path provided by a discrete routing protocol (i.e., Open Shortest-Path First (OSPF)). The Resv message is generated by the receiver and is transported back upstream toward the sender, creating and maintaining a reservation state in each node along the traffic path. RSVP still can function across intermediate nodes that are not RSVP capable. However, end-to-end resource reservations cannot be made, because non-RSVP-capable devices in the traffic path cannot maintain reservation or Path state in response to appropriate RSVP messages. Although intermediate nodes that do not run RSVP cannot provide these functions, they may have sufficient capacity to be useful in accommodating tolerant real-time applications. Since RSVP relies on a discrete routing infrastructure to forward RSVP messages between nodes, the forwarding of Path messages by non-RSVP-capable intermediate nodes is unaffected, since the Path message is carrying the IP address of the previous RSVP-capable node as it travels toward the receiver. RSVP is not a routing protocol by itself. RSVP is designed to operate with current and future unicast and multicast routing protocols. An RSVP process consults the local routing database(s) to obtain routes. In the multicast case, for example, a host sends IGMP messages to join a multicast group and then sends RSVP messages to reserve resources along the delivery path(s) of that group. Routing protocols determine where packets get forwarded — RSVP is only concerned with the QoS of those packets that are forwarded in accordance with routing (Figure 4.4). Summing up, Integrated Services is capable of bringing enhancements to the IP network model to support real-time transmissions and guaranteed bandwidth for specific flows. In this case, a flow is defined as a distinguishable stream of related datagrams from a unique sender to a unique receiver that results from a single user activity and requires the same QoS. The Integrated Services architecture promises precise per-flow service provisioning, but never really made it as a commercial end-user product, which was mainly accredited to its lack of scalability [16].
4.4.2 Differentiated Services Differentiated Services (DiffServ) defines an architecture (RFC 2474 and 2475) for implementing scalable service differentiation in the Internet. Here, a service defines some significant characteristics of packet transmission in one direction across a set of one or more paths within a network. These characteristics may
© 2005 by CRC Press
4-10
The Industrial Communication Technology Handbook
0
32 version
flags
message type
RSVP checksum
(reserved)
RSVP length
send TTL
• Version — The protocol version number; the current version is 1. • Flags — No flag bits are defined yet. • Message type — Possible values are: 1 Path, 2 Resv, 3 PathErr, 4 ResvErr, 5 PathTear, 6 ResvTear, and 7 ResvConf. • RSVP checksum — The checksum. • Send TTL — The IP TTL value with which the message was sent. • RSVP length — The total length of the RSVP message in bytes, including the common header and the variable-length objects that follow. FIGURE 4.4 Resource Reservation Protocol (RSVP). 0
8 CP
CU
FIGURE 4.5 DiffServ Code Point.
be specified in quantitative or statistical terms of throughput, delay, jitter, and loss, or may otherwise be specified in terms of some relative priority of access to network resources. Service differentiation is desired to accommodate heterogeneous application requirements and user expectations, and to permit differentiated pricing of Internet service. Differentiated Services mechanisms do not use per-flow signaling and, as a result, do not consume per-flow state within the routing infrastructure. Different service levels can be allocated to different groups of users, which means that all traffic is distributed into groups or classes with different QoS parameters. This reduces the maintenance overhead in comparison to Integrated Services. Network traffic is classified and apportioned to network resources according to bandwidth management criteria. To enable QoS, network elements give preferential treatment to classifications identified as having more demanding requirements. DiffServ provides a simple and coarse method of classifying services of applications. The main goal of DiffServ is a more scalable and manageable architecture for service differentiation in IP networks [13]. The initial premise was that this goal could be achieved by focusing not on individual packet flows, but on traffic aggregates, large sets of flows with similar service requirements. By carefully aggregating a multitude of QoS-enabled flows into a small number of aggregates, giving a small number of differentiated treatments within the network, DiffServ eliminates the need to recognize and store information about each individual flow in core routers. This basic approach to scalability succeeds by combining a small number of simple packet treatments with a larger number of per-flow policies to provide a broad and flexible range of services. A description of the externally observable forwarding treatment applied at a Differentiated Services-compliant node to a behavior aggregate is defined as per-hop behavior (PHB). Each DiffServ flow is policed and marked at the first QoS-enabled downstream router according to a contracted service profile, or service-level agreement (SLA). Downstream from this router, a DiffServ flow is mingled with similar DiffServ traffic into an aggregate. Then, all further forwarding and policing activities are performed on these aggregates. Current proposals [12] are using a few bits of the IP version 4 (IPv4) ToS byte or the IPv6 traffic class byte, now called the DiffServ Code Point (DSCP) (Figure 4.5), for marking packets. The format of the header is as follows: • CP — (Six-bit) Differentiated Services Code Point to select the PHB a packet experiences at each node • CU — Currently unused
© 2005 by CRC Press
4-11
Fundamentals in Quality of Service and Real-Time Transmission
Classifier
Conditioner Marker
Meter
FIGURE 4.6 Edge router: DiffServ classification and conditioning.
There are currently two standard per-hop behaviors defined that effectively represent two service levels (traffic classes): • Expedited forwarding (EF) — The objective with EF PHB (RFC 3246) is to provide a service that is low loss, low delay, and low jitter, such that the service approximates a virtual leased line. The basic approach is to minimize the loss and delay experienced in the network by minimizing queuing delays. This job can be done by ensuring that, at each node, the rate of departure of packets from the node is a well-defined minimum (shaping on egress points), and that the arrival rate at the node is always less than the defined departure rate (policing on ingress points). For example, to ensure that the incoming rate is always below the configured outgoing rate, any traffic that exceeds the traffic profile, which is defined by local policy, is discarded. Generally, expedited forwarding could be implemented in network nodes by a priority queue. The recommended DSCP value for the EF PHB is 101110; see [20]. • Assured forwarding (AF) — AF PHB is defined in RFC 2597. Its objective is to provide a service that ensures that high-priority packets are forwarded with a greater degree of reliability than lower-priority packets. AF defines four priorities (classes) of traffic, receiving different bandwidth levels (sometimes described as the Olympic services: gold, silver, bronze, and best effort). There are three drop preferences within each priority class, resulting in 12 different DSCP values. The worse the drop preference, the more chance of getting dropped during congestion. Hence, AF PHB enables packets to be marked with different AF classes and, within each class, to be marked with different drop precedence values. Within a router, resources are allocated according to the different AF classes. If the resources allocated according to a different class become congested, then packets must be dropped. The packets to be dropped are those with higher drop precedence, as in [20]. Normally, the traffic into a DiffServ network from a particular source should conform to a particular traffic profile; thus, the rate of traffic should not exceed some preagreed maximum. In the event that it does, excess traffic is not delivered with as high of a probability as the traffic within the profile, which means it may be demoted but not necessarily dropped. The PHBs are expected to be simple and define forwarding behaviors that may suggest, but do not require, a particular implementation or queuing discipline. In general, a classifier selects packets based on one or more predefined sets of header fields. The mapping of the network traffic to the specific behaviors is indicated by the DSCP. The traffic conditioners enforce the rules of each service at the network ingress point. Finally, PHBs are applied to the traffic by the conditioner at a network ingress point according to predetermined policy criteria. The traffic may be marked at this point and routed according to the marking, and then unmarked at the network egress. Each DiffServ-enabled edge router implements traffic conditioning functions, which perform metering, shaping, policing, and marking of packets to ensure that the traffic entering a DiffServ network conforms to the SLA, as illustrated in Figure 4.6. The simplicity of DiffServ to prioritize traffic belies its extensibility and power. Using RSVP parameters (as described in the next section) or specific application types to identify and classify constant-bit-rate (CBR) traffic might help to establish well-defined aggregate flows that may be directed to fixed-bandwidth pipes. DiffServ is more scalable at the cost of coarser service granularity, which may be the reason why it is not yet commercially available to the end users; see also [16].
© 2005 by CRC Press
4-12
The Industrial Communication Technology Handbook
0
32 label
exp
S
TTL
• Label — Label value carries the actual value of the label. When a labeled packet is received, the label value at the top of the stack is inspected and learns: • The next-hop to which the packet is to be forwarded. • The operation to be performed on the label stack before forwarding; this operation may be to replace the top label stack entry with another, or to pop an entry off the label stack, or to replace the top label stack entry and then to push one or more additional entries on the label stack. • Exp — Experimental use: Reserved for experimental use. • S — Bottom of stack: This bit is set to one for the last entry in the label stack, and zero for all other label stack entries. • TTL — Time-to-live field is used to encode a time-to-live value. FIGURE 4.7 MPLS label structure.
4.4.3 Multi-Protocol Label Switching As stated, we can see that IntServ and DiffServ take different approaches solving the QoS challenge. Meanwhile, another approach exists that is slightly different but already in use: Multi-Protocol Label Switching (MPLS). In contrast, it is not primarily a QoS solution, although it can be used to support QoS requirements. More specifically, MPLS has mechanisms to manage traffic flows of various granularities and is independent of the layer 2 and layer 3 protocols such as asynchronous transfer mode (ATM) and IP. MPLS provides a means to map IP addresses to simple, fixed-length labels used by different packet-forwarding and packet-switching technologies. Additionally, MPLS interfaces to existing routing and switching protocols, such as IP, ATM, Frame Relay, Resource Reservation Protocol (RSVP), Open Shortest-Path First (OSPF), and others. In MPLS, data transmission occurs on label-switched paths (LSPs). LSPs are a sequence of labels at each and every node along the path from the source to the destination. There are several label distribution protocols used today, such as Label Distribution Protocol (LDP) or RSVP, or piggybacked on routing protocols like Border Gateway Protocol (BGP) and OSPF. High-speed switching of data is possible because the fixed-length labels are inserted at the very beginning of the packet or cell and can be used by hardware to switch packets quickly between links. MPLS is best viewed as a new switching architecture and is basically a forwarding protocol that simplifies routing in IP-based networks. It specifies a simple and scalable forwarding mechanism, since it uses labels instead of a destination address to make the routing decision. The label value that is placed in an incoming packet header is used as an index to the forwarding table in the router (Figure 4.7). This lookup requires only one access to the table, in contrast to the traditional routing table access that might require uncountable lookups [1]. One of the most important uses of MPLS is in the area of traffic engineering, which can be summarized as the modeling, characterization, and control of traffic to meet specified performance objectives. Such performance objectives might be traffic oriented or resource oriented. The former deals with QoS and includes aspects such as minimizing delay, jitter, and packet loss. The latter deals with optimum usage of network resources, particularly network bandwidth. The current situation with IP routing and resource allocation is that the routing protocols are not well equipped to deal with traffic engineering issues. For example, a protocol such as OSPF (open shortestpath first) can actually promote congestion because it tends to force traffic down the shortest route, although other acceptable routes might be less loaded. With MPLS, a set of flows that share specific attributes can be routed over a given path. This capability has the immediate advantage of steering certain traffic away from the shortest path, which is likely to become congested before other paths. In conclusion, we may say that label switching offers scalability to networks by allowing a large number of IP addresses to be associated with one or a few labels. This approach reduces further the size of address
© 2005 by CRC Press
4-13
Top-to-bottom QoS
Fundamentals in Quality of Service and Real-Time Transmission
Application
Application
Presentation
Presentation
Session
Session
Transport
Transport
Network
Network
Data link
Data link
Physical
Physical
QoS API RSVP DiffServ 802.1p
802.1p
802.1p RSVP
QoS enabled application
DiffServ, MPLS
RSVP
End-to-end QoS
FIGURE 4.8 QoS architecture.
(actually label) tables and allows a router to support more users or to set up fixed paths for different types of traffic. Since the main attributes of label switching are fast relay of the traffic, scalability, simplicity, and route control, label switching can be a valuable tool to reduce latency and jitter for data transmission on packet-switched networks.
4.4.4 Combining QoS Solutions The QoS solutions previously described take different approaches, and each has its advantages and disadvantages. The Integrated Service approach is based on a sophisticated background of research in QoS mechanisms and protocols for packet networks. However, the acceptance of IntServ from network providers and router providers has been quite limited, at least so far, mainly due to scalability and manageability problems [10]. The scalability problems arise because IntServ requires routers to maintain control and a forwarding state for all flows passing through them. Maintaining and processing a per-flow state for gigabit or terabit links, with a lot of simultaneously active flows, is significantly difficult from an implementation point of view. Hence, the IntServ architecture makes the management and accounting of IP networks significantly more complicated. Additionally, it requires new application–network interfaces and can only provide service guarantees when all elements in the flow’s path support IntServ. MPLS may be used as an alternative intradomain implementation technology. These architectures in combination can enable end-to-end QoS. End hosts may use RSVP requests with high granularity (e.g., bandwidth, jitter, threshold, etc.). Border routers at backbone ingress points can then map those RSVP reservations to a class of service indicated by a DSCP or to a dedicated MPLS path. At the backbone egress point, the RSVP provisioning may be honored again, to the final destination; see Figure 4.8. Such combinations clearly represent a trade-off between service granularity and scalability: as soon as flows are aggregated, they are not as isolated from each other as they possibly were in the IntServ part of the network. This means that, for instance, unresponsive flows can degrade the quality of responsive flows. The strength of a combination is the fact that it gives network operators another opportunity to customize their network and fine-tune it based on QoS and scalability demands, as stated in [16]. Until now, IP has provided a best-effort service in which network resources are shared equitably. Adding quality-of-service support to the Internet raises significant concerns, since it enables Differentiated Services that represent a significant departure from the fundamental and simple design principles that made the Internet a success. Nonetheless, there is a significant need for IP QoS, and protocols have
© 2005 by CRC Press
4-14
The Industrial Communication Technology Handbook
evolved to address this need. The most viable solution today is a trade-off between protocol complexity and bandwidth scarcity with the following results: • Different QoS levels are used in the core network (e.g., four MPLS levels). • Applications at the user side are distinguished by DiffServ mechanisms. • The marked user traffic is mapped to the appropriate core layers. Finally, we should always bear in mind that an application-to-application guarantee not only depends on network conditions but also on the overall performance of each end system and the way of supporting real-time traffic, as discussed next.
4.5 Protocols Supporting Real-Time Traffic This section is intended to give a brief overview of protocols supporting end-to-end transport of realtime data. However, it may also be added that these protocols do not provide any QoS guarantees as previously described.
4.5.1 Real-Time Transport Protocol The Real-Time Transport Protocol (RTP) provides end-to-end delivery services for data with real-time characteristics, such as interactive audio and video or simulation data, over multicast or unicast network services. Real-time traffic examples are audio conversations between two people and playing individual video frames at the receivers as they are received from the transmitter. RTP itself, however, does not provide all of the functionality required for the transport of data, and therefore applications typically run RTP on top of UDP to make use of its multiplexing and checksum services. RTP is best described as an encapsulation protocol. The data field of the RTP packet carries the real-time traffic, and the RTP header contains information about the type of traffic that is transported [17]. RTP supports data transfer to multiple destinations using multicast distribution if provided by the underlying network, and may also be used with other suitable underlying network or transport protocols. RTP is described in the IETF’s RFC 3550 [18] specification as being a protocol providing end-to-end delivery services, such as payload type identification, time stamping, and sequence numbering, for data with real-time characteristics. RTP itself does not provide any mechanism to ensure timely delivery or provide other quality-of-service guarantees, but relies on lower-layer services to do so. It does not guarantee delivery or prevent out-of-order delivery, nor does it assume that the underlying network is reliable and delivers packets in sequence. The sequence numbers included in RTP allow the receiver to reconstruct the sender’s packet sequence (Figure 4.9). RTP consists of two closely linked parts: • The Real-Time Transport Protocol (RTP), to carry data that has real-time properties • The Real-Time Transport Control Protocol (RTCP), to monitor the quality of service and to convey information about the participants in an ongoing session
4.5.2 Real-Time Transport Control Protocol RTP usually works in conjunction with another protocol called the Real-Time Transport Control Protocol (RTCP), which provides minimal control over the delivery and quality of the data. It is based on the periodic transmission of control packets to all participants in the session, using the same distribution mechanism as the data packets. The underlying protocol must provide multiplexing of the data and control packets, for example, using separate port numbers with UDP. RTCP performs four main functions: • Feedback information — This is used to check the quality of the data distribution. During an RTP session, RTCP control packets are periodically sent by each participant to all the other participants. These packets contain information such as the number of RTP packets sent, the number of packets
© 2005 by CRC Press
4-15
Fundamentals in Quality of Service and Real-Time Transmission
0
32 V
P X
CC
M
sequence number
PT timestamp
synchronization source (SSRC) identifier contributing source (CSRC) identifier ....
• V — Version: Identifies the RTP version (V = 2). • P — Padding: When set, the packet contains one or more additional padding octets at the end that are not part of the payload. • X — Extension bit: When set, the fixed header is followed by exactly one header extension, with a defined format. • CSRC count (CC) — Contains the number of CSRC identifiers that follow the fixed header (0 to 15 items, 32 bits each). • M — Marker: The interpretation of the marker is defined by a profile. It is intended to allow significant events such as frame boundaries to be marked in the packet stream. • Payload type — Identifies the format of the RTP payload and determines its interpretation by the application. A profile specifies a default static mapping of payload type codes to payload formats. Additional payload type codes may be defined dynamically through non-RTP means. • Sequence number — Increments by 1 for each RTP data packet sent and may be used by the receiver to detect packet loss and restore packet sequence. • Time stamp — Reflects the sampling instant of the first octet in the RTP data packet. The sampling instant must be derived from a clock that increments monotonically and linearly in time to allow synchronization and jitter calculations. • SSRC — Synchronization source: This identifier is chosen randomly, with the intent that no two synchronization sources within the same RTP session will have the same SSRC identifier. • CSRC — Contributing sources identifier list: Identifies the contributing sources for the payload contained in this packet. FIGURE 4.9 RTP header.
lost, etc., which the receiving application or any other third-party program can use to monitor network problems. The application might then change the transmission rate of the RTP packets to help reduce any problems. • Transport-level identification — This is used to keep track of each of the participants in a session. RTCP carries a persistent transport-level identifier for an RTP source called the canonical name or CNAME. Since the SSRC identifier may change if a conflict is discovered or a program is restarted, receivers require the CNAME to keep track of each participant. It is also used to associate multiple data streams from a given participant in a set of related RTP sessions, e.g., the synchronization of audio and video. • Transmission interval control — The first two functions require that all participants send RTCP packets; therefore, the rate must be controlled in order for RTP to scale up to a large number of participants. By having each participant send its control packets to all the others, each can independently observe the number of participants. This number is used to calculate the rate at which the packets are sent, which ensures that the control traffic will not overwhelm network resources. Control traffic is limited to at most 5% of the overall session traffic. • Minimal session control — This optional function is to convey minimal session control information, e.g., to display the name of a new user joining an informal session. This is most likely useful in loosely controlled sessions where participants enter and leave without membership control or parameter negotiation.
© 2005 by CRC Press
4-16
The Industrial Communication Technology Handbook
When an RTP session is initiated, an application defines one network address and two ports for RTP and RTCP. If there are several media formats such as video and audio, a separate RTP session with its own RTCP packets is required for each one. Other participants can then decide which particular session and hence medium they want to receive. Overall, RTP provides a way in which real-time information can be transmitted over existing transport and underlying network protocols. It is important to realize that RTP is an application layer protocol and does not provide any QoS guarantees. However, it does allow for various types of impairments such as packet loss or jitter to be detected. With the use of a control protocol, RTCP, it provides a minimal amount of control over the delivery of the data. However, to ensure that the real-time data will be delivered on time, if at all, RTP must be used in conjunction with other mechanisms or protocols that will provide reliable service.
4.5.3 Real-Time Streaming Protocol The Real-Time Streaming Protocol (RTSP) [19] establishes and controls either a single or several timesynchronized streams of continuous media such as audio and video. RTSP does not typically deliver the continuous streams itself, although interleaving of the continuous media stream with the control stream is possible. RFC236 [21] describes RTSP as being an application-level protocol that controls the delivery of streaming media with real-time properties. This media can be streamed over unicast or multicast networks. RTSP itself does not actually deliver the media data. This is handled by a separate protocol, and therefore RTSP can be described as a kind of network remote control to the server that is streaming the media. Sources of data can include both live data feeds and stored clips. RTSP is intended to control multiple data delivery sessions, provide a means for choosing delivery channels such as UDP, multicast UDP, and TCP, and provide a means for choosing delivery mechanisms based upon RTP. The underlying protocol that is used to control the delivery of the media is determined by the scheme used in the RTSP Uniform Resource Locator (URL). The schemes that are supported on the Internet are “rtsp:,” which requires that the commands are delivered using a reliable protocol, e.g., TCP; “rtspu:,” which identifies an unreliable protocol such as UDP; and “rtsps:,” which requires a TCP connection secured by the Transport Layer Security (TLS) protocol. Therefore, a valid RTSP URL could be “rtspu://foo.bar.com:5150,” which requests that the commands be delivered by an unreliable protocol to the server “foo.bar.com,” on port 5150. There is no notion of an RTSP connection; instead, a server maintains a session labeled by an identifier. During an RTSP session, an RTSP client may open and close many reliable transport connections to the server to issue RTSP requests. Alternatively, it may use a connectionless transport protocol such as UDP. RTSP is intentionally similar in syntax and operation to the Hypertext Transfer Protocol (HTTP) so that extension mechanisms to HTTP can in most cases also be added to RTSP. The protocol supports the following operations: • Retrieval of media from media server: The client can request a presentation description via HTTP or some other method. • Invitation of a media server to a conference: A media server can be invited to join an existing conference, either to play back media into the presentation or to record all or a subset of the media in a presentation. • Addition of media to an existing presentation: Particularly for live presentations, it is useful if the server can tell the client about additional media becoming available. Since most servers are designed to handle more than one user at a time, the server needs to be able to maintain a session state, i.e., whether it is setting up a session (the SETUP state), playing a stream (the PLAY state), etc. This will allow it to correlate RTSP requests with the relevant stream. HTTP, however, is a stateless protocol since typically there is no need to save the state of each client. Another area in which HTTP and RTSP differ is in the way the client and server interact. With HTTP the interaction is one way — the client issues a request for a document and the server responds. With
© 2005 by CRC Press
Fundamentals in Quality of Service and Real-Time Transmission
4-17
RTSP both the client and server can issue requests. To summarize, RTSP is more of a protocol framework than a protocol itself.
References [1] Uyless D. Black, MPLS and Label Switching Networks, Prentice Hall, Englewood Cliffs, NJ, 2001. [2] Douglas E. Comer, Computer Networks and Internets, 2nd edition, Prentice Hall, Englewood Cliffs, NJ, 1999. [3] P. Ferguson and G. Houston, Quality of Service: Delivering QoS on the Internet and in Corporate Networks, John Wiley & Sons, New York, 1998. [4] ITU-T Recommendation G.114, One-Way Transmission Time, International Telecommunication Union, 1996. [5] S. Kalinindi, OWDP: A Protocol to Measure One-Way Delay and Packet Loss, Technical Report STR-001, Advanced Network and Services, September 1998. [6] S. Keshav, An Engineering Approach to Computer Networking, Addison-Wesley, Reading, MA, 1997. [7] T. Kushida, The traffic and the empirical studies for the Internet, in Proc. IEEE Globecom 98, Sydney, 1998, pp. 1142–1147. [8] V. Paxson, Towards a Framework for Defining Internet Performance Metrics, Technical Report LBNL-38952, Network Research Group, Lawrence Berkeley National Laboratory, June 1996. [9] RFC2205, Resource ReSerVation Protocol (RSVP) Version 1 Functional Specification, September 1997. [10] RFC2208, Resource ReSerVation Protocol (RSVP) Version 1 Applicability Statement: Some Guidelines on Deployment, September 1997. [11] RFC2210, The Use of RSVP with IETF Integrated Services, September 1997. [12] RFC2474, Definition of the Differentiated Services Field (DS Field in the IPv4 and IPv6 Headers), December 1998. [13] RFC2475, An Architecture for Differentiated Services, September 1997. [14] R. Seifert, Gigabit Ethernet: Technology and Applications for High-Speed LANs, Addison-Wesley, Reading, MA, 1998. [15] R.W. Stevens, TCP/IP Illustrated: The Protocols, Volume 1, Addison-Wesley, New York, 1994. [16] M. Welzl and M. Mühlhäuser, Scalability and quality of service: a trade-off? IEEE Communications Magazine, 41, 32–36, 2003. [17] Uyless D. Black, Voice over IP, Prentice Hall, Englewood Cliffs, NJ, 2000. [18] RFC 3550, RTP: A Transport Protocol for Real-Time Applications, July 2003. [19] H. Schulzrinne, A. Rao, and R. Lanphier, Real Time Streaming Protocol, Internet Draft, 1998. [20] D. Collins, Carrier Grade Voice Over IP, 2nd edition, McGraw-Hill, New York, 2003. [21] RFC2326, Real Time Streaming Protocol (RTSP), April 1998.
© 2005 by CRC Press
5 Survey of Network Management Frameworks 5.1 5.2 5.3
Introduction ........................................................................5-1 Network Management Architecture...................................5-3 ISO Systems Management Framework..............................5-4 Functional Aspects • Information Aspects • Organization Aspects • Communication Aspects
5.4
Internet Management Framework .....................................5-7
5.5
ISO and Internet Management Standards: Analysis and Comparison................................................................5-12
SNMPv1 • SNMPv2 • SNMPv3
SNMP and CMIP • MIBs and SMI • Network Management Functions
5.6
DHCP: IP Address Management Framework for IPv4 ....................................................................................5-14 IP Address Allocation Mechanisms • The IP Address Management of DHCP • Advantages and Disadvantages of DHCP for IPv4
Mai Hoang University of Potsdam
5.7 Conclusions .......................................................................5-17 References .....................................................................................5-18
5.1 Introduction Computer networks and distributed processing systems continue to grow in scale and diversity in business, government, and other organizations. Three facts become evident. First, new networks are added and existing ones are expanded almost as rapidly as new network technologies and products are introduced. The problems associated with network expansion affect day-to-day network operation management. Second, the network and its resources and distributed services become indispensable to organizations. Third, more things can go wrong, which disable the network or degrade the performance to an unacceptable level. Inhomogeneous large networks cannot be put together and managed by human effort alone. Instead, their complexity dictates the use of a rich set of automated network management tools and applications. In response, the International Organization for Standardization (ISO) began work in 1978 to establish a standard for network management, the Open Systems Interconnection (OSI) network management, including the management model, functional areas, Common Management Information Services (CMIS), Common Management Information Protocol (CMIP), and management information base (MIB) [ROS90]. The network management model describes the main components of a network man-
5-1 © 2005 by CRC Press
5-2
The Industrial Communication Technology Handbook
agement tool for a managed network. For a given managed network, it is necessary to know which problem areas have to be considered. These problem areas are specified as the ISO management functions, which were already contained in the first ISO working draft of the management framework and gradually evolved into what is presently known as the five functional areas of the ISO management framework (performance, faults, configuration, accounting, security). The important pieces of the ISO management are CMIP and CMIS, which managing and managed devices use for their communication. The CMIP consists of a set of services, the so-called CMIS that define the types of requests and responses and the actions they should invoke. In addition to being able to pass information back and forth, the managing and managed devices need to agree on a set of variables and means to initiate actions. The collection of this information is referred to as the management information base (MIB). Because of the slowness of the ISO standardization process, the complexity of the proposed new standard, and the urgent need for management tools, the Internet Engineering Task Force (IETF) devised the Simple Network Management Protocol (SNMP) [RFC1157], which was originally regarded as a provisional means for network management until the OSI management standards were complete, but subsequently became a de facto standard because of its dissemination and simplicity. The SNMP consists of three parts: the protocol, the structure of management information (SMI), and the management information base (MIB). The SNMP includes the SNMP operations, the format of messages, and how messages are exchanged between a manager and agent. The SMI is a set of rules allowing a user to specify the desired management information, e.g., by providing a means of naming and declaring the types of variables. Finally, the MIB is a structured collection of all managed objects maintained by a device. The managed objects are structured as a hierarchical tree. In order to address several weaknesses within SNMP, SNMP version 2 (SNMPv2) was initiated around 1994. SNMPv2 provides more functionality and greater efficiency than the original version of SNMP, but for various reasons SNMPv2 did not succeed. Finally, SNMP version 3 (SNMPv3) was issued in 1998. SNMPv3 describes an overall framework for present and future versions of SNMP and defines security features to SNMP. Both ISO and IETF frameworks are used for developing the network management systems and applications for monitoring and controlling the hardware as well as software components. In addition to these complex frameworks, other simple management frameworks have been developed. Each of these frameworks focuses only on a particular management task. One sort of framework is for Internet Protocol (IP) address management, which has been around since the advent of networks — each component within a network must have a set of definite, unique parameters so the rest of the network can recognize them. Traditionally, most network administrators used a pen and paper or a spreadsheet to keep track of their networks’ parameters. While this was sufficient for small networks with a few hosts, increased management expenses naturally followed as the networks grew and changed. Thus, the process of IP address management needed to be done through automated management applications. In response to this need, the IETF created the Dynamic Host Configuration Protocol (DHCP). DHCP was developed from an earlier protocol called the Bootstrap Protocol (BOOTP) [RFC951, RFC1542], which was used to pass information during initial booting to client systems. The BOOTP was designed to store and update static information for clients, including IP addresses. The BOOTP server always issued the same IP address to the same client. As a result, while BOOTP addressed the need for central management, it did not address the problem of managing IP addresses as a dynamic resource. To address the need to manage dynamic configuration information in general, and dynamic IP addresses specifically, the IETF standardized the DHCP as a framework for automatic IP verson 4 (IPv4) address management. To standardize the DHCP environment, the IETF issued a series of RFCs [RFC1542, RFC2131, RFC2132] focused on DHCP extensions to the BOOTP. The most recent of these standards is RFC2131, which was issued in March 1997. DHCP is built on a client–server model. It includes two parts: the mechanisms for IP address allocation and the protocol for communication between DHCP servers and DHCP clients. The most important features of DHCP are as follows. First, DHCP permits a server to allocate IP addresses automatically. Automatic address allocation is needed for environments such as wireless networks, where a computer can attach and detach quickly. Second, DHCP allows a client to acquire all the configuration information it needs in a single message.
© 2005 by CRC Press
Survey of Network Management Frameworks
5-3
FIGURE 5.1 Manager–agent architecture.
This chapter focuses on the network management frameworks. First, it provides a comprehensive survey of conceptual models, protocols, services, and management information bases of the ISO and IETF management framework. Following that, the DHCP for IPv4 is discussed in detail. The chapter is organized as follows. Section 5.2 describes the network management model. The ISO network management framework is discussed briefly in Section 5.3, while Section 5.4 presents the IETF management framework. A comparison of these management standards is given in Section 5.5. Section 5.6 provides an overview of DHCP. Section 5.7 concludes the chapter with an overview of the open problems in network management.
5.2 Network Management Architecture The network management architecture used in ISO and IETF frameworks is called manager–agent architecture and includes the following key components: • • • •
Managed devices Management stations Management protocols Management information
These pieces are shown in Figure 5.1 and described below. Network management is done from management stations, which are computers running special management software. These management stations contain a set of management application processes called managers for data analysis, fault recovery, and so on. The manager is the locus of activity for network management: it provides or monitors information to users; it issues requests to managed devices in order to ask them to take some action; it receives responses to the requests; and it receives unsolicited reports from managed devices concerning the status of the devices — these reports are referred to as notification and are frequently used to report problems, anomalies, or changes in the agent environment. A managed device is a piece of network equipment that resides in a managed network. The managed devices might be hosts, routers, switches, bridges, or printers. To be managed from a management station, a device must be capable of running a management process, called (management) agent. These agents communicate with managers running on the management station and take local actions on the managed device under the command and control of the managers. An agent can act upon and respond to requests from a manager; furthermore, it can provide unsolicited notifications to a manager. Each managed device maintains one or more variables (for example, a network interface card or a set of configuration parameters for a piece of hardware or software) that describe its state. In the ISO and IETF management frameworks these variables are called managed objects. The collection of these managed objects is referred to as a management information base (MIB). These variables can be viewed and optionally modified by the managers. The network management protocol is needed for communication between managers and agents. This protocol allows the manager to query the status of managed devices and to initiate actions at these devices
© 2005 by CRC Press
5-4
The Industrial Communication Technology Handbook
by triggering the agents. Furthermore, agents can use the network management protocol to report exceptional events to the management stations. When describing any framework for network management, the following aspects must be addressed: • Functional aspect: Specifies management functional areas supported by managers and agents. This aspect relates to specific management functions that are carried out by the manager or agent. • Information aspect: Defines the kind of information that will be exchanged between manager and agent. The information aspect deals with MIBs and SMI. • Communication aspect: Addresses the communication protocol between manager and agent for exchanging this information. • Organization aspect: Deals with the definition of the principle structural components and the management architecture for a managed network. The OSI and IETF management frameworks are discussed in the following subsections from the view of these aspects.
5.3 ISO Systems Management Framework The first standard for network management was ISO 7498-4, which specifies the network management framework for the OSI model [ISO7498-4]. Although the production of this framework took considerable time, it was not generally accepted as an adequate starting point. It was therefore decided to issue an additional standard, which was called the Systems Management Overview [ISO10040]. Subsequently, ISO has issued a set of other standards for network management. Together these standards provide the basis for the OSI management framework.
5.3.1 Functional Aspects OSI Systems Management standardization followed a top-down approach, with a number of systems management functional areas (SMFAs) identified first. The intention was not to describe exhaustively all relevant types of management activity, but rather to investigate the key requirements and address these through a generic management model. The identified areas were fault, configuration, accounting, performance, and security management, collectively referred to as FCAPS from their initials [Sta93a]. 5.3.1.1 Fault Management Fault management deals with the mechanisms for the detection, isolation, and correction of abnormal operations. Fault management includes functions to: • • • •
Maintain and examine error logs Trace and identify faults Accept and act upon error notifications Carry out diagnostic tests and correct faults
5.3.1.2 Configuration Management Configuration management is the set of facilities that allow network managers to exercise control over the configuration of the network components and OSI layer entities. Configuration management includes functions to: • • • • •
Record the current configuration Record changes in the configuration Initialize and close down managed objects Identify the network components Change the configuration of managed objects (e.g., routing table)
© 2005 by CRC Press
Survey of Network Management Frameworks
5-5
5.3.1.3 Accounting Management Accounting management deals with the collection and processing of accounting information for charging and billing purposes. It should enable accounting limits to be set and costs to be combined when multiple resources are used in the context of a service. Accounting management includes functions to: • Inform users of the cost thus far • Inform users of the expected cost in the future • Set cost limits 5.3.1.4 Performance Management Performance management is the set of facilities that enable the network managers to monitor and evaluate the performance of the system and layer entities. Performance management involves three main steps: (1) performance data are gathered on variables of interest to network administrators, (2) the data are analyzed to determine normal (baseline) levels, and (3) appropriate performance thresholds are determined for each important variable so that exceeding these thresholds indicates a network problem worth attention. Management entities continually monitor performance variables. When a performance threshold is exceeded, an alert is generated and sent to the network management system. Performance management provides functions to: • Collect and disseminate data concerning the current level of performance of resources • Maintain and examine performance logs for planning and analysis purposes 5.3.1.5 Security Management Security management addresses the control of the access to network resources according to local guidelines so that the network cannot be damaged and persons without appropriate authorization cannot access sensitive information. A security management subsystem, for example, can monitor users logging on to a network resource and can refuse access to those who enter inappropriate access codes. Security management provides support for management of: • • • • •
Authorization facilities Access control Encryption and key management Authentication Security logs
Soon after the first working drafts of the management framework appeared, ISO started to define protocol standards for each of the five SMFAs. After some time, an interesting observation was made that most of the functional area protocols used a similar set of elementary management functions. ISO therefore decided to stop further progression of the five functional area protocols and concentrate on the definition of elementary management functions. Following this, a set of standards, e.g., object management, state management, relationships management, alarm reporting, event report management, and log control, have been issued as the general category systems management functions (SMFs). Each SMF standard defines the functionality to support specific management functional area (SMFA) requirements. Moreover, these standards provide a mapping between the CMIS (discussed below) and SMFs.
5.3.2 Information Aspects The information aspects of OSI systems management deal with the resources that are being managed by agents. OSI systems management relies on object-oriented concepts. Therefore, each resource being managed is represented by a managed object. A managed object may represent either a logical resource, such as a user account, or a real resource, like an ATM switch. Managed objects that refer to resources specific to an individual layer are called (N)-layer managed objects. Managed objects that refer to resources that encompass more than one layer are called systems managed objects. According to the OSI
© 2005 by CRC Press
5-6
l
The Industrial Communication Technology Handbook
Management Information Model [ISO10165-1], a managed object is defined in terms of attributes it possesses, operations that may be performed on it, notifications that it may issue, and its interactions with other managed objects. The managed objects are defined using two standards: Abstract Syntax Notation 1 (ASN.1), to define data types, and Guidelines for Definition of Managed Objects (GDMO), to define managed objects [ASN90, ISO10165-1]. Under systems management, all the managed objects are represented in the so-called management information base (MIB). The managed object concept is refined in a number of additional standards that are called the structure of management information (SMI) standards [ISO10165-1, ISO10165-2, ISO10165-4, ISO10165-5, ISO10165-7]. The SMI identifies the data types that can be used in the MIB and how the resources within the MIB are represented and named [Sta93b].
5.3.3 Organization Aspects The key elements of the OSI architectural model include the systems management application process (SMAP), systems management application entity (SMAE), layer management entity, and management information base. SMAP is the process within a managing device that is responsible for executing the network management functions; it has access to all parameters of managed devices and can therefore manage all aspects of a managed network. SMAP works in cooperation with SMAPs on other managed networks. A SMAE is responsible for communication with other devices, especially with devices exercising control functions. CMIP is used as a standardized application-level protocol by SMAE. Layer management entity is the logic embedded into each layer of OSI architecture to provide network management functions that are specific for this layer. To provide management of a distributed system, the elements in this architectural model must be implemented in a distributed fashion across all of the devices in a managed network. The OSI systems management is organized in a central manner. According to this scheme, a single manager may control several agents. Each agent contains a number of objects. Each object is a data structure that corresponds to an actual piece of device to be managed. The SMAP is allowed to take on either a manager role or an agent role. The manager role for a SMAP occurs in a device that acts as a network control center. The agent role for a SMAP occurs in managed devices. The manager performs operations upon the agents, and the agents forward notifications to the managers. Because of the expansion of the open system, the OSI management environment may be partitioned into a number of management domains. The partition can not only be based on the required management functions (security, accounting, performance, etc.), but also on other requirements (e.g., geographical).
5.3.4 Communication Aspects The communication aspect deals with the exchange of systems management information between manager and agents within a managed network. Relating to this aspect, ISO has issued two standards, the Common Management Information Services (CMIS) and the Common Management Information Protocol (CMIP) [ISO9595, ISO9596]. CMIS provides OSI management services to management applications. It defines a set of management services, specifies types of requests and responses, and defines what each request and response can do. The management processes initiate these services in order to communicate remotely. Seven services used to handle management information have been standardized. Table 5.1 lists the CMIS with their type and function. The CMIP provides the information exchange capability to support CMIS; it defines a set of protocol data units that implement the CMIS [ISO9596]. In particular, CMIP defines how the requests, responses, and notifications are encoded into messages and specifies which bearer service is used to transport those encoded messages between managers and agents. A CMIP request typically specifies one or more managed objects to which the request is to be sent. The correspondence between CMIS primitives and CMIP data units is described in [Sta99].
© 2005 by CRC Press
5-7
Survey of Network Management Frameworks
TABLE 5.1
CMIS
CMIS Services
Type
Functions
M-EVENT-REPORT
Confirmed/not confirmed
Notification Services Gives notification of an event occurring on a managed object
Operation Services M-GET M-SET M-ACTION M-CREATE M-DELETE M-CANCEL-GET
Confirmed Confirmed/not confirmed Confirmed/not confirmed Confirmed Confirmed Confirmed
Request for mangement data Modification of management data Execution of action on a managed object Creation of a managed object Deletion of a managed object Request to cancel any new responses to a previous request for M-GET services
5.4 Internet Management Framework An interesting difference between the IETF and ISO is that the IETF takes a more pragmatic and resultdriven approach than ISO. In the IETF, it is, for instance, unusual to spend much time on architectural discussions; people prefer to use their time for the development of protocols and implementations. This difference explains why no special standards management architecture and function areas have been defined in the first two versions of SNMP; only the communication aspect (as SNMP), the information aspect (as SMI and MIB), and the security aspect have been standardized. SNMP is an application layer protocol that facilitates the exchange of management information between network devices. It is a part of the Transmission Control Protocol (TCP)/IP suite and operates over User Datagram Protocol (UDP). As described in Section 5.2, the IETF network management is based on the manager–agent architecture. Figure 5.2 shows the architecture of the Internet management. In this architecture, a manager process controls access to a central MIB at the management station and provides an interface to the management application. Furthermore, a manager may control many agents, whereby each agent interprets the SNMP messages and controls the agent’s MIBs. In Section 5.2, we have provided an overview of the basic components of a management architecture used by ISO and IETF. The IETF network management framework consists of: • SNMP. SNMP is a management protocol for conveying information and commands between a manager and an agent running in a managed network device [KR01]. • MIB. Resources in networks may be managed by representing them as objects. Each object is a data variable that represents one aspect of a managed device. In the IETF network management framework, the representation of a collection of these objects is called the management information base (MIB) [RFC1066, RFC1157, RFC1212]. A MIB object might be a counter such as the number of IP datagrams discarded at a router due to errors, descriptive information such as generic information about the physical interfaces of the entity, or protocol-specific information such as the number of UDP datagrams delivered to UDP users. Management application Manager process Central MIB
SNMP UDP IP Network-dependent protocols
FIGURE 5.2 Internet management architecture.
© 2005 by CRC Press
Application manages objects SNMP messages Network or Internet
Agent process SNMP UDP IP Network-dependent protocols
Agent MIB
5-8
The Industrial Communication Technology Handbook
• SMI. SMI [RFC1155] allows the formal specification of the data types that are used in a MIB and specifies how resources within a MIB are named. The SMI is based on the ASN.1 (Abstract Syntax Notation 1) [ASN90] object definition language. However, since many SMI-specific data types have been added, SMI should be considered a data definition language of its own right. • Security and administration are concerned with monitoring and controlling access to managed networks and access to all or part of management information obtained from network nodes. In the following sections, an overview of several SNMP versions (SNMPv1, SNMPv2, SNMPv3) with respect to protocol operations, MIB, SMI, and security is given.
5.4.1 SNMPv1 The original network management framework is defined in the following documents: • RFC 1155 and RFC 1212 define SMI, the mechanisms used for specifying and naming managed objects. RFC 1215 defines a concise description mechanism for defining event notifications that are called traps in SNMPv1. • RFC 1157 defines SNMPv1, the protocol used for network access to managed objects and event notification. • RFC 1213 contains definitions for a specific MIB (MIB I) covering TCP, UDP, IP, routers, and other inhabitants of the IP world. 5.4.1.1 SMI The RFCs 1155, 1212, and 1215 describe the SNMPv1 structure of management information and are often referred to as SMIv1. Note that the first two SMI documents do not provide definitions of event notifications (traps). Because of this, the last document specifies a straightforward approach toward defining event notifications used with the SNMPv1 protocol. 5.4.1.2 Protocol Operations In SNMPv1, communication between manager and agent is performed in a confirmed way. The manager at the network management station takes the initiative by sending one of the following SNMP protocol data units (PDUs): GetRequest, GetNextRequest or SetRequest. The GetRequest and GetNextRequest are used to get management information from the agent; the SetRequest is used to change management information at the agent. After reception of one of these PDUs, the agent responds with a response PDU, which carries the requested information or indicates failure of the previous request (Figure 5.3). It is also possible that the SNMP agent takes the initiative. This happens when the agent detects some extraordinary event such as a status change at one of its links. As a reaction to this, the agent sends a trap PDU to the manager [RFC1215]. The reception of the trap is not confirmed (Figure 5.3(d)). 5.4.1.3 MIB As noted above, the MIB can be thought of as a virtual information store, holding managed objects whose values collectively reflect the current state of the network. These values may be queried or set by a manager by sending SNMP messages to the agent. Managed objects are specified using the SMI discussed above. The IETF has been standardizing the MIB modules associated with routers, hosts, and other network equipment. This includes basic identification data about a particular piece of hardware and management information about the devices network interfaces and protocols. With the different SNMP standards, the IETF needed a way to identify and name the standardized MIB modules, as well as the specific managed objects within a MIB module. To do that, the IETF adopted ASN.1 as a standardized object identification (naming) framework. In ASN.1, object identifiers have a hierarchical structure, as shown in Figure 5.4. The global naming tree illustrated in Figure 5.4 allows for unique identification of objects, which correspond to leaf nodes. Describing an object identifier is accomplished by traversing the tree, starting
© 2005 by CRC Press
5-9
Survey of Network Management Frameworks
FIGURE 5.3 Initiative from manager (a, b, c) and agent (d).
at the root, until the intended object is reached. Several formats can be used to describe an object identifier, with integer values separated by dots being the most common approach. As shown in Figure 5.4, ISO and the Telecommunications Standardization Sector of the International Telecommunications Union (ITU-T) are at the top of the hierarchy. Under the Internet branch of the tree (1.3.6.1), there are seven categories. Under the management (1.3.6.1.2) and MIB-2 (1.3.6.1.2.1) branches of the object identifier tree, we find the definitions of the standardized MIB modules. The
ITU-T (0)
ISO (1)
Standard (0)
ISO member body (2)
Joint ISO/ITU-T (2)
ISO identified organization (3)
US Dod (6) Internet (1)
directory (1) experimental (3) Security (5) management (2) private (4) SNMPv2 (6) MIB-2 (1)
system (1)
mail (7)
address icmp (5) udp (7) cmot (9) snmp (11) RMON (16) translation (3) interface (2) ip (4) tcp (6) egp (8) transmission (10)
FIGURE 5.4 ASN.1 object identifier tree.
© 2005 by CRC Press
5-10
The Industrial Communication Technology Handbook
lowest level of the tree shows some of the important hardware-oriented MIB modules (system and interface) as well as modules associated with some of the most important Internet protocols. RFC 2400 lists all standardized MIB modules. 5.4.1.4 Security The security capabilities deal with mechanisms to control the access to network resources according to local guidelines so that the network cannot be damaged (intentionally or unintentionally) and persons without appropriate authorization have no access to sensitive information. SNMPv1 has no security features. For example, it is relatively easy to use the SetRequest command to corrupt the configuration parameters of a managed device, which in turn could seriously impair network operations. The SNMPv1 framework only allows the assignment of different access rights to variables (READ-ONLY, READ-WRITE), but performs no authentication. This means that anybody can modify READ-WRITE variables. This is a fundamental weakness in the SNMPv1 framework. Several proposals have been presented to improve SNMP. In 1992, IETF issued a new standard, SNMPv2.
5.4.2 SNMPv2 Like SNMPv1, the SNMPv2 network management framework [RFC1213, RFC1441, RFC1445, RFC1448, RFC1902] consists of four major components: • RFC1441 and RFC1902 define the SMI, the mechanisms used for describing and naming objects for the purpose of management. • RFC1213 defines MIB-2, the core set of managed objects for the Internet suite of protocols. • RFC1445 defines the administrative and other architectural aspects of the framework. • RFC1448 defines the protocol used for network access to managed objects. The main achievements of SNMPv2 are improved performance, better security, and a possibility to build a hierarchy of managers. 5.4.2.1 Performance SNMPv1 includes a rule that states that if the response to a GetRequest or GetNextRequest (each of which can ask for multiple variables) would exceed the maximum size of a packet, no information will be returned at all. Because managers cannot determine the size of response packets in advance, they usually take a conservative guess and request just a small amount of data per PDU. To obtain all information, managers are required to issue a large number of consecutive requests. To improve performance, SNMPv2 introduced the GetBulk PDU. In comparison with Get and GetNext, the response to GetBulk always returns as much information as possible in lexicographic order. 5.4.2.2 Security The original SNMP had no security features. To solve this deficiency, SNMPv2 introduced a security mechanism that is based on the concepts of parties and contexts. The SNMPv2 party is a conceptual, virtual execution environment. When an agent or manager performs an action, it does so as a defined party, using the party’s environment as described in the configuration files. By using the party concept, an agent can permit one manager to do a certain set of operations (e.g., read, modify) and another manager to do a different set of operations. Each communication session with a different manager can have its own environment. The context concept is used to control access to the various parts of a MIB; each context refers to a specific part of a MIB. Contexts may be overlapping and are dynamically configurable, which means that contexts may be created, deleted, or modified during the network’s operational phase. 5.4.2.3 Hierarchy of Managers Practical experience with SNMPv1 showed that in several cases managers are unable to manage more than a few hundred agent systems. The main cause for this restriction is due to the polling nature of
© 2005 by CRC Press
Survey of Network Management Frameworks
5-11
FIGURE 5.5 Hierarchy of managers.
SNMPv1. This means that the manager must periodically poll every system under his control, which takes time. To solve this problem, SNMPv2 introduced the so-called intermediate-level managers concept, which allows polling to be performed by a number of intermediate-level managers under control of top-level managers (TLMs) via the InformRequest command provided by SNMPv2. Figure 5.5 shows an example of hierarchical managers: before the intermediate-level managers start polling, the top-level manager tells the intermediate-level managers which variable must be polled from which agents. Furthermore, the toplevel manager tells the intermediate-level manager of the events he wants to be informed about. After the intermediate-level managers are configured, they start polling. If an intermediate-level manager detects an event of interest to the top-level manager, a special Inform PDU is generated and sent to the TLM. After reception of this PDU, the TLM directly operates upon the agent that caused the event. SNMPv2 dates back to 1992, when the IETF formed two working groups to define enhancements to SNMPv1. One of these groups focused on defining security functions, while the other concentrated on defining enhancements to the protocol. Unfortunately, the group tasked with developing the security enhancements broke into separate camps with diverging views concerning the manner by which security should be implemented. Two proposals (SNMPv2m and SNMPv2*) for the implementation of encryption and authentication have been issued. Thus, the goal of the SNMPv3 working group was to continue the effort of the disbanded SNMPv2 working group to define a standard for SNMP security and administration.
5.4.3 SNMPv3 The third version of the Simple Network Management Protocol (SNMPv3) was published as proposed standards in RFCs 2271 to 2275 [RFC2271, RFC2272, RFC2273, RFC2274, RFC2275], which describe an overall architecture plus specific message structure and security features, but do not define a new SNMP PDU format. This version is built upon the first two versions of SNMP, and so it reuses the SNMPv2 standards documents (RFCs 1902 to 1908). SNMPv3 can be thought of as SNMPv2 with additional security and administration capabilities [RFC2570]. This section focuses on the management architecture and security capacities of SNMPv3. 5.4.3.1 The Management Architecture The SNMPv3 management architecture is also based on the manager–agent principle. The architecture described in RFC 2271 consists of a distributed, interacting collection of SNMP entities. Each entity implements a part of the SNMP capabilities and may act as an agent, a manager, or a combination of both. The SNMPv3 working group defines five generic applications (Figure 5.6) for generating and receiving SNMP PDUs: command generator, command responder, notification originator, notification receiver, and proxy forwarder. A command generator application generates the GetRequest, GetNextRequest, GetBulkRequest, and SetRequest PDUs and handles Response PDUs. A command responder application executes in an agent and receives, processes, and replies to the received GetRequest, GetNextRequest,
© 2005 by CRC Press
5-12
The Industrial Communication Technology Handbook
FIGURE 5.6 SNMPv3 entity.
GetBulkRequest, and SetRequest PDUs. A notification originator application also executes within an agent and generates Trap PDUs. A notification receiver accepts and reacts to incoming notifications. And a proxy forwarder application forwards request, notification, and response PDUs. The architecture shown in Figure 5.6 also defines an SNMP engine that consists of four components: dispatcher, message processing subsystem, security subsystem, and access control subsystem. This SNMP engine is responsible for preparing PDU messages for transmission, extracting PDUs from incoming messages for delivery to the applications, and doing security-related processing of outgoing and incoming messages. 5.4.3.2 Security The security capabilities of the SNMPv3 are defined in RFC 2272, RFC 2274, RFC 2275, and RFC 3415 {RFC3415]. These specifications include message processing, a user-based security model, and a viewbased access control model. The message processing can be used with any security model as follows. For outgoing messages, the message processor is responsible for constructing the message header attached to the outgoing PDUs and for passing the appropriate parameters to the security entity so that it can perform authentication and privacy functions, if required. For incoming messages, the message processor is used for passing the appropriate parameters to the security model for authentication and privacy processing and for processing and removing the message headers of the incoming PDUs. The user-based security model (USM) specified in RFC 2274 uses data encryption standard (DES) for encryption and hashed message authentication codes (HMACs) for authentication [Sch95]. USM includes means for defining procedures by which one SNMP engine obtains information about another SNMP engine, and a key management protocol for defining procedures for key generation, update, and use. The view-based access control model implements the services required for an access control subsystem [RFC2275]. It makes an access control decision that is based on the requested resource, the security model and security level used for communicating the request, the context to which access is requested, the type of access requested, and the actual object for which access is requested.
5.5 ISO and Internet Management Standards: Analysis and Comparison The purpose of this section is to compare the two different network management frameworks described in the previous sections. This comparison focuses on the four management aspects described above (functional, information, communication, organization). In particular, the network management protocols (SNMP and CMIP), the management information base (MIB), and the management functions, management architectures, and security capabilities of these two frameworks are discussed. Possible solutions to some disadvantages are also presented.
© 2005 by CRC Press
Survey of Network Management Frameworks
5-13
5.5.1 SNMP and CMIP The biggest advantage of SNMP over CMIP is its simple design, which makes it easy to implement and as easy to use on a small network as on a large one. Users can specify the variables to be monitored in a straightforward manner. From a low-level perspective, each variable consists of the following information: • • • •
Variable name Its data type Its access attributes (READ-ONLY or READ-WRITE) Its value
Another advantage of SNMP is that it is in wide use today around the world. It has became so popular that no other network management protocol appeared to be likely to replace SNMP. The result of this is that almost all major vendors of network hardware, such as bridges and routers, design their products to support SNMP, making it very easy to implement. SNMP also has several disadvantages. The first deficiency with SNMP is that it has some large security leaks that can give network intruders access to managed devices. Intruders could also potentially shut down some terminals. To solve this problem, SNMPv2 and SNMPv3 have added some security mechanisms, as described above, that help combat the security problems. In comparison with SNMP, CMIP has a lot of advantages. The biggest advantage of CMIP is that an agent not only relays information to and from a terminal, as in SNMP, but in CMIP, an agent can perform management functions on its own instead of being restricted to gather information for remote processing by a manager. Another advantage of the CMIP approach is that it addresses many of the shortcomings of SNMP. For instance, it has built-in security management devices that support authorization, access control, and security logs. The result of this is a safer system from the beginning; no security upgrades are necessary. CMIP has many advantages, but has not been implemented yet. One problem of CMIP is that it needs more system resources than SNMP. Furthermore, a full implementation of CMIP requires adding more processes to network elements. One possible work-around is to decrease the size of the protocol by changing its specifications. Another problem with CMIP is that it is very difficult to program.
5.5.2 MIBs and SMI MIB and SMI represent the information aspects of a network management framework. In the ISO management framework, managed objects within the MIB are complex and have sophisticated data structures with three attributes: variable attributes that represent the variable characteristics, variable behaviors that define what actions can be triggered on that variable, and notifications that generate an event report whenever a specific event occurs [Sta99]. In contrast, in the SNMP framework, variables are only used to relay information to and from managers. The SNMP MIB concept has two important disadvantages. The first one is that the user has to know the names and meanings of (thousands of) different variables, which can be a daunting task. The second problem is the lack of variable aggregation: when the user wants to inquire about the values contained in an array, he has to ask separately for each element instead of naming the array at once. The latter problem has been fixed in the newer releases of SNMP: SNMPv2 and SNMPv3. These versions provide means for aggregating variables, e.g., by the new GetBulkRequest service. In fact, so many new features have been added that the formal specifications for SNMP MIBs have expanded considerably.
5.5.3 Network Management Functions One advantage of the ISO management framework over the IETF framework is that ISO has issued the five specific management functional areas, which are useful for users to develop management applications. In contrast, the IETF management framework has not defined any specific network management functions. These have to be provided entirely by the user. In fact, the IETF management standards explain
© 2005 by CRC Press
5-14
The Industrial Communication Technology Handbook
how individual management operations should be performed, but they do not specify the sequence in which these operations should be carried out to solve particular management problems.
5.6 DHCP: IP Address Management Framework for IPv4 In the previous sections, two network management frameworks have been discussed. These standards are used for developing automated network management tools and applications for monitoring and maintaining the network. In this section, the Dynamic Host Configuration Protocol (DHCP), as an IP address management framework derived from IETF, will be discussed. In comparison with the standards described before, DHCP is based on neither ISO nor IETF management standards. Each computer that can connect to the Internet needs a unique IP address. When an organization sets up its computers with a connection to the Internet, an IP address must be assigned to each machine. In the early phase of the Internet, the administrator had to manually assign an IP address to each computer, and if computers were moved to another location in another part of the network, a new IP address had to be entered. With the daily changes and additions of new IP addresses, it has become extremely difficult to keep track of the IP address records across the multitude of IP nodes and subnets. Problems involving duplicate IP addresses, missing devices, and overflows of allocated IP address pools can bring down parts or the whole of a network until the problems are manually remedied. To overcome those problems, IETF has developed the Dynamic Host Configuration Protocol [RFC2131, RFC2132, RFC3046], which allows for the automatic assignment of IP addresses to devices as they connect to the network. DHCP allows for a computer to acquire all the configuration information it needs in a single message. Furthermore, this protocol permits the allocation of IP addresses automatically. To use DHCP’s dynamic address allocation mechanism, the network administrator must configure a DHCP server by supplying a set of IP addresses. Whenever a new computer connects to the network, this computer contacts the DHCP server and requests an IP address. The server chooses one of the addresses the administrator specified and allocates that address to the computer. In the next subsections, IP address allocation and IP address management within DHCP are discussed in detail.
5.6.1 IP Address Allocation Mechanisms DHCP supports three mechanisms for IP address allocation: • Automatic allocation: DHCP server assigns a permanent IP address to a computer when it first attaches to the network. • Dynamic allocation: DHCP server assigns an IP address to a computer for a limited period of time (or until the client explicitly relinquishes the address). This mechanism is useful for assigning an address to a computer that will be connected to the network only temporarily or for sharing a limited IP address pool among a group of clients that do not need permanent IP addresses. • Manual allocation: The network administrator can configure a specific address for a specific computer. A particular network will use one or more of those mechanisms, depending on the policies of the network administrator.
5.6.2 The IP Address Management of DHCP In this subsection, DHCP is discussed from the viewpoint of four aspects: organization, information, function, and communication, which were presented in the previous sections while describing the ISO and IETF management frameworks. 5.6.2.1 Organization Aspect DHCP is built on a client–server model. The DHCP system consists of three types of devices: clients, relays, and servers. DHCP servers provide configuration information for one or several subnets. A DHCP
© 2005 by CRC Press
Survey of Network Management Frameworks
5-15
FIGURE 5.7 Communication between DHCP server and DHCP client.
FIGURE 5.8 The DHCP PDU format.
client is a host configured using information obtained from DHCP servers. If a client and a server reside on different networks, then a relay server on the client’s network is needed to relay broadcast messages between the server and the client. The organization architecture is shown in Figure 5.7. A DHCP server in a network receives DHCP requests from a client and, in case of dynamic address allocation policies selected, allocates an IP address to the requesting client. 5.6.2.2 Information Aspect The information aspects of DHCP deal with the network parameters (configuration parameters and IP addresses) exchanged between DHCP servers and DHCP clients, and the persistent storage of these parameters. The DHCP server stores a key–value entry for each client, where the key is some unique identifier and the value contains the configuration parameters for the client. A client can query the DHCP server to retrieve its configuration parameters. The client’s interface to the configuration parameters repository consists of protocol messages to request configuration parameters, and responses from the server carrying the configuration parameters. 5.6.2.3 Functional Aspect The functions of DHCP are defined through DHCP PDU. The format of a DHCP PDU is shown in Figure 5.8. Table 5.2 describes the fields in a DHCP message. There are eight message types for DHCP: five of them are used as messages sent from the client to the server, and the other three are used for messages sent from the server to the client. The types of these messages are described in Table 5.3. 5.6.2.4 Communication Aspect The communication aspect deals with rules for communication between a DHCP client and a DHCP server for exchanging the DHCP PDUs. The client–server interaction can be classified in two cases: (1) client–server interaction for allocating an IP address, and (2) client–server interaction for reusing a previously allocated IP address. In both cases, the communication between clients and servers is performed in a confirmed way and initiated by clients.
© 2005 by CRC Press
5-16
The Industrial Communication Technology Handbook
TABLE 5.2
DHCP Message Field Description
Field
Description
op xid ciaddr yiaddr giaddr sname file options
Message type (BOOTREQUEST, BOOTREPLY) identifies whether a message is sent from a client to a server (BOOTREQUEST) or from a server to a client (BOOTREPLY) Transaction ID, a random number chosen by client, used by the client and server to associate messages and responses between a client and a server Client IP address; only filled in if client can respond to ARP request Your client IP address Relay agent IP address, used in booting via a relay agent Optional server host name Boot file name Optional parameters field
TABLE 5.3
DHCP Message Type
PDU (Message)
Description Sent from Client to Server
DHCPDISCOVER DHCPREQUEST DHCPDECLINE DHCPINFORM DHCPRELEASE
Client broadcast to local available servers Requesting parameters from one server Indicating an IP address is already in use Asking for a local configuration parameter Relinquishing an IP address and canceling remaining lease Sent from Server to Client
DHCPOFFER DHCPACK DHCPNAK
Respond to DHCPDISCOVER with an offer of configuration parameters Acknowledgment with configuration parameters, including committed IP address Refusing request for configuration parameters (e.g., requested IP address already in use)
• The client–server interaction for allocating an IP address is performed as follows [RFC2131]: 1. A client, which attaches to the network for the first time, takes the initiative by sending a DHCPDISCOVER broadcast message to locate available servers. The DHCPDISCOVER message may include options that suggest values for the network address and lease duration. 2. DHCP servers receiving the DHCPDISCOVER message may return DHCPOFFER or may not return (many servers may receive the same DHCPDISCOVER message). If a server decides to respond, the server puts an available address into the “yiaddr” field (and other configuration parameters in DHCP options) and broadcasts a DHCPOFFER message. At this point, there is no agreement of an assignment between the server and the client. 3. The client receives one or more DHCPOFFER messages from one or more servers. It then chooses one server from them. The client puts the IP address of the selected server into the “server identifier” option of a DHCPREQUEST and broadcasts it to indicate which server it has selected. This DHCPREQUEST is broadcasted and relayed through DHCP relay agents. 4. Servers receive the DHCPREQUEST broadcast from the client. The servers check the “server identifier” option. If it does not match with its own address, the server interprets the message as a notification that the client has declined the offer. The selected server sends the DHCPACK (if its address is available) or the DHCPNAK (for example, the address is already assigned to another client). 5. The client, which gets the DHCPACK, starts using the IP address. From that point on, the client is configured. If it gets DHCPNAK, it restarts from step 1. 6. If the client finds a problem with the assigned address of DHCPACK, it sends DHCPDECLINE to the server and restarts from step 1. 7. The client may choose to relinquish its lease of a network address by sending a DHCPRELEASE message to the server.
© 2005 by CRC Press
Survey of Network Management Frameworks
5-17
• Client–server interaction for reusing a previously allocated IP address: If a client remembers and wishes to reuse a previously allocated IP address, it may choose to omit some of the steps described in the previous section. The interaction is performed as follows [RFC2131]: 1. The client broadcasts a DHCPREQUEST message with the “requested IP address” option, which indicates the previously assigned address. 2. A DHCP server, which has a binding of the address, returns DHCPACK or DHCPNAK to the client. The DHCPACK message indicates that the client can use the previously assigned address. The DHCPNAK means that the IP address is already in use. 3. If the client receives the DHCPACK message, it performs a final check of the parameters, notes the duration of the lease specified in the DHCPACK message, and starts using the IP address. If the client receives the DHCPNAK message, the client must send the DHCPDECLINE message to the server and restarts the configuration process by requesting a new network address. If the client receives neither the DHCPACK nor DHCPNAK message, it times out and retransmits the DHCPREQUEST message. 4. The client may choose to relinquish its lease of an IP address by sending a DHCPRELEASE message to the server. DHCP uses UDP as its transport protocol. DHCP messages from a client to a server are sent to the DHCP server’s port (67), and DHCP messages from a server to a client are sent to the DHCP client’s port (68).
5.6.3 Advantages and Disadvantages of DHCP for IPv4 Nowadays, DHCP is used in many installations to pass configuration information to workstations. One of the main advantages of DHCP is that a workstation is not required to have any kind of permanent storage space. All network configuration parameters can be passed using DHCP without any human interaction. The other advantage is that the DHCP can play an important role in reducing the cost of ownership for large organizations by moving the administration of client systems to centralized management servers. DHCP helps reduce the impact of the increasing scarcity of available IP addresses in two ways. First, DHCP can be used to manage the limited standard IP addresses that are available to an organization. It does this by issuing the addresses to clients on an as-needed basis and reclaiming them when the addresses are no longer required. Second, DHCP can be used in conjunction with network address translation (NAT) to issue private network addresses to connect clients to the Internet. However, HDCP for IPv4 has a few inherent problems. First of all, the current DHCP implementation provides no authentication or security mechanisms, whereby the authentication for DHCP is proposed in RFC 3118 [RFC3118]. One example of this security problem is that a message broadcast by a server can lead to the situation in which all the traffic can be routed through a malicious host that eavesdrops on the traffic all the time. An even more dreadful situation would arise if a computer could obtain a tampered boot file that logs all the login–password pairs to the same remote hosts. Second, data about configuration parameters and IP addresses are held locally in the DHCP servers. There exists no standard for controlling and monitoring this configuration data from the DHCP servers. Another disadvantage of DHCP relates to the leasing mechanism: the client is expected to stop using any dynamically allocated IP address after the lease time expires. Additionally, a client requesting a new lease is not guaranteed to receive the same IP address as it had previously.
5.7 Conclusions We have surveyed the management architecture, protocols, services, and management information base of the network management frameworks standardized by ISO and IETF. We also introduced DHCP, an IP address management framework standardized by IETF. Within each of those frameworks, standards relating to four fundamental aspects of network management (functional, information, communication, and organization aspects) were addressed.
© 2005 by CRC Press
5-18
The Industrial Communication Technology Handbook
Both ISO and IETF network management frameworks have their advantages and disadvantages. However, the key decision factor in choosing between the two frameworks is in their implementation. It has been until now almost impossible to find a system with the necessary resources to support the ISO framework, although it is conceptually superior to SNMP (v1, v2, and v3) in both design and operation. In comparison with both network management frameworks, DHCP is much simpler in both design and implementation. This is largely due to the fact that it focuses only on one particular task —the IP address management. DHCP can play an important role in making systems management simpler and less expensive by moving the management of IP addresses away from the client systems and onto centralized servers.
References [ASN90] ISO/IEC 8824. Specification of Abstract Syntax Notation One (ASN.1), April 1990. [ISO7498-4] ITU-T-ISO/IEC.ITU-T X.700-ISO/IEC 7498-4. Information Processing Systems: Open Systems Interconnection: Management Framework for Open System Interconnection, 1992. [ISO9595] ISO 9595. Information Processing Systems: Open Systems Interconnection: Common Management Information Service Definition, Geneva, 1990. [ISO9596] ISO 9596. Information Processing Systems: Open Systems Interconnection: Common Management Information Protocol, Geneva, 1991. [ISO10040] ITU-T-ISO/IEC.ITU-T X.701-ISO/IEC 10040. Information Processing Systems: Open System Interconnection: System Management Overview, 1992. [ISO10165-1] ITU-T-ISO/IEC.ITU-T X.720-ISO/IEC 10165-1. Information Processing Systems: Open Systems Interconnection: Structure of Management Information: Management Information Model, Geneva, 1993. [ISO10165-2] ISO 10165-2. Information Processing Systems: Open Systems Interconnection: Structure of Management Information: Definition of Management Information, Geneva, 1993. [ISO10165-4] ISO 10165-4. Information Processing Systems: Open Systems Interconnection: Structure of Management Information: Part 4: Guidelines for the Definition of Managed Objects, Geneva, 1993. [ISO10165-5] ISO 10165-5. Information Processing Systems: Open Systems Interconnection: Structure of Management Information: Generic Management Information, Geneva, 1993. [ISO10165-7] ISO 10165-7. Information Processing Systems: Open Systems Interconnection: Structure of Management Information: General Relationship Model, Geneva, 1993. [KR01] James F. Kurose, Keith W. Ross. Computer Networking: A Top-Down Approach Featuring the Internet, Addison Wesley, Reading, MA, 2001. [RFC951] Bill Crosoft, John Gilmore. Bootstrap Protocol (BOOTP), RFC 951, September 1985. [RFC1066] K. McCloghrie, M. Rose. Management Information Base for Network Management of TCP/ IP-Based Internets, RFC 1066, 1998. [RFC1155] K. McCloghrie, M. Rose. Structure and Identification of Management Information for TCP/ IP-Based Internets, RFC 1155, 1990. [RFC1157] J. Case, M. Fedor, M. Schofftall, C. Davin. The Simple Network Management Protocol, RFC 1157, May 1990. [RFC1212] K. McCloghrie, M. Rose. Concise MIB Definitions, RFC 1212, 1991. [RFC1213] K. McCloghrie, M. Rose. Management Information Base for Network Management of TCP/ IP-Based Internets: MIB-II, RFC 1213, 1991. [RFC1215] M. Rose. A Convention for Defining Traps for use with the SNMP, RFC 1215, 1991. [RFC1441] K. McCloghrie, M. Rose, J. Case, S. Waldbusser. Introduction to Version 2 of the InternetStandard Network Management Framework, RFC 1441, 1993. [RFC1445] J. Galvin, K. McCloghrie. Administrative Model for Version 2 of the Simple Network Management Protocol (SNMPv2), RFC 1445, 1993.
© 2005 by CRC Press
Survey of Network Management Frameworks
5-19
[RFC1448] K. McCloghrie, M. Rose, J. Case, S. Waldbusser. Protocol Operations for Version 2 of the Simple Network Management Protocol (SNMPv2), RFC 1448, 1993. [RFC1542] W. Wimer. Clarifications and Extensions for Bootstrap Protocol, RFC 1542, October 1993. [RFC1902] J. Case, K. McCloghrie, M. Rose, S. Waldbusser. Structure of Management Information for Version 2 of Simple Network Management Protocol (SNMPv2), RFC 1902, January 1996. [RFC2131] R. Droms. Dynamic Host Configuration Protocol, RFC 2131, March 1997. [RFC2132] S. Alexander, R. Droms. DHCP Options and BOOTP Vendor Extensions, RFC 2132, March 1997. [RFC2271] D. Harrington, R. Presuhn, B. Wijnen. An Architecture for Describing SNMP Management Frameworks, RFC 2271, 1998. [RFC2272] J. Case, D. Harrington, R. Presuhn, B. Wijnen. Message Processing and Dispatching for the Simple Network Management Protocol (SNMP), RFC 2272, 1998. [RFC2273] D. Levi, P. Meyer, B. Stewart. SNMPv3 Applications, RFC 2273, 1998. [RFC2274] U. Blumenthal, B. Wijnen. User-Based Security Model (USM) for Version 3 of the Simple Network Management Protocol (SNMPv3), RFC 2274, 1998. [RFC2275] B. Wijnen, R. Presuhn, K. McCloghrie. View-Based Access Control Model (VACM) for the Simple Network Management Protocol (SNMP), RFC 2275, 1998. [RFC2570] J.Case, R. Mundy, D. Partain, B. Steward. Introduction to Version 3 of the Internet Standard Network Management Framework, RFC 2570, 1999. [RFC3046] M. Patrick. DHCP Relay Agent Information Option, RFC 3046, January 2001. [RFC3118] R. Droms, W. Arbaugh. Authentication for DHCP Messages, RFC 3118, June 2001. [RFC3411] B. Wijnen, R. Presuhn, K. McCloghrie. View-Based Access Control Model (VACM) for the Simple Network Management Protocol (SNMP), RFC 3411, 2002. [RFC3415] B. Wijnen, R. Presuhn, K. McCloghrie. View-Based Access Control Model (VACM) for the Simple Network Management Protocol (SNMP), RFC 3415, 2002. [Ros90] Marshall T. Rose. The Open Book: A Practical Perspective on OSI, Prentice Hall, Englewood Cliffs, NJ, 1990. [Sch95] Bruce Schneider. Applied Cryptography: Protocols, Algorithms, and Source Code in C, John Wiley, New York, 1995. [Sta93a] William Stallings. Networking Standards: A Guide to OSI, ISDN, LAN, and MAN Standards, Addison-Wesley, Reading, MA, 1993. [Sta93b] William Stallings. SNMP, SNMPv2 and CMIP: The Practical Guide to Network Management Standards, Addison-Wesley, Reading, MA, 1993. [Sta99] William Stallings. SNMP, SNMPv2, SNMPv3 and RMON 1 and 2, Addison-Wesley, Reading, MA, 1999.
© 2005 by CRC Press
6 Internet Security 6.1 6.2
Security Attacks and Security Properties...........................6-1 Security Mechanisms ..........................................................6-3 Attack Prevention • Attack Avoidance • Attack and Intrusion Detection
Christopher Kruegel Vienna University of Technology
6.3 Secure Network Protocols ................................................6-10 6.4 Secure Applications ...........................................................6-12 6.5 Summary............................................................................6-13 References .....................................................................................6-13
In order to provide useful services or to allow people to perform tasks more conveniently, computer systems are attached to networks and get interconnected. This results in the worldwide collection of local and wide area networks known as the Internet. Unfortunately, the extended access possibilities also entail increased security risks, as it opens additional avenues for an attacker. For a closed, local system, the attacker was required to be physically present at the network in order to perform unauthorized actions. In the networked case, each host that can send packets to the victim can be potentially utilized. As certain services (such as Web or name servers) need to be publicly available, each machine on the Internet might be the originator of malicious activity. This fact makes attacks very likely to happen on a regular basis. The following attempts to give a systematic overview of security requirements of Internet-based systems and potential means to satisfy them. We define properties of a secure system and provide a classification of potential threats to it. We also introduce mechanisms to defend against attacks that attempt to violate desired properties. The most widely used means to secure application data against tampering and eavesdropping, the Secure Sockets Layer (SSL), and its successor, the Transport Layer Security (TLS) protocol, are discussed. Finally, we briefly describe popular application programs that can act as building blocks for securing custom applications. Before one can evaluate attacks against a system and decide on appropriate mechanisms against them, it is necessary to specify a security policy [23]. A security policy defines the desired properties for each part of a secure computer system. It is a decision that has to take into account the value of the assets that should be protected, the expected threats, and the cost of proper protection mechanisms. A security policy that is sufficient for the data of a normal user at home may not be sufficient for bank applications, as these systems are obviously a more likely target and have to protect more valuable resources. Although often neglected, the formulation of an adequate security policy is a prerequisite before one can identify threats and appropriate mechanisms to face them.
6.1 Security Attacks and Security Properties For the following discussion, we assume that the function of a system that is the target of an attack is to provide information. In general, there is a flow of data from a source (e.g., host, file, memory) to a
6-1 © 2005 by CRC Press
6-2
The Industrial Communication Technology Handbook
FIGURE 6.1 Security attacks.
destination (e.g., remote host, other file, user) over a communication channel (e.g., wire, data bus). The task of the security system is to restrict access to this information to only those parties (persons or processes) that are authorized to have access according to the security policy in use. In the case of an automation system that is remotely connected to the Internet, the information flow is from or to a control application that manages sensors and actuators via communication lines of the public Internet and the network of the automation system (e.g., a fieldbus). The normal information flow and several categories of attacks that target it are shown in Figure 6.1 and explained below (according to [22]): 1. Interruption: An asset of the system gets destroyed or becomes unavailable. This attack targets the source or the communication channel and prevents information from reaching its intended target (e.g., cut the wire, overload the link so that the information gets dropped because of congestion). Attacks in this category attempt to perform a kind of denial of service (DOS). 2. Interception: An unauthorized party gets access to the information by eavesdropping into the communication channel (e.g., wiretapping). 3. Modification: The information is not only intercepted, but modified by an unauthorized party while in transit from the source to the destination. By tampering with the information, it is actively altered (e.g., modifying message content). 4. Fabrication: An attacker inserts counterfeit objects into the system without having the sender do anything. When a previously intercepted object is inserted, this process is called replaying. When the attacker pretends to be the legitimate source and inserts his desired information, the attack is called masquerading (e.g., replay an authentication message, add records to a file). The four classes of attacks listed above violate different security properties of the computer system. A security property describes a desired feature of a system with regards to a certain type of attack. A common classification following [5, 13] is listed below: • Confidentiality: This property covers the protection of transmitted data against its release to nonauthorized parties. In addition to the protection of the content itself, the information flow should also be resistant against traffic analysis. Traffic analysis is used to gather other information than the transmitted values themselves from the data flow (e.g., timing data, frequency of messages).
© 2005 by CRC Press
Internet Security
6-3
• Authentication: Authentication is concerned with making sure that the information is authentic. A system implementing the authentication property assures the recipient that the data are from the source from which they claim to be. The system must make sure that no third party can masquerade successfully as another source. • Nonrepudiation: This property describes the feature that prevents either sender or receiver from denying a transmitted message. When a message has been transferred, the sender can prove that it has been received. Similarly, the receiver can prove that the message has actually been sent. • Availability: Availability characterizes a system whose resources are always ready to be used. Whenever information needs to be transmitted, the communication channel is available and the receiver can cope with the incoming data. This property makes sure that attacks cannot prevent resources from being used for their intended purpose. • Integrity: Integrity protects transmitted information against modifications. This property ensures that a single message reaches the receiver as it has left the sender, but integrity also extends to a stream of messages. It means that no messages are lost, duplicated, or reordered, and it makes sure that messages cannot be replayed. As destruction is also covered under this property, all data must arrive at the receiver. Integrity is not only important as a security property, but also as a property for network protocols. Message integrity must also be ensured in case of random faults, not only in case of malicious modifications.
6.2 Security Mechanisms Different security mechanisms can be used to enforce the security properties defined in a given security policy. Depending on the anticipated attacks, different means have to be applied to satisfy the desired properties. We divide these measures against attacks into three different classes: attack prevention, attack avoidance, and attack detection.
6.2.1 Attack Prevention Attack prevention is a class of security mechanisms that contains ways of preventing or defending against certain attacks before they can actually reach and affect the target. An important element in this category is access control, a mechanism that can be applied at different levels, such as the operating system, the network, or the application layer. Access control [23] limits and regulates the access to critical resources. This is done by identifying or authenticating the party that requests a resource and checking its permissions against the rights specified for the demanded object. It is assumed that an attacker is not legitimately permitted to use the target object and is therefore denied access to the resource. As access is a prerequisite for an attack, any possible interference is prevented. The most common form of access control used in multiuser computer systems is access control lists for resources that are based on the user identity of the process that attempts to use them. The identity of a user is determined by an initial authentication process that usually requires a name and a password. The login process retrieves the stored copy of the password corresponding to the user name and compares it with the presented one. When both match, the system grants the user the appropriate user credentials. When a resource should be accessed, the system looks up the user and group in the access control list and grants or denies access as appropriate. An example of this kind of access control is a secure Web server. A secure Web server delivers certain resources only to clients that have authenticated themselves and pose sufficient credentials for the desired resource. The authentication process is usually handled by the Web client, such as Microsoft Internet Explorer or Mozilla, by prompting the user to enter his name and password. The most important access control system at the network layer is a firewall [4]. The idea of a firewall is based on the separation of a trusted inside network of computers under single administrative
© 2005 by CRC Press
6-4
The Industrial Communication Technology Handbook
Firewall
Firewall
Internet
Inside Network Demilitarized Zone (DMZ)
FIGURE 6.2 Demilitarized zone.
control from a potential hostile outside network. The firewall is a central choke point that allows enforcement of access control for services that may run at the inside or outside. The firewall prevents attacks from the outside against the machines in the inside network by denying connection attempts from unauthorized parties located outside. In addition, a firewall may also be utilized to prevent users behind the firewall from using certain services that are outside (e.g., surfing Web sites containing pornographic material). For certain installations, a single firewall is not suitable. Networks that consist of several server machines that need to be publicly accessible and workstations that should be completely protected against connections from the outside would benefit from a separation between these two groups. When an attacker compromises a server machine behind a single firewall, all other machines can be attacked from this new base without restrictions. To prevent this, one can use two firewalls and the concept of a demilitarized zone (DMZ) [4] in between, as shown in Figure 6.2. In this setup, one firewall separates the outside network from a segment (DMZ) with the server machines, while a second one separates this area from the rest of the network. The second firewall can be configured in a way that denies all incoming connection attempts. Whenever an intruder compromises a server, he is now unable to immediately attack a workstation located in the inside network. The following design goals for firewalls are identified in [4]: 1. All traffic from inside to outside, and vice versa, must pass through the firewall. This is achieved by physically blocking all access to the internal network except via the firewall. 2. Only authorized traffic, as defined by the local security policy, will be allowed to pass. 3. The firewall itself should be immune to penetration. This implies the use of a trusted system with a secure operating system. A trusted, secure operating system is often purpose built, has heightened security features, and only provides the minimal functionality necessary to run the desired applications. These goals can be reached by using a number of general techniques for controlling access. The most common is called service control and determines Internet services that can be accessed. Traffic on the Internet is currently filtered on the basis of Internet Protocol (IP) addresses and Transmission Control Protocol (TCP)/User Datagram Protocol (UDP) port numbers. In addition, there may be proxy software that receives and interprets each service request before passing it on. Direction control is a simple mechanism to control the direction in which particular service requests may be initiated and permitted to flow through. User control grants access to a service based on user credentials similar to the technique used in a multiuser operating system. Controlling external users requires secure authentication over the network (e.g., such as provided in IPSec [10]). A more declarative approach in contrast to the operational variants mentioned above is behavior control. This technique determines how particular services are used. It may be utilized to filter e-mail to eliminate spam or to allow external access to only part of the local Web pages. A summary of capabilities and limitations of firewalls is given in [22]. The following benefits can be expected: • A firewall defines a single choke point that keeps unauthorized users out of the protected network. The use of such a point also simplifies security management. • It provides a location for monitoring security-related events. Audits, logs, and alarms can be implemented on the firewall directly. In addition, it forms a convenient platform for some nonsecurity-related functions, such as address translation and network management. • A firewall may serve as a platform to implement a virtual private network (e.g., by using IPSec).
© 2005 by CRC Press
Internet Security
6-5
The list below enumerates the limits of the firewall access control mechanism: • A firewall cannot protect against attacks that bypass it, for example, via a direct dial-up link from the protected network to an Internet service provider (ISP). It also does not protect against internal threats from an inside hacker or an insider cooperating with an outside attacker. • A firewall does not help when attacks are against targets whose access has to be permitted. • It cannot protect against the transfer of virus-infected programs or files. It would be impossible, in practice, for the firewall to scan all incoming files and e-mails for viruses. Firewalls can be divided into two main categories. A packet-filtering router, or short packet filter, is an extended router that applies certain rules to the packets that are forwarded. Usually, traffic in each direction (in- and outgoing) is checked against a rule set that determines whether a packet is permitted to continue or should be dropped. The packet filter rules operate on the header fields used by the underlying communication protocols, for the Internet almost always IP, TCP, and UDP. Packet filters have the advantage that they are cheap, as they can often be built on existing hardware. In addition, they offer good performance for high-traffic loads. An example for a packet filter is the iptables package, which is implemented as part of the Linux 2.4 routing software. A different approach is followed by an application-level gateway, also called proxy server. This type of firewall does not forward packets on the network layer but acts as a relay on the application level. The user contacts the gateway, which in turn opens a connection to the intended target (on behalf of the user). A gateway completely separates the inside and outside networks at the network level and only provides a certain set of application services. This allows authentication of the user who requests a connection and session-oriented scanning of the exchanged traffic up to the application-level data. This feature makes application gateways more secure than packet filters and offers a broader range of log facilities. On the downside, the overhead of such a setup may cause performance problems under heavy loads. Another important element in the set of attack prevention mechanisms is system hardening. System hardening is used to describe all steps that are taken to make a computer system more secure. It usually refers to changing the default configuration to a more secure one, possibly at the expense of ease of use. Vendors usually preinstall a large set of development tools and utilities, which, although beneficial to the new user, might also contain vulnerabilities. The initial configuration changes that are part of system hardening include the removal of services, applications, and accounts that are not needed and the enabling of operating system auditing mechanisms (e.g., event log in Windows). Hardening also involves a vulnerability assessment of the system. Numerous open-source tools such as network (e.g., nmap [8]) and vulnerability scanners (e.g., Nessus [12]) can help to check a system for open ports and known vulnerabilities. This knowledge then helps to remedy these vulnerabilities and close unnecessary ports. An important and ongoing effort in system hardening is patching. Patching describes a method of updating a file that replaces only the parts being changed, rather than the entire file. It is used to replace parts of a (source or binary) file that contain a vulnerability that is exploitable by an attacker. To be able to patch, it is necessary that the system administrators keep up to date with security advisories that are issued by vendors to inform about security-related problems in their products.
6.2.2 Attack Avoidance Security mechanisms in this category assume that an intruder may access the desired resource but the information is modified in a way that makes it unusable for the attacker. The information is preprocessed at the sender before it is transmitted over the communication channel and postprocessed at the receiver. While the information is transported over the communication channel, it resists attacks by being nearly useless for an intruder. One notable exception is attacks against the availability of the information, as an attacker could still interrupt the message. During the processing step at the receiver, modifications or errors that might have previously occurred can be detected (usually because the information cannot be correctly reconstructed). When no modification has taken place, the information at the receiver is identical to the one at the sender before the preprocessing step.
© 2005 by CRC Press
6-6
The Industrial Communication Technology Handbook
FIGURE 6.3 Encryption and decryption.
The most important member in this category is cryptography, which is defined as the science of keeping messages secure [18]. It allows the sender to transform information into a random data stream from the point of view of an attacker, but to have it recovered by an authorized receiver (Figure 6.3). The original message is called plain text (sometimes clear text). The process of converting it through the application of some transformation rules into a format that hides its substance is called encryption. The corresponding disguised message is denoted cipher text, and the operation of turning it back into clear text is called decryption. It is important to notice that the conversion from plain to cipher text has to be lossless in order to be able to recover the original message at the receiver under all circumstances. The transformation rules are described by a cryptographic algorithm. The function of this algorithm is based on two main principles: substitution and transposition. In the case of substitution, each element of the plain text (e.g., bit, block) is mapped into another element of the used alphabet. Transposition describes the process where elements of the plain text are rearranged. Most systems involve multiple steps (called rounds) of transposition and substitution to be more resistant against cryptanalysis. Cryptanalysis is the science of breaking the cipher, i.e., discovering the substance of the message behind its disguise. When the transformation rules process the input elements one at a time, the mechanism is called a stream cipher; in case of operating on fixed-size input blocks, it is called a block cipher. If the security of an algorithm is based on keeping the way the algorithm works (i.e., the transformation rules) secret, it is called a restricted algorithm. Those algorithms are no longer of any interest today because they do not allow standardization or public quality control. In addition, when a large group of users is involved, such an approach cannot be used. A single person leaving the group makes it necessary for everyone else to change the algorithm. Modern cryptosystems solve this problem by basing the ability of the receiver to recover encrypted information on the fact that he possesses a secret piece of information (usually called the key). Both encryption and decryption functions have to use a key, and they are heavily dependent on it. When the security of the cryptosystem is completely based on the security of the key, the algorithm itself may be revealed. Although the security does not rely on the fact that the algorithm is unknown, the cryptographic function itself and the used key, together with its length, must be chosen with care. A common assumption is that the attacker has the fastest commercially available hardware at his disposal in his attempt to break the cipher text. The most common attack, called known plain text attack, is executed by obtaining cipher text together with its corresponding plain text. The encryption algorithm must be so complex that even if the code breaker is equipped with plenty of such pairs and powerful machines, it is infeasible for him to retrieve the key. An attack is infeasible when the cost of breaking the cipher exceeds the value of the information or the time it takes to break it exceeds the life span of the information. Given pairs of corresponding cipher and plain text, it is obvious that a simple key guessing algorithm will succeed after some time. The approach of successively trying different key values until the correct one is found is called brute force attack because no information about the algorithm is utilized whatsoever. In order to be useful, it is a necessary condition for an encryption algorithm that brute force attacks are infeasible. Depending on the keys that are used, one can distinguish two major cryptographic approaches: secret and public key cryptosystems.
© 2005 by CRC Press
Internet Security
6-7
6.2.2.1 Secret Key Cryptography This is the kind of cryptography that has been used for the transmission of secret information for centuries, long before the advent of computers. These algorithms require that the sender and receiver agree on a key before communication is started. It is common for this variant (which is also called single key or symmetric encryption) that a single secret key is shared between the sender and receiver. It needs to be communicated in a secure way before the actual encrypted communication can start, and has to remain secret as long as the information is to remain secret. Encryption is achieved by applying an agreed function to the plain text using the secret key. Decryption is performed by applying the inverse function using the same key. The classic example of a secret key block cipher, which is widely deployed today, is the data encryption standard (DES) [6]. DES was developed in 1977 by IBM and adopted as a standard by the U.S. government for administrative and business use. Recently, it has been replaced by the advanced encryption standard (AES — Rijndael) [1]. It is a block cipher that operates on 64-bit plain text blocks and utilizes a key 56 bits in length. The algorithm uses 16 rounds that are key dependent. During each round 48 key bits are selected and combined with the block that is encrypted. Then, the resulting block is piped through a substitution and a permutation phase (which use known values and are independent of the key) to make cryptanalysis harder. Although there is no known weakness of the DES algorithm itself, its security has been much debated. The small key length makes brute force attacks possible, and several cases have occurred where DES-protected information has been cracked. A suggested improvement called 3DES uses three rounds of the simple DES with three different keys. This extends the key length to 168 bits while still resting on the very secure DES base. A well-known stream cipher that has been debated recently is RC4 [16], which has been developed by RSA. It is used to secure the transmission in wireless networks that follow the IEEE 802.11 standard and forms the core of the wired equivalent protection (WEP) mechanism. Although the cipher itself has not been broken, current implementations are flawed and reduce the security of RC4 down to a level where the used key can be recovered by statistical analysis within a few hours. 6.2.2.2 Public Key Cryptography Since the advent of public key cryptography, the knowledge of the key that is used to encrypt a plain text also allowed the inverse process, the decryption of the cipher text. In 1976, this paradigm of cryptography was changed by Diffie and Hellman [7] when they described their public key approach. Public key cryptography utilizes two different keys, one called the public key, the other the private key. The public key is used to encrypt a message while the corresponding private key is used to do the opposite. Their innovation was based on the fact that it is infeasible to retrieve the private key given the public key. This makes it possible to remove the weakness of secure key transmission from the sender to the receiver. The receiver can simply generate his public–private key pair and announce the public key without fear. Anyone can obtain this key and use it to encrypt messages that only the receiver with his private key is able to decrypt. Mathematically, the process is based on the trap door of one-way functions. A one-way function is a function that is easy to compute but very hard to inverse. That means that given x, it is easy to determine f(x), but given f(x), it is hard to get x. Hard is defined as computationally infeasible in the context of cryptographically strong one-way functions. Although it is obvious that some functions are easier to compute than their inverse (e.g., square of a value in contrast to its square root), there is no mathematical proof or definition of one-way functions. There are a number of problems that are considered difficult enough to act as one-way functions, but it is more an agreement among cryptanalysts than a rigorously defined set (e.g., factorization of large numbers). A one-way function is not directly usable for cryptography, but it becomes so when a trap door exists. A trap door is a mechanism that allows one to easily calculate x from f(x) when an additional information y is provided. A common misunderstanding about public key cryptography is thinking that it makes secret key systems obsolete, either because it is more secure or because it does not have the problem of secretly exchanging keys. As the security of a cryptosystem depends on the length of the used key and the utilized
© 2005 by CRC Press
6-8
The Industrial Communication Technology Handbook
transformation rules, there is no automatic advantage of one approach over the other. Although the key exchange problem is elegantly solved with a public key, the process itself is very slow and has its own problems. Secret key systems are usually a factor of 1000 (see [18] for exact numbers) faster than their public key counterparts. Therefore, most communication is still secured using secret key systems, and public key systems are only utilized for exchanging the secret key for later communication. This hybrid approach is the common design to benefit from the high speed of conventional cryptography (which is often implemented directly in hardware) and from a secure key exchange. A problem in public key systems is the authenticity of the public key. An attacker may offer the sender his own public key and pretend that it originates from the legitimate receiver. The sender then uses the fake public key to perform his encryption and the attacker can simply decrypt the message using his private key. In order to thwart an attacker that attempts to substitute his public key for the victim’s, certificates are used. A certificate combines user information with the user’s public key and the digital signature of a trusted third party that guarantees that the key belongs to the mentioned person. The trusted third party is usually called a certification authority (CA). The certificate of a CA itself is usually verified by a higher-level CA that confirms that the CA’s certificate is genuine and contains its public key. The chain of third parties that verify their respective lower-level CAs has to end at a certain point, which is called the root CA. A user that wants to verify the authenticity of a public key and all involved CAs needs to obtain the self-signed certificate of the root CA via an external channel. Web browsers (e.g., Netscape Navigator, Internet Explorer) usually ship with a number of certificates of globally known root CAs. A framework that implements the distribution of certificates is called a public key infrastructure (PKI). An important protocol for key management is X.509 [25]. Another important issue is revocation, the invalidation of a certificate when the key has been compromised. The best-known public key algorithm and textbook classic is RSA [17], named after its inventors at MIT, Rivest, Shamir, and Adleman. It is a block cipher that is still utilized for the majority of current systems, although the key length has been increased over recent years. This has put a heavier processing load on applications, a burden that has ramifications, especially for sites doing electronic commerce. A competitive approach that promises similar security as RSA using far smaller key lengths is elliptic curve cryptography. However, as these systems are new and have not been subject to sustained cryptanalysis, the confidence level in them is not yet as high as in RSA. 6.2.2.3 Authentication and Digital Signatures An interesting and important feature of public key cryptography is its possible use for authentication. In addition to making the information unusable for attackers, a sender may utilize cryptography to prove his identity to the receiver. This feature is realized by digital signatures. A digital signature must have similar properties as a normal handwritten signature. It must be hard to forge and it has to be bound to a certain document. In addition, one has to make sure that a valid signature cannot be used by an attacker to replay the same (or different) messages at a later time. A way to realize such a digital signature is by using the sender’s private key to encrypt a message. When the receiver is capable of successfully decrypting the cipher text with the sender’s public key, he can be sure that the message is authentic. This approach obviously requires a cryptosystem that allows encryption with the private key, but many (such as RSA) offer this option. It is easy for a receiver to verify that a message has been successfully decrypted when the plain text is in a human readable format. For binary data, a checksum or similar integrity checking footer can be added to verify a successful decryption. Replay attacks are prevented by adding a time stamp to the message (e.g., Kerberos [11] uses time stamps to prevent messages to the ticket-granting service from being replayed). Usually, the storage and processing overhead for encrypting a whole document is too high to be practical. This is solved by one-way hash functions. These are functions that map the content of a message onto a short value (called message digest). Similar to one-way functions, it is difficult to create a message when given only the hash value itself. Instead of encrypting the whole message, it is enough to simply encrypt the message digest and send it together with the original message. The receiver can then apply
© 2005 by CRC Press
Internet Security
6-9
the known hash function (e.g., MD5 [15]) to the document and compare it to the decrypted digest. When both values match, the message is authentic.
6.2.3 Attack and Intrusion Detection Attack detection assumes that an attacker can obtain access to his desired targets and is successful in violating a given security policy. Mechanisms in this class are based on the optimistic assumption that most of the time the information is transferred without interference. When undesired actions occur, attack detection has the task of reporting that something went wrong and then to react in an appropriate way. In addition, it is often desirable to identify the exact type of attack. An important facet of attack detection is recovery. Often it is enough to just report that malicious activity has been found, but some systems require that the effect of the attack has to be reverted or that an ongoing and discovered attack is stopped. On the one hand, attack detection has the advantage that it operates under the worst-case assumption that the attacker gains access to the communication channel and is able to use or modify the resource. On the other hand, detection is not effective in providing confidentiality of information. When the security policy specifies that interception of information has a serious security impact, then attack detection is not an applicable mechanism. The most important members of the attack detection class, which have received an increasing amount of attention in the last few years, are intrusion detection systems (IDSs). Intrusion detection [2, 3] is the process of identifying and responding to malicious activities targeted at computing and network resources. This definition introduces the notion of intrusion detection as a process, which involves technology, people, and tools. An intrusion detection system basically monitors and collects data from a target system that should be protected, processes and correlates the gathered information, and initiates responses, when evidence for an intrusion is detected. IDSs are traditionally classified as anomaly or signature based. Signature-based systems act similar to virus scanners and look for known, suspicious patterns in their input data. Anomaly-based systems watch for deviations of actual from expected behavior and classify all abnormal activities as malicious. The advantage of signature-based designs is the fact that they can identify attacks with an acceptable accuracy and tend to produce fewer false alarms (i.e., classifying an action as malicious when in fact it is not) than their anomaly-based cousins. The systems are more intuitive to build and easier to install and configure, especially in large production networks. Because of this, nearly all commercial systems and most deployed installations utilize signature-based detection. Although anomaly-based variants offer the advantage of being able to find prior unknown intrusions, the costs of having to deal with an order of magnitude more false alarms is often prohibitive. Depending on their source of input data, IDSs can be classified as either network or host based. Network-based systems collect data from network traffic (e.g., packets by network interfaces in promiscuous mode), while host-based systems monitor events at the operating system level, such as system calls, or receive input from applications (e.g., via log files). Host-based designs can collect high-quality data directly from the affected system and are not influenced by encrypted network traffic. Nevertheless, they often seriously impact performance of the machines they are running on. Network-based IDSs, on the other hand, can be set up in a nonintrusive manner — often as an appliance box without interfering with the existing infrastructure. In many cases, this makes them the preferred choice. As many vendors and research centers have developed their own intrusion detection system versions, the Internet Engineering Task Force (IETF) has created the intrusion detection working group [9] to coordinate international standardization efforts. The aim is to allow intrusion detection systems to share information and to communicate via well-defined interfaces by proposing a generic architectural description and a message specification and exchange format (IDMEF). A major issue when deploying intrusion detection systems in large network installations is the huge number of alerts that are produced. These alerts have to be analyzed by system administrators who have to decide on the appropriate countermeasures. Given the current state of the art of intrusion detection, however, many of the reported incidents are in fact false alerts. This makes the analysis process for the
© 2005 by CRC Press
6-10
The Industrial Communication Technology Handbook
system administrator cumbersome and frustrating, resulting in the problem that IDSs are often disabled or ignored. To address this issue, two new techniques have been proposed: alert correlation and alert verification. Alert correlation is an analysis process that takes as input the alerts produced by intrusion detection systems and produces compact reports on the security status of the network under surveillance. By reducing the total number of individual alerts and aggregating related incidents into a single report, it is easier for a system administrator to distinguish actual and bogus alarms. In addition, alert correlation offers the benefit of recognizing higher-level patterns in an alert stream, helping the administrator to obtain a better overview of the activities on the network. Alert verification is a technique that is directly aimed at the problem that intrusion detection systems often have to analyze data without sufficient contextual information. The classic example is the scenario of a Code Red worm that attacks a Linux Web server. It is a valid attack that is seen on the network; however, the alert that an IDS raises is of no use because the Linux server is not vulnerable (as Code Red can only exploit vulnerabilities in Microsoft’s IIS Web server). The intrusion detection system would require more information to determine that this attack cannot possibly succeed than what is available from only looking at network packets. Alert verification is a term that is used for all mechanisms that use additional information or means to determine whether an attack was successful. In the example above, the alert verification mechanism could supply the IDS with the knowledge that the attacked Linux server is not vulnerable to a Code Red attack. As a consequence, the IDS can react accordingly and suppress the alert or reduce its priority and thus reduce the workload of the administrator.
6.3 Secure Network Protocols Now that the general concepts and mechanisms of network security have been introduced, the following section concentrates on two actual instances of secure network protocols: the secure sockets layer (SSL [20]) and the transport layer security (TLS [24]) protocol. The idea of secure network protocols is to create an additional layer between the application and transport/network layers to provide services for a secure end-to-end communication channel. TCP/IP are almost always used as transport/network layer protocols on the Internet, and their task is to provide a reliable end-to-end connection between remote tasks on different machines that intend to communicate. The services on that level are usually directly utilized by application protocols to exchange data, for example, Hypertext Transfer Protocol (HTTP) for Web services. Unfortunately, the network layer transmits this data unencrypted, leaving it vulnerable to eavesdropping or tampering attacks. In addition, the authentication mechanisms of TCP/IP are only minimal, thereby allowing a malicious user to hijack connections and redirect traffic to his machine as well as to impersonate legitimate services. These threats are mitigated by secure network protocols that provide privacy and data integrity between two communicating applications by creating an encrypted and authenticated channel. SSL has emerged as the de facto standard for secure network protocols. Originally developed by Netscape, its latest version SSL 3.0 is also the base for the standard proposed by the IETF under the name TLS. Both protocols are quite similar and share common ideas, but they unfortunately cannot interoperate. The following discussion will mainly concentrate on SSL and only briefly explain the extensions implemented in TLS. The SSL protocol [21] usually runs above TCP/IP (although it could use any transport protocol) and below higher-level protocols such as HTTP. It uses TCP/IP on behalf of the higher-level protocols, and in the process allows an SSL-enabled server to authenticate itself to an SSL-enabled client, allows the client to authenticate itself to the server, and allows both machines to establish an encrypted connection. These capabilities address fundamental concerns about communication over the Internet and other TCP/IP networks and give protection against message tampering, eavesdropping, and spoofing. • SSL server authentication allows a user to confirm a server’s identity. SSL-enabled client software can use standard techniques of public key cryptography to check that a server’s certificate and
© 2005 by CRC Press
Internet Security
6-11
public key are valid and have been issued by a certification authority (CA) listed in the client’s list of trusted CAs. This confirmation might be important if the user, for example, is sending a credit card number over the network and wants to check the receiving server’s identity. • SSL client authentication allows a server to confirm a user’s identity. Using the same techniques as those used for server authentication, SSL-enabled server software can check that a client’s certificate and public key are valid and have been issued by a certification authority (CA) listed in the server’s list of trusted CAs. This confirmation might be important if the server, for example, is a bank sending confidential financial information to a customer and wants to check the recipient’s identity. • An encrypted SSL connection requires all information sent between a client and a server to be encrypted by the sending software and decrypted by the receiving software, thus providing a high degree of confidentiality. Confidentiality is important for both parties to any private transaction. In addition, all data sent over an encrypted SSL connection are protected with a mechanism for detecting tampering — that is, for automatically determining whether the data has been altered in transit. SSL uses X.509 certificates for authentication, RSA as its public key cipher, and one of RC4-128, RC2128, DES, 3DES, or IDEA (international data encryption algorithm) as its bulk symmetric cipher. The SSL protocol includes two subprotocols: the SSL Record Protocol and the SSL Handshake Protocol. The SSL Record Protocol simply defines the format used to transmit data. The SSL Handshake Protocol (using the SSL Record Protocol) is utilized to exchange a series of messages between an SSL-enabled server and an SSLenabled client when they first establish an SSL connection. This exchange of messages is designed to facilitate the following actions: • Authenticate the server to the client • Allow the client and server to select the cryptographic algorithms, or ciphers, that they both support • Optionally authenticate the client to the server • Use public key encryption techniques to generate shared secrets • Establish an encrypted SSL connection based on the previously exchanged shared secret The SSL Handshake Protocol is composed of two phases. Phase 1 deals with the selection of a cipher, the exchange of a secret key, and the authentication of the server. Phase 2 handles client authentication, if requested, and finishes the handshaking. After the handshake stage is complete, the data transfer between client and server begins. All messages during handshaking and after are sent over the SSL Record Protocol layer. Optionally, session identifiers can be used to reestablish a secure connection that has been previously set up. Figure 6.4 lists in a slightly simplified form the messages that are exchanged between the client C and the server S during a handshake when neither client authentication nor session identifiers are involved. In this figure, {data}key means that data has been encrypted with a key. The message exchange shows that the client first sends a challenge to the server, which responds with an X.509 certificate containing its public key. The client then creates a secret key and uses RSA with the server’s public key to encrypt it, sending the result back to the server. Only the server is capable of decrypting that message with its private key and can retrieve the shared, secret key. In order to prove to the client that
FIGURE 6.4 SSL handshake message exchange.
© 2005 by CRC Press
6-12
The Industrial Communication Technology Handbook
the secret key has been successfully decrypted, the server encrypts the client’s challenge with the secret key and returns it. When the client is able to decrypt this message and successfully retrieve the original challenge by using the secret key, it can be certain that the server has access to the private key corresponding to its certificate. From this point on, all communication is encrypted using the chosen cipher and the shared secret key. TLS uses the same two protocols shown above and a similar handshake mechanism. Nevertheless, the algorithms for calculating message authentication codes (MACs) and secret keys have been modified to make them cryptographically more secure. In addition, the constraints on padding a message up to the next block size have been relaxed for TLS. This leads to an incompatibility between both protocols. SSL/TLS is widely used to secure Web and mail traffic. HTTP and the current mail protocols IMAP (Internet Message Access Protocol) and POP3 (Post Office Protocol version 3) transmit user credential information as well as application data unencrypted. By building them on top of a secure network protocol such as SSL/TLS, they can benefit from secured channels without modifications. The secure communication protocols simply utilize different well-known destination ports (443 for HTTPs, 993 for IMAPs, and 995 for POP3s) than their insecure cousins.
6.4 Secure Applications A variety of popular tools that allow access to remote hosts (such as telnet, rsh, and rlogin) or that provide means for file transfer (such as rcp or ftp) exchange user credentials and data in plain text. This makes them vulnerable to eavesdropping, tampering, and spoofing attacks. Although the tools mentioned above could have also been built upon SSL/TLS, a different protocol suite called Secure Shell (SSH) [19] has been developed that follows partial overlapping goals. The SSH transport and user authentication protocols have features similar to those of SSL/TLS. However, they are different in the following ways: • TLS server authentication is optional and the protocol supports fully anonymous operation, in which neither side is authenticated. As such connections are inherently vulnerable to man-inthe-middle attacks, SSH requires server authentication. • TLS does not provide the range of client authentication options that SSH does — public key via RSA is the only option. • Most importantly, TLS does not have the extra features provided by the SSH connection protocol. The SSH connection protocol uses the underlying connection (also known as a secure tunnel), which has been established by the SSH transport and user authentication protocols between two hosts. It provides interactive login sessions, remote execution of commands and forwarded TCP/IP, as well as X11 connections. All these terminal sessions and forwarded connections are realized as different logical channels that may be opened by either side on top of the secure tunnel. Channels are flow controlled, which means that no data may be sent to a channel until a message is received to indicate that window space is available. The current version of the SSH protocol is SSH 2. It represents a complete rewrite of SSH 1 and improves some of its structural weaknesses. Because it encrypts packets in a different way and has abandoned the notion of server and host keys in favor of host keys only, the protocols are incompatible. For applications built from scratch, SSH 2 should always be the preferred choice. Using the means of logical channels for interactive login sessions and remote execution, a complete replacement for telnet, rsh, and rlogin could be easily implemented. A popular site that lists open-source implementations that are freely available for many different platforms can be found under [14]. Recently, a secure file transfer protocol (sftp) application has been developed that makes the use of regular File Transfer Protocol (FTP)-based programs obsolete. Notice that it is possible to tunnel arbitrary application traffic over a connection that has been previously set up by the SSH protocols. Similar to SSL/TLS, Web and mail traffic could be securely transmitted over a SSH connection before reaching the server port at the destination host. The difference is that SSH requires that a secure tunnel is created in advance that is bound to a certain port at the destination host. The setup
© 2005 by CRC Press
Internet Security
6-13
of this secure channel, however, requires that the client that is initiating the connection has to log in to the server. Usually, this makes it necessary that the user has an account at the destination host. After the tunnel has been established, all traffic sent by the client gets forwarded to the desired port at the target machine. Obviously, the connection is encrypted. In contrast, SSL/TLS connects directly to a certain point without prior logging in to the destination host. The encryption is set up directly between the client and the service listening at the destination port without a prior redirection via the SSH server. The technique of tunneling application traffic is often utilized for mail transactions when the mail server does not support SSL/TLS directly (as users have accounts at the mail server anyway), but it is less common for Web traffic.
6.5 Summary This chapter discusses security threats that systems face when they are connected to the Internet. In order to achieve the security properties that are required by the security policy in use, three different classes of mechanisms can be adopted. The first is attack prevention, which attempts to stop the attacker before it can reach its desired goals. Such techniques fall into the category of access control and firewalls. The second approach aims to make the data unusable for unauthorized persons by applying cryptographic means. Secret key as well as public key mechanisms can be utilized. The third class of mechanisms contains attack detection approaches. They attempt to detect malicious behavior and recover after undesired activity has been identified. The text also covers secure network protocols and applications. SSL/TLS as well as SSH are introduced, and their most common fields of operations are highlighted. These protocols form the base of securing traffic that is sent over the Internet on behalf of a variety of different applications.
References [1] Advanced Encryption Standard (AES). National Institute of Standards and Technology, U.S. Department of Commerce, FIPS 197, 2001. [2] Edward Amoroso. Intrusion Detection: An Introduction to Internet Surveillance, Correlation, Trace Back, and Response. Intrusion.Net Books, Andover, NJ, 1999. [3] Rebecca Bace. Intrusion Detection. Macmillan Technical Publishing, Indianapolis, 2000. [4] William R. Cheswick and Steven M. Bellovin. Firewalls and Internet Security. Addison-Wesley, Reading, MA, 1994. [5] George Coulouris, Jean Dollimore, and Tim Kindberg. Distributed Systems: Concepts and Design, 2nd edition. Addison-Wesley, Harlow, England, 1996. [6] Data Encryption Standard (DES). National Bureau of Standards, U.S. Department of Commerce, FIPS 46-3, 1977. [7] W. Diffie and M. Hellman. New directions in cryptography. IEEE Transactions on Information Theory, IT-22:644-654, 1976. [8] Fyodor. Nmap: The Network Mapper. http://www.insecure.org/nmap/. [9] Intrusion Detection Working Group. http://www.ietf.org/ids.by.wg/idwg.html. [10] IP Security Protocol. http://www.ietf.org/html.charters/ipsec-charter.html, 2002. [11] J. Kohl, B. Neuman, and T. T’so. The evolution of the Kerberos authentication system. Distributed Open Systems, 78-94, IEEE Computer Society Press, 1994. [12] Nessus Vulnerabilty Scanner. http://www.nessus.org/. [13] Steven Northcutt. Network Intrusion Detection: An Analyst’s Handbook. New Riders, Indianapolis, 1999. [14] OpenSSH: Free SSH Tool Suite. http://www.openssh.org. [15] R.L. Rivest. The MD5 Message-Digest Algorithm. Technical report, Internet Request for Comments (RFC) 1321, 1992.
© 2005 by CRC Press
6-14
The Industrial Communication Technology Handbook
[16] R.L. Rivest. The RC4 Encryption Algorithm. Technical report, RSA Data Security, 1992. [17] R.L. Rivest, A. Shamir, and L. A. Adleman. A method for obtaining digital signatures and publickey cryptosystems. Communications of the ACM, 21:120-126, 1978. [18] Bruce Schneier. Applied Cryptography, 2nd edition. John Wiley & Sons, New York, 1996. [19] Secure Shell (secsh). http://www.ietf.org/html.charters/secsh-charter.html, 2002. [20] Secure Socket Layer. http://wp.netscape.com/eng/ssl3/, 1996. [21] Introduction to Secure Socket Layer. http://developer.netscape.com/docs/manuals/security/ sslin/contents.htm, 1996. [22] William Stallings. Network Security Essentials: Applications and Standards. Prentice Hall, Englewood Cliffs, NJ, 2000. [23] Andrew S. Tanenbaum and Maarten van Steen. Distributed Systems: Principles and Paradigms. Prentice Hall, Englewood Cliffs, NJ, 2002. [24] Transport Layer Security. http://www.ietf.org/html.charters/tsl-charter.html, 2002. [25] Public Key Infrastructure X.509. http://www.ietf.org/html.charters/pkix-charter.html, 2002.
© 2005 by CRC Press
2 Industrial Communication Technology and Systems
2-1 © 2005 by CRC Press
I Field Area and Control Networks 7 Fieldbus Systems: History and Evolution ........................................................................7-1 Thilo Sauter 8 The WorldFIP Fieldbus .....................................................................................................8-1 Jean-Pierre Thomesse 9 FOUNDATION Fieldbus: History and Features ....................................................................9-1 Salvatore Cavalieri 10 PROFIBUS: Open Solutions for the World of Automation .........................................10-1 Ulrich Jecht, Wolfgang Stripf, and Peter Wenzel 11 Principles and Features of PROFInet ............................................................................11-1 Manfred Popp, Joachim Feld, and Ralph Büsgen 12 Dependable Time-Triggered Communication ..............................................................12-1 Hermann Kopetz, Günther Bauer, and Wilfried Steiner 13 Controller Area Network: A Survey ...............................................................................13-1 Gianluca Cena and Adriano Valenzano 14 The CIP Family of Fieldbus Protocols ...........................................................................14-1 Viktor Schiffer 15 The Anatomy of the P-NET Fieldbus .............................................................................15-1 Christopher G. Jenkins 16 INTERBUS Means Speed, Connectivity, Safety.............................................................16-1 Jürgen Jasperneite 17 Data Transmission in Industrial Environments Using IEEE 1394 FireWire..............17-1 Michael Scholles, Uwe Schelinski, Petra Nauber, and Klaus Frommhagen 18 Configuration and Management of Fieldbus Systems..................................................18-1 Stefan Pitzek and Wilfried Elmenreich 19 Which Network for Which Application.........................................................................19-1 Jean-Dominique Decotignie
I-3 © 2005 by CRC Press
7 Fieldbus Systems: History and Evolution 7.1 7.2
What Is a Fieldbus? .............................................................7-1 Notions of a Fieldbus..........................................................7-2 The Origin of the Word • Fieldbuses as Part of a Networking Concept
7.3
History .................................................................................7-5
7.4
Fieldbus Standardization ....................................................7-8
The Roots of Industrial Networks • The Evolution of Fieldbuses The German–French Fieldbus War • The International Fieldbus War • The Compromise
7.5
Fieldbus Characteristics ....................................................7-15 Communication Concepts • Communication Paradigms • Above the OSI Layers: Interoperability and Profiles • Management
7.6
New Challenges: Industrial Ethernet ...............................7-20 Ethernet in IEC 61158 • Real-Time Industrial Ethernet
7.7
Aspects for Future Evolution............................................7-25 Driving Forces • System Complexity • Software Tools and Management • Network Interconnection and Security
Thilo Sauter Austrian Academy of Sciences
7.8 Conclusion and Outlook ..................................................7-30 Acknowledgments........................................................................7-31 References .....................................................................................7-31 Appendix ......................................................................................7-37
7.1 What Is a Fieldbus? Throughout the history of automation, many inventions and developments have influenced the face of manufacturing and information processing. But few novelties have had such a radical effect as the introduction of fieldbus systems, and no single achievement was so heavily disputed as these industrial networks. And yet, they have made automation what it is today. But even after some 20 years of fieldbus development, there exists no clear-cut definition for the term. The “definition” given in the IEC 61158 fieldbus standard is more a programmatic declaration, or a least common multiple compromise, than a concise formulation [1]: “A fieldbus is a digital, serial, multidrop, data bus for communication with industrial control and instrumentation devices such as — but not limited to — transducers, actuators and local controllers.” It comprises some important characteristics, but is far from being complete. On the other hand, it is a bit too restrictive. A more elaborate explanation is given by the Fieldbus Foundation, the user organization supporting one of the major fieldbus systems [2]: “A Fieldbus is a digital, two-way, multi-drop communication link among intelligent measurement and control devices. It serves as a Local Area Network (LAN) for advanced process control, remote input/output and high speed factory automation applications.” Again, this is a
7-1 © 2005 by CRC Press
7-2
The Industrial Communication Technology Handbook
bit restrictive, for it limits the application to process and factory automation, the primary areas where the Foundation Fieldbus is used. The lack of a clear definition is mostly due to the complex evolutionary history of fieldbuses. A look at today’s situation reveals that fieldbus systems are employed in all automation domains ranging from the aforementioned process and factory areas to building and home automation, machine building, automotive and railway applications, and avionics. In all those fields, bus systems emerged primarily to break up the conventional star-type point-to-point wiring schemes connecting simple digital and analog input and output devices to central controllers, thereby laying the grounds for the implementation of really distributed systems with more intelligent devices. As was declared in the original mission statement of the International Electrotechnical Commission (IEC) work, “the Field Bus will be a serial digital communication standard which can replace present signalling techniques such as 4-20 mA … so that more information can flow in both directions between intelligent field devices and the higher level control systems over a shared communication medium …” [3, 4]. But even though the replacement of especially the traditional 4–20 mA current loop by a digital interface is bequeathed as the sole impetus of fieldbus development, still in contemporary publications [5], there is much more to the idea of the fieldbus: • Flexibility and modularity: A fieldbus installation like any other network can be extended much more easily than a centralized system, provided the limitations of addressing space, cable length, etc., are not exceeded. • Configureability: A network — other than an analog interface — permits the parameterization and configuration of complex field devices, which facilitates system setup and commissioning and is the primary requirement for the usability of intelligent devices. • Maintainability: Monitoring of devices, applying updates, and other maintenance tasks are easier, if at all possible, via a network. • Distribution: A network is the prerequisite of distributed systems; many data processing tasks can be removed from a central controller and placed directly in the field devices if the interface can handle reasonably complex ways of communication. These aspects are not just theoretical contemplations but actual user demands that influenced the development from the beginning [4]. However, as the application requirements in the various automation domains were quite different, so were the solutions, and that makes it difficult to find a comprehensive definition. The purpose of this contribution is not to find the one and only precise definition for what constitutes a fieldbus. The vast literature on this topic shows that this is a futile attempt. Furthermore, such a definition would be mostly of academic nature and is not necessary that either. Instead, the following sections will treat the fieldbus as a given phenomenon in automation and look at it from different sides. Typical characteristics will be discussed as well as the role of fieldbus systems in a globally networked automation world. The major part of this chapter will be devoted to the historical evolution and the standardization processes, which will enlighten the current situation. Finally, future aspects and evolutionary potential are briefly discussed.
7.2 Notions of a Fieldbus Fieldbus systems have to be seen as an integrative part of a comprehensive automation concept and not as stand-alone solutions. The name is therefore programmatic and evocative. It seems to give an indication of the intentions the developers had in mind and thus deserves special attention.
7.2.1 The Origin of the Word Interestingly enough, not even the etymology of the term itself is fully clear. The English word fieldbus is definitely not the original one. It appeared around 1985 when the fieldbus standardization project
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-3
within IEC TC65 was launched [4] and seems to be a straightforward literal translation of the German term Feldbus, which can be traced back until about 1980 [6]. Indeed, the overwhelming majority of early publications in the area are available only in German. The word itself was coined in process industry and primarily refers to the process field, designating the area in a plant where lots of distributed field devices, mostly sensors and actuators, are in direct contact with the process to be controlled. Slightly after the German expression and sharing its etymological root, the French word réseau de terrain (or réseau d’instrumentation, instrumentation network) emerged. This term was not specifically targeted at the process industry, but refers also to large areas with scattered devices. The connection of such devices to the central control room was traditionally made via point-to-point links, which resulted in a significant and expensive cabling need. The logical idea, powered by the advances of microelectronics in the late 1970s, was to replace this star-like cabling in the field by a party-line, bus-like installation connecting all devices via a shared medium — the fieldbus [7, 8]. Given the large dimensions of process automation plants, the benefits of a bus are particularly evident. However, the concept was not undisputed when it was introduced. The fieldbus approach was an ambitious concept: a step toward decentralization, including the preprocessing of data in the field devices, which both increases the quality of process control and reduces the computing burden for the centralized controllers [9]. Along with it came the possibility to configure and parameterize the field devices remotely via the bus. This advanced concept, on the other hand, demanded increased communication between the devices that goes far beyond a simple data exchange. This seemed infeasible to many developers, and still in the mid-1980s, one could read statements like the following [10]: “The idea of the fieldbus concept seems promising. However, with reasonable effort it is not realizable at present.” The alternative and somewhat more conservative approach was the development of so-called field multiplexers, devices that collect process signals in the field, serialize them, and transfer them via one single cable to a remote location where a corresponding device de-multiplexes them again [11]. For quite some time, the two concepts competed and coexisted [12], but ultimately the field multiplexers mostly disappeared, except for niches in process automation, where many users still prefer such remote input/ output (I/O) systems despite the advantages of fieldbus solutions [13]. The central field multiplexer concept of sampling I/O points and transferring their values in simple data frames also survived in some fieldbus protocols, especially designed for low-level applications. The desire to cope with the wiring problem getting out of hand in large installations was certainly the main impetus for the development of fieldbus systems. Other obvious and appealing advantages of the concept are modularity, the possibility to easily extend installations, and the possibility to have much more intelligent field devices that can communicate not just for the sake of process data transfer, but also for maintenance and configuration purposes [14, 15]. A somewhat different viewpoint that led to different design approaches was to regard bus systems in process control as the spine of distributed real-time systems [16]. While the wiring optimization concepts were in many cases rather simple bottom-up approaches, these distributed real-time ideas resulted in sophisticated and usually well investigated top-down designs.
7.2.2 Fieldbuses as Part of a Networking Concept An important role in the fieldbus evolution has been played by the so-called automation pyramid. This hierarchical model was defined to structure the information flow required for factory and process automation. The idea was to create a transparent, multilevel network — the basis for computer-integrated manufacturing (CIM). The numbers vary, but typically this model is composed of up to five levels [7, 8]. While the networks for the upper levels already existed by the time the pyramid was defined, the field level was still governed by point-to-point connections. Fieldbus systems were therefore developed also with the aim of finally bridging this gap. The actual integration of field-level networks into the rest of the hierarchy was in fact considered in early standardization [4]; for most of the proprietary developments, however, it was never the primary intention. In the automation pyramid, fieldbuses actually populate two levels: the field level and the cell/process level. For this reason, they are sometimes further differentiated into two classes:
© 2005 by CRC Press
7-4
The Industrial Communication Technology Handbook
• Sensor–actuator buses or device buses have very limited capabilities and serve to connect very simple devices with, e.g., programmable logic controllers (PLCs). They can be found exclusively on the field level. • Fieldbuses connect control equipment like PLCs and PCs as well as more intelligent devices. They are found on the cell level and are closer to computer networks. Depending on the point of view, there may even be a third sublevel [17]. This distinction may seem reasonable but is in fact problematic. There are only few fieldbus systems that can immediately be allocated to one of the groups; most of them are used in both levels. Therefore, it should be preferable to abandon this arbitrary differentiation. How do fieldbus systems compare to computer networks? The classical distinction of the different network types used in the automation pyramid hinges on the distances the networks span. From top down, the hierarchy starts with global area networks (GANs), which cover long, preferably intercontinental distances and nowadays mostly use satellite links. On the second level are wide area networks (WANs). They are commonly associated with telephone networks (no matter if analog or digital). Next come the well-known local area networks (LANs), with Ethernet as the most widely used specimen today. They are the classical networks for office automation and cover only short distances. The highest level of the model shown in Figure 7.1 is beyond the scope of the original definition, but is gaining importance with the availability of the Internet. In fact, Internet technology is penetrating all levels of this pyramid all the way down to the process level. From GANs to LANs, the classification according to the spatial extension is evident. One step below, on the field level, this criterion fails, because fieldbus systems or field area networks (FANs) can cover even larger distances than LANs. Yet, as LANs and FANs evolved nearly in parallel, some clear distinction between the two network types seemed necessary. As length is inappropriate, the classical border line drawn between LANs and FANs relies mostly on the characteristics of the data transported over these networks. Local area networks have high data rates and carry large amounts of data in large packets. Timeliness is not a primary concern, and real-time behavior is not required. Fieldbus systems, by contrast, have low data rates. Since they transport mainly process data, the size of the data packets is small, and real-time capabilities are important. For some time, these distinction criteria between LANs and FANs were sufficient and fairly described the actual situation. Recently, however, drawing the line according to data rates and packet sizes is no longer applicable. In fact, the boundaries between LANs and fieldbus systems have faded. Today, there are fieldbus systems with data rates well above 10 Mbit/s, which is still standard in older LAN installations. In addition, more and more applications require the transmission of video or voice data, which results in large data packets. Network types
Protocol hierarchy
company level
global area networks
factory level
wide area networks
TOP
MAP shop floor level
cell level process level
field level
cell controller
PLC
sensors/actuators
CNC
local area networks Mini-MAP field area networks
Fieldbus
sensor-actuator networks
(sensor level)
FIGURE 7.1 Hierarchical network levels in automation and protocols originally devised for them.
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-5
On the other hand, Ethernet as the LAN technology is becoming more and more popular in automation and is bound to replace some of today’s widely used midlevel fieldbus systems. The real-time extensions under development tackle its most important drawback and will ultimately permit the use of Ethernet in low-level control applications. At least for the next 5 years, however, it seems that Industrial Ethernet will not make the lower-level fieldbuses fully obsolete. They are much better optimized for their specific automation tasks than the general-purpose network Ethernet. But the growing use of Ethernet results in a reduction of the levels in the automation hierarchy. Hence the pyramid gradually turns into a flat structure with at most three, maybe even only two, levels. Consequently, a more appropriate distinction between LANs and FANs should be based on the functionality and the application area of these networks. According to this argumentation, a fieldbus is simply a network used in automation, irrespective of topology, data rates, protocols, or real-time requirements. Consequently, it need not be confined to the classical field level; it can be found on higher levels (provided they still exist) as well. A LAN, on the other hand, belongs to the office area. This definition is loose, but mirrors the actual situation. Only one thing seems strange at first: following this definition, the Industrial Ethernet changes into a fieldbus, even though many people are inclined to associate it with LANs. However, this is just another evidence that the boundaries between LANs and FANs are fading.
7.3 History The question of what constitutes a fieldbus is closely linked to the evolution of these industrial networks. The best approach to understanding the essence of the concepts is to review the history and intentions of the developers. This review will also falsify one of the common errors frequently purported by marketing divisions of automation vendors: that fieldbus systems were a revolutionary invention. They may have revolutionized automation — there is hardly any doubt about it. However, they were only a straightforward evolution that built on preexisting ideas and concepts.
7.3.1 The Roots of Industrial Networks Although the term fieldbus appeared only about 20 years ago, the basic idea of field-level networks is much older. Still, the roots of modern fieldbus technology are mixed. Both classical electrical engineering and computer science have contributed their share to the evolution, and we can identify three major sources of influence: • Communication engineering with large-scale telephone networks • Instrumentation and measurement systems with parallel buses and real-time requirements • Computer science with the introduction of high-level protocol design This early stage is depicted in Figure 7.2. One foundation of automation data transfer has to be seen in the classic telex networks and also in standards for data transmission over telephone lines. Large distances called for serial data transmission, and many of these comparatively early standards still exist, like V.21 (data transmission over telephone lines) and X.21 (data transmission over special data lines). Various protocols have been defined, mostly described in state machine diagrams and rather simple because of the limited computing power of the devices available at that time. Of course, these communication systems have a point-to-point nature and therefore lack the multidrop characteristic of modern fieldbus systems, but nevertheless, they were the origin of serial data transmission. Talking about serial data communication, one should notice that the engineers who defined the first protocols often had a different understanding of the terms serial and parallel than we have today. For example, the serial interface V.24 transmits the application data serially, but the control data in a parallel way over separate control lines. In parallel to the development of data transmission in the telecommunication sector, hardware engineers defined interfaces for stand-alone computer systems to connect peripheral devices such as printers. The basic idea of having standardized interfaces for external devices was soon extended to process control and instrumentation equipment. The particular problems to be solved were the synchronization of
© 2005 by CRC Press
7-6
The Industrial Communication Technology Handbook
Centronics parallel printer interfaces
Telex DT
CAMAC industrial parallel interfaces
Teletex DT
GPIB
RS 485
serial interfaces
fieldbus systems
SS7 V.21
X.21
DT in telecommunications Computer WAN X.25
FIGURE 7.2 Roots of fieldbus systems.
spatially distributed measurement devices and the collection of measurement data from multiple devices in large-scale experimental setups. This led to the development of standards like CAMAC (computerautomated measurement and control, mostly used in nuclear science) and GPIB (general purpose interface bus, later also known as IEEE 488). To account for the limited data processing speed and real-time requirements for synchronization, these bus systems had parallel data and control lines, which is also not characteristic for fieldbus systems. However, they were using the typical multidrop structure. Later on, with higher integration density of integrated circuits and thus increased functionality and processing capability of microcontrollers, devices became smaller and portable. The connectors of parallel bus systems were now too big and clumsy, and alternatives were sought [18]. The underlying idea of developments like I2C [19] was to extend the already existing serial point-to-point connections of computer peripherals (based on RS 232) to support longer distances and finally also multidrop arrangements. The capability of having a bus structure with more than just two connections together with an increased noise immunity due to differential signal coding eventually made RS 485 a cornerstone of fieldbus technology up to the present day. Historically the youngest root of fieldbus systems, but certainly the one that left the deepest mark, was the influence of computer science. Its actual contribution was a structured approach to the design of high-level communication systems, contrary to the mostly monolithic design approaches that had been sufficient until then. This change in methodology had been necessitated by the growing number of computers used worldwide and the resulting complexity of communication networks. Conventional telephone networks were no longer sufficient to satisfy the interconnection requirements of modern computer systems. As a consequence, the big communication backbones of the national telephone companies gradually changed from analog to digital systems. This opened the possibility to transfer large amounts of data from one point to another. Together with an improved physical layer, the first really powerful data transmission protocols for wide area networks were defined, such as X.25 (packet switching) or SS7 (common channel signaling). In parallel to this evolution on the telecommunications sector, local area networks were devised for the local interconnection of computers, which soon led to a multitude of solutions. It took nearly a decade until Ethernet and the Internet Protocol (IP) suite finally gained the dominating position they have today.
7.3.2 The Evolution of Fieldbuses The preceding section gave only a very superficial overview of the roots of networking, which laid the foundations not only of modern computer networks, but also of those on the field level. But let us now look more closely at the actual evolution of the fieldbus systems. Here again, we have to consider the
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-7
different influences of computer science and electrical engineering. First and foremost, the key contribution undoubtedly came from the networking of computer systems, when the International Organization for Standardization (ISO) introduced the Open Systems Interconnection (OSI) model [20, 21]. This seven-layer reference model was (and still is) the starting point for the development of many complex communication protocols. The first application of the OSI model to the domain of automation was the definition of the Manufacturing Automation Protocol (MAP) in the wake of the CIM idea [22]. MAP was intended to be a framework for the comprehensive control of industrial processes covering all automation levels, and the result of the definition was a powerful and flexible protocol [23]. Its complexity, however, made implementations extremely costly and hardly justifiable for general-purpose use. As a consequence, a tightened version called MiniMAP, using a reduced model based on OSI layers 1, 2, and 7, was proposed to better address the problems of the lower automation layers [24]. Unfortunately, it did not have the anticipated success either. What did have success was Manufacturing Message Specification (MMS). It defined the cooperation of various automation components by means of abstract objects and services and was later used as a starting point for many other fieldbus definitions [25]. The missing acceptance of MiniMAP and the inapplicability of the original MAP/MMS standard to time-critical systems [26] were finally the reason for the IEC to launch the development of a fieldbus based on the MiniMAP model, but tailored to the needs of the field level. According to the original objectives, the higher levels of the automation hierarchy should be covered by MAP or PROWAY (process data highway) [22]. Independent of this development in computer science, the progress in microelectronics brought forward many different integrated controllers (ICs), and new interfaces were needed to interconnect the ICs in an efficient and cheap way. The driving force was the reduction of both the interconnect wires on the printed circuit boards and the number of package pins on the ICs. Consequently, electrical engineers — without knowledge of the ISO/OSI model or similar architectures — defined simple buses like the I2C. Being interfaces rather than fully fledged bus systems, they have very simple protocols, but they were and still are widely used in various electronic devices. Long before the invention of board-level buses, the demand for a reduction of cabling weight in avionics and space technology had led to the development of the MIL-STD-1553 bus, which can be regarded as the first real fieldbus. Introduced in 1970, it showed many characteristic properties of modern fieldbus systems: serial transmission of control and data information over the same line, master–slave structure, the possibility to cover longer distances, and integrated controllers. It is still used today. Later on, similar thoughts (reduction of cabling weight and costs) resulted in the development of several bus systems in the automotive industry, but also in the automation area. A characteristic property of these fieldbuses is that they were defined in the spirit of classical interfaces, with a focus on the lower two protocol layers, and no or nearly no application layer definitions. With time, these definitions were added to make the system applicable to other areas as well. Controller Area Network (CAN) is a good example of this evolution: for the originally targeted automotive market, the definition of the lowest two OSI layers was sufficient. Even today, automotive applications of CAN typically use only these low-level communication features because they are easy to use and the in-vehicle networks are usually closed. For applications in industrial automation, however, where extensibility and interoperability are important issues, higher-level functions are important. So, when CAN was found to be interesting also for other application domains, a special application layer was added. The lack of such a layer in the original definition is the reason why there are many different fieldbus systems (like CANopen, Smart Distributed System (SDS), DeviceNet) using CAN as a low-level interface. From today’s point of view, it can be stated that all fieldbuses that still have some relevance were developed using the top-down or computer science-driven approach, i.e., a proper protocol design with abstract high-level programming interfaces to facilitate usage and integration in complex systems. The fieldbuses that followed the bottom-up or electrical engineering-driven approach, i.e., that were understood as low-level computer interfaces, did not survive due to their inflexibility and incompatibility with modern software engineering, unless some application layer functions were included in the course of the evolution.
© 2005 by CRC Press
7-8
The Industrial Communication Technology Handbook
Computer Science
ARPANET Microprocessors
Internet
Ethernet
C4004
ISO/OSI
C8080
C8086
Building and home automation
80386
Batibus
Modbus ARCNET
PDV-Bus Automotive and avionics
IEEE488 GPIB Predecessors
1970
Pentium
BacNet EIB LON
FF PROWAY ISA SP50 FIP ControlNet IEC61158 P-NET Profibus SDS IEC61784 EN50254 Interbus ASi Bitbus EN50170 EN50325 Sercos Hart DeviceNet
ARINC
MIL 1553 Interfaces, Instrumentation, PCB busses CAMAC
80486
CEbus
X10
Industrial and process
WWW
MMS
MAP
CAN
I 2C HP-IL RS485
SwiftNet M-Bus
Meas. Bus
Proprietary and Open Systems
1980
1990
International Standards
2000
FIGURE 7.3 Milestones of fieldbus evolution and related fields.
From the early 1980s on, when automation made a great leap forward with PLCs and more intelligent sensors and actuators, something like a gold rush set in. The increasing number of devices used in many application areas called for reduced cabling, and microelectronics had grown mature enough to support the development of elaborated communication protocols. This was also the birth date for the fieldbus as an individual term. Different application requirements generated different solutions, and from today’s point of view, it seems that creating new fieldbus systems was a trendy and fashionable occupation for many companies in the automation business. Those mostly proprietary concepts never had a real future, because the number of produced nodes could never justify the development and maintenance costs. Figure 7.3 depicts the evolution timeline of fieldbus systems and their environments. The list of examples is of course not comprehensive; only systems that still have some significance have been selected. Details about the individual solutions are summarized in the tables in the appendix. As the development of fieldbus systems was a typical technology push activity driven by the device vendors, the users first had to be convinced of the new concepts. Even though the benefits were quite obvious, the overwhelming number of different systems appalled rather than attracted the customers, who were used to perfectly compatible current-loop or simple digital inputs and outputs as interfaces between field devices and controllers and were reluctant to use new concepts that would bind them to one vendor. What followed was a fierce selection process where not always the fittest survived, but often those with the highest marketing power behind them. Consequently, most of the newly developed systems vanished or remained restricted to small niches. After a few years of struggle and confusion on the user’s side, it became apparent that proprietary fieldbus systems would always have only limited success and that more benefit lies in creating open specifications, so that different vendors may produce compatible devices, which gives the customers back their freedom of choice [8, 27]. As a consequence, user organizations were founded to carry on the definition and promotion of the fieldbus systems independent of individual companies [28]. It was this idea of open systems that finally paved the way for the breakthrough of the fieldbus concept.
7.4 Fieldbus Standardization From creating an open specification to the standardization of a fieldbus system it is only a small step. The basic idea behind it is that a standard establishes a specification in a very rigid and formal way, ruling out the possibility of quick changes. This attaches a notion of reliability and stability to the specification,
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-9
which in turn secures the trust of the customers and, consequently, also the market position. Furthermore, a standard is vendor independent, which guarantees openness. Finally, in many countries standards have a legally binding position, which means that when a standard can be applied (e.g., in connection with a public tender), it has to be applied. Hence, a standardized system gains a competitive edge over its nonstandardized rivals. This position is typical for, e.g., Europe (see [29] for an interesting U.S.-centric comment). It is therefore no wonder that after the race for fieldbus developments, a race for standardization was launched. This was quite easy on a national level, and most of today’s relevant fieldbus systems soon became national standards. Troubles started when international solutions were sought. One problem of fieldbus standardization is that the activities are scattered among a multitude of committees and working groups according to the application fields. This reflects the historical evolution and underpins the previous statement that the fieldbus is not a unique and revolutionary technology, but emerged independently in many different areas. Interestingly enough, the standardization activities are not even confined to the electrotechnical standardization bodies. Inside the IEC, committees concerned are: • IEC TC65/SC65C: Industrial-Process Measurement and Control/Digital Communications • IEC TC17/SC17B: Switchgear and Controlgear/Low-Voltage Switchgear and Controlgear In the ISO, work is being done in: • ISO TC22/SC3: Road Vehicles/Electrical and Electronic Equipment • ISO TC184/SC5: Industrial Automation Systems and Integration/Architecture, Communications and Integration Frameworks • ISO TC205/WG3: Building Environment Design/Building Control Systems Design The second player in the international standardization arena is the European standardization bodies CENELEC and CEN.* They are not mirrors of the IEC and ISO; the committees work independently, even though much work is being done in parallel. In recent years, cooperation agreements were established with the aim of facilitating the harmonization of international standardization. The cooperation of ISO and CEN is governed by the Vienna Agreement [30] (1990), and that of IEC and CENELEC by the Dresden Agreement [31] (1996). Roughly, these documents define procedures to carry out parallel votings and to simultaneously adopt standards on both the international and European levels. In practice, this comes down to international standards always superseding European ones, even though there is the theoretical possibility of European work being adopted on an international level. Hence, European committees are today much more closely connected to their worldwide counterparts than they were at the beginning of the fieldbus era. Within CENELEC, such relevant committees are: • CLC TC65CX: Fieldbus • CLC TC17B: Low-voltage Switchgear and Controlgear Including Dimensional Standardization • CLC TC205: Home and Building Electronic Systems (HBES) In CEN, fieldbuses are defined in: • CEN TC247: Building Automation, Controls and Building Management The committee with the longest track record in fieldbus standardization is IEC SC65C, which in May 1985 started the ambitious endeavor of defining an international and uniform fieldbus standard for process and industrial automation. This initiative came relatively early, soon after the trend toward fieldlevel networking and the inability of MAP to fully cover it became apparent. With the background of several industry-driven solutions emerging all around, however, this project caused heavy turbulences and opened a battlefield for politics that gradually left the ground of technical discussion. Table 7.1 shows the overall timeline of these fieldbus wars, which form an essential and obscure chapter in the fieldbus history and thus deserve special attention. *CENELAC, Comité Européen de Normalisation Electrotechnique (European Committee for Electrotechnical Standardization); CEN, Comité Européen de Normalisation (European Committee for Standardization).
© 2005 by CRC Press
7-10
The Industrial Communication Technology Handbook
TABLE 7.1
Fieldbus Standardization Timeline from the Viewpoint of IEC 61158
Period
Status of Standards
1985–1990
The claims are staked
1990–1994
German–French fieldbus war
1995–1998
Standardization locked in stalemate
1999–2000 2000–2002
The compromise Amendments to reach maturity for the market
Major Activities Start of the IEC fieldbus project; selection of various national standards — German Profibus and French FIP are the main candidates; first attempts to combine the two approaches Attempt of a general specification based on WorldFIP and the Interoperable System Project (ISP) Development of the American Foundation Fieldbus (FF) in response to the European approach and formation of the CENELEC standards comprising several fieldbus systems in one standard; deadlock of the international standard through obstructive minorities The eight-type specification becomes a standard The standard is enhanced by more types and the necessary profiles are specified in IEC 61784
7.4.1 The German–French Fieldbus War The actual starting point of international fieldbus standardization in IEC SC65C was a new work item proposed by the German national mirror committee [32]. The task was allocated to the already existing working group 6 dealing with the definition of PROWAY, another fieldbus predecessor. At that time, the development of fieldbus systems was mainly a European endeavor, thrust forward by research projects that still had a strong academic background as well as many proprietary developments. The European activities — at least those on the nonproprietary level — also have to be seen as a response to MAP, where the U.S. had a dominating position. Hence, the two big European fieldbus projects at that time, Factory Instrumentation Protocol (FIP) and Profibus, were intended to be counterweights for the international automation world. The IEC work started with a definition of requirements a fieldbus must meet [4]. In parallel, the ISA SP50 committee started its own fieldbus project on the U.S. level and defined a slightly different set of requirements [24, 33]. Work was coordinated between the two committees, with ISA taking the more active part. It launched a call for proposals to evaluate existing solutions. In response to this call, the following systems were identified as possible candidates [34]: • FIP (Flux Information Processus, Factory Instrumentation Protocol), a French development started around 1982 • Profibus (derived from process field), a German project started around 1984 • A proposal from Rosemount based on the ISO 8802.4 token-passing bus • A proposal from Foxboro based on the high-level data link control (HDLC) protocol • The IEEE 1118 project, in fact an extension of Bitbus • An extension of MIL-STD-1553B defined by a U.K. consortium All these proposals were evaluated, and finally the two most promising projects retained for further consideration were the French FIP and the German Profibus. Unfortunately, the approaches of the two systems were completely different. Profibus was based on a distributed control idea and in its original form supported an object-oriented vertical communication according to the client–server model in the spirit of the MAP/MMS specification, with the lower two layers taken from the exiting PROWAY project. FIP, on the other hand, was designed with a central, but strictly real-time-capable control scheme and with the newly developed producer–consumer (producer–distributor–consumer) model for horizontal communication. In fact, the idea behind FIP was to have a distributed operating system; a communication protocol was just one building block. Different as they were, the two systems were well suited for complementary application areas [35]. Evidently, a universal fieldbus had to combine the benefits of both, and so the following years saw strong efforts to find a viable compromise and a convergence between the two approaches. The most problematic part was the data link layer, where Profibus supported a token-passing scheme, while FIP relied on a
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-11
central scheduling approach. The suggestion to standardize both in parallel was not supported, and so it came that two different proposals were put to vote: a token-passing approach and a new proposal defined by an expert group with the aim of reconciling the two worlds [32]. The latter was more FIP oriented and finally prevailed [36], but it was very complex and left many Profibus supporters skeptical about its practical usability. In the meantime, the leading role in the standardization efforts on the IEC level had been taken not by the Europeans, but by the work of the SP50 committee of the Instrumentation, Systems and Automation Society (ISA, at that time still standing for Instrument Society of America). Owing to its mandatory composition involving manufacturers and users, it had taken a more pragmatic view and had been much more efficient during the late 1980s [37]. Actually, the committee had defined and issued (as a U.S. standard in 1993) a solution on its own. The results of this work exerted an important influence on the layer structure of the standard as we have it today [8, 38]. Finally, ISA and IEC decided to have joint meetings [35], and from that point onward the actual technical work was done within ISA SP50, while IEC restricted its activities to organizing the voting process. By the mid-1990s, the IEC committee was still struggling to overcome the differences between Profibus and FIP in what was sarcastically called the two-headed monster. With respect to its goal of defining a uniform fieldbus solution, it had not produced any substantial outcome for more than 8 years. The only exception was the definition of the physical layer, which was adopted as IEC 61158-2 in 1993. This part is the one that has since been used very successfully, mainly in the process automation area. On top of the physical layer, however, the standardization drafts became more and more comprehensive and overloaded with all kinds of communication and control principles imported from the different systems. In the data link layer specification, for example, three different types of tokens were introduced: the scheduler token, which determines which station controls the timing on the bus; the delegated token, with which another station can temporarily gain control over the bus; and the circulated token, which is passed from station to station for bus access. The problem with these all-inclusive approaches was that a full implementation of the standard was too expensive, whereas a partial implementation would have resulted in incompatible and not interoperable devices (a problem that was encountered also in the early implementations of, e.g., Profibus-FMS (fieldbus message specification), where significant parts of the standard are optional and not mandatory). Outside the international standardization framework, but alerted by the inability of the committees to reach a resolution, the big vendors of automation systems launched two additional initiatives to find a compromise. The foundation of the international WorldFIP project in 1993 had the goal of adding the functionality of the client–server model to FIP [39]. On the other side, the Interoperable System Project (ISP) attempted to demonstrate from 1992 onward how Profibus could be enhanced with the publisher–subscriber communication model, which is about the same as the producer–consumer model of FIP. Strange enough, the ISP was abandoned in 1994, before reaching a mature state, for strategic reasons [40].
7.4.2 The International Fieldbus War In 1994, after long years of struggles between German and French experts to combine the FIP and Profibus approaches, several, mainly American, companies decided to no longer watch the endless discussions. With the end of the ISP project, several former project members joined forces with the WorldFIP North America organization and formed the Fieldbus Foundation. This new association began the definition of a new fieldbus optimized for the process industry: the Foundation Fieldbus (FF). The work was done outside the IEC committees within the ISA, and for some time, the IEC work seemed to doze off. Meanwhile in Europe, disillusion had run rampant [3]. Following the failure to find an acceptable IEC draft for a universal fieldbus, several players deemed it necessary to make a new start at least on a European level. Therefore, the CENELEC committee TC65CX was established in 1993 with the aim of finding an intermediate solution until an agreement was reached within IEC. By that time, the standardization issue had ceased to be a merely technical question. Fieldbus systems had already made their way into the
© 2005 by CRC Press
7-12
The Industrial Communication Technology Handbook
TABLE 7.2 Contents of the CENELEC Fieldbus Standards and Their Relation to IEC IS 61158 CENELEC Standards Part
Contained in IEC Standard
EN 50170-1 (July 1996) EN 50170-2 (July 1996) EN 50170-3 (July 1996) EN 50170-A1 (Apr. 2000) EN 50170-A2 (Apr. 2000) EN 50170-A3 (Aug. 2000) EN 50254-2 (Oct. 1998) EN 50254-3 (Oct. 1998) EN 50254-4 (Oct. 1998) EN 50325-2 (Jan. 2000) EN 50325-3 (Apr. 2000) EN 50325-4 (July 2002) EN 50295-2 (Dec. 1998)
IS 61158 type 4 IS 61158 type 1/3/10 IS 61158 type 1/7 IS 61158 type 1/9 IS 61158 type 1/3 IS 61158 type 2 IS 61158 type 8 (IS 61158 type 3) (IS 61158 type 7) IS 62026-3 (2000) IS 62026-5 (2000) IS 62026-2 (2000)
Brand Name P-Net Profibus WorldFIP Foundation Fieldbus Profibus-PA ControlNet Interbus Profibus-DP (Monomaster) WorldFIP (FIPIO) DeviceNet SDS CANOpen AS-Interface
Note: The dates given in parentheses are the dates of ratification by the CENELEC Technical Board. The parenthetical IEC types denote that the respective fieldbus is contained in a superset definition.
market, much effort and enormous amounts of money had been invested in the development of protocols and devices, and there were already many installations. Nobody could afford to abandon a successful fieldbus; hence it was — from an economical point of view — impossible to start from scratch and create a unified but new standard that was incompatible with the established and widely used national ones. The emerging market pressure was also a reason that within CENELEC no uniform fieldbus solution could be agreed upon. However, the national committees found after lengthy and controversial discussions [3] a remarkable and unprecedented compromise: all national standards under consideration were simply compiled “as is” to European standards [41]. Every part of such a multipart standard is a copy of the respective national standard, which means that every part is a fully functioning system. Although this approach is very pragmatic and seems easy to carry out once adopted, it took a long time to reach it. After all, with the strict European regulations about the mandatory application of standards, being part of it would ensure competitiveness for the respective system suppliers. As there were mostly representatives of the big players present in the committees, they naturally tried to optimize their own positions. Consequently, the contents of the individual CENELEC standards that were adopted step by step still reflect the strategic alliances that had to be formed by the national committees to get “their” standard into the European ones. To make the CENELEC collection easier to handle, the various fieldbus systems were bundled according to their primary application areas. EN 50170 contains general-purpose field communication systems, EN 50254 has high-efficiency communication subsystems for small data packages, and EN 50325 is composed of different solutions based on the CAN technology. In the later phases of the European standardization process, the British national committee played the part of an advocate of the American companies and submitted also FF, DeviceNet, and ControlNet for inclusion in the European standards. Table 7.2 shows a compilation of all these standards, as well as their relation to the new IEC standard. For the sake of completeness, it should be noted that a comparable, though much less disputed, standardization process also took place for bus systems used in machine construction (dealt with by ISO), as well as building automation (in CEN and more recently in ISO). While the Europeans were busy standardizing their national fieldbus systems and simply disregarded what happened in IEC, the Fieldbus Foundation prepared its own specification. This definition was modeled after the bus access scheme of FIP and the application layer protocol of the ISP work (which was in turn based on Profibus-FMS). The FF specification naturally influenced the work in the IEC committee, and consequently, the new draft evolved into a mixture of FF and WorldFIP. By several members of IEC TC65, this was seen as a reasonable compromise able to put an end to the lengthy debate.
© 2005 by CRC Press
7-13
Fieldbus Systems: History and Evolution
However, when the draft was put to a vote in 1996, it was rejected by a very narrow margin, and the actual fieldbus war started. What had happened? The casus belli was that Profibus (specifically the variant PA, which was named after the target application area, process automation, and which had been developed by the Profibus User Organization based on the ideas of the abandoned ISP project) was no longer properly represented in the IEC draft. When the majority of ISP members had teamed up with WorldFIP North America to form the Fieldbus Foundation, the main Profibus supporters had been left out in the rain. The fact that Profibus was already part of a CENELEC standard was no consolation. Given the strict European standardization rules and the Dresden Agreement, according to which international (i.e., IEC) standards supersede opposing CENELEC standards, the Profibus proponents feared that FF might gain a competitive advantage and “their” fieldbus might lose ground. Consequently, the countries where Profibus had a dominant position managed to organize an obstructive minority that prohibited the adoption of the standard. The fact that the IEC voting rules make it easier to cast positive votes (negative votes have to be justified technically) was no particular hindrance, as there were still many inconsistencies and flaws in the draft that could serve as a fig leaf. The FF empire (as it was seen by the Profibus supporters) could not take this and struck back to save “their” standard. They launched an appeal to cancel negative votes that had, in their opinion, no sufficient technical justification. The minority of votes against the draft was very small, so the cancellation of a few votes would have been enough to turn the voting result upside down. Because this idea of using rather sophisticated legal arguments to achieve the desired goal was rather delicate, they proposed that the members of the IEC committee (i.e., the respective national mirror committees) should decide about the (non-)acceptance of the incriminated votes — a procedure that is not in conformance with the IEC rules and caused substantial exasperation. The discredited countries filed a complaint to the Committee of Action (CoA) of the IEC and asked it to resolve the situation. Owing to the infrequent meetings and rather formal procedures, the controversy sketched here carried on for several months. In the meantime, a new draft had been prepared with most of the editorial errors removed. The main discussion point was again the data link layer draft. But now the question was whether the draft in its present form could really be implemented to yield a functioning fieldbus. The Profibus supporters claimed it was not possible, and they envisioned — especially in Europe — a dreary scenario of a nonfunctional IEC fieldbus standard replacing the market-proven European counterparts. The FF proponents maintained it was possible. Their argument was that the Foundation Fieldbus was implemented according to the draft and that products were already being sold. The debate waved to and fro, and Figure 7.4 tries to depict why it was so difficult to judge what was right. Over the years of development, several different versions of the data link layer specification had been submitted to the various standardization committees or implemented as products. Hence, both sides could find ample evidence for their claims.
1995 IEC WG6 DLL
1997
1996 IEC WG6 DLL CDV 160 / 161
Editorial changes
Rejected in vote
1996 ISA SP 50.02 DLL
FF Prelim. Spec. Corrections / Amendments to ISA Subset 1996
IEC WG6 DLL CDV 178/179 1997 DD 238: FF Prelim. Spec. ref. IEC 160/161 instead of ISA
FDIS Vote
To CENELEC
Amendment EN 50170 prA1
1997 FF Final Spec.
To market
Addition of 80 extra pages
FIGURE 7.4 Evolution of the IEC 61158 data link layer and the Foundation Fieldbus (FF) demonstrating the various inconsistent flavors of the document.
© 2005 by CRC Press
7-14
The Industrial Communication Technology Handbook
In the course of subsequent voting processes, the battle raged and things grew worse. There were countries voting — both in favor and against — that had never cast a vote before or that according to their status in the IEC were not even allowed to vote. There were votes not being counted because they were received on a fax machine different from that designated at the IEC and thus considered late (because the error was allegedly discovered only after the submission deadline and it took several days to carry the vote to the room next door). Finally, there were rumors about presidents of national committees who high-handedly changed the conclusions of their committee experts. Throughout this entire hot phase of voting, the meetings of the national committees burst of representatives of leading companies trying to convince the committees of one or the other position. Never before or afterwards was the interest in fieldbus standardization so high, and never were the lobbying efforts so immense — including mobilization of the media, who had difficulties getting an objective overview of the situation [42]. The spiral kept turning faster and faster, but by and large, the obstruction of the standard draft remained unchanged, and the standardization process had degenerated to a playground for company tactics, to an economical and political battle that was apt to severely damage the reputation of standardization as a whole.
7.4.3 The Compromise On June 15, 1999, the Committee of Action of the IEC decided to go a completely new way to break the stalemate. One month later, on July 16, the representatives of the main contenders in the debate (Fieldbus Foundation, Fisher Rosemount, ControlNet International, Rockwell Automation, Profibus User Organization, and Siemens) signed a “Memorandum of Understanding,” which was intended to put an end to the fieldbus war. The Solomonic resolution was to create a large and comprehensive IEC 61158 standard accommodating all fieldbus systems — a move that left unhappy many of those who had been part of the IEC fieldbus project from the beginning [36, 43]. However, other than CENELEC, where complete specifications had been copied into the standard, the IEC decided to retain the original layer structure of the draft with physical, data-link, and application layers, each separated into services and protocols parts (Table 7.3). The individual fieldbus system specifications had to be adapted to so-called types to fit into this modular structure. In a great effort and under substantial time pressure, the draft was compiled and submitted for vote. The demand of the CoA was clear-cut: either this new draft would finally be accepted, or the old draft would be adopted without further discussion. Hence it was no wonder that the new document passed the vote, and the international fieldbus was released as a standard on the carefully chosen date of December 31, 2000. It was evident that the collection of fieldbus specification modules in the IEC 61158 standard was useless for any practicable implementation. What was needed was a manual for the practical use showing which parts can be compiled to a functioning system and how this can be accomplished. This guideline was compiled later on as IEC 61784-1 as a definition of so-called communication profiles [44]. At the same time, the specifications of IEC 61158 were corrected and amended. The collection of profiles shows TABLE 7.3 Systems
© 2005 by CRC Press
Structure of the IEC 61158 Fieldbus for Industrial Control
Standards Part
Contents
Contents and Meaning
IEC 61158-1 IEC 61158-2 IEC 61158-3 IEC 61158-4 IEC 61158-5 IEC 61158-6 IEC 61158-7 IEC 61158-8
Introduction PhL: Physical Layer DLL: Data Link Layer Services DLL: Data Link Layer Protocols AL: Application Layer Services AL: Application Layer Protocols Network Management Conformance Testing
Only technical report 8 types of data transmission 8 types 8 types 10 types 10 types Must be completely revised Work has been canceled
7-15
Fieldbus Systems: History and Evolution
TABLE 7.4
Profiles and Protocols according to IEC 61784-1 and IEC 61158 IEC 61158 Protocols
IEC 61784 Profile CPF-1/1 CPF-1/2 CPF-1/3 CPF-2/1 CPF-2/2 CPF-3/1 CPF-3/2 CPF-3/3 CPF-4/1 CPF-4/2 CPF-5/1 CPF-5/2 CPF-5/3 CPF-6/1 CPF-6/2 CPF-6/3 CPF-7/1 CPF-7/2
Phy
DLL
AL
CENELEC Standard
Brand Names
Type 1 Ethernet Type 1 Type 2 Ethernet Type 3 Type 1 Ethernet Type 4 Type 4 Type 1 Type 1 Type 1 Type 8 Type 8 Type 8 Type 6 Type 6
Type 1 TCP/UDP/IP Type 1 Type 2 TCP/UDP/IP Type 3 Type 3 TCP/UDP/IP Type 4 Type 4 Type 7 Type 7 Type 7 Type 8 Type 8 Type 8 Type 6 Type 6
Type 9 Type 5 Type 9 Type 2 Type 2 Type 3 Type 3 Type 10 Type 4 Type 4 Type 7 Type 7 Type 7 Type 8 Type 8 Type 8 — Type 6
EN 50170-A1 (Apr. 2000) — EN 50170-A1 (Apr. 2000) EN 50170-A3 (Aug. 2000) — EN 50254-3 (Oct. 1998) EN 50170-A2 (Oct. 1998) — EN 50170-1 (July 1996) EN 50170-1 (July 1996) EN 50170-3 (July 1996) EN 50170-3 (July 1996) EN 50170-3 (July 1996) EN 50254-2 (Oct. 1998) EN 50254-2 (Oct. 1998) EN 50254-2 (Oct. 1998) — —
Foundation Fieldbus (H1) Foundation Fieldbus (HSE) Foundation Fieldbus (H2) ControlNet EtherNet/IP Profibus-DP Profibus-PA PROFInet P-Net RS-485 P-Net RS-232 WorldFIP (MPS, MCS) WorldFIP (MPS, MCS, SubMMS) WorldFIP (MPS) Interbus Interbus TCP/IP Interbus subset Swiftnet transport Swiftnet full stack
that the international fieldbus today consists of seven different main systems (communication profile families) that in turn can be subdivided (see Table 7.4). All important fieldbuses from industrial and process automation are listed here, and the world’s biggest automation companies are represented with their developments. Foundation Fieldbus consists of three profiles. The H1 bus is used in process automation, whereas high-speed Ethernet (HSE) is planned as an Ethernet backbone and for industrial automation. H2 is a remainder of the old draft. It allows for a migration of the WorldFIP solution toward FF, but in the profile description it is explicitly noted that there are no products available. From the Profibus side, the two profiles DP (decentralized periphery) and PA are present (even the new PROFInet has been included). Interestingly, the experts did not consider it worthwhile to list the original version of Profibus, the FMS, which is a strong sign for the diminishing importance, if not abandonment, of this hard-to-engineer fieldbus that is currently only contained in the EN 50170-2. The Danish fieldbus P-Net was taken over like all definitions and variants of WorldFIP and Interbus. In the latter case, the extensions for the tunneling of TCP/IP traffic have also been foreseen in the standard. A newcomer in the fieldbus arena is Swiftnet, which is widely used in airplane construction. The correct designation of an IEC fieldbus profile is shown for the example of Profibus-DP: compliance to IEC 61784 Ed.1:2002 CPF 3/1. Table 7.5 shows some technical characteristics and the main fields of application for the different systems. Lowlevel fieldbus systems for simple inputs/outputs (I/Os) such as the ones based on CAN or the AS-Interface are not part of IEC 61158; it is planned to combine them in IEC 62026.
7.5 Fieldbus Characteristics The application areas of fieldbus systems are manifold; hence, many different solutions have been developed in the past. Nevertheless, there is one characteristic and common starting point for all those efforts. Fieldbus systems were always designed for efficiency, with two main aspects: • Efficiency concerning data transfer, meaning that messages are rather short according to the limited size of process data that must be transmitted at a time • Efficiency concerning protocol design and implementation, in the sense that typical field devices do not provide ample computing resources
© 2005 by CRC Press
7-16
TABLE 7.5
The Industrial Communication Technology Handbook
Technical Characteristics and Application Domains of the Different Profiles
Profile
Name
CPF-1/1 CPF-1/2
FF (H1) FF (HSE)
CPF-1/3
Industry
Special Features
Nodes per Segment
Processing
Bus Access
Centralized Decentralized Decentralized Centralized Decentralized Centralized
Producer–consumer with distributor CSMA/CD Producer–consumer with distributor Producer–consumer
Centralized
Master–slave with token passing
Max. 99 Max. 30 Max. 126 Max. 32
Function blocks for decentralized control
FF (H2)
Process Factory Process Factory
Max. 32
CPF-2/1 CPF-2/2 CPF-3/1
ControlNet EtherNet/IP Profibus-DP
Factory Factory Factory
CPF-3/2
Profibus-PA
Process
CPF-3/3
PROFInet
Factory
Decentralized
Producer–consumer
Max. 30
CPF-4/1 CPF-4/2 CPF-5/1 CPF-5/2 CPF-5/3 CPF-6/1 CPF-6/2 CPF-6/3 CPF-7/1 CPF-7/2
P-Net RS-485 P-Net RS-232 WorldFIP
Factory Shipbuilding Factory
Optimized for factory applications Optimized for remote I/O Optimized for process control Distributed automation objects Multinet capability
Centralized
Max. 32
Distributed real-time database
Centralized Decentralized
Master–slave with token passing Producer–consumer with distributor
Interbus Interbus TCP/IP Interbus Subset Swiftnet transport Swiftnet full stack
Factory
Optimized for remote I/O
Centralized
Aircraft
Optimized for aircraft
Decentralized
Max. 30 Max. 32
Centralized
Single master with synchronized shift register Producer–consumer with distributor
Max. 256
Max. 256
Max. 1024
These two aspects, together with characteristic application requirements in the individual areas with respect to real-time, topology, and economical constraints, have led to the development of concepts that still are very peculiar of fieldbus systems and present fundamental differences to LANs.
7.5.1 Communication Concepts One difference to LANs concerns the protocol stack. Like all modern communication systems, fieldbus protocols are modeled according to the ISO/OSI model. However, normally only layers 1, 2, and 7 are actually used [14]. This is in fact a tribute to the lessons learned from the MAP failure, where it was found that a full seven-layer stack requires far too many resources and does not permit an efficient implementation. For this reason, the MiniMAP approach and, based on it, the IEC fieldbus standard explicitly prescribe a three-layer structure consisting of physical, data link, and application layers. In most cases, this reduced protocol stack reflects the actual situation found in many automation applications anyway. Fieldbuses typically are single-segment networks, and extensions are realized via repeaters or, at most, bridges. Therefore, network and transport layers — which contain routing functionality and end-to-end control — are simply not necessary. If functions of these layers, as well as layers 5 and 6, are still needed, they are frequently included in layer 2 or 7. For the IEC 61158 fieldbus standard, the rule is that layer 3 and 4 functions can be placed in either layer 2 or layer 7, whereas layer 5 and 6 functionalities are always covered in layer 7 (Figure 7.5) [45]. In the building automation domain (LonWorks, EIB/KNX [European installation bus and its successor, Konnex], BacNet), the situation is different. Owing to the possibly high number of nodes, these fieldbus systems must offer the capability of hierarchically structured network topologies, and a reduction to three layers is not sensible. For typical process control applications, determinism of data transfer is a key issue, and cycle time is a critical parameter. This fact has been the optimization criterion for many different fieldbus protocols and the reason that they are different from conventional LANs. Particularly the physical layer has to meet substantially more demanding requirements like robustness, immunity to electromagnetic disturbances,
© 2005 by CRC Press
7-17
Fieldbus Systems: History and Evolution
Full OSI stack Application Presentation Session Transport Network Data Link Physical
Reduced fieldbus stack IEC 61158 Coverage Application
Application
Data Link Physical
Data Link Physical
FIGURE 7.5 Layer structure of a typical fieldbus protocol stack as defined by IEC 61158.
intrinsic safety for hazardous areas, and costs. The significance of the physical layer is underpinned by the fact that this area was the first that reached (notably undisputed) consensus in standardization. On the data link layer, all medium access strategies also known from LANs are used, plus many different subtypes and refinements. Simple master–slave polling (ASi, Profibus-DP) is used as well as token-based mechanisms in either explicit (Profibus, WorldFIP) or implicit (P-Net) form. Carrier-sense multiple access (CSMA) is mostly used in a variant that tries to avoid collisions by either the dynamic adaptation of retry waiting times (LonWorks) or the use of asymmetric signaling strategies (CAN, EIB). Especially for real-time applications, time-division multiple-access (TDMA)-based strategies are employed (TTP [time-triggered protocol], but also Interbus). In many cases, the lower two layers are implemented with application-specific integrated circuits (ASICs) for performance and cost reasons. As a side benefit, the preference of dedicated controllers over software implementations also improves interoperability of devices from different manufacturers. An essential part of fieldbus protocol stacks is comprehensive application layers. They are indispensable for open systems and form the basis for interoperability. Powerful application layers offering abstract functionalities to the actual applications, however, require a substantial software implementation effort, which can negatively impact the protocol processing time and also the costs for a fieldbus interface. This is why in many cases (like Interbus or CAN) an application layer was originally omitted. While the application areas were often regarded as limited in the beginning, market pressure and the desire for flexibility finally enforced the addition of higher-layer protocols, and the growing performance of controller hardware facilitated their implementation. Network management inside fieldbus protocols is traditionally not very highly developed. This stems from the fact that a fieldbus normally is not designed for the setup of large, complex networks. There are exceptions, especially in building automation, which consequently needs to provide more elaborated functions for the setup and maintenance of the network. In most cases, however, the flexibility and functionality of network management is adapted to the functionality and application area of the individual fieldbus. There are systems with comparatively simple (ASi, Interbus, P-Net, J1939) and rather complex management functions (Profibus-FMS, WorldFIP, CANopen, LonWorks, EIB). The latter are typically more flexible in their application range but need more efforts for configuration and commissioning. In any case, network management functions are normally not explicitly present (in addition to the protocol stack, as suggested by the OSI model), but rather included in the protocol layers (mostly the application layer).
7.5.2 Communication Paradigms The characteristic properties of the various data types inside a fieldbus system differ strongly according to the processes that must be automated. Application areas like manufacturing, processing, and building automation pose different timing and consistency requirements that are not even invariant and consistent within the application areas [46]. Typical examples for different timing parameters are continuous measurement data that are sampled and transmitted in discrete-time fashion and form the basis for continuous process control and monitoring (like temperature, pressure, etc.). Other data are typically event based; i.e., they need transmission only in case of status changes (like switches, limit violations,
© 2005 by CRC Press
7-18
The Industrial Communication Technology Handbook
TABLE 7.6
Properties of Communication Paradigms
Communication relation Communication type Master–slave relation Communication service Application classes
Client–Server Model
Producer–Consumer Model
Publisher–Subscriber Model
Peer to peer Connection oriented Monomaster, multimaster Confirmed, unconfirmed, acknowledged Parameter transfer, cyclic communication
Broadcast Connectionless Multimaster Unconfirmed, acknowledged
Multicast Connectionless Multimaster Unconfirmed, acknowledged
Event notification, alarms, error, synchronization
State changes, event-oriented signal sources (e.g., switches)
etc.). As far as consistency is concerned, there are on the one hand process data that are continuously updated and on the other hand parameterization data that are transferred only upon demand. In case of error, the former can easily be reconstructed from historical data via interpolation (or simply be updated by new measurements). The systemwide consistency of configuration data, on the other hand, is an important requirement that cannot be met by mechanisms suitable for process data. These fundamental differences led to the evolution of several communication paradigms that are used either individually or in combination. The applicability in different fieldbus systems is quite different because they require various communication services and media access strategies. The three basic paradigms are: • Client–server model • Producer–consumer model • Publisher–subscriber model The most relevant properties of these three are summed up in Table 7.6. The overview shows that processes with mostly event-based communication can get along very well with producer–consumer-type communication systems, especially if the requirements concerning dynamics are not too stringent. The obvious advantage is that all connected devices have direct access to the entire set of information since the broadcasting is based on identification of messages rather than nodes. Reaction times on events can be very short due to the absence of slow polling or token cycles. Generally, producer–consumer-type systems (or subsystems) are necessarily multimaster systems because every information source (producer) must have the possibility to access the bus. The selection of relevant communication relationships is solely based on message filtering at the consumer’s side. Such filter tables are typically defined during the planning phase of an installation. The publisher–subscriber paradigm uses very similar mechanisms; the only difference is that multicast communication services are employed. The subscribers are typically groups of nodes that listen to information sources (publishers). Relating publishers and subscribers can be done online. As both paradigms are message based and therefore connectionless on the application layer, they are not suited for the transmission of sensitive, nonrepetitive data such as parameter and configuration values or commands. Connectionless mechanisms can inform the respective nodes about communication errors on layer 2, but not about errors on the application layer. The client–server paradigm avoids this problem by using connection-oriented information transfer between two nodes with all necessary control and recovery mechanisms. The communication transfer itself is based on confirmed services with appropriate service primitives (request, indication, response, confirm) as defined in the OSI model. Basically, a client–server-type communication can be implemented in both mono- and multimaster systems. In the latter cases (CSMA- and token-based systems) every master can take on the role of a client, whereas in monomaster systems (polling based) this position is reserved for the bus master. Consequently, the client–server paradigm is used mainly for monomaster systems as well as generally for discrete-time (cyclic) information transfer and for reliable data transfer on the application level (e.g., for parameterization data).
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-19
It is a characteristic feature of fieldbus systems that they do not adhere to single communication paradigms, but support a mix of strategies on different levels of sophistication. Examples for typical client–server systems are Interbus, Profibus, P-Net, and ASi. Broadcast services are here only used for special cases like synchronization purposes. Likewise, there are special ways of receiving messages (e.g., direct slave-to-slave communication) that require temporary delegation of certain bus master aspects. The other two paradigms are widely used in systems like CAN, CANopen, DeviceNet, ControlNet, EIB, and LonWorks. Yet, these systems also employ the client–server paradigm for special functions such as node configuration, file transfer, or the like.
7.5.3 Above the OSI Layers: Interoperability and Profiles A key point for the acceptance of open fieldbus systems was the possibility to interconnect devices of different vendors. Multivendor systems and interoperability are still important arguments in fieldbus marketing. The standardization of fieldbuses was originally thought to be sufficient for interoperable systems, but reality quickly showed that it was not. Standards often leave room for interpretation, and implementations may vary, even if they conform to the standard. Certification of the devices is a suitable way to reduce the problems, but by no means a guarantee. Another reason for troubles is that the semantics of data objects are not precisely defined. This problem has been disregarded in many cases until recently. In fact, it is not a problem of the fieldbus itself, but of the application. Consequently, it must be tackled beyond the ISO/OSI model. The definition of appropriate profiles (or companion standards in MMS) addresses this problem. The creation of profiles originated from the recognition that the definition of the protocol layers alone is not sufficient to allow for the implementation of interoperable products, because there are simply too many degrees of freedom. Therefore, profiles limit the top-level functionality and define specialized subsets for particular application areas [47]. Likewise, they specify communication objects, data types, and their encoding. So they can be seen as an additional layer on top of the ISO/OSI model, which is why they have also been called layer 8 or user layer. One thing to be kept in mind is that nodes using them literally form islands on a fieldbus, which contradicts the philosophy of an integrated, decentralized system. Different profiles may coexist on one fieldbus, but communication between the device groups is normally very limited or impossible. From a systematic viewpoint, profiles can be distinguished into communication, device, and branch profiles. A bus-specific communication profile defines the mapping of communication objects onto the services offered by the fieldbus. A branch profile specifies common definitions within an application area concerning terms, data types, and their coding and physical meaning. Device profiles build on communication and branch profiles and describe functionality, interfaces, and in general the behavior of entire device classes such as electric drives, hydraulic valves, and simple sensors and actuators. The work of defining profiles is scattered among different groups. Communication profiles are usually in the hands of fieldbus user groups. They can provide the in-depth know-how of the manufacturers, which is indispensable for bus-specific definitions. Device and branch profiles are increasingly a topic for independent user groups. For them, the fieldbus is just a means to an end — the efficient communication between devices. What counts more in this respect is the finding and modeling of uniform device structures and parameters for a specific application. This forms the basis for a mapping to a communication system that is generic within a given application context. The ultimate goal is the definition of fieldbus-independent device profiles [47]. This is an attempt to overcome on a high level the still overwhelming variety of systems. Finally, such profiles are also expected to facilitate the employment of fieldbus systems by the end user, who normally is only concerned about the overall functionality of a particular plant — and not about the question of which fieldbus to use. The methods used to define data types, indices, default values, coding and meanings, identification data, and device behavior are based on functional abstractions (most promising are currently function blocks [43, 48]) and universal modeling techniques [49]. A first step in the direction of fieldbus harmonization
© 2005 by CRC Press
7-20
The Industrial Communication Technology Handbook
has been taken by the European research project NOAH (Network-Oriented Application Harmonization [48, 50]), the results of which are currently under standardization by IEC SC65C in project IEC 61804.
7.5.4 Management Owing to the different capabilities and application areas of fieldbus systems, fieldbus management shows varying complexity and its solutions are more or less convenient for the user. It has already been stated above that the various fieldbuses offer a wide range of management services with grossly varying levels of sophistication. Apart from the functional boundary conditions given by the protocols, fieldbus management always strongly relies on the tool support provided by the manufacturers. This significantly adds to inhomogeneity of the fieldbus world in that entirely different control concepts, user interfaces, and implementation platforms are used. Furthermore, a strict division between communication and application aspects of fieldbus management is usually not drawn. Typical communication-related management functions are bus parameter settings like address information, data rate, or timing parameters. These functions are rather low level and implicitly part of all fieldbus protocols. The user can access them via software tools mostly supplied by the device vendor. Application-related management functions concern the definition of communication relations, systemwide timing parameters (such as cycle times), priorities, or synchronization. The mechanisms and services offered by the fieldbus systems to support these functions are very diverse and should be integrated in the management framework for the application itself (e.g., the control system using the fieldbus). As a matter of fact, a common management approach for various automation networks is still not available today, and vendor-specific solutions are preferred. From the users’ point of view (which includes not only the end users, but also system integrators), this entails significantly increased costs for the buildup and maintenance of know-how because they must become acquainted with an unmanageable variety of solutions and tools. This situation actually revives one of the big acceptance problems that fieldbus systems originally had among the community of users: the missing interoperability. Communication interoperability (as ensured by the fieldbus standards) is a necessary but not sufficient precondition. For the user, handling interoperability of devices from different vendors is equally important. What is needed are harmonized concepts for configuration and management tools. As long as such concepts do not exist, fieldbus installations will typically be single-vendor systems, which is naturally a preferable situation for the manufacturers to secure their market position. With the increasing importance of LAN and Internet technologies in automation, new approaches for fieldbus management appeared that may be apt to introduce at least a common view at various fieldbuses. All these concepts aim at integrating fieldbus management into existing management applications of the higher-level network, which is nowadays typically IP based. One commonly employed high-level network management protocol is the Simple Network Management Protocol (SNMP), which can also be used to access fieldbus data points [51, 52]. Another approach involves the use of Directory Services [53]. These two solutions permit the inclusion of a large number of devices in specialized network management frameworks. An alternative that has become very popular is the use of Web technology, specifically HTTP tunneled over the fieldbus, to control device parameters. This trend is supported by the increasing availability of embedded Web servers and the use of Extensible Markup Language (XML) as a device description language [54]. The appealing feature of this solution is that no special tools are required and a standard Web browser is sufficient. However, Web pages are less suitable for the management of complete networks and rather limited to singledevice management. Nevertheless, this approach is meanwhile pursued by many manufacturers.
7.6 New Challenges: Industrial Ethernet As stated before, Ethernet has become increasingly popular in automation. And like in the early days of fieldbus systems, this boom is driven mainly by the industry — on an academic level, the use of Ethernet had been discussed decades ago. Hence, the initial situation is comparable to that of 15 years ago, and there is enough conflict potential in the various approaches to use Ethernet in automation. After all, a
© 2005 by CRC Press
7-21
Fieldbus Systems: History and Evolution
HTTP, FTP, SMTP
SNMP, TFTP
Standard Internet TCP
Fieldbus Application Protocol Fieldbus over Internet
UDP IP
Fieldbus over Ethernet Standard Fieldbus Internet over Fieldbus
Real-time extensions Fieldbus Ethernet
FIGURE 7.6 Structures of Ethernet and fieldbus combinations.
key argument for the introduction of Ethernet was its dominating role in the office world and the resulting status of a uniform network solution. It was exactly this picture of uniqueness that marketing campaigns tried to project also onto the automation world: Ethernet as the single, consistent network for all aspects. A quick look at reality, however, shows that things are different. Ethernet per se is but a solution for the two lower OSI layers, and as fieldbus history already showed, this is not sufficient. Even if the commonly used Internet protocol suite with TCP (Transport Control Protocol) and UDP (User Datagram Protocol) is taken into account, only the lower four layers are covered. Consequently, there are several possibilities to get Ethernet or Internet technologies into the fieldbus domain, all of which are actually used in practice (Figure 7.6): • • • •
Tunneling of a fieldbus protocol over UDP/TCP/IP Definition of new real-time-enabled protocols Reduction of the free medium access in standard Ethernet Tunneling of TCP/IP over an existing fieldbus
The future role of Ethernet in the automation area is not clear. Initially, Ethernet was considered inappropriate because of its lack of real-time capabilities. With the introduction of switched Ethernet and certain modifications of the protocol, however, these problems have been alleviated. And even if there are still doubts about the predictability of Ethernet [55], its penetration into the real-time domain will influence the use of fieldbus-based devices and most likely restrict the future use of fieldbus concepts [56]. Today, Ethernet already takes the place of midlevel fieldbus systems, e.g., for the connection of PLCs. There exist first applications in manufacturing and building automation where no other fieldbuses are installed but Ethernet. To replace the existing lower-level fieldbuses by Ethernet and TCP/UDP/IP, more efforts are needed. One critical issue is (hard) real time, and there exist already different solutions to make Ethernet and TCP/IP meet the requirements of industrial applications [57]. One step below, on the sensor–actuator level, cost and implementation complexity are the most important factors. At the moment, fieldbus connection circuits for simple devices, often only one ASIC, are still cheaper than Ethernet connections. However, with modifications and simplifications of the controller hardware and the protocol implementations, Ethernet could finally catch up and become an interesting option.
7.6.1 Ethernet in IEC 61158 Only recently has standardization begun to deal with the question of Industrial Ethernet. Still, in the wake of the fieldbus wars, several solutions based on Ethernet and TCP/UDP/IP have made their way into the IEC 61158 standard without much fighting (see also Table 7.4):
© 2005 by CRC Press
7-22
The Industrial Communication Technology Handbook
• • • •
High-speed Ethernet (HSE) of the Foundation Fieldbus EtherNet/IP of ControlNet and DeviceNet PROFInet defined by Profibus International TCP/IP over Interbus
HSE and EtherNet/IP (note that here IP stands for Industrial Protocol) are two solutions with a fieldbus protocol being tunneled over TCP/IP. To be specific, it is no real tunneling, where data packets of a lower fieldbus OSI layer are wrapped in a higher-layer protocol of the transport medium. Instead, the same application layer protocol, which is already defined for the fieldbus, is also used over the TCP/IP or UDP/IP stack. In the case of ControlNet and DeviceNet, this is the Control and Information Protocol [58]. This solution allows the device manufacturers to base their developments on existing and well-known protocols. The implementation is without any risk and can be done fast. The idea behind PROFInet is more in the direction of implementing a new protocol. For the actual communication, however, it was decided to use the component object model (COM)/distributed component object model (DCOM) mechanism known from the Windows world. This solution opens a wide possibility of interactions with the office IT software available on the market. The possibility to use fieldbus devices like objects in office applications will increase the vertical connectivity. On the other hand, this also includes the risk of other applications overloading the network, which has to be avoided. Basically, the COM/DCOM model defines an interface to use modules as black boxes within other applications. PROFInet offers a collection of automation objects with COM interfaces independent of the internal structure of the device. So the devices can be virtual, and the so-called proxy servers can represent the interfaces of any underlying fieldbus. This encapsulation enables the user to apply different implementations from different vendors. The only thing the user has to know is the structure of the interface. Provided the interfaces of two devices are equal, the devices are at least theoretically interchangeable. Although this proxy mechanism allows the connection of the Ethernet to all types of fieldbus systems, it will not be a simple and real-time-capable solution. A second problem is that in order to achieve portability, the COM/DCOM mechanism has to be reprogrammed for different operating systems. DCOM is tightly connected to the security mechanisms of Windows NT, but there is also the possibility of using WIN95/98 systems or — with restrictions — some UNIX systems. To simplify this, the PROFInet runtime system includes the COM/DCOM functionality, and the standard COM/DCOM functions inside the operating system have to be switched off if PROFInet is used. The solution of tunneling TCP/IP over a fieldbus requires some minimum performance in terms of throughput from the fieldbus to be acceptable. Normally, throughput of acyclic data (the transport mechanism preferably used in this case) is not the strongest point of fieldbus systems. Nevertheless, Interbus defines the tunneling of TCP/IP over its acyclic communication channel [59]. The benefit of this solution is the parameterization of devices connected to the fieldbus with standard Internet services and well-known tools, e.g., a Web browser. This approach opens the possibility of achieving a new quality of user interaction, as well as a simpler integration of fieldbus management into existing high-level systems. On the downside, however, it forces the manufacturer of the field device to also implement the complete TCP/IP stack, maybe together with a Web server, on the device and the installation personnel to handle the configuration of the IP addressing parameters.
7.6.2 Real-Time Industrial Ethernet The Industrial Ethernet solutions discussed so far build on Ethernet in its original form; i.e., they use the physical and data link layers of ISO/IEC 8802-3 without any modifications. Furthermore, they assume that Ethernet is low loaded or Fast Ethernet switching technology is used, in order to get a predictable performance. Switching technology does eliminate collisions, but delays inside the switches and lost packages under heavy load conditions are unavoidable with switches [60]. This gets worse if switches are used in a multilevel hierarchy and may result in grossly varying communication delays. The real-time
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-23
capabilities of native Ethernet are therefore limited and must rely on application-level mechanisms controlling the data throughput. For advanced requirements, like drive controls, this is not sufficient. These known limitations of conventional Ethernet stimulated the development of several alternative solutions that were more than just adaptations of ordinary fieldbus systems. These entirely new approaches were originally outside the IEC standardization process, but are now candidates for inclusion in the real-time Ethernet (RTE) standard, i.e., the second volume of IEC 61784. The initial and boundary conditions for the standardization work, which started in 2003, are targeted at backward compatibility with existing standards. First of all, RTE is seen as an extension to the Industrial Ethernet solutions already defined in the communication profile families in IEC 61784-1. Furthermore, coexistence with conventional Ethernet is intended. The scope of the working document [61] states that “the RTE shall not change the overall behavior of an ISO/IEC 8802-3 communication network and their related network components or IEEE 1588, but amend those widely used standards for RTE behaviors. Regular ISO/IEC 8802-3 based applications shall be able to run in parallel to RTE in the same network.” Reference to the time distribution standard IEEE 1588 [62] is made because it will be the basis for the synchronization of field devices. The work program of the RTE working group essentially consists of the definition of a classification scheme with RTE performance classes based on actual application requirements [63]. This is a response to market needs that demand scalable solutions for different application domains. One possible classification structure could be based on the reaction time of typical applications in automation: • A first low-speed class with reaction times around 100 ms. This timing requirement is typical for the case of humans involved in the system observation (10 pictures per second can already be seen as a low-quality movie), for engineering, and for process monitoring. Most processes in process automation and building control fall into this class. This requirement may be fulfilled with a standard system with a TCP/IP communication channel without many problems. • In a second class the requirement is a reaction time below 10 ms. This is the requirement for most tooling machine control systems like PLCs or PC-based control. To reach this timing behavior, special care has to be taken in the RTE equipment: sufficient computing resources are needed to handle TCP/IP in real-time or the protocol stack must be simplified and reduced to get these reaction times on simple, cheap resources. • The third and most demanding class is defined by the requirements of motion control: to synchronize several axes over a network, a time precision well below 1 ms is needed. Current approaches to reach this goal rely on modifications of both protocol medium access and hardware structure of the controllers. These classes will then be the building blocks for additional communication profiles. The intended structural resemblance to the fieldbus profiles is manifested by the fact that the originally attributed document number IEC 62391 was changed to IEC 61784-2. The technological basis for the development will mostly be switched Ethernet. At the moment there are several systems that have the potential to fulfill at least parts of such an RTE specification and that are already introduced on the market or will be shortly. From these systems, three are extensions to fieldbuses already contained in IEC 61784: EtherNet/IP: Defined by Rockwell and supported by Open DeviceNet Vendor Association (ODVA) and ControlNet International, EtherNet/IP makes use of the Common Industrial Protocol (CIP), which is common to the networks EtherNet/IP, ControlNet, and DeviceNet. CIP defines objects and their relations in different profiles and fulfills the requirements of class 1 on EtherNet/IP. As such, it is part of IEC 61784-1. With the CIP Sync extensions it is possible to get isochronous communication that satisfies class 2 applications. These extensions use 100 MBit/s networks with the help of IEEE 1588 time synchronization. PROFInet: Defined mainly by Siemens and supported by Profibus International. Only the first version is currently included in the international fieldbus standard. A second step was the definition of
© 2005 by CRC Press
7-24
The Industrial Communication Technology Handbook
TABLE 7.7 IEC 61784
Industrial Ethernet Profiles Defined in
IEC 61784 Profile
Volume
Brand Names
CPF-1 CPF-2 CPF-3 CPF-6 CPF-10 CPF-11 CPF-12 CPF-13 CPF-14 CPF-15
1 1, 2 1, 2 1, 2 2 2 2 2 2 2
Foundation Fieldbus EtherNet/IP PROFInet Interbus VNET/IP TCnet EtherCAT EPL (Ethernet Powerlink) EPA Modbus
a soft real-time (SRT) solution for PROFInet IO. In this version class 2 performance is also reached for small and cheap systems by eliminating the TCP/IP stack for process data. I/O data are directly packed into the Ethernet frame with a specialized protocol. Class 3 communication is reached with a special switch ASIC with a short and stable cut-through time and special priority mechanism for real-time data [64]. Synchronization is based on an extension of IEEE 1588 using onthe-fly time stamping, an idea that has been introduced in a different context [65]. The first application planned for PROFInet isochronous real time (IRT) is the PROFIdrive profile for motion control applications. Interbus: Will also have an RTE extension, which will be identical to PROFInet. Still, it will be listed as a separate profile. Apart from these approaches that merely extend well-known fieldbus systems, there is a multitude of new concepts collected in IEC 61784-2 (Table 7.7), not all of which were known in detail at the time of this writing: VNET/IP: Developed by Yokogawa. The real-time extension of this protocol is called RTP (Real-Time and Reliable Datagram Protocol). Like many others, it uses UDP as a transport layer. Characteristic for the approach are an optimized IP stack (with respect to processing times) and a concept for redundant network connections. TCnet: A proposal from Toshiba. Here, the real-time extension is positioned in the medium access control (MAC) layer. Also, a dual redundant network connection is proposed, based on shared Ethernet. EtherCAT: Defined by Beckhoff and supported by the Ethercat Technology Group (ETG), EtherCAT uses the Ethernet frames and sends them in a special ring topology [66]. Every station in the net removes and adds its information. This information may be special input/output data or standard TCP/IP frames. To realize such a device, a special ASIC is needed for medium access that basically integrates a two-port switch into the actual device. The performance of this system is very good: it may reach cycle times of 30 µs. Powerlink: Defined by B&R and now supported by the Ethernet Powerlink Standardization Group (EPSG). It is based on the principle of using a master–slave scheduling system on top of a regular shared Ethernet segment [67]. The master ensures the real-time access to the cyclic data and lets standard TCP/IP frames pass through only in specific time slots. To connect several segments, a synchronization based on IEEE 1588 is used. This solution is the only product available on the market that already fulfills the class 3 requirements today. In the future, the CANopen drive profiles will be supported. EPA (Ethernet for Process Automation) protocol: A Chinese proposal. It is a distributed approach to realize deterministic communication based on a time-slicing mechanism.
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-25
Modbus/TCP: Defined by Schneider Electric and supported by Modbus-IDA,* Modbus/TCP uses the well-known Modbus over a TCP/IP network. This is probably the most widely used Ethernet solution in industrial applications today and fulfills the class 1 requirements without problems. Modbus/TCP was — contrary to all other fieldbus protocols — submitted to Internet Engineering Task Force (IETF) for standardization as an RFC (request for comments) [68]. The real-time extensions use the Real-Time Publisher–Subscriber (RTPS) protocol, which runs on top of UDP. Originally outside the IEC SC65C was SERCOS, well known for its optical ring interface used in drive control applications. SERCOS III, also an Ethernet-based solution, is under development [69]. The ring structure is kept and the framing replaced by Ethernet frames to allow easy mixture of real-time data with TCP/IP frames. In every device a special software or, for higher performance, an application-specific integrated circuit will be needed that separates the real-time time slot from the TCP/IP time slot with a switch function. Recently, cooperation between the committee working on SERCOS and SC65C has been established to integrate SERCOS in the RTE standard. The recent activities of IEC SC65C show that there is a substantial interest, especially from industry, in the standardization of real-time Ethernet. This situation closely resembles fieldbus standardization at the beginning of the 1990s, which ultimately led to the fieldbus wars. Given the comparable initial situation, will history repeat itself? Most likely not, because the structure of the intended standard documents already anticipates a multipart solution. So, the compromise that in former days needed so long to be found is already foreseen this time. Furthermore, the big automation vendors have learned their lessons to allow them to avoid time- and resource-consuming struggles that eventually end up in compromises anyway. Finally, the IEC itself cannot afford a new standardization war that would damage its image. Hence, all parties involved should have sufficient interest for the standardization process to be smooth and fast without too much noise inside the committees. Another evidence for this attitude is that the CENELEC committee TC65CX explicitly decided not to carry out standardization on the European level, but to wait for the outcome of the IEC work. The final standard is expected in 2007.
7.7 Aspects for Future Evolution Even though fieldbus systems have reached a mature state, applications have become more demanding, which in turn creates new problems. Much work is still being done to improve the fieldbus itself, in particular concerning transmission speed and the large area of real-time capabilities [46, 70]. Another subject receiving considerable attention is the extension of fieldbuses to wireless physical layers [71, 72]. Apart from such low-level aspects, other problems are lurking on the system and application levels.
7.7.1 Driving Forces Historically, the most important driving forces behind the development of fieldbus systems were the reduction of cabling and the desire to integrate more intelligence into the field devices. At least in Europe, the general need for automation, of which fieldbus systems are an integral part, also had a socioeconomic reason. Raising production costs due to comparatively high wages required a higher degree of automation to stay competitive in an increasingly globalized market. The enabling technology for automation was, of course, microelectronics. Without the availability of highly integrated controllers, the development of fieldbus systems would have never been possible. Today’s driving forces for further evolution mainly come from the application fields that will be reviewed. Nevertheless, there are also technology push factors that promote the application of new *IDA, Interface for Distributed Automation, a consortium that originally worked on an independent solution, but finally merged with Modibus.
© 2005 by CRC Press
7-26
The Industrial Communication Technology Handbook
technologies, mainly at the lower layers of communication (e.g., Ethernet). It must not be overlooked, however, that these factors are to a certain extent marketing driven and aim at the development of new market segments or the redistribution of already existing ones. One important factor is what has recently become known as vertical integration. It concerns the possibly seamless interconnection between the traditional fieldbus islands and higher-level networks. The driving force behind this development is that people have become used to the possibility of accessing any information at any time over the Internet. Computer networks in the office area have reached a high level of maturity. Moreover, they are (quasi) standards that permitted worldwide interconnectivity and — even more important — easy access and use for nonspecialists. Hence, it is not astonishing that the anytime–anywhere concept is also extended to fieldbuses and automation systems in general. A common solution today is to have the coexistence of real-time fieldbus traffic and not time-critical tasks like configuration and parameterization based on, e.g., user-friendly Web-based services on the same communication medium. This becomes possible by the use of embedded Web servers in the field devices and the tunneling of TCP/IP over the fieldbus. Other approaches employ gateways to translate between the two worlds. In the near future, the increased use of Ethernet on the field level is supposed to further alleviate network integration, even though it will not be able to solve all problems. Another driving force for the development of new concepts comes from the area of building automation. Although networks in this field emerged relatively late compared with industrial automation, the benefits are evident: the operating costs of a building can be reduced dramatically, if information about the status of the building is available for control purposes. This concerns primarily the energy consumption, but also service and maintenance costs. Energy control is a particularly interesting topic. Provided electrical appliances are interconnected via a fieldbus, they can adjust their energy consumption so as to balance the overall load [73, 74]. This demand-side management avoids peak loads, which in turn is honored by the utility companies with lower energy prices. Even more important will be the combination of fieldbuses in buildings (and also private homes) with Internet connections. This is a particular aspect of vertical integration and opens a window for entirely new services [75]. External companies could offer monitoring and surveillance services for private houses while the owners are on vacation. Currently, such services already exist, but are limited to company customers (mostly within the context of facility management). A very important topic for utility companies in many countries is remote access to energy meters [76]. Having an appropriate communication link, they can more precisely and with finer granularity monitor the actual energy consumption of their customers, detect possible losses in the network, and better adapt their own productions and distributions. As a side benefit, billing can be automated and tariffs can be made more flexible when load profiles can be recorded. Eventually, if the energy meters support prepayment, even billing is no longer necessary. An application field that is becoming increasingly relevant for networks is safety-relevant systems. As this domain is subject to very stringent normative regulations, and thus very conservative, it was dominated for a long time (and still is) by point-to-point connections between devices. The first bus system to penetrate this field was the CAN-based safety bus [77]. It took a long time and much effort for this system to pass the costly certification procedures. Nevertheless, it was finally accepted by the users, which was by no means obvious in an area concerned with the protection of human life, given that computer networks usually have the psychological disadvantage of being considered unreliable. After this pioneering work, other approaches like the ProfiSafe profile [78], Interbus safety [79], ASi safety [80], and recently EtherNet/IP safety [81] and WorldFIP [82] readily followed. The next big step is just ahead in car manufacturing, where in-vehicle networks in general and x-by-wire technology in particular will become determining factors [83]. Here, safety is of even more obvious relevance, and the latest developments of fieldbus systems for automotive use clearly address this issue. In the current Industrial Ethernet standardization process, safety considerations also play an important role. Microelectronics will continue to be the primary enabling technology for automation networks. Increasing miniaturization and the possibility to integrate more and more computing power while at the same reducing energy consumption will be the prerequisite for further evolution. Today, system-on-a-
© 2005 by CRC Press
7-27
Fieldbus Systems: History and Evolution
3–6 nodes physical layer error control
6–12 nodes networks software tools
~ 1985
~ 1990
up to 20,000 nodes Profiles Plug & Play Internet
~ 2000
up to 1,000,000 nodes complex systems agent-based approaches
~ 2015
FIGURE 7.7 With increasing complexity of fieldbus installations, the important topics in research and practice change.
chip (SoC) integration of a complete industrial PC with Ethernet controller, on-chip memory, and a complete IP stack as firmware is available. Of course, the computing resources of such integrated solutions cannot be compared with high-end PCs, but they are sufficient for smart and low-cost sensors and actuators. This evolution is, on the one hand, the foundation of the current boom of Ethernet in automation. On the other hand, it will stimulate more research in the emerging field of sensor networks [84]. Currently most of the effort in this area is being put into wireless networking approaches, but it can be expected that work on other aspects will gain importance in the future. From an application point of view, other emerging fields like ubiquitous computing [85] or concepts inspired by bionics [86] will also rely on low-level networking as an essential technological cornerstone.
7.7.2 System Complexity If we consider the evolution of fieldbus systems, we observe a very interesting aspect. Until the mid-1990s, the developers of fieldbus systems concentrated on the definition of efficient protocols. Since the computing resources in the field devices were limited and the developers did not expect fieldbuses to have a complex network structure, most protocols only use the lower two or three layers and the top layer of the OSI/ISO model. In those days, typical applications in industrial automation had only about six nodes on average, so the assumption of not-so-complex structures was justified. With the availability of more fieldbus devices and a growing acceptance of the technology, the number of nodes in a typical installation has also increased. A decade ago, the average application in industrial automation had 6 to 12 nodes. With time, however, it turned out that the main costs of fieldbus systems were determined not so much by the development of the nodes, but rather by the maintenance of the node software, as well as the software tools necessary to integrate and configure the network. Actually, the development of a fieldbus system means much more than just designing a clever protocol and implementing a few nodes — an aspect that was often underrated in the past. More important for the success of a fieldbus is the fact that a user-friendly configuration and operating environment is available. This was, by the way, a strong argument in favor of open systems, where the development of field devices and software tools can be accomplished by different companies. For proprietary systems, by contrast, the inventor must supply both devices and software, which is likely to overstrain a single company. Today, the number of nodes per installation is increasing dramatically. The enormous numbers shown in Figure 7.7 are of course not found in industrial automation, but in the area of building automation, where installations with 20,000 or more nodes are nowadays feasible. This evolution goes hand in hand with the advances of sensor networks in general. If we extrapolate the experience from other fields of computer technology, we can try to sketch the future evolution: the prices of the nodes will fall, and at the same time the performance will increase, allowing for the integration of more and more intelligence into the individual node. This way, we can have complex networks with up to 1 million nodes working together. Such complex systems will be the challenge for the next decades. It is evident that applications in such systems must be structured differently from today’s approaches. What is required is a true distribution of the application. A promising concept is holonic systems that have been thoroughly investigated in manufacturing systems [87, 88]. A holonic system consists of
© 2005 by CRC Press
7-28
The Industrial Communication Technology Handbook
distributed, autonomous units (holons) that cooperate to reach a global goal. In artificial intelligence, the same concept is better known as a multiagent system. Such agents could be an interesting way to cope with complex systems [89, 90]. The main problem, however, will be to provide tools that can support the user in creating the distributed application. A problem directly connected with system complexity is installation and configuration support through some plug-and-play capability. The ultimate meaning here is that new nodes can be attached to an existing network and integrate themselves without further input from the user. Realistically, this will remain only an appealing vision, as the user will always have to define at least the semantics of the information flow (i.e., in the trivial case of building automation, which switch is associated with which lamp), but nodes will have to be much more supportive than they are today. To date, the concepts for plug and play or at least plug and participate are at a very early stage. There are exemplary solutions for the automatic configuration of Profibus-DP devices [91] based on a manager–agent model inspired by management protocols like SNMP or the management framework of the ISO/OSI model. Here, a manager controls the status of the fieldbus and initiates the start-up and commissioning of the system in cooperation with the agents on the individual devices. The necessary data are kept in a (distributed) management information base (MIB). Service broker approaches, such as Jini [92], could also be a suitable approach to tackle the problem of plug and play. The goal of Jini is to make distributed resources in a client–server network accessible. The term resource has a very abstract meaning and is composed of both hardware and software. To locate the resources in the network, services offered, as well as service requests, are published by the nodes and matched by the service broker [93]. A problem of Jini is that it builds on the relatively complex programming language Java. Hence, all Jini-enabled devices need to have a Java Virtual Machine as an interpreter, which is rather computing intensive. Jini is well developed today; however, hardware support still does not exist, and the breakthrough in smart devices as originally intended is not in sight. Competing approaches like Universal Plug and Play (UPnP) are catching up, but it is also questionable whether they will be suitable for complex systems.
7.7.3 Software Tools and Management The fieldbus as a simple means to communication is only one part of an automation system. Today, it is the part that is best understood and developed. What becomes increasingly a problem, especially with increasing complexity, is the support through software tools. Historically, such tools are provided by the fieldbus vendors or system integrators and are as diverse as the fieldbuses themselves. Moreover, there are different (and often inconsistent) tool sets for different aspects of the life cycle of a plant, like planning, configuration, commissioning, testing, and diagnosis or maintenance. Such tools typically only support a topological view on the installation, whereas modern complex systems would require rather functionality-oriented, abstract views. A major disadvantage of the tool variety is that they operate in many cases on incompatible databases, which hampers system integration and is likely to produce consistency problems. More advanced concepts build on unified data sets that present consistent views to the individual tools with well-defined interfaces [94, 95]. The data structures are nevertheless still specific for each fieldbus. Unification of the data representations is one of the goals of NOAH [50]. For fieldbus-independent access to the field devices and their data (not necessarily covering the entire life cycle), several solutions have been proposed. They mostly rely on a sort of middleware abstraction layer using object-oriented models. Examples are OPC (OLE for Process Control) [96], Java, and other concepts [97]. Such platforms can ultimately be extended through definition of suitable application frameworks that permit the embedding of generic or proprietary software components in a unified environment spanning all phases of the life cycle. Relevant approaches are, e.g., Open Control [95], Field Device Tool [98], and a universal framework of the ISO [99]. Beyond pure communication management, in the application domain, essential aspects of engineering and management are also not yet universally solved. The ample computing resources of modern field devices, however, allow the introduction of new and largely fieldbus-independent concepts for the modeling of applications. A promising development are function blocks, standardized in IEC 61499
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-29
[100]. Historically evolved as an extension to the PLC programming standard IEC 61131, they can be used to create a functional view (rather than a topological one) on distributed applications. The function block concept integrates the models known from PLCs in factory automation, as well as typical functions from process automation that are in many fieldbuses available as proprietary implementations. With its universal approach, it is also a good option for the implementation of fieldbus profiles. In the context of management and operation frameworks, the unified description of device and system properties becomes of eminent importance. To this end, device description languages were introduced. The descriptions of the fieldbus components are mostly developed by the device manufacturer and are integral parts of the products. Alternatively, they are contained in libraries where they can be downloaded and parsed for further use. Over the years, several mutually incompatible languages and dialects were developed [101, 102]. This is not surprising, as device descriptions are the basis for effective installation and configuration support. Thus, they are a necessary condition for the already discussed plug-and-play concepts. In recent years, the diversity of description languages is being addressed by the increased usage of universal languages like XML [103, 104], which is also the basis for the electronic device description language (EDDL) standardized in IEC 61804 [105, 106].
7.7.4 Network Interconnection and Security Security has never been a real issue in conventional fieldbus systems. This is understandable in so far as fieldbuses were originally conceived as closed, isolated systems, which raised no need for security concepts. In building automation, where networks are naturally larger and more complex, the situation is different, and at least rudimentary security mechanisms are supported [107]. In factory and process automation, things changed with the introduction of vertical integration and the interconnection of fieldbuses and office-type networks. In such an environment, security is an essential topic on all network levels. Given the lack of appropriate features on the fieldbus level, the development and application of security concepts is typically confined to the actual network interconnection [108, 109]. One important aspect is that popular firewalls are not sufficient to guarantee security. Likewise, encryption is no cure-all, albeit an important element of secure systems. To reach a meaningful security level, a thorough risk analysis is the first step. On this basis, a security strategy needs to be developed detailing all required measures, most of which are organizational in nature. In practice, one will face two major problems: (1) the additional computational effort for security functions on the field devices (e.g., for cryptographic function), which may contradict real-time demands; and (2) the logistical problem of distributing and managing the keys whose secrecy forms the basis of every security policy. Both problems can — to a certain extent — be tackled with the introduction of security tokens such as smart cards [107]. With the introduction of Ethernet in automation, a reconsideration of field-level security is also possible. This is facilitated by the fact that many Industrial Ethernet solutions use IP and the Internet transport protocols UDP and TCP on top of Ethernet, which means that standard security protocols like Transport Layer Security (TLS) [110] can be used. One should recognize, however, that there are other approaches that use proprietary protocols above Ethernet, and that Ethernet per se is not the layer where security features can be reasonably implemented. The fact that automation networks do not have security features up to now is also reflected in the recent standardization work of IEC SC65C WG13. Unlike other working groups, where the aim of the members is to get concrete proposals of established systems into the standards, no ready-to-use proposals exist. Apart from general considerations, the work has to be started largely from scratch. There is, however, related work in other fields that is being considered: • IEC 61508: Functional safety of electrical/electronic/programmable electronic safety-related systems, maintained by IEC/SC 65A. Functional safety is in principle covered by the work of WG 12, but the common understanding is that safety-related systems necessarily have security aspects. • Work being done in IEC TC57/WG15: Power systems management and associated information exchange/data and communication security. • ISO/IEC 17799: Code of Practice for Information Security Management.
© 2005 by CRC Press
7-30
The Industrial Communication Technology Handbook
• ISO/IEC 15408: Common Criteria for IT Security Evaluation. • ISA SP99: Manufacturing and Control Systems Security. It can be expected that this U.S. activity will have significant influence on the WG 13 work. • AGA/GTI 12: Cryptographic Protection of SCADA Communications. • NIST PCSRF: Process Control Security Requirements Forum.
7.8 Conclusion and Outlook Fieldbus systems have come a long way from the very first attempts of industrial networking to contemporary highly specialized automation networks. What is currently at hand — even after the selection process during the last decade — nearly fully covers the complete spectrum of possible applications. Nevertheless, there is enough evolution potential left [70, 86]. On the technological side, the communication medium itself allows further innovations. Up to now, the focus has been on wired links, twisted pair being the dominant solution. Optical media have been used comparatively early for large distances and electromagnetically disturbed environments. Recently, plastic optical fibers have reached a status of maturity that allows longer cable lengths and smaller prices. Another option, especially for building automation, is the use of electrical power distribution lines. This possibility, although tempting in principle, is still impaired by bad communication characteristics of the medium. Substantial research effort will be needed to overcome these limitations, which in fact comes down to a massive use of digital signal processing. The most promising research field for technological evolution is the wireless domain. The benefits are obvious: no failure-prone and costly cabling and high flexibility, even mobility. The problems, on the other hand, are also obvious: very peculiar properties of the wireless communication channel must be dealt with, such as attenuation, fading, multipath reception, temporarily hidden nodes, and the simple access for intruders [71]. Wireless communication options do exist today for several fieldbuses [72]. Up to now, they have been used just to replace the conventional data cable. A really efficient use of wireless communication, however, would necessitate an entire redefinition of at least the lower fieldbus protocol layers. Evaluation of currently available wireless technologies from the computer world with respect to their applicability in automation is a first step in this direction. Ultimately we can expect completely new automation networks optimized for wireless communication, where maybe only the application layer protocol remains compatible with traditional wired solutions to achieve integration. Apart from mere technological issues, the currently largest trend is the integration of fieldbus systems in higher-level, heterogeneous networks and process control systems. Internet technologies play a particularly prominent role here, and the penetration of the field level by optimized Ethernet solutions creates additional momentum. The ultimate goal is a simplification and possibly harmonization of fieldbus operation. For the fieldbus itself, this entails increasing complexity in the higher protocol levels. At the same time, more and more field-level applications employ standard PC-based environments and operating systems like Windows or Linux [111]. These two trends together result in a completely new structure of the automation hierarchy. The old multilevel pyramid finally turns into a rather flat structure with two, maybe three levels, as shown in Figure 7.8. Here, functions of the traditional middle layers (like process and cell levels) are transferred into the intelligent field devices (and thus distributed) or into the management level. The traditional levels may persist in the organizational structure of the company, but not in the technical infrastructure. Does all this mean we have reached the end of the fieldbus era? The old CIM pyramid, which was a starting point for the goal-oriented development of fieldbus systems, ceases to exist, and Ethernet is determined to reach down into the field level. This may indeed be the end of the road for the traditional fieldbus as we know it, but certainly not for networking in automation. What we are likely to see in the future are Ethernet- and Internet-based concepts at all levels, probably optimized to meet special performance requirements on the field level but still compatible with the standards in the management area. Below, very close to the technical process, there will be room for highly specialized sensor–actuator networks — new fieldbus systems tailored to meet the demands of high flexibility, energy optimization,
© 2005 by CRC Press
7-31
Fieldbus Systems: History and Evolution
Management Marketing, Planning
Data Server Business data
Process Information Level
Statistics
Ethernet
Company network (backbone)
Quality control PC Control Parameters
Ethernet, (Fieldbus)
Process data management PLC
Process Control Level
Visualization
Fieldbus, (Ethernet) Measurement technology, sensors, actuators, controllers
FIGURE 7.8 Flattened, two-level automation hierarchy.
small-footprint implementation, or wireless communication. The next evolution step in fieldbus history is just ahead.
Acknowledgments The author thanks Dietmar Dietrich, Kurt Milian, Eckehardt Klemm, Peter Neumann, and Jean-Pierre Thomesse for the extensive discussions, especially about the historical aspects of fieldbus systems.
References [1] International Electrotechnical Commission, IEC 61158, Digital Data Communications for Measurement and Control: Fieldbus for Use in Industrial Control Systems, 2003. [2] Fieldbus Foundation, What Is Fieldbus? http://www.fieldbus.org/About/FoundationTech/. [3] G.G. Wood, Fieldbus status 1995, IEE Computing and Control Engineering Journal, 6, 251–253, 1995. [4] G.G. Wood, Survey of LANs and standards, Computer Standards and Interfaces, 6, 27–36, 1987. [5] N.P. Mahalik (Ed.), Fieldbus Technology: Industrial Network Standards for Real-Time Distributed Control, Spinger, Heidelberg, 2003. [6] H. Töpfer, W. Kriesel, Zur funktionellen und strukturellen Weiterentwicklung der Automatisierungsanlagentechnik, Messen Steuern Regeln, 24, 183–188, 1981. [7] T. Pfeifer, K.-U. Heiler, Ziele und Anwendungen von Feldbussystemen, Automatisierungstechnische Praxis, 29, 549–557, 1987. [8] H. Steusloff, Zielsetzungen und Lösungsansätze für eine offene Kommunikation in der Feldebene, Automatisierungstechnik, 855, 337–357, 1990. [9] L. Capetta, A. Mella, F. Russo, Intelligent field devices: user expectations, IEE Coll. on Fieldbus Devices: A Changing Future, 6/1–6/4, 1994.
© 2005 by CRC Press
7-32
The Industrial Communication Technology Handbook
[10] K. Wanser, Entwicklungen der Feldinstallation und ihre Beurteilung, Automatisierungstechnische Praxis, 27, 237–240, 1985. [11] J.A.H. Pfleger, Anforderungen an Feldmultiplexer, Automatisierungstechnische Praxis, 29, 205–209, 1987. [12] H. Junginger, H. Wehlan, Der Feldmultiplexer aus Anwendersicht, Automatisierungstechnische Praxis, 31, 557–564, 1989. [13] W. Schmieder, T. Tauchnitz, FuRIOS: fieldbus and remote I/O: a system comparison, Automatisierungstechnische Praxis, 44, 61–70, 2002. [14] P. Pleinevaux, J.-D. Decotignie, Time critical communication networks: field buses, IEEE Network, 2, 55–63, 1988. [15] E.H. Higham, Casting a crystal ball on the future of process instrumentation and process measurements, in IEEE Instrumentation and Measurement Technology Conference (IMTC ’92), New York, May 1992, pp. 687–691. [16] J.P. Thomesse, Fieldbuses and interoperability, Control Engineering Practice, 7, 81–94, 1999. [17] J.-C. Orsini, Field Bus: A User Approach, Cahier Technique Schneider Electric 197, 2000, http:// www.schneider-electric.com.tr/ftp/literature/publications/ECT197.pdf. [18] R.D. Quick, S.L. Harper, HP-IL: A Low-Cost Digital Interface for Portable Applications, HewlettPackard Journal, January 1983, pp. 3–10. [19] Philips Semiconductor, The I2C-Bus Specification, 2000, http://www.semiconductors.philips.com/ buses/i2c/. [20] H. Zimmermann, OSI reference model: the ISO model of architecture for open system interconnection, IEEE Transactions on Communications, 28, 425–432, 1980. [21] J. Day, H. Zimmermann, The OSI reference model, Proceedings of the IEEE, 71, 1334–1340, 1983. [22] D.J. Damsker, Asessment of industrial data network standards, IEEE Trans. Energy Conversion, 3, 199–204, 1988. [23] H.A. Schutz, The role of MAP in factory integration, IEEE Transactions on Industrial Electronics, 35, 6–12, 1988. [24] B. Armitage, G. Dunlop, D. Hutchison, S. Yu, Fieldbus: an emerging communications standard, Microprocessors and Microsystems, 12, 555–562, 1988. [25] S.G. Shanmugham, T.G. Beaumariage, C.A. Roberts, D.A. Rollier, Manufacturing communication: the MMS approach, Computers and Industrial Engineering, 28, 1–21, 1995. [26] T. Phinney, P. Brett, D. McGovan, Y. Kumeda, FieldBus: real-time comes to OSI, in International Phoenix Conference on Computers and Communications, March 1991, pp. 594–599. [27] K. Bender, Offene Kommunikation: Nutzen, Chancen, Perspektiven für die industrielle Kommunikation, in iNet ’92, 1992, pp. 15–37. [28] T. Sauter and M. Felser, The importance of being competent: the role of competence centres in the fieldbus world, in FeT ’99 Fieldbus Technology, Magdeburg, Germany, September 1999, pp. 299–306. [29] Gesmer Updegrove LLP, Government Issues and Policy, http://www.consortiuminfo.org/ government/. [30] M.A. Smith, Vienna Agreement on Technical Cooperation between ISO and CEN, paper presented at ISO/IEC Directives Seminar, Geneva, June 1995, isotc.iso.ch/livelink/livelink/fetch/2000/2123/ SDS_WEB/sds_dms/vienna.pdf. [31] International Electrotechnical Commission, IEC-CENELEC Agreement, http://www.iec.ch/about/ partners/agreements/cenelec-e.htm. [32] E. Klemm, Der Weg durch die Gremien zur internationalen Feldbusnorm, paper presented at VDE Seminar Die neue, internationale Feldbusnorm: Vorteile, Erfahrungen, Beispiele, Zukunft, November 2002, Mannheim. [33] Instrument Society of America Standards and Practices 50, Draft Functional Guidelines, March 10, 1987, document ISA-SP50-1986-17-D.
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-33
[34] G.G. Wood, Current fieldbus activities, Computer Communications, 11, 118–123, 1988. [35] C. Gilson, Digital Data Communications for Industrial Control Systems or How IEC 61158 (Just) Caught the Bus, paper presented at IEC E-TECH, March 2004, http://www.iec.ch/online_news/ etech/arch_2004/etech_0304/focus.htm#fieldbus. [36] P. Leviti, IEC 61158: an offence to technicians? in IFAC International Conference on Fieldbus Systems and Their Applications, FeT 2001, Nancy, France, November 15–16, 2001, p. 36. [37] T. Phinney, Mopping up from bus wars, World Bus Journal, 22–23, December 2001. [38] H. Engel, Feldbus-Normung 1990, Automatisierungstechnische Praxis, 32, 271–277, 1990. [39] H. Wölfel, Die Entwicklung der digitalen Prozebleittechnik: Ein Rückblick (Teil 4), Automatisierungstechnische Praxis, 40, S25–S28, 1998. [40] J. Rathje, The fieldbus between dream and reality, Automatisierungstechnische Praxis, 39, 52–57, 1997. [41] G.H. Gürtler, Fieldbus standardization, the European approach and experiences, in Feldbustechnik in Forschung, Entwicklung und Anwendung, Springer, Heidelberg, 1997, pp. 2–11. [42] S. Bury, Are you on the right bus? Advanced Manufacturing, 1, 26–30, 1999, http://www.advanced manufacturing.com/October99/fieldbus.htm. [43] G.G. Wood, State of play, IEE Review, 46, 26–28, 2000. [44] International Electrotechnical Commission, IEC 61784-1, Digital Data Communications for Measurement and Control: Part 1: Profile Sets for Continuous and Discrete Manufacturing Relative to Fieldbus Use in Industrial Control Systems, 2003. [45] International Electrotechnical Commission, IEC 61158-1, Digital Data Communications for Measurement and Control: Fieldbus for Use in Industrial Control Systems: Part 1: Introduction, 2003. [46] J.-P. Thomesse, M. Leon Chavez, Main paradigms as a basis for current fieldbus concepts, in Fieldbus Technology, Springer, Heidelberg, 1999, pp. 2–15. [47] C. Diedrich, Profiles for Fieldbuses: Scope and Description Technologies, in Fieldbus Technology, Springer, Heidelberg, 1999, pp. 90–97. [48] U. Döbrich, P. Noury, ESPRIT Project NOAH: Introduction, in Fieldbus Technology, Springer, Heidelberg, 1999, pp. 414–422. [49] R. Simon, P. Neumann, C. Diedrich, M. Riedl, Field devices-models and their realisations, in IEEE International Conference on Industrial Technology (ICIT ’02), Bangkok, December 2002, pp. 307–312. [50] A. di Stefano, L. Lo Bello, T. Bangemann, Harmonized and consistent data management in distributed automation systems: the NOAH approach, in IEEE International Symposium on Industrial Electronics, ISIE 2000, Cholula, Mexico, December 2000, pp. 766–771. [51] M. Knizak, M. Kunes, M. Manninger, T. Sauter, Applying Internet management standards to fieldbus systems, in WFCS ’97, Barcelona, October 1997, pp. 309–315. [52] M. Kunes, T. Sauter, Fieldbus-Internet connectivity: the SNMP approach, IEEE Transactions on Industrial Electronics, 48, 1248–1256, 2001. [53] M. Wollschlaeger, Integration of VIGO into Directory Services, paper presented at 6th International P-NET Conference, Vienna, May 1999. [54] M. Wollschlaeger, Framework for Web integration of factory communication systems, in IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Antibes JuanLes-Pins, France, October 2001, pp. 261–265. [55] J.D. Decotignie, A perspective on Ethernet-TCP/IP as a fieldbus, in IFAC International Conference on Fieldbus Systems and Their Applications, FeT 2001, Nancy, France, November 15–16, 2001, pp. 138–143. [56] E. Byres, Ethernet to Link Automation Hierarchy, InTech Magazine, June 1999, pp. 44–47. [57] M. Felser, Ethernet TCP/IP in automation, a short introduction to real-time requirements, in Conference on Emerging Technologies and Factory Automation, ETFA 2001, Antibes Juan-Les-Pins, France, October 15–18, 2001, pp. 501–504.
© 2005 by CRC Press
7-34
The Industrial Communication Technology Handbook
[58] V. Schiffer, The CIP family of fieldbus protocols and its newest member: EtherNet/IP, in Conference on Emerging Technologies and Factory Automation, ETFA 2001, Antibes Juan-Les-Pins, France, October 15–18, 2001, pp. 377–384. [59] M. Volz, Quo Vadis Layer 7? The Industrial Ethernet Book, no. 5, Spring 2001. [60] K.C. Lee, S. Lee, Performance evaluation of switched Ethernet for real-time industrial communications, Computer Standards and Interfaces, 24, 411–423, 2002. [61] TC65/SC65C, New work item proposal, 65C/306/NP, 2003. [62] IEEE 1588, Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems, 2002. [63] TC65/SC65C, Meeting minutes, 65C/318/INF, 2003. [64] A. Boller, Profinet V3: bringing hard real-time and the IT world together, Control Engineering Europe, September 2003, http://www.manufacturing.net/ctl/article/CA318939. [65] R. Höller, G. Gridling, M. Horauer, N. Kerö, U. Schmid, K. Schossmaier, SynUTC: high precision time synchronization over Ethernet networks, in 8th Workshop on Electronics for LHC Experiments (LECC), Colmar, France, September 9–13, 2002, pp. 428–432. [66] http://www.ethercat.org/. [67] http://www.ethernet-powerlink.com/. [68] Schneider Automation, Modbus Messaging on TCP/IP Implementation Guide, May 2002, http:// www.modbus.org/. [69] E. Schemm, SERCOS to link with ethernet for its third generation, IEE Computing and Control Engineering Journal, 15, 30–33, 2004. [70] J.-D. Decotignie, Some future directions in fieldbus research and development, in Fieldbus Technology, Springer, Heidelberg, 1999, pp. 308–312. [71] L. Rauchhaupt, J. Hähniche, Opportunities and problems of wireless fieldbus extensions, in Fieldbus Technology, Springer, Heidelberg, 1999, pp. 308–312. [72] L. Rauchhaupt, System and device architecture of a radio based fieldbus: the RFieldbus system, in IEEE Workshop on Factory Communication Systems, Västerås, Sweden, 2002, pp. 185–192. [73] P. Palensky, Distributed Reactive Energy Management, Ph.D. thesis, Vienna University of Technology, Austria, 2001. [74] G. Gaderer, T. Sauter, Ch. Eckel, What it takes to make a refrigerator smart: a case study, in IFAC International Conference on Fieldbus Systems and Their Applications (FeT), Aveiro, Portugal, July 2003, pp. 85–92. [75] L. Haddon, Home Automation: Research Issues, paper presented at EMTEL Workshop: The European Telecom User, Amsterdam, November 10–11, 1995. [76] M. Lobashov, G. Pratl, T. Sauter, Implications of power-line communication on distributed data acquisition and control systems, in IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Lisboa, Portugal, September 2003, pp. 607–613. [77] R. Piggin, An introduction to safety-related networking, IEE Computing and Control Engineering Journal, 15, 34–39, 2004. [78] PROFIBUS International, Profile for Failsafe with PROFIBUS, DP-Profile for Safety Applications, Version 1.2, October 2002, http://www.profibus.com. [79] INTERBUS Club, INTERBUS Safety, White Paper, 2003. [80] http://as-i-safety.net. [81] ODVA, Safety Networks: Increase Productivity, Reduce Work-Related Accidents and Save Money, Open DeviceNet Vendor Assoc., White Paper, 2003, http://www.odva.org. [82] J.-P. Froidevaux, O. Nick, M. Suzan, Use of fieldbus in safety related systems, an evaluation of WorldFIP according to proven-in-use concept of IEC 61508, WorldFIP News, http://www. worldfip.org. [83] G. Leen, D. Heffernan, Expanding automotive electronic systems, IEEE Computer, 35, 88–93, 2002. [84] H. Gharavi, S.P. Kumar (Eds.), Special issue on sensor networks and applications, Proceedings of the IEEE, 91, 2003.
© 2005 by CRC Press
Fieldbus Systems: History and Evolution
7-35
[85] G. Borriello, Key challenges in communication for ubiquitous computing, IEEE Communications Magazine, 40, 16–18, 2002. [86] D. Dietrich, T. Sauter, Evolution potentials for fieldbus systems, in Proceedings of the 3rd IEEE International Workshop on Factory Communication Systems, Porto, 2000, pp. 343–350. [87] A. Koestler, The Ghost in the Machine, Arkana Books, London, 1967. [88] F. Pichler, On the construction of A. Koestler’s holarchical networks, in Cybernetics and Systems 2000, Austrian Society for Cybernetic Systems, Vienna, 2000. [89] P. Palensky, The convergence of intelligent software agents and field area networks, in 1999 IEEE Conference on Emerging Technologies and Factory Automation, Barcelona, 1999, pp. 917–922. [90] T. Wagner, An agent-oriented approach to industrial automation systems, in Agent Technologies, Infrastructures, R. Kowalczyk et al. (Eds.), Springer-Verlag, Berlin, 2003, pp. 314–328. [91] A. Pöschmann, P. Krogel, Autoconfiguration Management für Feldbusse: PROFIBUS Plug & Play, Elektrotechnik und Informationstechnik, 117, 5, 2000. [92] W. Kastner, M. Leupold, How dynamic networks work: a short tutorial on spontaneous networks, in IEEE Conference on Emerging Technologies and Factory Automation, ETFA 2001, Antibes JuanLes-Pins, France, October 15–18, 2001, pp. 295–303. [93] S. Deter, Plug and participate for limited devices in the field of industrial automation, in IEEE Conference on Emerging Technologies and Factory Automation, ETFA 2001, Antibes Juan-Les-Pins, France, October 15–18, 2001, pp. 263–268. [94] O. Cramer Nielsen, A real time, object oriented fieldbus management system, in 3rd IEEE International Workshop on Factory Communication Systems, Porto, 2000, pp. 335–340. [95] A. Baginski, G. Covarrubias, Open control: the standard for PC-based automation technology, in IEEE International Workshop on Factory Communication Systems, October 1997, pp. 329–333. [96] OPC Data Access Automation Specification, Version 2.0, OPC Foundation, October 14, 1998. [97] R. Bachmann, M.S. Hoang, P. Rieger, Component-based architecture for integrating fieldbus systems into distributed control applications, in Fieldbus Technology, Springer-Verlag, Heidelberg, 1999, pp. 276–283. [98] R. Simon, M. Riedl, C. Diedrich, Integration of field devices using field device tool (fdt) on the basis of electronic device descriptions (EDD), in IEEE International Symposium on Industrial Electronics, ISIE ’03, June 9–11, Rio de Janeiro, 2003, pp. 189–194. [99] W.H. Moss, Report on ISO TC184/SC5/WG5 open systems application frameworks based on ISO 11898, in 5th International CAN Conference (iCC ’98), San Jose, CA, 1998, pp. 07-02–07-04. [100] Function Blocks for Industrial-Process Measurement and Control Systems: Committee Draft, IEC TC65/WG6, ftp://ftp.cle.ab.com/stds/iec/sc65bwg7tf3/html/news.htm. [101] GSD Specification for PROFIBUS-FMS (version 1.0), PNO Karlsruhe. [102] Device Description Language specification, HART Communication Foundation, Austin, TX, 1995. [103] T. Bray, J. Paoli, C. M. Sperberg-McQueen, Extensible Markup Language (XML) 1.0, 1998, http: //www.w3.org/TR/REC-xml. [104] M. Wollschlaeger, Descriptions of fieldbus components using XML, Elektrotechnik und Informationstechnik, 117, 5, 2000. [105] International Electrotechnical Commission, IEC 61804-2, Function Blocks (FB) for Process Control: Part 2: Specification of FB Concept and Electronic Device Description Language (EDDL), 2003. [106] P. Neumann, C. Diedrich, R. Simon, Engineering of field devices using device descriptions, paper presented at IFAC World Congress 2002, Barcelona, 2002. [107] C. Schwaiger, A. Treytl, Smart card based security for fieldbus systems, in 2003 IEEE Conference on Emerging Technologies and Factory Automation, Lisbon, September 2003, pp. 398–406. [108] T. Sauter, Ch. Schwaiger, Achievement of secure Internet access to fieldbus systems, Microprocessors and Microsystems, 26, 331–339, 2002. [109] P. Palensky, T. Sauter, Security considerations for FAN-Internet connections, in IEEE International Workshop on Factory Communication Systems, Porto, September 2000, pp. 27–35.
© 2005 by CRC Press
7-36
The Industrial Communication Technology Handbook
[110] E. Rescorla, SSL and TLS, Addison-Wesley, Reading, MA, 2000. [111] W. Kastner, C. Csebits, M. Mayer, Linux in factory automation? Internet controlling of fieldbus systems! in 1999 IEEE Conference on Emerging Technologies and Factory Automation, Barcelona, 1999, pp. 27–31. [112] CAMAC, A Modular Instrumentation System for Data Handling, EUR4100e, March 1969. [113] http://www.hit.bme.hu/people/papay/edu/GPIB/tutor.htm. [114] National Instruments, GPIB Tutorial, www.raunvis.hi.is/~rol/Vefur/%E9r%20Instrupedia/CGPTUTO.PDF. [115] W. Büsing, Datenkommunikation in der Leittechnik, Automatisierungstechnische Praxis, 28, 228–237, 1986. [116] G. Färber, Bussysteme, 2nd ed., Oldenbourg-Verlag, Munich, 1987. [117] M-Bus Usergroup, The M-Bus: A Documentation, Version 4.8, November 11, 1997, http://www.mbus.com/mbusdoc/default.html. [118] G. Leen, D. Heffernan, A. Dunne, Digital networks in the automotive vehicle, IEE Computer and Control Engineering Journal, 10, 257–266, 1999. [119] CAN-in-Automation, CAN history, http://www.can-cia.de/can/protocol/history/. [120] Condor Engineering, MIL-STD-1553 tutorial, http://www.condoreng.com/support/downloads/ tutorials/MIL-STD-1553Tutorial.PDF. [121] Grid Connect, The Fieldbus Comparison Chart, http://www.synergetic.com/compare.htm. [122] Interbus Club, Interbus Basics, 2001, http://www.interbusclub.com/en/doku/pdf/interbus_ basics_en.pdf. [123] H. Kirrmann, Industrial Automation, lecture notes, EPFL, 2004, http://lamspeople.epfl.ch/ kirrmann/IA_slides.htm. [124] H. Wölfel, Die Entwicklung der digitalen Prozebleittechnik: Ein Rückblick (Teil 3), Automatisierungstechnische Praxis, 40, S17–S24, 1998. [125] T. Sauter, D. Dietrich, W. Kastner (Eds.), EIB Installation Bus System, Publicis MCD, Erlangen, Germany, 2001. [126] E.B. Driscoll, The History of X10, http://home.planet.nl/~lhendrix/x10_history.htm.
© 2005 by CRC Press
7-37
Fieldbus Systems: History and Evolution
Appendix The tables presented here give an overview of selected fieldbus systems, categorized by application domain. The list is necessarily incomplete, although care has been taken to include all approaches that either exerted a substantial influence on the evolution of the entire field or are significant still today. The year of introduction refers to the public availability of the specification or first products. This year is also the one used in the timeline in Figure 7.3. Note that despite careful research, the information obtained from various sources was frequently inconsistent, so there may be an uncertainty in the figures. Where respective data could be obtained, the start of the project has been listed as well because there are several cases where much time elapsed between the start of development of the fieldbus and its first release. TABLE 7.8
Instrumentation and PCB-Level Buses
Fieldbus
Developer (Country)
Introduced in
Standard IEEE 583 (1970, 1982, 1994) IEEE 595 (1974, 1982) IEEE 596 (1972, 1982) IEEE 758 (1979) ANSI IEEE-488 (1975, 1978) ANSI IEEE-488.2 (1987, 1992) IEC 60625 (1979,1993) —
[18]
— EN 1434-3 (1997)
[116] [117]
CAMAC
ESONE (Europe)
1969 (start of development 1966)
GPIB (HP-IB)
Hewlett-Packard (U.S.)
1974 (start of development 1965)
HP-IL
Hewlett-Packard (U.S.)
I 2C M-Bus
Philips (Netherlands) University of Paderborn, TI, Techem (Germany) Industry consortium (Germany)
1980 (start of development 1976) 1981 1992
Measurement Bus
TABLE 7.9
1988
[112]
[113, 114, 115]
DIN 66348-2 (1989) DIN 66348-3 (1996)
Automotive and Aircraft Fieldbuses
Fieldbus
Developer (Country)
Introduced in
Standard
References
— AEEC ARINC 429 (1978, 1995) ISO 11898 (1993, 1995) ISO 11519 (1994)
[118]
ABUS ARINC CAN
Volkswagen (Germany) Aeronautical Radio, Inc. (U.S.) Bosch (Germany)
Flexray J1850
DaimlerChrysler, BMW (Germany) Ford, GM, Chrysler (U.S.)
1987 1978 1986 (start of development 1983), CAL 1992 2002 1987
J1939 LIN MIL-1533
SAE (U.S.) Industry consortium SAE (military and industry consortium, U.S.)
1994 1999 1970 (start of development 1968)
VAN
Renault, PSA Peugeot-Citroen (France), ISO TC22 Ship Star Assoc., Boeing (U.S.) Vienna University of Technology (Austria)
1988
— SAE J1850 (1994, 2001) ISO 11519-4 SAE J1939 (1998) — (open spec) MIL-STD-1553 (1973) MIL-STD-1553A (1975) MIL-STD-1553B (1978) ISO 11519-3 (1994)
1997 1996
IEC 61158 (2000) —
SwiftNet TTP
References
© 2005 by CRC Press
[119]
[118] [118] [120]
[118]
[118]
7-38
The Industrial Communication Technology Handbook
TABLE 7.10 Fieldbuses for Industrial and Process Automation and Their Foundations Fieldbus ARCNET ASi
Developer (Country)
Introduced in
Hart Interbus-S
Datapoint (U.S.) Industry and university consortium (Germany) Intel (U.S.) Mitsubishi (Japan) CAN in Automation (user group, Germany) Allen-Bradley (U.S.) Allen-Bradley (U.S.) Fieldbus Foundation (industry consortium, U.S.) Rosemount (U.S.) Phoenix Contact (Germany)
MAP
General Motors (U.S.)
MMS Modbus PDV-Bus P-NET
ISO TC 184 Gould, Modicon (U.S.) Industry and university consortium (Germany) PROCES-DATA (Denmark)
1986 1979 1979 (start of development 1972) 1983
PROWAY C
IEC TC 65
Profibus
Industry and university consortium (Germany)
1986 (start of development 1975) 1989 (start of development 1984)
SDS Sercos
Honeywell (U.S.) Industry consortium (Germany) APC, Inc. (U.S.) Siemens (Germany) ISA SP 50 (U.S.)
1994 1989 (start of development 1986) 1990 1992 1993
Industry and university consortium (France)
1987 (start of development 1982)
Bitbus CC-Link CANopen ControlNet DeviceNet FF
Seriplex SINEC L2 SP50 Fieldbus (World)FIP
© 2005 by CRC Press
1977 1991 1983 1996 1995 (start of development 1993) 1996 1994 1995 (start of development 1994) 1986 1987 (start of development 1983) 1982 (start of development 1980)
Standard
References
ANSI ATA 878 (1999) EN 50295-2 (1998, 2002) IEC 62026-2 (2000) ANSI IEEE 1118 (1990) — (open spec) EN 50325-4 (2002)
[121]
EN 50170-A3 (2000) EN 50325-2 (2000) BSI DD 238 (1996) EN 50170-A1 (2000) — (open spec) DIN 19258 (1993) EN 50254-2 (1998) MAP 1.0 (1982) MAP 2.0 (1985) MAP 3.0 (1988) ISO/IEC 9506 (1988, 2000) — (open spec) DIN 19241 (1982)
[121] [119] [121]
DS 21906 (1990) EN 50170-1 (1996) ISA S72.01 (1985) IEC 60955 (1989) FMS: DIN 19245-1 and -2 (1991) DP: DIN 19245-3 (1993) PA: DIN 19245-4 (1995) FMS/DP: EN 50170-2 (1996) DP: EN 50254-3 (1998) PA: EN 50170-A2 (2000) EN 50325-3 (2000) IEC 61491 (1995) EN 61491 (1998) IEC 62026-6 (2000) — ISA SP 50 (1993) AFNOR NF C46601-7 (1989–1992) EN 50170-3 (1996) DWF: AFNOR NF C46638 (1996) DWF: EN 50254-4 (1998)
[119]
[122] [123]
[124, 115]
[14]
[119]
[121]
[16]
7-39
Fieldbus Systems: History and Evolution
TABLE 7.11 Fieldbuses for Building and Home Automation Fieldbus
Developer (Country)
Introduced in
BACnet
ASHRAE SPC135P (industry consortium, U.S.)
1991
Batibus
Industry consortium (France) Industry consortium (U.S.) Industry consortium (Europe) Industry consortium (Germany)
1987
CEBus EHS EIB
HBS
Industry consortium (Japan)
LonWorks
Echelon (U.S.)
Sigma I X10
ABB (Germany) Pico Electronics (U.K.)
© 2005 by CRC Press
1984 1987 1990
1986 (start of development 1981) 1991 1983 1978 (start of development 1975)
Standard
References
ANSI/ASHRAE 135 (1995) ENV 1805-1 (1998) ENV 13321-1 (1999) ISO 16484-5 (2003) AFNOR NF 46621-3 and -9 (1991) ENV 13154-2 (1998) ANSI EIA 600 (1992) ENV 13154-2 (1998) AFNOR NFC 46624-8 (1991) DIN V VDE 0829 (1992) ENV 13154-2 (1998) EIAJ/REEA ET2101
[125]
ANSI EIA 709 (1999) ENV 13154-2 (1998) — —
[121, 126]
[126]
8 The WorldFIP Fieldbus 8.1 8.2 8.3 8.4
Introduction ........................................................................8-1 WorldFIP Origin .................................................................8-2 Requirements.......................................................................8-2 Choices of WorldFIP...........................................................8-3 Identified Data vs. Classical Messages • Periodic and Aperiodic Traffic • Timeliness Attributes and Mechanisms for Time-Critical Systems
8.5
WorldFIP Architecture........................................................8-5
8.6
Physical Layer ......................................................................8-6
8.7
Data Link and Medium Access Control Layers ................8-7
Architecture and Standardization Figures • Topology • Coding Introduction • Basic Mechanism • The Aperiodic Server • Variable Transfer Services • Message Transfer • Synthesis on the Data Link Layer
8.8
Application Layer ..............................................................8-11 Services Associated with the Variables • Temporal Validity of Variables • Synchronous and Asynchronous • Synchronization Services • Services Associated with Variables Lists
8.9
Jean-Pierre Thomesse Institut National Polytechnique de Lorraine
WorldFIP State and Technology.......................................8-16 Technology • Fieldbus Internet Protocol • New Development
8.10 Conclusion.........................................................................8-16 References .....................................................................................8-17
8.1 Introduction This chapter is dedicated to the study of the WorldFIP* fieldbus. It is one of the first fieldbuses, born at the beginning of the 1980s. But it is also at the origin of several main concepts, which are now implemented in different, other fieldbuses. For example, the producer–consumer model, the timeliness attributes to qualify the validity of data, and the time coherence and consistency attributes are some of the most important WorldFIP contributions. Many of them are coming from research activities (academic and industrial) and from very distributed and real-time requirements analysis. That is why the first sections of this chapter will briefly relate the origin of this fieldbus (Section 8.2), the requirements (Section 8.3), and the choices of WorldFIP specifications (Section 8.4). The technical aspects will be further studied in the four following sections: the architecture in Section 8.5, the physical layer in Section 8.6, the data link layer in Section 8.7, and the application layer in Section 8.8. The current state of this fieldbus is given in the last section (Section 8.9) before the conclusion and bibliography. A lot of theoretical works have been developed for more than 15 years, to prove the protocols, to evaluate the performances, to *WorldFIP is the current name of the previous FIP network. FIP stands for Factory Instrumentation Protocol, but in the French language, the acronym means Flux d’Information (de et vers le) Processus.
8-1 © 2005 by CRC Press
8-2
The Industrial Communication Technology Handbook
guarantee the time constraints (Pleinevaux et al., 1988; Song et al., 1991; Simonot et al., 1995), or to estimate the performances of distributed applications (Bergé et al., 1995).
8.2 WorldFIP Origin The first works on the WorldFIP specification started in September 1982, in a working group under the aegis of the French Ministry of Research and Technology. This working group was composed of representatives of end users, engineering companies, and laboratories. It was important not to include providers and manufacturers of networks at the beginning in order to organize a real end users’ needs analysis, without having to consider the possible influence of existing products or projects. The first objective of this work was to analyze the needs for communication in automatic control systems, but it was necessary to take into account the following points: • It was really the starting development of local area networks. • It was the beginning of the Manufacturing Automation Protocol (MAP) project in the U.S. (MAP, 1988). • Some new ideas appeared on the application architectures, especially the idea of really distributed systems. • The intelligent devices started their development thanks to the progress of microelectronics. The development of WorldFIP started in this context, with essentially two main types of contributions, coming from research and end users’ experiences. The functional analysis of the communication needs in automatic control systems led to the distinction between two mains flows: • A flow of information associated with the control rooms in continuous processes or with the plant in discrete part manufacturing applications • A flow associated with the field devices called “flow of information of the process,” which will be analyzed later and which led to the WorldFIP fieldbus profiles To satisfy the former type, different local area networks were already in existence, while nothing yet existed for the latter. It was then decided to specify a so-called instrumentation network.* The first specification of the FIP Fieldbus was then published in May 1984 (Galara and Thomesse, 1984). It was only at the beginning of the 1990s that the name was transformed to WorldFIP. More information on the origins may be found in Thomesse (1993, 1998). The first results were presented for sustaining a standardization process at International Electrotechnical Commission (IEC) (Gault and Lobert, 1985).
8.3 Requirements The first (and abstract) requirement was to define a communication system to take the place of usual connections standards (4 to 20 mA) between the devices and controllers in an automation system. Another expression was more complete but also abstract: the objective was the design of an operating system for instrumentation. It was in fact a real need in order to build not only a communication system but also really distributed systems. It was then important to provide the right and well-suited services for the distribution of the applications (facilities for the management of coherence and consistencies and for the management of the impossible common global state and clock synchronization). The requirements could be enounced at different abstraction levels. Starting from the most general (see above), they have led to the following: • The connection between the field devices and the control functions should be expensive enough to try the specification of another communication technique. *At this time the word Fieldbus was not yet in use.
© 2005 by CRC Press
The WorldFIP Fieldbus
8-3
• The access to the data by the network should be standardized. • The location of data should be transparent for the user. • The system should be built to meet different dependability requirements by using the same basic components. • The competitiveness of companies should be improved by such technologies. • The development should go through the international standardization. • The protocols should be implemented in silicon. • The data flows between the functions and the set of field and control equipment have then been identified and analyzed, leading to the identification of special needs for a so-called instrumentation network. These led to the identification of the traffic and then to the more technical requirements: • The exchanged data are coming from sensors or are put to actuators. Most of them are known and identified (temperature, pressure, speed, position, and so on), but other transmitted data are not identified in the same sense and usual messages must be transmitted. • The exchanges are periodic or not. They are time constrained in terms of period, jitter, deadline, lifetime, promptness, and refreshment. • Most critical traffic must be periodically managed, but sporadic traffic must also take place. • The timeliness is important for the quality of service and the dependability of the applications. • The distributed decisions must be consistent; i.e., the data and the physical process must be seen in a coherent manner by all application processes. The impossible global state must be approached by a reliable broadcasting of states and events.
8.4 Choices of WorldFIP According to the previous requirements, the WorldFIP solution is based on a few basic ideas, which give the right quality of service to this fieldbus: • The distinction of two types of messages: the notion of identified data vs. the concept of classical messages, associated with the respective cooperation models, producer–consumer vs. client–server • The predefined scheduling of periodic traffic, with periods suited to the physical needs, especially the sampling theory • The online scheduling of sporadic traffic, with priority messages to the critical traffic • The cyclic updating of real-time data at the consumers’ sites • The timeliness attributes and mechanisms for time-critical systems These choices will be presented and analyzed below.
8.4.1 Identified Data vs. Classical Messages Data provided by sensors, data sent to the actuators, and more generally input/output (I/O) and control data are all identified in a given process. They are known within the application. These data are also called identified variables or identified objects. They are often simple objects (temperature, pressure, speed, etc.) and of fixed syntax (integer, real, Boolean, record, list, or other structured data). For instance, a temperature sensor can produce a temperature value coded as an integer, or as a real, and the manufacturer identification as a character chain. The identified data receive a name, which is a global name for the whole application. This name is also used for managing access to the medium. Each variable value has a single producer and one or more consumers. Since transferred values correspond to variables in the process, an identifier is attached to each variable whose value is to be transmitted on the network. This identifier is used as source address to control the medium access. The destination
© 2005 by CRC Press
8-4
The Industrial Communication Technology Handbook
is not indicated. Consumers are responsible for deciding the update of their copies of data on the reception of data by recognizing the corresponding identifier. This is the so-called source addressing. This addressing technique represents several advantages. It allows communication in a one-to-many manner with broadcast. Not only is the communication channel used efficiently when the same information has to be transmitted to more than one consumer, but also the coherence may be obtained with reliable broadcast. A new receiver may be added without address modification. Identifying the variables instead of the sources of the information on the variables offers an additional advantage: the variable is no longer bound to a node of the network. For example, in case of failure of the node providing the variable value, a new source may become active and replace the failed node without any modification of the receivers. Regarding an identified object, a single active producer is defined and all other stations may be defined as consumers.
8.4.2 Periodic and Aperiodic Traffic The control systems are usually based on the system sampling theory, and then the data in inputs and outputs should be transferred periodically. WorldFIP has chosen to privilege the periodic traffic of identified objects between producer and consumers. Variable values are stored in erasable buffers rather than in queues. There is neither acknowledgment nor retransmission for variable transfers. WorldFIP is from this point of view a time-triggered system (Kopetz, 1990). WorldFIP may also be seen as a distributed database updating and management system. The producer of an identified object periodically updates his own buffer, WorldFIP periodically updates the buffers at the consumer locations, and then these consumers may periodically use the copy of the producer value. If a failure occurs during the transmission, the last value is always available for the consumer until he or she receives a new one. In WorldFIP, a one-place erasable buffer is associated with each variable at its production and consumption locations. The usual acknowledgments are not necessary, and the retransmissions are avoided in case of error. The question is how to handle some critical data like alarms or rarely occurring events. In WorldFIP, there are two possible ways, depending on the criticality. If no real-time reaction is required, the best is to use the usual message transfer. Otherwise, the only good solution is to transform the alarm into a variable whose value reflects the presence of an alarm and transfer this value periodically. One may think that this would result in a waste of bandwidth. This is true, but it is the price to pay to ensure a deterministic response time. Moreover, multicast transfers are complex when acknowledgments from each receiver are required. In FIP, the choice to suppress acknowledgments simplifies drastically the solution.
8.4.3 Timeliness Attributes and Mechanisms for Time-Critical Systems Due to the periodic transfer of identified data between a producer and its consumers, no acknowledgment has been proposed. No retransmission is basically allowed. We consider the three following elements: a producer, the consumers, and the bus. The producer is a process producing a data named X at a given period. Several processes consume X at different periods, and the bus updates at a given period the copy of X at each consumer site from the original of X. The question at each consumption site is: Is the value of X fresh, too old, or obsolete? Therefore, some timeliness attributes have been defined in order to indicate to the consumers if the data are correct and in this case the cause of error. These attributes are called refreshments and promptness. The former type indicates if the production is timely; the latter indicates if the reception is correct. Based on these elementary attributes, it is then possible to define the time coherence of actions, i.e., the fact that different distributed actions take place in a given time interval. That is also the definition of simultaneity of actions.
© 2005 by CRC Press
8-5
The WorldFIP Fieldbus
MMS
MPS
Identified Traffic management
Messaging management
Physical layer
FIGURE 8.1 Simplified architecture of WorldFIP.
MMS
MPS Identified Traffic management
Messaging management
(ident,value) transfer Physical layer
FIGURE 8.2 Architecture of WorldFIP.
Other mechanisms have been introduced as synchronization mechanisms between the local operations and the behavior on the network. All these mechanisms will be detailed in Section 8.8.
8.5 WorldFIP Architecture The WorldFIP architecture is demonstrated in Figure 8.1 and Figure 8.2, according to the Open Systems Interconnection (OSI) architecture model (Zimmermann, 1980). All elements were standardized in France in 1992 (AFNOR, 1989). This architecture shows that two main profiles may be used. One is defined to solve the traffic of identified objects; the other is defined for the usual messaging exchanges. This architecture is directly issued from the need analysis. It is important to note that the messaging services in the data link layer are related to the point-to-point exchanges of frames, with storage in queues, with or without acknowledgment, and replication detection. The identified traffic services are related to the exchanges of data in a broadcast manner, with storage in erasable buffers, without acknowledgment, except by the space consistency mechanism at the application layer. Messaging periodic service (MPS) is the service element for the periodic and aperiodic exchanges of identified data. It uses the services of identified traffic at the data link layer. MMS is a subset of the wellknown MMS standard (ISO, 1990) and uses the messaging services at the data link layer. We may say that the first profile (left of the figure) is a profile for real-time traffic management, with guaranteed quality-of-service and timeliness properties. The second profile is used more for noncritical exchanges, e.g., during commissioning, for maintenance and configuration, or more generally for management. Notice that the messaging services are based on the same medium access control.
8.5.1 Architecture and Standardization The European standard EN 50170 [CENELEC, 1996a] contains three national standards in Europe.* Volume 3 outlines all WorldFIP specifications according to the organization shown in Table 8.1 and Figure 8.3. *The other volumes are concerned with P-Net and Profibus.
© 2005 by CRC Press
8-6
The Industrial Communication Technology Handbook
TABLE 8.1
Parts of the European 50170-3 Standard
EN 50170 volume 3
Part 1-3
EN 50170 volume 3
Part 2-3 Sub-part 2-3-1 Sub-part 2-3-2 Sub-part 2-3-3 Part 3-3 Sub-part 3-3-1 Sub-part 3-3-2 Sub-part 3-3-3 Part 5-3 Sub-part 5-3-1 Sub-part 5-3-2 Part 6-3 Part 7-3
EN 50170 volume 3
EN 50170 volume 3
EN 50170 volume 3 EN 50170 volume 3
General Purpose Field Communication System Physical Layer IEC Twisted Pair (IEC 61158-2) IEC Twisted Pair Amendment IEC Fiber optic Data Link Layer Data Link Layer Definitions FCS Definition Bridge Specification Application Layer Specification MPS Definition SubMMS Definition Application Protocol Specification Network Management
TABLE 8.2
Data Rate and Maximum Possible Lengths
Data Rate
Length without Repeater
Length with 4 Repeaters
31,25 kbps 1 Mbps 2,5 Mbps
10 km 1 km 700 m
50 km 5 km 3,5 km
Several profiles of WorldFIP have been defined. One of them, the simpler one providing only periodic traffic of identified data, is called Device WorldFIP (DWF) and is standardized (AFNOR, 1996; CENELEC, 1996b).
8.6 Physical Layer The physical layer of WorldFIP was obviously the first to be conformed to IEC 1158-2* because this standard has been defined starting from the FIP French standard C46 604. The medium may be a twisted shielded pair or a fiber optic.
8.6.1 Figures 8.6.1.1 Data Rates The standard defines three data rates for the shielded twisted pair: 31.25 kbps, 1 Mbps, and 2.5 Mbps. For the fiber optic, a fourth data rate, 5 Mbps, is defined. However, some experiences have been built with other data rates, for example, 25 Mbps, with transfer of speed and video. 8.6.1.2 Maximum Length The maximum number of stations is 256 and the maximum number of repeaters is 4. According to the data rate and number of repeaters, Table 8.2 gives the possible maximum lengths.
8.6.2 Topology The topology for a twisted shielded pair may be like that shown in Figure 8.4.
*This number was the previous number of the current 61158 standard.
© 2005 by CRC Press
8-7
The WorldFIP Fieldbus
Network Management
SubMMS EN50170 - volume 3, part 5-3-2
MPS EN50170 Volume 3 Part 5-3-1
MCS EN50170 - volume 3, part 6-3
EN50170 Volume 3 Part 7-3
Data Link layer EN50170 - volume 3, Part 3-3
Physical layer EN50170 - volume 3, Part 2-3
FIGURE 8.3
Architecture and European standard.
PC JB
TAP
JB
REP
JB
JB
DS
NDS
DS
DS
PC
DS
JB JB
DS
DS
NDS
DS
DS
NDS
NDS
JB: JunctionBox TAP: Connector DS: Diffusion Box DS: Device locally disconnectable NDS: Not disconnectable device RP: Repeater PC: Principal cable
FIGURE 8.4 Example of topology.
8.6.3 Coding The coding is based on a Manchester code. A physical data frame is composed of three parts: a sequence of frame composed of a preamble and a start delimiter (PRE and FSD), the data link information, and the end delimiter (FED). Twenty-four bits are added to each data frame.
8.7 Data Link and Medium Access Control Layers 8.7.1 Introduction The WorldFIP medium access control is centralized and managed by a so-called bus arbitrator (BA). All exchanges are under the control of this bus arbitrator. They are currently scheduled according to the
© 2005 by CRC Press
8-8
The Industrial Communication Technology Handbook
timing requirements and time constraints (Cardeira and Mammeri, 1995), but the scheduling policy is not relevant to the standard. The data link layer provides two types of services: for the identified objects and for the messages. Both may take place periodically. Thanks to the medium access control (MAC) protocol, it is obviously easy to manage the periodic traffic, which may be scheduled before the runtime. However, then it is necessary to provide the well-suited services for the requirement of sporadic or random traffic and the associated protocol mechanisms. As it is usually known, the random traffic is managed by a periodic server. When a station is polled, it may express a request for extra polling. Such requests corresponding to the aperiodic traffic are managed dynamically by the bus arbitrator.
8.7.2 Basic Mechanism The medium access control is based on the following principle: each exchange is composed of two frames, a request and a response. All of the exchanges are based on the couple (name of information, value of the information) implemented by two frames: an identification frame and a value frame. So, to exchange a value of an object, the bus arbitrator sends a frame that contains the identifier of this object. This frame is denoted ID-DAT (as identification of data) (Figure 8.5a). This frame is received by all active current stations and recognized by the so-called producer station of the identified object, and also by the consumer stations that subscribe to this object* (Figure 8.5b). The station that recognizes itself as the producer sends the current value of the identified object. This value is transferred in a so-called RP_DAT frame (as response) (Figure 8.5c). All interested stations, including all subscribers and the bus arbitrator, receive this RP_DAT frame (cf. Figure 8.5d). The ID and RP frames have the following formats: ID frame: • A control field (cf.) of 8 bits for which roles will be developed later • The identifier of the object (16 bits) • A cyclic redundancy check (CRC) (16 bits) RP frame: • A control field (cf.) of 8 bits for which roles will be developed later • The value of the identified object in the previous frame (maximum of 256 bytes) • A CRC (16 bits) The idea is now to extend this simple mechanism of polling by designation of the data to be sent, in order to solve the problem of messages transfer and the transfer of aperiodic data or messages.
8.7.3 The Aperiodic Server The needs for aperiodic transfer have been identified in Section 8.4.2. This aperiodic transfer takes place in the free time slots of the periodic one. The aperiodic traffic is dynamically managed by the bus arbitrator. 8.7.3.1 First Stage The first stage is the expression of the request to the BA by a station. Any producer may request a new exchange by an indication in the control field of the RP frame, when it is polled by an ID_DAT frame. This indication specifies the type of RP frame. We may then observe the three following RP frame types as answers to an ID_DAT frame:
*Notice that the producer–consumer model is also called publisher–subscriber, following the concept of subscribing to the data by the consumers.
© 2005 by CRC Press
8-9
The WorldFIP Fieldbus
(a) VARK
Identification of VARK
VAR3 VAR2 VAR1
(b) VARK
VAR3 VAR2 VAR1 Prod Value of VARK
Cons Value of VARK
Cons Value of VARK
(c) New Value of VARK
VARK
VAR3 VAR2 VAR1 Prod
Cons
Cons
New Value of VARK
Old Value of VARK
Old Value of VARK
Prod
Cons
(d) VARK
VAR3 VAR2 VAR1
New Value of VARK
FIGURE 8.5 Exchange and updating of a variable.
© 2005 by CRC Press
New Value of VARK
Cons New Value of VARK
8-10
The Industrial Communication Technology Handbook
RP_DAT: Response to ID_DAT without any request RP_DAT_RQ: Response to ID_DAT with request for random exchange of other identified objects RP_DAT_MSG: Response to ID_DAT with request for random exchange of message 8.7.3.2 Second Stage The second stage is to satisfy the request. The BA has then to place the right ID frames in a free time slot of the scanning table, according to its own scheduling policy. The ID frames corresponding to the possible requests are ID_RQ and ID_MSG. The former satisfies the RP_DAT_RQ and the latter the RP_DAT_MSG. Following the reception of ID_RQ, the station at the origin of the request sends an RP_RQ frame, in the data field, with a list of identifiers, which have to be sent as ID frames by the BA. Following the reception of ID_MSG, the station at the origin of the request sends an RP_MSG frame with a message in the data field. This message is specified with or without acknowledgment. Both of these RP_MSG frames are called RP_MSG NOACK or RP_MSG_ACK. In this last case, the receiver sends an RP_ACK after reception of the message. A special frame RP_FIN allows the BA to continue its polling.
8.7.4 Variable Transfer Services To each identified object, the data link layer associates a buffer B_DAT_prod at the producer station and a buffer B_DAT_cons at each consumer station. Two main services are defined for writing and reading a buffer: L_PUT and L_GET, respectively. The write service (L_PUT) places the new value in the producer buffer. The previous buffer content is overwritten. The read service (L_GET) gets the value from the consumer buffer. These services do not cause any traffic on the bus. The content of each consumer buffer is updated with the value stored in the producer buffer under the control of the bus arbitrator (Figure 8.5d), as seen in Section 8.7.2. A L_SENT.indication informs the producer when the transmission takes place. The consumers are informed by an L_RECEIVED indication when the update occurs.
8.7.5 Message Transfer For the message as for the identified objects, WorldFIP defines periodic and aperiodic (or on-request) message transfers. To periodic messages are assigned an identifier and a queue. According to the application needs, more than one identifier and one queue may be used when one wants to have different polling periods. Messages are deposited at the source side and the transfer of the content of the queue is periodically triggered by the bus arbitrator (ID-MSG frame). If the queue is not empty, the source DLL sends the first message in the queue in an RP_MSG_xx frame. The destination data link layer stores the message in the receive queue and, if requested, immediately acknowledges the transfer using an RP_ACK frame. The end of the transaction is signaled by the source to the bus arbiter using an RP_FIN frame. If the queue is empty, no transfer takes place and only the RP_FIN frame is sent. The polling period is a configuration parameter. For aperiodic message transfer, on the source side, the data link layer defines a single queue F_MSG_aper that will hold the pending messages. On the destination side, a receiving queue F_MSG_rec is defined. As for aperiodic variables, transfer requests are signaled to the bus arbiter as piggybacks on RP_DAT frames sent in response to ID_DAT frames.
8.7.6 Synthesis on the Data Link Layer Three types of objects are exchanged according to the basic principle based on the exchange of the couples (name, value) (Thomesse and Rodriguez, 1986). These objects are identified objects or list of identifiers of identified objects or messages with or without acknowledgment (Figure 8.6). The data link layer protocol may be considered with a connection established at the configuration stage. These connections are multicast and the corresponding services access points (SAPs) are represented
© 2005 by CRC Press
8-11
The WorldFIP Fieldbus
Station queue
Service Access Point IDENT4
Buffer Value of IDENT4
Buffer List of identifiers
CEP RP-DAT
CEP RP-RQ
Pointer to message queue
General Queue for messages
CEP RP-MSG
FIGURE 8.6 Service access point and connection end point.
by the identifiers. Associated with each SAP, different connection end points (CEPs) are represented by the associated objects and necessary resources: • A CEP for data exchange represented by the buffer storing the successive values • A CEP for the list of identifiers exchange represented by another buffer • A CEP for the sending of messages represented by a queue Each of these CEPs is addressed first by the identifier addressing the SAP and second by the indication in the control field of the ID frame, specifying the corresponding resource.
8.8 Application Layer Two application service elements comprise the WorldFIP application layer: MPS for real-time data exchange (Thomesse and Delcuvellerie, 1987; Thomesse and Lainé, 1989) and sub-MMS for usual messaging service and compatibility with other networks. For the real-time data exchange, FIP behaves like a distributed database being refreshed by the network periodically or aperiodic on demand. All application services related to the periodic and aperiodic data exchange are called MPS. MPS provides local read-write services (periodic) as well as remote read-write services (aperiodic) of the values of variables or lists of variables. The read services associate indications on the age of the object value. Considering a producer (producer of data named X) at a given period, the consumers of X consuming X at different periods, and the bus itself updating at a given period the copies of X consumers from the original of X, the question at a consumption site is: Is the value of X fresh, too old, or obsolete? In WorldFIP, this information is based on local mechanisms and provided as two types of status: the refreshment status elaborated by the producer and the promptness status elaborated by the consumer (Thomesse et al., 1986; Decotignie and Raja, 1993; Lorenz et al., 1994). These statuses are returned by the read services with the value itself. This information may also be used to check whether a set of variables are time coherent (Figure 8.9). As a variable may have several consumers, there is a need to know whether the different copies of the variable value available to the various consumers are identical. This information, called spatial coherency status (or spatial consistency) (Saba et al., 1993), is provided by the MPS read services related to lists of variables. The same services also offer a temporal coherence status.
8.8.1 Services Associated with the Variables A variable can be a simple type, such as integer, floating, Boolean, character, or composite-like array and records. It may have different semantics: variable, synchronization variable, consistency variable, or variable descriptor. Synchronization variables are used to synchronize application processes and also in the elaboration of the temporal and spatial statuses associated with ordinary variables. Consistency
© 2005 by CRC Press
8-12
The Industrial Communication Technology Handbook
variables are used to elaborate the spatial coherence status. Variable descriptors hold all information concerning variables, type, semantics, periodicity, etc., for configuration, commissioning, maintenance, and management. Three request services, A_READ, A_WRITE, and A_UPDATE, and two indication services, A_SENT and A_RECEIVED, are available. A_UPDATE is used to request an aperiodic transfer of a variable. A_SENT and A_RECEIVED are optional services. When used, A_RECEIVED informs the consumer at its local communication entity of the reception of a new variable value. Similarly, A_SENT notifies the producer of the transmission of the produced value. These indication services can be used by application processes (APs) to verify the proper operations of the communication entity and also to synchronize with each other by receiving synchronization variables (see Section 8.8.4). A_READ and A_WRITE exist in two forms, local and remote. A local read of a variable (denoted A_READLOC) provides the requesting AP with the current value of the local variable image. A local write (denoted A_WRITELOC) invoked by an AP updates the local value of the variable to be sent at a next scanning, specified in the request. As already said, these operations do not invoke any communication. The transfer of the value from the producer to its copies in all consumers concerns the distribution function of the application layer. The variable value given in an A_WRITELOC will be available for the distributor, which is in charge of broadcasting it to all consumers of this value. The variable value, returned by an A_READLOC at a consumer site, will be the last value updated by the distribution function. A remote write, A_WRITEFAR, is used by a producer to update the content of the local buffer assigned to the variable specified in the request and to ask for the transfer of this value to the producers. It may been seen as the combination of the invocation of an A_WRITELOC service followed by an A_UPDATE. However, as for the A_WRITELOC, this service may only be invoked by the producer of the variable. Operations are as follows: 1. 2. 3. 4. 5.
Remote READ request Ask the distributor for an update Transfer order from the distributor Transfer of the producer’s variable value to the users Confirmation of remote READ
In a similar way, with a remote read, A_READFAR, a consumer requires the transfer of the variable value from the producer to all consumers. This value is returned as a result of this request. This service is hence a combination of an A_UPDATE service invocation and an A_READLOC service invocation. However, read services may only be invoked on a consumer. Figure 8.7 shows this service from the producer–consumer model point of view, considering an AP that is a consumer of variable X. In order to request explicitly a transfer, this AP should also be the producer of a variable (Y in the example). Figure 8.8 depicts in detail all necessary primitive calls and frame exchanges to achieve the remote READ of X. We can see that these remote operations are not symmetrical. The remote write is a service without confirmation, while the remote read is with confirmation. User
Producer
User
5
1
Distributor
3 FIGURE 8.7 Remote READ using PDC model.
© 2005 by CRC Press
4
2
8-13
The WorldFIP Fieldbus
AP: consumer of X and producer of Y
Data Link layer
Application layer
Exchanged frames
A-READFAR-Rq(X) L-UPDATE-Rq(X)
Y val(Y)
ID-DAT(Y)
X demand queue
L-SENT-Ind(Y)
RP-DAT-RQ(val(Y)) ID-RQ(Y)
X
L-UPDATE-Cnf(X) X val(X)
L-RECEIVED-Ind(X)
RP-RQ(X) ID-DAT(X) RP-DAT(val(X))
A-READFAR-Cnf(X,val(X))
FIGURE 8.8 Primitives and frame exchanges to achieve a remote READ service.
8.8.2 Temporal Validity of Variables Normally, according to the producer–consumer model, operations should proceed in the following order: production, transfer, and consumption. As the receipt of an identifier triggers the transfer, a production and a consumption may be triggered by a production or consumption order. These behaviors may lead to abnormal situations. For example, several productions may occur successively without any transfer. Conversely, a number of transfers may take place between two successive productions of a variable. The same problems may arise on the consumer side. A consumer may read several times the same value of a variable or may not have enough time (or may not be interested) to consume all of the received values. It is thus important to detect these deviations from normal behavior. This means that any consumer should be able to know if the producer has produced on time, the transfer has been handled on time, and the consumer itself has consumed on time. Finally, the consumer should be able to check whether the value is still temporally valid (Lorenz et al., 1994a; Lorenz et al., 1994b). 8.8.2.1 Refreshment The refreshment status type for a variable is a Boolean that indicates if a production occurred in the right time window. It is elaborated by the application layer of the producer. The time window is defined by a start event and a duration. From a simplified view, the refreshment is correct (true) if the production occurs in the time window and it remains true during a given delay, which is the normal production period. It is false after this period deadline. The value of the current refreshment status is sent with the value and indicates to the consumer if, from the point of view of the producer and transmission, the value is valid (Figure 8.9). sync. variable variable prod.
variable transm.
sync. variable
Ttx time timer Tprod refreshment status
true false
timer Tcons promptness status
FIGURE 8.9
true false
Refreshment and promptness status elaboration in the synchronous case.
© 2005 by CRC Press
8-14
The Industrial Communication Technology Handbook
In summary, the refreshment status indicates to a consumer that the producer has produced its value by respecting a production delay called the production period. 8.8.2.2 Promptness The promptness status type for a variable value indicates if the transmission of the data has been done in a right time window. It is elaborated by the communication entities of its consumers. It is returned with the variable value and the refreshment status as a result of an invocation of A_READ.
8.8.3 Synchronous and Asynchronous Refreshment and promptness are the timeliness attributes associated with a given variable. They indicate if an event occurs in a time window defined by a starting event and a duration. The WorldFIP fieldbus has defined two types of timeliness attributes, the synchronous type and asynchronous type.* An attribute is said to be synchronous when the starting event of the time window is a received indication of a dedicated variable. It is normally periodic, and the bus arbitrator is, as for other variables, in charge of the respect of the period. The duration is the period of production and is managed by the network, through the bus arbitrator behavior. An attribute is said to be asynchronous when the starting event is the previous occurrence of this event. The duration is the period of production and is here managed by the device itself. In fact, in WorldFIP, two attributes are relevant to the synchronous type: the so-called synchronous attribute, as defined previously, and the so-called ponctual attribute, where the starting event is the same as for the synchronous attributes, but the duration is no more than the period of production. It is a shorter delay. Further details and the finite-state machines of these mechanisms can be found in the C46-602 standard and in Lorenz et al. (1994a).
8.8.4 Synchronization Services The processes of an application can be synchronized or asynchronous. An asynchronous application process is one whose execution is independent of the network behavior. An application process is said to be synchronized when its execution is related to the reception of some indication of the network. In many cases, the various distributed processes of an application are synchronized. This synchronization may be ensured through the indication of reception of variables. However, some application processes may not be able to handle synchronization. For such asynchronous APs that need to participate in a synchronized distributed application, FIP provides a resynchronization mechanism. In addition to the existing buffer, called the public buffer, for each variable the resynchronization mechanism associates a second buffer, the private buffer. The private buffer is only accessible to the corresponding AP. The access can be performed by the local A_READ if the variable is consumed or A_WRITE if the variable is produced. The public buffer is only accessible to the network (Figure 8.10). The resynchronization mechanism consists of copying the content from one buffer to the other according to the synchronization order via the network. Both variable production and variable consumption can be resynchronized. When variable consumption has to be resynchronized, its value in the private buffer is kept unchanged until the resynchronization order. If a new value is transferred on the network, it will be kept in the public buffer. Only at the reception of a resynchronization order will the value in the public buffer be copied in the private buffer. The process is similar for variable production. In both cases, the resynchronization order is given by a synchronization variable specified with each variable, produced or consumer, that needs to be resynchronized. *The terms synchronous and asynchronous could be replaced by “synchronized by the network” and “locally synchronized,” respectively.
© 2005 by CRC Press
8-15
The WorldFIP Fieldbus
Asynchronous access by producer or consumer
Occurrences of the same variable
Private buffer Public buffer
Synchronous access by network FIGURE 8.10
Resynchronization mechanism.
8.8.5 Services Associated with Variables Lists A variable list is an unordered set of variables that must verify the time coherence, i.e., the fact that they are all produced in a given time window. All variables of a list are consumed at the same time by at least one consumption process. In the case of more than one process concerned with the consumption of a list, another property must be verified: the space consistency, i.e., the fact that all copies of all variables of the list are the same on all consumption sites. The variables that comprise the list may be produced on different sites. They need not all be produced by the same producer, as in the usual application layers, as MMS or MMS-like (FMS or some sub-MMS, for example). Usually the productions are synchronous operations. The only service defined on lists is A_READLIST, which allows the reading of all variables of a list in a single invocation. This service returns the last received values for the variables in the list and three optional statuses: a production coherence status, a transmission coherence status, and a spatial consistency status. These statuses are provided to account for two important needs in the systems using fieldbuses: First, a consumer of several variables is interested to know whether the corresponding values have been sampled nearly at the same time, which is called time coherence (Kopetz, 1990). Second, when the value of a variable has been distributed to several consumers, it might be useful to know if all of the values are identical. This is referred to as spatial consistency. The idea in FIP is not to ensure temporal coherence and spatial consistency, but rather to indicate if these properties are present. In FIP, temporal coherence indication is given through two statuses, the production coherence status and the transmission coherence status. The production coherence status is a Boolean information elaborated by the consumer application layer. This status corresponds to a logical AND operation of all corresponding refreshment statuses. Similarly, a transmission coherence status is calculated as the logical AND of all promptness statuses of the variables in the list. The production coherence status and the transmission coherence status together are an indication of the temporal coherence of the variables in the list. The space consistency status of a list is elaborated by the application layer of each consumer of the list. The elaboration mechanism relies on the broadcast of a consistency variable by each of the consumers. This variable indicates for each copy of the variable list if it has been received correctly and in a given window. After the reception of all consistency variables by all consumers, each of them will have knowledge about the validity of the variables in the list at all consumers. A logical AND operation of all consistency variables will give the spatial consistency status. To make the variable list transfer more reliable, FIP defines an error recovery mechanism. If needed, a consumer can trigger a retransmission of its consumed variables when any error is detected. This mechanism performs a retransmission request (using aperiodic variable transfer service of data link layer) for a number of times, limited to a maximum defined for each of the instances of this list. The duration of the whole transaction of the list (including retransmission) must be bounded by a time window T smaller than the delay between two consecutive synchronization orders. It has been shown in (Song et
© 2005 by CRC Press
8-16
The Industrial Communication Technology Handbook
al., 1991) that this recovery mechanism (a kind of grouped acknowledgment technique) is very efficient and can be recommended for use for other multicast protocols.
8.9 WorldFIP State and Technology 8.9.1 Technology A lot of integrated circuits and software libraries are available in order to build devices compatible with the standard. They conform to the European standard EN 50 170. The circuits cover all physical layer protocols and all profiles of data link and application layers. They are referenced as FIPIU2, FULLFIP2, and MICROFIP for the communication component. The circuits for the physical layer are essentially FIELDRIVE and CREOL for copper wire and OPERA-FIPOPTIC for fiber optic. It is important to notice that redundant channels may be used with the FIELDUAL component. TransoFIP and FIELDTR are possible transformers for connection on copper wire. The libraries are used in order to create the link between the user application and the communications controller. Each library is dedicated to a communication component.
8.9.2 Fieldbus Internet Protocol FIP may now be interpreted as FieldBus Internet Protocol. Indeed, the messaging services are used to transfer Hypertext Transfer Protocol (HTTP) protocol data unit (PDU), and then each site on a WorldFIP fieldbus may host a Web server. For remote maintenance applications, for remote configuration, it is then possible to access the stations through a browser. Two solutions are currently used: 1. A single station may be seen as the image of all stations of the fieldbus. 2. Each station is directly accessed by tunneling HTTP in the WorldFIP PDUs. The interest of WorldFIP is that the data flow generated by Internet connections is managed in complete compatibility with the time constraints of the process, and then with its own dependability. The timecritical traffic is always satisfied in priority.
8.9.3 New Development In 2001, WorldFIP was certified safe according to the procedure “proven in use” defined by IEC 61 508. According to this standard, a fieldbus is considered a subsystem. It has been certified by Bureau Veritas after considering very reliable applications from detailed records from field users, with a sufficient number of applications, with a high level of confidence in the operational figures. It is the only fieldbus in the world with such a certification.
8.10 Conclusion The WorldFIP fieldbus is now 20 years old if we consider the beginning of its specification. It is only a few years old if we consider the very recent dates of the international standards (Leviti, 2001), knowing that the definition of the profiles standard (CENELEC, 2003b) is not yet finished. The current integrated circuits (ICs) (the third generation for the first IC) are based on the last development of microelectronics. The WorldFIP fieldbus is the only certified SIL3 (safety integrity level) in the world (Froidevaux et al., 2001). WorldFIP occupies a special place in the international market. It is used in all types of industry (iron–steel industry, car manufacturing, bottling, power plants, etc.), but also in a lot of time-critical applications, embedded systems in trains, autobuses, ships, subways, and a very important application: the Large Hadron Collider in CERN (Centre Européen de Recherche Nucléaire) in Switzerland. The main reason must be searched in the technical specifications, in the services provided, and in the quality of
© 2005 by CRC Press
The WorldFIP Fieldbus
8-17
services, essentially from a timeliness point of view. WorldFIP guarantees that time constraints will be met, periods without jitter, and synchronization of distributed actions (productions, consumptions, data acquisitions, controls). The validation of data is also an important element of the quality of service and of the dependability of WorldFIP-based systems. The quality of the physical layer (IEC, 1993) and the redundancy capabilities are also important points for choosing WorldFIP for critical applications. A lot of these concepts and mechanisms have been repeated in the TCCA (time-critical communication architecture) report (ISO, 1991; Grant, 1992) and in the IEC TS 61158 (CENELEC, 2003a). Regarding other and newer requirements, WorldFIP is able to transport voice and video without disturbing or influencing the real-time traffic. Some examples of voice transport are already industrial in trains. WorldFIP is also able to transport HTTP data units, allowing remote access to WorldFIP-based systems for monitoring, maintenance, configuration, etc.
References AFNOR (1989). French Standards NF C46601 to C46607. FIP bus for exchange of information between transmitters, actuators and programmable controllers. Published between 1989 and 1992 (in French). AFNOR (1996). French Standard C46-638, Système de Communication à haute performance pour petits modules de données (WorldFIP Profil 1, DWF). Bergé, N., G. Juanole, and M. Samaan (1995). Using Stochastic Petri Nets for Modeling and Analysing an Industrial Application Based on FIP Fieldbus. Paper presented at International Conference on Emerging Technologies and Factory Automation, INRIA, Paris. Cardeira, C. and Z. Mammeri (1995). A schedulability analysis of tasks and network traffic in distributed real-time systems. Measurement, 15, 71–83. CENELEC (1996a). European standard EN 50170. Fieldbus. Volume 1: P-Net, Volume 2: PROFIBUS, Volume 3: WorldFIP. CENELEC (1996b). High Efficiency Communications Subsystems for Small Data Packages, CLC TC/ 65CX, EN 50254. CENELEC (2003a). prEN61158-2: Digital Data Communication for Measurement and Control: Fieldbus for Use in Industrial Control Systems. Part 2: Physical layer specification, Part 3: Data link layer service definition, Part 4: Data link layer protocol specification, Part 5: Application layer service definition, Part 6: Application layer protocol specification. CENELEC (2003b). prEN61784-1 (65C/294/FDIS): Digital Data Communications for Measurement and Control: Part 1: Profile Sets for Continuous and Discrete Manufacturing Relative to Fieldbus Use in Industrial Control Systems. Decotignie, J.D. and P. Raja (1993). Fulfilling temporal constraints in fieldbus. In Proc. IECON ’93, Maui, HI, pp. 519–524. Froidevaux, J.-P. (2001). Use of Fieldbus In Safety Related Systems, an Evaluation of WorldFIP according to Proven-in-Use Concept of IEC 61508. Paper presented at 4th FET, IFAC Conference, Nancy, France. Galara, D. and J.P. Thomesse (1984). Groupe de réflexion FIP, Proposition d’un système de transmission série multiplexée pour les échanges d’informations entre des capteurs, des actionneurs et des automates réflexes. Ministère de l’Industrie et de la Recherche. Gault, M. and J.P. Lobert (1985). Contribution for the fieldbus standard. Presentation to IEC/TC65/ SC65C/WG6. Grant, K. (1992). Users Requirements on Time Critical Communications Architectures, Technical Report. ISO TC184/SC5/WG2/TCCA. IEC (1993). IEC Standard 1158-2, Fieldbus Standard for Use in Industrial Control Systems: Part 2: Physical Layer Specification and Service Definition + AMD1, 1995. ISO (1990). International Standard ISO 9506, Manufacturing Message Specification (MMS): Part 1: Service Definition, Part 2: Protocol Specification, 1991.
© 2005 by CRC Press
8-18
The Industrial Communication Technology Handbook
ISO (1991). ISO/TC 184/SC 5/WG 2-TCCA-N56, Draft Technical Report of the TCCA Rapporteurs’ Group of ISO/TC 184/SC 5/WG 2 Identifying User Requirements for systems Supporting TimeCritical Communications, August 1991. Kopetz, H. (1990). Event triggered vs. time triggered real time systems. LNCS, 563, 87–101. Leviti, P. (2001). IEC 61158, An Offence to Technicians. Paper presented at 4th FET, IFAC Conference, Nancy, France. Lorenz, P. and Z. Mammeri (1994a). Temporal Mechanisms in Communication Models Applied to Companion Standards. Paper presented at SICICA 94, Budapest. Lorenz, P., J.-P. Thomesse, and Z. Mammeri (1994b). A State-Machine for Temporal Qualification of Time-Critical Communication. Paper presented at 26th IEEE Southeastern Symposium on System Theory, Athens, Ohio, March 20–22. MAP (1988). General Motors, Manufacturing Automation Protocol, version 3.0. Pleinevaux, P. and J.-D. Decotignie (1988). Time critical communications networks: field buses. IEEE Network Magazine, 2, 55–63. Saba, G., J.P. Thomesse, and Y.Q. Song (1993). Space and time consistency qualification in a distributed communication system. In Proceedings of IMACS/IFAC International Symposium on Mathematical and Intelligent Models in System Simulation, Vol. 1, Brussels, Belgium, April 12–16, pp. 383–391. Simonot, F., Y.Q. Song, and J.P. Thomesse (1995). On message sojourn time in TDM schemes with any buffer capacity. IEEE Transactions on Communication, 43, 2/3/4, 1013–1021. Song, Y.Q., P. Lorenz, F. Simonot, and J.P. Thomesse (1991). Multipeer/Multicast Protocols for TimeCritical Communication. Paper presented at Multipeer/Multicast Workshop, Orlando, FL. Thomesse, J.-P. (1993). Le réseau de terrain FIP. Revue Réseaux et Informatique Répartie, Ed. Hermès, 3, 3, 287–321. Thomesse, J.-P. (1998). A review of the fieldbuses. Annual Reviews in Control, Pergamon, 22, 35–45. Thomesse, J.-P. and J.-L. Delcuvellerie (1987). FIP: A Standard Proposal for Fieldbuses. Paper presented at IEEE-NBS Workshop on Factory Communications, Gaithersburg, MD, March 17–19. Thomesse, J.-P., J.-Y. Dumaine, and J. Brach (1986). An industrial instrumentation local area network. Proceedings of IECON, 1, 73–78. Thomesse, J.-P. and T. Lainé (1989). The field bus application services. In Proceedings of IECON ’89, 15th Conference IEEE-IES Factory Automation, Philadelphia, pp. 526–530. Thomesse, J.-P. and M. Rodriguez (1986). FIP, A Bus for Instrumentation. Paper presented at Advanced Seminar on Real Time Local Area Networks, Colloque INRIA Bandol, France. Zimmermann, H. (1980). OSI reference model. The ISO model of architecture for open system interconnection. IEEE Transactions on Communication, 28, 425–432.
© 2005 by CRC Press
9 FOUNDATION Fieldbus: History and Features 9.1 9.2
Principles of FOUNDATION Fieldbus.....................................9-1 Technical Description of FOUNDATION Fieldbus ................9-2 H1 and HSE FOUNDATION Fieldbus User Application Layer • H1 FOUNDATION Fieldbus • HSE FOUNDATION Fieldbus • Open Systems Implementation
Salvatore Cavalieri University of Catania
9.3 Conclusions .......................................................................9-16 References .....................................................................................9-16
FOUNDATION fieldbus is an all-digital, serial, two-way communication system. Its specification has been developed by the nonprofit Fieldbus Foundation [1]. Since its very beginning, FOUNDATION Fieldbus has shown two fundamental and unique (at that time, at least) features: an emphasis on the standardization of the description of the devices to be connected to the fieldbus, and the adoption of the main link access mechanisms (i.e., token-based and centralized ones), which the International Electrotechnical Commission (IEC)/Instrument Society of America (ISA) fieldbus committee (IEC 61158 and ISA SP50) was trying to derive from the existing proposals within a new and complete solution [2][3][4][5]. One of the aims of this paper is to emphasize the value of the choices made by the Fieldbus Foundation as well as their impact on current features of the FOUNDATION Fieldbus communication system. Furthermore, those features will be described in great detail, allowing the reader to clearly understand the key points of the system. This chapter is organized into two parts: Section 9.1 will give an overview of the principles of FOUNDATION Fieldbus, and Section 9.2 will discuss the main features of this communication system.
9.1 Principles of FOUNDATION Fieldbus Since the first fieldbus communication systems [6][7] have appeared on the market, the need to achieve just one fieldbus standard was felt immediately. Over 15 years ago, the International Electrotechnical Commission (IEC) and Instrument Society of America (ISA) embarked on a joint standardization effort identified by two codes: 61158 on the IEC side and SP50 on the ISA one. The main aim of the standard committee was the definition of a unique communication system able to merge the main features of the fieldbuses available on the market: FIP (Factory Instrumentation Protocol) [8] and Profibus [9]. The Fieldbus Foundation (note that Fieldbus Foundation refers to the name of the association while FOUNDATION Fieldbus is used for the relevant communication system [1]), established in 1994 as a result of a merger between ISP (Interoperable System Project) and WorldFIP North America, has defined a small set of basic principles. Those basic principles included two main cornerstones:
9-1 © 2005 by CRC Press
9-2
The Industrial Communication Technology Handbook
1. The adoption of both main medium access control (MAC) mechanisms that the IEC/ISA fieldbus committee was trying to derive from the existing proposals within a truly complete solution 2. Emphasis on a standard description of the devices to be connected on the fieldbus Cornerstone 1 made the Fieldbus Foundation free from the persistent solution issue: scheduled access vs. circulated token. IEC 61158 type 1 data link layer (DLL) stated: “Both paradigms, circulated token and scheduled access were good, but insufficient at the same time; they were complementary, not alternative, and a complete fieldbus solution needs the two together” [3]. FOUNDATION Fieldbus, since being established, fully adopted an approach to provide both the predefined scheduling philosophy of FIP and the token rotation philosophy of Profibus. Section 9.2 provides more details about these two fundamental mechanisms. Cornerstone 2 allowed the Fieldbus Foundation to avoid a situation (which affected most of the previous fieldbus proposals) in which, after defining the communication stack, much more still needed to be done in order to make devices operational after being connected to a fieldbus. In fact, the previous fieldbus proposals started their developments by focusing on the communication aspects (physical media, access mechanisms, addressing, connections, quality of services, etc.). That was mostly motivated by the fact that when switching from a dedicated low-frequency 4- to 20-A signal to a multidata high-frequency serial link, the most evident change was relevant to the communication mechanism itself: Was that cable still good? What is the best encoding method? How can we guarantee noise recovery? Will too many data on the same medium affect their timeliness? And so on. But once the communication aspects were defined and proven, the data, which such communication was able to transfer between two or more devices, did not make those devices able to interoperate. FOUNDATION Fieldbus had this concept clear in sight since the beginning and included the definition of data semantics plus their configuration and use within the first set of specifications. The only aspect that initially the Fieldbus Foundation intentionally left out was the higher-speed version of Fieldbus, H2. That was mostly because the market addressed by the Fieldbus Foundation was relevant to the replacement of existing 4- to 20-mA devices with the ones compliant with FOUNDATION Fieldbus, trying to reduce as much as possible the relevant costs. This market strategy, chosen in order to make easier the adoption of the new fieldbus technology, could be realized if the already-laid-down twisted-pair cables (connecting the old 4- to 20-mA devices), which support only the slow-speed version, H1, were maintained. This explains why such a strong effort was put in the development of the H1 technology as opposed to the higher-speed H2 version of FOUNDATION Fieldbus. For H2, the Fieldbus Foundation initially planned to adopt the IEC/ISA high-speed standard [2], but ultimately decided to use the High-Speed Ethernet (HSE) instead, mainly due to the wide availability of components and the existence of networks in the plants (at least at the backbone level).
9.2 Technical Description of FOUNDATION Fieldbus FOUNDATION fieldbus is an all-digital, serial, two-way communication system. FOUNDATION Fieldbus specifications include two different configurations, H1 and HSE [1][10]. H1 (running at 31.25 kbit/s) interconnects field equipment such as sensors, actuators, and inputs/outputs (I/Os). HSE (running at 100 Mbit/s) provides integration of controllers (such as distributed control systems [DCSs] and programmable logic controllers [PLCs]), H1 subsystems (via a linking device), data servers, and workstations. HSE is based on standard Ethernet technology to perform its role. In detail: • The H1 FOUNDATION Fieldbus communication system is mainly devoted to distributed continuous process control and to replacement of existing 4- to 20-mA devices. Its communication functionalities, specifically foreseen for time-critical applications, are supported by services grouped within levels, as in all other OSI-RM open-system architectures. The number of levels is minimal, in order to guarantee maximum speed in the data handling. Below the fieldbus application layer, H1 FOUNDATION Fieldbus directly presents the data link layer, managing access to the communication
© 2005 by CRC Press
FOUNDATION Fieldbus: History and Features
FIGURE 9.1
9-3
H1 FOUNDATION Fieldbus vs. ISO/OSI architecture.
FIGURE 9.2 HSE FOUNDATION Fieldbus vs. ISO/OSI architecture.
channel. A physical layer deals with the problem of interfacing with the physical medium. A network and system management layer is also present. Figure 9.1 compares the H1 FOUNDATION Fieldbus architecture against the International Organization for Standardization (ISO)/Open Systems Interconnection (OSI) reference model. • The HSE FOUNDATION Fieldbus defines an application layer and associated management functions, designed to operate over a standard Transmission Control Protocol (TCP)/User Datagram Protocol (UDP)/Internet Protocol (IP) stack, over twisted-pair or fiber-optic switched Ethernet. It is mainly foreseen for discrete manufacturing applications, but of course, it can also be used to interconnect H1 segments, as well as foreign protocols trough IP/TCP gateways, in order to build complete plant networks. Figure 9.2 compares the HSE FOUNDATION Fieldbus architecture against the ISO/ OSI reference model. Looking at Figure 9.1 and Figure 9.2, it becomes evident that the Fieldbus Foundation has specified a user application layer, significantly differentiating the solution from the ISO/OSI model, which does not define such a layer. The H1 and HSE FOUNDATION Fieldbus user application layer is mainly based on function blocks, providing a consistent definition of inputs and outputs that allow seamless distribution and integration of functionality from various vendors [11].
9.2.1 H1 and HSE FOUNDATION Fieldbus User Application Layer As mentioned above, the Fieldbus Foundation has defined a standard user application layer based on blocks. Blocks are representations of different types of application functions. The types of blocks used in a user application are resource, transducer, and function. Devices are configured by using resource blocks and transducer blocks. The control strategy is built by using function blocks [11] instead.
© 2005 by CRC Press
9-4
The Industrial Communication Technology Handbook
TABLE 9.1
Basic Function Blocks
Function Block Name
Symbol Name
Analog Input Analog Output Bias/Gain Control Selector Discrete Input Discrete Output Manual Loader Proportional/Derivative Proportional/Integral/Derivative Ratio
AI AO BG CS DI DO ML PD PID RA
H1 Fieldbus
Al 110 PID 110 AO 110
FIGURE 9.3 Example of a complete control loop using function blocks in FOUNDATION Fieldbus devices.
9.2.1.1 Resource Block The resource block describes characteristics of the fieldbus device such as the device’s name, manufacturer, and serial number. There is only one resource block in a device. 9.2.1.2 Function Block Function blocks (FBs) provide the control system behavior. The input and output parameters of function blocks running in different devices can be linked over the fieldbus. The execution of each function block is precisely scheduled. There can be many function blocks in a single user application. The Fieldbus Foundation has defined sets of standard function blocks. Ten standard function blocks for basic control are defined in [12]. These blocks are summarized in Table 9.1. Other, more complex standard function blocks are defined in References [13] and [14]. The flexible function block is defined in [15]. A flexible function block (FFB) is a user-defined block. FFB allows a manufacturer or user to define block parameters and algorithms to suit an application that interoperates with standard function blocks and host systems. Function blocks can be built into fieldbus devices as required in order to achieve the desired device functionality. For example, a simple temperature transmitter may contain an analog input (AI) function block. A control valve might contain a proportional integrative derivative (PID) function block as well as the expected analog output (AO) block. Thus, a complete control loop can be built using only a simple transmitter and a control valve (Figure 9.3). 9.2.1.3 Transducer Blocks Like the resource blocks, the transducer blocks are used to configure devices. Transducer blocks decouple function blocks from the local input/output functionalities required in order to read sensors or to command actuator’s output. They contain information such as calibration date and sensor type [16][17].
© 2005 by CRC Press
FOUNDATION Fieldbus: History and Features
9-5
9.2.2 H1 FOUNDATION Fieldbus The H1 FOUNDATION Fieldbus is made up by different layers (as shown in Figure 9.1), whose functionalities will be described in the following. 9.2.2.1 H1 FOUNDATION Fieldbus Physical Layer The H1 FOUNDATION Fieldbus physical layer has been conceived to receive messages from the communication stack in order to convert them into physical signals on the fieldbus transmission medium and vice versa [18]. Conversion tasks include the adding and removing of preambles, start delimiters, and end delimiters. The preamble is used by the receiver to synchronize its internal clock with the incoming fieldbus signal. The receiver uses the start delimiter to find the beginning of a fieldbus message. After finding the start delimiter, the receiver accepts data until the end delimiter is received. The physical layer is defined by approved standards issued by the IEC and ISA. In particular, the FOUNDATION Fieldbus H1 physical layer is the 31.25-Kbaud version of the IEC (type 1)/ISA fieldbus [2][19]. Signals (±10 mA on 50-ohm load) are encoded using the synchronous Manchester biphase-L technique and can be conveyed on low-cost twisted-pair cables. The signal is called synchronous serial because the clock information is embedded in the serial data stream. Data are combined with the clock signal while creating the fieldbus signal. The receiver of the fieldbus signal interprets a positive transition in the middle of a bit time as a logical 0 and a negative transition as a logical 1. Special codes are defined for the preamble, start delimiter, and end delimiter. Special N+ and N– characters are used in the start delimiter and end delimiter. Note that the N+ and N– signals do not have a transition in the middle of a bit time. 9.2.2.1.1 Fieldbus Signaling The transmitting device delivers ±10 mA at 31.25 kbit/s into a 50-ohm equivalent load to create a 1.0volt peak-to-peak voltage modulated on top of the direct current (DC) supply voltage. The DC supply voltage can range from 9 to 32 volts. The 31.25 kbit/s fieldbus also supports intrinsically safe (I.S.) fieldbuses for bus-powered devices. To accomplish this, an I.S. barrier is placed between the power supply in the safe area and the I.S. device in the hazardous area. In this case (I.S. applications), the allowed power supply voltage depends on the barrier rating. 9.2.2.1.2 Fieldbus Wiring H1 FOUNDATION Fieldbus wiring is based on trunk cables featuring terminators installed at each end. The H1 FOUNDATION Fieldbus allows for stubs or spurs located anywhere along the trunk and connected to the trunk by junction box, as shown by Figure 9.4. A single device can be connected by each spur. Existence of spurs allows 31.25 kbit/s devices to operate on wiring previously used for 4- to 20-mA devices [20][21]. More trunks can be connected in a fieldbus link by means of repeaters. Up to five trunks (by means of four repeaters) can be interconnected. Spur length varies from 1 up to 120 m, depending on the number of devices connected to the fieldbus link. The maximum number of devices on a fieldbus link is 32; the actual number depends on factors such as the power consumption of each device, the type of cable, the use of repeaters, etc. In particular, the maximum number of devices is usually 6 for intrinsically safe applications that are power delivered through the bus, 12 for nonintrinsically safe applications that are power delivered through the bus, and 32 for nonintrinsically safe applications that are not power delivered through the bus [22]. The total trunk length (including spurs) is 1900 m, and the number of network addresses available for each link is 240. 9.2.2.2 H1 FOUNDATION Fieldbus Data Link Layer The H1 FOUNDATION Fieldbus data link layer (DLL) controls transmission of messages onto the fieldbus. As mentioned already, FOUNDATION Fieldbus fully adopted the IEC (type 1)/ISA DLL statement: “Both
© 2005 by CRC Press
9-6
The Industrial Communication Technology Handbook
FIGURE 9.4 Trunk, junction box, and spurs in H1 FOUNDATION Fieldbus.
paradigms, circulated token and scheduled access were good, but insufficient at the same time; they were complementary, not alternative, and a complete fieldbus solution needs the two together” [3][23][24]. Thus, a fieldbus needs a good mix of circulated token and scheduled access, well balanced to avoid losing bandwidth for the scheduled access when it is not really needed, but always giving priority to scheduled access over circulated token when a conflict arises. That is what the IEC (type 1)/ISA DLL documents propose and FOUNDATION Fieldbus made real: an overall schedule able to guarantee the needed data at the needed time but also allowing gaps within which a circulated token mechanism can take place while complying with a defined maximum rotation time. Such a philosophy clearly needs an arbitrator that univocally imposes the transmission of defined data at a defined time by a defined entity, when so required, but also guarantees a defined minimum amount of free time to each entity. This arbitrator is called link active scheduler (LAS) within IEC (type 1)/ISA and H1 FOUNDATION Fieldbus DLL [3][23][24]. Essentially, LAS performs: • Access to the physical medium on a scheduled basis. • Circulation of the token only when no scheduled traffic is needed. The token is passed for a limited amount of time that is always shorter than the interval left before the next scheduled traffic. • A policy for the management of the token, according to which the token is returned to the LAS instead of passing it on to a new node so that the LAS, depending on the time left, can decide whether to actually pass the token once more or to resume link control to manage the scheduled traffic. Specific needs of the token mechanism include: • Giving enough contiguous token time to each node • Guaranteeing the most regular, as possible, token rotation time among all the nodes These needs are granted by the token method of IEC (type 1)/ISA and H1 FOUNDATION Fieldbus DLL. Other needs include: • Keeping the token cycle short enough • Satisfying the occurrence of high-priority events
© 2005 by CRC Press
9-7
FOUNDATION Fieldbus: History and Features
LAS
CD
DT
FIGURE 9.5
Centralized access mechanism.
These are substituted by the scheduled access method of IEC (type 1)/ISA and H1 FOUNDATION Fieldbus DLL. 9.2.2.2.1 Device Types Two types of devices are defined in the DLL specification: basic device and link master. Link master devices are capable of becoming the link active scheduler (LAS). Basic devices do not have the capability to become the LAS. 9.2.2.2.2 Scheduled Communication The way the LAS manages the centralized government is based on the following mechanism. The LAS has a list of transmitting times for all data buffers in all devices that need to be cyclically transmitted. When it is time for a device to send contents of a buffer, the LAS issues a compel data (CD) message to the device. Upon receipt of the CD, the device broadcasts or publishes the data item (DT) in the buffer to all devices on the fieldbus. Any device configured to receive the data is called a subscriber. Figure 9.5 shows this access mechanism. Scheduled data transfers are typically used for the regular, cyclic transfer of control loop data between devices on the fieldbus. 9.2.2.2.3 Unscheduled Communication The federal autonomy of each node is given by a bandwidth distribution mechanism based on the use of a circulating token. In unused portions of the bandwidth (i.e., not occupied by the transmission of CDs), the LAS sends a pass token (PT) to each node included in a particular list called live list (described in the following). Each token is associated with a maximum utilization interval, during which the receiving node can use the available bandwidth to transmit what it needs. On the expiration of the time interval
© 2005 by CRC Press
9-8
The Industrial Communication Technology Handbook
PT PT PT
RT RT RT PT
RT LAS PT
RT RT PT
FIGURE 9.6 Token passing mechanism.
or when the node completes its transmissions, the token is returned to the LAS by using another frame called return token (RT). A target token rotation time (TTRT) defines the interval time desired for each token rotation. The value to be assigned to this parameter is linked to the maximum admissible delay in the transmission of the asynchronous flow. Figure 9.6 shows the token circulation managed by the LAS. 9.2.2.2.4 Live List Maintenance The list of all devices that are properly responding to the pass token (PT) is called the live list. New devices may be added to the fieldbus at any time. The LAS periodically sends probe node (PN) messages to all the addresses not yet present in the live list. If a new device appears at an address and receives the PN, it immediately returns a probe response (PR) message. When a device returns a PR, the LAS adds the device to the live list and confirms its addition by sending the device a node activation message. The LAS is required to probe at least one address after it has completed a cycle of sending PTs to all the devices in the live list. The device will remain in the live list as long as it responds properly to the PTs sent from the LAS. The LAS will remove a device from the live list if the device does not reply to the relevant PT for three successive tries. Whenever a device is added or removed from the live list, the LAS broadcasts changes in the live list to all devices; this allows each link master device to maintain a current copy of the live list in order to be ready to become LAS if needed. 9.2.2.2.5 Data Link Time Synchronization A DLL time synchronization mechanism is provided so that any node can request the LAS for a scheduled action to be executed at a defined time that represents the same absolute instant for all the nodes.
© 2005 by CRC Press
9-9
FOUNDATION Fieldbus: History and Features
Is there time to do something before next scheduled CD ?
no
Wait until it’s time to issue the CD Issue the CD Send Idle Messages while waiting Legend: CD = Compel Data PN = Probe Node TD = Time Distribution PT = Pass Token
yes
Issue PN, TD, or PT
FIGURE 9.7 Link active scheduler algorithm.
The LAS periodically broadcasts a time distribution (TD) message on the fieldbus so that all devices have exactly the same data link time. This is important because scheduled communications on the fieldbus and scheduled function block executions in the user application layer are based on timing derived from these messages. 9.2.2.2.6 Link Active Scheduler Operation The algorithm used by the LAS is shown in Figure 9.7. 9.2.2.2.7 LAS Redundancy IEC (type 1)/ISA and H1 FOUNDATION Fieldbus DDL provides the possibility to have more than one potential LAS on each link as well as backup procedures that are essential for fieldbus availability. In particular, as a fieldbus may have multiple link masters, if the current LAS fails, one of the link masters will become the LAS and the operation of the fieldbus will continue. 9.2.2.3 H1 FOUNDATION Fieldbus Application Layer The H1 FOUNDATION Fieldbus application layer includes two sublayers: FAS and FMS [25][26]. 9.2.2.3.1 Fieldbus Access Sublayer Fieldbus access sublayer (FAS) uses both the scheduled and unscheduled features of the data link layer to provide services for the fieldbus message specification (FMS) [25]. The type of each FAS service is described by virtual communication relationships (VCRs). VCR defines the kind of information (messages) exchanged between two applications. Possible features of the VCR may be the number of receivers (one or many) for each transmitter, the memory organization (queue or buffer) used to store the messages to be sent/received, and the DLL mechanism used to send the message (PT or CD). The types of VCR defined by the Fieldbus Foundation are: • Client–server VCR type. The client–server VCR type is used for queued, unscheduled, userinitiated, one-to-one communication between devices on the fieldbus. Queued means that messages are sent and received in the order submitted for transmission, according to their priority, without overwriting previous messages. When a device receives a pass token (PT) from the LAS, it may send a request message to another device on the fieldbus. The requester is called the client and the device that received the request is called the server. The server sends the response when it receives a PT from the LAS. The client–server VCR type is used for operator-initiated requests such as setpoint changes, access to and change of a tuning parameter, alarm acknowledgment, and device upload and download.
© 2005 by CRC Press
9-10
The Industrial Communication Technology Handbook
• Report distribution VCR type. The report distribution VCR type is used for queued, unscheduled, user-initiated, one-to-many communications. When a device, holding an event or a trend report to send, receives a PT from the LAS, it sends its message to a group address defined by its VCR. Devices that are configured to listen for that VCR will receive the report. The report distribution VCR type is normally used by fieldbus devices to send alarm notifications to the operator consoles. • Publisher–subscriber VCR type. The publisher–subscriber VCR type is used for buffered, oneto-many communications. Buffered means that only the latest version of the data is maintained within the network. New data completely overwrite previous data. When a device receives compel data (CD), the device will publish (broadcast) its message to all devices on the fieldbus. Devices that wish to receive the published message are called subscribers. CD may be scheduled in LAS, or they may be sent by subscribers on an unscheduled basis. An attribute of the VCR indicates which method is used. The publisher–subscriber VCR type is normally used by the field devices for cyclic, scheduled publishing of user application function block inputs and outputs. 9.2.2.3.2 Fieldbus Message Specification Fieldbus message specification (FMS) services allow user applications to send messages to each other across the fieldbus by using a standard set of message formats. FMS describes the communication services, message formats, and protocol behavior needed to build messages for the user application [26]. Data that are communicated over the fieldbus are described by an object description. Object descriptions are collected together in a structure called an object dictionary (OD). The object description is identified by its index in OD. Index 0, called the object dictionary header, provides a description of the dictionary itself and defines the first index for the object descriptions of the user application. The user application object descriptions can start at any index above 255. Index 255 and below define standard data types such as Boolean, integer, float, bit string, and data structures that are used to build all other object descriptions. A virtual field device (VFD) is used to remotely view local device data described in the object dictionary. A typical device will have at least two VFDs: the network and system management VFD and the user application VFD. The network and system management VFD provides access to the network management information base (NMIB) and to the system management information base (SMIB). NMIB data include VCRs, dynamic variables, statistics, and LAS schedules (if the device is a link master). SMIB data include device tag and address information and schedules for function block execution. The user application virtual field device is used to make the device functions (the function of a fieldbus device is defined by the selection and interconnection of blocks) visible to the fieldbus communication system. The header of the user application object dictionary points to a directory that is always the first entry in the function block application. The directory provides the starting indices of all of the other entries used in the function block application. The VFD object descriptions and their associated data are accessed remotely over the fieldbus network using virtual communication relationships. FMS communication services provide a standard way for user applications, such as function blocks, to communicate over the fieldbus. Specific FMS communication services are defined for each object type. Table 9.2 summarizes the communication services available. Detailed descriptions for each service are provided in [26]. All of the FMS services use the client–server VCR type except as noted (see notes a and b in Table 9.2). 9.2.2.4 H1 FOUNDATION Fieldbus System Management Inside the H1 FOUNDATION Fieldbus specification, system management handles important system features [27] such as: • Function block scheduling. Function blocks must often be executed at precisely defined intervals and in the proper sequence for correct control system operation. System management synchronizes execution of the function blocks to a common time clock shared by all devices. A macrocycle is a single iteration of a schedule within a device. According to the type of the device, we can have a LAS macrocycle and a device macrocycle. According to the first one, the system management
© 2005 by CRC Press
9-11
FOUNDATION Fieldbus: History and Features
TABLE 9.2
Set of Services in FMS
Group Management and environment services
Object dictionary (OD) services
Variable access services
Event services
Downloading/uploading services
Program handling services
a b
Service
Meaning
Initiate Abort Reject Status Unsolicited status Identify Get OD Initiate put OD Put OD Terminate put OD Read Write Information reporta Define variable list Delete variable list Event notificationb Acknowledge event notification Alter event condition monitoring Request domain in upload Initiate upload sequence Upload segment Terminate upload sequence Request domain download Initiate upload sequence Upload segment Terminate upload sequence Generic initiate download sequence Generic download segment Generic terminate download Create program invocation Delete program invocation Start Stop Resume Reset Kill
Establish a communication Abort a communication Reject a nonvalid service Gives the status of a service Send a nonrequested status Read a device specification (vendor, type, and version) Read an object dictionary (OD) Start loading an OD Load an OD in a device Stop loading an OD Read a variable Update the value of a variable Send data Define a variable list Delete a variable list Notify an event Acknowledge an event Enable or disable an event Request of domain upload Initiate upload Upload data End upload Request of domain download Initiate upload Upload data End upload Open download Send data to device Sequence stop download Create a program object Delete a program object Start a program object Stop a program object Resume execution of a program Reset a program Kill a program
Service can only use the publisher–subscriber or report distribution on VCR. Service can only use the report distribution on VCR.
can synchronize execution of the function blocks across the entire fieldbus link. On the basis of the device macrocycle, instead the system management can synchronize execution of function blocks inside each device. • Application clock distribution. This function allows publication of the time of day to all devices, including automatic switchover to a redundant time publisher. FOUNDATION Fieldbus supports an application clock distribution function. The application clock is usually set equal to the local time of day or to universal coordinated time. System management has a time publisher that periodically sends an application clock synchronization message to all fieldbus devices. The data link scheduling time is sampled and sent with the application clock message so that the receiving devices can adjust their local application times. During the intervals between synchronization messages, application clock time is independently maintained within each device relying on its own internal clock. • Device address assignment. Fieldbus devices do not use jumpers or switches to configure addresses. Instead, device addresses are set by configuration tools using system management services. Every fieldbus device must have a unique network address and physical device tag for the fieldbus to operate properly. To avoid the need for address switches on the devices, assignment
© 2005 by CRC Press
9-12
The Industrial Communication Technology Handbook
of network addresses can be performed by configuration tools using system management services. The sequence for assigning a network address to a new device is as follows: • An unconfigured device will join the network at one of four special temporary default addresses. • A configuration tool will assign a physical device tag to the new device using system management services. • A configuration tool will choose an unused permanent address and assign it to the device using system management services. • The sequence is repeated for all devices that enter the network at a default address. • Devices store the physical device tag and node address in nonvolatile memory, so they will also retain these settings after a power failure. • Find tag service. For the convenience of host systems and portable maintenance devices, system management supports a service for finding devices or variables by a tag search. The “find tag query” message is broadcasted to all fieldbus devices. Upon receipt of the message, each device searches its virtual field devices for the requested tag and returns complete path information (if the tag is found), including the network address, VFD number, VCR index, and OD index. Once the path is known, the host or maintenance device can access the data by its tag. All of the configuration information needed by system management, such as the function block schedule, is described by object descriptions in the network and system management VFD. This VFD provides access to the system management information base and also to the network management information base. 9.2.2.5 H1 FOUNDATION Fieldbus Network Management H1 FOUNDATION Fieldbus network management mainly provides for the configuration of the communication stack [28].
9.2.3 HSE FOUNDATION Fieldbus The HSE FOUNDATION Fieldbus foresees the architecture depicted in Figure 9.2. As shown, its main feature is the use of Internet architecture (full TCP/UDP/IP and IEEE 802.3u stack [29][30][31]) for high-speed discrete control and, more generally, for interconnecting several H1 segments in order to achieve a plantwide fieldbus network [32]. Before describing the HSE FOUNDATION Fieldbus specifications, a brief overview of its general features will be given, with particular emphasis on the capability to interconnect different H1 FOUNDATION Fieldbus segments. There are four basic HSE device categories (but several of them are typically combined into a single real device): linking device, Ethernet device, host device, and gateway device. A linking device (LD) connects H1 networks to the HSE network. An Ethernet device (ED) may execute function blocks and may have some conventional I/Os. A gateway device (GD) interfaces other network protocols such as Modbus [33], DeviceNet [34], or Profibus [9]. A host device (HD) is a non-HSE device capable of communicating with HSE devices. Examples include configurators, operator workstations, and an OPC server. The network in Figure 9.8 shows a host system operating on an HSE bus segment labeled Segment A. Communications to H1 segments (B and C, as shown in the figure) are achieved by means of an Ethernet switch. The same switch is used to connect a second HSE segment (D) and a segment running a foreign protocol (E). Any of the devices connected to the switch may attempt communication to any other device, and it is the function of the switch to provide the correct routing and to negotiate transmission without collisions. The connecting mechanism between HSE and H1 segments is performed by a linking device (LD). A typical LD will serve multiple H1 segments, though for simplicity, only one segment per LD is shown in Figure 9.8. The connection between HSE and a foreign protocol is made through a gateway device (GD). The capabilities of the interconnections shown in Figure 9.8 are as follows:
© 2005 by CRC Press
9-13
FOUNDATION Fieldbus: History and Features
HSE Segment A
Host System
LD
LD H1 Segment B Foreign Protocol Segment E Switch
GD LD H1 Segment C
LD HSE Segment D
FIGURE 9.8 H1 and HSE FOUNDATION Fieldbus interconnection.
• HSE host/H1 segment. The HSE host interacts with a standard H1 device through an LD. In this situation, the HSE host is able to configure, diagnose, and publish and subscribe data to and from the H1 device. • HSE host/HSE segment. The HSE host interacts with an HSE device and is able to configure, diagnose, and publish and subscribe data to and from the HSE device. • H1/H1 segment. In this situation, the interaction is between two H1 devices on two distinct H1 bus segments. The segments are connected to the Ethernet by LDs. Communications between devices on two H1 segments are functionally equivalent to communications between two H1 devices on the same bus segment. But it is clear that real-time communication between devices belonging to different H1 segments cannot be guaranteed due to the lack of a unique scheduler of the communication among different H1 segments. • HSE host/foreign protocol. This connection defines the relationship between a foreign device and the FOUNDATION Fieldbus application environment. The connection is made by GD. The foreign device is seen as a publisher to an HSE resident subscriber; the HSE host can handle the data stream from the I/O gateway in the same manner as it treats the data streams from devices on H1/HSE segments. 9.2.3.1 HSE FOUNDATION Fieldbus Physical, Data Link, Network, and Transport Layers As explained before, a higher-speed physical layer specification was always intended for selected process applications and for factory (discrete parts) automation. The original high-speed solution, called H2, was based on the H1 protocol and function block application running on different media at either 1 or 2.5 Mbit/s. In March 1998, the Foundation board of directors reconsidered the high-speed solution options and terminated further work on H2. The new approach was based on Ethernet and was intended to make use, as much as possible, of commercially available, off-the-shelf technology (COTS) components and software. The new solution, high-speed Ethernet, is designed to integrate multiple protocols, including multiple H1 FOUNDATION Fieldbus segments as well as foreign protocols, as described above. For its high-speed physical layer version, the Fieldbus Foundation has selected high-speed Ethernet at 100 Mbaud. The specifications for the physical layer, as well as for the Ethernet data link layer, are maintained by the Institute of Electrical and Electronics Engineers (IEEE) [30][31].
© 2005 by CRC Press
9-14
The Industrial Communication Technology Handbook
HSE also makes use of well-established Internet protocols that are maintained by the Internet Architecture Board. These include TCP (Transport Control Protocol), UDP (Unit Datagram Protocol), and IP (Internet Protocol) [29]. Standard HSE stack components are the Distributed Host Configuration Protocol (which assigns addresses), Simple Network Time Protocol (SNTP), and Simple Network Management Protocol (SNMP), which rely on TCP and UDP over IP and the IEEE 802.3 MAC and physical layers. This has resulted in a practically unlimited number of nodes (IP addressing) over star topology networks made of as many links as required, the length of which can be up to 100 m for twisted pair and 2000 m on fiber. Messages sent on the Ethernet are bounded by a series of data fields called frames. The combination of a message and frame is called an Ethernet packet. Typically, a packet encoded according to TCP/IP will be inserted in the message field of the Ethernet packet. FOUNDATION Fieldbus uses a similar data structure where messages are bounded by addressing and other data items. What corresponds to a packet in Ethernet is called a protocol data unit (PDU) in FOUNDATION Fieldbus. Let us consider a communication between two H1 devices over an interposed HSE segment, as illustrated in Figure 9.8. The easiest method for LD might be, upon receiving a communication from an H1 device, to simply insert the entire H1 PDU into the message part of the TCP/IP packet. Then LD on the destination H1 segment, upon receiving the Ethernet packet, would merely strip away the Ethernet frame and send the H1 PDU on to the receiving H1 bus segment. This technique is called tunneling and is commonly used in mixed-protocol networks. The solution developed by HSE FOUNDATION Fieldbus is somewhat more complex, but more efficient than tunneling. The HSE FOUNDATION Fieldbus PDU is inserted into the data field of a TCP/IP message field. However, the fieldbus address is encoded as a unique TCP/IP address, so the fieldbus PDU address is used to fill the address field of the TCP/IP packet. The entire TCP/IP packet is then inserted into the message field of the Ethernet packet. Because of the HSE encoding scheme, networks having multiple LDs can locate and transfer messages to the correct destination much more quickly, with far less extraneous bus traffic, as opposed to tunneling. Perhaps even more important, every H1 device (and every HSE device for that matter) has a unique TCP/IP address and can be directly accessed over standard IT and Internet networks. 9.2.3.2 HSE FOUNDATION Fieldbus Application Layer Existing Fieldbus specifications, which have been widely tested in H1 applications and had already been maintained by the Fieldbus Foundation, have been used in the HSE standard too, where applicable. These include fieldbus message specification (FMS) and system management (SM). New specifications were developed and tested to provide complete high-speed communications and control solutions. The new technology is based on the field device access agent (FDA agent) [35]. The FDA agent allows SM and FMS services used by the H1 devices to be conveyed over the Ethernet using standard UDP and TCP. This allows HSE devices to communicate with H1 devices that are connected via a linking device. The FDA agent is also used by the local function blocks in an HSE device. Thus, the FDA agent enables remote applications to access HSE devices and H1 devices through a common interface. 9.2.3.3 HSE FOUNDATION Fieldbus System Management The following aspects of management are supported in the HSE system management layer [36]: • • • •
Each device has a unique and permanent identity and a system-specific configured name. Devices maintain version control information. Devices respond to requests aiming to locate objects, including the device itself. Time is distributed to all devices on the network.
© 2005 by CRC Press
FOUNDATION Fieldbus: History and Features
9-15
• Function block schedules are used to execute function blocks. • Devices are added and removed from the network without affecting other devices on the network. 9.2.3.4 HSE FOUNDATION Fieldbus Network Management HSE network management allows HSE host systems to perform management operations over the HSE network [37]. The following capabilities are provided by the network management: • Configuring the H1 bridge, which performs data forwarding and republishing between H1 interfaces. • Loading the HSE session list or single entries in this list. An HSE session endpoint represents a logical communication channel between two or more HSE devices. • Loading the HSE VCR list or single entries in this list. An HSE VCR is a communication relationship used for accessing VFDs across HSE. • Performance monitoring for session endpoints, HSE VCRs, and the H1 bridge. • Fault detection monitoring. 9.2.3.5 HSE FOUNDATION Fieldbus Redundancy The HSE FOUNDATION Fieldbus specification provides for management of redundant network interfaces. This capability protects against single and multiple faults in the network. Each device monitors the network and selects the best route to the destination for each message it has to send. HSE provides for various levels of redundancy up to and including complete device and media redundancy. HSE fault tolerance is achieved by operational transparency; i.e., the redundancy operations are not visible to the HSE applications. This is necessary because HSE applications are required to coexist with standard information technology applications. The HSE local area network (LAN) redundancy entity (LRE) coordinates the redundancy function. Each HSE device periodically transmits on both of its Ethernet interfaces a diagnostic message (representing its view of the network) to the other HSE devices. Each device uses the diagnostic messages to maintain a network status table (NST), which is used for fault detection and transmission port selection. There is no central redundancy manager. Instead, each device determines its behavior in response to the faults it detects [38].
9.2.4 Open Systems Implementation One of the main features of FOUNDATION Fieldbus (in both H1 and HSE configurations) is the ability to build open communication systems. It is clear that this represents a key issue in a communication system, as building up a perfect communication stack between two devices is completely useless if those two devices are not able to understand the meaning of each other’s data or behavior. Implementation of open systems is achieved through the use of function blocks and the adoption of a standard way to represent them (the device description language (DDL)). FOUNDATION Fieldbus has defined a set of standard function blocks that can be combined and parameterized to build up a device [11]. Due to the standard format, behavior, and connection of such function blocks, their access and use through the bus is then immediate, allowing the achievement of interoperability and interchangeability. Further, a manufacturer can improve and innovate an existing function block, creating a new standard function block. The solution adopted inside FOUNDATION Fieldbus to realize this has been the DDL, able to provide a formal device description (DD) that can then be interpreted by the DD services library available through FOUNDATION Fieldbus [39][40][41]. Such a DD acts as a driver for each specific device and is supplied together with the device itself. Within each DD, and for each function block included in the device, a hierarchy of definitions is followed: (1) the universal parameters of the device itself, (2) the common parameters of each function block, (3) the common parameters of the transducer blocks, and (4) the parameters specific to the manufacturer. DD may also include small programs able to interoperate with the device (e.g., for its calibration), as well as download capability for managing manufacturers’ upgrading.
© 2005 by CRC Press
9-16
The Industrial Communication Technology Handbook
9.3 Conclusions The Fieldbus Foundation so far appears to be the only multisupplier organization able to achieve concrete results in proposing a large-scale fieldbus solution merging the initial FIP/Profibus proposals. That is mainly due to its features providing true device interoperability and a combination of guaranteed scheduling and token rotation.
References [1] www.fieldbus.org. [2] IEC 61158, Digital Data Communications for Measurements and Control: Fieldbus for Use in Industrial Control Systems: Part 2: Physical Layer Specification, 2001. [3] IEC 61158, Digital Data Communications for Measurements and Control: Fieldbus for Use in Industrial Control Systems: Parts 3 and 4: Data Link Layer Service and Protocol Definition, 2001. [4] IEC 61158, Digital Data Communications for Measurements and Control: Fieldbus for Use in Industrial Control Systems: Parts 5 and 6: Application Layer Service and Protocol Definition, 2001. [5] IEC 61784, Profile Sets for Continuous and Discrete Manufacturing Relative to Fieldbus Use in Industrial Control Systems, 2001. [6] J.D. Decotignie, P. Pleinevaux, Time critical communication networks: field buses, IEEE Network, 2. [7] J.D. Decotignie, P. Pleinevaux, A survey on industrial communication networks, Annales des Telecommunications, 48, 9–10. [8] www.worldfip.org. [9] www.profibus.org. [10] CENELEC EN50170/A1, General Purpose Field Communication System, Addendum A1, Foundation Fieldbus, 2000. [11] Fieldbus Foundation FF-890, Function Block Application Process: Part 1. [12] Fieldbus Foundation FF-891, Function Block Application Process: Part 2. [13] Fieldbus Foundation FF-892, Function Block Application Process: Part 3. [14] Fieldbus Foundation FF-893, Function Block Application Process: Part 4. [15] Fieldbus Foundation FF-894, Function Block Application Process: Part 5. [16] Fieldbus Foundation FF-902, Transducer Block Application Process: Part 1. [17] Fieldbus Foundation FF-903, Transducer Block Application Process: Part 2. [18] Fieldbus Foundation FF-816, 31.25 kbit/s Physical Layer Profile Specification. [19] ISA S50.02, Physical Layer Standard, 1992. [20] Fieldbus Foundation AG-140, 31.25 kbit/s Wiring and Installation Guide. [21] Fieldbus Foundation AG-165, Fieldbus Installation and Planning Guide. [22] Fieldbus Foundation AG-163, 31.25 kbit/s Intrinsically Safe Systems Application Guide. [23] Fieldbus Foundation FF-821, Data Link Services Subset. [24] Fieldbus Foundation FF-822, Data Link Protocol Subset. [25] Fieldbus Foundation FF-875, Fieldbus Access Sublayer. [26] Fieldbus Foundation FF-870, Fieldbus Message Specification. [27] Fieldbus Foundation FF-800, System Management Specification. [28] Fieldbus Foundation FF-801, Network Management. [29] Douglas E. Comer, Internetworking with TCP/IP, Vol. I, Principles, Protocols, and Architecture, Prentice Hall International, Englewood Cliffs, NJ, 1999. [30] ANSI/IEEE 802.3, IEEE Standards for Local Area Networks: CSMA/CD Access Method and Physical Layer Specifications, 1985. [31] ANSI/IEEE 802.3u, IEEE Standards for Local Area Networks: Supplement to CSMA/CD Access Method and Physical Layer Specifications: MAC Parameters, Physical Layer, MAUs, and Repeater for 100 Mb/s Operation, Type 100BASE-T, 1995. [32] Fieldbus Foundation FF-581, System Architecture.
© 2005 by CRC Press
FOUNDATION Fieldbus: History and Features
[33] [34] [35] [36] [37] [38] [39] [40] [41]
www.modbus.org. CENELEC EN 50325-2, DeviceNet. Fieldbus Foundation FF-588, HSE Field Device Access Agent. Fieldbus Foundation FF-589, HSE System Management. Fieldbus Foundation FF-803, HSE Network Management. Fieldbus Foundation FF-593, HSE Redundancy. Fieldbus Foundation FD-900, Device Description Language Specification. Fieldbus Foundation FD-110, DDS User’s Guide. Fieldbus Foundation FD-100, DDL Tokenizer User’s Manual.
© 2005 by CRC Press
9-17
10 PROFIBUS: Open Solutions for the World of Automation 10.1 Basics ..................................................................................10-1 10.2 Transmission Technologies ...............................................10-2 10.3 Communication Protocol .................................................10-4 PROFIBUS DP • System Configuration and Device Types • Cyclic and Acyclic Data Communication Protocols
10.4 Application Profiles...........................................................10-8 General Application Profiles • Specific Application Profiles • Master and System Profiles
Ulrich Jecht UJ Process Analytics
Wolfgang Stripf Siemens AG
Peter Wenzel PROFIBUS International
10.5 10.6 10.7 10.8
Integration Technologies ................................................10-17 Quality Assurance ...........................................................10-19 Implementation...............................................................10-20 Prospects ..........................................................................10-21 PROFINET CBA • PROFINET IO • The PROFINET Migration Model
Abbreviations .............................................................................10-22 References ...................................................................................10-23
10.1 Basics Fieldbuses are industrial communication systems with bit-serial transmission that use a range of media such as copper cable, fiber optics, or radio transmission to connect distributed field devices (sensors, actuators, drives, transducers, analyzers, etc.) to a central control or management system. Fieldbus technology was developed in the 1980s with the aim of saving cabling costs by replacing the commonly used central parallel wiring and dominating analog signal transmission (4- to 20-mA or ±10-V interface) with digital technology. Due to the different industry-specific demands to sponsored research and development projects or preferred proprietary solutions of large system manufacturers, several bus systems with varying principles and properties were established in the market. The key technologies are now included in the recently adopted standards IEC 61158 and 61784 [1]. PROFIBUS is an integral part of these standards. Fieldbuses create the basic prerequisite for distributed automation systems. Over the years they evolved to instruments for automated processes with high productivity and flexibility compared to conventional technology. PROFIBUS is an open, digital communication system with a wide range of applications, particularly in the fields of factory and process automation, transportation, and power distribution. PROFIBUS is suitable for both fast, time-critical applications and complex communication tasks (Figure 10.1).
10-1 © 2005 by CRC Press
10-2
The Industrial Communication Technology Handbook
PROFIBUS
Upstream Inbound logistics
PROFIBUS
PROFIBUS
Mainstream
PROFIBUS
Downstream Outbound
Production
logistics
Automation Technology
FIGURE 10.1 PROFIBUS suitable for all decentralized applications.
The application and engineering aspects are specified in the generally available guidelines of the PROFIBUS International [2]. This fulfills user demand for standardization, manufacturer independence, and openness and ensures communication between devices of various manufacturers. Based on a very efficient and extensible communication protocol, combined with the development of numerous application profiles (communication models for device type families) and a fast-growing number of devices and systems, PROFIBUS began its record of success, initially in factory automation and, since 1995, in process automation. Today, PROFIBUS is the world market leader for fieldbuses with more than a 20% share of the market, approximately 500,000 equipped plants, and more than 12 million nodes. Today, there are more than 2000 PROFIBUS products available from a wide range of manufacturers. The success of PROFIBUS stems in equal measures from its progressive technology and the strength of its noncommercial PROFIBUS User Organization e.V. (PNO), the trade body of manufacturers and users founded in 1989. Together with the 25 other regional PROFIBUS associations within countries all around the world and the international umbrella organization PROFIBUS International (PI) founded in 1995, this organization now totals more than 1200 members worldwide. Objectives are the continuing further development of PROFIBUS technology and increasing worldwide acceptance. PROFIBUS has a modular structure (PROFIBUS toolbox) and offers a range of transmission and communication technologies, numerous application and system profiles, and device management and integration tools [8]. Thus, PROFIBUS covers the various and application-specific demands from the field of factory to process automation, from simple to complex applications, by selecting the adequate set of components out of the toolbox (Figure 10.2).
10.2 Transmission Technologies PROFIBUS features four different transmission technologies, all of which are based on international standards. They all are assigned to PROFIBUS in both IEC 61158 and IEC 61784: RS485, RS485-IS, MBPIS (IS stands for intrinsic safety protection), and fiber optics. RS485 transmission technology is simple and cost-effective and primarily used for tasks that require high transmission rates. Shielded, twisted-pair copper cable with one conductor pair is used. No expert knowledge is required for installation of the cable. The bus structure allows addition or removal of stations or the step-by-step commissioning of the system without interfering with other stations. Subsequent expansions (within defined limits) have no effect on stations already in operation.
© 2005 by CRC Press
10-3
Integration Technologies
System Profiles 1…x • Master Conformance Classes • Interfaces (Comm-FB, FDT, etc.) • Constraints
Encoder
Ident
PROFIdrive
SEMI
Common application profiles (optional): I&M functions, PROFIsafe, Time stamp, Redundancy, etc.
Communication IEC 61158/61784 protocol
PROFIBUS DP DP-V0...V2
Transmission technologies
• Descriptions (GSD, EDD) • Tools (DTM, Configurators)
Application profiles I
RIO for PA
PA Devices
Application profiles II
Weighing & Dosage
PROFIBUS: Open Solutions for the World of Automation
RS 485 NRZ RS 485-IS Intrinsic Safety
Fiber Glass Multi Mode Optics: Glass Single Mode PCF/Plastic Fiber
MBP: Manchester Bus Powered MBP-LP: Low Power MBP-IS: Intrinsic Safety
FIGURE 10.2 Structure of PROFIBUS system technology.
Various transmission rates can be selected from 9.6 Kbit/s up to 12 Mbit/s. One uniform speed is selected for all devices on the bus when commissioning the system. Up to 32 stations (master or slaves) can be connected in a single segment. For connecting more than 32 stations, repeaters can be used. The maximum permissible line length depends on the transmission rate. Different cable types (type designation A to D) for different applications are available on the market for connecting devices either to each other or to network elements (segment couplers, links, and repeaters). When using RS485 transmission technology, PI recommends the use of cable type A. RS485-IS transmission technology responds to an increasing market demand to support the use of RS485 with its fast transmission rates within intrinsically safe areas. A PROFIBUS guideline is available for the configuration of intrinsically safe RS485 solutions with simple device interchangeability. The interface specification details the levels for current and voltage that must be adhered to by all stations in order to ensure safe operation during interconnection. An electric circuit limits currents at a specified voltage level. When connecting active sources, the sum of the currents of all stations must not exceed the maximum permissible current. In contrast to the FISCO model (see below), all stations represent active sources. Up to 32 stations may be connected to the intrinsically safe bus circuit. MBP type transmission technology (Manchester coding and bus powered) is a new term that replaces the previously common terms for intrinsically safe transmission such as physics in accordance with IEC 61158-2, 1158-2, etc. In the meantime, the current version of IEC 61158-2 (physical layer) describes several different transmission technologies, MBP technology being just one of them. Thus, differentiation in naming was necessary. MBP is a synchronous, Manchester-coded transmission with a defined transmission rate of 31.25 Kbit/s. In the MBP-IS version, it is frequently used in process automation as it satisfies the key demands of the chemical and petrochemical industries for intrinsic safety and bus powering using two-wire technology. MBP transmission technology is usually limited to a specific segment (field devices in hazardous areas) of a plant, which is then linked to a RS485 segment via a segment coupler or links (Figure 10.3). Segment couplers are signal converters that modulate the RS485 signals to the MBP signal level and vice versa. They are transparent from a bus protocol’s point of view. In contrast, links provide more computing power. They virtually map the entire field devices connected to the MBP segment into the RS485 segment as a single slave. Tree or line structures (and any combination of the two) are network topologies supported by PROFIBUS with MBP transmission with up to 32 stations per segment and a maximum of 126 per network.
© 2005 by CRC Press
10-4
The Industrial Communication Technology Handbook
Control system (PLC)
Engineering or HMI tool
ε x
≤ 12 Mbit/s PROFIBUS DP/RS 485 Actuator + ε x
31.25 Kbit/s
I
PROFIBUS DP/MBP-IS Transducer Segment coupler/link
FIGURE 10.3 Intrinsic safety and powering of field devices using MBP-IS.
Fiber-optic transmission technology is used for fieldbus applications with very high electromagnetic interference or that are spread over a large area or distance. The PROFIBUS guideline for fiber-optic transmission [3] specifies the technology available for this purpose, including multimode and singlemode glass fiber, plastic fiber, and hard-clad silica (HCS) fiber. Of course, while developing these specifications, great care was taken to allow problem-free integration of existing PROFIBUS devices in a fiberoptic network without the need to change the protocol behavior of PROFIBUS. This ensures backward compatibility with existing PROFIBUS installations. The internationally recognized FISCO model considerably simplifies the planning, installation, and expansion of PROFIBUS networks in potentially explosive areas. FISCO stands for fieldbus intrinsically safe concept. It was developed by the German PTB [4]. The model is based on the specification that a network is intrinsically safe and requires no individual intrinsic safety calculations when the relevant four bus components (field devices, cables, segment couplers, and bus terminators) fall within predefined limits with regard to voltage, current, output, inductance, and capacity. The corresponding proof can be provided by certification of the components through authorized accreditation agencies, such as PTB (Germany), UL and FM (U.S.), and others. If FISCO-approved devices are used, not only is it possible to operate more devices on a single line, but the devices can be replaced during runtime by devices of other manufacturers, or the line can be expanded — all without the need for time-consuming calculations or system certification. So you can simply plug and play, even in hazardous areas.
10.3 Communication Protocol 10.3.1 PROFIBUS DP At the protocol level, PROFIBUS with decentralized peripherals (DP) and its versions DP-V0 to DP-V2 offer a broad spectrum of optional services, which enable optimum communication between different applications. DP has been designed for fast data exchange at the field level. Data exchange with the distributed devices is primarily cyclic. The communication functions required for this are specified through the DP basic functions (version DP-V0). Geared toward the special demands of the various areas of application, these basic DP functions have been expanded step by step with special functions, so that DP is now available
© 2005 by CRC Press
PROFIBUS: Open Solutions for the World of Automation
10-5
in three versions — DP-V0, DP-V1, and DP-V2 — whereby each version has its own special key features. All versions of DP are specified in detail in IEC 61158 and 61784, respectively. Version DP-V0 provides the basic functionality of DP, including cyclic data exchange as well as station diagnosis, module diagnosis, and channel-specific diagnosis. Version DP-V1 contains enhancements geared toward process automation, in particular acyclic data communication for parameter assignment, operation, visualization, and alarm handling of intelligent field devices, in coexistence with cyclic user data communication. This permits online access to stations using engineering tools. In addition, DP-V1 defines alarms. Examples for different types of alarms are status alarm, update alarm, and a manufacturer-specific alarm. Version DP-V2 contains further enhancements and is geared primarily toward the demands of drive technology. Due to additional functionalities, such as isochronous slave mode and slave-to-slave(s) communication (data exchange broadcast (DXB)), etc., DP-V2 can also be implemented as a drive bus for controlling fast movement sequences in drive axes.
10.3.2 System Configuration and Device Types DP supports implementation of both monomaster and multimaster systems. This affords a high degree of flexibility during system configuration. A maximum of 126 devices (masters or slaves) can be connected to a bus network. In monomaster systems, only one master is active on the bus during operation of the bus system. Figure 10.4 shows the system configuration of a monomaster system. In this case, the master is hosted by a programmable logic controller (PLC). The PLC is the central control component. The slaves are connected to the PLC via the transmission medium. This system configuration enables the shortest bus cycle times. In multimaster systems several masters are sharing the same bus. They represent both independent subsystems, comprising masters and their assigned slaves, and additional configuration and diagnostic master devices. The masters are coordinating themselves by passing a token from one to the next. Only the master that holds the token can communicate. PROFIBUS DP differentiates three groups of device types on the bus. DP master class 1 (DPM1) is a central controller that cyclically exchanges information with the distributed stations (slaves) at a specified message cycle. Typical DPM1 devices are PLCs or PCs. A DPM1 has active bus access with which it can read measurement data (inputs) of the field devices and write the set-point values (outputs) of the actuators at fixed times. This continuously repeating cycle is the basis of the automation function (Figure 10.4). PLC with Master Class 1
Bus cycle
1 2
Slaves
FIGURE 10.4 PROFIBUS DP monomaster system (DP-V0).
© 2005 by CRC Press
10-6
The Industrial Communication Technology Handbook
Power_On
Wait on Parameterization
Optional: - set slave address - get slave diagnosis
Parameterization Configuration
not ok
Wait on Configuration Configuration
Slave fault or timeout
Data Exchange
ok Optional: - get configuration - get slave diagnosis ok Diagnosis telegram instead of process data
FIGURE 10.5 State machine for slaves.
DP master class 2 (DPM2) consists of engineering, configuration, or operating devices. They are put in operation during commissioning and for maintenance and diagnostics in order to configure connected devices, evaluate measured values and parameters, and request the device status. A DPM2 does not have to be permanently connected to the bus system. The DPM2 also has active bus access. DP slaves are peripherals (input/output (IO) devices, drives, human machine interfaces (HMIs), valves, transducers, analyzers), which read in-process information or use output information to intervene in the process. There are also devices that solely provide input information or output information. As far as communication is concerned, slaves are passive devices: they only respond to direct queries (see Figure 10.4, sequences ① and ➁). This behavior is simple and cost-effective to implement. In the case of DPV0, it is already completely included in the Bus-ASIC.
10.3.3 Cyclic and Acyclic Data Communication Protocols Cyclic data communication between the DPM1 and its assigned slaves is automatically handled by the DPM1 in a defined, recurring sequence (Figure 10.4). The appropriate services are called MS0. The user defines the assignment of the slave(s) to the DPM1 when configuring the bus system. The user also defines which slaves are to be included/excluded in the cyclic user data communication. DPM1 and the slaves are passing three phases during start-up: parameterization, configuration, and cyclic data exchange (Figure 10.5). Before entering the cyclic data exchange state, the master first sends information about the transmission rate, the data structures within a PDU, and other slave-relevant parameters. In a second step, it checks whether the user-defined configuration matches the actual device configuration. Within any state the master is enabled to request slave diagnosis in order to indicate faults to the user. An example for the telegram structure for the transmission of information between master and slave is shown in Figure 10.6. The telegram starts with some synchronization bits, the type (SD) and length (LE) of the telegram, the source and destination addresses, and a function code (FC). The function code indicates the type of message or content of the load (processing data unit) and serves as a guard to control the state machine of the master. The PDU, which may carry up to 244 bytes, is followed by a safeguard mechanism frame-checking sequence (FCS) and a delimiter. One example for the usage of the function code is the indication of a fault situation on the slave side. In this case, the master sends a special diagnosis request instead of the normal process data exchange that the slave replies to with a diagnosis message. It comprises 6 bytes of fixed information and userdefinable device and module- or channel-related diagnosis information [1], [7].
© 2005 by CRC Press
10-7
PROFIBUS: Open Solutions for the World of Automation
Stream of standard PROFIBUS telegrams (S) S
Sync time
S
S
SD LE LEr SD DA
33TBit 68H
...
...
68H ....
S
S
S
SA
FC
Processing Data Unit
FCS ED
....
...
1.......244 Bytes
..... 16H
1 Cell = 11 Bit LE SB ZB ZB ZB ZB ZB ZB ZB ZB PB EB 0 1 2 3 4 5 6 7 TBit SD LE LEr DA SA FC
PDU
= Processing Data Unit, 244 Bytes maximum = Clock-Bit = 1/Baudrate = Frame Checking Sequence FCS = Start Delimiter (here SD2, var. data length) (across data within LE) = Length of Process Data ED = End Delimiter = Repetition of Length; no check in FCS SB = Start-Bit = Destination Address ZB0...7 = Character-Bit = Source Address PB = (even) Parity Bit = Function Code (Message type) EB = Stop-Bit
FIGURE 10.6 PROFIBUS DP telegram structure (example).
In addition to the single station-related user data communication, which is automatically handled by the DPM1, the master can also send control commands to all slaves or a group of slaves simultaneously. These control commands are transmitted as multicast messages and enable sync and freeze modes for event-controlled synchronization of the slaves [1], [7]. For safety reasons, it is necessary to ensure that DP has effective protective functions against incorrect parameterization or failure of transmission equipment. For this purpose, the DP master and the slaves are fitted with monitoring mechanisms in the form of time monitors. The monitoring interval is defined during configuration. Acyclic data communication is the key feature of version DP-V1. This forms the requirement for parameterization and calibration of the field devices over the bus during runtime and for the introduction of confirmed alarm messages. Transmission of acyclic data is executed parallel to cyclic data communication, but with lower priority. Figure 10.7 shows some sample communication sequences for a master class 2, which is using MS2 services. In using MS1 services, a master class 1 is also able to execute acyclic communications. Slave-to-slave communications (DP-V2) enable direct and timesaving communication between slaves using broadcast communication without the detour over a master. In this case, the slaves act as publisher; i.e., the slave response does not go through the coordinating master, but directly to other slaves embedded in the sequence, the so-called subscribers (Figure 10.8). This enables slaves to directly read data from other slaves and use them as their own input. This opens up the possibility of completely new applications; it also reduces response times on the bus by up to 90%. The isochronous mode (DP-V2) enables clock synchronous control in masters and slaves, irrespective of the busload. The function enables highly precise positioning processes with clock deviations of <1 ms. All participating device cycles are synchronized to the bus master cycle through a global control broadcast message. A special sign of life (consecutive number) allows monitoring of the synchronization. Clock control (DP-V2) via a new master slave service synchronizes all stations to a system time with a deviation of <1 ms. This allows the precise tracking of events. This is particularly useful for the acquisition of timing functions in networks with numerous masters. This facilitates the diagnosis of faults as well as the chronological planning of events. Upload and download (DP-V2) allows the loading of any size of data area in a field device with the help of a few commands. Within IEC 61158 these services are called load region. This enables, for example, programs to be updated or devices replaced without the need for manually loading processes.
© 2005 by CRC Press
10-8
The Industrial Communication Technology Handbook
Token
PROFIBUS-DP Master Class 1
DP-Slave 1
Cycle:
PROFIBUS-DP Master Class 2
DP-Slave 2
Slave 1
Slave 2
Slave 3
Cyclic Access of Master 1
DP-Slave 3
Slave 3 Acyclic Access of Master 2
FIGURE 10.7 Cyclic and acyclic data communication with DP-V1.
PROFIBUS-DP Master Class 1
Output data
Publisher (e.g., lightcurtain) Slave
Input data via broadcast
Suscriber (e.g., drive) Slave
Subscriber (e.g., drive) Slave
FIGURE 10.8 Slave-to-slaves data exchange.
Addressing with slot and index is used for both cyclic and acyclic communication services (Figure 10.9). When addressing data, PROFIBUS assumes that the physical structure of the slaves is modular or can be structured internally in logical functional units, so-called modules (Figure 10.9). The slot number addresses the module and the index addresses the data records assigned to a module. Each data record can be up to 244 bytes. The modules begin at slot 1 and are numbered in ascending contiguous sequence. The slot number 0 is for the device itself. Compact devices are regarded as units of virtual modules. These can also be addressed with slot number and index.
10.4 Application Profiles Profiles are used in automation technology to define specific properties and behavior for devices, device families, or entire systems. Only devices and systems using a vendor-independent profile provide interoperability on a fieldbus, thereby fully exploiting the advantages of a fieldbus. Profiles take into account
© 2005 by CRC Press
10-9
PROFIBUS: Open Solutions for the World of Automation
Node
MS1 and MS2 (acyclic operation)
Index 255: “Load Region”; I&M functions as a subset
Index
Index Device Index 0-254
Module 1
Index 0-254
Module 2 Module 3 Module 3
Index 0-254
Index 0-254
8 Digital 16 Digital 8 Digital OUT IN OUT
0
1
2
Index 0-254
1 Analog IN
3
4
Data records up to 240 Bytes
255
0
...
...
2
0
1
2
1
0
1
2
0
0
1
3
...
239 240
Slot_Number ascending from left to right
FIGURE 10.9 Slot/index address model of a slave.
application- and type-specific special features of field devices, controls, and methods of integration (engineering). The term profile ranges from just a few specifications for a specific device class to comprehensive specifications for applications in a specific industry. The generic term for all profiles is application profiles. A distinction is then drawn between general application profiles with implementation options for different applications (this includes, for example, the profiles identification and maintenance (I&M) functions, PROFIsafe, redundancy, and time stamp); specific application profiles, which are developed for a specific application, such as PROFIdrive, SEMI (Semiconductor Equipment and Materials International), or process automation (PA) devices; and system and master profiles, which describe specific system performance that is available to field devices. PROFIBUS offers a wide range of such application profiles, which allow application-oriented implementation.
10.4.1 General Application Profiles Identification and maintenance functions are mandatory for all PROFIBUS devices with MS1 or MS2 services. The main purpose of the I&M functions described hereinafter is to support the end user during various scenarios of a device’s life cycle, be it configuration, commissioning, parameterization, diagnosis, repair, firmware update, asset management, audit trailing, etc. It is kind of a type plate or boiler plate. The corresponding parameters all are stored at the same address space within the PROFIBUS slot/index address model. Using the call mechanism of the load region services [1] opens up an additional subindex address space of 65535 data records. The I&M functions are assigned a space between 65000 and 65199 for basic, profile-specific, and manufacturer-specific items. Table 10.1 itemizes the individual parameters. The usage of IDs may not be very helpful for the user if a tool is displaying the information directly out of the device. However, nowadays laptops or engineering tools normally have access to the Internet, at least temporarily. Thus, it is quite easy to reference a central database (e.g., on the PROFIBUS Web server) and retrieve comprehensive and always actual information, even in a desired language (Figure 10.10). PROFIsafe is a comprehensive, open fieldbus solution for safety-relevant applications without the use of a second relay-based layer or proprietary safety buses. PROFIsafe defines how fail-safe devices (emer-
© 2005 by CRC Press
10-10
The Industrial Communication Technology Handbook
TABLE 10.1 Basic Identification and Maintenance Functions I&M Function
Data Format
MANUFACTURER_ID
2 Octets
ORDER_ID
20 Octets
SERIAL_NUMBER HARDWARE_REVISION SOFTWARE_REVISION
16 Octets 2 Octets 4 Octets
REV_COUNTER PROFILE_ID PROFILE_SPECIFIC_TYPE IM_VERSION IM_SUPPORTED
2 Octets 2 Octets 2 Octets 2 Octets 2 Octets
TAG_FUNCTION
32 Octets
TAG_LOCATION
22 Octets
INSTALLATION_DATE
16 Octets
DESCRIPTOR SIGNATURE
54 Octets 54 Octets
Notes I&M functions are using company IDs instead of names. These IDs are harmonized with the list of the HART Foundation and comprise extensions for additional companies. Order number for a particular device type. For virtual modular devices the root or highest level of the basic device. Unique identifier for a particular device (counter). The content of this parameter characterizes the edition of the hardware only. The content of this parameter characterizes the edition of the software or firmware of a device or module. The structure supports coarse and detailed differentiation that may be defined by the manufacturer: Vx.y.z. Indicates unplugging of modules or write access. Device or module corresponds to a particular PROFIBUS profile. This identifier references a device class defined within a PROFIBUS profile. Version of the I&M functions implemented within a device or module. Directory for the subset of I&M functions implemented within a device or module. User definable information about the “role” of the device within a plant facility. User definable information about the “location coordinates” of the device within a plant facility. Indicates the date of installation or commissioning of a device or module, e.g., 1995-02-04 16:23. User defined comments. Allows parameterization tools to store a “security” code as a reference for a particular parameterization session and audit trail tools to retrieve the code for integrity checks. Used for safety applications according 21 CFR 11 [6] or hazardous machinery (PROFIsafe).
gency stop push buttons, light curtains, level switches, etc.) can communicate over PROFIBUS with failsafe controllers in such a manner that they can be used for safety-relevant automation tasks up to category 4 compliance with EN 954 (ISO 13849) or SIL3 (safety integrity level) according to IEC 61508. It implements safe communications over a profile, i.e., over a special PROFIsafe data frame and a special protocol. PROFIsafe is a single-channel software solution that is implemented in the devices as an additional layer above layer 7 (Figure 10.11); the standard PROFIBUS components, such as lines, application-specific integrated circuits (ASICs), or protocols, remain unchanged. This ensures redundancy mode and retrofit capability. Devices with the PROFIsafe profile can be operated in coexistence with standard devices without restriction on the same bus (cable). PROFIsafe takes advantage of the acyclic communication (DP-V1) for full maintenance support of the devices and can be used with RS485, fiber-optic, or MBP transmission technology. This ensures both fast response times (important for the manufacturing industry) and low power consumption with intrinsically safe operation (important for process automation). HART on PROFIBUS DP integrates HART devices installed in the field, in existing or new PROFIBUS systems. It includes the benefits of the PROFIBUS communication mechanisms without any changes required to the PROFIBUS protocol and services, the PROFIBUS, or the state machines and functional characteristics. This profile is implemented in the master and slave above layer 7, thus enabling mapping of the HART client–master–server model on PROFIBUS. The cooperation of the HART Foundation on the specification work ensures complete conformity with HART specifications. The HART client application is integrated in a PROFIBUS master and the HART master in a PROFIBUS slave (Figure 10.12), whereby the latter serves as a multiplexer and handles communication to the HART devices. The time-stamp application profile describes mechanisms for supplying certain events and actions with a time stamp, which enables precise time assignment. Precondition is a clock control in the slaves through
© 2005 by CRC Press
10-11
PROFIBUS: Open Solutions for the World of Automation
Actual version of the table in XML format
PROFIBUS web server
ID Company Cust. Support
42 Comp Inc. 912-895-...
Download of the table to update the local table Internet
ID
Company
Cust. Support
42
Comp Inc.
912-895-...
Table within the engineering tool:
MANUFACTURER_ID = 42
MS2
slave
FIGURE 10.10 Referencing IDs via the Internet. fail-safe application
standard application
PROFIsafeLayer
standard application
fail-safe application
PROFIsafeLayer Standard PROFIBUS DP-Protocol
Standard PROFIBUS DP-Protocol
PROFIsafe layer
“black channel” = standard PROFIBUS
PROFIBUS-DP RS 485/MBP-IS
FIGURE 10.11 Safe communication on PROFIBUS DP.
a clock master via special services. An event can be given a precise system time stamp and read out accordingly. The concept of graded messages is used. The message types are summarized under the term alerts and are divided into high-priority alarms (these transmit a diagnostics message) and low-priority events. In both cases, the master acyclically reads the time-stamped process values and alarm messages from the alarm and event buffer of the field device (Figure 10.13). The slave redundancy application profile provides a slave redundancy mechanism (Figure 10.14): slave devices contain two different PROFIBUS interfaces that are called primary and backup (slave interface). These may be in a single device or distributed over two devices. The devices are equipped with two independent protocol stacks with a special redundancy expansion.
© 2005 by CRC Press
10-12
The Industrial Communication Technology Handbook
PROFIBUS master
PROFIBUS slave
HART device
HART client application
HART master
HART server
HART profile
HART profile
7
7
2
2
1
1
HART comm
HART comm
HART communication
PROFIBUS DP
FIGURE 10.12 Operating HART devices over PROFIBUS DP. Master
Alarm and event buffer MS 1
Variable Variable
MS 0
Variable Alarm
Slave
FIGURE 10.13 Time stamping and alarm messages.
n
Control System (Master) Life-List
FDL_Status
FDL_Status
PROFIBUS (Primary)
PROFIBUS (Backup)
Redundancy Extensions
RedCom
Process Data Redundant Slave
FIGURE 10.14 Slave redundancy in PROFIBUS.
© 2005 by CRC Press
m
Redundancy Extensions
10-13
PROFIBUS: Open Solutions for the World of Automation
Position control loop is closed via the bus
Model 1:
Model 2:
Automation Interpolation Position control
Automation Technology
Model 3: Automation technological instructions
target positions
velocity position Position control Velocity control
Technology Position control Velocity control
Velocity setpoints Velocity control within several axes Synchronism requirements are activated at the Slave-to-slave communication requirements same point in time Actual position of several axes are sampled at the same point in time
Technological coupling, Position setpoints within several axes are activated e.g., distributed setpoint at the same point in time cascades
FIGURE 10.15 Different requirements for distributed drive applications.
A redundancy communication (RedCom) runs between the protocol stacks, i.e., within a device or between two devices. It is independent of PROFIBUS and its performance capability is largely determined by the redundancy reversing times. Only one device version is required to implement different redundancy structures, and no additional configuration of the backup slave is necessary. The redundancy of PROFIBUS slave devices provides high availability, short reversing times, and no data loss and ensures fault tolerance.
10.4.2 Specific Application Profiles The PROFIdrive application profile defines device behavior and the access procedure to drive data for electric drives on PROFIBUS, from simple frequency converters to highly dynamic servo-controls. The method of integrating drives in automation solutions is highly dependent on the task of the drives (Figure 10.15). The more the drives are acting independently from central host controllers, the more they require slave-to-slave communication capabilities. On the other hand, the more the central host controllers are taking over the computing tasks, the more that synchronization of the involved drives is required. For this reason, PROFIdrive defines six classes covering the majority of applications (Figure 10.16): Standard drives (class 1): The drive is controlled by means of a main set-point value (e.g., rotational speed), whereby speed control is carried out in the drive controller. Standard drives with technological function (class 2): The automation process is broken down into several subprocesses and some of the automation functions are shifted from the central programmable controller to the drive controllers. PROFIBUS serves as the technology interface in this case. Slave-to-slave communication between the individual drive controls is a requirement for this solution. Positioning drive (class 3): Integrates an additional position controller in the drive, thus covering an extremely broad spectrum of applications (e.g., the twisting on and off of bottle tops). The positioning tasks are passed to the drive controller over PROFIBUS and started. Central motion control (classes 4 and 5): Enables the coordinated motion sequence of multiple drives. The motion is primarily controlled over a central numeric control (CNC). PROFIBUS serves to close the position control loop as well as synchronize the clock (Figure 10.17). The position control concept (dynamic servo-control) of this solution also supports extremely sophisticated applications with linear motors. Distributed automation by means of clocked processes and electronic shafts (class 6): Can be implemented using slave-to-slave communication and isochronous slaves. Sample applications include electrical gears, curve discs, and angular synchronous processes.
© 2005 by CRC Press
10-14
The Industrial Communication Technology Handbook
Class 1 standard drive
Class 2 standard drive with motion control
PLC
PLC
Class 4 Class 6 Class 5 Class 3 centralized distributed centralized position control position control, motion control, positioning drives drives, no synchronous synchronous with synchronous synchronization motion control PLC
Motion control
PLC
PLC
PLC
Motion control
Motion control
Position loop
Motion control Motion control
Position loop
Velocity loop
Velocity loop
Velocity loop
Current loop
Current loop
Current loop
Motion control Position loop
Position loop
Velocity loop
Velocity loop
Velocity loop
Current loop
Current loop
Current loop
FIGURE 10.16 PROFIdrive defines six application classes.
Controller
Control word + Speed setpoint + ....
Technology Interpolation, Position control
Global control
Status word + Actual position + ....
Isochronous mode (clock)
Drive
Drive
Velocity loop
Velocity loop
Drive
M
M
M
Encoder
Encoder
Encoder
Velocity loop
FIGURE 10.17 Positioning with central interpolation and position control.
In contrast to other drive profiles, PROFIdrive only defines the access mechanisms to the parameters and a subset of approximately 30 profile parameters, which includes fault buffers, drive controllers, device identification, etc. All other parameters (which may number more than 1000 in complex devices) are manufacturer specific; they provide drive manufacturers great flexibility when implementing control functions. The profile for PA devices defines all functions and parameters for different classes of devices for process automation with local intelligence. They can execute part of the information processing or even take over the overall functionality in automation systems. The profile includes all steps of a typical signal flow — from process sensor signals to the preprocessed process value that is communicated to the control system together with a value qualifier (Figure 10.18). The profile for PA devices is documented in a general model description containing the currently valid specifications for all device types and in device data sheets containing the agreed additional specifications
© 2005 by CRC Press
10-15
PROFIBUS: Open Solutions for the World of Automation
Measured value and value qualifier
Calibration
Linearization, scaling
Filtering
Limit value check
Default behavior
Operating mode selection
Transmission to the control system
FIGURE 10.18 Signal processing defined in profile PA devices.
for individual device classes. Version 3.0 of the profile for PA devices includes device data sheets for quantity measurement of pressure and differential pressure, level, temperature, and flow rate, and data sheets for valves, actuators, analyzers, analog, and digital inputs and outputs. In process engineering it is common to use blocks for describing the characteristics and functions of a measuring point or manipulating point at a certain control point and to represent an automation application through a combination of these blocks. Therefore, the specification for PA devices uses a function block model according to IEC 61804 to represent functional sequences, as shown in Figure 10.19. The blocks are implemented by the manufacturers as software in the field devices and, taken as a whole, represent the functionality of the device.
Field device
Bus
Physical Block
Control system Conditioning
Preprocessing
Sensor 1
Transducer Block 1
Function Block 1
Sensor 2
Signal 2 Transducer Block 2
Function Block 2
Sensor 3
Signal 3 Transducer Block 3
Function Block 3
Signal 1
Process
FIGURE 10.19 Block structure of a PA field device.
© 2005 by CRC Press
10-16
The Industrial Communication Technology Handbook
The following three block types are used: A physical block (PB) contains the characteristic data of a device, such as device name, manufacturer, version, serial number, etc. There can only be one physical block in each device. A transducer block (TB) contains all the data required for processing an unconditioned signal delivered from a sensor for passing on to a function block. If no processing is required, the TB can be omitted. Multifunctional devices with two or more sensors have a corresponding number of TBs. A function block (FB) contains all data for final processing of a measured value prior to transmission to the control system or, on the other hand, for processing of a setting before the setting process. Different FBs are available: Analog input (AI) — Delivers the measured value from the sensor/TB to the control system Analog output (AO) — Provides the device with the value specified by the control system Digital input (DI) — Provides the control system with a digital value from the device Digital output (DO) — Provides the device with the value specified by the control system The profile for ident systems defines complete communication and processing models for bar code readers and transponder systems. These are primarily intended for extensive use with DP-V1 functionality. While the cyclic data transmission channel is used for small data volumes to transfer status/control information, the acyclic channel serves the transmission of large data volumes that result from the information in the bar code reader or transponder. The definition of standard PROXY function blocks [5] according to IEC 61131-3 has facilitated the use of these systems and paves the way for the application of open solutions on completion of international standards, such as ISO/IEC 15962 and ISO/IEC 18000. The profile for weighing and dosage systems follows approaches similar to those of the ident systems. Communication and processing models are defined for four classes of devices or systems: • • • •
Simple scale Comfort scale Continuous scale Batch scale
These new types of profiles will dramatically reduce the engineering costs and improve the bidding process during project execution. 10.4.2.1 Summary of Specific Application Profiles PROFIdrive: The profile specifies the behavior of devices and the access procedure to parameters for variable-speed electrical drives on PROFIBUS DP. PA devices: The profile specifies the data formats for cyclic data exchange and the characteristics for process engineering of devices for process automation. Robots/NC: The profile describes how handling and assembly robots are controlled via PROFIBUS. Panel devices: The profile describes the interfacing of simple human machine interface devices (HMIs) to control components. Encoders: This profile describes the interfacing of rotary, angle, and linear encoders with single-turn or multiturn resolution. Fluid power: The profile describes the control of hydraulic drives via PROFIBUS. SEMI: The profile defines models of devices for semiconductor production such that they comply with the PA model and the SEMI model. Low-voltage switchgear: The profile describes data exchange for low-voltage switchgear like circuit breakers, switches, and starters. Weighing/dosage: The profile describes the communication and processing models for simple and comfort scales as well as for batch and continuous scales. Ident systems: The profile describes the communication and processing models for bar code readers and transponders.
© 2005 by CRC Press
10-17
PROFIBUS: Open Solutions for the World of Automation
... Application profiles are based on one or more of the master/system profiles
are supporting one or more of the application profiles Master/system profiles
Discrete Manufacturing
Continuous Manufacturing
Motion Control
Safety
Programmer, Laptop, PC, etc.
FIGURE 10.20 Master and system profiles for PROFIBUS DP.
Remote IO for PA devices: This profile takes into account the special conditions of physically modular slaves like limited communication resources and extreme cost sensitivity. The profile follows the model of PA devices as much as possible but has some simplifications.
10.4.3 Master and System Profiles Master profiles for PROFIBUS describe classes of controllers, each of which support a specific subset of all the possible master functionalities, such as cyclic and acyclic communications, diagnostics, alarm handling, clock control, slave-to-slaves communication, isochronous mode, and safety. System profiles for PROFIBUS go a step further and describe classes of systems, including the master functionality, the possible functionality of standard program interfaces (FB in accordance with IEC 61131-3, safety layer, and field device tool (FDT)), and integration options (general station description (GSD), electronic device description (EDD), and device type manager (DTM)). Figure 10.20 shows the standard platforms available today. In the PROFIBUS DP system, the master and system profiles provide the much needed counterpart to the application profiles: master and system profiles describe specific system parameters that are made available to the field devices; application profiles require specific system parameters in order to simplify their defined characteristics. By using these profiles the device manufacturers can focus on existing or specified system profiles and the system manufacturers can provide the platforms required by the existing or specified device application profiles.
10.5 Integration Technologies Modern field devices in both factory and process automation provide a wide range of information and also execute functions that were previously executed in PLCs and control systems. To execute these tasks, the tools for commissioning, maintenance, engineering, and parameterization of these devices require an exact and complete description of device data and functions, such as the type of application function, configuration parameters, range of values, units of measurement, default values, limit values, identification, etc. The same applies to the controller/control system, whose device-specific parameters and data formats must also be made known (integrated) to ensure error-free data exchange with the field devices. PROFIBUS has developed a number of methods and tools (integration technologies) for this type of device description, which enables standardization of device management. The performance range of these
© 2005 by CRC Press
10-18
The Industrial Communication Technology Handbook
Discrete Manufacturing (Factory automation)
Automatic control for binary I/O Simplest handling and fixed configurations Parameterization at start-up
Network configuration • In-process measurement
• Drives • Functional safety GSD
DTM program FDT • Device specific handling • Application interface • Middle to high complexity Continuous Manufacturing (Process Automation)
Interpreter • Uniform device handling • Device description language • Low to middle complexity
EDD
Closed-loop control Tool-based parameterization & diagnostic Device tuning at run-time
FIGURE 10.21 Integration technologies for PROFIBUS DP.
tools is optimized to specific tasks (Figure 10.21), which has given rise to the term scalable device integration. GSD and EDD are both a sort of electronic device data sheet, developed with different languages according to the special scope, while DTM is a software component containing specific field device functions for parameterization, configuration, diagnostics, and maintenance, generated by mapping and to be used together with the universal software interface FDT, which is able to implement software components. A GSD is an electronically readable ASCII text file that contains both general and device-specific specifications for communication and network configuration. Each of the entries describes a feature that is supported by a device. By means of keywords, a configuration tool reads the device identification (ID number), the adjustable parameters, the corresponding data type, and the permitted limit values for the configuration of the device from the GSD. Some of the keywords are mandatory, for example, Vendor_Name. Others are optional, for example, Sync_Mode_supported. A GSD replaces the previously conventional manuals and supports automatic checks for input errors and data consistency, even during the configuration phase. Distinction is made between a device GSD (for an individual device only) and a profile GSD, which may be used for devices that comply exactly with a profile such as PROFIdrive version 3 or PA devices version 3. GSD for compact devices, whose block configuration is already known on delivery, can be created completely by the device manufacturer. GSD for modular devices, whose block configuration is not yet conclusively specified on delivery, must be configured by the user in accordance with the actual module configuration using the configuration tool. The device manufacturers are responsible for the scope and quality of the GSD of their devices. Submission of a profile GSD (contains the information from the profile of a device family) or an individual device GSD (device specific) is essential for certification of a device. Like a GSD, an EDD is an electronic device data sheet, but it is developed by using a more powerful and universal language, the electronic device description language (EDDL). An EDD typically describes the application-related parameters and functions of a field device, such as configuration parameters, ranges of values, units of measurement, default values, etc. An EDD is a versatile source of information for engineering, commissioning, runtime, asset management, and documentation. It also contains support mechanisms to integrate existing profile descriptions in the device description, to allow references to existing objects, to access standard dictionaries, and to allow assignment of the device description to a device. An EDD is independent of operating systems and supports the user by its uniform user and operation interface (only one tool, reliable operation, reduced training and documentation costs) and also the device manufacturer (no specific knowledge required, existing EDDs and libraries can be used).
© 2005 by CRC Press
PROFIBUS: Open Solutions for the World of Automation
10-19
Test device
Test in PROFIBUS test laboratory
no
OK? yes
Certification through PROFIBUS International
FIGURE 10.22 Certification procedure.
The EDD concept is suitable for tasks of low to middle complexity. A DTM is software that is generated by mapping the specific functions and dialogs of a field device for parameterization, configuration, diagnostics, and maintenance, complete with user interface, in a software component. This component is called DTM and is integrated in the engineering tool or control system over the FDT interface. A DTM uses the routing function of an engineering system for communicating across the hierarchical levels. It works similarly as a printer driver, which the printer supplier includes in delivery, and must be installed on the PC by the user. The DTM is generated by the device manufacturer and is included in delivery of the device. DTM generation may be performed using one of the following options: • • • •
Specific programming in a higher programming language Reuse of existing components or tools through their encapsulation as DTM Generation from an existing device description using a compiler or interpreter Use of the DTM tool kit of MS Visual Basic
With DTMs it is possible to obtain direct access to all field devices for planning, diagnostics, and maintenance purposes from a central workstation. A DTM is not a stand-alone tool, but an ActiveX component with defined interfaces. The FDT/DTM concept is protocol independent and, with its mapping of device functions in software components, opens up interesting new user options. The DTM/FDT concept is very flexible, resolves interface and navigation needs, and is suitable for tasks of middle to high complexity.
10.6 Quality Assurance In order for PROFIBUS devices of different types and manufacturers to correctly fulfill tasks in the automation process, it is essential to ensure the error-free exchange of information over the bus. The requirement for this is a standard-compliant implementation of the communication protocol and application profiles by device manufacturers (Figure 10.22). To ensure that this requirement is fulfilled, PI
© 2005 by CRC Press
10-20
The Industrial Communication Technology Handbook
has established a quality assurance procedure whereby, on the basis of test reports, certificates are issued to devices that successfully complete the test. Basis for the certification procedure is the standard EN 45000. PROFIBUS International has approved manufacturer-independent test laboratories in accordance with the specifications of this standard. Only these test laboratories are authorized to carry out device tests, which form the basis for certification. The test procedure, which is the same for all test laboratories, is made up of several parts: • The GSD/EDD check ensures that the device description files comply with the specification. • The hardware test tests the electric characteristics of the PROFIBUS interface of the device for compliance with the specifications. This includes terminating resistors, suitability of the implemented drivers and other modules, and the quality of line level. • The function test examines the bus access and transmission protocol and the functionality of the test device. • The conformity test forms the main part of the test. The objective is to test conformity of the protocol implementation with the standard. • The interoperability test checks the test device for interoperability with the PROFIBUS devices of other manufacturers in a multivendor plant. This checks that the functionality of the plant is maintained when the test device is added. Operation is also tested with different masters. Once a device has successfully passed all the tests, the manufacturer can apply for a certificate from PROFIBUS International. Each certified device contains a certification number as a reference. The certificate is valid for 3 years but can be extended after undergoing further tests.
10.7 Implementation For the device development or implementation of the PROFIBUS protocol, a broad spectrum of standard components and development tools (PROFIBUS ASICs, PROFIBUS stacks, monitoring and commissioning tools) as well as services are available that enable device manufacturers to realize cost-effective development. A corresponding overview is available in the product catalog of PROFIBUS International [2]. PROFIBUS interface modules are ideal for a low or medium volume of devices to be produced. These credit card-size modules implement the entire bus protocol. They are fitted on the master board of the device as an additional module. PROFIBUS protocol chips (single chips, communication chips, protocol chips) are recommended for an individual implementation in the case of a high volume of devices. The implementation of single-chip ASICs is ideal for simple slaves (IO devices). All protocol functions are already integrated on the ASIC. No microprocessor or software is required. Only the bus interface driver, the quartz, and the power electronics are required as external components. For intelligent slaves, parts of the PROFIBUS protocol are implemented on a protocol chip and the remaining protocol parts implemented as software on a microcontroller. In most of the ASICs available on the market all cyclic protocol parts have been implemented, which are responsible for transmission of time-critical data. For complex masters, the time-critical parts of the PROFIBUS protocol are also implemented on a protocol chip and the remaining protocol parts implemented as software on a microcontroller. Various ASICs of different suppliers are currently available for the implementation of complex master devices. They can be operated in combination with many common microprocessors. An overview for commercially offered PROFIBUS chips and software (PROFIBUS stacks) is available at the PROFIBUS Web site [2]. For further information, please contact the suppliers directly. Modem chips are available to realize the (low) power consumption, which is required when implementing a bus-powered field device with MBP transmission technology. Only a feed current of 10 to 15 mA over the bus cable is available for these devices, which must supply the overall device, including the bus interface and the measuring electronics. These modems take the required operating energy for the overall device from the MBP bus connection and make it available as feed voltage for the other electronic
© 2005 by CRC Press
PROFIBUS: Open Solutions for the World of Automation
10-21
components of the device. At the same time, the digital signals of the connected protocol chip are converted into the bus signal of the MBP connection modulated to the energy supply.
10.8 Prospects While the fieldbuses were pioneering the field of distributed automation in discrete and continuous manufacturing facilities for the past 15 years, Ethernet was gaining great success in office automation. The technology matured more and more and evolved a high degree of comfort and flexibility, such as high transmission speed, easy-to-handle cables and connectors, efficient control protocols, network devices like switches, and the tremendous success of the Internet. The fieldbus organizations now are eager to provide solutions for a steadily growing demand of the market. The solution from PROFIBUS International is PROFINET.
10.8.1 PROFINET CBA PROFINET CBA is a new automation concept that has emerged as a result of the trend in automation technology toward modular, reusable machines (mechatronic components) and plants with distributed intelligence. With its comprehensive design (uniform model for engineering, communication, and migration architecture to other communication systems, such as PROFIBUS and OPC), PROFINET CBA fulfills all the key demands of automation technology for: • Consistent communications from field level to corporate management level such as enterprise resource planning (ERP) and manufacturing execution systems (MES) using Ethernet • A vendor-independent plantwide engineering model for the entire automation landscape • Openness to other systems • Implementation of IT standards • Integration capability of PROFIBUS segments without changes PROFINET CBA is available as a specification and as operating system-independent source software for Ethernet based communications [9].
10.8.2 PROFINET IO The PROFINET component model is ideal for intelligent field devices and programmable controllers with data format interfaces that can be standardized. Simple field devices with many IO signals do not fit into the engineering model of PROFINET CBA. Thus, PROFINET IO offers an integration methodology based on the PROFINET communication protocols such that a manufacturer of PROFIBUS DP slave devices feels comfortable to switch over. He will find the services described for PROFIBUS DP in PROFINET IO and more. The most essential feature of this integration is the use of distributed field devices with their input and output data to be processed within the application program of a PLC.
10.8.3 The PROFINET Migration Model This model allows the integration of PROFIBUS DP segments in PROFINET using proxies. These assume a proxy function for all the devices connected to PROFIBUS. This means that when rebuilding or expanding plants, the entire spectrum of PROFIBUS devices, including products of PROFIdrive and PROFIsafe, can be implemented unchanged, thus providing users with maximum investment protection. Proxy technology also allows integration of other fieldbus systems (Figure 10.23). A second possibility is the usage of PROFINET IO devices directly connected to the host controller (PLC) via PROFINET. Intelligent field devices may be connected directly as a component. This way the user has all the possibilities to migrate from the current situation to PROFINET at his convenience.
© 2005 by CRC Press
10-22
The Industrial Communication Technology Handbook
PROFInet components
Ethernet
Intelligent field device on Ethernet
PROFIBUS DP
PLC with distributed I/O on Ethernet
PLC with distributed I/O on PROFIBUS DP
FIGURE 10.23 The migration concept of PROFINET CBA and PROFINET IO.
Abbreviations ASIC BIA CBA CPU CRC DP DPM1 DPM2 DTM DXB EDD EMI EN, prEN FB FDT FISCO FM GSD HMI HW IEC IO ISO/OSI MS0 MS1/MS2 NAMUR OPC PA PC PDU PLC PTB
SW UL
Application-specific integrated circuit German Institute of Occupational Safety and Health Component-based automation Central processing unit Cyclic redundancy check Decentralized peripherals PROFIBUS DP master class 1, usually a programmable logic controller PROFIBUS DP master class 2, usually a laptop or PC Device type manager Data exchange broadcast (slave-to-slaves communication) Electronic device description Electromagnetic interference European standard, preliminary European standard Function block Field device tool Fieldbus intrinsically safe concept Factory Mutual Global, a commercial and industrial property insurance company with a unique focus on risk management (www.fmglobal.com) General station description (electronically readable data sheet) Human machine interface Hardware International Electrotechnical Commission Input/output International Organization for Standardization/Open Systems Interconnection (reference model) Cyclic master slave communication services of PROFIBUS DP Acyclic master slave communication services of PROFIBUS DP Association of users of process control technology OLE for process control Process automation Personal computer Protocol data unit Programmable logic controller Pysikalisch-Technische Bundesanstalt, national institute of natural and engineering sciences and the highest technical authority for metrology and physical safety engineering of the Federal Republic of Germany (www.ptb.de) Software Underwriters Laboratories, Inc., an independent, not-for-profit product safety testing and certification organization (www.ml.com)
© 2005 by CRC Press
PROFIBUS: Open Solutions for the World of Automation
10-23
References [1] IEC 61158/61784: Digital Data Communications for Measurement and Control: Fieldbus for Use in Industrial Control Systems, 2003. [2] PROFIBUS home page: www.profibus.com. [3] Optical transmission technology for PROFIBUS, version 2.0, 1999, PROFIBUS Order 2.021. [4] IEC/TS 60079-27: Electrical Apparatus for Explosive Gas Atmospheres: Part 27: Fieldbus Intrinsically Safe Concept (FISCO); Parts 11, 14, and 25: Constructional and installation requirements, 2002. [5] PROFIBUS communication and proxy function blocks according to IEC 61131-3, version 1.2, July 2001, PROFIBUS Order 2.182. [6] Food and Drug Administration: 21 CFR Part 11. [7] M. Popp, The New Rapid Way to PROFIBUS DP, PROFIBUS Order 4.072. [8] PROFIBUS System Description: Technology and Application, October 2002, free download from www.profibus.com or PNO Order 4.002. [9] PROFInet System Description: Technology and Application, November 2002, free download from www.profibus.com or PNO Order 4.132.
© 2005 by CRC Press
11 Principles and Features of PROFInet 11.1 Introduction ......................................................................11-1 11.2 PROFInet at a Glance .......................................................11-2 Decentralized Field Devices (PROFInet IO) • Distributed Automation (Component Model) • Communication • Network Installation • IT Integration • Fieldbus Integration
11.3 Decentralized Field Devices (PROFInet IO)...................11-5 Functional Scope • Device Model • Device Description (GSD) • Configuration and Data Exchange • Diagnostics
11.4 Distributed Automation....................................................11-8 Technological Modules • PROFInet Components
11.5 Granularity of Technological Modules ............................11-8 PROFInet Engineering • Component Description (PCD) • Interconnection Editor • PROFInet Runtime
11.6 PROFInet Communication ............................................11-11 Standard Communication with TCP/UDP • Real-Time Communication • Communication for PROFInet IO • Communication between Technological Modules
11.7 Installation Technology for PROFInet...........................11-14 PROFInet Cable Installation • Plug Connectors • Switches as Network Components
11.8 IT Integration ..................................................................11-17 Network Management • Web Utilities
11.9 OPC..................................................................................11-20
Manfred Popp Siemens AG
Joachim Feld Siemens AG
Ralph Büsgen Siemens AG
OPC DA (Data Access) • OPC DX (Data Exchange) • OPC DX and PROFInet
11.10 Integration of Fieldbus Systems.....................................11-21 Migration Strategies • Integration by Means of Proxies • Integration of Fieldbus Applications • PROFInet and Other Fieldbus Systems
11.11 PNO Offer........................................................................11-23 Technology Development • Quality Measures • Technical Support
11.1 Introduction Automation technology is undergoing continuous change due to the ever-shorter innovation cycles for new products. The use of fieldbus technology in recent years has represented a significant innovation. It has enabled the migration of automation systems from centralized to decentralized systems. In this regard, PROFIBUS has set the standard as the market leader for more than 15 years.
11-1 © 2005 by CRC Press
11-2
The Industrial Communication Technology Handbook
Moreover, in today’s automation technology, information technology (IT) with established standards, such as Transmission Control Protocol (TCP)/Internet Protocol (IP) and Extensible Markup Language (XML), is increasingly dictating changes. Integration of information technology into automation is opening up significant advances in communication options between automation systems, far-reaching configuration and diagnostic options, and network-wide service functions. These functions have been a fixed component of PROFInet from the start. PROFInet is the open standard for industrial automation based on the Industrial Ethernet. PROFInet enables problem-free realization of distributed automation, integration of existing field devices, and operation of demanding, time-critical applications (such as motion control). In addition to utilization of IT technology, protection of investment also plays an important role with PROFInet. PROFInet enables existing fieldbus systems such as PROFIBUS to be integrated without modifications with existing devices. This protects the investments of plant operators, machinery/plant construction firms, and device manufacturers. Automation technology requirements are thoroughly covered by PROFInet. It was possible to transfer the many years of experience in the PROFIBUS sphere to PROFInet standardization. The use of open standards, simple handling ability, and integration into existing plant units have defined PROFInet from the start. PROFInet is currently integrated in IEC 61158. A long-term perspective is offered to users through continuous advancements in PROFInet. Costs incurred by plant or mechanical system engineers for installation, engineering, and start-up are minimized through the use of PROFInet. For plant operators, PROFInet enables plants to be easily expanded and achieve a high level of availability through independently operating plant units. Establishment of certification by the PROFIBUS User Organization (PNO) guarantees a high standard of quality for PROFInet products. This chapter describes in detail how experience gained with IT standards in the PROFIBUS sphere has been converted to PROFInet.
11.2 PROFInet at a Glance The motivation to create PROFInet comes from the user requirements outlined in Section 11.1 and the anticipated cost reduction resulting from manufacturer-independent, plantwide engineering. With PROFInet, a modern automation concept has emerged that is based on Ethernet and enables simple integration of existing fieldbus systems (in particular, PROFIBUS). This represents an important aspect for satisfying uniformity of requirements from the corporate management level to the field level.
11.2.1 Decentralized Field Devices (PROFInet IO) Simple field devices are integrated in PROFInet using PROFInet IO and described by the familiar input/ output (IO) view in PROFIBUS DP. Decentralized periphery for PROFInet are also integrated using this approach. The essential feature of this integration is the use of decentralized field devices with their input and output data, which are processed in the programmable logic controller (PLC) user program. PROFInet IO describes a device model that differentiates slots and channels in a way similar to that of the model for PROFIBUS DP. The device properties are described by an XML-based description file (general station description (GSD)). PROFInet IO devices are engineered using the same approach that has long been familiar to system integrators of PROFIBUS DP. This includes assignment of the decentralized field devices to a controller during configuration. Productive data are then exchanged between the controller and the assigned field devices (Figure 11.1).
11.2.2 Distributed Automation (Component Model) Distributed automation systems typically consist of several subunits that act autonomously for the most part and coordinate with each other using signals for synchronization, sequence control, and information exchange.
© 2005 by CRC Press
Principles and Features of PROFInet
11-3
FIGURE 11.1 Architecture of PROFInet IO.
FIGURE 11.2 Mechanical, electrical/electronic, and software aspects are combined in technological modules.
The PROFInet component model refers to these subunits as technological modules. The technological modules form an intelligent functional unit. Through the use of the component technology proven in the IT sphere, the overall functionality of a technological module is encapsulated in an associated software component. Such a component is modeled as an object and regarded as a black box. An outside technological component interface is defined in order for the component to communicate with other components within the distributed system. A distributed automation system designed in this way constitutes the prerequisite for modularization of plants and machinery, and hence for reuse of plant and machine parts. This significantly reduces engineering costs. A technological module is described in PROFInet within the component model using PROFInet component description (PCD). PCD is XML based and produced by either a component generator of a manufacturer-specific configuration tool or the PROFInet component editor. A manufacturer-neutral engineering concept is available for user-friendly configuration of a PROFInet system. Engineering of distributed automation systems distinguishes between programming of control logic for individual technological modules (manufacturer-specific configuration tools) and the technological configuration of the entire system (interconnection editor). A systemwide application is formed in three steps: create components, interconnect components, and download interconnection information (Figure 11.2).
11.2.3 Communication Different performance levels are available for PROFInet communication.
© 2005 by CRC Press
11-4
The Industrial Communication Technology Handbook
Parameters, configuration data, and interconnection information that are not critical with respect to time are transferred in PROFInet via the standard channel based on TCP/User Datagram Protocol (UDP) and IP. This satisfies the prerequisites for interfacing the automation levels with other networks (manufacturing execution systems (MES), enterprise resource planning (ERP)). For transfer of time-critical process data within the production plant, the real-time channel known as soft real time (SRT) is available. This channel is implemented as software on the basis of existing controllers. For isochronous applications, isochrone real-time (IRT) communication is available that enables clock pulse rates of less than 1 ms and a jitter accuracy of 1 ms.
11.2.4 Network Installation PROFInet network installations are oriented toward specific requirements for Ethernet networks in industrial environments. The “PROFInet Installation Guideline” provides plant construction engineers and plant operators with simple rules for installing Ethernet networks and associated cabling. This guideline provides device manufacturers with clear specifications for device interfaces.
11.2.5 IT Integration The network management includes functions for administration of PROFInet devices in Ethernet networks. This includes the device configuration, network configuration, and network diagnostics. In the case of Web integration, PROFInet makes use of the Ethernet base technologies and enables access to a PROFInet component by means of standard Internet technologies. In order to preserve an open connection to other system types, PROFInet supports OPC DA (data access) and DX (data exchange).
11.2.6 Fieldbus Integration An important aspect of PROFInet is the seamless transition from existing fieldbus solutions such as PROFIBUS DP to Ethernet-based PROFInet. This contributes significantly to protection of investments by the device manufacturer, the plant construction/mechanical system engineer, and the end user (Figure 11.3). PROFInet offers two alternatives for integrating fieldbus systems: • Integration of fieldbus devices by means of proxies: The proxy is the representative for the lowerlevel field devices on the Ethernet. Through the proxy principle, PROFInet offers a completely transparent transition from existing to newly installed plant units. • Integration of whole fieldbus applications: A fieldbus segment represents a self-contained component. The representative for this component is the PROFInet device that operates a fieldbus such as PROFIBUS DP at a lower level. The entire functionality of a lower-level fieldbus is thereby implemented in the form of a component in the proxy, which is available on the Ethernet.
FIGURE 11.3 PROFIBUS systems, e.g., can be integrated in PROFInet using a proxy.
© 2005 by CRC Press
Principles and Features of PROFInet
11-5
FIGURE 11.4 Device types in PROFInet IO.
11.3 Decentralized Field Devices (PROFInet IO) Decentralized field devices are integrated directly on the Ethernet using PROFInet IO. To accomplish this, the master–slave system familiar in PROFIBUS DP is transferred over to a provider–consumer model. Though all devices on the Ethernet have equal communication rights, the configuration specifies which field devices are assigned to a centralized controller. In this way, the familiar user view in PROFIBUS is transferred to PROFInet IO. IO signals are read in and processed by the PLC and then re-sent to the outputs.
11.3.1 Functional Scope PROFInet IO distinguishes between three device types: PN-IO controller, PN-IO device, and PN-IO supervisor (Figure 11.4): • PN-IO controller: A PLC on which the automation program runs • PN-IO device: Decentralized field device that is assigned to a PN IO controller (such as remote IO, valve terminals, frequency converters) • PN-IO supervisor: Programming device or PC with commissioning and diagnostic functions Data can be transferred between the IO controller and IO device by means of the following channels: • Cyclic user data via the real-time channel • Event-triggered interrupts (diagnostics) via real-time channel • Parameter assignment and configuration as well as reading of diagnostic information via the standard channel based on UDP/IP At the start, a communication relationship called application relation (IO-AR) is established between the IO controller and the IO device based on the acyclic UDP/IP channel (Figure 11.5). Then the IO
FIGURE 11.5 Communication relationships in PROFInet IO governed by the consumer–provider model.
© 2005 by CRC Press
11-6
The Industrial Communication Technology Handbook
FIGURE 11.6 PROFInet IO device model is similar to that of PROFIBUS DP.
controller transfers the configuration data for the IO device by means of this established channel. Based on the configuration data, (1) the correct operating mode is determined, for example, and the IO device is uniquely identified, and (2) high-speed, cyclic useful data exchange via the real-time channel (IO-CR) is started. If a diagnostic event occurs (such as a wire break), an interrupt is sent to the IO controller via the high-speed, acyclic real-time channel (interrupt CR) for processing in the PLC program located on the IO controller.
11.3.2 Device Model A uniform device model has been specified for the PROFInet IO device (Figure 11.6). This model enables modular and compact field devices to be modeled. This model is oriented toward the main features in PROFIBUS DP and extends the advantages of these features into the future. An IO device with a modular configuration consists of slots in which modules are inserted. The modules contain channels over which process signals are read in or read out. The representative of the IO device is the interface module, which receives data from the IO controller and forwards it to the modules via the backplane bus. Conversely, it receives the process and diagnostic information from the modules via the backplane bus and forwards this information to the IO controller. Each IO device receives a global device identification that is uniquely assigned within the framework of PROFInet IO. This 32bit device ID number is divided into a 16-bit manufacturer identifier and a 16-bit device identifier. This device ID is assigned by the PROFIBUS User Organization (PNO).
11.3.3 Device Description (GSD) As in PROFIBUS, a device description is used to integrate a PROFInet device into the configuration tool of an IO controller. The properties of an IO device are described in the form of general station description (GSD), which contains all necessary information: • • • • •
Properties of the IO device (e.g., communication parameters) Insertable modules (number of type) Configuration data for individual modules (e.g., 4- to 20 mA-analog input) Parameters of modules Error texts for diagnostics (e.g., wire break, short circuit)
XML is the description basis for the GSD of PROFInet IO devices. Because XML is an open, widespread, and accepted standard for describing data, appropriate tools and derived properties are automatically available, including: • Creation and validation through a standard tool
© 2005 by CRC Press
Principles and Features of PROFInet
11-7
• Foreign language integration • Hierarchical structuring The GSD structure corresponds to ISO 15745 and consists of a header, the device description in the application layer (e.g., configuration data and module parameters), and the communication properties description in the transport layer.
11.3.4 Configuration and Data Exchange The description files of the IO devices are imported into the configuration tool. IO addresses are assigned to the individual IO channels of the field devices. The IO input addresses contain the received process values. The user program evaluates and processes these values. The user program forms the IO output values and outputs them to the process via the IO output addresses. In addition, parameters are assigned to the individual IO modules or channels in the configuration tool, e.g., 4- to 20-mA current range for an analog channel. After conclusion of the configuration, the configuration data are downloaded to the IO controller. The IO devices are assigned and configured automatically by the IO controller and then enter into the cyclic data exchange (Figure 11.7).
11.3.5 Diagnostics PROFInet IO supports a multilevel diagnostic concept that enables efficient fault localization and correction. When a fault occurs, the faulty IO device generates a diagnostic alarm to the IO controller. This alarm triggers a call in the PLC program to the appropriate program routine in order to be able to respond to the fault. If a device or module defect requires a complete replacement of the device or module, the IO controller automatically performs a parameter assignment and configuration of the new device or module. The diagnostic information is structured hierarchically: • • • • •
Slot number (module) Channel number Channel type (input/output) Coded fault cause (e.g., wire break, short circuit) Additional manufacturer-specific information
When an error occurs in a channel, the IO device generates a diagnostic alarm to the IO controller. This alarm triggers a call in the control program to the appropriate fault routine. After processing of the fault routine, the IO controller acknowledges the fault to the IO device. This acknowledgment mechanism ensures that sequential fault processing is possible in the IO controller.
FIGURE 11.7 Configuration path for data exchange.
© 2005 by CRC Press
11-8
The Industrial Communication Technology Handbook
11.4 Distributed Automation Automation development has given rise to modular plants and machinery. This structuring has triggered further development in automation to produce distributed automation systems. PROFInet also offers a solution in this area. The PROFInet solution involves separation of plant units into technological modules.
11.4.1 Technological Modules The function of an automated plant or machine — for a goods manufacturing process — is produced by a defined interaction of mechanical, electrical/electronic, and control logic/software aspects. According to this principle, PROFInet defines the mechanical, electrical/electronic, and control logic/software aspects for a technological module (see Figure 11.2). A technological module is in turn modeled by a software component, i.e., the PROFInet component.
11.4.2 PROFInet Components The representative of a technological module during plant engineering is the so-called PROFInet component. Each PROFInet component has an interface that contains the technological variables to be exchanged with other components. The PROFInet components are modeled using the standardized component object model (COM) technology. COM is an advanced object orientation that enables applications to be developed on the basis of preassembled components. The components are characterized by the formation of complete units that can be in relationship with other components. Like blocks, the components can be combined flexibly and easily reused, irrespective of how they are implemented internally. Access mechanisms to the component interfaces are uniformly defined in PROFInet.
11.5 Granularity of Technological Modules When the granularity of modules is being specified, the ability to reuse the modules in different plants must be examined with cost and availability in mind. The objective is to be able to merge individual components into an overall plant as flexibly as possible according to the building block principle. On the one hand, with too fine a granularity the technological view of the plant can become more complex, resulting in higher engineering costs. On the other hand, with too coarse a granularity, the degree of reusability is reduced. This, in turn, results in higher implementation costs. The machine or system manufacturer creates the software component. The component design has a major influence on reduction of engineering and hardware costs and the time response of the automation system (Figure 11.8). When a component is being designed, the granularity (i.e., size of machine/plant or the specified machine parts/plant units) can extend from an individual device to a complete machine containing a number of devices.
Component Interface Component Interface Component Interface
FIGURE 11.8 Component creation is standardized in PROFInet.
© 2005 by CRC Press
Principles and Features of PROFInet
11-9
FIGURE 11.9 Interconnected components.
11.5.1 PROFInet Engineering A manufacturer-neutral engineering concept has been created for user-friendly configuration of a PROFInet system. For one thing, this engineering concept enables development of configuration tools that can be used for components of different manufacturers. For another, it permits manufacturer-specific and application-specific function expansions. The engineering model distinguishes between programming of control logic for individual technological modules and the technological configuration of the overall plant. A plantwide application is generated in three stages. 11.5.1.1 Component Creation Components are created as an image of the technological modules by the machinery or plant construction firms. Devices are programmed and configured as before with the respective manufacturer-specific tools. That way, available user programs and the know-how of programmers and service personnel can continue to be used. Then the user software is encapsulated in the form of a PROFInet component. In so doing, a component description (PCD) in the form of an XML file is created. The contents of the component description are specified in PROFInet. These component descriptions are imported into the library of the interconnection editor. 11.5.1.2 Component Interconnection The created PROFInet components are moved from a library to an application and interconnected at the click of a mouse using the PROFInet interconnection editor (Figure 11.9). Interconnection replaces the previous labor-intensive programming of communication relationships with simple graphical configuration. During programming, detailed knowledge about the integration and sequence of communication functions in the device is required. At the time programming is performed, the following must already have been specified: which devices will communicate with one another, when the communication will occur, and which bus system will be used for the communication. By contrast, knowledge of the communication functions is not required during configuration because these functions run automatically in the devices. The interconnection editor consolidates the individual distributed applications on a plantwide basis. Operation of the interconnection editor is manufacturer neutral; that is, the editor interconnects any manufacturer’s PROFInet components. 11.5.1.3 Downloading Following component interconnection, the interconnection information as well as the code and configuration data for the components are downloaded to the PROFInet devices at the click of a mouse (Figure 11.10). As a result, each device knows all of its communication peers, communication relationships, and information to be exchanged. The distributed application can be executed afterward.
© 2005 by CRC Press
11-10
The Industrial Communication Technology Handbook
FIGURE 11.10 Downloading the interconnection information for PROFInet devices.
11.5.2 Component Description (PCD) The PROFInet component description (PCD) is an XML file, which is created by manufacturer-specific tools. This assumes that these tools have a component generator. Alternatively, a manufacturer-neutral PROFInet component editor, available for download on the PROFIBUS Web site, can be used to create the PCD file. The PCD file contains information on the functions and objects of the PROFInet component. Specifically, this information includes: • Description of components as library elements: component ID, component name • Hardware description: IP address storage, diagnostic data access, interconnection download • Description of software functionality: software hardware assignment, component interface, properties of variables such as technological name, data type, and direction (input or output) • Storage location of component project Component libraries are generated to support reusability.
11.5.3 Interconnection Editor In general, an interconnection editor has two views: plant view and network view. In the plant view, required components are imported from the library and placed on the screen, and the individual interconnections are established (Figure 11.11). This yields a technological structure and its local relationships within a plant. By contrast, the topological structure of the automation system is created in the network view. Here, the field devices and automation devices are assigned to a communication system or bus system, and the device addresses are specified according to the rules for the underlying bus system (Figure 11.12).
11.5.4 PROFInet Runtime The PROFInet runtime model defines functions and utilities that require cooperating automation components to accomplish an automation task. This model establishes and monitors the interconnections between PROFInet components that have been configured by engineering. The PROFInet runtime model sets up a provider–consumer model in which the provider creates and sends data and the consumer receives and processes data.
© 2005 by CRC Press
Principles and Features of PROFInet
11-11
FIGURE 11.11 Plant view in the interconnection editor shows the interconnected components.
FIGURE 11.12 Network view in the interconnection editor shows the connected field devices.
11.6 PROFInet Communication Ethernet-based communication is scalable in PROFInet. The following three performance levels are differentiated: 1. TCP/UDP and IP for data that are not time critical, such as parameter assignment and configuration data
© 2005 by CRC Press
11-12
The Industrial Communication Technology Handbook
2. Soft real time (SRT) for time-critical process data in factory automation 3. Isochrone real time (IRT) for particularly challenging application requirements, such as those for motion control These three PROFInet communication performance levels cover the spectrum of automation applications in their overall diversity. The PROFInet communication standard is characterized by the following in particular: • Coexisting utilization of real-time and TCP-based IT communication on one line • Uniform real-time protocol for all applications both for communication between components in distributed systems and for communication between the controller and the distributed field devices • Scalable real-time communication from performant to high performant, and isochronous mode Scalability and a uniform communication basis represent two of the major strengths of PROFInet. They guarantee continuity to the corporate management level and fast response times in the automation process.
11.6.1 Standard Communication with TCP/UDP PROFInet uses Ethernet and TCP/UDP with IP as the basis for communication. TCP/UDP with IP is a de facto standard with respect to communication protocols in the IT landscape. However, for interoperability (i.e., the interaction between applications), establishment of a common communication channel over the field devices, based on TCP/UDP (layer 4), is insufficient. TCP or UDP represents only the foundation on which Ethernet devices can exchange data via a transport channel in local and distributed networks. Therefore, additional specifications and protocols beyond TCP/UDP — the socalled application protocols — are required. Interoperability is only guaranteed if the same application protocol is used on the devices. Typical application protocols are Simple Mail Transfer Protocol (SMTP, for e-mail), File Transfer Protocol (FTP, for file transfer), and Hypertext Transfer Protocol (HTTP, used on the Internet).
11.6.2 Real-Time Communication Real-time applications in manufacturing automation require update, or response, times in the range of 5 to 10 ms. The update time refers to the time that elapses when a variable is generated by an application in a device, then sent to a peer device via the communication system, and then received updated by the application. For devices, the implementation of a real-time communication causes only a small load on the processor so that execution of the user program continues to take precedence. From experience, in the case of Fast Ethernet (100 Mb/s Ethernet), the transmission rate on the line in proportion to the execution in the devices can be disregarded. Most of the time gets lost in the application. The time it takes to provide data to the application of the provider is not influenced by the communication. This also applies to processing of data received in the consumer. As a result, noteworthy improvements in update rates, and thus the real-time performance, can be obtained primarily through proper optimization of the communication stack in the provider and consumer. 11.6.2.1 Soft Real Time (SRT) In order to satisfy real-time demands in automation, PROFInet uses an optimized real-time communication channel (Figure 11.13). This channel is based on the Ethernet (layer 2). The solution minimizes the throughput time in the communication stack considerably and increases performance in terms of the process data update rate. By doing away with several protocol layers, the message frame length is reduced. By not including these layers, the data to be transmitted from the provider can be sent sooner, or are available earlier to the application on the consumer side. At the same time, this significantly reduces the processor power required for communication in the device.
© 2005 by CRC Press
Principles and Features of PROFInet
11-13
FIGURE 11.13 Communication channels in PROFInet.
11.6.2.1.1 Optimized Data Transmission through Prioritization In addition to the minimized communication stack in the automation devices, the transmission of data in the network is also optimized in PROFInet. In order to achieve an optimal result, the packets are prioritized in PROFInet according to IEEE 802.1Q. The network components use this priority to control the data flow between the devices. The standard priority for real-time data is based on Prio 6. Thus, priority handling over other applications, such as Internet telephony, is guaranteed because they are using Prio 5. 11.6.2.2 Isochronous Real Time (IRT) The presented solutions are not sufficient for motion control applications, in particular. These applications require update rates in the range of 1 ms along with a jitter for consecutive update cycles of 1 ms for cases involving up to 100 stations. To satisfy these requirements, PROFInet defines the IRT time slotcontrolled transmission process on layer 2 for Fast Ethernet. That means every device knows exactly in which time slot it is allowed to send data over the bus. Through synchronization of the devices involved (network components and PROFInet devices) with the accuracy indicated above, a time slot can be specified during which data critical for the automation task are transferred. The communication cycle is split into a deterministic part and an open part. The cyclic real-time message frames are dispatched in the deterministic channel, while the TCP/IP message frames are transported in the open channel. The process is comparable to the traffic on a highway where the left lane is reserved for time-critical traffic (real-time traffic) and the remaining traffic elements (TCP/IP traffic) are prevented from switching to this lane. Even if there is a traffic jam in the right lane, the time-critical traffic is not impacted. Isochronous data transmission is realized based on hardware; e.g., it is burned in an ASIC. Such an ASIC covers the cycle synchronization and time slot reservation functionality for the real-time data. Realization in hardware ensures that the accuracy requirements will be achieved within the order of magnitude necessary. Furthermore, the processor in the PROFInet device is relieved from communication tasks. The resulting additional runtime can be made available for automation tasks.
11.6.3 Communication for PROFInet IO For PROFInet IO, RPC based on UDP/IP is used in the start-up phase for initiation of the data exchange between devices, parameter assignment of distributed field devices, and diagnostics. Through the open and standardized RPC protocol, human machine interface (HMI) stations or engineering systems (IO supervisor) can also access PROFInet IO devices. The PROFInet real-time channel is employed for transmission of user data and alarms. In a typical IO configuration, there is an IO controller that exchanges the user data cyclically by means of communication relationships that are established during the start-up phase with multiple distributed
© 2005 by CRC Press
11-14
The Industrial Communication Technology Handbook
FIGURE 11.14 PROFInet communication between PROFInet components and PROFInet IO devices.
field devices (IO devices). In each cycle, the input data are sent from the assigned field devices to the IO controller, and in return, the output data are sent back to the appropriate field devices. The communication relationship is monitored by keeping track of cyclic messages. If, for example, cyclic messages fail in three cycles, the IO controller recognizes that the corresponding IO device has failed. The data transmission layer of PROFInet is defined in IEEE 802.3, which describes the protocol design and malfunction monitoring. A user data message frame consists of a minimum of 64 bytes and a maximum of 1500 bytes. The overall protocol overhead for real-time data is 28 bytes.
11.6.4 Communication between Technological Modules Distributed COM (DCOM) is specified in the PROFInet component view as the common TCP/IP-based application protocol between PROFInet components (Figure 11.14). DCOM represents an enhancement to COM for distribution of objects and their interaction in a network. DCOM is based on the standardized RPC protocol. DCOM is used for loading of interconnections, reading of diagnostic data, device parameter assignment and configuration, establishing interconnections, and, to some extent, exchanging user data between components in PROFInet. However, DCOM does not have to be used for exchange of user data between PROFInet components. Whether user data exchange is to take place via DCOM or the realtime channel is configured by the user in the engineering system, devices can then negotiate the use of a real-time-capable protocol. This is because communication between such plant or machinery modules can require real-time conditions that cannot be satisfied through TCP/IP and UDP. TCP/IP and DCOM form the common language that can be used to start communication between the devices in all cases. The PROFInet real-time channel is then used for real-time communication between individual stations in time-critical applications. In the configuration tool, the user can select the update time, the so-called quality of service, to determine whether the values are to be transferred between components cyclically during operation or when a change is made. A cyclic transfer is more advantageous for high-frequency update times since checking for changes and acknowledging them imposes less processor load than cyclic sending.
11.7 Installation Technology for PROFInet The international standard ISO/IEC 11801 and its European equivalent EN 50173 define an applicationneutral, information-oriented standard networking for a complex of buildings. The content of the two standards is essentially identical. Both standards assume the buildings are similar to an office environment and assert a claim to being application neutral. The following specific requirements for Ethernet networks in the industrial environment are not taken into consideration in the two standards (Figure 11.15): • Plant-specific cable routing
© 2005 by CRC Press
11-15
Principles and Features of PROFInet
FIGURE 11.15 Structure of Ethernet networks in office systems. Office Area Fixed basic installation in a building Laid under raised floors Variable device connection at workplace Prefabricated device connection cable Tree-shape network structures Large data packets (e.g., images) Medium network availability Moderate temperatures (from 0 to 50°C) No moisture Virtually no vibrations Low EMC burden Low mechanical danger Virtually no chemical danger
Production Area Largely system-related cabeling System-related cable routing Connection points are seldom changed Field-preparable device connections Quite often: line-form network structures and (redundant) ring structures Small data packets (measured values) Very high network availability Extreme temperatures (from –20 to +70°C) Moisture possible (IP65) Vibrating machines High EMC burden Danger of mechanical damage Chemical burden from oily or aggressive atmospheres
FIGURE 11.16 Differences in cable installation in office systems and automation systems.
• Individual degree of networking for each machine/plant • Linear network structures • Robust industrial cable and connectors with special requirements for EMC, temperature, moisture, dust, and vibration For this reason, the “PROFInet Transmission Technology and Cabeling” guideline defines an industrial cable installation for the Fast Ethernet application based on the fundamental requirements in IEC 11801 (Figure 11.16).
11.7.1 PROFInet Cable Installation 11.7.1.1 PROFInet Cable Installation with Symmetrical Copper Cable Signals are transmitted over symmetrical copper cable (twisted pair) in accordance with 100BASE-TX at a transmission rate of 100 Mbits/s (Fast Ethernet). The transmission medium is a two-pair, twistedconductor, shielded copper cable (twisted pair or star quad) with a characteristic impedance of 100 ohm (Figure 11.17). Only shielded cable and connection elements are permitted. The individual components must satisfy category 5 requirements according to IEC 11801. The overall transmission line must satisfy the Class D requirements according to IEC 11801. Removable connections are produced using an RJ45 or M12 connector system. Sockets are used as device connections. Connecting cables (device connection cables, marshalling cables) are provided with suitable plugs at both ends. An active network component is used to connect all devices. To ensure that installation is as simple as possible, the transmission cable has been
© 2005 by CRC Press
11-16
The Industrial Communication Technology Handbook
FIGURE 11.17 Ethernet Networks in an industrial environment.
defined to be identical at both ends. This connecting cable satisfies the function of a cable assembly with two identical ends. The maximum segment length is 100 m. 11.7.1.2 PROFInet Cable Installation with Optical Fibers PROFInet can be operated with multimode or single-mode fiber-optic cables. Signals are transferred over two-strand optical fibers in accordance with 100BASE-FX at a transmission rate of 100 Mbits/s. The optical interfaces comply with the ISO/IEC 9314-3 (multimode) or ISO/IEC 9314-4 (single-mode) specifications. For applications external to the control cabinet, the outer sheath must satisfy the applicable requirements at the point of use (mechanical, chemical, thermal). The maximum segment length is 2 km for multimode and 14 km for single mode.
11.7.2 Plug Connectors Plug connectors for M12 and RJ45 are available in PROFInet. They can be easily cut to length and terminated on site (Figure 11.18). RJ45 type is used in control cabinet settings in PROFInet. It is compatible with an office connector. Plug connectors external to the control cabinet must notably accommodate industrial requirements. The RJ45 types in IP65 or IP67 or the M12 type is utilized in this case (Figure 11.19). Fiber-optic connections are implemented according to ISO/IEC 11801, preferably with a duplex SC plug connector system. This system is described in IEC 60874-14. Devices are equipped with the socket, and the connecting cable is equipped with the plug. Alternatively, the fiber-optic plug connector BFOC/ 2,5 in accordance with IEC 60874-10 can be used.
FIGURE 11.18 Example of an RJ45 plug connector in IP20.
© 2005 by CRC Press
FIGURE 11.19 Example of an RJ45 plug connector in IP67.
Principles and Features of PROFInet
11-17
FIGURE 11.20 Example of a hybrid plug connector with RJ45 in IP67.
The hybrid plug connector is used in situations where decentralized field modules are connected using a combined plug connector containing data and power supply (Figure 11.20). A plug connector that is completely protected against accidental contact enables the use of plug connectors that are identical at both ends because a pin–socket reversal is not required due to the integrated accidental contact protection. This example involves an RJ45 in IP67 with a two-pair, shielded data cable for communication and a 4 copper conductor for voltage supply.
11.7.3 Switches as Network Components PROFInet uses general switches as a network component. Switches are devices that are situated in the transmission path between the end devices and that regenerate received signals and forward them selectively. They are used to structure networks. ISO/IEC 15802-3 contains the basic specifications. Switches suitable for PROFInet are sized for Fast Ethernet (100 Mbits/s, IEEE 802.3u) and full-duplex transmission. During full-duplex operation, a switch receives and sends data simultaneously at the same port. No collisions occur when switches are utilized. As a result, bandwidth is not lost due to the Ethernet collision process. Network configuration is simplified significantly because route length checking is not required within a collision domain. To ensure compatibility with old plants or individual, older-end devices or hubs, 10BASE-TX (10 Mbits/s, carrier-sense multiple access with collision detection (CSMA/ CD)) is supported. Moreover, a PROFInet switch supports prioritized message frames in accordance with IEEE 802.1Q, standardized diagnostic paths, and autopolarity exchange, autonegotiation mode, and autocrossover function. Port mirroring for diagnostic purposes is optional. As a general rule, office-type switches cannot be used, even if the functionality described above is satisfied. Special switches are applied for industrial use. For one thing, these are designed for harsh industrial use based on their mechanical (IP degree of protection, etc.) and electrical (24-V power supply, etc.) properties. For another, they must fulfill the EMC requirements for industrial machine applications to enable safe operation.
11.8 IT Integration In addition to the automation functionalities described, the use of Ethernet as a communication medium in the PROFinet context enables IT functions to be integrated in PROFInet as well. As is the case in the fieldbus world, Ethernet places additional network management requirements in connection with TCP/IP. In order to manage all technical aspects of integration of PROFInet devices in such networks, a concept for network management has been specified in PROFInet. The topics of network infrastructure, IP management, network diagnostics, and time-of-day synchronization are the primary elements of the concept. Network management simplifies Ethernet administration and management through the use of standard protocols from the IT world.
© 2005 by CRC Press
11-18
The Industrial Communication Technology Handbook
The use of Internet technologies in automation systems represents an additional aspect. PROFInet has specified a concept under the scope of Web integration that enables access to a PROFInet component. This access is achieved by using Web utilities based on standard Internet technologies, such as HTTP, XML, and Hypertext Markup Language (HTML).
11.8.1 Network Management Network management comprises all functions for administering the network, such as configuration (assignment of IP addresses), fault monitoring (diagnostics), and performance optimization. 11.8.1.1 IP Management The use of TCP/IP in the PROFInet context means that an IP address has to be assigned to the network stations (i.e., PROFInet devices): • Address assignment with manufacturer-specific configuration systems: This alternative is required because a network management system is not always available. In PROFInet, the Discovery and Basic Configuration Protocol (DCP) is specified, enabling IP parameters to be assigned using manufacturing-specific configuration/programming tools or during plantwide engineering (e.g., in the PROFInet interconnection editor). The use of DCP is mandatory for PROFInet devices. In this way, uniform behavior of PROFInet devices is ensured. • Automatic address assignment with DHCP: The Dynamic Host Configuration Protocol (DHCP) has been established for assigning and managing IP addresses in office networks with network management systems. PROFInet provides for the use of this standard and describes how DHCP can be applied in a useful way in the PROFInet environment. Implementation of DHCP in PROFInet devices is optional. 11.8.1.2 Diagnostics Management The reliability of network operation has a very high priority in network management. The Simple Network Management Protocol (SNMP) has been established in existing networks as the de facto standard for maintaining and monitoring network components and their functions. In order for PROFInet devices to be monitored with established management systems, it is useful to implement SNMP. SNMP provides for both read access from (monitoring, diagnostics) and write access to (administration) a device. To begin with, only read access of device parameters was specified in PROFInet. Like the IP management functions, SNMP is optional. When SNMP is implemented in components, only the usual standard information for SNMP is accessed (management information base 2 (MIB-2)). A specific diagnostic for PROFInet components is possible by means of the mechanisms described in the PROFInet specification. In this context, SNMP will not open an additional diagnostic path. Rather, SNMP enables integration in network management systems that do not normally process PROFInetspecific information.
11.8.2 Web Utilities In addition to the use of modern Ethernet-based technologies in PROFInet, it is also possible to access a PROFInet component using Web clients based on standard Internet technologies, such as HTTP, XML, HTML, or scripting. Data are transmitted in standardized format (HTML, XML) and through standardized front ends (browsers such as Netscape, MS Internet Explorer, Opera, etc.). This enables integration of information from PROFInet components into modern, multi-media-supported information systems. The advantages of Web integration in the IT sphere — such as utilization of browsers as uniform user interfaces, access to information on any number of clients from any location, platform independence of clients, and reduced effort for installing and servicing software on the client side — are thus also made available for PROFInet components.
© 2005 by CRC Press
Principles and Features of PROFInet
11-19
11.8.2.1 Functional Properties Web integration of PROFInet has been designed with emphasis on commissioning and diagnostics. Webbased concepts can be used very effectively within these application areas. • No special tools are required to access components. Established standard tools can be used. • Global accessibility means that user support can be easily obtained when commissioning is performed by the component manufacturer. • Autodescription of components enables access with a standard tool without configuration information. Possible scenarios for PROFInet Web integration in the areas of commissioning and maintenance include testing and commissioning, overview of device database, device diagnostics, and plant and device documentation. The information provided should be represented in both a human-readable format (e.g., via a browser) and a machine-readable format (e.g., via an XML file). Both variants can be provided in integral form using the PROFInet Web integration. For certain information, the PROFInet Web integration also provides standardized XML schemes. 11.8.2.2 Technical Properties The basic component of Web integration is the Web server. The Web server forms the interface between the PROFInet object model and the basic technologies for Web integration. The PROFInet Web integration can be scaled according to the performance and properties of the Web server. That means that, in addition to a PROFInet device with an MS Internet Information Server or Apache Web Server, small PROFInet devices equipped with only an embedded Web server can also participate in the Web integration with equal rights. The Web integration for PROFInet has been created in such a way that it can be made available optionally for each device. Certain functions are optional and can be used for the device depending on its load capacity. This enables implementation of scalable solutions that are adapted to the respective use case to the maximum extent possible. The PROFInet-specific elements can be integrated seamlessly in an existing component Web implementation. Based on uniform interfaces and access mechanisms, the creator of a technological component can provide his technology-related data via the Web. Through the name space specified in the PROFInet Web integration and the addressing concept, elements of the PROFInet component object model (technological variables) can be referenced by the Web server. In this way, dynamic Web sites configured with current data from the PROFInet component can be created. 11.8.2.3 Scope The basic architecture model for the Web integration concept is shown in Figure 11.21. Web integration is optional for PROFInet.
FIGURE 11.21 Web integration enables Web access to PROFInet components.
© 2005 by CRC Press
11-20
The Industrial Communication Technology Handbook
Regarding the system architecture of an automation system with PROFInet, all architectural forms are supported by Web integration. In particular, the use of proxies for interfacing to any fieldbus is supported. The specification contains appropriate models that describe the relationships among the PROFInet components, the existing Web components, and the PROFInet Web integration elements. 11.8.2.4 Security The PROFInet Web integration specification has been created in such a way that access to the PROFInet devices is the same whether it occurs from the Internet or an intranet. As a result, all advantages of Web integration can be reaped even if the device itself is not connected to the Internet. In the case of local access, there is little risk for unauthorized access (comparable to today’s HMI systems). For networking within a larger factory or over the Internet, the PROFInet Web integration is based on a graded security concept. PROFInet Web integration recommends a security concept that has been optimized for the specific use case and includes one or more upstream security zones. No structural restrictions are imposed in the Web integration concept because the security measures are always arranged around PROFInet devices. As a result, PROFInet devices are not burdened, and the security concept can be optimized to the changing security requirements for a continuous automation solution. The best-practice recommendations for PROFInet Web integration contain scenarios and examples of how requirement-dependent security measures can be implemented around PROFInet devices. In this way, for example, security mechanisms can be used in the transport protocols (TCP/IP and HTTP). In addition, encoding, authentication, and access management can be scaled in the utilized Web servers. Advanced security elements such as application gateways can be added for Web services, if required.
11.9 OPC The PROFInet component model and OPC have the same technological basis: DCOM. This yields userfriendly options for data interchange between different plant units. OPC is a widely used interface for data exchange between applications in automation systems. OPC enables flexible selection of stations of different manufacturers and data exchange between the stations without programming. OPC is not object oriented like PROFInet, but rather is tag oriented. That is, the automation objects do not exist as COM objects but as names (tags).
11.9.1 OPC DA (Data Access) OPC DA (data access) is an industry standard that defines a uniform user interface to access process data. As a result of this standard, the following are harmonized: access to data of process and control devices, the locating of OPC servers, and simple browsing in the name spaces of the OPC servers.
11.9.2 OPC DX (Data Exchange) OPC DX (data exchange) defines a communication standard for the higher-level exchange of nontime-critical user data on the system level between controllers of different manufacturers and types (e.g., between PROFInet and EtherNet/IP) (Figure 11.22). However, OPC DX does not permit any direct access to the field level of another system. OPC DX represents an expansion of the OPC DA specification and defines interfaces for interoperable data exchange and server-to-server communication in Ethernet networks. OPC DX is commonly used by the following: • User and system integrators who integrate devices, controllers, and software of different manufacturers and want to enable access to jointly used data in multivendor systems • Manufacturers that want to offer products that are based on an open industry standard for interoperability and data exchange
© 2005 by CRC Press
Principles and Features of PROFInet
11-21
FIGURE 11.22 Cross-system data exchange with OPC DA and OPC DX.
11.9.3 OPC DX and PROFInet OPC DX was developed with the objective of having a minimum degree of interoperability between the different fieldbus systems and Ethernet-based communication protocols. In order to maintain an open connection to other systems, OPC DX was integrated in PROFInet. Integration is accomplished as follows: • Each PROFInet node can be referenced as an OPC server because the basic capacity already exists in the form of the PROFInet runtime implementation. • Each OPC server can be operated as a PROFInet node by means of a standard adapter. This is accomplished by the OPC objectizer, an SW component that implements a PROFInet device on the basis of an OPC server in a PC. This SW component need only be implemented once and can then be used for all OPC servers. The functionality and performance of PROFInet is significantly greater than OPC. Moreover, PROFInet offers the required real-time capability for automation solutions. On the other hand, OPC exhibits a higher degree of interoperability.
11.10 Integration of Fieldbus Systems PROFInet offers a model for integrating the existing PROFIBUS and other fieldbus systems. This enables any combination of fieldbus and Ethernet-based subsystems to be established. In this way, a continuous technology transfer from fieldbus-based systems to PROFInet is possible.
11.10.1 Migration Strategies In view of the large number of existing PROFIBUS systems, it is essential for purposes of protection of investment that these systems be able to be easily integrated into PROFInet (migrated) without any changes (Figure 11.23). The following cases are distinguished: • The plant operator would like to be able to easily integrate his existing installations into a new PROFInet automation concept that is to be installed. • The plant construction engineer would like to be able to use his proven and documented range of devices for PROFInet automation projects without any changes. • The device manufacturer or OEM would like to be able to integrate his existing field devices in PROFInet systems without expending any effort on changes.
© 2005 by CRC Press
11-22
The Industrial Communication Technology Handbook
FIGURE 11.23 PROFInet offers openness and protection of investment to device manufacturers and end customers.
There are two ways available in PROFInet for connection of fieldbus systems: • Integration of fieldbus devices by means of proxies • Integration of fieldbus applications
11.10.2 Integration by Means of Proxies The proxy concept in PROFInet enables simple, highly transparent integration of existing fieldbus systems (Figure 11.24). The proxy is the representative on the Ethernet for one or more fieldbus devices (e.g., on the PROFIBUS). This representative ensures transparent communication (no protocol tunneling) between the networks. For example, the representative forwards cyclical data to the fieldbus devices transparently. In the case of PROFIBUS DP, the proxy is, on the one hand, the PROFIBUS master that coordinates data exchange among the PROFIBUS stations and, on the other hand, an Ethernet station with PROFInet communication. A proxy can be implemented, for example, as a PLC or PC-based control or purely as a gateway. The DP slaves on the PROFIBUS are handled as IO devices in the PROFInet IO context. In the component view, the intelligent DP slaves are used as stand-alone PROFInet components. Within the PROFInet interconnection editor, such PROFIBUS components cannot be distinguished from the com-
FIGURE 11.24 Principles of integration of individual fieldbus devices using a proxy: PROFIBUS example.
© 2005 by CRC Press
Principles and Features of PROFInet
11-23
FIGURE 11.25 Principles of integration of fieldbus applications: PROFIBUS example.
ponents on the Ethernet. The use of proxies enables transparent communication between devices on different bus systems.
11.10.3 Integration of Fieldbus Applications An entire fieldbus application can be copied as a PROFInet component within the framework of the component model. This is always important if an existing operating plant is to be expanded using PROFInet. Which fieldbus was used to automate the plant unit plays no role in this case (Figure 11.25). To communicate with the existing plant using PROFInet, the fieldbus master in the PROFInet component must be PROFInet capable. Consequently, the existing fieldbus mechanisms (e.g., PROFIBUS DP) are used within the component and the PROFInet mechanisms are used outside the component. This migration option ensures that the investment of the user (plant operators and plant construction engineers) in existing plants and cabling is protected. In addition, existing know-how in the user programs is protected. This enables a seamless transition to new plant units with PROFInet.
11.10.4 PROFInet and Other Fieldbus Systems The proxy concept can be used to integrate other fieldbus systems besides PROFIBUS in PROFInet (e.g., Foundation Fieldbus, DeviceNet, CC-Link, etc.). A bus-specific image of the component interfaces must be defined and stored in the proxy for all possible data transmissions on each bus. This enables any fieldbus to be integrated in PROFInet with a manageable amount of effort.
11.11 PNO Offer Optimal support by the PROFIBUS User Organization (PNO) is important for rapid dissemination of PROFInet in the market (Figure 11.26). In order to guarantee this, a powerful offer of services and products has been established.
11.11.1 Technology Development 11.11.1.1 PROFInet IO A specification is available for PROFInet IO, which provides a detailed description of the device model and the behavior of a field device in the form of protocols and communication sequences (so-called state machines). This type of description has already been proven with PROFIBUS DP. The level of detail in the PROFInet IO specification permits software creation of a standard stack from different stack suppliers. It must therefore be anticipated that different implementations by a few firms will be offered. For example, Siemens offers implementation in the form of a development package.
© 2005 by CRC Press
11-24
The Industrial Communication Technology Handbook
FIGURE 11.26 PROFIBUS User Organization (PNO) offer.
11.11.1.2 Component Model Like PROFInet IO, the PROFInet component technology is available as a detailed specification. The specification includes all aspects: communication, device model, engineering, network management, Web integration, and fieldbus connection. In addition to the specification, the PROFIBUS User Organization is offering PROFInet software (in the form of source code) for the component technology. The PROFInet software includes the entire runtime communication. The combination of a specification and operating system-independent software (as source code) has created the opportunity for easy, timesaving integration of PROFInet into a wide range of device operating system environments. The PROFInet runtime software is structured in such a way that it supports a simple integration of existing application software in the runtime object model. Sample portings for Win32, Linux, VxWorks, and WinCE are already available for PROFInet. The PROFInet runtime software has a modular design and consists of various layers that must be adapted to each system environment. These adaptations are confined to the porting interfaces for the different functional parts of the environment, the operating system (e.g., WinCE), and the device application (e.g., PLC). Instructions for porting are available, enabling the device developer to easily understand the individual porting steps.
11.11.2 Quality Measures From the start, PNO has developed PROFInet with the assurance that the entire life cycle from the PROFInet specification to plant engineering is supported by measures that guarantee a high level of quality in each phase. 11.11.2.1 QA for the Specification and Implementation Processes The PROFInet specification and software are created in a working group represented by multiple companies (PROFInet Core Team) whose mission is to ensure that the entire development process ranging from the initial formulation of requirements to the release of the PROFInet software is conducted under the auspices of quality management (QM). Quality measures are governed by a quality assurance (QA) manual adapted to the boundary conditions of the multicompany development team. This ensures that the source code corresponds to the quality management rules in effect at that time. The QA manual describes the process model to be applied. It defines the terms, methods, and tools that must be applied in the QA measures. It also specifies the responsibilities in the overall QA process. Defect management
© 2005 by CRC Press
Principles and Features of PROFInet
11-25
represents a major component. Defect management includes a unique error classification and understandable error message communication. 11.11.2.2 Testing and Certification To ensure the interaction of all PROFInet devices and a high level of product quality from the start, a certification system has been established. Proven models for PROFIBUS products in accordance with the certification system have been established. Certification tests by accredited test laboratories authorized by PNO form the core of the process. The test for obtaining a certificate by these competent test laboratories ensures that the products offered conform to specifications and are free of errors. 11.11.2.3 Defect Database In order for defects and requests from end customers and device manufacturers to be systematically addressed in the runtime software, a defect database has been set up by PNO. A defect database containing a record of all defects and their status is available. The entries in the database comply with the QA process rules.
11.11.3 Technical Support An important factor for the success of PROFInet is that a sufficient number of PROFInet products from different manufacturers are available on the market within a short time. 11.11.3.1 Competence Center The PROFInet Competence Center has been established for support of the product development process. This ensures that porting to the different operation systems and adaptation to product-specific boundary conditions occur in an optimal manner, particularly in the initial phase when companies lack experience. The Competence Center builds up know-how in interested companies so that development departments can develop additional products skillfully without additional support. The services of the PROFInet Competence Center also include a telephone hotline and customized workshops. 11.11.3.2 Tools A tool for device manufacturers is required for creation of a component description of Ethernet devices in the form of an XML file. The PROFIBUS User Organization offers the PROFInet component editor — similar to the GSD editor in PROFIBUS DP — for download on it Web site (Figure 11.27). In order to prepare newly developed products for certification, PNO offers a PROFInet testing tool for download on its Web site (Figure 11.28). The PROFInet testing tool enables the device manufacturer to perform static tests prior to certification.
© 2005 by CRC Press
11-26
FIGURE 11.27 PROFInet component editor.
FIGURE 11.28 PROFInet test tool.
© 2005 by CRC Press
The Industrial Communication Technology Handbook
12 Dependable Time-Triggered Communication 12.1 Introduction ......................................................................12-1 12.2 Fundamental Concepts .....................................................12-2 Model of Time • State Information vs. Event Information • Temporal Firewalls • Communication Interface
12.3 The Time-Triggered Architecture.....................................12-5 Basic Services • The System Protocol TTP/C • The Fieldbus Protocol TTP/A
12.4 Fault Tolerance ................................................................12-11
Hermann Kopetz Vienna University of Technology
Günther Bauer Vienna University of Technology
Wilfried Steiner Vienna University of Technology
Fault Containment • Error Containment in the Temporal Domain • Error Handling in the Value Domain • Virtual Networks
12.5 The Design of TTA Applications ...................................12-13 Architecture Design • Component Design • Validation
12.6 Conclusions .....................................................................12-15 Acknowledgments......................................................................12-15 References ...................................................................................12-15
12.1 Introduction Clean and well-understood concepts are necessary for the design of every complex system. The TimeTriggered Architecture (TTA) covers such concepts and design principles that allow the implementation of large distributed real-time systems in high-dependability environments. Characteristic of the TimeTriggered Architecture is the concept of a sparse-time base that addresses physical time as a first-order quantity: each node in the distributed system holds a local clock, and the set of these local clocks is used to maintain a fault-tolerant global time base of known precision (which is the maximum deviation of any two local clock values of correct nodes). TTA takes advantage of the availability of this global time to simplify the communication and agreement protocols, to perform prompt error detection, to guarantee the timeliness of real-time applications, and to precisely specify the interfaces of the nodes not only in the value, but also in the temporal domain where other architectures fail to provide the necessary information. Based on these precise interface specifications, TTA provides mechanisms and guidelines to partition a large application into nearly autonomous subsystems and sets up the computing infrastructure in order to control the complexity of the evolving artifact. By defining an architectural style that is observed at all component interfaces, the architecture avoids property mismatches at the interfaces and eliminates the need for unproductive “glue” code.
12-1 © 2005 by CRC Press
12-2
The Industrial Communication Technology Handbook
Ticks of Global Time
a
s
a
a: Duration of Activity
s
a
s
Real Time
s: Duration of Silence
FIGURE 12.1 Sparse-time base.
This chapter aims to give a short, yet concise introduction to the Time-Triggered Architecture in general and time-triggered communication networks in particular. We will discuss the basic concepts of TTA in Section 12.2. Two representatives of time-triggered network protocols will be presented in Section 12.3. Section 12.4 will discuss fault tolerance aspects of TTA. Section 12.5 will elaborate on the design of TTA applications. Finally, the chapter ends with a conclusion in Section 12.6.
12.2 Fundamental Concepts The time-triggered (TT) model of computation [6] is fundamental to the design of the Time-Triggered Architecture. The following sections will discuss the concepts of this architectural model. A more detailed description of the concepts can be found in [3].
12.2.1 Model of Time The model of time of TTA is based on Newtonian physics. Real time progresses along a dense timeline, consisting of an infinite set of instants, from the past to the future. A duration (or interval) is a section of the timeline, delimited by two instants. A happening that occurs at an instant (i.e., a cut of the timeline) is called an event. An observation of the state of the world is thus an event. The time stamp of an event is established by assigning the state of the node–local global time to the event immediately after the event occurrence. This global time is established by a fault-tolerant clock synchronization algorithm. Due to the impossibility of synchronizing clocks perfectly and the denseness property of real time, there is always the possibility of the following sequence of events: clock of node j ticks, event e occurs, clock of node k ticks. In such a situation, the single event e is time-stamped by the two clocks j and k with a difference of one tick. In a distributed system, the finite precision of the global time base and the digitalization of time make it — in general — impossible to consistently order events on the basis of their global time stamps. TTA solves this problem by the introduction of a sparse-time base [5, p. 55]. In the sparse-time model the continuum of time is partitioned into an infinite sequence of alternating durations of activity and silence, as shown in Figure 12.1. The duration of the activity interval, i.e., a granule of the global time, must be larger than the precision of the clock synchronization. From the point of view of temporal ordering, all events that occur within an interval of activity are considered to happen at the same time. Events that happen in the distributed system at different nodes at the same global clock tick are thus considered simultaneous. Events that happen during different durations of activity and that are separated by the required interval of silence can be consistently temporally ordered on the basis of their global time stamps. The architecture must make sure that significant events, such as the sending of a message, occur only during an interval of activity. The time stamps of events that are outside the control of the distributed computer system (and therefore happen on a dense timeline) must be assigned to an agreed duration of activity by an agreement protocol. In TTA there exists a uniform external representation of time that is modeled according to the global positioning system (GPS) time representation. The time stamp of an instant is represented by an 8-byte integer, i.e., two words of a 32-bit architecture. The three lower bytes contain the binary fractions of the second, giving a granularity of about 60 ns. This is the accuracy that can be achieved with a precise GPS receiver. The five upper bytes count the full seconds. The external TTA epoch assigns the value 238 to the start of the GPS epoch, i.e., 00:00:00 universal time coordinated (UTC) on January 6, 1980. This offset has been chosen in order that instants before January 6, 1980, can also be represented by positive integers
© 2005 by CRC Press
Dependable Time-Triggered Communication
12-3
in TTA. Thus, events that occurred between 8,710 years before January 1980 and 26,131 years after January 1980 can be time-stamped with an accuracy of 60 ns. There are different internal time representations in TTA that match the time format to the capabilities of the hardware (8-, 16-, 32-bit architectures) and the requirements of the application. Since not all time stamps are based on a global time with a precision of 60 ns, an attribute field is introduced in the external representation indicating the precision of a time stamp.
12.2.2 State Information vs. Event Information The information that is exchanged across an interface is either state information or event information. Any property of a real-time (RT) entity (i.e., a relevant state variable) that is observed by a node of the distributed real-time system at a particular instant, e.g., the temperature of a vessel, is called a state attribute and the corresponding information state information. A state observation records the value of a state variable at a particular instant, the point of observation. A state observation can be expressed by the atomic triple < Name, value, time of observation > Example: State observation: “The position of control valve A was at 75˚ at 10:42 A.M.” State information is idempotent and requires an at-least-once semantics when transmitted to a client. At the sender, state information is not consumed on sending, and at the receiver, state information requires an update-in-place and a nonconsuming read. State information is transmitted in state messages. A sudden change of state of an RT entity that occurs at an instant is an event. Information that describes an event is called event information. Event information contains the difference between the state before the event and the state after the event. An event observation can be expressed by the atomic triple < Name, value difference, time of event > Example: Event observation: “The position of control valve A changed by 5˚ at 10:42 A.M.” Event observations require exactly-once semantics when transmitted to a consumer. At the sender, event information is consumed on sending, and at the receiver, event information must be queued and consumed on reading. Event information is transmitted in event messages. Periodic state observations or sporadic event observations are two alternative approaches for the observation of a dynamic environment in order to reconstruct the states and events of the environment at the observer. Periodic state observations produce a sequence of equidistant “snapshots” of the environment that can be used by the observer to reconstruct those events that occur within a minimum temporal distance that is longer than the duration of the sampling period. Starting from an initial state, a complete sequence of (sporadic) event observations can be used by the observer to reconstruct the complete sequence of states of the RT entity that occurred in the environment. However, if there is no minimum duration between events assumed, the observer and the communication system must be infinitely fast.
12.2.3 Temporal Firewalls An extensible architecture must be based on a small number of orthogonal concepts that are reused in many different situations in order to reduce the mental load required for understanding large systems. In a large distributed system the characteristics of the interfaces between subsystems determine to a large extent the comprehensibility of the architecture. In TTA, the communication network interface (CNI; Figure 12.2) between a host computer and the communication network is the most important interface. The CNI appears in every node of the architecture and separates the local processing within a node from the global interactions among the nodes. The CNI consists of two unidirectional data flow interfaces, one from the host computer to the communication system and the other in the opposite direction.
© 2005 by CRC Press
12-4
The Industrial Communication Technology Handbook
Input-Output Subsystem CNI Host Processor with Memory, Operating System and Application Software CNI TT Communication Controller to/from Replicated Communication Channels FIGURE 12.2 Node of TTA.
We call a unidirectional data flow interface elementary if there is only a unidirectional control flow [7] across this interface. An interface that supports periodic state messages with error detection at the receiver is an example of such an elementary interface. We call a unidirectional data flow interface composite if even a unidirectional data flow requires a bidirectional control flow. An event message interface with error detection is an example of a composite interface. Composite interfaces are inherently more complex than elementary interfaces, since the correct operation of the sender depends on the control signals from all receivers. This can be a problem in multicast communication where many control messages are generated for every unidirectional data transfer, and each one of the receivers can affect the operation of the sender. The basic CNI of TTA as depicted in Figure 12.3 is an elementary interface. The time-triggered transport protocol carries autonomously — driven by its time-triggered schedule — state messages from the sender’s CNI to the receiver’s CNI. The sender can deposit the information into its local CNI memory according to the information push paradigm, while the receiver will pull the information out of its local CNI memory. From the point of view of temporal predictability, information push into a local memory at the sender and information pull from a local memory at the receiver are optimal, since no unpredictable task delays that extend the worst-case execution occur during reception of messages. A receiver that is working on a time-critical task is never interrupted by a control signal from the communication system. Since no control signals cross the CNI in TTA (the communication system derives control signals for the fetch-and-delivery instants from the progress of global time and its local schedule exclusively), propagation of control errors is prohibited by design. We call an interface that prevents propagation of control errors by design a temporal firewall [4]. The integrity of the data in the temporal firewall is assured by the nonblocking write (NBW) concurrency control protocol [5, p. 217].
12.2.4 Communication Interface From the point of view of complexity management and composability, it is useful to distinguish between three different types of interfaces of a node: the real-time service (RS) interface, the diagnostic and management (DM) interface, and the configuration and planning (CP) interface [8]. These interface types serve different functions and have different characteristics. For the temporal composability, the most important interface is the RS interface. 12.2.4.1 The Real-Time Service Interface The RS interface provides the timely real-time services to the node environment during the operation of the system. In real-time systems it is a time-critical interface that must meet the temporal specification of the application in all specified load and fault scenarios. The composability of an architecture depends
© 2005 by CRC Press
12-5
Dependable Time-Triggered Communication
CNI Memory
CNI Memory
Pull
Sender
Push
Global Time
Receiver
Cluster Communication System Control Flow
Data Flow
FIGURE 12.3 Data flow and control flow at a TTA interface.
on the proper support of the specified RS interface properties (in the value and temporal domains) during operation. From the user’s point of view, the internals of the node are not visible at the CNI, since they are hidden behind the RS interface. 12.2.4.2 The Diagnostic and Management Interface The DM interface opens a communication channel to the internals of a node. It is used for setting node parameters and for retrieving information about the internals of the node, e.g., for the purpose of internal fault diagnosis. The maintenance engineer that accesses the internals of a node via the DM interface must have detailed knowledge about the internal objects and behavior of the node. The DM interface does not affect temporal composability. Usually, the DM interface is not time critical. 12.2.4.3 The Configuration and Planning Interface The CP interface is used to connect a node to other nodes of a system. It is used during the integration phase to generate the “glue” between the nearly autonomous nodes. The use of the CP interface does not require detailed knowledge about the internal operation of a node. The CP interface is not time critical. The CNI of TTA can be directly used as the real-time service interface. On input, the precise interface specifications (in the temporal and value domains) are the preconditions for the correct operation of the host software. On output, the precise interface specifications are the postconditions that must be satisfied by the host, provided the preconditions have been satisfied by the host environment. Since the bandwidth is allocated statically to the host, no starvation of any host can occur due to high-priority message transmission from other hosts. TTA implements an event-triggered communication service on top of the basic TT service to realize the DM and CP interfaces. Since the event-triggered communication is based on (but not executed in parallel to) the time-triggered communication, it is possible to maintain and to use all predictability properties of the basic TT communication service in event-triggered communication.
12.3 The Time-Triggered Architecture The range of TTA’s services is understood best if put into a broader context: the integrated project Dependable Computer Systems (DECOS) aims to develop technologies to move from federated distributed architectures to integrated distributed architectures [1]. While federated basically means that each application’s subsystem is placed on an independent node, integrated architectures try to unite several application’s subsystems on a single node. A schematic overview of the DECOS approach for integrated distributed architectures is depicted in Figure 12.4. An application is divided into different distributed application subsystems (DASs); such subsystems could be, for example, the power train, a breaking system, a steering system, etc. In a federated architecture each DAS would be implemented on a single node; an integrated architecture provides services that allow more DASs to be implemented on a single node. These services form the platform interface layer (PIL). Examples of PIL services are:
© 2005 by CRC Press
12-6
The Industrial Communication Technology Handbook
DAS A
DAS B
DAS C
DAS D
Platform Interface Layer (PIL)
Basic Services
Different Implementation Platforms and Choices
FIGURE 12.4 Structure of DECOS integrated distributed architecture.
• • • • • •
Encapsulation services Event-triggered communication Virtual networks Hidden gateways Provision of legacy interfaces Application diagnosis support
12.3.1 Basic Services The PIL services rely on a set of validated basic services. TTA is a target architecture that provides the basic services. 12.3.1.1 Predictable Time-Triggered Transmission The very basic principle of time-triggered communication is that transmission of messages is triggered by the clock rather than the availability of new information; the so-called time-division multiple-access (TDMA) strategy is used. In an architecture using TDMA, time is split up into (nonoverlapping) pieces of not necessarily equal durations, which are called slots. These slots are grouped into sequences called TDMA rounds, in which every node occupies exactly one slot. The knowledge of which node occupies which slot in a TDMA round is static, available to all components a priori, and equal for all TDMA rounds. When the time of a node’s slot is reached, the node is provided exclusive access to the communications medium for the duration of the slot, t islot , where 0 £ i < n (assuming there are n nodes in the system). The sending slot, t islot, of a respective node i is split up into three phases: presend, transmit, postreceive, where in the first phase preparations for the transmission are done, and the actual sending process is done in the second phase. During the postreceive phase the state of the nodes is updated according to the received messages. Durations between two consecutive transmit phases of succeeding nodes are called interframe gaps. The interframe gaps have to be chosen with respect to the postreceive phase and the different propagation delays of the messages on the channels. After the end of one TDMA round, the next TDMA round starts; that is, after the sending of the node in the last slot of a TDMA round, the node that is allowed to send in the first slot sends again. Consequently, each node sends predictably every tround time units, where tround = S in=-01t islot. 12.3.1.2 Fault-Tolerant Clock Synchronization It is widely understood that a common agreement on physical time throughout the complete systems is necessary for distributed control applications. Since safety-critical systems shall not rely on a single point of failure, each fault-tolerant solution for clock synchronization requires a distributed solution. Typically,
© 2005 by CRC Press
Dependable Time-Triggered Communication
12-7
we can distinguish three phases in a fault-tolerant clock synchronization algorithm [5]: In the first phase, each node that participates in clock synchronization acquires information on the local views on the global time in all other nodes. The required message exchange can be implemented either by the exchange of dedicated synchronization messages or by a priori knowledge of the transmission pattern of regular message flow (implicit synchronization). In the second phase each node executes a convergence function based on the received deviation values from the different nodes. In the third phase a node adjusts its local timer that represents the local view on the global time by the output of the convergence function. The adjustment procedure can be implemented either as state correction, where the local timer is corrected at an instant, or as rate correction, where the local timer is corrected over an interval by accelerating or decelerating the speed of the local clock. More sophisticated clock synchronization algorithms take the stability of the drift of the node’s local clock into account and correct the rate of the clock in advance. Case studies show that a combination of a regular clock synchronization algorithm with a rate correction algorithm yields an impressive quality of the precision in the system. A crucial phase of clock synchronization is the initial synchronization after power-on when the nodes within a system are unsynchronized (since the power-on times of different nodes may vary, and thus the local clocks start to run at different points in time). Start-up algorithms have to be used to achieve a sufficient degree on initial synchronization. One possible solution for a start-up algorithm is a variation of a clock synchronization algorithm: after power-on, the local clocks of different nodes may be far apart, but successive rounds of message exchange and convergence should achieve a sufficient precision. However, if the exchange of messages, in particular synchronization messages, itself requires synchronization between the nodes, as is the case in time-triggered protocols, this solution cannot be implemented and a dedicated start-up algorithm has to be constructed. 12.3.1.3 Determinism A definition of a timely and deterministic multicast channel is given in [9] by the following three properties: 1. Timeliness: Given that a message is sent at the send instant tsend, then the receive instants treceive at all receivers of the (multicast) message will be in the interval [tsend + dmin, tsend + dmax], where dmin is called the minimum delay and dmax is called the maximum delay. The difference dmax – dmin is called the jitter of the communication channel. dmax and dmin are a priori known characteristic parameters of the given communication channel. 2. Constant order: The receive order of the messages is the same as the send order. The send order among all messages is established by the temporal order of the send instants of the messages as observed by an omniscient observer. 3. Agreed order: If the send instances of n (n > 1) messages are the same, then an order of the n messages will be established in an a priori known manner. We call a communication channel that fulfills properties 2 and 3 ordinal deterministic. If a communication channel fulfills all properties stated above, this communication channel is temporal deterministic; thus, temporal determinism is a stronger form of determinism than ordinal determinism. We call a communication channel path deterministic if there exists an a priori known route from a sending to a receiving node. Path determinism and temporal determinism are therefore orthogonal properties. 12.3.1.4 Fault Isolation In the field of fault-tolerant computing the notion of a fault containment region (FCR) is introduced in order to delimit the impact of a single fault. A fault containment region is defined as the set of subsystems that share one or more common resources. A fault in any one of these shared resources can thus impact all subsystems of the FCR; i.e., the subsystems of an FCR cannot be considered to fail independently of each other. In the context of this chapter we consider the following resources that can be impacted by a fault: • Computing hardware • Power supply
© 2005 by CRC Press
12-8
The Industrial Communication Technology Handbook
• Timing source • Clock synchronization service • Physical space For example, if two subsystems depend on a single timing source, e.g., a single oscillator or a single clock synchronization algorithm, then these two subsystems are not considered to be independent and therefore belong to the same FCR. Since this definition of independence allows that two FCRs can share the same design, i.e., the same software, software faults are not part of this fault model. In TTA a node is considered to form a single FCR. An architecture for safety-critical systems has to ensure that a fault that affects one FCR is isolated so that it will not cause other FCRs to fail. 12.3.1.5 FCR Diagnosis (Membership) The failure of an FCR must be reported to all other FCRs in a consistent manner within a short latency [5]. The membership service is a form of concurrent diagnosis that realizes such a detection service. The time-triggered protocols TTP/C and TTP/A are concrete implementations of TTA services. TTP/C is designed for ultrahigh dependable systems and thus tolerates either an arbitrary failure of any one of its nodes or a passive arbitrary failure of one of its channels (that means that even a faulty channel will not be allowed to create a correct TTP/C message itself). Furthermore, TTP/C is equipped with fault tolerance mechanisms that ensure that if the fault assumptions are temporally violated, the system will be able to recover within a bounded duration after the fault assumptions hold again. To ensure this robustness TTP/C implements all listed basic services. The low-cost TTP/A protocol is intended for usage as a fieldbus protocol and tolerates only fail-silent components. It implements only the predictable time-triggered transmission service. We discuss the time-triggered protocols TTP/C and TTP/A next.
12.3.2 The System Protocol TTP/C
Hub
The time-triggered protocol for Society of Automotive Engineers (SAE) Class C applications (TTP/C) currently supports bus (Figure 12.5a) and star (Figure 12.5b) topologies as well as hybrid compositions of those. The communication medium is replicated to compensate transmission failures of messages. The communication links are half duplex; that is, a node is able to either transmit or receive via an attached link. Full-duplex links would not bring advancements since the TDMA strategy excludes the possibility of more than one good node transmitting concurrently.* TTP/C realizes the predictable time-triggered transmission service by adhering to an a priori defined communication schedule that organizes communication into TDMA rounds. Several successive TDMA rounds form a cluster cycle. The messages a node may send may differ with respect to the TDMA
Node
Node
Node
Node
Node
Node
Node
Hub
Node
a)
b)
FIGURE 12.5 Different TTP/C topologies.
*Full-duplex links may bring advancements during the start-up phase of the protocol; the current start-up algorithm, however, is designed for half-duplex links.
© 2005 by CRC Press
Dependable Time-Triggered Communication
12-9
round in the cluster cycle. When a cluster cycle is finished, it is restarted such that the cluster cycle is executed cyclically. The communication schedule, the so-called message description list (MEDL), is stored within the communication controller of each node. In addition to the time-triggered transmission concept described in Section 12.3.1.1, TTP/C also supports multiplexed nodes and shadow nodes. A set of nodes is called to be multiplexed if they share the same slot in a TDMA round. Depending on the TDMA round in a cluster cycle, the single node that is allowed to send in the multiplexed slot is identified (this information is stored in the MEDL as well). A shadow node has a dedicated slot in the TDMA round but will only transmit in this slot if it detects that its primary node fails to send. After recovery of the primary node the former primary will act as shadow node. A particular message may carry up to 240 bytes of data. The data are protected by a 24-bit cyclic redundancy check (CRC) checksum. In order to achieve high data efficiency, the sender name and message name are derived from the send instant. We distinguish between three different types of messages in TTP/C: I-frames, N-frames, and X-frames. I-frames carry the current controller state (C-state) and can be used for nodes that are out of synchronization to reintegrate into a running system. N-frames are used for regular application data and do not carry C-state information explicitly. However, the sending node calculates the CRC checksum using the N-frame and its internal C-state. A receiving node will calculate the CRC checksum using the received N-frame and its C-state. Thus, the CRC check will only be successful if both sender and receiver agree on the C-state. Using this form of CRC checksum calculation makes it impossible for a receiver to distinguish a transmission failure from a disagreement on the Cstate. X-frames (that is, N-frames that carry the C-state information explicitly) overcome this limitation. The fault-tolerant clock synchronization of TTP/C exploits the common knowledge of the send schedule: every node measures the difference between the a priori known expected and the actually observed arrival time of a correct message to learn about the difference between the clock of the sender and the clock of the receiver. This information is used by a fault-tolerant average algorithm to calculate periodically a correction term for the local clock in order to keep the clock in synchrony with all other clocks of the cluster. The clock synchronization algorithm has been formally verified in [16]. TTP/C uses a faulttolerant start-up algorithm that ensures that the system will become synchronized within an upper bound in time, provided that there is a minimum number of components awake. The start-up algorithm used in TTP/C is a waiting room algorithm that is based on unique time-outs. Each node i has two unique ) and cold-start time-out (t CS time-outs, listen time-out (t listen i i ). For each pair of nodes i, j the following relation holds: t listen > t CS i j
(12.1)
time units. After power-up, say at t0, node k starts to listen on the communication channels for t listen k If there is already synchronous operation established, node k will receive a frame during this period. If the , it will initiate cold-start by itself by sending a coldnode was not able to synchronize during t0 + t listen k start frame. After transmission of the cold-start frame, node k listens to the communication channel until t 0 + t listen . If node k was not able to integrate until this point in time, and no collision occurred, + t CS k k node k will send another cold-start frame. Node k will transmit cold-start frames with a period of t CS k until it successfully synchronizes to a received frame. Extensive model-checking studies of the start-up concept, including exhaustive failure simulation, were performed in [18]. As a key lemma of these studies, it was verified that a minimum configuration of three nodes and intelligent central guardians is necessary and sufficient to tolerate one arbitrarily faulty node or one passive arbitrarily faulty central guardian during the start-up sequence. The membership service employs a distributed agreement algorithm to determine whether the outgoing link of the sender or the incoming link of the receiver has failed. Nodes that have suffered a transmission fault are excluded from the membership until they restart with a correct protocol state. Before each send operation of a node, the clique avoidance algorithm checks if the node is a member of the majority clique. Certain aspects of TTA group membership service have been formally verified in [15].
© 2005 by CRC Press
12-10
The Industrial Communication Technology Handbook
The fault tolerance concepts of TTA that are used in TTP/C are discussed in detail in Section 12.4. As in any distributed computing system, the performance of TTA depends primarily on the available communication bandwidth and computational power. Because of physical effects of time distribution and limits in the implementation of the guardians [19], a minimum interframe gap of about 5 ms must be maintained between frames to guarantee the correct operation of the guardians. If a bandwidth utilization of about 80% is intended, then the message send phase must be in the order of about 20 ms, implying that about 40,000 messages can be sent per second within such a cluster. With these parameters, a sampling period of about 250 ms can be supported in a cluster composed of 10 nodes. The precision of the clock synchronization in current prototype systems is below 1 ms. If the interframe gap and bandwidth limits are stretched, it might be possible to implement in such a system a 100-ms TDMA round (corresponding to a 10-kHz control loop frequency), but not much smaller if the system is physically distributed (to tolerate spatial proximity faults). The amount of data that can be transported in the 20-ms window depends on the bandwidth: in a 5 MBit/s system it is about 12 bytes; in a 1 GBit/s system it is about 2400 bytes. A prototype implementation of TTP/C using Gigabit Ethernet [17] was developed within the next TTA project. This prototype implementation uses COTS (commercially available, off-the-shelf technology) hardware and was therefore not expected to achieve the limiting performance. The objective of this project was rather to determine the performance that can be achieved without special hardware and to pinpoint the performance bottlenecks faced when using COTS components. TTP/C is commercially available in the form of the automotive qualified TTP/C-C2 chip [22]. A Federal Avionics Aviation (FAA) certification process (DO-178b) is currently under finalization that shall also prove the appropriateness of the hardware for avionics applications. The detailed specification of the TTP/C protocol can be found at [20]. There are several ongoing projects that use TTP/C; examples are a railway signaling system or the cabin pressure control in Airbus A380. See [21] for a list of projects that employ TTP/C as a commercial product.
12.3.3 The Fieldbus Protocol TTP/A The TTP/A protocol is the time-triggered fieldbus protocol of TTA. It is used to connect low-cost smart transducers to a node of TTA, which acts as the master of a transducer cluster. In TTP/A the CNI memory element of Figure 12.3 has been expanded at the transducer side to hold a simple interface file system (IFS). Each interface file contains up to 256 records of four bytes each. The IFS forms the uniform name space for the exchange of data between a sensor and its environment (Figure 12.6). The IFS holds the real-time data, calibration data, diagnostic data, and configuration data. The information between the IFS of the smart transducer and the CNI of the TTA node is exchanged by the time-triggered TTP/A protocol, which distinguishes between two types of rounds, the master–slave (MS) round and the multipartner (MP) round. The MS rounds are used to read and write records from the IFS of a particular transducer to implement the DM and CP interfaces. The MP rounds are periodic and transport data from selected IFS records of several transducers across the TTP/A cluster to implement the RS. MP rounds and MS rounds are interleaved, such that the time-critical RS implemented by means of MP rounds and the event-based MS service can coexist. It is thus possible to diagnose a smart transducer
Internal Logic of Transducer
Interface File System (IFS)
Sensor FIGURE 12.6 Interface file system in a smart transducer.
© 2005 by CRC Press
Read by Client Write
Dependable Time-Triggered Communication
12-11
or to reconfigure or install a new smart transducer online, without disturbing the time-critical RS of the other nodes. The TTP/A protocol also supports a plug-and-play mode where new sensors are detected, configured, and integrated into a running system online and dynamically. The detailed specification of the TTP/A protocol can be found at [14].
12.4 Fault Tolerance In any fault-tolerant architecture it is important to distinguish clearly between fault containment and error containment. Fault containment is concerned with limiting the immediate impact of a single fault to a defined region, while error containment tries to avoid the propagation of the consequences of a fault, the error. It must be prohibited that an error in one fault containment region propagate into another fault containment region that has not been directly affected by the original fault.
12.4.1 Fault Containment In TTA, nodes communicate by the exchange of messages across replicated communication channels. Each one of the two channels transports independently its own copy of the message at about the same time from the sending CNI to the receiving CNI. The start of sending a message by the sender is called the message send instant. The termination of receiving a message by the receiver is called the message receive instant. In TTA, the intended message send instants and the intended message receive instants are a priori known to all communicating partners. A message contains an atomic data structure that is protected by a CRC. We make the assumption that a CRC cannot be forged by a fault. A message is called a valid message if it contains a data structure with a correct CRC. A message is called a timely message if it is a valid message and conforms to the temporal specification. A message that does not conform to the temporal specification is an untimely message. A timely message is a correct message if its data structure is in agreement, at both the syntactic and semantic levels, with the specification. We call a message with a message length that differs from its specification or with an incorrect CRC an invalid message.
12.4.2 Error Containment in the Temporal Domain An error that is caused by a fault in the sending FCR can propagate to another FCR via a message failure; i.e., the FCR sends a message that deviates from the specification. A message failure can be a message value failure or a message timing failure. A message value failure implies either that a message is invalid or that the data structure contained in a valid message is incorrect. A message timing failure implies that the message send instant or the message receive instant is not in agreement with the specification. In order to avoid error propagation of a sent message, we need error detection mechanisms that are in different FCRs than the message sender. Otherwise, the error detection mechanism may be impacted by the same fault that caused the message failure. In TTA we distinguish between timing failure detection and value failure detection. Timing failure detection is performed by a guardian (Figure 12.7), which is part of TTA. Value failure detection is the responsibility of the host computer. The guardian is an autonomous unit that has a priori knowledge of all intended message send and receive instants. Each one of the two replicated communication channels has its own independent guardian. A receiving node within TTA judges a sending node as operational, if it has received at least one timely message from the sender around the specified receive instant. It is assumed that a guardian cannot forge a CRC and cannot store messages; i.e., it can only output a valid message at one of its output ports if it has received a valid message on one of its input ports within the last d time units. A guardian transforms a message that it judges to be untimely into an invalid message by cutting off its tail. Such a truncated message will be recognized as invalid by all correct receivers and will then be discarded. The guardian may truncate a message either because it detected a message timing failure or because the guardian itself is faulty. In the latter case it is assumed that the sender of the message is correct, and thus the correct message will proceed to the receivers via the replicated channel of TTA.
© 2005 by CRC Press
12-12
The Industrial Communication Technology Handbook
TT
P/ C
CN I
Star Coupler including Central Guardian and TTP/C Communication Controller
TTP/C Communication Controller
C P/ TT I
CN FIGURE 12.7 TTA star topology with central guardian.
12.4.3 Error Handling in the Value Domain Detection of value failures is not the responsibility of TTA, but of the host computers. For example, detection and correction of value failures can be performed in a single step by triple modular redundancy (TMR). In this case three replicated senders, placed in three different FCRs, perform the same operations in their host computers. They produce — in the fault-free case — correct messages with the same content that are sent to three replicated receivers that perform a majority vote on these three messages (actually, at the communication level six messages will be transported, one from each sender on each of its two channels). Detection of value failures and detection of timing failures are not independent in TTA. In order to implement a TMR structure at the application level, the integrity of the timing of the architecture must be assumed. An intact sparse global time base is a prerequisite for the systemwide definition of the distributed state, which again is a prerequisite for masking value failures by voting. The separation of handling timing failures from handling value failures has beneficial implications for resource requirements. In general, it is necessary to implement interactive consistency to solve the Byzantine Generals Problem: a set of nodes has to agree upon a correct value in the presence of faulty nodes that may be asymmetric faulty. A Byzantine-tolerant algorithm that establishes interactive consistency in the presence of k arbitrary failing nodes requires 3k + 1 nodes and several rounds of message exchange [12]. For clock synchronization, and thus for the maintenance of the sparse global time base, instead of an interactive consistency algorithm, an interactive convergence algorithm [11] can be used that needs only a single round of message exchange. TTA claims to tolerate one arbitrary faulty component (that is, k = 1). Since all nodes of a cluster, independent of their involvement in a particular application system, can contribute to handling timing failures at the architectural level, the lower bound of nodes in a system is 4, which is a relatively small number for real systems. Once a proper global time has been established, TMR for masking of value failures can be implemented using only 2k + 1 synchronized nodes in a particular application subsystem. Two concepts contribute to this fine property: the self-confidence principle and replica determinism. According to the self-confidence principle, a node will consider itself correct until it is accused by a sufficient set of nodes. A set of nodes that operates replica determinant will produce the same output that are at most an a priori specifiable interval d apart [5]. That means that the tolerance of a Byzantine-faulty component does not necessarily
© 2005 by CRC Press
12-13
Dependable Time-Triggered Communication
Physical CAN Bus
Logical CAN Bus
CAN Controller
High-dependable TTP/C TTP/C Communication Communication Controller FIGURE 12.8 Virtual CAN on top of TTP/C.
require a solution to the Byzantine Generals Problem. The Byzantine Generals Problem has to be solved only if values from the environment are received, and the nodes have to come to a consistent view on these values. This separation of timing failures and value failures thus reduces the number of components needed for fault tolerance of an application from 3k + 1 to 2k + 1.
12.4.4 Virtual Networks As it is most likely that in a real system the nodes are of mixed criticality, it is economically feasible to provide a communication infrastructure that is of mixed dependability. Nodes that execute a highly dependable task are of high criticality and communicate via a highly dependable network protocol, while nodes of minor criticality operate on a low-dependable network protocol. TTA provides a mixeddependability communication infrastructure by virtual networks. Virtual networks provide a logical network structure on top of a physical network structure by emulation. Example: Recent research was concerned with a prototype study of CAN over TTP/C [13]. In this work two physical CAN networks were connected to a TTP/C cluster via two gateway nodes (Figure 12.8). The CAN messages are tunneled through the TTP/C system. Thus, the physically separated CAN buses form one logical CAN bus in a transparent fashion for the CAN controllers. With the virtual network approach, it is possible to have low-critical nodes communicate via a dynamic protocol while highly critical nodes communicate via the highly dependable TTP/C. Furthermore, the logical CAN bus consists of three independent fault containment regions, and thus a babbling CAN controller will only affect the physical part of the CAN bus where it is located. This approach is also scalable with respect to the number of logical CAN buses. To summarize, fault containment and error detection are achieved in TTA in three distinct steps. First, fault containment is achieved by proper architectural decisions concerning resource sharing in order to provide independent fault containment regions. Second, propagation of timing errors is avoided at the architecture level by the guardians. Third, handling of value failures is performed at the application level by voting.
12.5 The Design of TTA Applications Composability and the associated reuse of nodes and software can only be realized if the architecture supports a two-level design methodology. In TTA such a methodology is supported: TTA distinguishes between the architecture design (cluster design) and the component design (node design).
© 2005 by CRC Press
12-14
The Industrial Communication Technology Handbook
I/O TTP/A Network Driver Interface
Vehicle Dynamics
Brake Manager
Engine Control
I/O
I/O
Communication Controller
Gateway Body
Steering Manager
Suspension
I/O
I/O
Communication Network Interface
Replicated Broadcast Channels
FIGURE 12.9 Decomposition of a drive-by-wire application.
12.5.1 Architecture Design In the cluster design phase, an application is decomposed into clusters and nodes. This decomposition will be guided by engineering insight and the structure inherent in the application, in accordance with the proven architecture principle of form follows function. For example, in an automotive environment, a drive-by-wire system may be decomposed into functional units, as depicted in Figure 12.9. If a system is developed “on the green lawn,” then a top-down decomposition will be pursued. After the decomposition has been completed, the CNIs of the nodes must be specified in the temporal and value domains. The data elements that are to be exchanged across the CNIs are identified, and the precise fetch instants and delivery instants of the data at the CNI must be determined. Given these data, the schedules of the TTP/C communication system can be calculated and verified. At the end of the architecture design phase, the precise interface specifications of the nodes are available. These interface specifications are the inputs and constraints for the node design. Given a set of available nodes with their temporal specifications (nodes that are available for reuse), a bottom-up design approach must be followed. Given the constraints of the nodes at hand (how much time they need to calculate an output from an input), a TTP/C schedule must be found that meets the application requirements and satisfies the node constraints.
12.5.2 Component Design During the node design phase, the application software of the host computer is developed. The deliveryand-fetch instants established during the architecture design phase are the preconditions and postconditions for the temporal validation of the application software. The host operating system can employ any reasonable scheduling strategy, as long as the given deadlines are satisfied and the replica determinism of the host system is maintained. Node testing proceeds bottom up. A new node must be tested with respect to the given CNI specifications in all anticipated load and fault conditions. The composability properties of TTA (stability of prior service achieved by the strict adherence to information pull interfaces) ensure that a property that has been validated at the node level will also hold at the system level. At the system level, testing will focus on validating the emerging services that are a result of the integration.
12.5.3 Validation Today, the integration and validation phases are probably the most expensive phases in the implementation of a large distributed real-time system. TTA has been designed to reduce this integration and validation effort by providing the following mechanisms:
© 2005 by CRC Press
Dependable Time-Triggered Communication
12-15
• The architecture provides a consistent distributed computing base to the application and informs the application in case a loss of consistency is caused by a violation of the fault hypothesis. The basic algorithms that provide this consistent distributed computing base (clock synchronization and membership) have been analyzed by formal methods and are implemented once and for all in silicon. The application need not be concerned with the implementation and validation of the complex distributed agreement protocols that are needed to establish consistency in a distributed system. • The architecture is replica deterministic, which means that any observed deficiency can be reproduced in order to diagnose the cause of the observed problem. • The interaction pattern between the nodes and the contents of the exchanged messages can be observed by an independent observer without the probe effect. It is thus possible to determine whether a node complies with its preconditions and postconditions without interfering with the operation of the observed node. • The internal state of a node can be observed and controlled by the DM interface. • In TTA it is straightforward to provide a real-time simulation test bench that reproduces the environment to any node in real time. Deterministic automatic regression testing can thus be implemented.
12.6 Conclusions The Time-Triggered Architecture is the result of more than 20 years of research in the field of dependable distributed real-time systems. During this period, many ideas have been developed, implemented, evaluated, and finally discarded. What survived is a small set of orthogonal concepts that center around the availability of a dependable global time base. The guiding principle during the development of TTA has always been to take maximum advantage of the availability of this global time, which is part of the world, even if we do not use it. TTA spans the whole spectrum of dependable distributed real-time systems, from the low-cost deeply embedded sensor nodes to high-performance nodes that communicate at gigabits per second speeds, persistently assuming that a global time of appropriate precision is available in every node of TTA. At present, TTA occupies a niche position, since in the experimental as well as in the theoretical realm of main-line computing, time is considered a nuisance that makes life difficult and should be dismissed at the earliest moment [10]. However, as more and more application designers start to realize that real time is an integrated part of the real world that cannot be abstracted away, the future prospects for TTA look encouraging.
Acknowledgments This work was supported by the European IST (Information Society Technologies) project “Next TTA” under project number IST-2001-32111. This document is a revised version of [2].
References [1] Consortium DECOS. DECOS Annex 1: Description of Work, 2003. Contract FP6-511764. [2] H. Kopetz and G. Bauer. Time-triggered communication networks. In Industrial Information Technology Handbook. CRC Press, Boca Raton, FL, 2004. [3] H. Kopetz and G. Bauer. The Time-Triggered Architecture. Proceedings of the IEEE, 91:112–126, 2003. [4] H. Kopetz and R. Nossal. Temporal firewalls in large distributed real-time systems. In Proceedings of the IEEE Workshop on Future Trends in Distributed Computing, 1997, pp. 310–315. [5] H. Kopetz. Real-Time Systems: Design Principles for Distributed Embedded Applications. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1997.
© 2005 by CRC Press
12-16
The Industrial Communication Technology Handbook
[6] H. Kopetz. The time-triggered (TT) model of computation. In Proceedings of the 19th IEEE RealTime System Symposium, 1998, pp. 168–177. [7] H. Kopetz. Elementary versus composite interfaces in distributed real-time systems. In Proceedings of the 4th International Symposium on Autonomous Decentralized Systems, 1999, pp. 26–33. [8] H. Kopetz. Software engineering for real-time: a roadmap. In Proceedings of the 22nd International Conference on Software Engineering, 2000, pp. 201–211. [9] Hermann Kopetz. On the Determinism of Communication Systems. Research Report 48/2003, Technische Universität Wien, Institut für Technische Informatik, Vienna, Austria, 2003. [10] E. Lee. What’s ahead for embedded software? IEEE Computer, 33:18–26, 2000. [11] L. Lamport and P.M. Melliar-Smith. Synchronizing clocks in the presence of faults. Journal of the ACM, 32:52–78, 1985. [12] Leslie Lamport, Robert Shostak, and Marshall Pease. The Byzantine Generals Problem. ACM Transactions on Programming Languages and Systems, 4:382–401, 1982. [13] Roman Obermaisser. An Integrated Architecture for Event-Triggered and Time-Triggered Control Paradigms. Ph.D. thesis, Technische Universität Wien, Institut für Technische Informatik, Vienna, Austria, 2002. [14] OMG. Smart Transducers Interface. Final adopted specification ptc/2002-10-02, Object Management Group, 2002. Available at http://www.omg.org. [15] H. Pfeifer. Formal verification of the TTP group membership algorithm. In Tommaso Bolognesi and Diego Latella, editors, Formal Methods for Distributed System Development Proceedings of FORTE XIII/PSTV XX 2000, Pisa, Italy, October 2000, pp. 3–18. Kluwer Academic Publishers, Dordrecht, The Netherlands. [16] Holger Pfeifer, Detlef Schwier, and Friedrich W. von Henke. Formal verification for time-triggered clock synchronization. In Charles B. Weinstock and John Rushby, editors, Dependable Computing and Fault Tolerant Systems, Vol. 12, Dependable Computing for Critical Applications — 7, IEEE Computer Society, San Jose, CA, 1999, pp. 207–226. [17] Martin Schwarz. Implementation of a TTP/C Cluster Based on Commercial Gigabit Ethernet Components. Master’s thesis, Technische Universität Wien, Institut für Technische Informatik, Vienna, Austria, 2002. [18] Wilfried Steiner, John Rushby, Maria Sorea, and Holger Pfeifer. Model Checking a Fault-Tolerant Startup Algorithm: From Design Exploration to Exhaustive Fault Simulation. Paper presented at the International Conference on Dependable Systems and Networks (DSN2004), June 2004. [19] Christopher Temple. Enforcing Error Containment in Distributed Time-Triggered Systems: The Bus Guardian Approach. Ph.D. thesis, Technische Universität Wien, Institut für Technische Informatik, Vienna, Austria, 1999. [20] TTTech Computertechnik AG. Specification of the TTP/C Protocol. Available at http://www.tttech.com. [21] TTTech Computertechnik AG. TTP in Commercial Production. Available at http://tttech.com/c u s t o m e r s / . [22] TTTech Computertechnik AG. TTP/C-C2 Data Sheet. Available at http://www.ttchip.com.
© 2005 by CRC Press
13 Controller Area Network: A Survey 13.1 Introduction ......................................................................13-1 13.2 CAN Protocol Basics.........................................................13-2 Physical Layer • Frame Format • Access Technique • Error Management • Fault Confinement • Communication Services • Implementation
13.3 Main Features of CAN....................................................13-12 Advantages • Drawbacks • Solutions
Gianluca Cena IEIIT-CNR
Adriano Valenzano IEIIT-CNR
13.4 Time-Triggered CAN ......................................................13-14 Main Features • Protocol Specification • Implementation
13.5 CAN-Based Application Protocols.................................13-16 CANopen • DeviceNet
References ...................................................................................13-20
13.1 Introduction The history of Controller Area Network (CAN) starts more than 20 years ago. At the beginning of the 1980s a group of engineers at Bosch GmbH were looking for a serial bus system suitable for use in passenger cars. The most popular solutions adopted at that time were considered inadequate for the needs of most automotive applications. The bus system, in fact, had to provide a number of new features that could hardly be found in the already existing fieldbus architectures. The design of the new proposal also involved several academic partners and had the support of Intel, as the potential main semiconductor producer. The new communication protocol was presented officially in 1986 with the name of Automotive Serial Controller Area Network at the Society of Automotive Engineers (SAE) congress held in Detroit. It was based on a multimaster access scheme to the shared medium that resembled the well-known carriersense multiple-access (CSMA) approach. The peculiar aspect, however, was that CAN adopted a new distributed nondestructive arbitration mechanism to solve contentions on the bus by means of priorities implicitly assigned to the colliding messages. Moreover, the protocol specifications also included a number of error detection and management mechanisms to enhance the fault tolerance of the whole system. In the following years, both Intel and Philips started to produce controller chips for CAN following two different philosophies. The Intel solution (often referred to as FullCAN in the literature) required less host CPU power, since most of the communication and network management functions were carried out directly by the network controller. Instead, the Philips solution (BasicCAN) was simpler but imposed a higher load on the processor used to interface the CAN controller. Since the mid-1990s more than 15 semiconductor vendors, including Siemens, Motorola, and NEC, have been producing and shipping millions of CAN chips mainly to car manufacturers such as Mercedes-Benz, Volvo, Saab, Volkswagen, BMW, Renault, and Fiat. The Bosch specification (CAN version 2.0) was submitted for international standardization at the beginning of the 1990s. The proposal was approved and published as ISO 11898 at the end of 1993 and
13-1 © 2005 by CRC Press
13-2
The Industrial Communication Technology Handbook
contained the description of the network access protocol and the physical layer architecture. In 1995 an addendum to ISO 11898 was approved to describe the extended format for message identifiers. The CAN specification is currently in the process of being revised and reorganized and has been split into four separate parts: [ISO1], [ISO2], and [ISO4] have already been approved as international standards, whereas [ISO3] has reached a stable status and is being finalized. Even though it was conceived for vehicle applications, at the beginning of the 1990s CAN began to be adopted in different scenarios. The standard documents provided satisfactory specifications for the lower communication layers but did not offer guidelines or recommendations for the upper part of the Open Systems Interconnection (OSI) protocol stack, in general, and for the application layer, in particular. This is why the earlier applications of CAN outside the automotive scenario (i.e., textile machines, medical systems, and so on) adopted ad hoc monolithic solutions. The CAN in Automation (CiA) users’ group, founded in 1992, was originally concerned with the specification of a standard CAN application layer. This effort led to the development of the general-purpose CAN application layer (CAL) specification. CAL was intended to fill the gap between the distributed application processes and the underlying communication support, but in practice it was not successful, the main reason being that because CAL is really application independent, each user had to develop a suitable profile based on CAL for her or his specific application field. In the same years, Allen-Bradley and Honeywell started a joint distributed control project based on CAN. Although the project was abandoned a few years later, Allen-Bradley and Honeywell continued their works separately and focused on the higher protocol layers. The results of these activities were the Allen-Bradley DeviceNet solution and the Honeywell Smart Distributed System (SDS). For a number of reasons, SDS remained, in practice, an internal solution to Honeywell Microswitch, while DeviceNet was soon switched to Open DeviceNet Vendor Association and was widely adopted in a number of U.S. factory automation areas, becoming a serious competitor to widespread solutions such as PROFIBUS-DP and INTERBUS. Besides DeviceNet and SDS, other significant initiatives were focused on CAN and its application scenarios. CANopen was conceived in the framework of the European Esprit project ASPIC* by a consortium led once again by Bosch GmbH. The purpose of CANopen was to define a profile based on CAL, which could support communications inside production cells. The original CANopen specifications were further refined by CiA and released in 1995. Later, both CANopen and DeviceNet became European standards, and they are now widely used, especially in two different areas: factory automation and machine-distributed controls.
13.2 CAN Protocol Basics The CAN protocol architecture is structured according to the layered approach of the International Organization for Standardization (ISO)/OSI model. However, as in most of the currently existing networks conceived for use at the field level in the automated manufacturing environments, only few layers have been considered in its protocol stack. This is to make implementations more efficient and inexpensive. Few protocol layers, in fact, imply reduced processing delays when receiving and transmitting messages and simpler communication software. The CAN specifications [ISO1] and [ISO2], in particular, include only the physical and the data link layers, as depicted in Figure 13.1. The physical layer is aimed at managing the effective transmission of data over the communication support and tackles the mechanical, electrical, and functional aspects. Bit timing and synchronization, in particular, belong to this layer. The data link layer is split into two separate sublayers: medium access control (MAC) and logical link control (LLC). The purpose of the MAC entity is basically to manage access to the shared transmission support by providing a mechanism aimed at coordinating the use of the bus, so as to avoid unmanageable collisions. The functions of the MAC sublayer include frame encoding and decoding, arbitration, error
*ASPIC, Automation and control Systems for Production units using Installation bus-Concept.
© 2005 by CRC Press
Controller Area Network: A Survey
13-3
FIGURE 13.1 CAN protocol stack.
checking and signaling, and also fault confinement. The LLC sublayer offers the user (i.e., the application programs running in the upper layers) a proper interface, which is characterized by a well-defined set of communication services, in addition to the ability to decide whether an incoming message is relevant to the node. It is worth noting that the CAN specification is very flexible for what concerns both the implementation of the LLC services and the choice of the physical support, whereas there can be no modifications to the behavior of the MAC sublayer. As said before, unlike most fieldbus networks, the CAN specification does not include any native application layer. However, a number of such protocols exist that rely on CAN and ease the design and implementation of complex CAN systems.
13.2.1 Physical Layer The features of the physical layer of CAN that are valid for any system, such as those related to the physical signaling, are described in ISO 11898-1 [ISO1]. The medium access units (i.e., the transceivers) are defined in two separate documents: ISO 11898-2 [ISO2] and ISO 11898-3 [ISO3] for high-speed and low-speed communications, respectively. The definition of the medium interface (i.e., the connectors) is usually covered in other documents. 13.2.1.1 Network Topology CAN networks are based on a shared-bus topology. Buses have to be terminated at each end with resistors (the recommended nominal impedance is 120 W), so as to suppress signal reflections. For the same reason, the standard documents state that the topology of a CAN network should be as close as possible to a single line. Stubs are permitted for connecting devices to the bus, but their length should be as short as possible. For example, at 1 Mbit/s the length of a stub must be shorter than 30 cm. Several kinds of transmission media can be used: • Two-wire bus, which enables differential signal transmissions and ensures reliable communications. In this case, shielded twisted pair can be used to further enhance the immunity to electromagnetic interferences. • Single-wire bus, a simpler and cheaper solution that features lower immunity to interferences and is mainly suitable for use in automotive applications.
© 2005 by CRC Press
13-4
The Industrial Communication Technology Handbook
• Optical transmission medium, which ensures complete immunity to electromagnetic noise and can be used in hazardous environments. Fiber optics is often adopted to interconnect (through repeaters) different CAN subnetworks. This is done to cover plants that are spread over a large area. Several bit rates are available for the network, the most adopted being in the range of 50 Kbit/s to 1 Mbit/s (the latter value represents the maximum allowable bit rate according to the CAN specifications). The maximum extension of a CAN network depends directly on the bit rate. The exact relation between these two quantities involves parameters such as the delays introduced by transceivers and opto-couplers. Generally speaking, the mathematical product between the length of the bus and the bit rate has to be approximately constant. For example, the maximum extension allowed for a 500 Kbit/s network is about 100 m, and increases up to about 500 m when a bit rate of 125 Kbit/s is considered. Signal repeaters can be used to increase the network extension, especially when large plants have to be covered and the bit rate is low or medium. However, they introduce additional delays on the communication paths; hence the maximum distance between any two nodes is effectively shortened at high bit rates. Using repeaters also achieves topologies different from the bus (trees or combs, for example). In this case, good design could increase the effective area that is covered by the network. It is worth noting that unlike other field networks, such as, for example, PROFIBUS-PA, there is in general no cost-effective way in CAN to use the same wire for carrying both the signal and the power supply. However, an additional pair of wires can be provided inside the bus cable for the power supply. Curiously enough, connectors are not standardized by the CAN specifications. Instead, several companion or higher-level application standards exist that define their own connectors and pin assignment. CiA DS102 [DS102], for example, foresees the use of a SUB-D9 connector, while DeviceNet and CANopen suggest the use of either five-pin ministyle, microstyle, or open-style connectors. In addition, these documents include recommendations for bus lines, cables, and standardized bit rates, which were not included in the original CAN specifications. 13.2.1.2 Bit Encoding and Synchronization In CAN the electrical interface of a node to the bus is based on an open-collector-like scheme. As a consequence, the level on the bus can assume two complementary values, which are denoted symbolically as dominant and recessive. Usually, the dominant level corresponds to the logical value 0 while the recessive level coincides with the logical value 1. CAN relies on the non-return-to-zero (NRZ) bit encoding, which features very high efficiency in that synchronization information is not encoded separately from data. Bit synchronization in each node is achieved by means of a digital phase-locked loop (DPLL), which extracts the timing information directly from the bit stream received from the bus. In particular, the edges of the signal are used for synchronizing the local clocks, so as to compensate tolerances and drifts of the oscillators. To provide a satisfactory degree of synchronization among the nodes, the transmitted bit stream should include a sufficient number of edges. To do this, CAN relies on the so-called bit stuffing technique. In practice, whenever five consecutive bits at the same value (either dominant or recessive) appear in the transmitted bit stream, the transmitting node inserts one additional stuff bit at the complementary value, as depicted in Figure 13.2. These stuff bits can be easily and safely removed by the receiving nodes, to obtain the original stream of bits back.
FIGURE 13.2 Bit stuffing technique.
© 2005 by CRC Press
Controller Area Network: A Survey
13-5
From a theoretical point of view, the maximum number of stuff bits that may be added is one every four bits in the original frame, so the encoding efficiency can be as low as 80% (see, for example, the rightmost part of Figure 13.2, where the original bit stream alternates sequences of four consecutive bits at the dominant level followed by four bits at the recessive level). However, the influence of bit stuffing in real operating conditions is noticeably lower than the theoretical value computed above. Simulations show that, on average, only two to four stuff bits are effectively added to each frame, depending on the size of the identifier and data fields. Despite its being quite efficient, the bit stuffing technique has a drawback: the time taken to send a message over the bus is not fixed; instead, it depends on the content of the message itself. This might cause annoying jitters. Not all fields in a CAN frame are encoded according to the bit stuffing mechanism: it applies only to the initial part of the frames, from the start-of-frame (SOF) bit up to the cyclic redundancy check (CRC) sequence. The remaining fields are of fixed form and are not stuffed.
13.2.2 Frame Format The CAN specification [ISO1] defines both a standard and an extended frame format. These formats mainly differ for the size of the identifier field and for some other bits in the arbitration field. In particular, the standard frame format (also known as CAN 2.0A format) defines an 11-bit identifier field, which means that up to 2048 different identifiers are available to the applications executing in the same network (many older CAN controllers, however, only support identifiers in the range of 0 to 2031). The extended frame format (identified as CAN 2.0B) instead assigns 29 bits to the identifier, so that up to a half billion different objects could exist (in theory) in the same network. This is a fairly high value, which is virtually sufficient for any kind of application. Using extended identifiers in a network to which 2.0A-compliant CAN controllers are also connected usually leads to unmanageable transmission errors, which effectively make the network unstable. Thus, a third category of CAN controllers was developed, known as 2.0B passive: they manage in a correct way the transmission and reception of CAN 2.0A frames, while CAN 2.0B frames are simply ignored so that they do not hang the network. It is worth noting that, in most practical cases, the number of different objects allowed by the standard frame format is more than adequate. Since standard CAN frames are shorter than the extended ones (because of the shorter arbitration field), they permit higher communication efficiency (unless part of the payload is moved into the arbitration field). As a consequence, they are adopted in most of the existing CAN systems, and most of the CAN-based higher-layer protocols, such as CANopen and DeviceNet, basically rely on this format. The CAN protocol foresees only four kinds of frames: data, remote, error, and overload. Their formats are described in detail below. 13.2.2.1 Data Frame Data frames are used to send information over the network. Each data frame in CAN begins with a startof-frame (SOF) bit at the dominant level, as shown in Figure 13.3. Its role is to mark the beginning of the frame, as in serial transmissions carried out by means of conventional Universal Asynchronous Receiver/Transmitters (UARTs). The SOF bit is also used to synchronize the receiving nodes. Immediately after the SOF bit there is the arbitration field, which includes both the identifier and the remote transmission request (RTR) bit. As the name suggests, the identifier field identifies the content of the frame that is being exchanged uniquely on the whole network. The identifier is also used by the MAC sublayer to detect and manage the priority of the frame, which is used whenever a collision occurs (the lower the numerical value of the identifier, the higher the priority of the frame). The identifier is sent starting from the most significant bit up to the least significant one. The size of the identifier is different for the standard and the extended frames. In the latter case, the identifier has been split into an 11-bit base identifier and an 18-bit extended identifier, to provide compatibility with the standard frame format.
© 2005 by CRC Press
13-6
The Industrial Communication Technology Handbook
FIGURE 13.3 Format of data frames.
The RTR bit is used to discriminate between data and remote frames. Since a dominant value of RTR denotes a data frame while a recessive value stands for a remote frame, a data frame has a higher priority than a remote frame having the same identifier. Next to the arbitration field comes the control field. In the case of standard frames, it includes the identifier extension (IDE) bit, which discriminates between standard and extended frames, followed by the reserved bit r0. In the extended frames, the IDE bit effectively belongs to the arbitration field, as well as the substitute remote request (SRR) bit — a placeholder that is sent at recessive value to preserve the structure of the frames. In this case, the IDE bit is followed by the identifier extension and then by the control field, which begins with the two reserved bits r1 and r0. After the reserved bits there is the data length code (DLC), which specifies — encoded on 4 bits — the length (in bytes) of the data field. Since the IDE bit is dominant in the standard frames, while it is recessive in the extended ones, when the same base identifier is considered, standard frames have precedence over extended frames. Reserved bits r0 and r1 must be sent by the transmitting node at the dominant value. Receivers, however, will ignore the value of these bits. For the DLC field, values ranging from 0 to 8 are allowed. According to the last specification, higher values (from 9 to 15) can be used for application-specific purposes. In this case, however, the length of the data field is meant to be 8. The data field is used to store the effective payload of the frame. In order to ensure a high degree of responsiveness and minimize the priority inversion phenomenon, the size of the data field is limited to 8 bytes at most. After the data field there are the CRC and acknowledgment fields. The former field is made up of a cyclic redundancy check sequence encoded on 15 bits, which is followed by a CRC delimiter at the recessive value. The kind of CRC adopted in CAN is particularly suitable to cover short frames (i.e., counting less than 127 bits). The acknowledgment field is made up of two bits: the ACK slot followed by the ACK delimiter. Both of them are sent at the recessive level by the transmitter. The ACK slot, however, is overwritten with a dominant value by each node that has received the frame correctly (i.e., no error was detected up to the ACK field). It is worth noting that, in this way, the ACK slot is actually surrounded by two bits at the recessive level: the CRC and ACK delimiters. By means of the ACK bit, the transmitting node is enabled to discover whether at least one node in the network has received its frame correctly. At the end of the frame there is the end-of-frame (EOF) field, made up of seven recessive bits, which notifies all the nodes of the end of an error-free transmission. In particular, the transmitting node assumes that the frame has been exchanged correctly if no error is detected until the last bit of the EOF field, while in the case of receivers, the frame is valid if there are no errors until the sixth bit of EOF. Different frames are interleaved by the intermission (IMS), which consists of three recessive bits and effectively separates consecutive frames exchanged on the bus. 13.2.2.2 Remote Frames Remote frames are very similar to data frames. The only difference is that they carry no data (i.e., the data field is not present in this case). They are used to request that a given message be sent on the network
© 2005 by CRC Press
Controller Area Network: A Survey
13-7
by a remote node. It is worth noting that the requesting node does not know who the producer of the related information is. It is up to the receivers to discover the one that has to reply. The DLC field in remote frames is not effectively used by the CAN protocol. However, it should be set to the same value as the corresponding data frame, so as to cope with the situations where several nodes send remote requests with the same identifier at the same time (this is legal in a CAN network). In this case, it is necessary for the different requests to be perfectly identical, so that they will overlap in the case of a collision. It should be noted that because of the way the RTR bit is encoded, if a request is made for an object at the same time the transmission of that object is started by the related producer, the contention is resolved in favor of the data frame. 13.2.2.3 Error Frames Error frames are used to notify the nodes in the network that an error has occurred. They consist of two fields: error flag and error delimiter. There are two kinds of error flag: the active error flag is made up of six dominant bits, while the passive error flag consists of six recessive bits. An active error flag violates the bit stuffing rules or the fixed-format parts of the frame that is currently being exchanged; hence, it enforces an error condition that is detected by all other stations connected to the network. Each node that detects an error condition transmits an error flag on its own. In this way, as a consequence of the transmission of an error flag, there can be from 6 to 12 dominant bits on the bus. The error delimiter is made up of eight recessive bits. After the transmission of an error flag, each node starts sending recessive bits, and at the same time, it monitors the bus level until a recessive bit is detected. At this point the node sends seven more recessive bits, hence completing the error delimiter. 13.2.2.4 Overload Frames Overload frames can be used by the slow receivers to slow down operations on the network. This is done by adding an extra delay between consecutive data and remote frames. Their format is very similar to that of error frames. In particular, it is made up of an overload flag followed by an overload delimiter. Today’s CAN controllers are very fast, and so they make the overload frame almost useless.
13.2.3 Access Technique The medium access control mechanism on which CAN relies is basically carrier-sense multiple access (CSMA). When no frame is being exchanged, the network is idle and the level on the bus is recessive. Before transmitting a frame, the nodes have to observe the state of the network. If the network is idle, the frame transmission begins immediately; otherwise, the node must wait for the current frame transmission to end. Each frame starts with the SOF bit at the dominant level, which informs all the other nodes that the network has switched to the busy state. Even though very unlikely, it may happen that two or more nodes start sending their frames exactly at the same time. This is actually possible because the propagation delays on the bus — even though very small — are greater than zero. Thus, one node might start its transmission while the SOF bit of another frame is already traveling on the bus. In this case, a collision will occur. In CSMA networks that are based on collision detection, such as, for example, nonswitched Ethernet, this unavoidably leads to the corruption of all frames involved, which means that they have to be retransmitted. The consequence is a waste of time and a net decrease of the available bandwidth. In high-load conditions, this may lead to congestion: when the number of collisions is so high that the net throughput on the Ethernet network falls below the arrival rate, the network becomes stalled. Unlike Ethernet, CAN is able to resolve the contentions in a deterministic way, so that neither time nor bandwidth is wasted. Therefore, congestion conditions can no longer occur and all the theoretical system bandwidth is effectively available for communications. For the sake of truth, it should be said that contentions in CAN occur more often than one may think. In fact, when a node that has a frame to transmit finds the bus busy or loses the contention, it waits for
© 2005 by CRC Press
13-8
The Industrial Communication Technology Handbook
the end of the current frame exchange, and immediately after the intermission has elapsed, it starts transmitting. Here, the node may compete with other nodes for which — in the meantime — a transmission request has been issued. In this case, the different nodes synchronize on the falling edge of the first SOF bit that is sensed on the network. This implies that the behavior of a CAN network is effectively that of a network-wide distributed transmission queue where messages are selected for transmission according to a priority order. 13.2.3.1 Bus Arbitration The most distinctive feature of the medium access technique of CAN is the ability to resolve in a deterministic way any collision that should occur on the bus. In turn, this is made possible by the arbitration mechanism, which effectively finds out the most urgent frame each time there is a contention for the bus. The CAN arbitration scheme allows the collisions to be resolved by stopping the transmissions of all frames involved except the one that is characterized by the highest priority (i.e., the lowest identifier). The arbitration technique exploits the peculiarities of the physical layer of CAN, which conceptually provides a wired-end connection scheme among all the nodes. In particular, the level on the bus is dominant if at least one node is sending a dominant bit; likewise, the level on the bus is recessive if all the nodes are transmitting recessive bits. By means of the so-called binary countdown technique, each node — immediately following the SOF bit — transmits the message identifier serially on the bus, starting from the most significant bit. When transmitting, each node checks the level observed on the bus against the value of the bit that is being written out. If the node is transmitting a recessive value and the level on the bus is dominant, the node understands it has lost the contention and withdraws immediately. In particular, it ceases transmitting and sets its output port to the recessive level so as not to interfere with the other contending nodes. At the same time, it switches to the receiving state to read the incoming (winning) frame. The binary countdown technique ensures that in the case of a collision, all the nodes that are sending lower-priority frames will abort their transmissions by the end of the arbitration field, except for the one that is sending the frame characterized by the highest priority (the winning node does not even realize that a collision has occurred). This implies that no two nodes in a CAN network can be transmitting messages related to the same object (that is to say, characterized by the same identifier) at the same time. If this is not the case, in fact, unmanageable collisions could take place that, in turn, cause transmission errors. Because of the automatic retransmission feature of the CAN controllers, this will lead almost certainly to a burst of errors on the bus, until the stations involved are disconnected by the fault confinement mechanism. This implies that, in general, only one node can be the producer of each object. One exception to this rule is given by the frames without a data field, such as, for example the remote frames. In this case, should a collision occur among frames with the same identifier, they overlap perfectly and hence no collision effectively occurs. The same is also true for data frames that have a nonempty data field, provided that the content of this field is the same for all the frames sharing the same identifier. However, it makes no sense in general to send frames with a fixed data field. All nodes that lose the contention have to retry the transmission as soon as the exchange of the current (winning) frame ends. They will all try to send their frames again immediately after the intermission is read on the bus. Here, a new collision could take place that also involves the frames sent by the nodes for which a transmission request was issued while the bus was busy. An example that shows the detailed behavior of the arbitration phase in CAN is outlined in Figure 13.4. Here, three nodes (that have been indicated symbolically as A, B, and C) start transmitting a frame at the same time (maybe at the end of the intermission following the previous frame exchange over the bus). As soon as a node understands it has lost the contention, it switches its output level to the recessive value, so that it no longer interferes with the other transmitting nodes. This event takes place when bit ID5 is being sent for node A, while for node B this happens at bit ID2. Node C manages to send the entire identifier field, and then it can keep on transmitting the remaining part of the frame.
© 2005 by CRC Press
Controller Area Network: A Survey
13-9
FIGURE 13.4 Arbitration phase in CAN.
13.2.4 Error Management One of the main requirements that was fundamental in the definition of the CAN protocol was the need to have a communication system characterized by high robustness, i.e., a system that is able to detect most of the transmission errors. Hence, particular care has been taken in defining error management. The CAN specification foresees five different mechanisms to detect transmission errors: 1. Cyclic redundancy check: When transmitting a frame, the originating node adds a 15-bit-wide CRC to the end of the frame itself. Receiving nodes reevaluate the CRC to check if it matches the transmitted one. Generally speaking, the CRC used in CAN is able to discover up to 5 erroneous bits distributed arbitrarily in the frame or errors bursts including up to 15 bits. 2. Frame check: The fixed-format fields in the received frames can be easily tested against their expected values. For example, the CRC and ACK delimiters as well as the EOF field have to be at the recessive level. If one or more illegal bits are detected, a form error is generated. 3. Acknowledgment check: The transmitting node checks whether the ACK bit has been set to the dominant value in the received frame. On the contrary, an acknowledgment error is issued. 4. Bit monitoring: Each transmitting node compares the level on the bus against the value of the bit that is being written. Should a mismatch occur, an error is generated. This does not hold for the arbitration field or the acknowledgment slot. Such an error check is very effective to detect local errors that may occur in the transmitting nodes. 5. Bit stuffing: Each node verifies whether the bit stuffing rules have been violated in the portion of the frames from the SOF bit up to the CRC sequence. In the case when six bits of identical value are read from the bus, an error is generated. The residual probability that a corrupted message is not detected in a CAN network — under realistic operating conditions — has been evaluated and is found to be about 4.7 · 10–11 times the frame error rate or less.
13.2.5 Fault Confinement To prevent a node that is not operating properly from sending repeatedly corrupted frames, hence blocking the entire network, a fault confinement mechanism has been included in the CAN specification. The fault confinement unit supervises the correct operation of the related MAC sublayer, and should the node become defective, it disconnects that node from the bus. The fault confinement mechanism has been conceived to discriminate, as long as it is possible, between permanent failures and short disturbances that may cause bursts of errors on the bus. According to this mechanism, each node can be in one of the three following states:
© 2005 by CRC Press
13-10
The Industrial Communication Technology Handbook
• Error active • Error passive • Bus off Error-active and error-passive nodes take part in the communication in the same way. However, they react to the error conditions differently. They send active error flags in the former case and passive error flags in the latter. This is because an error-passive node has already experienced several errors, and hence it should avoid interfering with the network operations (a passive error flag, in fact, does not corrupt the ongoing frame exchange). The fault confinement unit uses two counters to track the behavior of the node with respect to the transmission errors: transmission error count (TEC) and receive error count (REC). The rules by which TEC and REC are managed are actually quite complex. However, they can be summarized as follows: each time an error is detected, the counters are increased by a given amount, whereas successful exchanges decrease them by one. Furthermore, the amount of the increase for the nodes that first detected the error is higher than for the nodes that simply replied to the error flag. In this way, it is very likely that the counters of the faulty nodes increase more quickly than the nodes that are operating properly, even when sporadic errors due to electromagnetic noise are considered. When counters exceed the first threshold (127), the node is switched to the error-passive state, to try not to affect the network. When a second threshold (255) is exceeded, the node is switched to the busoff state. At this point, it can no longer transmit any frame on the network, and it can be switched back to the error-active state only after it has been reset and reconfigured.
13.2.6 Communication Services According to the ISO specification [ISO1], the LLC sublayer of CAN provides two communication services only: L_DATA, which is used to broadcast the value of a specific object over the network, and L_REMOTE, which is used to ask for the value of a specific object to be broadcast by the related remote producer. From a practical point of view, these primitives are implemented directly in the hardware by all currently available CAN controllers. 13.2.6.1 Model for Information Exchanges Unlike most network protocols conceived for use in automated manufacturing environments (which rely on node addressing), CAN adopts object addressing. In other words, messages are not tagged with the address of the destination or originating node. Instead, each piece of information that is exchanged over the network (often referred to as an object) is assigned a unique identifier, which denotes unambiguously the meaning of the object itself in the whole system. This fact has important consequences on the way communications are carried out in CAN. In fact, identifying the objects that are exchanged over the network according to their meaning rather than to the node they are intended for implicitly allows multicasting and makes it very easy for the control applications to manage interactions among devices according to the producer–consumer paradigm. The exchange of information in CAN takes place according to the three phases shown in Figure 13.5: 1. The producer of a given piece of information encodes and transmits the related frame on the bus (the arbitration technique will transparently resolve any contention that should occur). 2. Because of the intrinsically broadcast nature of the bus, the frame is propagated all over the network, and every node reads its content in a local receive buffer. 3. The frame acceptance filtering (FAF) function in each node determines whether the information is relevant to the node itself. If it is, the frame is passed to the upper communication layers (from a practical point of view, this means that the CAN controller raises an interrupt to the local device logic, which will then read the value of the object); if it is not, the frame is simply ignored and discarded.
© 2005 by CRC Press
Controller Area Network: A Survey
13-11
FIGURE 13.5 Producer–consumer model.
In the sample data exchange depicted in Figure 13.5, node B is the producer of some kind of information that is relevant to (i.e., consumed by) nodes A and D. Node C is not interested in such data, so it is rejected by the filtering function (this is the default behavior of the FAF function). 13.2.6.2 Model for Device Interaction The access technique of CAN makes this kind of network particularly suitable to be used in distributed systems that communicate according to the producer–consumer model. In this case, data frames are used by the producer nodes to broadcast new values over the network, each of which is identified unambiguously by means of its identifier. Unlike the networks based on the producer–consumer–arbiter model, such as the Factory Instrumentation Protocol (FIP), information is sent in CAN as soon as it becomes available from either the control applications or the controlled physical system (by means of sensors), without the need for the intervention of a centralized arbiter. This noticeably improves the responsiveness of the whole system. CAN networks also work equally well when they are used to interconnect devices in systems based on a more conventional master–slave communication model. In this case, the master can use remote frames to ask for some specific information to be remotely sent on the network. The producer of that information, as a consequence of this frame, will reply with a data frame carrying the related object. It is worth noting that this kind of interaction is implemented in CAN in a fairly more flexible way than in the conventional master–slave networks, such as, for example, PROFIBUS-DP. In CAN, in fact, it is not necessary for the reply (data frame) to follow the request (remote frame) immediately. In other words, the network is not kept busy while the device is trying to send the reply. This allows the entire bandwidth to be theoretically available to the applications. Furthermore, the reply containing the requested value is broadcast on the whole network, and hence it can be read by all the interested nodes, in addition to the one that transmitted the remote request.
13.2.7 Implementation According to the internal architecture, CAN controllers can be classified in two different categories: BasicCAN and FullCAN. Conceptually, BasicCAN controllers are provided with one transmit and one receive buffer, as in conventional UARTs. The frame-filtering function, in this case, is generally left to the application programs (i.e., it is under control of the host controller), even though some kind of filtering can be done by the controller. To avoid overrun conditions, a double-buffering scheme based on shadow receive buffers is usually available, which permits a new frame to be received from the bus while the previous one is being read by the host controller. An example of a controller based on the BasicCAN scheme is given by Philips’ PCA82C200. FullCAN implementations foresee a number of internal buffers that can be configured to either receive or transmit some particular messages. In this case, the filtering function is implemented directly in the
© 2005 by CRC Press
13-12
The Industrial Communication Technology Handbook
CAN controller. When a new frame that is of interest for the node is received from the network, it is stored in the related buffer, where it can then be read by the host controller. In general, new values simply overwrite the previous ones, and this does not lead to an overrun condition (the old value of a variable is superseded by a newer one). The Intel 82526 and 82527 CAN controllers are based on the FullCAN architecture. FullCAN controllers, in general, free the host controller of a number of activities, so they are considered to be more powerful than BasicCAN controllers. However, the most recent CAN controllers embed the operating principles of both above architectures, so the above classification is actually in the process of being superseded.
13.3 Main Features of CAN The medium access technique on which CAN relies basically implements a nonpreemptive distributed priority-based communication system, where each node is enabled to compete directly for the bus ownership, so that it can send messages on its own (this means that CAN is a true multimaster system). This can be advantageous for use in event-driven systems.
13.3.1 Advantages CAN is by far more simple and robust than the token-based access schemes (such as, for example, PROFIBUS when used in multimaster configurations). In fact, there is no need to build or maintain the logical ring, nor to manage the circulation of the token around the master stations. In the same way, it is noticeably more flexible than the solutions based on the time-division multiple-access (TDMA) or combined-message approaches — two techniques adopted by SERCOS and INTERBUS, respectively. This is because message exchanges do not have to be known in advance. When compared to schemes based on centralized polling, such as FIP, it is not necessary to have a node in the network that acts as the bus arbiter, which can become a point of failure for the whole system. Since in CAN all the nodes are masters (at least from the point of view of the MAC mechanism), it is very simple for them to notify asynchronous events, such as, for example, alarms or critical error conditions. In all cases where this aspect is important, CAN is clearly better than the above-cited solutions. Thanks to the arbitration scheme, it is certain that no message will be delayed by lower-priority exchanges (this phenomenon is known as priority inversion). Since the CAN protocol is not preemptive (as is the case for almost all existing protocols), a message can still be delayed by a lower-priority one whose transmission has already started. This is unavoidable in any nonpreemptive system. However, as the frame size in CAN is very small (standard frames are 135 bits long at most, including stuff bits), the blocking time experienced by the very urgent messages is in general quite low. This makes CAN a very responsive network, which explains why it is used in many real-time control applications despite its relatively low bandwidth. The above characteristics have to be considered carefully when assigning identifiers to the different objects that have to be exchanged in distributed real-time control applications. From an intuitive point of view, the most urgent messages (i.e., the messages characterized by the tightest deadlines) should be assigned the lowest identifiers (for example, identifier 0 labels the message that has the highest priority in any CAN network). If the period of cyclic data exchanges (and the minimum interarrival time of the acyclic ones) is known in advance, a number of techniques based on either the rate monotonic or deadline monotonic approaches have appeared in the literature [TIN] that can be used to find (if it exists) a suitable assignment of identifiers to the objects, so that the resulting schedule is feasible (i.e., the deadlines of all the objects are always respected).
13.3.2 Drawbacks There are a number of drawbacks that affect CAN, the most important being related to performance, determinism, and dependability. Though they were initially considered mostly irrelevant, as time elapses they are becoming quite limiting in a number of application fields.
© 2005 by CRC Press
Controller Area Network: A Survey
13-13
13.3.2.1 Performances Even though inherently elegant, the arbitration technique of CAN poses serious limitations on the performance that can be obtained by the network. In fact, in order for the arbitration mechanism to operate correctly, it is necessary for the signal to be able to propagate from a node located at one end of the bus up to the farthest node (at the other end) and come back before the originating samples the level on the bus. Since the sampling point is located roughly after the middle of each bit (the exact position can be programmed by means of suitable registers), the end-to-end propagation delay, including the hardware delay of transceivers, must be shorter than about one quarter of the bit time (the exact value depending on the bit timing configuration in the CAN controller). As the propagation speed of signals is fixed (about 200 m/µs on copper wires), this implies that the maximum length allowed for the bus is necessarily limited and depends directly on the bit rate chosen for the network. For example, a 250 Kbit/s CAN network can span at most 200 m. Similarly the maximum bus length allowed when the bit rate is selected as equal to 1 Mbit/s is only 40 m. This, to some degree, explains why the maximum bit rate allowed by CAN specifications [ISO1] has been limited to 1 Mbit/s. It is worth noting that this limitation depends on physical factors, and hence it cannot be overcome in any way by advances in the technology of transceivers (to make a comparison, at present, several inexpensive communication technologies are available on the market that allow bit rates in the order of tens or hundreds of Mbit/s). Even though this can appear to be a very limiting factor, it will probably not have any relevant impact in the near future for several application areas — including automotive and process control applications — for which cheap and well-assessed technology is more important than performance. However, there is no doubt that CAN will suffer in a couple of years from the higher bit rates of its competitors, i.e., PROFIBUS-DP (up to 12 Mbit/s), SERCOS (up to 16 Mbit/s), INTERBUS (up to 2 Mbit/s), FlexRay (up to 10 Mbit/s), or the networks based on Industrial Ethernet (up to 100 Mbit/s). Such solutions, in fact, are able to provide a noticeably higher data rate, which is necessary for the systems that have a lot of devices and very short cycle times (1 ms or less). 13.3.2.2 Determinism Because of its nondestructive bitwise arbitration scheme, CAN is able to resolve in a deterministic way any collision that might occur on the bus. However, if nodes are allowed to produce asynchronous messages on their own — this is the way event-driven systems usually operate — there is no way to know in advance the exact time a given message will be sent. This is because it is not possible to foresee the actual number of collisions a node will experience with higher-priority messages. This behavior leads to potentially dangerous jitters, which in some kind of applications, such as, for example, those involved in the automotive field, might affect the control algorithms in a negative way and worsen its precision. In particular, it might happen that some messages miss their intended deadlines. Related to determinism is the problem that composability is not ensured in CAN networks. This means that when several subsystems are connected to the same network, the overall system may fail to satisfy some timing requirement, even though each subsystem was tested separately and proved to behave correctly. This is a severe limitation to the chance of integrating subsystems from different vendors, and hence makes the design tasks more difficult. 13.3.2.3 Dependability The last drawback of CAN concerns dependability. Whenever safety-critical applications are considered, where a communication error may lead to damages to the equipment or even injuries to human beings, such as, for example, in automotive x-by-wire systems, a highly dependable network has to be adopted. Reliable error detection should be achieved both in the value and in the time domain. In the former case, conventional techniques such as, for example, the use of a suitable CRC are adequate. In the latter case, a time-triggered approach is certainly more appropriate than the event-driven communication scheme provided by CAN. In time-triggered systems all actions (including message exchanges, sampling
© 2005 by CRC Press
13-14
The Industrial Communication Technology Handbook
of sensors, actuation of commanded values, and task activations) are known in advance and must take place at precise points in time. In this context even the presence (or absence) of a message at a given instant provides significant information (i.e., it enables the discovery of faults). Also related to dependability issues is the so-called babbling idiot problem, from which the CAN system might suffer. In fact, a faulty node that repeatedly transmits a very high priority message on the bus can block the whole network. Such a failure cannot be detected by the fault confinement unit embedded in CAN chips, as it does not depend on physical faults, but is due to logical errors.
13.3.3 Solutions Among the possible solutions conceived to enhance the behavior of CAN is the so-called time-triggered CAN (TTCAN) protocol [ISO4], for which the first chips are already available. By adopting a common clock and a time-triggered approach it is possible to reduce jitters and provide a fully deterministic behavior. If asynchronous transmissions are not allowed in the system (which means that the arbitration technique is not actually used), TTCAN effectively behaves like a TDMA system, and thus there is not any particular limitation on the bit rate (which could be increased above the theoretical limit of CAN). However, such a solution is generally not advisable, in that the behavior of the resulting network becomes noticeably different from CAN. Other solutions have appeared in the literature for improving CAN performances, such as, for example, WideCAN [WCAN], that provide higher bit rates and still rely on the conventional CAN arbitration technique. However, at present their interest is mainly theoretical.
13.4 Time-Triggered CAN The time-triggered CAN protocol was introduced by Bosch in 1999 with the aim of making CAN suitable for the new needs of the automotive industry. However, it can be profitably used in those applications characterized by tight timing requirements that demand strictly deterministic behavior. In TTCAN, in fact, it is possible to decide exactly the point in time when safety-critical messages will be exchanged, irrespective of the network load. Moreover, composability is much improved with respect to CAN, so that it is possible to split a system into several subsystems that can be developed and tested separately. The TTCAN specification is now stable and is being standardized by ISO [ISO4]. The main reason that led to the definition of TTCAN was the need to provide improved communication determinism while maintaining the highest degree of compatibility with the existing CAN devices and development tools. In this way, noticeable savings in the investments for the communication technology can be achieved.
13.4.1 Main Features One of the most appealing features of TTCAN is that it allows event-driven and time-triggered operations to coexist in the same network. To ease migration from CAN, TTCAN foresees two levels of implementations that are known as levels 1 and 2, respectively. Level 1 implements basic time-triggered communications over CAN. Level 2, which is a proper extension of level 1, also offers a means for maintaining a global system time across the whole network, irrespective of tolerances and drifts of the local oscillators. This enables high-end synchronization, and hence true time-triggered operations can take place in the system. The TTCAN protocol is placed above the (unchanged) CAN protocol. It allows time-triggered exchanges to take place in a quasi-conventional CAN network. Because TTCAN relies on CAN directly (they adopt the same frame format and the same transmission protocol), it suffers from the same performance limitations of the underlying technology. In particular, it is not practically feasible to increase the transmission speed above 1 Mbit/s. However, because of the time-triggered paradigm it relies on, TTCAN is able to ensure strictly deterministic communications, which means that, for example, it is suitable for the first generation of drive-by-wire automotive systems — which are provided with hydrau-
© 2005 by CRC Press
Controller Area Network: A Survey
13-15
lic/mechanical backups. However, it will likely be unsuitable for the next generation of steer-by-wire applications. In these cases, in fact, the required bandwidth is noticeably higher.
13.4.2 Protocol Specification TTCAN is based on a centralized approach, where a special node called the time master (TM) keeps the whole network synchronized by regularly broadcasting a reference message (RM), usually implemented as a high-priority CAN message. Redundant time masters can be envisaged to provide increased reliability. Whenever receiving RM, each node restarts its cycle timer, so that a common view of the elapsing time is ensured across the whole network. In practice, every time a SOF bit is read on the bus, a synchronization event is generated in every network controller that causes the local time to be copied in a sync mark register. If the SOF bit is related to a valid reference message, the sync mark register is then loaded into the reference mark register. At this point, the cycle time is evaluated as the difference between the current local time and the reference mark. Two kinds of RM are foreseen: in level 1 implementations RM is 1 byte long, whereas level 2 relies on a 4-byte RM that is backward compatible with level 1 (from a practical point of view, 3 bytes are added for distributing the global time as seen by the time master). Protocol execution is driven by the progression of the cycle time. In particular, a number of time marks are defined in each network controller as either transmission or receive triggers, which are used for sending messages and validating message receptions, respectively. In TTCAN each node does not have to know all the messages in the network. Instead, only details of the messages the node sends or reads are needed. Transmission of data is organized as a sequence of basic cycles (BCs). Each basic cycle begins with the reference message, which is followed by a fixed number of time windows that are configured offline and can be of the following three types: • Exclusive windows: Each exclusive window is statically reserved to a predefined message, so that collisions cannot occur. They are used for safety-critical data that have to be sent deterministically and without jitters. • Arbitration windows: Such windows are not preallocated to any given message; thus, different competing messages will rely on the nondestructive CAN arbitration scheme to resolve any possible collision that might occur. • Free windows: They are reserved for future expansions of TTCAN systems. So that time windows are not exceeded, in TTCAN controllers it should be possible to disable the automatic retransmission feature of CAN when either the contention is lost or transmission errors are detected. The only exception occurs when several adjacent arbitrating windows exist. In this case, they can be merged to provide a single larger window, which can accommodate asynchronously generated messages in a more flexible way. Despite it seamlessly mixing both synchronous (exclusive) and asynchronous (arbitration) messages, TTCAN is very dependable: in fact, should there be a temporary lack of synchronization and more than one node tries to transmit in the same exclusive window, the arbitrating scheme of CAN is used to solve the collision. For increased flexibility, it is possible to have more than one basic cycle. A system matrix can be defined that consists of up to 64 different BCs, which are repeated periodically (see Figure 13.6). Thus, the effective periodicity in TTCAN is given by the so-called matrix cycle. A cycle counter — included in the first byte of RM — is used by every node to determine the current basic cycle. It is increased each cycle up to a maximum value (which is selected on a network-wide basis before operation is started), after which it is restarted. It should be noted that the system matrix is highly column oriented. In particular, each BC is made up of the same sequence of time windows; i.e., corresponding windows in different BCs have the same duration. However, they can be used to convey different messages, depending on the cycle counter. In this way, it is possible to have messages in exclusive time windows that are repeated once every any given
© 2005 by CRC Press
13-16
The Industrial Communication Technology Handbook
FIGURE 13.6 System matrix in TTCAN.
number of BCs. In this case, each message is assigned a repeat factor and a cycle offset, which characterize its transmission schedule. In the same way, it is possible to have more than one exclusive window in the BC allocated to the same message. This is useful either to replicate critical data or for having a refresh rate for some variables that is faster than the basic cycle.
13.4.3 Implementation TTCAN requires slight and inexpensive changes to the current CAN chips. In particular, transmit and receive triggers and a counter for managing the cycle time are needed for ensuring time-triggered operations. Even though level 1 could be implemented in software, a specialized hardware support can reduce noticeably the burden on the processor for managing time-triggered operations. As level 2-compliant controllers should allow drift correction and calibration of the local time, they need modified hardware. The structure of TTCAN modules is very similar to that of conventional CAN modules. In particular, two additional blocks are needed: the trigger memory and the frame synchronization entity. The former is used for storing the time marks of the system matrix. They are linked to the message buffers held in the controller’s memory. The latter is used to control the time-triggered communications. At present, there are several controllers available off-the-shelf that comply with TTCAN specifications, so that this protocol can be readily embedded in new projects.
13.5 CAN-Based Application Protocols To reduce the costs of designing and implementing automated systems, a number of higher-level application protocols have been defined in the past few years that rely on the CAN data link layer to exchange messages among the nodes (all the functions of the data link layer of CAN are implemented directly in hardware in the current CAN controllers, which increases the efficiency and reliability of the data exchanges). The aim of such protocols is to provide a usable and well-defined set of service primitives that can be used to interact with the field devices in a standardized way. At present, two of the most widely available solutions for the process control and automated manufacturing environments are CANopen [COP] and DeviceNet [DNET]. Both of them define an object model that describes the behavior of devices. This permits interoperability and interchangeability among devices coming from different manufacturers. In fact, as long as a device conforms to a given profile, it can be used in place of any other device (of a different brand) that adheres to the same profile.
© 2005 by CRC Press
Controller Area Network: A Survey
13-17
13.5.1 CANopen CANopen was originally conceived to rely on the communication services provided by the CAN application layer (CAL). However, the latest specifications [DS301] no longer refer explicitly to CAL. Instead, the relevant communication services have been embedded directly in the CANopen documents. In CANopen, information is exchanged by means of communication objects (COBs). A number of different COBs are foreseen, which are aimed at different functions: • Process data objects (PDOs), used for real-time exchanges such as, for example, measurements read from sensors and commanded values sent to the actuators for controlling the physical system • Service data objects (SDOs), used for non-real-time communications, i.e., parameterization of devices and diagnostics • Emergency objects (EMCY), used by devices to notify the control application that some error condition has occurred • Synchronization object (SYNC), used to achieve synchronized and coordinated operations in the system Even though in principle every CAN node is a master, at least from the point of view of the MAC mechanism, CANopen systems often rely on a master–slave approach, so as to simplify system configuration and network management. In most cases, in a CANopen network there is only one application master (which is responsible for actually controlling the operations of the automated system) and up to 127 slave devices (sensors and actuators). Each device is identified by means of a unique 7-bit address, called the node identifier, which lies in the range of 1 to 127. The node identifier 0 is used in general for broadcast communications. To ease network configuration, a predefined master–slave connection set has to be provided mandatorily by every CANopen device. It is a standard allocation scheme of identifiers to COBs that is available directly after initialization, when a node is switched on or reset — provided that no modifications have been stored in a nonvolatile memory of the device. COB identifiers in the predefined connection set are made up of a function code — which takes the four most significant bits of the CAN identifier — followed by the node address. The function code, on which mainly depends the priority of the COB, is used to discriminate among the different kinds of COBs, that is, PDOs, SDOs, EMCYs, network management (NMT) functions, and so on. 13.5.1.1 Object Dictionary The behavior of any CANopen device is described completely by means of a number of objects, each one tackling a particular aspect related to either the communications on the CAN bus or the functions available to interact with the physical controlled system (for example, there are objects that define the device type, the manufacturer’s name, the hardware and software versions, and so on). All the objects relevant to a given node are stored in the object dictionary (OD) of that node. Entries in the OD are addressed by means of a 16-bit index. Each entry, in turn, can either be represented by a single value or consist of several components that are accessible through an 8-bit subindex (such as the arrays and records). The object dictionary is split into four separate parts, according to the index of entries. Entries below 1000H are used to specify data types. Entries from 1000H to 1FFFH are used to describe communicationspecific parameters (i.e., the interface of the device as seen by the CAN network). Entries from 2000H to 5FFFH can be used by manufacturers to extend the basic set of functions of their devices. Their use has to be considered carefully, in that they could make devices no longer interoperable. Finally, entries from 6000H to 9FFFH are used to describe in a standardized way all aspects related to a specific category of devices (as defined in a device profile). 13.5.1.2 Process Data Objects All the real-time process data involved in controlling a physical system are exchanged in CANopen by means of PDOs. Each PDO is mapped on exactly one CAN frame, so that it can be exchanged quickly
© 2005 by CRC Press
13-18
The Industrial Communication Technology Handbook
and reliably. As a direct consequence, the amount of data that can be exchanged with one PDO is limited to 8 bytes at most. In most cases, this is more than sufficient to encode an item of process data. According to the predefined connection set, each node in CANopen can define up to four receive PDOs (from the application master to the device) and four transmit PDOs (from the device to the application master). In case more PDOs are needed, the PDO communication parameter entries in the OD of the device can be used to define additional messages — or to change the existing ones. By using the PDO mapping parameter — if supported by the device — it is even possible to define in the configuration phase which application objects (i.e., process variables) will be included in each PDO. The transmission of PDOs from the slave devices can be triggered by some local event taking place on the node — including the expiration of some time-out — or it can be remotely requested from the master. This gives system designers a very high degree of flexibility in choosing how devices interact in the automated system, and enables the features offered by intelligent devices to be exploited better. No additional control information is added to PDOs by CANopen, so that communication efficiency is as high as in CAN. This means that the meaning of each PDO is determined directly by the related identifier. As multicasting is allowed on PDOs, their transmission is unconfirmed; i.e., the producer has no way to determine whether the PDO has been read by all the intended consumers. One noticeable feature of CANopen is that it can provide synchronous operations. In particular, it is possible to configure the transmission type of each single PDO so that its exchanges will be driven by the occurrence of the SYNC message, which is sent regularly by a node known as sync master (which usually is the same node as the application master). Synchronous data exchanges, in this case, take place in periodic communication cycles. When synchronous operations are selected, commanded values are not actuated by devices as soon as they are received, nor are sampled values transmitted immediately. Instead, as depicted in Figure 13.7, each time a SYNC message is read from the network, the PDOs received in the previous communication cycle are actuated by every output device. At the same time, all sensors will sample their input ports and the measured values will be sent as soon as possible in the next cycle. A synchronous window length parameter can be defined that specifies the latest time when it is certain that all commanded values have been made available to devices. After that time the processing of output values can be started. Synchronous operations provide a noticeable improvement for what concerns the effect of jitters: in this case, in fact, system operations and timings are decoupled by the actual times PDOs are exchanged over the network. As the SYNC message is mapped on a high-priority frame, jitters are, at worst, the same as the duration of the longest CAN message.
FIGURE 13.7 Synchronous operation.
© 2005 by CRC Press
Controller Area Network: A Survey
13-19
13.5.1.3 Service Data Objects SDOs are used in CANopen for parameterization and configuration, which usually take place at a lower priority than process data (hence, they are effectively considered non-real-time exchanges). In this case a confirmed transmission service has to be provided, which ensures a reliable exchange of information. Furthermore, SDOs are only available on a peer-to-peer communication basis (multicasting is not allowed). A fragmentation protocol has been adopted for SDOs — which derives from the domain transfer services of CAL — so that information of any size can be exchanged. This means that the SDO sender has to split the information in smaller chunks, which are then reassembled at the receiving side. This affects the communication efficiency in a negative way. However, as SDOs are not used for the real-time control of the system, this is not a problem. SDOs are used to access the entries of the object dictionary directly, so that they can be read or modified by the configuration tools. From a practical point of view, two services are provided, which are used to upload and download the content of one subentry of the OD, respectively. According to the predefined connection set, each node must provide SDO server functionalities and has to define a pair of COB IDs for dealing with the OD access, one for each direction of transfer. In a CANopen network only one SDO client at a time is usually allowed (in reality, what is needed is that all SDO connections between clients and servers be defined statically). It is optionally possible to provide dynamic establishment of additional SDO connections by means of a network entity called the SDO manager. 13.5.1.4 Network Management There are two kinds of functions related to network management (NMT): node control and error control. Node control services are used to control the operation of either a single node or the whole network. For example, they can be used to start or stop nodes, to reset their state, or to put a node in configuration (preoperational) mode. Such commands are definitely time critical, and hence they use the highestpriority communication object available in CAN. Error control services are used to monitor the correct operation of the network. Two mechanisms are available: node guarding and heartbeat. In both cases, low-priority messages are exchanged periodically in the background over the network by the different nodes and suitable watchdogs are defined, both in the NMT master and in slave nodes. Should one device cease sending these messages, after a given timeout the network management layer is made aware of the problem and can take the appropriate actions. 13.5.1.5 Device Profiles In order to provide interoperability, a number of device profiles have been standardized in CANopen. Each profile describes the common behavior of a particular class of devices and is usually described in a separate document. Among the available profiles are the following: • I/O devices [DS401], which include both digital and analog input/output devices • Drives and motion control, which are used to describe digital motion products, such as stepper motors and servo-drives • Human machine interfaces, which describe the use of displays and operator interfaces • Measuring devices and closed-loop controllers, which measure and control physical quantities • IEC 61131-3 programmable device, which describes the behavior of programmable logic controllers (PLCs) and intelligent devices • Encoders, which define incremental/absolute linear and rotary encoders to measure both position and velocity The I/O device profile, for instance, permits the definition of the polarity of each digital input/output port, or the application of a filtering mask for disabling selected bits. Device ports can be accessed in groups of 1, 8, 16, or 32 bits. For analog devices, it is possible to use the raw value or a converted one (after a scaling factor and an offset have been applied), or to define triggering conditions when specific thresholds are exceeded.
© 2005 by CRC Press
13-20
The Industrial Communication Technology Handbook
13.5.2 DeviceNet DeviceNet [DNET] is a very flexible protocol to be used at the field level in the automated environments. The implementation of devices that comply with DeviceNet is, in general, slightly more complex than that for CANopen devices. However, DeviceNet offers a number of additional features with respect to CANopen, which can be used, for example, in complex multimaster networks. One appealing feature of DeviceNet is that it is based on the same Control and Information Protocol (CIP) adopted by ControlNet and EtherNet/IP. This means that a good level of interoperability is ensured among these networks, making it possible to interconnect them to provide seamless communications from devices at the plant floor up to the Internet. In addition to the services at the application level, the DeviceNet specification also defines the physical layer in detail, including aspects such as connectors and cables (thin, thick, and flat cables are foreseen). It should be noted that the cable in DeviceNet can be used for both the signal and power supply (by using 4 wires plus ground). Each DeviceNet network can include up to 64 different devices, which means that each node is identified by means of a 6-bit MAC ID. The allowable bit rates are limited to 125, 250, and 500 Kbit/s, which means that the permitted maximum bus extensions lie in the range of 100 to 500 m. 13.5.2.1 Object Model The behavior and functions of each device are described in detail in DeviceNet by means of objects. In particular, three kinds of objects are foreseen: communication, system, and application-specific objects. Two very important objects are the connection object, which defines all aspects related to a connection (including the CAN identifier and the triggering mode), and the application object, which defines the standardized behavior of a class of devices. Data and services made available by each device are addressed by means of a hierarchical addressing scheme that is based on the following components: MAC ID (i.e., the device’s address), class ID, instance ID, attribute ID, and service code. The class, instance, and attribute identifiers are usually specified on 8 bits, while the service code is made up of a 7-bit integer. 13.5.2.2 Communication Model Communication among nodes (either point-to-point or multicast) takes place according to a connectionoriented scheme. By using the standard 11-bit CAN identifier, it is possible to provide an addressing scheme based on four message groups, in decreasing order of priority: • Message group 1 includes the highest-priority identifiers and permits up to 16 different messages per node. • Message group 2 essentially refers to the predefined master–slave connection set. • Message group 3 is similar to message group 1, but it is made up of low-priority frames. • Message group 4 is primarily used for network management. Basically, two kinds of communication are possible: explicit messages and I/O messages. Explicit messages are used for general data exchanges among devices, such as configuration, management, and diagnostics. These kind of exchanges take place on the network at a low priority. I/O messages are used to exchange high-priority real-time messages according to the producer–consumer model. Because the underlying communication system is based on a CAN network, each frame can include 8 bytes at most. Should one item of data exceed this size, a fragmentation protocol has been defined in DeviceNet that manages message splitting and the successive reassembly.
References [COP] European Committee for Electrotechnical Standardization, Industrial Communications Subsystem Based on ISO 11898 (CAN) for Controller-Device Interfaces: Part 4: CANopen, EN 503254, 2001.
© 2005 by CRC Press
Controller Area Network: A Survey
13-21
[DNET] European Committee for Electrotechnical Standardization, Industrial Communications Subsystem Based on ISO 11898 (CAN) for Controller-Device Interfaces: Part 2: DeviceNet, EN 503252, 2000. [DS102] CAN in Automation International Users and Manufacturers Group e.V., CAN Physical Layer for Industrial Applications: Two-Wire Differential Transmission, CiA DS 102, version 2.0, 1994. [DS301] CAN in Automation International Users and Manufacturers Group e.V., CANopen: Application Layer and Communication Profile, CiA DS 301, version 4.02, 2002. [DS401] CAN in Automation International Users and Manufacturers Group e.V., CANopen: Device Profile for Generic I/O Modules, CiA DS 401, version 2.1, 2002. [ISO1] International Organization for Standardization, Road Vehicles: Controller Area Network: Part 1: Data Link Layer and Physical Signalling, ISO 11898-1, 2003. [ISO2] International Organization for Standardization, Road Vehicles: Controller Area Network: Part 2: High-Speed Medium Access Unit, ISO 11898-2, 2003. [ISO3] International Organization for Standardization, Road Vehicles: Controller Area Network: Part 3: Low-Speed, Fault-Tolerant, Medium Dependent Interface, TC 22/SC 3/WG 1, ISO/PRF 11898-3, 2003. [ISO4] International Organization for Standardization, Road Vehicles: Controller Area Network: Part 4: Time-Triggered Communication, ISO 11898-4, 2004. [TIN] Tindell K.W., Burns A., and Wellings A.J., Calculating Controller Area Network (CAN) messages response times, Control Engineering Practice, 3, 1163–1169, 1995. [WCAN] Cena G. and Valenzano A., A multistage hierarchical distributed arbitration technique for priority-based real-time communication systems, in IEEE Transactions on Industrial Electronics, 49, 1227–1239, 2002.
© 2005 by CRC Press
14 The CIP Family of Fieldbus Protocols 14.1 Introduction ......................................................................14-1 14.2 Description of CIP ............................................................14-3 Object Modeling • Services • Messaging Protocol • Communication Objects • Object Library • Device Profiles • Configuration and Electronic Data Sheets • Bridging and Routing • Data Management
14.3 Network Adaptations of CIP..........................................14-18 DeviceNet • ControlNet • EtherNet/IP
14.4 Benefits of the CIP Family .............................................14-50 Benefits for the Manufacturer of Devices • Benefits for the Users of Devices and Systems
14.5 Protocol Extensions under Development......................14-51 CIP Sync • CIP Safety
Viktor Schiffer Rockwell Automation
14.6 Conclusion.......................................................................14-64 References ...................................................................................14-64
14.1 Introduction In the past, typical fieldbus protocols (e.g., Profibus, Interbus-S, FIP (Factory Instrumentation Protocol), P-Net, AS-i (Actuator/Sensor Interface)) have been isolated implementations of certain ideas and functionalities that the inventors thought were best suited to solve a certain problem or do a certain job. This has led to quite effective fieldbuses that do their particular job quite well, but they are optimized for certain layers within the automation pyramid or are limited in their functionality (e.g., strict single master systems running a Master/Slave protocol). This typically results in barriers within the automation architecture that are difficult to penetrate and that require complex gateway devices without being able to fully bridge the gap between the various systems that can be quite different in nature. In contrast, the CIP™* family of protocols (CIP = Common Industrial Protocol) offers a scalable solution that allows a uniform protocol to be employed from the top level of an automation architecture down to the device level without burdening the individual devices. DeviceNet™* is the first member of this protocol family introduced in 1994. DeviceNet is a CIP implementation using the very popular Controller Area Network (CAN) data link layer. CAN in its typical form (ISO 11898 [11]) defines layers 1 and 2 of the OSI seven-layer model [14] only, while DeviceNet covers the rest. The low cost of implementation and the ease of use of the DeviceNet protocol has led to a large number of manufacturers, with many of them organized in the Open DeviceNet Vendor Association (ODVA; see http://www.odva.org). *CIP™ and DeviceNet™ are trademarks of ODVA.
14-1 © 2005 by CRC Press
14-2
The Industrial Communication Technology Handbook
Layers according to ISO/OSI User Device Profiles
Semiconductor
Pneumatic valves
Position controller
AC Drives
Other profiles
CIP Application Layer Application Object Library
Application
CIP (Presentation)
CIP Data Management Services CIP Message Routing, Connection Management Explicit Messages, I/O Messages
(Session)
Transport
ControlNet Transport
DeviceNet Transport
Encapsulation TCP
Network
UDP IP
DataLink
ControlNet CTDMA
CAN CSMA/NBA
EtherNet CSMA/CD
Physical
ControlNet Phys. Layer
DeviceNet Phys. Layer
EtherNet Physical Layer
FIGURE 14.1
Possible future Alternatives: ATM, USB, FireWire,...
Relationship between CIP, its implementations, and the ISO/OSI layer model.
ControlNet™,* introduced a few years later (1997), implemented the same basic protocol on new data link layers that allow for much higher speed (5 Mbps), strict determinism, and repeatability while extending the range of the bus (several kilometers with repeaters) for more demanding applications. Vendors and users of ControlNet products are organized within ControlNet International (CI; see http://www.controlnet.org) to promote the use of these products. In 2000, ODVA and ControlNet International introduced the newest member of the CIP family — EtherNet/IP™,† where IP stands for Industrial Protocol. In this network adaptation, CIP runs over TCP/ IP and therefore can be deployed over any Transmission Control Protocol (TCP)/Internet Protocol (IP)supported data link and physical layers, the most popular of which is IEEE 802.3 [12], commonly known as Ethernet. The universal principles of CIP easily lend themselves to possible future implementations on new physical/data link layers, e.g., ATM, USB, or FireWire. The overall relationship between the three implementations of CIP and the ISO/OSI layer model is shown in Figure 14.1. Two significant additions to CIP are currently being worked on: CIP Sync™ and CIP Safety™.‡ CIP Sync allows synchronization of applications in distributed systems through precision real-time clocks in all devices. These real-time clocks are kept in tight synchronization by background messages between clock masters and clock slaves using the new IEEE 1588:2002 standard [24]. A more detailed description of this CIP extension is given in Section 14.5.1. CIP Safety is a protocol extension that allows the transmission of safety-relevant messages. Such messages are governed by additional timing and integrity mechanisms that are guaranteed to detect system flaws to a very high degree, as required by international standards such as IEC 61508 [15]. If anything goes wrong, the system will be brought to a safe state, typically taking the machine to a standstill. A more detailed description of this CIP extension is given in Section 14.5.2. In both cases, ordinary devices can operate with CIP Sync or CIP Safety devices side by *ControlNet™ is a trademark of ControlNet International. †EtherNet/IP™ is a trademark of ControlNet International under license by ODVA. ‡CIP Sync™ and CIP Safety™ are trademarks of ODVA.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
14-3
side in the same system. There is no need for strict segmentation into Standard, Sync, and Safety networks. It is even possible to have any combination of all three functions in one device.
14.2 Description of CIP CIP is a very versatile protocol that has been designed with the automation industry in mind. However, due to its very open nature, it can be applied to many more areas. The overall CIP Specification is divided into several volumes: • Volume 1 is the CIP Specification. It contains all general parts of the specification that apply to all the network variants. • Volume 2 is the EtherNet/IP Specification. It contains the adaptation of CIP to the Ethernet TCP/ IP and User Datagram Protocol (UDP)/IP transportation layers and all details that apply specifically to EtherNet/IP, including extensions and any modifications of the CIP Specification. • Volume 3 is the DeviceNet Specification. It contains the adaptation of CIP to the CAN data link layer and all details that apply specifically to DeviceNet, including extensions and any modifications of the CIP Specification. • Volume 4 is the ControlNet Specification. It contains the adaptation of CIP to the ControlNet data link layer and all details that apply specifically to ControlNet, including extensions and any modifications of the CIP Specification. • Volume 5 will contain CIP Safety; it is planned to be published in early 2005. The CIP Specification [4] is available from ODVA. It is beyond the scope of this handbook to fully describe each and every detail of this specification, but the key features will be presented. The specification is subdivided into several chapters and appendices that describe the following features: • • • • • • • • •
Object modeling Messaging protocol Communication objects General object library Device profiles Electronic Data Sheets Services Bridging and routing Data management
There are a few more chapters containing descriptions of further CIP elements, but they are not of significance in the context of this book. A few terms used throughout this section should be described here to ensure they are well understood: • Client: Within a Client/Server architecture, the client is the device that sends a request to a server. The client expects a response from the server. • Server: Within a Client/Server architecture, the server is the device that receives a request from a client. The server is expected to give a response to the client. • Producer: Within a Producer/Consumer architecture, the producing device places a message on the network for consumption by one or several consumers. The produced message is in general not directed to a specific consumer. • Consumer: Within a Producer/Consumer architecture, the consumer is one of potentially several consuming devices that pick up a message placed on the network by a producing device. • Producer/Consumer model: CIP makes use of the Producer/Consumer model as opposed to the traditional Source/Destination message addressing scheme (Figure 14.2). It is inherently multicast. Nodes on the network determine if they should consume the data in a message based on the Connection ID in the packet.
© 2005 by CRC Press
14-4
The Industrial Communication Technology Handbook
Source/Destination src
dst
data
crc
data
crc
Producer/Consumer identifier
FIGURE 14.2
Source/Destination vs. Producer/Consumer model.
• Explicit Message: Explicit Messages contain addressing and service information that directs the receiving device to perform a certain service (action) on a specific part (e.g., an attribute) of a device. • Implicit (Input/Output (I/O)) Message: Implicit Messages do not carry address or service information; the consuming node(s) already know what to do with the data based on the Connection ID that was assigned when the connection was established. They are called Implicit Messages because the meaning of the data is implied by the Connection ID. Let us now have a look at the individual elements of CIP.
14.2.1 Object Modeling CIP makes use of abstract object modeling to describe: • The suite of available communication services • The externally visible behavior of a CIP node • A common means by which information within CIP products is accessed and exchanged Every CIP node is modeled as a collection of objects. An object provides an abstract representation of a particular component within a product. Anything not described in object form is not visible through CIP. CIP objects are structured into classes, instances, and attributes. A class is a set of objects that represent the same kind of system component. An object instance is the actual representation of a particular object within a class. Each instance of a class has the same attributes, but it has its own particular set of attribute values. As Figure 14.3 illustrates, multiple object instances within a particular class can reside within a CIP node. In addition to the instance attributes, an object class may also have class attributes. These are attributes that describe properties of the whole object class, e.g., how many instances of this particular object exist. Furthermore, both object instances and the class itself exhibit a certain behavior and allow certain services to be applied to the attributes, instances, or whole class. All publicly defined objects that are implemented in a device must follow at least the mandatory requirements of the CIP specification. Vendor-specific objects may also be defined with a set of instances, attributes, and services according to the requirements of the vendor. However, they need to follow certain rules described in Chapter 4 of the CIP Specification [4]. The objects and their components are addressed by a uniform addressing scheme consisting of: • Node Identifier: An integer identification value assigned to each node on a CIP network. On DeviceNet and ControlNet, this is also called MAC ID (Media Access Control Identifier) and is nothing more than the node number of the device. On EtherNet/IP the Node ID is the IP address. • Class Identifier (Class ID): An integer identification value assigned to each object class accessible from the network. • Instance Identifier (Instance ID): An integer identification value assigned to an object instance that identifies it among all instances of the same class.
© 2005 by CRC Press
14-5
The CIP Family of Fieldbus Protocols
CIP Node
A Class of Objects
FIGURE 14.3
Object Instances
A class of objects.
• Attribute Identifier (Attribute ID): An integer identification value assigned to a class or instance attribute. • Service Code: An integer identification value that denotes an action request that can be directed at a particular object instance or object class (see Section 14.2.2). Object Class Identifiers are divided into open objects, defined in the CIP Specifications (ranging from 0x00 to 0x63 and 0x00F0 to 0x02FF), and vendor-specific objects (ranging from 0x64 to 0xC7 and 0x0300 to 0x04FF); all other Class Identifiers are reserved for future use. In some cases, e.g., within the Assembly Object class, Instance Identifiers are divided into open instances, defined in the CIP Specifications (ranging from 0x00 to 0x63 and 0x0100 to 0x02FF), and vendor-specific instances (ranging from 0x64 to 0xC7 and 0x0300 to 0x04FF); all other instance identifiers are reserved for future use. Attribute Identifiers are divided into open attributes, defined in the CIP Specifications (ranging from 0x00 to 0x63), and vendor-specific attributes (ranging from 0x64 to 0xC7); the other Attribute Identifiers are reserved for future use. Vendor-specific objects can be created with a lot of freedom, but they still have to adhere to certain rules specified for CIP; e.g., they can use whatever Instance and Attribute IDs they wish, but their class attributes must follow the CIP Specification. Figure 14.4 shows an example of this object addressing scheme. More details on object modeling can be found in Chapters 1 and 4 of the CIP Specification [4]. Node ID #1
Node ID #2 Node ID #4: Object Class #5: Instance #2:Attribute #2
CIP Link
Object Class #5 Instance #1
Object Class #5 Attribute #2
Object Class #7 Instance #1
Instance #2
Node ID #3 Instance #1
Node ID #4
FIGURE 14.4
© 2005 by CRC Press
Object addressing example.
14-6
The Industrial Communication Technology Handbook
14.2.2 Services Service Codes are used to define the action that is requested to take place when an object or parts of an object are addressed through Explicit Messages using the addressing scheme described in Section 14.2.1. Apart from the simple read and write functions, a set of CIP Common Services (totaling 22, currently described in [4]) have been defined. These CIP Common Services are common in nature, which means that they can be used in all CIP networks and that they are useful for a large variety of objects. Furthermore, there are object-specific Service Codes that may have a different meaning for the same code, depending on the class of object. Finally, there is a possibility to define vendor-specific services according to the requirements of the developer. While this gives a lot of flexibility, the disadvantage of vendor-specific services is that they may not be understood universally. Complete details of the CIP Service Codes can be found in Appendix A of the CIP common Specification [4].
14.2.3 Messaging Protocol CIP is a connection-based protocol. A CIP Connection provides a path between multiple application objects. When a connection is established, the transmissions associated with that connection are assigned a Connection ID (CID) (Figure 14.5). If the connection involves a bidirectional exchange, then two Connection ID values are assigned. The definition and format of the Connection ID is network dependent. For example, the Connection ID for CIP Connections over DeviceNet is based on the CAN Identifier field. Since most messaging on a CIP network is done through connections, a process has been defined to establish such connections between devices that are not connected yet. This is done through the Unconnected Message Manager (UCMM) function, which is responsible for the processing of Unconnected Explicit Requests and Responses. The general method to establish a CIP Connection is by sending a UCMM Forward_Open Service Request Message. While this is the method used on ControlNet and EtherNet/IP (all devices that allow Connected Messaging support it), it is rarely used on DeviceNet so far. For DeviceNet, the simplified methods described in Sections 14.3.1.11 and 14.3.1.12 are typically used. DeviceNet Safety™* (see Section 14.5.2), on the other hand, fully utilizes this service. A Forward_Open request contains all information required to create a connection between the originator and the target device and, if requested, a second connection between the target and the originator. In particular, the Forward_Open request contains information on the following: • • • • •
Time-out information for this connection Network Connection ID for the connection from the originator to the target Network Connection ID for the connection from the target to the originator Information on the identity of the originator (Vendor ID and Serial Number) (Maximum) data sizes of the messages on this connection
FIGURE 14.5
Connections and Connection IDs.
*DeviceNet Safety™ is a trademark of ODVA.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
14-7
• Trigger mechanisms, e.g., Cyclic, Change of State (COS) • Connection Path for the application object data in the node The Connection Path may also contain a Routing Segment that allows connections to exist across multiple CIP networks. The Forward_Open request may also contain an electronic key of the target device (Vendor ID, Device Type, Product Code, Revision), as well as configuration information that will be forwarded to the Configuration Assembly of the target device. Some networks, like ControlNet and EtherNet/IP, may also make extensive use of Unconnected Explicit Messaging, while DeviceNet uses Unconnected Messaging only to establish connections. All connections in a CIP network can be divided into I/O Connections and Explicit Messaging Connections: • I/O Connections provide dedicated, special-purpose communication paths between a producing application and one or more consuming applications. Application-specific I/O data move through these ports and are often referred to as Implicit Messaging. These messages are typically multicast. • Explicit Messaging Connections provide generic, multipurpose communication paths between two devices. These connections are often referred to as just Messaging Connections. Explicit Messages provide the typical Request/Response-oriented network communications. These messages are typically point-to-point. The actual data transmitted in CIP I/O Messages are the I/O data in an appropriate format — it may be prepended by a Sequence Count value. This Sequence Count value can be used to distinguish old data from new, e.g., if a message has been re-sent as a heartbeat in a COS Connection. The two states Run and Idle can be indicated with an I/O Message either by prepending a Run/Idle header, used for ControlNet and EtherNet/IP, or by sending I/O data (Run) or no I/O data (Idle), mainly used for DeviceNet. Run is the normal operative state of a device; the reaction to receiving an Idle event is vendor-specific and application-specific. Typically, this means bringing all outputs of the device to an Idle state, and that typically means “off,” i.e., de-energized. Explicit Messaging requests, on the other hand, contain a Service Code with path information to the desired object (attribute) within the target device followed by data (if any). The associated responses repeat the Service Code followed by status fields followed by data (if any). DeviceNet uses a condensed format for Explicit Messages, while ControlNet and EtherNet/IP use the full format. More details of the messaging protocol can be found in Chapter 2 of the CIP Specification [4].
14.2.4 Communication Objects The CIP communication objects manage and provide the runtime exchange of messages. While these objects follow the overall principles and guidelines for CIP objects, the communication objects are unique in a way since they are the focal point for all CIP communication. It therefore makes sense to have a look at them in more detail. Every instance of a communication object contains a link producer part or a link consumer part, or both. I/O Connections may be either producing or consuming or producing and consuming, while Explicit Messaging Connections are always producing and consuming. Figure 14.6 and Figure 14.7 show the typical connection arrangement for CIP I/O Messaging and CIP Explicit Messaging. The attribute values in the Connection Objects define a set of attributes that describe vital parameters of this connection. Note that Explicit Messages are always directed to the Message Router Object. First of all, they state what kind of connection this is. They specify whether this is an I/O Connection or an Explicit Messaging Connection, but also the maximum size of the data to be exchanged across this connection, and the source and sink of this data. Note that Explicit Messages are always directed to the Message Router Object. Further attributes define the state of this connection and what kind of behavior this connection is to show. Of particular importance is how messages are triggered (from the application, through Change of State or Change of Data, through Cyclic events or network events) and the timing of the connections
© 2005 by CRC Press
14-8
The Industrial Communication Technology Handbook
I/O Connection
I/O Producing Application Object
Producing I/O Connection
I/O Consuming Application Object
Consuming I/O Connection
Device #2
Device #1 I/O Message
I/O Consuming Application Object
Consuming I/O Connection
Device #3
FIGURE 14.6
CIP I/O Multicast Connection.
Explicit Messaging Connection Device #1 Request Application Object
Explicit Messaging Connection
Device #2 Request Explicit Messages
Response
FIGURE 14.7
Explicit Messaging Connection
Message Router Response
Obj. Obj.
CIP Explicit Messaging Connection.
(time-out associated with this connection and predefined action if a time-out occurs). CIP allows multiple connections to coexist in a device, although simple devices, e.g., simple DeviceNet slaves, will typically only have one or two connections alive at any given point in time. Complete details of the communication objects can be found in Chapter 3 of the CIP Specification [4].
14.2.5 Object Library The CIP family of protocols contains a very large collection of commonly defined objects (currently 48 object classes). The overall set of object classes can be subdivided into three types: • General-use objects • Application-specific objects • Network-specific objects Apart from the objects that are network-specific, all other objects are used in all three CIP network types. Figure 14.8 shows the general-use objects, Figure 14.9 shows a group of application-specific objects, and Figure 14.10 shows a group of network-specific objects. New objects are added on an ongoing basis. The general-use objects can be found in many different devices, while the application-specific objects are typically only found in devices hosting such applications.
© 2005 by CRC Press
14-9
The CIP Family of Fieldbus Protocols
• Identity Object, see Section 14.2.5.1
• Parameter Object, see Section 14.2.5.2
• Message Router Object
• Parameter Group Object
• Assembly Object, see Section 14.2.5.3
• Acknowledge Handler Object
• Connection Object, see Section 14.2.4
• Connection Configuration Object
• Connection Manager Object, see Section 14.2.4
• Port Object
• Register Object
• Selection Object • File Object
FIGURE 14.8
General-use objects.
• Discrete Input Point Object
• Sequencer Object
• Discrete Output Point Object
• Command Block Object
• Analog Input Point Object
• Motor Data Object
• Analog Output Point Object
• Control Supervisor Object
• Presence Sensing Object
• AD/DC Drive Object
• Group Object
• Overload Object
• Discrete Input Group Object
• Softstart Object
• Discrete Output Group Object
• S-Device Supervisor Object
• Discrete Group Object
• S-Analog Sensor Object
• Analog Input Group Object
• S-Analog Actor Object
• Analog Output Group Object
• S-Single Stage Controller Object
• Analog Group Object
• S-Gas Calibration Object
• Position Sensor Object
• Trip Point Object
• Position Controller Supervisor Object
• S-Partial Pressure Object
• Position Controller Object FIGURE 14.9
Application-specific objects.
• DeviceNet Object, see Section 14.3.1.4.1 • ControlNet Object, see Section 14.3.2.4.1 • ControlNet Keeper Object, see Section 14.3.2.4.2 • ControlNet Scheduling Object, see Section 14.3.2.4.3 • TCP/IP Interface Object, see Section 14.3.3.5.1 • Ethernet Link Object, see Section 14.3.3.5.2 FIGURE 14.10
© 2005 by CRC Press
Network-specific objects.
14-10
The Industrial Communication Technology Handbook
Parameter
Application Object(s)
Identity
Message Router
Assembly
Required Objects
Optional Objects I/O
Explicit msg
Connection(s)
Network Link* * - DeviceNet - ControlNet - Ethernet
CIP Network
FIGURE 14.11
Typical device object model.
This looks like a large number of object types, but typical devices only implement a subset of these objects. Figure 14.11 shows the object model of such a typical device. The objects required in a typical device are: • • • •
Either a Connection Object or a Connection Manager Object An Identity Object One or several network link-related objects (depends on network) A Message Router Object (at least its function)
Further objects are added according to the functionality of the device. This allows very good scalability of implementations so that small devices such as a proximity sensor on DeviceNet are not burdened with unnecessary overhead. Developers typically use publicly defined objects (see above list), but can also create their own objects in the vendor-specific areas, e.g., Class IDs 100 to 199. However, it is strongly encouraged to work with the Special Interest Groups (SIGs) of ODVA and ControlNet International to create common definitions for further objects instead of inventing private ones. Out of the general-use objects, several will be described in more detail below. 14.2.5.1 Identity Object (Class Code 0x01) This object is described in more detail for two reasons: (1) being a relatively simple object, it can easily be used to show the general principles, and furthermore, (2) every device must have an Identity Object. Therefore, it is of general interest in this context. The vast majority of devices only support one instance of the Identity Object. Thus, there are typically no requirements for any class attributes that would describe further class details, e.g., how many instances exist in the device; only instance attributes are required in most cases. There are mandatory attributes (Figure 14.12) and optional attributes (Figure 14.13). • The Vendor ID attribute allows an identification of the vendor of every device. This UINT (Unsigned Integer) value (for Data Type descriptions, see Section 14.2.9) is assigned to a specific vendor by ODVA or ControlNet International. If a vendor intends to build products for more than one CIP network, he will get the same Vendor ID for all networks. • The Device Type specifies which profile has been used for this device. It must be one of the Device Types described in Chapter 6 of the CIP Specification [4] or a vendor-specific type (see Section 14.2.6). • The Product Code is a UINT number defined by the vendor of the device. This is used to distinguish multiple products of the same Device Type from the same vendor.
© 2005 by CRC Press
14-11
The CIP Family of Fieldbus Protocols
• Vendor ID
• Status
• Device Type
• Serial Number
• Product Code
• Product Name
• Revision FIGURE 14.12
Mandatory attributes.
• State • Configuration Consistency Value • Heartbeat Interval • Languages Supported FIGURE 14.13
Optional attributes.
• The Revision is split into two USINT (Unsigned Short Integer) values specifying a Major Revision and a Minor Revision. Any change of the device that results in a modified behavior of the device on the network must be reflected in a change of at least the Minor Revision. Any change in the device that needs a revised Electronic Data Sheet (EDS; see Section 14.2.7) must be reflected in a change of the Major Revision. Vendor ID, Device Type, Product Code, and Major Revision allow an unambiguous identification of an EDS for this device. • The Status attribute provides information on the status of the device, e.g., whether it is owned (controlled by another device), whether it is configured (to something different from the out-ofthe-box default), and whether any major or minor faults have occurred. • The Serial Number is used to uniquely identify individual devices in conjunction with the Vendor ID; i.e., no two CIP devices of a vendor may carry the same Serial Number. The 32 bits of the Serial Number allow ample space for a subdivision into number ranges that could be used by different divisions of larger companies. • The Product Name attribute allows the vendor to give a meaningful ASCII name string (up to 32 characters) to the device. • The State attribute describes the state of a device in a single UINT value; it is thus less detailed than the Status attribute. • The Configuration Consistency Value allows a distinction between a configured and an unconfigured device or between different configurations in a device. This helps avoid unnecessary configuration downloads. • The Heartbeat Interval allows enabling of the Device Heartbeat Message and setting the maximum time between two heartbeats to 1 to 255 s. The services supported by the class and instance attributes are either Get_Attribute_Single (typically implemented in DeviceNet devices) or Get_Attributes_All (typically implemented in ControlNet and EtherNet/IP devices). None of the attributes is settable, except for the Heartbeat Interval (if implemented). The only other service that is typically supported by the Identity Object is the reset service. The behavior of the Identity Object is described through a state transition diagram. This and further details of the Identity Object can be found in Chapter 5 of the CIP Specification [4]. 14.2.5.2 Parameter Object (Class Code 0x0F) This object is described in some detail since its concept is referred to in Section 14.2.7, “Configuration and Electronic Data Sheets.” This object, when used, comes in two “flavors”: a complete object and an abbreviated version (Parameter Object Stub). This abbreviated version is mainly used by DeviceNet
© 2005 by CRC Press
14-12
The Industrial Communication Technology Handbook
Parameter Value
This is the actual parameter.
Link Path Size Link Path
These two attributes contain information on what application object/instance/ attribute the parameter value is retrieved from.
Descriptor
This describes parameter properties, e.g., read-only, monitor parameter, etc.
Data Type
This must be one of the Data Types described in Chapter C-6.1 of the CIP Specification, see Section 14.2.9.
Data Size
Data size in bytes.
FIGURE 14.14
Parameter Object Stub attributes.
devices that only have small amounts of memory available. The Object Stub in conjunction with the Electronic Data Sheet has more or less the same functionality as the full object (see Section 14.2.7). The purpose of this object is to provide a general means to allow access to many attributes of the various objects in the device without a simple tool (such as a handheld terminal) having to know anything about the specific objects in the device. The class attributes of the Parameter Object contain information on how many instances exist in this device and a Class Descriptor indicating, among other properties, whether a full or stub version is supported. Furthermore, they tell whether a Configuration Assembly is used and what language is used in the Parameter Object. Of the instance attributes, the first six are those required for the Object Stub. These are listed in Figure 14.14. These six attributes already allow access, interpretation, and modification of the parameter value, but the remaining attributes make life a lot better: • The next three attributes provide ASCII strings with the name of the parameter, its engineering units, and an associated help text. • Another three attributes contain the minimum, maximum, and default values of the parameter. • The next four attributes that follow allow scaling of the parameter value so that the parameter can be displayed in a more meaningful way, e.g., raw value in multiples of 10 mA, scaled value displayed in amps. • Another four attributes follow that can link the scaling values to other parameters. This feature allows variable scaling of parameters, e.g., percentage scaling to a full range value that is set by another parameter. • Attribute 21 defines how many decimal places are to be displayed if the parameter value is scaled. • Finally, the last three attributes are an international language version of the parameter name, its engineering units, and the associated help text. 14.2.5.3 Assembly Object (Class Code 0x04) Using the Assembly Object gives the option of mapping data from attributes of different instances of various classes into one single attribute (attribute 3) of an instance of the Assembly Object. This mapping is generally used for I/O Messages to maximize the efficiency of the control data exchange on the network. Due to the Assembly mapping, the I/O data are available in one block; thus, there are fewer Connection Object instances and fewer transmissions on the network. The process data are normally combined from different application objects. An Assembly Object can also be used to configure a device with a single data block, rather than having to set individual parameters. CIP makes a distinction between Input and Output Assemblies. Input and output in this context are viewed from the network. An Input Assembly reads data from the application and produces it on the network. An Output Assembly consumes data from the network and writes the data to the application. This data mapping is very flexible; even mapping of individual bits is permitted. Assemblies can also be
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
FIGURE 14.15
14-13
Example of an Assembly mapping.
used to transmit a complete set of configurable parameters instead of accessing them individually. These Assemblies are called Configuration Assemblies. Figure 14.15 shows an example of an Assembly mapping. The data from application objects 100 and 101 are mapped in two instances of the Assembly Object. Instance 1 is set up as an Input Assembly for the input data and instance 2 as an Output Assembly for output data. The data block is always accessed via attribute 3 of the relevant Assembly instance. Attributes 1 and 2 contain mapping information. The I/O Assembly mapping is specified for certain Device Profiles (e.g., Motor Starters) by the ODVA. Device developers can then choose which Assemblies they support in their products. If none of the publicly defined Assemblies fully represent the functionality of the product, a device vendor may implement additional vendor-specific Assemblies (Instance IDs 100 to 199). CIP defines static and dynamic Assembly Objects. Whereas mapping for static Assemblies is permanently programmed in the device (ROM), it can be modified and extended for dynamic mapping (RAM). Most simple CIP devices support only static Assembly Objects. Dynamic Assembly Objects tend to be used in more complex devices.
© 2005 by CRC Press
14-14
The Industrial Communication Technology Handbook
• Generic Device (0x00)
• Motor Starter (0x16)
• AC Drives (0x02)
• Soft Starter (0x17)
• Motor Overload (0x03)
• Human Machine Interface (0x18)
• Limit Switch (0x04)
• Mass Flow Controller (0x1A)
• Inductive Proximity Switch (0x05)
• Pneumatic Valve(s) (0x1B)
• Photoelectric Sensor (0x06)
• Vacuum/Pressure Gauge (0x1C)
• General Purpose Discret I/O (0x07)
• Process Control Valve (0x1D)
• Resolver (0x09)
• Residual Gas Analyzer (0x1E)
• Communications Adapter (0x0C)
• DC Power Generator (0x1F)
• ControlNet Programmable Logic Controller (0x0E)
• RF Power Generator (0x20)
• Position Controller (0x10)
• Turbomolecular Vacuum Pump (0x21)
• DC Drives (0x13)
• ControlNet Physical Layer (0x32)
• Contactor (0x15)
• (this is not a “normal” profile, it does not • contain any objects)
FIGURE 14.16
CIP Device Types.
14.2.6 Device Profiles It would be possible to design products using only the definitions of communication links and objects, but this could easily result in similar products having quite different data structures and behavior. To overcome this situation and to make the application of CIP devices much easier, devices of similar functionality have been grouped into Device Types with associated profiles. Such a CIP profile contains the full description of the object structure and behavior. Figure 14.16 shows the Device Types and associated profiles that have been defined so far in the CIP Specification [4] (profile numbers are in parentheses). Device developers must use a profile. Any device that does not fall into the scope of one of the specialized profiles must use the Generic Device profile or a vendor-specific profile. What profile is used and which parts of it are implemented must be described in the user documentation of the device. Every profile consists of a set of objects, some required, some optional, and a behavior associated with that particular type of device. Most profiles also define one or several I/O data formats (Assemblies) that define the meaning of the individual bits and bytes of the I/O data. In addition to the publicly defined object set and I/O data Assemblies, vendors can add objects and Assemblies of their own if they have devices that have additional functionality. If that is still not appropriate, vendors can create profiles of their own within the vendor-specific profile range. They are then free to define whatever behavior and objects are required for their device as long as they stick to some general rules for profiles. Whenever additional functionality is used by multiple vendors, ODVA and ControlNet International encourage coordinating these new features through discussion in the Special Interest Groups (SIGs). They can then create new profiles and additions to the existing profiles for everybody’s use and the benefit of the device users. All open (ODVA/CI defined) profiles carry numbers in the 0x00 through 0x63 or 0x0100 through 0x02FF ranges, while vendor-specific profiles carry numbers in the 0x64 through 0xC7 or 0x0300 through 0x02FF ranges. All other profile numbers are reserved by CIP. Complete details of the CIP profiles can be found in Chapter 6 of the CIP Specification [4].
14.2.7 Configuration and Electronic Data Sheets CIP has made provisions for several options to configure devices:
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
• • • • •
14-15
A printed data sheet Parameter Objects and Parameter Object Stubs An Electronic Data Sheet (EDS) A combination of an EDS and Parameter Object Stubs A Configuration Assembly combined with any of the above methods
When using configuration information collected on a printed data sheet, configuration tools can only provide prompts for service, class, instance, and attribute data and relay this information to a device. While this procedure can do the job, it is the least desirable solution since it does not determine the context, content, or format of the data. Parameter Objects, on the other hand, provide a full description of all configurable data of a device. This allows a configuration tool to gain access to all parameters and maintain a user-friendly interface since the device itself provides all the necessary information. However, this method imposes a burden on a device with full parameter information, which may be excessive for a small device, e.g., a simple DeviceNet slave. Therefore, an abbreviated version of the Parameter Object, called Parameter Object Stub, may be used (see Section 14.2.5.2). This still allows access to the parameter data, but it does not describe any meaning of this data. Parameter Stubs in conjunction with a printed data sheet are usable, but certainly not optimal. On the other hand, an EDS supplies all the information that a full Parameter Object contains in addition to I/O Connection information. The EDS thus provides the full functionality and ease of use of the Parameter Object without imposing an excessive burden on the individual devices. Another value of the EDS is that it provides a means for tools to do offline configuration and download the configuration data to the device at a later point in time. An EDS is a simple ASCII text file that can be generated on any ASCII editor. The CIP Specification lays down a set of rules for the overall design and syntax of an EDS. The main purpose of the EDS is to give information on several aspects of the device’s capabilities, the most important ones being the I/O Connections it supports and what parameters for display or configuration exist within the device. It is highly recommended that all supported I/O Connections are described within the EDS; this makes the application of a device much easier. When it comes to parameters, it is up to the developer to decide which items to make accessible to the user. Let us look at some details of the EDS: EDSs are structured into sections. Every section starts with a section name in square brackets []. The first two sections are mandatory for all EDSs. • [File]: Describes the contents and revision of the file • [Device]: Is equivalent to the Identity Object information and is used to match an EDS to a device • [Device Classification]: Describes what network the device can be connected to, optional for DeviceNet, required for ControlNet and EtherNet/IP • [IO_Info]: Describes connection methods and I/O sizes, DeviceNet only • [Variant_IO_Info]: Describes multiple I/O_Info data sets, DeviceNet only • [ParamClass]: Describes class-level attributes of the Parameter Objects • [Params]: Identifies all configuration parameters in the device; follows the Parameter Object definition, further details below • [EnumPar]: Enumeration list of parameter choices to present to the user; old method specified for DeviceNet only • [Assembly]: Describes the structure of data items • [Groups]: Identifies all parameter groups in the device and lists group name and Parameter Object instance numbers • [Connection Manager]: Describes connections supported by the device, typically used in ControlNet and EtherNet/IP • [Port]: Describes the various network ports a device may have • [Modular]: Describes modular structures inside a device • [Capacity]: Brand new EDS section to specify communication capacity of EtherNet/IP and ControlNet devices
© 2005 by CRC Press
14-16
The Industrial Communication Technology Handbook
With these sections, a very detailed description of a device can be made. Only a few of these details are described here, and further reading is available in [25] and [26]. A tool with a collection of EDSs will first use the device section to try to match an EDS with each device it finds on a network. Once this is done and a particular device is chosen, the tool can then display device properties and parameters and allow their modification (if necessary). A tool may also display what I/O Connections a device may allow and which of these are already in use. EDS-based tools are mainly used for slave or adapter devices; scanner devices are typically too complex to be configured through EDSs. For those devices, the EDS is mainly used to identify the device and then guide the tool to call a matching configuration applet. A particular strength of the EDS approach lies in the methodology of parameter configuration. A configuration tool typically takes all the information supplied by the Parameter Object and an EDS and displays it in a user-friendly manner. This enables the user to configure a device in many cases without the need of a detailed manual; the tool presentation of the parameter information together with help texts allows one to make the right decisions for a complete device configuration, provided, of course, that the developer has supplied all information required. A complete description of what can be done with EDSs goes well beyond the scope of this handbook. For further details, consult [25], [26], and Chapter 7 of the CIP Specification [4].
14.2.8 Bridging and Routing CIP has defined mechanisms that allow the transmission of messages across multiple networks, provided the bridging devices (routers) between the various networks are equipped with a suitable set of capabilities (objects and support of services). Once this is the case, the message will be forwarded from router to router until it has reached its destination node. Here is how it works: For Unconnected Explicit Messaging, the actual Explicit Message to be executed on the target device is wrapped up using another type of Explicit Message service, the so-called Unconnected_Send Message. This Unconnected_Send Message (Service Code 0x52 of the Connection Manager Object) contains complete information on the transport mechanism, in particular time-outs (they may be different while the message is still en route) and path information. The first router device that receives an Unconnected_Send Message will take its contents and forward them to the next network as specified within the path section of the message. Before the message is actually sent, the used part of the path is removed, but remembered by the intermediate router device for the return of any response. This process is executed for every hop until the final destination network is reached. The number of hops is theoretically limited by the message length. Once the Unconnected_Send Message has arrived at the target network, the inner Explicit Message is then sent to the target device, which executes the requested service and generates a response. This response is then routed back through all the routers it has gone through during its forward journey until it has finally reached the originating node. It is important to note in this context that the transport mechanism may have been successful in forwarding the message and returning the response, but the response could still contain an indication that the desired service could not be performed successfully in the target network/device. Through this mechanism, the router devices do not need to know anything about the message paths ahead of time. Thus, no programming of any of the router devices is required. This is often referred to as seamless routing. When a connection (I/O or Explicit) is set up using the Forward_Open service (see Section 14.3.2.10), it may go to a target device on another network. To enable the appropriate setup process, the Forward_Open Message may contain a field with path information describing a route to the target device. This is very similar to the Unconnected_Send service described above. This routing information is then used to create routed connections within the routing devices between the originator and the target of the message. Once set up, these connections automatically forward any incoming messages for this connection to the outgoing port en route to the target device. Again, this is repeated until the message has reached its target node. As with routed Unconnected Explicit Messages, the number of hops is generally limited
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
FIGURE 14.17
14-17
Logical Segment encoding example.
only by the capability of the devices involved in this process. In contrast to routed Unconnected Messages, routed Connected Messages do not carry path information. Since Connected Messages always use the same path for any given connection, the path information that was given to the routing devices during connection setup is held there as long as the connection exists. Again, the routing devices do not have to be preprogrammed; they are self-configured during the connection establishment process.
14.2.9 Data Management The Data Management part of the CIP Specification describes addressing models for CIP entities and the data structure of the entities themselves. The entity addressing is done by so-called Segments, a method that allows very flexible usage so that many different types of addressing methods can be accommodated. The first byte of a CIP Segment allows a distinction between a Segment Address (0x00 to 0x9F) and a Data Type description (0xA0 to 0xDF). Two uses of this addressing scheme (Logical Segments and Data Types) are looked at in a little more detail here; all of them are described in Appendix C of the CIP Specification [4]. 14.2.9.1 Logical Segments Logical Segments (first byte = 0x20 to 0x3F) are addressing segments that can be used to address objects and their attributes within a device. They are typically structured into [Class ID] [Instance ID] [Attribute ID, if required]. Each element of this structure allows various formats (1, 2, and 4 bytes). Figure 14.17 shows a typical example of this addressing method. This type of addressing is commonly used to point to assemblies, parameters, or any other addressable attribute within a device. It is extensively used in EDSs, but also within Unconnected Messages, to name just a few application areas. A complete list of all Segment types and their encoding can be found in Appendix C of the CIP Specification [4]. 14.2.9.2 Data Types Data Types (first byte = 0xA0 to 0xDF) can be either structured (first byte = 0xA0 to 0xA3) or elementary (first and only byte = 0xC1 to 0xDE) Data Types. Structured Data Types can be arrays of elementary Data Types or any assembly of arrays or elementary Data Types. Of particular importance in the context of this handbook are the elementary Data Types. They are used within EDSs to specify the Data Types of parameters and other entities. Here is a list of commonly used Data Types: • 1 bit (encoded into 1 byte): • Boolean, BOOL, Type Code 0xC1 • 1 byte: • Bit string, 8 bits, BYTE, Type Code 0xD1 • Unsigned 8-bit integer, USINT, Type Code 0xC6 • Signed 8-bit integer, SINT, Type Code 0xC2 • 2 bytes: • Bit string, 16 bits, WORD, Type Code 0xD2
© 2005 by CRC Press
14-18
The Industrial Communication Technology Handbook
• Unsigned 16-bit integer, UINT, Type Code 0xC7 • Signed 16-bit integer, INT, Type Code 0xC3 • 4 bytes: • Bit string, 32 bits, DWORD, Type Code 0xD3 • Unsigned 32-bit integer, UDINT, Type Code 0xC8 • Signed 32-bit integer, DINT, Type Code 0xC4 The Data Types in CIP follow the requirements of IEC 61131-3 [9]. A complete list of all Data Types and their encodings can be found in Appendix C of the CIP Specification [4]. 14.2.9.3 Maintenance and Further Development of the Specifications Both ODVA and ControlNet International have a set of working groups that have the task of maintaining the specifications and creating protocol extensions, e.g., new profiles or functional enhancements such as CIP Sync and CIP Safety. These groups are called Special Interest Groups (SIGs) for DeviceNet and ControlNet and Joint Special Interest Groups (JSIGs) for EtherNet/IP. JSIGs are called “joint” since it is a combination of ODVA and ControlNet International members that do the work, since the EtherNet/IP technology is jointly administered by both groups. The results of these SIGs are written up as DSEs (DeviceNet Specification Enhancements), CSEs (ControlNet Specification Enhancements), ESEs (EtherNet/IP Specification Enhancements), or CIPSEs (CIP Specification Enhancements), presented to the Technical Review Board (TRB) for approval and then incorporated into the specifications. Only ODVA or ControlNet International members can work within the SIGs, and those participants have the advantage of advance knowledge of technical changes. Participation in one or several SIGs is therefore highly recommended.
14.3 Network Adaptations of CIP Up to now there are three public derivatives of CIP. These three derivatives are based on quite different data link layers and transport mechanisms, but they maintain the principles of CIP.
14.3.1 DeviceNet DeviceNet was the first public implementation of CIP. As already mentioned in Section 14.2, DeviceNet is based on the Controller Area Network (CAN). The adaptations of CIP are done to accommodate certain limitations of the CAN protocol (up to 8 bytes payload only) and to allow for a simple device with only minimal processing power; for a more detailed description of the CAN protocol and some of its applications, see [10]. DeviceNet uses a subset of the CAN protocol (11-bit identifier only, no remote frames). Figure 14.18 shows the relationship between CIP, DeviceNet, and the ISO/OSI layer model. 14.3.1.1 Physical Layer and Relationship to CAN The physical layer of DeviceNet is an extension of ISO 11898 [11]. This extension defines the following additional details: • • • •
Improved transceiver characteristics that allow the support of up to 64 nodes per network Additional circuitry for overvoltage and miswiring protection Several types of cables for a variety of applications Several types of connectors for open (IP20) and enclosed (IP65/67) devices
These extensions result in a communication system with the following physical layer characteristics: • • • •
Trunkline/dropline configuration Support for up to 64 nodes Node removal without severing the network Simultaneous support for both network-powered (sensors) and separately powered (actuators) devices
© 2005 by CRC Press
14-19
The CIP Family of Fieldbus Protocols
Layers according to ISO/OSI User Device Profiles
Semiconductor
Pneumatic valves
Position controller
AC Drives
Other profiles
CIP Application Layer Application Object Library
Application
CIP (Presentation) (Session)
CIP Data Management Services CIP Message Routing, Connection Management Explicit Messages, I/O Messages
ControlNet Transport
Transport
DeviceNet Transport
Encapsulation TCP
Network
UDP IP
DataLink
ControlNet CTDMA
CAN CSMA/NBA
EtherNet CSMA/CD
Physical
ControlNet Phys. Layer
DeviceNet Phys. Layer
EtherNet Physical Layer
FIGURE 14.18
Possible future Alternatives: ATM, USB, FireWire,...
Relationship between CIP and DeviceNet.
• • • • • • •
Use of sealed or open-style connectors Protection from wiring errors Selectable data rates of 125, 250, and 500 kBaud Adjustable power configuration to meet individual application needs High current capability (up to 16 amps per supply) Operation with off-the-shelf power supplies Power taps that allow the connection of several power supplies from multiple vendors that comply with DeviceNet standards • Built-in overload protection • Power available along the bus: both signal and power lines contained in the cable The cables described in the DeviceNet Specification have been specially designed to meet minimum propagation speed requirements to make sure they can be used up to the maximum system length. Using the specified cables, in conjunction with suitable transceiver circuits, results in overall systems, as specified in Figure 14.19. ODVA has issued a guideline [7] that gives complete details on how to build the physical layer of a DeviceNet network. Developers of DeviceNet devices have the choice of creating DeviceNet circuits with or without physical layer isolation; both versions are fully specified. Furthermore, a device may take some or all of its power Trunk Distance Data Rate
Thick Cable
125 kBaud
500 meters
250 kBaud
250 meters
500 kBaud
100 meters
FIGURE 14.19
© 2005 by CRC Press
Thin Cable
Drop Length Flat Cable
Maximum
420 meters 100 meters
Data rate vs. trunk and drop length.
200 meters 75 meters
Cumulative 156 meters
6 meters
78 meters 39 meters
14-20
The Industrial Communication Technology Handbook
from the bus, thus avoiding extra power lines for devices that can live on the power supplied through the DeviceNet cable. All DeviceNet devices must be equipped with one of the connectors described in the DeviceNet Specification. Hard wiring of a device is allowed, provided the node is removable without severing the trunk. 14.3.1.2 Protocol Adaptations On the protocol side, there are basically two adaptations of CIP (apart from the addition of the DeviceNet Object) that have been made to better accommodate it to the CAN data frame: • Limitation to short messages (8 bytes or less) where possible; introduction of fragmentation for longer messages. • Introduction of the Master/Slave communication profile minimizes connection establishment management (see Section 14.3.1.12). These two features have been created to allow the use of simple and thus inexpensive microcontrollers. This is particularly important for small, cost-sensitive devices like photo-eyes or proximity sensors. As a result of this specialization, the DeviceNet protocol in its simplest form has been implemented in 8-bit microprocessors with as little as 4 kbyte of code memory and 175 bytes of RAM. The fragmentation of messages comes in two varieties: For I/O Messages typically sent with a fixed length, the use of fragmentation is defined through the maximum length of data to be transmitted through a connection. Any connection that has more than 8 bytes to transmit always uses the fragmentation protocol, even if the actual data to be transmitted are 8 bytes or less, e.g., an Idle Message. For Explicit Messaging, the use of the fragmentation protocol is indicated with every message, since the actual message will vary in length. The actual fragmentation protocol is contained in one extra byte within the message that indicates whether the fragment is a start, middle, or end fragment. A modulo 64 rolling fragment counter allows very long fragmented messages, in theory limited only by the maximum Produced or Consumed Connection Sizes (65,535 bytes). In reality, it is the capability of the devices that limits the message sizes. 14.3.1.3 Indicators and Switches DeviceNet devices may be built with indicators or without, but it is recommended to incorporate some of the indicators described in the specification. These indicators allow the user to determine the state of the device and its network connection. Devices may have additional indicators with a behavior not described in the specification. However, any indicators that carry names of those described in the specification must also follow their specified behavior. Devices may be built with or without switches or other directly accessible means for configuration. If switches for MAC ID and baud rate exist, then certain rules apply regarding how these values have to be used at power-up and during the operation of the device. 14.3.1.4 Additional Objects The DeviceNet Specification defines one additional object, the DeviceNet Object. 14.3.1.4.1 DeviceNet Object (Class Code 0x03) A DeviceNet Object is required for every DeviceNet port of the device. The instance attributes of this object contain information on how this device uses the DeviceNet port. In particular, there is information about the MAC ID of the device and the (expected) baud rate of the DeviceNet network this device is attached to. Both attributes are always expected to be nonvolatile; i.e., after a power interruption, the device is expected to try to go online again with the same values that were stored in these attributes before the power interruption. Devices that set these values through switches typically override any stored values at power-up. 14.3.1.5 Network Access DeviceNet uses the network access mechanisms described in the CAN specification, i.e., bitwise arbitration through the CAN Identifier for every frame to be sent. This requires a system design that does not
© 2005 by CRC Press
14-21
The CIP Family of Fieldbus Protocols
allow multiple uses of any of these identifiers. Since the node number of every device is coded into the CAN Identifier (see Section 14.3.1.10), it is generally sufficient to make sure that none of the node numbers exist more than once on any given network. This is guaranteed through the Network Access algorithm (see Section 14.3.1.6). 14.3.1.6 Going Online Any device that wants to communicate on DeviceNet must go through a Network Access algorithm before any communication is allowed. The main purpose of this process is to avoid duplicate Node IDs on the same network. Every device that is ready to go online sends a Duplicate MAC ID Check Message containing its Port Number, Vendor ID, and Serial Number. If another device is already online with this MAC ID or is in the process of going online, it responds with a Duplicate MAC ID Response Message that directs the checking device to go offline and not communicate any further. If two or more devices with the same MAC ID should happen to try to go online at exactly the same time, all of them will win arbitration at the same time (same CAN ID) and will proceed with their messages. However, since they will exhibit different values in the data field of the message, all devices on the link will flag Cyclic Redundancy Check (CRC) errors and thus cause a repetition of the message. This may eventually result in a Bus-Off condition for these devices, but a situation with duplicate Node ID is safely avoided. 14.3.1.7 Offline Connection Set The Offline Connection Set is a set of messages that have been created to communicate with devices that have failed to go online (see Section 14.3.1.6), e.g., to allow setting a new MAC ID. Full details of these messages can be found in [5] or [10]. 14.3.1.8 Explicit Messaging All Explicit Messaging in DeviceNet is done via connections and the associated Connection Object instances. However, these objects must first be set up in the device. This can be done by using the Predefined Master/Slave Connection Set to activate a static Connection Object already available in the device or by using the Unconnected Message Manager (UCMM) port of a device, through which this kind of Connection Object can be dynamically set up. The only messages that can be sent to the UCMM are Open or Close requests that set up or tear down a Messaging Connection, while the only messages that can be sent to the Master/Slave equivalent are an Allocate or Release request (see also Section 14.3.1.12). Explicit Messages always pass via the Message Router Object to the individual objects (refer to Figure 14.11). As mentioned in Section 14.2.3, Explicit Messages on DeviceNet have a very compact structure to make them fit into the 8-byte frame in most cases. Figure 14.20 shows a typical example of a request message. Bit number Byte offset
7
6
0
Frag [0]
XID
1
R/R [0]
2
5
4
3
2
MAC ID
1
0 Message header
Service Code Class ID
3
Instance ID
4 ... 7
Service data ... (optional)
Message body
FIGURE 14.20 Format of nonfragmented Explicit Request Message using the 8/8 message body format (1 byte for Class ID, 1 byte for Instance ID).
© 2005 by CRC Press
14-22
The Industrial Communication Technology Handbook
Bit number Byte offset
7
6
0
Frag [0]
XID
1
R/R [1]
5
4
2
1
0 Message header
MAC ID Service Code
Message body
Service data ... (optional)
2 ... 7
FIGURE 14.21
3
Format of a nonfragmented 8/8 Explicit Response Message.
The consumer of this Explicit Message responds in the format shown in Figure 14.21. The consumer sets the R/R (Request/Response) bit and repeats the Service Code of the request message. If data are transferred with the response, this is entered in the service data field. Most messages will use the 8/8 format shown in Figure 14.20, since they only need to address Class and Instance IDs up to 255. If there is a need to address any Class/Instance combinations above 255, then this is negotiated between the two communication partners during the setup of the connection. Should an error occur, the receiver responds with the Error Response Message. The Service Code for this message is 0x14; 2 bytes of error code are returned in the service data field. Further details of the message encoding, including the use of fragmentation, can be found in [5] and [10]. 14.3.1.9 I/O Messaging I/O Messages have a very compact structure; only the naked data are transmitted without the Run/Idle header and Sequence Count value used in ControlNet and EtherNet/IP. For messages up to 8 bytes long, the full CAN data field is used for I/O data. I/O Messages that are longer use 1 byte of the CAN data field for the fragmentation protocol (Figure 14.22 and Figure 14.23). I/O Messages without data (i.e., with zero length data) indicate the Idle state of the producing application. Any producing device can do this — master, slave, or peer.
Bit number Byte offset
7
6
5
4
0 ...
3
2
1
0
2
1
0
Process data (0 – 8 bytes)
7
FIGURE 14.22
Format of a nonfragmented I/O Message, 0 to 8 bytes.
Bit number Byte offset 0 1 ...
7
6
5
4
Fragmentation protocol Process data (0 – 7 bytes)
7
FIGURE 14.23
© 2005 by CRC Press
3
Format of the fragmented I/O Message.
14-23
The CIP Family of Fieldbus Protocols
FIGURE 14.24
I/O Messaging Connections.
As already mentioned, I/O Messages are used to exchange high-priority application and process data via the network, and this communication is based on the Producer/Consumer model. The associated I/O data are always transferred from one producing application object to one or more consuming application objects. This is undertaken using I/O Messages via I/O Messaging Connection Objects (Figure 14.24 shows two consuming applications) that must have been previously set up in the device. This can be done in one of two ways by using: • The Predefined Master/Slave Connection Set to activate a static I/O Connection Object already available in the device • An Explicit Messaging Connection Object already available in the device to dynamically create and set up an appropriate I/O Connection Object I/O Messages usually pass directly to the data of the assigned application object. The Assembly Object is the most common application object used with I/O Connections. Refer to Figure 14.11. 14.3.1.10 Using the CAN Identifier DeviceNet is based on the standard CAN protocol and therefore uses an 11-bit message identifier. A distinction can therefore be made between 211 = 2048 messages. Six bits is sufficient to identify a device because a DeviceNet network is limited to a maximum of 64 participants. This 6-bit Device Identifier (node address) is also called MAC ID. The overall CAN Identifier range is divided into four Message Groups of varying sizes (Figure 14.25). In DeviceNet, the CAN Identifier is the Connection ID. This is composed of the ID of the Message Group, the Message ID within this group, and the MAC ID of the device. The source or destination address is possible as the MAC ID. The definition depends on the Message Group and the Message ID. The significance of the message within the system is defined by the Message Group and Message ID. Connection ID = CAN Identifier (bits 10:0)
FIGURE 14.25
© 2005 by CRC Press
8
7
6
5
9
0 1
0
1
1
Message ID
Source MAC ID Message Group 1 Message ID Message Group 2 Message Group 3 Source MAC ID
1 1
1 1
1 1
Message ID 1 x x x
Message ID MAC ID 1 1
1 1
1
4
3
Definition of the Message Groups.
2
1
0
Used for
10
x
Message Group 4 Invalid CAN Identifiers
14-24
The Industrial Communication Technology Handbook
The four Message Groups are used as follows: • Message Group 1: Assigned 1024 CAN Identifiers (0x0000 to 0x03FF), 50% of all identifiers available. Up to 16 different Message IDs are available to the user per device (node) within this group. The priority of a message from this group is primarily determined by the Message ID (the significance of the message) and only after that by the source MAC ID (the producing device). If two devices transmit at the same time, then the device with a lower Message ID will always win the arbitration. However, if two devices transmit the same Message ID at the same time on the CAN bus, then the device with the lower node number will win. A 16-stage priority system can be set up relatively easily in this manner. The messages of Group 1 are therefore well suited for the exchange of high-priority process data. • Message Group 2: Assigned 512 identifiers (0x0400 to 0x05FF). Most of the Message IDs of this group are optionally defined for what is commonly referred to as the Predefined Master/Slave Connection Set (see Section 14.3.1.12). One Message ID is defined for network management (Section 14.3.1.6). The priority here is primarily determined by the device address, and only after that by the Message ID. If you consider the bit positions in detail, you will see that a CAN controller with an 8-bit mask is able to filter out its Group 2 Messages based on MAC ID. • Message Group 3: Has 448 CAN Identifiers (0x0600 to 0x07BF) and a structure similar to that of Message Group 1. Unlike this group, however, low-priority process data are mainly exchanged. In addition to this, the main use of this group is also the setting up of dynamic Explicit Connections. Seven Message IDs are possible per device; two of these are reserved for what is commonly referred to as the UCMM port (Section 14.3.1.11). • Message Group 4: Has 48 CAN Identifiers (0x07C0 to 0x07EF) and does not include any device addresses, but only Message IDs. The messages of this group are only used for network management. Four Message IDs are currently assigned for services of the Offline Connection Set. The other 16 CAN Identifiers (0x07F0 to 0x07FF) are invalid CAN IDs and thus not permitted in DeviceNet systems. This type of CAN Identifier issuing system means that unused Connection IDs (CAN Identifiers) cannot be used by other devices. Each device has exactly 16 Message IDs in Group 1, 8 Message IDs in Group 2, and 7 Message IDs in Group 3. One advantage of this system is that the CAN Identifiers used in the network can always be clearly assigned to a device. Devices are responsible for managing their own identifiers. This simplifies not only the design but also troubleshooting and diagnosis in DeviceNet systems. A central tool that keeps a record of all assignments on the network is not needed. 14.3.1.11 Connection Establishment As is described in Sections 14.3.1.8 and 14.3.1.9, messages in DeviceNet are always exchanged in a connection-based manner. Communication objects must be set up for this purpose. These are not initially available when a device is switched on; they first have to be created. The only port by which a DeviceNet device can be addressed when first switched on is the Unconnected Message Manager port (UCMM port) or the Group 2 Only Unconnected Explicit Message port of the Predefined Master/Slave Connection Set. Picture these ports like doors to the device. Only one particular key will fit in each lock. The appropriate key to this lock is the Connection ID, i.e., the CAN Identifier of the selected port. Other doors in the device can only be opened once the appropriate key is available and other Connection Objects are set up. The setting up of a link via the UCMM port represents a general procedure to be strived for with all DeviceNet devices. Devices that in addition to having the Predefined Master/Slave Connection Set are also UCMM capable are called Group 2 Servers. A Group 2 Server can be addressed by one or more connections from one or more clients. Since UCMM capable devices need a good amount of processing power to service multiple communication requests, a simplified communication establishment and I/O data exchange method has been created for low-end devices. This is called the Predefined Master/Slave Connection Set (see Section 14.3.1.12). This covers as many as five predefined connections that can be activated (assigned) when
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
14-25
accessing the device. The Predefined Master/Slave Connection Set represents a subset of the general connection establishment method. It is limited to pure Master/Slave relations. Slave devices that are not UCMM capable, and only support this subset, are called Group 2 Only Servers in DeviceNet speak. Only the master that allocates it can address a Group 2 Only Server. All messages received by this device are defined in Message Group 2. More details of the connection establishment using UCMM and the Master/Slave Connection Set can be found in [5] and [10]. 14.3.1.12 Predefined Master/Slave Connection Set Establishing a connection via the UCMM port requires a relatively large number of individual steps that have to be conducted to allow for data exchange via DeviceNet. The devices must provide resources to administer the dynamic connections. Because every device can set up a connection with every other device, and the source MAC ID of the devices is contained in the Connection ID, the CAN Identifier (Connection ID) may have to be filtered via software. This depends on how many connections a device supports, and the type and number of screeners (hardware CAN ID filters) of the CAN chip used in the device’s implementation. While this approach provides for taking full advantage of the multicast, peer-to-peer, and Producer/ Consumer capabilities of CAN, a simpler method that needs fewer CPU resources is needed for low-end devices. To that end, the Predefined Master/Slave Connection Set was defined. The Group 2 Only Unconnected Explicit Message port of the Predefined Master/Slave Connection Set therefore provides an interface for a set of five preconfigured connection types in a node. The basis of this model is a 1:n communication structure consisting of one control device and decentralized I/O devices. The central instance of such a system is known as the Master, and the decentralized devices are known as Slaves. Multiple masters are allowed on the network, but a slave can only be allocated to one master at any point in time. The predefined Connection Objects occupy instances 1 to 5 in the Connection Object (Class ID 0x05; see Section 14.2.4): • Explicit Messaging Connection: • Group 2 Explicit Request/Response Message (Instance ID 1) • I/O Messaging Connections: • Polled I/O Connection (Instance ID 2) • Bit-Strobe I/O Connection (Instance ID 3) • Change of State or Cyclic I/O Connection (Instance ID 4) • Multicast Polling I/O Connection (Instance ID 5) The messages to the slave are defined in Message Group 2, and some of the responses from the slave are contained in Message Group 1. The distribution of Connection IDs for the Predefined Master/Slave Connection Set is defined as shown in Figure 14.26. Because the CAN ID of most of the messages the master produces contains the destination MAC ID of the slave, it is imperative that only one master talks to any given slave. Therefore, before a master can use this Predefined Connection Set, he must first allocate it with the device. The DeviceNet Object manages this important function in the slave device. It allows only one master to allocate its Predefined Connection Set, thereby preventing duplicate CAN IDs from appearing on the wire. The two services used are called Allocate_Master/Slave_Connection_Set (Service Code 0x4B) and Release_Group_2_Identifier_Set (Service Code 0x4C). These two services always access Instance 1 of the DeviceNet object (Class ID 0x03) (Figure 14.27). Figure 14.27 shows the Allocate Message with 8-bit Class ID and 8-bit Instance ID, a format that is always used when it is sent as a Group 2 Only Unconnected Message. It may also be sent across an existing connection and in a different format if a format other than 8/8 was agreed upon during the connection establishment. The Allocation Choice Byte is used to set which predefined connections are to be allocated (Figure 14.28).
© 2005 by CRC Press
14-26
The Industrial Communication Technology Handbook
Connection ID = CAN Identifier (bits 10:0) 10
9
8
7
6
5
Group 1 Message ID
0
4
3
2
Used for 1
0
Source MAC ID
Group 1 Messages
0
1
1
0
0
Source MAC ID
Slave’s I/O Multicast Poll Response
0
1
1
0
1
Source MAC ID
Slave’s I/O Change of State or Cyclic Message
0
1
1
1
0
Source MAC ID
Slave’s I/O Bit-Strobe Response Message
0
1
1
1
1
Source MAC ID
Slave’s I/O Poll Response or COS/Cyclic Ack Message
1
0
MAC ID
1
0
Source MAC ID
0
0
0
Master’s I/O Bit-Strobe Command Message
1
0
Source MAC ID
0
0
1
Master’s I/O Multicast Poll Group ID
1
0
Destination MAC ID
0
1
0
Master’s Change of State or Cyclic Acknowledge Message
1
0
Source MAC ID
0
1
1
Slave’s Explicit/Unconnected Response Messages
1
0
Destination MAC ID
1
0
0
Master’s Explicit Request Messages
1
0
Destination MAC ID
1
0
1
Master’s I/O Poll Command/COS/Cyclic Message
1
0
Destination MAC ID
1
1
0
Group 2 Only Unconnected Explicit Request Messages
1
0
Destination MAC ID
1
1
1
Duplicate MAC ID Check Messages
FIGURE 14.26
Group 2 Group 2 Messages Message ID
Connection IDs of the Predefined Master/Slave Connection Set. Bit number Byte offset
7
6
0
Frag [0]
XID
1
R/R [0]
5
4
3
2
1
0 Message header
MAC ID Service Code [0x4B] Class ID [0x03] Allocation Choice
0
FIGURE 14.27
Message body
Instance ID [0x01]
2…5 0
Allocator’s MAC ID
Allocate_Master/Slave_Connect_Set Request Message. Bit number
7
6
5
4
3
2
1
0
Reserved
Ack Suppression
Cyclic
Change of State
Multicast Polling
Bit-Strobe
Polled
Explicit Message
FIGURE 14.28
Format of the Allocation Choice Byte.
The associated connections are activated by setting the appropriate bits. Change of State and Cyclic Connections are mutually exclusive choices. The Change of State/Cyclic Connection may be configured as not acknowledged using acknowledge suppression. The individual connection types are described in more detail below. The allocator’s MAC ID contains the address of the node (master) that wants to assign the Predefined Master/Slave Connection Set. Byte 0 of this message differs from the allocator’s MAC ID if this service has been passed on to a Group 2 Only Server via a Group 2 Only Client (what is commonly referred to as a proxy function).
© 2005 by CRC Press
14-27
The CIP Family of Fieldbus Protocols
Link Producer Object Application Object Link Consumer Object
Link Producer Object
Link Consumer Object
Link Producer Object
Master MAC ID = 0
FIGURE 14.29
Link Consumer Object
Poll Command Message
CID = 0x041D Poll Response Message
CID = 0 x03C3
Poll Command Message
CID = 0 x042D Poll Response Message
CID = 0 x03C5
Poll Command Message
CID = 0 x0455 Poll Response Message
CID = 0x03CA CID = Connection ID
Link Consumer Object Application Object Link Producer Object
Slave MAC ID = 3
Link Consumer Object Application Object Link Producer Object
Slave MAC ID = 5
Link Consumer Object Application Object Link Producer Object
Slave MAC ID = 10
Polled I/O Connections.
The slave, if not already claimed, responds with a Success Message. The connections are now in configuring status. Setting the Expected_Packet_Rate (EPR) (Set_Attribute_Single service to attribute 9 in the appropriate Connection Object, value in ms) starts the connection’s time-monitoring function. The connection then changes into established state and I/O Messages begin transferring via this connection. The allocated connections can be released individually or collectively through the Release_Group_2_Identifier_Set service (Service Code 0x4C), using the same format as in Figure 14.27, except that the last byte (allocator’s MAC ID) is omitted. The following is an explanation of the four I/O Connection types in the Predefined Master/Slave Connection Set. 14.3.1.12.1 Polled I/O Connection A Polled I/O Connection is used to implement a classic Master/Slave relationship between a control unit and a device. In this setup, a master can transfer data to a slave using the poll request and receive data from the slave using the poll response. Figure 14.29 shows the exchange of data between one master and three slaves in the Polled I/O mode. In a message between a master and a slave using the Polled I/O Connection, the amount of data transferred via this message can be any length. If the length exceeds 8 bytes, the fragmentation protocol is automatically used. A Polled I/O Connection is always a point-to-point connection between a master and a slave. The slave consumes the Poll Message and sends back an appropriate response, normally its input data. The Polled Connection is subject to a time-monitoring function (that can be adjusted) in the device. A Poll Command must have been received within this time (4 × EPR); otherwise, the connection changes over into time-out mode. When a connection times out, the node may optionally go to a preconfigured fault state as set up by the user. A master usually polls all the slaves in a round-robin manner.
© 2005 by CRC Press
14-28
7
The Industrial Communication Technology Handbook
Bit Numbers Byte 0
MAC ID 7
FIGURE 14.30
0 7
Bit Numbers
0
7
Bit Numbers
...
Byte 1
MAC ID 0 MAC ID 15 MAC ID 8
Byte 6
0 7
Bit Numbers
0
Byte 7
MAC ID 55 MAC ID 48 MAC ID 63 MAC ID 56
Data format of the Bit-Strobe I/O Connection.
A slave’s response time to a poll command is not defined in the DeviceNet Specification. This provides much flexibility for slave devices to be designed appropriate to their primary applications, but it may also exclude the device from use in higher-speed applications. 14.3.1.12.2 Bit Strobe I/O Connection The master’s transmission on this I/O Connection is also known as a Bit-Strobe Command. Using this command, a master multicasts one message to reach all its slaves allocated for the Bit-Strobe Connection. The frame sent by the master using a Bit-Strobe Command is always 8 or 0 bytes (if Idle). From these 8 bytes, each slave is assigned one bit (Figure 14.30). Each slave can send back as many as 8 data bytes in its response. A Bit-Strobe I/O Connection represents a multicast connection between one master and any number of strobe-allocated slaves (Figure 14.31). Since all devices in a network receive the Bit-Strobe Command at the same time, they can be synchronized by this command. When the Bit-Strobe Command is received, the slave may consume its associated bit and then send a response of up to 8 bytes.
Link Producer Object Application Object Link Consumer Object
Bit-Strobe Command Message
CID = 0 x0400 Bit-Strobe Response Message
CID = 0 x0383
Link Consumer Object Application Object Link Producer Object
Slave MAC ID = 3
Link Consumer Object
Link Consumer Object
Bit-Strobe Response Message
CID = 0 x0385
Application Object Link Producer Object
Slave MAC ID = 5
Link Consumer Object
Master MAC ID = 0
FIGURE 14.31
© 2005 by CRC Press
Link Consumer Object
Bit-Strobe I/O Connections.
Bit-Strobe Response Message
CID = 0 x038A CID = Connection ID
Application Object Link Producer Object
Slave MAC ID = 10
14-29
The CIP Family of Fieldbus Protocols
Link Producer Object Application Object Link Consumer Object
Link Consumer Object
Link Producer Object
Link Producer Object
Master MAC ID = 0
FIGURE 14.32
Link Consumer Object
Master COS/Cyclic Message CID = 0x041D Slave Acknowledge Message CID = 0x03C3 Slave COS/Cyclic Message CID = 0x0345 Master Acknowledge Message CID = 0x042A Master COS/Cyclic Message CID = 0x0455 Slave COS/Cyclic Message CID = 0x034A CID = Connection ID
Link Consumer Object Application Object Link Producer Object
Slave MAC ID = 3
Link Producer Object
Application Object
Link Consumer Object
Ack Handler Object
Slave MAC ID = 5
Link Consumer Object
Application Object
Link Producer Object
Application Object
Slave MAC ID = 10
COS/Cyclic I/O Connections.
Since this command uses the source MAC ID in the Connection ID (Figure 14.26), devices that support the Bit-Strobe I/O Connection and have a CAN chip with screening limited to only 8 bits of the CAN ID (11 bits) must perform software screening of the CAN Identifier. 14.3.1.12.3 Change of State/Cyclic I/O Connection The COS/Cyclic I/O Connection differs from the other types of I/O Connections in that both endpoints produce their data independently. This can be done in a change-of-state or cyclic manner. In the former case, the COS I/O Connection recognizes that the application object data that the Produced_Connection_ Path is indicating have changed. In the latter case, a timer of the Cyclic I/O Connection expires and therefore triggers the message transfer of the latest data from the application object. A COS/Cyclic I/O Connection can be set up as acknowledged or unacknowledged. When acknowledged, the consuming side of the connection must set up a defined path to the Acknowledge Handler Object to ensure that the retries, if needed, are properly managed. Figure 14.32 shows the various COS/Cyclic I/O Connection possibilities. A COS/Cyclic I/O Connection can also originate from a master. This connection then seems like a Polled I/O Connection to the slave. This can be seen in Figure 14.26 since the same Connection ID is issued for the master’s Polled I/O Message as is issued for the master’s COS/Cyclic I/O Message. COS Connections have two additional behaviors. The Expected Packet Rate (EPR) is used as a default production trigger so that if the data have not changed after the EPR timer has expired, the data will be re-sent as a heartbeat. This is so the consuming node can know the difference between a dead node and one whose data has not changed. COS Connections also have a Production Inhibit timer to prevent a chattering node from using too much bandwidth.
© 2005 by CRC Press
14-30
The Industrial Communication Technology Handbook
14.3.1.12.4 Multicast Polled I/O Connection This connection is similar to the regular I/O poll except that all of the slaves belonging to a multicast group consume the same output data from the master. Each slave responds with its own reply data. A unique aspect of this connection is that the master picks the CAN ID from one of the slaves in the multicast group and must then set to that same value the consumed CAN ID in each of the other slaves. If during runtime that slave’s connection times out, the master must either stop producing its multicast poll command or pick another slave in the group and reset the command CAN ID in all the remaining slaves in the group to that value before sending another Multicast Poll Command. 14.3.1.12.5 I/O Data Sharing Due to the inherent broadcast nature of all CAN frames, applications can be set up to listen to the data produced by other applications. Such a listen-only mode is not described in the DeviceNet Specification, but some vendors have created products that do exactly that, e.g., Shared Inputs in Allen-Bradley scanners. 14.3.1.12.6 Typical Master/Slave Start Sequence A typical start-up of a DeviceNet network with a scanner and a set of slaves is executed as follows: • All devices run their self-test sequence and then try to go online with the algorithm described in Section 14.3.1.6. Any device that uses an autobaud mechanism to detect the baud rate of a network will have to wait with its Duplicate Node ID Message until it has seen enough CAN frames to detect the correct baud rate. • Once online, slave devices will do nothing until their master allocates them. • Once online, a master will try to allocate each slave configured into its scan list by running the following sequence of messages: • Try to open a connection to the slave using a UCMM Open Message. • If successful, the master can then use this connection for further communication with the slave. • If not successful, the master will try again after a minimum wait time of 1 s. • If unsuccessful again, the master will try to allocate the slave using the Group 2 Only Unconnected Explicit Request Message (at least for Explicit Messaging). • If successful, the master can then use this connection for further communication with the slave. • If not successful, the master will try again after a minimum wait time of 1 s. • If unsuccessful again, the master will start all over again with the UCMM Message. This process will carry on indefinitely or until the master has allocated the slave. • Once the master has allocated the slave, it may carry out some verification to see whether it is safe to start I/O Messaging with the slave. The master may also apply some further configuration to the connections it has established, e.g., setting the Explicit Messaging Connection to “Deferred Delete.” • Setting the EPR value(s) brings the I/O Connection(s) to the Established State so that I/O Messaging can commence. 14.3.1.12.7 Master/Slave Summary The task of supporting the Predefined Master/Slave Connection Set represents a solution that can be easily implemented for the device manufacturer. Simple BasicCAN controllers may be used; software screening of the CAN Identifier is generally not necessary, enabling the use of low-cost 8-bit controllers. This may represent an advantage as far as the devices are concerned but entails disadvantages for the system design. Group 2 Only (i.e., UCMM incapable) devices permit only one Explicit Connection between client (master) and server (slave), whereas UCMM capable devices can maintain Explicit Messaging Connections with more than one client at the same time. If a device wants to communicate with one of the allocated slaves that do not support UCMM, the master recognizes this situation and sets up a communication link with the requestor instead. Any communication between the requestor is then automatically routed via the master. This is called the proxy function. Since this puts an additional burden on the master and on network bandwidth, it is recommended that slave devices support UCMM.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
14-31
Although not explicitly defined in the DeviceNet Specification, DeviceNet masters can, under certain conditions, automatically configure their scan lists or the devices contained in their scan lists. This functionality simply makes use of the messaging capabilities of masters and slaves that allow the master to read from a slave whatever information is required to start an I/O communication and to download any configurable parameter that has been communicated to the master via EDS. This functionality facilitates the replacement of even complex slave devices without the need for a tool, reducing downtime of a system dramatically. 14.3.1.13 Device Profiles DeviceNet uses the full set of profiles described in Chapter 6 of the CIP Specification [4]. 14.3.1.14 Configuration EDSs of DeviceNet devices can make full use of all EDS features, but they do not necessarily contain all sections. Typical DeviceNet devices contain (apart from the mandatory sections) at least an IO_Info section. This section specifies which types of Master/Slave connections are supported and which one(s) should be enabled as default. It also declares which I/O Connections may be used in parallel. Chapter 7 of the DeviceNet Specification [5] gives full details of this section of a DeviceNet EDS. A full description of what can be done in DeviceNet EDSs would go well beyond the scope of this handbook, so references [25] and [26] are recommended for further reading. 14.3.1.15 Conformance Test At an early stage, the ODVA defined test and approval procedures for DeviceNet devices and systems. Manufacturers are given the opportunity to have their devices checked for conformance with the DeviceNet Specification in one of several independent DeviceNet conformance test centers. Only then do two key characteristics of all DeviceNet devices become possible: interoperability and interchangeability. Interoperability means that DeviceNet devices from all manufacturers can be configured to operate with each other on the network. Interchangeability goes one step farther by providing the means for devices of the same type (i.e., they comply with the same Device Profile) to be logical replacements for each other, regardless of the manufacturer. The conformance test checks both of these characteristics. This test is divided into three parts: • A software test to verify the function of the DeviceNet protocol. Depending on the complexity of the device, as many as several thousand messages are transmitted to the device under test (DUT). To allow a test that is closely adapted to the characteristics of the DUT, a formal description of all relevant features of the DUT must be provided by the manufacturer. • A hardware test to check conformance with the characteristics of the physical layer. This test checks all requirements of the specification like miswiring protection, overvoltage withstand, grounding, CAN transceiver, etc. The test may be destructive for noncompliant devices. • A system interoperability test that verifies that the device can function in a network with more than 60 nodes and a variety of scanners from various manufacturers. The software test is available from ODVA. It is a Windows-based tool, running on various PC CAN interface cards from a number of suppliers. It is recommended that device developers run this test in their own lab before taking devices to the official ODVA test. The hardware test and the system interoperability test involve more complex test setups that are typically not available to device developers. When a device passes the test, it is said to be DeviceNet CONFORMANCE TESTED®.* Many DeviceNet users now demand this seal. A device that has not been tested accordingly has a significant market disadvantage. Devices that have passed conformance testing are published on the ODVA Web site.
*DeviceNet CONFORMANCE TESTED® is a registered certification mark of ODVA.
© 2005 by CRC Press
14-32
The Industrial Communication Technology Handbook
14.3.1.16 Tools Tools for DeviceNet networks can be divided into three groups: • Physical layer tools: Tools (hardware and/or software) that verify the integrity and conformance of the physical layer or monitor the quality of the data transmission. • Configuration tools: Software tools that are capable of communicating with individual devices for data monitoring and configuration purposes. They can range from very basic software operating from handheld devices to powerful PC-based software packages to configure complete networks. Most configuration tools are EDS-based; however, more complex devices like scanners tend to have their own configuration applets that are only partially based on EDSs. Some of these tools support multiple access paths to the network, e.g., via Ethernet and suitable bridging devices, and thus allow remote access. High-level tools also actively query the devices on the network to identify them and monitor their health. • Monitoring tools: Typically PC-based software packages that can capture and display the CAN frames on the network. A raw CAN frame display may be good enough for some experts, but it is recommended that a tool that allows both raw CAN display and DeviceNet interpretation of the frames be used. For a typical installation, a configuration tool is all that is needed. However, to ensure the network is operating reliably, a check with a physical layer tool is highly recommended. Experience shows that the overwhelming majority of DeviceNet network problems are caused by inappropriate physical layer installation. Protocol monitoring tools are mainly used to investigate interoperability problems and to assist during the development process. Turn to the DeviceNet product catalog on the ODVA Web site to access a list of vendors that provide tools for DeviceNet. 14.3.1.17 Advice for Developers Before any development of a DeviceNet product is started, the following issues should be considered in detail: • What functionality does the product require today and in future applications? • Slave functionality • Master functionality • Peer-to-peer messaging • Combination of the above • What are the physical layer requirements? Is IP 65/67 required or is IP 20 good enough? • What type of hardware should be chosen for this product? • What kind of firmware should be used for this product? Will a commercially available communication stack be used? • Will the development of hardware and software be done internally or will it be designed by an outside company? • What kind of configuration software should be used for this product? Will a commercially available software package be used; i.e., is an EDS adequate to describe the device or is custom software needed? • What are the configuration requirements? • Will the product be tested for conformance and interoperability (highly recommended)? • What design and verification tools should be used? • What is an absolute must before the products can be placed on the market (own the specification, have a Vendor ID)? A full discussion of these issues goes well beyond the scope of this book; see [27] for further reading.
© 2005 by CRC Press
14-33
The CIP Family of Fieldbus Protocols
Layers according to ISO/OSI User Device Profiles
Semiconductor
Pneumatic valves
Position controller
AC Drives
Other profiles
CIP Application Layer Application Object Library
Application
CIP CIP Data Management Services
(Presentation)
CIP Message Routing, Connection Management (Session)
Transport
Explicit Messages, I/O Messages ControlNet Transport
DeviceNet Transport
Encapsulation TCP
UDP
Network IP Data Link
ControlNet CTDMA
CAN CSMA/NBA
EtherNet CSMA/CD
Physical
ControlNet Phys. Layer
DeviceNet Phys. Layer
EtherNet Physical Layer
FIGURE 14.33
Possible future Alternatives: ATM, USB, Fire Wire,...
Relationship between CIP and ControlNet.
14.3.1.18 DeviceNet Overall Summary Since its introduction in 1994, DeviceNet has been used successfully in millions of nodes in many different applications. It is a de facto standard in many countries, and this situation is reflected in several national and international standards [16], [17], [18]. Due to its universal communication characteristics, it is one of the most versatile fieldbuses for low-end devices. While optimized for devices with small amounts of I/O, it can easily accommodate larger devices as well. Powerful EDS-based configuration tools allow easy commissioning and configuration of even complex devices without the need to consult manuals. While most applications are of the Master/Slave type, peer-to-peer communication is used in a rising number of applications, greatly simplifying the design, operation, and maintenance of these networks. With the introduction of CIP Safety on DeviceNet, many machine-level applications that today need a set of dedicated networks will soon be accommodated in only one DeviceNet network. Finally, its use of CIP and object structure allows the blending of DeviceNet networks into an overall CIP network structure that permits seamless communication, just as if it was only one network.
14.3.2 ControlNet ControlNet is based on a physical layer and a bus access mechanism that was specifically developed for this network to provide absolute determinism. All other features are based on CIP. Figure 14.33 shows the relationship between CIP, ControlNet, and the ISO/OSI layer model. 14.3.2.1 Physical Layer and Frame Structure The physical layer of ControlNet has specifically been designed for this network; it does not reuse any existing open technology. The basis of the physical layer is a 75-Ω coax trunk line cable (typically of the RG6 type) terminated at both ends. To reduce impedance mismatch, all ControlNet devices are connected to the network through special taps that consist of a coupling network and a specific length of dropline (1 m). There is no minimum distance requirement between any two of these taps, but since every tap introduces some signal attenuation, each tap reduces the maximum length of the trunkline by 16.3 m.
© 2005 by CRC Press
14-34
The Industrial Communication Technology Handbook
maximum allowable segment length =
Segment length (m)
1000
1000 m–16.3 m [number of taps–2]
750
500
250
2
16
32
48
Number of taps FIGURE 14.34
Coax medium topology limits.
This results in a full-length trunkline of 1000 m with only two taps at the ends, while a fully populated physical network with 48 taps allows a trunkline length of 250 m (Figure 14.34). This physical layer limitation was taken into account from the very beginning by including repeaters into the design that can increase the network size without lowering the speed. Therefore, if a network is to be built with a higher number of nodes (up to 99 nodes are possible) or with a topology that goes beyond the single trunkline limitations, repeaters can be used to create any tree topology or even a ring topology using a special type of repeater. There are also repeaters for fiber-optic media that can be used either to increase the system size even further or to allow very good isolation of network segments in harsh EMC (Electromagnetic Compatibility) environments or for high-voltage applications. The number of repeaters in series between any two nodes used to be limited to five until recently. Better repeater technology now allows up to 20 repeaters in series. However, whatever media technology is used, the overall length of a ControlNet system (distance between any two nodes on the network) is limited. This fundamental limit is due to propagation delay. With currently available media, this translates into approximately 20 km. To better accommodate industry requirements, ControlNet supports redundant media, allowing bumpless transfer from primary to secondary media or vice versa if one of them should fail or deteriorate. Developers are encouraged to support this redundant media feature in their design. For cost-sensitive applications, less expensive device variants may then be created by populating one channel only. Another feature often used in the process industry is the capability to run ControlNet systems into areas with an explosion hazard. The system is fully approved to meet worldwide standards for intrinsic safety (explosion protection). The connectors used for copper media are of the BNC type; TNC type connectors have been introduced recently for applications that require IP 67 protection. Devices may also implement a Network Access Port (NAP). This feature takes advantage of the repeater function of the ControlNet application-specific integrated circuits (ASICs). It uses an additional connector (RJ45) with RS 422-based signals that provides easy access to any node on the network for configuration devices. The signal transmitted on the copper media is a 5 Mbit/s Manchester encoded signal with an amplitude of up to 9.5 V (pk-pk) at the transmitter that can be attenuated down to 510 mV (pk-pk) at the receiving end. The transmitting and receiving circuits, coupled to the cable through transformers, are described in full detail in the ControlNet Specification [3].
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
14-35
14.3.2.2 Protocol Adaptation ControlNet can use all features of CIP. The ControlNet frame is big enough that fragmentation is rarely required. Since ControlNet is not expected to be used in very simple devices, there is no scaling. 14.3.2.3 Indicators and Switches ControlNet devices must be built with device status and network status indicators, as described in the specification. Devices may have additional indicators, which must not carry any of the names of those described in the specification. Devices may be built with or without switches or other directly accessible means for configuration. If switches for the MAC ID exist, then certain rules apply about how these values have to be used at powerup and during the operation of the device. 14.3.2.4 Additional Objects The ControlNet Specification defines three additional objects, the ControlNet Object (Class Code 0xF0), the Keeper Object (Class Code 0xF1), and the Scheduling Object (Class Code 0xF2). 14.3.2.4.1 ControlNet Object The ControlNet Object contains a host of information on the state of the ControlNet link of the device, among them diagnostic counters, data link and timing parameters, and the MAC ID. A ControlNet Object is required for every physical layer attachment of the device. A redundant channel pair counts as one attachment. 14.3.2.4.2 Keeper Object The Keeper Object (not required for every device) holds (for link scheduling software) a copy of the Connection Originator schedule data for all Connection Originator devices using a network. Every ControlNet network with scheduled I/O traffic must have at least one device with a Keeper Object (typically a Programmable Logic Controller (PLC) or another Connection Originator). If there are multiple Keeper Objects on a link, they perform negotiations to determine which Keeper is the Master Keeper and which Keeper(s) performs Backup Keeper responsibilities. The Master Keeper is the Keeper actively distributing attributes to the nodes on the network. A Backup Keeper is one that monitors Keeperrelated network activity and can transition into the role of Master Keeper should the original Master Keeper become inoperable. 14.3.2.4.3 Scheduling Object The Scheduling Object is required in every device that can originate an I/O Messaging Connection. Whenever a link scheduling tool accesses a Connection Originator on a ControlNet link, an instance of the Scheduling Object is created and a set of object-specific services is used to interface with this object. Once the instance is created, the link scheduling tool can then read and write connection data for all connections to originate from this device. After having read all connection data from all Connection Originators, the link scheduling tool can calculate an overall schedule for the ControlNet link and write this data back to all Connection Originators. The scheduling session is ended by deleting the instance of the Scheduling Object. 14.3.2.5 Network Access The bus access mechanism of ControlNet allows full determinism and repeatability while still maintaining sufficient flexibility for various I/O Message triggers and Explicit Messaging. This bus access mechanism is called Concurrent Time Domain Multiple Access (CTDMA); it is illustrated in Figure 14.35. The time axis is divided into equal intervals called Network Update Time (NUT). Within each NUT there is a subdivision into a Scheduled Service Time, an Unscheduled Service Time, and the Guardband. Figure 14.36 shows the function of the Scheduled Service. Every node up to and including the SMAX node (maximum node number participating in the Scheduled Service) has a chance to send a message
© 2005 by CRC Press
14-36
The Industrial Communication Technology Handbook
FIGURE 14.35
Media access through CTDMA.
FIGURE 14.36
Scheduled Service.
within the Scheduled Service. If a particular node has no data to send, it will nevertheless send a short frame to indicate that it is still alive. If a node fails to send its frame, the next-higher node number will step in after a very short, predetermined waiting time. This makes sure that the failure of a node will not lead to an interruption of the NUT cycle. Figure 14.37 shows the function of the Unscheduled Service. Since this service is designed for nontime-critical messages, only one node is guaranteed to get access to the bus during the Unscheduled Service Time. If there is time left, then the other nodes (with higher node numbers) will also get a chance to send. As with the Scheduled Service Time, if a node fails to send when it is its turn, the next node will step in. The node number that is allowed to send first within the Unscheduled Service Time is increased by 1 in each NUT. This guarantees an equal chance to all nodes. All node sequencing in this interval wraps; UMAX is followed by the lowest node number (typically 1) on the network. Those two service intervals combined with the Guardband guarantee determinism and repeatability while still maintaining sufficient freedom to allow for unscheduled message transmissions, e.g., for parameterization.
© 2005 by CRC Press
14-37
The CIP Family of Fieldbus Protocols
FIGURE 14.37
Unscheduled Service.
MAC Frame Preamble 16 bits
Start Delimiter 8 bits
Source MAC ID 8 bits
Lpacket FIGURE 14.38
Lpacket
Lpackets
CRC
0...510 bytes
16 bits
•••
End Delimiter 8 bits
Lpacket
MAC frame format.
14.3.2.6 Frame Description Every frame transmitted on ControlNet has the format of the MAC frame (Figure 14.38). Within every MAC frame, there is a field of up to 510 bytes that is available for the transmission of data or messages. This field may be populated with one or several Lpackets (link packets). These Lpackets carry the individual messages (I/O or Explicit) of CIP. There are also some specialized Lpackets used for network management. Since every node always listens to all MAC frames, they have no problem consuming any of the Lpackets in the frame that might be unicast, multicast, or broadcast in nature. This feature allows fine-tuned multicasting of small amounts of data to different sets of consumers without too much overhead. There are two types of Lpacket formats: fixed tag and generic tag. The fixed tag Lpackets are used for Unconnected Messaging and network administration, while the generic tag Lpackets are used for all Connected Messaging (I/O and Explicit). Figure 14.39 shows the format of a fixed tag Lpacket. By including the destination MAC ID, this format reflects the fact that these Lpackets are always directed from the requesting device (sending the MAC frame) to the target device (the destination MAC ID). The service byte within the fixed tag Lpacket does
© 2005 by CRC Press
14-38
The Industrial Communication Technology Handbook
Lpacket
Size
Control
Service
Destination MAC ID
Link data
1 byte
1 byte
1 byte
1 byte
3...506 bytes
FIGURE 14.39
Fixed tag Lpacket format.
Lpacket
Size
Control
Connection ID
Link data
1 byte
1 byte
3 bytes
0...504 bytes
FIGURE 14.40
Generic tag Lpacket format.
not represent the service of an Explicit Message, but a more general service type since the fixed tag Lpacket format can be used for a variety of actions such as network administration. Figure 14.40 shows the format of a generic tag Lpacket. The size byte specifies the number of words within the Lpacket; the control byte gives information on what type of Lpacket this is. The 3-byte Connection Identifier specifies which connection this Lpacket belongs to. These three bytes are the three lower bytes of the 4-byte Connection ID specified in the Forward_Open Message. The ControlNet Specification gives full details on how to assemble the three lower bytes of the Connection ID; the uppermost byte is always zero. For a device that receives the MAC frame, the Connection ID is the indication whether to ignore the Lpacket (the device is not part of the connection), to consume the data and forward it to the application (the device is an endpoint of this connection), or to forward the data to another network (the device acts as a bridge in a bridged connection). 14.3.2.7 Network Start-Up After power-up, every ControlNet device goes through a process of getting access to the ControlNet communication link and learning the current NUT and other timing requirements. This is a fairly complex process typically handled by the commercially available ControlNet ASICs. It would go beyond the scope of this handbook to describe all the details here. 14.3.2.8 Explicit Messaging Unlike DeviceNet, Explicit Messages on ControlNet can be sent connected or unconnected; both are typically transmitted within the unscheduled part of the NUT. Connected Explicit Messaging requires setting up a connection first (see Section 14.3.2.10). This, of course, means that all resources required for the management of the connection must stay reserved for this purpose as long as the connection exists. To avoid tying up these resources, most Explicit Messages can also be sent unconnected. Every part of an Explicit Message (request, response, acknowledgments) is wrapped into an Lpacket using the fixed tag Lpacket format for Unconnected Messaging (Figure 14.39) and the generic tag Lpacket format for Connected Messaging (Figure 14.40). The service/class/instance/attribute fields (see Section 14.2.3) of the Explicit Message are contained in the link data field.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
FIGURE 14.41
14-39
Device levels.
14.3.2.9 I/O Messaging ControlNet I/O Messaging, like any other CIP I/O Messaging, is done across connections and it always takes place in the scheduled part of NUT. Only one MAC frame may be transmitted by any device within its time slot, but this MAC frame may contain multiple Lpackets so that data can be sent to multiple nodes in one NUT. The individual Lpackets may be consumed by one node only or by multiple nodes if they are set up to consume the same data. I/O Messages use the generic tag Lpacket format (Figure 14.40). The link data field contains the I/O data prepended with a 16-bit sequence count number for the packet. I/O data transmission without the Sequence Count Number is possible in principle, but is not used today. Run/Idle can be indicated within a prepended Run/Idle header or by sending the data packet (Run) or no data packet (Idle). Which of the two methods that is used is indicated in the connection parameters in the Connection Manager section of the EDS. However, only the Run/Idle header method has been in use for ControlNet up to now. 14.3.2.10 Connection Establishment All connections on ControlNet are established using a UCMM Forward_Open message (see Section 14.2.3); therefore, all devices must support the UCMM function. 14.3.2.11 Device Levels While not official categories, it is useful to distinguish among several levels of devices (Figure 14.41); you only have to implement the functionality you need. The minimal device function (level 1) is that of a Messaging Server. It is used for Explicit Messaging applications only and acts as a target for Connected and Unconnected Explicit Messages, e.g., for program upload/download, data collection, status monitoring, etc. The next class of device (level 2) is an I/O Server. It adds I/O Messaging support to a level 1 device and acts as a target for both Explicit and I/O Messages, e.g., simple I/O devices, pneumatic valves, AC drives, etc. These devices are also called adapters. Another class of devices is a Messaging Client (level 3). It adds client support to level 1 Explicit Messaging applications and acts as a target and an originator for messaging applications, e.g., computer interface cards or Human-Machine Interface (HMI) devices. Finally, the most powerful class of device is a scanner (level 4). It adds I/O Message origination support to levels 1, 2, and 3 and acts as a target and an originator for Explicit and I/O Messages, e.g., PLCs, I/O scanners, etc. 14.3.2.12 Device Profiles ControlNet uses the full set of profiles described in Chapter 6 of the CIP Specification [4]. 14.3.2.13 Configuration ControlNet devices typically come with EDSs, as described in Section 14.2.7. For EDS-based configuration tools, the EDS should contain a Connection Manager section to describe the details of the connections that can be made into the device. This section basically is a mirror of what is contained in the
© 2005 by CRC Press
14-40
The Industrial Communication Technology Handbook
Forward_Open message that a Connection Originator would send to the device. Multiple connections can be specified within an EDS, then one or more can be chosen by the configuration tool. An EDS may also contain individual parameters or a Configuration Assembly with a complete description of all parameters within this Assembly. In many applications, the Configuration Assembly is transmitted as an attachment to the Forward_Open Message. 14.3.2.14 Conformance Test ControlNet International has defined a conformance test for ControlNet devices. Currently, this test is a protocol conformance test only since it is expected that most implementations use the commercially available components for transformers and drivers. As many as several thousand messages are transmitted to the DUT, depending on the complexity of the device. To allow a test that is closely adapted to the characteristics of the DUT, a formal description of all relevant features of the DUT must be provided by the manufacturer. The software test is available from ControlNet International. It is a Windows-based tool, running on a PC interface card through a NAP connection (see Section 14.3.2.1). It is recommended that device developers run this test in their own labs before taking devices to the official ControlNet International test. When a device passes the test, it is said to be ControlNet CONFORMANCE TESTED™.* Many ControlNet users now demand this seal. A device that has not been tested accordingly has a significant market disadvantage. Devices that have passed conformance testing are published on the ControlNet International Web site. 14.3.2.15 Tools Tools for ControlNet networks can be divided into three groups: • Physical layer tools: Tools (hardware and software) that verify the integrity and conformance of the physical layer or monitor the quality of the data transmission. • Configuration tools: Software tools that are capable of communicating with individual devices for data monitoring and configuration purposes. Most configuration tools are EDS-based; however, more complex devices like scanners tend to have their own configuration applets that are only partially based on EDSs. Some of these tools support multiple access paths to the network, e.g., via Ethernet and suitable bridging devices, and thus allow remote access. High-level tools also actively query the devices on the network to identify them and monitor their health. Configuration tools may also be integrated into other packages like PLC programming software. • Monitoring tools: Typically PC-based software packages that can capture and display the ControlNet frames on the network. A raw ControlNet frame display may be good enough for some experts, but it is recommended that a tool that allows both raw ControlNet frame display and interpreted frames be used. For a typical installation, a configuration tool is all that is needed. However, to ensure that the network is operating reliably, a check with a physical layer tool is highly recommended. Experience shows that the overwhelming majority of ControlNet network problems are caused by inappropriate physical layer installation. Protocol monitoring tools are mainly used to investigate interoperability problems and to assist during the development process. Turn to the ControlNet product catalog on the ControlNet International Web site to access a list of vendors that provide tools for ControlNet. 14.3.2.16 Advice for Developers Before any development of a ControlNet product is started, the following issues should be considered in detail:
*ControlNet CONFORMANCE TESTED™ is a certification mark of ControlNet International.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
14-41
• What functionality (device class; see Section 14.3.2.11) does the product require today and in future applications? • Messaging server only • Adapter functionality • Messaging client • Scanner functionality • What are the physical layer requirements? Is IP 65/67 required or is IP 20 good enough? • Will the development be based on commercially available hardware components and software packages (recommended) or designed from scratch (possible but costly)? • What are the configuration requirements? • Will the product be tested for conformance (highly recommended)? • What design and verification tools should be used? • What is an absolute must before products can be placed on the market (own the specification, have a Vendor ID)? ControlNet chip sets and associated software packages are available from Rockwell Automation and through ControlNet International. Turn to the ControlNet International Web site for a list of companies that can support ControlNet developments. 14.3.2.17 ControlNet Overall Summary Since its introduction in 1997, ControlNet has been used successfully in hundreds of thousands of nodes in many different applications. It is the network of choice for many high-speed I/O and PLC interlocking applications. Like DeviceNet, ControlNet has been turned into an international standard [19]. Due to its universal communication characteristics, it is one of the most powerful controller-level fieldbuses. The specific strength of ControlNet is its full determinism and repeatability, which make it ideally suited for many high-speed applications while maintaining full Explicit Messaging capabilities without compromising its real-time behavior. Finally, its use of CIP and object structure allows the blending of ControlNet networks into an overall CIP network structure that permits seamless communication, just as if it was only one network.
14.3.3 EtherNet/IP EtherNet/IP is the newest member of the CIP family; it is a technology supported by both ODVA and ControlNet International. EtherNet/IP has evolved from ControlNet and is therefore very similar to ControlNet in the way the CIP Specification is applied. Due to the length of the Ethernet frames and the typical multimaster structure of Ethernet networks, there are no particular limitations in the EtherNet/IP implementation of CIP. Basically all that is required is a mechanism to encode CIP Messages into Ethernet frames. Figure 14.42 shows that there is an encapsulation mechanism (see Section 14.3.3.6) that specifies how I/O and Explicit Messages are wrapped into Ethernet frames. The well-known TCP/IP protocol is used for the encapsulation of Explicit Messages, while UDP/IP is used for the encapsulation of I/O Messages. The use of the very popular TCP/IP and UDP/IP stacks for encapsulation means that many applications will not require extra middleware for this purpose, since these stacks are already in use in many applications anyway. Even with the use of certain infrastructure devices (see Section 14.3.3.16) it is difficult to make today’s Ethernet fully deterministic. Therefore, many CIP users may prefer ControlNet for applications that require full determinism and repeatability. However, future extensions to CIP such as CIP Sync (see Section 14.5.1) will allow EtherNet/IP to be used in highly synchronous and deterministic applications like coordinated drives. 14.3.3.1 Physical Layer Adaptation Since EtherNet/IP is taking the Ethernet protocol to the factory floor, there are some restrictions and further requirements on the physical layer [12] that is to carry EtherNet/IP in a typical factory automation
© 2005 by CRC Press
14-42
The Industrial Communication Technology Handbook
Layers according to ISO/OSI User Device Profiles
Semiconductor
Pneumatic valves
Position controller
AC Drives
Other profiles
CIP Application Layer Application Object Library
Application
CIP CIP Data Management Services
(Presentation)
CIP Message Routing, Connection Management (Session)
Transport
Explicit Messages, I/O Messages ControlNet Transport
DeviceNet Transport
Encapsulation TCP
UDP
Network IP DataLink
ControlNet CTDMA
CAN CSMA/NBA
EtherNet CSMA/CD
Physical
ControlNet Phys. Layer
DeviceNet Phys. Layer
EtherNet Physical Layer
FIGURE 14.42
Possible future Alternatives: ATM, USB, FireWire,...
Relationship between CIP and EtherNet/IP.
environment. The actual signaling is left unchanged, but there are some additional specifications on connectors and cabling. For IP 20 applications, the well-known RJ45 connector is used, but for applications that require a higher degree of protection, suitable connectors have been specified. The EtherNet/ IP specification lists a sealed connector based on the RJ45 type. A second connector (D-coded M12) is a recent addition for devices that require a more compact connector. This connector has also been specified by a number of other organizations, so it is expected that it will become the de facto standard for field devices. Cat 5E or Cat 6 shielded or unshielded cables are recommended for EtherNet/IP. The use of shielded cables is specifically recommended in application where adjacent material, such as metal cable ducts, may have substantial influence on the characteristics of the cable. Copper media may only be used for distances up to 100 m. Fiber-optic media are recommended for longer distances. Fiber-optic media may also be advisable for applications with very high electromagnetic disturbances or high-voltage potential differences between devices. 14.3.3.2 Frame Structure EtherNet/IP uses standard Ethernet TCP/IP and UDP/IP frames as defined by international standards [12]. Therefore, no further frame details are described here. 14.3.3.3 Protocol Adaptation EtherNet/IP can use all features of CIP. The Ethernet frame is big enough that fragmentation is rarely required. Since EtherNet/IP is not expected to be used in very simple devices, no further scaling than that described in Section 14.3.3.10 is required. 14.3.3.4 Indicators and Switches EtherNet/IP devices that need to conform to the industrial performance level must have the set of indicators described in Chapter 9 of the EtherNet/IP Specification [6]. Devices may have additional indicators, which must not carry any of the names of those described in the specification. Devices may be built with or without switches or other directly accessible means for configuration.
© 2005 by CRC Press
14-43
The CIP Family of Fieldbus Protocols
Ethernet Frame Ethernet Header
IP Header
TCP or UDP Header
Encapsulation Header
Encapsulation Data
Trailer
Described in EtherNet/IP Specification FIGURE 14.43
Relationship between CIP and Ethernet frames.
14.3.3.5 Additional Objects The EtherNet/IP Specification defines two additional objects, the TCP/IP Object (Class Code 0xF5) and the Ethernet Link Object (Class Code 0xF6). 14.3.3.5.1 TCP/IP Object The TCP/IP interface object provides a mechanism to configure a device’s TCP/IP network interface. Examples of configurable items include the device’s IP address, network mask, and gateway address. 14.3.3.5.2 Ethernet Link Object The Ethernet link object maintains link-specific counters and status information for an Ethernet 802.3 communications interface. Each device has exactly one instance of the Ethernet link object for each Ethernet 802.3 communications interface. A request to access instance 1 of the Ethernet link object always refers to the instance associated with the communications interface over which the request was received. 14.3.3.6 EtherNet/IP Encapsulation EtherNet/IP is completely based on existing TCP/IP and UPD/IP technologies and uses these principles without any modification. TCP/IP is mainly used for the transmission of Explicit Messages, while UDP/ IP is used mainly for I/O Messaging. The encapsulation protocol defines a reserved TCP port number that is supported by all EtherNet/IP devices. All EtherNet/IP devices accept at least two TCP connections on TCP port number 0xAF12. The encapsulation protocol also defines a reserved UDP port number that is supported by all EtherNet/ IP devices. All devices accept UDP packets on UDP port number 0xAF12. However, most UDP port assignments in EtherNet/IP are determined by (TCP) Explicit Messages; most EtherNet/IP UDP messages do not, in fact, use port 0xAF12. Since UDP, unlike TCP, does not have an ability to reorder packets, whenever UDP is used to send an encapsulated message, the entire message is sent in a single UDP packet and only one encapsulated message is present in any UDP packet. 14.3.3.6.1 General Use of the Ethernet Frame Since EtherNet/IP is completely based on Ethernet with TCP/IP and UDP/IP, all CIP-related messages sent on an EtherNet/IP network are based on Ethernet frames with an IP header (Figure 14.43). The Ethernet, IP, and TCP or UDP headers are described through international standards (see Section 14.3.3.2); therefore, details of these headers are only mentioned in the EtherNet/IP Specification when necessary to understand how they are used. The encapsulation header is a description of the meaning of the encapsulation data. Most encapsulation data use the so-called Common Packet Format. I/O Messages sent in UDP frames do not carry an encapsulation header, but they still follow the Common Packet Format. 14.3.3.6.2 Encapsulation Header and Encapsulation Commands The overall encapsulation packet has the structure described in Figure 14.44. While the description of some of the encapsulation header details would go beyond the scope of this handbook, the command field needs some more attention here. However, only those commands that are
© 2005 by CRC Press
14-44
The Industrial Communication Technology Handbook
Encapsulation Packet Encapsulation Header
Encapsulation Data
Command
Length
Session Handle
Status
Sender Context
Options
Encapsulated Data Common Packet Format
2 bytes
2 bytes
4 bytes
4 bytes
8 bytes
4 bytes
0 to 65,511 bytes
FIGURE 14.44
Structure of the encapsulation packet. Common Packet Format
Item count
Address Item
Data Item
Optional additional items
2 bytes
FIGURE 14.45
Type ID
Length
2 bytes
2 bytes
Data
Type ID
Length
2 bytes
2 bytes
Data
Common Packet Format.
needed to understand EtherNet/IP are described here, and their descriptions only list the main features. The encapsulated data as such follows the Common Packet Format (see Section 14.3.3.6.2.4). 14.3.3.6.2.1 ListIdentity Command — The ListIdentity command is a broadcast UDP message that tells all EtherNet/IP devices to return a data set with identity information. This command is typically used by software tools to browse a network. 14.3.3.6.2.2 RegisterSession/UnRegisterSession Commands — These two commands are used to register and unregister a CIP Session between two devices. Once such a session is established, it can be used to exchange further messages. Multiple sessions may exist between two devices, but this is not common. The device requesting the session creates a sender context value; the device receiving the session request creates a session handle. Both values are used to identify messages between the two devices. 14.3.3.6.2.3 SendRRData/SendUnitData Commands — The SendRRData command is used for Unconnected Messaging; the SendUnitData command is used for Connected Explicit Messaging. 14.3.3.6.2.4 Common Packet Format — The Common Packet Format is a construct that allows packing of multiple items into one encapsulation frame (Figure 14.45). However, in most cases, only one Address Item and one Data Item are represented. All encapsulated messages are then assembled using at least these two items within the Common Packet Format. Full details of this encapsulation can be found in Chapter 2 of the EtherNet/IP Specification [6]. 14.3.3.7 IP Address Assignment Since the initial development of TCP/IP, numerous methods for configuring a device’s IP address have evolved. Not all of these methods are suitable for industrial control devices. In the office environment, for example, it is common for a PC to obtain its IP address via the Dynamic Host Configuration Protocol (DHCP), potentially getting a different address each time the PC reboots. This is acceptable because the PC is typically a client device that only makes requests, so there is no impact if its IP address changes.
© 2005 by CRC Press
14-45
The CIP Family of Fieldbus Protocols
Command [0x6F]
…
FIGURE 14.46
Length [bytes]
Session Handle
Status [0]
Sender Context
Options [0]
…
Interface Time- Item Address Address Data Data Message Handle out Count Type Length Type Length Router Request [0] [2] [0] [0] [0x00B2] Packet
UCMM request encapsulation.
However, for an industrial control device that is a target of communication requests, the IP address cannot change at each power-up. If you’re talking to a particular PLC, you want that PLC to be at the same address the next time it powers up. To further complicate matters, the only interface common to all EtherNet/IP devices is an Ethernet communications port. Some devices may also have a serial port, user interface display, hardware switches, or other interfaces, but these are not universally shared across all devices. Since Ethernet is the common interface, the initial IP address must at least be configurable over Ethernet. The EtherNet/IP Specification, via the TCP/IP Interface Object, defines a number of ways to configure a device’s IP address. A device may obtain its IP address via Bootstrap Protocol (BOOTP), DHCP, or an explicit Set_Attribute (single or set-all) service. None of these methods are mandated however. As a result, vendors could choose different methods for configuring IP addresses. From the user’s perspective, it is desirable for vendors to support some common mechanism(s) for IP address configuration. Therefore, ODVA, Profibus User Organization (PNO), and Modbus/IDA (Interface for Distributed Automation) are currently working on mandating a set of common methods to assign an IP address across the Ethernet link. The current ODVA recommendations on this subject can be downloaded from the ODVA Web site [8]. 14.3.3.8 Use of the Encapsulation Data 14.3.3.8.1 Explicit Messaging Unlike DeviceNet, Explicit Messages on EtherNet/IP can be sent connected or unconnected. Connected Explicit Messaging requires setting up a connection first (see Section 14.3.3.9). This, of course, means that all resources required for the management of the connection must stay reserved for this purpose as long as the connection exists. To avoid tying up these resources, most Explicit Messages can also be sent unconnected. Explicit Messages on EtherNet/IP are sent with a TCP/IP header and use encapsulation with the SendRRData Command (unconnected) and the SendUnitData Command (connected). As an example, the full encapsulation of a UCMM request is shown in Figure 14.46. The Message Router Request Packet, containing the message as such, follows the general format of Explicit Messages defined in Chapter 2 of the CIP Specification [4]. 14.3.3.8.2 I/O Messaging I/O Messages on EtherNet/IP are sent with a UDP/IP header. No encapsulation header is required, but the message still follows the Common Packet Format (e.g., Figure 14.47). The data field contains the I/O data prepended with a 16-bit Sequence Count Number for the packet. I/O data transmission without the Sequence Count Number is possible in principle, but is not used today. Run/Idle can be indicated within a Run/Idle header or by sending the data packet (Run) or no data packet (Idle). Which of the two methods is used is indicated in the connection parameters of the
© 2005 by CRC Press
14-46
The Industrial Communication Technology Handbook
Item Address Type Address Conn. Sequence Data Type Data Count (Sequenced) Length ID Number (Connected) Length [2] [0x8002] [8] [0x00B1]
...
FIGURE 14.47
Sequence Count Value
Run/Idle Header
...
I/O Data
I/O Message encapsulation.
Connection Manager section of the EDS. However, the Run/Idle header method is recommended for use in EtherNet/IP, and this is what is shown in Figure 14.47. I/O Messages from the originator to the target are typically sent as UDP unicast frames, while those sent from the target to the originator are typically sent as UDP multicast frames. This allows other EtherNet/IP devices to listen to this input data. To avoid these UDP multicast frames propagating all over the network, it is highly recommended that switches that support Internet Group Management Protocol (IGMP) Snooping be used. IGMP (see [41]) is a protocol that allows the automatic creation of multicast groups. Using this functionality, the switch will automatically create and maintain a multicast group consisting of the devices that need to consume these multicast messages. Once the multicast groups have been established, the switch will direct such messages only to those devices that have subscribed to the multicast group of that message. 14.3.3.9 Connection Establishment All connections on EtherNet/IP are established using a UCMM Forward_Open Message (see Section 14.2.3); therefore, all devices must support the UCMM function. 14.3.3.10 Device Levels (Clients, Servers) While not official categories, it is useful to distinguish among several levels of devices (Figure 14.48); one only has to implement the functionality needed. The minimal device function (level 1) is that of a Messaging Server. It is used for Explicit Messaging applications only and acts as a target for Connected and Unconnected Explicit Messages, e.g., for program upload/download, data collection, status monitoring, etc. The next class of device (level 2) is an I/O Server. It adds I/O Messaging support to a level 1 device and acts as a target for both Explicit and I/O Messages, e.g., simple I/O devices, pneumatic valves, AC drives, etc. These devices are also called adapters. Another class of device is a Messaging Client (level 3). It adds client support to level 1 Explicit Messaging applications and acts as a target and an originator for messaging applications, e.g., computer interface cards or HMI devices. Finally, the most powerful class of device is a scanner (level 4). It adds I/O Message origination support to levels 1, 2, and 3 and acts as a target and an originator for Explicit and I/O Messages, e.g., PLCs, I/O scanners, etc. 14.3.3.11 Device Profiles EtherNet/IP uses the full set of profiles described in Chapter 6 of the CIP Specification [4]. 14.3.3.12 Configuration EtherNet/IP devices typically come with EDSs, as described in Section 14.2.7. For EDS-based configuration tools, the EDS should contain a Connection Manager section to describe the details of the connections that can be made into the device. This section basically is a mirror of what is contained in the Forward_Open message that a Connection Originator would send to the device. Multiple connections can be specified within an EDS that can then be chosen by the configuration tool.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
FIGURE 14.48
14-47
Device levels.
An EDS may also contain individual parameters or a Configuration Assembly with a complete description of all parameters within this Assembly. In many applications, the Configuration Assembly is transmitted as an attachment to the Forward_Open Message. 14.3.3.13 Conformance Test Conformance testing is mandatory for all EtherNet/IP devices. Currently, this test is a protocol conformance test only since it is expected that most implementations use commercially available components for media access and physical attachments. Depending on the complexity of the device, as many as several thousand messages are transmitted to the DUT. To allow a test that is closely adapted to the characteristics of the DUT, a formal description of all relevant features of the DUT must be provided by the manufacturer. The software test is available from ODVA. It is a Windows-based tool, running on a PC with a standard Ethernet card. It is recommended that device developers run this test in their own labs before taking devices to the official ODVA test. When a device passes the test, it is said to be EtherNet/IP CONFORMANCE TESTED™.* Devices that have passed conformance testing are published on the ODVA Web site. 14.3.3.14 Requirements for TCP/IP Support In addition to the various requirements set forth in the EtherNet/IP Specification, all EtherNet/IP hosts are required to have a minimally functional TCP/IP suite and transport mechanism. The minimum host requirements for EtherNet/IP hosts shall be those covered in RFC 1122 [36], RFC 1123 [37], and RFC 1127 [38] and the subsequent documents that may supersede them. Whenever a feature or protocol is implemented by an EtherNet/IP host, that feature shall be implemented in accordance with the appropriate RFC documents, regardless of whether the feature or protocol is considered required or optional by this specification. The Internet and RFCs are dynamic. There will be changes to the RFCs and to the requirements included in this section as the Internet and this specification evolve, and these changes will not always provide for backward compatibility. All EtherNet/IP devices shall at a minimum support: • • • • • • •
Internet Protocol (IP version 4) (RFC 791 [29]) User Datagram Protocol (UDP) (RFC 768 [28]) Transmission Control Protocol (TCP) (RFC 793 [31]) Address Resolution Protocol (ARP) (RFC 826 [32]) Internet Control Messaging Protocol (ICMP) (RFC 792 [30]) Internet Group Management Protocol (IGMP) (RFC 1112 [35] and RFC 2236 [41]) IEEE 802.3 (Ethernet) as defined in RFC 894 [33]
*EtherNet/IP CONFORMANCE TESTED™ is a certification mark of ODVA.
© 2005 by CRC Press
14-48
FIGURE 14.49
The Industrial Communication Technology Handbook
Relationship of CIP to other typical Ethernet protocols.
Although the encapsulation protocol is suitable for use on other networks besides Ethernet that support TCP/IP, and products may be implemented on these other networks, conformance testing of EtherNet/ IP products is limited to those products on Ethernet. Other suitable networks include: • Point-to-Point Protocol (PPP) (RFC 1171 [39]) • ARCNET (RFC 1201 [40]) • Fiber Distributed Data Interface (FDDI) (RFC 1103 [34]) 14.3.3.15 Coexistence of EtherNet/IP and Other Ethernet-Based Protocols EtherNet/IP devices are encouraged but not required to support other Internet protocols and applications not specified in the EtherNet/IP Specification. For example, they may support Hypertext Transfer Protocol (HTTP), Telnet, File Transfer Protocol (FTP), etc. The EtherNet/IP Specification makes no requirements with regards to these protocols and applications. Figure 14.49 shows the relationship between CIP and other typical Ethernet-based protocol stacks. Since EtherNet/IP, like many other popular protocols, is based on TCP/IP and UDP/IP, coexistence with many other services and protocols is no problem at all and CIP blends nicely into the set of already existing functions. This means that anybody already using some or all of these popular Ethernet services can add CIP without too much of a burden; the existing services like HTTP or FTP may remain as before, and CIP will become another service on the process layer. 14.3.3.16 Ethernet Infrastructure To successfully apply EtherNet/IP to the automation world, the issue of determinism has to be considered. The inherent principle of the Ethernet bus access mechanism whereby collisions are detected and nodes back off and try again after a while cannot guarantee determinism. While Ethernet in its present form cannot be made strictly deterministic, there are ways to improve this situation. First, the hubs typically used in many office environments have to be replaced by the more intelligent switches that will forward only those Ethernet frames that are intended for nodes connected to this switch. With the use of full-duplex switch technology, collisions are completely avoided; instead of colliding, multiple messages sent to the same node at the same time are queued up inside the switch and are then delivered one after another. As already mentioned in Section 14.3.3.8.2, it is highly recommended that switches that support IGMP Snooping be used.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
14-49
If EtherNet/IP networks are to be connected to a general company network, then this should always be done through a router. The router keeps the UDP multicast messages from propagating into the company network and makes sure that broadcast or multicast office traffic does not congest the control network. Even though the router separates the two worlds, it can be set up to allow the TCP/IP-based Explicit Messages to pass through so that a configuration tool sitting in a PC in the office environment may very well be capable of monitoring and configuring devices on the control network. 14.3.3.17 Tools Tools for EtherNet/IP networks can be divided into four groups: • Physical layer tools: Tools (hardware and software) that verify the integrity and conformance of the physical layer or monitor the quality of the data transmission. • Commissioning tools: All EtherNet/IP devices need an IP address. In some cases, the setting of this address can only be achieved through the Ethernet link (see Section 14.3.3.7). In these cases, a BOOTP/DHCP server tool is required such as the free BOOTP/DHCP routine downloadable from the Rockwell Automation Web site. • Configuration tools: Software tools that are capable of communicating with individual devices for data monitoring and configuration purposes. Most configuration tools are EDS based; however, more complex devices like scanners tend to have their own configuration applets that are only partially based on EDSs. Some of these tools support multiple access paths to the network, e.g., via suitable bridging devices. High-level tools also actively query the devices on the network to identify them and monitor their health. Configuration tools may also be integrated into other packages like PLC programming software. • Monitoring tools: Typically PC-based software packages (called sniffers) that can capture and display the Ethernet frames on the network. A raw Ethernet frame display may be good enough for some top experts, but it is recommended that a tool that allows both raw Ethernet frame display and multiple levels of frame interpretation (IP, TCP/UDP, EtherNet/IP header interpretation) be used. Due to the popularity of Ethernet, a large number of sniffers are available, but not all of them support EtherNet/IP decoding. For a typical installation, a commissioning tool and a configuration tool are all that is needed. Protocol monitoring tools are mainly used to investigate interoperability problems and to assist during the development process. Turn to the EtherNet/IP product catalog on the ODVA Web site to access a list of vendors that provide tools for EtherNet/IP. 14.3.3.18 Advice for Developers Before any development of an EtherNet/IP product is started, the following issues should be considered in detail: • What functionality (device class; see Section 14.3.3.10) does the product require today and in future applications? • Messaging server only • Adapter functionality • Messaging client • Scanner functionality • What are the physical layer requirements? Is IP 65/67 required or is IP 20 good enough? • Will the development be based on commercially available hardware components and software packages (recommended) or designed from scratch (possible but costly)? • What are the configuration requirements? • What design and verification tools should be used?
© 2005 by CRC Press
14-50
The Industrial Communication Technology Handbook
• What is an absolute must before products can be placed on the market (own the specification, have a Vendor ID, have the product conformance tested)? Ethernet chip sets and associated base software packages are available from a large number of vendors on the market. For support of the EtherNet/IP part of the development, turn to the ODVA Web site for a list of companies that can support EtherNet/IP developments. 14.3.3.19 EtherNet/IP Overall Summary Since its introduction in 2000, EtherNet/IP has shown remarkable growth in many applications that used to be done with traditional fieldbuses. This success is largely attributed to the fact that this TCP/UDP/ IP-based Ethernet system has introduced real-time behavior into the Ethernet domain without giving up any of its highly appreciated features such as company-wide access with standard and specialized tools through corporate networks. The specific strength of EtherNet/IP is the fact that it does not require a modified or highly segregated network; standard switches and routers as known in the office world can be used without modification. At the same time, this means that all existing transport-level or TCP/UDP/IP-level protocols can continue to be used without any need for special bridging devices. The substantially improved real-time behavior of CIP Sync and the introduction of CIP Safety will soon allow EtherNet/IP to be used in applications that today need a set of several dedicated fieldbuses. Finally, its use of CIP and object structure allows the blending of EtherNet/IP networks into an overall CIP network structure that allows seamless communication, just as if it was only one network.
14.4 Benefits of the CIP Family The benefits of the CIP family can be subdivided into two groups: • Benefits for the manufacturer of devices • Benefits for the user of devices and systems
14.4.1 Benefits for the Manufacturer of Devices Major benefits for manufacturers come from the fact that existing knowledge can be reused from one protocol to another. This results in lower training costs for development, sales, and support personnel. Reduced development costs can be achieved since certain parts (e.g., parameters, profiles) of the embedded firmware can be reused from one network to another since they are identical. As long as these parts are written in a high-level language, the adaptation is simply a matter of running the right compiler for the new system. Another very important advantage for manufacturers is the easy routing of messages from one system to another. Any routing device can be designed very easily, since there is no need to invent a translation from one system to another; both systems already speak the same language. Manufacturers also benefit from dealing with the same organizations for support and conformance testing.
14.4.2 Benefits for the Users of Devices and Systems Major benefits for users come from the fact that existing knowledge can be reused from one protocol to another, e.g., Device Profiles and the behavior of devices are identical from one system to another. This results in lower training costs. Technical personnel and users do not have to make very large changes to adapt an application from one type of CIP network to another. The system integrator can choose the CIP network that is best suited to his application without having to sacrifice functionality. A further, very important benefit comes from the ease of bridging and routing within the CIP family. Moving information between noncompatible fieldbuses is always difficult and cumbersome, since it is almost impossible to translate functionality from one fieldbus to another. This is where the full benefits
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
14-51
of CIP can be reaped. Forwarding of data and messages from top to bottom and back again is very easy to implement and uses very little system overhead. There is no need to translate from one data structure to another — they are the same. Services and status codes share the same benefit: these, too, are identical over all CIP networks. Finally, creating a message that runs through multiple hops of CIP networks is simply a matter of inserting the full path from the originating to the target device. Not a single line of code or any other configuration is required in the routing devices. This results in fast and efficient services that are easy to create and maintain. Even though these networks may be used in different parts of the application, messaging from one end to another really works as if there is only one network. Finally, the very efficient Producer/Consumer mechanisms used in all CIP networks result in very fast and efficient use of the transmission bandwidth, with the result that system performance is often much higher than that with other fieldbuses running at higher raw baud rates. Only the data that are really important will be transmitted, instead of repeating old data again and again. Planned and future protocol extension will always be integrated in a manner that allows coexistence of normal devices with enhanced devices like those supporting CIP Sync and CIP Safety. Therefore, no strict segmentation into Standard, CIP Sync, and CIP Safety networks is required unless there is a compelling reason, e.g., unacceptably high response time due to high bus load.
14.5 Protocol Extensions under Development 14.5.1 CIP Sync 14.5.1.1 General Considerations CIP networks as described in [3], [4], [5], and [6] have a real-time behavior that is appropriate for many applications, but there are a growing number of applications that require much tighter control of certain real-time parameters. Let us have a look at some of them: • Real time: This term is being used in a large number of differing meanings in various contexts. For further use in this section, the following definition is used: A system exhibits real-time behavior when it can react to an external stimulus within a predetermined time. How short or how long this time is depends on the application. Demanding industrial control applications require reactions in the millisecond range, while some process control applications can often live with reaction times of several seconds or more. • Determinism: A deterministic system allows worst-case determination (not a prediction or a probability) of when a certain action takes place. Industrial communication systems may offer determinism to a higher or lesser degree depending on how they are implemented and used. Networks featuring message transmission at a predetermined point in time, such as ControlNet, SERCOS interface, and Interbus-S, are often said to offer absolute determinism. On the other hand, networks such as Ethernet may become undeterministic under certain load conditions, specifically when deployed in half-duplex mode with hubs. However, when Ethernet is deployed with full-duplex high-speed switches, it operates in a highly deterministic manner (see Section 14.3.3.16). • Reaction time: In an industrial control system, the overall system reaction time is what determines the real-time behavior. The communication system is only one of several contributing factors to the overall reaction time. In general, it is the time from an input stimulus to a related output action. • Jitter: The term jitter is used to define the time deviation of a certain event from its average occurrence. Some communication systems rely on a very small message jitter, while most applications only require that a certain jitter is not exceeded for actions at the borders of the system, such as input sampling jitter and output action jitter. • Synchronicity: Distributed systems often require certain actions to take place in a coordinated fashion; i.e., these actions must take place at a predetermined moment in time independent of
© 2005 by CRC Press
14-52
The Industrial Communication Technology Handbook
where the action is to take place. A typical application is coordinated motion or electronic gearing. Some of these applications require a synchronicity in the microsecond range. • Data throughput: This is the capability of a system to process a certain amount of data within a certain time span. For communication systems, protocol efficiency, the communication model (e.g., Producer/Consumer), and endpoint processing power are most important, while the wire speed only sets the limit of how much raw data can be transmitted across the physical media. CIP Sync is a CIP-based communication principle that enables synchronous low-jitter system reactions without the need for low-jitter data transmission. This is of great importance in systems that do not provide absolute deterministic data transmission or where it is desirable for a variety of higher-layer protocols to run in parallel to the application system protocol. The latter situation is characteristic for Ethernet. Most users of TCP/IP-based Ethernet want to keep using it as before without the need to resort to a highly segregated network segment to run the real-time protocol. The CIP Sync communication principle meets these requirements. 14.5.1.2 Using IEEE 1588 Clock Synchronization The recently published IEEE standard 1588 — Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems [24] — lays the foundation for a precise synchronization of real-time clock in a distributed system. An IEEE 1588 system consists of a Time Master that distributes its system time to Time Slaves in a tree-like structure. The Time Master may be synchronized with another real-time clock higher up in the hierarchy, while the Time Slaves may be Time Masters for other devices below them. A Time Slave that is Time Master to another set of devices (typically in another part of the system) is also called a Boundary Clock. The time distribution is done by multicasting a message with the actual time of the master clock. This message originates in a relatively high layer of the communication stack, and therefore, the actual transmission takes place at a slightly later point in time. Also, there will be a variation of the stack processing time from one message to another. To compensate this delay and its jitter, the actual transmission time can be captured in a lower layer of the communication stack, such as noting the “transmit complete” feedback from the communication chip. This update time capture is then distributed in a follow-up message. The average transmission delay is also determined so that the time offset between master and slave clock can also be compensated. This protocol has been fully defined for Ethernet UDP/IP systems, and the protocol details for further industrial communication systems are to follow. The clock synchronization accuracy that can be achieved with this system largely depends on the precision time capture of the master clock broadcast message. Hardware-assisted time capture systems can reach a synchronization accuracy of 250 ns or less. It is expected that Ethernet chip manufacturers will offer integrated IEEE 1588 hardware support in the very near future. 14.5.1.3 Additional Object CIP Sync will require the addition of a new time synchronization object. This object manages the realtime clock inside a CIP Sync device and provides access to the IEEE 1588 timing information. Figure 14.50 shows the relationship of the additional object required for CIP Sync. 14.5.1.4 CIP Sync Communication Principles Real-time clocks coordinated through the IEEE 1588 protocol on their own do not constitute a real-time system yet. Additional details to show how time stamping is used for input sampling and for the coordination of output actions will be added. Some Device Profiles will be extended as well to incorporate time information in their I/O Assemblies. Details of this activity are under discussion in the ODVA Distributed Motion Control JSIG. 14.5.1.5 Message Prioritization Combining these three elements (Sections 14.5.1.2, 14.5.1.3, and 14.5.1.4) with collision-free infrastructure (see Section 14.3.3.16) is sufficient to build a real-time system. However, it is necessary to consider all traffic within the system and arrange all application messages containing time-critical data in such a
© 2005 by CRC Press
14-53
The CIP Family of Fieldbus Protocols
Layers according to ISO/OSI User Device Profiles Application
Semiconductor
Pneumatic valves
AC Drives
Position controller
Other profiles
CIP Application Layer Application Object Library
CIP Sync Object
CIP (Presentation) (Session)
Transport
CIP Data Management Services CIP Message Routing, Connection Management Explicit Messages, I/O Messages
DeviceNet Transport
ControlNet Transport
Network
Encapsulation TCP
UDP IP
DataLink
ControlNet CTDMA
CAN CSMA/NBA
EtherNet CSMA/CD
Physical
ControlNet Phys. Layer
DeviceNet Phys. Layer
EtherNet Physical Layer
FIGURE 14.50
Possible future Alternatives: ATM, USB, FireWire,...
CIP extensions required for CIP Sync.
way that they are guaranteed to arrive at all consumers in time. When other Ethernet protocols, such as HTTP or FTP with possibly very long frames, need to coexist in the same system, the situation may need careful configuration. Ethernet frames with up to 1500 bytes of payload (approximately 122 µs long in a 100 Mbit/second system) can easily congest the system and delay important messages by an undetermined amount of time, possibly too long for correct functioning of the system. This is where message prioritization becomes an important element. Of the many prioritization schemes in use or proposed for Ethernet today, EtherNet/IP uses message prioritization according to IEEE 802.3:2002 [13]. This is a scheme supported by many switches available today. It allows preferential treatment of Ethernet frames in such a way that those frames with the highest priority will jump the message queues in a switch and will get transmitted first. Messages with high priority will get transmitted, while those with lower priority typically have to wait. Suitable priority assignment for all time-critical messages then guarantees their preferential treatment. Standard EtherNet/IP and other Ethernet messages will get low or no priority and thus have to wait until all higher-priority messages have passed. Once this prioritization scheme is implemented, one full-length frame can be tolerated within every communication cycle consisting of a set of prioritized input (port A through port E) and output (port F) messages. Figure 14.51 illustrates this process. 14.5.1.6 Applications of CIP Sync Typical applications for CIP Sync are time-stamping sensor inputs, distributed time-triggered outputs, and distributed motion such as electronic gearing or camming applications. For example, in motion applications, the sensors sample their actual position at a predetermined time, i.e., in a highly synchronized way, and transmit them to the application master that coordinates the motion. The application master then calculates the new reference values and sends them to the motion drives. Using CIP Sync, it is no longer necessary to have extremely low jitter in the communication system; it is sufficient to transmit all time-critical messages in time, and their exact arrival time becomes irrelevant. The assignment of suitable priorities to CIP Sync communication guarantees that all time-critical messages always get the bandwidth they need and all other traffic is automatically limited to the remaining bandwidth.
© 2005 by CRC Press
14-54
The Industrial Communication Technology Handbook
1
6
CIP frames with priority
Port A 3
9
Port B 5
7
4
8
1 3 4 5
2
6 7 8 9
Port F
Port C
Port D 2 Port E
Ethernet frame without priority
Result after priorization in the switch
The numbers inside the frames indicate their relative arrival time at the switch port
FIGURE 14.51
Ethernet frame prioritization.
As a result of these measures, CIP Sync devices can coexist side by side with other EtherNet/IP devices without any need for network segmentation or special hardware. Even non-EtherNet/IP devices — provided they do not override any of the CIP Sync prioritizations — can be connected without any loss of performance in the CIP Sync application. 14.5.1.7 Expected Performance of CIP Sync Systems As already mentioned, CIP Sync systems can be built to maintain a synchronization accuracy of better than 250 ns, in many cases without the use of Boundary Clocks. The communication cycle and thus the reaction delay to unexpected events is largely governed by the number of CIP Sync devices in a system. Allowing some bandwidth (approximately 40%) for non-CIP Sync messages, as described in Section 14.5.1.5, the theoretical limit (close to 100% wire load) for the communication cycle of a CIP Sync system based on a 100 Mbit/s Ethernet link is around 500 µs for 30 coordinated motion axes, with 32 bytes of data each. 14.5.1.8 CIP Sync Summary CIP Sync based on EtherNet/IP is a natural extension of the EtherNet/IP system into the very fast real-time domain. In contrast to many other proposed or existing real-time extensions, it does not require any strict network segmentation between high-performance real-time sections and other parts of the communication system. This results in truly open systems that can tolerate the vast majority of parallel TCP/IP-based protocols found in today’s industrial communication architecture without compromising performance. In a first phase, the CIP Sync principles will be applied to EtherNet/IP, while an extension to the other CIP implementations will follow at a later time.
14.5.2 CIP Safety CIP Safety, like other safety protocols based on industry standard networks, adds additional services to transport data with high integrity. Unlike other networks, the user of CIP Safety does not have to change his approach when going from one network or media to another. CIP Safety presents a scalable, networkindependent approach to safety network design, where the safety services are described in a well-defined layer. This approach also enables the routing of safety data, allowing the user to create end-to-end safety chains across multiple links without being forced to difficult-to-manage gateways. 14.5.2.1 General Considerations In the past and still today, hardwired safety systems employed safety relays that are interconnected to provide a safety function. Hardwired systems are difficult to develop and maintain for all but the most trivial applications. Furthermore, these systems place significant restrictions in the distance between devices.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
FIGURE 14.52
14-55
CIP communications layers, including safety.
Because of these issues, as well as distance and cost considerations, it is desirable to allow safety services to be implemented on standard communication networks. The key to the development of safety networks was not to create a network that could not fail, but to create a system where failures in the network would cause safety devices to go to a known state. If the user knew which state the system would go to, he could make his application safe. But this meant that significantly more checking and redundant coding information would be required. To determine the additional safety requirements, an existing railway standard [20] was used and later extended by the German Safety Bus committee [21]. This committee provided design guidelines to safety network developers to allow their networks and safety devices to be certified according to IEC 61508 [15]. Using these results, the Common Industrial Protocol, which allows network-independent routing of standard data, was extended to allow high-integrity safety services. The result is a scalable, routable, network-independent safety layer, thus removing the requirement for dedicated safety gateways. Since all safety devices execute the same protocol, independent of which media they reside on, the user approach is consistent and independent of media or network used. CIP Safety is an extension to standard CIP that has been approved by TÜV Rheinland for use in IEC 61508 SIL 3 and EN 954-1 category 4 applications. It extends the model by adding CIP Safety application layer functionality, as shown in Figure 14.52. The additions include several safety-related objects and Safety Device Profiles. Because the safety application layer extensions do not rely on the integrity (see Section 14.5.2.3) of the underlying standard CIP as described in Section 14.2 and data link layers as described in Sections 14.3.1, 14.3.2, and 14.3.3, single-channel (nonredundant) hardware can be used for the data link communication interface. This same partitioning of functionality allows standard routers to be used to route safety data, as shown in Figure 14.53. The routing of safety messages is possible, because the end device is responsible for ensuring the integrity of the data. If an error occurs in the transmission of data or in the intermediate router, the end device will detect the failure and take an appropriate action. This routing capability allows the creation of DeviceNet Safety cells with quick reaction times to be interconnected with other cells via a backbone network such as EtherNet/IP for interlocking, as shown in Figure 14.54. Only the safety data that are needed are routed to the required cell, which reduces the
© 2005 by CRC Press
14-56
The Industrial Communication Technology Handbook
CIP Safety App. Objects
CIP Safety App. Objects
CIP Safety Connection
CIP Safety Connection
CIP Routing DeviceNet Transport & Data Link Layer
DeviceNet Transport & Data Link Layer
DeviceNet
FIGURE 14.53
EtherNet/IP
Routing of safety data.
Safety PLC
EtherNet/IP Safety
Safety PLC
Router
Router DeviceNet Safety 1
I
FIGURE 14.54
EtherNet/IP Transport & Data Link Layer
EtherNet/IP Transport & Data Link Layer
I
O
Safety PLC
Router DeviceNet Safety 2
I
I
O
Safety PLC
DeviceNet Safety 3
I
O
Network routing.
individual bandwidth requirements. The combination of fast responding local safety cells and the intercell routing of safety data allows users to create large safety applications with fast response times. Another benefit is the ability to multicast safety messages across multiple networks. 14.5.2.2 Implementation of Safety As indicated in Figure 14.52, all CIP Safety devices also have an underlying standard CIP functionality. The extension to the CIP Safety application layer is specified using a Safety Validator Object. This object is responsible for managing the CIP Safety Connections (standard CIP Connections are managed through communication objects) and serves as the interface between the safety application objects and the link layer connections, as shown in Figure 14.55. The Safety Validator ensures the integrity of the safety data transfers by applying the integrity-ensuring measures described in Section 14.5.2.3. • The producing safety application uses an instance of a Client Validator to produce safety data and ensure time coordination. • The client uses a link data producer to transmit the data and a link consumer to receive time coordination messages. • The consuming safety application uses a Server Validator to receive and check data. • The server uses a link consumer to receive data and a link producer to transmit time coordination messages.
© 2005 by CRC Press
14-57
The CIP Family of Fieldbus Protocols
FIGURE 14.55
Relationship of Safety Validators.
The link producers and consumers have no knowledge of the safety packet and fulfill no safety function. The responsibility for high-integrity transfer and checking of safety data lies within the Safety Validators. 14.5.2.3 Ensuring Integrity CIP Safety does not prevent communication errors from occurring, but it ensures transmission integrity by detecting errors and allowing devices to take appropriate actions. The Safety Validator is responsible for detecting these communication errors. The nine communication errors that must be detected are shown in Figure 14.56 along with the five measures CIP Safety used to detect these errors, based on reference [21]. Measures to detect communication errors Communication Errors
Time Expectation ID for send Safety CRC via time stamp and receive
Message Repetition
X
X*
Message Loss
X
X*
Message Insertion
X
Incorrect Sequence
X
X
Increased age of data in bridge
X*
X
X
X
X
X
Coupling of safety and safety data Coupling of safety and standard data
Diverse measure
X*
Message Corrupt Message Delay
Redundancy with Cross Checking
X
X
X
X
X
* The Safety CRC provides additional protection for communication errors in fragmented messages.
FIGURE 14.56
© 2005 by CRC Press
Error detection measures.
14-58
The Industrial Communication Technology Handbook
Producer Count 0
Consumer Count 89
Ping
1
90
2
91
3 4 Offset = 92−5 = 87
er nsum
2 e=9
tim
93
Co
5
94
6
95
7 Time Stamp = 97+8 = 95
8
96
Time Sta
mp = 95
9
Time Stamp = 87+14 = 101
FIGURE 14.57
92
97 98
10
99
11
100
12
101
13
102
14
Time Sta
mp = 10
1
Max. Age = 98−95 = 3
103
15
104
16
105
17
106
Max. Age = 104−101 = 3
Time stamp.
14.5.2.3.1 Time Expectation via a Time Stamp All CIP Safety data are produced with a time stamp, which allows Safety Consumers to determine the age of the produced data. This detection measure is superior to the more conventional reception timers. Reception timers can tell how much time has elapsed since a message was last received, but they do not convey any information about the actual age of the data. A time stamp allows transmission, media access/ arbitration, queuing, retry, and routing delays to be detected. Time is coordinated between producers and consumers using ping requests and ping responses, as shown in Figure 14.57. After a connection is established, the producer will produce a ping request, which causes the consumer to respond with its consumer time. The producer will note the time difference between the ping production and the ping response and store this as an offset value. The producer will add this offset value to its producer time for all subsequent data transmissions. This value is transmitted as the time stamp. When the consumer receives a data message it subtracts its internal clock from the time stamp to determine the data age. If the data age is less than the maximum age allowed, the data are applied; otherwise, the connection goes to the safety state. The device application is notified so that the connection safety state can be appropriately reflected. The ping request-and-response sequence is repeated periodically to correct for any drift in producer or consumer time base drift. 14.5.2.3.2 Production Identifier A Production Identifier (PID) is encoded in each data production of a Safety Connection to ensure that each received message arrives at the correct consumer. The PID is derived from an electronic key, the device Serial Number, and the CIP Connection Serial Number. Any safety device inadvertently receiving a message with the incorrect PID will go to a safety state. Any safety device that does not receive a message within the expected time interval with the correct PID will also go to a safety state. This measure ensures that messages are routed correctly in multilink applications.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
14-59
14.5.2.3.3 Safety CRC All safety transfers on CIP Safety use Safety CRCs to ensure the integrity of the transfer of information. The Safety CRCs serve as the primary measure to detect possible corruption of transmitted data. They provide detection up to a Hamming distance of 4 for each data transfer section, though the overall Hamming distance coverage is greater for the complete transfer due to the redundancy of the protocol. The Safety CRCs are generated in the Safety Producers and checked in the Safety Consumers. Intermediate routing devices do not examine the Safety CRCs. Thus, by employing end-to-end Safety CRCs, the individual data link CRCs are not part of the safety function. This eliminates certification requirements for intermediate devices and helps to ensure that the safety protocol is independent of the network technology. The Safety CRC also provides a strong protection mechanism that allows underlying data link errors such as bit stuffing or fragmentation errors to be detected. The individual link CRCs are not relied on for safety, but they are still enabled. This provides an additional level of protection and noise immunity, by allowing data retransmission for transient errors at the local link. 14.5.2.3.4 Redundancy and Cross-Check Data and CRC redundancy with cross-checking provides an additional measure of protection by detecting possible corruption of transmitted data. They effectively increase the Hamming distance of the protocol. These measures allow long safety data packets, up to 250 bytes, to be sent with high integrity. For short packets of 2 bytes or less, data redundancy is not required; however, redundant CRCs are cross-checked to ensure integrity. 14.5.2.3.5 Diverse Measures for Safety and Standard The CIP Safety protocol is present only in safety devices; this prevents standard devices from masquerading as a safety device. 14.5.2.4 Safety Connections CIP Safety provides two types of Safety Connections: • Unicast • Multicast A unicast connection, as shown in Figure 14.58, allows a Safety Validator client to be connected to a Safety Validator server using two link layer connections.
FIGURE 14.58
© 2005 by CRC Press
Unicast connection.
14-60
FIGURE 14.59
The Industrial Communication Technology Handbook
Multicast connection.
A multicast connection, as shown in Figure 14.59, allows up to 15 Safety Validator servers to consume safety data from a Safety Validator client. When the first Safety Validator server establishes a connection with a Safety Validator client, a pair of link layer connections is established, one for data-and-time correction and one for time coordination. Each new Safety Validator server will use the existing dataand-time correction connection and establish a new time coordination connection with the Safety Validator client. To optimize the throughput on DeviceNet, three data link connections are used for each multicast connection, as shown in Figure 14.60. The data-and-time correction messages are sent on separate connections. This allows short messages to be transmitted on DeviceNet within a single CAN frame and reduces the overall bandwidth, since the time correction and time coordination messages are sent at a much slower periodic interval. When multicast messages are routed off link, the router combines the data-and-time correction messages from DeviceNet and separates them when messages reach DeviceNet. Since the safety message contents are unchanged, the router provides no safety function.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
FIGURE 14.60
14-61
Multicast connection on DeviceNet.
14.5.2.5 Message Packet Sections CIP Safety has four message sections: • • • •
Data section Time-stamp section Time correction section Time coordination section
The description of these formats would go beyond the scope of this handbook. Reference [23] provides further details. 14.5.2.6 Configuration Before safety devices can be used in a safety system, they must first be configured and connections must be established. The process of configuration requires configuration data from a configuration tool to be placed in a safety device. There are two possible sequences for configuration:
© 2005 by CRC Press
14-62
Safety Network Configuration Tool
d oa t) nl ge ow T ar )D o (2 ol t o (T
(T (1) oo D l to ow Or nlo igi ad na to r)
The Industrial Communication Technology Handbook
(3) Download (Originator to Target Download)
Originator Device
FIGURE 14.61
(4) SafetyOpen Configuration
Target Device
Configuration transfers.
• Configuration tool directly to device • Via an intermediate device In the configuration tool-to-device case, as shown in Figure 14.61, the configuration tool writes directly to the device to be configured (1 and 2). In the case of intermediate device configuration, the tool first writes to an originator (1) and the originator writes to the target using an originator-to-target download (3) or a Safety_Open service (4). The Safety_Open service (4) is unique in that it allows a Safety Connection to be established at the same time that a device is configured. 14.5.2.7 Connection Establishment CIP provides a connection establishment mechanism, using a Forward_Open service that allows producer-to-consumer connections to be established locally or across multiple links via intermediate routers. An extension of the Forward_Open, called the Safety_Open service, has been created to allow the same multilink connections for safety. There are two types of Safety_Open requests: • Type 1: With configuration • Type 2: Without configuration With the Type 1 Safety_Open request, configuration and connections are established at the same time. This allows rapid configuration of devices with simple and relatively small configuration data. With the Type 2 Safety_Open request, the safety device must first be configured, and the Safety_Open request then establishes a Safety Connection. This separation of configuration and connection establishment allows the configuration of devices with large and complex configuration data. In both cases, the Safety_Open request establishes all underlying link layer connections — across the local link as well as any intermediate links and routers. 14.5.2.8 Configuration Implementation CIP Safety provides the following protection measures to ensure the integrity of configuration: • • • •
Safety Network Number Password protection Configuration ownership Configuration locking
14.5.2.8.1 Safety Network Number The Safety Network Number provides a unique network identifier for each network in the safety system. The Safety Network Number combined with the local device address allows any device in the safety system to be uniquely addressed.
© 2005 by CRC Press
14-63
The CIP Family of Fieldbus Protocols
Application Objects
Parameters
Safety I/O Assemblies Standard I/O Assemblies
Other Objects
Message Router
Identity Safety Supervisor
Safety Validator
Safety I/O Connections
Explicit msg
Network Link* * - DeviceNet - ControlNet - Ethernet
CIP Network
FIGURE 14.62
Safety device objects.
14.5.2.8.2 Password Protection All safety devices support the use of an optional password. The password mechanism provides an additional protection measure, prohibiting the reconfiguration of a device without the correct password. 14.5.2.8.3 Configuration Ownership The owner of a CIP Safety device can be specified and enforced. Each safety device can specify that its configuration is configured by a selected originator or that the configuration is only configured by a configuration tool. 14.5.2.8.4 Configuration Locking Configuration Locking provides the user with a mechanism to ensure that all devices have been verified and tested prior to being used in a safety application. 14.5.2.9 Safety Devices The relationship of the objects within a safety device is shown in Figure 14.62. Note that CIP Safety extends the CIP object model, with the addition of Safety I/O Assemblies and Safety Validator and Safety Supervisor Objects. 14.5.2.10 Safety Supervisor The Safety Supervisor Object provides a common configuration interface for safety devices. The Safety Supervisor Object centralizes and coordinates application object state behavior and related status information, exception status indications (alarms and warnings), and defines a behavior model that is assumed by objects identified as belonging to safety devices.
© 2005 by CRC Press
14-64
The Industrial Communication Technology Handbook
14.5.2.11 CIP Safety Summary The concept presented here demonstrates a scalable, routable network-independent safety protocol based on extensions to the CIP architecture. This concept can be used in solutions ranging from device-level networks such as DeviceNet to higher-level networks such as EtherNet/IP. By designing network independence into CIP Safety, multilink routing of Safety Connections can be supported. Functions such as multilink routing and multicast messaging provide a strong foundation that enable users to create the fast responding local cells and interconnect remote cells that are required for today’s safety applications. The design also enables expansion to future network technologies as they become available.
14.6 Conclusion The CIP family of protocols is a very versatile set of fieldbus protocols that are scalable to allow their use in many applications and many levels of the automation architecture. Due to the universal applicability of the underlying protocol, it is very easy to switch from one system to another. The Producer/Consumer principle together with the open-object architecture used in the CIP family allow very efficient use of the communication bandwidth and ensure that these modern systems can be used for many years to come.
References* [1] CIP Common Specification, Release 1.0, © 2000, 2001 by ControlNet International and Open DeviceNet Vendor Association. [2] DeviceNet Specification, Release 2.0, including Errata 5, March 31, 2002, © 1995–2002 by Open DeviceNet Vendor Association. [3] ControlNet Specification, Release 2.0, including Errata 2, December 31, 1999, © 1998, 1999 by ControlNet International. [4] CIP Common Specification, Edition 2.0, © 2001–2004 by ODVA and ControlNet International. [5] DeviceNet Adaptation of CIP Specification, Edition 1.0, December 15, 2003, © 1994–2004 by Open DeviceNet Vendor Association. [6] EtherNet/IP Specification, Release 1.0, June 5, 2001, © 2000, 2001 by ControlNet International and Open DeviceNet Vendor Association. [7] Planning and Installation Manual, DeviceNet Cable System, Publication PUB00027R1, downloadable from ODVA Web site (http://www.odva.org/). [8] Recommended IP Addressing Methods for EtherNet/IP Devices, Publication PUB00028R0, downloadable from ODVA Web site (http://www.odva.org/). [9] IEC 61131-3:1993, Programmable Controllers: Part 3: Programming Languages. [10] Controller Area Network: Basics, Protocols, Chips and Application, IXXAT Automation, 2001. [11] ISO 11898:1993, Road Vehicles: Interchange of Digital Information: Controller Area Network (CAN) for High-Speed Communication. [12] IEEE 802.3:2000, ISO/IEC 8802-3:2000, Information Technology: Local and Metropolitan Area Networks: Part 3: Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access Method and Physical Layer Specification. [13] IEEE 802.3:2002, Information Technology: Telecommunication and Information Exchange between Systems: LAN/MAN: Specific Requirements: Part 3: Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access Method and Physical Layer Specifications. [14] ISO/IEC 7498-1:1994, Information Technology: Open Systems Interconnection: Basic Reference Model. [15] IEC 61508, Functional Safety of Electrical/Electronic/Programmable Electronic Safety-Related Systems, 1998.
*All RFCs are downloadable from http://www.faqs.org/rfcs/.
© 2005 by CRC Press
The CIP Family of Fieldbus Protocols
14-65
[16] IEC 62026-3, Low-Voltage Switchgear and Controlgear: Controller-Device Interfaces (CDIs): Part 3: DeviceNet, 2000. [17] EN 50325-2, Industrial Communications Subsystem Based on ISO 11898 (CAN) for ControllerDevice Interfaces: Part 2: DeviceNet, 2000. [18] GB/T 18858 (Chinese national standard), Low-Voltage Switchgear and Controlgear ControllerDevice Interface, 2003. [19] IEC 61158, Digital Data Communications for Measurement and Control: Fieldbus for Use in Industrial Control Systems, 2000. [20] EN 50159-1:2001, Railway Applications, Communication, Signaling and Processing Systems. [21] Draft Proposal Test and Certification Guideline, Safety Bus Systems, BG Fachausschuβ Elektrotechnik, May 28, 2000. [22] IEC 61508, Functional Safety of Electrical/Electronic/Programmable Electronic Safety-Related Systems. [23] David A. Vasko, Suresh R. Nair, 2003, CIP Safety: Safety Networking for the Future, paper presented at Proceedings of the 9th International CAN Conference. [24] IEEE 1588:2002, Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems. [25] Viktor Schiffer, 2003, Modular EDSs and Other EDS Enhancements for DeviceNet, paper presented at Proceedings of the 9th International CAN Conference. [26] Viktor Schiffer, 2003, Device Configuration Using Electronic Data Sheets, ODVA Conference and 9th Annual Meeting, downloadable from ODVA Web site. [27] Viktor Schiffer, Ray Romito, DeviceNet Development Considerations, downloadable from ODVA Web site, 2000. [28] RFC 768, User Datagram Protocol, 1980. [29] RFC 791, Internet Protocol, 1981. [30] RFC 792, Internet Control Message Protocol, 1981. [31] RFC 793, Transmission Control Protocol, 1981. [32] RFC 826, Ethernet Address Resolution Protocol, or Converting Network Protocol Addresses to 48.bit Ethernet Address for Transmission on Ethernet Hardware, 1982. [33] RFC 894, Standard for the Transmission of IP Datagrams over Ethernet Networks, 1984. [34] RFC 1103, Proposed Standard for the Transmission of IP Datagrams over FDDI Networks, 1989. [35] RFC 1112, Host Extensions for IP Multicasting, 1989. [36] RFC 1122, Requirements for Internet Hosts: Communication Layers, 1989. [37] RFC 1123, Requirements for Internet Hosts: Application and Support, 1989. [38] RFC 1127, Perspective on the Host Requirements RFCs, 1989. [39] RFC 1171, Point-to-Point Protocol for the Transmission of Multi-Protocol Datagrams over Pointto-Point Links, 1990. [40] RFC 1201, Transmitting IP Traffic over ARCNET Networks, 1991. [41] RFC 2236, Internet Group Management Protocol, Version 2, 1997.
© 2005 by CRC Press
15 The Anatomy of the P-NET Fieldbus 15.1 15.2 15.3 15.4
Background........................................................................15-1 The Embodiment of P-NET.............................................15-2 The Communication Skeleton .........................................15-3 Layer 1: The Physical Layer ..............................................15-3 RS 485 • RS 232 • Light-Link • 4-WIRE P-NET• Ethernet
15.5 Layer 2: The Data Link Layer ...........................................15-7 Node Address Field • Control Status Field • Info Length Field • Info Field • The Error Detection Field • Master–Slave • Multimaster Bus Access
Christopher G. Jenkins PROCES-DATA (U.K.) Ltd.
15.6 Layer 3: The Network Layer ...........................................15-15 15.7 Layer 4: The Service Layer..............................................15-16 15.8 Layer 7: The Application Layer ......................................15-16 15.9 Layer 8: The User Layer ..................................................15-16 15.10 The Intelligent P-NET Node ..........................................15-18 15.11 The PC and P-NET .........................................................15-20 15.12 The Appliance of P-NET ................................................15-22 15.13 Worldwide Fallout...........................................................15-23 15.14 P-NET for SMEs..............................................................15-24 Bibliography ...............................................................................15-25
Within a restricted number of pages, there needs to be a compromise between describing a technical concept so briefly as to not do it justice, and delving too deeply into its functionality to leave no room to convey the general essence. With this in mind, this study concentrates on describing the skeleton of P-NET’s structure in as much detail as is necessary to at least highlight the main attributes of this fieldbus type. Hopefully, the reader will still be left with enhanced knowledge of the techniques used when using P-NET within a process automation system.
15.1 Background P-NET was initially conceived in Denmark during the 1980s as a means of transferring measurement and control data between industrial process transmitters and programmable controllers, connected together within a serial communications network — hence the name P-NET, derived from the words describing its use within a process network. Prior to this, data had to be individually transmitted in analogue form using current (e.g., 4 to 20 mA) or voltage (e.g., 0 to 100 mV) or a one-to-one digital link (e.g., 20-mA current loop or RS 232). Therefore, there was little chance of avoiding many point-to-point connections from transmitters to a central controller, involving a multitude of wires, cables, and terminations. This is not the place to discuss the unfavorable
15-1 © 2005 by CRC Press
15-2
The Industrial Communication Technology Handbook
FIGURE 15.1 P-NET segments showing inter-network communication.
cost comparison between even a small process plant based on point-to-point wiring and a digitally networked system, due to the additional cabling and manpower, a decrease in reliability with an increase of terminations, and lack of expansion flexibility. However, it is worth mentioning that studies and practical implementations since that time have proposed that one can achieve a 40% lifetime cost advantage of the latter over the former. So, in justifying the economic advantages of developing P-NET in the first place, how does this compare with the evolution of other forms of fieldbus available today? Again, this brief study is not intended to be a comparison with, or to cast dispersions upon, any other industrial communications standard. However, the aim is to convey that there are a number of important differences between P-NET and other fieldbus standards, which provide it with its own unique characteristics.
15.2 The Embodiment of P-NET Before we dig deeper into the technicalities of the P-NET protocol, it would be as well to briefly describe some of the attributes that give P-NET its particular personality: • Although P-NET is generically grouped and described as a “Fieldbus for use in Industrial Control Systems” and to which an international standard has been applied (IEC 61158/4), its capabilities extend beyond being purely a single-level bus system. As the name implies, it has built-in networking properties. This provides the means for an industrial control system to be partitioned into a number of autonomous segments (buses), each communicating with others using the same protocol through gateways. This means that if required, such a system can also be structured into hierarchical layers dealing with the lower sensor level, through the device level, up to the control system backbone. P-NET is therefore known as a multinet protocol (Figure 15.1). • Some fieldbus types only support a single controller or master. This is the device that controls communication with a number of slave nodes connected to the bus. P-NET is a multimaster protocol, which allows up to 32 master devices to communicate with up to 125 other nodes. However, this restriction applies only to a single bus or network segment. By incorporating a P-NET multiport master device onto the bus, the bus will act as a transparent bridge between two nets. Another 32 masters within 125 nodes can again be connected to this new bus. It does not take too much thought to realize that by adding more multiport masters to interconnected buses, one will eventually build up a highly complex Web network having many segments and layers. While such extended facilities would not perhaps be considered particularly useful for a simple measurement and control system, the flexibility offered by P-NET has allowed it to also be structured as a completely integrated industrial control communication network to run an entire process or manufacturing plant (Figure 15.2).
© 2005 by CRC Press
The Anatomy of the P-NET Fieldbus
15-3
FIGURE 15.2 A complete network showing a variety of physical media.
15.3 The Communication Skeleton Let us now start discussing the P-NET protocol in more detail. When fully specifying a communications protocol, it is quite common to use the Open Systems Interconnection (OSI) 7 layer reference model as a means of breaking the protocol down into manageable parts. The methodology and generic meaning of these layers is described elsewhere in this book. It is worth mentioning, however, that although most fieldbus types can be described using just layers 1, 2, and 7, P-NET also implements layers 3 and 4, because it offers the additional multinet and gateway features. Since the structure of a complete message can be seen in the lower layers, we shall concentrate mainly on these.
15.4 Layer 1: The Physical Layer Layer 1 is concerned with how raw bits are transmitted over the bus and network. It specifies the cable or transmission medium, how many nodes are allowed and what topology is used to connect them together, how a 1 and a 0 are represented in terms of voltage level, the timing of each bit, etc. It is worth noting that this is the only layer where the electrical activity at the transmitting and receiving nodes will be the same. There are three electrical standards specified for P-NET in the current IEC standard. However, in recent years, other specifications have evolved, all of which are now being used extensively. If we also consider the local and wide area transport media of Ethernet and wireless local area network (WLAN) together with dial-up networking using public switched telephone network (PSTN), integrated services digital network (ISDN), and global system for mobile communication (GSM) (including short message service [SMS]), or the permanent service provided by Broadband via the Internet, there are a myriad of possibilities for fully utilizing the networking capabilities of P-NET.
15.4.1 RS 485 RS 485 is an international electrical standard in its own right, which enables multiple communicating devices to be connected together on the same piece of cable (Figure 15.3).
© 2005 by CRC Press
15-4
The Industrial Communication Technology Handbook
FIGURE 15.3 Recommended interface circuit.
This standard was chosen for P-NET because electrically it uses a balanced-line transmission principle. This means that it has better noise immunity and higher speed and distance capabilities than one where each line is referenced to a fixed voltage. The specification has been extended using P-NET by ensuring that all connected nodes incorporate a galvanically isolated connection. This enables up to 127 nodes to be simultaneously connected to a single cable (Figure 15.4). The topology of choice for this medium is for the screened twisted-pair cable to be connected in a continuous ring having a length of up to 1.2 km. This improves noise immunity by reducing reflections and enhances reliability if there should be a single point break in the cable. However, the classic bus architecture is also permitted if a specific termination network is fitted. The bit rate of standard P-NET was carefully chosen to ensure that any commonly available microcontroller with a serial interface could be used as a P-NET node. At the same time, it needed to be ensured
P-NET
FIGURE 15.4 Ring topology.
© 2005 by CRC Press
15-5
The Anatomy of the P-NET Fieldbus
= Termination
P-NET
FIGURE 15.5 Bus topology.
that the line length of a single bus segment performed over a workable distance without signal degradation. Such a standard rate is fixed at 76.8 Kbs, which is a standard software-selectable baud rate in most microcontroller serial interfaces. This also enables the line length to extend around a complete plant. The consequent data transfer rate is shown in Figure 15.10.
15.4.2 RS 232 This electrical standard is also part of the P-NET standard. It plays an important role in providing gateway possibilities between two P-NET network segments and in linking to equipment that does not possess a P-NET interface. RS 232 is a point-to-point serial link and does not allow more than two transmit–receive devices to be connected together. Therefore, it is not a multidrop medium as with RS 485. Due to the way bits are transmitted, it is more susceptible to noise unless distances are kept to a minimum. However, when used as a P-NET port gateway, the bit rate is selectable, these days ranging from 300 to 230,400 bps, including the standard 76,800 bps. This means that dial-up modems associated with PSTN, ISDN, and GSM can all be utilized to transfer data between P-NET segments that are physically located thousands of miles apart. Of course, if a P-NET device includes an RS 232 port, it can also be connected to printers, bar code readers, and other devices having an RS 232 interface.
15.4.3 Light-Link Light-link is a medium designed to transfer data via infrared (IR) nonvisible light. It was designed by PROCES-DATA primarily as a means of transferring data using the P-NET protocol throughout a local cluster of individually mounted DIN* rail input/output (I/O) slave modules and multiport programmable master modules. Such a facility negates the need to perform any physical wiring between modules, since the act of mounting adjacent modules automatically connects the light path. The means are also provided for extending the light-link path to other local clusters or individual nodes using standard fiber-optic cable (Figure 15.6 and Figure 15.7).
15.4.4 4-WIRE P-NET 4-WIRE P-NET extents the usability of P-NET devices to areas where it is important to transmit power as well as data within one cable, perhaps where a number of low-power single-channel sensors/actuators need to be distributed around a building or home. This is opposed to connecting an individual power source to each distributed node or cluster, as with classic RS 485. Another major advantage of this medium is that together with an appropriate barrier, it is also suitable for connecting ATEX†-approved AP-NET nodes within hazardous areas. A hazardous area is one where an ignitable gas or material permanently or occasionally exists, and where a spark caused by the connection or disconnection or a fault within electrical equipment could cause an explosion. Electrical equipment therefore has to be designed, approved, and marked for use in such areas. The subject of intrinsic safety and other forms of hazardousarea protection is a wide one, and it would be inappropriate to discuss it in any detail here. Suffice it to say that by using this medium, a number of approved P-NET devices can be used within the industries
*DIN, Deutsches Institute für Normung (German Standards Authority). †ATEX, Authority for Testing Equipment in Explosive Atmospheres.
© 2005 by CRC Press
15-6
The Industrial Communication Technology Handbook
FIGURE 15.6 Mounting principle of P-NET modules from PROCES-DATA.
FIGURE 15.7 Two clusters of modules joined by extending light-link via fiber optics.
that warrant such safety protection (e.g., oil, petroleum, gas, pharmaceuticals, mining, etc.). It should also be understood that a change from one P-NET medium to another is transparent and operates at the same speed using the same protocol (Figure 15.8).
15.4.5 Ethernet Ethernet is another electrical standard for the transmission of data, in the same way that RS 485 is. It has gained wide popularity as a means of exchanging data between office equipment, such as PCs, printers, faxes, etc. As an electrical standard, it can be used to transfer data using various protocols, one of which is the packet-oriented Internet Protocol (IP). As the name implies, this protocol enables not only computers within an office environment to talk together, but also equipment separated by vast distances by
© 2005 by CRC Press
15-7
The Anatomy of the P-NET Fieldbus
24V PSU M36 Cluster
M36 Cluster
M36 Cluster
P-NET = 4-WIRE P-NET Conditioner: Termination = 4-WIRE P-NET Conditioner: S-Ref = P-NET Device = 4-WIRE P-NET Device
FIGURE 15.8 Conversion to 4-WIRE P-NET directly or through ATEX barrier (not shown).
each having a connection to the Internet. As with a P-NET device having an RS 232 port to communicate with another P-NET device directly or via various MODEM types, devices incorporating an Ethernet port are also available to connect P-NET devices together locally or via other devices, including WLAN, to the Internet. Although such a connection through an Ethernet switch would also be transparent to P-NET communicating nodes, in this case each P-NET frame is wrapped within a User Datagram Protocol (UDP)/IP packet (Figure 15.9 and Figure 15.10).
15.5 Layer 2: The Data Link Layer The task of the data link layer is to: • Create and recognize frame boundaries and node addresses • Perform transmission error control • Control access to the bus, including multimaster access All communication on a bus is sent within a structured frame. This consists of a series of asynchronously transmitted 9-bit bytes. The important feature of each is that the ninth bit indicates whether the remaining bits are associated with a node address or some other data. Any microcontroller that is to be considered for use as a P-NET node must have the ability in its serial interface Universal Asynchronous PC with WLAN interface
Interconnecting Clients and Notebook PCs PC with WLAN interface Access Point
Uplink
Switch 1
Client mode
2
3
4
Client mode
VIGO PC
PD602
PD602 PD602
FIGURE 15.9 Using P-NET across Ethernet and WLAN media.
© 2005 by CRC Press
15-8
The Industrial Communication Technology Handbook
P-NET RS 485
P-NET Light-Link
P-NET 4-WIRE
P-NET Ethernet
RS 232
IR
Combined power and RS 485
Ethernet
Ring without termination. Bus with terminators.
Point to point
Multi -master bus supporting multinet structure
Bus
Star connection using switches
Screened twisted pair cable (STP) 100 – 120 ohms impedence (IBM Twinax)
Screened twisted pair (Cat 5 STP)
Plastic Optical Fibre 1000 um. In built within PD M 36 DIN rail modules
Single cable for both communication and power. Screened dual twisted pair.
Twisted pair cable without screen (Cat 5 UTP)
12 m
0.5 m
600 m
100 m
Depends on load
100
76.8 kbits/sec providing 300 confirmed floating point measurements/sec.
10/100 Mbit/s. 1000 confirmed messages/sec.
3.3 mS
1 mS
Electrical Standard
RS 485
Bus Structure
Medium
Bus Length No. of Nodes per Segment
P-NET RS 232
1200 m ring 600 m bus Up to 125 inc. up to 32 masters
Communication Speed
76.8 kbits/sec providing 300 confirmed floating point values/sec
Cycle Time
3.3 mS
2
300 – 230.4 kbit/s
-
Up to 125 inc. up to 32 masters 76.8 & 230.4 kbit/s providing 300 or 900 confirmed floating point measurements/sec. 3.3 or 1.1 mS
Accessories
Cutter & optic wedges
Bending Radius
> 30 mm
FIGURE 15.10 Overview of the physical media used by P-NET.
Receiver/Transmitter (UART) for this additional bit to be included within the byte. The byte structure is therefore as follows (Figure 15.11): • • • •
One start bit (logical 0) Eight data bits with least significant bit (LSB) first (bits 0 to 7) One address/data bit One stop bit (logical 1)
A frame is divided up into a number of variable- and fixed-length fields as follows (Figure 15.12): • • • • •
Node address field — 2 to 24 bytes Control/status field — 1 byte Info length — 1 byte Info field — 0 to 63 bytes Error detection field — 1 to 2 bytes
FIGURE 15.11 Structure of byte.
FIGURE 15.12 Structure of frame.
© 2005 by CRC Press
The Anatomy of the P-NET Fieldbus
15-9
Complete frames sent by masters are separated by an idle interval of no more than 30 bit periods before receiving a response frame. At the standard communication speed of 76.8 Kbaud, this is equivalent to 390 µS. The start of a frame can always be recognized by the fact that the first byte has the address/ data bit set to 1. In addition, the first address-identified byte in the frame having bit 7 set true will contain the node address (bits 0 to 6) of the token-holding master (see below). This introduces the fact that P-NET addressing also includes the requesting node address, as well as the destination node address from which a response is expected. Bit 7 of each address byte is thus used to indicate whether it is associated with the (slave) address from which a response is expected or is being made (bit 7 = 0) or the requesting (master) source address of the transmission to which a response is expected or is being received (bit 7 = 1). In other words, because the first byte of an address field is always a destination address, if bit 7 is false, then the transmission is a request from a master. Conversely, if bit 7 is true, the transmission is a response from a slave.
15.5.1 Node Address Field Node addresses (NAs) can extend from 0 to 127 (6 bits): • Address 0 — Normally used as default address when shipped from manufacture. This guarantees that this node will have a unique address when first incorporated into a network of nodes, before it is automatically set to its system-defined address. • Addresses 1 to 125 — Range of addresses within an individual P-NET sector, where addresses 1 to 32 are normally reserved for master nodes. • Address 126 — Broadcast to all nodes without stimulating a response. • Address 127 — Reserved for test. Since P-NET has the ability to communicate with nodes in other network segments (up to 10 layers away), the address field is of variable length. For simple connections between nodes on the same bus segment, only two bytes are required. The first is the destination address of the slave node. The second is the source address of the master. This simple address strategy can be recognized by the 0, 1 sequence of bit 7 in the two address bytes. The response from the slave (or master) also uses the same simple format, by reversing the received address fields so that the master’s source address becomes the slave’s destination address (sending it back from where it came) and the master’s destination address becomes the slave’s source address. This address strategy is recognized by the 1, 0 sequence of bit 7 in the two address bytes. This slave response format is always the same, no matter whether the request has been made by a local master or one originating from a remote segment. For a master node to address a slave node through a gateway into an adjacent P-NET segment or farther, it is necessary to use the services provided by a multiport master. Here the master that is originating the request, being aware of the route map to the node that is to give the response, prepares a series of signposts within the address field of the frame. The technique that P-NET uses to define a complete address consists of node addresses and port numbers, together with an indication of how far the journey is. Also, some kind of identity for this particular communication is included. Node addresses have already been discussed, so an explanation of what a port number is is relevant here. Because of the networking capabilities of P-NET, there needs to be a device having the ability to transparently transfer data from one network segment to another. This is called a gateway and consists of two or more input/output ports. This gateway is also a master node, because it has to control the access to the individual buses at the appropriate time. Of course, master devices that only have one port cannot act as gateways. Commonly, multiport masters consist of two or three external ports, but P-NET allows devices to have as many as is practically useful. Each port is numbered from 1 upward, and if it is being used as an output port, it is used within the address to define the complete route. So basically, an address that is to extend across one or more gateways (as opposed to simple local bus addressing) is formed by using the node address of the gateway in the local bus segment, then the port number that is
© 2005 by CRC Press
15-10
The Industrial Communication Technology Handbook
MASTER
GATEWAY 1 Node 5 of 6 Segment A
Port 2
Node 4 of 6
Port 2
GATEWAY 2 Node 2 of 2 Segment B
Port 1
Node 1 of 2
Port 2 Port 3
SLAVE Node 3 of 4 Segment C Node 34
FIGURE 15.13 Communication path to a remote slave.
connected to the adjacent segment, followed by the node address of the slave node or gateway to another segment, then the port number, and so on. As previously discussed, a P-NET address is structured in terms of its destination and source addresses, so that the response can find itself back to the requesting node. This concept is obviously a little more complex when we consider a long path through a number of gateways. The fact that each bus segment within a larger network operates both simultaneously and independently means that it is unlikely that the communication path will be completely continuous at a particular instance in time. In other words, due to the equal-priority philosophy of P-NET masters, it may be necessary for a gateway to wait before its port can access the adjacent bus. The more gateways that are involved in the transaction, the more traffic-controlled crossroads that can be expected during the journey. In order to help describe the route across these network areas (and back again), two additional pieces of data are incorporated into the address field (those bytes with the address/data bit 8 set to 1). If the address field consists of more than the two bytes used for simple addressing, the third byte is used to define how many extra bytes are to be involved in the address field. In addition, the last byte is used as a requesting master message identity. In simple terms, this is chosen by the master to ensure that when a response message is returned, it recognizes that this response is associated with a particular request. More specifically, it will be a number between 16 ($10) and 127 ($7F) and is in fact associated with a particular task running in the master. It can therefore be deduced that a master could send out a number of requests over various paths, and the responses may not return in the order sent out. It can also be concluded that, more often than not, a master is a programmable device/controller programmed in a multitasking high-level language (e.g., Pascal), having the ability to obtain values from network-based variables (e.g., Process-Pascal). So, there are two automatic tasks that a gateway node must perform. In order to satisfy the rule that a slave must reply to a request within 30 bit periods, the gateway must reply to the originating master that an “answer comes later.” This is so that this master can now release the token to allow this local segment to continue dealing with other bus transfers, be those local or across gateways from others or the same master (see below). The other task of the gateway is to respecify the address structure to enable the next communication stage to be achieved while ensuring that the way back for a response is not lost. In this respect, a number that previously identified an outgoing port is changed to become the equivalent incoming node address. Let us use an illustrative example to show how a master obtains data from a slave located in a remote network segment. The physical paths could include any of the media types already described (Figure 15.13). The address fields (add/data bit = 1) in Figure 15.14 show how a request is made across the above example of three P-NET segments. The hatched bytes relate to source rather than destination addresses (Figure 15.14). This inaugural address field includes the complete path to the slave. The first source node
FIGURE 15.14 Request from Master to gateway 1.
© 2005 by CRC Press
15-11
The Anatomy of the P-NET Fieldbus
Response from Gateway 1 to Master Begin
End
Node
Node
5
4
1 1
0 1
FIGURE 15.15 Response from gateway 1 to Master.
Request from Gateway 1 to Gateway 2
Begin
Node 1
0 1
Port 3
Node
Extra Bytes
0 1
5
0 1
34
Node
0 1
2
Port
1 1
2
1 1
End
Node 5
1 1
Task 20
1 1
FIGURE 15.16 Request from gateway 1 to gateway 2.
Request from Gateway 2 to Gateway1 Begin
End
Node
Node
2
1
1 1
0 1
FIGURE 15.17 Response from gateway 2 to gateway 1.
address indicates that node 5 in segment A currently holds the token (explained below). A complex address is recognized as one having more than three sequential zeros for bit 7 (Figure 15.15). The first destination and source node addresses have been reversed. The adjacent control/status field will indicate “answer comes later” and no data will be included. This releases node 5 for one of the other five masters to gain access to the bus in segment A (Figure 15.16). Gateway 1 port has been changed to a source node, and gateway 2 node changed to a source port. The first source node address indicates that node 2 in segment B currently holds the token (Figure 15.17). The first destination and source addresses have been reversed. The adjacent control/status field will indicate “answer comes later” and no data will be included. This releases node 2 for the one other master in segment B to gain access to the bus (Figure 15.18). The final part of the request path uses simple addressing as if gateway 2 had originated the request. It is a request because of the 0, 1 sequence of bit 7. The first source node address indicates that node 3 in segment C currently holds the token (Figure 15.19). The first destination and source addresses have been reversed and indicate a simple address response because of the 1, 0 sequence of bit 7. The following info field will contain the requested data. This releases node 3 for one of the other three masters in segment C to gain access to the bus (Figure 15.20). Now the source addresses in the request from gateway 1 to gateway 2 have been swapped to destination addresses, and the data from the slave passed on within the attached info field. This is sent
Request from Gateway 2 to Slave Begin
End
Node
Node
34
3
0 1
FIGURE 15.18 Request from gateway 2 to slave.
© 2005 by CRC Press
1 1
15-12
The Industrial Communication Technology Handbook
Response from Slave to Gateway 2 Begin
End
Node
Node
3
34
1 1
0 1
FIGURE 15.19 Response from slave to gateway 2. Answer from Gateway 2 to Gateway 1
Begin
Node 2
0 1
Port 2
0 1
Extra Bytes
6
0 1
Node 5
0 1
Task 20
0 1
Node 1
1 1
End
Port 3
1 1
Node 34
1 1
End 0
1 1
FIGURE 15.20 Answer from gateway 2 to gateway 1. Answer from Gateway 1 to Master
Begin
Node 5
0 1
Task 20
0 1
Extra Bytes
6
0 1
Node 4
1 1
Port 1
1 1
Node 5
1 1
End
Port 1
1 1
Node 34
1 1
End 0
1 1
FIGURE 15.21 Answer from gateway 1 to master.
when node 1 in segment B gains access to the bus. No additional response will be generated by gateway 1 to gateway 2, because the final address byte contains zero, meaning that this is an awaited answer (Figure 15.21). The final part of the request–response cycle, received by the originating node 5 while node 4 is holding the bus token. It indicates that the source of the data is from node 34, which has traveled across two segments and that the response is associated with task 20. The attached info field contains the requested data. The generation of any response is not required or expected for the reasons given above.
15.5.2 Control Status Field Control/status is the means by which it is possible to communicate an instruction from a master device to a slave during a request, or to reveal status or error information in a response. It consists of just one byte. The first three least significant bits (0 to 2) are used as a reduced instruction set during a request: • 001 Store — Used when transferring data from a master to slave. The response from the slave contains no data. • 010 Load — Used to request specific data from slave to master. The response from the slave contains all or part of the requested data. • 011 AND — Used to AND data from a master with that in a specific location in a slave. The response contains no data. • 100 OR — As above, except the data is ORed. • 101 Test and Set — This instruction is used to send data from a master to a slave, and after being ANDed with the contents of the specified location, returned to the master. • 110 Long Load — Used when it is known that the number of bytes of data requested from a slave will exceed 64. Here the communication will be automatically divided into 56-byte messages, so that other masters can also be given the opportunity to access the bus. • 111 Long Store — As above, except that the data will be transferred from master to slave. 15.5.2.1 Softwire Numbers In a request, bit 3 is used to define whether the location address of the data is to be regarded as a symbolic SwNo (0) or is an absolute address (1), which is mainly used for test purposes. This is the first time this
© 2005 by CRC Press
The Anatomy of the P-NET Fieldbus
15-13
word has been used so far, but it is an important concept in the world of P-NET. A softwire number is a way of symbolically defining the location of a variable within a P-NET node. SoftWiring can be conceptualized as the means of connecting a declared variable in an application to its value held in a remote node. Using softwire numbers means that an ordered set of variables all associated with a particular measurement or collection of data can be defined (a channel), even though these may be scattered throughout a microcontroller’s memory and may even use various memory types. This is opposed to using absolute addressing, where the location of a measurement in a device designed using one microcontroller is likely to be different for the same measurement using another type of controller. In a response, the use of the control/status byte changes to signal various coded qualitative states, varying from “OK,” “busy,” and “answer comes later” to error codes representing “no response,” “data format error,” “node address error,” “write protection,” and “net short circuit.” These are just a brief example of some of the tests and analyses that are performed on a message on its journey to and from a P-NET node. Generally speaking, if bits 0 to 2 are not all zero, then the status given is error-free, although bit 7 is used to show whether any historical error has been flagged in the node. Conversely, if bits 0 to 2 are all zero, then the code relates to various error states.
15.5.3 Info Length Field The info length field defines how many bytes of data are included in this transmission. The single byte is divided into two parts. Bits 0 to 5 define within the range 0 to 65 how many bytes are included in the info field. Bits 6 and 7 are also encoded to indicate whether the most significant bytes in the datacontaining info field are a SwNo of 2 or 4 bytes long and whether an offset is included.
15.5.4 Info Field The whole purpose of a communication protocol is to send and receive some kind of data or information. It is the info field that is the culmination of all the internal negotiating communications between the different layers within each node, to provide this within a frame container to the outside world. Of course, the frame has to be decoded and travel back up the layers again before the user can see any result, such as a measurement presented on a screen or that a solenoid has been activated. It is this assembling and disassembling from a command block into a packet into a frame and back again that is the essence of a communication protocol. The info field contains the number of data bytes defined by the info length field.
15.5.5 The Error Detection Field The error detection field provides a means for a check to be made as to whether any corruption of data has occurred during transmission. There are two selectable levels of detection. The first is called normal error detection, and during the formation of the frame, each byte to be sent is exclusive ORed with each of two previously cleared registers. For each byte for transmission, the second register is shifted one bit to the left. When the formation of the info field has been completed, the value of the first register is sent, followed by the value of the second register. The receiving node reverses the process, where each byte received is exclusive ORed with each of two previously cleared byte registers. For each byte received, the second register is shifted one bit to the left. On receipt of the first error detection field byte, the first register should be zero. On receipt of the second error detection field, the second register should be zero. If this is not the case, an error detect failure is generated. This method of error detection has a Hamming distance of 4, meaning that up to 3 randomly spaced erroneous bits within 64 will be detected. The second level of error detection is called reduced error detection. Here the Hamming distance is 2, where any single bit error will be detected. The method involves a single register to which each transmission byte is ADDed without carry. When the info field has been transmitted, the 2’s complement of the contents of the register is sent in the error detection field. The receiving node reverses the process by adding each byte received to a register. When the complete frame has been received, the register should contain zero. If this is not the case, an error detect failure will be generated.
© 2005 by CRC Press
15-14
The Industrial Communication Technology Handbook
In addition to the two methods of error detection described above, every received byte is checked for a correct start bit, address/data bit, and stop bit. If an error is found, then an overrun/framing error bit is set in the control/status byte. It should be noted that whatever error detection mechanism is chosen, all nodes within the same network segment must use the same method.
15.5.6 Master–Slave One of the characteristics of P-NET is that it is a multimaster–slave protocol, meaning that only one of perhaps a number of master devices at a time can instigate a request to another node (slave) to send back or receive data. However, it should be remembered that a master can (and often does) also act as a slave when it is not holding a token. To maximize efficiency, it is ensured that a request–reply cycle is completed in the fastest time possible. This is done by arranging for a slave module to prepare an immediate response once the slave has established that the request on the bus is for itself. Since the first byte in a request contains the node address, which is read by all nodes, only the slave with that address needs to perform any further processing beyond that point. All other slaves can “go to sleep” with respect to communication and get on with their own specific functional processing. Once the identified slave establishes that the frame generated by the master is complete, it must start to respond back to that master within 11 to 30 bit periods, which is equivalent to a maximum of 390 µS at 76.8 Kbit/s. During the period that the slave is transmitting the response frame, it has exclusive access to the bus.
15.5.7 Multimaster Bus Access As explained previously, P-NET has the versatility to include more than one master node within a single bus segment. In fact, up to 32 masters are permitted (some of which could be the common PC). Since any P-NET segment only utilizes one common communication cable, there is a need for a mechanism to determine which of the master nodes is permitted to access the bus. In communication terms, this uses a technique called token passing. Here, it is arranged that only one master is able to hold the token at any one time, and when it does, it is permitted to access the bus. When that master has finished using the bus, it passes the token to another master. Unlike some token-passing techniques, P-NET uses a rather ingenious method to transfer the right to access where no actual data representing the token are communicated between masters. This method is therefore called virtual token passing. It has also been seen previously that the data (info) held in a frame can vary from 0 to 63 bytes. Although this does not prevent large amounts of concatenated data (record, array, database, program, etc.) from being transmitted from one master to another node, it is arranged so that it is fragmented into token-controlled transfers of up to 56 bytes at a time. The consequence is that any master connected to the same bus gets the same level of priority and opportunity to access the bus, and no master can clog up the system when transferring large amounts of data. So, how is this done? Well, a master needs to incorporate an additional mechanism not required in a slave. The main part consists of two counters: the idle bus bit period counter (IBBPC) and the access counter (AC). The latter is associated with node addressing. Each master is given a node address between 1 and 32 (normally sequentially) as well as a number representing the maximum number of masters that are expected to be connected to the bus (e.g., node 3 of four masters). The IBBPC is designed to increment for each bit period that the bus has not made a transit from a 1 to a 0. For a bus operating at the standard 76.8 Kbs, this bit period is 11 µS ± 0.1%. If such a change is seen, then the IBBPC is reset to zero. When transmissions are being made, the first byte in a frame that has bit 7 set to 1 will contain the node address of the token-holding master, because it is the only one allowed to start a transmission off. All other masters can see this and will therefore synchronize their access counters to the same number as the node address of the active master. When a slave has finished responding to a request from the token-holding master, there will then be an idle period of 40 bits (Figure 15.22). At this point in time, therefore, all IBBPCs will have a count of 40. This number is the minimum decade value to stimulate an access counter to increment by 1, or to reset to 1 if it becomes greater than
© 2005 by CRC Press
15-15
The Anatomy of the P-NET Fieldbus
Action Bus
AccessCounter
3
4
1
50
40
2
60
3
4
80
70
Idle-Bus Bit-Period Counter
FIGURE 15.22 Sharing bus access between four master nodes.
Physical Link Layer 1
Data Link Layer 2
Network Layer 3
Service Layer 4
Port N Port 2 Port 1
BusII BusI interface
*
13 12 11 NA X
Master Receive Multimaster BusAccess
Master Buffer
*
10
NA Y C o n v e r t
Program Service
Slave Transmit
* Bit 7 in the first address byte is "1"
*
Data Memory
NA 2 NA:Y
* Internal"Bus" Node Add Error NA70
T A S K
NA 11 NA 10
ACL
A d d r e s s
Slave Receive
NA: NA:X
Application Layer 7
<Packet>
NA??
*
P-NET Service
<Packet>
SW List
FIGURE 15.23 A conceptual view of the P-NET communication layers.
the recorded maximum number of masters. The master node that has a node address equal to the value of the access counter is now said to be holding the token. All this has been done without any data having been passed between masters. This master must now use the bus within 2 to 7 bit periods, although there is no obligation to do so if it has nothing to transmit (or indeed, a master with that address is not powered or not even connected). If this is the case after 10 bit periods, then all IBBPCs will have increased by another 10 (i.e., 50, 60, 70 …), which will increase the access counter by 1, thereby giving the next master in the cycle the chance to access the bus (hold the virtual token). If a master does respond within the timescale, the bus will no longer be idle, and all IBBPCs will therefore be reset to zero, awaiting the end of the current acknowledge transmission (Figure 15.23).
15.6 Layer 3: The Network Layer The structure and contents of the frame as described in the data link layer (layer 2) have encompassed practically all the important aspects for addressing and communicating data between two P-NET nodes. However, the frame was the result of assembling commands, addresses, and routes from within the other layers of communication activity. In the same way that there is a parallel to serial conversion between the byte structure of the frame in layer 2 and the electrical impulses of layer 1, there is also a structure conversion between layer 2 and layer 3, the network layer. The frame is converted into a packet and vice
© 2005 by CRC Press
15-16
The Industrial Communication Technology Handbook
versa. Its form is not much different, where basically the error detect field in a frame is replaced with a retry timer in a packet. Layer 3 is basically the P-NET post office, which transfers the packets according to the destination address. A message may be required to be sent out of another P-NET port, or into the P-NET service layer (layer 4), or back to the requesting application, or to return a message indicating that an address is unknown. It also performs any action necessary to ensure that a response finds its way back. It may be deduced that a gateway will require greater activity at this level than a single port master. When considering a slave device, layer 3 is practically transparent.
15.7 Layer 4: The Service Layer In essence, layer 4 performs two kinds of service, one when the node is acting as a slave, called the P-NET service, and the other when acting as a requesting master, called the program service. One common aspect is that each is now able to access internal data memory and an area called the softwire list. As mentioned before, a softwire number is an important aspect of P-NET in that, together with a node address, it is the means to symbolically address a memory location anywhere within a P-NET system. A softwire list is a memory-based table. It holds details about the device’s own internally held global variables and details of the variables used by this device’s application program, which have been declared to be located in other addressable nodes. In either case, the list index is the SwNo; it maps not only the absolute address of the variable, including the node address if external, but also the data and memory type. Therefore, if a device acting as a slave has received a request for data, it is this layer that interprets the command instruction (load, store), translates the SWNo into an absolute memory location, retrieves or stores the data, and converts the packet into a response by removing all addresses having bit 7 = 0. This is then sent back to the network layer. If the device requires the sending of data or the requesting of data from another node, a command is received from the application layer. The associated SwNo of the variable is translated into a complete node address, and a task number is attached to identify to which application task this message applies. Together with the command and data if any, this is all formed into a packet and sent to layer 3. If the amount of data to be sent exceeds 56 bytes, it is this layer that holds all the fragmented packets for release each time the master regains access to the bus. If the application has made a number of requests to which “answer comes later” has been received, this layer is responsible for ensuring that the response is returned to the correct application task.
15.8 Layer 7: The Application Layer Layer 7 is the means by which application programs begin the process of accessing variables in other nodes and networks. When a task in a running program requires the value of an external global variable to be obtained or changed, the program code is translated into a command block containing a code defining the operation requirement, the SwNos of both the internal and external variables, an expected data length, and a means to relate the block to the calling task. The P-NET protocol is therefore able to deal with multitasking processes, where a number of tasks may be making simultaneous calls on network variables. Such blocks are used in layer 4, by referring to the softwire list to form the complete node address, to be structured into packets for transfer through the other layers. It is the act of declaring variable types and locations in the user program that generates, during compilation, the softwire table for use by the running application(s).
15.9 Layer 8: The User Layer As far as the traditional OSI seven-layer model is concerned, there is no layer 8 in the analysis of a communication path between two devices. This is quite reasonable since, for example, in describing a
© 2005 by CRC Press
15-17
The Anatomy of the P-NET Fieldbus
RegNo. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Identifier FlagReg IOTimer Counter OutCurrent Operatingtime LevelVoltage MinMaxCurTimer InputHoldPreset OutputPreset ChConfig
MEMORY TYPE RAM Read Write RAM Read Write RAM Auto Save RAM Read Only RAM Auto Save RAM Read Only RAM Read Write RAM Init EEPROM RAM Init EEPROM EEPROM RPW
Read Out Binary Decimal Decimal Decimal Decimal Decimal Decimal Decimal Decimal ------
Type Bit8 Real Long Integer Real Real Real Real Real Record Record
SI unit --s --A s V s s s ---
MinMaxCurrent VoltageScale Maintenance ChType CHError
EEPROM RPW EEPROM RPW EEPROM RPW PROM Read Only RAM Read Only
Decimal Decimal ----------Binary
Record Record Record Record Record
-----------
FIGURE 15.24 A standard digital input/output channel.
telephone network, one does not really need to know what the conversation between the two parties is about once the connection has been made. However, in the current age of object-oriented programming, an individual process measurement object can have many properties associated with it, and within its bounds have various methods applied. The P-NET protocol lends itself very well to this object-oriented concept, and there are certain rules that need to be followed before a device can be said to have P-NET compatibility. The object in this case is a formally structured table of data associated with a particular process or process variable, the index of which has an associated SwNo. We touched on the idea of softwire numbers in layers 4 and 7, but this was to describe how we get the global address of an individual global variable. Here the table, which may just be a small part of the softwire list previously described, incorporates all there is to know about that process variable (its properties), be that its value or state, how long it has been active, how many times it has switched, and its ability to perform automatic functions (its methods). Such a table definition is called a channel and is the basis of a P-NET object. In basic terms, a channel consists of 16 registers of variables or constants, of any simple or complex data type, that can be derived from unrelated memory locations and technologies (Figure 15.24). It is the channel and register numbers that define the SwNo, which together with the node address of the device provides the complete P-NET address. Any universal P-NET device must have at least one channel that represents the whole device as an object. It is within this channel that properties such as node address, manufacturer, serial number, device type, etc., are found or specified. This is called the service channel and is always identified as channel 0. Since anyone using P-NET knows how to access the identifying properties of a device object, it is a simple matter to set up a node address or to check that a connected node is of the correct type or not exhibiting any errors. Of course, it is in the slave nodes where the object-oriented structure of channels is highly utilized. A device can have as many channels as thought to be practical as long as they include a service channel. Each of these can be of the same channel type or be different. A device manufacturer may decide to have an eight-channel module concentrating on digital input/output. Another may wish to mix analogue, digital I/O, and other channel types in the same device. A low-cost transmitter could be made using one special measurement channel and a service channel. Over the years, many standardized channel types have been defined. These are held by the International P-NET User Organization (IPUO) and help to ensure that a device manufacturer of a weight transmitter, for example, can provide a compatible channel structure to one that has already been produced. It is these predefined object structures that make it easy for the programming user to treat as local variables any variable within the distributed nodes connected throughout a highly complex networked
© 2005 by CRC Press
15-18
The Industrial Communication Technology Handbook
Node Application
Communications
Microprocessor
Dualport RAM
Dip-switches Micro-/Chipprocessor
Fieldbus interface
Fieldbus
FIGURE 15.25 A fieldbus node requiring additional protocol hardware and memory.
system. Since such structures hold the address and data type of all device variables, the programmer only has to declare that he is using an instance of this predefined device type and then just specifies its node address. All contained and named variables within this device (also defined in the device’s own softwire list), of which there may be hundreds, are therefore also declared, and can be referred to in high-level languages by means of highly convenient object-oriented identifiers such as Tank1.Temperature2.Value. The programming user, of course, does not have to worry about writing any code associated with the transport of data between nodes, and can instead concentrate on designing the required processes using a meaningful naming and object-oriented strategy.
15.10 The Intelligent P-NET Node One of the main philosophies behind P-NET is that of distributing processing power, popularly known as intelligence. The idea of controlling a complete system from a central point, even if basic transducers and actuators are networked, is at odds with the concept of segmenting a system into autonomous but intercommunicating parts. Fieldbus was born when integrated microcontrollers became available, providing both programmability and serial communication. Some fieldbus types offer manufacturers an add-on chip to provide the means to communicate with that protocol. Others offer a complete microcontroller with built-in protocol (Figure 15.25). P-NET has been designed so that a wide selection of standard microcontrollers can be used depending on how they are to be used. Not only does this make it unnecessary to interface with additional hardware, but it also allows the user to choose his preferred or most economic supplier (Figure 15.26). The protocol is therefore completely software based, which for a slave at least, could just about be derived from what has been described here so far. Alternatively, a copy of the international standard will provide the formal structure. However, the most attractive route may be to make a one-off purchase of the code for the specific micro family of your choice. Any of these routes ensures that no extra royalties or levies will be paid to a supplier for special chip purchases. As mentioned elsewhere in this treatise, one of the first P-NET devices was a magnetic flow meter transmitter. This was and continues to be a highly intelligent fieldbus node. Not only does the microcontroller deal with aspects of electronically providing a measurement of flow and converting this into volume, but it is able to present these measurements configured into the unit of measurement the user chooses. Furthermore, by integrating
© 2005 by CRC Press
15-19
The Anatomy of the P-NET Fieldbus
P-NET Node Application & Communications Microprocessor
Fieldbus interface
Fieldbus= P-NET
FIGURE 15.26 A P-NET node where communications protocol is part of the application software.
temperature measurement with digital and analogue I/O, and a PID channel, closed-loop temperaturecompensated batch and blend control can be independently undertaken without any intervention from the fieldbus, apart from perhaps initially sending a set point or requesting a flow measurement. This fact alone reduces the requirement for a high bandwidth, because high-speed operations can be performed locally. Since the design of this device, many other specialized sensors and general-purpose digital and analogue input/output modules have been produced (Figure 15.27). Thus, the general trend is that any object-oriented channel incorporated into a module has the intelligence to perform some form of autonomous task or process without the need for constant fieldbus communication. For digital devices, this may include performing high-speed counting, measuring the load current, and automatically switching off if a set point level is reached. (Also, pulse width and duty cycle control can be performed, depending on the setting of individual control bits within a channel register.) For analogue channels, the most likely
FIGURE 15.27 Example of slave module containing a mix of analogue and digital I/O channels and two internal channels.
© 2005 by CRC Press
15-20
The Industrial Communication Technology Handbook
FIGURE 15.28 A PC as an integral part of a P-NET system.
task is to scale a measurement into engineering units for immediate use elsewhere on the network, signal level monitoring, and setting of alarm bits.
15.11 The PC and P-NET No process automation system based on a fieldbus could yield its full potential without somehow providing the opportunity to incorporate the ubiquitous PC somewhere during its development, commissioning, or operation. P-NET is no exception in this respect, where many PC-based tools have been developed for defining a system structure, setting up node addresses, configuring channel functions, and monitoring individual or lists of variables. Furthermore, software development editors, compilers, and debuggers have been made available for the download of multitasking, high-level language and objectoriented process automation programs to program channels incorporated into P-NET multiple-port master devices (Figure 15.28). To do this, there must first be an interface between the PC and P-NET, and second, the PC must be running a real-time operating system. The first part is relatively straightforward, in that either a built-in interface card or an external module attached to the parallel port can be used. In fact, the PC serial or local area network port could also be used, if this were directly connected to an RS 232 or Ethernet port on a P-NET master. As far as the operating system is concerned, there is really only one package to consider — VIGO. In the same way that P-NET was derived from a phrase, VIGO was derived from the term virtual global object to depict that this package has the means to generate and manage objects (Figure 15.29). Since the time P-NET was conceived, VIGO has evolved into the realtime Windows-based PC operating system for P-NET, in the same way that Microsoft Windows is really
FIGURE 15.29 The object-oriented nature of P-NET.
© 2005 by CRC Press
15-21
The Anatomy of the P-NET Fieldbus
Configuration Editor Applications e.g., Visual Basic,Excel,Access,VisualC++,.... BackupRestore
Common Application Service Interface: OLE2 Automation Interface
Manager Information Base
VIGOSERV Application Programmers Fieldbus Interface Common Communication Service Interface: MMS Interface
P-NET Protocol
Profibus Protocol
WorldFip Protocol
Compiler
Project Configuration
Debugger
MAPFile
? Protocol Instruction Data Converter (IDC) HUGO2API
HUGO2 Real time communication kernel for MS Windows
HUGO2 API
P-NET Driver
Profibus Driver
WorldFip Driver
? Driver
P-NET HW
Profibus HW
WorldFip HW
? HW
Network Drivers
Hardware Drivers
Fieldbuses
Novell Driver
NetBios Driver
Ethernet HW
Local Area Networks
FIGURE 15.30 Conceptual view of the structure of VIGO.
the only choice for Intel-based PCs. VIGO, as well as a great deal of other P-NET hardware and software, has been and continues to be developed by PROCES-DATA. As with Windows, VIGO is constantly evolving and is really an integrated suite of interrelated programs operating around a central core (Figure 15.30). At the time of writing, VIGO is well into the version 5.x level, having first made an appearance in the early 1990s, and is freely available on the Internet. One could regard VIGO as a means to turn a PC into a P-NET multimaster node, and indeed it does just that. You can use a PC as a transparent gateway node between P-NET segments or between P-NET and other electrical standards and protocols such as RS 232 for Modbus, or Ethernet for linking to local nodes, or the Internet. Since VIGO is extensible, additional protocols, including those for other fieldbus types, could be incorporated. The other exceptional feature of VIGO is that it is also a Windows OLE2 automation server. This means that any standard Windows application designed as an object linking and embedding (OLE) client, such as Excel or Access, can transfer data to and from these programs and any P-NET remotely located node. Furthermore, customized graphical applications can be written in any language for use on a PC, e.g., Delphi, Visual Basic, C++, etc., and can interface with all declared P-NET variables. One such specific application is included in the P-NET suite: Visual VIGO. This is because it is a graphical SCADA (supervisory control and data acquisition) application, where a system is drawn by the system designer using graphical components or objects. Each is allied to a P-NET variable so that, for example, the liquid level in a tank can be displayed as a configurable bar graph, or that a picture of a valve can be clicked with the mouse to open and close it (Figure 15.31). Visual VIGO also incorporates data acquisition components, where variables either can be conditionally logged in a P-NET master and then used by Visual VIGO to plot historical trend graphs or tables, or can be directly logged by the PC in real time. In both cases, collected data are held on the PC hard disc in a form suitable for conversion to any other database format (Figure 15.32).
© 2005 by CRC Press
15-22
The Industrial Communication Technology Handbook
FIGURE 15.31 Visual VIGO display of a mixing system.
15.12 The Appliance of P-NET From its inception in Denmark, P-NET was, among other things, initially used to link together groups of intelligent hygienic flow meter transmitters, so that they could be configured, provide multiple measurement values already scaled into engineering units, set batches, and control flow with PID loops, all from a central point using only one cable. The early successes with this fieldbus type therefore tended to revolve around industries requiring hygienic liquid measurement, e.g., dairies, breweries, and soft drinks companies. The fact that Denmark is renowned for the quality of its dairy products and beer, ensured that the advantages of this emerging fieldbus technology were rapidly adopted nationally during any upgrading or the building of new plants. P-NET was made a national standard, and together with two other open standards from Germany and France formed a combined European standard: EN 50170. Another industry of importance to Denmark is intensive pig farming, providing significant exports of bacon all over the world. P-NET provided an economic opportunity to modernize and automate animal feeding systems, to an extent that customized animal feeding systems have also been exported widely. The networking and object-oriented nature of P-NET was recognized by other diverse industries for its systems’ ability to be packaged for export. The manufacture of concrete products, involving weighing, mixing, conveyor belt control, etc., has been one of the successful examples of a customizable system completely removed from the dairy/brewing sector. Another major environment within which P-NET has had a major impact has been shipping. Systems for large tankers, container ships, ferries, and luxury yachts involve ballast control, engine management, level measurement, and general heating/lighting/ security duties. One of the reasons for its popularity in this region is because P-NET is multimaster, allowing simultaneous monitoring and distributed control from various locations aboard a ship. In addition, the fact that P-NET is multinet means that there is a naturally built-in redundancy for signal paths, essential in order to gain worldwide insurance and safety approvals.
© 2005 by CRC Press
The Anatomy of the P-NET Fieldbus
15-23
FIGURE 15.32 Visual VIGO logged data.
P-NET has also been found to suit the requirements of smaller mobile systems, such as delivery/ collection trucks. It goes without saying that the flow metering requirements of the dairy/brewing industries have also extended to the collection of ex-farm milk and, in Scandinavia at least, to deliver beer in bulk to bars and hotels. Such systems may only require a single multiport master and some slave transducers, but the ability to port to GPS for customer or supplier location recording and the use of GSM for download of collected data, demonstrate the flexible advantages offered by P-NET at the other end of the complexity spectrum. We cannot leave Denmark without mentioning a huge national project confirming the “green” credentials of this high-tech country. There is legislation in some European countries to levy a charge on drink bottles and cans, to ensure that these are returned when empty for reuse or recycling. This has, in the past, involved additional resources to sort, store for return, and credit the customer. The steady increase in the variety of bottles and the addition of cans into the equation warranted a technological solution to ensure a continued green advantage. Of course, this massive undertaking by up to 10,000 retail shops and supermarkets is being highlighted because the only fieldbus permitted to be used by the machine manufacturers is P-NET. The system consists of at least two machines, each of which can be supplied by different companies. Obviously, the question of compatibility and interoperability is an important consideration here. The use of the P-NET protocol over various physical media is also demonstrated, in that automatic communication between each shop and central depots uses the data communication facilities offered by the GSM mobile phone network. This enables all outlets to be kept up to date with current charges together with the identities of all permitted bottles and cans.
15.13 Worldwide Fallout The diverse nature and popularity of P-NET applications within its country/region of origin has no doubt had a profound effect on the quantity of product, systems, and knowledge exported to other areas of the
© 2005 by CRC Press
15-24
The Industrial Communication Technology Handbook
FIGURE 15.33 P-NET modules mounted in an electrical distribution box.
world. In the same way as other fieldbus types, P-NET is an enabling technology, and as additional product has become available through systems and component manufacturers and distributors, system integrators and designers have embraced the business opportunities presented to produce enhanced or new systems in many other countries. Some memorable examples include: • A well-known engineering company in India is producing tire-manufacturing plants controlled by P-NET. • A Chinese company is using P-NET to control wind turbines. • A German company is using an intrinsically safe form of P-NET to produce metering systems for petroleum and domestic heating oil delivery trucks. • A U.K. company is producing fuel management systems based on P-NET to fuel trains from depots throughout British railways. A Canadian company manufactures P-NET-enabled mixing systems for the soft drink industry (Figure 15.33). Other industrial sectors within which P-NET is represented include: • • • • • • • • •
Fish farming Agricultural systems Propane gas container filling systems Textile manufacture Blood testing equipment manufacture Building management systems Product selection machines in retail outlets Weather stations Home automation
15.14 P-NET for SMEs It could be argued that as a fieldbus type, P-NET is not quite as well known as some of the more publicized protocols. This may be because its diverse utilization within many industrial sectors and world locations has been promoted more by communication between engineers and programmers looking to provide solutions, rather than through the budgets of sales and marketing organizations. However, like its siblings, P-NET is an international standard because it meets and often exceeds the criteria required for it to be so. Its often unsung implementations by small and medium enterprises (SMEs) in many major projects can perhaps be likened to the quiet enthusiasm of certain software engineers for the Linux operating system over Windows, where the well-known use of the latter can often overshadow a technical or commercial advantage of the former.
© 2005 by CRC Press
The Anatomy of the P-NET Fieldbus
15-25
For those already having some knowledge of fieldbus types, it is hoped that this chapter has helped put P-NET into some kind of comparative perspective. For those new to industrial communication techniques, its compilation will have been worthwhile if there is now at least a conviction that fieldbus technology like P-NET has much to offer in achieving robust solutions to the challenges presented by modern industrial processing.
Bibliography [1] [2] [3] [4] [5]
The P-NET Fieldbus for Process Automation: International P-NET User Organization. P-NET 502 058 01, International P-NET User Organization. Installation Guide, PROCES-DATA A/S. www.p-net.org. (Various figures and diagrams were obtained from this source.) www.proces-data.com. (Various figures and diagrams were obtained from this source.)
© 2005 by CRC Press
16 INTERBUS Means Speed, Connectivity, Safety
Jürgen Jasperneite Phoenix Contact GmbH & Co. KG
16.1 Introduction to Field Communication ...........................16-1 16.2 INTERBUS Overview........................................................16-2 16.3 INTERBUS Protocol .........................................................16-4 16.4 Diagnostics.........................................................................16-7 16.5 Functional Safety...............................................................16-8 16.6 Interoperability, Certification ...........................................16-9 16.7 Connectivity ....................................................................16-10 16.8 IP over INTERBUS .........................................................16-12 16.9 Performance Evaluation..................................................16-13 16.10 Conclusions .....................................................................16-14 References ...................................................................................16-14
16.1 Introduction to Field Communication The growing degree of automation in machines and systems increases the amount of cabling required for parallel wiring due to the large number of input/output (I/O) points. This brings with it increased configuration, installation, start-up, and maintenance effort. The cable requirements are often high because, for example, special cables are required for the transmission of analog values. Parallel field wiring thus entails serious cost and time factors. In comparison, the serial networking of components in the field using fieldbus systems is much more cost-effective. The fieldbus replaces the bundle of parallel cables with a single bus cable and connects all levels, from the field to the control level. Regardless of the type of automation device used, e.g., programmable logic controllers (PLCs) from various manufacturers or PC-based control systems, the fieldbus transmission medium networks all components. They can be distributed anywhere in the field and are all connected locally. This provides a powerful communication network for today’s rationalization concepts. There are numerous advantages to a fieldbus system in comparison with parallel wiring: The reduced amount of cabling saves time during planning and installation, while the cabling, terminal blocks, and control cabinet dimensions are also reduced (Figure 16.1). Self-diagnostics minimize downtimes and maintenance times. Open fieldbus systems standardize data transmission and device connection regardless of the manufacturer. The user is therefore independent of any manufacturer-specific standards. The system can be easily extended or modified, offering flexibility as well as investment protection. Fieldbus systems, which are suitable for networking sensors and actuators with control systems, have represented state-of-the-art technology for some time. The main fieldbus systems are combined under
16-1 © 2005 by CRC Press
16-2
The Industrial Communication Technology Handbook
FIGURE 16.1 Serial instead of parallel wiring. TABLE 16.1 The Four Basic Types of Arithmetic Operations for Field Communication Signal acquisition Functional safety Drive synchronization Connectivity
Quick and easy acquisition of signals from I/O devices Transmission of safety-related information (e.g., emergency stop) Quick and precise synchronization of drive functions for distributed closed-loop controllers Creation of seamless communication between business processes and production
the umbrella of IEC 61158 [1]. This also includes INTERBUS as type 8 of IEC 61158 with an installed basis of 6.7 million nodes and more than 1000 device manufacturers. The requirements of these systems can be grouped according to the four basic types of arithmetic operations for field communication shown in Table 16.1.
16.2 INTERBUS Overview INTERBUS has been designed as a fast sensor–actuator bus for transmitting process data in industrial environments. Due to its transmission procedure and ring topology, INTERBUS offers features such as fast, cyclic, and time-equidistant process data transmission, diagnostics to minimize downtime, and easy operation and installation, as well as meets the optimum requirements for fiber-optic technology. In terms of topology, INTERBUS is a ring system; i.e., all devices are actively integrated in a closed transmission path (Figure 16.2). Each device amplifies the incoming signal and forwards it, enabling higher transmission speeds over longer distances. Unlike other ring systems, the data forward and return lines in the INTERBUS system are led to all devices via a single cable. This means that the general physical appearance of the system is an open tree structure. A main line exits the bus master and can be used to form seamless subnetworks up to 16 levels deep. This means that the bus system can be quickly adapted to changing applications. The INTERBUS master–slave system enables the connection of up to 512 devices, across 16 network levels. The ring is automatically closed by the last device. The point-to-point connection eliminates the need for termination resistors. The system can be adapted flexibly to meet the user’s requirements by adding or removing devices. Countless topologies can be created. Branch terminals create branches,
© 2005 by CRC Press
INTERBUS Means Speed, Connectivity, Safety
16-3
FIGURE 16.2 Topology flexibility.
which enable the connection and disconnection of devices. The coupling elements between the bus segments enable the connection and disconnection of a subsystem and thus make it possible to work on the subsystem without problems, e.g., in the event of an error or when extending the system. Unlike in other systems where data are assigned by entering a bus address using dual in-line package (DIP) or rotary switches on each individual device, in the INTERBUS system data are automatically assigned to devices using their physical location in the system. This plug-and-play function is a great advantage with regard to the installation effort and service friendliness of the system. The problems and errors, which may occur when manually setting device addresses during installation and servicing, are often underestimated. The ability to assign easy-to-understand software names to the physical addresses enables devices to be added or removed without readdressing. In order to meet the individual requirements of a system, various basic elements must be used (Figure 16.2): 1. Controller board: The controller board is the master that controls bus operation. It transfers output data to the corresponding modules, receives input data, and monitors data transfer. In addition, diagnostic messages are displayed and error messages are transmitted to the host system. 2. Remote bus: The controller board is connected to the remote bus devices via the remote bus. A branch from this connection is referred to as a remote bus branch. Data can be physically transmitted via copper cables (RS-485 standard), fiber optics, optical data links, slip rings, or other media (e.g., wireless). Special bus terminal modules and certain I/O modules or devices such as robots, drives, or operating devices can be used as remote bus devices. Each has a local voltage supply and an electrically isolated outgoing segment. In addition to the data transmission lines, the installation remote bus can also carry the voltage supply for the connected I/O modules and sensors. 3. Bus terminal module: The bus terminal modules, or devices with embedded bus terminal module functions, are connected to the remote bus. The distributed local buses branch out of the bus terminal module with I/O modules, which establish the connection between INTERBUS and the sensors and actuators. The bus terminal module divides the system into individual segments, thus enabling one to switch individual branches on/off during operation. The bus terminal module amplifies the data signal (repeater function) and electrically isolates the bus segments. 4. Local bus: The local bus branches from the remote bus via a bus coupler and connects the local bus devices. Branches are not allowed at this level. The communications power is supplied by the bus terminal module, while the switching voltage for the outputs is applied separately at the output modules. Local bus devices are typically I/O modules.
© 2005 by CRC Press
16-4
The Industrial Communication Technology Handbook
FIGURE 16.3 The layer 2 summation frame structure of INTERBUS.
16.3 INTERBUS Protocol INTERBUS recognizes two cycle types: the identification cycle for system configuration and error management, and a data transfer cycle for the transmission of user data. Both cycle types are based on a summation frame structure (Figure 16.3). The layer 2 summation frame consists of a special 16-bit loopback word (preamble), the user data of all devices, and a terminating 32-bit frame check sequence (FCS). As data can be simultaneously sent and received by the ring structure of the INTERBUS system (full-duplex mode), this results in very high protocol efficiency. The logical method of operation of an INTERBUS slave can be configured between its incoming and outgoing interfaces by the register set shown in Figure 16.4. Each INTERBUS slave is part of a large, distributed shift register ring, whose start and end point is the INTERBUS master. In the input data register, input data, i.e., data that is to be transmitted to the master, is loaded during data transfer. The output data registers and the cyclic redundancy check (CRC) register are switched to the input data register in parallel [8]. The polynomial g(x) = x16 + x12 + x5 + 1 is used for the CRC. After finishing a valid data transfer cycle, output data from the output data register are written to a memory and then accepted by the local application. The CRC registers are used during the frame check sequence to check whether the data have been transmitted correctly. The length of the I/O data registers depends on the number of I/Os of the individual node. The master needs to know which devices are connected to the bus so that it can assign the right I/O data to the right device.
FIGURE 16.4 Basic model of an INTERBUS slave node.
© 2005 by CRC Press
INTERBUS Means Speed, Connectivity, Safety
16-5
Once the bus system has been switched on, the master starts a series of identification cycles, which enable it to detect how many and which devices are connected. Each slave has an identification data register, which has a 16-bit ID code. The master can use this ID code to assign a slave node to a defined device class (e.g., digital I/O node, analog I/O node) and detect the length of the I/O data registers in a data cycle. The control data registers are switched in parallel to the identification data registers, whereby the individual devices can be managed by the master. Commands are transmitted, e.g., for local resets or outgoing interface shutdown. The identification cycle is also used to find the cause of a transmission error and to check the integrity of the shift register ring. The individual registers are switched in the different phases of the INTERBUS protocol via the selector in the ring. On layer 1, INTERBUS uses two telegram formats with start and stop bits similar to those of Universal Asynchronous Receiver/Transmitter (UART): • The 5-bit status telegram • The 13-bit data telegram The status telegram is used to generate defined activity on the medium during pauses in data transmission. The slave nodes use the status telegram to reset their internal watchdogs, which are used to control a fail-safe state. The data telegram is used to transmit a byte of the layer 2 payload. The remaining bits of both telegrams are used to distinguish between data and ID cycles, as well as the phase data transfer and FCS within a cycle. This information is used by the selector shown in Figure 16.4 to switch the relevant register in the ring. INTERBUS uses a physical transmission speed of 500 kbps or 2 Mbps. The cycle time, i.e., the time required for I/O data to be exchanged once with all the connected modules, depends on the amount of user data in an INTERBUS system. Depending on the configuration, INTERBUS can achieve cycle times of just a few milliseconds. The cycle time increases linearly with the number of I/O points, as it depends on the amount of information to be transmitted. For more detailed information, refer to the performance evaluation. The architecture of the INTERBUS protocol is based on the OSI reference model according to ISO 7498. The protocol architecture of INTERBUS provides the cyclic process data and an acyclic parameter data channel, using the services of the peripheral message specification (PMS) as well as a peripheral network management (PNM) channel (see Figure 16.5). As is typical for fieldbus systems, for reasons of efficiency, ISO layers 3 to 6 are not explicitly used, but are combined in the lower-layer interface (LLI) in layer 7. The process data channel enables direct access to the cyclically transmitted process data. It is characterized by its ability to transmit process-relevant data quickly and efficiently. From the application point of view, it acts as a memory interface. The parameter channel enables data to be accessed via a service interface. The data transmitted in the parameter channel have a low dynamic response and occur relatively infrequently (e.g., updating text in a display). Network management is used for manufacturer-independent configuration, maintenance, and startup of the INTERBUS system. Network management is used, for example, to start or stop INTERBUS cycles, to execute a system reset, and for fault management. Furthermore, logical connections between devices can be established and aborted via the parameter channel in the form of context management. To transmit parameter data and time-critical process data simultaneously, the data format of the summation frame must be extended by a specific time slot. In several consecutive bus cycles, a different part of the data is inserted in the time slot provided for the addressed devices. The Peripherals Communication Protocol (PCP) performs this task [5]. It inserts a part of the telegram in each summation frame and recombines it at its destination (see Figure 16.6). The parameter channels are activated, if necessary, and do not affect the transfer of I/O data. The longer transmission time for parameter data that are segmented into several bus cycles is sufficient for the low time requirements that are placed on the transmission of parameter information. INTERBUS uses a master–slave procedure for data transmission. The parameter channel follows the client–server paradigm. It is possible to transmit parameter data between two slaves (peer-to-peer communication). This means that both slaves can adopt both the client and server functions. With this
© 2005 by CRC Press
16-6
The Industrial Communication Technology Handbook
FIGURE 16.5 Protocol architecture of an INTERBUS node.
FIGURE 16.6 Transmission of parameter data with a segmentation and recombination mechanism.
function, layer 2 data are not exchanged directly between the two slaves, but are implemented by the physical master–slave structure; i.e., the data are first transmitted from the client to the master and then forwarded to the server from the master. The server response data are also transmitted via the master. However, this diversion is invisible for slave applications. The task of a server is described using the model of a virtual field device (VFD). The virtual field device model unambiguously represents that part of a real application process that is visible and accessible through the communication. A real device contains process objects. Process objects include the entire data of an application process (e.g., measured values, programs, or events). The process objects are entered in the object dictionary (OD) as communication objects. The object dictionary is a standardized public list in which communication objects are entered with their properties. To ensure that data are exchanged smoothly in the network, additional items must be standardized, in addition to the OD, which can be accessed by each device. This includes device features such as the manufacturer name or defined device functions that are manufacturer independent. These settings are
© 2005 by CRC Press
INTERBUS Means Speed, Connectivity, Safety
16-7
used to achieve a closed and manufacturer-independent representation of a real device from the point of view of the communication system. This kind of modeling is known as a virtual field device.
16.4 Diagnostics The system diagnostics play an important role in practical applications. In increasingly complex systems, errors must be located quickly using system diagnostics and clearly indicated to the user. In addition to detecting errors, good error diagnostics include reliable error localization. For message-oriented fieldbus systems with a bus structure, only one telegram is ever transmitted to a device at any one time. An error, which affects the system via a specific device or a device nearby, can even destroy telegrams, which are not themselves directed at the faulty device, but may be directed at remote devices. It is therefore virtually impossible to determine the exact error location. INTERBUS uses the CRC procedure in each device to monitor the transmission paths between two devices and, in the event of CRC errors, can therefore determine in which segment the error occurred. An important criterion for maintaining data communication is the response of the master in the event of the following errors: • • • •
Cable break Failure of a device Short circuit on the line Diagnostics of temporary electromagnetic interference (EMI)
In all fieldbus systems, in the event of a line interrupt, the devices after the interrupt are no longer reached. The error localization capability depends on the transmission system used. In linear systems, telegrams are still sent to all devices. However, these telegrams are lost because the devices are no longer able to respond. After a certain period, the master detects the data loss. However, it cannot precisely determine the error location because the physical position of the devices is not known. The system diagrams must be consulted so that the service or maintenance personnel can determine the probable error location (Figure 16.7). Unlike linear systems, the individual devices in the INTERBUS system are networked so that each one behaves as a separate bus segment. Following a fatal error, the outgoing interfaces of all devices are fed
FIGURE 16.7 The behavior of bus systems and ring systems in the event of a cable break.
© 2005 by CRC Press
16-8
The Industrial Communication Technology Handbook
back internally via a bypass switch. In the event of a line interrupt between the devices, the master activates each separate device in turn. To do this, the master opens the outgoing interface, starting from the first device up until the error location, thus clearly identifying the inaccessible device. The controller board can then clearly assign the error location as well as the station or station name and display it in plain text. This is a huge advantage, particularly in large bus structures with numerous devices, where bus systems are typically used. If a device fails, the fieldbus behaves in the same way as for a line interrupt. However, the functional capability of the remaining stations differs in linear and ring systems. In a linear system, bus operation cannot be maintained because the condition of physical bus termination using a termination resistor is no longer met. This can lead to reflections within the bus configuration. The resulting interference level means that correct operation is not possible. In an INTERBUS ring system, the termination resistor is opened and closed together with a bypass switch, which ensures that the condition of the closed ring is always met. In the event of a line interrupt or device failure, the master can either place the devices in a safe state or start up the remaining bus configuration autonomously. Short circuits on the line are a major challenge in a bus system. In the event of a direct or indirect (e.g., via ground) short circuit on the line, the transmission path is blocked for the entire section. In linear systems, the transmission line is used equally for all devices, which means that the master cannot reach segment parts either. This considerably reduces further error localization. In the INTERBUS system, the user is aided by the physical separation of the system into different bus segments. As described for the line interrupt, the devices are activated by the master in turn and the ring is closed prior to the short circuit, which means that subsystems can be started up again. The error location is reported in clear text on the controller board. Linear systems also support a division into different segments. Repeaters, which are placed at specific points, can then perform diagnostic functions. However, a repeater cannot monitor the entire system; it can only cover a defined number of devices per segment. Furthermore, the use of repeaters incurs additional costs, which should not be underestimated, and increased configuration effort. In summary, the INTERBUS diagnostic features are essentially based on the physical segmentation of the network into numerous point-to-point connections. This feature makes INTERBUS particularly suitable for use with fiber optics, which are used increasingly for data transmission in applications with large drives, welding robots, etc. In linear systems, the use of fiber optics — like bus segmentation — requires expensive repeaters, which simulate a ring structure. Fiber-optic path check in the INTERBUS system is another feature, which is not offered by other buses. In this system, a test pattern for the fiberoptic cable is transmitted between the interfaces to determine the quality of the connection. If the cable deteriorates due to dirt, loose connections, bending, etc., the transmission power is increased automatically. If a critical value is reached, the system generates a warning message so that the service personnel can intervene before the deterioration leads to expensive downtimes. Studies by the German Association of Electrical and Electronic Manufacturers (ZVEI) and the German Engineering Federation (VDMA) indicate that many bus errors are caused by direct or hidden installation faults. For this reason alone, bus diagnostics simplify start-up and ensure the smooth operation of the system, even in the event of extensions, servicing, and maintenance work. Every bus system should automatically carry out comprehensive diagnostics of all connected bus devices without the need to install and configure additional tools. Additional software tools for system diagnostics often cost several thousand Euro. In the INTERBUS system, all diagnosed states can be displayed directly on the controller board. If the master has a diagnostic display, various display colors can be used so that serious errors are clearly visible even from a distance. In addition, each master has a diagnostic interface, which can be used to transfer all functions to visualization systems or other software tools.
16.5 Functional Safety In recent years, safety technology has become increasingly important in machine and system production. This is because complex automation solutions require flexible and cost-effective safety concepts, which
© 2005 by CRC Press
INTERBUS Means Speed, Connectivity, Safety
16-9
FIGURE 16.8 The safety extension of INTERBUS.
offer the same advantages that users have come to appreciate in nonsafe areas. This means that considerable savings can be made, e.g., in terms of both cost and time, by changing from parallel to serial wiring. From the user’s point of view, however, various requirements must be taken into consideration. First, safe and nonsafe signals must be separated in order to simplify the programming, operation, and acceptance of safety applications. Second, all components used should be operated on a fieldbus, because a standardized installation concept and standard operation make planning operation, start-up, and maintenance easier. These requirements have led to the safety extension of INTERBUS. As the INTERBUS master, the controller board uses an integrated safe control system. The INTERBUS controller board with integrated safe controller is the basic unit in the system (Figure 16.8). It processes all safety-related inputs and confirms them to the standard control system by setting an output or resetting the output. This method of operation is similar to existing contact-based safety technology. The enabling of output data is programmed with preapproved blocks such as emergency stop, two-hand control, or electrosensitive protective equipment in SafetyProg Windows software, which is compatible with IEC 61131. The amount of programming required is reduced considerably through the use of blocks and the enable principle. The safe input and output components form the interface to the connected I/Os. They control, for example, contactors or valves and read the input status of the connected safety sensors, including intelligent sensors. The user uses the parameterization function of INTERBUS to select the settings for the I/O components, such as clock selection, sensor type, and signal type. The INTERBUS Safety system meets safety functions up to category 4 according to EN 954 [6] and safety integrity level 3 (SIL 3) according to IEC 61508 [7]. Depending on the application, the user can choose to use either a one-cable solution with integrated safety technology or a two-cable solution, where one bus cable is used for standard signals and the other for safety signals. A safety protocol is used between the safe controller and the connected I/O safety devices. This protocol provides the desired security of data transmission and can only be interpreted by the connected safety devices. The safety data are integrated transparently into the INTERBUS summation frame (Figure 16.9). This feature ensures the simultaneous operation of standard and safety devices in the bus system.
16.6 Interoperability, Certification The basic aim of open systems is to enable the exchange of information between application functions implemented on devices made by different manufacturers. This includes fixed application functions, a uniform user interface for communication, and a uniform transmission medium. For the user, this profile definition is a useful supplement to standardized communication and provides a generally valid model for data content and device behavior. These function definitions standardize some essential device
© 2005 by CRC Press
16-10
The Industrial Communication Technology Handbook
FIGURE 16.9 Safety protocol on top of the INTERBUS summation frame.
parameters. As a result, devices from different manufacturers exhibit the same behavior on the communication medium and can be exchanged without altering the application software when these standard parameters are used. INTERBUS takes a rigorous approach to the area of interoperability using the Extensible Markup Language (XML)-based device description FDCML (Fieldbus Description Configuration Markup Language) [9]. FDCML enables the different views of a field device to be described due to the generic device model used. Some examples include identification, connectivity, device functions, diagnostic information, and mechanical description of a device. This electronic device description is used in the configuration software for configuration, start-up, and other engineering aspects. Different applications can use FDCML to evaluate various aspects of a component. For example, Figure 16.9 shows the use of the FDCML file as an electronic data sheet in a Web browser. To simplify interoperability and interchangeability, the members of the INTERBUS Club compile a set of standard device profiles in several user groups for common devices such as drives (DRIVECOM Profile 22), human machine interfaces (MMI-COM D1), welding controllers (WELD-COM C0), and I/O devices (Sensor/Actuator Profile 12). These profiles can also be described neutrally with regard to the manufacturer and bus system in FDCML. The INTERBUS Club and various partner institutes have offered certification for several years to ensure maximum safety when selecting components. Independent test laboratories carry out comprehensive tests on devices as part of this process. The device only receives the “INTERBUS Certified” quality mark, which is increasingly important among users, if the test object passes all the INTERBUS conformance tests. The conformance test is composed of examinations that are carried out by test laboratories using various tools. The conformance test is divided into the following sections: • Basic function test (mandatory) • General section (valid for all interface types) • Fiber optics (for devices with fiber-optic interfaces) • Burst noise immunity test (mandatory) • PCP software conformance test (for devices with PCP communication) (dependent)
16.7 Connectivity As shown in Table 16.1, connectivity is one of the four basic arithmetic operations of field communication. Connectivity is the integration of fieldbus technology in company networks.
© 2005 by CRC Press
INTERBUS Means Speed, Connectivity, Safety
16-11
FIGURE 16.10 Example of an FDCML device description visualized in a Web browser.
However, there are still no standard concepts for connectivity solutions. This makes it more difficult to integrate the field level in the company-wide distributed information system and is only possible through increased programming and parameterization. The Internet Protocol (IP) can be used here as an integration tool [2, 3]. IP is increasingly used in automation technology in conjunction with Ethernet and is then frequently referred to as Industrial Ethernet, for example, [4]. In many cases, IP is already well suited to the field. This section deals with transparent IP communication at the field level, taking into consideration the real-time requirements. This means that the advantages of fieldbus technology are maintained, and at the same time, the user is provided with a new quality of connectivity. For example, well-known office applications such as browsers can be used to load device descriptions, e-mails can be used to send maintenance messages, or File Transfer Protocol (FTP) can be used to upload and download files (Figure 16.10). Ethernet’s role in the future of automation is an important current issue. On the one hand, its specification suggests it could solve all of the communication problems in automation applications and supersede fieldbuses. On the other hand, fieldbuses, with their special characteristics, have arisen because the real world does not consist simply of bits and bytes. However, Ethernet and INTERBUS can only be fully integrated if transparent communication beyond the boundaries of the system is possible without complex conventional gateway processes. This is achieved by using Transmission Control Protocol (TCP)/IP as the standard communication protocol on both systems. While TCP/IP is now standard on Ethernet, this is by no means the case in the factory environment. Virtually all fieldbus organizations and promoters map their fieldbus protocol to Ethernet TCP/IP in order to protect their existing investments. INTERBUS took a different direction early on and integrated TCP/IP into the hybrid INTERBUS
© 2005 by CRC Press
16-12
The Industrial Communication Technology Handbook
protocol. TCP/IP standard tools and Internet technologies based on TCP/IP can therefore be readily transferred to the factory environment without additional expense. For example, on INTERBUS, the FTP service can be used to download control programs and other data to a process controller. The use of FTP to upload and download files such as robot programs is just one advantage, since TCP/IP opens up automation to the world of the Internet. Internet browsers will also be the standard user interface of the factory of the future, when all devices have their own integrated Web page. Special configuration tools, now supplied for devices by virtually every manufacturer, will no longer be needed, as in the future these devices will be configured through ActiveX controls or Java applets that are loaded through the network, and therefore do not have to be present on the control computer beforehand.
16.8 IP over INTERBUS Figure 16.11 shows the system architecture for IP tunneling. The known Ethernet TCP/IP structure can be seen on the left, and the extended protocol structure of an INTERBUS device can be seen on the right. An IP router, with the same mechanisms as in the office environment, is used for coupling. This function is best performed in the PLC. IP tunneling is performed by introducing a new data-send-acknowledged (DAS) service in the INTERBUS parameter channel (Figure 16.12 and Figure 16.13).
FIGURE 16.11 Connectivity creates new options such as Web-based management.
Client
H T T P TCP/ IP ETH
IP- Forw . ETH IB
Server H S P T N PD M T M S P P TCP/ IP IB
Internet/ Intranet
IP Ethernet
FIGURE 16.12 Basic architecture for IP tunneling.
© 2005 by CRC Press
PD/PMS/Safety
INTERBUS
MIB
16-13
INTERBUS Means Speed, Connectivity, Safety
LLI-User services • Data-TransferConfirmed (DTC) • Data-TransferAcknowledged (DTA) • Data-SendAcknowledge (DSA) • Associate (ASS) • Abort (ABT)
…..
Real-Time Application
Internet Protocol (IP)
API (z.B.C, DDI) PMS
DAS-Service Layer 7
Lower Layer Interface (LLI) Peripherals Data Link (PDL) Layer 2
…
FIGURE 16.13 Protocol architecture of an IP-enabled INTERBUS node.
This DAS service enables LLI user protocol data units (PDUs) to be transmitted for unconfirmed, connectionless LLI user services and is used for transparent IP data transmission. These data are transmitted in the same way as the parameter channel (PMS) at the same time as the time-critical process data (PD) exchange.
16.9 Performance Evaluation This section considers the performance of the concept in relation to the relevant fieldbus parameters, such as the number of I/O modules and the amount of cyclic I/O data. The achievable data throughput for IP tunneling is a key performance indicator. Due to the determinism of the INTERBUS system, the throughput can be easily calculated. The following applies to the INTERBUS medium access control (MAC) cycle time TIB (ms): TIB = 13 ◊(6 + N ) ◊
1 ◊1000 + TSW Baud rate
where i =k
N=
 PL
i
i =1
The IP throughput (IP_Th) of a device is calculated as follows: IP _ Th =
M -1 ◊8 TIB
where: N Baud rate TSW PLi M IP_Th
© 2005 by CRC Press
= total payload size: sum of all user data (bytes) of all devices k where N £ 512 bytes and k £ 512 = physical transmission speed of INTERBUS Mbps (0.5, 2) Mbps = software runtime (ms) of the master (0.7 ms, typical, depending on implementation) = Layer 2 payload of the ith device (bytes) where 1 £ i £ k, where k £ 512 = reserved MAC payload (bytes) for the IP channel of a device (M = 8, typical) = throughput (kbps) for IP data telegrams
16-14
The Industrial Communication Technology Handbook
80
14
70
IP throughput
60
TIB (ms)
10
50
8
TIB@2 Mbps
6
40 30
4
20
2
10
0 0
100 typical
200
300
400
IP_Th (kbps)
12
0 500
N (bytes)
FIGURE 16.14 Performance data.
In Figure 16.14, the curve for the INTERBUS cycle time at the MAC level and the IP throughput for a baud rate of 2 Mbps is illustrated as the function of payload size N. For a medium-size configuration of N = 125 bytes, an IP throughput of 36 kbps at an INTERBUS cycle time of approximately 1.6 ms is achieved. This roughly corresponds to the quality of an analog modem for Internet access. For smaller configurations, even Integrated Services Digital Network (ISDN) quality can be achieved. The calculated values could be confirmed in a practical application. In this configuration it should be noted that, in addition to the dedicated process data devices, several IPcompatible devices can be operated at the same time with this throughput.
16.10 Conclusions The open INTERBUS fieldbus system for modern automation seamlessly connects all the I/O and field devices commonly used in control systems. The serial bus cable can be used to network sensors and actuators, to control machine and system parts, to network production cells, and to connect higher-level systems such as control rooms. After a comprehensive introduction to INTERBUS, IP tunneling for fieldbuses was described for improved connectivity at the field level. An essential requirement is that time-critical process data transport is not affected. The integration of the Internet Protocol in INTERBUS creates a seamless communication platform to enable the use of new IP-based applications, which can make it considerably easier to engineer distributed automation solutions. Analysis has shown that the IP throughput can be as high as ISDN quality.
References [1] IEC 61158: Digital Data Communication for Measurement and Control: Fieldbus for Use in Industrial Control Systems, IEC 2001. [2] Postel, J., RFC-793: Transmission Control Protocol (TCP), USC/Information Science Institute, CA, 1981. [3] Postel, J., RFC-791: Internet Protocol (IP), USC/Information Science Institute, CA, 1981. [4] Feld, J., Realtime communication in PROFINET V2 and V3 designed for industrial purposes, in 5th IFAC International Conference on Fieldbus Systems and Their Applications (FeT 2003), Aveiro, Portugal, July 2003. [5] Phoenix Contact INTERBUS Slave Implementation Guide: Communication Software PCP, 1997, www.interbusclub.com/itc/sc_down/5193a.pdf. [6] DIN EN 954-1: Safety of Machinery: Safety-Related Parts of Control Systems: Part 1: General Principles for Design, 1997; German version, 1996.
© 2005 by CRC Press
INTERBUS Means Speed, Connectivity, Safety
16-15
[7] IEC 61508, Functional safety of electrical/electronic/programmable electronic safety-related systems - General requirements, 1999. [8] INTERBUS Protocol Chip SUPI 3, http://www.interbusclub.com/itc/sc_down/supi3_op_e.pdf, 2004. [9] Field Device Configuration Markup Language (FDCML), www.fdcml.org, 2004.
© 2005 by CRC Press
17 Michael Scholles Fraunhofer Institute of Photonic Microsystems
Data Transmission in Industrial Environments Using IEEE 1394 FireWire
Uwe Schelinski Fraunhofer Institute of Photonic Microsystems
Petra Nauber Fraunhofer Institute of Photonic Microsystems
Klaus Frommhagen Fraunhofer Institute of Photonic Microsystems
17.1 Introduction ......................................................................17-1 17.2 IEEE 1394 Basics ...............................................................17-2 17.3 IEEE 1394 System Design.................................................17-4 17.4 Industrial Applications of IEEE 1394 ..............................17-6 17.5 IEEE 1394 Automation Protocol......................................17-9 17.6 Summary..........................................................................17-11 References ...................................................................................17-12
17.1 Introduction Almost every modern distributed technical system requires some kind of digital communication infrastructure with high bandwidth, no matter if it concerns consumer electronics or industrial applications. On one hand, computers and devices for acquisition and reproduction of digital images and sound converge to so-called multimedia systems. On the other hand, industrial control systems incorporate multiple sensors like cameras for optical quality inspection, while control and status information, normally distributed via fieldbuses, run along the same cable as mass data. All these systems have one common characteristic: they require an efficient peer-to-peer data communication mechanism, since they do not inevitably include a computer that can act as a data hub. Even if such a central node exists, sending data from one node to the computer and then forwarding it to another external device is often not the optimum communication pattern. Modern consumer electronics and communication systems for industrial and factory automation applications have some more features and requirements in common: High bandwidth: An industrial black-and-white camera with VGA resolution of 640 ¥ 480 pixels, 12bit color depth per pixel, and a frame rate of 25 Hz, which is often used for automated optical quality inspection, produces a data rate of almost 100 Mbit/s, which can easily be increased by higher geometrical resolution and use of multicamera systems. Real-time streaming: If the communication system supports a special type of data transfer that ensures bandwidth for real-time streaming of data or a well-defined latency for message transmission, definition of higher-layer protocols is significantly simplified. Standard Ethernet does not fulfill this requirement, which makes it difficult to implement hard real-time applications using this bus technology.
17-1 © 2005 by CRC Press
17-2
The Industrial Communication Technology Handbook
One single cabling: Mass data as well as status and control information shall be exchanged via the same cable that is used for all purposes within the system. In order to reduce the costs for cabling and connectors, a serial data transmission is preferred. Easy reconfiguration: While plug-and-play capability is self-evident for consumer electronics, this feature can also be very helpful in industrial environments, like a complex measurement system with lots of different signal acquisition and processing devices. If the user just plugs them together without any elaborate setup, setup time is significantly reduced. This requirement also includes that no specific network topology is enforced. All these requirements are fulfilled by the bus as defined in IEEE 1394 for a high-performance serial bus [1–4], with commercial implementations known as FireWire and i.LINK.* The current version of the standard is made up of three documents: The original standard, IEEE 1394–1995 [1], and the first amendment, IEEE 1394a–2000 [2], are the basis for almost all currently commercially available IEEE 1394 devices. The first amendment remedies flaws in the original standard but provides no new features for the user, whereas the second amendment, IEEE 1394b–2002 [4], completely replaces the lowest, physical layer of the IEEE 1394 protocol stack, necessary for extending the maximum speed and adding new media for data transmission. Because of its features listed above, this bus standard originally used for consumer electronics and computer peripherals becomes more and more popular for industrial and factory automation applications. The following sections describe the fundamental facts of IEEE 1394, a reference design for industrial IEEE 1394 nodes, and some real-world applications for industrial environments. Also, aspects of new media for data transmission, including optical fibers, will be covered.
17.2 IEEE 1394 Basics IEEE 1394 is a serial bus connecting nodes with current data rates up to 800 Mbit/s that can be mixed with older devices running at data rates of 100, 200, and 400 Mbit/s. The standard supports two completely different types of data transfer: 1. Asynchronous transfers are rather short messages that are mainly used for control and setup purposes. Their exchange is controlled by a request-and-response scheme that guarantees a data delivery for read or write operations and generates well-defined error codes. This type of communication is used if reliability is more important than timing, as it cannot be exactly determined at which time an asynchronous request of the application is actually sent to the connected node. 2. Isochronous channels are used for mass data that require a fixed, guaranteed bandwidth. The streaming data are divided into packets that are sent every 125 ms. This 8-kHz clock is distributed across the network by a special packet called cycle start packet. Typical examples of isochronous transfers are real-time video streams where late data are useless. The reception of the data is not secured; the sender does not even know if any other node is listening to the data. These two transfer types use a common physical and link layer of the IEEE 1394 protocol stack, as depicted in Figure 17.1. The link layer provides addressing, data checking, and data framing for packet transmission and reception, whereas the physical layer transforms the logical symbols used by the link layer into electrical signals, which includes bus arbitration in case several nodes want to send data at the same time. The isochronous data are directly fed into the link layer; for asynchronous data there is an additional transaction layer as a standard interface for the application on top of the protocol stack. Its main task is to implement the secure protocol using requests and responses for asynchronous transactions. All three layers exchange data with the so-called serial bus management that incorporates some special functions for managing the isochronous bandwidth and optimizing the efficiency of the bus. Normally, the physical and link layers are realized by dedicated chip sets, whereas the transaction layer and serial *FireWire is a trademark of Apple Computer, Inc., and the 1394 Trade Association. i.LINK is a trademark of Sony Corporation.
© 2005 by CRC Press
Data Transmission in Industrial Environments Using IEEE 1394 FireWire
17-3
FIGURE 17.1 Layered protocol architecture of IEEE 1394. TABLE 17.1 Supported Media by IEEE 1394
bus management are implemented in firmware. Serial bus management is optional and can normally be omitted for design of embedded nodes if a host computer with a common operating system is part of the network. But it is necessary that one node with serial bus management capabilities exists in the network; otherwise, no isochronous operation is possible. In IEEE 1394–1995 and IEEE 1394a–2000 the physical media used for IEEE 1394 were restricted to special shielded twisted-pair (STP) copper cables and IEEE 1394-specific sockets and connectors. For signaling and data exchange between two nodes, a pair of differential signals is used with additional information coded in the direct current (DC) voltage level. A modified version of the STP cables still exists in IEEE 1393b, but a true differential signaling is now used, and more important, a number of other media have been added. A matrix with all supported speeds and reach for different kinds of media is shown in Table 17.1. IEEE 1394b offers two promising options for IEEE 1394 in industrial applications. On one hand, existing CAT5 cabling can be used if a data rate of 100 Mbit/s is sufficient, e.g., if only short status and control messages have to be exchanged. On the other hand, optical media combine the advantages of high data rates with long distance and solve almost all electromagnetic interference (EMI) problems. If the special IEEE 1394 STP cables —for both IEEE 1394a and IEEE 1394b — are used, not only data but also power can be provided. A maximum current of 1.5 A at a typical voltage of 12 V is sufficient for a lot of applications, so that a separate power supply can be omitted for most nodes. IEEE 1394b only affects the physical layer: existing applications that use the original IEEE 1394–1995 standard and the first amendment IEEE 1394a–2000 can easily migrate to IEEE 1394b by just replacing the physical layer; no change of software is required.
© 2005 by CRC Press
17-4
The Industrial Communication Technology Handbook
All data are transmitted as packets consisting of a header that describes the type and destination address of the packets and payload. The receiver can check the integrity by means of cyclic redundancy check (CRC) values that are also included. The IEEE 1394 standard only specifies how packets are transmitted from one node to another; i.e., it covers only the lower layers of the International Organization for Standardization (ISO)/Open Systems Interconnection (OSI) architecture. The definition of common interpretation of the payload is left to application-specific transport protocols. Currently, more than 60 additional specifications exist, most of them for audio and video consumer electronics. Only a few of them are actually of interest for industrial applications: 1. Serial Bus Protocol 2 (SBP-2) [5] defines a generic method for asynchronous data exchange between two nodes. Principally, it encapsulates an arbitrary command set, but is mainly used for the implementation of the SCSI-2 protocol via IEEE 1394. SBP-2 is a very versatile protocol but has significant administrative overhead, so that it is not well suited for extremely high performance applications. This limitation is overcome by a new version of this standard, called SBP-3, that significantly reduces the amount of overhead and also allows isochronous transport of data. The well-known Apple iPod uses SBP-3 for music download. 2. The industrial and instrumentation digital camera (IIDC) 1394-based digital camera specification [6] (DCAM for short) describes how industrial cameras delivering uncompressed video data are accessed by other components as well as the data format of the image data. The main applications are industrial inspection systems for quality control or the like. 3. The Instrument and Industrial Control Protocol (IICP) specification [7] defines how to implement the IEEE 488 protocol on IEEE 1394. The situation concerning IEEE 1394 for high-performance industrial control systems is somewhat complicated. Of course, IEEE 1394 is already used for this application area, but existing solutions are based on proprietary protocols. In the meantime, leading European companies selling products for factory automation, motion control, and the like, as well as research institutes working on IEEE 1394, factory automation, and production technologies, have formed the 1394 Automation Association. Their main task was to develop and maintain a standard called 1394AP (1394 Automation Protocol) that allows subsystems from different vendors to communicate with each other via IEEE 1394. Additionally, DCAMcompliant commands and video streams can be embedded into the 1394AP packets. More details and issues concerning the implementation of 1394AP are described in Section 17.5. However, the exact specification of 1394AP will only be accessible to members of the 1394 Automation Association until first products using 1394AP are out on the market. Afterwards, the responsibility for the protocol will be handed over to the 1394 Trade Association. This process will happen in January 2005.
17.3 IEEE 1394 System Design This section describes a generic architecture of an embedded IEEE 1394 device, i.e., everything that is not a conventional computer. As stated above, only the lower layers of the IEEE 1394 protocol are realized by dedicated hardware; the upper layers are firmware. Therefore, every IEEE 1394 node must contain some kind of processor that executes the higher layers. This leads to the system architecture as depicted in Figure 17.2. Here, an IEEE 1394a solution is still described. However, if IEEE 1394b is necessary either for speed or because of the necessity of alternative media, only a different hardware for link and physical layers has to be chosen; the overall architecture remains the same. For the physical layer, any standard PHY chip can be used. A three-port PHY gives the most flexibility for the network topology. IEEE 1394a forbids closed loops with the system, which results in a treelike network topology, whereas IEEE 1394b automatically breaks up loops but also favors treelike networks. A variety of link layer controllers (LLCs) from different manufacturers exists. For industrial applications, the TSB12LV32 GP2Lynx by Texas Instruments is well suited. It provides a memory-mapped interface to a microcontroller for asynchronous data and setup of the IEEE 1394 data transmission, as well as a 16-bit high-speed data port that can send or receive data at the full speed of 400 Mbit/s. Unfortunately,
© 2005 by CRC Press
17-5
Embedded 1394.a Chip Set
Interface FPGA
DPRAM or FIFO
IEEE1394 Bus
Isochronous data
Data Transmission in Industrial Environments Using IEEE 1394 FireWire
3-Port PHY
LLC
Data/Address Bus
5V 16 Bit µC w. Timers
Asynchronous Data
RAM
FLASH Memory
Programmed Bit I/O
3.3V
Power Unit
Self Power Plug
FIGURE 17.2 Hardware architecture of embedded IEEE 1394 device.
until now (November 2004) no embedded IEEE 1394b LLC capable of handling 800 Mbit/s and dedicated to isochronous transmission is available. Oxford Semiconductor produces a chip called OXUF922 that supports 800 Mbit/s but is mainly intended for mass storage devices like hard drives, as it includes some hardware support for SBP-2 but not for isochronous streaming. One solution might be to use the TSB82AA2 LLC of Texas Instruments, which is IEEE 1394b compliant and has a peripheral component interconnect (PCI) bus interface. If just one application-specific hardware is connected to the PCI interface, this can be treated as a general-purpose 32-bit-wide interface. The complexity of PCI only arises if it is really used as a bus accessed by numerous components. An important issue, especially for industrial applications not using the IEEE 1394b standard, is the electrical interface between PHY and LLC. Since in IEEE 1394a DC voltage levels communicate some information between the PHYs, all PHYs are directly connected with each other. If the nodes are powered from an external supply, they might operate at different ground levels, which may produce a current flow on the IEEE 1394 cables. In order to prevent a faulty operation of the bus or even damage of the nodes, a galvanic isolation between the LLC and the PHY is strongly recommended. Normally, if the chips include some bus holder circuitry, a capacitor inserted in the data and control lines is sufficient. Of course, the problem of additional galvanic isolation vanishes if IEEE 1394b and optical media are used. Two different kinds of optical media are supported. One solution is plastic optical fibers (POFs), which are relatively easy to install. However, commercially available LED-based optical transceivers that are necessary for inexpensive designs are limited to 200 Mbit/s data rates. For high-performance applications, glass optical fibers (GOFs) have to be used. In this case, Gigabit Ethernet transceivers can be used for IEEE 1394b systems. A crucial point is the choice of the processor for the IEEE 1394 node. Usually, the data handling is split: the processor executes the IEEE 1394 stack, cares about the asynchronous data, and controls additional low-bandwidth hardware via programmed bit I/O, whereas the isochronous mass data are processed by some application-specific hardware. In this case, a 16-bit microcontroller like the SABC161PI by Infineon gives sufficient computing power. Another processor that becomes very popular for embedded IEEE 1394 devices is the ARM7 type. Of course, the necessary amount of memory depends on the application, for the IEEE 1394 stack 64 KByte of both RAM and ROM are sufficient. The use of a field programmable gate array (FPGA) for the hardware processing of mass data provides flexibility, so that the same design can be used for a broad range of applications. Some buffering of data is required at the high-speed port of the LLC in order to transform the streaming data into isochronous packets or vice versa. Some LLCs have a sufficient amount of internal first in first out (FIFO) memory; others, like the GP2Lynx, require an external buffer realized by a dual-ported RAM or a FIFO. Another solution is to implement the FIFO inside the FPGA. The architecture of the software implementation of the protocol stack (Figure 17.3) represents the layer structure as defined in the standard. However, two additional layers are inserted that both provide a universal programming interface. The embedded application closely interacts with the serial bus management, transaction layer, and link layer. In order to ease the coding of the application for the user, a common Application Programming Interface (API) must be provided. API calls include routines for initialization of the bus, the basic asynchronous transactions (read, write, and lock), setup of isochronous transfers, and inquiry of information on the status of the bus and the local node as well as callback
© 2005 by CRC Press
17-6
The Industrial Communication Technology Handbook
Hardware Abstraction Layer Link Layer Physical Layer
Hardware
Serial Bus Mgmt.
asynch data
Transaction Layer
isoch. data
Application Programming Interface
Firmware
Application
FIGURE 17.3 Embedded IEEE 1394 protocol stack.
functions that are automatically executed in case of external bus events. The latter generate responses to incoming requests without explicit programming in the application. However, the API, transaction layer, and serial bus management shall be independent from the link layer controller in order to be usable for different embedded systems. Therefore, an additional hardware abstraction layer (HAL) is used that transforms services requested to the link layer into corresponding hardware accesses. No embedded realtime operating system is required; time-critical tasks are scheduled via timers of the microcontroller. For proper operation of IEEE 1394 in industrial environments, a number of requirements must be fulfilled. First of all, it is important to follow the rules of the PHY manufacturers for PCB design, which include a short distance between PHY and LLC, etch traces that match the line impedance of the cable, and avoidance of vias between the PHY and the IEEE 1394 connectors. Second, galvanic isolation is selfevident for IEEE 1394a, as described above. Third, EMI protection of the power supply must be applied. Fourth, shielding and grounding have to be accurate. If these rules are considered, a stable operation of IEEE 1394 nodes can be guaranteed even under harsh industrial conditions. Some examples are described in the following sections.
17.4 Industrial Applications of IEEE 1394 Since the most distinguished feature of IEEE 1394 is its high bandwidth — maximum 50 MByte/s for IEEE 1394a and 100 MByte/s for IEEE 1394b currently — it is mainly used in industrial systems that comprise some kind of image sensors. They produce a huge amount of data that must be transmitted to an image processing system in real time. A typical example is an automated optical quality control of goods. Therefore, the following examples refer to this kind of application. All of them still use IEEE 1394a. The application of IEEE 1394 as fieldbus replacement will be covered in a separate section. The first example is an industrial camera that contains a CMOS (complementary metal oxide semiconductor) image sensor. By using special readout electronics, a dynamic range of 120 dB per pixel is achieved, which allows the capturing of images having both extremely bright and dark regions, like the welding scene in Figure 17.4. The comparison with an equivalent CCD image (also in Figure 17.4) shows that no artifacts like blooming or smearing occur for the CMOS image. Main applications of this camera will be optical inspection of industrial production, like the welding scene in Figure 17.4, but also automotive application, like driver assistance. Here, intensive sun lighting and shadows produce images with enormous contrast that cannot be acquired with conventional charge coupled device (CCD) cameras. The IEEE 1394 interface of this camera uses the hardware and software described in the previous section. The camera operates in accordance to the DCAM standard. Since there are already a number of interface cards, image processing systems, and software tools available that support the DCAM standard, this camera can be used for a broad range of industrial inspection systems. Since another application is driver assistance in passenger cars, the special requirements of automotive electronics must
© 2005 by CRC Press
Data Transmission in Industrial Environments Using IEEE 1394 FireWire
17-7
FIGURE 17.4 Welding scene acquired with conventional CCD camera (left image) and advanced CMOS camera (right image).
be fulfilled. Special IEEE 1394 latching connectors and chip sets for extended temperature ranges are commercially available. A second application is a so-called data logger for a machine that measures the outer contours of rotation symmetrical workpieces like crankshafts. The piece is placed between an infrared (IR) lighting unit and some CCD line cameras, whose number (maximum of 10) depends on the diameter of the workpiece. Both the lighting and the cameras are moved along the axis of the workpiece in a synchronous manner, as shown in Figure 17.5. The plug-and-play capability of IEEE 1394 allows an easy reconfiguration of the system for different types of workpieces. Another reason for IEEE 1394 in this application is bandwidth: each camera has 2048 pixels with 12-bit color depth and operates at a maximum pixel clock of 20 MHz. The resulting data rate exceeds the bandwidth of IEEE 1394a already for one camera so that areas outside the region of interest are clipped using a lookup table and run length encoded by hardware. This task together with the transformation of pixel and additional mechanical data into IEEE 1394 packets is performed by circuitry implemented in the FPGA. The packets are written into the FIFO and transmitted via the IEEE 1394 subsystem shown in Figure 17.2. Concerning the type of transaction, a trade-off must be made between asynchronous packets that guarantee a secure transmission at the cost of bandwidth for response packets and isochronous transfers that lead to high bus throughput but with the lack of feedback upon success or failure. For reasons of security, asynchronous transfers in combination with a proprietary transport protocol have been chosen. However, first experiments have shown that the data logger is able to generate data up to 30 MByte/s, but the connected PC that performs the image analysis cannot handle such an amount of data over long period. Therefore, it is likely that a second version will use isochronous channels for data transmission. Prototype systems of the data logger have been realized (Figure 17.6) and successfully tested. Due to constructive reasons, only a 6-inch-square printed circuit board (PCB) area can be used per camera. The complete system has passed intensive EMC/ESD (electromagnetic compatibility/electrostatic discharge) testing, since all rules mentioned in the previous section have been strictly followed. The machine operates in the direct neighborhood of machine tools, but no faulty operation of the IEEE 1394a bus has been observed so far. The third device is a so-called color photo scanner for extremely high-resolution image acquisition. It is used for applications in which short acquisition time is not the most important issue, but a precise imaging of fine details, like in optical inspection of printed circuit boards. The design goal has been a geometrical resolution of more than 8000 pixels for the shorter edge of the image area and a color depth
© 2005 by CRC Press
17-8
The Industrial Communication Technology Handbook
IR Lighting Unit
Work Piece (e.g. Shaft) CCD Line Camera Camera Position
Movement Directions White Level
Black Level
Camera Output
FIGURE 17.5 Working principle of optical measurement system.
FIGURE 17.6 Module for data compression and transmission used in optical measurement machine.
of 12 bits per elementary color. In order to avoid the enormous costs of a corresponding CCD area sensor, a mixed mechanical-optical principle has been chosen. An arbitrary three-dimensional scene is projected onto a focusing screen via a lens, and the resulting two-dimensional image is scanned with a CCD line sensor that is moved by a step motor. This results in an image size of 8,192 ¥ 12,000 pixels per image. In order to reduce the scanning time to the physically minimum duration — integration time per line ¥ number of lines — two hardware tasks must operate in parallel: one must control the CCD sensor, motor, and analog and digital image processing circuitry, and the other the transmission of data via IEEE 1394. The architecture of the color photo scanner (see photo of the final product in Figure 17.7) is derived from the reference design described in the last section. The two hardware tasks are implemented inside the FPGA. Instead of the FIFO, a fast SRAM (static random access memory) organized as a cyclic buffer is used as an interface between these two tasks. The design makes use of the fact that the high-speed port of the LLC can also handle asynchronous packets at high speed, so that the controller does not care about the mass data. It only executes the IEEE 1394 protocol stack and an implementation of the SBP-2 protocol that is used for control and setup of the scanner. The microcontroller provides enough computing power for that purpose. The device that has become a commercial product is able to capture a complete image within 90 s, which can only be achieved by hardware support of the FPGA. All necessary parts fit into the modified housing of a single lens reflex (SLR) camera.
© 2005 by CRC Press
Data Transmission in Industrial Environments Using IEEE 1394 FireWire
17-9
FIGURE 17.7 Color photo scanner.
17.5 IEEE 1394 Automation Protocol The previous section describes some industrial applications of IEEE 1394 but leaves real industrial communication networks out of consideration. This application field has been dominated by proprietary solutions in the past. Only recently has the situation changed with the passing of the 1394 Automation Protocol. Its theoretical principles are defined in [8]. In industrial automation, a trend toward decentralized systems can be observed over the last few years. In conjunction with the growing capabilities of embedded hardware, most of the system functionality is transferred from the central control unit to distributed control units. A typical example is motion control: the system consists of several input/output modules and decentralized control units to drive the axes. One control node is designated as the host control. This node is responsible for maintaining the network, which includes start-up and parameterization, control and supervision of all system components, and network management. Most of the time, the host control is implemented on an industrial computer, but this task may be assigned to any one of the control units. Several buses have been established in the industrial control in order to connect the decentralized modules, like Ethernet and its derivates, Profibus, CANBus, SerCos, and others. All these technologies lack one important feature: there is no method for cyclic, deterministic distribution of control information with guaranteed delivery at fixed intervals, which is a prerequisite for industrial, especially motion control systems. However, this type of information exchange is one of the characteristics of IEEE 1394: its isochronous mode for data transmission. Therefore, a number of companies selling products for factory automation have integrated IEEE 1394 in their equipment, but are still using proprietary protocols, so that interoperability is not ensured. This limitation shall be overcome by 1394AP. Its specification as well as the development of a reference implementation has been initiated by the European 1394 Automation Association, which consists of more than a dozen factory automation companies and research institutes. IEEE 1394 matches the requirements on communication in industrial control systems: • The communication network has to be fast, robust, and inexpensive. • IEEE 1394 supports distributed architecture like the tree structure. Isochronous and asynchronous transmission schemes permit unrestricted communication between the system components. • IEEE 1394 itself already provides the cycle clock synchronization, which is necessary in order to control specific data exchange. • Low jitter of IEEE 1394 is one reason this bus is attractive for industrial control. • The variety of IEEE 1394 devices guarantees independence from a single supplier and its price policy. Therefore, IEEE 1394 is an almost ideal choice for industrial communication networks. However, specific features like network topology, application-specific control and status registers, packet payload, network management, and message synchronization have to be defined, which makes up 1394AP. Because
© 2005 by CRC Press
17-10
The Industrial Communication Technology Handbook
of the special requirements of industrial communication, 1394AP differs from the usual properties of an IEEE 1394-based transport protocol. One node of the network acts as the so-called application master. Most of the time it will be the control PC of the network, but any other node with sufficient computing power may take this function. The capability of becoming application master is stated by special entries in the configuration ROM of the 1394AP node. Since the application master is responsible for synchronization of all nodes and the usual IEEE 1394 cycle start packets are used for this purpose, the root of the treelike network shall become the application master. All nonroot nodes are called slaves. The main task of the application master is the cyclic transfer of input data to the slaves. This information is summarized in the master data telegram (MDT), which is the payload of an IEEE 1394 packet. The update rate of the control variables can be adjusted via the 1394AP-specific application cycle. Its length is based on the requirements of the application. For high-speed applications, the application cycle is identical with the standard IEEE 1394 isochronous cycle of 125 ms. In this case, the MDT is the payloads of isochronous packets, which are sent immediately after the cycle start packet. In case of reduced performance requirements, the application cycle is spread over several isochronous cycles. Then, for the MDT, both isochronous and asynchronous packets can be used. In the first case, each packet only carries part of the complete MDT; in the second case, asynchronous broadcast write requests are used. The MDT of 1394AP only defines a data structure for transmission of application-specific variables; it does not specify the meaning of the variables. This is left to the application, which gives flexibility for the use of 1394AP. While the slaves receive the MDT and extract the data that are necessary for proper operation, they output their data via the device data telegrams (DDTs). The DDTs transfer the output data both to the master and to the other slaves, so that a real peer-to-peer data transfer is ensured. The remarks on the application cycle for the MDTs are also valid for DDTs. Each node provides network management services, which include node activation, suspension, configuration, initialization, and reset, as well as error handling. The network master — identical with the application master — controls changes in the state of the nodes. One of the main reasons why IEEE 1394 has been chosen as a control bus for real-time systems is its deterministic timing via isochronous data transfer. The clocks running in each individual node are synchronized every 125 ms by the cycle start packets. Data to be sent as either MDT or DDT are held in local buffers and marked with a time stamp. The time information contained in the cycle start packets is used as a trigger that releases the data for transmission. One problem to be overcome is if isochronous and asynchronous data transfers are mixed. After a cycle start, first all isochronous packets are sent; the remaining part of the cycle is used for asynchronous data. However, IEEE 1394 does not check if the transfer of one asynchronous packet can be terminated within 125 ms since the last cycle start. If not, the next cycle start packet will be delayed. For 1394AP, this results in nondeterministic timing, which cannot be accepted. Therefore, some precalculation of the overall bandwidth has to be carried out, which prevents the delay by managing the limited asynchronous resources between all nodes of the network. Isochronous transport is preferred in any case. The MDTs and DDTs define the software interface of 1394AP to the application. In order to ease the migration from other bus solutions to 1394AP without having to rewrite major parts of the application software, well-known communication profiles will be added to 1394AP. As a first example, the CANopen Communication Profile [9] is taken into consideration and implemented in 1394AP. For first experiments, a communication node that supports features of a preliminary version of the 1394AP protocol by hardware has been designed. It can only be used as a slave node. Principally, it again uses the system design depicted in Section 17.3, but comprises a different link layer controller because of the special communication pattern defined in 1394AP. The GP2Lynx is only capable of either sending or receiving isochronous data, but not both simultaneously, as required for MDTs and DDTs. Therefore, the Texas Instruments TSB42AB4 CeLynx device is used in this design. Originally targeted to consumer electronics, it is also well suited for industrial applications and supports full-duplex isochronous streaming.
© 2005 by CRC Press
Data Transmission in Industrial Environments Using IEEE 1394 FireWire
17-11
MC 16Bit Port
FIGURE 17.8 Architecture of node for synchronous industrial communication via 1394AP.
FIGURE 17.9 Node for synchronous industrial communication via IEEE 1394.
The resulting schematic of the node is shown in Figure 17.8. The FPGA filters the data of incoming MDTs and DDTs, so that the microcontroller of the type Infineon C167CS only has to handle relevant data. Outgoing DDTs are sent as asynchronous packets and directly written into the link layer controller by the microcontroller. Figure 17.9 shows the industrial communication node. It is a stacked design using two PCBs; therefore, part of the circuitry is not visible in Figure 17.9. Experiments with this embedded system have shown that 1394AP is suited for industrial real-time communications, especially for motioncontrol systems.
17.6 Summary The four examples in the last two sections show that IEEE 1394 is an emergent and well-suited technology for industrial applications. However, some minor problems exist: 1. Currently, almost all IEEE 1394 devices use IEEE 1394a, which limits the distance between two nodes to 4.5 m, as defined by the standard, while high-quality cables allow up to 10 m. But for large industrial systems this still might not be sufficient. 2. The DC coupling of physical layers in IEEE 1394a implies the danger of disturbance problems, especially in harsh environments. 3. The number of devices on one bus is limited to 63, which may not be enough for certain applications. IEEE 1394 bridges that can connect up to 1023 buses are currently in the specification phase, but it will take some time, since first devices are available. 4. If IEEE 1394 is used inside machines with little space left for cabling, the conventional cables and connectors are not optimal. Smaller IEEE 1394 connections for short distances, like, for instance, PCB stacks or flexible printed tapes, are desirable.
© 2005 by CRC Press
17-12
The Industrial Communication Technology Handbook
The first two items will vanish with the widespread use of the IEEE 1394b amendment; the third one is tackled by the upcoming IEEE 1394.1 specification. Unfortunately, the last item remains a field of design experiments, because a standardized solution does not exist at the moment. These items will not prevent the use of IEEE 1394 for industrial and factory automation applications. The basic technology consisting of IEEE 1394 chip sets, hardware reference designs, and software protocol stacks are already available and first commercial applications have been realized. Therefore, it is very likely that a large number of industrial systems will incorporate IEEE 1394 for data transmission in the near future. The most important limitations of the original standard — DC coupling of nodes and limited reach of 4.5 m between two nodes via copper cables — have been overcome by the new IEEE 1394b amendment. In summary, IEEE 1394 is already more than an emergent technology nowadays, since it will be widely used in industrial applications very soon.
References [1] IEEE, IEEE Standard for a High-Performance Serial Bus 1394–1995, IEEE Press, 1996. [2] IEEE, IEEE Standard for a High-Performance Serial Bus Amendment 1 1394a–2000, IEEE Press, New York, 2000. [3] D. Anderson, Firewire System Architecture, 2nd edition, Addison-Wesley, Reading, MA, 1999. [4] IEEE, IEEE Standard for a High-Performance Serial Bus Amendment 2 1394b–2002, IEEE Press, New York, 2002. [5] Serial Bus Protocol 2 (SBP-2), ANSI Standard NCITS 325-1998. [6] IIDC 1394-Based Digital Camera Specification, Version 1.30, available from 1394 Trade Association (see http://www.1394ta.org/Technology/Specifications/specifications.htm). [7] 1394TA IICP Specification for the Instrument and Industrial Control Protocol, Version 1.00, available from 1394 Trade Association (see http://www.1394ta.org/Technology/Specifications/ specifications.htm). [8] G. Beckmann, Ein Hochgeschwindigkeits-Kommunikationssystem für die industrielle Automation, dissertation, Technical University of Braunschweig, 2001 (in German). [9] CANopen Communication Profile for Industrial Systems, CiA Draft Standard 301, Revision 3.0, available from CAN in Automation e.V.
© 2005 by CRC Press
18 Configuration and Management of Fieldbus Systems 18.1 Introduction ......................................................................18-1 18.2 Concepts and Terms..........................................................18-2 Configuration vs. Management • Smart Devices • Plug and Play vs. Plug and Participate • State
18.3 Requirements on Configuration and Management........18-3 18.4 Interface Separation ..........................................................18-4 The Interface File System Approach
18.5 Profiles, Data Sheets, and Descriptions ...........................18-6 Profiles • Electronic Data Sheets
18.6 Application Development...............................................18-10 18.7 Configuration Interfaces .................................................18-13 Hardware Configuration • Plug and Participate • Application Download
Stefan Pitzek Vienna University of Technology
Wilfried Elmenreich Vienna University of Technology
18.8 Management Interfaces...................................................18-15 Monitoring and Diagnosis • Calibration
18.9 Maintenance in Fieldbus Systems ..................................18-17 18.10 Conclusion.......................................................................18-18 References ...................................................................................18-18
18.1 Introduction Fieldbus systems are often evaluated by their technical merits, like performance, efficiency, and suitability for a particular application. Being designed to perform control applications, most industrial communication networks are well capable of performing their respective application tasks. Besides these ostensible criteria, however, there are some other capabilities a fieldbus system must provide, which in some cases might actually have a greater influence on the usability of a particular system than the technical ability to fulfill the given control requirements. These capabilities deal with the configuration and management, i.e., the setup, configuration, monitoring, and maintenance of the fieldbus system. Powell [32] describes the problematic situation in the past: “Fifteen years ago, a typical process automation plant consisted of various field devices from half a dozen of vendors. Each device had its own setup program with different syntax for the same semantics. The data from the devices often differed in the data formats and the routines to interface each device.” Since that time, a lot of concepts and methods have been devised in order to support these configuration and management tasks. Many of the concepts have been implemented in fieldbus technologies such as HART (highway addressable remote transducer), Profibus, Foundation Fieldbus, LON (local operating
18-1 © 2005 by CRC Press
18-2
The Industrial Communication Technology Handbook
network), etc. It is the objective of this chapter to give an introduction to the state-of-the-art concepts and methods for the configuration and management of fieldbus systems. The remainder of the chapter is organized as follows: Section 18.2 gives definitions of the concepts and terms in the context of configuration and management of fieldbus systems. Section 18.3 investigates the requirements for configuration and management tasks. Section 18.4 analyzes the necessary interfaces of a field device and proposes a meaningful distinction of interface types. Section 18.5 discusses profiles and other representation mechanisms for system properties in several fieldbus systems. Section 18.6 gives an overview of application development methods and their implications for configuration and management of fieldbus networks. Section 18.7 examines the initial setup of a system in terms of hardware and software configuration. Section 18.8 deals with approaches for the management of fieldbus systems, like application download, diagnosis, and calibration of devices. Section 18.9 presents maintenance methods for reconfiguration, repair, and reintegration of fieldbus devices.
18.2 Concepts and Terms The purpose of this section is to introduce and define some important concepts and terms that are used throughout this chapter.
18.2.1 Configuration vs. Management The term configuration is used for a wide range of actions. Part of the configuration deals with setting up the hardware infrastructure of a fieldbus network and its nodes, i.e., physically connecting nodes (cabling) and configuring (e.g., by using switches, jumpers) nodes in a network. On the other hand, configuration also involves setting up the network on the logical (i.e., software) level. Depending on the network topology and underlying communication paradigm (and other design decisions), this leads to very different approaches to how configuration mechanisms are implemented. In contrast, management deals with handling an already built system and includes maintenance, diagnosis, monitoring, and debugging. As with configuration, different fieldbus systems can greatly differ in their support and capabilities for these areas. Often configuration and management are difficult to separate since procedures such as plug and play (see Section 18.2.3) involve configuration as well as management tasks.
18.2.2 Smart Devices The term smart or intelligent device was first used in this context by Ko and Fung [21], meaning a sensor or actuator device that is equipped with a network interface in order to support an easy integration into a distributed control application. In the context of fieldbus systems, a smart device supports its configuration and management by providing its data via a well-defined network interface [23] or offering a self-description of its features. The description usually comes in a machine-readable form (e.g., as an Extensible Markup Language (XML) description) that resides either locally at the fieldbus device (e.g., IEEE 1451.2 [17]) or at a higher network level being referenced by a series number (e.g., OMG Smart Transducer Interface [26]).
18.2.3 Plug and Play vs. Plug and Participate Plug and play describes a feature for the automatic integration of a newly connected device into a system without user intervention. While this feature works well for personal computers within an office environment, it is quite difficult to achieve this behavior for automation systems, since without user intervention the system would not be able to guess what sensor data should be used and what actuator should be instrumented by a given device. Therefore, in the automation domain the more correct term plug and participate should be used, describing the initial configuration and integration of a new device that can be automated. For example, after connecting a new sensor to a network, it could be automatically detected,
© 2005 by CRC Press
Configuration and Management of Fieldbus Systems
18-3
given a local name, and assigned to a communication slot. The task of a human system integrator is then reduced to decide on the further processing and usage of the sensor data.
18.2.4 State Zadeh states that the “notion of state of a system at any given time is the information needed to determine the behavior of the system from that time on” [40, p. 3]. In real-time computer systems, we distinguish between the initialization state (i-state) and the history state (h-state) [22]. The i-state encompasses the static data structure of the computer system, i.e., data that are usually located in the static (read-only) memory of the system. The i-state does not change during the execution of a given application, e.g., calibration data of a fieldbus node. The h-state is the “dynamic data structure … that undergoes change as the computation progresses” [22, p. 91]. An example for an h-state is the cached results of a sequence of measurements that are used to calculate the current state of a process variable. The size of the h-state at a given level of abstraction may vary during execution. A good system design will aim at having a ground state, i.e., when the size of the h-state becomes zero. In a distributed system, this usually requires that no task is active and no messages are in transit.
18.3 Requirements on Configuration and Management The requirements on a configuration and management framework are driven by several factors. We have identified the following points: • (Semi)automatic configuration: The requirement for a plug-and-play-like configuration can be justified by three arguments: 1. An automatic or semiautomatic configuration saves time and therefore leads to better maintainability and lower costs. 2. The necessary qualification of the person who sets up the system may be lower if the overall system is easier to configure. 3. The number of configuration faults will decrease, since monotone and error-prone tasks like looking up configuration parameters in heavy manuals are done by the computer. In most cases, a fully automatic configuration will only be possible if the functionality of the system is reduced to a manageable subset. For more complex applications, consulting the human mind is unavoidable. Thus, we distinguish two cases: (i) The automatic setup of simple subsystems. This use case mostly deals with systems that require an automatic and autonomous (i.e., without human intervention) reconfiguration of network and communication participants in order to adapt to different operating environments. Usually, such systems either use very sophisticated (and often costly) negotiation protocols or work only on closely bounded and well-known application domains. (ii) Computer-supported configuration of large distributed systems. This case is the usual approach. • Comprehensible interfaces: In order to minimize errors, all interfaces will be made as comprehensible as possible. This includes the uniform representation of data provided by the interfaces and the capability of selectively restricting an interface to the data required by its user. The comprehensibility of an interface can be expressed by the mental load that it puts on to the user. Different users need different specialized interfaces, each with a minimum of mental load. For example, an application developer mostly has a service-centered view of the system. Physical network details and other properties not relevant for the application should be hidden from the developer [27]. • Uniform data structures: The configuration and management of fieldbus systems require representations of system properties that are usable by software tools. In order to avoid a situation where each application deals with the required information in its own way, these representations should be generic, highly structured, and exactly specified.
© 2005 by CRC Press
18-4
The Industrial Communication Technology Handbook
• Low overhead on embedded system: Fieldbus systems employ embedded hardware for reasons of cost, size, power consumption, and mechanical robustness. Such embedded hardware usually provides far less memory and processing power than average desktop systems. Currently, typical microcontrollers provide about several hundred bytes of RAM and few kilobytes of Flash ROM. Clocked by an internal oscillator, these microcontrollers provide about 0.5 up to 16 MIPS of processing power. Therefore, the designers of configuration and management tools must take care that there is as little overhead on the embedded system nodes as possible (e.g., static data required for management should be stored in a central repository outside the network). • Use of standard software/hardware: Computers running standard Windows or Linux operating systems do not provide guaranteed response times for programs, and most hardware interfaces are controlled by the operating system. Since this might violate the special timing requirements of a fieldbus protocol, it is often not possible to directly connect a configuration host computer to the fieldbus network using the fieldbus protocol itself. Instead, a configuration tool must use some other means of communication, such as standard communication protocols or interfaces like Transmission Control Protocol (TCP)/Internet Protocol (IP), RS232, universal serial bus (USB), or standard middleware like CORBA (Common Object Request Broker Architecture). Since fieldbus nodes might not be powerful enough to implement these mechanisms, communication will often be performed using dedicated gateway nodes. In order to reduce the complexity of the involved conversion and transformation steps, the interface to and from the fieldbus node must be comprehensible, structurally simple, and easy to access.
18.4 Interface Separation If different user groups access a system for different purposes, they should only be provided with interfaces to the information relevant for their respective purposes [33]. Interfaces for different purposes may differ by the accessible information and in the temporal behavior of the access across the interface. Kopetz et al. [23] have identified three interfaces to transducer nodes of a fieldbus: 1. The configuration and planning (CP) interface allows the integration and setup of newly connected nodes. It is used to generate the “glue” in the network that enables the components of the network to interact in the intended way. Usually, the CP interface is not time critical. 2. The diagnostic and management (DM) interface is used for parametriation and calibration of devices and to collect diagnostic information to support maintenance activities. For example, a remote maintenance console can request diagnostic information from a certain sensor. The DM interface is usually not time critical. 3. The real-time service (RS) interface is used to communicate the application data, e.g., sensor measurements or set values for an actuator. This interface usually has to fulfill timing constraints such as a bounded latency and a small communication jitter. The RS interface has to be configured by means of the CP (e.g., communication schedules) or DM (e.g., calibration data or level monitors) interface. The TTP/A (time-triggered protocol for SAE class A applications) fieldbus system [24] uses timetriggered scheduling that provides a deterministic communication scheme for the RS interface. A specified part of the bandwidth is reserved for arbitrary CP and DM activities. Therefore, it is possible to perform configuration and planning tasks while the system is in operation without a probe effect on the real-time service [15].
18.4.1 The Interface File System Approach The concept of the Interface File System (IFS) was introduced by Kopetz et al. [23]. The IFS provides a unique addressing scheme to all relevant data of the nodes in a distributed system. Thus, the IFS maps
© 2005 by CRC Press
18-5
Configuration and Management of Fieldbus Systems
real-time data, all kinds of configuration data, self-describing information, and internal state reports for diagnosis purposes. The IFS is organized hierarchically as follows: The cluster name addresses a particular fieldbus network. Within the cluster, a specific node is addressed by the node name. The IFS of a node is structured into files and records. Each record is a unit of four bytes of data. The IFS is a generic approach that has been implemented with the TTP/A protocol [24] as a case study for the OMG Smart Transducer Interface. The IFS approach supports well the integration and management of heterogeneous fieldbus networks. The IFS provides the following benefits: • It establishes a well-defined interface between network communication and local application. The local application uses API (Application Programming Interface) functions to read and write data from or into the IFS. The communication interface accesses the IFS to exchange data across the network. • The IFS hides network communication from the node application and provides location transparency for a message, since a task does not have to discriminate between data that is locally provided and data that is communicated via the network. • Since the configuration and management data are also mapped into the IFS, configuration and management tools can directly use the CORBA STI (smart transducer interface) for accessing this information from outside the network. Figure 18.1 depicts an architecture with configuration and management tools that access the IFS of a fieldbus network from the Internet. Diagnosis Tools Monitoring Application (Remote )
Configuration Tool
Virtual Access to any
fieldbus node
Internet TCP /IP or CORBA Interface
Internet or CORBA ORBit Server
RS232 or any other Standard Interface
Fieldbus Gateway Node
Fieldbus Smart Transducer Interface
Field Devices
FIGURE 18.1 Architecture for remote configuration and monitoring.
© 2005 by CRC Press
Monitoring Application (Local)
18-6
The Industrial Communication Technology Handbook
The IFS maps real-time service data, configuration data, and management data all in the same way. In fact, the management interface can be used to define the real-time service data set dynamically (e.g., to select between a smoothed value or a dynamic value as the result from a sensor). While it is required to provide real-time guarantees for communication of real-time data, the access to configuration and management data is not time critical. This enables the employment of Web-based tools for remote maintenance. Tools that interface with the IFS have been implemented using CORBA as middleware. CORBA is an object model managed by the Object Management Group (OMG) that provides transparent communication among remote objects. Objects can be implemented in different programming languages and can run on different platforms. The standardized CORBA protocol IIOP (Internet Inter-ORB Protocol) can be routed over TCP/IP, thus supporting worldwide access to and communication between CORBA objects across the Internet. Alternatively, it is possible to use Web Services as the management interface to a fieldbus network. A case study that implements Web Services on top of the IFS of a fieldbus is described in [36].
18.5 Profiles, Data Sheets, and Descriptions In order to build and configure systems, users require information on different properties of the parts of the targeted system. Such information comes in the form of, e.g., hardware manuals or data sheets. Since this information is intended for human consumption, representation and content are typically less formal than would be required for computer processing of this information. For that reason, dedicated computer-readable representations of fieldbus system properties are required, which play a similar role as information sources for a computer-based support framework during configuration and management of a system. These representations allow for establishing common rule sets for developing and configuring applications and for accessing devices and system properties (for configuration as well as management functions). In the following, we examine several representation mechanisms.
18.5.1 Profiles Profiles are a widely used mechanism to create interoperability in fieldbus systems. We distinguish several types of profiles, i.e., application, functional, or device profiles. Heery and Patel [16] propose a very general and short profile definition that we adopt for our discussion: “Profiles are schemas, which consist of data elements drawn from one or more name spaces,* combined together by implementors, and optimized for a particular local application.” In many cases, a profile is the result of the joint effort of a group of device vendors in a particular area of application. Usually, a task group is founded that tries to identify reoccurring functions, usage patterns, and properties in their domain and then creates strictly formalized specifications according to these identified parts, resulting in so-called profiles. More specific, for each device type, a profile exactly defines what kind of communication objects, variables, and parameters have to be implemented so that a device conforms to the profile. Profiles usually distinguish several types of variables and parameters (e.g., process parameters, maintenance parameters, user-defined parameters) and provide a hierarchical conformance model that allows for the definition of user-defined extensions of a profile. A device profile need not necessarily correspond to a particular physical device; for example, a physical node could consist of multiple virtual devices (e.g., multipurpose input/output (I/O) controller), or a virtual device could be distributed over several physical devices. Protocols supporting device, functional, and application profiles are CANopen [7], Profibus, and LON [25] (LonMark functional profiles). Figure 18.2 depicts, as an example, the visual specification of a LonMark† functional profile for an analog input object. The profile defines a set of network variables (in this example only the mandatory ones are shown) and local configuration parameters (none in this *That is, sources. †http://www.lonmark.org.
© 2005 by CRC Press
Configuration and Management of Fieldbus Systems
18-7
Analog Input Object Mandatory Network Variables
nv1
nvoAnalog SNVT_lev_percent
Configuration Properties None
FIGURE 18.2 Functional profile for an analog input.
example). The arrow specifies that this profile outputs a digital representation of an analog value, whereas the structure of this output is defined with the standardized (in LON) network variable type SNVT_lev_percent (–163.84 to 163.84% of full scale). In addition, a profile also specifies other important properties, such as timing information, valid range, update rate, power-up state, error condition, and behavior (usually as a state diagram). While the approach for creating profiles is comparable for different protocols, the profiles are not always interchangeable between the various fieldbuses, although advancements (at least for process control-related fieldbuses) have been made within IEC 61158 [19]. Block- and class-based concepts, such as function blocks as they are defined for the Foundation Fieldbus or Profibus DP, or component classes in IEEE 1451.1 [18], can be considered implementations of the functional profile concept.
18.5.2 Electronic Data Sheets Classical data sheets usually provide a detailed description of mostly physical key properties of a device such as available pins, electrical properties of pins, available amount and layout of memory, processing power, etc. Electronic data sheets play a conceptually similar role, but usually with a different focus, since they often try to abstract from details of physical properties of the underlying system and describe properties of a higher-level system model (e.g., Institute of Electrical and Electronics Engineers (IEEE) digital transducer interface [17] or Interface File System [26]). Such electronic data sheets follow strict and formalized specification rules in order to allow computer-supported processing of the represented information. A generic electronic data sheet format was developed as part of the smart transducer-related IEEE 1451 standards family. IEEE 1451.2 [17] specifies the transducer electronic data sheet (TEDS) and a digital interface to access that data sheet and to read sensors or set actuators. Figure 18.3 depicts the TEDS in context of the system architecture as defined in IEEE 1451: • Smart Transducer Interface Module (STIM): A STIM contains from 1 to 255 transducers of various predefined types together with their descriptions in the form of the corresponding TEDSs. • Network-capable application processor (NCAP): The NCAP is the interface to the overall network. By providing an appropriate NCAP, the transducer interface is independent of the physical fieldbus protocol. • Transducer-independent interface (TII): The TII is the interface between the STIM and the NCAP. It is specified as an abstract model of a transducer instrumented over 10 digital communication lines. TEDSs describe node-specific properties, such as the structure and temporal properties of devices and transducer data. Since the transducer interface in IEEE 1451 is line based, the basic communication
© 2005 by CRC Press
18-8
The Industrial Communication Technology Handbook
Smart Transducer Interface Module Sensor ADC
DAC Transducer
Address Logic
Netw ork Capable Application Processor (NCAP)
DIO Transducer Independent Interface
Netw ork
Actuator
Transducer Electronic Data Sheet (TEDS)
FIGURE 18.3 Smart Transducer Interface Module connected to NCAP.
primitive is a channel. A channel represents a single flow path for digital data or an analog signal. One STIM may contain multiple channels and has an associated meta-TEDS that describes properties of the STIM, such as device identification information, number of implemented channels, command response time, or worst-case timing information. Each channel has an associated channel TEDS that describes channel-related information such as data structure, transducer, data conversion, timing, etc. IEEE 1451 aims at self-contained nodes. Thus, TEDSs are stored in a memory directly located at the nodes. This requires considerable memory resources, so the representation of the configuration information for such a system must be very compact. IEEE 1451 achieves this goal by providing a large set of predefined transducer types and modes based on enumerated information, where identifiers are associated with more detailed prespecified descriptions (similar to error codes). An instance of a transducer description can be derived from the predefined types, and thus the memory requirements for the transducer description are kept low. The smart transducer descriptions (STDs), as defined in [28], take a comparable role for describing properties of devices that follow the CORBA Smart Transducer Interfaces standard (the descriptions themselves are currently not part of the standard), although there are some notable differences between both approaches. Unlike the commonly used enumeration-based description of properties, the STD and related formats use XML [39] as the primary representation mechanism for all relevant system aspects. Together with related standards, such as XML Schema or XSLT, XML provides advanced structuring, description, representation, and transformation capabilities. It is becoming the de facto standard for data representation and has extensive support throughout the industry. Some examples of XML used in applications in the fieldbus domain can be found in [6, 9, 37]. As the name implies, the smart transducer descriptions describe the properties of nodes in the smart transducer network. The STD format is used for describing both static properties of a device family (comparable to classic data sheets) and devices that are configured as part of a particular application (e.g., the local STD also contains the local node address). The properties described in STDs can be divided into the following categories: • Microcontroller information: This block holds information on the microcontroller and clock of the smart transducer (e.g., controller vendor, clock frequency, clock drift). • Node information: This block describes properties that are specific to a particular node and mostly consist of identification information, such as vendor name, device name/version, and node identifiers (serial number, local name).
© 2005 by CRC Press
Configuration and Management of Fieldbus Systems
18-9
FIGURE 18.4 Example STD element.
• Protocol information: This block holds protocol-specific information, such as version of the communication protocol, supported baud rates, Universal Asynchronous Receiver/Transmitter (UART) types, and IFS layout. • Node service information: The information in this block specifies the behavior and the capabilities of a node. In the current approach, a service plays a role similar to that of a functional profile (see Section 18.5.1) or function block. Such functional units are especially important for supporting the creation of applications. They conform to the interface model of the CORBA STI standard, since a service consists of a service identifier (e.g., name), input and output parameters, configuration parameters, and management parameters [12]. Parameters are specified by data type and multiple constraints (range, precision, minimum interval time, maximum runtime). Figure 18.4 shows the description of a file in the IFS, consisting of the name of the file, its length (in records), and the location of the data, i.e., the memory type (RAM, Flash, ROM) where the file should be located (e.g., data specifies that a file is mapped into the internal RAM of the microcontroller). The prefix rodl: is shorthand for an XML name space. Name spaces allow the reuse of element definitions in multiple places. For example, the elements from the rodl (round descriptor list) name space are defined once separately and used in smart transducer descriptions as well as in additional related formats, such as the cluster configuration descriptions (CCDs). While the STD focuses on the nodes, the CCD format deals with system-level aspects. It is not always possible to store all relevant information outside the node, but by focusing on reducing the amount of required information on the node to the minimum, extensive external meta-information can be used without size constraints. The reference to this external information is the unique combination of series and serial numbers of the node. The series number is identical for all nodes of the same type. The serial number identifies the instance of a node among all nodes of a series. The advantages of this approach are twofold: 1. The overhead at the node is very low. Current low-cost microcontrollers provide internal RAM and EPROM memory of around 256 bytes. This is not sufficient to store more than the most basic parts of data sheets according to, e.g., IEEE 1451.2 without extra hardware like an external memory element. With the proposed description approach, only the memory for storing the series and serial numbers is necessary, which is 8 bytes. 2. Instead of implicitly representing the node information with many predefined data structures mapped to a compact format, it is possible to have an explicit representation of the information in a well-structured and easy-to-understand way. A typical host computer running the configuration and management tools can easily deal with even very extensive generic XML descriptions. Furthermore, XML formats are inherently easy to extend, so the format is open for future extensions of transducer or service types.
© 2005 by CRC Press
18-10
The Industrial Communication Technology Handbook
FIGURE 18.5 Process variable represented with a device description.
Another interesting description mechanism is the device description language (DDL), which has a relatively long history in the fieldbus sector. First drafts emerged around 1990 at Endress+Hauser, where development for a predecessor language, called parameter description language, was already performed in the late 1980s [1]. DDL was first used with the HART fieldbus [5], but later was adopted for the Foundation Fieldbus [13] and most recently for Profibus (where it is called electronic device description). Unfortunately, the different versions are not fully compatible, since they have been extended within the scope of the respective fieldbus protocols. The syntax of the DDL is similar to the syntax of the C programming language, but conceptually the language strongly relates to specialized markup languages like the hypertext markup language (HTML). In addition to these markup capabilities, DDL also provides enhancements like conditional evaluation and loops. DDL serves several purposes in the description of field devices: • It describes the information items presented in the memory of the described devices. • It supports the representation of the described information on different accessing devices (with different displaying capabilities). • It supports the detailed specification of device properties, such as labels for parameters, engineering units, display precision, help texts, the relationship of parameters, and the layout of calibration and diagnostic menus. Unlike the other presented approaches, the device descriptions (DDa) based on DDL play a bigger role for system management, since they not only describe the data in the memory of the devices, but also support defining rich meta-information for improving the interaction with devices. Figure 18.5 depicts a process variable defined with DDL. The example DDL fragment defines the representation of a variable on an access device. It specifies a label to represent the variable on the display, the data type, formatting information for the value to be displayed, and constraints on valid inputs for changing the value. DDs can be stored on devices themselves (using a compact encoding of the information in the DD), as well as externally (e.g., delivered on a disc together with the device or centrally available in a DD repository).
18.6 Application Development At the center of a fieldbus system is the actual fieldbus application. In the following we examine several application development approaches and how they influence system configuration. A widely used development approach for fieldbus applications is model-based development. The basic idea behind this approach is to create a model of the application that consists of components that are connected via links that represent the communication flow between the components. Different approaches usually differ in what constitutes a component (e.g., function blocks, subsystems, services, functional profiles, physical devices) and the detailed semantics of a link. Many approaches support the recursive definition of components, which allows for grouping multiple lower-level components into one higher-level component. Figure 18.6 depicts a typical small control application consisting of two analog inputs receiving values from two sensors, two production identifiers (PIDs), and one analog output controlling an actuator.
© 2005 by CRC Press
18-11
Configuration and Management of Fieldbus Systems
Sensor
Analog In
PID
Sensor
Analog In
PID
Analog Out
Actuator
FIGURE 18.6 Example for an application model. Enterprise Site Area Process Cell Unit Equipment Control Module
FIGURE 18.7 ANSI/ISA-88.01–1995 hierarchical model.
But the model-based approach is not the only application design approach. Another approach used by multiple fieldbus configuration tools is the ANSI/ISA-88.01–1995 procedural control model [20]. This modeling approach enforces a strictly modular hierarchical organization of the application (Figure 18.7). There should be no or hardly any interaction between multiple process cells. Interaction between components in a process cell is allowed. To make best use of this approach, the structure of the network site and the application should closely correspond to the hierarchy specified by this model. This modeling approach conceptually follows the typical hierarchy of process control applications with multiple locally centralized programmable logic controllers (PLCs) that drive several associated control devices. This eases transition from predecessor systems and improves overall robustness, since this approach provides fault containment at the process cell level. As a downside, the coupling between the physical properties of the system and the application is rather tight. An example for a fieldbus protocol that supports this modeling approach is the Profibus PA protocol that provides a universal function block parameter for batch identification [4]. Another design approach is two-level design [31], which originated in the domain of safety-critical systems. In this approach, the communication between components must be configured before configuring the devices. While this requires that many design decisions must be made very early in the design process, this approach greatly improves overall composability of the components in the system. Abstract application models provide several advantages for application development: • The modular design of applications helps to deal with complexity by applying a divide-and-conquer strategy. Furthermore, it supports reuse of application components and physical separation. • The separation of application logic from physical dependencies allows hardware-independent design that enables application development before hardware is available, as well as eases migration and possibly allows the reuse (of parts) of applications. For configuring a physical fieldbus system from such an application model, we must examine (1) how this application model maps to the physical nodes in the network and (2) how information flow is maintained in the network.
© 2005 by CRC Press
18-12
The Industrial Communication Technology Handbook
Function Block Index Parameter
Slot 1 Slot 2 Slot Index
Param Index
1
Param1
2
Param2
Address Data Item Module1
Module2
virtual devices Module3
Device Memory
FIGURE 18.8 Mapping of function blocks to a physical device in Profibus DP.
In order to map the application model to actual devices, fieldbuses often provide a model for specifying physical devices as well. For example, in Profibus DP the physical mapping between function blocks and the physical device is implemented as follows (Figure 18.8). A physical device can be subdivided in several modules that take the role as virtual devices. Each device can have one (in case of simple functionality) up to many slots. A function block is mapped to a slot, whereas slots may also have associated physical and transducer blocks. Physical and transducer blocks represent physical properties of a fieldbus device. Parameters of a function block are indexed, and the slot number and parameter index cooperatively define the mapping to actual data in the device memory. In contrast, the Foundation Fieldbus (FF) follows an object-oriented design philosophy. Thus, all information items related to configuring a device and the application (control strategy) are represented with objects. This includes function blocks, parameters, and subelements of parameters. These objects are collected in an object dictionary (OD), whereas each object is assigned an index. This OD defines the actual mapping to the physical memory on the respective device. In order to understand the methods for controlling the communication flow between the application components, we first examine some recurring important communication properties in fieldbus applications: • The use of state communication as primary communication mechanism for operating a fieldbus [29], i.e., performing the fieldbus application. State communication usually involves cyclically updating the associated application data. • Support for asynchronous/sporadic communication (event communication) in order to perform management functions and deal with parts of the application that cannot be performed with state communication. A common method to achieve these properties is scheduling. There are many scheduling approaches with vastly different effects on configuration. Following are some commonly used approaches adopted in fieldbus systems: • Multicycle polling: In this approach, the communication is controlled by a dedicated node that authorizes other nodes to transmit their data [8]. This approach is used, for example, in WorldFIP, FF, and ControlNet. For configuring devices in such a network, the authorization nodes require at least a list of nodes to be polled; i.e., in the case of a master–slave configuration, only one node must be configured with the time information in order to control the whole cluster. For better control on the timely execution of the overall application, a time-division multiplexing scheme is used for bus access.
© 2005 by CRC Press
Configuration and Management of Fieldbus Systems
18-13
• Time triggered: In a time-triggered communication model, the communication schedule is derived from the progression of physical time. This approach requires a predefined collision-free schedule that defines a priori when a device is allowed to broadcast its data and an agreement on the global time, which requires the synchronization of the local clocks of all participating devices [10]. Some examples of protocols that support time-triggered communication are TTP/A [24], TTP/C [35], and the synchronous part of the FlexRay protocol [14]. In order to configure the communication in these systems, the schedules must be downloaded to all the nodes in the network. • Event triggered: Event-triggered communication implements a push model, where the sender decides when to send a message, e.g., when a particular value has changed more than a given delta. Collisions on the bus are solved by collision detection/retransmission or collision avoidance, i.e., bitwise arbitration protocols such as Controller Area Network (CAN) [34]. Event-triggered communication does not depend on scheduling, since communication conflicts are resolved either by the protocol at the data link layer (e.g., bitwise arbitration) or by the application. The scheduling information is usually stored in dedicated data structures that are downloaded to the nodes in the network in order to be available for use by the network management system functions of the node. The TTP/A protocol deals with both application- and communication-specific configuration information in an integrated way. In this approach, the local communication schedules (called round descriptor lists) as well as the interfaces of application services [12] are mapped onto the same interfacing mechanism, the Interface File System (see Section 18.4.1). For the representation of the overall system, the cluster configuration description format was developed; it acts as a central and uniform data structure that stores all the information pertinent to the fieldbus system. This information includes: • Cluster description meta-information: This description block holds information on the cluster description itself, such as the maintainer, name of the description file, or version of the CCD format. • Communication configuration information: This information includes round sequence lists as well as round descriptor lists, which represent the detailed specification of the communication behavior of the cluster. Additionally, this part of the CCD also includes (partially physical) properties important for communication, such as the UART specification, line driver, and minimum or maximum signal runtimes. • Cluster node information: This block contains information on the nodes in a cluster, whereas nodes are represented with the smart transducer description format.
18.7 Configuration Interfaces In the last section we focused on the relation between application and configuration. In the following, we examine aspects of the system configuration that are mostly independent of the application. We will take a brief look at the physical configuration of fieldbus systems, how nodes are recognized by the configuration system, and how the actual application code is downloaded into the fieldbus nodes.
18.7.1 Hardware Configuration The hardware configuration involves a setup of plugs and cables of the fieldbus system. Several fieldbus systems implement means to avoid mistakes, such as connecting a power cable to a sensitive input, which would cause permanent damage to the fieldbus system or even harm people. Moreover, the hardware configuration interfaces such as plugs and clamps are often subject to failure in harsh environments, e.g., on a machine that induces a lot of vibration. For hardware configuration the following approaches can be identified:
© 2005 by CRC Press
18-14
The Industrial Communication Technology Handbook
• The use of special jacks and cables that support a tight mechanical connection and avoid mistakes in orientation and polarity by their geometry. For example, the actuator–sensor interface* (AS-i) specifies a mechanically coded flat cable that allows the connection of slaves on any position on the cable by using piercing connectors. AS-i uses cables with two wires transporting data and energy via the same line. The piercing connectors support simple connection, safe contacting, and protection up to class IP 67. • Baptizing of devices in order to obtain an identifier that allows addressing the newly connected device. This could be done explicitly by assigning an identifier to the device (e.g., by setting dip switches or entering a number over a local interface) or implicitly by the cabling topology (e.g., devices could be daisy chained and obtain their name subsequently according to the chain). Alternatively, it is possible to assign unique identifiers to nodes in advance. This approach is taken, for example, with Ethernet devices where the medium access control (MAC) address is a worldwide unique identifier, or in the TTP/A protocol that also uses unique node IDs. However, such a worldwide unique identifier will have many digits, so that it is usually not feasible to have the number printed somewhere on the device. To overcome this problem, machine-readable identifiers in the form of bar codes or radio frequency (RF) tags are used during hardware configuration. • Simple configuration procedures, which can be carried out and verified by nonexpert personnel.
18.7.2 Plug and Participate Since the hardware configuration is intended to be simple, a fieldbus system should behave intelligently in order to release human personnel from error-prone tasks. During the stage of plug and participate, the fieldbus system runs an integration task that identifies new nodes, obtains information about these nodes, and changes the network configuration in order to include the new nodes in the communication. Identification of new nodes can be supported with manual baptizing as described in the previous section. Alternatively, it is also possible to automatically search for new nodes and identify them as described in [11]. If there can be different classes of nodes, it is necessary to obtain information on the type of the newly connected nodes. This information will usually be available in the form of an electronic data sheet that can be obtained from the node or from an adequate repository. The necessary changes of the network configuration for including the new node greatly depend on the employed communication paradigm. In the case of a polling paradigm, only the list of nodes to be polled has to be extended. In the case of a time-triggered paradigm, the schedule has to be changed and updated in all participating nodes. In the case of an event-triggered paradigm, only the new node has to be authorized to send data. However, it is very difficult to predict how a new sender will affect the timing behavior of an event-triggered system. In all three cases, critical timing might be affected due to a change of the response time, i.e., when the cycle time has to be changed. Thus, in time-critical systems, the extensibility must be taken into account during system design, e.g., by reserving at first unused bandwidth or including spare communication slots.
18.7.3 Application Download Some frequently reoccurring fieldbus applications, like standard feedback control loops, alert monitoring, and simple control algorithms, can often be put in place like building bricks, since these applications are generically available (e.g., PID controller). For more complex or unorthodox applications, however, it is necessary to implement user-defined applications. These cases require that code must be downloaded into the target devices.
*http://www.as-interface.net/.
© 2005 by CRC Press
Configuration and Management of Fieldbus Systems
18-15
Ten years ago, the most common method to reprogram a device was to use a socketed EPROM memory chip that was taken out of the circuit, erased under UV radiation, and programmed using a dedicated development system, i.e., a PC with a hardware programming device, and then put back into the system. Today, most memory devices and microcontrollers provide an interface for in-system serial programming of Flash and EPROM memory. The hardware interface for in-system serial programming usually consists of a connector with four to six pins that is attached to either an external programming device or directly to the development PC. These programming interfaces are often proprietary to particular processor families, but there also exist some standard interfaces that support a larger variety of devices. For example, the Joint Test Action Group (JTAG) debugging interface (IEEE 1149.1) also supports the download of application code. While the in-system serial programming approach is much more convenient than the socketed EPROM method, both approaches are conceptually quite similar, since it is still necessary to establish a separate hardware connection to the target system. The most advanced approach for downloading applications is in-system application download. In this approach, it is possible to program and configure a device without taking it out of the distributed target system and without using extra cables and hardware interfaces. In-system configuration is supported by state-of-the-art Flash devices, which can reprogram themselves in part by using a boot loader program. This approach is supported, for example, by state-of-theart TTP nodes. A cluster consists of a set of at least four TTP/C nodes and a monitoring node that is connected to the development system. Whenever a new application has to be set up, the monitoring node sends a signal that causes the nodes to go into the so-called download mode. In this mode, it is possible to download application code via the fieldbus network. During the download phase, the real-time service is inactive. Misconfigurations that lead to a failure of the download function must be corrected by locally connecting a programming tool. Alternatively, application code could be downloaded via the fieldbus into the RAM memory at startup. In this case, only the boot loader resides in the persistent memory of the device and the user-defined application code has to be downloaded at start-up. This approach has the advantage of being stateless, so that errors in the system are removed at the next start-up. Thus, the engineers could handle many faults by a simple restart of the system. On the other hand, this approach depends on the configuration instance at start-up — the system cannot be started if the configuration instance is down. Moreover, the restart time of a system may be considerably longer.
18.8 Management Interfaces The ability to perform remote management operations on distributed fieldbus devices is one of the most important advantages of fieldbus systems. Wollschläger [38, p. 89] states that “in automation systems, engineering functions for administration and optimization of devices are gaining importance in comparison with control functions.” Typical management operations are monitoring, diagnosis, or node calibration. Unlike the primary fieldbus applications, which often require cyclical, multidrop communication, these management operations usually use a one-to-one (client–server) communication style. For this reason, most fieldbus systems support both communication styles. A central question is whether and how this management traffic influences the primary application, the so-called probe effect [15]. System management operations that influence the timing behavior of network communication are especially critical for typical fieldbus applications (e.g., process control loops) that require exact realtime behavior. The probe effect can be avoided by reserving a fixed amount of the bandwidth for management operations. For example, in the Foundation Fieldbus and WorldFIP protocols the application cycle (macrocycle) is chosen to be longer than strictly required by the application, and the remaining bandwidth is free for management traffic.
© 2005 by CRC Press
18-16
The Industrial Communication Technology Handbook
In order to avoid collisions within this management traffic window, adequate mechanisms for avoiding or resolving such conflicts must be used (e.g., token passing between nodes that want to transmit management information, priority-based arbitration). In TTP/A, the management communication is implemented by interleaving real-time data broadcasts (implemented by multipartner rounds) with so-called master–slave rounds that open a communication channel to individual devices. If management traffic is directly mingled with application data, such as in CAN, LonWorks, or Profibus PA, care must be taken that this management traffic does not influence the primary control application. This is typically achieved by analyzing network traffic and leaving enough bandwidth headroom. For complex systems and safety-critical systems that require certain guarantees on system behavior, this analysis can become very difficult.
18.8.1 Monitoring and Diagnosis In order to perform passive monitoring of the communication of the application, it is usually sufficient to trace the messages transmitted on the bus. However, the monitoring device must have knowledge of the communication scheme used in the network, in order to be able to understand and decode the data traffic. If this scheme is controlled by physical time, as is the case in time-triggered networks, the monitoring node must also synchronize itself to the network. Some advanced field devices often have built-in self-diagnostic capabilities and can disclose their status to the management system. It depends on the capabilities of the fieldbus system how such information reaches the management framework. Typically, a diagnosis tool or the diagnosis part of the management framework will regularly check the information in the nodes. This method is called status polling. In some fieldbus protocols (e.g., FF), devices can also transmit status messages by themselves (alert reporting). In general, the restrictions from the implementation of the management interface of a fieldbus protocol also apply to monitoring, since in most fieldbus systems the monitoring traffic is transmitted using the management interface. For systems that do not provide this separation of management from application information at the protocol level, other means must be taken to ensure that monitoring does not interfere with the fieldbus application. Since status polling is usually performed periodically, it should be straightforward to reserve adequate communication resources during system design, so that the control application is not disturbed. In the case of alert reporting, the central problem without adequate arbitration and scheduling mechanisms is how to avoid overloading the network in case of “alarm showers,” where many devices want to send their messages at once. It can be very difficult to give timeliness guarantees (e.g., the time between when an alarm occurs and the time it is received by the respective target) in such cases. The typical approach to deal with this problem (e.g., as taken in CAN) is to provide much bandwidth headroom. For in-depth diagnosis of devices, it is sometimes also desirable to monitor operation and internals of individual field devices. This temporarily involves greater data traffic that cannot be easily reserved a priori. Therefore, the management interface must provide some flexibility on the diagnosis data in order to dynamically adjust to the proper level of detail using some kind of pan-and-zoom approach [2].
18.8.2 Calibration The calibration of transducers is an important management function in many fieldbus applications. There is some ambiguity involved concerning the use of this term. Berge [4] strictly distinguishes between calibration and range setting: “Calibration is the correction of sensor reading and physical outputs so they match a standard” [p. 363]. According to this definition, calibration cannot be performed remotely, since the device must be connected to a standardized reference input. Range setting is used to move the value range of the device so that the resulting value delivers the correctly scaled percentage value. It does not require applying an input and measuring an output; thus,
© 2005 by CRC Press
Configuration and Management of Fieldbus Systems
18-17
it can be performed remotely. In the HART bus, this operation is called calibration, whereas calibration is called trim. Fieldbus technology does not influence the way calibration is handled, although information that is required for calibration is stored as part of the properties that describe a device. Such information could be, e.g., the minimum calibration span limit. This is the minimum distance between two calibration points within the supported operation range of a device. Additionally, calibration-related information, i.e., the individual calibration history, can be stored in the devices themselves. This information is then remotely available for management tools in order to check the calibration status of devices. Together with the self-diagnosis capabilities of the field devices, this allows performing a focused and proactive management strategy.
18.9 Maintenance in Fieldbus Systems Fieldbus maintenance is the activity of keeping the system in good working order. The extensive management functions provided by fieldbus systems, such as diagnosis, and monitoring greatly help in maintaining systems. There are several different maintenance schemes that influence the way these steps are executed in detail. Choice of a particular maintenance scheme is usually motivated by the application requirements [4]: • Reactive maintenance is a scheme in which a device is only fixed after it has been found to be broken. This case should be avoided in environments where downtimes are costly (such as in factory applications). Thus, designers of such applications will usually choose more active maintenance strategies. Nonetheless, fieldbus systems also provide advantages for this scheme, since they support the fast detection of faulty devices. • Preventive maintenance is a scheme in which devices are serviced in regular intervals even if they are working correctly. This strategy prevents unexpected downtime, thus improving availability. Due to the associated costs, this approach will only be taken in safety-related applications such as in aviation, train control, or where unexpected downtimes would lead to very high costs. • Predictive maintenance is similar to preventive maintenance, differing in a dynamic service interval that is optimized by using longtime statistics on devices. • Proactive maintenance focuses on devices that are expected to require maintenance. Basically, maintenance involves the following steps: • Recognizing a defective device • Repairing (replacing) the defective device • Reintegrating the serviced device In fieldbus systems, faulty devices will usually be recognized via the network. This is achieved by monitoring the fieldbus nodes and the application or with devices that are capable of sending alerts (refer to Section 18.8). After the source of a problem has been found, the responsible node must be serviced. This often requires disconnecting the node from the network. Thus, we require strategies of how the system should deal with disconnecting a node, as well as reconnecting and reintegrating the replacement node. In case the whole system must be powered down for maintenance, a faulty node can be simply replaced and the integration of the new node occurs as a part of the normal initial start-up process. If powering down of the whole system is undesirable or even impossible (in the sense of leading to severe consequences, as in the case of safety-critical applications), this process becomes more complicated. In this case, we have several options: • Implementation of redundancy: This approach must be taken for safety- or mission-critical devices, where operation must be continued after a device becomes defective or during replacement, respectively. A detailed presentation of redundancy and fault-tolerant systems can be found in [30].
© 2005 by CRC Press
18-18
The Industrial Communication Technology Handbook
• Shutdown of part of the application: In the case of factory communication systems that often are organized as multilevel networks or use modular approaches, it might be feasible to shut down a local subnetwork (e.g., a local control loop or a process cell as defined in the ANSI/ISA-88.01–1995 standard). The replacement node must be configured with individual node data, such as calibration data (these data usually differ between replaced and replacement node), and the state of a node. The state information can include: • Information that is accumulated at runtime (the history state of a system). This information must be transferred from the replaced to the replacement node. • Timing information, so that the node can synchronize with the network. For example, in networks that use a distributed static schedule (e.g., TTP/A), each node must be configured with its part of the global schedule in order to get a network-wide consistent communication configuration. One alternative approach for avoiding transferring of system state is to design a stateless system in the first place. Bauer [3] proposes a generic approach for creating stateless systems from systems with state. Another possibility is to provide well-defined reintegration points where this state is minimized. Since fieldbus applications typically use a cyclical communication style, the start of a cycle is a natural reintegration point.
18.10 Conclusion Configuration and management play an important role in fieldbus systems. The configuration phase can be subdivided into a part that requires local interaction such as connection of hardware and setting dip switches, and a part that can be done remotely via the fieldbus system. An intelligent design requires that the local part is as simple as possible in order to employ nonexpert personal, and that both parts are supported by an adequate architecture and tools that assist the system integrator in tedious and errorprone tasks such as adjusting parameters according to the data sheet of a device. Examples of such an architecture are, among others, IEEE 1451 and the OMG Smart Transducer Standard, which both provide machine-readable electronic data sheets. Management encompasses functions like monitoring, diagnosis, calibration, and support for maintenance. In contrast to the configuration phase, most management functions are used concurrently with the real-time service during operation. Some management functions, such as monitoring, may even require real-time behavior for themselves. In order to avoid a probe effect on the real-time service, the scheduling of a fieldbus system must be designed to integrate management traffic with real-time traffic.
References [1] Borst Automation. Device description language. The HART Book, 9, May 1999. Available at http: //www.thehartbook.com/. [2] L. Bartram, A. Ho, J. Dill, and F. Henigman. The continuous zoom: a constrained fisheye technique for viewing and navigating large information spaces. In ACM Symposium on User Interface Software and Technology, 1995, pp. 207–215. [3] G. Bauer. Transparent Fault Tolerance in a Time-Triggered Architecture. Ph.D. thesis, Technische Universität Wien, Institut für Technische Informatik, Vienna, Austria, 2001. [4] J. Berge. Fieldbuses for Process Control: Engineering, Operation, and Maintenance. ISA — The Instrumentation, Systems, and Automation Society, Research Triangle Park, NC, 2002. [5] R. Bowden. HART: A Technical Overview. Fisher-Rosemount, Chanhassen, MN, 1997. [6] D. Bühler. The CANopen Markup Language: representing fieldbus data with XML. In Proceedings of the 26th IEEE International Conference of the IEEE Industrial Electronics Society (IECON 2000), Nagoya, Japan, October 2000.
© 2005 by CRC Press
Configuration and Management of Fieldbus Systems
18-19
[7] CAN in Automation e.V. CANopen: Communication Profile for Industrial Systems, 2002. Available at http://www.can-cia.de/downloads/. [8] S. Cavalieri, S. Monforte, A. Corsaro, and G. Scapellato. Multicycle polling scheduling algorithms for fieldbus networks. Real-Time Systems, 25:157–185, 2003. [9] S. Eberle. XML-basierte Internetanbindung technischer Prozesse. In Informatik 2000 Neue Horizonte im neuen Jahrhundert. Springer-Verlag, Heidelberg, 2000, pp. 356–371. [10] W. Elmenreich, G. Bauer, and H. Kopetz. The time-triggered paradigm. In Proceedings of the Workshop on Time-Triggered and Real-Time Communication, Manno, Switzerland, December 2003. [11] W. Elmenreich, W. Haidinger, P. Peti, and L. Schneider. New node integration for master-slave fieldbus networks. In Proceedings of the 20th IASTED International Conference on Applied Informatics (AI 2002), February 2002, pp. 173–178. [12] W. Elmenreich, S. Pitzek, and M. Schlager. Modeling distributed embedded applications using an interface file system. Accepted for presentation at the 7th IEEE International Symposium on ObjectOriented Real-Time Distributed Computing, 2004. [13] Fieldbus Technical Overview: Understanding FOUNDATION fieldbus Technology, 2001. Available at http://www.fieldbus.org. [14] T. Führer, F. Hartwich, R. Hugel, and H. Weiler. FlexRay: The Communication System for Future Control Systems in Vehicles. Paper presented at SAE World Congress 2003, Detroit, MI, March 2003. [15] J. Gait. A probe effect in concurrent programs. Software Practice and Experience, 16:225–233, 1986. [16] R. Heery and M. Patel. Application Profiles: Mixing and Matching Metadata Schemas. Ariadne, September 25, 2000. Available at http://www.ariadne.ac.uk. [17] Institute of Electrical and Electronics Engineers, Inc. IEEE 1451.2–1997, Standard for a Smart Transducer Interface for Sensors and Actuators: Transducer to Micro-Processor Communication Protocols and Transducer Electronic Data Sheet (TEDS) Formats, September 1997. [18] Institute of Electrical and Electronics Engineers, Inc. IEEE 1451.1–1999, Standard for a Smart Transducer Interface for Sensors and Actuators: Network Capable Application Processor (NCAP) Information Model, June 1999. [19] International Electrotechnical Commission (IEC). Digital Data Communications for Measurement and Control: Fieldbus for Use in Industrial Control Systems: Part 1: Overview and Guidance for the IEC 61158 Series, April 2003. [20] ANSI/ISA-88.01, Batch Control Part 1: Models and Terminology, December 1995. [21] W.H. Ko and C.D. Fung. VLSI and intelligent transducers. Sensors and Actuators, 2:239–250, 1982. [22] H. Kopetz, Real-Time Systems: Design Principles for Distributed Embedded Applications. Kluwer Academic Publishers, Boston, 1997. [23] H. Kopetz, M. Holzmann, and W. Elmenreich. A universal Smart Transducer Interface: TTP/A. International Journal of Computer System Science and Engineering, 16:71–77, 2001. [24] H. Kopetz, et al. Specification of the TTP/A Protocol, Version 2.00. Technical report, Technische Universität Wien, Institut für Technische Informatik, Vienna, Austria, 2002. Available at http:// www.ttagroup.org. [25] D. Loy, D. Dietrich, and H.-J. Schweinzer (Eds.). Open Control Networks. Kluwer Academic Publishing, Boston, 2001. [26] OMG. Smart Transducers Interface, V1.0. Available specification document number formal/200301-01, Object Management Group, Needham, MA, January 2003. Available at http://doc.omg.org/ formal/2003-01-01. [27] S. Pitzek and W. Elmenreich. Managing fieldbus systems. In Proceedings of the Work-in-Progress Session of the 14th Euromicro International Conference, June 2002. [28] S. Pitzek and W. Elmenreich. Configuration and management of a real-time smart transducer network. In Proceedings of the 9th IEEE International Conference on Emerging Technologies and Factory Automation, Volume 1, Lisbon, Portugal, September 2003, pp. 407–414. [29] P. Pleinevaux and J.-D. Decotignie. Time critical communication networks: field buses. IEEE Network, 2:55–63, 1998.
© 2005 by CRC Press
18-20
The Industrial Communication Technology Handbook
[30] S. Poledna. Fault-Tolerant Real-Time Systems: The Problem of Replica Determinism. Kluwer Academic Publishers, Boston, 1995. [31] S. Poledna, H. Angelow, M. Glück, M. Pisecky, I. Smaili, G. Stöger, C. Tanzer, and G. Kroiss. TTP Two Level Design Approach: Tool Support for Composable Fault-Tolerant Real-Time Systems. Paper presented at SAE World Congress 2000, Detroit, MI, March 2000. [32] J. Powell. The “Profile” Concept in Fieldbus Technology. Technical article, Siemens Milltronics Process Instruments Inc., 2003. [33] A. Ran and J. Xu. Architecting software with interface objects. In Proceedings of the 8th Israeli Conference on Computer-Based Systems and Software Engineering, 1997, pp. 30–37. [34] Robert Bosch GmbH, Stuttgart. CAN Specification, Version 2.0, 1991. [35] TTAGroup. Specification of the TTP/C Protocol. TTAGroup, 2003. Available at http://www. ttagroup.org. [36] M. Venzke. Spezifikation von interoperablen Webservices mit XQuery. Ph.D. thesis, Technische Universität Hamburg-Harburg, Hamburg-Harburg, Germany, 2003. [37] M. Wollschläger. A framework for fieldbus management using XML descriptions. In Proceedings of the 2000 IEEE International Workshop on Factory Communication Systems (WFCS 2000), September 2000, pp. 3–10. [38] M. Wollschläger, C. Diedrich, T. Bangemann, J. Müller, and U. Epple. Integration of fieldbus systems into on-line asset management solutions based on fieldbus profile descriptions. In Proceedings of the 4th IEEE International Workshop on Factory Communication Systems, August 2002, pp. 89–96. [39] World Wide Web Consortium (W3C). Extensible Markup Language (XML) 1.0, 2nd ed., October 2000. Available at http://www.w3.org. [40] L.A. Zadeh. The Concept of System, Aggregate, and State in System Theory, Inter-University Electronics Series, Volume 8. McGraw-Hill, New York, 1969, pp. 3–42.
© 2005 by CRC Press
19 Which Network for Which Application 19.1 Introduction ......................................................................19-1 19.2 Production Hierarchies.....................................................19-1 19.3 Process Types .....................................................................19-3 Batch Systems
19.4 Control Systems.................................................................19-3 Time-Triggered Systems • Discrete Event Control Systems
19.5 Communication Systems ..................................................19-5 Communication Models • Temporal Consistency • Spatial Consistency • Event Ordering • Influence of Failures
19.6 Parameters to Consider in a Choice ................................19-7 19.7 An Overview of Some Solutions......................................19-9
Jean-Dominique Decotignie CSEM (Centre Suisse d’Electronique et de Microtechnique)
Actuator Sensor Interface • Controller Area Network • HART • INTERBUS • LON • MIL-STD-1553 • PROFIBUS-FMS • PROFIBUS-DP • SERCOS • WorldFIP • Ethernet-Based Solutions • Solutions from Nonindustrial Markets
19.8 Conclusion.......................................................................19-13 References ...................................................................................19-14
19.1 Introduction Since the advent of industrial communications, dozens of solutions have been designed that all claim to solve the problem posed by this chapter’s title. This is true, but most of the time in a given context. The reason is that there is no single case in industrial communications. Most of the time industrial processes are organized in a hierarchical manner, and depending on the location in the hierarchy, needs will differ. The way the control software is written also introduces different needs on the communication networks. To select a communication network, it is thus important to understand the different views the network designers had when they built their solution. In this chapter, we will first describe the way industrial production is organized. Then the differences of approaches in architecturing control applications will be explained. When introducing a network to support communication between control entities, a number of constraints are added, such as errors and delays. As they have an impact on the system, they will be detailed first. The last section will present selected industrial networks and show which type of application they may support and how well they do the required job.
19.2 Production Hierarchies Control of a production system is performed by numerous computers organized along several hierarchical levels. Computer networks provide communications between the computers within a level and with some of the computers at the adjacent levels. 19-1 © 2005 by CRC Press
19-2
The Industrial Communication Technology Handbook
TABLE 19.1 Production Hierarchy LEVEL MANUFACTURING PROCESS CONTROL
PUBLIC NETWORK 6 plant management
PLANT MANAGEMENT
OFFICE LAN 5
factory controller
4
cell/line controller
3
workstation controller
2
automation module ctrl
SUPERVISORY CONTROLLER
INDUSTRIAL LAN
CELL NETWORK
BACKPLANE/ CELL NETWORK FIELD BUS
FIELD BUS
1
0
device controller
LINE
DISTRIBUTED CONTROL SYSTEM
MACHINE
PROCESS CONTROLLER
AXES
DEDICATED CONTROLLER
AXES
DEDICATED CONTROLLER
sensor or actuator
Before presenting the different levels in more detail, let us say that the closer an application is to the process, the higher the temporal constraints (ISO, 1994). Conversely, the quantity of transferred data increases with the level in the hierarchy. The position of an application in the production hierarchy also influences the way the application software is built. Applications at the lowest levels often adopt a timetriggered approach (see Section 19.4), while at higher levels an event-driven approach is used in most cases. Numerous factory models, also known as computer-integrated manufacturing (CIM) models, have been described in the literature (for a summary, see Jones, 1989). In terms of hierarchy, all of the above proposals do not differ very much. Differences concern mainly the number and labeling of the levels. Table 19.1 depicts a possible summary of these proposals. This table shows that there exists a more profound difference between process control and manufacturing. The primary purpose of a plant control system, organized in a hierarchical manner as in Table 19.1, is to manage the process as seen through the sensors (level 0) by acting upon it through actuators. Sensors and actuators are linked to the first level of automation (level 1), where each automation device controls a variable of the process in such a way that it stays within limits or follows set-point data given by the next level of automation (level 2). Level 1 corresponds to an axis, control of a single entity, in manufacturing or a local loop in process control. The variable may be discrete, such as a tool-changing mechanism in a machine tool, or continuous, such as an axis in a robot or a heater in a distillation column. Units at level 2 elaborate the set-point data or the limits for variables that have to be linked or related to ensure a proper operation. This is typically the purpose of an interpolator in the computer numerical controller (CNC), or a tool magazine controller for a machine tool, or a robot path controller that gives the set-point data for joints in order to follow a given path. These units receive their commands from and return their status to the machine or process level (level 3). Units at level 3 coordinate the actions on, or the operating conditions of, several elements or groups of elements in order to realize or optimize operations or sequences of operations on objects, either solid or fluid. They receive their commands and report to the line or distributed control system (DCS) level (level 4). Units at levels 1 to 3 also have to perform diagnostics on themselves, detect emergency conditions, and make immediate decisions if they have enough information to do so. Otherwise, they report to the next higher level for decisions. Level 4 units are responsible for optimizing the production, that is, scheduling when and where given operations are performed on objects, and for ensuring that all necessary resources are present.
© 2005 by CRC Press
Which Network for Which Application
19-3
Level 5 normally corresponds to the elaboration of a single product or a family of products on the same devices. That is where process planning for products is performed. Level 6 represents the top level of a plant. Basic functions performed at this level are product design, resource management, high-level production management with inter-area production scheduling, plant policies establishment, etc. At the top, we may find a corporate management level (not shown in Table 19.1) that is linked to the different sites or plants via public or corporate networks. At this level all the nonproduction activities (research, finances, etc.) of the enterprise can be linked. It should be noted that some of these levels might be absent in a given enterprise. Depending on the level of automation and the control structure, level 2 and even level 1 might be included in level 3. Also, levels 5 and 6 might be collapsed into a single level. There is today a clear tendency to reduce the number of levels. However, the functions described above are always present whatever the number of levels.
19.3 Process Types Processes controlled by computers are usually classified according to two paradigms: continuous systems and discrete event systems (Halang, 1992). An example of a continuous process is a temperature regulator that keeps the temperature in a reactor within preset limits around a given set-point value. Such a system reads the temperature as given by a sensor, compares it with the set-point value, computes the necessary correction using an adequate control algorithm, and energizes accordingly a heating actuator. This sequence of operations is repeated at regular intervals, often periodically. The period is set according to the dynamics of the controlled process. On the other hand, the control of an elevator pertains to the class of discrete systems. Let us describe a part of a typical sequence of operations. While the elevator is approaching its destination floor, the system is expecting that the floor deceleration sensor turns on. In this case, the control system has to react by decreasing the speed of the elevator. In the next step, it has to stop the elevator when the floor sensor indicates that the floor has been reached. During this sequence, other events may happen, such as summons requests from external passengers. Such requests may either alter the current sequence of operations or just need to be memorized for later treatment. In addition, a given event may not be interesting at all times. In the above example, the deceleration event is only of interest if the elevator has to stop on the corresponding floor. Event detection needs to be disabled sometimes and enabled again later. An essential characteristic of discrete event systems is that if the action triggered by the occurrence of an event is perfectly known, the point in time at which this event will occur is a priori unknown. Furthermore, the order in which two or more events occur is often very important.
19.3.1 Batch Systems A lot of physical systems present both of these aspects; they are called batch systems.
19.4 Control Systems Continuous systems and discrete event systems implemented by cyclic or periodic polling are often called time-triggered systems or sampled data systems. Discrete event systems implemented using interrupts or internal software events are referred to as event-triggered systems (Kopetz, 1991).
19.4.1 Time-Triggered Systems From the control system viewpoint, continuous systems will be implemented as looping tasks that await the beginning of the period, sample their inputs, compute the new output values, and set the actuators according to the new values. Periodicity is not mandatory but often assumed, as it leads to simpler
© 2005 by CRC Press
19-4
The Industrial Communication Technology Handbook
algorithms and more stable and secure systems. Most of the algorithms developed with this assumption are very sensitive to period duration variations, jitters in the starting instant. This is especially the case of motor controllers in precision machines. Simultaneous sampling of inputs is also an important stability factor. Let us assume that the controller implements a state space control algorithm in which the state vector includes the position, speed, and acceleration of a mobile and that the actual values are obtained by three sensors, a position encoder, tachometer, and accelerometer. The control algorithm assumes that the measured values were acquired at the same instant. For the control systems, this translates into what is called simultaneous sampling. The acquisitions are indeed not really simultaneous if only one processor is devoted to the three operations. However, their instants must be as close as possible and remain apart under a given limit that may be roughly estimated to two orders of magnitude below the period.
19.4.2 Discrete Event Control Systems Discrete event systems may be implemented as a set of tasks activated by event occurrences. Let us define an event as a significant change (or a sequence of significant changes) of state of the process (the so-called occurrences). This change, the event, is detected by some input circuitry that continuously monitors the input and is transformed into an interrupt (detection may also be done by software). The event is handled by a task that undertakes the required actions. The time elapsed between the occurrence of the event and the corresponding reaction, often called reaction time, is bounded and given in the requirements. Reaction times may depend on the kind of event. Reactions to events are normally handled according to the order of arrival of the events. However, some events may be more important than others. The emergency stop in an elevator is clearly more important than a summons request. This will translate into priorities in the event handling. In any case, the order in which events occur is important for the application. The above implementation technique is conventional but not the only one. It is always possible to implement a discrete event system as a continuous one. In such a case, all the inputs are sampled at regular intervals by the control software that detects the changes and undertakes the necessary reactions. This is the way most programmable logic controllers (PLCs) are implemented. With such an implementation, precedence between events may only be asserted if the events are detected during different polling cycles. If two events are detected during the same cycle or period, they will be considered simultaneous. The cycle duration or period should be selected in such a way that all events can be detected. In the elevator example, if the deceleration switch closes during a minimum of 20 ms, the polling period should clearly be lower in order to detect the close event. In summary, continuous control systems exhibit four important characteristics: • They are cyclic and often periodic; period values are set according to the process dynamics. • Jitters in the period should be limited to a few percent of the period. • The instants of input acquisition and output settings are known in advance and dictated by the control system. • All inputs need to be sampled nearly simultaneously. Discrete event systems may be implemented as continuous ones are. They then exhibit the same characteristics but are not necessarily periodic. However, the cycle time should be kept low enough so that all events may be sensed. These systems may also be implemented using interrupts with the following characteristics: • • • • • •
Event occurrence instants are not known. The reaction time to an event is bounded. The order of occurrence of events is important. Reaction to some events may have a higher priority. Detection of a given event may be temporarily disabled. There is a limit in the density of event occurrence that may be handled by the control system.
© 2005 by CRC Press
Which Network for Which Application
19-5
From the control point of view, a control system is often composed of a time-triggered component and an event-triggered component.
19.5 Communication Systems The communication system is there to support the interaction between the control applications. On a given computer different applications may coexist, some being time triggered and some being event triggered. They may communicate with distant applications of the other kind (a time-triggered application may communicate with an event-triggered application and vice versa). The communication network itself may be built according to the event-triggered paradigm, the timetriggered paradigm (Kopetz, 1994), or a combination of both. Matching a time-triggered application with a time-triggered network is obviously more easily performed than doing so with an event-triggered communication system. The latter requires some additional adaptation. Ideally, the communication system should support both views.
19.5.1 Communication Models Communication models define how the different application processes may cooperate (Thomesse, 1993). The most widely used communication model is the client–server model. In this model, processes interact through requests and responses. The client is the process that requests an action be performed by another process, the server. The server carries out the work and returns a message with the result. Client and server only refer to the role the processes play during the remote operation. A process that is a client in some operation may become the server for another one. This model is hence clearly a point-to-point model. This model exhibits a number of drawbacks for control applications. First, time is not taken into account. It is impossible to specify a delay between the request and the response and have some means to check that this delay would be respected because the server application is involved in the response. Second, if a client wants to make simultaneous requests to several servers, this may only be performed sequentially, one request after the other. Last, if two clients make the same request to a server, this model will treat the requests in sequence and the answers may be different. For example, two control nodes may request the value of a sensor attached to a server node. The value returned to the first client may differ from the value returned to the second. The last two problems may be solved by adequate algorithms running on top of the client–server interaction. This results in heavy implementations with often poor performances. However, for a number of remote operations, in particular during the configuration and setup phases, these problems do not appear and the client–server model is a good solution. For real-time operations, industrial networks need to offer performing and simple solutions that solve the problems of the client–server model. This has led to the producer–consumer model (sometimes called publisher–subscriber model), which is a multipoint model. This model is restricted to the transfer of information values and as such is well suited to time-triggered systems. In the producer–consumer model, each piece of information has a producer and one or more consumers. When the producer has some information ready, it gives it to the network. The network transfers the information and makes it available to the consumers. This has some advantages over the client–server model: • The producer and consumers do not need to be synchronized. The consumer does not need to wait for the response of the producer as in the client–server model. If the information is already transferred, it may use it; otherwise, it considers that no new information is available. • The same information can be transferred at the same time to all consumers. The network may thus be used in a more efficient way. • Two or more consumers will work with the same value at a given time. • Flow control is no longer necessary as new information overwrites the previous information. This assumes that the production of new information outdates the previous one.
© 2005 by CRC Press
19-6
The Industrial Communication Technology Handbook
• Synchronization between applications can be implemented as production of a synchronization variable. However, this model comes with a price. As the consumer has no way to relate the information with an explicit request (as in the client–server model), it should be able to know the age of the information. Networks implementing the producer–consumer model should be able to tag the information with some attributes from which this information can be extracted. The producer–distributor–consumer model is an extension of the producer–consumer model that adds an extra level of separation between the production and the transfer of information. The additional role, the distributor, is in charge of transferring the information from the producer site to the consuming sites. In such a way, transfers are no longer triggered by production but are done according to rules defined for the distributor. This offers more flexibility in the scheduling of transfers in the network and results in an improved efficiency.
19.5.2 Temporal Consistency Temporal consistency has two facets, absolute temporal consistency and relative temporal consistency (Kopetz, 1988). Absolute temporal consistency is related to the age of information. A piece of data has been produced at a given time. It is later transmitted over the communication system. It is finally used at some point in time. A piece of data is said to be time consistent in the absolute way as long as the difference between the instant of production and the current instant does not exceed its validity duration. In other words, a piece of data should not be used if it is too old. This behavior is easy to ensure when all takes place on the same computer. When the data are transported over a network, the information is often lost. In time-triggered application, the control application expects that all input data have been acquired at the same time. The measures have been done in a given time window. This property is called production time coherence in the Field Instrumentation Protocol (FIP) standard, and more generally, we can speak of time consistency each time some action must be done in a time interval (ISO, 1994). It is called temporal (or relative temporal) consistency (Kopetz and Kim, 1990) and is defined as follows. Let us consider two variables a and b. Let [a, ta, va] and [b, tb, vb] be two observations of a and b where va and vb are two samples of a and b taken at times ta and tb. The samples are said to be temporally coherent from the production point of view if |ta – tb| < R for a given R, where R is the temporal consistency threshold.
19.5.3 Spatial Consistency As defined by Le Lann (1990), a distributed system is “a computing system whose behavior is determined by algorithms explicitly designed to work with multiple loci of control.” In such systems, there are a number of control nodes that do not work independently but cooperate to perform common tasks. Cooperation may be needed to achieve a certain degree of fault tolerance or because the overall control task is too large to be handled on a single node. Time-triggered distributed systems work periodically, and the various loci of control synchronize their operations according to the passage of time. To do so, they cannot rely on local timers and need to share a common sense of time (Verissimo, 1994). This may be achieved through a distributed clock synchronization algorithm (Kopetz and Ochsenreiter, 1987) or, with some restrictions, using temporal events generated by the networks (He et al., 1990). In such systems, the various loci of control only exchange data, or state information. Obviously, all data necessary for the computations should be available to all loci of control before the synchronization instants; flow control is done statically. A single unit of data may be needed by several control nodes. As these nodes do not share a common memory, the data unit needs to be replicated in each node. This may be done by a broadcast transfer, but we need to have some guarantee that all replicas are identical at the time they are used. This property is called spatial consistency. It may be obtained by a reliable broadcast algorithm (Hadzilacos and Toueg, 1993) or, as in FIP, if only an indication of spatial consistency is necessary (Decotignie and Prasad, 1994).
© 2005 by CRC Press
Which Network for Which Application
19-7
19.5.4 Event Ordering Event-triggered control systems are often very sensitive to the order in which events occur. Networks do not guarantee that requests for information transfers are handled in the order they are submitted. This means that the applications cannot rely on the order in which they receive the events from the networks to establish the order of occurrence of events in time.
19.5.5 Influence of Failures Failures cannot be avoided, but their influence must be minimized. In the value domain, possible failures may be undefined values or defined values considered correct, but incorrect vis-à-vis the related input. In the time domain, crash failures, omission failures, and timing (early or late) failures may occur. Coping with failures is the subject of fault-tolerant computing (Le Lann, 1992) and, as such, outside our scope. However, it is worth discussing the impact of the network on the system with regard to failures. With the introduction of a network, faults may occur in a number of additional entities, links, emission and reception circuits, and software. Links may be cut intermittently or permanently and may be subject to perturbations that mutilate the transmitted information. Emission and reception parts in nodes may stop, respond too quickly or too early, and emit when they are not allowed to or even constantly. Networks have been designed to resist faults by detecting and correcting errors using three types of redundancy: time redundancy, physical resource redundancy, and information redundancy. For example, redundant information is added to each message transferred on the network in the form of an error detection code, parity, or cyclic redundancy check (CRC). Each code exhibits a given detection capacity, which means that there is a number of errors that may not be detected at the network level. Crash, omission, and timing failures are normally detected through the use of timers. In time-triggered systems, this detection is easy because each node has to transmit periodically. Absence of message arrival in the period indicates a possible failure, and appropriate countermeasures may be taken in the application. Furthermore, error correction may be based on temporal redundancy by keeping the previous value and just waiting for a new value at the next period. Event-triggered systems are more difficult to handle. Mutilated frames may be detected as previously and signaled to the sender by a negative acknowledgment from the receiver. The sender waits for the acknowledgment, either positive or negative. Omission failures are detected by the absence of acknowledgment in a given delay. In case of negative or absent acknowledgment, the emitter retransmits the same message. This process is repeated until success or a maximum number of retries has been reached. This means that, to cope with possible faults, acknowledgments are necessary. Furthermore, the time necessary to transfer a message from a sender to a receiver may greatly vary, with a corresponding degradation of the application response time. It also introduces a delay in the transmission of other messages. A second potential problem in event-triggered systems is the difficulty to distinguish a node crash or an omission failure from an absence of event occurrence. A receiver node, for instance, an actuator node, that does not receive any message may assume either that the control node has failed or that there is no new command. In the first case, the actuator should be put in some safe mode, while in the latter no special treatment is to be done. This means that liveness messages should be sent at regular intervals in addition to normal messages. Some failures may prevent the network from functioning if the network has not been designed properly. For instance, if all traffic is ruled by a single node, the case of centralized medium access control, a crash or an omission failure of this node causes the network to stop unless this node is duplicated and some recovery protocol implemented. Some nodes may also start transmitting messages constantly or at a point in time where they are not allowed to do so. This may be the case with some deterministic carriersense multiple-access (CSMA) protocols where collision resolution is based on a priority.
19.6 Parameters to Consider in a Choice When selecting a network for a given application, there are a number of parameters to consider:
© 2005 by CRC Press
19-8
The Industrial Communication Technology Handbook
• Communication models: As described above, two main models are used in building an application layer, the client–server model and the producer–distributor–consumer model (sometimes also called the publisher–subscriber model). Distributed applications are likely to use the latter, while hierarchically organized applications may use the former. • Traffic types: Information transfers may be sporadic (event triggered), cyclic, or periodic. In the last case, the application may impose strict limits in the jitter between consecutive transmissions of the same data. • Topology and transmission medium: If most solutions permit a tree-like topology, there might be restrictions in the branch lengths and the number of nodes in the tree. Some solutions also offer transmission media other than copper-based twisted pairs. Examples are optical fibers or radio transmissions. • Immunity to noise, environment: Networks can be used in difficult environments. This is often the case in transportation and petrochemical applications. The physical layer may be more or less resistant to such environments. Selecting an inadequate solution may lead to an increase error rate in the transmission or even a complete fading in some cases. • Errors: As described above (Section 19.5), transmission errors cannot be avoided even if, with proper cabling and an adequate physical layer, their level can be kept very low. In many applications, temporary errors can be tolerated even if the application is not informed of the occurrence of the error. However, in some cases, this is not acceptable and the network should provide ways to inform the application that an error has occurred. • Throughput: The raw bit rate of a given network is a poor measure of the actual throughput as seen from the application. Overheads in the protocols, response delay in the network stack, medium access control schemes, traffic scheduling algorithms, and delays in the application reduce the actual throughput. As an example, transferring a 16-bit value requires the actual transfer of around 1000 bits with Ethernet, 450 bits with PROFIBUS-FMS, 200 bits with the Controller Area Network (CAN), and 90 bits with FIP. Due to the other effects, the actual throughput may be one or two orders of magnitude below the raw bit rate. • Guarantees: A lot of research work has been devoted to calculating the guarantees offered by industrial networks, as this is of prime importance to the applications. Guarantees give answers to different questions such as: • Will the network be able to withstand a given traffic load? • What will be the maximum transfer time of a sporadic event? • What happens in case of network overload? • What will be the maximum jitter of a periodic transmission? If most solutions offer some guarantees, few give answers to all of the questions listed above. Furthermore, most of the time guarantees are given under the assumption that no transmission error will occur, which cannot be taken for granted. • Consistency: As mentioned above, temporal consistency is assumed in most control applications. Unfortunately, most solutions do not offer any support for temporal consistency not speaking of spatial consistency. This implies that this aspect should be added on top of the selected network. • Horizontal vs. vertical traffic: As described above, a network may be used to support traffic between computers of adjacent levels (vertical traffic). It may also be used to ensure communication between control devices of the same level to guarantee coordination. Traffic patterns and communication relationships are often different in each case, and the network solution should be able to sustain both. • Services: The application layer of a network provides a number of services to the applications. Services that are not available will need to be implemented in the application. Examples of such services are: • Read and write typed variables (or objects) with possible indication of freshness • Read sets of data with possible indication of temporal consistency
© 2005 by CRC Press
Which Network for Which Application
19-9
• Download of data and programs • Remote invocation (start, stop, pause, resume) of programs • Synchronization between applications • Receive, send, or subscribe to events with possible indication of time of occurrence • Configuration ease: For most networks, vendors offer tools that allow configuration of a network. This may include setting addresses, initial values to the timers used in the network, periods, priorities, etc. Tools are also available to monitor the network. The availability of such tools may greatly reduce commissioning and management efforts. • Connection with the Internet: With the widespread use of Web browsers, it is tempting to put a Web server inside each industrial device. This server is used to access non-real-time information such as configuration, user manual, etc. To implement such a capability, the transport protocol should be able to carry the Hypertext Transfer Protocol (HTTP) messages. However, the Transport Control Protocol (TCP) is not mandatory.
19.7 An Overview of Some Solutions There are dozens of networks used in the industrial domain. Here, we present briefly some of the most well known solutions. The objective is to highlight the features that make them fit a type of application rather than another.
19.7.1 Actuator Sensor Interface Actuator sensor interface (ASi) (CENELEC EN 50295, IEC 62026-2) is a communication bus targeted at simple remote inputs and outputs for a single industrial computer. It is based on a low-cost electromechanical multidrop connection system designed to operate over a two-wire cable, over a distance of up to 100 m, or more if repeaters are used. Data and power are transmitted on the same two-wire cable. It is especially suitable for lower levels of plant automation where simple — often binary — field devices such as switches need to interoperate in a stand-alone local area automation network controlled by PLC or PC. The master polls the network by issuing commands and receiving and processing replies from the slaves. Connected slaves are polled cyclically by the master with a data rate of 166,6 kbit/s, which gives a maximum latency of 5148 ms on a fully loaded network (31 slaves). Sixty-two slaves can be connected in the extended addressing mode. Each slave may receive or transmit four data bits. There is provision for automatic slave detection. ASi does not define any application layer but provides management functions to configure or reset slaves and detect new slaves.
19.7.2 Controller Area Network Controller Area Network (CAN) results from an effort from Robert BOSCH to provide a serial communication multiplexer for transmission of information inside a vehicle. CAN layers 1 and 2 are normalized by the International Organization for Standardization (ISO) (1993). Strictly speaking, there are a few application layers for CAN among which CANOpen is the most well known. CAN is a bus without repeaters, which restricts the possible topologies. Up to 30 stations can be connected. Every message to be transmitted is uniquely identified, and there is no means to address directly a given station unless a unique identifier is attached to this station. Every station may access the bus when no other station is transmitting. If two stations start transmitting at the same time, there is an arbitration and the message with the lowest identifier will win. This means that the transmission of a message bearing a high value as identifier may be deferred for a long period and time constraints may only be fulfilled statistically unless an additional protocol (time slots or central access controller) is added. It should be noted that by using an open-collector connection to the cable, CAN implements an immediate and collective acknowledge to any transmitted frame. The protocol is hence efficient and spatial consistency may be easily implemented.
© 2005 by CRC Press
19-10
The Industrial Communication Technology Handbook
At the application layer, CANOpen offers adequate services for sporadic (event) transfers using the producer–consumer model with minimal support of consistency. CAN is not adequate for periodic transfers. As a response, time-triggered (TT)-CAN has been developed. Through its deterministic collision resolution mechanism, CAN offers ways to determine the guarantees offered by the network. Its behavior in the presence of overload is known: low-priority messages cannot access the network any longer. This implies a careful configuration of the network. DeviceNet (IEC 62026-3, 2000b) and Smart Distributed System (SDS) (IEC 62026-5, 2000c) are partly based on CAN.
19.7.3 HART HART (ROSEMOUNT, 1991) has been designed by ROSEMOUNT to interconnect transmitters in process control applications. HART may run over existing 4- to 20-mA lines on a point-to-point mode or on twisted pairs in a bus configuration. In the first case, process values are usually transmitted by analog means and HART is used for configuration and tests. In the multipoint (bus) mode, up to 15 devices may be remotely powered from the master station while fulfilling intrinsic safety requirements. If they are powered locally, a much larger number of devices may be connected. All traffic either comes from or goes to the master station. An additional master station (handheld terminal) may be connected to the bus mainly for management purposes. HART defines an application layer suited to process control applications and is well established in the field. Outside this scope, HART is restricted to very slow master–slave applications.
19.7.4 INTERBUS INTERBUS has been developed by Phoenix Contact to implement distributed inputs and outputs on a PLC. It complies with the collapsed three-layer Open Systems Interconnection (OSI) model and has been standardized by CENELEC (EN 50254, 1998a). INTERBUS uses a ring topology with a master station (PLC) and up to 256 slave stations. Twisted pairs are used, but fiber-optic cable may be used easily. Topologies are slightly restricted by the ring cabling. In INTERBUS, most of the traffic is cyclic or periodic, with a single period for all traffic. The traffic is from the master to slaves and vice versa. As such, INTERBUS is targeted to remote inputs and outputs for time-triggered applications. Simultaneous sampling is available and network guarantees can be easily determined.
19.7.5 LON LON has been designed by ECHELON, especially with building control in mind. It complies with the full seven-layer OSI model and provides a variety of transmission media, including wireless at limited speed (5 kbit/s). It has been standardized by the Electronic Industries Alliance (EIA-709.1, 1998). On twisted pairs, LON uses the RS 485 standard. Repeaters are permitted and a rather general topology may be implemented. The medium access control is decentralized and contention based (predictive CSMA). Time constraints may hence only be fulfilled statistically. LON has a network layer and routers are available to interconnect up to 255 subnetworks. The application layer provides services for sporadic variable and message exchanges without temporal relationship (no support for time stamping). Variables and messages may have different priorities and may be authenticated. The authentication key is assigned to the node. LON has no support for cyclic transfers and cannot ensure periodicity. Variables may be typed by using one of the predefined types or a C declaration. In the former case, a remote station will be able to read the variable type. Each node may define a variable as input from or output to the network. In the latter case, when the value of the variable is updated, the new value may be automatically sent to all nodes that have declared the variable as input. Such transmissions seem to
© 2005 by CRC Press
Which Network for Which Application
19-11
take place in a series of point-to-point transfers and not in a broadcast mode as in WorldFIP. A second difference is that the same variable may be updated (declared as output) by several nodes. Messages may be sent with or without acknowledgment in the point-to-point or multipoint mode. The content of a message is user defined. Messages provide the means to extend the LON functionality. Loss of interoperability is the price to pay.
19.7.6 MIL-STD-1553 MIL-STD-1553 has been developed for the U.S. Air Force and the U.S. Navy (Haverty, 1986). It may be considered the precursor of fieldbuses and most of the concepts have been used in other proposals. Information is communicated over a bus in Manchester II biphase encoding at a rate of 1 Mbits/s. A maximum of 31 remote terminals (RTs) can be connected to the bus. Each RT can have up to 30 subaddresses. They are coupled to the bus controller (BC) either directly or via transformers, over a screened twisted pair. The bus controller initiates all message transfers. It issues a command frame to a given RT to transmit or receive data. The RT responds by sending a data frame if necessary and terminates the transfer with a status frame. The bus controller may simultaneously issue to a given RT a command to receive data and to another RT a command to transmit data. This mechanism allows cross-communication between RTs. Command, data, and status frames consist of 16 data bits, 1 parity bit, and 3 bits of Manchester code violation for synchronization at the beginning of the frame. Very strong requirements are set for cables, coupling transformers, and isolation resistors to ensure very good noise immunity. MIL-STD-1553 does not define any application layer. There are, however, some attempts to use it in the industry. Due to the high interface cost, its success is very limited.
19.7.7 PROFIBUS-FMS PROFIBUS is a European standard (EN 50170 1996b) that adopts a three-layer architecture and includes a full network management part. PROFIBUS has strong analogies with MiniMAP and may be considered a rather cheap implementation of this network. It makes the difference between master stations that may transmit on their own and slave stations that may only respond to inquiries from master stations. This allows simple devices to be implemented in an economic manner. A rather general topology is possible with up to 127 devices connected. Intrinsically safe devices are available from some vendors. Medium redundancy is supported. PROFIBUS relies on a simplified token-passing mechanism that does not offer any guaranteed transfer time. Elaboration of spatial and temporal consistency indications as well as age of data is not standardized and left to the users. The application layer may be considered a subset of Manufacturing Message Specification (MMS) (Manufacturing Automation Protocol (MAP) application layer) (CCE-CNMA, 1995b) even if the syntax is somehow different. Room is left for profiles that are more or less equivalent to companion standards in MMS. In addition, data access protection is defined and applies to all objects (variables, programs, domains) manipulated by the application services. PROFIBUS-FMS introduces the idea of cyclic exchanges of data. However, this has very little impact on the application services. In fact, cyclic data exchanges are added to handle data transfers from slave stations and not for cyclic user data transfers. As a cell network, PROFIBUS fulfills most of the requirements. One may regret the absence of application services for event management and application synchronizations (semaphores).
19.7.8 PROFIBUS-DP PROFIBUS-DP is also covered by European standard EN 50170. It shares its physical and data link layers with PROFIBUS-FMS. DP is targeted at remote inputs and outputs and has normally one master station and a number of slave stations.
© 2005 by CRC Press
19-12
The Industrial Communication Technology Handbook
Traffic is essentially cyclic from the master to the slaves and vice versa. In the absence of other master stations, some real-time guarantees can be calculated.
19.7.9 SERCOS SERCOS results from a common effort of the German Machine-Tool (VDW) and Drive Manufacturers (ZVEI) to design a communication mean between computerized numerical controllers (CNCs) and motor drives. It has been standardized by IEC (61491, 2002) and CENELEC (1998b). SERCOS is a ring where the link between each node is an optical fiber. High noise immunity is thus provided. There is a central station in the CNC, and all traffic either comes from or goes to this master station. However, medium access control is not under the control of the master, which would have rendered the network much more efficient. Rather, every slave (drive) has a time slot in which it may transmit information. The selection of the slot is done at start-up and takes into account the response time of each device (drive). Each transmitted frame may contain real-time data as well as messages for configuration. As a maximum of 2 bytes may be sent per frame, messages must be segmented in a large number of segments and reassembled at reception. Even if SERCOS is not really structured according to the OSI model, a large number of application layer services are defined. These services mainly apply to drives. SERCOS provides simultaneous sampling of the inputs and handles periodic traffic at a minimum of 62.5 µs. Real-time traffic is guaranteed. Up to 254 drives can be connected.
19.7.10 WorldFIP Field Instrumentation Protocol (FIP) is a European standard (CENELEC, 1996a) that complies with the ISO collapsed three-layer model. Its physical layer conforms with the fieldbus international standard (IEC 61158-2) and provides IEC level 3 EMC noise immunity. It includes a full network management part. FIP assumes that most of the traffic is cyclic or periodic. In this case, each transfer is from a producer to a number of consumers. For real-time data, the producer–consumer model is used. Different periods or cycle durations may coexist on the same network. The deterministic nature of the traffic justifies a central medium access controller, the distributor or bus arbiter. For reliability purposes, redundant bus arbiters can be added. In FIP, cyclic traffic scheduling is performed by the bus arbiter based on the needs expressed during the initialization phase by each communicating entity. Sporadic traffic is scheduled based upon the needs expressed at runtime. FIP supports two transmission media: shielded twisted pair and fiber optics. The topology is very flexible. Medium and line-driver redundancy are supported. Up to 64 stations can be connected without repeaters. As mentioned above, FIP medium access control is centralized. All transfers are under the control of the bus arbiter that schedules transfers to comply with timing requirements. The data link layer provides two types of transmission services, those for variable exchange and those for message transfer. Transfers of variables and messages may take place under request from any station or cyclically (or periodically) according to system configuration. Variables are exchanged according to the producer–distributor–consumer model and identified by a unique 16-bit identifier known from the producer and consumer. The identifier is not related to any physical address. Messages are transferred from a source station to a single or all destination stations according to a client–server model. Each message holds its source and destination addresses. These addresses are 24 bits long and identify the segment number and address of the station on the segment (in fact, the link service access point). Messages are optionally acknowledged. For real-time data exchange, FIP behaves like a distributed database being refreshed by the network periodically or upon demand. All the application services related to periodic and sporadic data exchange are called MPS. MPS provides local read and write services (periodic) as well as remote read and write
© 2005 by CRC Press
Which Network for Which Application
19-13
services (sporadic). Accessed objects are variables or lists of variables. For variables, information on freshness is available. For lists of variables, FIP provides information on temporal and spatial consistency status. In addition, FIP provides a conventional client–server model for messages and events with a subset of MMS as application layer. The available services are defined according to classes: sensor, actuator, input/ output (I/O) concentrator, PLC, operator console, and programming console. The MMS subset covers domain management, program invocation, variable access, semaphore and journal management, and the basics of event management. Syntax and encoding rules conform to Abstract Syntax Notation 1 (ASN.1).
19.7.11 Ethernet-Based Solutions Ethernet has been used since the beginning of the 1980s in industrial communications (CCE-CNMA, 1995a). Even if a few solutions (Le Lann, 1993) introducing some degree of predictability in this technology have been designed and produced, users in the industrial domain were reluctant to use Ethernet for its lack of guarantees. More recently, due to its low cost and wide availability, there is a revival of interest in using Ethernet (IEEE 802.3) as a communication network in factories. Ethernet (IEEE802.3, 2000) defines the lowest two protocol layers of the OSI model. Its topology is a tree in which each node is either a switch or a hub. This decreases cabling flexibility and increases the cost of the solution. The main question is: Which protocol should be used over Ethernet? Layer 3 (network layer) is often the Internet Protocol (IP) (IETF, 1981a), which is adequate for real time. Selecting the Transport Control Protocol (TCP) (IETF, 1981b) as a transport protocol is not really adequate for realtime use (Decotignie, 2001). Other solutions such a Xpress Transfer Protocol (XTP) (Sanders and Weaver, 1990) do a much better job. TCP may still be used for non-real-time traffic. If adequate application layers can be found easily, the missing part is what we could call a real-time layer that can provide indication of consistency and event ordering. For the time being, a number of organizations are pushing for Ethernetbased solutions, including a PROFIBUS version. Unfortunately, there is no compatibility between these solutions. Let us simply state that it is certainly possible to built adequate solutions on top of Ethernet, but this is still to come.
19.7.12 Solutions from Nonindustrial Markets It is tempting to use, in the industrial domain, solutions that were designed for the consumer market. They are often cheaper and a lot of tools exist to support them. Universal Serial Bus (USB) and Firewire, or IEEE 1394, are examples of such technologies. They were not designed for computer-to-computer networking but as remote input and output media for a single computer. The main questions are: • Is the solution able to withstand industrial use? • Is there a clear advantage over other solutions? The answer to the first question is case dependent. Successful experiences were made with Firewire (Ruiz, 1999). However, the physical layer of both USB and Firewire was not designed for industrial use. The second question has several facets. In terms of throughput, there is not a definitive advantage over other high-speed solutions (Dallemagne and Decotignie, 2001). The main problem is that most of the functionality is missing. These networks only define the lowest two layers of the OSI model and completely lack the other layers.
19.8 Conclusion The selection of a communication system to interconnect industrial applications on different computers depends on the type of application. We have presented the taxonomies of control applications. We have shown that a related taxonomy exists for the industrial communication networks. However, distributing industrial applications over several sites connected by a network introduces a number of problems that differ depending on the type of application. Industrial networks should support the resolution of these
© 2005 by CRC Press
19-14
The Industrial Communication Technology Handbook
problems. Finally, selected examples of networks were briefly described and the ability to support the different types of applications and problems related to distribution outlined.
References CCE-CNMA (1995a), CCE: An Integration Platform for Distributed Manufacturing Applications, Research Report ESPRIT, Springer-Verlag, Berlin. CCE-CNMA (1995b), MMS: A Communication Language for Manufacturing, Research Report ESPRIT, Springer-Verlag, Berlin. CENELEC EN 50170 (1996a), General Purpose Field Communication System, Vol. 3/3 (WorldFIP). CENELEC EN 50170 (1996b), General Purpose Field Communication System, Vol. 2/3 (PROFIBUS). CENELEC EN 50254 (1998a), High Efficiency Communication Subsystem for Small Data Packages, December 1998. CENELEC EN 50295 (1999), Low-Voltage Switchgear and Controlgear: Controller and Device Interface Systems: Actuator Sensor interface (AS-i). CENELEC EN 61491 (1998b), Electrical Equipment of Industrial Machines: Serial Data Link for RealTime Communication between Controls and Drives (SERCOS). Dallemagne P., Decotignie J.-D. (2001), A comparison of USB 2.0 and Firewire in industrial applications, in Proceedings of the FeT 2001 International Conference on Fieldbus Systems and Their Applications, Nancy, France, November 15–16, pp. 16–23. Decotignie J.-D. (2001), A perspective on Ethernet-TCP/IP as a fieldbus, in Proceedings of the FeT 2001 International Conference on Fieldbus Systems and Their Applications, Nancy, France, November 15–16, pp. 138–143. Decotignie J.-D., Prasad P. (1994), Spatio-temporal constraints in fieldbus: requirements and current solutions, in the 19th IFAC/IFIP Workshop on Real-Time Programming, Isle of Reichnau, June 22–24, pp. 9–14. EIA-709.1 (1998), Control Network Specification, Electronic Industries Alliance, Arlington, VA. Hadzilacos V., Toueg T. (1993), Fault-tolerant broadcasts and related problems, in Distributed Systems, S. Mullender, Ed., ACM Press, New York. Halang W., Sacha K. (1992), Real-Time Systems, World Scientific, Singapore. Haverty M. (1986), MIL-STD-1553: a standard for data communications, Communication and Broadcasting, 10, 29–33. He J., Mammeri Z., Thomesse J.-P. (1990), Clock synchronization in real-time distributed systems based on FIP field bus, 2nd IEEE Workshop on Future Trends of Distributed Computing Systems, Cairo, Egypt, September 30–October 2, pp. 135–141. IEC 61158-2 (2000–8) Fieldbus Standard for Use in Industrial Control Systems: Part 2: Physical Layer Specification and Service Definition. IEC 61491 (2002), Electrical Equipment of Industrial Machines: Serial Data Link for Real-Time Communication between Controls and Drives. IEC 62026-2 (2000a), Low-Voltage Switchgear and Controlgear: Controller-Device Interfaces (CDIs): Part 2: Actuator Sensor Interface (AS-i). IEC 62026-3 (2000b), Low-Voltage Switchgear and Controlgear: Controller-Device Interfaces (CDIs): Part 3: DeviceNet. IEC 62026-5 (2000c), Low-Voltage Switchgear and Controlgear: Controller-Device Interfaces (CDIs): Part 5: Smart Distributed System (SDS). IEEE 802.3 (2000), Part 3: Carrier Sense Multiple Access with Collision Detection on (CSMA/CD) Access Method and Physical Layer Specifications, pp. i –1515. IETF RFC 791 (1981a), Internet Protocol, September 1. IETF RFC 793 (1981b), Transport Control Protocol, September 1. ISO 11898 (1993), Road Vehicles: Exchange of Digital Information: Controller Area Network (CAN) for High-Speed Communication.
© 2005 by CRC Press
Which Network for Which Application
19-15
ISO (1994), Time Critical Communication Architectures: User Requirements, ISO TR 12178, Geneva. Jones A., et al. (1989), Issues in the design and implementation of a system architecture for computer integrated manufacturing, International Journal of Computer Integrated Manufacturing, 2, 65–76. Kopetz H. (1988), Consistency constraints in distributed real time systems, in Proceedings of the 8th IFAC Workshop on Distributed Computer Control Systems, Vitznau, Switzerland, September 13–15, pp. 29–34. Kopetz H. (1991), Event-triggered versus time-triggered real-time systems, in Operating Systems of the 90s and Beyond, Lecture Notes in Computer Science 563, Springer, Heidelberg, 1991. Kopetz H., Grunsteidl G. (1994), TTP: a protocol for fault tolerant real-time systems, IEEE Computer, 27, 14–23. Kopetz H., Kim K. (1990), Temporal uncertainties in interactions among real-time objects, in Proceedings of the 9th Symposium on Reliable Distributed Systems, Huntsville, AL, October 9–10, pp. 165–174. Kopetz H., Ochsenreiter W. (1987), Clock synchronization in distributed real-time systems, IEEE Computer, 36, 933–940. Le Lann G. (1990), Critical issues in the development of distributed real-time computing systems, in 2nd IEEE Workshop on Future Trends of Distributed Computing Systems, Cairo, September 30–October 2, pp. 96–105. Le Lann G. (1992), Designing real-time dependable distributed systems, Computer Communications, 14, 225–234. Le Lann G., Rivierre N. (1993), Real-Time Communications over Broadcast Networks: The CSMA-DCR and the DOD-CSMA-CD Protocols, Research Report INRIA 1863. ROSEMOUNT, Inc. (1991), HART Smart Communications Protocol Specifications, revision 5.1.4, January. Ruiz L., et al. (1999), Using Firewire as an industrial network, in SCCC ’99, Talca, Chile, pp. 201–208. Sanders R., Weaver A. (1990), The Xpress Transfer Protocol (XTP): a tutorial, Computer Communications Review, 20, 65–80. Thomesse J.-P. (1993). Time and industrial local area networks, in the 7th Annual European Computer Conference on Computer Design, Manufacturing and Production (COMPEURO ’93), Paris-Evry, France, May, pp. 365–374. Verissimo P. (1994), Ordering and timeliness requirements of dependable real-time programs, Real-Time Systems, 7, 104–128.
© 2005 by CRC Press
II Ethernet and Wireless Network Technologies 20 Approaches to Enforce Real-Time Behavior in Ethernet .............................................20-1 P. Pedreiras and L. Almeida 21 Switched Ethernet in Automation Networking .............................................................21-1 Tor Skeie, Svein Johannessen, and Øyvind Holmeide 22 Wireless LAN Technology for the Factory Floor: Challenges and Approaches..........22-1 Andreas Willig 23 Wireless Local and Wireless Personal Area Network Technologies for Industrial Deployment ......................................................................................................................23-1 Kirsten Matheus
II-1 © 2005 by CRC Press
20 Approaches to Enforce Real-Time Behavior in Ethernet 20.1 20.2 20.3 20.4 20.5
Introduction ......................................................................20-1 Ethernet Roots...................................................................20-2 Why Use Ethernet at the Fieldbus Level? ........................20-3 Making Ethernet Real Time .............................................20-4 CSMA/CD-Based Protocols..............................................20-5 NDDS • ORTE • Traffic Smoothing
20.6 Modified CSMA Protocols................................................20-8 Virtual-Time CSMA • Windows Protocol • CSMA/DCR • EQuB
20.7 Token Passing ..................................................................20-13 Timed-Token Protocols • RETHER • RT-EP: Real-Time Ethernet Protocol
20.8 Time-Division Multiple Access ......................................20-17 The MARS Bus • Variable-Bandwidth Allocation Scheme
20.9 Master–Slave Techniques ................................................20-18 FTT-Ethernet Protocol • ETHERNET Powerlink
Paulo Pedreiras Universidade de Aveiro
Luís Almeida Universidade de Aveiro
20.10 Switched Ethernet ...........................................................20-20 EDF Scheduled Switch • EtheReal
20.11 Recent Advances ..............................................................20-24 20.12 Conclusion.......................................................................20-25 References ...................................................................................20-26
20.1 Introduction Nowadays, intelligent nodes, i.e., microprocessor based with communication capabilities, are extensively used in the lower layers of both process control and manufacturing industries [52]. In these environments, applications range from embedded command-and-control systems to image processing, monitoring, human–machine interface, etc. Moreover, the communication between different nodes has specific requirements [5] that are quite different from, and sometimes opposed to, those found in office environments. For instance, predictability is favored against average throughput, and message transmission is typically time and precedence constrained. Furthermore, a lack of compliance with those constraints can have a significant negative impact on the quality of the control action in distributed computer control systems (DCCS), or on the quality of the observation of the system state in distributed monitoring systems (DMS). Therefore, to deliver adequate quality of service, special-purpose networks have been developed, essentially during the last two decades, which are generically called fieldbuses and are particularly adapted to support frequent exchanges of small amounts of data under time, precedence, and dependability
20-1 © 2005 by CRC Press
20-2
The Industrial Communication Technology Handbook
constraints [52]. Probably, the most well known are PROFIBUS, WorldFIP, P-NET, Foundation Fieldbus, TTP/C, CAN, and CAN-based systems such as DeviceNet. In the early days of DCCS, network nodes presented simple interfaces and supported limited sets of actions. However, the quantity, complexity, and functionality of nodes on a DCCS have been increasing steadily. As a consequence of this evolution, the amount of information that must be exchanged over the network has also increased, either for configuration or for operational purposes. The increase in the amount of data exchanged among DCCS nodes is reaching limits that are not achievable using traditional fieldbuses due to limited bandwidth, typically between 1 and 5 Mbps [5]. Machine vision is just one example of a killer application for those systems. Therefore, other alternatives are required to support higher-bandwidth demands while retaining the main requirements of a real-time communication system: predictability, timeliness, bounded delays, and jitter. From the 1980s, several general-purpose networks, exhibiting higher bandwidth than the traditional fieldbus protocols, have also been proposed for use at the field level. For example, two prominent networks, Fiber Distributed Data Interface (FDDI) and ATM, have been extensively analyzed for both hard and soft real-time communication systems [50]. However, due to high complexity, high cost, lack of flexibility, and interconnection capacity [50], these protocols have not gained general acceptance. Another communication protocol that has been evaluated for use at the field level is Ethernet. Main factors that favor the use of this protocol are [5] cheap silicon availability, easy integration with Internet, clear path for future expandability, and compatibility with networks used at higher layers on the factory structure. However, the nondeterministic arbitration mechanism used by Ethernet impedes its direct use at the field level, at least for hard real-time communications. Therefore, in the past, several attempts have been made to allow Ethernet to support time-constrained communications. The methods that have been used to achieve deterministic message transmission over Ethernet range from modifications of the medium access control (MAC) layer (e.g., [28]) to the addition of sublayers over the Ethernet layer to control the time instants of message transmission (e.g., [54]) and therefore avoid collisions. More recently, with the advent of switched Ethernet, and therefore the intrinsic absence of collisions, a new set of works concerning the ability of this topology to carry time-constrained communications has appeared (e.g., [50]). This chapter presents a brief description of the Ethernet protocol, followed by a discussion of several techniques that have been proposed or used to enforce real-time communication capabilities over Ethernet during the last two decades. The techniques referred to include those that support either probabilistic or deterministic analysis of the network access delay, thus covering diverse levels of real-time requirements from soft to hard real-time applications. This chapter aims to bring together in one volume different dispersed pieces of related work, trying to provide a global view of this niche area of real-time communication over Ethernet. The presentation is focused more on conceptual grounds than on mathematical formalism, which can be found in the references provided in the text. Moreover, for the sake of coherency with the original work referred to, several different expressions are used interchangeably in the text, but with similar meaning. This is the case of node, station, and host, referring to a computing element in a distributed system with independent network access, as well as message, frame, and packet, referring to an Ethernet protocol data unit.
20.2 Ethernet Roots Ethernet was born about 30 years ago, invented by Robert Metcalfe at Xerox’s Palo Alto Research Center. Its initial purpose was to connect two products developed by Xerox: a personal computer and a brand new laser printer. Since then, this protocol has evolved in many ways. For instance, concerning the transmission speed, it has grown from the original 2.94 Mbps to 10 Mbps [6, 7, 15–18], then to 100 Mbps [19], and more recently to 1 Gbps [20] and 10 Gbps [21]. Concerning physical medium and network topology, Ethernet started as a bus topology based initially on thick coaxial cable [15], and afterwards on thin coaxial cable [16]. In the mid-1980s, a more structured and fault-tolerant approach,
© 2005 by CRC Press
20-3
Approaches to Enforce Real-Time Behavior in Ethernet
TX OK
Idle
TX request Bus busy
Transmit
Sense Bus idle
Collision Jam sequence
Wait backoff time
FIGURE 20.1 Ethernet CSMA/CD simplified state diagram.
based on a star topology, was standardized [17], running only at 1 Mbps, however. At the beginning of the 1990s, an improvement was standardized [18], running at 10 Mbps over category 5 unshielded twisted-pair cable. In this evolution process, two fundamental properties have been kept unchanged: 1. Single collision domain; i.e., frames are broadcast on the physical medium, and all the network interface cards (NICs) connected to it receive them. 2. The arbitration mechanism, which is called carrier-sense multiple access with collision detection (CSMA/CD). According to the CSMA/CD mechanism (Figure 20.1), a NIC with a message to be transmitted must wait for the bus to become idle and only then does it start transmitting. However, several NICs may have sensed the bus during the current transmission and then tried to transmit simultaneously thereafter, causing a collision. In this case, all the stations abort the transmission of the current message, wait for a random time interval, and try again. This retry mechanism is governed by the truncated binary exponential backoff (BEB) algorithm, which duplicates the randomization interval every retry, reducing the probability of further collisions. The number of retries is limited to 16. The use of a single broadcast domain and the CSMA/CD arbitration mechanism has created a bottleneck when facing highly loaded networks: above a certain threshold, as the submitted load increases, the throughput of the bus decreases — a phenomenon referred to as thrashing. In the beginning of the 1990s, the use of switches in place of hubs was proposed as an effective way to deal with thrashing. A switch creates a single collision domain for each of its ports. If a single node is connected to each port, collisions never actually occur unless they are created on purpose, e.g., for flow control. Switches also keep track of the addresses of the NICs connected to each port by inspecting the source address in the incoming messages. This allows forwarding of incoming messages directly to the respective outgoing ports according to the respective destination addresses, a mechanism generally known as forwarding. When a match between a destination address and a port cannot be established, the switch forwards the respective message to all ports, a process commonly referred to as flooding. The former mechanism, forwarding, allows a higher degree of traffic isolation so that each NIC receives the traffic addressed to it only. Moreover, since each forwarding action uses a single output port, several of these actions can be carried out in parallel, resulting in multiple simultaneous transmission paths across the switch and, consequently, in a significant increase in the global throughput.
20.3 Why Use Ethernet at the Fieldbus Level? Operation at the fieldbus level implies the capacity to convey time-constrained traffic associated with sensors, controllers, and actuators. However, as mentioned above, Ethernet was not designed to support that type of traffic, and some of its properties, such as the nondeterministic arbitration mechanism, pose
© 2005 by CRC Press
20-4
The Industrial Communication Technology Handbook
serious challenges for that purpose. Thus, why use it? Several works address the pros and cons of using Ethernet at the field level (e.g., [5][30][54]). Commonly cited arguments in favor are [5]: • It is cheap, due to mass production. • Integration with Internet is easy (Transmission Control Protocol (TCP)/Internet Protocol (IP) stacks over Ethernet are widely available, allowing the use of application layer protocols such as File Transfer Protocol (FTP) and Hypertext Transfer Protocol (HTTP)). • Steady increases on the transmission speed have happened in the past and are expected to occur in the near future. • Due to its inherent compatibility with the communication protocols used at higher levels within industrial systems, the information exchange with the plant level becomes easier. • The bandwidth made available by existing fieldbuses is insufficient to support some recent developments, such as the use of multimedia (e.g., machine vision) at the field level. • Availability of technicians familiar with this protocol. • Wide availability of test equipment from different sources. • Mature technology, well specified and with equipment available from many sources, without incompatibility issues. On the other side, the most common argument against using Ethernet at the field level is its destructive and nondeterministic arbitration mechanism. A potential remedy is the use of switched Ethernet, which allows bypassing of the native CSMA/CD arbitration mechanism. In this case, provided that a single NIC is connected to each port, and the operation is full duplex, no collisions occur. However, just avoiding collisions does not make Ethernet deterministic; for example, if a burst of messages destined to a single port arrives at the switch in a given time interval, their transmission must be serialized. If the arrival rate is greater than the transmission rate, buffers will be exhausted and messages will be lost. Therefore, even with switched Ethernet, some kind of higher-level coordination is required. Moreover, bounded transmission delay is not the only requirement of a fieldbus. Some other important factors commonly referred to in the literature are temporal consistency indication, precedence constraints, and efficient handling of periodic and sporadic traffic, as well as of short packets. Clearly, Ethernet, even with switches, does not provide answers to all those demands.
20.4 Making Ethernet Real Time The previous section discussed the pros and cons of using Ethernet for real-time communication, particularly for use as a fieldbus. Basically, Ethernet by itself cannot fulfill all the properties that are expected from a fieldbus. Therefore, specifically concerning real-time communication, several approaches have been developed and used. Many of them override Ethernet’s CSMA/CD medium access control by setting an upper transmission control layer that eliminates, or at least reduces, the occurrence of collisions at the medium access. Other approaches propose modification of the CSMA/CD medium access control layer so that collisions either seldom occur or, when they do, the collision resolution is deterministic and takes a bounded worst-case time. Moreover, some approaches support such deterministic reasoning on the network access delay, while other ones allow for a probabilistic characterization only. In the remainder of this chapter, some paradigmatic efforts to improve Ethernet’s behavior with respect to meeting deadlines for the network access delay are briefly discussed. For the sake of clarity, they are classified as: • • • • • •
CSMA/CD-based protocols Modified CSMA protocols Token passing TDMA Master–slave techniques Switched Ethernet
© 2005 by CRC Press
Approaches to Enforce Real-Time Behavior in Ethernet
20-5
20.5 CSMA/CD-Based Protocols This category of protocols paradoxically achieves real-time behavior by using standard Ethernet network adapters and relying on the original CSMA/CD contention resolution mechanism. They exploit the fact that the probability of collision upon network access is closely related to the traffic properties, namely, the bus utilization factor, message lengths, and precedence constraints [2][25]. In fact, knowing such traffic properties allows for computing probabilities of either packet losses or deadline misses [25][41]. In many distributed real-time applications the network utilization can be characterized at design time, the traffic load is light, and there is a predominance of short messages. In these cases, the expected deadline miss ratio is low. Reference [41] presents a table that contains a set of different load, bandwidth, and deadline requirements the interval of time for which there is a 99% probability of all messages being delivered within their respective deadlines. For instance, in a 100-Mbps network with 1000 messages of 128 bytes payload generated every second, and a message deadline of 1 ms, the referred interval is 1140 years. However, for the same load but with 2-ms deadlines in a 10-Mbps network that interval is reduced to 1 h. This huge reduction in the interval for which all deadlines are met with 99% probability results from an increase in bus utilization from 1 to 10% approximately. The numbers above show a strong dependence between the network utilization and the deadline miss probability. They also show that by keeping the utilization sufficiently low (1% in that case) and using relatively short messages (128 bytes of payload in the example), the deadline miss probability can be almost negligible. Despite seeming a significant waste of resources, the fact that Ethernet bandwidth is significantly higher than the requirements of many practical applications makes this possibility a practical one. In fact, there are protocols for real-time communication over Ethernet that rely solely on such a low-bandwidth utilization factor with small payloads, as shown in the following subsection on NDDS and ORTE. Achieving higher-bandwidth utilization factors together with a low-deadline-miss probability requires further control over the traffic in order to avoid bursts. This is called traffic smoothing or shaping, and it is explained later on in this section. In both cases, the probability of deadline misses is nonzero but can be made arbitrarily small with a corresponding penalty in bandwidth efficiency. Thus, in principle, these techniques may also be used in hard real-time applications, but their natural target is in soft realtime ones, such as multimedia, in which an occasional deadline miss just causes some transitory performance degradation. The application of these techniques to distributed computer control systems is also possible as long as such systems, from the control point of view, are designed to tolerate occasional sample losses.
20.5.1 NDDS Network Data Delivery Service (NDDS) [38] is a middleware for distributed real-time applications made by Real-Time Innovation, Inc., based on the Real-Time Publisher–Subscriber model [43]. It is currently in a process of standardization within the Object Management Group (OMG), which has recently released the “Data Distribution Service for Real-Time Systems Specification” [4]. The NDDS architecture is depicted in Figure 20.2. The system architecture is centered on the NDDS database, which holds all the pertinent data concerning groups of publishers and subscribers. This database is accessible to the NDDS library and to a set of NDDS tasks. The former component provides a comprehensive set of services to the user applications, in the form of an Application Programming Interface. The NDDS tasks manage subscriptions and services, and send and receive publication updates. The NDDS database is shared among all network nodes, providing them with a holistic view of the communication requirements. Such global knowledge may be used to compute the probabilistic guarantee of deadline misses for the current load. This information is then made available to the application. There is no other mechanism to support traffic timeliness beyond admission control on the current communication load.
© 2005 by CRC Press
20-6
The Industrial Communication Technology Handbook
User Applications
NDDS Library
NDDS Database
NDDS Tasks
Operating Systems Network Interface
FIGURE 20.2 NDDS architecture.
However, in terms of fault tolerance, which is often a requirement of real-time systems, NDDS provides other mechanisms to support publisher redundancy. Thus, each group may have several subscribers and several replicated publishers of the same entity, e.g., a temperature value, all producing it in parallel. Each publication has two additional associated parameters: strength and persistency. The strength defines the relative weight of a publisher with respect to other publishers of the same entity. The persistency specifies the temporal validity of the publication. Subscribers consider a publication only if its strength is greater than or equal to the strength of the last one they received of that entity. In case the persistency window expires, the first publication of that entity after that instant is always accepted, irrespective of its strength. These mechanisms are complemented with sequence numbers assigned to each publication that allow the detection of missing instances. Publishers keep their publications in a buffer for a specified period. During that period, subscribers that missed a publication may request its retransmission.
20.5.2 ORTE The OCERA Real-Time Ethernet (ORTE) communication protocol [47] has been developed in the scope of an open-source implementation of the Real-Time Publisher–Subscriber protocol [43]. This protocol allows establishing statistical real-time channels on Ethernet based on the limitation of the bandwidth utilization. The internal architecture of ORTE is presented in Figure 20.3, and it is functionally equivalent to the NDDS architecture (Section 20.5.1). The ORTE layer is composed of manager objects (M), which are responsible for the traffic management, and managed applications (MA), which are the objects that represent the user application within the ORTE layer. Publisher redundancy and acknowledged message transmissions are supported by mechanisms equivalent to the ones presented in Section 20.5.1.
20.5.3 Traffic Smoothing Kweon et al. [25] introduced the traffic-smoothing technique. In this work, the authors showed analytically that it is possible to provide a probabilistic guarantee that packets may be successfully transmitted
© 2005 by CRC Press
20-7
Approaches to Enforce Real-Time Behavior in Ethernet
API
Store Pub, Sub Database of M, MA, Pub. and Subs. objects
Metatraffic
Database of M and MA objects
User Data Metatraffic
UDP
FIGURE 20.3 ORTE internal architecture.
within a predefined time bound, if the total arrival rate of new packets generated by all the stations on the network is kept below a threshold, called network-wide input limit. The probabilistic guarantee can be expressed by P(D £ Dk*) > 1 – Z, where Z is the loss tolerance and Dk* is the worst-case delay suffered by a packet when it is successfully transmitted at the kth trial. Therefore, if the network average load is kept below a given threshold and bursts of traffic are avoided, a low probability of collisions can be obtained, as an estimation of the network-induced delay. To enforce this behavior, an interface layer called traffic smoother (Figure 20.4) is placed just above the Ethernet data link layer. This element is in charge of shaping the traffic generated by the respective node according to a desired rate commonly referred to as station input limit. The traffic smoother implements a leaky bucket with appropriate depth and rate that captures and smoothes the non-realtime traffic generated by the node. On the other hand, the real-time traffic is, by its nature, nonbursty, thus spaced in time, with short payloads, resulting in low collision probability. Therefore, it does not need smoothing and is immediately sent to the network, bypassing the leaky bucket. Application
{TCP,UDP}/IP
Traffic Smoother RT traffic
NRT traffic
Leaky bucket
Ethernet
FIGURE 20.4 Software architecture of traffic smoothing.
© 2005 by CRC Press
20-8
The Industrial Communication Technology Handbook
The station input limit, i.e., the parameters of the leaky bucket, can be defined either statically at design time or dynamically according to the current traffic conditions. The original implementation used the static approach, in which the station input limit is assigned at design time. A side effect of this technique is that it may lead to poor network bandwidth utilization whenever at runtime one or more stations use less bandwidth than they were assigned. In such circumstances, the unused bandwidth is simply wasted. Moreover, the number of stations must be known a priori to compute the station input limits. This is not adequate to an open system with stations that connect and disconnect at runtime. Kweon et al. [26] and Lo Bello et al. [29] proposed the dynamic approach in which the bus load is assessed online and the station input limit may vary within a predefined range. Thus, stations having higher communication requirements at some particular instant in time may reclaim bandwidth that is not being used by other stations at that time, thus increasing the potential bandwidth utilization. Moreover, as stations dynamically adapt themselves to the traffic conditions, this solution scales well when the number of stations increases. Lo Bello et al. [31] developed a further evolution of the dynamic approach that consists of estimating the network load online using two parameters, the number of collisions and the throughput, both observed in a given interval. These parameters are then fed to a fuzzy controller to set the instantaneous station input limit. The resulting efficiency is substantially higher than both the static and dynamic approaches relying on a single parameter to assess the bus state.
20.6 Modified CSMA Protocols As opposed to the previous category in which the native arbitration mechanism of Ethernet, i.e., CSMA/ CD, is used as it is, in this category such an arbitration protocol is adequately modified, namely, the backoff-and-retry mechanism, in order to improve the temporal behavior of the network (e.g., [3][28][46]). The result is still a fully distributed arbitration protocol of the CSMA family that determines when to transmit based on local information and on the current state of the bus only. There are two common options in this category: delaying transmissions to reduce the probability of collisions or sorting out collisions in a controlled way. This section presents four protocols, the first of which (virtual-time CSMA) follows the first option by implementing a type of CSMA with collision avoidance (CSMA/CA) that delays message transmissions according to a temporal parameter. The remaining three protocols, Windows, CSMA/DCR, and EQuB, follow the second option, modifying the backoff-and-retry mechanism so that the network access delay for any contending message can be bounded. CSMA/DCR and EQuB support a deterministic bound, while Windows still uses a probabilistic approach to sort out particular situations.
20.6.1 Virtual-Time CSMA The virtual-time CSMA protocol is presented in [33] and [37]. It allows implementing different scheduling policies by assigning different waiting times to messages submitted for transmission. The traffic on the bus is then serialized according to such waiting times, following an order that approximates the chosen scheduling policy. This mechanism is highly flexible in the sense that all common real-time scheduling policies can be implemented, either static priorities based (e.g., rate monotonic and deadline monotonic) or dynamic priorities based (e.g., minimum laxity first and earliest deadline first). One of the most interesting features of this protocol is that its decisions are based on the assessment of the communication channel status only. When the bus becomes idle and a node has a message to transmit, it waits for a given amount of time, related to the scheduling policy implemented. For example, if minimum laxity first (MLF) scheduling is used, the waiting time is derived directly from the laxity using a proportional constant. When this amount of time expires, and if the bus is still idle, the node tries to transmit the message. If a collision occurs, then the scheduler outcome resulted in more than one message having permission to be transmitted at the same time (e.g., when two messages in two nodes have the same laxity in MLF). In this case, the protocol can recalculate the waiting time either using the
© 2005 by CRC Press
20-9
Approaches to Enforce Real-Time Behavior in Ethernet
Message collision Message c Message m
Bus
Message a
Message b
Message b
Message c
Events A(i,a) A(j,b)
A(k,c)
d(i,a)
d(j,b)=d(k,c)
Legend: A(l,z) : arrival instant of the lth instance of message z d(l,z) : absolute deadline of instance l of message z
FIGURE 20.5 Example of Virtual-Time CSMA operation using MLF.
same rule or using a probabilistic approach. This last option is important to sort out situations in which the scheduler cannot differentiate messages; e.g., messages with the same laxity would always collide. Figure 20.5 shows the operation of the Virtual-Time CSMA protocol with MLF scheduling. During the transmission of message m, messages a and b become ready. Because the laxity of message a (i.e., time to the deadline minus message transmission time) is shorter than the laxity of message b, message a is transmitted first. During the transmission of message a, message c arrives. Messages b and c have the same deadline and the same laxity. Therefore, an attempt will be made to transmit them at the same time, causing a collision. Then the algorithm uses the probabilistic approach, with message b being given a random waiting time lower than that of message c, and thus being transmitted next. When the transmission of message b ends, the waiting time for message c is recomputed, and only after the expiration of this interval is message c finally transmitted. Beyond the advantage of using standard Ethernet hardware, this approach does not require any other global information but the channel status, which is readily available from all NICs. Thus, the protocol can be implemented in a fully distributed and uniform way, with a relatively low computational overhead. Nevertheless, this approach presents some important drawbacks: • Performance is highly dependent on the proportional constant value used to generate the waiting time, leading to: • Excess of collisions if it is too short • Large amount of idle time if it is too long • The proportional constant value depends on the properties of the message set; therefore, online changes to that set can lead to poor performance. • Due to possible collisions, worst-case transmission time can be much higher than average transmission time, and thus only probabilistic timeliness guarantees can be given. • When implemented in software using off-the-shelf NICs, the computational overhead grows with the level of traffic on the bus because every transmission or collision raises an interrupt in all nodes to trigger the intervals of waiting time. This can be costly for networks with higher transmission rates, such as Fast or Gigabit Ethernet, mainly when the bus traffic includes many short messages.
20.6.2 Windows Protocol The Windows protocol was proposed for both CSMA/CD and Token Ring networks [33]. Concerning the CSMA/CD implementation, the operation is as follows: The nodes on a network agree on a common time interval referred to as a window. All nodes synchronize upon a successful transmission, restarting the respective window. The bus state is used to assess the number of nodes with messages to be transmitted within the window: • If the bus remains idle, there are no messages to be transmitted in the window.
© 2005 by CRC Press
20-10
The Industrial Communication Technology Handbook
lstA
lst B
Step 1
Window 1 Window 2
Step 2 Step 3
lst C
Window 3
Messages A,B and C collide
Messages A and B collide
Message A transmitted
Legend: lst x : latest sending time of message x
FIGURE 20.6 Resolving collisions with the Windows protocol.
• If only one message is in the window, it will be transmitted. • If two or more messages are within the window, a collision occurs. Depending on the bus state, several actions can be performed: • If the bus remains idle, the window duration is increased in all nodes. • In the case of a collision, the time window is shortened in all nodes. • In the case of a successful transmission, the window is restarted and its duration is kept as it is. In the first two cases, the window duration is changed but the window is not restarted. Moreover, the window duration varies between a maximum (initial) and minimum value. Whenever there is a sufficiently long idle period in the bus, the window will return to its original maximum length. If a new node enters dynamically in the system, it may have an instantaneous window duration different from the remaining nodes. This may cause some perturbation during an initial period, with more collisions than expected. However, as soon as an idle period occurs, all windows will converge to the initial length. A probabilistic retry mechanism may also be necessary when the windows are shrunk to their minimum and collisions still occur (e.g., when two messages have the same transmission time). Figure 20.6 shows a possible operational scenario using the Windows protocol implementing MLF message scheduling. The top axis represents the latest send times (lst) of messages A, B, and C. The lst of a message is the latest time instant by which the message transmission must start so that the respective deadline is met. This is equivalent to the laxity of the message as presented in the previous subsection. The first window (step 1) includes the lst of three messages, thus leading to a collision. The intervening nodes feel the collision, and the window is shrunk (step 2). However, the lst of messages A and B are still inside the window, causing another collision. In response to this event, the window size is shrunk again (step 3). In this case only message A has its lst within the window, leading to a successful transmission. This method exhibits properties that are very similar to those of the previous method (virtual-time protocol). It is based on local information only, it supports probabilistic bounds to the network delay, and it can be easily implemented in a fully distributed and uniform way. The computational overhead is also similar to that of the previous case, growing for higher levels of bus traffic when implemented in software. However, this approach is more efficient than virtual time because of its adaptive behavior, which can easily cope with a dynamic number of nodes and dynamic communication requirements. Its efficiency, though, is substantially influenced by the magnitude of the steps in the variations of the window duration.
20.6.3 CSMA/DCR In [28], LeLann and Rivierre presented the CSMA/DCR protocol, where DCR stands for deterministic collision resolution. This protocol implements a fully deterministic network access scheme that consists
© 2005 by CRC Press
20-11
Approaches to Enforce Real-Time Behavior in Ethernet
1 C6 9
2
C3
C3 3 C2
X
4
5
V
C2 6 X2
10
11
V
C3 13 X
7
8
X3
X5
C2
12
14
15
X 12
X 14
X 15
Legend: c s
c : chronological order s : channel status C n - Collided slot (n collisions) V - Empty channel slot X - transmission ok
FIGURE 20.7 Example of tree search with CSMA/DCR. TABLE 20.1 Tree Search Example Search order Channel status Source index
1 C 2 3 5 12 14 15
2 C 2 3 5
3 C 2 3
4 I
5 C 2 3
6 X 2
7 X 3
8 X 5
9 C 12 14 15
10 I
11 C 12 14 15
12 X 12
13 C 14 15
14 X 14
15 X 15
Note: Channel status: C, collision; I, idle; X, transmission.
of a binary tree search of colliding messages; i.e., there is a hierarchy of priorities in the retry mechanism that allows calculation of the maximum network delay that a message can suffer. During normal operation, the CSMA/DCR follows the standard IEEE 802.3 protocol (random access mode). However, whenever a collision is detected the protocol switches to the epoch mode. In this mode, lower-priority message sources voluntarily cease contending for the bus, and higher-priority ones try again. This process is repeated until a successful transmission occurs (Figure 20.7). After all frames involved in the collision are transmitted, the protocol switches back to the random access mode. Figure 20.7 together with Table 20.1 depicts the CSMA/DCR operation in a situation where six messages collide. Considering that lower indexes correspond to higher priorities, after the initial collision the right branch of the tree (messages 12, 14, and 15) ceases contending for bus access. Since there are still three messages on the left branch, a new collision appears, between messages 2, 3, and 5. Thus, the left subbranch is selected again, leaving message 5 out. In the following slot, messages 2 and 3 will collide again. The subbranch selected after this collision has no active messages, and thus in the following time slot the bus will be idle (step 4). This causes a move to the right subbranch, where messages 3 and 5 reside, resulting in a new collision. Finally, in step 6, the branch containing only the message with index 5 is selected, resulting in a successful transmission. The algorithm continues this way until all messages are successfully transmitted. Despite ensuring a bounded access time to the transmission medium, this approach exhibits two main drawbacks:
© 2005 by CRC Press
20-12
The Industrial Communication Technology Handbook
• In some cases (e.g., [28]), the firmware must be modified; therefore, the economy of scale obtained when using standard Ethernet hardware is lost. • The worst-case transmission time, which is the main factor considered when designing real-time systems, can be substantially longer than the average transmission time. Consequently, all worstcase analysis will be very pessimistic, leading to low bandwidth utilization, at least concerning real-time traffic.
20.6.4 EQuB Sobrinho and Krishnakumar [49] proposed the EQuB protocol, which allows achieving predictable behavior on shared Ethernet networks. It consists of an overlay mechanism to the native CSMA/CD that allows real-time and non-real-time traffic to coexist on the same network while providing privileged access to the former over the latter, with a first-come, first-served (FCFS) access discipline between contending real-time sources. The collision resolution mechanism for real-time sources requires disabling the native exponential backoff mechanism of Ethernet and the capacity to transmit jamming sequences with predefined durations. Both features must be configured in the network interface card of the respective hosts, but the latter feature is not commonly supported by off-the-shelf NICs. The underlying real-time traffic model assumes that during long intervals of time, called sessions, real-time hosts generate continuously periodic streams of data to be transmitted over the network. This is the case, for example, when a host initiates the transmission of a video stream at a constant bit rate. Collisions involving non-real-time hosts only are sorted out by the native CSMA/CD mechanism of Ethernet. However, when real-time hosts participate in a collision, they transmit a jamming signal that is longer than that specified in the Ethernet MAC protocol, i.e., 32 bit times. These crafted jamming signals are called black bursts, and their maximum duration is set proportional to the time a given host has been waiting to transmit a given message, i.e., the duration of the collision resolution process. During the transmission of a black burst, the bus state is continuously monitored. If, at some moment, a realtime host contending for the bus detects that no other nodes are sending black bursts, it infers that it is the host having the oldest ready message (highest-priority according to FCFS), subsequently aborts the transmission of its own black burst, and transmits the data message immediately after. If a real-time host transmits its complete black burst and still feels the bus jammed, it infers that other hosts having longer black bursts, and consequently longer waiting times, are also disputing the bus. In this circumstance the host backs off, waiting for the bus to become idle for the duration of an interframe space (IFS). At this time, the black burst duration is recomputed to reflect the increased waiting time, and a new attempt is made to transmit the message. Figure 20.8 illustrates the bus arbitration mechanism with two hosts having one real-time message each, 1 and 2, scheduled for transmission at instants t0 and t1, respectively, while a third data message is being transmitted. Since both hosts feel the bus busy, they wait for the end of current message transmission plus an IFS, which occurs at instant t3. According to EQuB, both nodes attempt to transmit their message at time t3, generating a collision and starting the transmission of black bursts (t4). Since message 2 has a shorter waiting time than message 1, its black burst is completely transmitted, terminating at instant t5, and the respective host backs off, waiting for the bus to become idle again before retrying the message transmission. At that point, the winning host, which has the oldest message, detects that there are no more jamming sequences from other hosts, stops sending its own jamming signal, and initiates immediately the transmission of its data message, which happens at instant t6. It is important to realize that non-real-time data messages always lose the arbitration against any realtime message because real-time hosts transmit their messages right after the jamming signal without further delay, while the non-real-time messages follow the standard Ethernet backoff-and-retry mechanism (BEB). On the other hand, among real-time messages, the ones with longer waiting times are associated with longer black bursts. Thus, they are transmitted before other real-time messages with shorter waiting times, resulting in the FCFS serialization, as discussed before.
© 2005 by CRC Press
20-13
Approaches to Enforce Real-Time Behavior in Ethernet
tacc 1
Maximum black burst duration
twait 1
RT Pack
tacc 2
IFS
1
t
Maximum black burst duration
twait 2
RT Pack
2
t IFS
Data Pack
IFS
k
t t0
t1
t2
t3 t4
t5 t6
t7
t8
Legend
NRT data packet
RT data packet
Black burst
FIGURE 20.8 Black burst contention resolution mechanism.
Moreover, the EQuB protocol also takes advantage of the underlying periodic model of the real-time traffic and schedules the next transmission in each host one period later with respect to the transmission instant of the current instance. Thus, in some circumstances, particularly when the message periods in all real-time hosts are equal or harmonic, the future instances of the respective messages will not collide again, leading to a high efficiency in bus utilization and to a round-robin service of real-time hosts.
20.7 Token Passing One well-known medium access control technique suited for shared broadcast bus networks is token passing. According to this technique, there is a single token in the entire network at any instant and only the node having possession of the token is allowed to trigger message transactions. The token is then circulated among all nodes according to an order that is protocol dependent. In the simplest and most common way, the token rotates in a circular fashion, which tends to divide the bandwidth equally among all nodes in high-traffic load conditions. For asymmetrical bandwidth distribution, some protocols allow the token to visit the same node more than one time in each token round. In both cases, a basic condition for real-time operation is that the time spent by the token at each node must be bounded. This can be achieved by using a timed-token protocol [32], as in the wellknown cases of FDDI, IEEE 802.4 Token Bus, and PROFIBUS. The same technique, i.e., a timed-token protocol, can be used to enforce real-time behavior on Ethernet networks, overriding the native CSMA/ CD arbitration. A common pitfall of the previous approaches is that token losses take time to be detected and recovered from, causing a transitory disruption in the network operation. Also, the bus access is not periodic due to the irregularity of token arrivals, causing a considerable jitter in high-rate periodic message streams. The above pitfalls have been addressed by other protocols, e.g., RETHER and RT-EP, both explained below, that have substantially different token management policies. For example, in RETHER the token rotation is triggered periodically, irrespective of the traffic transmitted in each cycle. On the other hand, in RT-EP the token is first circulated among all nodes to reach an agreement on the highest-priority message ready to be transmitted, and then it is directly sent to the respective transmitting node.
© 2005 by CRC Press
20-14
The Industrial Communication Technology Handbook
Direct Access
TCP/UDP IP
Video/Audio
Real-Time Traffic
Logical Link Control
QoS Sublayer
Token-Passing Protocol
Medium Access Control (MAC) Physical Layer
FIGURE 20.9 Extended Ethernet protocol stack for timed-token operation.
20.7.1 Timed-Token Protocols In timed-token protocols [32], the token visits all the nodes in a fixed order, without previous knowledge about their states concerning the number or priority of ready messages. Therefore, upon token arrival, a node may have several messages ready for transmission or none. In the former case, the node transmits its ready messages while in possession of the token. In the latter case, the token is forwarded immediately. The crux of the protocol consists of enforcing an upper limit to the interval of time that a node can hold the token before forwarding it, i.e., the token-holding time. This interval of time is set dynamically upon each token arrival according to the difference between the target and the effective token rotation times. The target token rotation time is a configuration parameter with a deep impact on the temporal behavior of the system. For example, it influences directly the worst-case interval between two consecutive token visits. The effective token rotation time is the interval that actually ellapses between a token arrival and the arrival of the previous one at the same node. Therefore, during each token visit, a node has more or less time to transmit messages depending on whether the token arrived early or late, respectively. In any case, a minimum transmission capacity is always granted to every node during each token visit to reduce network inaccessibility periods (synchronous bandwidth in FDDI and 802.4, or one high-priority message in PROFIBUS). Knowing the global communication requirements as well as the number of nodes in the network, it is possible to upper bound the time between two consecutive token visits to each node, thus providing an upper bound to the real-time traffic latency. The respective feasibility analysis is shown in [32] for IEEE 802.4 and in [53] for PROFIBUS. Steffen et al. [51] present an implementation of this concept on shared media local area networks. Although aiming particularly at shared Ethernet, the method may also be applied to networks like HomePNA [14] and Powerline [40]. The extended Ethernet protocol stack proposed in [51] is depicted in Figure 20.9. All the nodes connected to the network have a quality-of-service (QoS) sublayer (token-passing protocol in Figure 20.9), which interfaces the logical link control and the medium access control layers. The QoS sublayer overrides the native arbitration mechanism, controlling the access to the bus via a token-passing mechanism. This protocol defines two distinct types of message streams: synchronous and asynchronous. Synchronous traffic is assumed to be periodic and its maximum latency can be bounded. It is characterized by the message transmission time, period, and deadline. Asynchronous traffic is handled according to a best-effort policy, and thus no real-time guarantees are provided. Asynchronous streams are characterized by the message transmission time and desired average bandwidth.
© 2005 by CRC Press
20-15
Approaches to Enforce Real-Time Behavior in Ethernet
Ethernet
NRT 1
1
2
NRT 2
NRT 3
3
NRT 4
4
5
NRT 5
6
RT 1
n
Nodes having RT messages
m
Nodes having exclusively NRT messages
FIGURE 20.10 Sample network configuration for RETHER.
Whenever the token arrives at a node, the synchronous messages are sent first. All nodes are granted at least a predefined synchronous bandwidth in all token visits to send such type of traffic. After the synchronous bandwidth is exhausted, a node can continue to transmit up to the exhaustion of its tokenholding time. After that, the token is forwarded to the next node in the circulation list. In [51] Steffen et al. present the adaptation of existing analytical tools to carry the feasibility analysis of the real-time communication requirements.
20.7.2 RETHER The RETHER protocol was proposed by Venkatramani and Chiueh [56]. This protocol operates in normal Ethernet CSMA/CD mode until the arrival of a real-time communication request, upon which it switches to Token Bus mode. In Token Bus mode, real-time data are considered to be periodic and the time is divided into cycles of fixed duration. During the cycle duration the access to the bus is regulated by a token. First, the token visits all nodes that are sources of real-time (RT) messages. After this phase, and if there is enough time until the end of the cycle, the token visits the sources of non-real-time (NRT) messages. An online admission control policy assures that all accepted RT requests can always be served and that new RT requests cannot jeopardize the guarantees of existing RT messages. Therefore, in each cycle all RT nodes can send their RT messages. However, concerning the NRT traffic, no timeliness guarantees are granted. Figure 20.10 illustrates a possible network configuration with six nodes. Nodes 1 and 4 are sources of RT messages, forming the RT set. The remaining nodes have no such RT requirements and constitute the NRT set. The token first visits all the members of the RT set and after, if possible, the members of the NRT set. A possible token visit sequence could be cycle i {1 – 4 – 1 – 2 – 3 – 4 – 5 – 6}, cycle i + 1 {1 – 4 – 1 – 2}, cycle i + 2 {1 – 4 – 1 – 2 – 3 – 4}, etc. In the ith cycle the load is low enough so that the token has time to visit the RT set plus all nodes in the NRT set, too. In the following cycle, besides the RT set, the token only visits nodes 1 and 2 of the NRT set, and in the next cycle, only nodes 1 through 4 of the NRT set are visited. This approach supports deterministic analysis of the worst-case network access delay, particularly for the RT traffic. Furthermore, if the NRT traffic is known a priori, it is also possible to bound the respective network access delay, which can be important, for example, for sporadic real-time messages. However, since the bandwidth available for NRT messages is distributed according to the nodes order established in the token circulation list, the first nodes always get precedence over the following ones, leading to potentially very long worst-case network access delays. Moreover, this method involves a considerable communication overhead caused by the circulation of the token.
20.7.3 RT-EP: Real-Time Ethernet Protocol The Real-Time Ethernet Protocol (RT-EP) [34][35] is a token-passing protocol that operates over Ethernet and that was designed to be easily analyzable using well-known schedulability analysis techniques.
© 2005 by CRC Press
20-16
The Industrial Communication Technology Handbook
Priority Queues TX Queue
Send_Info
RX Queue RX Queue
Main Communication Thread
Ethernet
Tasks
Init_Comm
Recv_Info RX Queue
FIGURE 20.11 RT-EP node architecture.
An RT-EP network is logically organized as a ring, each node knowing which other nodes are its predecessor and successor. The token circulates from node to node within this logical ring. Access to the bus takes place in two phases: arbitration and application message transmission. In the arbitration phase, the token visits all the nodes arranged in the logical ring to determine the one holding the highest-priority message ready for transmission. For this purpose, the token conveys a priority threshold that is initialized with the lowest priority in the system every time an application message is transmitted. Then, upon token arrival, each node compares the priority of its own ready messages, if any, with the priority encoded in the token. If any of its ready messages has a higher priority than the one encoded in the token, this one is updated. The token also carries the identity of the node that contains the highest-priority message found so far. After one token round the arbitration phase is concluded and the token is sent directly to the node having the highest-priority ready message so that it can transmit it, i.e., application message transmission phase. After concluding the application message transmission, the same node starts a new arbitration phase. Internally (Figure 20.11), each node has one priority transmission queue, within which all outgoing messages are stored in priority order, and a set of reception priority queues, one for each application requesting the reception of messages. The main communication thread handles all the interaction with the network, carrying all the protocol-related operations, namely, the arbitration and application message transmission and reception. User applications have access to three services: • Init_Comm: Performs network initialization • Send_Info: Places a message in the TX queue for transmission • Recv_Info: Reads a message (if present) from the application RX queue RT-EP packets are carried in the data field of the Ethernet frames. There are two distinct types of RTEP packets: token packets and info packets. The token packets are used during the arbitration phase and contain a packet identifier, specifying the functionality of the packet; priority and station address fields, identifying the highest-priority ready message as well as the respective station ID; and a set of fields used to handle faults. The info packets carry the actual application data and contain a packet identifier field, specifying the packet’s type; a priority field, which contains the priority of the message being conveyed; a channel ID field, identifying the destination queue in the receiver node; a length field, defining the message data size; an info field, carrying the actual message data; and a packet number field, which is a sequence number used for fault tolerance purposes.
© 2005 by CRC Press
Approaches to Enforce Real-Time Behavior in Ethernet
20-17
The fault tolerance mechanism [35] allows recovering from message losses, including token losses, within a bounded time. This mechanism is based on forcing all stations to permanently listen to the bus. Following any transaction, the predecessor station monitors the bus waiting for the following transmission of the next frame by the receiving station. If the receiving station does not transmit any frame within a given time window, the predecessor station assumes a message loss and retransmits it. After a predefined number of unsuccessful retries, the receiving station is considered a failing station and is excluded from the logical ring. This mechanism my lead to the occurrence of message duplicates. The sequence number field, present both in the token and info packets, is used to discard the duplicate messages at the receiving nodes. This protocol has been implemented on nodes running MaRTE OS, a POSIX compliant real-time kernel.
20.8 Time-Division Multiple Access Another well-known technique to achieve predictable temporal behavior on shared communication networks is to assign exclusive time slots to distinct data sources, either nodes or devices, in a cyclic fashion. This is known as time-division multiple access (TDMA), and it implies a global synchronization framework so that all nodes agree on their respective transmission slots. Hence, this is also a collision-free medium access protocol that can be used on top of shared Ethernet to override its native CSMA/CD mechanism and prevent the negative impact of collisions. TDMA mechanisms are widely used, mainly in safety-critical applications. Examples of TDMA-based protocols include TTP/C, TT-CAN, SAFEBus, and SWIFTNET. The remainder of this section addresses two particular TDMA implementations on shared Ethernet.
20.8.1 The MARS Bus The MARS bus was the networking infrastructure used in the MARS (Maintainable Real-Time System) architecture [24, 45] developed in the late 1980s. Soon after, the MARS bus evolved into what is nowadays the TTP/C protocol. The MARS architecture aimed at fault-tolerant distributed systems providing active redundancy mechanisms to achieve high predictability and ease of maintenance. In MARS, all activities are scheduled offline, including tasks and messages. The resulting schedule is then used online to trigger the system transactions at the appropriate instants in time. Interactions among tasks, either local or remote, are carried out via MARS messages. It is the role of the MARS bus to convey MARS messages between distinct nodes (cluster components). The MARS bus was based on a 10BASE2 Ethernet using standard Ethernet interface cards. A TDMA scheme was used to override Ethernet’s native CSMA/CD medium access control. The TDMA round consisted of a sequence of slots of equal duration, each assigned to one node in a circular fashion. Moreover, during each slot the tasks in each node were scheduled in a way to prevent contention between tasks on bus access [44].
20.8.2 Variable-Bandwidth Allocation Scheme The variable-bandwidth allocation scheme was proposed for Ethernet networks by Lee and Shin [27]. Basically, it is a TDMA transmission control mechanism in which the slots assigned to each node in the TDMA round can have different durations. This feature allows tailoring the bandwidth distribution among nodes according to their effective communication requirements, and thus it is more bandwidth efficient than other TDMA-based mechanisms relying on equal-duration slots, as was the case in the MARS bus. Nowadays, this feature has been incorporated in most of the existing TDMA-based protocols, e.g., TTP/C and TT-CAN, improving their bandwidth efficiency. Moreover, this technique also encompasses the possibility of changing the system configuration online, namely, adding or removing nodes, a feature that is sometimes referred to as flexible TDMA (FTDMA) [57]. The nomenclature used in [27] uses the expression frame to refer to the TDMA round. Both the frame duration (frame time — F) together with the slot durations (slot times — Hi) are computed according
© 2005 by CRC Press
20-18
The Industrial Communication Technology Handbook
Frame Time
Tc
H1
H2
Hn time
Slot Time
Interslot Time
FIGURE 20.12 The structure of a TDMA frame.
to the specific traffic characteristics. The first slot in each frame (Tc) is reserved for control purposes, such as time synchronization and addition or deletion of nodes. The structure of a TDMA frame is depicted in Figure 20.12. The transmission of the control slot, Tc, as well as the interslot times represent communication overhead. The interslot time must be sufficient to accommodate a residual global clock inaccuracy and to allow nodes to process incoming messages before the start of the following slot. In their work, the authors derive a set of necessary conditions that a given allocation scheme f has to fulfill to compute both the frame (F) and slot durations (Hi) according to the communication requirements, i.e., message transmission times (Ci), periods (Pi), and system overhead (g). f: ({Ci}, {Pi}, g) ({Hi}, F). Based on those conditions, the authors present an algorithmic approach for carrying the computation of F and Hi and compare the results of this methodology with other TDMA approaches, namely, MARS. The results obtained show the improvement in bandwidth utilization that may be achieved with this variable-bandwidth allocation scheme.
20.9 Master–Slave Techniques One of the simplest ways of enforcing real-time communication over a shared broadcast bus, including Ethernet, consists of using a master–slave approach, in which a special node, the master, controls the access to the medium of all other nodes, the slaves. The traffic timeliness is then reduced to a problem of scheduling that is local to the master. However, this approach typically leads to a considerable underexploitation of the network bandwidth because every data message must be preceded by a control message issued by the master, resulting in a substantial communication overhead. Moreover, there is some extra overhead related to the turnaround time, i.e., the time that must elapse between consecutive messages, since every node must fully receive and decode the control message before transmitting the respective data message. Nevertheless, it is a rugged transmission control strategy that has been used in many protocols. This section will describe two examples: ETHERNET Powerlink [9] and flexible time-triggered (FTT) Ethernet [39]. The case of FTT-Ethernet deserves particular attention because it implements a variant of the master–slave technique that allows a substantial reduction in the protocol communication overhead. This is called the master–multislave approach [1] according to which the bus time is broken into cycles and the master issues one control message each cycle only, indicating which data messages must be transmitted therein. This mechanism has been developed within the FTT communication framework [8] and has been implemented over different network protocols, such as Controller Area Network [1] and Ethernet [39].
20.9.1 FTT-Ethernet Protocol The FTT-Ethernet protocol [39] combines the master–multislave transmission control technique with centralized scheduling, maintaining both the communication requirements and the message scheduling policy localized in one single node, the master, and facilitating online changes to both, thus supporting a high level of operational flexibility.
© 2005 by CRC Press
20-19
Approaches to Enforce Real-Time Behavior in Ethernet
Elementary Cycle (EC) [ i ] Synchronous Window TM SM1
SM3
SM8
Elementary Cycle (EC) [ i + 1 ] Async. Window
SM9
{SM1,Tx 1} {SM3,Tx 3} {SM8,Tx 8} {SM9,Tx 9}
CM3
NRTM4
TM SM1
SM4
SM11
CM7 NRT11
NRT21
{SM1,Tx 1} {SM4,Tx 4} {SM11,Tx 11}
FIGURE 20.13 FTT-Ethernet traffic structure.
The bus time is divided in fixed duration time slots called elementary cycles (ECs) that are further decomposed into two phases, the synchronous and asynchronous windows (Figure 20.13), which have different characteristics. The synchronous window carries the periodic time-triggered traffic that is scheduled by the master node. The expression time triggered implies that this traffic is synchronized to a common time reference, which in this case is imposed by the master. The asynchronous window carries the sporadic traffic related to protocol control messages, event-triggered messages, or non-real-time traffic in general. There is a strict temporal isolation between both phases so that the sporadic traffic does not interfere with the time-triggered one. Despite allowing online changes to the attributes of the time-triggered traffic, the FTT-Ethernet protocol enforces global timeliness using online admission control. Due to the global knowledge and centralized control of the time-triggered traffic, the protocol supports arbitrary scheduling policies (e.g., rate monotonic (RM) and earliest deadline first [EDF]) and may easily support dynamic QoS management complementary to admission control. Beyond the flexibility and timeliness properties that this protocol exhibits, there are also some drawbacks that concern the computational overhead required in the master to execute both the message scheduling and the schedulability analysis online. This is, however, confined to one node. The computational power required by the slaves in what concerns the communication protocol is just to decode the trigger message in time and start the due transmissions in the right moments. Finally, in safety-critical applications the master must be replicated, for which there are specific mechanisms to ensure coherency between their internal databases holding the system communication requirements.
20.9.2 ETHERNET Powerlink ETHERNET Powerlink [9] is a commercial protocol providing deterministic isochronous real-time communication, operating over hub-based Fast Ethernet networks. A recently developed version (version 2.0) also allows operation over switched Ethernet networks, but for applications with more relaxed temporal constraints only. The protocol supports either periodic (isochronous) or event (asynchronous) data exchanges, and when implemented on hubs, it also provides a very tight time synchronization (accuracy better than 1 µs) and fast update cycles (in the order of 500 µs) for the periodic traffic. From architectural and functional points of view, this protocol bears many resemblances with the WorldFIP fieldbus. The ETHERNET Powerlink protocol uses a master–slave transmission control technique, which completely prevents the occurrence of collisions at the bus access [10]. The network architecture is asymmetric, composed of a so-called Powerlink manager (master) and a set of Powerlink controllers (slaves). The former device controls all the communication activities, assigning time slots to all the remaining stations. The latter devices, controllers, are passive bus stations, sending information only after an explicit request from the manager. The Powerlink protocol operates isochronously, with the data exchanges occurring in a cyclic framework based on a microcycle of fixed duration, i.e., the Powerlink cycle. Each cycle is divided in four distinct phases: start, cyclic, asynchronous, and idle periods (Figure 20.14).
© 2005 by CRC Press
20-20
The Industrial Communication Technology Handbook
Cycle Time
Manager Controller
Start-Period Start of Cycle
Cyclic-Period Poll Req
Asynch-Period
Poll Req
Poll Res
End of Cycle
Poll Res
Idle-Period
Invite
Send
Time
FIGURE 20.14 Powerlink cycle structure.
A Powerlink cycle starts with a start-of-cycle message, sent by the manager. This is a broadcast message, which instructs controllers that a new cycle will start, and thus allows them to carry the preparation of the necessary data. After the start period is the cyclic period, in which the controllers transmit the isochronous traffic. The transactions carried out in this period (window) are fully controlled by the manager, which issues poll requests (PollRequest) to the controllers. Upon reception of a PollRequest, controllers respond by transmitting the corresponding data message (PollResponse). The PollRequest message is a unicast message, directly addressed to the controller node involved in the transaction. The corresponding PollResponse is a broadcast message, thus facilitating the distribution of data among all system nodes that may need it (producers–distributor–consumers communication model). Isochronous messages may be issued every cycle or every given number of cycles according to the application communication requirements. After completing all isochronous transactions of one cycle, the manager transmits an end-of-cycle message, signaling the end of the cyclic period. Asynchronous transactions may be carried out between the end of the cyclic period and the end of the Powerlink cycle. These messages may be asynchronous data messages (invite/send) or management messages, like Ident/AsyncSend, issued by the manager to detect active stations. Since these transactions are still triggered by the Powerlink manager, any node having asynchronous data to send must first notify the manager of that fact. This is performed during an isochronous transaction involving that particular node, using piggybacked signaling in the respective PollResponse message. The manager maintains a set of queues for the different asynchronous request sources and schedules the respective transactions within the asynchronous period, if there is enough time up to the end of the cycle. In case there is not enough time to complete a given asynchronous transaction or there is no scheduled asynchronous transaction, then the protocol inserts idle time in the cycle (idle period) in order to strictly respect the period of the start-of-cycle message. ETHERNET Powerlink also handles Ethernet packets with foreign protocols, such as TCP/IP. This traffic is conveyed within the asynchronous period. Powerlink provides a special-purpose device driver that interfaces with such upper-protocol stacks.
20.10 Switched Ethernet Since roughly one decade ago, the interest in using Ethernet switches has been growing as a means to improve global throughput, traffic isolation, and reduce the impact of the nondeterministic features of the original CSMA/CD arbitration mechanism. Switches, unlike hubs, provide a private collision domain for each of their ports; i.e., their ports are not directly connected to each other. When a message arrives at a switch port, it is buffered, analyzed concerning its destination, and moved to the buffer of the destination port (Figure 20.15). The packet-handling block in the figure, commonly referred to as switch fabrics, transfers messages from input to output ports. When the arrival rate of messages at each port, either input or output, is greater than the rate of departure, the messages are queued. Currently, most switches are fast enough handling message arrivals so that queues do not build up at the input ports
© 2005 by CRC Press
20-21
Approaches to Enforce Real-Time Behavior in Ethernet
Switch Output Queues
Packet handling
Scheduler
Output ports
Input ports
Receiving buffers
- Address lookup - Traffic classification Scheduler
FIGURE 20.15 Switch internal architecture.
(these are commonly referred to as nonblocking switches). However, queues may always build up at the output ports whenever several messages arrive in a short interval and are routed to the same port. In such a case, queued messages are transmitted sequentially, normally in FCFS order. This queue-handling policy may, however, lead to substantial network-induced delays because higher-priority or more important messages may be blocked in the queue while lower-priority or less important ones are being transmitted. Therefore, the use of several parallel queues for different priority levels has been proposed (formerly IEEE 802.1p, now integrated within IEEE 802.1D). The number of distinct priority levels is limited to eight, but many current switches that support traffic prioritization offer an even further limited number. The scheduling policy used to handle the messages queued at each port also strongly impacts the network timing behavior [22]. A common misconception is that the use of switches, due to the elimination of collisions, is enough to enforce real-time behavior in Ethernet networks. However, this is not true in the general case. For instance, if a burst of messages destined to the same port arrives at the switch, output queues can overflow, thus losing messages. This situation, despite seeming somewhat unrealistic, can occur with a nonnegligible probability in certain communication protocols based on the producer–consumer model, e.g., Common Industrial Protocol (CIP) and its lower-level protocols such as EtherNet/IP (Industrial Protocol) [36], or based on the publisher–subscriber model, such as RTPS [43] used within Interface for Distributed Automation (IDA). In fact, according to these models, each node that produces a given datum (producer or publisher) transmits it to potentially several nodes (consumers or subscribers) that need it. This model is efficiently supported in Ethernet by means of special addresses, called multicast addresses. Each network interface card can define the multicast addresses related to the information that it should receive. However, the switch has no knowledge about such addresses and thus treats all multicast traffic as broadcasts; i.e., messages with multicast destination addresses are transmitted to all ports (flooding). Therefore, when the predominant type of traffic is multicast/broadcast instead of unicast, one can expect a substantial increase of the peak traffic at each output port that increases the probability of queue overflow, causing degradation of the network performance. Furthermore, in these circumstances, one of the main benefits of using switched Ethernet, i.e., multiple simultaneous transmission paths, can be compromised. A possible way to limit the impact of multicasts is using virtual LANs (VLANs) so that flooding affects only the ports of the respective VLAN [36]. Other problems concerning the use of switched Ethernet are discussed in [5], such as the additional latency introduced by the switch in the absence of collisions as well as the low number of available priority levels, which hardly supports the implementation of efficient priority-based scheduling. These problems are, however, essentially technological and are expected to be eliminated in the near future. Moreover, switched Ethernet does alleviate the nondeterminism inherent to CSMA/CD medium access control and opens the way to efficient implementations of real-time communication over Ethernet. The remainder of this section presents two protocols that operate over switched Ethernet to support real-time communication.
© 2005 by CRC Press
20-22
The Industrial Communication Technology Handbook
End node Switch
Application protocols
RT channel management UDP
TCP IP
RT RT layer Queue Ethernet
NRT Queue
MAC PHY
RT layer Ethernet
RT layer MAC PHY
Ethernet
MAC PHY
FIGURE 20.16 System architecture for EDF-based switched Ethernet.
20.10.1 EDF Scheduled Switch Hoang et al. [12][13] developed a technique that supports a mix of real-time (RT) and non-real-time (standard IP) traffic coexisting in a switch-based Ethernet network. The RT traffic is scheduled according to the earliest-deadline-first policy, and its timeliness is guaranteed by means of adequate online admission control. The proposed system architecture, depicted in Figure 20.16, requires the addition of a real-time layer (RT-l) on network components, either end node, as well as the switch. The RT-l is responsible for establishing real-time connections, performing admission control, providing time synchronization, and managing the message transmission and reception of both real-time and non-real-time traffic classes. The switch RT channel management layer provides time synchronization by transmitting periodically a time reference message. Moreover, this layer also takes part in the admission control process, by assessing the internal state of the switch, and consequently its ability to fulfill the timeliness requirements of the real-time message streams, and by acting as a broker between the nodes requesting RT channels and the targets of such requests. Finally, this layer also disseminates the internal switch state, namely, in what concerns the queue’s status, to allow flow control of non-real-time traffic on the end nodes. Real-time communication is carried out within real-time channels, a point-to-point logical connection with reserved bandwidth. Whenever a node needs to send real-time data, it issues a request to the switch, indicating both the source and destination addresses (both MAC and IP), and the period, transmission time, and deadline of the message. Upon reception of such a request, the switch performs the first part of the admission control mechanism, which consists of evaluating the feasibility of the communication between the source node and the switch (uplink) and between the switch and the target node (downlink). If the switch finds the request feasible, it forwards the request to the destination node. The target node analyzes the request and informs the switch about its will on accepting or not the real-time connection. The switch then forwards this answer to the originator node. If the RT channel is accepted, it is assigned with a systemwide channel ID that univocally identifies the connection. The real-time layer is composed of two distinct queues, one for real-time traffic and the other for nonreal-time traffic. The former is a priority queue, where messages are kept sorted by distance to their deadlines. The non-real-time queue holds the messages in a first-in, first-out scheme. Thus, real-time messages are transmitted according to their deadlines, while non-real-time messages are transmitted according to their arrival instant. The feasibility analysis proposed [13] is derived from EDF task scheduling analysis, but with adaptations to account for some system specifics, such as including the overheads due to control messages and the impact of nonpreemptive message transmission. In the scope of that work, deadlines are defined on an end-to-end basis. Since the traffic is transmitted in two separate steps (uplink and downlink), the analysis must ensure that the total delay induced by these steps together does not exceed the total end-to-end deadline. For a given real-time message stream i, if diu is the deadline for the uplink and did the deadline for the downlink, then the end-to-end deadline
© 2005 by CRC Press
20-23
Approaches to Enforce Real-Time Behavior in Ethernet
Ethernet
EtheReal switch M 7x
8x
9x
1x
2x
3x
10x
11x 12x
7x
8x
9x
4x
5x
1x
2x
3x
10x
11x
12
4x
5x
6x
C 7 8 9101112 A
12 34 5 6
A
6x
B
Host B
Ethernet
Host A
RTCD Request
7x
8x
9x
1x
2x
3x
10x
11x
12x
7x
8x
9x
4x
5x
6x
1x
2x
3x
10x
11x
12
4x
5x
6x
C 7 8 9 101112 A
12 34 56
A
B
EtheReal switch N Result
Sender Applications
FIGURE 20.17 Connection setup procedure in the EtheReal architecture.
diee must be at least as large as the sum of the previous two: diu+ did £ diee. In [12], the authors assume an end-to-end deadline equal to the period of the respective message stream, and a symmetric partitioning of that deadline between the uplink and downlink. An improvement is presented in [13], where the authors propose an asymmetric deadline partition scheme. Although more complex, this method allows a higher efficiency in bandwidth utilization, because a larger fraction of the deadline can be assigned to more loaded links, thus increasing the overall schedulability level.
20.10.2 EtheReal The EtheReal protocol [54] is another proposal to achieve real-time behavior on switched Ethernet networks. In this approach, the protocol is supported by services implemented on the switch only, without any changes in the operating system and network layers of end nodes. The switch services are accessible to the end nodes by means of user-level libraries. EtheReal has been designed to support both real-time and non-real-time traffic via two distinct classes. The real-time variable-bit-rate (RT-VBR) service class is meant to support real-time applications. These services use reserved bandwidth and try to minimize the packet delay and packet delay variation (jitter). Applications must provide the desired traffic characteristics during the connection setup, namely, average traffic rate and maximum burst length. If these parameters are violated at runtime, the real-time guarantees do not hold and packets may be lost. The second service class is best effort (BE); it was developed specifically to support existing non-real-time applications like telnet, HTTP, etc., without requiring any modification. No guarantees are provided for this type of traffic. Real-time services in EtheReal are connection oriented, which means that applications have to follow a connection setup protocol before being able to send data to the real-time channels. The connection setup procedure is started by sending a reservation request to a user-level process called real-time communication daemon (RTCD), running on the same host (Figure 20.17). This daemon is responsible for the setup and teardown of all connections in which the host node is engaged. The reservation request for RT connections contains the respective QoS requirements: average traffic rate and maximum burst length. Upon reception of a connection setup request, the RTCD contacts the neighbor EtheReal switch that evaluates whether it has enough resources to meet the QoS requirements of the new RT connection without jeopardizing the existing ones, namely, switch fabric bandwidth, CPU bandwidth for packet scheduling, and data buffers for packet queuing. If it has such resources and if the destination node is directly attached to the same switch, it positively acknowledges the request. If the destination node is in another segment, i.e., connected to another switch, the switch that received the request forwards it to the next switch in the path. A successful connection is achieved if and only if all the switches in the path between the source and target node have enough resources to accommodate the new RT connection. If one switch does not have enough resources, it sends back a reject message, which is propagated down to the requestor node. This procedure serves to notify the requestor application about the result of the
© 2005 by CRC Press
20-24
The Industrial Communication Technology Handbook
operation, as well as to let the intermediate EtheReal switches to de-allocate the resources associated with that connection request. The EtheReal architecture employs traffic shaping and policing, within both hosts and switches. The traffic shaping is performed to smooth the interpacket arrival time, generating a constant rate flow of traffic. Traffic policing is used to ensure that the declared QoS parameters are met during runtime. Those functions are also implemented on the switches to ensure that an ill-behaved node, due to either malfunction or malicious software, does not harm the other connections on the network. With respect to the packet scheduling inside the switch, the EtheReal architecture employs a cyclic round-robin scheduling algorithm. All real-time connections are served within a predefined cycle. A part of that cycle is also reserved to best-effort traffic, to avoid starvation and subsequent time-outs on the upper-layer protocols. Applications access the real-time services by means of a real-time data transmission/reception (RTTR) library, which provides services for connection setup and teardown and data transmission and reception, beyond other internal functions already referred to, such as traffic shaping and policing. Another interesting feature of this protocol is its scalability and high recovery capability, when compared with standard switches. For example, the spanning tree protocol (IEEE 802.1D) is used in networks of standard switches to allow redundant paths and automatic reconfiguration upon a link/switch failure. However, such reconfiguration may take several tens of seconds with the network down, typically around 30 s, which is intolerable for most real-time applications. On the other hand, the authors claim that EtheReal networks may recover nearly three orders of magnitude faster, within 32 ms [55].
20.11 Recent Advances The interest in Ethernet and its use to support real-time communication continues to grow in the industrial domain, in embedded systems, and even in LANs that support QoS-sensitive distributed applications (e.g., video conferencing, Voice-over-IP (VoIP)). This growing interest also motivates a substantial research effort toward solving current limitations and improving real-time performance of Ethernet-based communication systems. This section summarizes a few of the latest results related to Ethernet technologies that were presented in this chapter. The results to be briefly discussed deal with issues such as protocol stack implementation solutions, switched topologies, traffic scheduling within the switches, and shared Ethernet. The way packets are handled by the protocol software (protocol stack) within the system nodes is one of the issues recently addressed in related literature. Most of the operating systems implement a single queue, usually using a first-come, first-served policy, for both real and non-real-time traffic. This approach causes priority inversions and induces unforeseen delays. A methodology that has been recently proposed to solve this problem is to implement multiple transmit/receive queues [48]. With this approach, the real-time traffic is intrinsically separated from the non-real-time traffic, and the latter is sent/processed only when the real-time queues are empty. It is also possible to build separate queues for each traffic class, providing internal priority-aware scheduling. The topology of a switched Ethernet network is another important issue that has recently received a great deal of attention. In fact, the topology has an impact on the number of switches that messages have to cross before reaching the target. This in turn affects the temporal properties of the traffic. For instance, the line topology proposed in [23], in which each device integrates a simplified switch and all devices are chained in a line, eases the cabling, but this is not the most suitable topology for realtime behavior and fault tolerance. On the other hand, the work in [42], or more recently in [6], proposes using a tree topology with two levels only. An optimization algorithm decides the allocation of nodes to switches so that all time constraints are met, taking into account the whole traffic in each branch of the tree, i.e., in each switch. This topology favors the real-time behavior of the system but leads to a more complex cabling. However, neither of the referred topologies considers redundant paths for improved fault tolerance. This issue is addressed in [55], in which a variant of the spanning
© 2005 by CRC Press
Approaches to Enforce Real-Time Behavior in Ethernet
20-25
tree protocol is proposed that is capable of managing redundant paths with recovery times on the order of few tens of milliseconds — a magnitude that is compatible with the time constraints of a large set of practical applications. Another aspect concerning switch-based Ethernet networks deals with the scheduling policy within the switch itself. Switches support up to eight distinct statically prioritized traffic classes. Different message scheduling strategies have a strong impact on the real-time behavior of the switch [23]. Particularly strategies oriented toward average performance and fairness, which are relevant for general-purpose networks, may have a negative impact on the switch real-time performance. Recent research presented in [13] and [55] addresses the use of different scheduling policies within switches, namely, earliest deadline first (EDF) and modified round-robin with fixed duration cycles, respectively. Both works also address admission control to provide timeliness guarantees to the current real-time traffic, while supporting online channel setup and teardown. On the other hand, the interest in shared Ethernet continues, either for applications requiring frequent multicasting, in which case the benefits of using switches are substantially reduced, or for applications requiring precise control of transmission timing, such as high-speed servoing. In fact, switches induce higher delay and jitter in message forwarding than hubs. This is caused by internal mechanisms such as MAC address to port translation in forwarding. In the previous sections, several examples of recent enhancements to the shared Ethernet were discussed, such as the work on adaptive traffic smoothing [31] and master–slave techniques, including both ETHERNET Powerlink [9] and FTT-Ethernet [39] protocols. Nevertheless, these two protocols also operate over switches, in the process incurring additional forwarding delays. In the latter protocol, the switch-based implementation may take advantage of the message queuing in the switch ports to simplify the transmission control. Several existing Ethernet-based industrial protocols, such as Ethernet/IP, also take advantage of switches to improve their real-time capabilities [11][36]. Particularly, Ethernet/IP is receiving considerable support from major international associations of industrial automation suppliers, such as Open DeviceNet Vendor Association (ODVA), ControlNet International (CNI), Industrial Ethernet Association (IEA), and Industrial Automation Open Networking Alliance (IAONA).
20.12 Conclusion Ethernet is the most popular technology for LANs today. Due to its low cost, high availability, and easy integration with other networks, among other characteristics, Ethernet has become an attractive option in application domains for which it was not originally designed. Some such application domains, e.g., industrial automation, impose real-time constraints on the communication services that must be delivered to the applications. This requirement conflicts with the original medium access control technique embedded in the protocol, CSMA/CD, which is nondeterministic and behaves very poorly with medium to high network loads. Therefore, many adaptations and technologies for Ethernet have been proposed to support the desired real-time behavior. This chapter presented an overview of some paradigmatic techniques, ranging from changes to the bus arbitration, to the addition of transmission control layers, to the use of special networking equipment such as switches. These techniques were described and briefly analyzed for their pros and cons in different types of applications. Then a brief summary of the latest results was presented. With the current trends to bring Ethernet into the world of distributed automation systems, it is likely that Ethernet (and its variants based on different solutions to support hard real-time and deterministic behavior, among other requirements) will establish itself as the de facto communication standard for this area. Although its efficiency in terms of bandwidth utilization is still low when considering short messages, particularly compared with the majority of fieldbuses, its already high and increasing bandwidth seems more than enough to supplant such a deficiency. Ethernet thus has a chance to become a single networking technology within automation systems to support the integration of all levels, from the plant floor to management, maintenance, the supply chain, etc.
© 2005 by CRC Press
20-26
The Industrial Communication Technology Handbook
References [1] Almeida, L., Pedreiras, P., and Fonseca, J.A. The FTT-CAN protocol: why and how. IEEE Transactions on Industrial Electronics, 49, 2002. [2] Tanenbaum, Andrew S. Computer Networks, 4th edition. Prentice Hall, Englewood Cliffs, NJ, 2002. [3] Court, R. Real-time Ethernet. Computer Communications, 15, 198–201, 1992. [4] Data Distribution Service for Real-Time Systems Specification, Final Adopted Specification ptc/ 03-03-07. Object Management Group, Inc., July 2003. [5] Decotignie, J.-D. A perspective on Ethernet as a fieldbus. In Proceedings of the 4th FeT 2001: International Conference on Fieldbus Systems and Their Applications, pp. 138–143. Nancy, France, November 2001. [6] Divoux, T., Georges, J.P., Krommenacker, N., and Rondeau, E. Designing suitable switched Ethernet architectures regarding real-time application constraints. In Proceedings of INCOM 2004 (11th IFAC Symposium on Information Control Problems in Manufacturing). Salvador, Brazil, April 2004. [7] DIX Ethernet V2.0 specification, 1982. [8] FTT Web page. Available at http://www.ieeta.pt/lse/ftt. [9] ETHERNET Powerlink protocol. Available at www.ethernet-powerlink.org. [10] ETHERNET Powerlink Data Transport Services White-Paper Ver. 0005. Bernecker + Rainer Industrie-Elektronic Ges.m.b.H., September 2002. Available at http://www.ethernet-powerlink.org. [11] Ethernet/IP (Industrial Protocol) specification. Available at www.odva.org. [12] Hoang, H., Jonsson, M., Hagstrom, U., and Kallerdahl, A. Switched real-time ethernet with earliest deadline first scheduling: protocols and traffic handling. In Proceedings of WPDRTS 2002, the 10th International Workshop on Parallel and Distributed Real-Time Systems. Fort Lauderdale, FL, April 2002. [13] Hoang, H. and Jonsson, M. Switched real-time ethernet in industrial applications: asymmetric deadline partitioning scheme. In Proceedings of the 2nd International Workshop on Real-Time LANs in the Internet Age, RTLIA ’03. Porto, Portugal, July 2003. [14] Home Phoneline Association. Available at http://www.homepna.org. [15] IEEE 802.3 10BASE5 standard. [16] IEEE 802.3 10BASE2 standard. [17] IEEE 802.3c 1BASE5 StarLan standard. [18] IEEE 802.3i 10BASE-T. [19] IEEE 802.3u 100BASE-T. [20] IEEE802.3z 1000BASE-T. [21] IEEE 802.3ae-2002, 10 Gbps. [22] Jasperneit, J. and Neumann, P. Switched Ethernet for factory communication. In Proceedings of ETFA2001: 8th IEEE International Conference on Emerging Technologies and Factory Automation. Antibes, France, October 2001. [23] Jasperneite, J., Neumann, P., Theis, M., and Watson, K. Deterministic real-time communication with switched Ethernet. In Proceedings of WFCS ’02: 4th IEEE Workshop on Factory Communication Systems, pp. 11–18. Västeras, Sweden, August 2002. [24] Kopetz, H., Damm, A., Koza, C., Mulazzani, M., Schwabl, W., Senft, C., and Zainlinger, R. Distributed fault-tolerant real-time systems: the MARS approach. IEEE Micro, 9, 25–40, 1989. [25] Kweon, S.-K., Shin, K.G., and Zheng, Q. Statistical real-time communication over Ethernet for manufacturing automation systems. In Proceedings of the 5th IEEE Real-Time Technology and Applications Symposium. June 1999. [26] Kweon, S.-K., Shin, K.G., and Workman, G. Achieving real-time communication over Ethernet with adaptive traffic smoothing. In Proceedings of RTAS ’00, 6th IEEE Real-Time Technology and Applications Symposium, pp. 90–100. Washington, DC, June 2000. [27] Lee, J. and Shin, H. A variable bandwidth allocation scheme for Ethernet-based real-time communication. In Proceedings of the 2nd International Workshop on Real-Time Computing Systems and Applications, pp. 28–33. Tokyo, Japan, October 1995.
© 2005 by CRC Press
Approaches to Enforce Real-Time Behavior in Ethernet
20-27
[28] LeLann, G. and Rivierre, N. Real-Time Communications over Broadcast Networks: The CSMADCR and the DOD-CSMA-CD Protocols, INRIA Report RR1863. 1993. [29] Lo Bello, L., Lorefice, M., Mirabella, O., and Oliveri, S. Performance analysis of Ethernet networks in the process control. In Proceedings of the 2000 IEEE International Symposium on Industrial Electronics. Puebla, Mexico, December 2000. [30] Lo Bello, L. and Mirabella, O. Design issues for Ethernet in automation. In Proceedings of FeT 2001: 4th FeT IFAC Conference. Nancy, France, 2001. [31] Lo Bello, L., Mirabella, O., et al. Fuzzy traffic smoothing: an approach for real-time communication over Ethernet networks, In Proceedings of WFCS 2002, 4th IEEE Workshop on Factory Communication Systems. Västeras, Sweden, August 2002. [32] Malcolm, N. and Zhao, W. The timed-token protocol for real-time communications, IEEE Computer, 27, 35–41, 1994. [33] Malcolm, N. and Zhao, W. Hard real-time communications in multiple-access networks. Real Time Systems, 9, 75–107, 1995. [34] Martínez, J., Harbour, M., and Gutiérrez, J. A multipoint communication protocol based on Ethernet for analyzable distributed applications. In Proceedings of the 1st International Workshop on Real-Time LANs in the Internet Age, RTLIA ’02, Vienna, Austria, 2002. [35] Martínez, J., Harbour, M., and Gutiérrez, J. RT-EP: real-time Ethernet protocol for analyzable distributed applications on a minimum real-time POSIX kernel. In Proceedings of the 2nd International Workshop on Real-Time LANs in the Internet Age, RTLIA ’03. Porto, Portugal, July 2003. [36] Moldovansky, A., Utilization of modern switching technology in Ethernet/IP networks, In Proceedings of the 1st International Workshop on Real-Time LANs in the Internet Age, RTLIA ’02, Vienna, Austria, 2002. [37] Molle, M. and Kleinrock, L. Virtual time CSMA: why two clocks are better than one. IEEE Transactions on Communications, COM-33, 919–933, 1985. [38] Pardo-Castellote, G., Schneider, S., and Hamilton, M. NDDS: The Real-Time Publish-Subscribe Middleware. Real-Time Innovations, Inc., Sunnyvale, CA, August 1999. Available at http:// www.rti.com/products/ndds/literature.html. [39] Pedreiras, P., Gai, P., and Almeida, L. The FTT-Ethernet protocol: merging flexibility, timeliness and efficiency, pp. 152–160. In Proceedings of the 14th Euromicro Conference on Real-Time Systems. Vienna, Austria, 2002. [40] Powerline Alliance. Available at http://www.powerlineworld.com. [41] Real-Time Innovations, Inc. Can Ethernet Be Real-Time? Available at http://www.rti.com/products/ ndds/literature.html. [42] Rondeau, E., Divoux, T., and Adoud, H. Study and method of Ethernet architecture segmentation for industrial applications, pp. 165–172. In The 4th IFAC Conference on Fieldbus Systems and Their Applications. Nancy, France, November 2001. [43] RTPS (Real-Time Publisher–Subscriber protocol) part of the IDA (Interface for Distributed Automation) specification. Available at www.ida-group.org. [44] Schabl, W., Reisinger, J., and Grunsteidl, G. A Survey of MARS, Research Report 16/89. Vienna University of Technology, Austria, October 1989. [45] Schutz, W. A Test Strategy for the Distributed Real-Time System MARS, Research Report 1/90. Vienna University of Technology, Austria, January 1990. [46] Shimokawa, Y. and Shiobara, Y. Real-time Ethernet for industrial applications, pp. 829–834. In Proceedings of IECON. 1985. [47] Smolik, P., Sebek, Z., and Hanzalek, Z. ORTE: open source implementation of Real-Time PublishSubscribe Protocol. In Proceedings of the 2nd International Workshop on Real-Time LANs in the Internet Age, RTLIA ’03. Porto, Portugal, July 2003. [48] Skeie, T., Johannesses, S., and Holmeide, O. The road to an end-to-end deterministic Ethernet. In Proceedings of WFCS ’02: 4th IE,EE International Workshop on Factory Communication Systems, pp. 3–9. Västeras, Sweden, August 2002.
© 2005 by CRC Press
20-28
The Industrial Communication Technology Handbook
[49] Sobrinho, J.L. and Krishnakumar, A.S. EQuB-Ethernet quality of service using black bursts. In Proceedings of the 23rd Conference on Local Computer Networks, pp. 286–296. Boston, MA, October 1998. [50] Song, Y. Time constrained communication over Switched Ethernet. In Proceedings of the 4th FeT 2001: International Conference on Fieldbus Systems and Their Applications, pp. 138–143. Nancy, France, November 2001. [51] Steffen, R., Zeller, M., and Knorr, R. Real-time communication over shared media local area networks. In Proceedings of the 2nd International Workshop on Real-Time LANs in the Internet Age, RTLIA ’03. Porto, Portugal, July 2003. [52] Thomesse, J-P. Fieldbus and interoperability. Control Engineering Practice, 7, 81–94, 1999. [53] Tovar, E. and Vasques, F. Cycle time properties of the PROFIBUS timed token protocol. Computer Communications, 22, 1206–1216, 1999. [54] Varadarajan, S. and Chiueh, T. EtheReal: a host-transparent real-time Fast Ethernet switch. In Proceedings of the 6th International Conference on Network Protocols, pp. 12–21. Austin, TX, October 1998. [55] Varadarajan, S. Experiences with EtheReal: a fault-tolerant real-time Ethernet switch. In Proceedings of the 8th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), pp. 184–195. Antibes, France, October 2001. [56] Venkatramani, C. and Chiueh, T. Supporting real-time traffic on Ethernet. In Proceedings of IEEE Real-Time Systems Symposium. San Juan, Puerto Rico, December 1994. [57] Willig, A. A MAC protocol and a scheduling approach as elements of a lower layer architecture in wireless industrial LANs. In Proceedings of WFCS ’97 (IEEE International Works on Factory Communication Systems). Barcelona, Spain, October 1997.
© 2005 by CRC Press
21 Switched Ethernet in Automation Networking 21.1 The Switches Are Not the Complete Network................21-1 21.2 Analyzing Switched Fast Ethernet....................................21-2 The Learning Process inside the Switch
21.3 There Are Always Bottlenecks ..........................................21-3 Even Highways Have Queues • Introducing a Standard for Priority and Delivery • High-Priority Packets Get High-Priority Treatment • Bottleneck Conclusions
21.4 Time Synchronization across Switched Ethernet............21-7
Tor Skeie ABB Corporate Research
Svein Johannessen ABB Corporate Research
Øyvind Holmeide OnTime Networks
The Concept of Time Stamping • Synchronization Requirements in Substation Automation • How to Be Extremely Accurate: IEC Class T3 • Measurements on an Actual Network • Beyond the Speed of Light: Class T5 • Summary and Conclusions
21.5 Introducing Virtual Subnetworks ..................................21-13 Port Group VLANs • Group-Based VLANs: GARP VLAN Registration Protocol • Layer 3-Based VLANs
References ...................................................................................21-15
In recent years Ethernet technology has taken several giant evolution steps. From the 10 Mbit/s shared cable in 1990, the state of the art is now a switch-based communication technology running at 100 or 1000 Mbit/s. The network switches are also getting more sophisticated in that they have started including support for packet priority and virtual networks. Since all this power comes at a steadily decreasing cost, it is steadily invading other cost-sensitive areas, most notably automation networks. This chapter will look at some critical aspects of switched Fast Ethernet as an automation network.
21.1 The Switches Are Not the Complete Network Exchanging the classic Ethernet infrastructure for a switched infrastructure does wonders for the infrastructure, but not necessarily for our automation network. The reason for this is illustrated in Figure 21.1, which shows all the stages in an information transfer between an automation controller and an input/output (I/O) node. The purpose of this chapter is to: • Analyze the advantages, disadvantages, and peculiarities of a switched Ethernet automation network • Point out the remaining bottlenecks between the controller software and the I/O node software
21-1 © 2005 by CRC Press
21-2
The Industrial Communication Technology Handbook
Controller softw are
I/O node softw are
Protocol
Protocol
Netw ork Interface Drop Link
Netw ork Interface
Netw ork
Drop Link
FIGURE 21.1 The complete end-to-end network.
• Discuss how the characteristics of the switched network affect some specialized automation network tasks
21.2 Analyzing Switched Fast Ethernet Let us start by recapitulating some basic facts about switched Ethernet automation networks: 1. Just as in a hub, an Ethernet switch contains a number of ports. Each port is either active (connected to an active Ethernet node) or passive (disconnected or connected to an inactive Ethernet node). 2. The connection between an active port and its associated Ethernet node is point to point. The connection may be full duplex (send and receive simultaneously) if the associated node supports it; otherwise, it is half duplex. 3. Each port in an Ethernet switch has a table of network addresses associated with that port. These tables are created by inspecting network packets sent from the node and extracting the source address from the packets. 4. Ethernet switches use a store-and-forward approach, which means that the complete network packet is received and verified by the switch before it is transferred to the output port. 5. The transfer of a network packet from one port to another inside the Ethernet switch is done by memory-to-memory copy at a very high speed. 6. An Ethernet switch does not use the collision mechanism of classic Ethernet. 7. Several transfers between different ports may take place more or less simultaneously. If an Ethernet switch has N ports, it should also be capable of handling N simultaneous connections running at full link speed (assuming that none of them request the same output port). This means that an Ethernet switch has a potentially much greater data transfer capability than a hub.
21.2.1 The Learning Process inside the Switch Item 3 above gave the algorithm the switch uses for the internal transfer of network packets. What happens if the switch does not have the destination node address in any port table? In this case, the switch plays it safe and transfers the packet to every output port in the switch. It reduces the switch performance to that of a hub, but it ensures that if the destination node is anywhere on the network, the packet will get through. In the meantime, the switch has learned the location of the source node and stored it in the correct table. This learning process is usually very fast, since each node only has to answer once in order for it to be associated with the correct port. There are, however, some cases when the learning process fails, the most important being: • Ethernet broadcast • Ethernet multicast 21.2.1.1 Broadcast: Use with Care Ethernet inherited the concept of broadcast (an address that is accepted by every node on the network) from high-level data link control (HDLC). This functionality turned out to be very useful in several network protocols:
© 2005 by CRC Press
Switched Ethernet in Automation Networking
• • • •
21-3
Name announcing in NetBIOS (“Hi everybody, my name is …”) The Service Advertising Protocol (SAP) in IPX/SPX Network address inquiry in ARP (“Hello everybody, who has got this IP address?”) IP address deployment in BOOTP and DHCP (“I just woke up and I want an IP address!”)
The drawbacks inherent in excessive usage of broadcast messages are: • Broadcast messages cannot be filtered by hardware and must perforce be handled by software, thus consuming CPU resources. • If broadcast messages are allowed to pass through bridges and routers unchecked, they may create a broadcast storm (broadcast messages propagated in circles and thereby clogging the whole network). • If a broadcast address is used as a source address, it may create a lot of network problems (this is a well-known way of creating network meltdown). • It reduces the performance of a switched network to that of a hub-based network. 21.2.1.2 Multicast: A Useful, but Dangerous Concept A huge number of Ethernet addresses are reserved for multicast usage. The multicast concept addresses the need for creating network groups of some sort. Traditionally, the multicast concept has been popular in some automation contexts, usually associated with a philosophy called publish-and-subscribe. This philosophy allows independently developed distributed applications to be able to exchange information in an event-driven manner without needing to know the source of the data or the network topology. Information producers publish information anonymously. Subscribers anonymously receive messages without requesting them. Like other broadcast-based models, publish-and-subscribe is efficient when used on a classic network. For example, if the cost of energy changes in a distribution system, only a single transmission is required to update all of the devices dependent on the energy price. (This is, of course, in the best or most optimistic case.) On a switched Ethernet network, the situation changes drastically. For a standard (unmanaged) switch, multicasting actually uses more bandwidth than sending the same message to one recipient after another. This surprising fact is best illustrated with an example. If we have a sixteen-port switch and send the same message to four nodes, the message will occupy the sending port four times and the four receiving ports one time each, for a total of eight message times. Sending it as a multicast means that the message will occupy all sixteen ports for one message time — a total of sixteen message times. The moral in this case is: Know your protocols and your infrastructure — and make sure they work together, not against one another.
21.3 There Are Always Bottlenecks Referring again to Figure 21.1, we observe that the network infrastructure is just one part of the communication path from the controller application to the I/O node application. The drop links are unchanged from the hub-based network, and the end nodes (the controller and the I/O station) may well be unchanged from the days of the 10-Mbit network — or even older. We shall now proceed to look at the most important bottlenecks in our switched network.
21.3.1 Even Highways Have Queues 21.3.1.1 Queues in the Switches The nondeterministic behavior of traditional switched Ethernet is caused by unpredictable traffic patterns. At times, packets from several input ports will be destined for the same output port and some of them must perforce be queued up waiting for the output port to be free. The reason is that at times there
© 2005 by CRC Press
21-4
The Industrial Communication Technology Handbook
will be a lot of low-priority traffic in the system (node status reports, node firmware update, etc.). If we do not introduce some sort of traffic rules, such a situation will give an unpredictable buffering delay depending on the number of involved packets and their length. In the worst case, packets can be lost when the amount of packets sent to an output port exceeds the bandwidth of this port for a period that is longer than the output buffer is able to handle. An automation network will have such coexisting real-time traffic (raw data, time sync data, commands, etc.) and noncritical data (Transmission Control Protocol (TCP)/Internet Protocol (IP) — file transfer, etc.). A maximum size packet (1518 bytes) represents an extra delay of 122 ms for any following queued packets in case of a 100-Mbps network. 21.3.1.2 Queues on the Drop Links Even if we manage to introduce some sort of priority mechanism in our system, we have one queue mechanism left. Since some standards (International Electrotechnical Commission (IEC) 61850-9 springs to mind) propose to transfer real-time data by using a publish-and-subscribe approach (which is again based on the use of multicast groups), a standard unmanaged Ethernet switch has to route these data packets onto every drop link in the system adhering to the broadcast paradigm. Since all those real-time packets might have the same priority, they will be put in the same switch queue on all output ports. The processing rate for these queues is equal to the bandwidth of the drop link between the switch and the nodes, and therefore is effectively a drop link queue. This way, important data packets destined for one node may be delayed by multicast traffic destined for another node. Obviously, we need to introduce some traffic rules/mechanisms for multicast data packets as well in order to reduce the worst-case transfer time. 21.3.1.3 Queues in the Nodes Introducing traffic rules in the Ethernet switches will improve the worst-case latency across the network (from the Ethernet controller in the source node to the Ethernet controller in the destination node). Inside the node, however, there is usually only one single network task, and just a single hardware queue associated with the Ethernet controller. Since a network packet spends a large percentage of its total endto-end time inside a node, internal packet prioritization is needed in order to have maximal control over the total packet transfer time (80 to 90% of the end-to-end message latency is spent within the end nodes, at least when adhering to the Internet Engineering Task Force (IETF) protocols, User Datagram Protocol (UDP)/TCP/IP). • For the transmit operation, one typically could have a situation where multiple maximum size Ethernet packets, for example, fragments of an FTP transfer, queued up at the Ethernet driver level. In a standard implementation, real-time packets will be added to the end of the queue. Such behavior will cause a nonpredictable delay in transmission. • For the receive operation, the protocol stack implementation represents a possible bottleneck in the system. This is mainly due to the first-come, first-served queue at the protocol multiplexer level (the level where the network packets are routed to the appropriate protocol handler software).
21.3.2 Introducing a Standard for Priority and Delivery 21.3.2.1 High-Priority Packets Jump Ahead in Switch Queues Institute of Electrical and Electronics Engineers (IEEE) 802.1D (see [3]) has been introduced to alleviate the switch queue problem; moreover, the standard specifies a layer 2 mechanism for giving missioncritical data preferential treatment over noncritical data. The concept has primarily been driven by the multimedia industry and is based on priority tagging of packets and implementation of multiple queues within the network elements in order to discriminate packets [1]. For tagging purposes IEEE 802.1Q [4] defines an extra field for the Ethernet medium access control (MAC) header. This field is called the Tag Control Info (TCI) field and is inserted as indicated by Figure 21.2. This field contains three priority bits; thus, the standard defines eight different levels of priority.
© 2005 by CRC Press
21-5
Switched Ethernet in Automation Networking
Destination
Source
0x8100
Tag
xxx x
Type
FCS
0xYYY
12-bit 80.21Q VLAN identifier "Canonical" - 1 bit 3-bit priority field (802.1D) Tagged frame type interpretation - 16 bit
FIGURE 21.2
MAC header (layer 2) with tag.
21.3.2.2 Multicast Distribution Rules Reduce Drop Link Traffic Figure 21.2 defines more than some priority bits. If you look closely, you can see something called a “12bit 802.1Q VLAN identifier.” This virtual local area network (VLAN) identifier is a mandatory part of each TCI field and can, when properly used, remove all unnecessary drop link traffic in an automation network based on publish-and-subscribe. We will discuss the handling and usage of this field in the section on VLANs.
21.3.3 High-Priority Packets Get High-Priority Treatment A network data packet spends a relatively small percentage of its application-to-application transfer time on the physical network. The actual percentage varies with the speed of the network and the performance of the node, but for a 100 Mbit/s Fast Ethernet, the average percentage is between 20 and 0.1%. For this reason, implementing network priority will have little influence on the average application-to-application transfer time (it will, however, have a large influence on the worst-case transfer time). In order to improve the average application-to-application transfer time, the concept of priority must be extended to include the protocol layers in both the sending and receiving ends. In order to accomplish this, we must consider: • Adjustable process priority for the protocol stack software • Several instances of the protocol stack software running at different priority levels • Multiple transmit queues at the Ethernet driver level 21.3.3.1 Matching Network Process Priority to Packet Priority In most real-time operating systems, the protocol stack runs as a single thread in the context of some networking task. If we want the task priority to depend on the packet priority, we can implement it in a very simple way: 1. At compile time, decide on the task priority that should correspond to each packet priority and to an untagged packet. Put those priorities in a table. 2. Set a high basic networking task priority (the priority it uses when no packet is being processed). 3. When the networking task starts to process a packet, it should extract the packet priority and use it with the priority table described above to find the appropriate task priority. 4. Change the task priority by executing a system call. 21.3.3.2 Multiple Instances of the Protocol Stack Software There is, however, a problem with the previous solution. Adjusting to correspond to the packet priority ensures that the processing of high-priority network packets does not get interrupted by less important administrative tasks. Still, networking tasks process incoming messages one at a time in a linear fashion. What we really want is to process high-priority messages before low-priority ones. In fact, if we could suspend the processing of low-priority network packets when a high-priority message arrives, we would have the perfect solution. At first glance, there exists an ideal solution: 1. At compile time, decide on the task priority that should correspond to each packet priority and to an untagged packet. Create one task for each priority and put the task IDs in a table.
© 2005 by CRC Press
21-6
The Industrial Communication Technology Handbook
2. When a packet arrives, extract the packet priority, use it with the task table described above, send the packet to that task, and send a signal to the task in order for it to start processing. The problem with this solution is that it supposes that the network software is reentrant, a condition that is seldom fulfilled. Rewriting the stack software to make it reentrant is not hard, just tedious and time-consuming. Of course, it also means that you have to support the rewritten software in the future. Running multiple instances of the protocol software may be the most elegant and efficient solution, but be prepared to spend some time and resources on it. 21.3.3.3 Multiple Receive Queues and Adjustable Priority If we do not want to spend time and money on making the network software reentrant, there is an alternative solution available. This solution does not suspend the processing of lower-priority packets, but selects the next packet to be processed from a set of criteria based on the packet priority. One possible implementation goes like this: 1. At compile time, decide on the task priority that should correspond to each packet priority and to an untagged packet. Create one queue for each priority and put a pointer to the queue in a table. 2. Whenever the network software is ready to process the next packet, it pulls all packets from the input queue and distributes them to the priority queues. 3. When the input queue is empty, the priority selection algorithm is run. This algorithm may be implemented in several different ways, for example: • Always pick the packet from the highest-priority nonempty queue. • Pick the packet from the highest-priority nonempty queue a certain number of times. Then move the first packet from each nonempty queue to the next higher-priority queue before picking the packet from the highest-priority nonempty queue again. • Introduce a LOW flag. Wait for a packet to appear in the highest-priority queue or for the LOW flag to be set. If the packet appeared in the highest-priority queue, set the LOW flag and process that packet. If the LOW flag was set, reset it and process the packet from the highestpriority nonempty queue. 21.3.3.4 Implementing Multiple Transmit Queues On a transmit request, a standard Ethernet driver takes a buffer, does some housekeeping, and transfers it to the Ethernet controller hardware. Such a driver is simple and fair (first come, first served), but may be unsuitable for an efficient priority implementation. One or more large low-priority packets, once they are scheduled for transmission, will delay high-priority packets for the time it takes to transfer the lowpriority ones. If we want high-priority packets to go to the head of the transmission queue, and the hardware does not support multiple queues, we must do some priority handling at the driver level. The simplest solution is to use two queues: the hardware queue and a software queue. Low-priority packets go into the software queue, and high-priority packets go straight into the hardware queue. From this point onward, there exist several algorithms addressing different needs. • If the real-time requirements are moderate, move the first packet in the software queue to the hardware queue whenever the hardware queue is empty. • If the real-time requirements are strict, move the first packet in the software queue to the hardware queue whenever a high-priority packet has been placed in the hardware queue.
21.3.4 Bottleneck Conclusions The conclusion, based on a large set of real-world measurements (see [Reference 2]) is that: 1. At 100 Mbit/s, the Ethernet switch and the drop links do not constitute a bottleneck under FTP load. 2. If several switches are interconnected by standard drop link cables, these cables might represent bottlenecks. Most switches have provisions for Gigabit Ethernet (1000 Mbit/s) interconnections, however, thereby removing this possibility.
© 2005 by CRC Press
Switched Ethernet in Automation Networking
21-7
3. The main communication delays are inside the nodes. A suitable queue strategy for high-priority packets ensures that those will be processed before low-priority packets. 4. For a high-performance node processor, implementing prioritized transmit queues has a greater overall impact on high-priority packet delays than implementing prioritized receive queues. 5. If you want to implement internal receive priority queuing, ensure that it is possible to configure the chosen switch not to remove the priority tagging information.
21.4 Time Synchronization across Switched Ethernet Now and then you come across measurement problems that are tightly associated with the notion of synchronicity, meaning that things need to happen simultaneously. The usual things that need such synchronicity are data sampling and motion control. In the case of data sampling, you need to know the value of two different quantities measured at the same time (within a narrow tolerance). If the measurement sources are close together, this is fairly easy to accomplish, but if they are far apart and connected to different measurement nodes, it suddenly gets harder. The usual choices are: 1. Use a special hardware signal on a separate cable between the controller and all nodes that need synchronization. If the nodes are far apart and the tolerances are tight, make sure that all cables that carry the synchronization signal have the same length. 2. Add a local clock to each node and use the present automation network to keep them in synchronization. Tell each node how often the measurement sources should be sampled and require the node to time-stamp each measurement. We shall now take a look at the hardest synchronization requirements for automation purposes and discuss the possibility of implementing class T5 (1 ms) and class T3 (25 ms) synchronization in a multitraffic switched Ethernet environment. Common for both solutions is that they adhere to the same standardized time protocol. Such a step would significantly reduce the cabling and transceiver cost, since costly dedicated (separate) links are used for this purpose today.
21.4.1 The Concept of Time Stamping Let us start at the very beginning — the concept of time stamping: Time stamping is the association of a data set with a time value. In this context, time may also include date. Why would anybody want to time-stamp anything? The closest example may be on your own PC — whenever you create a document and save it, the document is automatically assigned a data and time value. This value enables you to look for: • Documents created on a certain date (for example, last Monday) • Documents created within a certain time span (for example, last half of 1998) • The order in which a set of documents was created (for example, the e-mails in your inbox) If we just look at the examples above, we see that the accuracy we need for the time stamping is about the same as that which we expect from our trusty old wristwatch. This means “within a couple of minutes,” but as long as the clock does not stop, it does not really matter much how precise it is. 21.4.1.1 Let Us Synchronize Our Watches Now we know about time stamping on our own PC. The next step is to connect the PC to a network, maybe even to the Internet, and start exchanging documents and e-mails. What happens if the clock in your PC (the clock that is used for time stamping) is wrong by a significant amount? • If you have an e-mail correspondence with someone, a reply (which is time-stamped at the other end) might appear to be written before the question (which is time-stamped at your end). • If you collaborate on some documents, getting the latest version might be problematical.
© 2005 by CRC Press
21-8
The Industrial Communication Technology Handbook
Therefore, when several PCs are connected together in any sort of network, the PC clocks are still accurate enough, but a new requirement is that they should be synchronized (show the same time at one point in time). Now, we could go around to each PC, look at our wristwatch, and set the PC clock to agree with it. The trouble is that this is a boring and time-consuming job and we should look for a better solution. One solution is to elect one PC to be the time reference, which means that every other PC should get the current time from it at least once a day and set its own clock to agree with that time. This solution works satisfactorily on a LAN, but all PC clocks will lag the time reference by the time it takes a clock value to travel from the time reference to the synchronizing PC. Except for very unusual cases, this lag is less than 1 s and thus good enough for office purposes. Enter the Internet. Suddenly the synchronization problem escalates, since two collaborating PCs may be located in different time zones (remember to compensate for that) and a synchronization message may take a long time to travel from one PC to the other. Fortunately, the Internet Network Time Protocol has a solution to both problems. This protocol involves sending a time-stamped time request message to a timeserver. This timeserver adds an arrival time stamp and a retransmit time stamp before returning the request message to the requesting PC. The requesting PC time-stamps the message when it returns and uses all the time stamps in calculating the correct time. This protocol and its little brother, the Simple Network Time Protocol (SNTP), are able to synchronize computers across the Internet with a precision in the low milliseconds.
21.4.2 Synchronization Requirements in Substation Automation In the energy distribution world, a substation is an installation where the energy is combined, split, or transformed. A substation automation (SA) system refers to tasks that must be performed in order to control, monitor, and protect the primary equipment of such a substation and its associated feeders. In addition, the SA system has administrative duties such as configuration, communication management, and software management. The communication within SA systems is crucial from the point that the functionality demands for very time-critical data exchange. These requirements are substantially harder than the corresponding requirements in general automation. This is also true for the required synchronization accuracy of the IED’s* internal clock in order to guarantee precise time stamping of current and voltage samples. Various SA protection functions require different levels of synchronization accuracy. IEC has provisionally defined five levels: IEC classes T1 to T5 (IEC 61850-5, Sections 12.6.6.1 and 12.6.6.2): • • • • •
IEC class T1: 1 ms IEC class T2: 0.1 ms IEC class T3: ±25 ms IEC class T4: ±4 ms IEC class T5: ±1 ms
Since these definitions and classes are not yet frozen, we will refer to them here as class T1, class T2, etc. (without IEC). At this point in time, the substation automation field is also on the edge of migrating toward the usage of switched Fast Ethernet as the automation network infrastructure. The ultimate vision is to achieve interoperability between products from different vendors on all levels within the substation automation field. A proof of this new trend is the upcoming IEC 61850 standard, Communication Networks and Systems in Substations, issued by the technical committee. Inventions of de facto standard concepts and adoption of off-the-shelf technologies are the key instruments to reaching the interoperability goal. Figure 21.3 illustrates the communication structure of a future substation adhering to switched Ethernet as a common network concept holding multiple coexisting traffic types [5].
*IED: Intelligent Electrical Device.
© 2005 by CRC Press
21-9
Switched Ethernet in Automation Networking
Gateway/ firew all
Local station
P&C
P&C
IED IED
IED
IED IED
IED
1µs
FIGURE 21.3 The communication network in a future substation automation system using Ethernet as a common network infrastructure.
We already know that switched Fast Ethernet has sufficient real-time characteristics to meet very demanding automation requirements. What is left to show is that it is possible to implement the various IEC classes of synchronization accuracy over such a network. This is considered to be the final obstacle of fully migrating to Ethernet in substation automation. 21.4.2.1 Time Synchronization Status There is a plethora of proposed theory and methods for synchronizing clocks in distributed systems. The most prominent public domain synchronization method is the Network Time Protocol (NTP), which is standardized by the IETF in RFC1305. A subset of NTP (Simple Network Time Protocol) is also defined and is protocol compatible with NTP. The intended use of NTP is to synchronize computer clocks in the global Internet. For this purpose, it relies on sophisticated mechanisms to access national time, organize time synchronization subnets possibly implemented over various media, and adjust the local clock in each participating peer. SNTP, on the other hand, does not implement the full set of NTP algorithms and is targeting simpler synchronization purposes. Common for this body of work is that it does not present solutions for low-microsecond accuracy, instead targeting synchronization of LANs and wide area networks (WANs) in the general sense where a precision of some milliseconds is sufficient. Looking at the automation field in general and especially at the SA world, we find a diversity of proprietary and patented solutions in order to achieve highly accurate time synchronization over the Ethernet. Interoperability concerns are, however, present. 21.4.2.2 Stating the Problem: Why Network Synchronization Is Difficult The delays from the time stamping of a time synchronization message in the message source node until it is time-stamped in the message destination node are: • • • • • •
Message preparation delay Communication stack traversal delay (transmission) Network access delay Network traversal delay Communication stack traversal delay (reception) Message handling delay
Variations in the delays are due to: • Real-time operating system (RTOS) scheduling unpredictability • Network access unpredictability • Network transversal time variations
© 2005 by CRC Press
21-10
The Industrial Communication Technology Handbook
Time stamping at the lowest stack level helps eliminate the stack delay variations and RTOS scheduling unpredictability, but introduces some complications in the implementation.
21.4.3 How to Be Extremely Accurate: IEC Class T3 We have mentioned that the precision that may be achieved by the traditional NTP/SNTP implementations is 1 ms at best. Basically, this stems from the time stamping of incoming and outgoing NTP/SNTP packets at the NTP/SNTP application layer. As stated above, this makes time stamping a victim of realtime OS scheduling unpredictability. In this section we describe how a high degree of accuracy can be achieved using a tuned SNTP implementation and standard Ethernet switches. 21.4.3.1 A Tuned SNTP Time Protocol Implementation The NTP/SNTP time synchronization algorithm presented has definite limits to the level of attainable accuracy. Let us recapitulate: The NTP/SNTP algorithm is based on a network packet containing three time stamps: • t1: The (client) time the packet was generated in the client asking for the current time • t2: The (server) time the packet arrived at the timeserver • t3: The (server) time the packet was updated and put into the transmission queue at the server In addition, the calculations require: • t4: The (client) time the packet arrived back at the client Now, t2 and t4 can easily be determined down to the microsecond (and perhaps even better) using hardware or software time stamps based on packet arrival interrupts. The other two have definite problems, however. For full accuracy, t1 and t3 should be the time when the packet left the time client or timeserver, respectively. The problem is that this time stamp is not available until the packet has really left the timeserver or client, and then it is, of course, too late to incorporate it into the packet. Therefore, the time synchronization inaccuracy for an NTP/SNTP setup is the variation in the delay between t1 and the time the packet leaves the time client plus the variation in the delay between t3 and the time the packet leaves the timeserver. It turns out that there are several ways of determining the time when the synchronization packet actually leaves the time client. One algorithm runs like this: 1. Create the NTP packet and fill in the t1 (originate time stamp) as usual. Transmit the package to the timeserver. 2. Get hold of the time stamp when the packet leaves the time client, using any appropriate mechanism. Store this time stamp (t11) in a data structure together with t1. 3. When the packet returns from the timeserver, extract t2 from the packet and use t1 to look up the corresponding value of t11. Since t11 and t1 are stored together, there is no chance of confusion here. 4. Substitute t11 for t1 in the packet. 21.4.3.2 Achieving Precise Time Stamping within a Node The nature of real-time operating systems (they guarantee a maximum response time for an event but allow for a wide variation below that) introduces a substantial variation in the time spent in the communication stacks. This fact has necessitated an interrupt-level time stamping in both the time client and timeserver. The IEC class T3 solution described here adheres to the principle of interrupt-level time stamping of the SNTP request packet when sent from the time client and when received at the timeserver. Moreover, we propose that the synchronization is based on the (corrected) transmit time stamp set by the client (referred to as t1 in SNTP terminology) and the receive time stamp set by the server (referred to as t2). Usage of a possible low-level transmit time stamping of the corresponding SNTP reply packet (referred to as t3) necessitates novel techniques for controlling the nondeterministic access of an Ethernet packet to an Ethernet bus.
© 2005 by CRC Press
21-11
Switched Ethernet in Automation Networking
SNTP time server
SNTP time client GPS
Automation application
SNTP server TCP
t3
t0
UDP
UDP
IP Low level
MAC SNTP
SNTP client
TCP IP
t2
t4
t1
802.3
Low level MAC SNTP
802.3
FIGURE 21.4 The SNTP time client–server relation using low-level time stamping.
A side effect of using t1 – t2 only is that no mechanism for automatic calibration of the network latency will be available, and therefore a manual calibration of the propagation delays of the drop links and the minimum switch latency* must be performed. Figure 21.4 illustrates the setup of a SNTP time client and timeserver adhering to interrupt-level time stamping. 21.4.3.3 Time Client Implementation Issues There are several ways of time-stamping a network packet. We shall look at three of them and show that only the first two are suitable for accurate time synchronization: 1. Hardware time stamping in the Ethernet controller. 2. Software time stamping in an Interrupt Service Routine (ISR) outside the RTOS. This ISR should be connected to the Ethernet Interrupt Request signal and have a top hardware priority. 3. Software time stamping in an ISR controlled by the RTOS (Ethernet driver). This ISR is connected to the Ethernet Interrupt Request signal with a normal hardware priority. Using any of these low-level time-stamping methods is considered an implementation issue and will not cause any incompatibly between a low-level time-stamping client and a standard high-level timestamping server. In addition to low-level time stamping, the time client must consider the following aspects: • The interval between time updates • The specifications of the local time-of-day clock with respect to resolution, accuracy/stability, and the availability of drift and offset correction mechanisms • The usage of adaptive filtering and time-stamp validation methods in order to remove network delay variations 21.4.3.4 Timeserver Implementation Issues In order to achieve class T3 accuracy, the timeserver should be able to time-stamp an incoming message with an accuracy of better than 2 ms independently of network load. The exact time should be taken from a global positioning system (GPS) receiver, and the time parameters distributed from the timeserver should be based on GPS time representation instead of absolute time (i.e., coordinated universal time (UTC) timing) in order to cope with the leap-second problem. It is also convenient if the timeserver supports full-duplex connectivity in order to avoid a situation where upstream data introduce extra switch latency in downstream data (i.e., time requests). 21.4.3.5 Ethernet Infrastructure Implementation Issues Only one switch should preferably be allowed between a time client and a timeserver. Having multiple switch levels will impose increased jitter† through the infrastructure, which again might call for more
*The minimum Ethernet switch delay is usually given in the switch data sheet. †Jitter: Variations in the delay.
© 2005 by CRC Press
21-12
The Industrial Communication Technology Handbook
complex filtering at the time client side. The Ethernet switch must also have good switch latency characteristics. The switch latency from the client drop link to the server drop link depends on several parameters: • General switch load: This means all the network load on the switch except for the packets sent to the timeserver. The variations in the switch latency from the client drop link to the server drop link should be less than 2 ms. • Timeserver load: This parameter means other packets sent to the timeserver that may introduce extra delay in the transmission of a given SNTP request packet. This delay can be handled at the time client side using various filtering techniques (see time client requirements).
21.4.4 Measurements on an Actual Network Extensive tests and measurements regarding time synchronization accuracy on a switched Ethernet network have been undertaken [6]. The conclusions from this body of work are: • Traffic not destined for the timeserver does not interfere with traffic to the timeserver. • The switch latency for Ethernet packets to the timeserver depends to a great extent on other traffic to the timeserver. The full network measurements conclusions are: • Software time stamping using a sufficiently high priority interrupt (preferably nonmaskable) is for all practical purposes indistinguishable from time stamping using special-purpose hardware. • Software time stamping using an interrupt under RTOS control needs sophisticated filtering and statistical techniques before it can be used for time synchronization purposes. In that respect, this time stamping method is not suitable for IEC class T3 synchronization. • IEC class T3 time synchronization using tuned SNTP over a switched Ethernet has been shown to be eminently feasible.
21.4.5 Beyond the Speed of Light: Class T5 It is now possible to procure industrial-class Fast Ethernet switches fulfilling an extensive list of environment requirements relevant for substation automation applications. Some of these switches can even be delivered with an integrated SNTP timeserver. Since the internal switch logic has full control over input and output ports, time-stamping an SNTP request packet on arrival is no problem. In addition, the switch logic can insert the transmit time stamp whenever the output port is ready for the reply packet (and even adjust the time stamp for the delay between time stamping and actual transmission). Thus, the traditional problem related to the nondeterministic access to the Ethernet is not a problem here due to the tight interaction between the SNTP timeserver and the switch architecture. This time synchronization scheme provides the following: • Timing synchronization accuracy better than 1 µs if time stamping in the time client is performed in hardware; see “Time Client Implementation Issues.” • Both server time stamps — t2 (receive) and t3 (transmit) — may be used at the time client for synchronization purposes, and the drop link propagation delay can easily be removed based on the calculated round-trip delay. • The timing accuracy is independent of the network load. • No clever filtering/erasure techniques are needed in the time client.
21.4.6 Summary and Conclusions We have presented general solutions for achieving class T5 (1 ms) and class T3 (25 ms) time synchronization over the switched Ethernet. The former is based on a dedicated Ethernet switch/timeserver
© 2005 by CRC Press
Switched Ethernet in Automation Networking
21-13
combination, while the latter relies on standard switches. Common for both solutions is that they adhere to the low-level time-stamp implementation of the Simple Network Time Protocol. Hardware time stamping or low-level software time stamping outside the real-time operating system eliminates client inaccuracy from the error budget of the SNTP time synchronization loop. If the SNTP timeserver relies on the same time-stamping techniques, the only remaining factor to be handled in the error budget is possible time delay variations within the infrastructure. In these settings, class T3 synchronization is possible over the switched Ethernet.
21.5 Introducing Virtual Subnetworks What is a virtual local area network? A concept made possible by Ethernet switches, the multitude of vendor-specific VLAN solutions and implementation strategies has made it very difficult indeed to define precisely what a VLAN is. Nevertheless, most people would agree that a VLAN might be roughly equated to a broadcast domain.* More specifically, VLANs can be seen as analogous to a group of end stations, perhaps on multiple physical LAN segments, which are not constrained by their physical location and can communicate as if they were on a common LAN. Currently, three types of VLANs are of interest: • Port group VLANs • Layer 2 multicast group-based VLANs • Layer 3-based VLANs Here we will only give a brief overview of these different types of VLANs. An important aspect to note in this context is the fact that it is now possible to define dynamic VLANs that correspond exactly to a multicast group. This means that multicast frames will only propagate to members of the indicated VLAN and not to anyone else. In particular, the frames will not occupy drop links or CPU resources in nodes that do not belong to the multicast group.
21.5.1 Port Group VLANs Port-based VLAN is a concept relying on a pure manual configuration of Ethernet switches (bridges) to set up VLAN membership. Many initial VLAN implementations defined VLAN membership by groups of switch ports (for example, ports 1, 2, 3, 7, and 8 on a switch made up VLAN A, while ports 4, 5, and 6 made up VLAN B). Furthermore, in most initial implementations, VLANs could only be supported on a single switch. Second-generation implementations support VLANs that span multiple switches (for example, ports 1 and 2 of switch 1 and ports 4, 5, 6, and 7 of switch 2 make up VLAN A, while ports 3, 4, 5, 6, 7, and 8 of switch 1 combined with ports 1, 2, 3, and 8 of switch 2 make up VLAN B). Port grouping is still the most common method of defining VLAN membership, and configuration is fairly straightforward. Defining VLANs purely by port group does not allow multiple VLANs to include the same physical segment (or switch port). However, the primary limitation of defining VLANs by port is that the network manager must reconfigure the VLAN membership when a node is moved from one port to another.
21.5.2 Group-Based VLANs: GARP VLAN Registration Protocol In the 802.1D standard (see [3]), the Generic Attribute Registration Protocol (GARP) is introduced. This is a general protocol that allows network nodes to control selected switch properties. The first two
*Broadcast domain: A collection of all nodes that can be reached by a broadcast message from one of them.
© 2005 by CRC Press
21-14
The Industrial Communication Technology Handbook
implementations of this protocol are the GARP Multicast Registration Protocol and the GARP VLAN Registration Protocol (GVRP). GVRP provides a mechanism that allows switches and end stations to dynamically register (and subsequently, de-register) VLAN membership information with the Ethernet switches attached to the same LAN segment, and for that information to be disseminated across all switches in the LAN that support extended filtering services. Moreover, no manual configuration of the switches is required, opposed to the port-based solution; on the other hand, the switches must implement extra software as specified by IEEE 802.1D [3]. The operation of GVRP relies upon the services provided by GARP. The information registered, deregistered, and disseminated via GVRP is in the following forms: 1. VLAN membership information indicates that one or more GVRP participants that are members of a particular VLAN (or VLANs) exist, and the Ethernet frames carry a 12-bit VID* (see Figure 21.2) that state the membership. The act of registering/de-registering a VID affects the contents of dynamic VLAN registration entries to indicate the port(s) on which members of the VLAN(s) have been registered. 2. Registration of VLAN membership information allows the Ethernet switches in a LAN to be made aware that frames associated with a particular VID should only be forwarded in the direction of the registered members of that VLAN. In this way, the VLAN membership is propagated through the Ethernet infrastructure, and forwarding of frames associated with the VID therefore occurs only on ports on which such membership registration has been received. GVRP is a very new protocol concept and is not yet widely supported by industrial Ethernet switches, but it is foreseen to be one of the future technologies for handling the publish-and-subscribe automation network philosophy. One such handling algorithm consists of three steps: 1. Map every multicast group to one specific VLAN identifier. This is, of course, an offline activity. 2. At system start-up time, all nodes send out one small packet for each VLAN (and thereby multicast group) it wants to join. This packet, which is not a part of any complicated protocol stack, gives the network infrastructure the information it needs in order to map VLAN identifiers to physical drop links. For simple data source nodes, these packets can be precompiled. 3. Whenever a multicast packet is to be transmitted, it should be tagged with “real-time priority” and the VLAN identifier corresponding to the multicast group. These simple steps ensure that multicast packets do not block any unnecessary drop links. Below we discuss possible solutions for taking control of the latency within the end nodes.
21.5.3 Layer 3-Based VLANs VLANs based on layer 3 information take into account protocol type (if multiple protocols are supported) or network layer address (for example, subnet address for TCP/IP networks) in determining VLAN membership. Although these VLANs are based on layer 3 information, this does not constitute a routing function and should not be confused with network layer routing. Even though a switch inspects a packet’s IP address to determine VLAN membership, no route calculation is undertaken, and frames traversing the switch are usually bridged according to implementation of the Spanning Tree Algorithm (the purpose of this algorithm is to make sure that no network packet is caught in an endless loop between switches). There are several advantages to defining VLANs at layer 3. First, it enables partitioning by protocol type. This may be an attractive option for network managers who are dedicated to a service- or application-based VLAN strategy. Second, nodes can physically move their workstations without having to reconfigure each workstation’s network address — a benefit primarily for TCP/IP nodes. Third, defining *VID: VLAN identifier.
© 2005 by CRC Press
Switched Ethernet in Automation Networking
21-15
VLANs at layer 3 can eliminate the need for frame tagging in order to communicate VLAN membership between switches, reducing transport overhead. One of the disadvantages of defining VLANs at layer 3 (vs. MAC- or port-based VLANs) can be performance. Inspecting layer 3 addresses in packets is more time consuming than looking at MAC addresses in frames. For this reason, switches that use layer 3 information for VLAN definition are generally slower than those that use layer 2 information. It should be noted that this performance difference is true for most, but not all, vendor implementations.
References [1] Ø. Holmeide and T. Skeie, VoIP drives realtime ethernet, in Industrial Ethernet Book, Vol. 5, GGH Marketing Communications, Titchfield, UK, 2001. [2] T. Skeie, S. Johannessen, and Ø. Holmeide, The road to an end-to-end deterministic Ethernet, in Proceedings of the 4th IEEE International Workshop on Factory Communication Systems (WFCS), September 2002. [3] IEEE 802.1D, Information Technology: Telecommunications and Information Exchange between Systems: Local and Metropolitan Area Networks: Communication Specification: Part 3: Media Access Control Bridges, 1998. [4] IEEE 802.1Q, IEEE Standards for Local and Metropolitan Area Networks: Virtual Bridged Local Area Networks, 1998. [5] T. Skeie, S. Johannessen, and C. Brunner, Ethernet in substation automation, IEEE Control Systems Magazine, 22: 43–51, 2002. [6] T. Skeie, S. Johannessen, and Ø. Holmeide, Highly accurate time synchronization over switched Ethernet, in Proceedings of the 8th IEEE Conference on Emerging Technologies and Factory Automation (ETFA), Antibes Juan-les-Pins, France, 2001, pp. 195–204.
© 2005 by CRC Press
22 Wireless LAN Technology for the Factory Floor: Challenges and Approaches 22.1 Introduction ......................................................................22-1 22.2 Wireless Industrial Communications and Wireless Fieldbus: Challenges and Problems .................................22-2 System Aspects • Real-Time Transmission over Error-Prone Channels • Integration of Wired and Wireless Stations/Hybrid Systems • Mobility Support • Security Aspects and Coexistence
22.3 Wireless LAN Technology and Wave Propagation .........22-5 Wireless LANs • Wave Propagation Effects
22.4 Physical Layer: Transmission Problems and Solution Approaches .........................................................22-7 Effects on Transmission • Wireless Transmission Techniques
22.5 Problems and Solution Approaches on the MAC and Link Layer ..........................................................................22-8 Problems for Wireless MAC Protocols • Methods for Combating Channel Errors and Channel Variation
22.6 Wireless Fieldbus Systems: State of the Art ..................22-12 CAN • FIP/WorldFIP • PROFIBUS • Other Fieldbus Technologies
22.7 Wireless Ethernet/IEEE 802.11.......................................22-14 Brief Description of IEEE 802.11 • Real-Time Transmission over IEEE 802.11
Andreas Willig University of Potsdam
22.8 Summary..........................................................................22-15 References ...................................................................................22-15
22.1 Introduction Wireless communication systems diffused into an ever-increasing number of application areas and achieved wide popularity. Wireless telephony and cellular systems are now an important part of our daily lives, and wireless local area network (WLAN) technologies become more and more the primary way to access business and personal data. Two important benefits of wireless technology are key to this success: the need for cabling is greatly reduced, and computers as well as users can be truly mobile. This saves
22-1 © 2005 by CRC Press
22-2
The Industrial Communication Technology Handbook
costs and enables new applications. In factory plants, wireless technology can be used in many interesting ways [24, Chap. 2]: • Provision of the communication services for distributed control applications involving mobile subsystems like autonomous transport vehicles, robots, or turntables • Implementation of distributed control systems in explosible areas or in the presence of aggressive chemicals • Easing frequent plant reconfiguration, as fewer cables have to be remounted • Mobile plant diagnosis systems and wireless stations for programming and on-site configuration However, when adopting WLAN technologies for the factory floor, some problems occur. The first problem is the tension between the hard reliability and timing requirements (hard real time) pertaining to industrial applications, on the one hand, and the problem of wireless channels having time-variable and sometimes quite high error rates, on the other hand. A second major source of problems is the desire to integrate wireless and wired stations into one single network (henceforth called a hybrid system or hybrid network). This integration calls for the design of interoperable protocols for the wired and wireless domains. Furthermore, using wireless technology imposes problems not anticipated in the original design of the (wired) fieldbus protocols: security problems, interference, mobility management, and so on. In this chapter we survey some issues pertaining to the design and evaluation of protocols and architectures for (integrated) wireless industrial LANs and provide an overview of the state of the art. There is emphasis on aspects influencing the time and reliability behavior of wireless transmission. However, we discuss not only the problems but also different solution approaches on the physical, medium access control (MAC), or data link layer. These layers are key to the success of wireless fieldbus systems because they have important responsibilities in fulfilling timing and reliability requirements, and furthermore, they are exposed most directly to the wireless link characteristics. In the second part of this chapter, we focus on technologies and the creation of hybrid systems. On the one hand, there are a number of existing fieldbus standards like Controller Area Network (CAN), Factory Instrumentation Protocol (FIP)/WorldFIP, or PROFIBUS. For these systems we discuss problems and approaches to create hybrid systems. On the other hand, one could start from existing wireless technologies and ask about their capabilities with respect to timeliness and reliability. The most widely deployed WLAN technology is currently the IEEE 802.11 WLAN standard; its suitability for industrial applications is discussed. This chapter is structured as follows: In Section 22.2 important general considerations and problems of wireless industrial communications and wireless fieldbus systems are presented. In Section 22.3, we discuss some basic aspects of wireless LAN technology and wireless wave propagation. The transmission impairments resulting from certain wave propagation effects and some physical layer approaches to deal with them are presented in Section 22.4. Wireless wave propagation also has some interesting consequences on the operation of the MAC and data link layer; these are discussed in Section 22.5. The following two sections take a more technology-oriented perspective. Specifically, in Section 22.6 we survey the state of the art regarding wireless industrial communication systems and wireless fieldbus systems. In Section 22.7 we present the important aspects of the IEEE 802.11 WLAN standard with respect to transmission of real-time data. Finally, in Section 22.8 we provide a brief summary. The chapter is restricted to protocol-related aspects of wireless transmission; other aspects like signal processing, analog and digital circuitry, or energy aspects are not considered. There are many introductory and advanced books on wireless networking, for example, [1, 12, 37, 56, 61, 63, 65, 66]. Several separate topics in wireless communications are treated in [25]. Furthermore, this chapter is not intended to serve as an introduction to fieldbus technologies; some background information can be found in [14, 46].
22.2 Wireless Industrial Communications and Wireless Fieldbus: Challenges and Problems In this section we survey some of the problem areas arising in wireless fieldbus systems.
© 2005 by CRC Press
Wireless LAN Technology for the Factory Floor: Challenges and Approaches
22-3
22.2.1 System Aspects First, wireless fieldbus systems will operate in similar environments as wired ones. Typically, a small to moderate number of stations are distributed over geographically small areas with no more than 100 m between any pair of stations [33]. Wired fieldbus systems offer bit rates ranging from hundreds of kilobits to (tens of) megabits per second, and wireless fieldbus systems should have comparable bit rates. The wireless transceivers have to meet electromagnetic compatibility (EMC) requirements, meaning that they not only have to restrict their radiated power and frequencies, but also should be properly shielded from strong magnetic fields and electromagnetic noise emanated by strong motors, high-voltage electrical discharges, and so on. This may pose a serious problem when off-the-shelf wireless transceivers are used (for example, commercial IEEE 802.11 hardware), since these are typically designed for office environments and have no industrial-strength shielding. Another problem is that many small fieldbus devices get their energy supply from the same wire as that used for data transmission. If the cabling is to be removed from these devices, there is not only the problem of wireless data transmission, but also the issue of wireless power transmission [31], which requires substantial effort. For battery-driven devices, the need to conserve energy arises. This has important consequences for the design of protocols [18, 27] but is not discussed anymore in this chapter.
22.2.2 Real-Time Transmission over Error-Prone Channels In industrial applications often hard real-time requirements play a key role. In accordance with [58] we assume the following important characteristics of hard real-time communications: (1) safety-critical messages must be transmitted reliably within an application-dependent deadline; (2) there should be support for message priorities to distinguish between important and unimportant messages; (3) messages with stringent timing constraints typically have a small size; and (4) both periodic and aperiodic/ asynchronous traffic is present. The qualifier hard stems from the fact that losses or deadline misses of safety-critical packets can cost life or damage equipment. Both periodic and aperiodic messages in fieldbus systems can be subject to hard real-time constraints. Wireless media tend to exhibit time-variable and sometimes high error rates, which creates a problem for fulfilling the hard real-time requirements. As an example, the measurements presented in [82] have shown that in a certain industrial environment, for several seconds no packet gets through the channel. Therefore, seeking deterministic guarantees regarding timing and reliability is not appropriate. Instead, stochastic guarantees become important. An example formulation might be the percentage of safetycritical messages that can be transmitted reliably within a prespecified time — bound should be at least 99.x%. Of course, the error behavior limits the application areas of wireless industrial LANs: when deterministic guarantees in the range of 10 to 100 ms are essential, wireless transmission is ruled out (at least at the current state of art). However, if occasional emergency stop conditions due to message loss or missing deadlines are tolerable, wireless technologies can offer their potential. The goal is to reduce the frequency of losses and deadline misses. It depends on the communication model how transmission reliability can be implemented. In many fieldbus systems (for example, PROFIBUS) packets are transmitted from a sender to an explicitly addressed receiver station without involving other stations. Reliability can be ensured by several mechanisms, for example, retransmissions, packet duplications, or error-correcting codes. On the other hand, systems like FIP/WorldFIP [73] and CAN [35] implement the model of a real-time database where data are identified instead of stations. A piece of data has one producer and potentially many consumers. The producer broadcasts the data and all interested consumers copy the data packet into an internal buffer. This broadcast approach prohibits the use of acknowledgments and packet retransmissions, but error-correcting codes can still be used to increase transmission reliability. Often the data are transmitted periodically and (repeated) packet losses can be detected by comparing the known period and the time of the last arrival of a data packet. This freshness information can be used by the application to react properly.
© 2005 by CRC Press
22-4
The Industrial Communication Technology Handbook
22.2.3 Integration of Wired and Wireless Stations/Hybrid Systems There is a huge number of existing and productive fieldbus installations, and it is best if wireless stations can be integrated into them. Such a network with both wireless stations (stations with a wireless transceiver) and wired stations are called hybrid systems. The most important requirements for hybrid systems are: • Transparency: There should be no need to modify the protocol stack of wired stations. • Using specifically tailored protocols: Most fieldbus systems are specified on layers 1 (physical layer), 2 (medium access control and link layer), and 7 (application layer). The introduction of a wireless physical layer affects the behavior and performance of both the medium access control and link layer. The existing protocols for wired fieldbus systems are not designed for a wireless environment and should be replaced by protocols specifically tailored for the wireless link. However, this comes at the cost of protocol conversion between wired and wireless protocols. • Portability of higher-layer software: If the link layer interface is the same for both the wireless and wired protocol stacks, implementations of higher-layer protocols and application software can be used in the same way on both types of stations. The different approaches to integrate wireless stations into wired fieldbus LANs can be classified according to the layer of the Open Systems Interconnection (OSI) reference model where the integration actually happens [13, 81]. Almost all fieldbus systems are restricted to the physical, data link, and application layers [14]. The classification is as follows: • Wireless cable replacement approach: All stations are wired stations and thus attached to a cable. A piece of cable can be replaced by a wireless link, and special bridgelike devices translate the framing rules used on the wireless and wired media, respectively. In this approach, no station is aware of the wireless link. A typical application scenario is the wireless interconnection of two fieldbus segments. • Wireless MAC-unaware bridging approach: The network is composed of both wired and wireless stations, but integration happens solely at the physical layer. Again, a bridgelike device translates the framing rules between wired and wireless media. The wireless stations use merely an alternative physical layer (PHY), but the medium access control (MAC) and link layer protocols remain the same as for wired stations. • Wireless MAC-aware bridging approach: The LAN is composed of both wired and wireless stations and integration happens at the MAC and data link layer. There are two different MAC and link layer protocol stacks for wired and wireless stations, but both offer the same link layer interface. The wireless MAC and link layer protocols should be (1) specifically tailored to the wireless medium and (2) easily integrable with the wired MAC and link layer protocols. An intelligent bridgelike device is responsible for both translation of the different framing rules and interoperation of the different MAC protocols. • Wireless gateway approach: In this approach integration happens at the application layer or even in the application itself. Entirely different protocols can be used on the different media types. • Some mixture of these approaches. Any of these approaches requires special coupling devices at the media boundaries. For the wireless cable replacement and the MAC-unaware bridging approaches, these devices can be simple. The other approaches may require complex and stateful operations. Hence, the issues of failure and redundancy need to be addressed.
22.2.4 Mobility Support The potential station mobility is one of the main attractions of wireless systems. We can assume that wireless fieldbus systems will be mostly infrastructure based (meaning that there are base stations or access points). A handover must be performed when a mobile station moves from the range of one access point into the
© 2005 by CRC Press
Wireless LAN Technology for the Factory Floor: Challenges and Approaches
22-5
range of another access point. Typically, handover processes involve exchange of signaling packets between the mobile and access points. Ideally, a station can fulfill timing and reliability requirements even during a handover. The applicability and performance of handover schemes depend on the maximum speed of a mobile station. In industrial applications, it is typically forklifts, robots, or moving plant subsystems that are mobile, and it is safe to assume that these devices will have a maximum speed of 20 km/h [33]. A simple consequence of mobility is that stations may enter and leave a network at unforeseeable times. To support this, a protocol stack at a minimum must offer functionalities to make a new station known to the network/the other stations, and sometimes address assignment is also needed. On the other hand, fieldbus systems and their applications often are designed with the assumption that the network is set up once and not changed afterwards. Consequently, some fieldbus systems do not support any dynamics in their set of stations. Consider, for example, the FIP/WorldFIP fieldbus [73]. This system belongs to the class of real-time database systems, and the producers of data items (called process variables) are polled by a dedicated central station, the bus arbiter. The bus arbiter keeps a table of variable identifiers and traverses it cyclically. To include a new station into the system, the arbiter’s table has to be modified by a human operator. It is worth noting that the most widely used fieldbus systems do not offer any support for dynamic address assignment.
22.2.5 Security Aspects and Coexistence Security played no important role in the initial design of the fieldbus standards. This was reasonable, because physical access to a wire is needed to eavesdrop or inject packets. However, the introduction of wireless media allows an attacker to eavesdrop packets at some distance, for example, on the factory’s parking lot. Even worse, an attacker could generate interference on the operating frequency of a wireless fieldbus system and distort all transmissions (including the time-critical and important ones). An attacker might also try to inject malicious packets into the network, for example, false valve commands. Therefore, security measures (integrity, authentication, authorization) have to be added to wireless fieldbus systems [64]. Noise and interference are not only generated purposely by some attackers, but can also come from co-located wireless systems working in the same frequency band. As an example, both IEEE 802.11 and Bluetooth use the 2.4-GHz ISM (industrial, scientific, and medical) band and create mutual interference. This coexistence problem is explored in [10].
22.3 Wireless LAN Technology and Wave Propagation In this section we discuss some basic characteristics of WLAN technology and present some of the fundamental wave propagation effects. In Sections 22.4 and 22.5 we discuss physical layer and MAC/link layer approaches to overcome or at least relax some of the problems created by the propagation effects.
22.3.1 Wireless LANs Wireless LANs are designed for packet-switched communications over short distances (up to a few hundred meters) and with moderate to high bit rates. As an example, the IEEE 802.11 WLAN standard offers bit rates between 1 and 54 Mb/s [54, 69]. Wireless LANs usually use either infrared or radio frequencies. In the latter case, license-free bands like the 2.4-GHz ISM band are particularly attractive, since the only restriction in using this band is a transmit power limit. On the other hand, since anyone can use these bands, several systems have to coexist. Radio waves below 6 GHz propagate through walls and can be reflected on several types of surfaces, depending on both frequency and material. Thus, with radio frequencies non-line-of-sight (NLOS) communications is possible. In contrast, systems based on infrared only allow for line-of-sight (LOS) communications over a short distance. An example is the IrDA (Infrared Data Association) system [79]. Wireless LANs can be roughly subdivided into ad hoc networks [71] and infrastructure-based networks. In the latter case some centralized facilities like access points or base stations are responsible for tasks
© 2005 by CRC Press
22-6
The Industrial Communication Technology Handbook
like radio resource management, forwarding data to distant stations, mobility management, and so on. In general, stations cannot communicate without the help of the infrastructure. In ad hoc networks there is no prescribed infrastructure and the stations have to organize network operation by themselves. Infrastructure-based WLANs offer some advantages for industrial applications. Many industrial communication systems already have an asymmetric structure that can be naturally accommodated in infrastructure-based systems. The often used master–slave communication scheme serves as an example. Furthermore, the opportunity to offload certain protocol processing tasks to the infrastructure keeps mobile stations simpler and allows them to make efficient centralized decisions. Compared to other wireless technologies like cellular systems and cordless telephony, WLAN technologies seem to offer the best compromise between data rate, geographical coverage, and license-free/ independent operation.
22.3.2 Wave Propagation Effects In the wireless channel waves propagate through the air, which is an unguided medium. The wireless channel characteristics are significantly different from those of guided media, like cables and fibers, and create unique challenges for communication protocols. A transmitted waveform is subjected to phenomena like path loss, attenuation, reflection, diffraction, scattering, adjacent and co-channel interference, thermal or man-made noise, and imperfections in the transmitter and receiver circuitry [8, 61]. The path loss characterizes the loss in signal power when increasing the distance between a transmitter T and a receiver R. In general, the mean received power level E[PRx] can be represented as the product of the transmit power PTx and the mean path loss E[PL] : E[PRx ] = PTx ◊ E[PL] A typical path loss model for omnidirectional antennas is given by [61, Chap. 4.9]: Êdˆ E[PL](d) = C ◊ Á ˜ Ë d0 ¯
n
where d ≥ d0 is the distance between T and R, E[ PL ](d) is the mean path loss, d0 is a reference distance that depends on the antenna technology, C is a technology- and frequency-dependent scaling factor, and n is the so-called path loss exponent. Typical values for n are between 2 (free-space propagation) and 5 (shadowed urban cellular radio); see also [61, Chap. 4.9]. Reflection occurs when a waveform impinges on a smooth surface with structures significantly larger than the wavelength. Not all signal energy is reflected; some energy is absorbed by the material. The mechanism of diffraction allows a wave to propagate into a shadowed region, provided that some sharp edge exists. Scattering is produced when a wavefront hits a rough surface having structures smaller than the wavelength; it leads to a signal diffusion in many directions. The most important types of interference are co-channel interference and adjacent channel interference. In co-channel interference a signal transmitted from T to R on channel c1 is distorted by a parallel transmission on the same channel. In case of adjacent channel interference the interferer I transmits on an adjacent channel c2, but due to imperfect filters R captures frequency components from c2. Alternatively, an interferer I transmitting on channel c2 leaks some signal energy into channel c1 due to imperfect transmit circuitry (amplifier). Noise can be thermal noise or man-made noise. Thermal noise is created in the channel or in transceiver circuitry and can be found in almost any communications channel. Man-made noise in industrial environments can have several sources, for example, remote controls, motors, or microwave ovens.
© 2005 by CRC Press
Wireless LAN Technology for the Factory Floor: Challenges and Approaches
22-7
22.4 Physical Layer: Transmission Problems and Solution Approaches The previously discussed wave propagation effects can lead to channel errors. In general, their impact depends on a multitude of factors, including frequency, modulation scheme, and the current propagation environment. The propagation environment is characterized by the distance between stations, interferers, the number of different paths and their respective losses, and more. These factors can change when a station or parts of the environment move. Consequently, the transmission quality is time variable.
22.4.1 Effects on Transmission The notion of slow fading refers to significant variations in the mean path loss, as they occur due to significant changes in distance between transmitter T and receiver R or by moving beyond large obstacles. Slow fading phenomena usually occur on longer timescales; they often coincide with human activity like mobility. For short durations in the range of a few seconds, the channel can often be assumed to have constant path loss. An immediate result of reflection, diffraction, and scattering is that multiple copies of a signal may travel on different paths from T to R. Since these paths usually have different lengths, the copies arrive at different times (delay spread) and with different phase shifts at the receiver and overlap. This has two consequences: • The overlapping signals can interfere constructively or destructively. Destructive interference may lead to up to a 40-dB loss of received power. Such a situation is often called a deep fade. • The delay spread leads to intersymbol interference, since signals belonging to neighboring information symbols overlap at the receiver. If the stations move relative to each other or to the environment, the number of paths and their phase shifts vary in time. This results in a fast fluctuating signal strength at the receiver (called fast fading or multipath fading). It is important to note that these fluctuations are much faster than those caused by slow fading. Fast fading happens on the scale of milliseconds, whereas slow fading happens at scales of seconds or minutes. On the timescale of milliseconds, the mean signal strength is constant. If the delay spread is small relative to the duration of a channel symbol, the channel is called non-frequency selective or flat; otherwise, it is called frequency selective. These problems translate into bit errors or packet losses. Packet losses occur when the receiver fails to acquire bit synchronization [82]. In case of bit errors synchronization is successfully acquired, but a number of channel symbols are decoded incorrectly. The bit error rate can, for example, be reduced by using forward error correction (FEC) techniques [11, 45]. The statistical properties of bit errors and packet losses were investigated in a number of studies [16, 52, 82]. While the results are not immediately comparable, certain trends show up in almost every study: • Both bit errors and packet losses are bursty; they occur in clusters with error-free periods between the clusters. The distributions of the cluster lengths and the lengths of error-free periods often have a large coefficient of variation or even seem to be heavy tailed. • The bit error rates depend on the modulation scheme; typically schemes with higher bit rates/ symbol rates exhibit higher error rates. • The wireless channel is much worse than wired channels; often bit error rates of 10 -3 L10 -6 can be observed. Furthermore, the bit error rate can vary over several orders of magnitudes within minutes. Some knowledge about error generation patterns and error statistics can be helpful in designing more robust protocols.
© 2005 by CRC Press
22-8
The Industrial Communication Technology Handbook
22.4.2 Wireless Transmission Techniques A number of different transmission techniques have been developed to combat the impairments of the wireless channel and to increase the reliability of data transmission. Many types of WLANs (including IEEE 802.11) rely on spread-spectrum techniques [26], where a narrowband information signal is spread to a wideband signal at the transmitter and de-spread back to a narrowband signal at the receiver. By using a wideband signal, the effects of narrowband noise or narrowband interference are reduced. The two most important spread-spectrum techniques are directsequence spread spectrum (DSSS) and frequency-hopping spread spectrum (FHSS). In DSSS systems an information bit is multiplied (XORed) with a finite bipolar chip sequence such that transmission takes place at the chip rate instead of the information bit rate. The chip rate is much higher than the information rate and consequently requires more bandwidth; accordingly, the duration of a chip is much smaller than the duration of a user symbol. The chip rate is chosen such that the average delay spread is larger than the chip duration; thus, the channel is frequency selective. Receivers can explore this in different ways. To explain the first one, let us assume that the receiver receives a signal S from a line-of-sight path and a delayed signal S¢ from another path, such that the delay difference (called lag) between S and S¢ is more than the duration of a single chip. The chip sequences are designed such that the autocorrelation between the sequence and a shifted version of it is low for all lags of more than one chip duration. If a coherent matched-filter receiver is synchronized with the direct signal S, the delayed signal S¢ appears as white noise and produces only a minor distortion. In the RAKE receiver approach, delayed signal copies are not treated as noise but as a useful source of information [65, Section 10.4]. Put briefly, a RAKE receiver tries to acquire the direct signal and the strongest time-delayed copies and combines them coherently. However, RAKE receivers are much more complex than simple matched-filter DSSS receivers. In FHSS the available spectrum is divided into a number of subchannels. The transmitter hops through the subchannels according to a predetermined schedule, which is also known to the receiver. The advantage of this scheme is that a subchannel currently subject to transmission errors is used only for a short time before the transmitter hops to the next channel. The hopping frequency is an important parameter of FHSS systems, since high frequencies require fast and accurate synchronization. As an example, the FHSS version of IEEE 802.11 hops with 2.5 Hz and many packets can be transmitted before the next hop. In Bluetooth the hopping frequency is 1.6 kHz and at most one packet can be transmitted before the next hop. Packets are always transmitted without being interrupted by hopping. Recently there has been considerable interest in orthogonal frequency-division multiplexing (OFDM) techniques [75]. OFDM is a multicarrier technique, where blocks of N different symbols are transmitted in parallel over a number of N subcarriers. Hence, a single symbol has an increased symbol duration N · t, compared to full-rate transmission with symbol duration t. The symbol duration N · t is usually much larger than the delay spread of the channel, this way combatting intersymbol interference and increasing channel quality. IEEE 802.11a [54] as well as HIPERLAN/2 [20, 21] use an OFDM physical layer.
22.5 Problems and Solution Approaches on the MAC and Link Layer The MAC and the link layer are exposed most to the error behavior of wireless channels and should do most of the work needed to improve the channel quality. Specifically for hard real-time communications, the MAC layer is a key component: if the delays on the MAC layer are not bounded, the upper layers cannot compensate this. In general, the operation of the MAC protocol is largely influenced by the properties of the physical layer. Some of the unique problems of wireless media are discussed in this section. For a general discussion of MAC and link layer protocols, refer to [15, 28, 40, 68, 74].
© 2005 by CRC Press
Wireless LAN Technology for the Factory Floor: Challenges and Approaches
A
B
22-9
C
FIGURE 22.1 Hidden-terminal scenario.
22.5.1 Problems for Wireless MAC Protocols Several problems arise due to path loss in conjunction with a threshold property: wireless receivers require the signal to have a minimum strength to be recognized. For a given transmit power, this requirement translates into an upper bound on the distance between two stations wishing to communicate*; if the distance between two stations is larger, they cannot hear each other’s transmissions. For MAC protocols based on carrier sensing (carrier-sense multiple access (CSMA)), this property creates the hiddenterminal [70] and exposed-terminal problems. The hidden-terminal problem is sketched in Figure 22.1: consider three stations A, B, and C with transmission radii as indicated by the circles. Stations A and C are in the range of B, but A is not in the range of C and vice versa. If C starts to transmit to B, A cannot detect this by its carrier-sensing mechanism and considers the medium to be free. Consequently, A also starts packet transmission and a collision occurs at B. The exposed-terminal problem is also a result of false prediction of the channel state at the receiver. An example scenario is shown in Figure 22.2. The four stations A, B, C, and D are placed such that the pairs A/B, B/C, and C/D can hear each other; all other combinations cannot. Consider the situation where B transmits to A, and one short moment later C wants to transmit to D. Station C performs carrier sensing and senses the medium busy due to B’s transmission. Consequently, C postpones its transmission. However, C could safely transmit its packet to D without disturbing B’s transmission to A. This leads to a loss of efficiency. Two approaches to solve these problems are busy-tone solutions [70] and the RTS/CTS protocol (defined below). In the busy-tone solution two channels are assumed: a data channel and a separate control channel for the busy-tone signals. The receiver of a packet transmits a busy-tone signal on the control channel during packet reception. If a prospective transmitter wants to perform carrier sensing, it listens
A
B
C
D
FIGURE 22.2 Exposed-terminal scenario.
*To complicate things, wireless links are not necessarily bidirectional: it may well happen that station A can hear station B but not vice versa.
© 2005 by CRC Press
22-10
The Industrial Communication Technology Handbook
on the control channel instead of the data channel. If the control channel is free, the transmitter can start to transmit its packet on the data channel. This protocol solves the exposed-terminal problem. The hidden-terminal scenario is also solved except in those rare cases where A and C start their transmissions simultaneously. However, if the busy tone is transmitted only when the receiver detects a valid packet header, the two colliding stations A and C can abort their transmissions quickly when they perceive the lack of a busy tone. The busy-tone solution requires two channels and two transceivers. The RTS/CTS protocol attacks the hidden-terminal problem using only a single channel. Here we describe the variant used in the IEEE 802.11 WLAN (there are other ones). Consider the case that station A has a data packet for B. After A has obtained channel access, it sends a short request-to-send (RTS) packet to B. This packet includes the time duration needed to finish the whole packet exchange sequence, including the final acknowledgment. If B receives the RTS packet properly, it answers with a clear-tosend (CTS) packet, again including the time needed to finish the packet exchange sequence. Station A starts to transmit its data packet immediately after receiving the CTS packet. Any other station C, hearing the RTS or CTS packet, defers its transmissions for the indicated time, this way not disturbing the ongoing packet transmission. It is a conservative choice to defer on any of the RTS or CTS packets, and in fact the exposed-terminal problem still exists. One solution could be to let C defer only on reception of a CTS frame, but to allow C a packet transmission if it hears an RTS without a corresponding CTS frame.* The RTS/CTS protocol described here does not prevent collisions of RTS packets, it has significant overhead, and it is still susceptible to subtle variants of the hidden-terminal problem [60]. A significant problem of wireless transceivers is their inability to transmit and receive simultaneously on the same frequency band. Hence, a fast collision detection procedure similar to the CSMA/CD protocol of Ethernet is impossible to implement. Instead, collision detection has to resort to other mechanisms like the busy-tone approach described above (rarely used) or the use of MAC layer acknowledgments (used frequently). Unfortunately, there are fieldbus systems relying on such a feature, for example, the Controller Area Network (CAN) fieldbus [35] with its priority arbitration protocol. In this class of protocols each message is tagged with a priority value, and this value is used to deterministically resolve collisions. In the CAN protocol, all stations are tightly time synchronized and the priority field is always at the start of a packet. All contending stations start packet transmission at the same time. Each contender transmits its priority field bit by bit and reads back the signal from the medium. If the medium state is the same as the transmitted bit, the station continues; otherwise, the station gives up and waits for the next contention cycle. This protocol requires not only the ability to simultaneously listen and receive on the same channel, but the channel will also produce meaningful values from overlapping signals. Alternative implementations are sketched in Section 22.6.1. Even receiver-based collision detection may not work reliably due to the near–far effect: consider two stations A and B transmitting packets in parallel to a station C. For simplicity, let us assume that both stations use the same transmit power. Station A is very close to C, whereas station B is far away but still in reach of C. Consequently, A’s signal at C is much stronger than B’s. In this case, it may happen that C successfully decodes a packet sent by A despite B’s parallel transmission. This situation is advantageous for the system throughput but disadvantageous for MAC protocols relying on collision detection or collision resolution.
22.5.2 Methods for Combating Channel Errors and Channel Variation A challenging problem for real-time transmission is the error-prone and time-varying channel. There are many possible control knobs for improving the channel quality, for example, transmit power, bit rate/modulation, coding scheme/redundancy scheme, packet length, choice of retransmission scheme (automatic repeat request (ARQ)), postponing schemes and timing of (re)transmissions, diversity schemes [57], and adaptation as a meta-method [22]. In general, adaptation at the transmitter requires
*Clearly, if C receives a distorted CTS packet it should defer.
© 2005 by CRC Press
Wireless LAN Technology for the Factory Floor: Challenges and Approaches
22-11
feedback from the receiver. This feedback can be created by using immediate acknowledgment packets after each data packet. The variations of transmit power and of the bit rate/modulation scheme are both equivalent to varying the energy per bit, which in turn influences the bit error rate [59, 61]. Roughly, higher transmit powers and slower bit rates/modulation schemes increase the probability of successful packet reception [32]. A common way to protect data bits against bit errors is to use redundancy. Example approaches are error-detecting and -correcting codes (also called forward error correction (FEC)) [44] and the transmission of multiple copies of a packet [2]. The latter approach can also be classified as a time-diversity scheme [57]. It is beneficial for the overall throughput to control the amount of redundancy according to the current channel state such that none or only a little redundancy is added when the channel currently shows only few errors [9, 16]. A second standard way to deal with transmission errors is retransmissions and suitable ARQ schemes. For channels with bursty errors, it is not clever to retransmit the same packet immediately on the same channel. Specifically, when the mean length of error bursts is of the same order or larger than the packet length, both the original packet and its immediate retransmission are likely to be hit by the same error burst. Hence, under these circumstances an immediate retransmission wastes time and energy. The transmitter can postpone the retransmission for a while and possibly transmit packets to other stations or over other channels in the meantime. If the postponing delay is well chosen, the channel has left the error burst and the retransmission is successful. Indeed, it has been demonstrated in [4–6] that such an approach can reduce the number of wasted packets and increase the throughput significantly. But how should one choose the postponing delay? One option is to adopt some fixed value, which could be based on measurements or on a priori knowledge about the channel. Another option is to occasionally send small probing packets [84] that the receiver has to acknowledge. If the transmitter captures such an acknowledgment, it assumes the channel to be back in good state and continues data transmission. For real-time systems, the postponing decision should not only consider the channel state but also the deadline of a packet. The authors of [17] describe a scheme that takes both the estimated channel state (for postponing decisions) and the packet deadline into account to select one coding scheme from a suite of available schemes. Retransmissions do not necessarily need to use the same channel as the original packet. It is well known that wireless channels are spatially diverse: a signal transmitted by station A can be in a deep fade at geographical position p1 and at the same time good enough to be properly received at another position p2. This property is exploited by certain diversity techniques, for example, receiver diversity [61]: the receiver has two antennas and can pick the stronger/better of the two signals it reads from its antennas. If the spacing between the antennas is large enough,* the signals appear to be uncorrelated. The spatial diversity of wireless channels can also be explored at the protocol level: assume that station A transmits a packet to station B. The channel from A to B is currently in a deep fade, but station C successfully captures A’s packet. If the channel from C to B is currently in a good state, the packet can be successfully transmitted over this channel. Therefore, station C helps A with its retransmission. This idea has been applied in [81] to the retransmission of data packets as well as to poll packets in a polling-based MAC protocol. In general, ARQ schemes can be integrated with forward error correction schemes into hybrid error control schemes [45]. Ideally, for industrial applications deadlines should be taken into account when designing these schemes. In [3, 72], retransmissions and FEC are combined with the concept of deadlines by increasing the coding strength with each retransmitted packet as the packet deadline approaches. This is called deadline-dependent coding. Another interesting hybrid error control technique is packet combining [30, 39, 78]. Put briefly, in these schemes the receiver tries to take advantage of the partially useful information contained in already received erroneous copies of a packet. For example, if the receiver has received at least three erroneous copies of a packet, it can try to figure out the original packet by
*The two antennas should at a minimum have a mutual distance of 40% of the wavelength [61, Chap. 5]. If the system works in the 2.4-GHz ISM band, this amounts to 5 to 6 cm.
© 2005 by CRC Press
22-12
The Industrial Communication Technology Handbook
applying bit-by-bit majority voting. There are other packet-combining techniques, for example, equalgain combining. Sometimes the packet error probability (and therefore the need for retransmissions) can be reduced by proper tuning of packet sizes. Intuitively, it is clear that larger packets are more likely hit by errors than smaller ones. On the other hand, with smaller packets the fixed-size packet header becomes more dominant and leads to increased overhead. If the transmitter has estimates of current channel conditions, it can choose the appropriate packet size, giving the desired trade-off between reliability and efficiency [47].
22.6 Wireless Fieldbus Systems: State of the Art Fieldbus systems are designed to deliver hard real-time services under harsh environmental conditions. A wireless fieldbus [13] should be designed to provide as stringent stochastic timing and reliability guarantees as possible over wireless links. However, in most of the literature surveyed in this section this issue is not addressed. Nonetheless, we discuss existing approaches for different popular fieldbus systems.
22.6.1 CAN As already described in Section 22.5.1, the CAN system [35] uses a priority arbitration protocol on the MAC layer, which cannot be implemented directly on a wireless link. Some approaches have been developed to circumvent this; here we discuss a centralized and two distributed solutions [42]. The distributed WMAC protocol uses a CSMA/CA (carrier-sense multiple access with collision avoidance) scheme with priority-dependent backoffs. A station wishing to transmit a packet uses a carriersense mechanism to wait for the end of an ongoing packet transmission. After this, the station picks a backoff time depending on the priority value of the current packet. The station listens on the channel during the backoff time. If no other station starts transmission, the station assumes that it has the highest priority and starts transmitting its own packet. Otherwise, the station defers and starts over after the other packet has been finished. In another distributed scheme the CAN message priority value is mapped onto the channel using an on–off keying scheme [41]: a station transmits a short burst if the current priority bit is a logical one; otherwise, it switches into receive mode. If the station receives any signal, it gives up; otherwise, it continues with the next bit. The priority bits are considered from the most significant bit to the least significant bit. If the station is still contending after the last bit, it transmits the actual data packet. This approach requires tight synchronization and fast switching between transmit and receive modes of the radio transceiver, which is a problem for certain WLAN technologies. The centralized RFMAC protocol leverages the fact that CAN belongs to the class of systems using the real-time database communication model. Data items are identified by unique identifiers. Similar to FIP/ WorldFIP, all communication is controlled by a central station broadcasting the variable identifiers and causing the producers of the corresponding data items to transmit the data.
22.6.2 FIP/WorldFIP The FIP/WorldFIP fieldbus uses a polling table to implement a real-time database [73]. To couple wired and wireless stations, in [49] a wireless-to-wired gateway is introduced, serving as a central station for the wireless part. The wireless MAC protocol uses time-division multiple access (TDMA), and each TDMA slot is used to transmit one data item (also called process variable). In the OLCHFA project, a prototype system integrating wired and wireless FIP stations has been developed. This system works in the 2.4-GHz ISM band using a DSSS physical layer [36]. The available publications put emphasis on the management of configuration data and on distributed algorithms for clock synchronization. The MAC and data link protocols of Factory Instrumentation Protocol were not modified. Since FIP broadcasts the values of process variables periodically, the protocol contains no retransmission scheme for the time-critical data. Instead, the OLCHFA approach is to enhance the FIP
© 2005 by CRC Press
Wireless LAN Technology for the Factory Floor: Challenges and Approaches
22-13
process variable model with so-called time-critical variables, which provide freshness information to the applications. Applications can use this to handle cases of repeated losses.
22.6.3 PROFIBUS The R-FIELDBUS project (www.rfieldbus.de) evaluated how IEEE 802.11 with DSSS can be used in a PROFIBUS fieldbus system and how such a system can be used for transmission of IP-based multimedia data [33, 62]. Two different architectures have been proposed: the single logical ring and the multiple logical ring solutions, discussed below. Both solutions run the (almost) unmodified PROFIBUS protocol. The PROFIBUS protocol uses token passing on top of a broadcast medium. The token is passed between active stations along a logical ring, and much of the protocol’s complexity deals with ring maintenance. The token itself is a small control frame. In the single logical ring solution all wired and wireless stations are integrated into a single logical token-passing ring. The coupling devices between the wired and wireless domains simply forward all packets. This approach is easy to realize but subjects both data packets and control packets like the token frame to the errors on wireless links. It is shown in [83] for PROFIBUS and in [38] for the similar IEEE 802.4 Token Bus protocol that repeated losses of token frames can create severe problems with the achievable real-time performance. Since there is only a single logical ring, the whole network is affected. In contrast, in the multiple logical ring solution [23] wireless and wired stations are separated into several logical rings. These rings are coupled by intelligent devices called brouters (a merger from bridge and router). In this solution, transmission problems distort only one ring; the other logical rings remain operational. A second benefit of having multiple rings is traffic segmentation: if the segments are chosen carefully, most of the traffic will be intrasegment, and thus the overall traffic capacity can be increased. A drawback of the multiple logical ring solution, however, is that intersegment traffic is not natively supported by the PROFIBUS protocol and extensions are required. In [80, 81] a system following the wireless MAC-aware bridging approach is proposed. On the wireless side, specifically tailored polling-based protocols are used, whereas wired stations run the unmodified PROFIBUS protocol stack. The goal is to avoid token passing on wireless segments. It is shown that for bursty channel errors the polling-based protocols achieve substantially better performance in terms of stochastic hard real-time behavior than the PROFIBUS token-passing protocol; for certain kinds of channels the 99% quartile of the delay needed to successfully transmit a high-priority packet is up to an order of magnitude smaller than for the PROFIBUS protocol. To integrate both protocols, the coupling device between wired and wireless media provides a virtual ring extension [80]. In this scheme the coupling device acts on the wired side on behalf of the wireless stations. For example, it creates token frames and executes the ring maintenance mechanisms. Finally, in [43] a scheme for integration of wireless nodes into a PROFIBUS-DP network (single master, many slaves, no token passing) is described. An application layer gateway is integrated with a virtual master station. The virtual master acts as a proxy for the wireless stations; it polls them using standard IP and IEEE 802.11 distributed coordination function (DCF) protocols.
22.6.4 Other Fieldbus Technologies For the International Electrotechnical Commission (IEC) fieldbus [34] (which uses a centralized, pollingbased access protocol for periodic data and a token-passing protocol for asynchronous data) in reference [7], an architecture is proposed that allows coupling of several fieldbus segments using a wireless backbone based on IEEE 802.11 with point coordination function (PCF). In [50], it is investigated how the Manufacturing Automation Protocol (MAP)/Manufacturing Message Specification (MMS) application layer protocol can be enhanced with mobility. In the proposed system the IEEE 802.11 WLAN with DCF is used; time-critical transmissions and channel errors are not considered. In [48], the same question was investigated with the digital European cordless telephone (DECT) as the underlying technology.
© 2005 by CRC Press
22-14
The Industrial Communication Technology Handbook
22.7 Wireless Ethernet/IEEE 802.11 Instead of developing WLAN technology for the factory floor from scratch, existing technologies might serve as a starting point. A good candidate is the IEEE 802.11 WLAN standard [53, 54, 69], since it is the most widely used WLAN technology. Some alternative systems are HIPERLAN [19, 20], Bluetooth [29], and HomeRF [51].
22.7.1 Brief Description of IEEE 802.11 IEEE 802.11 belongs to the IEEE 802.x family of LAN standards. The standard describes architecture, services, and protocols for an Ethernet-like wireless LAN, using a CSMA/CA-based MAC protocol with enhancements for time-bounded services. The protocols run on top of several PHYs: a FHSS PHY, a DSSS PHY offering 1 and 2 Mb/s [69], 5.5 and 11 Mb/s extensions of the DSSS PHY [53], and an OFDM PHY with 54 Mb/s [54]. The standard describes an ad hoc mode and an infrastructure-based mode. In the infrastructure mode all communications are relayed through fixed access points (APs). An access point constitutes a service set in, and mobile stations have to associate with the closest access point. The access points are connected by a distribution system that allows the forwarding of data packets between mobile stations in different cells. In the ad hoc mode, there are neither access points nor a distribution system; stations communicate in a peer-to-peer fashion. A detailed description of IEEE 802.11 can be found in [55]. The basic MAC protocol of 802.11 is called the distributed coordination function (DCF). It is a CSMA/ CA protocol using the RTS/CTS scheme described in Section 22.5.1 and different interframe gaps to give control frames (for example, acknowledgments, CTS frames) priority over data frames. However, data frames cannot be differentiated according to priorities. The IEEE 802.11 MAC provides a connectionless and semireliable best-effort service to its users by performing a bounded number of retransmissions. The user cannot specify any quality-of-service requirements for his packets; he can only choose between contention-based and contentionless transmission (see below). Furthermore, it is not possible to specify attributes like transmit power, modulation scheme, or the number of retransmissions on a per-packet basis. This control would be desirable for treating different packet types differently. As an example, one could transmit high-priority packets with high transmit power and low bit rate to increase their reliability. The enhancement for time-bounded services is called point coordination function (PCF) and works only in the infrastructure mode. The PCF defines a superframe structure with variable- but maximumlength superframes. A superframe consists of a superframe header followed by a contention-free period (CFP) and a contention period (CP), both of variable length. During the CP all stations operate in the DCF mode, including the access points. To initiate the start of the CFP, the AP (also called point coordinator (PC)) has to acquire the wireless medium before it can transmit its beacon packet. Therefore, beacon transmissions and contention-free periods are not strictly periodic and isochronous services are not supported. The beacon indicates the length of the contention-free period, and all stations receiving the beacon are forbidden to initiate transmissions during this time. Instead, they wait to be polled by the point coordinator. If this happens, they can use the medium exclusively for transmission of a single packet. After the contention-free period ends, the stations return to their usual DCF behavior and can initiate transmissions at their will. The AP has a poll list of station addresses. The polling scheme itself is not fully specified. A station that desires to be polled has to signal this to the AP during the association process. The poll list membership ends upon disassociation or when the station reassociates itself without requesting contention-free service.
22.7.2 Real-Time Transmission over IEEE 802.11 The PCF is designed to provide time-bounded services. Many studies [67, 76, 77] confirm that indeed packets transmitted during the CFP receive substantially smaller delays than those transmitted during
© 2005 by CRC Press
Wireless LAN Technology for the Factory Floor: Challenges and Approaches
22-15
the CP, but at the cost of substantial overhead: in [77] the authors show a scenario where eight voice calls, each having a data rate of 8 kBit/s, require approximately 50% bandwidth of a 1 MBit/s transmission medium (without channel errors). When transmission has to be both timely and reliable despite channel errors, retransmissions are needed. When a time-critical packet transmitted during the CFP fails, the station can try the retransmission during the following CP or during the next CFP one superframe later (except in the case where multiple entries in the polling list are allocated to the mobile, and thus it receives multiple polls during the same CFP). Hence, retransmissions of important packets receive no priority in 802.11.
22.8 Summary This chapter presented some problems and solution approaches to bringing WLAN technology to the factory plant and to benefiting from reduced cabling and mobility. The basic problem is the tension between the hard timing and reliability requirements of industrial applications, on the one hand, and the serious error rates and time-varying error behavior of wireless channels, on the other hand. Many techniques have been developed to improve the reliability and timeliness behavior of lower-layer wireless protocols, but up to now, wireless fieldbus systems have not been deployed on a large scale, as the problem of reliable transmission despite channel errors is not solved satisfactorily. It is not clear which combination of mechanisms and technologies has the potential to bound the number of deadline misses under realistic channel conditions. It seems to be an open question of whether just more engineering is needed to make wireless transmission suitable for fulfilling hard real-time and reliability requirements, or whether there is really a limit of what can be achieved. Fortunately, wireless communications and WLAN technology is a very active field of research and development. New technologies are created and existing technologies are enhanced. As an example, the IEEE 802.11g and IEEE 802.11e working groups are working on delivering higher bit rates and better quality of service to users. It will be exciting to see how industrial applications can benefit from this.
References [1] Lars Ahlin and Jens Zander. Principles of Wireless Communications. Studentlitteratur, Lund, Sweden, 1998. [2] A. Annamalai and Vijay K. Bhargava. Analysis and optimization of adaptive multicopy transmission ARQ protocols for time-varying channels. IEEE Transactions on Communications, 46:1356–1368, 1998. [3] Henrik Bengtsson, Elisabeth Uhlemann, and Per-Arne Wiberg. Protocol for wireless real-time systems. In Proceedings of the 11th Euromicro Conference on Real-Time Systems, York, England, 1999. [4] Pravin Bhagwat, Partha Bhattacharya, Arvind Krishna, and Satish K. Tripathi. Using channel state dependent packet scheduling to improve TCP throughput over wireless LANs. Wireless Networks, 3:91–102, 1997. [5] Richard Cam and Cyril Leung. Multiplexed ARQ for time-varying channels. Part I. System model and throughput analysis. IEEE Transactions on Communications, 46:41–51, 1998. [6] Richard Cam and Cyril Leung. Multiplexed ARQ for time-varying channels. Part II. Postponed retransmission modification and numerical results. IEEE Transactions on Communications, 46:314–326, 1998. [7] S. Cavalieri and D. Panno. On the integration of fieldbus traffic within IEEE 802.11 wireless LAN. In Proceedings of the 1997 IEEE International Workshop on Factory Communication Systems (WFCS ’97), Barcelona, Spain, 1997. [8] James K. Cavers. Mobile Channel Characteristics. Kluwer Academic Publishers, Boston, 2000. [9] R. Chen, K.C. Chua, B.T. Tan, and C.S. Ng. Adaptive error coding using channel prediction. Wireless Networks, 5:23–32, 1999.
© 2005 by CRC Press
22-16
The Industrial Communication Technology Handbook
[10] Carla-Fabiana Chiasserini and Ramesh R. Rao. Coexistence mechanisms for interference mitigation in the 2.4-GHz ISM band. IEEE Transactions on Wireless Communications, 2:964–975, 2003. [11] Daniel J. Costello, Joachim Hagenauer, Hideki Imai, and Stephen B. Wicker. Applications of errorcontrol coding. IEEE Transactions on Information Theory, 44:2531–2560, 1998. [12] Klaus David and Thorsten Benkner. Digitale Mobilfunksysteme. Informationstechnik. B.G. Teubner, Stuttgart, 1996. [13] Jean-Dominique Decotignie. Wireless fieldbusses: a survey of issues and solutions. In Proceedings of the 15th IFAC World Congress on Automatic Control (IFAC 2002), Barcelona, Spain, 2002. [14] Jean-Dominique Decotignie and Patrick Pleineveaux. A survey on industrial communication networks. Annales des Telecommunications, 48:435ff, 1993. [15] Lou Dellaverson and Wendy Dellaverson. Distributed channel access on wireless ATM links. IEEE Communications Magazine, 35:110–113, 1997. [16] David A. Eckhardt and Peter Steenkiste. A trace-based evaluation of adaptive error correction for a wireless local area network. MONET: Mobile Networks and Applications, 4:273–287, 1999. [17] Moncef Elaoud and Parameswaran Ramanathan. Adaptive use of error-correcting codes for realtime communication in wireless networks. In Proceedings of IEEE INFOCOM 1998, San Francisco, March 1998. [18] Anthony Ephremides. Energy concerns in wireless networks. IEEE Wireless Communications, 9:48–59, 2002. [19] ETSI. High Performance Radio Local Area Network (HIPERLAN): Draft Standard. ETSI, March 1996. [20] ETSI. TR 101 683, HIPERLAN Type 2: System Overview. ETSI, February 2000. [21] ETSI. TS 101 475, BRAN, HIPERLAN Type 2: Physical (PHY) Layer. ETSI, March 2000. [22] Andras Farago, Andrew D. Myers, Violet R. Syrotiuk, and Gergely V. Zaruba. Meta-MAC protocols: automatic combination of MAC protocols to optimize performance for unknown conditions. IEEE Journal on Selected Areas in Communications, 18:1670–1681, 2000. [23] Luis Ferreira, Mario Alves, and Eduardo Tovar. Hybrid wired/wireless PROFIBUS networks supported by bridges/routers. In Proceedings of the 2002 IEEE Workshop on Factory Communication Systems, WFCS 2002, pp. 193–202, Västeras, Sweden, 2002. [24] Funbus-Projektkonsortium. Das Verbundprojekt Drahtlose Feldbusse im Produktionsumfeld (Funbus): Abschlubbericht. INTERBUS Club Deutschland e.V., Postf. 1108, 32817 Blomberg, Bestell-Nr: TNR 5121324, October 2000. Available at http://www.softing.de/d/NEWS/ Funbusbericht.pdf. [25] Jerry D. Gibson, editor. The Communications Handbook. CRC Press/IEEE Press, Boca Raton, FL, 1996. [26] Savo Glisic and Branka Vucetic. Spread Spectrum CDMA Systems for Wireless Communications. Artech House, Boston, 1997. [27] Andrea J. Goldsmith and Stephen B. Wicker. Design challenges for energy-constrained ad hoc wireless networks. IEEE Wireless Communications, 9:8–27, 2002. [28] Ajay Chandra V. Gummalla and John O. Limb. Wireless medium access control protocols. IEEE Communications Surveys and Tutorials, 3, 2000. Availabe at http://www.comsoc.org/pubs/surveys. [29] Jaap C. Haartsen. The Bluetooth radio system. IEEE Personal Communications, 7:28–36, 2000. [30] Bruce A. Harvey and Stephen B. Wicker. Packet combining systems based on the Viterbi decoder. IEEE Transactions on Communications, 42:1544–1557, 1994. [31] Junji Hirai, Tae-Woong Kim, and Atsuo Kawamura. Practical study on wireless transmission of power and information for autonomous decentralized manufacturing system. IEEE Transactions on Industrial Electronics, 46:349–359, 1999. [32] Gavin Holland, Nitin Vaidya, and Paramvir Bahl. A rate-adaptive MAC protocol for wireless networks. In Proceedings of the Seventh Annual International Conference on Mobile Computing and Networking 2001 (MobiCom), Rome, Italy, July 2001. [33] Jörg Hähniche and Lutz Rauchhaupt. Radio communication in automation systems: the R-Fieldbus approach. In Proceedings of the 2000 IEEE International Workshop on Factory Communication Systems (WFCS 2000), pp. 319–326, Porto, Portugal, 2000.
© 2005 by CRC Press
Wireless LAN Technology for the Factory Floor: Challenges and Approaches
22-17
[34] IEC (International Electrotechnical Commission). IEC-1158-1, FieldBus Specification, Part 1, FieldBus Standard for Use in Industrial Control: Functional Requirements. IEC. [35] International Organization for Standardization. ISO Standard 11898: Road Vehicle: Interchange of Digital Information: Controller Area Network (CAN) for High-Speed Communication. ISO, 1993. [36] Ivan Izikowitz and Michael Solvie. Industrial needs for time-critical wireless communication and wireless data transmission and application layer support for time-critical communication. In Proceedings of Euro-Arch ’93, Munich, 1993. [37] W.C. Jakes, editor. Microwave Mobile Communications. IEEE Press, Piscataway, NJ, 1993. [38] Hong ju Moon, Hong Seong Park, Sang Chul Ahn, and Wook Hyun Kwon. Performance degradation of the IEEE 802.4 Token Bus network in a noisy environment. Computer Communications, 21:547–557, 1998. [39] Samir Kallel. Analysis of a type-II hybrid ARQ scheme with code combining. IEEE Transactions on Communications, 38:1133–1137, 1990. [40] J.F. Kurose, M. Schwartz, and Y. Yemini. Multiple-access protocols and time-constrained communication. ACM Computing Surveys, 16:43–70, 1984. [41] A. Kutlu, H. Ekiz, M.D. Baba, and E.T. Powner. Implementation of “comb” based wireless access method for control area network. In Proceedings of the 11th International Symposium on Computer and Information Science, pp. 565–573, Antalaya, Turkey, November 1996. [42] A. Kutlu, H. Ekiz, and E.T. Powner. Performance analysis of MAC protocols for wireless control area network. In Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks, pp. 494–499, Beijing, China, June 1996. [43] Kyung Chang Lee and Suk Lee. Integrated network of PROFIBUS-DP and IEEE 802.11 wireless LAN with hard real-time requirement. In Proceedings of IEEE 2001 International Symposium on Industrial Electronics, Pusan, Korea, 2001. [44] Shu Lin and Daniel J. Costello. Error Control Coding: Fundamentals and Applications. Prentice Hall, Englewood Cliffs, NJ, 1983. [45] Hang Liu, Hairuo Ma, Magda El Zarki, and Sanjay Gupta. Error control schemes for networks: an overview. MONET: Mobile Networks and Applications, 2:167–182, 1997. [46] Nitaigour Premchand Mahalik, editor. Fieldbus Technology: Industrial Network Standards for RealTime Distributed Control. Springer, Berlin, 2003. [47] Eytan Modiano. An adaptive algorithm for optimizing the packet size used in wireless ARQ protocols. Wireless Networks, 5:279–286, 1999. [48] Philip Morel. Mobility in MAP networks using the DECT wireless protocols. In Proceedings of the 1995 IEEE Workshop on Factory Communication Systems, WFCS ’95, Leysin, Switzerland, 1995. [49] Philip Morel and Alain Croisier. A wireless gateway for fieldbus. In Proceedings of the Sixth International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC 95), 1995. [50] Philip Morel and Jean-Dominique Decotignie. Integration of wireless mobile nodes in MAP/MMS. In Proceedings of the 13th IFAC Workshop on Distributed Computer Control Systems DCCS 95, 1995. [51] Kevin J. Negus, Adrian P. Stephens, and Jim Lansford. HomeRF: Wireless networking for the connected home. IEEE Personal Communications, 7:20–27, 2000. [52] Giao T. Nguyen, Randy H. Katz, Brian Noble, and Mahadev Satyanarayanan. A trace-based approach for modeling wireless channel behavior. In Proceedings of the Winter Simulation Conference, Coronado, CA, December 1996. [53] Editors of IEEE 802.11. IEEE Standard for Information Technology: Telecommunications and Information Exchange between Systems: Local and Metropolitan Networks: Specific Requirements: Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications: Higher Speed Physical Layer (PHY) Extension in the 2.4 GHz Band. IEEE, 1999. [54] Editors of IEEE 802.11. IEEE Standard for Telecommunications and Information Exchange between Systems: LAN/MAN Specific Requirements: Part 11: Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications: High Speed Physical Layer in the 5 GHz Band. IEEE, 1999.
© 2005 by CRC Press
22-18
The Industrial Communication Technology Handbook
[55] Bob O’Hara and Al Petrick. IEEE 802.11 Handbook: A Designer’s Companion. IEEE Press, New York, 1999. [56] K. Pahlavan and A.H. Levesque. Wireless Information Networks. John Wiley & Sons, 1995. [57] Arogyaswami Paulraj. Diversity techniques. In Jerry D. Gibson, editor, The Communications Handbook, pp. 213–223. CRC Press/IEEE Press, Boca Raton, FL, 1996. [58] Juan R. Pimentel. Communication Networks for Manufacturing. Prentice Hall International, Englewood Cliffs, NJ, 1990. [59] John G. Proakis. Digital Communications, 3rd edition. McGraw-Hill, New York, 1995. [60] C.S. Raghavendra and Suresh Singh. Pamas: power aware multi-access protocol with signalling for ad hoc networks. ACM Computer Communication Review, 27, 1998. [61] Theodore S. Rappaport. Wireless Communications: Principles and Practice. Prentice Hall, Upper Saddle River, NJ, 2002. [62] Lutz Rauchhaupt. System and device architecture of a radio-based fieldbus: the RFieldbus system. In Proceedings of the Fourth IEEE Workshop on Factory Communication Systems 2002 (WFCS 2002), Västeras, Sweden, 2002. [63] Asuncion Santamaria and Francisco J. Lopez-Hernandez, editors. Wireless LAN: Standards and Applications, Mobile Communication Series. Artech House, Boston, 2001. [64] Günter Schäfer. Security in Fixed and Wireless Networks: An Introduction to Securing Data Communications. John Wiley & Sons, Chichester, U.K., 2003. [65] William Stallings. Wireless Communications and Networks. Prentice Hall, Upper Saddle River, NJ, 2001. [66] Ivan Stojmenovic, editor. Handbook of Wireless Networks and Mobile Computing. John Wiley & Sons, New York, 2002. [67] Takahiro Suzuki and Shuji Tasaka. Performance evaluation of video transmission with the PCF of the IEEE 802.11 standard MAC protocol. IEEE Transactions on Communications, E83-B:2068–2076, 2000. [68] Andrew S. Tanenbaum. Computernetzwerke, 3rd edition. Prentice Hall, Munich, 1997. [69] Editors of IEEE 802.11. IEEE Standard for Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications, November 1997. [70] Fouad A. Tobagi and Leonard Kleinrock. Packet switching in radio channels. Part II. The hidden terminal problem in CSMA and busy-tone solutions. IEEE Transactions on Communications, 23:1417–1433, 1975. [71] Chai-Keong Toh. Ad Hoc Mobile Wireless Networks: Protocols and Systems. Prentice Hall, Upper Saddle River, NJ, 2002. [72] Elisabeth Uhlemann, Per-Arne Wiberg, Tor M. Aulin, and Lars K. Rasmussen. Deadline-dependent coding: a framework for wireless real-time communication. In Proceedings of the International Conference on Real-Time Computing Systems and Applications, pp. 135–142, Cheju Island, South Korea, December 2000. [73] Union Technique de l’Electricit’e. General Purpose Field Communication System, EN 50170, Volume 3, WorldFIP. Union Technique de l’Electricit’e, 1996. [74] Harmen R. van As. Media access techniques: the evolution towards terabit/s LANs and MANs. Computer Networks and ISDN Systems, 26:603–656, 1994. [75] Richard van Nee and Ramjee Prasad. OFDM for Wireless Multimedia Communications. Artech House Publisher, Boston, 2000. [76] Malathi Veeraraghavan, Nabeel Cocker, and Tim Moors. Support of voice services in IEEE 802.11 wireless LANs. In Proceedings of IEEE INFOCOM 2001, Anchorage, AK, April 2001. [77] Matthijs A. Visser and Magda El Zarki. Voice and data transmission over an 802.11 wireless network. In Proceedings of the IEEE Personal, Indoor and Mobile Radio Conference (PIMRC) 95, pp. 648–652, Toronto, Canada, September 1995. [78] Xin Wang and Michael T. Orchard. On reducing the rate of retransmission in time-varying channels. IEEE Transactions on Communications, 51:900–910, 2003.
© 2005 by CRC Press
Wireless LAN Technology for the Factory Floor: Challenges and Approaches
22-19
[79] Stuart Williams. IrDA: past, present and future. IEEE Personal Communications, 7: February 2000. [80] Andreas Willig. An architecture for wireless extension of PROFIBUS. In Proceedings of IECON 03, Roanoke, VA, 2003. [81] Andreas Willig. Polling-based MAC protocols for improving realtime performance in a wireless PROFIBUS. IEEE Transactions on Industrial Electronics, 50:806 –817, 2003. [82] Andreas Willig, Martin Kubisch, Christian Hoene, and Adam Wolisz. Measurements of a wireless link in an industrial environment using an IEEE 802.11-compliant physical layer. IEEE Transactions on Industrial Electronics, 49:1265–1282, 2002. [83] Andreas Willig and Adam Wolisz. Ring stability of the PROFIBUS token passing protocol over error prone links. IEEE Transactions on Industrial Electronics, 48:1025–1033, 2001. [84] Michele Zorzi and Ramesh R. Rao. Error Control and energy consumption in communications for nomadic computing. IEEE Transactions on Computers, 46:279–289, 1997.
© 2005 by CRC Press
23 Wireless Local and Wireless Personal Area Network Technologies for Industrial Deployment 23.1 Introduction ......................................................................23-1 23.2 WLAN, WPAN, Cellular Networks, and Ad Hoc Networks ............................................................................23-2 23.3 Bluetooth Technology .......................................................23-4 Technical Background • Performance
23.4 IEEE 802.11........................................................................23-7 Technical Background • Performance
23.5 ZigBee...............................................................................23-13 Technical Background • Performance
Kirsten Matheus Carmeq GmbH
23.6 Coexistence of WPAN and WLAN (Bluetooth and IEEE 802.11b)..................................................................23-15 23.7 Summary and Conclusions ............................................23-16 References ...................................................................................23-17
23.1 Introduction The convenience of true mobility offered by wireless connectivity is the main factor behind widespread acceptance of wireless technologies. The global system for mobile communication (GSM), a secondgeneration cellular system designed mainly for mobile telephony, currently has more than one billion users worldwide. Systems like GSM or the third-generation universal mobile telecommunication system (UMTS) nevertheless require extensive infrastructure. The commercial and industrial deployment of systems that function on a smaller scale and do not require costly frequency licensing or infrastructure has become more appealing; such systems include wireless personal area networks (WPANs) and wireless local area networks (WLANs). As a consequence, the Bluetooth and IEEE 802.11 technologies and the newly emerging ZigBee have received a significant amount of public and scientific attention. Bluetooth, like ZigBee, is a typical WPAN representative that is inexpensive, consumes little power, is small in size, and supports voice and data services. The different IEEE 802.11 variants are WLAN representatives that provide comparably high user data rates at the cost of a higher battery power consumption. ZigBee is limited to small data rates, but at the same time consumes very little power. With their original purposes fulfilled, new areas of deployment are being
23-1 © 2005 by CRC Press
23-2
The Industrial Communication Technology Handbook
developed for these technologies. Bluetooth plays a larger role in markets like warehousing, retailing, and industrial applications [5]. IEEE 802.11 is considered for seamless coverage of complete cities [4, 59]. Depending on the exact application, users of these wireless technologies have certain expectations concerning the quality of the systems. The application requirements have to be considered carefully in order to be able to choose the most suitable technology. The main criteria are generally throughput, delay, and reliability. In addition, cost, power consumption, security, and, last but not least, availability can be important issues. Note, though, that owing to the possibility of interference, adverse radio conditions, or range limits, hard quality-of-service (QoS) guarantees for throughput cannot be provided by wireless systems. In industrial environments the radio conditions can be especially difficult because metal walls have a significant impact on the transmission. Metal shields radio transmissions while causing respectively more reflections. Systems requiring a certain amount of data rate within a strict time window, e.g., because they are security dependent, should not be wireless. In addition to the parameters discussed above — factors like unit density, traffic demand, mobility, environmental changes during deployment, interference, frequency range, etc., determine how well a technology satisfies the requirements. Thus, both the individual link performance and the overall network capacity should be optimized. This chapter first describes in Section 23.2 the basic differences between WLANs, WPANs, cellular networks, and ad hoc networks. In Sections 23.3, 23.4, and 23.5 the technologies Bluetooth, IEEE 802.11, and ZigBee are described in more detail. Each of those sections provides the technical background on the technology under consideration, as well as investigations on the performance of the systems and their suitability for industrial applications/factory floor environments. Section 23.6 shows how Bluetooth and IEEE 802.11b/g, which are placed in the same frequency band and are possibly used at the same time in the same location — coexist. Section 23.7 provides a summary and the conclusion.
23.2 WLAN, WPAN, Cellular Networks, and Ad Hoc Networks The expressions wireless local area network (WLAN), wireless personal area network (WPAN), cellular networks, and ad hoc networks are commonly used, often though without consistency or precision. In the following, a clarification of the terminology is given: WLAN: A wireless LAN has the same functionality as a wired LAN with the difference that the wires are replaced by air links. This means that within a restricted area (home, office, hot spot), intercommunication between all devices connected to the network is possible and the focus is on data communication as well as high data rates. The definition of WLAN says nothing on how the network is organized. Often an infrastructure of mounted access points (APs) enables wireless access to the wired LAN behind the APs, thus representing a cellular network structure. Nevertheless, a wireless LAN can also function on an ad hoc basis. WPAN: In a wireless PAN all devices are interconnectable. The difference is that all units are somehow associated with someone or something alike (either because they are yours or because they are shared or public devices you want to use) and are very nearby. A PAN can consist of a variety of devices and can even include different technologies. The applications are therefore not limited to data transmission, but voice communication can be used in a PAN as well. While you move within the WLAN, you can generally move with your WPAN. This means that several independent WPANs can coexist in the same area, each being self-sufficient without any infrastructure. Thus, they generally function on an ad hoc basis. The difference between cellular and ad hoc networks is visualized in Figure 23.1. As can be seen, there are several steps that lead to ad hoc networking: a pure ad hoc network employs neither any infrastructure nor a specific unit (like access point or base station) for the organization of coverage, synchronization, and services. Nevertheless, a network can be ad hoc, when it supports single hops only. It can be seen that WLAN technologies like IEEE 802.11 in the infrastructure mode or HIPERLAN/2 are in the same classification as typical cellular systems like GSM or UMTS/wireless code-division multiple
© 2005 by CRC Press
Wireless Local and Wireless Personal Area Network Technologies
23-3
FIGURE 23.1 Classification of wireless technologies between cellular and ad hoc; MANET stands for mobile (or multihop [38]) ad hoc network (designed for Internet applications [31]), PRnet for public radio network [46], and ODMA for opportunity-driven multiple access [1, 54]. A Bluetooth scatternet means that several Bluetooth piconets are interlinked. BS = base station.
access (WCDMA); all are based on infrastructure and use a specific unit for central control. It would thus be correct to call these WLAN systems cellular systems. Despite this, there are distinct differences with regard to coverage. The wide (instead of local) area coverage of cellular systems like GSM has caused cellular systems to be associated with complete coverage and access everywhere, even though this is not correct. The description wireless wide area network (WWAN) gives a better idea of the difference from WLAN systems. The fact that the existing WWAN technologies like GSM and UMTS focus on voice communication, while WLAN technologies focus on data transmission, is not such an important difference. More important is that WWAN technologies are designed to support user mobility (and roaming) up to very high velocities, while WLAN systems support stationary or portable access. Because of the licensing regulations and costs for extensive infrastructure, WWAN systems like GSM and UMTS are not of interest for industrial applications. Thus, they are not discussed any further. From the radio network point of view, the most important technical distinction with respect to the discussed terminology — with far-reaching consequences to system design, network optimization, etc. [52] — does not have to be made between WPAN, WLAN, or WWAN technologies. It has to be made between systems organized in cells and ad hoc systems: • As cellular networks are systematically laid out, the minimum distance to the next co-channel interferer (i.e., the next uncoordinated user transmitting at the same time on the same frequency) is generally controllable, known, and fixed. In contrast, in ad hoc networks the next co-channel interferer can be very close from one moment to the next, without any possibility of influencing the situation (Figure 23.2). • The centralized control in cellular networks allows for effective and fair distribution of resources, because of the accurate knowledge of the overall traffic demand and the possibility of a more global resource management. For ad hoc networks or several coexisting WPANs, knowledge of overall traffic demand is generally not available and the systems compete for the resources.
FIGURE 23.2 Next closest co-channel interferer in cellular and ad hoc networks.
© 2005 by CRC Press
23-4
The Industrial Communication Technology Handbook
23.3 Bluetooth Technology 23.3.1 Technical Background Bluetooth (BT) is first of all a cable replacement technology aiming at effortless wireless connectivity in an ad hoc fashion. It supports voice as well as data transmission [7, 8, 10, 11, 26]. Its desired key associations are easy to use, low in power consumption, and low in cost, with the aim that the Bluetooth functionality is integrated in as many devices as possible. To meet these goals, the BT special interest group (SIG) placed the technology in the unlicensed ISM (industrial, scientific, and medical) band at 2.4 GHz. This allows close to worldwide deployment without the need to pay fees for frequency licensing. Nevertheless, it requires complying with the respective sharing rules.* As a consequence, Bluetooth performs a rather fast frequency hopping (FH) over 79 carriers of 1-MHz bandwidth each, such that every Bluetooth packet is transmitted on a newly chosen frequency (which results in a nominal hop rate of 1600 hops/s). To further reduce cost and support the distribution of the Bluetooth technology, the Bluetooth specification is an open standard that can be used without even needing to pay for the use of its key patents, on the term that the strict qualification procedure is passed.† The latter is to ensure the acceptance of the technology. For the same purpose, the specification contains application profiles. The profiles describe in detail the implementation of the foreseen applications, thus enabling units of different manufactures to communicate. The most important characteristics on the physical layer are as follows: The data are Gaussian frequency shift keying (GFSK) modulated at 1 Mbps and organized in packets consisting of access code, header, and payload. The employment of forward error correction (FEC) for the payload is optional. Interpacket interleaving is not performed. This allows for lower chip prices, because memory can be saved. Bluetooth uses a master–slave concept in which the unit that initiates a connection is temporarily assigned master status (for as long as the connection is up). The master organizes the traffic of up to seven other active units, called slaves, of this piconet. From the master’s device address the identity of each piconet, and with it the frequency-hopping sequence, can be derived. The header of a packet contains the actual addressee, the length of the packet, and other control information. Note that within one piconet the slave can only communicate with the master (and not directly with the other slaves) and this — in case of data connections — only after having been asked (i.e., polled).‡ The channel is organized in a time-division multiple access (TDMA)/time-division duplex (TDD) [24] scheme (Figure 23.3). It is partitioned into 625-ms time slots. Within this slot grid the master can only start transmission in the odd-numbered slots, while the slaves can only respond in even-numbered ones. When a unit is not already in one of the specific power save modes (sniff, hold, park), the slotting is power consumption friendly, because every unit has to listen only during the first 10 ms of its receive slot whether there is a packet arriving (and if not, can close down until the next receive slot**). This means it needs to listen into the channel only 10 ms/(2 · 625 ms) = 0.8% of the time, during an active connection in which no packets are sent. Yet another facet to the long battery life is the low basic transmit power of 0 dBm (resulting in a nominal range of about 10 m). Bluetooth can also be used with up to 20-dBm transmit power. This results in a larger range but requires the implementation of power control to fulfill the Federal Communications Commission (FCC) sharing rules.
*For the U.S. [20, Part 15], for Europe [16], for Japan [48]. †The Bluetooth specification has also been adopted by the IEEE. It can be found under IEEE 802.15.1. ‡Every BT unit can simultaneously be a member of up to four piconets (though it can be master in only one of them). A formation in which several piconets are interlinked in that manner is called scatternet. Aspects like routing, which are of interest in this constellation, will not be covered in this chapter. The chapter will thus focus on the properties of a single or multiple independent piconets. **This is quite different from channel access schemes like CSMA, as used in IEEE 802.11 (see Section 23.4). Unless asleep, IEEE 802.11 always has to listen into the channel.
© 2005 by CRC Press
23-5
Wireless Local and Wireless Personal Area Network Technologies
DH1
HV3
DH1
POLL
HV3
HV3 HV3
HV3
HV3
DH1
DH1
ACK
FIGURE 23.3 Example slot occupation within one piconet consisting of a master and three slaves; to slave 1 there is an SCO link, to slave 2 (best-effort) traffic is transmitted in both directions, and slave 3 currently has nothing to send (but has to respond to the POLL packet with an acknowledgment (ACK)). During the transmission of multislot packets the frequency is not changed.
Bluetooth provides two in principle different types of connections: asynchronous connectionless (ACL) links foreseen for data transmission and synchronous connection-oriented (SCO) links foreseen for speech transmission. For ACL links there are six packet types defined. The packets occupy either one, three, or five (625-ms) time slots, and their payloads are either uncoded (called DH1, DH3, or DH5, respectively) or protected with a 2/3 rate FEC using a (15, 10) shortened Hamming block code without any interleaving (called DM1, DM3, DM5, respectively). An automatic repeat request (ARQ) scheme initiates the retransmission of a packet in case the evaluation of the cyclic redundancy check (CRC) included in each ACL payload shows inconsistencies. This secures error-free reception of the transmitted information. Table 23.1 gives an overview of the throughput values achievable with ACL connections. The maximum (unidirectional) Bluetooth throughput is 723 kbps. As speech transmission is delay sensitive, the original SCO links support three different packet types that are transmitted at fixed intervals. These types were designed to transport continuous-variable slope delta (CVSD) encoded speech at 64 kbps. The packet types occupy always just one (625-ms) time slot, but they are differentiated by their payload FEC. The packet payloads are either unprotected (called HV3) or 2/3 rate FEC encoded (HV2) or protected with a 1/3 rate repetition code (HV1). For an HV3 connection, a packet is transmitted every sixth slot (Figure 23.3); for HV2, every fourth slot; and for HV1, every second slot (meaning that with one HV1 connection no other traffic can be transmitted in the piconet). Up to the Bluetooth Specification 1.1 [9] there was no ARQ scheme for SCO links. In case of an erroneous reception of the packet overhead, the SCO packet was replaced by an erasure pattern. In case noncorrectable bit errors occurred in the payload only, these errors were forwarded to the speech decoder. The latest specification, Bluetooth Specification 1.2 [10], includes an enhanced SCO link. This link allows very flexible deployment of the SCO link, providing for a reserved bandwidth for several transmission rates and a limited number of retransmissions. TABLE 23.1 Throughput Values for ACL Connections
Name
No. of Slots
FEC?
Max. No. of User Bytes
DH1 DH3 DH5 DM1 DM3 DM5
1 3 5 1 3 5
No No No 2/3 2/3 2/3
27 183 339 17 121 224
Unidirectional Throughput
Bidirectional Throughput
Forward
Reverse
Forward
Reverse
172.8k 585.6k 723.2k 108.8k 387.2k 477.8k
172.8k 86.4k 57.6k 108.8k 54.4k 36.3k
172.8k 390.4k 433.9k 108.8k 258.1k 286.7k
172.8k 390.4k 433.9k 108.8k 258.1k 286.7k
Note: The reverse link in the unidirectional case transmits DH1 or DM1 packets, depending on whether the forward link uses a DH or DM packet type.
© 2005 by CRC Press
23-6
The Industrial Communication Technology Handbook
To further improve coexistence with other systems in the ISM band, Bluetooth version 1.2 includes the possibility to perform adaptive frequency hopping (AFH), i.e., to exclude carrier frequencies used by other systems from the hop sequence. With AFH the nominal hop rate will be halved, because the specification has been changed such that the slave responds on the same frequency on which it received the packet from the master [10]. Security is supported in Bluetooth by the specification of authentication and encryption. For the future, a high-rate mode is envisioned that allows direct slave-to-slave communication at an about 10-fold transmission rate. The transmission takes place on 4-MHz channels that are chosen at specifically good locations within the 79-MHz bandwidth.
23.3.2 Performance On the factory floor Bluetooth can be used as a wireless add-on to wired systems or as a replacement of existing cabling. It can cover machine-to-machine communication, wireless/remote monitoring, or tracking and some type of positioning of moving entities [5, 23]. Considering the comparably short range of Bluetooth and the likely association with a specific unit (represented by a machine, person, or task), it is possible that several independently active Bluetooth piconets coexist and overlap in space. The use of frequency hopping helps to mitigate the effects of interference among these piconets. When assuming more or less time-synchronized piconets, a worst-case approximation of the loss rate can be made with Equation 23.1. It calculates the probability P(x, n) that of x other piconets, n hop onto the same frequency as the considered piconet: n
Ê x ˆ Ê 1 ˆ Ê 78 ˆ P(x, n) = Á ˜ Á ˜ Á ˜ Ë n¯ Ë 79 ¯ Ë 79 ¯
x- n
(23.1)
The probability that at least one of the other x piconets transmits on the same frequency is then P(x) = 1 – P(x, 0). The smaller the number of interfering piconets, the better the approximation offered by this approach, because for larger numbers, the distances to some of the interferers are likely to be too large to be harmful. In [52, 61, 62] a more sophisticated approach has been chosen, and Bluetooth–Bluetooth coexistence results have been obtained with the help of detailed radio network simulations that include traffic, distribution, and fading models, as well as adjacent channel effects. All results have been obtained for an office of 10 ¥ 20 m2, assuming an average master–slave distance of 2 m. Naturally a factory floor is likely to be significantly larger than 10 ¥ 20 m2. Nevertheless, the increased delay spread on the factory floor does not really effect Bluetooth due to its small range (which is different for WLAN technologies; see Section 23.4.2). On the factory floor it is thus possible to place the Bluetooth units with the same density as in the investigated office scenario without loss in performance. Because a factory floor is larger than the investigated office, the overall number of piconets that can be used simultaneously on the factory floor is larger too. Additionally, location and traffic of the factory floor units are likely to be more predictable. Directive antennas also help to improve the performance. The results of the aforementioned publications thus give a good idea of what performance is achievable: • A 10 ¥ 20 m2 room supports 30 HV3 simultaneous speech connections with an average packet loss rate of 1% (a limit that still allows for acceptable quality). • HV3 packet types are preferable to HV2 and HV1. The subjective quality will not increase with additional payload coding. Using a coded HV packet just increases (unnecessarily) the interference in the network and the power consumption. • One hundred simultaneous World Wide Web (WWW) sessions (bursty traffic with an average data rate of 33.2 kbps each) in the 10 ¥ 20 m2 size room result in only a 5% degradation of the aggregate throughput. • The maximum aggregate throughput in the room is 18 Mbps (at 50 fully loaded piconets). These piconets then transmit at a unidirectional data rate of 360 kbps each.
© 2005 by CRC Press
Wireless Local and Wireless Personal Area Network Technologies
23-7
• Long and uncoded packets are preferable to shorter and coded ones. It takes 60 interfering piconets using the same packet type, 10 interfering HV1 connections (worst case), or a link distance of 27 m (which is far beyond the envisioned range of 10 m) before another packet type yields a larger throughput than DH5 [52]. It is advisable not to use the optional FEC (DM packet types). As the coding is not appropriate to handle the almost binary character of the Bluetooth (ad hoc) transmission channel,* the additional power that would be needed for the coding can be saved. Bluetooth is inexpensive and consumes significantly less power than IEEE 802.11 systems. The ACL link is reliable with best-effort traffic (with a maximum throughput of 723 kbps). The SCO link has reserved bandwidth, though the packets might contain residual bit errors (even when using the enhanced SCO link). In principle, Bluetooth is very robust against other Bluetooth interference and good performance can be achieved even in very dense environments. Note that customized implementations, which cannot be based on existing profiles, might be difficult to realize, as the regulations do not allow the implementation of proprietary solutions. The specification of new profiles, though, can be quite time-consuming.
23.4 IEEE 802.11 23.4.1 Technical Background IEEE 802.11 includes a number of specifications that define the lower layers (mainly the physical (PHY) and medium access control (MAC) layers) for WLANs [32–34, 37, 53]. Being part of the IEEE 802 group means that an interface can be used (IEEE 802.2) to connect to the higher layers, which are then not aware of the — with IEEE 802.11 wireless — network that is actually transporting the data. The key intentions of IEEE 802.11 are thus to provide a high-throughput and continuous-network connection like that available in wired LANs. To encourage the wide employment of the technology, the use of IEEE 802.11 does not incur frequency licensing fees. IEEE 802.11 either uses infrared (IR) or transmits in the unlicensed ISM band at 2.4 GHz (like Bluetooth) or in 5-GHz bands that are license-exempt in Europe and unlicensed in the U.S. (UNII bands). In contrast to Bluetooth, the companies holding key patents within IEEE 802.11 can charge developers of IEEE 802.11 products for using the patents “on reasonable terms” [36]. In principle, it is possible to have an IEEE 802.11 WLAN consisting of mobile stations (MSs) only. It is more likely, though, that IEEE 802.11 is used as a wireless access technology to a wired LAN to which the connection is made by IEEE 802.11 access points (APs). Should the access point only employ the distributed coordination function,† the MAC layer supports collision avoidance by employing carriersense multiple access (CSMA). This means that before transmitting a packet, the respective unit has to listen for the availability of the channel.‡ If the channel is sensed free after having been busy, the unit waits a certain period (called DIFS) and then enters a random backoff period** of
*The reasons are manifold. Without interference, the channel varies already due to hopping over 79 relatively narrowband channels. Additionally, with the wavelength used in Bluetooth, even small changes in position can cause large changes in the received signal strength. When there is interference, the effect becomes more pronounced. The existence or nonexistence of a close co-channel interferer can make the channel change from very good to very bad within the fraction of a moment (Figure 23.2). †Which is likely and assumed in the following of this chapter. In theory, the standard also provides the use of a centralized point coordination function. ‡The implementor can choose whether the units react (1) on just other IEEE 802.11 traffic, (2) on just other IEEE 802.11 traffic above a certain receive signal strength, or (3) on any signal above a certain receive signal strength [34, Section 18.4.8.4]. **The random backoff period is entered only when the channel was busy before. Otherwise, the unit will transmit at once after DIFS.
© 2005 by CRC Press
23-8
The Industrial Communication Technology Handbook
nPHY + nr random (0Kmin( - 1,10233 )) ◊t slot 1444444 22444444
(23.2)
CW
with nPHY a parameter depending on the type of physical layer chosen, nr the index of the retransmission of the packet, tslot the slot duration, and CW the contention window (with CWmin = 2 nPHY - 1). If the channel is available after this period, the unit transmits its packet (consisting of a PHY header, MAC header, and payload). Upon correct reception, the addressee responds with an ACK packet a short period (called SIFS) later (Figure 23.4). The realized ARQ mechanism ensures reliable data. Obviously, the IEEE 802.11 WLAN MAC concept was designed for best-effort data traffic. Services for which strict delay requirements exist — like speech — are not supported well by the current IEEE 802.11 specifications. To be able to provide QoS in the future, there is an ongoing activity within the IEEE that extends the MAC protocol with the necessary parameters (see Table 23.3). At the moment, QoS is difficult to provide, especially when multiple units coexist in the network. The IEEE 802.11 MAC concept also includes a mechanism to solve the hidden-terminal problem. Whether this ready-to-send/clear-to-send (RTS/CTS) packet exchange saves more bandwidth (due to avoided retransmissions) than it needs depends on the terminal density and payload packet length [6]. As the RTS/CTS mechanism is optional and consumes additional bandwidth, it will be assumed in the following that the RTS/CTS mechanism is not used. IEEE 802.11 has a significantly larger power consumption than Bluetooth. Note that this is due not only to the higher transmit power (20 dBm in Europe, 30 dBm in the U.S.) but also to the CSMA concept. IEEE 802.11 units not specifically in sleep status have to listen to the channel all the time (unlike Bluetooth, which listens only at the beginning of the receive slot). Naturally, the higher transmit power allows for a larger range of about 50 m (with 20 dBm). There are six different options for the physical layer implementation of IEEE 802.11: IR: The infrared mode transmits near-visible light at 850- to 950-nm wavelength. The data are pulse position modulated at 1 or 2 Mbps. In principle, the signal needs line of sight (LOS) and cannot go through walls. This and the nonvisibility of IEEE 802.11 IR products are the reasons for not covering the IR mode further in this chapter. FHSS: The frequency-hopping spread-spectrum mode is placed (like Bluetooth) in the 2.4-GHz ISM band. The data are GFSK modulated using two levels for the 1-Mbps and four for the 2-Mbps modulation rates. The FHSS mode divides the 79 hop frequencies into three distinct sets with 26 different sequences each. The hopping rate can be as slow as 2.5 hops/s. Despite its comparably good interference robustness [56], the popularity of the FHSS mode is limited due to its comparably low transmission rates. Note, though, that its principles have been incorporated in the HomeRF standard [11, 29]. DSSS: The direct-sequence spread-spectrum mode is also used in the 2.4-GHz ISM band. The nominal bandwidth of the main lobe is 22 MHz. The transmit power reduction in the first and residual side lobes is supposed to be 30 and 50 dB, respectively (see Figure 23.5 for a measured spectrum). DIFS random back-off
SIFS
random DIFS back-off
SIFS
random DIFS back-off
SIFS
random DIFS back-off
ACK
ACK
ACK
FIGURE 23.4 Principle time behavior of IEEE 802.11 under the distributed coordination function; note that the random backoff timer from nontransmitting units continues after the next DIFS with the remaining number of slots.
© 2005 by CRC Press
Wireless Local and Wireless Personal Area Network Technologies
23-9
FIGURE 23.5 Measured spectrum of an IEEE 802.11b WLAN PCMCIA card.
In principle, 11/13 (U.S./Europe) center frequencies are available for the DSSS system. Nevertheless, using several systems in parallel requires a spacing of 25/30 MHz (U.S./Europe), which consequently only allows three systems to be used simultaneously. The DSSS mode includes the original version (specified in IEEE 802.11) and a high-rate extension (specified in IEEE 802.11b). In the original mode the chipping of the baseband signal is performed with 11 Mz, employing an 11-chip pseudorandom code (Barker sequence). For the 1-Mbps modulation rate, a 1-bit DBPSK symbol is spread with the Barker sequence; for the 2-Mbps modulation rate, a 2-bit DQPSK symbol is spread with the same sequence.* The high-rate extension IEEE 802.11b was, at the time of writing, the most popular and widespread WLAN technology. For the PHY header IEEE 802.11b uses the same 1- and 2-Mbps modulations as the plain DSSS mode. Note, though, that a shortened header of 96 ms can be used. For the IEEE 802.11b PHY payload (consisting of the MAC header and the user data), a 5.5- and 11-Mbps complementary code keying (CCK) modulation is used. The CCK employs a variation of M-ary orthogonal signaling (complex Walsh–Hadamard functions) with eight complex chips in each spreading code word. For the 5.5-Mbps modulation rate, 4 bits are mapped onto 8 chips, and for 11 Mbps, 8 bits are mapped onto 8 chips. PBCC: The packet binary convolution code (PBCC) physical layer is one of three additional possibilities standardized in IEEE 802.11g as an even higher rate extension to IEEE 802.11b in the 2.4-GHz band. In this optional PHY a single-carrier modulation scheme is used that encodes the payload using a 256-state PBCC. The foreseen modulation rates are 22 and 33 Mbps. OFDM: The orthogonal frequency-division multiplexing (OFDM) physical layer was originally designed for 5-GHz bands (also referred to as IEEE 802.11a) but has now been adopted for the 2.4-GHz band (as part of IEEE 802.11g). The parameters of the IEEE 802.11 OFDM PHY had at the time
*Note that in contrast to a typical CDMA system like UMTS, all users use the same spreading code.
© 2005 by CRC Press
23-10
The Industrial Communication Technology Handbook
of standardization been harmonized with those of HIPERLAN/2.* Seven modi are defined ranging from BPSK with a rate of R = 1/2 FEC (lowest modulation rate with 6 Mbps) to 64-QAM with a rate of R = 3/4 FEC (highest modulation rate with 54 Mbps; Table 23.2). The OFDM technique is based on a 64-point IFFT/FFT, while only using 52 of the subcarriers (48 for user data, 4 for pilot carriers). The subcarrier spacing is f = 20 MHz/64 = 0.3125 MHz. Note that full OFDM symbols always have to be transmitted. This means that they possibly have to be filled up with dummy bits. To transmit one OFDM symbol tsym = 1/f + 1/4 · 1/f = 4 ms are needed, with the latter part representing the time used for the guard interval to combat multipath propagation. For synchronization, channel estimation, and equalization, a training sequence is transmitted, which consists of often repeated short and two repeated long OFDM symbols [35, 58]. DSSS-OFDM: This optional physical layer format of IEEE 802.11g combines the DSSS PHY with the OFDM PHY such that for the header DSSS is used while the payload employs OFDM (including the OFDM preamble). Table 23.2 compares the theoretical maximum throughput (TP) values of the different IEEE 802.11 PHY versions after the MAC. The maximum payload length is 4095 bytes (which has to include the 34byte MAC header). Fifteen hundred bytes is the common length of an Ethernet packet (plus 34 bytes for the MAC header and checksum), 576 is a typical length for a Web-browsing packet, and 60 bytes is the length of a Transmission Control Protocol (TCP) acknowledgment. The throughput TP is calculated as follows: TP =
PayBytes ◊ 8 CWmin DIFS ◊ t slot + t data packet + SIFS + t ACK { + 2 243 2t slot + SIFS 14
(23.3)
average back-off
The durations needed to transmit the data packet tdata packet and acknowledgment tACK vary depending on the physical layer chosen. For the FHSS and DSSS modes they are calculated as follows: t data DSSS/FHSS = t PHYh +
34 ◊ 8 14 ◊ 8 PayBytes ◊ 8 + ; t ACK DSSS/FHSS = t PHYh + ModRate ModRate ModRate 1424 3
(23.4)
MACheader
For the OFDM physical layer in the 5-GHz bands, Equation 23.5 needs to be calculated. For the OFDM mode in the 2.4-GHz band, an additional 6-ms signal extension has to be added to both. “Ceil” stands for the next larger integer. Ê 16 + (34 + PayBytes) ◊ 8 + 6 ˆ t data OFDMa = t PHYh + t symceil Á ˜ Ë ModRate / 12 Mbps ◊ 48 ¯ (23.5) t ACK OFDMa
16 + 14 ◊ 8 + 6 Ê ˆ = t PHYh + t symceil Á ˜ Ë ModRate / 12 Mbps ◊ 48 ¯
The packet and acknowledgment durations for the DSSS-OFDM PHY are calculated quite similarly to Equation 23.5:
*Originally HIPERLAN/2 was intended to be the WLAN technology for the European market, while IEEE 802.11 was the pendant for North America. Owing to delays in development, HIPERLAN/2 lost the chance to establish itself on the market, despite its better overall network performance (from which the user would have had the advantage of higher user data rates). Publications on HIPERLAN/2 include [17, 18, 19, 24, 30, 40, 41, 43, 45, 47, 49, 57].
© 2005 by CRC Press
Mode
Frq. Band
tslot
SIFS
CWmin
tPHYh
FHSS
2.4 GHz
50 ms
28 ms
15
128 ms
DSSS
2.4 GHz
20 ms
10 ms
31
192 ms 96 ms
OFDM
a
5 GHz (2.4 GHz)
9 ms (9/20 ms)
16 ms (10 ms)
15 (31)
20 ms tsym = 4 ms
Modulation
ModRate (Mbps)
GFSK (2 level) GFSK (4 level) DBPSK DQPSK CCK (QPSK) CCK (QPSK) BPSK, R1/2
1 2 1 2 5.5 11 6
BPSK, R3/4
9
QPSK, R1/2
12
QPSK, R3/4
18
16-QAM, R1/2
24
16-QAM, R3/4
36
64-QAM, R1/2
48
64-QAM, R3/4
54
TP (Mbps) for PayBytes 60
576
1500
4061
0.29 0.39 0.30 0.40 0.67 0.75 1.51 (1.24/0.83) 1.81 (1.44/0.91) 2.02 (1.55/0.96) 2.25 (1.70/1.01) 2.38 (1.76/1.03) 2.59 (1.86/1.07) 2.64 (1.89/1.07) 2.70 (1.92/1.08)
0.79 1.40 0.80 1.42 3.14 4.54 4.57 (4.29/3.64) 6.32 (5.81/4.67) 7.86 (7.05/5.44) 10.4 (8.97/6.53) 12.3 (10.3/7.22) 15.2 (12.3/8.14) 17.1 (13.7/8.7) 17.9 (14.2/8.90)
0.91 1.72 0.91 1.72 4.26 7.11 5.37 (5.2/4.8) 7.79 (7.43/6.64) 10.0 (9.45/8.21) 14.1 (13.0/10.8) 17.6 (15.9/12.7) 23.7 (20.8/15.6) 28.5 (24.3/17.5) 30.8 (26.0/18.3)
0.96 1.89 0.97 1.89 4.97 9.16 5.76 (5.68/5.49) 8.51 (8.35/7.96) 11.2 (10.9/10.3) 16.3 (15.8/14.4) 21.2 (20.2/18.1) 30.3 (28.4/24.3) 38.4 (35.4/29.3) 42.2 (38.6/31.4)
Wireless Local and Wireless Personal Area Network Technologies
TABLE 23.2 Comparison of Different Achievable Maximum Throughput Rates (in Mbps) for the Different IEEE 802.11 PHY Modesa
Except for the optional DSSS-OFDM and PBCC.
23-11
© 2005 by CRC Press
23-12
The Industrial Communication Technology Handbook
TABLE 23.3 Overview of Activities within 802.11 Group
Subject
Status
a b c d e f g h i j k l m
PHY in the 5 GHz bands High rate mode in 2.4 GHz band Extensions for specific MAC procedures Supplements for new regulatory regions Enhancements for QoS To achieve multivendor access point interoperability Enhancements of 802.11b data rates Extensions for channel selection for 802.11b Enhancements for security and authentication algorithms Enhancements for the use of 802.11a in Japan Definition of radio resource management measurements Nonexistent Maintenance of 802.11-1999
Completed Completed Completed Completed Ongoing Completed Completed Almost completed Ongoing Ongoing Ongoing, initialized in 2003 Ongoing, initialized in 2003
Ê 16 + (34 + PayBytes) ◊ 8+ 6 ˆ t data DSSS-OFDM = t PHYh DSSS + t Preamble OFDM + t symceil Á ˜ + 6ms Ë ModRate /12 Mbps ◊ 48 ¯ (23.6) 16 + 14 ◊ 8+ 6 Ê ˆ t ACK DSSS-OFDM = t PHYh DSSS + t Preamble OFDM + t symceil Á ˜ + 6 ms Ë ModRate /12 Mbps ◊ 48 ¯ For the PBCC mode at 22 Mbps the data packet and acknowledgment durations are given in Equation 23.7. In the case of 33 Mbps, a 1-ms clock switch time has to be added to both: t data PBCC 22 = t PHYh +
(34 + PayBytes+1) 14 ◊ 8 ; t ACK PBCC 22 = t PHYh + ModRate ModRate
(23.7)
For small payload sizes (60 and 576 bytes) the throughput values are not very good. When considering Ethernet packets, the highest theoretical throughput rates are 7.11 Mbps for IEEE 802.11b and 30.8/26.0 Mbps for the OFDM modes. Naturally, these wireless throughput rates are smaller than the wired ones (where 70 to 80 Mbps is possible), but at least for the higher modulation rates with 1500-byte Ethernet packets the throughput values are reasonably good. Note that the real-life throughput values for IEEE 802.11b systems are still smaller than the theoretically possible ones: values around 5 Mbps have been measured [50, 51]. This is because for actual implementations used, higher protocol layers like TCP/IP cause additional overhead and delays. For security, IEEE 802.11 WLANs support several authentication processes, which are listed in the specification (none are mandatory).* Table 23.3 lists the standardization activities involving IEEE 802.11.
23.4.2 Performance Next to aspects like individual link throughput, network capacity, and interference robustness, the transmission environment has to be taken into consideration when contemplating the use of IEEE 802.11 on the factory floor. Because the scenarios envisioned for IEEE 802.11 were placed primarily in homes and offices, some differences occur when looking at the delay spread. While in homes and offices the delay spread is assumed to be <50 and <100 ns, respectively; it takes on values of 200 to 300 ns on factory floors [53].
*Neither Bluetooth nor IEEE 802.11 is renowned for its security concepts, and both have been criticized. To improve the situation, the IEEE 802.11i standardization activities were initiated.
© 2005 by CRC Press
Wireless Local and Wireless Personal Area Network Technologies
23-13
In case of IEEE 802.11b, a conventional RAKE receiver supports (only) about a 60-ns delay spread in the 11-Mbps mode and 200 ns in the 5.5-Mbps version [58]. When employing an IEEE 802.11b system with such a conventional receiver on a factory floor, intersymbol and interchip/codeword interferences (ISI and ICI) are likely to degrade the performance. Nevertheless, with more complex receiver algorithms like those presented in [13, 44], IEEE 802.11b can compensate well for delay spreads of 1 ms and even mobility of the user. When wanting to use IEEE 802.11b on the factory floor, the specific performance concerning the used equipment and environment needs to be measured. Another option, of course, is to use IEEE 802.11a or g. Because of the guard interval inherent in the OFDM technology, delay spreads of several hundred nanoseconds can be easily supported without paying attention to the receiver algorithms implemented [58]. When considering the overall network performance and not just the individual link performance (or interference performance, as discussed in Section 23.6), note first of all that the number of publications presenting well-founded results is very limited. It seems to be general thinking that capacity in an IEEE 802.11-only network is not an issue. The few publications that do exist do not at all support this. In [42] it is shown that co-channel interference with a carrier-to-interference ratio (CIR) of 5 dB still results in a packet loss rate of 10 to 20% (bit error rate [BER] = 10–5), and that with a frequency offset of 5 MHz, still CIR = 3 dB is required to achieve the same result. References [6, 55] present how the aggregate throughput in a single network decreases with the number of users, due to either hidden- or exposedterminal problems or additional RTS/CTS overhead. With only 10 stations [6] or a hidden node probability of 5% [55], the system throughput is about halved in the case of 1500-byte payloads. Only when there are more than 25 stations is the RTS/CTS implementation justified, while the throughput is still reduced. Thus, when installing IEEE 802.11 in a cellular fashion, some kind of frequency planning should be performed. For IEEE 802.11 systems a respective mechanism has to be added (refer, e.g., to [2] for frequency allocation algorithms); for IEEE 802.11a a mechanism is currently being standardized in IEEE 802.11h. Note that the most relevant parameter for WLAN frequency planning is the number of mobile terminals that have to be served. From it the optimum number of access points (APs) and distance between APs can be determined. If it is desired that mobile stations can seamlessly change between APs (due to mobile deployment), handover algorithms have to be added. IEEE 802.11 provides reliable best-effort traffic. The (theoretical) maximum transmission rates for IEEE 802.11a, g, and b are 30.8, 26.0, and 7.11 Mbps, respectively. To achieve network capacity values anywhere near three times these values (three parallel systems are possible for IEEE 802.11b and g; for IEEE 802.11a network capacity is not critical due to the larger-frequency band at 5 GHz), appropriate receivers and some sophisticated frequency planning are needed. The WLAN systems of IEEE 802.11 allow for higher data rates than the WPAN technology Bluetooth. Nevertheless, IEEE 802.11 consumes (significantly) more power than Bluetooth and is not really suitable for speech connections. Currently, IEEE 802.11b chips are also larger in size and more expensive than Bluetooth chips.
23.5 ZigBee 23.5.1 Technical Background The idea behind ZigBee* was to create a very low cost, very low power, two-way wireless communication solution that meets the unique requirements of sensors and control devices needed in consumer electronics, home and building automation, industrial controls, PC peripherals, medical sensor applications, toys, and games. To allow long battery lives of 2 years and more (and thus minimizing the efforts in maintenance), ZigBee supports low data rates at low-duty cycles. Owing to its nominal range of 10 m, ZigBee is considered a WPAN technology [3, 14, 39]. *Note that the technology includes two specifications: the specifications published under IEEE 802.15.4 for the physical and MAC layers as well as the ZigBee specification, which covers the upper layers from network to application layers [12].
© 2005 by CRC Press
23-14
The Industrial Communication Technology Handbook
TABLE 23.4 IEEE 802.15.4 Parameters for the Different Frequency Bands Frequency band Location Number of center frequencies Carrier spacing Gross bit rate Bit modulation Symbol rate Spreading Chip modulation Chip rate
868 MHz Europe 1
915 MHz North America 10 2 MHz 20 kbps 40 kbps BPSK BPSK 20 kbps 40 kbps 15-chips M-sequence BPSK O-QPSK 300 kchips/s 600 kchips/s
2.4 GHz Worldwide 16 5 MHz 250 kbps 16-ary orthogonal 62.5 kbps 32 chips PN code O-QPSK 2 Mchips/s
FIGURE 23.6 Optional ZigBee superframe.
To encourage deployment, ZigBee is also placed in unlicensed frequency bands. Like Bluetooth and the 802.11b and g systems, ZigBee can be used almost globally in the 2.4-GHz band. Additionally, ZigBee has been specified for the ISM bands at 868 MHz in Europe and 915 MHz in North America. To comply with the respective sharing rules and to allow simple analogue circuitry, ZigBee uses DSSS. Table 23.4 gives an overview of the respective physical layer parameters. Note that the maximum user data rate is about 128 kbps. A ZigBee network can have a star or a peer-to-peer topology. Each network needs a PAN coordinator unit, which can handle up to 255 devices in the case of a 16-bit address pad, and even more in the case of 64-bit addressing. Of the two types of ZigBee devices, full function device (FFD) and reduced function device (RFD), only FFD can function as a network coordinator. Direct communication between two RFDs is not possible. RFD packets have to be passed to an FFD first. For medium access, there are in principle three possibilities: CSMA/CA without beacons, CSMA/CA in the contention period of a beacon system, or a guaranteed data rate in the contention-free period of a beacon system. In the case of a beacon system, there are so-called superframes that can be between 15.36 ms and 251.65 s long. If desired, this period is divided between a contention and a contention-free period of limited length (Figure 23.6). The beacons are used for synchronization, for identification, and to describe the structure of the superframe. For security, ZigBee provides authentication, encryption, and integrity services. The developer can choose between no security, an access control list, and a 32- to 128-bit advanced encryption standard (AES) with authentication.
23.5.2 Performance Because of its newness, not much information is yet available on the performance of ZigBee. When comparing the system design choices made for ZigBee with those made for Bluetooth, it can be expected that Bluetooth will have superior interference behavior over ZigBee because of Bluetooth’s spectral selectivity and purity [27]. Furthermore, the frequency-hopping approach of Bluetooth will be responsible for a different range performance. While for Bluetooth there is — with increasing range — likely to be a smooth degradation in throughput, a loss of connection at a certain distance is more probable for ZigBee. Nevertheless, ZigBee has been developed (and should thus be the best choice) for applications
© 2005 by CRC Press
Wireless Local and Wireless Personal Area Network Technologies
23-15
in which the network is static and has many devices of infrequent use that only want to transmit small data packets.
23.6 Coexistence of WPAN and WLAN (Bluetooth and IEEE 802.11b) It would be most desirable for every mobile unit to connect effortlessly using whatever technology is most suitable at the time. Multiple wireless technologies will in the near future coexist: within an enterprise, for example, WLAN technologies could be used for flexible access to large corporate databases, while WPAN technologies could handle specific tasks (and cellular systems the voice communication). This is generally not a problem unless the technologies used are placed in the same frequency band or are linked by the application with each other (e.g., someone uses a Bluetooth headset with the mobile telephone). The two technologies that are placed in the same frequency band* and that are, due to their popularity, quite likely to be used in such a scenario (and possibly even in the same device) are Bluetooth and IEEE 802.11b/g.† Numerous publications cover the mutual interference and performance impairments of the systems (e.g., [15, 22, 25, 28, 50, 51, 63]). Depending on the investigated scenarios, the assessment of the situation varies from “good reliability even in fairly dense environments” to “the effects of interference can be quite severe.” A relative agreement exists in the causes that determine the systems’ performances: link distances (BT–BT, 802.11b–802.11b, BT transmitter–802.11b receiver, BT receiver–802.11b transmitter), traffic load, Bluetooth packet type, density of units, and local propagation conditions. Some of the results are briefly summarized below. IEEE 802.11b requires a CIR of about 10 dB to cope with a (narrowband) Bluetooth hop in its (wide-) passband. IEEE 802.11b has the disadvantage that its backoff procedure was designed to optimize the IEEE 802.11b WLAN performance but not to handle external interferers: each loss of a packet due to a collision with Bluetooth will increase the backoff window size by a factor of 2 (causing an unnecessary throughput reduction). Furthermore, the protocols overlaying WLAN often incorporate TCP, which includes the risk that packet losses on the air link are mistaken for network congestion, which then might initiate a slow start. In contrast, the main disadvantage of Bluetooth is that its transmit power is 20 dB below that of IEEE 802.11b. Another disadvantage is that the BT reverse link packet, which contains the acknowledgment, is transmitted on a different frequency than the forward link packet. This increases the packet loss probability in case of (frequency static) IEEE 802.11b interference. The packet loss rate for Bluetooth PLRBT then yields to PLRBT = PLRforward + (1 - PLRforward) ◊ PLRheader 1444 424444 3
(23.8)
PLRreverse
It would otherwise — were the forward and reverse links to use the same hop frequency — be PLRBT ª PLRforward (depending on the IEEE 802.11b system load). There are in principle three different approaches to assist otherwise interfering systems in coexisting: separation in time, separation in frequency, and separation in space:‡ • Separation in the frequency domain: 1. IEEE 802.11b can be used with an improved transmit filter that reduces the interference power on the side lobe frequencies and thus enhances the separation on those carriers. *Next to other technologies like HomeRF networks [29], RF-ID systems [21], microwave ovens, etc. †Investigations on the coexistence between Bluetooth, IEEE 802.11b, IEEE 802.11g, and ZigBee are not yet available. ‡The approaches separation through code (keyword CDMA), through the channel (keyword MIMO), or through the modulation (I vs. Q) allow the unlinking of several users of the same system. It is not obvious, though, to apply any of these latter methods to improve the coexistence of different systems, which is what is needed here.
© 2005 by CRC Press
23-16
The Industrial Communication Technology Handbook
2. Bluetooth can perform adaptive frequency hopping (AFH), i.e., exclude the most heavily interfered frequencies from its hop sequence. Note that in a lot of realistic situations, where the IEEE 802.11b and Bluetooth units are not in the same device, AFH is sufficient to combat the interference effects. AFH has thus become part of Bluetooth Specification 1.2 [10]. • Separation in the time domain: 1. The IEEE 802.11b carrier-sensing algorithm could consider Bluetooth signals,* and Bluetooth could be extended with a carrier-sensing algorithm [60]. The principle problem of carrier sensing, though, is that to be really effective, the transmitter has to sense the situation at the receiver correctly; i.e., the correlation between transmitter and receiver has to be high. In an uncoordinated WLAN–WPAN scenario, the hidden- and exposed-terminal problems are likely to countermeasure any advantage there might be. 2. A joint scheduler can allot alternating transmit time-shares to both systems in a (to be specified) fair way. This, of course, only works (and with AFH is necessary only) when both systems are in one device, or even on one chip. • Separation in space: 1. Should IEEE 802.11b and Bluetooth coexist in the same unit, intelligent antenna design and placement can optimize the isolation between the Bluetooth and IEEE 802.11b antennas. This will not prevent collisions, but will minimize their impact by maximizing the CIRs for the two systems.† 2. Antenna diversity can help each of the two technologies to individually improve its performance.
23.7 Summary and Conclusions When deciding on a wireless technology to use, you must first clarify the characteristics of the foreseen application: Do you want to move with or within the network? Do you have mobility within a small range or in a larger area? At what speed do the units move? Do you need access to large databases or just locally? Is battery life a critical issue? What maximum distance should the wireless link cover? What distance does it cover on average? As a next step, the existing technologies can be viewed for their applicability. Bluetooth is a quite power-efficient WPAN technology. Like all wireless systems, it cannot provide hard throughput guarantees. Bluetooth is nevertheless quite robust for best-effort traffic in coexistence environments. One hundred Bluetooth piconets can transmit at an average data rate of 95% · 33.2 kbps in an area of 10 ¥ 20 m2. Fifty fully loaded piconets can transmit at an average, unidirectional transmission rate of 360 kbps. Similar results are likely to be achieved on larger factory floors (provided the piconet density is comparable), as the disadvantage of a larger number of units can be outweighed by the more structured and predictable unit location. Of the discussed technologies, Bluetooth is the only one that supports voice in addition to data transmission. IEEE 802.11 is a WLAN technology that enables higher data rates but is not too power consumption friendly. Additionally, in a relatively dense network scenario the maximum aggregate throughput of 3 times 7.11/30.8 Mbps is likely to be seriously impaired. To aid the WLAN performance on a factory floor, it is thus advisable to take the following two measures: apply means to combat the increased delay spread (in case of IEEE 802.11b) and (for all IEEE 802.11 systems) carefully plan the frequency layout and access point placement. *Helpful to minimize the interference power is, of course, power control (when applied by the interfering unit). Nevertheless, even though it can be recommended to implement power control (let alone to save power [52]), in real-life situations it cannot be relied upon that the interfering unit can indeed live with less power. †As has been mentioned before, the IEEE 802.11b specification already provides for the possibility that IEEE 802.11b senses and not transmits if systems other than IEEE 802.11b are active [34, Section 18.4.8.4]. Next to the fact that most implementations do not seem to support this option, the principle problem is that IEEE 802.11b will refrain from transmission when it senses that Bluetooth transmits. To improve coexistence, though, IEEE 802.11b should refrain from transmission when nearby Bluetooth units receive.
© 2005 by CRC Press
Wireless Local and Wireless Personal Area Network Technologies
23-17
The newly emerging ZigBee technology has been designed specifically for sensor data and control information at low data rates. ZigBee supports long battery lives. Nothing can be said yet about the robustness and effectiveness of ZigBee. Yet for low-bit-rate applications in industrial environments, ZigBee seems to be a promising approach. The most efficient measure to aid the coexistence of Bluetooth and IEEE 802.11b is to use Bluetooth with adaptive frequency hopping. If this is not sufficient, hardware-related improvements and a common scheduler have to be added.
References [1] 3rd Generation Partnership Project, Technical Specification Group Access Network. Opportunity Driven Multiple Access, 3G TR 25.924, version 1.0.0. December 1999. [2] A. Hamid Aghvami. Resource Allocation in Hierarchical Cellular Systems. Artech House Publishers, Boston, 2000. [3] Venkat Bahl. ZigBee and Bluetooth: Competitive or Complementary? ZigBee Alliance, September 2002. [4] Daniel Beaumont. Citywide 802.11b networks gain momentum. Planet Wireless, July 2002. [5] Daniel Beaumont. More Bluetooth products but key profiles delayed. Planet Wireless, July 2002. [6] Giuseppe Bianchi. Performance analysis of the IEEE 802.11 distributed coordination function. IEEE Journal on Selected Areas in Communications, 18:535–547, 2000. [7] Chatschik Bisdikian. An overview of the Bluetooth wireless technology. IEEE Communications Magazine, 39:86–94, 2001. [8] http://www.bluetooth.com. [9] Bluetooth Special Interest Group. Specification of the Bluetooth System, Version 1.1. December 1999. [10] Bluetooth Special Interest Group. Bluetooth 1.2 Core Specification. November 2003. [11] Jennifer Bray and Charles F. Sturman. Bluetooth: Connect without Cables. Prentice Hall, Eaglewood Cliffs, NJ, 2000. [12] Ed Callaway. Low Power Consumption Features of the IEEE 802.15.4/ZigBee LR-WPAN Standard. Presentation slides, November 2003. [13] Martin V. Clark, Kin K. Leung, Bruce McNair, and Zoran Kostic. Outdoor IEEE 802.11 cellular networks: radio link performance. In Proceedings of the IEEE International Conference on Communication (ICC), May 2002. [14] Cambridge Consultants. Unleashing Renenue with ZigBee. Presentation slides, 2003. [15] Greg Ennis. Impact of Bluetooth on 802.11 Direct Sequence, Technical Report IEEE 802.11-98/319. IEEE, September 1998. Available at http://grouper.ieee.org/groups/802/Regulatory/Regulatory%20 Filings/831947a-Impact-of-Bluetooth-on-80211.pdf. [16] European Telecommunications Standards Institute (ETSI). ETSI EN 300 328-1, V1.2.2. July 2000. [17] European Telecommunications Standards Institute (ETSI). HIPERLAN Type 2: Data Link Control Specification: Part 1: Basic Data Transport Functions. November 2000. [18] European Telecommunications Standards Institute (ETSI). HIPERLAN Type 2: Data Link Control Specification: Part 2: Radio Link Control (RLC) Sub-Layer. April 2001. [19] European Telecommunications Standards Institute (ETSI). HIPERLAN Type 2: Physical (PHY) Layer. February 2001. [20] Federal Communications Comission (FCC). Code of Federal Regulations. 2002. [21] Klaus Finkenzeller. RFID-Handbuch. Hanser, Munich, 2000. [22] David Fumolari. Link performance of an embedded Bluetooth personal area network. In Proceedings of the IEEE International Conference on Communication (ICC), Helsinki, June 2001. [23] Jeremy Green, Rob Gear, and Nick Harman. Bluetooth: users, applications and technologies. Ovum Report, June 2001. [24] Ajay Chandra V. Gummalla and John O. Limb. Wireless medium access control protocols. IEEE Communications Surveys and Tutorials, 3: 2000. Available at http://www.comsoc.org/pubs/surveys.
© 2005 by CRC Press
23-18
The Industrial Communication Technology Handbook
[25] N. Golmie, R.E. van Dyck, and A. Soltanian. Bluetooth and 802.11b Interference: Simulation Model and System Results, Technical Report IEEE802.15-01/195R0. IEEE, April 2001. Available at http://grouper.ieee.org/groups/802/15/pub/2001/May01/01195r0P802-15_TG2-BT802-11-Model-Results.pdf. [26] Jaap Haartsen. Bluetooth: the universal radio interface for ad hoc, wireless connectivity. Ericsson Review, 3:110–117, 1998. [27] Jaap Haartsen. ZigBee OR Bluetooth or ZigBee AND Bluetooth. Presentation slides, June 2003. [28] Ivan Howitt. IEEE 802.11 and Bluetooth Coexistence Analysis Methodology. In Proceedings of the IEEE Vehicular Technology Conference (VTC), Rhodes, May 2001. [29] http://www.homerf.org. [30] Jörg Huschke and Gerd Zimmermann. Impact of decentralized adaptive frequency allocation on the system performance of HIPERLAN/2. In Proceedings of the IEEE Vehicular Technology Conference (VTC), Tokyo, May 2000. [31] http://www.ietf.org/html.charters/manet-charter.html. [32] http://grouper.ieee.org/groups/802/11/index.html. [33] Institute of Electrical and Electronic Engineering. Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications, ANSI/IEEE 802.11. September 1999. [34] Institute of Electrical and Electronic Engineering. Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications: Higher-Speed Physical Layer Extension in the 2.4 GHz Band, IEEE 802.11b-1999. September 1999. [35] Institute of Electrical and Electronic Engineering. Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications: Higher-Speed Physical Layer in the 5 GHz Band, IEEE 802.11a. September 1999. [36] IEEE-SA Standards Board Bylaws. September 2002. Approved by Standards Association Board of Governors. [37] Institute of Electrical and Electronic Engineering. Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications, Amendment 4: Further Higher Data Rate Extension in the 2.4 GHz Band, ANSI/IEEE 802.11. June 2003. [38] L. Ji, M. Ishibashi, and M.S. Corson. An approach to mobile ad hoc network protocol kernel design. In Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC), New Orleans, September 1999. [39] Patrick Kinney. ZigBee technology: wireless control that simply works. Paper presented at the Communication Design Conference, October 2003. [40] Jamshid Khun-Jush, Peter Schramm, Udo Wachsmann, and Fabian Wenger. Structure and performance of the HIPERLAN/2 physical layer. In Proceedings of the IEEE Vehicular Technology Conference (VTC), pp. 2667–2671, Amsterdam, Fall 1999. [41] Arndt Kadelka, Erkan Yidirim, and Bernhard Wegmann. Serving IP mobility with HIPERLAN/2. In Proceedings of the European Mobile Communications Conference (EPMCC), Vienna, February 2001. [42] Magnus Lindgren. Physical Layer Simulations of the IEEE 802.11b Wireless LAN-Standard. Master’s thesis, Lulea Technical University, Sweden, 2001. [43] Hui Li, Jan Lindskog, Göran Malmgren, Gyorgy Miklos, Fredrik Nilsson, and Gunnar Rydnell. Automatic repeat request (ARQ) mechanism in HIPERLAN/2. In Proceedings of the IEEE Vehicular Technology Conference (VTC), Tokyo, May 2000. [44] Kin K. Leung, Bruce McNair, Leonard J. Cimini, and Jack H. Winters. Outdoor IEEE 802.11 cellular networks: MAC protocol design and performance. In Proceedings of the IEEE International Conference on Communication (ICC), New York, April–May 2002. [45] Hui Li, Göran Malmgren, Mathias Pauli, Jürgen Rapp, and Gerd Zimmermann. Performance of the radio link protocol of HIPERLAN/2. In Proceedings of the IEEE International Symposium on Personal, Indoor and Mobile Radio Communication (PIMRC), London, 2000.
© 2005 by CRC Press
Wireless Local and Wireless Personal Area Network Technologies
23-19
[46] David A. Maltz. On-Demand Routing in Multi-Hop Wireless Mobile Ad Hoc Networks. Ph.D. thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, May 2001. Available at http://reports-archive.adm.cs.cmu.edu/anon/2001/CMU-CS-01-130.pdf. [47] Kirsten Matheus. Wireless local area networks and wireless personal area networks (WLANs and WPANs). In Richard Zurawski, editor, Industrial Information Technology Handbook. CRC Press, Boca Raton, FL, 2004, chap. 38. [48] Ministry of Telecommunications (MKK), Japan. RCR STD-33A. 2002. [49] Göran Malmgren, Jamshid Khun-Jush, Peter Schramm, and Johan Torsner. 6:3 HIPERLAN type 2: an emerging world wide WLAN standard. In Proceedings of the International Symposium on Services and Local Access, Stockholm, June 2000. [50] Mobilian. Wi-Fi TM (802.11b) and Bluetooth TM : An Examination of Coexistence Approaches, 2001. Available at http://www.mobilian.com. [51] Kirsten Matheus and Stefan Zürbes. Co-existence of Bluetooth and IEEE 802.11b WLANs: results from a radio network testbed. In Proceedings of the IEEE International Symposium on Personal, Indoor and Mobile Radio Communication (PIMRC), Lissabon, Portugal, September 2002. [52] Kirsten Matheus, Stefan Zürbes, Rakesh Taori, and Sverker Magnusson. Fundamental properties of ad-hoc networks like Bluetooth: a radio network perspective. In Proceedings of the IEEE Vehicular Technology Conference (VTC), Orlando, FL, September 2003. [53] Bob O’Hara and Al Petrick. IEEE 802.11 Handbook: A Designer’s Companion. Standards Information Network, IEEE Press, New York, 1999. [54] T.S. Rouse, I.W. Band, and S. McLaughlin. Capacity and power analysis of opportunity driven multiple access (ODMA) networks in CDMA systems. In Proceedings of the IEEE International Conference on Communication (ICC), pp. 3202–3206, 2002. [55] Shreyas Sadalgi. A Performance Analysis of the Basic Access IEEE 802.11 Wireless LAN MAC Protocol (CSMA/CA), May 2000. Available at http://paul.rutgers.edu/~sadalgi/network.pdf. [56] André Stranne, Fredrik Florén, Ove Edfors, and Bengt-Arne Molin. Throughput of IEEE 802.11 FHSS networks in the presence of strongly interfering Bluetooth networks. In Proceedings of the IEEE International Symposium on Personal, Indoor and Mobile Radio Commununication (PIMRC), Lissabon, Portugal, September 2002. [57] Johan Torsner and Göran Malmgren. Radio Network Solutions for HIPERLAN/2. In Proceedings of the IEEE Vehicular Technology Conference (VTC), pp. 1217–1221, Houston, Spring 1999. [58] Richard van Nee, Geert Awater, Masahiro Morikura, Hitoshi Takanashi, Mark Webster, and Karen W. Halford. New high-rate wireless LAN standards. IEEE Communications Magazine, 37:82–88, 1999. [59] U.S. city on the verge of being covered completely with wireless Internet. Available at www.web.de, message of February 12, 2003. [60] Bin Zhen, Yongsuk Kim, and Kyunghun Jang. The analysis of coexistence mechanisms of Bluetooth. In Proceedings of IEEE Vehicular Technology Conference (VTC), Spring 2000. [61] Stefan Zürbes, Wolfgang Stahl, Kirsten Matheus, and Jaap Haartsen. Radio network performance of Bluetooth. In Proceedings of the IEEE International Conference on Communication (ICC), New Orleans, June 2000. [62] Stefan Zürbes. Considerations on link and system throughput of Bluetooth networks. In Proceedings of the IEEE International Symposium on Personal, Indoor and Mobile Radio Communication (PIMRC), pp. 1315–1319, London, September 2000. [63] Jim Zyren. Extension of Bluetooth and 802.11 Direct Sequence Interference Model, Technical Report IEEE 802.11-98/378. IEEE, November 1998. Available at http://www.ieee802.org/15/pub/ SG.html.
© 2005 by CRC Press
III Linking Factory Floor with the Internet and Wireless Fieldbuses 24 Linking Factory Floor and the Internet.........................................................................24-1 Thilo Sauter 25 Extending EIA-709 Control Networks across IP Channels..........................................25-1 Dietmar Loy and Stefan Soucek 26 Interconnection of Wireline and Wireless Fieldbuses..................................................26-1 Jean-Dominique Decotignie
III-1 © 2005 by CRC Press
24 Linking Factory Floor and the Internet 24.1 Introduction and Historical Background........................24-1 24.2 Interconnection Approaches ............................................24-3 Protocol Tunneling • Gateways
24.3 Control Data Tunneling....................................................24-6 Network Parameters for Control Data Tunneling • Performance and Application Classes • Services • Gateway Structure • Data Representation
24.4 Gateway Access from the Internet .................................24-13 SNMP • LDAP • Web-Based Approaches
Thilo Sauter Austrian Academy of Sciences
24.5 The Role of Industrial Ethernet .....................................24-16 24.6 Summary..........................................................................24-17 References ...................................................................................24-18
24.1 Introduction and Historical Background If one believes contemporary advertisements and marketing articles in various automation domains, the interconnection of fieldbus systems and the Internet is the topic in automation. More specifically, but less appropriate for marketing purposes, we are talking about a linkage between arbitrary automation systems — which may also involve dedicated automation networks — and networks based on the Internet Protocol (IP). For the sake of simplicity, we will stick with the familiar terms fieldbus and Internet; however, keep in mind that the fieldbus is not a must and that the Internet can also be an intranet or any other network governed by IP and the protocol suite on top of it. What is being advertised as the big benefit of such an interconnection essentially boils down to two major, but nonetheless interwoven aspects: remote access to automation systems and the promise of easy integration of automation data in a user-friendly environment. In other words, there are two reasons why a fieldbus–Internet connection can be beneficial: • To extend the physical dimensions of an automation system. Mostly for historical reasons, the extension of a typical automation network is rather limited due to length restrictions in fieldbus segments and the lack of routing capabilities. If an Internet infrastructure is available, it can be used as a kind of backbone to connect distant segments of the installation. • To provide vertical integration. In the automation domain, this widely used term essentially means bringing information from an automation system into a framework used in the office domain, where it can be used not only for data acquisition, but also for strategic operations such as system management or resource planning. Especially the second aspect is currently the focus of substantial marketing efforts connected to the growing use of Ethernet in industrial automation. However, it must be stated that the idea of vertical integration is nothing entirely new. The roots date back to the 1980s when the computer-integrated
24-1 © 2005 by CRC Press
24-2
The Industrial Communication Technology Handbook
factory level
wide area networks
shop floor level
local area networks
cell controller
cell level process level
PLC
field level
field area networks
CNC
sensors/actuators
FIGURE 24.1 CIM pyramid and network types used in the various automation levels.
manufacturing (CIM) concept was developed. The hierarchical model with typically five levels (Figure 24.1) was an early attempt to structure the information flow within a company. There are many ways to draw this pyramid, and the names of the individual levels differ according to the application area. But the ultimate goal has always been the same: to provide transparent data exchange between the various levels. The first attempts to implement the CIM concept were more or less futile, for a number of mostly technological reasons. On the one hand, the protocols designed for the communication inside the automation system — essentially the Manufacturing Automation Protocol (MAP) as a full-fledged implementation of the International Organization for Standardization (ISO)/Open Systems Interconnection (OSI) model — were too complex. On the other hand, the progress of microelectronics as a technological backbone had not yet been far enough to provide sufficient computing resources at a reasonable price. Therefore, integration of CIM on the field level turned out to be nearly impossible. In addition, fieldbus systems as low-level automation networks were only at a very early stage of their evolution; hence there was, from a networking point of view, a missing link between the actual source of the process data and the already existing company networks. Figure 24.2 shows a few milestones in the evolution of both IT and automation networks and demonstrates how the situation changed over the years [1]. Partly driven by the insight that MAP was too clumsy to use (but still following its basic ideas) and partly driven by application-specific needs, fieldbus systems as dedicated automation networks emerged. They filled the networking gap at the lowest levels of the automation pyramid, providing an anchor point for subsequent integration efforts. The great leap forward for integration itself came with the success of the Internet and, more precisely, with the invention of the World Wide Web (WWW). While in the fieldbus domain, a large variety of approaches still exist even after a lengthy and cumbersome standardization process [2, 3]; the office world is dominated by the Internet Protocol suite and a small number of well-known, widely used applications. The reason why the old idea of vertical integration was revived in recent years was merely a psychological one. As the timeline in Figure 24.2 shows, the Internet and its basic technologies have been available Computer Networks
Fieldbus Systems
ARPANET
Internet ISO/OSI MAP
Ethernet
PROWAY LON Profibus EIB FIP
Modbus
CAN
MIL 1553 Bitbus
Interfaces, Instrumentation, PCB busses
IEC61158 EN50254 EN50170
Interbus
IEEE488
RS485 1980
1990
FIGURE 24.2 Evolution of networks in automation and information technology.
© 2005 by CRC Press
FF
I2C
CAMAC
1970
WWW
MMS
2000
Linking Factory Floor and the Internet
24-3
(and used) for a long time. However, it took the sheer simplicity of the Web browser as a ubiquitous tool to boost the acceptance of the Internet and to secure the dominant role of the IP suite. From the user’s point of view, a Web browser allows access to distant data in a nearly trivial manner. Hence, it is no wonder that the easy navigating through hypertext documents was soon adopted as a model for the remote access to automation data. Consequently, many solutions of fieldbus–Internet connectivity rely on WWW technology and a Web browser interface. Still, the impression that the Internet naturally entails a uniform way of remote access to automation networks is deceptive. The user interface is only one aspect; the underlying mechanisms and data structures are another. In fact, when it comes to implementing an interconnection between IP-based networks and an automation system fieldbus, there is a surprising variety of possibilities even in the so much “standardized” Internet environment. In particular, the Web-based approach is by no means the only one.
24.2 Interconnection Approaches From an architectural point of view, there are two main possibilities to achieve a fieldbus–Internet interconnection, both of which are being used in practice: • Tunneling of one protocol over the other • Providing protocol and service translation via a gateway As far as the network topology is concerned, both approaches are very similar. In both cases, a central access point between the networks is either required or at least reasonable. What differs is the way the information is processed by this link or, more accurately, how the processing is distributed between the devices involved in the communication.
24.2.1 Protocol Tunneling In communication networks, tunneling essentially means that data frames of one protocol are simply wrapped into the payload data of another protocol without any modification or translation. The tunneling approach in the automation context falls into two categories. The currently more common solution is to encapsulate IP packets in fieldbus messages, which also opens a channel for the upper-layer protocols required for, e.g., direct Web access to the field devices [4]. While in most cases not foreseen in the beginning, this possibility has recently been included in several fieldbus systems — a move that has to be seen mainly in connection with the discussion about Ethernet in automation. At first glance, IP tunneling does provide an easy way to achieve vertical integration, as Figure 24.3 shows. The field devices run an IP-based service such as a Web server providing data for a respective application in the Internet, which can directly access the device or process data. A second look, however, reveals that the integration is not all that easy. There are a few critical points to be considered to properly design such a system: Computing resources at the field device: First, it is evident that the field device as the endpoint of a tunnel must be able to run the complete IP stack with all additional protocols required for the particular application. In addition, memory space is required for the actual application program handling the data that are to be accessed via IP. With today’s embedded systems, all this may not be overly problematic, as there are very lightweight protocol implementations available (typically at the expense of detailed error handling). At any rate, this often requires additional hardware at the device and might be a cost factor, especially for simple devices like sensors or actuators. Traffic handling at the access point: The node that connects the IP-based network and the fieldbus has to encapsulate the IP packets and forward them to the appropriate fieldbus node, where they are unwrapped and handed over to the IP stack. To this end, the access point has to act as an IP router. More than that, it has to map the IP connections to the respective fieldbus communication channels. This requires an address translation on the fieldbus side of the access point. On the
© 2005 by CRC Press
24-4
The Industrial Communication Technology Handbook
Client
Client
WeWeb e.g. Browser
WeWeb e.g. Browser
Internet WeRouting IP -Server Fieldbus Interface
Fieldbus
Wrapping WeWeb e.g. Server Browser
Fieldbus Device
Fieldbus Device
FIGURE 24.3 Structure of IP tunneling over a fieldbus.
Internet side of the node, an additional translation can become necessary if the IP addresses in the fieldbus are not public ones, but taken from a private address range, which is the usual network configuration. The access point thus acts as a simple masquerading firewall with network address translation (NAT). Unlike the case where the IP addresses in the fieldbus are public, the fieldbus as a whole is accessible only via one single IP address, and the distinction between the individual devices (e.g., Web servers) hidden behind the firewall can only be done via port forwarding. This means that the services are not reachable through their well-known standard ports, but via dedicated ones that have, of course, to be configured appropriately at the client side in the Internet. Performance issues: IP packets tend to be rather large, whereas fieldbus data frames are usually optimized for the transmission of only small pieces of data at a time. Squeezing IP packets in the small payload data fields of a fieldbus message therefore requires segmentation, which in turn increases the transmission time. An extreme example is Interbus, where IP traffic can be transported in the parameter channel [5]. This channel has only a small capacity of, say, eight bytes per cycle in order not to interfere with the real-time process data, and one byte is needed for control purposes. Depending on the size of the network and the selected baud rate, one cycle may take a few milliseconds. As an IP packet usually is of the size of several hundred bytes, it is clear that the performance is rather limited. In other fieldbus systems, the data frames are large enough to avoid excessive segmentation. At any rate, the efficiency of IP over fieldbus tunneling depends strongly on the fieldbus. The alternative way of tunneling works the other way around. Fieldbus messages are wrapped into IP packets (or those of other protocols such as Transmission Control Protocol (TCP), depending on how the tunnel is set up) and sent to a distant network node (Figure 24.4). From the viewpoint of vertical integration, the disadvantage of this approach is immediately visible. The fieldbus data must be interpreted at the client side, which requires a fieldbus-specific application or at least an experienced user with detailed knowledge about the particular automation system. This approach does not fulfill the requirements of user-friendliness posed by the idea of vertical integration. In fact, fieldbus tunnels over the Internet are rather used to connect remote segments of an installation, as mentioned in the introduction. A wide field of applications for this concept are networkbased control systems, for instance, in building automation [6]. It has to be noted, though, that the interconnection of two fieldbus segments over an Internet tunnel is not fully transparent, but has to cope with timing problems introduced by the processing of the data packets and the transmission delays.
© 2005 by CRC Press
24-5
Linking Factory Floor and the Internet
Clients
Fieldbus App
Fieldbus App
We IP Browser Unwrapping
WeIPUnwrapping Browser
Internet We -Server IP Wrapping Fieldbus Interface
Fieldbus
Fieldbus Device
Fieldbus Device
Fieldbus Device
FIGURE 24.4 Structure of a fieldbus tunnel over the Internet.
Therefore, this concept is useful only if the fieldbus protocol provides some routing functionality, where timing requirements can be relaxed.
24.2.2 Gateways The alternative approach to achieve fieldbus–Internet connectivity is to design the central access point as a gateway. Like with the tunneling approach, the gateway is a full member of the fieldbus on the one side and can be accessed through IP-based mechanisms via the Internet (Figure 24.5). What is different, though, is the way the information exchange is handled. In the IP tunneling approach, the client connects directly to the server running on the field device. The access point just forwards the IP traffic, but is not further involved in the data transfer. In the gateway approach, the access point takes the role of a proxy representing the fieldbus and its data to the outside world. It fetches the data from the field devices using the usual fieldbus communication methods and is the communication partner addressed by the client. Unlike the tunneling approach, where each field device has to provide and maintain its data on an individual basis, information processing for the client in the Internet is centralized here. This enables the gateway to set up a comprehensive and consistent view of the automation system, which is undoubtedly a benefit. In addition, the field devices need no special equipment to process IP-based data and services. Client
Client
WeWeb e.g. Browser
WeWeb e.g. Browser
Internet e.g. Web Server Server Protocol/data translation
Fieldbus Interface
Fieldbus
Fieldbus Device
Fieldbus Device
Fieldbus Device
FIGURE 24.5 Network topology for a gateway-based interconnection.
© 2005 by CRC Press
24-6
The Industrial Communication Technology Handbook
Cross traffic sources
Internet CN/IP Router Fieldbus
IP Device
FIGURE 24.6 EIA-852 system configuration.
For the sake of completeness, it should be noted that a gateway would in principle also allow for the reversal of the communication direction. Therefore, field devices could also access resources on the Internet, e.g., different fieldbus systems connected to the local area network (LAN) via compatible gateways. Although there have been efforts to foresee this possibility by providing harmonized interfaces (as in the NOAH project [7]), it is irrelevant in practice.
24.3 Control Data Tunneling A special case of protocol tunneling is the interconnection of two remote fieldbus segments over an IPbased backbone. As stated before, this has become popular in recent years in control engineering [8, 9]. In this type of application, the performance of the backbone is essential. Specifically, the way the IP channel connecting the control nodes or networks is set up has a direct influence on the system behavior, and modeling of the network is not a simple task [10]. To implement the tunnel and exchange data between the endpoints requires a special protocol. Apart from proprietary solutions and those designed in the context of Industrial Ethernet, one standard that has been designed for this purpose and enjoys a growing user base is EIA-852 [11]. It describes the system components, the communication protocol, and management requirements to establish an IP channel with control devices and routers. A typical EIA-852 topology is depicted in Figure 24.6. The devices in an EIA-852 system can be either purely IP based or regular control network devices connected to the IP network through control network/IP (CN/IP) tunneling routers. EIA-852 control devices communicate directly in a peer-to-peer fashion using a native control network protocol without the need of gateways. The native protocol data units are encapsulated in User Datagram Protocol (UDP) frames and routed to the respective recipients on the IP channel. The choice of UDP transport over TCP has several advantages: First, because of the connectionless nature of control network communication, it would incur unnecessary overhead to manage connection setup in TCP. Second, TCP ensures delivery through retransmissions. Because of the (soft) real-time requirements of control applications, however, retransmissions will most likely miss the deadline, which makes them useless. Third, control network protocols typically implement their own retransmission scheme where necessary. So, on top of the actual IP-based network, this tunneling protocol is a viable solution.
24.3.1 Network Parameters for Control Data Tunneling On the lower layers of the tunnel, i.e., regarding the IP-based network, quality-of-service (QoS) parameters can be used to describe requirements imposed on the network behavior. The origin of QoS definitions was the usage of the Internet to transport multimedia services, specifically the difficulty of using a packet-switched, statistical multiplexing network to transmit stream-based data that normally require directly switched connections. In automation systems, and especially in network-based control, the situation is comparable: formerly direct connections are being replaced by packet-oriented networks that influence the quality of control (QoC) of the implemented application. QoS parameters are a reasonable means to quantify this influence. In principle, QoS refers to the capability to provide resource assurance and service differentiation in a network [12]. In a narrower sense, timeliness and reliability properties
© 2005 by CRC Press
24-7
Linking Factory Floor and the Internet
are often referred to as network QoS. Some typical QoS parameters relevant for the interconnection of distributed control systems by IP-based tunnels are discussed below. 24.3.1.1 End-to-End Delay This is an absolute measure for the total delay packets suffer from the emission by the sender until the reception over the network. It is also known as transit delay. A requirement on the end-to-end delay Di of each packet i can be defined such that the delay Di of each packet i must not exceed a maximum endto-end delay Dmax with a probability greater or equal than Zmin: P(Di £ Dmax) ≥ Zmin
(24.1)
This statistical bound on the end-to-end delay can be easily converted into a deterministic bound by defining Zmin = 1. 24.3.1.2 Delay Jitter The delay jitter Jk is the variance of the end-to-end delay Dk and is a measure for the stochastic part of the delay. It is often defined as Dk = Dmin + Jk
(24.2)
where the deterministic part from Equation 24.1 is subsidized by Dmin and the stochastic part by Jk. Delay jitter is a function of different processing times in intermediate nodes, different queue lengths, and typically a result of network congestion, which in turn depends on the various types of data traffic, like parallel and cross traffic running over the same network [10]. As a result, a packet stream transmitted in equidistant instants in time will be distorted while it traverses the network. Applications may want to bound the delay jitter Ji to a maximum value Jmax with a probability equal to or greater than Umin: P(Ji £ Jmax) ≥ Umin
(24.3)
24.3.1.3 Throughput The throughput defines how many bits per time unit can be transferred over a given network path. In an end-to-end view, the delay can be seen as a direct function of the instantaneous throughput. The practical notion of throughput, however, implicitly includes a certain time interval. If A(t) denotes the aggregate amount of transferred data up to a time t, the throughput can be defined as
Q=
A(t + Dt ) - A(t ) Dt
(24.4)
Throughput is especially important for applications transmitting constant data streams. A requirement for the throughput can be written as P(Q ≥ Qmin) ≥ zmin
(24.5)
Since control network applications usually deal with small data transfers only, this requirement will not be as important as it is for multimedia applications. However, throughput might become an issue over dial-up or other low-speed links. 24.3.1.4 Loss Rate The drop probabilities for different packets on the individual hops can be subsidized by a global loss probability or average loss rate. The most common reason for high loss rates is again network congestion. The probability of complement can be defined as the reliability of the network path. The reliability defines
© 2005 by CRC Press
24-8
The Industrial Communication Technology Handbook
how reliable packet transmission should be on the network. A requirement can define at which minimum probability Wmin a packet should be received, thus P(packet received) ≥ Wmin
(24.6)
24.3.1.5 Packet Ordering Some services rely on order preservation of a sequence of packets. Let be the sequence of transmitted packets and be the sequence of received packets, then the network is order preserving if and only if = , " k
(24.7)
Typically order preservation is a property depending on delay jitter and network topology. If the network provides multiple possible paths connecting any two nodes, control packets may traverse the network in any of those paths. Given that the packets suffer different delays in the paths, they may be reordered even if the intermediate nodes obey a strict first-in, first-out (FIFO) policy. More complex queuing policies (e.g., priority queues) implemented in QoS-enabled routers may even lead to reordering on a single path, due to changing packet priorities.
24.3.2 Performance and Application Classes Another possibility to assess the performance of an IP-based tunnel is from the application point of view. Depending on the application’s needs, the network must satisfy certain requirements. With respect to the packet delay as the most important parameter, we can distinguish several performance classes and derive characteristic features for the delay density f(D). 24.3.2.1 Hard Real Time If an application has hard real-time requirements, any given deadline must be met under all circumstances. Missing a deadline will inevitably result in a catastrophic failure of the system. Although the definition of what actually constitutes a catastrophic failure is not always clear, we can at least find a concise requirement for the delay, which must be upper bounded, i.e., f(D) ∫ 0 for D ≥ Dmax
(24.8)
24.3.2.2 Soft Real Time If missing a deadline does not lead to a catastrophic failure, the application imposes only soft real-time requirements. For our purpose, this definition is too loose and therefore not useful. In fact, either of the following two classes is a subset of soft real time. 24.3.2.2.1 Guaranteed Delivery For this requirement, the actual delay of a packet is irrelevant. What counts is that the packet is definitely received by the addressed node. There are no particular constraints for the delay density; it may not even be bounded (if it were, we would effectively have a hard real-time situation). 24.3.2.2.2 Best Effort This scenario is the weakest one. Here, the network only attempts to deliver the packets as fast as possible, without making guarantees concerning deadlines or successful delivery at all. Formally, this comes down to treating the delay density f(D) as conditional distribution, f(D|X) = f(D) p(X)
(24.9)
where X denotes the successful delivery of the packet, which is independent of the delay distribution. For the simulation model, the conditional distribution has the convenient property
© 2005 by CRC Press
24-9
Linking Factory Floor and the Internet
Application Class
RT
GD
BE
+ – + –
+ ++ – ++
– + + –
Monitoring Open-loop control Closed-loop control File transfer
FIGURE 24.7 Application classes and possible requirements imposed on the network. RT = hard real time, GD = guaranteed delivery, BE = best effort.
Ú
•
f (D | X )dD = p(X = true) < 1
(24.10)
-•
so that the residual probability that a packet is lost in the best-effort scenario is implicitly included. Distributed applications implemented on automation networks typically cope with process data acquisition, processing, and signaling. They can be categorized into several application classes, which can be mapped onto the performance classes introduced above (Figure 24.7). 24.3.2.2.3 Monitoring/Logging This application class allows the viewing or recording of data within a control application. Typically, selected process parameters are sent to a monitoring station where they can be tracked or logged. The QoS requirements can range from hard real-time to no requirements at all (meaning an entirely unreliable network is acceptable as well). Examples are remote power meters, weather sensors, and occupancy detectors. 24.3.2.2.4 Open-Loop Control Open-loop control appears where the process characteristics and disturbances are well known so that feedback paths are not required. The network should provide at least reliable data transfer, but the timing requirements can range from non-real time to hard real time. Multidimensional open-loop controls can pose additional timing requirements originating from the time dependencies of the output values. Control data are typically sent periodically, in which case the overhead for providing guaranteed delivery, for example, by packet retransmission, can be avoided. Examples are light switches, intelligent power outlets, and dot matrix displays. 24.3.2.2.5 Closed-Loop Control Closed-loop control applications are commonly found in automation tasks where control quality and error compensation is required. The controller does not need to be part of the application; it can be an external system such as a human operator. Timing variations influence stability and performance of a closed-loop control, so this application class has at least a demand for soft real-time behavior. Again, multidimensional control is possible and can enforce more strict timing requirements. Examples are heating, industrial control, and power management. 24.3.2.2.6 File Transfer All distributed automation systems need in one form or another the transmission of files or at least larger data blocks instead of process data. The typical application is to configure or parameterize field devices either at system start-up or later during runtime. Sometimes, even executable code is transferred, e.g., to update programs running on controllers. The minimum requirement for the data transmission is reliable and guaranteed data transfer. In rare cases, when the upload of a device configuration is time critical, hard real-time requirements may also apply. 24.3.2.3 Design Considerations for a Gateway There are no general guidelines how to implement a gateway between a fieldbus and the Internet. From the user’s perspective, however, there exists a clear-cut requirement for the integration: as the automation
© 2005 by CRC Press
24-10
The Industrial Communication Technology Handbook
data are brought into the office world, the ways and means employed must comply with the standards used there. In particular, this requires some sort of abstraction for both data and functionality. There is a good reason for this. Within the scope of vertical integration, the automation data are accessed from a strategic level (rather than control level), so the user wants to have a clear picture of the most relevant data. Too many fieldbus-specific details can obscure this picture and be distracting. After all, the user is typically not a fieldbus expert, let alone the one who set up the automation system. Another argument speaking in favor of an abstract view to the automation system is the desired userfriendliness of the interface. Typically, the user wants to retrieve data from different systems with only one tool. Therefore, the user interface should be as independent of the underlying system as possible. It is important to notice that the way data and functionality abstraction can actually be achieved also depends on the protocols and mechanisms used on the Internet side of the gateway. In the sequel, we list and discuss a few design issues. They may vary with the concrete target application. The model for the present discussion is a simple and cost-effective residential gateway for linking a home network to the Internet [13].
24.3.3 Services The basic function of a gateway evidently is to provide read and write access to process and device data on the fieldbus for monitoring and control operations, but also for network management purposes. However, as the gateway has to accomplish protocol (and maybe data) conversion and therefore needs a certain amount of computational intelligence anyway, additional services can be included. With the gateway acting as a proxy server for the automation data, there are a number of autonomous functions it can (or should) provide. Some of these add-on services can in an abstract way be described as events. Events are triggered by predefined criteria: if a criterion is fulfilled, an action is taken. Criteria can be the expiration of timers (time-based events) or the crossing of thresholds by the value of a data point (value-based events). The actions to be performed upon the occurrence of an event can be manifold. 24.3.3.1 Data Logging The storage of data point values based on either time or the recognition of certain trigger conditions enables the gateway to perform stand-alone operations independently of the availability of an online connection to the high-level network. Especially in remote access via unreliable channels (e.g., wireless or heavily congested service providers), the possibility of creating log files (and retrieving them at a later time) may be an essential requirement of the application. A management station can then retrieve logs (fully or incrementally) to perform subsequent analysis or feed specialized software with control system data. 24.3.3.2 Asynchronous Notifications Contrary to the predominant client–server communication model in the Internet, where the client has to continuously poll the server (the gateway) to get information, the gateway should become active as a resonse to certain events. The availability of such a mechanism is an important criterion for the choice of a suitable protocol on the Internet side of the gateway. In practical implementations, the lack of native asynchronous notification mechanisms is usually circumvented by the use of services based on some sort of e-mail. Two other types of services are independent of the fieldbus data and cannot be modeled as events. 24.3.3.3 Security Fieldbus systems are traditionally security unaware. When a connection to the Internet is provided, however, the security of protocols that are able to communicate with the gateway is vital for both the field area networks that the gateway is connected to and the gateway itself [14]. The gateway has to take care of access control, the minimum level of security being authentication.
© 2005 by CRC Press
24-11
Linking Factory Floor and the Internet
24.3.3.4 Network Structure Updates A last point that might be relevant for the gateway depending on the desired application area is some sort of plug-and-play functionality. By this we understand that the gateway is capable of recognizing changes in the network structure autonomously and adapt its internal data point structure. This does not necessarily eliminate the need for preconfiguration, but it may help to keep the gateway operational. A typical application for this feature is residential gateways that — once installed — have to be as autonomous as possible and must cope on their own with devices joining or leaving the field area network. In this case, the underlying fieldbus must of course also support basic plug-and-play features. A key issue here is the ability of the highlevel protocol on the Internet side to reflect structural changes in the field area network: nodes in a field area network can be removed, added, or replaced. This can influence the mapping between protocol addresses and fieldbus addresses. It has to be taken into account how a protocol deals with such changes in the network structure and how objects that existed before a change can be consistently addressed after a change.
24.3.4 Gateway Structure There is no best architecture for a gateway. Its primary function is to act as an abstract interface to an automation system, independent of both the fieldbus protocol (because it operates on the application level) and the fieldbus-specific coding of the data. Therefore, the gateway provides unified access to maybe more than one fieldbus simultaneously. The software structure of the gateway implementation can reflect this demand for versatility and exhibit a modular structure [13, 15]. Alternatively, the architecture can also be monolithic and tailored to one specific case. A modular approach will have to provide abstraction as a matter of course; the three-level approach shown in Figure 24.8 is just one possibility. What has to be considered separately is the low-level access to the fieldbus. One degree of freedom is the strategy used to fetch the data. This can be based on a caching mechanism, where the gateway autonomously retrieves the data with a predefined refresh rate. Upon request from the client, the response is taken from the cache and returned without further delay. This approach provides immediate answers and can reduce the fieldbus traffic, especially in situations where multiple access from different clients needs to be handled. For proper operation, a time stamping should be foreseen, so that the client can keep track of the age of the data values. The alternative is to get the data from the fieldbus only when requested to do so by the client. This avoids cyclic updates as well as time stamps, but the gateway may have to cope with substantial delays of the response from the field device. The connection to the client has to be kept alive in the meantime. Which strategy can be used depends on both the properties of the fieldbus and the capabilities of the interface used. To avoid excessive bus load in the caching scheme, the gateway can passively listen to the High-level access
Internet Front-End(s)
Services and Data Processing, Database
FAN connections
Fieldbus
FIGURE 24.8 Modular three-level gateway structure.
© 2005 by CRC Press
24-12
The Industrial Communication Technology Handbook
network traffic. Provided the fieldbus interface supports such a monitoring mode, this approach is convenient to track cyclic (in systems like Interbus or Profibus-DP) or event-driven (in fieldbuses based on the producer–consumer model like Controller Area Network (CAN) and European Installation Bus (EIB) data transfer. Unfortunately, many of today’s integrated fieldbus controllers perform a low-level data filtering, inhibiting straightforward monitoring. In these cases, the gateway must actively retrieve the data from the bus. The same applies to data that are not normally transferred on a regular basis, such as diagnostic data (e.g., in Profibus-DP). Even worse, the transmission of such specialized information often requires the execution of dedicated commands that cannot be mapped to simple read and write commands. Such fieldbus-specific communication sequences (but also other peculiarities, like remote program invocation or special network management functions) must be handled differently. As the high-level access via the Internet most likely will not provide methods for direct command execution, such special functions can be triggered by an ordinary read or write access to specially assigned data points. Another important restriction may be the medium access control scheme employed by the fieldbus. There are no problems with multimaster systems; here any node can host the gateway. Single-master systems, on the other hand, imply that the gateway must either be the fieldbus master or cooperate with it, in the sense that there is a dedicated and typically nonstandard communication channel through which the gateway can tell the bus master which data values are to be read or written. Otherwise, writing data points will inevitably lead to inconsistencies, when the gateway sets a value and the bus master overwrites it during a regular update cycle.
24.3.5 Data Representation A crucial point in the design of a gateway is the way the fieldbus data are represented. A data point is the smallest information unit available via the gateway. To achieve fieldbus independence, the data points have to be arranged in a rather general, high-level way. The question of how to find a unified view and coding for the sometimes extremely heterogeneous fieldbus objects is an interesting topic on its own and will be left aside here. More relevant for our purpose is the way the complete set of data points is structured inside the gateway. Basically, there are two different possibilities to do so. The first is a function-oriented approach. In this case, emphasis is put on the content of the individual data point, and the underlying network structure is completely irrelevant. The gateway thus offers a simple flat list of data points [15]. The second approach is a structure-oriented one. Here, the data points are arranged according to the structure of the field area network. The gateway offers a list of nodes, and each node can contain one or several data points [13]. In addition to its simple form, a data point can also contain a set of values instead of only a single value. This type of data point is referred to as an aggregate data point. Data points can also contain multiple values of different types (structures). The advantage of the structure-oriented solution is that properties of nodes (and not only properties of data points) can also be modeled on the gateway. These properties are, for example, a self-identification of the node or a location information that describes the position of a node in a field area network. Which of the two possibilities that is actually used is a matter of taste and also depends on the target application. Furthermore, the choice might be limited by the high-level access protocol and its capabilities of describing and managing data. Every object that is available to a client needs an address that uniquely identifies the object. Protocols typically define their own addressing scheme; the gateway, on the other hand, also defines an addressing scheme that may be similar to or completely abstract from the fieldbus network addressing. These two schemes need to be mapped onto each other to make data points available. Another point to be observed is how the information of a data point is encoded. This again might be limited by the capabilities of the way the data are transferred from the gateway to the client. Not every protocol may support special data types or encodings used in a particular fieldbus, so that there must be some form of data translation and — necessarily — abstraction. This might not be too problematic for scalar data points like temperatures (although there are obviously many possibilities to encode a temperature value), but it will get increasingly difficult for such common things as date and time, not to mention aggregate data points available only as a structure on the fieldbus level. For all these more
© 2005 by CRC Press
Linking Factory Floor and the Internet
24-13
complex data points, no one-to-one mapping is possible, and there are basically only two options to cope with the situation: (1) find a compromise in the form of abstract data representations that omit excess information, or (2) collapse the complete data point into a flexible and unspecific data format such as a string — with the disadvantage that the client needs to take the string apart and interpret it, which again requires fieldbus-specific knowledge. In addition to the actual value, a data point may also need some attributes to facilitate interaction with the client. Such attributes can be a description of the data point type and encoding (to tell the client how the data are to be handled) and maybe a string providing a textual description of the data point. In some fieldbus systems, such a self-description is foreseen for data objects anyway. Last but not least, there can be a specification of the access mode, i.e., what the client is allowed to do with the data point. It should be noted that although the restrictions concerning data representation have been discussed for the gateway approach, they also apply to the IP tunneling solution. In this case, the list of data points has to be maintained by every field device on its own. Typically, the implementation is up to the vendor, and unless there are very stringent rules, this may lead to a substantial inhomogeneity stemming from the many degrees of freedom mentioned above.
24.4 Gateway Access from the Internet The primary goal of vertical integration is to bring automation data into an IT context, and consequently, there are many ways to incorporate data provided by a gateway into office-level applications. To this end, it is not necessary to employ full-fledged application-level protocols on top of IP/UDP/TCP. A very common approach used in contemporary SCADA (supervisory control and data acquisition) systems is object linking and embedding (OLE) for Process Control (OPC), which relies on the Windows-specific mechanism OLE and component object model (COM)/distributed COM (DCOM) [16]. With the extension OPC DX, it is in principle even possible to interconnect different fieldbus systems because OPC DX also permits server–server communication over an Ethernet network [17]. A more unconventional idea is to treat the gateway as a database and use structured query language (SQL) to retrieve the data [18]. The protocols discussed in the sequel have been evaluated with respect to their feasibility for home network access, specifically for remote meter reading [14]. The criteria for the evaluation were described in the previous section: a reasonable way of structuring and handling data, a possibility to send asynchronous notifications, and, of course, security provisions. In addition, the protocol (together with its data representation) should be able to cope with changes in the network structure. This requirement is a direct consequence of the target application in an intended mass market with a high number of installations, where updates calling for specially trained personnel are not affordable. Likewise, the overall costs of such a gateway should be low, which typically means limited computing power and memory space. Compact implementations on possibly embedded devices are therefore a must.
24.4.1 SNMP Simple Network Management Protocol (SNMP) is a comparatively old protocol and has been designed for remote administration of networked devices [19]. Data are represented in a tree structure called management information base (MIB) where each branch and each leaf have a unique identifier, called object identifier (OID). SNMP managers (i.e., the client that connects to an SNMP-enabled device) use this MIB to get extended information about the managed agent (i.e., the device that allows SNMP access, in the present case, the gateway). The MIB is a static representation of all the objects available at the managed agent. As SNMP does not provide mechanisms to download the MIB from the agent, the information of which OID stands for which data point needs to be transferred to the manager over mechanisms that are out of scope of SNMP (usually this is done by the administrator of the system during setup). SNMP is based on the connectionless User Datagram Protocol (UDP) running on top of IP. This is rather efficient because of the low overhead, but unlike TCP, UDP does not guarantee delivery of packets, so packets can be lost on the network without being noticed by the sender.
© 2005 by CRC Press
24-14
The Industrial Communication Technology Handbook
The major commands provided by SNMP are GET and GETNEXT for reading data (GETNEXT allows the manager to traverse an unknown MIB tree and to retrieve the OIDs) and SET for writing to data points. In contrast to the standard manager-driven communication (in typical client–server fashion), SNMP also defines the unacknowledged TRAP command (from version 2 on, also the acknowledged INFORM command), which can be sent by an agent to asynchronously inform a manager about special conditions in the agent. This possibility to reverse the client–server communication is a rare exception among the investigated protocols and, together with the good protocol efficiency, is a strong argument in favor of SNMP. One of the big problems of SNMP is the static structure of the MIB together with the lack of native transfer mechanisms between agent and manager, which does not allow dynamic updates of the network (or data point) structure. Therefore, the list of nodes and data points needs to be stored in an SNMP table rather than as a set of separate objects in the MIB [20]. To access individual objects, the manager needs the OID of the table, which is fixed once and for all, and the number of the row containing the object. If the structure of the field area network is changed, the table can be updated by adding or removing the rows that represent the according nodes. This is a critical issue in that SNMP requires the rows in an SNMP table to be numbered continuously; therefore, the insertion or removal of a row changes the numbering of all subsequent rows. The workaround of using tables to add a somehow dynamic behavior also raises problems if the gateway is accessed by more than one client simultaneously. It is possible that data in a table are modified by one client (e.g., rows are deleted or added) while another client is in the process of reading the table. This results in an inconsistent set of data. A possible solution used in the case study was to introduce a sequence number for every table. This sequence number changes every time data in the table change. The workaround for achieving data consistency is to read the sequence number, read the table, and afterwards read the sequence number again to check if it changed. If it remained unchanged, the table data are consistent; otherwise, the whole process has to be repeated. To this end, for write access the protocol had to be slightly changed in a way that every SNMP SET command contains a sequence number that uniquely identifies the version of the table to which a data point is written. Another problem of SNMP is security. The commonly used version of SNMP is version 1, which does not contain any considerable security features. In the most recent version, version 3, security mechanisms have been introduced that are sufficient (though not widely used at present) [21]. Still, difficulties arise from the connectionless UDP used as the transport protocol. UDP traffic is usually not permitted to pass through a typical firewall. But even if communication is possible, the underlying UDP does not guarantee packet delivery. This means that an SNMP manager communicating with an agent cannot rely on the last action he has taken on an agent to be successfully finished. Neither can an agent be sure about the status of a communication with a manager. Hence, error monitoring must be implemented within the manager application.
24.4.2 LDAP Lightweight directory access protocol (LDAP) [22] is the lightweight counterpart to X.500 Directory Services. It is oriented toward directories that are organized in a tree, the directory information tree (DIT). The DIT is a hierarchically structured organization of data items, the directory entries. Every entry consists of a collection of attributes. Every attribute has an attribute name, an attribute type, and a value associated with it. One special attribute that is mandatory for every tree entry determines the position of the entry in the tree — the distinguished name (DN), which is used as a unique address for the entry. It specifies the path by listing all parent entries up to the root entry. The directories contain mostly (but not necessarily) small pieces of data, which are mainly read and searched for. Writing of data is intended to occur only rarely, as well as changes in the tree structure. LDAP offers a set of data querying and manipulating commands for accessing the directory service. The SEARCH command is very powerful and allows the specification of search filters that are applied to each entry. It is possible to request just a subset of the available attributes, and in order to narrow a search even
© 2005 by CRC Press
Linking Factory Floor and the Internet
24-15
more, the scope can be limited to a certain level of the directory tree (search only the current level, one level deeper, or the whole subtree). With the commands ADD, DELETE, and MODIFY, single entries can be added, deleted, or modified, respectively. The COMPARE command is used for comparison of two entries. Since LDAP uses TCP as the transport protocol, there are no problems with firewalls. Also, all security extensions designed for TCP-based protocols such as the transport layer security (TLS) [23] can be used. LDAP version 3 also provides appropriate authentication mechanisms in combination with those already defined in TLS [24]. Dynamic update of the directory information tree is also not particularly problematic, although the changes are not automatically taken over by the client. The DIT as such is not confined to a static structure like the SNMP MIB, which facilitates the adding and deleting of entries even at runtime. Consequently, there is also no problem with the consistency of data, because the address of a data point can be added as an attribute to the directory entries to unambiguously identify the data points. As operations on a DIT entry are always atomic, simultaneous and consistent access to one single data point from multiple clients is inherently handled by the built-in mechanisms of the protocol. A general difficulty of LDAP is that its original application idea does not exactly fit the purpose of fieldbus gateway access. The main application of LDAP is the access to data that are related to humans, e.g., telephone numbers, employee data, or classical white pages services. The client support reflects this field of application, and although clients exist, they are not widely used and are very different in their performance with respect to network load. A second point is that the access to white pages directories is typically read-only. Therefore, the powerful search command can provide a convenient means to retrieve fieldbus data. Writing, on the other hand, is more difficult and likely becomes a bottleneck. Hence, LDAP is not so well suited for control tasks where intensive write access is required. Another problem is the lack of an asynchronous notification mechanism, so that workarounds (like e-mails or the like) have to be used.
24.4.3 Web-Based Approaches For many solution providers in the area of fieldbus–Internet connectivity, Web-based approaches seem to be the only conceivable solution, at least they are strongly favored. However, unlike the protocols describe so far, methods based on the WWW are very heterogeneous. They include communication and data representation using Web browsers as clients. The common transport protocol is the Hypertext Transfer Protocol (HTTP), which is typically used to transmit Hypertext Markup Language (HTML) documents. HTML documents can contain embedded objects and script languages (e.g., JavaScript) and can be created by multiple technologies on the server side (such as CGI, ASP, and JSP*). They all have in common that the client receives a predefined view of a set of data. This view can be modified by the user (e.g., by selecting specific objects in a page and filtering out the others), which requires the creation of a new document on the server side. This process is usually not very responsive because of the necessary communication and computation overhead. The two main commands that HTTP supports are GET (for sending requests to a server to retrieve data) and POST (to send data to the server). This, however, does not cover the whole spectrum of possibilities that are based on top of this mechanism. In many cases, though, the user is presented with a view based on a template, which is filled with data by some server mechanism. A definite strength of Web technologies is the availability of many different clients. A lot of Internet browsers for multiple platforms exist, leaving hardly any white space on the map of supported systems. The main problem of pure HTML is the response behavior that is imposed by the necessity to refresh (i.e., completely reload) a page in order to receive current data (this can be circumvented by advanced mechanisms, which only retrieve, for example, the data point value). There is also another performance problem with this approach: as the view on the data depends on HTML documents, typically several data points will be collected on one page to have a better overview. Therefore, even for very small amounts of data (for example, the value of just one data point), a complete document needs to be created and *CGI, Common gateway interface; ASP, active server pages; JSP, JavaServer pages.
© 2005 by CRC Press
24-16
The Industrial Communication Technology Handbook
transmitted. This causes a considerable amount of both computational overhead and network traffic. Likewise, it is not possible to collect data from multiple servers in a single document, which considerably limits the use of this approach. In an IP tunneling scheme, for example, every device providing data needs to maintain a set of HTML pages, and the client needs an instance of the browser for each device. An integrative view of the entire installation, let alone an automatic data acquisition, is not feasible and has to be done manually. A positive point about the Web approach is the comparatively unproblematic security issue. As HTTP is based on TCP, TLS can be used. Furthermore, HTTP is conveniently routable through firewalls, which is an important point in an environment becoming increasingly aware of security — although it has to be noted that the simple routing possibility actually opens a way to circumvent security. A drawback is that there is no way for the server to send data to the client without request. Like with all other protocols except SNMP, other ways have to be found to accomplish such notifications (e.g., special applets or e-mail-based services). Also very poor is the support of different data types. In fact, the data have to be sent as strings and processed appropriately by the client. This lack of ample data type support ideally supports (and also demands) pragmatic, proprietary solutions, but is a severe obstacle to standardized, interoperable implementations. A possible way out of this dilemma is the increasing use of the Extensible Markup Language (XML). If the data are based on (standardized) XML descriptions, they are not only readable for a human operator (a severe drawback of HTML), but also machine parsable, which makes gateway access based on Web technologies finally also suited for automatic data acquisition. A further step in this direction could be the XML-based Simple Object Access Protocol (SOAP), which defines methods for data exchange typically used over HTTP. A tremendous benefit is an utmost platform independence, as both HTTP and XML are available everywhere. However, as SOAP is a protocol on its own, client applications are still needed; a Web browser today is not sufficient. A potential weak point is, as with the somehow related technology CORBA (Common Object Request Broker Architecture), the relatively complex structure, which does not lend itself to a lightweight implementation on those resourcelimited devices that today make up a significant portion of a typical automation system.
24.5 The Role of Industrial Ethernet A discussion of fieldbus–Internet connectivity is incomplete without consideration of the growing use of Ethernet in automation. After all, Industrial Ethernet is often praised as the solution to vertical integration, overcoming all problems encountered in the design of heterogeneous network interconnections. Beyond all well-sounding marketing slogans, it is a matter of fact that the dominant use of Ethernet in local area networks has renewed efforts to employ Ethernet also in the automation domain. These efforts date back to the early days of Ethernet development, but only recent technological advances in the development of full-duplex and switching techniques made high data rates possible without the old drawbacks of the carrier-sense multiple access with collision detection (CSMA/CD) medium access control. Although it is still disputed whether Ethernet really fits the requirements of a field-level network, the main benefit is clear: if Ethernet were used as a replacement for the fieldbus, and if IP plus the common transport protocols were used on top of it (which is not necessarily the case in all Industrial Ethernet approaches), Internet technologies would be the natural way of accessing automation data. It is thus not astonishing that many contemporary concepts of vertical integration rely strongly on Ethernet as an automation network. For the old CIM pyramid, the penetration of LAN technologies into the classical fieldbus domain has the ultimate consequence that intermediate levels become obsolete. The hierarchy thus gradually turns into a flatter structure with only two, maybe three, levels. How do the facts of Industrial Ethernet compare with the characteristics of vertical integration discussed so far? It is, of course, shortsighted to postulate that the use of Ethernet alone entails an interoperability with the office world — although it is often done. In fact, Ethernet is but a transport medium. Using Ethernet for both office and automation applications may allow for a coexistence on a shared medium (within certain limits posed by, e.g., bandwidth considerations), but does not mean integration. What is much more important to this end is the protocol stack employed on top of Ethernet. Indeed, most
© 2005 by CRC Press
Linking Factory Floor and the Internet
24-17
solutions rely on the Internet and LAN standards TCP/UDP/IP as a transport layer; however, this is already where compatibility ends. The higher layers of all serious approaches are in fact fieldbus protocols: • Ethernet/IP (IP standing for Industrial Protocol) uses the Control and Information Protocol (CIP) already known from ControlNet and DeviceNet [25]. This application layer protocol is simply sent over TCP or UDP, depending on whether configuration or process data have to be transmitted. • The high-speed Ethernet (HSE) variant of Foundation Fieldbus is in fact an application of the existing Fieldbus Foundation’s H1 protocol wrapped in UDP/IP packets [26]. • Modbus/TCP [27] is based on standard Modbus frames encapsulated in TCP frames. • PROFInet is rather a new protocol development [28]. Its integrability into the IT environment relies on the COM/DCOM mechanism used for communication between devices. Legacy fieldbus systems can be included via so-called proxies (which are essentially gateways in our sense). In light of our taxonomy sketched in Section 24.2, all these approaches (with maybe the exception of PROFInet) essentially implement some sort of fieldbus protocol tunneling approach. Of course, thanks to the possibility of using the same medium for office and automation tasks, they have the benefit of a tight integration from a network point of view. Unless a traffic separation seems reasonable for performance or security reasons, no dedicated connection node is necessary. On the application level, however, integration is by no means so tight. Even though the inclusion of automation data into office applications is simplified by, e.g., object-oriented techniques borrowed from the IT world, there is still a substantial amount of fieldbus-specific information to be handled. Consequently, an abstract view on the automation data — one of the requirements for vertical integration — is not a matter of course. Nor is an automatic inclusion of Industrial Ethernet into the office world easy to achieve. In fact, the effort that needs to be spent on a gateway design has in this case to be put into the development of appropriate nonstandard application-level tools. The only exception where Ethernet together with standard protocols definitely brings an advantage is a Web-based human machine interface used, e.g., for configuration purposes. But this is only one small aspect of automation.
24.6 Summary The classical fieldbus systems have been devised as highly specialized communication systems for the factory floor. Their development, though innovative at that time, was aimed at conventional self-contained automation applications. Today’s automation tasks, however, are usually no longer stand-alone systems. At least supervisory control and data acquisition are required in many cases, with the respective SCADA software running in a typical office environment as regards hard- and software platform or network environment. More advanced demands are remote data access for monitoring or maintenance, or the inclusion of automation data into company-wide management systems. All these functions are in fact not part of the factory floor or the process control level; they belong to a higher, more strategic level of information processing, which is governed by office networks. This circumstance calls for an interconnection between automation and office networks. The possible realizations are as different as the goals of interconnection. Although the systems on either side are standardized, the link is not — not even in a coarse architectural sense. One reason is the still existing plethora of fieldbus systems. But also on the Internet side, the Internet Protocol suite is just a least common denominator of an equally large set of possible applications serving as targets for the interconnection. The preceding sections have highlighted the basic system architectures and the many design options to be considered. Evidently, the properties of the networks together with the applications have an important influence on these options and, in fact, limit the designer’s flexibility with respect to the implementation. Contrary to an initial guess, the architectural approach (i.e., gateway or tunneling) does not seem to affect the overall system complexity very much. As the discussion showed, it rather influences the distribution of the implementation efforts among the involved entities: field devices, clients on the Internet side, and a possible central access node. The particular choice must largely depend on the
© 2005 by CRC Press
24-18
The Industrial Communication Technology Handbook
intended application; there is no golden rule for it. In fact, all approaches have their own strengths and shortcomings. Even though the major part of the article discussed the subject from the viewpoint of a gateway solution, this does not imply that the tunneling approach is inferior. It is simply suited for slightly different types of application, and most of the points described for a gateway are equally valid. The overall complexity of the interconnection is — according to practical experience — chiefly determined by the desired level of data and functionality abstraction and the degree of vertical integration to be achieved. A direct forwarding of fieldbus data packets “as is,” with the interpretation left up to the client (and, ultimately, the user), is easier to achieve than a reasonable data and protocol translation with the aim of a smooth integration into a non-fieldbus-specific application. Not always is true vertical integration and abstraction really an issue. But if it is, the price to be paid is necessarily a loss of precision, mostly from a functional point of view. The higher the abstraction level, the fewer functions and specific pieces of information that can be mapped to the other side without problems. Timing precision or realtime behavior should not be a requirement at all because timing relations are typically lost in the node linking the two different networks. Time stamps can help, but only to a certain extent. The great flexibility of fieldbus–Internet connections sketched in this article is a bit deceptive. In practical implementations, one has to live with a number of inconveniences. Timing is just one. Especially if one goes for vertical integration, the high-level target application and the underlying protocol may not support services provided by the automation system. This frequent lack of a one-to-one relation between the two networks requires some creativity from the designer, especially on the Internet side. A good example is the basic gateway services discussed in Section 24.3. Vital mechanisms such as asynchronous notifications rarely exist as native protocol features and have to be replaced by workarounds outside the actual application. In a number of case studies, it was found that none of the investigated high-level protocols really satisfy all needs. In this respect, the large number of different interconnection approaches is not astonishing, but only reflects the fact that the one ideal solution for all occasions does not exist.
References [1] D. Dietrich and T. Sauter, Evolution potentials for fieldbus systems, in IEEE International Workshop on Factory Communication Systems, Porto, Portugal, September 2000, pp. 343–350. [2] M. Felser and T. Sauter, The fieldbus war: history or short break between battles?, in IEEE International Workshop on Factory Communication Systems (WFCS), Västerås, Sweden, August 2002, pp. 73–80. [3] J.P. Thomesse, Fieldbuses and interoperability, Control Engineering Practice, 7, 81–94, 1999. [4] M. Wollschlaeger, Framework for Web integration of factory communication systems, in IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Antibes Juanles-Pins, France, October 2001, pp. 261–265. [5] M. Volz, Quo vadis Layer 7? The Industrial Ethernet Book, Issue 5, Spring 2001, http://ethernet. industrial-networking.com/articles/i05quovadislayer7.asp. [6] S. Soucek, T. Sauter, and T. Rauscher, A scheme to determine QoS requirements for control network data over IP, in 27th Annual Conference of the IEEE Industrial Electronics Society (IECON), Denver, CO, November–December 2001, pp. 153–158. [7] U. Döbrich and P. Noury, ESPRIT Project NOAH: introduction, in Fieldbus Technology, Springer, Vienna, 1999, pp. 414–422. [8] A. Ray, Introduction to networking for integrated control systems, IEEE Control Systems Magazine, 9, 76–79, 1989. [9] F.L. Lian, J.R. Moyne, and D.M. Tilbury, Performance evaluation of control networks: Ethernet, ControlNet and DeviceNet, IEEE Control Systems Magazine, 21, 66–83, 2001. [10] S. Soucek, T. Sauter, and G. Koller, Effect of delay jitter on quality of control in EIA-852-based networks, in 29th Annual Conference of the IEEE Industrial Electronics Society (IECON), Roanoke, VA, November 2003. [11] EIA-852 draft, Tunneling of Component Network Data over IP Channels, April 2000.
© 2005 by CRC Press
Linking Factory Floor and the Internet
24-19
[12] Zheng Wang, Internet QoS: Architectures and Mechanisms for Quality of Service, Morgan Kaufmann Publishers, San Francisco, 2001. [13] G. Pratl, M. Lobachov, and T. Sauter, Highly modular gateway architecture for fieldbus/Internet connections, in IFAC International Conference on Fieldbus Systems and Their Applications, FeT 2001, November 2001, Nancy, France, pp. 267–273. [14] P. Palensky and T. Sauter, Security considerations for FAN-Internet connections, in IEEE International Workshop on Factory Communication Systems, Porto, Portugal, September 2000, pp. 27–35. [15] M. Kunes and T. Sauter, Fieldbus-Internet connectivity: the SNMP approach, IEEE Trans. Ind. Electronics, 48, 1248–1256, 2001. [16] http://www.opcfoundation.org. [17] F. Iwanitz and J. Lange, OPC: Fundamentals, Implementation and Application, 2nd ed., Hüthig, Heidelberg, 2002. [18] M. Lobashov, G. Pratl, and T. Sauter, Applicability of Internet protocols for fieldbus access, in IEEE International Workshop on Factory Communication Systems (WFCS), Västerås, Sweden, August 2002, pp. 205–213. [19] W. Stallings, SNMP, SNMPv2, SNMPv3, and RMON 1 and 2, 3rd ed., Addison-Wesley, Reading, MA, 1999. [20] D. Perkins and E. McGinnis, Understanding SNMP MIBs, Prentice Hall, Upper Saddle River, NJ, 1997. [21] U. Blumenthal and B. Wijnen, RFC 2574: User-Based Security Model (USM) for Version 3 of the Simple Network Management Protocol (SNMPv3), The Internet Society, April 1999, http:// www.rfc-editor.org. [22] S. Kille, W. Yeong, and T. Howes, RFC1777: Lightweight Directory Access Protocol, The Internet Society, March 1995, http://www.rfc-editor.org. [23] T. Dierks and C. Allen, RFC 2246: The TLS Protocol Version 1.0, The Internet Society, January 1999, http://www.rfc-editor.org. [24] J. Hodges, R. Morgan, and M. Wahl, RFC 2830: Lightweight Directory Access Protocol (v3): Extension for Transport Layer Security, The Internet Society, May 2000, http://www.rfc-editor.org. [25] P. Brooks, Ethernet/IP: Industrial Protocol, in IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Antibes Juan-les-Pins, France, October 2001, pp. 505–514. [26] http://www.fieldbus.org/. [27] Schneider Automation, Modbus Messaging on TCP/IP Implementation Guide, May 2002, http:// www.modbus.org/. [28] http://www.profibus.com/.
© 2005 by CRC Press
25 Extending EIA-709 Control Networks across IP Channels 25.1 25.2 25.3 25.4
Introduction and Overview..............................................25-1 EIA-852 Standard ..............................................................25-2 System Components .........................................................25-4 Data Communication .......................................................25-5 Encapsulation • Packet Sequencing • Packet Aggregation • Stale Packet Detection • EIA-852 Data Packet Routing
Dietmar Loy LOYTEC electronics GmbH
Stefan Soucek LOYTEC electronics GmbH
25.5 Management ....................................................................25-11 25.6 Security ............................................................................25-13 25.7 Applications .....................................................................25-14 25.8 Conclusions .....................................................................25-16 References ...................................................................................25-16
25.1 Introduction and Overview Control networks, also known as fieldbus systems, started to emerge rapidly in the early 1990s [2, 5]. Back then control networks were designed to be used in closed systems exchanging data packets on dedicated network channels. These channels included twisted-pair media, power-line networks, wireless communication, and infrared communication. Dedicated network channels have more or less welldefined properties for bit error rates, propagation delays, and average and maximum response times, and they guarantee to maintain the packet order on the network. With the upcoming hype of the Internet in the late 1990s, requests were made to extend control networks to not only span local communication within relatively small areas, but to cover networks that are spread out over cities, countries, and even continents. In building automation, facility managers wanted to connect facilities to a central location for monitoring, logging, alarming, trending, and remote maintenance [1]. This meant the engineers had to either go back to the drawing board and redesign the control network protocol to make it work in large networks, or use the existing technology like the Internet Protocol (IP) service available on the Internet to transport control network data packets across continents [3]. Of course, there are still the restrictions regarding defined maximum response times, proper packet ordering, and defined packet loss — all properties that are easily manageable on dedicated network channels, but can be more or less unknown or undefined in IP networks. When making the transition from dedicated networks to IP channels, the major engineering challenge is to introduce a layer of software that manages to hide the different behavior of the IP network and ensures that certain control network properties still work by filtering invalid packets (e.g., stale packets).
25-1 © 2005 by CRC Press
25-2
The Industrial Communication Technology Handbook
Application 1
Application 2
Application n
Control Network Protocol Stack (e.g. ANSI/EIA -709)
ANSI/EIA -852
TCP/IP, UDP
Software Hardware Ethernet MAC/PHY
L-Chip
Twisted Pair
Ethernet
FIGURE 25.1 A dedicated software layer named ANSI/EIA-852 hides limitations of the IP transport service from the control network protocol stack and encapsulates the control network data packet into an IP frame.
Figure 25.1 shows the software architecture for a typical fieldbus node connected to a dedicated network channel on the left side and a port using the IP transport service on the right side. One can see the software layer called ANSI/EIA-852 [4], which is added between the control network protocol stack and the Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) socket interface. This layer is responsible for abstracting the IP network. It implements the necessary mechanisms to encapsulate control network packets in order to convey them transparently over the IP network. In this chapter, we concentrate on the ANSI/EIA-709 control network protocol [1]. However, the technique described here will be also used for EIA-600 and other control network protocols. The challenge in tunneling control network data packets through IP networks is to minimize the effect of the unpredictable IP timing parameters on the control network protocol stack. This is a fundamentally different approach than gateways, which try to abstract some application-oriented part of the fieldbus [7, 8].
25.2 EIA-852 Standard The EIA-852 standard [4] was created to ensure interoperability of control network (CN) devices, which communicate over an IP network using a native control network protocol. The basic idea is to exchange CN packets over an IP network by embedding them in IP packets. This technique is known as tunneling. In EIA-852, there are no application layer transformations involved, as in other approaches, which use gateways. The tunneling approach is beneficial in systems where a number of CN devices communicate and use the IP network purely as another medium. One major design criteria of EIA-852 is to be generic enough to apply it to different CN networks. Currently, the usage for EIA-709 (LonWorks) and EIA-600 (CEBus) is defined. Most EIA-852 implementations on the market today are LonWorks based. Therefore, the description of the standard is focused on its application to EIA-709.
© 2005 by CRC Press
Extending EIA-709 Control Networks across IP Channels
25-3
The network elements defined in the context of EIA-852 are CN devices. These CN devices are computing systems (e.g., PCs, embedded systems) equipped with a TCP/IP stack, the EIA-852 software, and a LonWorks protocol stack. A number of CN devices are connected over the IP network where they form a logical network known as the IP channel. The IP channel functions as a LonWorks communication channel and CN devices exchange data over the IP channel. The CN devices, which are part of the IP channel, are referred to as channel members. It should be noted that there may exist more than one IP channel on the same IP network. Although the CN devices are connected to the same physical network in this case (e.g., Ethernet), the IP channels are still isolated in the LonWorks domain. CN devices can have different functions. Depending on their primary function, the EIA-852 standard defines the following CN device types: • CN nodes: They operate solely as nodes in a distributed control application and are members of one IP channel. They have the same functionality as LonWorks nodes on their native communication channels and are connected to the IP channel only. • CN/IP routers: These CN devices are LonWorks routers that connect one native LonWorks channel with one IP channel. These devices are available on the market as stand-alone LonWorks/IP routers or EIA-852 routers, e.g., the L-IP by LOYTEC or the i.LON 1000 by Echelon Corp. • IP/IP routers: These CN devices are LonWorks routers that connect two IP channels. These devices are less common on the market today. • CN proxies: These are CN devices that function as LonWorks proxy nodes to establish crossdomain communication, for example. The EIA-852 addressing scheme defines two address types: (1) IP addresses associated with each CN device on the channel and (2) logical CN addresses based on the LonWorks address space. The unique ID is equivalent to the LonWorks node ID, or Neuron ID, and functions as a unique hardware address. Each CN device is accessible by its unique ID. Logical network addresses include a subnet/node (S/N) scheme. In this addressing scheme subnet numbers are assigned to individual channels and node addresses to the CN devices on those channels. As a convention, subnets must not span over multiple channels if the CN routers are used as configured LonWorks routers. The group address identifies a number of nodes independently from subnet addresses. Consequently, groups can span different channels. Finally, those logical addresses are local to a domain. The mechanisms in EIA-852 are designed to take care of typical CN protocol properties, which differentiate them from common IP application protocols such as e-mail, Web access, or multimedia streaming. The key characteristics of control network traffic are: • Low throughput • Small packet sizes • Higher sensitivity to packet loss and latency While low throughput allows CN devices to perform more lengthy per-packet calculations, small packet sizes will lead to significant overhead in the encapsulation process. The implications of packet loss and latency on the IP network are twofold in EIA-852 systems: • The control application has to be designed to cope with timing parameters, which differ from native CN channels. This has an impact on the soft real-time properties of some applications. It is important to know the time constants in the application and determine whether the application still operates correctly when run over an IP channel [9, 10]. • The EIA-852 standard has to ensure the functionality of the CN protocol itself. LonWorks, for instance, makes strict assumptions on the maximum end-to-end delay and packet ordering. The resulting functional domains defined in EIA-852 to implement a CN protocol tunnel over an IPbased network can be summarized as follows:
© 2005 by CRC Press
25-4
The Industrial Communication Technology Handbook
• Ensuring correct and efficient data communication: CN packets must be encapsulated and decapsulated, packet ordering must be ensured, message overhead must be minimized, timeliness must be checked, and optional security measures can be used. • Routing CN payload to their correct destination on the IP network: EIA-852 defines two ways to choose destination CN devices, either by using IP multicasting or by performing selective forwarding. • Managing the IP channel members: Configuring the CN devices, distributing information between the CN devices, performing access control on the IP channel, and retrieving statistics information.
25.3 System Components In addition to assigning logical addresses to the network node as specified in the control network protocol, network nodes on an IP channel must also have an IP address. Thus, when an IP-based node needs to transmit data packets to another node on the network, it must not only know the control network address, but also the IP address of the target node. A system component is required that manages the relationship between the control network address and the IP address for a logical IP channel. In the EIA-852 standard, this component is the configuration server (CS). Figure 25.2 outlines the system components required to operate and manage an IP channel. The network in Figure 25.2 is partitioned into three parts: two traditional EIA-709 networks with subnet numbers 2 and 3 and an IP channel with subnet number 1. The IP channel consists of four CN/IP client devices and a configuration server. Two IP client devices are routers between the traditional EIA-709 network channels and the IP-852 channel (CN/IP router). The other two devices are native IP-based CN nodes. Let us assume that the node with subnet/node address 1/1 on the IP-852 channel wants to send a data packet to the network node with subnet/node address 1/2, which is also connected to the IP-852 channel. In order for node 1/1 to send out a packet to node 1/2, it must know its peer IP address 192.168.1.102. This information is maintained by the CS hosted on IP node 192.168.1.100 and distributed to all nodes on the IP-852 channel. Node 1/1 queries this information from the CS and therefore knows the mappings between subnet/node addresses and IP addresses for each node on the IP-852 channel. In our example, node 1/1 generates a UDP packet that is sent to node 1/2 at IP address 192.168.1.102. If this packet is sent using acknowledged service, node 1/2 replies with an acknowledgment packet to node 1/1. This is the same behavior as on a traditional EIA-709 channel except that node 1/2 must not only know the source subnet/node address to send the reply to, but also under which IP address this subnet/node address is reachable. The reply packet must
1/1 192.168.1.101
1/2 192.168.1.102 192.168.1.100 CS EIA-852/Ethernet
Subnet 1 1/3 192.168.1.103
1/4 192.168.1.104
CN/IP Router
CN/IP Router IP Channel
EIA-709
Node x Subnet 2
EIA-709
Node y Subnet 3
FIGURE 25.2 System components constituting an IP channel, including the IP channel, configuration server, and IP-based nodes acting as configuration clients.
© 2005 by CRC Press
Extending EIA-709 Control Networks across IP Channels
25-5
be addressed to the IP address of node 1/1, which is 192.168.1.101. To summarize the above, the process of selecting an appropriate IP address corresponding to a specific EIA-709 destination address is called channel routing. Each node that joins an IP channel must send out the information about its assigned EIA-709 address information to the CS, which associates this address information with the IP address. This is called channel routing information. Whenever the channel routing information changes, a client node on the IP channel must create a new channel routing packet and send it to the CS, which in turn distributes the new information to all other members on the IP channel. CN/IP router devices bridge the gap between traditional EIA-709 networks and IP channels, as shown in Figure 25.2. For example, if node x on subnet 2 wants to send a message to node y on subnet 3, then the router with IP address 192.168.1.103 forwards the packet to the router with IP address 192.168.1.104, which forwards the packet to the destination node y on subnet 3. One can see that even though there is an IP channel on the communication path between node x on subnet 2 and node y on subnet 3, the source node x on subnet 2 does not even notice this IP channel in between the source and the destination node. As a result, it is now possible to connect any two local control networks with IP channels that can span either intranets or the Internet. The IP channel can be operated on different network media that are offering the IP service. Commonly, 10BaseT or 100BaseT Ethernet networks are used for the IP channel. But emerging wireless technologies such as 802.11b or 802.11a and Bluetooth, as well as Plain Old Telephony Service (POTS) and Integrated Services Digital Network (ISDN) lines, cable networks, and fiber-optic cables, are also used. This opens new application areas to existing control network protocols that could not be satisfied with the traditional technology.
25.4 Data Communication CN devices exchange CN data packets over the IP channel. This is referred to as data communication to distinguish the traffic from management communication between the CS and CN devices. There is always a source of a CN data packet and a sink. In the following, we shall refer to the CN device that acts as a source as the sender and to the CN device that acts as the sink as the receiver. It should be noted that this sender–receiver relation is only valid in the context of a single data packet, or a series of data packets, in the same direction. Generally, each CN device can act as a sender as well as a receiver on the IP channel at the same time. CN nodes are true sources and sinks of CN data. CN/IP routers are intermediary devices between the actual sources and sinks of the communication. This detail shall be neglected as the CN/IP routers appear on the IP channel as senders and receivers. The diagram in Figure 25.3 shows the functional blocks in the sender and receiver. When a data packet is generated in the CN node or has to be passed on to the IP channel by a CN/IP router, the sender packs the CN packet in an EIA-852 data packet, adds a sequence number and a time stamp, routes the EIA852 packet to the appropriate channel members, and finally may collect a bunch of packets before actually sending them on the wire (bunching). The receiver performs the reverse operations: it separates the aggregated packets (debunch), runs them through a sequencer, a stale packet detector, and unpacks the original CN frames. These functions are explained in more detail in the following sections.
25.4.1 Encapsulation The process of embedding EIA-709 frames in EIA-852 data packets is referred to as encapsulation. The native EIA-709 packets are encapsulated in UDP frames and routed to the respective CN devices on the IP channel. EIA-852 uses the reserved port number 1628 for EIA-852 communication on CN devices. The configuration server uses port 1629. This makes it possible to combine a configuration server and a CN device on the same system. The choice of UDP transport over TCP has several advantages: First, because of the connectionless nature of LonWorks communication, it would incur unnecessary overhead to manage connection setup
© 2005 by CRC Press
25-6
The Industrial Communication Technology Handbook
sender pack CN packet
add sequence
add time stamp
route
bunch
IP channel receiver unpack CN packet
stale filter
sequencer
debunch
FIGURE 25.3 Functional blocks in EIA-852 data communication: sender and receiver.
in TCP. Second, TCP ensures delivery through retransmissions. Because of the (soft) real-time requirements of control applications, retransmissions will most likely miss the deadlines. Third, the LonWorks protocol implements its own retransmission scheme where necessary. Figure 25.4 shows the format of the native LonWorks packet. The preamble bits at the start of the frame and the code violation (CV) after the cyclic redundancy check (CRC) are used for bus arbitration and end-of-frame marking on native EIA-709 channels. Not all parts of the native frame are subject to encapsulation. The fields starting with the control field and ending with the CRC information are taken from the native LonWorks frame and encapsulated in the UDP frame. Figure 25.5 shows the EIA-852 common header encapsulation format, which is the payload of a UDP frame. Data fields in EIA-852 packets are defined in network order, transmitting the most significant byte first. The data packet length specifies the complete length of the EIA-852 packet, including the two length bytes. The version number is currently set to 1. The packet type specifies how the EIA-852 payload after the common header is to be interpreted. An EIA-852 data packet, which encapsulates the CN frame, is of type 0x01. For more packet types, refer to Table 25.1. The extended header size can be used to add more fields to the common header. It specifies them in quantities of four bytes. This ensures that the common header is always a multiple of four bytes. The protocol flags define which CN protocol is to be tunneled (0x00 for LonWorks). The vendor code allows implementers to include their own vendor-specific extensions, which can be transported along with standard EIA-852 messages. The session ID is a 32-bit value that a CN device keeps Bus Arbitration Preamble
SB
Control
Source Address
Destination Address
User Data
CRC
CV
Preamble
FIGURE 25.4 Native EIA-709 frame format.
Byte 0
Byte 1
Data Packet Length Ext. Header Size
Byte 2
Byte 3
Version
Packet Type
Protocol Flags Session ID Sequence Number Time Stamp : Packet Specific Data :
FIGURE 25.5 EIA-852 common header format.
© 2005 by CRC Press
Vendor Code
...
25-7
Extending EIA-709 Control Networks across IP Channels
TABLE 25.1 EIA-852 Packet Types Packet Name
Packet Type
Function
Data packet Device configuration request (DC-REQ) Device registration (DEVREG) Device configuration (DC)
0x01 0x63 0x03 0x71
Channel list request (REQ-CM) Channel list (CM) Send list request (REQ-SL) Send list (SL) Channel routing request (REQ-CR)
0x64 0x04 0x66 0x06 0x68
Channel routing packet (CR) Acknowledge (ACK) Segment (SEG) Statistic request (REQ-STAT) Statistic response (STAT)
0x08 0x07 0x7F 0x60 0x70
Encapsulates the CN frame Requests a DC from the CN device Device registers with the CS DC requested from the device or unsolicited DC sent by the CS Device requests CM from the CS CS responds with the CM Request the send list from the CS CS responds with the send list CN device requests a CR from the CS for a specific member CS responds to a CR request Acknowledgment message from device or CS Segment of a large data structure Request statistics from a CN device CN device responds with statistic data
at random. After a CN device reboots or resets, the session ID must be different than the preceding one. The sequence number field is valid for EIA-852 data packets and is used to detect out-of-sequence packets. The time stamp is also used for EIA-852 data messages and is a millisecond value. Therefore, it wraps around every 49 days.
25.4.2 Packet Sequencing Packet sequencing in EIA-852 is based on the pair session ID and sequence number, which are both part of the EIA-852 common header. An EIA-852 receiver remembers the session ID–sequence number pair for each sending CN device (i.e., for each source IP address). If EIA-852 data packets from a given source are received with the same session ID, they are in-sequence provided their sequence numbers are increasing. If data packets get reordered or are dropped on the network, there appear gaps in the sequence numbers. A receiver basically has two choices: (1) continue and drop the missing packets later or (2) wait for the missing packets. If the receiver waits, it has to keep data packets after such a gap in escrow until a defined escrow timeout (ETO). If the missing packets arrive before this time-out, the packets are passed on in their correct sequence. If they do not arrive in time, they will be dropped later. Without escrowing, all out-of-sequence packets are dropped immediately. This may cause unnecessary packet loss on IP channels where packets may get reordered in a small time window. If the session ID changes, the receiver has to assume that the source has been reset and it accepts the first packet with that new session ID as in-sequence, clearing the sequence numbers from the previous session. This is a design choice of EIA-852 and can cause problems with replay attacks. See Section 25.6 for more detail on this issue. The ETO is a CN device parameter in most implementations. This time constant has to be considered when designing a control application. It is always a trade-off between unnecessarily losing reordered packets and delaying escrowed packets for too long when packets actually are lost on the IP link. If the ETO is big, more packets can be reordered because the wait time for them is longer. For the same reason, if a packet is actually lost on the IP link, the wait time can approach the ETO in the worst case. Applications that primarily rely on request–response style communication on LANs should choose a small ETO (e.g., 5 ms) or disable it. Applications that are based on a continuous unidirectional sample stream over a wide area network (WAN) link should choose a large enough ETO. A typical default value on most implementations is 64 ms.
© 2005 by CRC Press
25-8
The Industrial Communication Technology Handbook
25.4.3 Packet Aggregation The native message sizes of typical CN traffic are small, around 10 to 20 bytes in LonWorks. Encapsulating each CN packet into a UDP frame adds the UDP, IP, and Ethernet headers (or other media-specific headers), which account for more than 40 bytes. This incurs unnecessary overhead, which may lead to noticeable performance problems on embedded implementations. Those systems typically suffer from an IP transmission bottleneck. A technique to reduce this overhead is to aggregate more than one EIA-852 data packet in a single UDP frame (packet bunching). Clearly, the more packets that are bunched, the less the per-CN packet overhead that is added by the encapsulation. When aggregated, EIA-852 data messages are concatenated in the UDP frame. The UDP packet is then transmitted to the destination IP address. Therefore, messages can only be aggregated when they share the same destination. The sender may thus be configured to keep data packets and aggregate them until an aggregation time-out (ATO) before sending the UDP frame on the IP link. The ATO value is a CN device parameter in most implementations. The effect of packet bunching has to be considered in the design of control applications. There is a trade-off between overhead reduction and extra delay. In the worst case, an EIA-852 data packet is delayed for the full ATO until it finally gets on the wire. For highly responsive applications with low packet rates on LANs, packet bunching should be disabled. For applications with high packet rates over WANs, packet bunching should be enabled. A typical default value in most implementations is 16 ms.
25.4.4 Stale Packet Detection Packet delay on the IP link is a big issue on IP channels. Packet delays can be highly random with a large standard deviation on wide area IP links [13]. There are two problems with this: (1) the control application may miss some of its deadlines, and (2) the CN protocol may break if packets are received after they are delayed too long. For example, the LonWorks duplicate detection mechanism is sensitive to high delay variance on the network. While the first problem must be solved by carefully designing the control application [6], the second problem can be eliminated by discarding packets that are delayed beyond a certain time limit (stale packets). The mechanism is called stale packet detection and is based on oneway packet delay measurements. In order to make such measurements, CN devices must synchronize their clocks. Clock synchronization in EIA-852 is performed via the Simple Network Time Protocol (SNTP) [11]. The sender generates a millisecond time stamp and puts it into the EIA-852 common header. If its measured delay exceeds the channel time-out (CTO), the packet is called a stale packet and the receiver discards it in the stale packet filter. The CTO is a channel parameter and distributed among the CN devices. Applications on LANs can disable stale packet detection because the delay variance is typically very low and comparable with native LonWorks channels. On WAN links, it is recommended to enable the CTO. Typical values are the average ping delay on the IP channel plus aggregation time-out. The lower bound of the CTO is restricted by the resolution of the clock synchronization. On certain systems (e.g., Windows), the resolution may drop to 50 ms. Practical CTO values start at 200 ms. If certain CN devices are Windows implementations, it must be ensured that the Windows system clock is synchronized. Dropped packets due to stale packet detection can be observed in the EIA-852 statistics of stale packets (Table 25.2). Stale packet filtering also plays an important role in the context of security. Refer to Section 25.6 for more details.
25.4.5 EIA-852 Data Packet Routing One of the EIA-852 CN device tasks is to route EIA-852 data packets to the appropriate IP channel members. In this context, it is important to notice that each channel member is associated with an IP address. If a given CN data packet needs to be routed to multiple IP addresses, it is copied into several EIA-852 data packets, each sent to one of the selected CN devices on the channel. It should be noted that sequence numbers are associated per destination address. If CN data packets are routed to different
© 2005 by CRC Press
Extending EIA-709 Control Networks across IP Channels
25-9
TABLE 25.2 EIA-852 Statistics Data Data
Explanation
Seconds since cleared Date/time of clear (GMT) Number of members LT packets received LT packets sent IP packets sent IP bytes sent IP packets received IP bytes received LT stale packets RFC packets sent RFC packets received Avg. aggregation to IP Avg. aggregation from IP UDP packets sent Multicast packets sent
Number of seconds since the statistics have been reset Date and time when the statistics have been reset Current number of channel members CN packets received from LonWorks CN packets sent to LonWorks CN packets sent on to the IP channel Aggregate number of bytes in CN packets sent on to the IP channel Number of CN packets received from the IP channel Aggregate number of bytes in CN packets received from the IP channel Number of packets dropped because they are stale Number of management data packets sent to the CS Number of management data packets received from the CS Average aggregation of transmitted CN packets on the IP channel Average aggregation of received CN packets from the IP channel Total number of UDP packets sent (including both data and management) Total number of multicast packets sent (only for send lists)
Note:
IP addresses, the sequence numbers need not necessarily be the same among all destinations. This stems from the fact that some messages are routed to a specific destination and others are not. Depending on the available channel data, three variants of routing the CN data packet are possible: 1. Send list routing: A send list (SL) is an optimized list of IP addresses to reach all channel members. Typically, send lists are used when IP multicast addresses are assigned to groups of channel members. In the best case, all channel members are in the same IP multicast group. In this case, the send list contains this one multicast address. CN devices send EIA-852 data packets to all destinations in the SL. 2. Channel routing: Channel routing is used when only unicast IP addresses are available. The basic idea is to select those destination IP addresses that will accept the packet based on the LonWorks address information. To advertise which LonWorks addresses a CN device accepts, it publishes channel routing (CR) information to the other channel members. Each channel member collects all the available CR information and can route outgoing CN data packets in an optimized way. It should be noted that with channel routing, all data packets have to be copied multiple times if there are multiple advertised recipients. Especially for large groups and broadcast addresses, send lists appear much more efficient. For subnet/node or NID-addressed packets, channel routing is beneficial because the packets are routed to a single unicast address only, not burdening other CN devices with packets that they do not want. 3. Brute force: Neither send list nor channel routing information is available. The CN device resorts to the channel list and transmits every data packet to all channel members using their unicast addresses. This mode is the least efficient. The channel routing algorithm is based on the CR information provided by the CN devices on the channel. The channel routing information for each channel member contains the following fields: • • • • • • •
CN device IP address and port CN broadcast flag CN device type (router, node, IP/IP router, proxy) CN router mode (configured, bridge, repeater) List of node IDs List of subnet/node addresses List of domain items (including subnet and group forwarding flags)
© 2005 by CRC Press
25-10
The Industrial Communication Technology Handbook
I.
If (CN broadcast flag is set and address == FMT_BCAST) or (router type == REPEATER and there is a domain item) then route to member.
II.
Route to CN router
III.
IV.
(i)
If domain is incorrect skip to III.
(ii)
If address == FMT_GROUP and group forward flag is set for GROUP route to member.
(iii)
If SUBNET == 0 route to member.
(iv)
If subnet forward flag is set for SUBNET route to member.
Route to CN device
(i)
If address == NID and NID matches route to member.
(ii)
If DOMAIN does not match skip to IV.
(iii)
If address == FMT_SN and subnet/node item matches route to member.
(iv)
If address == FMT_BCAST and SUBNET == 0 route to member.
(v)
If address == FMT_BCAST and SUBNET matches subnet/node item route to member.
Go to next member CR information.
FIGURE 25.6 Channel routing algorithm.
The domain items deserve a more thorough discussion. First, they contain a domain number. Then they specify forwarding flags for both subnets and groups. If the n-th subnet flag is set, the CN device accepts all packets destined for subnet n. If the m-th group flag is set, the CN device accepts all group packets for group m. CN routers need to specify the subnet and group forwarding flags, while CN nodes only specify the group flags. This is because CN nodes do not forward any packets, but they can be members of multiple groups. Figure 25.6 summarizes the channel routing algorithm implemented in most CN/IP router implementations. The channel routing algorithm iterates over all channel members CR information for each EIA852 data packet. § I checks the CN broadcast flag and routes all broadcast packets if set. Further, § I routes all packets if the router type is a repeater and there exists at least one domain item. The CN broadcast flag is set when the channel member is unconfigured and does not yet have a valid domain. Otherwise, those devices could not be reached over the IP channel by broadcasts. Leaving the special case of unconfigured CN devices aside, there are two types of devices on a channel: (1) CN nodes, which accept only CN packets addressed to them (e.g., their subnet/node address, NID, group), and (2) CN routers, which forward packets to other channels behind them. The channel routing algorithm therefore distinguishes those two cases. In § II, the packet is routed by looking at the channel member as a CN router (e.g., configured router, bridge). This step uses information contained in the domain items of the CR information. Therefore, a forwarding device must define at least one domain item in its CR information. § II(i) skips the channel routing for a certain domain item if the domain does not match. § II(ii) is responsible for forwarding group-addressed messages if the corresponding group forwarding flag is set in the domain item. § II(iii) checks the destination subnet part for S/N, NID, and broadcast-addressed messages. If they are domainwide (i.e., subnet equals 0), the message is forwarded. Otherwise, those messages are only forwarded in § II(iv) if the corresponding subnet forwarding flag is set in the domain item. As a consequence, CN routers must supply domain items for all domains in which they need to receive packets. In § III, the message is routed by looking at the channel member as a CN node on the IP channel. The IP channel is assigned a separate individual subnet number, and all nodes on that channel get their
© 2005 by CRC Press
Extending EIA-709 Control Networks across IP Channels
25-11
individual node numbers. Therefore, devices that represent nodes on the IP channel must supply at least one subnet/node, one NID, and one domain entry in their CR information. § III(i) checks if a NIDaddressed message matches with one of the NID items in the member CR information. If the NID matches, the message is forwarded to that member, ignoring any domain information. Therefore, unconfigured nodes on the IP channel can be reached via NID-addressed messages. For all other messages, § III(ii) checks if the domain matches with one domain item. If it does not, the algorithm skips to § IV. In § III(iii), subnet/node-addressed messages are routed. A subnet/node-addressed message is forwarded to a specific member, if that member has a matching subnet/node entry in its CR information. § III(iv) routes all domain-wide broadcasts to a node on the IP channel. § III(v) routes all subnet broadcasts to the channel member if one of its subnet/node items has a matching subnet number. § IV iterates to the next member in the channel until the channel routing algorithm has gone through all channel members. The channel routing mechanism has the following implications for the CR information, which CN devices must provide: • Unconfigured channel members must set the CN broadcast flag in order to get configured by LonWorks network management tools (e.g., LonMaker). • CN routers (configured router, bridges, etc.) must provide corresponding domain items in the CR for all domains in which they want to exchange packets. • CN nodes must provide corresponding subnet/node, NID, and domain items.
25.5 Management Apart from data communication, a big part of the EIA-852 functionality is devoted to the management of IP channels. Management functions include controlling which CN devices can be members of a specific IP channel, configuring individual CN devices, distributing information between CN devices, and retrieving statistics information about CN devices. All relevant configuration data for a CN device is controlled by a device configuration (DC) structure. The EIA-852 management concept is based on a client–server model. A configuration server (CS) is the central part in the system, which is responsible for managing an IP channel. CN devices act as configuration clients (CCs) and request data structures from the CS. The CS only actively sends the DC to CN devices if the IP channel has been updated and it needs to announce that change. Therefore, the DC is an unsolicited message from the CS. If no channel changes are pending, the CN devices can operate without a CS. As soon as the IP channel needs to be updated, e.g., a CN device is added or recommissioned, the CS must be available again to distribute the change. The management data structures include the per-device DC, a channel membership (CM) list specifying the members on the IP channel, a send list (SL) for send list routing, which is optional, and a per-device CR information. All management data in EIA-852 is versioned. The version is a 32-bit value, which increases for newer versions. The version is usually date and time information in Network Time Protocol (NTP) second format, which may or may not be synchronized to UTC (universal time coordinated). Hence, the version is called datetime and is nonzero for valid data structures. Local data structures are checked against published datetime versions and requested if newer data are available. The CM, SL, and per-device DC are versioned by the CS, whereas the CR data are versioned by the individual CN devices. The CN devices, which constitute the IP channel, are defined in the channel list on the CS. Each CN device is uniquely identified by its IP address in the channel list. An important restriction in EIA-852 is that there cannot be two CN devices on a channel that share the same IP address but have different IP ports. Each CN device in the channel list is configured by a device configuration. The DC is entered on the CS, e.g., through a serial console or a Web interface. The following information needs to be supplied in order to add a CN device to an IP channel: • IP address and port of the CN device • Name of the CN device (typically 15 characters)
© 2005 by CRC Press
25-12
The Industrial Communication Technology Handbook
CN(n+1)
CS
DC
DEVREG
CN1
…
CNn
Add CN(n+1)
Add CN(n+1) to CM DC : DC
CM is newer
REQ-CM CM
Need CR for CN(n+1)
REQ-CR(n+1) CR(n+1)
FIGURE 25.7 CS configuration data distribution.
Figure 25.7 depicts the message flow when adding or configuring a CN device. The CS adds a CN device to the channel by sending an updated and unsolicited DC message to the device. Once a CN device is added or has accepted a new configuration, it registers with the CS by sending a device registration (DEVREG). Knowing the channel is updated, the CS sends a new unsolicited DC to all the other members to announce the channel update. The other CN devices in turn request the updated data structures from the CS starting to request the channel list (REQ-CM). If the CN device finds out that the CM advertises new CR information for certain members, it requests them from the CS (REQ-CR) one by one. If a CN device is not available when it is added on the CS, it is usually marked “unregistered” in the channel list. In this case, the CS tries to recontact the CN device using an exponential backoff for the retry interval. It starts at 1 s, doubles each time, and is limited to 32 s. As a consequence, it may take up to 32 s for a device to become functional on the IP channel when it was not reachable at the time of adding it. CS implementations usually display CN devices that have registered successfully as “registered.” If a device has been registered once but is not responding to a channel update, a CS may display it as “not responding.” The operator then can take appropriate action for those devices. Access control on the CS is also defined through the channel list. If a CN device tries to register with a CS that does not include this particular device in its channel list, the CN device is rejected from the IP channel. Rejected CN devices cannot send or receive CN packets over the IP channel and usually indicate this condition as an error, e.g., a red status lamp. Some CS implementations collect information about those CN devices in a so-called orphan list. The operator then can select an orphan device and add it manually to a given IP channel. An important part of EIA-852 is the routing of CN packets to the correct CN devices over the IP network. As described in the previous section, there are two possibilities: (1) creating a send list or (2) distributing channel routing information. The CS generates the send list (SL) by looking at the IP address information in the channel list. By appropriately grouping IP multicast addresses and unicast addresses, the send list typically gets much shorter than the channel list. The send list is also versioned and distributed to the CN devices by sending an updated DC to the members. The other possibility is to distribute channel routing (CR) information. The CR, however, does not originate at the CS. It is generated by the individual CN devices because they are commissioned and have knowledge of their CN address information. Each CN device sends a CR update to the CS if its CR changes. The CS in turn generates a new channel list, which reflects the updated CR datetime version of
© 2005 by CRC Press
25-13
Extending EIA-709 Control Networks across IP Channels
the device. The other devices will request the new channel list and discover that a specific device has a new CR. The other devices then request this CR from the CS as described before. If the information has changed for more than one CR, the CN devices request all newer CR data from the CS. This is an important detail because CN devices never request the new CR from the original device directly. Therefore, the CS must be available if any CN device is recommissioned. The management part of EIA-852 also defines a statistics information structure. This structure can be requested outside of the client–server model. Any CN device or the CS can request the statistics information from a CN device. The statistics data supplied are listed in Table 25.2.
25.6 Security Security is a growing area of concern for IP networks. This also includes EIA-852 traffic. While native LonWorks channels are local (e.g., to a building) and can be accessed only locally, IP channels are established on an open medium that may even span across WAN links. Typical security issues are privacy, authenticity, and integrity [14]. Security measures defined in EIA-852 refer to the problem of authenticity only. They ensure that a received data packet is actually sent by an authentic channel member and not by an attacker. The principle is based on secure message authentication codes and a shared secret. The secure hash function algorithm MD5 [12] is used to create a unique fingerprint of an EIA-852 message. Included in the fingerprint calculation is a secret key, which is not transmitted over the wire. The receiver can perform the same calculation, supplying its own copy of the secret key, and compare the hash codes. If they match, the message is authentic. The MD5 hash is transmitted right after the EIA-852 message. Its fixed length is 128 bits and is not included in the EIA-852 packet size field. Instead, channel members are configured for secure operation (MD5 mode) and expect the extra 128 bits after each EIA-852 packet. This is especially important for bunched UDP frames. If packets are bunched, the next EIA-852 message follows the MD5 hash. The calculation principle is depicted in Figure 25.8. The MD5 hash following the EIA-852 message is set to zero (MBZ). The MD5 algorithm is run over the EIA-852 message, the zero hash placeholder, and the 128-bit shared secret. The calculated hash value is copied into the placeholder and the message is transmitted excluding the shared secret. The receiver performs the same calculation. It just saves the MD5 hash and then sets it to zero before calculating the MD5 hash locally. If the saved and locally computed hash values match, the message contents are authentic. To prevent replay attacks, stale packet detection must be enabled. The reason for this is the sequencing algorithm in EIA-852. An attacker may record a sequence of authenticated messages known to trigger a specific action. These messages will share a common session ID. At a later point in time, the CN device may have chosen a different session ID (e.g., after a power outage). At this moment, the recorded sequence can be replayed successfully because the CN device would accept the old session ID as a new one and start over with counting the sequence numbers. Stale packet detection solves this problem because the EIA-852 messages also include time stamps. Since the CN devices are synchronized to UTC, a playback at a later time is not possible. The CN devices would drop those packets as stale packets. EIA-852 message
128-bit MBZ
MD5 calculation
EIA-852 message
FIGURE 25.8 MD5 authentication of EIA-852 packets.
© 2005 by CRC Press
MD5 hash
Shared secret
25-14
The Industrial Communication Technology Handbook
CN 2
NAT 2 IP: 192 .168. 1.250 NAT: 80.41.6.3
80.41.6.3 : :
IP NAT 1
CN n
CN 1
NATn
FIGURE 25.9 Simple NAT configuration. IP channel public CN
IP channel private
NAT
CN
CN …
: :
EIA-709
IP
Ethernet NAT
CN
CN/IP router
CN/IP router
NAT
FIGURE 25.10 Complex NAT configuration.
Another security aspect in modern networks is firewalls and network address translators (NATs). Firewalls, in principle, only filter traffic. Therefore, CN devices can be operated behind a firewall if the filter rules are set accordingly, allowing UDP traffic on ports 1628 and 1629 to pass. The case with NAT routers is different. NATs actually alter IP addresses and port numbers. This is a problem for EIA-852 because channel members on the IP channel are uniquely identified by their IP address. If that address is changed in the public realm, they could not be part of the IP channel. Some implementations of CN/IP routers therefore allow for a special NAT mode (e.g., the L-IP by LOYTEC). In this mode, a CN device is configured with the public address of its NAT router. It represents this public address in its device configuration and other members will see its public address in the channel list and CR information. The NAT itself must be configured with a port-forwarding rule to route all EIA-852 packets from the public interface to the CN device in the private realm. Figure 25.9 illustrates this setup. NAT1 contains a port-forwarding rule to route all packets it receives at IP address 80.41.6.3 on ports 1628 and 1629 to the private IP address 192.168.1.250 on the respective ports. The drawback of the current solution is that only one CN device per IP channel can be operated behind a NAT. This is because there would be only one public IP address visible for multiple private CN devices. This violates the rule that each CN device must have a unique IP address in the channel list. System designers have to split the IP channels if they want to operate more CN devices in the private realm. Figure 25.10 shows a possible solution where one IP channel is established over the public realm having one CN device behind each NAT. The CN device can be either an IP/IP router or two back-toback CN/IP routers to transfer the CN packets to the private IP channel. In any case, NAT configurations require a careful and handcrafted configuration on the NAT routers.
25.7 Applications So far we have learned what the new IP tunneling technology following the ANSI/EIA-852 standard can do in theory. Here we show the application scenarios that utilize this new technology. We have seen that it is possible to connect control networks via intranets, and this is exactly what is starting to emerge in
© 2005 by CRC Press
25-15
Extending EIA-709 Control Networks across IP Channels
Intranet (Ethernet) EIA-852
CNIP Router EIA-709
CNIP Router EIA-709
CNIP Router EIA-709
CNIP Router EIA-709
FIGURE 25.11 The IP channel can utilize the existing IP infrastructure in a building to connect several control networks to a large building automation network.
larger buildings. Typically, office buildings do have a sophisticated IP infrastructure; therefore, it is beneficial to utilize this IP infrastructure for services other than PC networking. As shown in Figure 25.11, control network to IP routers like the L-IP from LOYTEC or the i.LON by Echelon can utilize the existing IP infrastructure to form a high-speed backbone network in order to connect the control networks on different floors. This setup allows transparent communication between offices on different floors and to a central SCADA (supervisory control and data acquisition) system using the IP backbone. Another steadily growing application area is internetworking between different buildings that are connected via either an intranet or the Internet. Once the building infrastructure is online, it is only a small step to remote facility management and remote maintenance. Figure 25.12 shows a typical scenario of how building complexes are networked to form one large control network. A management PC connected anywhere to the network allows remote network maintenance, remote trending, alarming, logging, and preventive system repairs. Another advantage, which was initially not so obvious, is the fact that network management tools and network troubleshooting tools can be used together with the new IP technology. This means that thousands of installers and system integrators that have been trained over the past 10 years will use this knowledge with a little bit of additional training to install networks that span intranets and the Internet. Once the control network is able to use the IP service as a transport media, modern wireless technologies can be utilized to create new application areas. One can, for example, walk around a building with a wireless PDA and control the building infrastructure, such as lighting, heating, AC, and blinds, and even carry out network maintenance tasks on the spot by himself. It is now possible to monitor and control remote wastewater pump stations that are connected to the control center via an RF link.
© 2005 by CRC Press
25-16
The Industrial Communication Technology Handbook
Building 1
Building 2
CNIP Router
CNIP Router EIA-709
CNIP Router
EIA-709
CNIP Router EIA-709
CNIP Router
EIA-709
CNIP Router EIA-709
CNIP Router
EIA-709
CNIP Router EIA-709
EIA-709
Internet/Intranet
FIGURE 25.12 Connecting different buildings over an intranet or the Internet allows transparent communication of control network data between nodes on either network.
25.8 Conclusions The world is getting more and more connected and one of the driving factors is the Internet Protocol. Connecting PCs to the Internet was the hype of the 1990s; connecting everyday devices to the Internet is the challenge of the new millennium. The groundwork has been laid to provide technologies that will satisfy those demands. It is now a matter of time until your bathroom fan can be controlled from your cell phone. In the meantime, facility managers are concentrating on applications that are saving money, protecting environmental resources, and improving the reliability of the network. We have shown that existing control network protocols have by far not reached the end of their life cycle, but are experiencing a revival by the push of IP technology.
References [1] D. Dietrich, D. Loy, H.-J. Schweinzer, LON-Technologie, Verteilte Systeme in der Anwendung. 2. Auflage, Hüthig Verlag, Heidelberg, 1999. [2] D. Dietrich, P. Neumann, H. Schweinzer, Fieldbus Technology: System Integration, Networking, and Engineering, Springer, New York, 1999.
© 2005 by CRC Press
Extending EIA-709 Control Networks across IP Channels
25-17
[3] A.S. Tanenbaum, Computer Networks, 2nd ed., Prentice Hall, Englewood Cliffs, NJ, 1989. [4] EIA/CEA-852, Tunneling Component Network Protocols over Internet Protocol Channels, November 2001. [5] J.-P. Thomesse, M. Leon Chavez, Main Paradigms as a Basis for Current Fieldbus Concepts, in Fieldbus Technology, Springer, New York, 1999, pp. 2–15. [6] S. Soucek, T. Sauter, G. Koller, Effect of delay jitter on quality of control in EIA-852-based networks, in Proceedings of the 29th Annual Conference of the IEEE Industrial Electronics Society, Virginia, November 2003. [7] T. Sauter, P. Palensky, A closer look into Internet-fieldbus connectivity, in Proceedings of WFCS ’00, IEEE, Porto, Portugal, 2000. [8] P. Neumann, F. Iwanitz, Integration of fieldbus systems into distributed object-oriented systems, in Proceedings of WFCS ’97, IEEE, 1997, pp. 247–253. [9] S. Soucek, T. Sauter, T. Rauscher, A scheme to determine QoS requirements for control network data over IP, in 27th Annual Conference of the IEEE Industrial Electronics Society (IECON), Denver, CO, November–December 2001, pp. 153–158. [10] S. Soucek, T. Sauter, G. Koller, Impact of QoS parameters on Internet-based EIA-709.1 control applications, in Proceedings of the 28th Annual Conference of the IEEE Industrial Electronics Society, Seville, Spain, November 2002, pp. 3176–3181. [11] D. Mills, RFC 2030, Simple Network Time Protocol (SNTP) Version 4 for IPv4, IPv6 and OSI, University of Delaware, October 1996. [12] R. Rivest, RFC 1321, The MD5 Message Digest Algorithm, April 1992. [13] V. Paxon, S. Floyd, Wide-area traffic: the failure of Poisson modeling, ACM/IEEE Transactions on Networking, 3, 226–224, 1995. [14] B. Schneier, Applied Cryptography: Protocols, Algorithms, and Source Code in C, 2nd ed., John Wiley & Sons, New York, 1995.
© 2005 by CRC Press
26 Interconnection of Wireline and Wireless Fieldbuses 26.1 Introduction ......................................................................26-1 26.2 Context and Definitions ...................................................26-2 Fieldbus Requirements • Important Radio Transmission Properties • Definitions
26.3 Interconnection Means .....................................................26-3 Repeaters • Bridges • Routers • Gateways
26.4 Major Design Alternatives ................................................26-5 26.5 Solutions for the Interconnection....................................26-6 Repeater-Based Solutions • Bridge-Based Solutions • GatewayBased Solutions
Jean-Dominique Decotignie CSEM (Centre Suisse d’Electronique et de Microtechnique)
26.6 Amenability to Comply with the Fieldbus Requirements...................................................................26-10 26.7 Conclusion.......................................................................26-12 References ...................................................................................26-12
26.1 Introduction Networking at the factory floor is now commonplace. Solutions are available at different levels: sensory, cell, plant, etc. In this chapter, we focus on the sensory level where networks usually called fieldbuses (Pleinevaux and Decotignie, 1988) link sensors and actuators to the first level of automation. This kind of network is no longer the exclusivity of factories and finds its way into cars, planes, and buildings. Since the first solutions were designed in the early 1980s, a number of proposals have flourished and the field is well established in terms of both research and industrial use. Most solutions use wired transmission (twisted pairs, coaxial cables, optical fibers). Very early, the need for mobile nodes and the difficulty to install cables pushed for wireless solutions based on radio or light transmission. Today, most solutions use radio transmission. Due to the special properties of wireless transmission, it is not desirable to have all nodes of the fieldbus be wireless. On the other side, the wired devices need to communicate with the wireless nodes, thus prompting the necessity for interconnection means. However, the differences between the properties of wired and wireless transmissions introduce constraints in this interconnection. This is exacerbated by the special requirements put on fieldbuses. The chapter is organized as follows. Section 26.2 gives some definitions that will be used in the rest of the chapter. It also explains what are the relevant properties of wireless transmission. Section 26.3 details the different interconnecting mechanisms that can be used. Section 26.4 presents the different architectural options that are used when designing an interconnection architecture. Section 26.5 details
26-1 © 2005 by CRC Press
26-2
The Industrial Communication Technology Handbook
how the various options are used in the different proposals and the associated difficulties. Section 26.6 uses the model to derive solutions that were not yet proposed and discusses their applicability. Some conclusions and possible future work are given in the last section.
26.2 Context and Definitions As mentioned above, the chapter concentrates on field-level devices that communicate through a mix of wireless and wireline transmission. Although this is a very important topic, in this chapter, we do not consider the issue of power consumption in the wireless nodes.
26.2.1 Fieldbus Requirements Requirements for fieldbuses are numerous and cover all the life cycle activities. Here we restrict ourselves to those properties that are particularly relevant for the interconnection (for a detailed explanation, refer to Chapter 19, “Which Network for Which Application”): • Handle periodic traffic with different period durations. In many fieldbus solutions, this requirement is translated into some cyclic traffic. The real need is to be able to transport the information well before the end of the period at which the data has been sampled. • Handle sporadic traffic with bounded latency. • Allow for quasi-simultaneous sampling of a number of inputs on different network nodes. • Provide indication for temporal consistency. The fact is that control or acquisition systems expect that different sensed values correspond to sampling instants, which should be within a few percent of the sampling period. The network should thus provide ways to know if a set of values exhibits this property, named relative temporal consistency. Sometimes the age (time elapsed since sampling) of data, also called absolute temporal consistency (Kopetz, 1988), is also important to its users. • For sporadic traffic, provide ways to know the order in which events have occurred. An application will make different decisions depending on the order in which events have occurred. As the events are potentially detected on different nodes of the network, there should be a way to find out the order. • Transfer data from one node to another or from one node to a number of others. • Rugged solutions in terms of resistance to interference, vibrations, etc. It is important to notice the special temporal requirement, which will have a direct effect on the interconnection.
26.2.2 Important Radio Transmission Properties Wireless transmissions come in two main categories: radio- and light-based systems. They both exhibit characteristics that make them very different from cable-based transmission. For radio transmission, the main properties are: • Property 1: Compared to cables, radio transmissions suffer from bit error rates (BERs) that are some orders of magnitude higher. BERs of 10–3 to 10–4 are usual, whereas in cables one may expect BERs ranging from 10–7 to 10–9. Error detection schemes should thus be enhanced accordingly. This is especially a concern for token-based systems because the token recovery takes quite a long time compared to the temporal constraints. • Property 2: Spatial reuse is low, as spectrum is limited. This means that either coexistence of several systems in the same area should be planned (code or frequency allocation) or the medium access control (MAC) should be designed in a way that takes care of the interference between systems. • Property 3: Perturbing systems can easily jam radio transmission. This is especially true in the ISM (instrument, scientific, and medical) bands. For instance, in the 2.4-GHz band, high-power medical devices are allowed. They may completely suppress all communications for long periods.
© 2005 by CRC Press
Interconnection of Wireline and Wireless Fieldbuses
26-3
• Property 4: Transmission distances are smaller. Typical range are a few tenths of meters indoors and up to 300 m outdoors. Obstacles may further limit this distance. • Property 5: Collisions cannot be detected while emitting. The power of remote emitters is much lower than the power of the transmitter emission that masks the others. • Property 6: A transceiver needs longer time (up to a few milliseconds) to switch from emission to reception and vice versa. This has to be taken into account when designing the protocol. In particular, protocols that require immediate answers to incoming requests cannot be used. • Property 7: Radio transmissions suffer from frequency-selective multipath fading. Waves may follow different paths that interfere destructively at the receiver site. Communication may be impossible at some points. Optical waves may be used as an alternative to radio transmission. Here we will restrict the discussion to infrared transmission, as it is the most commonly used. Its special properties are: • Property 8: Transmission operates only in line of sight. An emitter and a receiver should have direct visibility. This constraint is often relaxed by using satellite-like techniques. A special device that acts as a repeater is located in a place where it “sees” all the other devices. This location is often on the ceiling of the room and thus is named satellite. However, communication may be impossible at some points. • Property 9: Sources of heat (sun, machines, heaters, etc.) interfere with transmission and induce errors. • Property 10: Spectrum reuse is limited as all systems share the same wavelength. As a summary, wireless transmission means exhibiting properties that are significantly different from those of wires. This will have an impact on the solutions that can be used to interconnect wired and wireless nodes.
26.2.3 Definitions In order to clarify the different architectural options, it is useful to define some terms: • Data circuit (ISO/IEC 7498:1, 1996) — “A common path in the physical media for OSI among two or more physical-entities together with the facilities necessary in the physical layer for the transmission of bits on it.” • Subnetwork (ISO/IEC 7498:1, 1996) — “An abstraction of a real subnetwork.” • Real subnetwork (ISO/IEC 7498:1, 1996) — “A collection of equipments and physical media which form an autonomous whole and which can be used to interconnect real systems for the purpose of data transfers.” • Data link (ISO/IEC 8802.2, 1998) — “An assembly of two or more terminal installations and the interconnecting communication channel operating according to a particular method that permit information to be exchanged. In this context, the term terminal installation does not include the data source and the data sink.” • LAN — A data link using the same physical layer and medium access control protocols. • Segment — Synonymous with data circuit when the nodes are connected through wires. • Cell — Synonymous with data circuit but in the case of a wireless medium.
26.3 Interconnection Means Generally speaking, networks may be interconnected in various ways — repeaters, bridges, routers, and gateways (Perlman, 2000).
26.3.1 Repeaters Repeaters operate above the physical layer. Conventionally, they work bit by bit receiving the input signal, regenerating the signal, and emitting it on the other side. In the last decade, we have seen the blossoming
© 2005 by CRC Press
26-4
The Industrial Communication Technology Handbook
of repeaters with multiple ports that are called hubs (Ethernet, Universal Serial Bus (USB)). In the context of wireline to wireless (or vice versa) repeaters, this may imply changing the encoding scheme (i.e., NRZI to Manchester). However, theoretically, repeaters operate transparently to the protocols above the physical layer. As wireless transmission is more error-prone than transmission on cables, a different kind of repeater may be designed (Morel, 1996), namely, a word repeater. Instead of repeating the incoming signal bit by bit (bit repeater), the word repeater waits until a number of bits have been received from the wired side, calculates a forward error correction (FEC) code, and transmits the bits together with the error correction code. A different modulation scheme (for instance, spread spectrum) may be used to improve the spectral efficiency. When receiving from the wireless side, the word repeater uses the FEC code to correct possible errors and retransmits the (possibly) corrected information bits to the wire side. This kind of repeater introduces a longer delay, although shorter than the delay of a bridge, router, or gateway. It reduces the error rate on the wireless side and brings it closer to the wired side bit error rate (Morel, 1996). Some practical considerations limit the use of repeaters. Repeaters should in principle forward what comes from one port to the other port and vice versa. This would be possible with full-duplex lines with one line for each direction of transfer. However, all popular wireline fieldbuses use a single line for both directions (half duplex). The repeater must thus switch from one direction to the other one according to the flow of data. This may be performed using additional lines but is not practical. In practice, wireline repeaters use some mechanism to sense the direction of travel (Murdock and Goldie, 1989) and autonomously switch accordingly. When an incoming signal is sensed on one side, the repeater is switched so that the signal is regenerated and emitted on the other side. The opposite happens when the signal is sensed on the other side. When signals are sensed on both sides, the repeater is put in isolation mode. This behavior is adequate as long as the medium access control does not rely on collisions. In such a case, when a signal is sensed on both sides, the repeater must emit a jamming signal on both sides so that the collision can be detected on both sides. It must also send this jamming signal on the input port when, after switching to one direction of travel, it senses a collision on the output port. A repeater is thus not completely independent of the medium access control scheme used in the overlying network. Using radio communications on one side of the repeater does not much change this picture unless the medium access control uses contention. Remember that collisions cannot be detected directly on the wireless side (property 5). The repeater thus has no means to detect the collision and propagate it on the other side (assumed to be wired). This precludes the use of protocols exclusively based on collisions. Property 6 will also lengthen the delay incurred in the repeater. It is important to notice that, if the bit rate is different on both sides of a repeater, the repeater must buffer the information. As repeaters are usually bidirectional, the buffering policy will differ from one direction to the other. Let us assume that the bit rate is higher on side A than on side B. As soon as information is received from side A, emission may start on side B. However, bits are emitted more slowly on side B and the repeater must buffer the incoming bits before they can be emitted on side B. Conversely, when something is received from side B, the repeater cannot start relaying the information on side A immediately. It must wait until a sufficient number of bits have been received to emit the complete packet at the side A bit rate. This means that the repeater must know the maximum size of a packet.
26.3.2 Bridges A bridge is a data link layer relay. A bridge receives a complete MAC or link layer controller (LLC) frame, checks it, and possibly forwards it on the other side. Contrary to repeaters, bridges filter the information that is relayed from one local area network (LAN) to another. The different types of interconnected LANs lead to various categories of bridges (Varghese and Perlman, 1990): • Pass-through bridges can be used when the LANs on both sides offer identical data link layer functions and addressing. The frames can then be passed unchanged. Ethernet switches are a recent example of this category of bridges.
© 2005 by CRC Press
Interconnection of Wireline and Wireless Fieldbuses
26-5
• Translation bridges are used when both LANs have data link layer function and addressing that are sufficiently similar to allow a direct translation of data link layer protocol data units (PDUs). • Encapsulation (or tunneling) bridges may be used when the translation is not possible. An incoming frame is encapsulated in the data link layer format on the other link before being forwarded. This is the kind of bridge used when two or more LANs of identical technology need to be interconnected through another one of a different kind. A bridge participates as a node in each of the LANs it interconnects. It receives a copy of all frames transmitted on each one. Obviously, not all frames need to be relayed. For instance, a frame received on side A, whose destination is also on side A, does not need to be relayed on side B. The bridge has thus to learn which frames should be relayed and which ones should not. In particular, special care should be taken to avoid frame looping when several paths exist between two nodes that communicate (ISO/ IEC 7498:1, 1996).
26.3.3 Routers Routers operate at the network layer level. The main difference between bridges and routers is that the latter are not transparent. Routers modify the packets they forward, in particular their address fields. Routers exchange information between themselves in order to find a route on which the packet is conveyed. They can thus find an optimum path between two nodes, whereas bridges only use a subset of the available topology. As the network layer is most of the time absent in fieldbuses, we will not deal further with this case.
26.3.4 Gateways Gateways relay the information on top of the application layer. When a gateway node receives some application service indication, it converts it in a service request on the other LAN on which it is connected. When the corresponding confirmation is received, it transforms it in a reply on the other side. Depending on the available services, an invocation may correspond to more than one request. Similarly, the reply may be constructed from a number of confirmations. In order to overcome the additional delay in the reply to an indication on the LAN on one side, some gateways may make requests on the other side in advance. This is especially true in fieldbuses where the gateway may keep an image of all inputs on one side so that when a read indication is received on the other side, the cached value is returned in the response. We call this kind of gateway a proxy gateway.
26.4 Major Design Alternatives If the interconnection mean is the prime choice, there are other degrees of freedom in the architecture of a mixed wired–wireless network: • Single vs. multiple wireline segments. The wireline portion of the system may be composed of a single fieldbus segment on which all the devices are hooked, or it may be made of several segments that need to be interconnected. • Single vs. multiple wireless cells. As in the previous case, the nodes that are connected through wireless means may be organized in a single cell or multiple cells. • Separate or integrated wireless and wireline subnetworks. In the first case, the various nodes connected through wireless means form a single subnetwork that is interconnected to the wireline network at some points. This is very similar to the integrating of the wireless cellular network and the telephone system. The other solution is to have one or more subnetworks in which wireless and wired nodes are mixed. • Single point or multiple points of interconnection. The interconnection between wireless and wireline transmissions can be done at a single point or at a number of points.
© 2005 by CRC Press
26-6
The Industrial Communication Technology Handbook
• Ad hoc, single-base station or multiple-base stations for the wireless subnetwork. In the first case, all nodes cooperate without any one playing a different role than the others. In the second option, a single node acts as coordinator for the traffic between the various wireless nodes. A third option is to have a number of these stations. And yet another option is whether the base station is the node that serves as the interconnection point. Wireless networks vary along other parameters such as: • Total or partial interconnection. All wireless nodes can see each other when total interconnection is achieved. • Multiple hops or single hop. In the case of multiple hops, traffic from a node addressed to a node that cannot be seen (out of transmission range) is routed by other nodes to the destination. • Satellite. The absence of full visibility can be compensated by a satellite node that retransmits on a different frequency all the emissions from the nodes. This is equivalent to a network of mobile nodes using a satellite to relay the communications. However, we believe that these additional parameters are not prime parameters in an architectural model. We will not consider them further. Although a given interconnection solution is a combination of these options, not all combinations lead to a feasible system. For instance, a wireless cell (data circuit) and a wired segment cannot be interconnected through more than a single repeater per cell. In the next paragraph, we will detail and discuss a number of proposals. When available, relevant references will be mentioned and commented upon.
26.5 Solutions for the Interconnection In this section, we explore the design space and look at the feasibility of various solutions. The practicality of each solution is discussed, and reference to existing work is given when available. Solutions based on routers will not be explored, as most of the fieldbuses do not offer any network layer, thus excluding this solution. The only noticeable exception is LON (EIA-709.1, 1998), which has a seven-layer stack and provides routing between wired segments and wireless nodes.
26.5.1 Repeater-Based Solutions A solution based on a repeater gives the impression that all nodes share the same medium. The same medium access control scheme must be used on the wireless part and on the wired part. This kind of interconnection has a number of advantages. There is no need for a base station in the wireless cells. It is easier to ensure periodicity because a single MAC scheme is used. The same applies to the guarantees for the latency bounds. However, the latency may increase as an effect of a higher number of nodes sharing the same medium. Furthermore, the higher BER on the wireless part has a direct impact on the frame error rate on the network. This causes higher latency bounds because of the necessary retransmissions. Finally, repeaters introduce some restrictions in the design freedom. Between one segment and another or a wireless cell, only a single repeater can be used. Furthermore, loops are excluded from the topology (for instance, S6 could not be a repeater in Figure 26.2). This precludes using multiple interconnection points. The properties of wireless communications impact in the same way all the possible solutions. Assuming that the same bit rate is used on both sides of the repeaters, properties 1, 3, and 7 will limit the throughput and increase the latency. Property 5 imposes restrictions on the type of MAC protocol. All protocols that need collision detection cannot be used directly. Finally, property 6 will increase the latency and response time for request–response protocols. According to the degrees of freedom explained above, several solutions are possible: • RPS1: Single wireless cell and single wired segment. This is an interesting solution to interconnect a number of isolated wireless nodes to a single segment. An example of this solution is given in
© 2005 by CRC Press
26-7
Interconnection of Wireline and Wireless Fieldbuses
node
node node
node
Rep.
node node
node
FIGURE 26.1 Repeater-based single-segment multiple-cell architecture.
H3
M1 S1 M2 S2
H2
M6 M3 S3 H3 M4 S4 M5 S5
S6
FIGURE 26.2 R-Fieldbus architecture. M = master station; S = slave station; H = repeater.
Morel et al. (1996) (Figure 26.1). The solution can support multiple wireless cells as long as the wireless nodes in two different cells do not receive the radio signals of each other. • RPS2: Single wireless cell and multiple wired segments. When there is no isolated wireless node, this solution can be used to interconnect two or more wired distant segments together when no cable can be installed in between. • RPS3: Multiple wireless cells and single wired segment. The wired segment may be considered a backbone for the wireless cells. Alternately, the cells may form a single subnetwork (using wirelessto-wireless repeaters) connected to the wired segment through a single repeater. For this, the wireless cells must overlap, and it is necessary to use a different channel for each cell. When wireless nodes may move and traverse different cells, they have to change the frequency of emission and reception. To trigger this process, special messages are sent regularly by a given node of the network. A mobile station that does not receive such a message starts to listen on other channels until it gets one. More sophisticated procedures are also used, in particular to reduce the time necessary to find the right channel (Rauchhaupt, 2002). • RPS4: Multiple wireless cells and multiple wired segments. This solution is basically a generalization of the previous one. The integrated wireless and wireline case is the base of the R-Fieldbus approach (Rauchhaupt, 2002) (Figure 26.2). R-Fieldbus uses a Token Bus medium access control and is thus very sensitive to token losses. Because tokens go through the wireless cells, it is necessary to reduce the BER on the wireless part. This has been done using a different encoding, direct-
© 2005 by CRC Press
26-8
The Industrial Communication Technology Handbook
sequence spread spectrum, and special countermeasures on the receiver side (RAKE receiver). As the bit rate is not the same on the wired part (1.5 Mbit/s) and on the wireless part (2 Mbit/s), buffering is necessary, as explained above. Peter 1999 gives another example of this architecture.
26.5.2 Bridge-Based Solutions Bridge-based solutions may be used when repeaters cannot be used. For instance, in the presence of a large number of nodes, a repeater-based solution may lead to unacceptable periods or latencies. Using a bridge will partition the nodes into two data links. The number of nodes that will share each link will hence be reduced. The latency will follow. Bridge-based solutions may offer a significant reduction in latency compared to repeater-based architectures. This is particularly the case when most of the traffic remains on each data link (cell or segment) and the traffic that goes through the bridge is minimal. Bridges also remove the need to use similar bit rates on the wired part and on the wireless one. However, they are not without drawbacks. In a bridge-based solution, a frame emitted on the segment (or cell) on one side is retransmitted on the other using the medium access control protocol of the latter. There is hence an additional waiting time before the frame is actually emitted on the second segment. The maximum delay depends on the medium access control protocol used. With a token-passing protocol, the maximum delay is at minimum the token rotation time. If a master–slave configuration is used, the maximum delay is the maximum time that may elapse between any two successive polls. In many cases, this delay has to be doubled. Many fieldbuses proceed by transactions in which one station, the initiator, sends a frame and the responder replies with another frame that can be an acknowledge or contain some data. In order to bound the transfer latency, the delay that may elapse between the initiator frame and the response is bounded by an upper limit. This is also used to set a timer that will trigger a retry if it times out. The bridge introduces an additional delay in the initiator frame as well as in the response when the initiator and responder are on different segments or cells. These two delays must be added to the time-out limit, increasing accordingly the maximum latency of the fieldbus. In other words, bridges have a negative impact on the real-time guarantees. The properties of wireless communication have a lower impact than on repeater-based solutions. Property 1 may be mitigated used forward error correction on the wireless cell. Using error detection and consecutive retries would not be a good solution because this would increase further the maximum additional delay in the bridges. Imagine, for instance, the architecture depicted in Figure 26.3. A station on the wired segment initiates a transaction. The responder is a wireless node. One of the bridges retransmits the request over the corresponding wireless cell after some medium access delay. The
node
node
node
bridge
node
FIGURE 26.3 Isolated node approach.
© 2005 by CRC Press
node
node
node
node
bridge
node
26-9
Interconnection of Wireline and Wireless Fieldbuses
node
node bridge node
node
node
bridge node
FIGURE 26.4 Interconnection of two wired fieldbuses through a wireless cell using tunneling bridges.
responder replies before the maximum response delay. The response is not received correctly by the bridge, which retries, thus extending the maximum delay before the response is sent to the initiator. The same happens if the reply is lost as a result of property 7 or 9. According to the degrees of freedom explained above, several solutions are possible: • BRS1: Single wireless cell and single wired segment. The bridge may be used as the base station for the wireless cell. In the office world, this is typically what is obtained using an 802.11 cell coordinated by an access point connected to an 802.3 segment. In the fieldbus domain, this architecture would be an alternative to RPS1 that relaxes the constraints of the repeater but is likely to increase the achievable latencies and periods. • BRS2: Single wireless cell and multiple wired segments. This solution has been suggested to interconnect two fieldbuses that cannot be linked through wires (Cavalieri and Panno, 1998). It is depicted in Figure 26.4. In the proposed solutions both segments use the same protocols and the connection is made through tunneling bridges. The solution offers a better reliability than solution RPS2 because the wireless part may be rugged. Buchholz et al. (1991) report a similar approach in the in-building context. The system is made of a number of Ethernet segments that are interconnected through a number of wireless cells. The options are identical with two exceptions: there may be more than a single wireless cell, and due to the higher bit error rate, each Ethernet frame is segmented before it is encapsulated in an LLC frame of the wireless protocol. • BRS3: Multiple wireless cells and single wired segment. This architecture comes in two flavors. Each cell may use a different channel, as presented by Leung (1992). In this approach, a number of wireless nodes are connected to a wired segment through a number of bridges. The set of wired nodes (including the bridges) uses a different protocol than the wireless protocol, but the functions of the data link layer are identical and the addressing is common to all nodes whether wireless or wired. The bridges are thus of the transparent or translating categories. To avoid duplicate transmission, bridges have to learn the topology so as to decide whether to forward a received frame. A second option is to use the same channel for all cells and rely on time or code division to separate the traffic. • BRS4: Multiple wireless cells and multiple wired segments. Contrary to RPS4, in this solution there may exist more than a single path between any two nodes. Bridge protocols will ensure that no message looping will occur. Bridges are sometimes interesting solutions either when repeaters cannot be used or to partition the network into smaller domains. Note that in general a bridge requires that the upper layers (network to application) are identical on both sides.
© 2005 by CRC Press
26-10
The Industrial Communication Technology Handbook
26.5.3 Gateway-Based Solutions With gateways, operation of each segment or cell may be governed by completely different protocols at all Open System Interconnection (OSI) levels. This is useful when the wireline segment and the wireless cell are built around protocols that are not compatible at the data link layer, precluding the deployment of bridges. For instance, a protocol based on a client–server model such as Profibus (CENELEC EN 50170, 1996a) will not include the same information at the data link layer than a protocol based on producer–consumer on top of broadcast source addressing such as WorldFIP (CENELEC EN 50170, 1996b; Solvie, 1994). The first one will include the address of the source node and the address of the destination node. The second will only include the identification of the data. It is clearly impossible to use a bridge in such a case. It is also necessary when the application layers differ on each side. A wireless WorldFIP (Roberts, 1993) on one side and a wired CANOpen (CENELEC EN 50325-4, 2002) on the other, despite their compatible medium access control frames, cannot be bridged. As both networks operate asynchronously, in terms of latency, gateway-based solutions suffer from the same drawbacks as the bridge-based solutions. The latency is even a little higher because of the additional overhead introduced by higher protocol layers. Except in special cases (see below), the worstcase latencies (worst-case response time) may be estimated as the sum of the values for the individual networks traversed increased by the penalty introduced by the gateways themselves. This may or may not be worse than using bridges, although as a rule of thumb, worse is more likely. Let us give a case in which the worst-case latency is better. In a bridge-based solution, the time-out length for transactions will be increased to take into account the longer response times of nodes that are on another data link (segment or cell). If we take into account these new values for time-out and the higher probability of loss due to the wireless part, the worst-case latency may be increased significantly. Basing the approach on a gateway means that the original values for time-outs are preserved. The higher probability is accounted on the wireless part (not on all transactions). The result is that the bridge-based solution will give a higher worst-case latency than the solution using a gateway. An interesting solution to improve the worst-case latency or decrease the sampling period for process values is to have a gateway that acts as a proxy. The gateway responds on one of its sides as it was a node of the network on its other side. An example of this approach is illustrated in Morel and Croisier (1995) (Figure 26.5). A wireless cell is connected through a single gateway to a WorldFIP (CENELEC EN 50170, 1996b) fieldbus. The gateway acts as a base station for the wireless cell. The gateway approach has been selected because of the strict temporal constraints in WorldFIP. This fieldbus requires that a maximum of 70 bit times separate the last bit of a request and the first bit of the corresponding response. This requirement cannot be fulfilled using bridges. It is very difficult to fill using repeaters. The gateway acts as a proxy representing the wireless nodes to the wireline part. The wireless cell is exploited in a round-robin manner according to a constant cycle. At the beginning of the cycle, the gateway broadcasts to all wireless nodes their update values in a single message. The wireless cells use this message as a sampling order, capture the values, and prepare their responses. Each node sends its response after the previous one. No explicit polling is done because of property 6, which would increase the response time.
26.6 Amenability to Comply with the Fieldbus Requirements Table 26.1 gives a summary of the capability to satisfy the fieldbus requirements (see Section 26.2.1) with the different architectures. For events and messages, latency is what matters. This issue has been discussed at length above and the results are just summarized in the table. The objective to have periodic sampling and exchange of process data can be easily satisfied using some broadcast mechanism to sample and an adequate control of the medium access control (Fonseca, 1999) when repeaters are used. With bridges and gateways, the same level of control is possible in each individual segment or cell, but the combination does not give the required periodicity and simultaneous sampling. The solution lies in the definition of a distributed clock synchronization algorithm. Simulta-
© 2005 by CRC Press
26-11
Interconnection of Wireline and Wireless Fieldbuses
node
node node
node
gate way node
node
node
FIGURE 26.5 Interconnection using a proxy gateway. TABLE 26.1 Repeater Ability to Help in Fulfilling the Fieldbus Requirements Degree of Fulfillment Using Fieldbus Requirement Bounded latency
Periodicity Simultaneous sampling Consistency indication Event ordering Rugged
Repeaters
Bridges
Gateways
Easier to prove if MAC is adequate. Longer delay due to higher number of stations in the same link Easier as the same MAC scheme is used Easily implemented through broadcast Simple to implement (i.e., CENELEC EN 50170, 1996) May be based on frame ordering Higher BER in wireless part will increase overall frame error rate and latency
Possible but more complex to calculate than with repeaters
Possible, sum of figures for each individual network
Difficult to achieve unless synchronized clocks are used Requires synchronized clocks
Difficult to achieve unless synchronized clocks are used Requires synchronized clocks
May be obtained using time stamps Requires synchronized clocks
May be obtained using time stamps Requires synchronized clocks
Better than repeater because wireless part may use a different protocol
Similar to bridge based case
neous sampling is then based on this clock. Accuracy is highly dependent on the medium access control protocol, the network adapter interface hardware, and software implementation. Absolute and relative temporal consistency may be obtained easily using mechanisms like those of WorldFIP (CENELEC EN 50170, 1996b) when repeaters are used. They fail when bridges and routers are used. Here again, synchronized clocks can be used, each information being stamped with its production instant, the stamp being transported with the data. By comparing the stamp values, the two kinds of consistency may be evaluated. Event ordering may use the distributed clock. When repeaters are used, one can save the burden of implementing such protocols and use the frame sequencing provided the required granularity is not too small. Finally, the repeater-based solution is more sensitive to the higher the bit error and frame drop rates (properties 1, 3, and 7 to 9), unless some of the wireless link is made more robust by some form of forward error correction or by improving the receiver design (Rauchhaupt, 2002).
© 2005 by CRC Press
26-12
The Industrial Communication Technology Handbook
26.7 Conclusion There are numerous architectures that combine wireless cells and wireline segments. The choice of a solution for a given problem depends on the used protocols, the kind of guarantees offered, and the constraints, in particular with regard to time. Here we have explored the solution space and discussed the applicability of the various options, as well as their pros and cons. This study has also shown that some architectural choices may not be offered for some fieldbus protocols and some combinations of protocols. In general, the lower the interconnection in the OSI model the interconnection, the better the performances are. We have seen that this general rule must be carefully checked in each particular case. This can be verified in a few cited examples.
References Buchholz D., et al., Wireless in-building network architecture and protocols, IEEE Network Magazine, 5, 31–38, 1991. Cavalieri S., Panno D., On the integration of Fieldbus traffic within IEEE 802.11 wireless LAN, in Proceedings of the 1997 IEEE International Workshop on Factory Communication Systems, 1997, pp. 131–138. Cavalieri S., Panno D., A novel solution to interconnect Fieldbus systems using IEEE wireless LAN technology, Computer Standards and Interfaces, 20, 9–23, 1998. CENELEC EN 50170, General Purpose Field Communication System, Vol. 2/3 (Profibus), 1996a. CENELEC EN 50170, General Purpose Field Communication System, Vol. 3/3 (WorldFIP), 1996b. CENELEC EN 50325-4, Industrial Communications Subsystem Based on ISO 11898 (CAN) for Controller-Device Interfaces: Part 4: CANopen, 2002. EIA-709.1, Control Network Specification, March 1998. Fonseca J.A., Almeida L.M., Using a planning scheduler in the CAN network, in 7th IEEE International Conference on Emerging Technologies and Factory Automation, October 1999, pp. 815–821. ISO/IEC 7498:1, Information Processing Systems–Open Systems Interconnection, Basic Reference Model: The Basic Model, 1996. ISO/IEC 8802.2, Information Technology: Telecommunications and Information Exchange between Systems: Local and Metropolitan Area Networks: Specific Requirements: Part 2: Logical Link Control, 1998. Kopetz H., Consistency Constraints in Distributed Real Time Systems, in Proceedings of the 8th IFAC Workshop on Distributed Computer Control Systems, Vitznau, Switzerland, September 1988, pp. 29–34. Leung V., Diversity interconnection of wireless terminals to local area networks via radio bridges, Electronics Letters, 28, 489–490, 1992. Morel Ph., Intégration d’une liaison radio dans un réseau industriel, Ph.D. thesis 1571, Swiss Federal Institute of Technology (EPFL), Lausanne, 1996. Morel Ph., Croisier A., A wireless gateway for fieldbus, in Sixth IEEE International Symposium on Personal, Indoor and Mobile Radio Communications PIMRC ’95, 1995, pp. 105–109. Morel Ph., Croisier A., Decotignie J.D., Requirements for wireless extensions of a FIP fieldbus, in Proceedings of 1996 IEEE Conference on Emerging Technologies and Factory Automation EFTA ’96, 1996, pp. 116–122. Murdock G., Goldie J., Build a direction-sensing bidirectional repeater, Electronic Design, 37, 105–108, 110, 1989. Perlman R., Interconnections: Bridges and Routers, 2nd edition, Addison-Wesley, Reading, MA, 2000. Peter M., The use of radio technology in the fieldbus area: using Interbus as an example, in Proceedings of FeT ’99, Magdeburg, Germany, September 1999, pp. 55–60. Pleinevaux P., Decotignie J.D., Time critical communication networks: field busses, IEEE Network Magazine, 2, 55–63, 1988.
© 2005 by CRC Press
Interconnection of Wireline and Wireless Fieldbuses
26-13
Rauchhaupt L., System and device architecture of a radio based fieldbus: the R-Fieldbus system, in Proceedings of the 2002 IEEE International Workshop on Factory Communication Systems, 2002, pp. 185–192. Roberts D., OLCHFA: a distributed time-critical fieldbus, in IEE Colloquium on Safety Critical Distributed Systems, 1993, pp. 6/1–6/3. Saba G., Mammeri Z., Thomesse J.P., Some solutions for FIP networks interconnection, in Proceedings of WFCS ’95, Leysin, Switzerland, October 1995, pp. 13–20. Solvie M., Configuration of distributed time-critical fieldbus systems, in Proceedings of 2nd International Workshop on Configurable Distributed Systems, 1994, p. 211. Varghese G., Perlman R., Transparent interconnection of incompatible local area networks using bridges, IEEE Journal on Selected Areas in Communications, 8, 42–48, 1990.
© 2005 by CRC Press
IV Security and Safety Technologies in Industrial Networks 27 Security Topics and Solutions for Automation Networks............................................27-1 Christian Schwaiger and Albert Treytl 28 PROFIsafe: Safety Technology with PROFIBUS ...........................................................28-1 Wolfgang Stripf and Herbert Barthel
IV-1 © 2005 by CRC Press
27 Security Topics and Solutions for Automation Networks 27.1 Introduction ......................................................................27-1 27.2 Basic Security Measures....................................................27-2 Security System Life Cycle • Common Security Measures
27.3 FAN Security......................................................................27-4 Security for Controlled FANs • Security for Uncontrolled FANs
27.4 Security for FAN Connections: Two Examples...............27-9
Christian Schwaiger Austria Card GmbH
Albert Treytl Vienna University of Technology
Overview of the Security Architecture • Implementation of a Secure External FAN Connection • Implementation of a Secure FAN Node
27.5 Conclusion.......................................................................27-14 References ...................................................................................27-14
27.1 Introduction During the past 20 years the main focus in the development of field area networks (FANs, or fieldbus systems) was on meeting the technical requirements of different application areas, resulting in a wide diversity of systems. Parallel standardization efforts, which still continue, have led to today’s widely accepted conclusion that in order to successfully cope with the requirements found in the application areas of interest, different coexisting FAN solutions are needed [1]. Also conceived in the past two decades were the technologies that make up today’s Internet, which ultimately led to the enormous surge in Internet usage in the 1990s together with the wide adoption of the Internet Protocol (IP) family for use in local area networks (LANs). This exponential growth of the Internet also sparked new interest in vertical communication flows according to the computer-integrated management (CIM) pyramid. The logical next step, therefore, is to connect FANs to the Internet. This can be done either via a tunneling approach or through gateways based on either Web technologies such as Hypertext Transfer Protocol (HTTP), Java, and Extensible Markup Language (XML) [2, 3] or higher-level protocols such as Simple Network Management Protocol (SNMP) or Lightweight Directory Access Protocol (LDAP) [4, 5]. While it is not clear today which, if any, of these options would provide the best solution to FAN–Internet connectivity, the notoriously bad security reputation of the Internet (which in its original form provided no security at all) mandates an awareness of FAN security issues. In this respect, it is necessary not to limit oneself to studying security issues arising within the context of FAN–Internet connections. A more general approach should be taken instead, one that includes security measures on the FAN level itself where appropriate. Ultimately, security characteristics should become
27-1 © 2005 by CRC Press
27-2
The Industrial Communication Technology Handbook
accepted properties used in the selection of a FAN for a certain application. Security would therefore become as important as other properties such as topology, structure, bus access, and safety features. The rest of the chapter is structured as follows: Section 27.2 introduces the important notion of security policy and outlines the most basic security measures applicable to a wide range of systems. Section 27.3 gives an overview of where security measures can be applied on the FAN level. Section 27.4 covers security for remote connections to FAN and offers two possible solutions. The first deals with a FAN–Internet gateway (which might be the security topic that is of most interest to the FAN community), while the second gives proof for the concept of a secure FAN node. Finally, Section 27.5 raises several topics of immediate interest should secure FAN technology become a reality.
27.2 Basic Security Measures Security measures for IT systems in any organization aim to achieve three basic security goals: confidentiality, integrity, and availability (CIA). These measures protect data from unauthorized entities and unauthorized manipulation and ensure that data are accessible when needed. In addition, another security goal that is often desired is nonrepudiation, which binds an entity to its transacted commitments. Solutions implemented to achieve these goals are typically based on a security policy that: • Explicitly states the security objectives and the scope of the policy • States who is responsible for implementing the policy Any such security policy must be backed by the organization’s management and be communicated to users of the IT system in question. A good, informal starting point for developing a security policy is the “Site Security Handbook” [6], which is geared toward systems connected to the Internet. Another good introduction is the IT Baseline Protection Manual [7], which aids in the establishment of a security policy and additionally offers common best practices for the implementation of security measures similar to British Standard 7799 (later adopted as ISO/IEC 17799 [8]). Unfortunately, no guidelines for developing a security policy for FANs exist. Below, we discuss security in the context of the life cycle of IT systems and give an overview of the most common technical means to enforce security goals that are also applicable to FANs.
27.2.1 Security System Life Cycle While security activities can be started at any time in the life cycle of a system, it is usually desirable (though often not possible) to integrate them in the whole life cycle [9], where five stages are defined* (Figure 27.1).
Initiation Development Acquisition
Implementation
Operation and Maintenance Disposal
FIGURE 27.1 Basic waterfall life cycle model. *The activities that follow can also be applied (in a slightly modified way) to other life cycle models like, e.g., the spiral model by Boehm [39].
© 2005 by CRC Press
Security Topics and Solutions for Automation Networks
27-3
During the initiation phase, when the system is designed, a basic security design is established by means of a sensitivity assessment. This assessment results in an estimation of the data to be handled by the system and its impact on security. In the development stage, the security requirements as well as the methods with which to implement them are selected and included into the overall system specification. Then follows the implementation phase during which security features of the system are activated and tested. In the operation and maintenance phase, security-related operations such as key exchanges take place. Also in this stage, system audits and monitoring may lead to system changes if new security risks that need to be addressed are found. Finally, the disposal of information hard- and software needs to be handled. Here, the longtime storage of cryptographic keys is a major task.
27.2.2 Common Security Measures The notion of security today is most often associated with the Internet. Most security measures presently available — with the exception of cryptography — are in some way tailored to the Internet, although they are also applicable to a wider range of systems. Without going into precise detail, this section introduces the following important security aspects: • • • •
Cryptography Authentication and access control Firewalls and intrusion detection Security evaluations
Cryptography is appropriate for implementing services where confidentiality and integrity are needed. The major building blocks of cryptography are: • Cryptographically secure pseudorandom number generators (CSPRNGs), used to generate seemingly random entities such as secret keys • Cryptographic hash functions, used either in conjunction with electronic signatures or to generate message authentication codes (MACs) to protect data integrity • Symmetric and asymmetric encryption/signature functions used to protect the confidentiality of data or to construct MACs or electronic signatures Cryptography is discussed in Stinson [10] (with the exception of randomness) and in Menezes et al. [11], which covers all of the above topics and which is a suitable reference for deeper investigation. Authentication measures are used to establish the identity of an entity, e.g., that of a system user. A clear distinction must be made between weak authentication (as found in password-based systems such as UNIX or Windows) and strong authentication, which is usually based on challenge–response protocols that rely on either symmetric or asymmetric cryptography like Kerberos [12] and X.509 [13], respectively. Authenticated entities can be subjected to access control mechanisms that allow the determination of the legality or otherwise of an entity’s desired action. Apart from simple models such as access control lists and other discretionary models, mandatory models exist that are mainly found in military systems. Both models can be implemented by the newer role-based access control (RBAC) schemes [14]. Firewalls [15–17] and intrusion detection systems (IDSs) [18, 19] are complementary technologies usually associated with LAN-FAN connections to the Internet, although both are also useful in a pure LAN environment or where a LAN is connected to a FAN and the LAN is shared by different organizational entities. Firewalls try to prevent illegal access to inbound and outbound connections by, for example, packet filtering, while IDSs monitor and analyze network access patterns and try to recognize security breaches or attacks, which are then referred to a system administrator (and responded to). The building of a secure system is a complicated task involving a variety of different skills and knowledge, yet it still might not succeed even if the necessary care is taken [20]. A prominent example
© 2005 by CRC Press
27-4
The Industrial Communication Technology Handbook
of a failed security system is Wireless Equivalent Privacy (WEP), which was designed to protect 802.11 wireless LANs but which failed miserably to do so [21]. Third-party security evaluation is a method with which to build confidence into a system beyond the security statements of the manufacturer. The Common Criteria 2.0 (CC) [22], for example, is an international standard that regulates security evaluations, and a positive evaluation of a security target (i.e., the product) against the CC results in an internationally acknowledged certificate. Besides this assurance, which relates to the efforts put into security engineering, the certificate itself contains a description of the security functionality that was assessed during the process. Additionally, in order to help prepare for certification, there exist so-called protection profiles (PPs) that describe implementation-independent sets of security requirements for specific security areas, e.g., firewalls. There are, however, currently no PPs for FANs available.
27.3 FAN Security A hypothetical way to secure a FAN would be the adoption of well-known security principles and protocols from the Internet environment. Unfortunately, the reality is not so straightforward. One reason is that security protocols for the Internet such as secure sockets layer (SSL)/transport layer security (TLS) [23] and the newer IPSec protocol suite [24] operate on top of or at network layers (according to International Organization for Standardization (ISO)/Open Systems Interconnection (OSI)). These layers are not defined in most FANs where, for reasons of speed and simplicity, one normally finds only layers 1 and 2. Specifications of layer 7 and above (sometimes referred to as layer 8 or user layer) may also occasionally be found, but are outside of the OSI model. Another problem in applying Internet security standards is that no provisions for real-time support are made, which could lead to problems if applied to a FAN. Furthermore, because of the assumption that communicating parties do not know each other in advance, many Internet security protocols (including those listed above) rely heavily on asymmetric cryptography. A notable exception is the Kerberos authentication protocol, which was designed with an educational institutional scale in mind. Finally, in considering FAN security, one may face a situation where one or all of the communicating nodes are not under the control of their rightful user(s), which is an unusual situation in conventional LANs. Although security of FAN systems per se has seldom been discussed in literature, secure FAN nodes [25] and an overview of possible security classes for FANs that have different security implications [26] have been addressed. The major problems that have to be faced when FANs are to be made secure are: • FANs have no built-in provisions for security and are usually designed as lightweight communication systems. When security services are added to an existing application area, suboptimization of the protocol will most probably occur. • Cryptographic operations such as encryption needed to secure FAN communication might be prohibitive because of the limited processing power of current FAN nodes, even more so if realtime needs have to be addressed. • The FAN nodes as well as the communication media might be under total control of the adversary. • The number of FAN systems is overwhelming, and securing all of them is an unfeasible task. On the other hand, some traits of FANs ease security engineering compared with other systems. Most notably these are: • FAN nodes have a very limited complexity. In comparison to the general-purpose computers linked to the Internet, FAN nodes are designed to solve one special problem that makes them easier to understand and analyze. • The protocols used in FAN systems are also less complex than those found in other network systems, which can be attributed to their restriction to only a few layers or to the usage of a single communication protocol. Most FANs only implement the lower layers of the protocol stack and the application layer, and in cases where this does not hold, the limitation to a single protocol (contrary to the protocol
© 2005 by CRC Press
27-5
Security Topics and Solutions for Automation Networks
TABLE 27.1 OSI Security Services in the Different Layers Security Service
Possible Layers
Peer entity authentication Data origin authentication Access control service Connection confidentiality Connectionless confidentiality Selective field confidentiality Traffic flow confidentiality Connection integrity with recovery Connection integrity without recovery Selective field connection integrity Connectionless integrity Selective field connectionless integrity Nonrepudiation origin Nonrepudiation delivery
3, 4, 7 3, 4, 7 3, 4, 7 1–4, 6, 7 2–4, 6, 7 6, 7 1, 3, 7 4, 7 3, 4, 7 7 3, 4, 7 7 7 7
proliferation found on the Internet) contributes to the simplicity of the system. Additionally, the simpler a protocol, the more manageable it is for efficient implementation, which is desirable should real-time behavior be of concern. • As a consequence of the above, the operating systems (OS) found in FAN nodes need not be as complicated as a typical computer OS. The following discussion uses the data listed in Table 27.1, which shows basic security services and the layers in the OSI model [27] where they may possibly be implemented. The following subsections attempt to classify typical FAN application scenarios with respect to their impact on security, according to [26]. Essentially, a distinction can be made between controlled and uncontrolled FANs.
27.3.1 Security for Controlled FANs A controlled FAN (CFAN) environment is a FAN installation where physical access to FAN nodes, and sometimes access to the underlying communication media, is controlled by the owner of the FAN system. This situation is comparable to the situation found in the usual Internet environment, where one computer or LAN of an organization is connected to the Internet. For a CFAN, one can further distinguish three different, mutually inclusive scenarios (Figure 27.2). The stand-alone FAN (SAF) consists of a FAN-only installation that is not connected to the outside. The LAN-integrated FAN (LIF) and Internet-integrated FAN (IIF) correspond to a FAN installation that can be accessed via a connected LAN or the Internet, respectively.
STS
IIF
LIF
FIGURE 27.2 Three controlled FAN scenarios.
© 2005 by CRC Press
SAF
27-6
The Industrial Communication Technology Handbook
The inclusion relationships, shown in Figure 27.2, correspond to the actual security threats and solutions (STS) that are applicable to the different scenarios. While the inclusion relationship always holds, it has to be noted that the set STS is highly dependent upon the underlying security policy. This shows the importance of deriving a security policy for the specific system on hand. Addressing the CIA security goals mentioned in Section 27.2, we now focus on possible solutions for confidentiality and integrity. Provisions to ensure availability in a FAN environment (apart from the provisions already designed into the FAN) will be hard to achieve. In the following discussion, the placement of security functions according to ISO/OSI (Table 27.1) will be helpful. Moreover, the fact that most FAN systems only implement OSI layers 1, 2, and 7 is of importance, as the reduced OSI stack of modern FANs is usually the result of efficiency considerations. In application domains such as in building automation, however, where time requirements are more relaxed and extensive networking capabilities are needed, an implementation of the full OSI stack may be found. Concerning the placement of confidentiality services in the network stack, the majority of these are placed at layers 1, 2, and 7, although placement above layer 7 is also possible. Integrity services can be placed at layers 3, 4, and 7 of the OSI stack or, again, above layer 7. However, the restriction to layers 1, 2, and 7, often necessary in a FAN environment, limits the application of integrity services to layer 7 or above. Taking into account that in a FAN environment one often finds dedicated microcontrollers designed to handle a specific FAN interface and protocol, a security solution that is targeted above layer 7 appears to be a sensible choice in all cases where one tries to secure an established FAN system. Moreover, in FAN systems where one finds an abstraction layer above layer 7, an inclusion of security services in this layer would be of special interest. Such a solution would have the special benefit that the user layer of a FAN system is typically targeted at the standardization of interoperable services or application profiles. The process of standardizing such services is carried out by a user group (usually comprised of the organizations that have an interest in the FAN), leading to a broad acceptance in industry. Apart from the expected acceptance, this solution is also of security interest as it provides end-to-end security in its narrowest sense: it secures all communication between two nodes of a FAN at the application level. In the absence of a widely accepted user layer into which security services could be integrated, or in instances where a specific user group sees no necessity to add such services (as may often presumably be the case), a generic solution above layer 7 can still be devised. The drawback, though, would be a loss of interoperability with other solutions, resulting in increased development efforts and, consequently, increased costs. In both approaches, security services could be provided by a wrapping mechanism. This mechanism takes an engineering value (i.e., a measured input value that has been either sampled with a sensor or first sampled and then preprocessed at the node) and applies this value to either an integrity mechanism or both an integrity and a confidentiality mechanism. Figure 27.3 shows an example of the second case, where the field header contains information about the security transformations applied. The MAC field contains integrity check information.
Engineering Value
Header
Encrypted Engineering Value
FIGURE 27.3 Wrapping an engineering value.
© 2005 by CRC Press
MCA
Security Topics and Solutions for Automation Networks
27-7
The integration of both integrity and confidentiality mechanisms into most of today’s FANs could be prohibitive as they require processing power that might be out of range for the typical low-cost 8-bit microprocessors found in typical FAN environments. Another drawback of solutions such as that depicted in Figure 27.3 is that the limited payload of FAN messages might not allow for the message expansion needed to add confidentiality or encryption. This is a problem especially for integrity mechanisms, as they always add additional data to the transmitted message. For the confidentiality-only mode, one could simply encrypt the engineering value and omit the header of the message. In this scenario, the two communicating nodes would need to know in advance whether the data to be sent or received was in encrypted form. If missing processing power in the nodes is a concern, one could also think of applying the necessary security services to the lower layers in the OSI stack. Although a first look at Table 27.1 shows that only confidentiality services can be added at the lower level, it is still possible to add the required services at layer 2 if the necessity arises. This has actually been the case with Ethernet where Institute of Electrical and Electronics Engineers (IEEE) [28] augments the services found in the OSI security model to also include integrity, authentication, and access control. While this could be a feasible solution, one has to keep in mind that the resulting protocol will also need to support non-layer 2 management functionalities handling, for example, key exchange. A solution for providing confidentiality or integrity would, however, still leave a major security threat unanswered: network management. In today’s FANs, this functionality is typically achieved in an unauthenticated manner, an approach that is tantamount to an invitation to an informed attacker who has gained access to the FAN. Management functionality might even include the possibility of totally reprograming a node by, for example, directly overwriting parts of the node’s memory. This would enable an attacker to gain complete knowledge about the programming of the node by reading out configuration and application data from the memory. It is not difficult to imagine situations where such behavior is not acceptable. The minimum functionality needed to achieve security in this area would be the provision of an access control mechanism, a task that could be supported by integrity mechanisms that allow the authentication of configuration communication messages. The implementation of access control for network management should pose few problems. Although the data transmitted might be larger than usual FAN messages, and maybe even multipart messages might be needed, such operations are usually not time critical. Restrictions as to the acceptable maximum length of a message and the maximum time a transmission may take are also usually very relaxed.
27.3.2 Security for Uncontrolled FANs An uncontrolled FAN (UFAN) environment is a FAN installation where physical access to the communication nodes that make up the FAN, as well as access to the underlying communication media, is controlled by a user (or a group of users) who is not the owner of the system. Given these circumstances, such users have to be considered to be adversaries. Common UFAN examples are a utility company’s remote metering device deployed at a customer’s site and a FAN deployed in a car. Another comparable situation that has been well studied, but where neither security mechanism nor attack results are openly published, is the transmission of secure payper-view programs via cable or satellite, where only entitled subscribers should be able to unscramble the program. These examples make it clear that for a UFAN environment, one needs to distinguish between different kinds of access that the owner/manufacturer has after the FAN has been deployed. In a remotely maintainable FAN (RMF), there exists a legitimate system owner who has the possibility to communicate regularly with the system, whether periodically or on an ad hoc basis. With remote metering, one can imagine monthly communication between the utility company and the meter in the
© 2005 by CRC Press
27-8
The Industrial Communication Technology Handbook
case of a service contract between the user and the company. Alternatively, if the user prepays for energy, communication could be on a per-payment basis. Nonmaintainable FANs (NMFs) are those systems where the rightful system owner has no regular possibility to communicate with the system after its deployment, although some communication could become possible in unforeseen circumstances such as a system failure. An example NMF is a FAN deployed in a car. Comparison of the RMF and NMF scenarios outlined above, through looking at the set of possible security threats and solutions, shows that both scenarios are different. Although they do overlap in some parts, e.g., the adversary has access to the physical parts of the FAN, they are not mutually inclusive. For example, the RMF is exposed to outside threats that are adversarial third parties. These third parties are neither the system owner nor the entity where the system is deployed. They are also not proper sub- or supersets of the SAF, LIF, and IIF scenarios described in Section 27.3.1, mainly because the whole FAN infrastructure is accessible by the adversary. But one can again see that the sets are highly dependent on the underlying security policy and security assumptions. Overall, UFAN systems seem to be more at risk than CFAN systems because the adversary has more possibilities to circumvent possible security measures. Additionally, analysis and results of security issues for such configurations have not been extensively dealt with yet. As far as ways and means to secure UFANs are concerned, one natural train of thought leads to the investigation of physical measures to secure the different nodes of a FAN as well as the sensors and actuators connected to them. Depending on the security policy, this can lead to simple solutions where seals are attached to the devices (as is often done today to prevent the manipulation of power meters). In this case, it is the legal implication associated with the removing of the seal that acts as a deterrent. Technologically stronger solutions attempt to construct tamper-proof devices that either notify the system owner if they are tampered with (in the case of an RMF system) or deactivate themselves (NMF system). In the latter case, apart from trying to prevent physical access to the nodes, one usually finds special security modules such as smart cards that are useful to protect sensitive data as well as to recognize attempted tampering. If it is not possible or feasible to physically protect the nodes, their programs, and their memory, then security considerations regarding lower layers of the network stack are in vain: the protection possibly achievable at those layers does nothing to prevent possible attacks at higher levels. Together with physical node protection, lower-layer security is, of course, of interest again and might — as in the CFAN scenario — help to remedy problems that could arise with the use of integrity services. Continuing considerations regarding physical security and the automatic deactivation of nodes in case of tampering, the possibility to include intrusion detection mechanisms in a node seems to be an interesting topic of research for the future. Such a mechanism could become feasible in UFANs that remain unchanged after deployment, meaning that no nodes are added or removed and that the communication patterns between the nodes remain unchanged or only change in a way that can be anticipated. If such an analysis were possible, a node that detects an intrusion could either contact the system owner or deactivate itself. The actual action taken would depend on the system configuration; e.g., in the case of an RMF, a failure to contact the system owner after detecting an intrusion would also result in deactivation. It would, however, also be necessary to determine the correct point in time for deactivation because, for example, it would be both inconvenient and dangerous if a UFAN in a car were to stop working while the car was being driven. Finally, a critical point in UFAN security is again network management. In an RMF scenario it is easy to see the absolute necessity for a point-to-point security solution where it is conceivable that the same mechanisms that may be utilized in a CFAN would be helpful. Conversely, in the NMF scenario one would like to prevent any management activities that concern the configuration of the nodes as well as the programs that are executed on them. Configuration data that reside in a ROM (or maybe in a once writeable WORM) and that are therefore unchangeable would be the easiest solution in this case, although such an approach might be prohibitive in a situation where, due to maintenance reasons, single nodes needed to be exchanged or updated.
© 2005 by CRC Press
27-9
Security Topics and Solutions for Automation Networks
Intranet/Internet
FAN
Gateway
FIGURE 27.4 External connection to a FAN.
27.4 Security for FAN Connections: Two Examples When a FAN is connected to an intranet or the Internet (either directly or indirectly), a situation as shown in Figure 27.4 arises. Access to the FAN is typically achieved via a central access point. In many cases, this is a gateway translating between the FAN protocol and the protocol used to access the gateway, which is subsequently assumed to be part of the Transmission Control Protocol (TCP)/IP suite. A common implementation is a Web-based solution where the gateway translates FAN data to accommodate a transport format such as XML [29] over HTTP. Yet this structure also allows for implementations other than the pure gateway approach. Other possible solutions are where FAN data are tunneled over the external network or where the IP is tunneled over the FAN and handled in the node that is the sender or recipient of a message. But whereas the former solution is commonly used, the latter appears very awkward if one compares the size of IP messages to that of typical FAN messages. Access to the gateway will usually be achieved either through a firewall, which might be incorporated into the gateway itself, or via a dial-up connection. (More information about the different possibilities of where to place a firewall and an accompanying IDS and how to configure it can be found in the references of Section 27.2.) Configuration, of course, heavily depends on the IT environment at hand. It is important to recognize that the secure connection of a FAN to an external IP network is more than just the application of any IP security protocol found on the Internet, as can be seen in the example presented below. Contrary to common conception, there is no need to rely on any of the very time consuming asymmetric security primitives that are often assumed when Internet–FAN connections are considered. The reason for this lies in the fact that all parties involved in the communication are previously known to each other. This is a very typical property of Internet–FAN connections and often alleviates system design workloads. The remainder of this section gives examples of security architectures for a residential gateway and a secure FAN node. Here, a need for a very high level of security and a limited demand for bandwidth are assumed. The target application is a residential gateway and a FAN node primarily used to remotely access energy meters in order to facilitate simplified and automatic billing. It may include the application of flexible (time- and consumption-dependent) tariff schemes, and for add-on services, the gateway could also be connected to a home automation network (see [30] for a detailed description of such a gateway). Such an application would be a UFAN, specifically the RMF subtype. From a security point of view, this configuration presents a worst-case scenario: the property owner has an interest in paying as little as possible for the energy consumed and therefore has to be considered an adversary. Worse still, the property owner has unlimited physical access to the entire installation, unless gateway and energy meter are locked in a sealed cabinet and there is no in-home network at all. Therefore, security demands are high.
27.4.1 Overview of the Security Architecture It is an important point to note that, in this example, the relationship between gateway and a client in the Internet is not just one to one. In fact, the utility company desiring remote access to the energy meters
© 2005 by CRC Press
27-10
The Industrial Communication Technology Handbook
Key distribution center off-line on-line
security client
smart card
security server access control
smart card security administrator
smart card FAN module
FAN
Gateway
FIGURE 27.5 Architecture of the secure FAN–Internet gateway.
has numerous customers as well as more than one data acquisition client. In addition, the whole concept must take into account that the communication infrastructure can be used by third-party service providers who need to be granted access to the gateway as well, albeit that they are only allowed access to different sets of data. The security architecture therefore becomes rather complicated and consists of an offline and an online part as shown in Figure 27.5. As can be seen, the key distribution center, which operates offline for security reasons, generates the necessary keys that reside securely on tamper-proof smart cards [31]. Such smart cards are also used for the execution of cryptographic algorithms. The online participants are the security clients and administrators (who may also be one entity) that access a remote FAN over the gateway using the smart cards to provide the necessary keys and security algorithms. While the clients are only entitled to access the FAN for monitoring and configuration purposes, administrators may additionally change access control settings or the configuration of the security server. On the gateway itself, the security server accepts and handles connections from the Internet using an access control component to enforce access restrictions to the FAN. For the actual FAN access, a FAN-specific module translates between the different protocols.
27.4.2 Implementation of a Secure External FAN Connection The secure communication between clients or administrators and the gateway requires a minimal set of information to be carried in a message. In this example, the message format shown in Figure 27.6 was chosen because it not only provides integrity and confidentiality, but also illustrates the elements typically found in secure communication.
UID MODE Scope of authentication TIME
UID...User ID MAC...Message Authentication Code LEN... Length of Data
MAC LEN
REQDATA...Request Data
REQUEST Scope of encryption
FIGURE 27.6 The communication message format.
© 2005 by CRC Press
RECEIVER REQDATA
Security Topics and Solutions for Automation Networks
27-11
DOMAIN
KDOM (random, unique) Level 0
f KDOM (RND)
K(AGA, AGE) (unique)
f KDOM (RND)
K(GUA, GUE) (unique)
Level 1
f(AGA, AGE) (GID)
K(GA, GE)
f(GUA, GUE) (UID)
K(UA, UE)
Level 2
FIGURE 27.7 Key hierarchy.
Apart from the user identifier (UID) that indicates the sender of the message, the MODE field allows the implementation and use of different security primitives. This can become necessary, for example, if the security of one of the used primitives is broken. The inclusion of a time field (TIME), containing the (notion of) time at the gateway involved in the communication, allows the definition of a time window of validity for a message and also prevents replay attacks of previously sent messages. In messages from the gateway, this field is used to update the local notion of the gateway’s time. The MAC protects the integrity of the message and is calculated over the whole message, while the LEN field indicates the length of the following encrypted REQUEST, which is split into the actual request data and the receiver of the data. The receiver is either the security server in the case of configuration messages or the underlying FAN. The RECEIVER field can also facilitate the support of multiple FANs connected to the gateway. The security mechanism used to provide data encryption is the remotely keyed encryption scheme (RKES) [32], which utilizes the secret keys of the smart card to initialize the encryption. The bulk of the work is then done by the gateway processor, which calculates faster than the smart card and is also able to cope with bigger messages. The authentication mechanism uses secure hash function SHA-1 [33] and sends the resulting data to the smart card, where they are encrypted using 3-DES (triple data encryption standard) [34]. The result of this operation is then used as the MAC of the message. Figure 27.7 shows the key hierarchy needed in the system. Because we foresee that more than one gateway would be required, we propose a domain system where users and administrators of a domain can exchange messages with all gateways of the domain. In order to achieve this functionality, a random and unique key KDOM is generated in the key distribution center for each domain. This key is subsequently used to derive all other keys of that domain. Looking at the left part of Figure 27.7, one can see that the keys KAGA and KAGE are derived from the master key using a randomly chosen input at level 1 of the key hierarchy. These two new keys are the respective master keys for authentication and encryption of administrator-to-gateway messages. They are also used to derive the keys KGA and KGE of each domain gateway from the unique gateway identifiers (GIDs). This scheme allows an administrator to communicate with every gateway because she can derive every level 2 gateway key by applying the key derivation function f to the GID of the requested gateway using the master keys of level 1. Additionally, if the secret keys of a gateway are compromised, the system can continue to work because each gateway has its own unique set of keys.
© 2005 by CRC Press
27-12
The Industrial Communication Technology Handbook
The right part of Figure 27.7 shows the same procedure for the generation of keys that are used in the communication between a gateway and a user. Function f, which is used to derive the keys of levels 1 and 2 in Figure 27.7, is the application of SHA-1 to the argument followed by an application of 3-DES in EDE (encrypt-decrypt-encrypt) mode, which finally yields the key: f KEY(◊) = 3 - DES - EDE KEY(SHA - 1(◊))
(27.1)
The index KEY in Equation 27.1 designates the master key used as a parameter in the DES algorithm. To derive, for example, the secret encryption key of a user with the UID X at a gateway, one needs to calculate K UE = f KGUE(X ) = 3 - DES - EDE KGUE(SHA - 1(X ))
(27.2)
The final functionality needed in the system is an access control mechanism. The easiest way to realize this functionality is the implementation of an access matrix operating on the FAN node level that allows read, write, and create access to data. Depending on the access rights of the user, it is possible to assign a Boolean flag to each entry indicating whether the following modifiers constitute an access or denial list for the specific user. The former will usually be used for normal users, whereas the latter will typically be applied to an administrator.
27.4.3 Implementation of a Secure FAN Node Whereas the previous subsection introduced security measures for a client- or administrator-to-gateway connection, this subsection presents a practical implementation of security measures within the FAN, i.e., between the FAN nodes. In the given example, LonWorks* FAN nodes are extended by smart cards to implement security services for a secure end-to-end communication. Nevertheless, the main results are applicable to many other FAN systems. A schematic of this setup is shown in Figure 27.8. Since the whole protocol stack of LonWorks is implemented in hard- and firmware within a dedicated microprocessor, the security measures have to be implemented above layer 7 of the LonWorks protocol (see also Section 27.3.1). Another argument for this decision is that such a system still fully complies Smart card
Smart card reader
I/O
Rs232 & smart card driver PRG 1
S0-Interface
PRG 1
RAM Data not secured
Network Interface
Power switch Additional I/Os
Fieldbus network
FAN node
Data secured
FIGURE 27.8 Schematic of secure FAN node.
*LonWorks is a FAN developed by Echelon, which is designed for control tasks with special respect to building automation.
© 2005 by CRC Press
Security Topics and Solutions for Automation Networks
27-13
with the LonWorks FAN standard, although interoperability is confined to secured nodes by defining special user-defined structures. If an attacker has appropriate tools and physical access, FAN nodes in general and LonWorks nodes in particular are not well protected. Properties of a node from simple configuration properties to the whole application and data storage can be read and often changed with common administrative tools. Given the assumptions made in Section 27.3, some kind of security token must therefore be introduced that securely stores the secret keys and can execute cryptographic operations in a secure manner. The first step to achieve this goal was the implementation of a smart card interface for the FAN node. Although all interfaces to a smart card are well defined [35], it is generally not possible to integrate any of the ISO 7816 protocols by software, due to limitations in memory and computing power of the node. Hence, an appropriate smart card reader has to be selected, which is a complicated task since only few readers offer a lightweight protocol for low-end microprocessors. In keeping with the underlying scenario — secure transmission of data from a power meter over the network — the FAN node was also equipped with an S0-power meter input to read data from common pulse power meters. In a second step — during operation of the FAN — the measured power value is preprocessed by the FAN node and transmitted to the smart card, which secures the data to be transmitted over the FAN. Similar to the message format shown in Figure 27.6, the integrity of the data is secured by the hashed message authentication code (HMAC) [36] algorithm using SHA-1 as the hash function and encrypted using the 3-DES algorithm. It should be emphasized at this point that the design of a proprietary security algorithm is usually a bad idea and highly error-prone. Instead, one should use algorithms that have been scrutinized by the security community for a substantial amount of time and are considered to be secure. In a third step, the secured data are passed to the network protocol stack and transmitted to the recipient’s node, where the procedure is executed in reverse. The selection of the applied security measures is not only determined by the strength of cryptographic algorithms but also by the restrictions of the FAN. One of the most important restrictions is the limited packet size of FAN protocols, because often packet segmentation and reassembly (SAR) mechanisms are not available. In the presented setup, the packet size is limited to 31 ASCII-encoded characters. According to prior research [37], this is no limitation for the processed 2-byte energy value and should be fine with most other data to be transmitted. However, this limitation could cause problems with asymmetric encryption schemes like RSA (Rivest, Shamir, and Adleman) [38], which requires a minimum packet size of 128 bytes to achieve adequate security. Another important issue is the additional delay introduced by transmitting data to and processing the data within the smart card. Whereas this usually does not apply for building automation control systems, where delays introduced by the smart card communication can be neglected in normal operation, the impacts on FANs with real-time requirements such as in industrial automation systems must be analyzed much more carefully. The main goals achieved by the described setup of a secure FAN node are, on the one hand, a proof of concept for security services at the FAN level and, on the other hand, increased transmission efficiency and encryption services for LonWorks. Nonetheless, further research is needed and two important topics are FAN node protection and key distribution. The first unresolved matter is the fact that the node’s memory is still unprotected, therefore allowing the possibility of data being read before they are encrypted (Figure 27.8, left-hand side). Solutions to this problem would most likely need additional hardware or the redesign of existing FAN nodes to supply protected areas. The second important issue is key distribution and regular key changes. In the present setup, these are done by physically exchanging smart cards, but future solutions should allow for automatic key updates. Indeed, cost constraints mean it is likely that a security token similar to smart cards will be integrated either directly into the node or on the circuit board of the node.
© 2005 by CRC Press
27-14
The Industrial Communication Technology Handbook
27.5 Conclusion The necessity for security at the FAN level still has to be brought to the attention of the bigger part of the FAN community. Although awareness of security issues in the connection of FANs to the Internet is starting to grow — mainly because of the frightening security record of the Internet — it will be of the utmost importance to incorporate security considerations into the next generation of FAN systems. The ultimate goal is to alert the community to the fact that security provisions should be an important factor in the decision about which FAN system to use for a specific task. It is to be hoped that, contrary to the Internet, where it took more than 20 years (and the commercialization of the Internet) until people realized this fact, this process will be faster — especially if one considers safety-critical applications where FAN systems are being deployed in increasing numbers. The realization of FAN–Internet connections has, thanks to the gateway approach, profited immensely (and continues to do so) from the experience gained in attempts to secure the Internet. Still, one must realize that FAN-level security needs different and new approaches. This is due to the different specialized protocols as well as to the very low end processors typically used in this area. Another big field for further research and development, where approaches are not very mature, can be found in the UFAN scenarios. We have discussed some potential security solutions (and pitfalls) for today’s FAN systems that are worth investigation where the need for security arises. However, the discussion also shows that this can only be an intermediate solution. The application of reasonable security strategies to non-security-aware systems has to overcome significant obstacles, to an extent where a meaningful operation is no longer practical. It seems that there is no alternative other than the design of a new, secure FAN infrastructure. This must include all necessary security services as well as the necessary administration framework. The ultimate aim of such efforts has to be a standardized framework that is widely accepted.
References [1] W. Kriesel, T. Heimbold, D. Telschow, Bustechnologien für die Automation, 2nd ed., Hüthig Verlag, Heidelberg, 2000. [2] M. Wollschlaeger, Framework for Web integration of factory communication systems, in IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Antibes Juanles-Pins, France, October 2001, pp. 261–265. [3] Alfred C. Weaver, Internet-based factory monitoring, in 27th Annual Conference of the IEEE Industrial Electronics Society (IECON), Denver, CO, November–December 2001, pp. 1784–1788. [4] M. Kunes, T. Sauter, Fieldbus-Internet connectivity: the SNMP approach, IEEE Trans. Ind. Electronics, 48, 1248–1256, 2001. [5] T. Sauter, M. Lobashov, G. Pratl, Lessons learnt from Internet access to fieldbus gateways, in 28th Annual Conference of the IEEE Industrial Electronics Society (IECON), Sevilla, Spain, November 2002, pp. 2909–2914. [6] B. Fraser, Ed., RFC 2196, Site Security Handbook, 1997, available at http://www.ietf.org/rfc/ rfc2196.txt?number=2196. [7] Bundesamt für Sicherheit in der Informationstechnik, IT Baseline Protection Manual, Bundesanzeiger-Verlag, Köln, 2003, available at http://www.bsi.bund.de/gshb/english/etc/index.htm. [8] International Standards Organisation, ISO/IEC 17799, Information Technology: Code of Practice for Information Security Management, 2000. [9] National Institute of Standards and Technology, An Introduction to Computer Security: The NIST Handbook, NIST Special Publication 800-12, 1995. [10] Douglas R. Stinson, Cryptography, Theory and Practice, 2nd ed., Chapman & Hall/CRC, Boca Raton, FL, 2002. [11] Alfred J. Menezes, Paul C. van Oorschot, Scott A. Vanstone, Handbook of Applied Cryptography, CRC Press, Boca Raton, FL, 1996.
© 2005 by CRC Press
Security Topics and Solutions for Automation Networks
27-15
[12] J.G. Steiner, B. Clifford Neuman, J.I. Schiller. Kerberos: an authentication service for open network systems, in Proceedings of the Winter 1988 Usenix Conference, Dallas, TX, February 1988, pp. 191–202. [13] International Telecommunication Union, ITU-T Recommendation X.509 (1997 E): Information Technology: Open Systems Interconnection: The Directory: Authentication Framework, June 1997. [14] R. Sandhu, E.J. Coyne, H.L. Feinstein, C.E. Youman, Role based access control models, IEEE Computers, 29, 38–47, 1996. [15] Bill Cheswick, The design of a secure Internet gateway, in Proceedings of the Usenix Summer 1990 Technical Conference, Anaheim, CA, June 1990, pp. 233–238. [16] John Wack, Ken Cutler, Jamie Pole, Recommendations of the National Institute of Standards and Technology Guidelines on Firewalls and Firewall Policy, NIST Special Publication 800-41, 2002. [17] Marcus Goncalves, Firewalls Complete, McGraw-Hill, New York, 1997. [18] Josef von Helden, Stefan Karsch, BSI-Studie: Intrusion Detection Systeme Grundlagen, Forderungen und Marktübersicht für Intrusion Detection Systeme (IDS) und Intrusion Response Systeme (IRS), Bundesamt für Sicherheit in der Informationstechnik, Bonn, 1998, available at http://www.bsi.de/ literat/studien/ids/ids-stud.htm. [19] Rebecca Bace, Peter Mell, Intrusion Detection Systems, NIST Special Publication 800-31, 2001. [20] Ross Anderson, Security Engineering, Wiley, New York, 2001. [21] Nikita Borisov, Ian Goldberg, David Wagner, Intercepting mobile communications: the insecurity of 802.11, in Proceedings of the Seventh ACM SIGMOBILE Annual International Conference on Mobile Computing and Networking, Rome, Italy, July 2001, pp. 180–189. [22] International Standards Organisation, ISO/IEC 15408-1, Information Technology: Security Techniques: Evaluation Criteria for IT Security, 1999. [23] T. Dierks, C. Allen, RFC 2246, The TLS Protocol, Version 1.0, January 1999, available at http:// www.ietf.org/rfc/rfc2246.txt. [24] Pete Loshin, Comp., Big Book of IPsec RFCs: Internet Security Architecture, Morgan Kaufmann, San Francisco, 1999. [25] P. Palensky, T. Sauter, Security considerations for FAN-Internet connections, in IEEE International Workshop on Factory Communication Systems, Porto, Portugal, September 2000, pp. 27–35. [26] C. Schwaiger, T. Sauter, Security strategies for field area networks, in 28th Annual Conference of the IEEE Industrial Electronics Society (IECON), Sevilla, Spain, November 2002, pp. 2915–2920. [27] International Organization for Standardization, Basic Reference Model for Open System Interconnection: Part 2: Security Architecture, 1989. [28] Institute of Electrical and Electronics Engineers, IEEE 802.10, IEEE Standards for Local and Metropolitan Area Networks: Standard for Interoperable LAN/MAN Security (SILS), 1998. [29] The XML CD Bookshelf, O’Reilly & Associates, Inc., Sebastopol, 2002. [30] C. Schwaiger, T. Sauter, A secure architecture for fieldbus/Internet gateways, in IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Antibes Juan-les-Pins, France, October 2001, pp. 279–286. [31] W. Rankl, W. Effing, Smart Card Handbook, 2nd ed., John Wiley & Sons, New York, 2000. [32] M. Blaze, J. Feigenbaum, M. Naor, A formal treatment of remotely keyed encryption, advances in cryptology, in EUROCRYPT ’98, International Conference on the Theory and Application of Cryptographic Techniques, Espoo, Finland, May–June 1998, pp. 251–265. [33] National Institute of Standards and Technology, Secure Hash Standard, Federal Information Processing Standards Publication 180-1, 1995. [34] National Institute of Standards and Technology, Data Encryption Standard (DES), Federal Information Processing Standards Publication 46-3, 1999. [35] International Standardization Organization, ISO 7816, Identification Cards: Integrated Circuit(S) Cards with Contacts: Parts 1–10, 1996–2002.
© 2005 by CRC Press
27-16
The Industrial Communication Technology Handbook
[36] H. Krawczyk, M. Bellare, R. Canetti, RFC 2104, HMAC: Keyed-Hashing for Message Authentication, February 1997, available at http://www.ietf.org/rfc/rfc2104.txt. [37] C. Schwaiger, A. Treytl, Smart card based security for fieldbus systems, in 2003 IEEE Conference on Emerging Technologies and Factory Automation, Lisbon, Portugal, September 2003, pp. 398–406. [38] R. Rivest, A. Shamir, L. Adelman, A method for obtaining digital signatures and public-key cryptosystems, Communications of the ACM, 21, 120–126, 1978. [39] B. Boehm, A spiral model of software development and enhancement, IEEE Computer, 21, 61–72, 1988.
© 2005 by CRC Press
28 PROFIsafe: Safety Technology with PROFIBUS 28.1 28.2 28.3 28.4
Why Do We Need Safety in Automation? .......................28-1 Dichotomy of Standard and Safety Automation ............28-2 Motivation and Objectives for PROFIBUS .....................28-2 PROFIsafe, the Solution....................................................28-3 Concept • Black Channel • Possible Transmission Errors and Their Remedies • The SIL Monitor • PROFIBUS Messages with PROFIsafe Frames • Synchronization Means (Finite State Machines)
28.5 Beyond Safe Transmission ..............................................28-10 Safety-Related Programmable Control Logic • Commissioning and Repair • Availability • Status of Profile Guidelines • Standards Catching Up
28.6 Peculiarities for Different Industries .............................28-14
Wolfgang Stripf Siemens AG
Herbert Barthel Siemens AG
Factory Automation • Process Automation • Environmental Conditions • Test and Certification • Development Tools and Support
28.7 Products ...........................................................................28-17 28.8 Prospects ..........................................................................28-18 Abbreviations .............................................................................28-18 References ...................................................................................28-19
28.1 Why Do We Need Safety in Automation? Any active industrial process is more or less associated with the following risks: • To injure or kill people • To destroy nature • To damage investments With most of the processes, it is quite easy to avoid risk without special requirements imposed on automation systems. However, there are typical applications associated with high risk, e.g., presses, saws, tooling machines, robots, conveying and packing systems, chemical processes, high-pressure operations, offshore technology, fire and gas sensing, burners, cable cars, etc. Those applications need special care and technology. Over time, the market balances out the reliability and availability of standard automation technology to a certain economic cost level. That means the failure or error rate of standard automation technology under normal circumstances is just acceptable for normal operations but not sufficient for the abovementioned applications.
28-1 © 2005 by CRC Press
28-2
The Industrial Communication Technology Handbook
The situation may be compared with a public mail system. While normal letter expedition is expected to be as affordable as possible at certain reliability, everybody will use special mail for important messages.
28.2 Dichotomy of Standard and Safety Automation In the past, microcontrollers, software, personal computers, and communication networks were dramatically influencing the standard automation means and thus lead to cost reduction, higher flexibility, and availability. In respect to safety, existing standards and regulations were prohibiting any usage of those means. Safety automation had to be hardwired and based on relay technology. This dichotomy or gap is quite natural due to the fact that safety relies on trusted technology or material, trust on experience, and experience on time. But adding classical safety to modern automation solutions always led to inadequate cost due to additional wiring and engineering, to less flexibility and availability than expected, and to other disadvantages. In the meantime, the situation has changed dramatically. Microcontrollers and software have been proven in use in millions of applications. And the preconditions for their usage in safety applications have been given since the introduction of the international standard IEC 61508 [1].
28.3 Motivation and Objectives for PROFIBUS During its lifetime of more than 10 years, PROFIBUS has been emerging to be one of the most important fieldbus systems in the world and just got standardized within IEC 61158 and IEC 61784 [2]. It is enabling decentralized applications in factory and process automation with its variety of appropriate transmission technologies like RS485, MBP-IS,* and fiber optics. Thus, it was merely a matter of time to integrate the necessary means for safety applications into PROFIBUS DP in a seamless manner and provide similar flexibility and availability also for powerful safety devices like remote input/output (I/O), laser scanners, light curtains, level switches, shutdown valves, drives, robots, and alike. Back in 1998, when the PROFIBUS organization started its project “safe communication across PROFIBUS DP,” more than 25 renowned companies on safety decided to follow the vision of connecting the above safety devices to the same transmission line as the standard devices and let them communicate with an additional programmable safety-related controller (F-Host in Figure 28.1). No cables, applicationspecific integrated circuits (ASICs), layer stack software, or other communication devices like repeaters, links, and couplers should have been changed. Configuration, parameterization, programming, and diagnostic means should be as familiar to the user as possible in order to simplify the usage of safety: “What he or she is using, he or she will not be loosing.” Fortunately the working group was able to base its design and development efforts on the new IEC 61508 and on the spadework of the EN 50159-1, now IEC 62280-1 [3], entitled “Railway Applications: Communication, Signalling and Processing Systems: Part 1: Safety Related Communication in Closed Transmission Systems.” PROFIBUS is such a closed transmission system, as it only allows communication between configured and well-known participants. In contrast, open transmission systems are, for example, the public phone system or the Internet. Already during the first sessions of the working group in 1998 it became apparent that the pure definition of the safe transmission of messages via the standard PROFIBUS cable would not prove sufficient for the new generation of safety devices that had to be connected to it. What sense would it make to a sensor manufacturer, for example, if merely the shutdown signals could be transported via the bus, while parameter value assignment or diagnosis in the event of a failure still required a timeconsuming PC connection on site via the RS-232 interface? The request for the system manufacturers to support the scenario of the rapid device replacement and the integration of the commissioning and diagnosis software of the field device into the engineering software of the system manufacturer for a joint utilization of the communication paths and the project storage was pending (Figure 28.2). Only such *MBP-IS (Manchester coded, bus powered, and intrinsically safe) replaces the previous PROFIBUS name IEC 1158-2. Further development has listed additional procedures in the corresponding IEC standard, so that creating an unambiguous designation has become necessary.
© 2005 by CRC Press
28-3
PROFIsafe: Safety Technology with PROFIBUS
F-Host
F-input/output standard input/output
standard host
PROFIBUS-DP
DP/PA
PG/ES with secure access e.g. flrewall TCP/IP
repeater
engineering F-Sensor F-Field tool Device
F-Actuator
standard input/output F-Gateway other safety bus master-slave assignment
F = Failsafe PG = programmer ES = engineering station
FIGURE 28.1 The PROFIsafe vision.
wiring, space availability system, device, and production failures
failsafe communication intelligent field devices decentralized failsafe automation
flexibility product variants, failures
usability uniform look & feel
system support
uniform engineering
FIGURE 28.2 Requirements and tasks for distributed safety technology.
integration permits the requests of customers for more production flexibility (e.g., program-controlled parameter value assignment) and for increased availability (predicting or faster diagnosis and preventive maintenance) to be satisfied. However, the working group agreed to strive for solutions that keep the device manufacturers as independent as possible from the system manufacturers. From the very beginning it was planned to add safety gateways to other safety bus systems like ASiSafety-at-Work. Safe communication could be achieved by using redundant transmission lines. However, the working group decided to look for a single-channel solution for safety applications such that redundancy can still be added to the system as an option to provide additional availability/reliability (Table 28.1). Eventually, the safety solution for PROFIBUS was called PROFIsafe and a sign was created.
28.4 PROFIsafe, the Solution 28.4.1 Concept As we know, PROFIBUS DP, like most of the fieldbus systems, only employs layers 1, 2, and 7 of the International Organization for Standardization (ISO)/Open Systems Interconnection (OSI) model [4].
© 2005 by CRC Press
28-4
The Industrial Communication Technology Handbook
TABLE 28.1 Safety and Redundancy Options PROFIBUS DP
PROFIsafe
Redundancy
PROFIsafe and Redundancy
Suitable for all kinds of distributed automation
Factory and process automation: Presses, robots, level switches, shutdown valves, as well as burner control and cable cars —
Process automation: Chemical or pharmaceutical productions, refineries, offshore No downtimes at best (fault tolerance) Redundancy by itself does not provide safety
Process automation: Chemical or pharmaceutical productions, refineries, offshore No downtimes at best (fault tolerance) Avoid hazards (required by laws or insurances)
— —
Avoid hazards (required by laws or insurances)
No changes to any of these layers means to add the safety measures as safety layer on top of the PROFIBUS layer 7, thus increasing the size of the OSI application layer. Since this safety layer is merely responsible for the transport of safety-relevant user or process data, it takes the rest of the application layer to look after the acquisition and processing of these data. These higher layers can be provided, for example, in a safe field device (e.g., light curtain) by the technology firmware of this device. Usually, parts of this firmware are of a safety-oriented design anyway — typically a redundant hardware/software structure, in which the PROFIsafe functionality can be embedded (see Figure 28.17). Like in standard mode, the process data (signals or process values) are packed in the processing data unit (PDU) of a PROFIBUS message frame. In case of PROFIsafe, the raw process data are just supplemented by additional information, as we will see in the next sections. The completed safety-relevant process data are called a PROFIsafe frame. Ideally, a PROFIsafe frame shall be passed completely unmodified from a (safety) sender to a (safety) receiver no matter what kind of transmission system both had been using. Thus, the safety measures are encapsulated in the communicating end devices.
28.4.2 Black Channel Such a communication system is called a black channel, analogical to a black box. This means first that the chosen communication technology does not matter, except for a few basic constraints to be defined later on in this chapter. Second, this means that none of the error detection mechanisms of the chosen communication technology are taken into account to guarantee the integrity of the transferred process data. Basically, there are no restrictions with respect to transmission rate, number of bus devices, or transmission technology — as long as its parameters are tolerated by the required reaction times of a given safety application. The example in Figure 28.3 shows that PROFIsafe also employs the black channel principle for complex PROFIBUS structures. Safety-related sensor signals from a modular slave (F-Slave), i.e., a PROFIBUS device that can be equipped with several safety-related modules with input/output channels, are routed via the PROFIBUS slave node to one of the two DP (decentralized peripherals) master nodes of a controller. From here they go via a local backbone bus to the F-Host, the programmable safety-related system (e.g., Safety-PLC or F-Host). After the safe logic operation, a corresponding output signal is routed via the local bus to a second DP master node and to a second PROFIBUS segment. The transmission rate is reduced there in a DP-PA link, and the intrinsically safe transmission technology (MBP-IS) is employed for routing the signal to the safety-related PA (process automation) slave. At no point on its communication path has the signal employed a redundant communication path. In other words, this is a single-channel transfer. Up to now, we simply dealt with the communication paths for safety-related message frames. Thus, it is still open who is responsible for the transfer and when the transfer will take place. Here, too, a PROFIBUS standard mechanism is used — the master–slave operation. A master, which may be assigned here to a F-Host, cyclically exchanges PROFIBUS telegrams with all its configured slaves one after the other. This means that there is always a 1:1 relationship between master and slave. The polling operation has the advantage that any failed device will be detected immediately — one of the basic principles of
© 2005 by CRC Press
28-5
PROFIsafe: Safety Technology with PROFIBUS
OSI model: DP M
DP M
7 2 1
F-Host
RS 485
RS 485 1,5 MBaud
DP-PA Link
2 1 1 backbone bus
MBP-IS
modular slave
FO
slave
7 2 1
31,25 KBaud compact slave FD O
F- FD A I I backbone bus sensor
DP-M F-DI F-AI F-DO FO F-HOST
DP-Master Failsafe digital input Failsafe analog input Failsafe digital output Fiber optics Failsafe programmable controller
actuator
FIGURE 28.3 Complete communication paths for PROFIsafe frames.
safety technology. Now we are able to identify the basic constraints for the black channel for PROFIsafe operations: the polling principle and the 1:1 relationship.
28.4.3 Possible Transmission Errors and Their Remedies Various errors may occur when messages are transferred in network topologies of the described complexity, due to hardware failures, extraordinary electromagnetic interference, or other influences. A telegram can be lost, occur repeatedly, be inserted from somewhere else, appear delayed or in an incorrect sequence, or show corrupted data. In the case of safety-related communication, there may also be incorrect addressing — i.e., a standard message erroneously appears at a safety device and pretends to be a safety message. Different transmission rates may additionally cause storage effects to occur. Out of the numerous remedies known from literature, PROFIsafe concentrates on those presented in the matrix shown in Figure 28.4. These include: • • • •
The consecutive numbering of the PROFIsafe frames (sign of life) A time expectation with acknowledgment (watchdog) A codename between sender and receiver (password) Data integrity checks (CRC = cyclic redundancy check)
Using the consecutive number, a receiver can see whether it received the PROFIsafe frames completely and within the correct sequence. When it returns a PROFIsafe frame with the consecutive number only as an acknowledgment to the sender, the sender too will be assured. Basically, a simple toggle bit would have proven sufficient. Due to the storing bus elements (e.g., routers), however, a counter from 0 to 255 has been selected for PROFIsafe. Zero is an exception in this procedure reserved for the start-up transitions. In safety technology, it matters not only that a message transfers the correct process signals or values, but also that updated actual values arrive within a fault tolerance time, thus enabling the respective device to automatically initiate safety reactions on site, if necessary (e.g., stop movement). For this purpose, the
© 2005 by CRC Press
28-6
The Industrial Communication Technology Handbook
Consecutive Number (sign of life)
Measure: Error:
Data Integrity Codename Time-out (CRC) (with acknowl- (for sender and receiver) edgment)
Repetition Loss Insertion Incorrect Sequence Data Corruption Delay Masquerade (standard message mimics failsafe) a
FIFO errors in intermediate routers a = no acknowledgment from routers
FIGURE 28.4 Transmission error types and remedies.
devices are using a watchdog timer that is restarted whenever new PROFIsafe frames with incremented consecutive numbers arrived. The 1:1 relationship between the master and a slave facilitates the detection of misdirected message frames. Master and slave must simply have an identification (password) that is unique in the network and can be used for verifying the authenticity of a PROFIsafe frame. Detecting corrupted data bits through an additional CRC plays a key role. The necessary probabilistic examination can benefit from the definitions within IEC 61508 [1] that consider the probability of failure of entire safety functions. PROFIsafe is following this approach (Figure 28.5). Accordingly, a safety circuit includes all sensors, actuators, transfer elements, and logic processes that are involved in a safety function. IEC 61508 defines overall values for the probability of failures for different safety integrity levels. For SIL3, for example, this is 10–7/h. For the transmission, PROFIsafe merely takes up 1%. This means that the permissible probability of failure is 10–9/h. This permits suitable CRC polynomials to be determined for the intended PROFIsafe frame lengths. The resulting residual error probabilities of undetected corrupted PROFIsafe frames are guaranteeing the required order of magnitude (see below for details). The quality of the chosen CRC polynomials is such that in the case of PROFIsafe, we are no longer depending on the basic error detection of standard PROFIBUS DP using a frame-checking sequence (FCS) and parity check. Thus, proof is not required that the error detection probability of the basic PROFIBUS DP mechanisms is independent of one of the additional PROFIsafe CRC mechanisms. Besides other things, we need so-called proper polynomials. A polynomial is called proper when the residual error rate over an increasing bit error rate does not show a curve with pronounced peaks (i.e.,
sensor
binary analog input 1%
1% 15%
FIGURE 28.5 Safety function according to IEC 61508.
© 2005 by CRC Press
binary output
control logic
actuator
28-7
PROFIsafe: Safety Technology with PROFIBUS
Pue
g = 16199999331
n =1056
1.10−9 1.10−10 1.10−11 1.10−12 epsilon 0.0005 0.001
0.005
0.01
0.05
0.1
FIGURE 28.6 Example of an improper polynomial.
when the curve rises continuously monotonic). This has been proven mathematically for the polynomials employed for PROFIsafe [5]: 16-bit: 14EABh 32-bit: 1F4ACFB13h Figure 28.6 shows an example of an improper polynomial. Epsilon on the x axis stands for the bit error probability. The residual error probability (y axis) becomes problematic with a high bit error probability (i.e., with a very large number of corrupted bits in a PROFIsafe frame). In order to avoid any insecurity, PROFIsafe uses a patented procedure — the safety integrity level (SIL) monitor.
28.4.4 The SIL Monitor As mentioned before, PROFIsafe is not relying on the basic data integrity checks of PROFIBUS DP. The entire error detection that is necessary for attaining the required category or SIL is advantageously implemented in the additional PROFIsafe protocol. Being dependent on the basic safety structure would require complex verifications for all possible bus configurations. Thus, a mechanism was created that guarantees the compliance with SIL levels over the service life of a distributed safety-related automation solution, irrespective of the employed components and configuration (Figure 28.7). Here, all influences that may corrupt the PROFIsafe frames are looked at as a frequency fw, whatever the cause of the corruption may be: hardware failures, electromagnetic interference (EMI), etc. For each safety function, the F-Host collects the number of detected corrupted PROFIsafe frames reported to it from the associated F-Slaves via the status byte (see Figure 28.9). It also sums up the number of detected corrupted PROFIsafe frames it received on its side. Once the frequency of corrupted PROFIsafe frames exceeds a certain limit (e.g., high bit error rate), the F-Host causes the safe state in the safety function to be activated. This requires monitoring intervals whose lengths depend on the SIL level. For example, T = 5 h for SIL3 and T = 0.5 h for SIL2. The values are calculated such that a maximum of one corrupted PROFIsafe frame is tolerated inside the monitoring interval T. The basis for the calculation is the following formula: L HW + EMI+ other = fW ◊ {PUB(typ) ◊ PUS + PUS(typ) } = 1 / T ◊ {PUB(typ) ◊ PUS + PUS(typ) } < 10 -9 / h PUS is the maximum residual error probability for 16- or 32-bit CRC, at a bit error rate of 0 … 0.5. The SIL monitor is not a separate component. It is implemented as part of the PROFIsafe driver software within the F-Host.
© 2005 by CRC Press
28-8
The Industrial Communication Technology Handbook
frequency of corrupted messages
1. Filter Bus Code: PUB(typ)
2. Filter PROFIsafe Code:
HD 4-Bit-failures HWfailures
PUS
special bit patterns
HD 1 1-C
EMI
fw
other safe state Time period Th with F-Host
C (very little) “raw” channel including backbone bus Bus code failed; statistical bit patterns
PUS(typ)
recognized corrupted messages from every participant
fw = frequency of corrupted messages EMI = electromagnetic interference HD = Hamming distance
C = probability of occurrence T = measurement time period
FIGURE 28.7 The SIL monitor.
There are two groups of safety-related parameters. The F parameters are associated with the PROFIsafe layer and the i (individual) parameters with the individual safety-related technology firmware of a safety slave. The above-mentioned CRC protection is used not only for cyclically ensuring the data integrity of the process signals and process values, but also for ensuring the data integrity of the safety-related parameters stored in the individual slaves (such as code name, watchdog time, etc.). For this purpose, a separate CRC value of these parameters is generated at longer time intervals (8 h). This CRC value is then the new start value of the cyclic CRC generation. The same procedure executes in the F-Host, where the parameters of the slaves are stored.
28.4.5 PROFIBUS Messages with PROFIsafe Frames Up to now, we discussed the processes PROFIsafe employs for safely transporting the PROFIsafe frames. This section now deals with the concrete mapping onto the PROFIBUS DP communication. Figure 28.8 shows the structure of a PROFIBUS DP message frame. In this case, simply the data unit, the parity bit (PB), and the FCS are of interest. The configuration part of an engineering tool uses the electronic generic station description (GSD) of a particular slave device to arrange for the format of the net process data to be transferred within the data unit of a message between a PROFIBUS DP master and its slave device. The same happens in the safety case. Due to the encapsulation via PROFIsafe layers, control and status information must be exchanged between an F-Host and an F-Slave to synchronize the sequences of the state machines. This is necessary, for example, in case an F-Slave requires i-Parameter sets or if the user program within the F-Host wants to change i-Parameter values. Furthermore, the F-Slave can report the detection of an incorrect PROFIsafe frame to the F-Host in order to support the SIL monitor function (Figure 28.7). For the described information exchange, PROFIsafe features a byte that follows the F process data (Figure 28.9). Another additional byte contains the above-mentioned consecutive number. This number, which is entered by the sender of a PROFIsafe frame (source-based counter), is checked by the receiver and returned to the sender in the acknowledgment PROFIsafe frame. Counting is performed in a cycle from 1 to 255. Zero is reserved for the system start. We have already discussed the fact that factory automation and process automation place different requirements upon a safety system. The former deals with short (bit) signals that must be processed at a very high speed; the latter involves longer (floating-point) process values that may take a little more
© 2005 by CRC Press
28-9
PROFIsafe: Safety Technology with PROFIBUS
Standard Message
S
Sync time
S
SD
33 TBt 68H
S
S
LE
LEr
SD
DA
SA
FC
...
...
68H
...
...
...
S
S
Data Unit = Standardor Failsafe Process Data
FCS
ED
1.......244 Bytes
....
16H
1 Cell = 11Bit SB ZB ZB ZB ZB ZB ZB ZB ZB PB EB 0 1 2 3 4 5 6 7 TBit SD LE LEr DA SA FC
= Clock-Bit = 1/Baudrate = Start Delimiter (here SD2, var. data length) = Length of Process Data = Repetion of Length; no Check in FCS = Destination Address = Source Address = Function Code (Message type)
LE Data Unit FCS ED SB ZBO...7 PB EB
= Process Data, for Failsafe Process Data also, max. 244 Bytes = Frame Checking Sequence (across data within LE) = End Delimiter = Start-Bit = Character-Bit = (even) Parity Bit = Stop-Bit
FIGURE 28.8 PROFIBUS DP message structure. Standard Message Frame including F-Process Data and Standard Process Data
S
S
F-Process Data
*)2 Bytes for max. 12 Bytes F-Data; 4 Bytes for max. 122 Bytes F-Data
Max. 12 resp. 122 bytes
S
S
Status/ Consecutive Control Byte Number
1 byte
S
S
CRC2
Standard Process Data
source based counter
across F-Proc. Data and F-Parameter
1 byte
2/4 bytes*)
240/138F Data
max. 244 bytes DP process data
FIGURE 28.9 PROFIsafe frame structure.
time. PROFIsafe therefore offers two different F process data lengths that require CRC protection of different complexity. One length is limited to a maximum of 12 bytes; it requires a 2-byte CRC following the consecutive number. The other length is limited to a maximum of 122 bytes; it requires a 4-byte CRC. The remaining space in the PDU of the message may be used for standard data or other PROFIsafe frames. This results in a very efficient communication with modular F-Slaves or F-Gateways to other safety buses, for example (Figure 28.9). The consecutive number enables the receiver to monitor the vitality of the sender and communication links. Using the acknowledgment mechanism, it is also employed for monitoring the propagation times between sender and receiver (Figure 28.10).
28.4.6 Synchronization Means (Finite State Machines) In distributed systems, a potential failure of the supply voltage of the devices and their subsequent restart are of particular importance. This hazard potential has been covered using detailed state diagrams and interaction diagrams in the PROFIsafe guidelines [5].
© 2005 by CRC Press
28-10
The Industrial Communication Technology Handbook
F-Slave (output)
F-Host
time monitor
CPU cycle time
DP cycle time
consec. N
r. = n
. Nr. = n
consec
consec. N
r. = n+1
time monitor
CPU cycle time
time monitor
r. = n+1
consec. N
consec. N
r. = n+2
time monitor
FIGURE 28.10 Monitoring propagation times. Standard Programming Software
+
Standard CPU incl. Failsafe HW
+
Standard Remote I/O
+
Standard PROFIBUS DP
+
Failsafe Programming Supplement
Failsafe Application Program Failsafe FBs with Time Diversity
Failsafe I/O Module
PROFIsafe
FIGURE 28.11 Combined standard and safety controls.
28.5 Beyond Safe Transmission 28.5.1 Safety-Related Programmable Control Logic The PROFIsafe guidelines specify how to connect any type of F-Host to PROFIsafe communications, but it does not dictate how safety-related signals and process values are to be processed within an F-Host. It turned out, however, that programming safety circuits in graphic languages, such as ladder diagram (LAD) or function block diagram (FBD), according IEC 61131-3, should be preferred to any textual language. In the process world, the representation in continuous function chart (CFC) has proven helpful. Approved relay-based safety circuits now can be offered as a standard certified software library [6]. There are manufacturers offering the option of retrofitting safety-related user programs for some of their standard programmable logic controllers, which fulfill the necessary requirements for safety-related applications. The safety software is of a time diversity structure; it executes in protected areas where it cannot be influenced by the standard application program. Special F I/O modules can supplement the associated standard remote I/O field devices (Figure 28.11). CNC control systems, too, provide the connection for PROFIsafe field devices.
28.5.2 Commissioning and Repair In order to be able to address the individual signals or process values during programming, the fieldbus must be configured using an engineering tool. In this configuration, the fieldbus participants arrange for
© 2005 by CRC Press
28-11
PROFIsafe: Safety Technology with PROFIBUS
their network address, the input signals, the output signals, the data formats, the transmission rate, etc. In the case of F-Slaves, additional steps are required that supply parameters to the PROFIsafe layer: code name, watchdog time, etc. For the computer-aided calculation of total reaction times in particular, and consequently the determination of watchdog times, the device manufacturers specify the processing time of sensors and actuators in their electronic data sheets (GSD). During start-up of the fieldbus network the host/master sequentially provides all slaves with the necessary parameters, including F-Parameters, before it starts cyclic operation [7]. Caused by the wide acceptance of microcontrollers, various new devices such as laser scanners and light curtains have turned up in the past, even for safety applications that feature a certain complexity and require system support to be able to develop their full functionality. These are, for example, teachin, rapid device replacement in the event of a malfunction, predictive diagnostics, self-testing and trend analysis, and more. In its profile guidelines, PROFIsafe has shown ways for mastering these tasks — the so-called threecomponent model that separates the responsibilities of device manufacturers and system manufacturers by using suitable interfaces. In the simplest case of an F-Slave that does not require any individual (i) parameter, providing the safety device with a PROFIsafe connection and a corresponding electronic data sheet (GSD file) proves sufficient. A possible example is a level switch. Slightly more effort is required in the case of a safety device. It can be connected to a laptop via a separate RS-232 interface. In order to perform diagnostics, for example, it uses a special parameterization and diagnostics program. In this case, the laptop or PC communication can also be routed via the PROFIBUS, using acyclic services that are accessible via the new PNO standard FDT/DTM interface technology [8]. Typical representatives of such devices are electronic shutdown (ESD) valves. A device manufacturer would have to provide the safety device, a GSD file, and a DTM program. The DTM program corresponds to the special parameterization and diagnostics program, but it uses the standard FDT interface instead. The most demanding devices are the ones that require user program-controlled parameter value assignment in addition to the above-mentioned features. This requires an additional so-called proxy function block (FB) in the controller (e.g., F-Host). Typical representatives of this device class are the laser scanners, for which we want to “play” a user scenario (Figure 28.12). DTM represents the parameterization and diagnostic program of the laser scanner manufacturer. The FDT framework of the engineering tool arranges for the acyclic connection between the DTM and the safety field device. A user can carry out the parameter value assignment of the laser scanner (configuration of the protection field, for example) as he or she is used to doing it via the previously
Customer program
GSD
IEC Protection 61131-3 field (DBX)
F- Parameter (SIL,Time, etc.) DTM
3
i-Parameter (individual device parameters)
Proxy FB
DTM:
. Administration of access levels . Parameter- and Diagnostic Tool . Parameter handling (manually) . Provides diagnosis displays . Fetches parameters (Teach-in)
F-Host /PLC
Comm FB
Engineering Tool
DP-Master 2 Prm + MS1 (records)
FB: .Proxy of the field device . Representative Parameter handling (controlled) . Provides diagnosis data . Fetches parameters (Teach-in) . Standard programming
i-Parameters via teach-in also
FIGURE 28.12 Parameter handling with powerful safety devices.
© 2005 by CRC Press
1
F-Slave with Proflsafe and technology firmware e.g. laser scanner
28-12
The Industrial Communication Technology Handbook
used RS-232 interface. The safety measures in the software, such as diverse representations and various authenticity mechanisms, can remain the same. During each upload or download, the i (individual safetyrelated) parameters are CRC protected block-wise. There are new steps now. With the help of the DTM, the user triggers the upload of the i-Parameters from the device into the F-Host. To do this, the proxy FB generates instance data for accommodating the i-Parameters. To ensure correct addressing and correct i-Parameters, the DTM — with the help of the engineering tool — reads the instance data of the proxy FB and compares it with the i-Parameters directly read from the safety device. From here on, the proxy FB would be able to load the i-Parameters automatically into the laser scanner in the event of a device replacement. By checking its device type and production number, the proxy FB finds out whether the device is correct or is a different one.
28.5.3 Availability The availability plays a significant role in safety technology. It is simply a precondition. There is too much of a risk that a frequent — and apparently groundless — trip of the safety devices causes the system to shut down (Bhopal effect) [9]. There are mainly five areas that influence the availability of a system: • • • • •
Design/construction of the components Facility layout Operating conditions MTBF of the components (bathtub curve) Requirements for operator actions
In the meantime, most standard automation components and systems on the market have reached a design and construction quality standard that ensures high reliability (high meantime between failures [MTBF]). Among the causes that led to this situation are standards, test institutes, and certifications. However, the availability (uptime in relation to downtime plus uptime) may deteriorate in safety systems that are based on standard systems. Redundant standard components (e.g., two or three controllers) that require synchronization can cause this. The safety device trips if one component takes longer than the other one due to an external event. This risk is also in the redundant implementation of sensors whose measured values diverge briefly (discrepancy). PROFIsafe therefore endeavors to support certified safe integrated F-Slaves that take the risk away from the user as much as possible. Standard fieldbus systems, even proven in use, without additional mechanisms like PROFIsafe or special overall risk assessments of an application may not be considered safe. In order to prevent electromagnetic interference from affecting the operation of a system, PROFIBUS International has published installation guidelines [10] whose utilization in bidding procedures is recommended, thus creating a clear situation right from the outset. They specify aspects like grounding, shielding, lightning protection, etc. PROFIsafe makes the satisfaction of these installation guidelines a prerequisite for the operation. An additional redundant structure of the entire system provides a significant increase of the availability. PROFIsafe has taken this mode of operation into account right from the outset. The operating conditions of a system have a strong influence on the reliability/availability. As we all know, a temperature rise of 10 K reduces the service life of an electronic device by 50%. Here, PROFIsafe can merely give recommendations. Devices should be designed for usage in one of the following zones: office environment, enclosed-type operation within racks, or process participation inside and outside a building. The means of the predictive diagnosis is used in order to avoid unexpected late failures toward the end of the service life of a field device. The proxy FBs are provided in the PROFIsafe concept. These proxy FBs can execute, automatically and periodically, test programs in the field device in order to determine its states and the trend. This enables risky units to be replaced at a time when the process is stopped anyway.
© 2005 by CRC Press
28-13
PROFIsafe: Safety Technology with PROFIBUS
The availability of a system is also reduced if the user is unable to handle it properly due to complexity or variety. PROFIsafe therefore currently strives in PROFIBUS International to support all activities of standardizing the device performance characteristic and the software user interfaces [11].
28.5.4 Status of Profile Guidelines At Hanover Fair 1999, the PROFIBUS User Organization published version 1.0 of the PROFIsafe guidelines. It has been confirmed by positive reports from the German Institute for Occupational Safety and Health (BIA) and the worldwide operating Safety Assessment Organization (TÜV). In the meantime, these guidelines have been enhanced by the requirements for the integration of any given safety-related control system (e.g., computerized numerical controls). Since most F-Host manufacturers currently employ diversity hardware, the PROFIsafe working group also deals with possible connections to these systems. The most common data types and rules for its arrangement have been defined. Via the information within a GSD file of a field device, the corresponding PROFIsafe layer driver in the F-Host can now be set up. Version 1.20 of the PROFIsafe guidelines was published in the fall of 2002. The guidelines and current notes can be found in at http://www.profibus.com.
28.5.5 Standards Catching Up The development of PROFIsafe was based on the new IEC 61508, which recently became a European standard and mainly deals with the software development procedures. Up to now, various standards do not yet go into the new possible solutions and are therefore revised or newly created. This includes IEC 61511, which deals with the particular problems of process safety on the basis of IEC 61508. For safety-relevant machine controls with programmable electronics, IEC 62061 also refers to IEC 61508. Figure 28.13 shows how this standard is embedded within other safety-relevant standards. In the ISO/FDIS 12100-1 & ISO 14121 Safety of machinery-principles for design and risk assessment Design of safety-related electrical, electronic, and programmable electronic control systems (SRECS) for machinery Quantitative index of safety: Safety integrity level (SIL) SIL assignment methodology Architecture oriented Requirements for prevention/control of systematic failures
Quantitative index of safety: Category/perfomance level Category assigned by qualitative risk graphing Architecture oriented
Design objective Applicable Standards IEC 60204-1 Safety of electrical equipment of machinery (used for general electrical safety aspects)
IEC 61508 Functional safety of E/E/PE safety-related systems (used for the design of complex subsystems)
IEC 62061 Functional safety of SRECS for machinery (used for the design of SRECS)
FIGURE 28.13 Relationship between relevant standards for machinery.
© 2005 by CRC Press
IEC 13849-1, -2 Design of safety-related parts of machinery control systems (SRPCS) (used for the design of subsystems to categories) Non-electrical-SRPCS Electrical-SRPCS
28-14
The Industrial Communication Technology Handbook
Standard + Failsafe
Standard
Failsafe
DP-DP-Coupler
FIGURE 28.14
Safety on one transmission line or separately.
U.S., existing restrictions against programmable electronics for emergency stop functions disappeared with the new NFPA 79 version that was published by the end of 2002.
28.6 Peculiarities for Different Industries 28.6.1 Factory Automation Within factory automation the discussion about the need to use a separate safety bus in addition to the standard fieldbus for the implementation of distributed safety technology comes up time and again. For PROFIBUS and PROFIsafe, this vexatious second bus is not a problem. The single-cable solution with a combined standard and safety-related control in one CPU and the solution with separate transmission lines and separate CPUs can both be implemented using the same means (Figure 28.14). However, compared to a heterogeneous solution with two different bus systems, this homogenous solution provides advantages through the same technique and engineering tools of uniform handling (Figure 28.11). Many users are following a migration strategy. First, within existing facilities, the new PROFIsafe technology will replace the relay-based safety technology via a second fieldbus segment. At the next opportunity and after some experience, it is planned to go with the combined solution due to the obvious many advantages.
28.6.2 Process Automation In the meantime, a lot of activities have taken place at the process automation side. As we know, safety applications in process engineering require a consideration that exceeds the functional safety. Pressure and temperature in a process, for example, cannot always be monitored independently of each other. A high availability of sensor functions requires a large degree of design skill and experience. To a large degree, the topic of proven in use plays a large role. Up to now, the part of the protection equipment — which mostly represents cases similar to SIL2 — has been solved with standard field devices and a 4- to 20-mA transmission technique. Communication faults, such as line interruptions or short circuit, were recognizable, the devices were proven in use, and safety monitoring was performed in the F-Host system by interpreting redundant signals and voters. In the case of microprocessor-equipped equipment, guidelines like the NAMUR recommendation NE79 defined the necessary prerequisites. These include the proof that the software/firmware has been developed according to generally recognized quality standards. Furthermore, the user expects facilities such as watchdog timer, memory test, and protection against inadvertent parameter changes. The new rec-
© 2005 by CRC Press
28-15
PROFIsafe: Safety Technology with PROFIBUS
domain: controller manufacturer
engineering tool
F-Host = safety PLC Input/Output
device manufacturer
PROFIsafe
PROFIsafe
Device acc. NE79
Device acc. IEC61508
adjustable: “on/off” safety by means of NE79 and proven-in-use max.AK4 (≈ SIL2), devices with redundancy and comparator: AK6
safety confirmed by means of certification
Up to and including SIL3
FIGURE 28.15 PROFIsafe in field devices for process automation.
ommendation NE97, which NAMUR published at the end of 2002, deals with the subject fieldbus for safety tasks. Among other things, this recommendation shows two possible solutions that can easily be implemented with PROFIsafe. A proven-in-use field device, developed according to NE79, which is connected to PROFIBUS, could actually be used for applications up to SIL2 if the possible communication faults that are known in bus operation (e.g., address corruption, loss, delay, etc.) could be eliminated. As we already know, this is possible if we employ a PROFIsafe protocol that is implemented in a single-channel manner. In the future, this solution permits proven-in-use standard PROFIBUS field devices to be operated in the standard operation mode or optionally in the safety mode, just by changing a single parameter. The user is at liberty to employ certified products (see Figure 28.15). In many systems, the installed proven-in-use field devices are retained, and an additional fieldbus wiring is installed via a so-called remote I/O. For this operating mode, too, NE97 provides a solution. The process automation field devices are usually subject to a strong standardization of device model and device parameters. Parameter value assignment is also expected to be as uniform and consistent as possible. To take this into account, the PROFIBUS User Organization has started to integrate PROFIsafe communication compatibly into the existing PA device model [12]. The related specification [13] especially deals with the following topics: • • • • •
Switching between the standard and safety modes Preparing and commissioning phases New parameters in the physical block of the PA device model Safety-related data structures for the cyclic data exchange Configuration data
As already mentioned, the field devices should be usable in the standard and safety modes. It is planned to achieve this switch by corresponding parameter settings in the related GSD file, and thus during the start-up of the field device. The required SIL is then communicated via an F-Parameter (F_SIL). Due to the contingency that parameters have to be edited during operation, a state machine has been designed for the individual commissioning phases. The three states are None Safety Mode — S1, Safe Commissioning — S2, and Safe Operation — S3. This extended state machine provides the option of using additional states that permit existing parameter assignment tools based on electronic device descriptions (EDDs) to be used [14]. In this case, the parameter assignment usually is performed offline and the device is sealed by soldering or similar procedures. Using safe parameter value assignment procedures also permits i-Parameter value assignment according to the PROFIsafe profile guidelines [5] to be performed at runtime directly in the safe state S3 using FDT/DTM.
© 2005 by CRC Press
28-16
The Industrial Communication Technology Handbook
Start
EMI (immunity)
PROFIBUS (conformance)
application (safety) IEC 61508
test of the PROFlsafe slave device*) no
test passed?
test successfully passed
yes
certificate from PNO
*) 1. DP-V1 test 2. PROFlsafe layer test 3. Interoperability test with reference F-Host
PNO assesscertificate ment successful certificate from “Notified Body”
FIGURE 28.16 The PROFIsafe certification procedure.
Additional parameters are required in such a safety-related field device for mastering the new tasks. Currently, there is one parameter for setting the SIL level, one parameter for displaying the operating states S1 through S3, and a device-individual password for write protection. Additional data formats for the PROFIsafe frame have been supplemented by the standardized data formats for the PA device model for cyclic data exchanges. The corresponding configuration identifiers have been defined.
28.6.3 Environmental Conditions In the meantime, PROFIBUS International has published further specifications since most safety standards ignore bus operations. One of these defines the boundary conditions for safe communication [15]. It deals with the topics of installation guidelines, electrical safety (overvoltage shock protection — SELV/ PELV), power supply units, and immunity to electromagnetic interference. Another describes a policy for all PROFIBUS member companies and providers of safety-related devices dealing with responsibilities and the quality of products and services [16].
28.6.4 Test and Certification The PROFIsafe mechanisms are based on finite state machines. Thus, it is possible via a validation tool for finite state machines to mathematically prove that PROFIsafe is working correctly, even in cases where more than two independent errors or failures may occur. This systematically is achieved by generating all possible cases for test-to-pass and test-to-fail situations. They have been extracted as test cases for a fully automated PROFIsafe layer tester, which is used to check the PROFIsafe conformance of safety-related devices. It is part of a three-step procedure within the overall certification process (Figure 28.16) [17].
28.6.5 Development Tools and Support Using PROFIBUS/PROFIsafe in factory automation with short reaction times, on the one hand, and in hazardous areas in process automation with lowest power dissipation, on the other hand, causes contradictory tasks. Since most modern devices use microprocessors, implementing the PROFIsafe procedure in software is an obvious solution. The operating conditions determine the selection of the microprocessor with respect to high performance or low power dissipation. PROFIsafe automatically adapts ideally; it requires merely some kilobytes of free memory: no additional power supply and no additional space in the confined enclosures are needed. PROFIsafe becomes a part of the device-specific safety software.
© 2005 by CRC Press
28-17
PROFIsafe: Safety Technology with PROFIBUS
Synchronization
Microprocessor 1
Microprocessor 2
PROFIsafe Driver
PROFIsafe Driver
DP-Stack IF
DP-Stack IF
SIL 3 or Category 4
DP-Stack on PROFIBUS-ASIC
Standard PROFIBUS
Exchange of Data
FIGURE 28.17 Generic PROFIsafe driver for slaves. Audit Trailing
Parameterization
B
A Data of a parameterization session
MS2 “SIGNATURE”
FIGURE 28.18 Audit trailing for secured safety parameters.
Immediately after completion of the specification, a company consortium started a joint development of a generic PROFIsafe driver software for slaves. Generic in this context means that the driver has been implemented in ANSI-C, and according to the encoding rules for safety technology for different microprocessors and C compilers. Although it should preferably be used in the dual-channel mode (see Figure 28.17), it may also be used in the single-channel mode (see Figure 28.15). For the specifications, the approval authorities merely issue so-called positive technical reports. With the acceptance of the software in a concrete hardware, we now have a TÜV certificate with the number Z2 01 12 20411 008. Without the help of simplifying step-by-step development equipment, the development of safetyrelated fieldbus devices turned out to be difficult due to the safety measures (consecutive numbering, watchdog timers, etc.) that have to be fulfilled at any time in a real environment. The members of PROFIBUS International are now providing development boards and PC software (PROFIsafe monitor) to develop step by step from PROFIBUS DP to the PROFIsafe communication level.
28.7 Products Currently available products include safety-related programmable controllers and numerical controllers, remote-I/O for IP20 and IP67, laser scanners, light curtains, motor starters, frequency converters, drives, gas and fire sensors, etc. For further details, please see www.profibus.com.
© 2005 by CRC Press
28-18
The Industrial Communication Technology Handbook
28.8 Prospects The sophisticated safety-related devices mentioned in this chapter with their many possibilities to use i-Parameters to adapt to all kinds of safety applications in a flexible manner are causing the requirement for continuous automatic monitoring of security, safety, and data integrity. Thus, regulations like 21 CFR 11 [18] are defining rules for the use of secure, computer-generated, time-stamped audit trails to independently record the date and time of operator entries and actions that create, modify, or delete electronic records. Record changes shall not obscure previously recorded information. Such audit trail documentation shall be retained for a period at least as long as that required for the subject electronic records and shall be available for agency review and copying. PROFIBUS suggests in its device-independent profile guideline Part 1, “Identification and Maintenance Functions” [19], using asset identification parameters and the parameter SIGNATURE to allow parameterization tools to store a security code as a reference for a particular parameterization session. Audit trailing tools may retrieve the code for integrity checks at any time. Together with the asset identification parameters consisting of MANUFACTURER_ID, ORDER_ID, and SERIAL_NUMBER, the SIGNATURE parameter allows unambiguousness of the session data. Due to its black channel approach, the PROFIsafe principle will be easily portable to PROFINET once this new technology is well established on the market. In the meantime, safety device manufacturers are well advised to start with a PROFIBUS DP/PROFIsafe version and find an already existing, fast-growing market that is receptive to the new safety technologies of PROFIsafe.
Abbreviations ASi ASIC BIA CPU CRC DP DTM EMI EN, prEN F FB FDT GSD HW IEC I/O IP ISO/OSI
Actuator sensor interface Application-specific integrated circuit German Institute of Occupational Safety and Health Central processing unit Cyclic redundancy check Decentralized peripherals Device type manager Electromagnetic interference European standard, preliminary Failsafe Function block Field device tool Generic station description (electronically readable data sheet) Hardware International Electrotechnical Commission Input/output Ingress protection, e.g., IP20 International Organization for Standardization/Open Systems Interconnection (reference model) MS1/MS2 Acyclic master–slave communication services of PROFIBUS DP MTBF Mean time between failures NAMUR Association of users of process control technology NFPA National Fire Protection Agency PA Process automation PELV Protective extra low voltage PLC Programmable logic controller
© 2005 by CRC Press
PROFIsafe: Safety Technology with PROFIBUS
SELV SW TÜV
28-19
Safety extra low voltage Software Safety Assessment Organization
References [1] IEC 61508: Functional Safety of Electrical/Electronic/Programmable El: Safety-Related Systems. [2] IEC 61158/61784: Digital Data Communications for Measurement and Control: Fieldbus for Use in Industrial Control Systems. [3] IEC 62288/EN50159-1: Railway Applications: Communication, Signalling and Processing Systems: Part 1: Safety Related Communication in Closed Transmission Systems. [4] ISO/OSI Model, ISO/IEC 7498, Information Technology: Open Systems Interconnection: Basic Reference Model. [5] PROFIsafe, Profile for Safety Technology, V1.20, October 2002, PROFIBUS Order 3.092. [6] www.plcopen.org. [7] M. Popp, The Rapid Way to PROFIBUS DP, PROFIBUS Order 4.072. [8] Specification for PROFIBUS Device Description and Device Integration: Volume 3: FDT Interface Specification, V1.2, PROFIBUS Order 2.162. [9] P. Gruhn, H. Cheddie, Safety Shutdown Systems: Design, Analysis and Justification, ISA, Research Triangle Park, NC, 1998. [10] Installation Guideline for PROFIBUS DP/FMS, V1.0, September 1998, PROFIBUS Order 2.112. [11] DTM Style Guide, V1.1, PROFIBUS Order 2.172. [12] PROFIBUS PA Profile for Process Control Devices, V3.0, October 1999, PROFIBUS Order 3.042. [13] PROFIsafe for PA Devices, working draft, not yet published. [14] Specification for PROFIBUS Device Description and Device Integration: Volume 2: Electronic Device Description, V1.1, January 2001, PROFIBUS Order 2.152. [15] PROFIsafe: Requirements for Installation, Immunity, and Electrical Safety, V1.0, PROFIBUS Order 2.232. [16] PROFIsafe Policy, V1.3, PROFIBUS Order 2.282. [17] PROFIsafe: Test Specification for Safety-Related PROFIBUS DP Slaves, V1.0, PROFIBUS Order 2.242. [18] Title 21 Code of Federal Regulations, U.S. Food and Drug Administration, available at www.fda.gov. [19] PROFIBUS Profile Guidelines: Part 1: Identification and Maintenance Functions, V1.1, PROFIBUS Order 3.502.
© 2005 by CRC Press
V Applications of Networks and Other Technologies Automotive Communication Technologies 29 Design of Automotive X-by-Wire Systems ....................................................................29-1 Cédric Wilwert, Nicolas Navet, Ye Qiong Song, and Françoise Simonot-Lion 30 FlexRay Communication Technology ............................................................................30-1 Dietmar Millinger and Roman Nossal 31 The LIN Standard ............................................................................................................31-1 Antal Rajnák 32 Volcano: Enabling Correctness by Design.....................................................................32-1 Antal Rajnák
Networks in Building Automation 33 The Use of Network Hierarchies in Building Telemetry and Control Applications......................................................................................................................33-1 Edward Koch 34 EIB: European Installation Bus ......................................................................................34-1 Wolfgang Kastner and Georg Neugschwandtner 35 Fundamentals of LonWorks/EIA-709 Networks: ANSI/EIA-709 Protocol Standard (LonTalk) ..........................................................................................................................35-1 Dietmar Loy
Manufacturing Message Specification in Industrial Automation 36 The Standard Message Specification for Industrial Automation Systems: ISO 9506 (MMS) ...............................................................................................................................36-1 Karlheinz Schwarz 37 Virtual Factory Communication System Using ISO 9506 and Its Application to Networked Factory Machine...........................................................................................37-1 Dong-Sung Kim and Zygmunt J. Haas
V-1 © 2005 by CRC Press
V-2
The Industrial Communication Technology Handbook
Mo tion Control 38 The SERCOS interface™..................................................................................................38-1 Scott C. Hibbard, Peter Lutz, and Ronald M. Larsen
Train Communication Network 39 The IEC/IEEE Train Communication Network ............................................................39-1 Hubert Kirrmann and Pierre A. Zuber
Smart Transducer Interface 40 A Smart Transducer Interface Standard for Sensors and Actuators ...........................40-1 Kang Lee
Energy Systems 41 Applying IEC 61375 (Train Communication Network) to Data Communication in Electrical Substations ......................................................................................................41-1 Hubert Kirrmann
SEMI 42 SEMI Interface and Communication Standards: An Overview and Case Study........42-1 A.M. Fong, K.M. Goh, Y.G. Lim, K. Yi, and O. Tin
© 2005 by CRC Press
29 Design of Automotive X-by-Wire Systems 29.1 Why X-by-Wire Systems? .................................................29-1 Steer-by-Wire System • Brake-by-Wire Systems
29.2 Problem, Context, and Constraints for the Design of X-by-Wire Systems .......................................................29-2 General Constraints • Dependability Constraints • Real-Time Constraints
Cédric Wilwert PSA Peugeot–Citroën
Nicolas Navet LORIA
Ye Qiong Song LORIA
Françoise Simonot-Lion LORIA
29.3 Fault-Tolerant Services for X-by-Wire ............................29-4 Overview on the Communication Services • Main TimeTriggered Protocols for Automotive Industry • Operating Systems and Middleware Services
29.4 Steer-by-Wire Architecture: A Case Study ......................29-8 Functional Description of a Steer-by-Wire System • Dependability and Real-Time Properties • Operational Architecture • Dependability Issues
29.5 Conclusion.......................................................................29-18 References ...................................................................................29-18
29.1 Why X-by-Wire Systems? Embedded electronics, and more precisely embedded software, is a fast-growing area, and software-based systems are increasingly replacing the mechanical and hydraulic ones. The reasons for this evolution are technological as well as economical. On the one hand, the cost of hardware components is decreasing while their performances and reliability are increasing. On the other hand, electronic technology facilitates the introduction of new functions whose development would be costly, or not even feasible, if using mechanical or hydraulic systems alone. This evolution, formerly confined to functions such as motor control, wipers, lights, or door controls, now affects all car domains, even for critical functions such as throttle, brake, or steering control. This trend resulted in the introduction of the concept X-by-Wire, where mechanical or hydraulic systems embedded in an automotive application will be replaced by fully electric/electronic ones. Historically, the first critical X-by-Wire function was Throttle-by-Wire, implemented in a Chevrolet Corvette series in 1980 to replace the cable-based throttle. Today, this function is present in most vehicles, for example, the Peugeot 307. Shift-by-Wire systems, also known as Gear-by-Wire, are also implemented in some high-end vehicles such as the BMW 5 and 7 series. However, mechanical systems are still necessary for the most currently used X-by-Wire systems, either to work in conjunction with the electronic system or as a backup (e.g., the electronic hydraulic braking system; semiactive suspensions, as in the Mercedes Adaptive Dampfung System; the electronic camshaft in the BMW Valvetronic technology; and the robotized gear box). It is interesting to note that the robotized gear box is an option proposed by all carmakers in the world today.
29-1 © 2005 by CRC Press
29-2
The Industrial Communication Technology Handbook
One of the main obstacles for general acceptance of X-by-Wire systems is the difficulty to prove that all the necessary safety measures are followed. It is enough to note that the dysfunction of Steer-by-Wire, Brake-by-Wire, or Throttle-by-Wire systems would jeopardize the safety of the occupants. As seen before, a number of X-by-Wire systems have already been implemented in certain series of vehicles; however, Steer-by-Wire and Brake-by-Wire systems will always have a mechanical backup. The concern for safety is certainly a major factor. Another obstacle is that the customer’s demand is not very great at the moment; he does not realize the technical advantages and only sees a higher price tag. But the advantages of this technology can be very attractive for both carmakers and customers (see Sections 29.1.1 and 29.1.2), which explains why carmakers are investing in this domain.
29.1.1 Steer-by-Wire System The first advantage lies in the decreased risk of the steering column entering the cockpit in the event of a frontal crash. Furthermore, the variable steering ratio of the Steer-by-Wire system brings remarkably increased comfort to the driver. This function enables the steering ratio between the handwheel and the wheels to adapt according to the driving conditions. In parking and urban driving, this ratio should be smaller in order to reduce the amplitude of the handwheel rotation. Another facility for steering functionality, brought by software-based technology, is the m-split braking, which consists of applying a dissymmetric torque to the wheels in case of a dissymmetric adherence. Finally, the steering column is one of the heaviest components of the vehicle and removing it significantly decreases the weight of the vehicle and thus reduces fuel consumption. Among the drawbacks, the electrical power needed to power the front axle requires the use of 42-V technology. At the Society of Automotive Engineers (SAE) Conference in 2003 [1], it was said that this technology would not be mature before 2010. This announcement has considerably reduced the emphasis put on the X-by-Wire developments in general. Furthermore, the safety issues have not yet been fully defined.
29.1.2 Brake-by-Wire Systems A Brake-by-Wire system implemented with one microcontroller and one actuator per wheel can significantly increase the quality of the braking, in particular by reducing the stopping distance. Moreover, this technology provides more precise braking by adapting to the pressure the driver puts on the pedal. Like Steer-by-Wire, there is a significant decrease in the weight of the vehicle in removing the hydraulic braking system, and therefore significantly lower costs. Finally, Brake-by-Wire will help to protect the environment because no braking fluid is necessary. Unfortunately, as with the Steer-by-Wire system, the 42-V problem is a barrier for its present deployment. Another hindrance comes from the mentality of the customers. Indeed, the customer is faced with a new technology for a critical function that will be more expensive in the beginning and whose benefits he does not clearly see. The first technology in Brake-by-Wire was the introduction of electrohydraulic braking (EHB). The main difference between the EHB system and the classic braking system is that each wheel has an independent braking subsystem: the hydraulic pressure is applied independently on each wheel. However, a classic hydraulic circuit is still implemented from the pedal to the front wheels for safety reasons. The EHB system was factory installed for the first time in 2001 with the Mercedes Roadster SL. Today, Toyota proposes a regenerative EHB in its Prius, which uses the energy dissipated during deceleration to charge the battery.
29.2 Problem, Context, and Constraints for the Design of X-by-Wire Systems 29.2.1 General Constraints As explained before, some X-by-Wire systems are already standard in certain series. However, while the implementation of Brake-by-Wire and Steer-by-Wire is possible with a 14-V battery in low-weight
© 2005 by CRC Press
Design of Automotive X-by-Wire Systems
29-3
vehicles, this is not the case for the heavier ones. As the first Brake-by-Wire and Steer-by-Wire systems will be costly in the beginning, they have greater chances of being implemented in high-end vehicles. Consequently, 42-V technology has to be mature before X-by-Wire systems can be mass-produced. Moreover, cell fuel technology, with fully electric energy sources, seems to be well situated for replacing combustion engines. This significantly reduces the necessity of 42-V technology in the long run [1]. In addition, the size of the X-by-Wire systems and their cost are major constraints for carmakers; electronics-based systems already account for 30% of the total cost in current vehicles.
29.2.2 Dependability Constraints An analysis of the requirements for X-by-Wire systems was published in the conclusions of the X-byWire Esprit project [19]. In general, for a critical X-by-Wire system, it must be ensured that [19]: • A system failure does not lead to a state in which human life, economics, or environment is endangered. • A single failure of one component must not lead to a failure of the whole X-by-Wire system. The system shall memorize intermittent failures and it shall signal a critical failure to the driver, for example, through a lightning on the dashboard. Moreover, it is required that the system is at least able to tolerate one major critical fault without loss of the functionality for a time long enough to reach a safe parking area. This requirement is a very constraining one, because if only two redundant components are used to provide a critical function, in case of failure of one of these components, the driver must immobilize the vehicle. This requirement will have to be confronted with the availability requirements in the future. In terms of the criticality of the involved functions, automotive X-by-Wire systems can reasonably be compared with Flight-by-Wire systems in the avionic field. According to [19], the probability of encountering a critical safety failure shall not exceed 5◊10–10 per hour and per system, but other studies have been realized with a maximal bound of 10–9. This quantification can be translated in terms of safety integrity level (SIL) [2], and a maximal bound of 10–9 corresponds to a SIL4 system. In fact, SIL4 conformance is reached below 10–8. Up to now, it has been a challenge to reach such dependability because of the lack of experience in the automotive industry with X-by-Wire systems and because of the complexity of the problem. In particular, the environment (electromagnetic interference (EMI), temperature peaks, etc.) may reduce the predictability of the system, and the design is subject to heavy cost and weight constraints. It is likely that, as in the aeronautic industry, for legal and technical reasons, the design process (e.g., software development, methods and tools for validation, etc.) will have to be certified in the future. Finally, one objective that must be reached is that an X-by-Wire system offers the same availability and same maintainability as its mechanical/hydraulic counterparts. The challenge is to prove that a given X-by-Wire system adheres to all these requirements.
29.2.3 Real-Time Constraints Belonging to the chassis domain, Steer-by-Wire and Brake-by-Wire systems are intrinsically real-time distributed systems. They implement complex multivariable control laws and deliver real-time information to intelligent devices that are physically distant (for example, the four wheels). They have to respect stringent time constraints, such as a sampling period of only a few milliseconds. End-to-end response times shall also be bounded; for example, the time between a request from the driver and the response of the physical system must be lower than a few tens of milliseconds (see the Steer-by-Wire example in Section 29.4). An excessive end-to-end response time of a control loop may not only induce performance degradation but also cause the instability of the vehicle. Although these constraints may differ according to the driving conditions, they all have to be respected whatever the situation. So, in general, the worst-case scenario must be taken into consideration and these real-time constraints must be met with a high probability, even if the system is under random perturbations, because of the critical safety nature of the X-by-Wire applications.
© 2005 by CRC Press
29-4
The Industrial Communication Technology Handbook
29.3 Fault-Tolerant Services for X-by-Wire 29.3.1 Overview on the Communication Services The communication system has to provide services that are pertinent with respect to the dependability objectives of the application (see Section 29.2). For instance, the knowledge of the stations that are operational at a given time is usually needed by X-by-Wire applications, and thus a membership service that furnishes this information will ease the development of application-level software and its accuracy. 29.3.1.1 Time-Triggered Communication Among communication networks, one distinguishes between time-triggered protocols, where activities are driven by the progress of time, and event-triggered protocols, where activities are driven by the occurrence of events. Both types of communication have advantages, but one considers that, in general, dependability is much easier to ensure using a time-triggered bus (for instance, refer to [3] for a discussion on this topic). This reference explains that, currently, only time-triggered communication systems are being considered for use in X-by-Wire applications. In this category, multiaccess protocols based on timedivision multiple access (TDMA) are particularly well suited; they provide deterministic access to the medium (the order of the transmissions is defined statically at the design time and organized in rounds), and thus bounded response times. Moreover, their regular message transmissions can be used as heartbeats for detecting station failures. The two TDMA-based networks that are candidates for supporting X-by-Wire applications are TimeTriggered Protocol (TTP)/C (see Section 29.3.2.1) and FlexRay (see Section 29.3.2.2). At the time of writing, FlexRay, which is backed by the major players of the European automotive industry, seems in a strong position for becoming the standard in the industry, although the specifications are not yet finalized. 29.3.1.2 Fault-Tolerant Unit To achieve fault tolerance, that is to say, the capacity of a system to deliver its service even in the presence of faults, certain nodes are replicated and clustered into fault-tolerant units (FTUs). An FTU is a set of several stations that perform the same function, and each node of an FTU possesses its own slot in the round so that the failure of one or more stations in the same FTU can be tolerated. Actually, the role of FTUs is twofold. First, they make the system resilient in the presence of transmission errors (some frames of the FTU may still be correct, while others are corrupted). Second, they provide a means to fight against measurement and computation errors occurring before transmission (some nodes may send the correct values, while others may make errors). The stations forming an FTU will be called replicas from here out. The use of fail-silent nodes (see Section 29.3.1.3) greatly simplifies the design of a fault-tolerant architecture. However, the fail-silent property is generally not easy to verify. The number of replicas per FTU, which are required to tolerate k faulty components, depend on the behavior of the individual components [4]. For instance, if the failure of k nodes must be tolerated, then the least necessary number of replicated nodes is k + 1 when all nodes are fail-silent. 29.3.1.3 Fail-Silent Node As previously explained, fail-silent nodes greatly decrease the complexity of the design of a critical application. A node is said to be fail-silent if: 1. It sends frames at the correct point in time (correctness in the time domain) and the correct value is transmitted (correctness in the value domain) 2. It sends detectably incorrect frames (e.g., wrong cyclic redundancy check (CRC)) in its own slot or no frame at all A communication system such as TTP/C is able to provide very reliable support for the requirements (which provide the so-called fail-silence in the temporal domain once they are accomplished), especially through the bus guardian concept, while the value domain is mainly the responsibility of the application. Refer to [4, 5, 6] for good starting points on the difficult problem of ensuring fail-silence.
© 2005 by CRC Press
Design of Automotive X-by-Wire Systems
29-5
29.3.1.4 Bus Guardian When communications are multiplexed, it may happen that a faulty electronic control unit (ECU), transmitting outside its specification, for instance, at the wrong time or with a larger frame, perturbs the correct functioning of the whole network. One well-known manifestation is the so-called babbling idiots nodes that transmit continuously. To avoid this situation, a component called the bus guardian restricts the controller’s ability to transmit by allowing transmission only when the node exhibits a specified behavior. Ideally, the bus guardian should have its own copy of the schedule, be physically separated from the controller, possess its own power supply, and be able to construct the global time itself. Due to the strong pressure from the automotive industry concerning costs, these assumptions are not fulfilled in general, which reduces the efficiency of the bus guardian strategy.
29.3.2 Main Time-Triggered Protocols for Automotive Industry 29.3.2.1 TTP/C TTP/C (Time-Triggered Protocol), which is defined in [7], is a central part of the Time-Triggered Architecture (TTA) [8], and it possesses numerous features and services related to dependability, such as the bus guardian [5], the group membership algorithm [9], and support for mode changes [10]. TTA and TTP/C have been designed and extensively studied at the Vienna University of Technology. Hardware implementations of TTP/C, as well as software tools for the design of the application, are commercialized by the TTTech company [11] and available today. On a TTP/C network, the transmission support is replicated and each channel transports its own copy of the same message. Although electromagnetic interference (EMI) is likely to affect both channels in quite a similar manner, the redundancy provides some resilience to transmission errors (see Section 29.4.4.4). TTP/C can be implemented with a bus topology or a star topology. The latter topology provides better fault tolerance since the star can act as a central bus guardian and protect against errors that cannot be avoided by a local bus guardian such as spatial proximity fault. For instance, a star topology is more resilient to spatial proximity faults and faults due to a desynchronization of an ECU (see Section 29.3.1.4). To avoid a single point of failure, a dual star topology should be used, but with the drawback that the length of the cables is significantly increased. At the medium access control (MAC) level, TTP/C implements the synchronous TDMA scheme: the stations (or nodes) have access to the bus in a strict deterministic sequential order and each station possesses the bus for a constant period called a slot, during which it has to transmit one frame. The sequence of slots such that all stations have accessed the bus one time is called a TDMA round. An example of a round is shown in Figure 29.6. The size of the slot is not necessarily identical for all stations in the TDMA round, but successive slots belonging to the same station are of the same size. Consecutive TDMA rounds may differ by the data transmitted during the slots, and the sequence of all TDMA rounds is the cluster cycle, which repeats itself in a cycle. TTP/C includes powerful but complex algorithms for easing and speeding up the design of faulttolerant applications, and some of them have been formally verified (for instance, [9] and [12]). In particular, TTP/C implements a clique avoidance algorithm and a membership algorithm that also provides data acknowledgment. The fault hypothesis used for the design of TTP/C is well specified and also quite restrictive (two successive faults must occur at least two rounds apart). Situations outside the fault hypothesis are treated using “never give up” (NUP) strategies [3], which aim to continue operation in a degraded mode. For example, a usual method is that each node switches to local control according to the information still available, while trying to return to the normal mode. 29.3.2.2 FlexRay A consortium of major companies from the automotive field is currently developing the FlexRay protocol. The core members are BMW, Bosch, DaimlerChrysler, General Motors, Motorola, Philips, and Volkswagen. The specifications of the FlexRay protocol are not publicly available or finalized at the time of writing; however, material describing the protocol is available on the FlexRay Web site [13].
© 2005 by CRC Press
29-6
The Industrial Communication Technology Handbook
TDMA Static Segment
Node A Static Slot
Node B Static Slot
Node A Static Slot
Node C Static Slot
FTDMA Dynamic Segment
Node A Static Slot
Node D Static Slot
Node A Static Slot
Node B Static Slot
...
Mini Slots
FIGURE 29.1 Example of a FlexRay communication cycle with four nodes: A, B, C, and D. MiniSlot
Frame ID n+1
Frame ID n+5
Channel 1 Channel 2
n n
n+1
n+2 n+1 n+2
Frame ID n
n+3
n+4 n+5 n+3 n+4
Frame ID n+2
n+6
n+7
Frame ID n+4
Slot Counter
FIGURE 29.2 Example of message scheduling in the dynamic segment of the FlexRay communication cycle.
The FlexRay network is very flexible with regard to topology and transmission support redundancy. It can be configured as a bus, a star, or multistars, and it is not mandatory that each station possess replicated channels, even though this should be the case for X-by-Wire functions. At the MAC level, FlexRay defines a communication cycle as the concatenation of a time-triggered (or static) window and an event-triggered (or dynamic) window. In each communication window, whose size is set statically at design time, a different protocol is applied. The communication cycles are executed periodically. The time-triggered window uses a TDMA MAC protocol; the main difference with TTP/C is that a station might possess several slots in the time-triggered window, but the size of all slots is identical (Figure 29.1). In the event-triggered part of the communication cycle, the protocol is FTDMA (flexible time-division multiple access): the time is divided into so-called mini-slots; each station possesses a given number of mini-slots (not necessarily consecutive), and it can start the transmission of a frame inside each of its own mini-slots. A mini-slot remains idle if the station has nothing to transmit. An example of a dynamic window is shown in Figure 29.2: on channel B, frame n started to be transmitted in the mini-slot n while mini-slot n + 1 has not been used. It is noteworthy that the frame n + 4 is not received simultaneously on channels A and B since, in the dynamic window, transmissions are independent on both channels. The FlexRay MAC protocol is much more flexible than TTP/C MAC since in the static window nodes are assigned as much slots as necessary (up to 4095 for each node) and since frames are only transmitted if necessary in the dynamic part of the communication cycle. Compared to TTP/C, the structure of the communication cycle is not statically stored in the nodes; it is indeed revealed during the start-up phase. However, unlike TTP/C, mode changes with a different communication schedule for each mode are not possible. From the dependability point of view, all services and functionalities of FlexRay, except the bus guardian and the clock synchronization, are not currently well documented, nor is the fault hypothesis used for the design. However, it seems that most features will have to be implemented in software or hardware layers on the top of FlexRay, with the drawback that efficient implementations might be more difficult to achieve. 29.3.2.3 TTCAN TTCAN (time-triggered Controller Area Network) [14] is a communication protocol developed by Robert Bosch GmbH on the basis of the CAN physical and data link layers. Time-triggered communication is
© 2005 by CRC Press
29-7
Design of Automotive X-by-Wire Systems
CAN standard arbitration
TDMA
Basic Cycle Reference Message
master node transmission
Exclusive Window
Exclusive Window
Arbitration Window
Free Window
Exclusive Window
Reference Message
Exclusive Window
...
Time windows for messages
FIGURE 29.3 Example of a TTCAN basic cycle.
built upon the standard CAN protocol, but the controllers must be able to disable automatic retransmission and provide the application with the time at which the first bit of a frame was sent or received [15]. The bus topology of the network, the characteristics of the transmission support, the frame format, as well as the maximum data rate (1 Mbits/s) are imposed by CAN protocol. Channel redundancy is possible, but not standardized, and no bus guardian is implemented in the node. A key idea is to propose, as with FlexRay, a flexible time-triggered/event-triggered protocol. As illustrated in Figure 29.3, TTCAN defines a basic cycle (the equivalent of the FlexRay communication cycle) as the concatenation of one or several time-triggered (or exclusive) windows and one event-triggered (or arbitrating) window. Exclusive windows are devoted to time-triggered transmissions (i.e., periodic messages), while the arbitrating window is ruled by the standard CAN protocol: transmissions are dynamic and bus access is granted according to the priority of the frames. Several basic cycles that differ by their organization in exclusive and arbitrating windows and by the messages sent inside exclusive windows can be defined. The list of successive basic cycles is called the system matrix and the matrix is executed in loops. Interestingly, the protocol enables the master node, the node that initiates the basic cycle through the transmission of the reference message, to stop the functioning in TTCAN mode and to resume in standard CAN. Later, the master node can switch back to the TTCAN mode by sending a reference message. TTCAN is built on a well-mastered and cheap technology that is CAN but, as defined by the standard, does not provide important dependability services such as the bus guardian, membership service, and reliable acknowledgment. It is, of course, possible to implement some of these mechanisms at the application or middleware level, but with a reduced efficiency. It seems that carmakers may consider the use of TTCAN for some systems during a transition period until the FlexRay technology is fully mature.
29.3.3 Operating Systems and Middleware Services In the context of automotive applications, middleware is a software layer located above the platform (hardware, operating system, protocols) that aims to offer high-level services to the application in order to reduce the time needed to market and improve the overall quality of the system. The main purpose of middleware is to hide the distribution of the functions and the heterogeneity inside the platform (ECU, network, CPU, OS, etc.). Another interest of middleware is to provide high-level services, and for X-byWire applications, services related to dependability are needed. Several projects aimed at the development of automotive middleware layers have been undertaken (EAST, www.east-eea.net; AUTOSAR, http://www.autosar.org/). To the best of our knowledge, the only results publicly available have been produced in the context of the OSEK/VDX consortium (detailed information can be obtained at http://www.osek-vdx.org), which is a project of the automotive industry whose objective is to build a standard architecture for in-vehicle control units. Among the results of the
© 2005 by CRC Press
29-8
The Industrial Communication Technology Handbook
OSEK/VDX group, two are of particular interest for X-by-Wire: OSEKTime operating and the faulttolerant communication layer. 29.3.3.1 OSEKTime OSEKTime OS (OSEK/VDX time-triggered operating systems) [16] is a small operating system designed to offer the basic services needed by time-triggered applications. In particular, OSEKTime OS offers services for task management, interrupt processing, and error handling. An offline-generated dispatcher table, termed a dispatcher round, activates the tasks in a determined order and repeats itself as long as the system is running. Several different dispatcher tables, corresponding, for instance, to different functioning modes of the system, can be defined, but the switching from one table to the next can only take place at the end of a round. Tasks cannot be blocked waiting for an external event, but they can be preempted; a running task will always be preempted by a task that is activated, and the designer must take resource contention into account in this case. An interesting feature of OSEKTime OS is the deadline monitoring that can be performed for some specified tasks: when such a task is still not finished at its deadline, a specific application error handling routine is invoked and the operating system is reinitialized. In OSEKTime OS, the rate at which interrupts can occur is bounded, at a rate specified when the system is configured, in order to keep the system predictable. As for communication cycles in time-triggered networks, the configuration of the dispatcher round can be done offline through a software tool that will ensure the correctness of the system. 29.3.3.2 FTCom OSEK/VDX FTCom (fault-tolerant communication) [17] is a proposal for a software layer that provides services for facilitating the development of fault-tolerant applications on top of time-triggered networks. One important function of FTCom, with respect to X-by-Wire, is to manage the redundancy of data needed for achieving fault tolerance (see Section 29.3.1.2). From an implementation point of view, it is sometimes preferable to present only one copy of data to the application in order to simplify the application code and to keep it independent from the level of redundancy (i.e., the number of nodes composing an FTU). In OSEK/VDX terminology, the algorithm responsible for the choice of the value that will be transmitted to the application is termed the agreement algorithm. Many agreement strategies are possible: pick-any (fail-silent node), average value, pick-a-particular-one, majority vote, etc. OSEK FTCom provides a generic way for specifying the agreement strategy of replicated data. Two other important services of the FTCom are (1) management of the packing and unpacking of messages [18], which is needed if the use of network bandwidth has to be optimized, and (2) provision of message-filtering mechanisms for passing only significant data to the application.
29.4 Steer-by-Wire Architecture: A Case Study A Steer-by-Wire system aims to provide two main services: controlling the wheel direction according to the driver’s request and providing a mechanical-like force feedback to the handwheel. In this section, we present the functional point of view of such a system, the real-time and dependability properties that have to be observed, and a realistic operational Steer-by-Wire system used as a reference architecture for evaluation purposes. Finally, we will focus on the real-time requirements, and after proposing a way to model failures, we will show how each component or subsystem of the reference architecture can reach the dependability objective.
29.4.1 Functional Description of a Steer-by-Wire System In a Steer-by-Wire system, two main services have to be provided: the front-axle actuation and the handwheel force feedback. So, from a functional point of view, this implies two main functions that are
© 2005 by CRC Press
Design of Automotive X-by-Wire Systems
29-9
not completely independent. However, in the following discussion, in order to simplify their description, we will not take into account the interdependencies between these functions. 29.4.1.1 Front-Axle Control This function computes the orders that are given to the motor of the front axle, mainly according to the state of this front axle, and the commands given by the driver through the handwheel. The driver’s requests are translated through: • Hand wheel angle • Hand wheel torque • Hand wheel speed 29.4.1.2 Handwheel Force Feedback This function computes the order that will be given to the handwheel motor, in particular according to: • Speed of the vehicle • Front-axle position • Front tie rod force The elaboration of these orders requires the execution of filtering algorithms, complex control laws under stringent sampling periods of a few milliseconds. The main property to ensure is that the end-toend response time between a new command from the driver and the effect on the front axle is bounded.
29.4.2 Dependability and Real-Time Properties As stated in Section 29.2.2, a Steer-by-Wire system must comply with safety integrity level 4 (SIL4) [2]. This means that the system shall be able to tolerate a single failure and to ensure that the probability of encountering a safety-critical failure does not exceed 10–9 per hour. An important issue is to formally determine the relation between the distributed system that supports the steering control function and the failure occurrences at the steering system level. Section 29.4.4.5 aims to propose a few solutions to this problem. Besides these dependability constraints, any steering system has to ensure certain performances. Specifically, let us consider the front-axle control function. According to the vehicle technical requirements stated in [19], whatever the system is (i.e., mechanical/hydraulic or by-wire), the maximum angles of the front wheels should be at least ±40˚ (±90˚ for upcoming systems). This leads to a specific performance property imposing the ability to control the steering with a velocity of at least 40˚ per second (90˚). Furthermore, some real-time properties are derived from control requirements. In particular, some control laws implemented in the case study required a sampling period equal to 2 ms (i.e., 500 sampled wheel angle values per second). Each sample was treated through filtering and control algorithms and led to a value that is used by an actuator in order to reach the desired steering position. Obviously, a delay (named end-to-end response time) that cannot be neglected exists between one sampling and its corresponding reaction on the actuator. This delay is mainly due to the execution of the algorithm on the ECU and to the transfer of the sampled value through the communication systems. In order to guarantee vehicle stability, this delay has to be less than a given bound that depends on the type of vehicle as well as on the driving condition (velocity, wheel angle, etc.). It is the carmakers’ responsibility to be able to compute the limit of the tolerated delay for any given situation the vehicle may be in. Notice that the occasional absence of samples or out-of-bound delays at the controller or actuator level, for instance, due to frame loss, does not necessarily lead to vehicle instability, but degrades steering performance (or quality of service). This is because most of the control laws that are used are designed with specific delay and absence of sampling data compensation mechanisms, thus tolerating perturbations under a given threshold. In Section 29.4.4.5.1, we will show an example of how to evaluate such a threshold value using a Matlab/Simulink model of the system.
© 2005 by CRC Press
29-10
The Industrial Communication Technology Handbook
Hand Wheel (HW)
as1
as2
HW Motor 1
HW Sensor as3
HW Motor 2
HW Sensors
HW ECU1
HW ECU2
Legend TDMA network point to point link as
hand wheel sensors
HW motor hand wheel actuators for force feedback ECU
nodes
FAA motor front axle actuators rps
FAA ECU2
FAA ECU1
Rotor Position sensors
Front Axle
rps1
FAA Motor 1
Rotor Position Sensor
rps2
FAA Motor 2
Rotor Position Sensor
rps3 Rotor Position Sensor
FIGURE 29.4 Steer-by-Wire operational architecture.
29.4.3 Operational Architecture An operational architecture, which is a solution for the implementation of the functions presented in Section 29.4.1, is described in this section. Figure 29.4 illustrates the hardware architecture on which the operational one is based. It includes four electronic control units (microcontrollers): HW ECU1 (handwheel ECU1), HW ECU2 (handwheel ECU2), FAA ECU1 (front-axle actuator ECU1), and FAA ECU2 (front-axle actuator ECU2). Each node is connected to the two TDMA-based communication channels (BUS1 and BUS2). Finally, three sensors — as1, as2, and as3 — placed near the handwheel measure the requests of the driver in a similar way, the latter being translated into a 3-tuple: . Three other sensors — rps1, rps2, and rps3 — are dedicated to the measurement of the front-axle position. Finally, two motors (FAA Motor1 and FAA Motor2), configured in active redundancy, act on the front axle, while two other motors (HW Motor1 and HW Motor2) realize the force feedback control on the handwheel. Sensors as1, as2, and as3 (sensors rps1, rps2, and rps3, respectively) are connected by pointto-point links to both HW ECU1 and HW ECU2 (FAA ECU1 and FAA ECU2, respectively). 29.4.3.1 Implementation of the Front-Axle Control Function The requests of the driver are measured by the three replicated sensors as1, as2, and as3 and sent to both HW ECU1 and HW ECU2. Each ECU performs a majority vote on the three received values and transmits secure data on both communication channels BUS1 and BUS2. The two ECUs, FAA ECU1 and FAA
© 2005 by CRC Press
29-11
Design of Automotive X-by-Wire Systems
ECU2, placed behind the front axle, consume these data, as well as the last wheel position, in order to elaborate the commands that are to be applied to FAA Motor1 and FAA Motor2. 29.4.3.2 Implementation of the Force Feedback Control Function In a way similar to the previous function, measurements taken by rps1, rps2, and rps3 are transmitted to both FAA ECU1 and FAA ECU2. Each of these ECUs elaborates information transmitted on the network. The consumers of this information are both HW ECU1 and HW ECU2, which compute the command transmitted to HW Motor1 and HW Motor2. The replication of algorithms on several ECUs of measurements from several similar sensors and of information transmission over redundant buses is highly used in this operational architecture. The choices made in terms of redundancy and diversification (see Section 29.4.4.3) are constrained by dependability, cost, and dimension requirements. Alternative Steer-by-Wire architectures presented in the literature (e.g., [19]) use, in addition, two central ECUs located between the handwheel and the front, but from an economic point of view, it is of course preferable to use only four ECUs if dependability criteria are met.
29.4.4 Dependability Issues In this section, after recalling which failures are taken into account, we justify choices that are made for the specification of the hardware architecture. 29.4.4.1 Failure Model The terms fault, error, and failure are currently used in system engineering. In [20], these terms are clearly defined. Let us consider that a system has to deliver a service. A system failure is an event that occurs when the delivered service deviates from the expected one. An error is the part of the state of the system that may cause a subsequent failure, and a fault is the adjudged cause of an error. A fault is active when it produces an error; otherwise, it is dormant. Note that if we consider that a system is composed of components, we can observe possible causal relations between the failure of one or several components and the failure of the system (Figure 29.5). Usually, two classes of faults can be distinguished according to their effects inside a system: Byzantine faults and coherent faults. These faults are caused by a failure of one or several components of the system. A Byzantine fault is a fault whose effect can be perceived differently by different observers. The effect of a coherent fault is seen the same by all observers. Moreover, the property of fail-silence, assumed for some components, leads to a third class of fault. A component is said to be fail-silent if, each time it is not silent, we can conclude that it is functioning properly. Note that at a system level, the silence of this class of component is seen as a fault when it occurs. A second classification of faults relies on the duration of a fault. In this case, we consider two types of faults according to their effect on the whole system. In our context, a transient fault is a fault whose duration is such that the system does not reach a “not safe state.” A permanent fault is a nontransient one. The main issue is to evaluate the delay after which a transient fault becomes a permanent one. In Section 29.4.4.5.1, we present a method for the evaluation of the worst tolerable delay. Following the approach proposed in [21], we define the system dependability requirement by a triplet FM(b, c, f), named flexible failure model, where b is the maximum number of Byzantine failing sources, c the maximum number of coherent failing sources, and f the maximum number of fail-silent sources that the system must be able to tolerate. In this case study, we consider a failure model defined by FM(1, 1, 1): the system must always tolerate, at a given time, one Byzantine fault or one coherent fault or a fault due to one fail-silence of a component.
Fault
Component Error Failure
FIGURE 29.5 Failure propagation.
© 2005 by CRC Press
Fault
System Error Failure
29-12
The Industrial Communication Technology Handbook
29.4.4.2 Operational Architecture vs. Dependability Requirements In this section, we give some rules that we applied for designing the architecture. The system under study has to provide two main services: control of the front axle according to the driver requests and furnishing of a force feedback to the driver. We focus on the former. A similar rationale can be used for the second one. In both cases, the main question is the evaluation of the minimum number of redundant components that contribute to meet the dependability requirement. Dependability analyses are generally based on a strong hypothesis assuming that, in the whole system, n simultaneous component failures can never occur for any set of redundant components (set of redundant handwheel sensors, set of redundant handwheel ECU, etc.). In this case study, we suppose that n is equal to 1. Note that in [22], Lamport et al. state that 3n + 1 redundant components are necessary to tolerate n Byzantine faults. In order to tolerate n coherent faults, it is sufficient to have 2n + 1 redundant components. 29.4.4.2.1 ECU Redundancy Two functions need to be implemented in ECU: front-axle control and force feedback control. To avoid costly and numerous wires, ECUs have to be placed close to the sensors, and communication between ECUs has to be multiplexed. A handwheel ECU (a front-axle ECU, respectively) is a consumer of information sampled from the handwheel (front axle, respectively) and is a producer of information used by the front-axle control function (force feedback control function, respectively) implemented in a front-axle ECU (handwheel ECU, respectively). According to the rule given by Lamport, the minimum number of redundant handwheel ECUs (front-axle ECUs, respectively) should be four. This solution is mainly used in the aeronautic domain. But automotive requirements are completely different in terms of cost and space. Therefore, a classical solution is to use fail-silent ECUs. In this case, obviously, only two handwheel ECUs (front-axle ECUs, respectively) are necessary. However, we have to ensure the failsilence property. To do this, several techniques based on the Petri net analysis [23], C model simulation [24], or fault injection [25] are used. 29.4.4.2.2 Handwheel Sensor Redundancy A handwheel sensor produces information for two handwheel ECUs. Three handwheel sensors are necessary for ensuring that each handwheel ECU, assumed to provide a voting algorithm, is able to tolerate one Byzantine fault (and subsequently one coherent fault or one fail-silent sensor). 29.4.4.2.3 Actuator Redundancy Actuators are mechanical components without any calculating ability, and a single actuator can take charge of piloting the front axle. Furthermore, according to the inherent reliability properties of these actuators, we guarantee that an actuator can never wrongly apply an order received by a front-axle ECU. Under these assumptions and taking into account the formerly stated fail-silence property of a front-axle ECU, only two couples are necessary for the tolerance of, at most, one fault. 29.4.4.3 Redundancy and Diversification According to the dependability requirements, presented in Section 29.4.2, and to the assumption made on the ECUs (fail-silence in our case), and according to the failure occurrence model (Section 29.3.1.3), a certain level of redundancy has to be implemented. If the chosen fault tolerance strategy is failure recovery, redundant ECUs will work only in the case of the primary ECU failing. In this case, failure detection must be quick and reliable. Otherwise, if the strategy is failure compensation, as with the frontaxle motors, redundant ECUs will be placed in parallel and work simultaneously. Because of the stringent real-time constraint, our architecture must provide failure compensation. It is worth noting that the redundancy of identical ECUs does not prevent the architecture from common mode failures: the hardware of redundant ECUs should be furnished by different suppliers and their software realized by different teams. The Ariane 501 explosion is a good example to show the importance of diversification. Both backup and active inertial reference systems failed for the same reason [26]. If software and hardware had been diversified, one of the two inertial reference systems would have
© 2005 by CRC Press
Design of Automotive X-by-Wire Systems
29-13
remained safe. But for cost and maintenance reasons, it is not always possible to implement diversified components and technologies. 29.4.4.4 Configuration of the Communication Protocol Communication is driven by a TDMA-based protocol with two replicated channels. More precisely, the network that is used in this case study is TTP/C because of the availability of the protocol specification and the components. However, the same analysis is valid for any time-triggered protocol such as TTCAN and FlexRay. For reliability reasons, the same frame is transmitted on the two replicated channels. In order to avoid common mode failures (EMI, temperature, etc.), channels should be placed as far as possible from each other in the vehicle. 29.4.4.4.1 Slot Allocation Strategy to Maximize the Robustness of the Transmission In TTP/C, the transmission order inside a round can be freely chosen by the application designer. Among the criteria for constructing the TDMA round, applicative constraints like computation time and sampling rates can be taken into account. But as shown in [27, 28], the robustness of a TDMA-based system against transmission errors heavily depends on the location of the slots inside the round. In automotive systems, one observes that transmission errors are highly correlated: there occur perturbations that corrupt several consecutive frames (so-called bursts of errors). Should two frames that belong to the same FTU (see Section 29.3.1.2) be transmitted just one after the other, then a single perturbation could corrupt both frames. The objective to pursue depends on the status of the FTU with regard to the concept of fail-silence (see Section 29.3.1.3). For FTUs composed of a set of fail-silent nodes, the successful transmission of one single frame for the whole set of replicas is sufficient since the value carried by the frame is necessarily correct. In this case, the objective to achieve with regard to the robustness against transmission errors is the minimizing of the probability that all frames of the FTU (carrying data corresponding to the same production cycle) be corrupted. This probability is denoted P_all in [27]. In practice, replicated sensors may return slightly different observations, and without extra communication for an agreement, replicated nodes of a same FTU may transmit different data. If a decision, such as a majority vote, has to be taken by a node with regard to the value of the transmitted data, the objective is to maximize probability that at least one frame of each FTU is successfully transmitted during a production cycle. If the production cycle is equal to one round, then it comes back to minimizing P_one, the probability that one or more frames of an FTU have become corrupted. It has been shown in [27], with some reasonable assumptions on the error model, that the optimal solution to minimize P_all is to “spread” the different frames of a same FTU uniformly over the TDMA round. An algorithm that ensures the optimal solution is provided for the case where the FTUs have, at most, two different cardinalities (for instance, one FTU is made of two replicas and other FTUs are made of three replicas). For the other cases, a low-complexity heuristic is proposed [27], and it was proven to be close to the optimal on simulations that were performed. In [28], it was demonstrated that under very weak assumptions on the error model, and whatever the number of FTUs and their cardinalities, the clustering together of the transmission of all the frames of an FTU minimizes P_one when the production cycle of the data sent is equal to the length of a TDMA round. These two results, for the fail-silent case and non-fail-silent case, provide simple guidelines to the application designer for designing the schedule of transmission. In our case study, since all ECUs are fail-silent, our requirement is to minimize the probability of losing all replicas in the TDMA round, and thus the redundant frames have to be spread over time. 29.4.4.4.2 Allocation of the Slots in the Round Let us consider the architecture shown in Figure 29.4 with the following characteristics for the production of data:
© 2005 by CRC Press
29-14
The Industrial Communication Technology Handbook
Slot Duration (0.5ms) HW ECU1
FAA ECU1
HW ECU2
FAA ECU2
HW ECU1
FAA ECU1
HW ECU2
FAA ECU2
HW ECU1
FAA ECU1
HW ECU2
FAA ECU2
...
Cluster Cycle (2ms) TDMA-Round (2ms)
TDMA-Round (2ms)
FIGURE 29.6 Placement of the slots in the cluster cycle.
HW ECU1/HW ECU2 — production of two pieces of data packed in a single frame: • HW angle every 2 ms • HW torque every 4 ms FAA ECU1/FAA ECU2 — production of two pieces of data packed in a single frame: • FAA position every 2 ms • Tie rod force every 4 ms The size of the TDMA round is set to the minimal production period, i.e., 2 ms. Since the frames are composed of the same information, whatever the round, the size of the cluster cycle is equal to one round, which is possible with the latest version of the specification [7]. According to all these considerations, the location of the slots inside the round is shown in Figure 29.6. The slot duration is set equal for every slot. However, it is not a constraint imposed by TTP/C, but in this case, this choice has been justified by application-level constraints (deadline on tasks, etc.). 29.4.4.5 Evaluation of the Behavioral Reliability of the Architecture For a given mean time to failure (MTTF) of the components (sensors, computers, network links, actuators) and their redundancy, the use of classic reliability analysis methods, for instance, fault tree analysis (FTA) and failure mode and effect analysis (FMEA), can provide an estimation of the reliability of a Steer-by-Wire architecture (see, for instance, [29]). However, neither transient faults nor real-time performances are taken into account in these kinds of evaluation, and thus these evaluations are clearly not sufficient in our context. Under normal conditions, the use of a time-triggered scheduling of tasks and messages allows the reception of the sensor data at regular intervals, thus providing bounded end-to-end delays. However, random environmental perturbations (e.g., EMI) could make the communication system unavailable during some periods. For example, consecutive transmission errors can create a period during which the controller or the actuators do not receive any sensor data. The concept of behavioral reliability, defined in [30], for determining the probability with which the Steer-by-Wire system violates the end-to-end delay constraints for 1 h under a stochastic perturbations. Our objective is to ensure this probability will be less than 10–9, which is more stringent than the SIL4 requirement (see Section 29.2.2). The end-to-end delay for the front-axle control function is composed of the so-called pure delay (delay induced by the system before the driver’s command is given to the actuators) and the mechatronic delay introduced by the actuators (electric motors in our case). The mechatronic delay can be bounded by a constant Tmec. In what follows, we will only focus on the analysis of the pure delay analysis, which is denoted by Tp. Systems that are not able to ensure a pure delay Tp lower than a tolerable upper bound Tmax are considered to be unstable. The value of Tmax can be estimated by tests in vehicles and simulations. The behavioral reliability is estimated by the probability that the pure delay is greater than the maximum tolerable bound: PBR = P[Tp > Tmax]. When Tp is equal to or lower than Tmax, the quality of service is degraded, while the vehicle is considered to be potentially unstable when Tp > Tmax. 29.4.4.5.1 Evaluation of Tmax To illustrate the method, only the function “turning the wheels according to driver’s request” is considered. The evaluation of Tmax can be performed either by testing in a vehicle or by using Matlab/Simulink. The
© 2005 by CRC Press
29-15
Design of Automotive X-by-Wire Systems
TABLE 29.1 QoS Score vs. Perturbation Time Configuration of the Steering System
TP (ms)
Mechanical steering system Steer-by-Wire Steer-by-Wire Steer-by-Wire Steer-by-Wire Steer-by-Wire Steer-by-Wire Steer-by-Wire Steer-by-Wire Steer-by-Wire
0 3.6 5.6 9.6 11.6 13.6 15.6 17.6 19.6 29.6
Score S 11.23 11.21 11.19 11.15 11.13 11.10 11.05 11 10.90 10.45
method we adopted was using Matlab/Simulink first and confirming the results with testing in vehicles. The software framework used was composed of a Matlab/Simulink model of the Steer-by-Wire architecture and a vehicle environment model. The architecture presented in Section 29.4.3 was simulated according to a handwheel angle utilization profile (positions of the handwheel over time). The impact of the variation of Tp on performance, mainly the stability of the vehicle and the time needed to reach the requested wheel angle, was evaluated and translated in terms of the quality-of-service score, denoted by S. Table 29.1 shows an example of the relation between the score S given to the system and the perturbation time. It corresponds to an instantaneous rotation of the handwheel from 0 to 45˚ at 100 km/h. From Table 29.1, with a minimum tolerable score of 11, one sees that 17.6 ms is the critical limit (figures in italic) for this perturbation time; beyond this limit, the vehicle becomes unstable and the safety of the driver can be at risk. The different values of Tp in Table 29.1 correspond to the cases where, during 1, 2, 4, 5, 6, 7, 8, 9, and 14 consecutive cluster cycles, the front-axle actuators receive nothing (caused, for instance, by environmental perturbations). In practice, even without receiving any sampling data, the actuator still performs, but the turning of the front axle is made, for instance, on the basis of the command of the previous period, or with an estimation based on the commands of several previous periods. So, in this case, Tp + Tmec is no longer the end-to-end delay strictly speaking, but the delay during which the system has not been able to take into account the current handwheel angle in order to compute the commands for the actuators. 29.4.4.5.2 Quantification of the Behavioral Reliability With the use of the TTP/C network, communication cycles are predefined and cyclic. However, as will be shown by the analysis of the example in Figure 29.7, Tp can be greater than a cluster cycle because of a possible desynchronization between the sampling period and the cluster cycle. The evaluation of the behavioral reliability should be based on the worst-case Tp but not the nominal Tp because of the safetycritical nature of the system. Moreover, with transient failures due to perturbations, Tp becomes a random variable. Therefore, in this section, we first evaluate Tp and then the behavioral reliability. 29.4.4.5.2.1 Worst-Case Pure Delay without Transient Failures — Figure 29.7 shows the temporal characteristics of the front-axle control function and the relationship with the cluster cycles. The fault-tolerant communication layer (Section 29.3.3.2) has been configured so that the front-axle ECUs (FAA ECU1 and FAA ECU2) have to wait for all replicas of data before consuming them. Cluster cycles are numbered by index i = 1, 2, … The ith cluster cycle starts at ti. For the front-axle control function, the worst-case pure delay, denoted by TpWC, appears when the ith handwheel sensor sampling period starts just after ti. This slight desynchronization leads to a situation where, during the ith cluster cycle, only the data elaborated using the sample of the (i – 1)th cluster cycle are transmitted to the front-axle ECUs. In fact, in TTP/C, data kept in the buffer of each HW ECU are transmitted at the beginning of each slot. In Figure 29.7, at the beginning of each HWA ECU slot, data in the buffer have
© 2005 by CRC Press
29-16
The Industrial Communication Technology Handbook
hand wheel sampling i sensors
hand wheel sampling i+1 treatment of sample number i+1
treatment of sample number i
HW ECU1 HW ECU2 Buffers of HW ECU1
data i-1
data i+1
data i data i-1
HW ECU2
transmission of data i-1
TDMA cycle
HWA ECU1
FAA ECU1
data i+1
data i transmission of data i
HWA ECU2
ti treatment of data i-1
FAA ECU2
HWA ECU1
end of t i-1 transmission of data i-1
FAA ECU1
HWA ECU2
treatment of data i
FAA ECU2
...
end of transmission of data i
t
FAA ECUs actuator action for data i
actuator action for data i-1 Actuators cluster cycle (T MA)
TNET Pure delay for data i
TT Mechatronic delay
FIGURE 29.7 Temporal characteristics of the function “turning the wheels according to the driver’s request.”
not yet been refreshed and only data corresponding to the previous sample are transmitted. So, the worstcase delay between the ith HWA sample and the beginning of the ith actuation is given by TpWC = TMA + TNET + TT
(29.1)
where TMA is the duration of a cluster cycle, TNET corresponds to the delay between the beginning of a cluster cycle and the arrival of all the replicas to the FAA ECU2, and TT is the treatment time of the data within an FAA ECU. Although in our case we have only one TDMA round per cluster cycle, in general one can find several TDMA rounds per cluster cycle. In this latter case, assuming that there is one computation per data reception, Equation 29.1 takes the form TpWC = nTr + TNET + TT
(29.2)
where Tr is the duration of a TDMA round and n is the number of TDMA rounds between two HWA data emissions (n could be less than or equal to the number of TDMA rounds in a cluster cycle). This result should be used for system dimensioning at the design step in order to ensure that TpWC is smaller than the tolerable upper bound Tmax (e.g., 17.6 ms for the example presented in 0). For the slot placement given in Figure 29.7, we obtain
© 2005 by CRC Press
29-17
Design of Automotive X-by-Wire Systems
TWCPD = 1 * 2 + 1.4 + 0.2 = 3.6 ms This is to say that in failure-free conditions, the pure delay is bounded by 3.6 ms. 29.4.4.5.2.2 Pure Delay under Transient Failures and Behavioral Reliability Evaluation — When perturbations occur, the pure delay can be longer than TpWC, but how the perturbations will influence the pure delay depends on the failure occurrence model. Establishing a realistic failure model according to perturbation occurrences is a complex statistical work that is beyond the scope of this chapter. A more realistic failure model of EMI perturbations is proposed in [31]. In what follows, to illustrate the evaluation method of behavioral reliability (PBR), we use a simplified failure model, stationary in time and where failures are independent from each other. The granularity of the failure model is the cluster cycle (one failure leads to one erroneous or one empty cluster cycle). A failure can happen at any step detailed in Figure 29.7. Whichever step has a failure, we consider that no information is transmitted to the actuators during this cluster cycle (the information is lost or destroyed before the command is given to the actuators). In the worst case, each time a failure occurs, the actuator has to wait TpWC plus one cluster cycle TMA to receive refreshed information. Let TpWC,ERR denote the maximum delay with N consecutive erroneous cluster cycles: TpWC , ERR = NTMA + TpWC = NnTr + TpWC
(29.3)
Behavioral reliability is then calculated by PBR = P[TpWC,ERR > Tmax] = P[N(nTr) + TpWC > Tmax]. So, we have: PBR = P[N > [(Tmax - TpWC ) / nTr ]]
(29.4)
This probability can be directly used to determine the SILs. In this case study, the requirement is that PBR < 10–9. In our simplified failure model, as the failures occur following a stationary and uniform probability distribution, PBR also gives the probability of failures per hour. For the studied architecture, the maximum tolerable number of erroneous cluster cycles is given by N = [(Tmax - TpWC ) /(nTr )] = [(17.6 - 3.6) / 2] = 7 So, we must here have PBR = P[N > 7] < 10–9 (failure/hour). As explained before, the chosen error model is a simplified one: erroneous or empty communication cycles are assumed independent events and the probability of losing one cluster cycle is ER. So, the probability of losing x consecutive cluster cycles before one successful transmission is P[N = x] = (1 – ER)(ER)x (geometric law): x
PBR = P[N > x ] = 1 -
 P[N = k] = (E )
x +1
R
(29.5)
k= 0
The proposed operational architecture meets the dependability requirements with ER < 0.075 under a geometrical failure model. In practice, it is necessary to use a more realistic error model, constructed on the basis of measurements taken from a prototype. Indeed, the effects of transient failures and external perturbations, such as EMI or temperature peaks, are not negligible and will become even more problematic when the 42-V technology [32] is used.
© 2005 by CRC Press
29-18
The Industrial Communication Technology Handbook
29.5 Conclusion X-by-Wire is a clear trend of automotive development due to the advantages of the electronic components for enhancing safety, functionality, and reducing cost. In this chapter, after having examined the real-time and dependability constraints of the X-by-Wire systems, we reviewed the fault-tolerant services and the communication protocols (TTP/C, FlexRay, and TTCAN) that are needed for such systems. Methods for designing a dependable X-by-Wire system were described and a Steer-by-Wire system based on TTP/C was then used as a case study. We showed how to build a fault-tolerant architecture by choosing the necessary redundant components and the schedule of transmission. A method for evaluating the probability that the real-time constraints would be violated under a simple perturbation model was also proposed. This method can be used to predict whether the architecture meets the SIL4 requirement. If the dependability of the X-by-Wire can be evaluated by assuming that one can establish a realistic failure model, the certification organization still remains to be convinced. At the time of writing, the legislation in some countries does not authorize fully X-by-Wire cars to circulate. The use of X-by-Wire systems mass production cars in the future also depends on other factors such as the advances in the 42-V technology.
References [1] Society of Automotive Engineers (SAE) public discussion: 42 Volt Electrical Systems and Fuel Cells: Harmonious Marriage or Incompatible Partners? SAE (N. Traub), General Motors (Ch. BorroniBird, Director of Design and Technology Fusion), Delphi (J. Botti, Innovation Center), DaimlerChrysler (T. Moore, Vice President, Liberty and Technical Affairs), UTC Fuels Cells (F.R. Preli, Vice President, Engineering), SAE 2003 World Congress and Exhibition, Detroit, 2003. [2] IEC 61508-1, Functional Safety of Electrical Electronic Programmable Electronic Safety-Related Systems: Part 1: General Requirements, IEC/SC 65A, 1998. [3] J. Rushby, A Comparison of Bus Architectures for Safety-Critical Embedded Systems, Technical Report, Computer Science Laboratory SRI International, 2003. [4] E. Dilger, T. Führer, B. Müller, S. Poledna, The X-by-Wire Concept: Time-Triggered Information Exchange and Fail Silence Support by New System Services, Technische Universität Wien, Institut für Technische Informatik, no. 7/1998; also available as SAE Technical Paper 98055, 1998. [5] C. Temple, Avoiding the babbling-idiot failure in a time-triggered communication system, in International Symposium on Fault-Tolerant Computing (FTCS), Munich, Germany, 1998. [6] S. Poledna, P. Barrett, A. Burns, A. Wellings, Replica determinism and flexible scheduling in hard real-time dependable systems, IEEE Transactions on Computers, 49, 100–111, 2000. [7] Time-Triggered Protocol TTP/C, High-Level Specification Document, Protocol Version 1.1, 2003. [8] H. Kopetz, Real-Time Systems: Design Principles for Distributed Embedded Applications, Kluwer Academic Publishers, Dordrecht, 1997. [9] H. Pfeifer, Formal verification of the TTP group membership algorithm, in FORTE/PSTV Euroconference, Pisa, Italy, 2000. [10] H. Kopetz, R. Nossal, R. Hexel, A. Krüger, D. Millinger, R. Pallierer, C. Temple, M. Krug, Mode handling in the Time-Triggered Architecture, Control Engineering Practice, 6, 61–66, 1998. [11] TTTech Computertechnik AG, http://www.tttech.com/, 2004. [12] G. Bauer, M. Paulitsch, An investigation of membership and clique avoidance in TTP/C, in 19th IEEE Symposium on Reliable Distributed Systems, Nuremberg, Germany, 2000. [13] FlexRay Consortium, http://www.flexray.com, 2004. [14] ISO 11898-4, Road Vehicles: Controller Area Network (CAN): Part 4: Time Triggered Communication. [15] Bosch, Time Triggered Communication on CAN, http://www.can.bosch.com/content/ TT_CAN.html, 2004.
© 2005 by CRC Press
Design of Automotive X-by-Wire Systems
29-19
[16] OSEK Consortium, OSEK/VDX Time-Triggered Operating System, Version 1.0, available at http: //www.osek-vdx.org/, 2001. [17] OSEK Consortium, OSEK/VDX Fault-Tolerant Communication, Version 1.0, available at http:// www.osek-vdx.org/, 2001. [18] N. Tracey, Comparing OSEK and OSEKTime, in Embedded System Conference (ESC) Europe, Stuttgart, Germany, 2001. [19] X-by-Wire Project, Brite-EuRam 111 Program, X-By-Wire: Safety Related Fault Tolerant Systems in Vehicles, Final Report, 1998. [20] A. Avizienis, J.-C. Laprie, B. Randell, Fundamental concepts of dependability, in 3rd Information Survivability Workshop, Boston, MA, pp. 7–12, 2000. [21] J.A. Garay, K.J. Perry, A continuum of failure models for distributed computing, in 6th Distributed Algorithm International Workshop (WDAG), Haifa, Israel, 1992. [22] L. Lamport, R. Shostak, M. Pease, The Byzantine Generals Problem, ACM Transactions on Programming Language and Systems, 4, 382–401, 1982. [23] G. Grünsteidl, H. Kantz, H. Kopetz, Communication reliability in distributed real-time systems, in 10th IFAC Workshop on Distributed Computer Control Systems, Semmering, Austria, 1991. [24] P. Herout, S. Racek, J. Hlavicka, Model-based dependability evaluation method for TTP/C based systems, in EDCC-4: Fourth European Dependable Computing Conference, Toulouse, France, 2002. [25] R. Hexel, FITS: a fault injection architecture for time-triggered systems, in 26th Australian Computer Science Conference (ACSC2003), Adelaide, Australia, 2003. [26] Report by the Inquiry Board, Ariane 501 Flight Failure, available at http://www.mssl.ucl.ac.uk/ www_plasma/missions/cluster/about_cluster/cluster1/ariane5rep.html, 1996. [27] B. Gaujal, N. Navet, Maximizing the Robustness of TDMA Networks with Application to TTP/C, Technical Report RR-4614, INRIA, 2002. [28] B. Gaujal, N. Navet, Optimal replica allocation for TTP/C based systems, in 5th FeT IFAC Conference (FeT 2003), Aveiro, Portugal, July 2003. [29] R. Hammett, P. Babcock, Achieving 10-9 dependability with drive-by-wire systems, in SAE 2003 World Congress and Exhibition, Detroit, MI, 2003. [30] C. Wilwert, Y.Q. Song, F. Simonot-Lion, T. Clément, Evaluating quality of service and behavioral reliability of steer-by-wire systems, in 9th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Lisbon, Portugal, 2003. [31] N. Navet, Y.Q. Song, F. Simonot, Worst-case deadline failure probability in real-time applications distributed over CAN (Controller Area Network), Journal of Systems Architecture, 46, 607–617, 2000. [32] H. Kopetz, H. Kantz, G. Gründsteidl, P. Puschner, J. Reisinger, Tolerating transient fault in MARS, in 20th Symposium of Fault Tolerant Computing, Newcastle upon Tyne, U.K., 1990.
© 2005 by CRC Press
30 FlexRay Communication Technology 30.1 Introduction ......................................................................30-1 30.2 Automotive Requirements ................................................30-1 Cutting Costs • Future Proof
30.3 What Is FlexRay? ...............................................................30-3
Dietmar Millinger
Media Access • Clock Synchronization • Start-Up • Coding and Physical Layer • Bus Guardian • Protocol Services • FlexRay Current State
DECOMSYS (Dependable Computer Systems)
30.4 System Configuration .....................................................30-10
Roman Nossal
30.5 Standard Software Components ....................................30-13
DECOMSYS (Dependable Computer Systems)
References ...................................................................................30-14
Development Models Standardized Interfaces
30.1 Introduction New electronic technologies have dramatically changed cars and the way we experience driving. Antiblocking system (ABS), electronic stability program (ESP), air bags, and many more applications have made cars a lot more convenient, comfortable, and — above all — safer. This trend of the past decade has been rather pleasant for the consumer — and a tedious task for the automotive industry. The reasons for this drawback have not only to do with the need of higher integration of the involved technologies. The very nature of the deployed communication technologies makes the task of integration itself a lot more complex, as well as the design of fault-tolerant systems on top of these communication technologies rather difficult. These limitations, on the one hand, as well as requirements and anticipated challenges of future automotive applications, on the other hand, motivated OEMs and suppliers to join forces. The goal of the 2001 founded FlexRay consortium [1] is to establish one standard for a high-performance communication technology in the automotive industry.
30.2 Automotive Requirements Since OEMs and suppliers were the founding fathers of the FlexRay consortium, it was clear from the very beginning of the work on the new de facto communication standard that FlexRay would have to meet the requirements of the automotive industry. Therefore, two key issues have driven the development work for the communication protocol: the need for a technological basis and solution for future safetyrelated applications and the need to keep costs down.
30-1 © 2005 by CRC Press
30-2
The Industrial Communication Technology Handbook
30.2.1 Cutting Costs The cost factor is a key driver for many requirements for the communication system, as the push for systematic reuse of existing components in multiple car platforms proves. Due to this approach, subsets of components related to a specific function can be reused in multiple platforms without changes inside the components. This elegant and cost-saving solution, however, is only possible if the communication system offers two decisive qualities: 1. It must be standardized and provide a stable interface to the components. 2. It has to provide a deterministic communication service to the components. This communication determinism is the solution for the problem of interdependencies between components, which is a major problem and cost factor in today’s automotive distributed systems. Since any change in one component can change the behavior of the entire system, integration and testing are of utmost importance, and therefore extremely difficult and expensive, in order to ensure the needed system reliability. A deterministic communication system significantly reduces this integration and test effort because it guarantees that the cross-influence is completely under control of the application and not introduced by the communication system. 30.2.1.1 Migration A new technology such as FlexRay does not make all predecessors obsolete at once. It rather replaces the traditional systems gradually and builds on proven solutions. Therefore, existing components and applications have to be migrated into new systems. In order to make this migration path as smooth and efficient as possible, FlexRay has integrated some key qualities of existing communication technologies, e.g., dynamic communication. 30.2.1.2 Scalability Communication determinism and reuse are also key enablers for scalability, which obviously is yet another cost-driven requirement. Scalability, however, calls not only for communication determinism and reuse, but also for the support of multiple network topologies and network architectures, as well as the applicability of the communication technology in different application domains like power train, chassis control, backbone architectures, or driver assistance systems.
30.2.2 Future Proof Keeping costs down is only one side of the coin. The automotive industry has visions of the future car and applications. The most obvious developments are active safety functions like electronic braking systems, driver convenience functions like active front steering, and the fast-growing domain of driver assistance systems like active cruise control or the lane departure warning function. These automotive applications demand a high level of reliability and safety from the network infrastructure in the car in order to provide the required level of safety at the system level. Therefore, the communication technology has to meet requirements such as redundant communication channels, a deterministic media access scheme, high robustness in case of transient faults, a distributed fault-tolerant agreement about the protocol state, and extensive error detection and reporting toward the application. The most stringent particular requirement arises from the deterministic media access scheme. In time-division multipleaccess (TDMA) schemes for networks, all participating communication partners need a common understanding of the time used in order to control access to the communication medium. Typically, a faulttolerant distributed mechanism for clock synchronization is required. Additionally, the safety requirement introduces the need to protect individual communication partners from faults of other partners by means of guardians. Otherwise, errors of one partner could cross-influence other partners, thus violating safety demands. Specific automotive issues complete the broad range of requirements forming the framework for and of FlexRay. These issues include the use of automotive components like crystals, automotive electromag-
© 2005 by CRC Press
30-3
FlexRay Communication Technology
node A
node B
node C
node D
node E
node F
Channel 0 Channel 1
FIGURE 30.1 FlexRay network.
Power Supply
Node
Host Host Interface
Communication Controller (CC)
Bus Guardian (BG)
Bus Driver (BD)
Channel 0 Channel 1
FIGURE 30.2 ECU architecture.
netic compatibility (EMC) requirements, support for power management to conserve battery power, support for electrical and optical physical layers, and a high-bandwidth demand of at least 2 ¥ 10 Mbit/s.
30.3 What Is FlexRay? Before the development of FlexRay was started, a comprehensive evaluation of the existing technologies took place. The results showed that none of the existing communication technologies could fulfill the requirements to a satisfactory degree. Thus, the development of a new technology was started. The resulting communication protocol FlexRay is an open, scalable, deterministic, and high-performance communication technology for automotive applications. A FlexRay network consists of a set of electronic control units (ECUs) with integrated communication controllers (Figure 30.1). Each communication controller connects the ECU to one or more communication channels via a communication port, which in turn links to a bus driver. The bus driver connects to the physical layer of the communication channel and can contain a guardian unit that monitors the TDMA access of the controller (the architecture of an ECU is depicted in Figure 30.2). A communication channel can be as simple as a single bus wire or as complex as active or passive star configurations. FlexRay supports the operation of a communication controller with single or redundant communication channels. In the case of a single communication channel configuration, all controllers are attached to the communication channel via one port. In the case of redundant configuration, controllers can be attached to the communication channels via one or two ports. Controllers that are connected to two channels can be configured to transmit data redundantly on two channels at the same time. This redundant transmission allows the masking of a temporary fault of one communication channel and thus constitutes a powerful fault tolerance feature of the protocol. A second fault tolerance feature related to transient faults can be constructed by the redundant transmission of data over the same channels with a particular time delay between the redundant transmissions. This delayed transmission allows the toleration of transient faults on both channels under particular preconditions.
© 2005 by CRC Press
30-4
The Industrial Communication Technology Handbook
Communication cycle network communication time
static segment
network idle time
dynamic segment
symbol window optional
static slot
static slot
static slot
minislot
minislot
minislot
minislot
minislot
minislot
FIGURE 30.3 FlexRay communication cycle.
30.3.1 Media Access The media access strategy of FlexRay is basically a TDMA scheme with some very specific properties. The basic element of the TDMA scheme is a communication cycle. A communication cycle contains a static segment, a dynamic segment, and two protocol segments called symbol window and network idle time (Figure 30.3). Communication cycles are executed periodically from start-up of the network until shutdown. Two or more communication cycles can form an application cycle. The static segment consists of slots with fixed duration. The duration and number of slots are determined by configuration parameters of the FlexRay controllers. These parameters must be identical in all controllers of a network. They form a so-called global contract. Each slot is exclusively owned by one FlexRay communication controller for transmission of a frame. This ownership only relates to one channel. On other channels, in the same slot, either the same or another controller can transmit a frame. The identification of the transmitting controllers in one slot is also determined by configuration parameters of the FlexRay controllers. This piece of information is local to the sending controller. The receiving controllers do not possess any knowledge on the transmitter of a frame; they are configured solely to receive in a specific slot. Hence, the content of a frame is determined by its positions in the communication cycle. The static segment provides deterministic communication timing, since it is exactly known when a frame is transmitted on the channel, giving a strong guarantee for the communication latency. This strong guarantee in the static segment comes with a trade-off of fixed-bandwidth reservation. The dynamic segment has fixed duration, which is subdivided into so-called minislots. A minislot has a fixed length that is substantially shorter than that of a static slot. The length of a minislot is not sufficient to accommodate a frame; a minislot only defines a potential start time of a transmission in the dynamic segment. Similar to static slots, each minislot is exclusively owned by one FlexRay controller for the transmission of a frame. During the dynamic segment, all controllers in the network maintain a consistent view about the current minislot. If a controller wants to transmit in a minislot, the controller accesses the medium and starts transmitting the frame. This is detected by all other controllers, which interrupt the counting of minislots. Thus, the minislot is expanded to a real slot, which is large enough to accommodate a frame transmission. It is only after the end of the frame transmission that counting of the minislots continues. The expansion of a minislot reduces the number of minislots available in this dynamic segment. The operation of the dynamic segment is illustrated in Figure 30.4: Figure 30.4a shows the situation before minislot 4 occurs. Each of the channels offers 16 minislots for transmission. The owner of minislot 4 on channel 0 — in this case controller D — has data to transmit. Hence, the minislot is expanded as shown in Figure 30.4b. The number of available minislots in the dynamic segment on channel 0 is reduced to 13. If there are no data to transmit by the owner of a minislot, it remains silent. The minislot is not expanded and slot counting continues with the next minislot. Because no minislot expansion occurred, no additional bandwidth beyond the minislot itself is used; hence, other, lower-priority minislots have more bandwidth available. This dynamic media access control scheme produces a priority and demand-driven access pattern that optimally uses the reserved bandwidth for dynamic communication. A controller that owns an earlier
© 2005 by CRC Press
30-5
FlexRay Communication Technology
Communication Cycle a) Symbol Window Static Segment
Channel
0
1
Dynamic Segment
2
A1
3
4
NI T
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
C1
Channel
1
t 1 A1
2 B1
3
4
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
C2
t
Communication Cycle b) Symbol Window Static Segment
Channel
0
1
Dynamic
2
A1
3
4
C1
Segment
NI T
5 6 7 5 8 6 9 10 7 11 8 12 9 10 13 11 14 12 15 13 16 14 17 15 18 16 19 D1
Channel
1
t 1 A1
2 B1
3
4
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
C2
t
FIGURE 30.4 FlexRay dynamic segment.
minislot, i.e., a minislot that has a lower number, has higher priority. The further back in the dynamic segment a minislot is situated, the higher is the probability that it will not be in existence in a particular cycle due to the expansion of higher-priority slots. A minislot is only expanded and its bandwidth used if the owning controller has data to transmit. As a consequence, the local controller configuration has to ensure that each minislot is configured only once in a network. The minimum duration of a minislot is mainly determined by physical parameters of the network (delay) and by the maximum deviation of the clock frequency in the controllers. The duration of a minislot and the length of the dynamic segment are global configuration parameters that have to be consistent within all controllers in the network. The symbol window is a time slot of fixed duration, in which special symbols can be transmitted on the network. Symbols are used for network management purposes. The network idle time is a protocol-specific time window in which no traffic is scheduled on the communication channel. The communication controllers use this time window to execute the clock synchronization algorithm. The offset correction (see below) that is done as a consequence of clock synchronization requires that some controllers correct their local view of the time forward and others have to correct backward. The correction is done in the network idle time. Hence, no consistent operations of the media access control can be guaranteed, and thus silence is required. Since this duration has to be subtracted from net bandwidth, it is kept as small as possible. The minimum length is largely determined by the maximum deviation between the local clocks after one communication cycle. The duration of the network idle time is a global parameter that has to be consistent between all controllers in a network.
© 2005 by CRC Press
30-6
The Industrial Communication Technology Handbook
Pure static configuration
static segment (min. 2 static slots)
symbol window optional
Mixed configuration
static seg. (min. 2 static slots)
dynamic segment
symbol window optional
Pure dynamic configuration
dynamic segment degraded static segment (min. 2 static slots)
symbol window optional
FIGURE 30.5 FlexRay configurations.
While at least a minimal static segment and the network idle time are mandatory parts of a communication cycle, the symbol window and dynamic segment are optional parts. This results in basically three reasonable configurations (Figure 30.5): • The pure static configuration, which contains only static slots for transmission. In order to enable clock synchronization, the static segment must consist of at least two slots, which are owned by different controllers. If a fault-tolerant clock synchronization should be maintained, the static segment must contain at least four static slots. • The mixed configuration with a static segment and a dynamic segment, where the ratio between static bandwidth and dynamic bandwidth can vary in a broad range. • Finally, a pure dynamic configuration with all bandwidth assigned to dynamic communication. This configuration also requires a so-called degraded static segment, which has two static slots. Considering the most likely application domains, mixed configurations will be dominant. Depending on the actual configuration, FlexRay can achieve a best-case bandwidth utilization of about 70%, with the average around 60%.
30.3.2 Clock Synchronization The media access scheme of FlexRay relies on a common and consistent view about time that is shared between all communication controllers in the network. It is the task of the clock synchronization service to generate such a time view locally inside each communication controller. For the detailed description of the clock synchronization, first the representation of time inside a communication controller is described. The physical basis for each time representation is the tick of the local controller oscillator. This clock signal is divided by an integer multiple to form a clock signal called microtick. An integer number of microticks forms a time unit called macrotick. Minislots and static slots are set up as integer multiples of macroticks. The number of microticks that constitute a macrotick is statically configured. However, for adjustment of the local time, the clock synchronization service can temporarily adjust this ratio in order to accelerate or decelerate the macrotick clock. The clock synchronization service is a distributed control system that produces local macroticks with a defined precision in relation to the local macroticks of the other controllers of a network. The control system takes some globally visible reference events that represent the global time ticks, measures the deviation of the local time ticks from the global ticks, and computes the local adjustments in order to minimize the deviation of the local clock from the global ticks. Due to the distributed nature of the FlexRay system, no explicit global reference event exists. The only event that is globally observable on the communication channel is a frame transmission. The start of a transmission is triggered by the local time base of the sending controller. Each controller can collect these reference events to form a virtual
© 2005 by CRC Press
FlexRay Communication Technology
30-7
global time base by computing a fault-tolerant mean value of the deviation between the local time and the perceived events. The reasoning behind this approach is based on the assumption that the majority of local clocks in the network are correct. Correctness of a local clock is given when the local clock does not deviate from every other by more than the precision value. A controller with a locally correct clock sees only deviation values within the precision value. By computing the median value of deviations, the actual temporal deviation of the local clock from the virtual reference tick is formed. Next, the local clock is adjusted such that the local deviation is minimized. Since all nodes in the network perform this operation, all local clocks move their ticks toward the tick of the virtual global time. The operation of observation, computing, and correction is performed in every communication cycle. In the case of wrong transmission times of faulty controllers on the communication channel, things get more complicated. Here, a special part of the fault-tolerant median value algorithm takes over. This algorithm uses only the best of the measured deviation values. All other values are discarded. This algorithm ensures that the maximum influence of a faulty controller to the virtual global time is strictly bound. Additionally, the protocol requires the marking of particular synchronization frames that can be used for deviation measurement. The reasoning behind this mechanism is twofold: First, it is used to pick exactly one frame from a controller in order to avoid monopolization of the global time by one controller with many transmit frames. Second, particular controllers can be excluded from clock synchronization, either because the crystal is not trustworthy or, more likely, because there are system configurations in which a controller is not available. In case of a faulty local clock, the faulty controller perceives only deviation values that exceed a particular value. This value can be derived from the precision value. This condition is checked by the synchronization service and an error is reported to the application. A specific extension of the clock synchronization services handles the compensation of permanent deviations of one node. In case such a permanent deviation is detected, a permanent correction is applied. The detection and calculation of such permanent deviations are executed less frequently than the correction of temporal deviations. The error handling of the protocol follows a strategy that identifies every problem as fast as possible, but keeps the controller alive as long as possible. Problem indicators are frames that are received outside their expected arrival intervals or when the clock synchronization does not receive sufficient synchronization frames. The automatic reaction of the controller is to degrade the operation from a sending mode to a passive mode where reception is still possible. At the same time, the problem is indicated to the application, which can react in an application-specific manner to the detected problem. This strategy gives the designer of a system maximum flexibility.
30.3.3 Start-Up The preceding protocol description handled only the case of an already running system. To reach this state, the start-up service is part of the protocol. Its purpose is to establish a common view on the global time and the position in the communication cycle. Generally, the start-up service has to handle two different cases. The cold-start case is a start-up of all nodes in the network, while the reintegration case is the integration of a starting controller into an already running set of controllers (Figure 30.6). During cold start, the algorithm has to ensure that a cold-start situation is really given. Otherwise, the starting controller might disturb an already running set of controllers. For this reason, the starting controller has to listen a considerable amount of time for the so-called listen timeout for traffic on the communication channel. In case no traffic is detected, the controller assumes a cold-start situation and starts to transmit frames for a limited number of rounds. If another controller responds with frames that fit the slot counter of the cold-start node, start-up was successful. In case traffic is detected during the observation period, the controller changes into the reintegration mode. In this mode, the controller has to synchronize the slot counter with the frames seen on the channel. Therefore, the controller receives frames from the channel and sets the slot counter accordingly.
© 2005 by CRC Press
30-8
The Industrial Communication Technology Handbook
Listen timeout
Traffic detected?
Yes
No
Listen to traffic & gather time and cycle information
Send frames and wait for matching reply
FIGURE 30.6 Start-up process.
For a certain period, the controller checks the plausibility of the received frames in relation to the internal slot counter. If there is a match, the controller enters the normal mode, in which active transmission of frames is allowed.
30.3.4 Coding and Physical Layer The frame format for data transmission contains three sections: the header, payload (body section), and trailer (Figure 30.7). The header contains protocol control information like the synchronization frame flag, the frame ID, a null frame indicator, the frame length, and a cycle counter. The payload section contains up to 254 bytes of data. In case the payload does not contain any data, the null frame indicator is set. Optionally, the data section can contain a message ID, which identifies the type of information transported in the frame. The trailer section contains a 24-bit cyclic redundancy check (CRC) that protects the complete frame. The existing FlexRay communication controllers support communication bit rates of up to 10 Mbps on two channels over an electrical physical layer. The physical layer is connected to the controller via a transceiver component. This physical layer supports bus topologies, star topologies, cascaded star topologies, and bus stubs connected to star couplers, as shown in Figure 30.8. This multitude of topologies allows a maximum of scalability and flexibility of electronic architectures in automotive applications. Besides transforming bit streams between the communication controller and physical layer, the transceiver component also provides a set of very specific services for an automotive network. The major services are alarm handling and wake-up control. Alarm signals are a very powerful mechanism for diverse information exchange between a sender controller and receiver controllers. A sender transmits an alarm symbol on the bus parallel to alarm information in a frame. A receiver ECU receives the alarm information in the frame like normal data. Additionally, the communication controller receives the alarm Protocol Flags Frame ID Length
Header Cycle CRC
40 bit Header Section
Message ID Data
Data
16 bit
Data
Data
0 ... 252 Bytes Body Section
FlexRay Frame 5 + (0 ... 254) + 3 Bytes
FIGURE 30.7 FlexRay frame format.
© 2005 by CRC Press
CRC
CRC
CRC
24 bit Trailer Section
30-9
FlexRay Communication Technology
Bus
Active Cascaded Stars
Active Star
Active Stars with Bus Extension
FIGURE 30.8 Topologies.
symbol on the physical layer and indicates this symbol to the ECU. Thus, the ECU has two highly independent indicators for an alarm to act on. This scheme can be used for the validation of critical signals like an air bag fire command. The second type of service provided by the symbol mechanism is the wake-up function. A wake-up service is required in automotive applications where electronic components have a sleep mode, in which power consumption is extremely reduced. The wake-up service restarts normal operation in all sleeping ECU components. In a network, the wake-up service uses a special signal that is transmitted over the network. In FlexRay, this function relies on the ability of a transceiver component to identify a wake-up symbol and to signal this event to the communication controller and the remaining components of the ECU to wake these components up.
30.3.5 Bus Guardian The media access strategy completely relies on the cooperative behavior of every communication controller in a network. The protocol mechanisms inside a controller ensure this behavior to a considerably high level of confidence. However, for safety-relevant applications, the controller internal mechanisms do not provide a sufficiently high level of confidence. An additional and independent component is required to ensure that no controller can disturb the media access mechanism of the network. This additional component is called a bus guardian. In FlexRay, the bus guardian is a component with an independent clock and is constructed such that an error in the controller cannot influence the guardian and vice versa. The bus guardian is configured with its own set of parameters. These are independent of the parameters of the controller, although both parameter sets represent the same communication cycle and slot pattern. During runtime, the bus guardian receives synchronization signals from the controller in order to keep track with the communication cycle. Using its own clock, the bus guardian verifies that those synchronization signals do avoid being influenced by a faulty controller. Typically, the bus guardian will be combined with the transceiver component. Optionally, a central bus guardian located inside a star coupler can be used.
30.3.6 Protocol Services Application information is transmitted by the communication controller inside of frames. A frame contains one or more application signals. A controller provides an interface for frame transmission and reception that consists of buffers. A buffer consists of a control/status section and the data section. These sections have different semantics for receive and transmit frames and for static and dynamic slots. The control section of transmit buffers for frames in the static segment contains the slot ID and channel in which the frame is transmitted. Once a buffer of this type is configured and the communication is
© 2005 by CRC Press
30-10
The Industrial Communication Technology Handbook
started, the controller periodically transmits the data in the data section in the slot configured in the slot ID. When the application changes the data in the buffer, the subsequent transmission contains the new data. A special control flag allows modifying this behavior so that in case the application does not update the data in the buffer, a null frame is transmitted, signaling the failure to update to other controllers. The control section of a receive buffer for frames in the static segment defines the slot ID and channel from which the frame should be loaded into the buffer. The status section contains the frame receive status and the null frame indicator. One special flag indicates that a new frame has been received. It is important to note that the slot ID and channel selection for slots in the static segment cannot be changed during operation. Buffer status and control sections for frames in the dynamic segment are similar to the buffers in the static segment. Differences result from the fact that the slot ID and channel can be changed during normal operation and that multiple buffers can be grouped together to form a first-in, first-out (FIFO) for frame reception from the dynamic section. The communication controller provides a set of timers that run clocked by the synchronized time of the network. Several different conditions can be used to generate interrupts based on these timers. These interrupts are efficient means to synchronize the application with the timing on the bus.
30.3.7 FlexRay Current State At the time of writing, the protocol was in its final stage of development; the protocol specification had the version number 1.9. The first public release of the protocol — specification version 2.0 — was released in July 2004. FlexRay controllers are currently available from Freescale. This controller is based on intermediate versions of the protocol specification; the latest version, named MFR4200, implements the protocol versions 1.0 and 1.1. Apart from Freescale, Bosch and NEC have announced that in the near future they will also offer FlexRay controllers. The special physical layer for FlexRay is provided by Phillips. It offers support for the topologies mentioned above and a data rate of 10 Mbit/s on one channel. There will be two versions of the bus drivers: one with an integrated bus guardian and the other without this unit.
30.4 System Configuration With the advent of the TDMA communication technologies, especially in the automotive application domain, the offline configuration of networks has become increasingly important. Offline configuration means that the configuration parameters of the communication controllers are not generated during the runtime of the system, but are determined throughout the development time of the system. The processes for system development are mainly determined by the applied technology, but are also driven by industryspecific technical or organizational constraints. In the following section, the background for such a design process is described by first defining a model for the used information and then explaining the information processing. The information model categorizes information into eight information domains. This categorization is comprehensive in the sense that each and every piece of development information belongs to one of the information domains (Figure 30.9). The functional domain defines entities called functions and the communication relations between them. Functions describe the functionality of the entire system, creating a hierarchy from very abstract high-level functions down to specific, tailored ones. A system is normally composed of more than one function. In a vehicle, for example, the functional hierarchy would feature chassis functions on the top level, steering functions, and braking functions, the latter being broken down even more into basic braking function, antilock brake functions, and so on. A communication relation between functions or within a function starts at a sender function, connects the sender function with receiver functions, and has an assigned signal.
© 2005 by CRC Press
30-11
FlexRay Communication Technology
C Functional Domain
Func 1 A
A:
unit B max 200 min 0
A
B:
B Message X
Func 3
B
Func 2
Single Domain
int B max 32000 min − 32000
C
Message Domain
Process Domain Function 1
Message Y Process 1.a
Sender ECU1 Receiver ECU2
Sender ECU1 Receiver ECU2
Network Domain
Process 1.b
Process 1.c
TDMA Domain Msg X
Dispatching Domain
Msg Y
Slot 1
Slot 2
Slot 3 Process 1.a
ECU1
ECU2
Communication Bus
Architecture Domain
Pr oc
2.a
Process 1.b
FIGURE 30.9 Information domains.
Signals are defined in the signal domain. A signal definition contains a signal name, the signal semantics, and the value range. Signals are assigned to messages. The message domain defines messages, i.e., packages of signals that should be transmitted together. A message has a name and a fixed setup of the signals it contains. The network domain determines which ECU will send a frame containing a message as well as which ECUs should receive this frame. The TDMA domain establishes the exact points in time when frames are transmitted. Finally, the architecture domain defines the physical structure of a system, with all ECUs, communication systems, and the connections of the ECUs to them. The above six information domains are of importance for the entire system, i.e., for all ECUs of the systems. Hence, they are considered global information. Two additional domains complement the message, network, TDMA, and architecture domains. These two domains are the process domain and the dispatching domain. The process domain describes the software architecture of the system. It lists all processes and their interactions like mutual exclusion, as well as the assignment of processes to functions. Processes are information processing units of an application. They have timing parameters assigned to them that define the period and the time offset of their execution. The dispatching domain is the ECU counterpart of the TDMA domain. It determines the application timing, i.e., which process is executed at which point time. Implicitly, this also defines the preemption of processes, i.e., when a process is interrupted by the execution of another process. The latter two information domains feature information that is relevant for only one ECU. Hence, they are considered to be local information (local referring to one ECU rather than the entire system). The categorization of information given by the information domain model leads the way to the development process. It is an organizational constraint in the automotive development processes that the knowledge related to the system to be developed is distributed among the process participants. These participants are typically the automobile manufacturer (OEM) and one or more suppliers. The OEM possesses the information on the intended system functions, the envisioned system architecture, and the
© 2005 by CRC Press
30-12
Communication Scheduling
Split
Partial Function Model
Comm. Timing
Schedule Export
Software Model
Task Scheduling
Function Model
Assignment Model
Software Design
Hardware Model
Function Distribution & Assignment
The Industrial Communication Technology Handbook
Appl. Timing
FIGURE 30.10 OEM–supplier development process.
allocation of functions to architectural components, i.e., to ECUs. Thus, the OEM’s knowledge covers three of the six global information domains. The supplier, on the other hand, is the expert on function implementation and ECU design. This means he provides the knowledge on the process domain, i.e., the software architecture underlying the function implementation. Each function of the system or each part thereof is implemented by a set of interacting processes. The functionality of the ECU relies not only on the software architecture but also on the execution pattern. The supplier has to define the timing of each process executed in his developed ECU. Hence, the dispatching domain is supplier knowledge as well. With five of the eight information domains assigned, the open question is whose responsibility are the three remaining information domains. As described in the previous section, these domains — the message, network, and TDMA domains — define the communication behavior on the system level. They specify at which point in time which message is transferred from which sending ECU to which receiving ECUs. These domains obviously affect all ECUs of the system; hence, no single supplier should be able to define these domains. For this reason, the DECOMSYS development process for collaborative system development between an OEM and several suppliers assigns the message, network, and TDMA domains to the OEM. So far, each piece of development information has been assigned to a process participant, which results in a static structure. In the following, the appropriate dynamic structure and development process will be described in brief. The proposed OEM–supplier development process takes a two-phase approach (Figure 30.10). In the first phase, the OEM has to cover all global aspects; subsequently, the suppliers deal with the local aspects. The process builds on the functional model of the system, which belongs to the functional domain and the signal domain, and the architectural model, belonging to the architecture domain. The functional model describes the functions of the system and their structure, consisting of subfunctions. As subfunctions have to exchange information in order to provide the intended function output, the functional model inherently also defines the signals that are transferred from one subfunction to others. The functional model is complemented by the architectural model, which defines the system topology and the ECUs that are present in the system. The mapping between these two models results in a concrete system architecture for a particular system in a particular vehicle. This is described in the distributed functional model. Based on the distributed functional model, the OEM performs communication scheduling. During this operation, signals are packed to messages, which in turn are scheduled for transmission at a specific point in time. Communication scheduling concludes the global design steps and thus the OEM’s tasks. The suppliers base their local design steps on the global information given by the OEM. The socalled split of the distributed functional model tells the supplier which functions or subfunctions the ECU has to perform, that is, his responsibility. The supplier conducts software architecture design and
© 2005 by CRC Press
FlexRay Communication Technology
30-13
creates the software model for each function or subfunction. The resulting list of processes is scheduled for the ECU, taking into account local constraints as well as the global constraints defined in the communication schedule. Note that for performing the local design, the supplier solely requires parts of the global information created by the OEM as well as his own knowledge on the ECU. In principle, the suppliers do not influence each other.
30.4.1 Development Models The use of development models with a clear purpose and information content is the answer to the challenge of reuse of components for different car lines. Each model focuses on a certain type of information. The full picture of the system consists of these individual models and the mappings between them. When developing a new car, only those models affected by the differences between the previous version and the new vehicle have to be adapted, while the other models remain unchanged. To be more specific, the reuse of system parts calls for the separation of the architectural and functional models. The functional model, i.e., the functions to be executed by the distributed system, is primarily independent of a specific car model and can be reused in different model ranges. The architectural model, on the other hand, i.e., the concrete number of ECUs and their properties in a certain car, varies between model ranges. It is decisive for a useful development process to allow the separate development of the functional model and the architectural model. At the same time, the process must support the mapping of the functions to a concrete hardware architecture. The DECOMSYS OEM–supplier development process meets this requirement.
30.5 Standard Software Components Standard software components within an ECU deliver a set of services to the actual application. The application itself can make use of these services without implementing any of them. For example, if a transport layer is part of the standard software components used in a project, the application does not need to take into account the segmentation and reassembly of data that exceed the maximum message size. In order to reuse the application code in another project, the services offered by the standard software in the new project should be the same. If the standard software provides less functionality, the code has to be changed, as missing services have to be added. In the optimal case, the standardization effort covers all OEMs and suppliers. Only then can reuse of existing code be guaranteed, thus creating a win–win situation for all participants: the OEMs can purchase tested software that has proven its function and reliability in other projects; suppliers, on the other hand, have the possibility to sell this software, which they have created with considerable effort, to other OEMs.
30.5.1 Standardized Interfaces Standard software components are not the only answer to the challenge of reuse. The standard software components and their standardized services must be complemented by standardized interfaces, through which the application software can access these services. Standardizing interfaces for software means providing one operating system API (Application Programming Interface) for the software to access communication as well as other resources, like analog digital converters (ADCs). Similarly, standardized network interfaces allow the reuse of entire ECUs in different networks. The hardware of a distributed system can have a standardized interface represented by an abstract description of the network communication. Standardization of software components as well as interfaces, and thus a system architecture, is not a competitive issue. Depending on where the interfaces are set, there is ample room for each participating company to use its strengths effectively to achieve its purpose. In our opinion, the real competitive issues are the functions in an ECU or the overall system functionality that is realized by the interaction of ECU functions. The special behavior of an electronic power-steering system as perceived by the car driver is
© 2005 by CRC Press
30-14
The Industrial Communication Technology Handbook
mainly determined by the control algorithms and their application data, rather than by the type of communication interface used for integration. With respect to standardization efforts, the industry is currently moving in the right direction. Initiatives like the OSEK/VDX consortium [2], HIS [3], and many others attempt to standardize certain software components and interfaces. The FIBEX group that is now part of the ASAM consortium [4] develops a standardized exchange format between tools based on Extensible Markup Language (XML), which is able to hold the complete specification of a distributed system. Many of these initiatives and projects are now united in the AUTOSAR development partnership [5] with the goal to generate an industry-wide standard for the basic software infrastructure.
References [1] [2] [3] [4] [5]
www.flexray.com. www.osek-vdx.org. www.automotive-his.de. www.asam.de. www.autosar.de.
© 2005 by CRC Press
31 The LIN Standard 31.1 31.2 31.3 31.4
Introduction ......................................................................31-1 The Need............................................................................31-1 History ...............................................................................31-2 Some LIN Basics................................................................31-3 The LIN Physical Layer • The LIN Protocol
31.5 Design Process and Work Flow........................................31-4 System Definition Process • Debugging
31.6 Future .................................................................................31-6 31.7 Volcano LIN Tool Chain...................................................31-6 LIN Network Architect • Requirement Capturing • LIN Target Package • LIN Spector: Test Tool
Antal Rajnák Volcano Communications Technologies AG
31.8 Summary..........................................................................31-11 Acknowledgments......................................................................31-12 Additional Information .............................................................31-13
31.1 Introduction LIN is much more than just another protocol. It is defining a straightforward design methodology, tool interfaces, and a signal-based API (Application Programming Interface) in a single package. The LIN (local interconnect network) is an open communication standard, enabling fast and cost-efficient implementation of low-cost multiplex systems. It supports encapsulation for model-based design and validation, leading to front-loaded development processes that are faster and more cost efficient than traditional development methods. The LIN standard not only covers the definition of the bus protocol, but also expands its scope into the domain of application and tool interfaces, reconfiguration mechanisms, and diagnostic services — thus offering a holistic communication solution for automotive, industrial, and consumer applications. In other words, it is systems engineering at its best, enabling distributed and parallel development processes. Availability of dedicated tools to automate the design and system integration process is a key factor for the success of LIN.
31.2 The Need The car industry today is implementing an increasing number of functions in software. Complex electrical architectures using multiple networks, with different protocols, are the norm in modern high-end cars. The software industry in general is handling software complexity through best practices such as: • Abstraction: Hiding the unnecessary level of detail.
31-1 © 2005 by CRC Press
31-2
The Industrial Communication Technology Handbook
• Composability: Partitioning a solution into a set of separately specified, developed, and validated modules, easily combined into a larger structure inheriting the validity of its components — without the need for revalidation. • Parallel processes: State-of-the-art development processes such as the rational unified process (RUP) are based on parallel and iterative development where the most critical parts are developed and tested in the first iterations. The automotive industry is under constant pressure to reduce cost and lead time, while still providing increasing amounts of functionality. This must be managed without sacrificing quality. It is not uncommon today for a car project to spend half a billion U.S. dollars on development, and perhaps as much as $150 million on prototypes. By shortening lead time, the carmaker creates benefits in several ways; typically both development cost and capital costs are reduced. At the same time, an earlier market introduction creates better sales volumes and therefore better profit. One way of reducing lead time is by eliminating traditional prototype loops requiring full-size cars, and rather relying on virtual development, replacing traditional development and testing methods by Computer Aided Engineering (CAE). To reduce development time while maintaining quality, a reduction in lead time must occur in a coordinated fashion for all major subsystems of a car, such as body, electrical, chassis, and engine. With improved tools and practices for other subsystems, and increasing complexity of the electrical system, more focus must be placed on the electrical development process, as it may determine the total lead time and quality of the car. These two challenges — lead time reduction and handling of increased software complexity — will put growing pressure on the industry to handle development of electrical architectures in a more purposeful manner.
31.3 History The LIN consortium started in late 1998, initiated by five car manufacturers (Audi, BMW, DaimlerChrysler, Volvo, and Volkswagen), the tool manufacturer VCT, and the semiconductor manufacturer Motorola. The work group focused on specification of an open standard for low-cost local interconnect networks in vehicles where the bandwidth and versatility of the Controller Area Network (CAN) are not required. The LIN standard includes the specification of the transmission protocol, the transmission medium, the interface between development tools, and the interfaces for application software programming. LIN promotes scalable architectures and interoperability of network nodes from the viewpoint of hardware and software, and a predictable electromagnetic compatibility (EMC) behavior. LIN complements the existing portfolio of automotive multiplex networks. It will be the enabling factor for the implementation of hierarchical vehicle networks, in order to gain further quality enhancement and cost reduction of vehicles. It addresses the needs of increasing complexity and implementation and maintenance of software in distributed systems by provision for a highly automated tool chain. The main properties of the LIN bus are: • Single master–multiple slaves structure • Low-cost silicon implementation based on common Universal Asynchronous Receiver/Transmitter (UART)/Serial Communications Interface (SCI) hardware, an equivalent in software, or as a pure state machine • Self-synchronization without a quartz or ceramics resonator in the slave nodes • Deterministic signal transfer entities, with signal propagation time computable in advance • Signal-based API A LIN network is composed of one master and one or more slave nodes. The medium access is controlled by a master node — no arbitration or collision management in the slaves is required. Worstcase latency of signal transfer is guaranteed.
© 2005 by CRC Press
31-3
The LIN Standard
BUS Voltage
Electronic Control Unit VBAT 8...18 V master: 1kΩ slave: 30kΩ SCI/UARTRx or SLIC Tx
Rcessive Logic ‘1’
60% Bus
Controlled slope 40% Dominant Logic ‘0’ GND
Time
Master: 2.2nF Slave : 220pF
FIGURE 31.1
Logical states and corresponding voltage levels on a LIN-bus.
31.4 Some LIN Basics LIN is a low-cost, single-wire network. The starting point of the physical layer design was the ISO 9141 standard. In order to meet EMC requirements, the slew rates are controlled. The protocol is a simple master–slave protocol based on the common UART format. In order to enable communication between nodes clocked by low-cost resistance capacitor (RC) oscillators, synchronization information is transmitted by the master node on the bus. Slave nodes will synchronize with the master clock, which is regarded to be accurate. The speed of the LIN network is up to 20 kbit/s, and the transmission is protected by a checksum. The LIN protocol is message identifier based. The identifiers do not address nodes directly, but denote the meaning of the messages. This way, any message can have multiple destinations (multicasting). The master sends out the message header consisting of a synchronization break (serving as a unique identifier for the beginning of the frame), a synchronization field carrying the clock information, and the message identifier, which denotes the meaning of the message. Upon reception of the message identifier, the nodes on the network will know exactly what to do with the message. One of the nodes sends out the message response and the others either listen or do not care. Messages from the master to the slave(s) are carried out in the same manner — in this case, the slave task incorporated into the master node sends the response. LIN messages are scheduled in a time-triggered fashion. This provides a model for the accurate calculation of latency times, thus supporting fully predictable behavior. Since the master sends out the headers, it is in complete control of the scheduling and is also able to swap between a set of predefined schedule tables, according to the specific requirements/modes of the applications running in the subsystem.
31.4.1 The LIN Physical Layer The transport medium is a single-line, wired-AND bus supplied via a termination resistor from the positive battery node (VBAT, nominally 12 V). The bus line transceiver is an enhanced ISO 9141 implementation. The bus can take two complementary logical values: the dominant value, with an electrical voltage close to ground and representing a logical 0, and the recessive value, with an electrical voltage close to the battery supply and representing a logical 1 (Figure 31.1). The bus is terminated by a pull-up resistor with a value of 1 kOhm in the master node and 30 kOhm in a slave node. A diode in series with the resistor is required to prevent the electronic control unit (ECU) from being powered by the bus in case of a local loss of battery. The termination capacitance is typically CSlave = 220 pF in the slave nodes, while the capacitance of the master node is higher in order to make the total line capacitance less dependent on the actual number of slave nodes in a particular network. The maximum signaling rate is limited to 20 kbit/s. This value is a practical compromise between the conflicting requirements of high slew rates for the purpose of easy synchronization and slower slew rates
© 2005 by CRC Press
31-4
The Industrial Communication Technology Handbook
FIGURE 31.2 Frame structure.
for electromagnetic compatibility. The minimum baud rate is 1 kbit/s — helping to avoid conflicts with the practical implementation of time-out periods.
31.4.2 The LIN Protocol The entities that are transferred on the LIN bus are frames. One message frame is formed by the header and the response (data) part. The communication in a LIN network is always initiated by the master task sending out a message header, which includes the synchronization break, the synchronization byte, and the message identifier. One slave task is activated upon reception and filtering of the identifier and starts the transmission of the message response. The response is composed of one to eight data bytes and is protected by one checksum byte. The time it takes to send a frame is the sum of the time to send each byte, plus the response space and the interbyte space. The interbyte space is the period between the end of the stop bit of a byte and the start bit of the following byte. The interframe space is the time from the end of a frame until the start of the next frame. A frame is constructed of a break followed by 4 to 11 byte fields. The structure of a frame is shown in Figure 31.2. In order to allow the detection of signaling errors, the sender of a message is required to monitor the transmission. After transmission of a byte, the subsequent byte may only be transmitted if the received byte was correct. This allows proper handling of bus collisions and time-outs. Signals are transported in the data field of a frame. Several signals can be packed into one frame as long as they do not overlap each other. Each signal has exactly one producer; i.e., it is always written by the same node in the cluster. Zero, one, or multiple nodes may subscribe to the signal. A key property of the LIN protocol is the use of schedule tables. A schedule table makes it possible to ensure that the bus will never be overloaded. It is also the key component to guarantee timely delivery of signals to the subscribing applications. Deterministic behavior is made possible by the fact that all transfers in a LIN cluster are initiated by the master task. It is the responsibility of the master to ensure that all frames relevant in a certain mode of operation are given enough time to be transferred.
31.5 Design Process and Work Flow Regardless of the protocol, a network design process includes three major elements: • Requirement capturing (signal definitions and timing requirements) • Network configuration/design • Network validation The holistic concept of LIN supports the entire development, configuration, and validation of a network by providing definitions of all necessary interfaces.
© 2005 by CRC Press
31-5
The LIN Standard
The LIN work flow allows for the implementation of a seamless chain of design and development tools, enhancing speed of development and the reliability of the resulting LIN cluster. The LIN configuration language allows description of a complete LIN network and also contains all information necessary to monitor the network. This information is sufficient to make a limited emulation of one or multiple nodes if they are not available. The LIN description file (LDF) can be one component used to generate software for an electronic control unit (ECU), which shall be part of the LIN network. An API has been defined by the LIN standard to provide a uniform, abstract way to access the LIN network from applications. The syntax of a LIN description file is simple and compact enough to be handled manually, but use of computer-based tools is encouraged. node capability files, as described in the LIN node capability language specification, provides one way to (almost) automatically generate LIN description files.
31.5.1 System Definition Process Defining optimal signal packing and a schedule table that fulfills the signaling needs in varying modes of operation, with consideration of capabilities of the participating nodes, is called the system definition process. Typically, it will result in generation of the LDF, written by hand for simple systems or generated by high-level network design tools, reusing existing, preconfigured slave nodes to create a cluster of them; starting from scratch is not that convenient. This is especially true if the defined system contains node address conflicts or frame identifier conflicts. The LIN node capability language, which is a new feature in LIN 2.0, provides a standardized syntax for specification of off-the-shelf slave nodes. This will simplify procurement of standard nodes as well as provide possibilities for tools that automate cluster generation. The availability of such nodes is expected to grow rapidly. If accompanied by a node capability file, it will be possible to generate both the LIN configuration file and the initialization code for the master node. Thus, true plug and play with nodes in a cluster will become a reality. By receiving a node capability file (NCF) with every existing slave node, the system definition step is automatic: just add the NCFs to your project in the system definition tool and it produces the LDF together with C code to configure a conflict-free cluster. The configuration C code shall, of course, be run in the master node during start-up of the cluster. If you want to create new slave nodes as well, the process becomes somewhat more complicated. The steps to perform will depend on the system definition tool being used, which is not part of the LIN specification. A useful tool will allow for entering of additional information before generating the LDF. (It is always possible to write a fictious NCF for the nonexistent slave node, and thus it will be included.) An example of the intended work flow is depicted in Figure 31.3. The slave nodes are connected to the master forming a LIN cluster. The corresponding node capability files are parsed by the system defining tool to generate an LDF in the system definition process. The LDF is parsed by the system generator to automatically generate LIN-related functions in the desired nodes (the master and slave 3 in the example shown in Figure 31.3). The LDF is also used by a LIN bus analyzer/ emulator tool to allow for cluster debugging. Node Capability Files
Design System Defining Tool
System Generator
LIN Description File
Debugging
System Slave 1
Slave 2
Slave 3
Master
Bus analyzer and emulator LIN
FIGURE 31.3 Work flow.
© 2005 by CRC Press
31-6
The Industrial Communication Technology Handbook
If the setup and configuration of any LIN cluster are fully automatic, a great step toward plug-andplay development with LIN will be taken. In other words, it will be just as easy to use distributed nodes in a LIN cluster as it is to use a single node with the physical devices connected directly to the node. It is worth noting that the generated LDF reflects the configured network; any preexisting conflicts between nodes or frames must have been resolved before activating cluster traffic.
31.5.2 Debugging Debugging and node emulation are based on the LDF produced during system definition. Emulation of the master adds the requirement that the cluster must be configured to be conflict-free. Hence, the emulator tool must be able to read reconfiguration data produced by the system definition tool. One example of a comprehensive tool chain built around the open interface definitions of the LIN standard is presented below.
31.6 Future The driving ideas and resulting technology behind the success of LIN — especially in the area of the structured approach toward the system design process — will most likely migrate to other areas of automotive electronics. LIN itself will find its way to applications outside of the automotive world due to its low cost and versatility. The LIN specification will evolve further to cover upcoming needs. For example, the future 42-V power supply will require a new physical layer. There will be a broad supply of components that are made for LIN. Because of high production volumes, these products can be used cost-effectively in many applications, enhancing the functionality of vehicles in a more costeffective manner.
31.7 Volcano LIN Tool Chain The Volcano LIN tool chain process is illustrated in Figure 31.4. 1. LIN network requirements are entered into LNA. 2. Automatic frame compilation and schedule table generation are done by LNA.
FIGURE 31.4
© 2005 by CRC Press
The Volcano tool-chain for LIN.
The LIN Standard
31-7
3. The LIN description file is generated by LNA. 4. The LIN configuration generator tool converts the LIN description file and private file to targetdependent “.c” and “.h” codes. 5. Application code is compiled with target-dependent configuration code and linked to the LIN target package library. 6. Analysis and emulation are performed with a LIN Spector using the generated LIN description file.
31.7.1 LIN Network Architect The LIN Network Architect (LNA) is built for design and management of LIN networks. Starting with the entry of basic data such as signals, encoding types, and nodes, LNA takes the user through all stages of network definition.
31.7.2 Requirement Capturing There are two types of data administered by LNA: • Global objects (signals, encoding types, and nodes) • Project-related data (network topology, frames, and schedule tables) Global objects shall be created first and can then be reused in any number of projects (Figure 31.5).
FIGURE 31.5 Definition of global objects in LNA.
© 2005 by CRC Press
31-8
The Industrial Communication Technology Handbook
FIGURE 31.6 Network definition and frame packing.
They can be defined manually or imported by using a standardized Extensible Markup Language (XML) input file (based on FIBEX revision 1.0). Future versions of the tool will be able to import data directly from the standardized node capability file (NCF). Comprehensive version and variant handling is supported. The systems integrator combines subsets of these objects to define networks. Consistency checks are continuously performed during this process. This is followed by automatic packing of signals into frames (Figure 31.6). The last task to be completed is that of generating the schedule table in accordance with the timing requirements captured earlier in the process (Figure 31.7). The optimization considers several factors such as bandwidth and memory usage. Based on the allocation of signals to networks via node interfaces, the tool will automatically identify gateway requirements between subnetworks, regardless of whether they are LIN to LIN or LIN to CAN. The transfer of signals from one subnetwork to another will become completely transparent to the application of the automatically selected gateway node. The tool uses a publish–subscribe model. A signal can only be published by one node, but it can be received by any number of other nodes. Different nodes may have different end-to-end timing requirements (Figure 31.8). The max_age is the most important timing parameter defined in the Volcano timing model. This parameter describes the maximum allowed time between the generation and consumption of a signal involved in a distributed function.
© 2005 by CRC Press
The LIN Standard
31-9
FIGURE 31.7 Manual or automatic schedule table generation.
FIGURE 31.8 LIN timing model.
Changes can be introduced in a straightforward manner, with frame definitions and schedule tables automaticly recalculated to reflect the changed requirements. When the timing analysis has been performed and the feasibility of the individual subnetworks has been established, LDFs will automatically be created for each network (Figure 31.9). Textual reports can be generated as well, to enhance the readability of information for all parties involved in the design, verification, and maintenance process.
© 2005 by CRC Press
31-10
The Industrial Communication Technology Handbook
FIGURE 31.9 LDF generation.
31.7.3 LIN Target Package The LIN target package (LTP) represents the embedded software portion of the Volcano tool chain for LIN. The LTP is distributed as a precompiled and fully validated object library, also including associated documentation and a command line configuration utility (LCFG) with automatic code generation capability, generating the configuration-specific code and set of data structures. Implementing the LTP with an application program is a simple process. The LDF created by the offline tool contains the communication-related network information. In addition, a target file as an ASCIIbased script defines low-level microcontroller information such as memory model, clock, SCI, and other node specifics to the LTP. These two files are run as input through the command line utility LCFG. It converts them both into target-dependent code usable by the microcontroller. The output contains all relevant configuration information formatted into compiler-ready C source code. The target-dependent source code is added to the module build system along with the precompiled object library. After compilation the LTP gets linked to the application functionality to form the target image, which is ready for download. The application programmer can interface to the LTP and therefore to the LIN subnetwork through the standardized LIN API (Figure 31.10). API calls include signal-oriented read and write calls, signalrelated flag control (but also node-related initialization), interrupt handling, and timed task management. The low-level details of communication are hidden to the application programmer using the LTP. Specifics about signal allocation within frames, frame IDs, and others are carried within the LDF so that
© 2005 by CRC Press
The LIN Standard
FIGURE 31.10
31-11
LIN API structure.
applications can be reused simply by linking to different configurations described by different LDFs. As long as signal formats do not get changed, a reconfiguration of the network only requires repeating the process described above, resulting in a new target image without impact to the application. When the node allows for reflashing, the configuration can even be adapted without further supplier involvement, allowing for end-of-line programming or after-sales adaptations in case of service. LTPs are created and built for a specific microcontroller and compiler target platform. A number of ports to popular targets are available, and new ports can be made at the customer’s request.
31.7.4 LIN Spector: Test Tool LIN Spector is a highly flexible analysis and emulation tool used for testing and validating LIN networks. The tool is devided into two components: external hardware and PC-based software. Using a 32-bit microcontroller, the hardware portion performs exact low-level real-time bus monitoring (down to 10-ms resolution) while interfacing to the PC via standard RS-232. Other connections are provided for external power and a 9-pin D-Sub for bus and triggering connections. The output trigger is provided for connection to an oscilloscope, allowing the user to externally monitor the bus signaling. Starting with LDF import, the tool allows for monitoring and display of all network signal data. Advanced analysis is possible with logical name and scaled physical value views. Full emulation of one or many nodes — regardless of whether they are master or slave — is possible using LDF information (Figure 31.11). Communication logging and replay is possible, including the ability to start a log via logic-based triggers. An optional emulation module enables the user to simulate complete applications or run test cycles when changing emulated signal values and switching schedule tables — all in real time. The functions are specified by the user via LIN emulation control (LEC) files created in a C-like programming language. This can also be used to validate the complete LIN communication in a target module. Test cases are defined stressing bus communication by error injection on the bit or protocol timing level. Sophisticated graphical user interface panels can be created using the LIN Go feature within the test device (Figure 31.12). These panels interface with the network data defined by the LDF for display and control.
31.8 Summary LIN is an enabling factor for the implementation of a hierarchical vehicle network to achieve higher quality and reduction of costs for automotive makers. This is enabled by providing best practices of software development to the industry: abstraction and composability. LIN allows for reduction of the many existing low-end multiplex solutions and for cutting the costs of development, production, service, and logistics in vehicle electronics.
© 2005 by CRC Press
31-12
The Industrial Communication Technology Handbook
FIGURE 31.11 LIN Spector — diagnostics and emulation tool for LIN.
The growing number of car lines equipped with LIN and the ambitious plans for the next generation of cars are probably the best proof for the success of LIN. The simplicity and completeness of the LIN specification, combined with a holistic networking concept allowing for a high degree of automation, have made LIN the perfect complement to CAN as the backbone of in-vehicle communication. Some of the market growth even resides in the downsizing of parts of the vehicle network from MS CAN to LIN, where limited communication requirements allow for such downsizing. The release of LIN 2.0 has further enhanced the reuse of components across car manufacturers and has added a higher degree of automated design capability by introduction of node capability description files and by defining mechanisms for reconfigurability of identical LIN devices in the same network. VCT is offering the corresponding and highly automated tool chain to guarantee design to correctness. This shortens design cycles and — as a conceptual approach — and allows for integration into higherlevel tools. LIN solutions provide a means for the automotive industry to drive new technology and functionality in all classes of vehicles.
Acknowledgments I thank Hans-Christian von der Wense of Freescale Semiconductor Munich and István Horváth and Thomas Engler of Volcano Automotive Group for their contributions to this chapter.
© 2005 by CRC Press
The LIN Standard
31-13
FIGURE 31.12 LIN Go — graphical objects.
Additional Information ISO 9141, Road Vehicles: Diagnostics Systems: Requirement for Interchange of Digital Information, 1st edition, 1989. LIN Consortium, LIN Specification, Version 2.0, www.lin-subbus.org, September 2003. Dr. Günter Reichart, LIN: a subbus standard in an open system architecture, in 1st International LIN Conference 19, Ludwigsburg, Germany, September 2002. J.W. Specks, A. Rajnák, LIN: protocol, development tools, and software interfaces for local interconnect networks in vehicles, in 9th International Conference on Electronic Systems for Vehicles, Baden-Baden, Germany, October 2000. W. Specks, A. Rajnák, The scaleable network architecture of the Volvo S80, in 8th International Conference on Electronic Systems for Vehicles, Baden-Baden, Germany, October 1998, pp. 597–641. The LIN specification package and further background information about LIN and the LIN consortium are available via the URL http://www.lin-subbus.org. Information about LIN products referred to in this chapter is available via the URL http://www. volcanoautomotive.com.
© 2005 by CRC Press
32 Volcano: Enabling Correctness by Design 32.1 Introduction ......................................................................32-1 32.2 Volcano Concepts..............................................................32-3 Volcano Signals and the Publish–Subscribe Model • Frames • Network Interfaces • The Volcano API • Timing Model • Capture of Timing Constraints
32.3 Volcano Network Architect ............................................32-10 The Car OEM Tool Chain: One Example • VNA: Tool Overview
32.4 Volcano Software in an ECU..........................................32-15 Volcano Configuration • Work Flow
Antal Rajnák Volcano Communications Technologies AG
Acknowledgments......................................................................32-17 Reference ....................................................................................32-18 More Information......................................................................32-18
32.1 Introduction Volcano is a holistic concept defining a protocol-independent design methodology for distributed realtime networks in vehicles. The concept deals with both technical and nontechnical entities (i.e., partitioning of responsibilities into well-defined roles in the development process). The vision of Volcano is enabling correctness by design. By taking a strict systems engineering approach and focusing resources into design, a majority of system-related issues can be identified and solved early in a project. The quality is designed into the vehicle, not tested out. Minimized cost, increased quality, and a high degree of configuration and reconfiguration flexibility are the trademarks of the Volcano concept. The Volcano approach is particularly beneficial as the complexity of vehicles is increasing very rapidly and as projects will have to cope with new functions and requirements throughout their lifetime. A unique feature of the Volcano concept is a solution called post-compile-time reconfiguration flexibility, where the network configuration contains signal-to-frame mapping, ID assignment, and frame period, is located in a configurable flash area of the electronic control unit (ECU), and can be changed without the need for touching the application software, thus eliminating the need for revalidation, saving both costs and lead time. The concept’s origins can be traced back to a project at Volvo Car Corporation in 1994–1998 when development of Volvo’s new large platform took place. It reuses solid industrial experience and takes into account recent findings from real-time research (Figure 32.1). The concept is characterized by three important features: • Ability to guarantee the real-time performance of the network already at the design stage, thus significantly reducing the need for testing
32-1 © 2005 by CRC Press
32-2
The Industrial Communication Technology Handbook
FIGURE 32.1 The Volvo S80 main networks.
• Built-in flexibility, enabling the vehicle manufacturer to upgrade the network in the preproduction phase of a project as well as in the aftermarket • Efficient use of available resources The actual implementation of the concept consists of two major parts: • The offline tool set for requirement capturing and automated network design (covering multiple protocols and gateway configuration). It provides strong administrative functions for variant and version handling, needed during the complete life cycle of a car project. • The target part — represented by a highly efficient and portable embedded software package offering a signal-based Application Programming Interface (API), handling of multiple protocols, integrated gateway functionality, and post-compile-time reconfiguration capability, together with a PC-based generation tool. Even though the implementation originally supported the Controller Area Network (CAN) and Volcano lite* protocols, it has successfully been extended to fit also other emerging network protocols. Local interconnect network (LIN) was added first, to be followed by the FlexRay and Media Oriented Systems Transport (MOST) protocols. The philosophy behind this is that communications have to be managed in one single development environment, covering all protocols used, in order to ensure end-to-end timing predictability, still providing the necessary architectural freedom to chose the most economic solution for the task. The Volcano approach is particularly beneficial because the complexity of vehicles is increasing very rapidly and because projects will have to cope with new functions and requirements throughout their lifetime. The computing industry has discovered over the last 40 years that certain techniques are needed in order to manage complex software systems. Two of these techniques are abstraction (where unnecessary information is hidden) and composability (if software components proven to be correct are combined, then the resulting system will be correct as well). Volcano is making heavy use of both of these techniques. The automotive industry is implementing an increasing number of functions in software. The introduction of protocols like MOST for multimedia and FlexRay for active chassis systems results in highly complex electrical architectures. Finally, all these complex subnetworks are linked through gateways. The behavior of the entire car network has a crucial influence upon the car’s performance and reliability. To manage software development involving many suppliers, hundreds of thousands of lines of code and thousands of signals require a structured systems engineering approach. Inherent in the concept of systems engineering is a clear partitioning of the architecture, requirements, and responsibilities.
*A low-speed, Serial Communications Interface (SCI)-based proprietary master–slave protocol used by Volvo.
© 2005 by CRC Press
Volcano: Enabling Correctness by Design
32-3
A modern vehicle includes a number of microprocessor-based components called electronic control units (ECUs), provided by a variety of suppliers. CAN provides an industry standard solution for connecting ECUs together using a single broadcast bus. A shared broadcast bus makes it much easier to add desired functionality: ECUs can be added easily, and they can communicate data easily and cheaply (adding a function may be “just software”). But increased functionality leads to more software and greater complexity. Testing a module for conformance to timing requirements is the most difficult of the problems. With a shared broadcast bus, the timing performance of the bus might not be known until all the modules are delivered and the bus usage of each is known. Testing for timing conformance can only then begin (which is often too far into the development of a vehicle to find and correct major timing errors). The supplier of a module can only do limited testing for timing conformance: it does not have a complete picture of the final load placed on the bus. This is particularly important when dealing with the CAN bus: arrivals of frames from the bus may cause interrupts on a module wishing to receive the frames, and so the load on the microprocessor in the ECU is partially dependent on the bus load. It is often thought that CAN is somehow unpredictable and the latencies for lower-priority frames in the network are unbounded. This is untrue, and in fact, CAN is a highly predictable communications protocol. Furthermore, CAN is well suited to handle large amounts of traffic with differing time constraints. However, with CAN there are a few particular problems: • The distribution of identifiers: CAN uses identifiers for two purposes: distinguishing different messages on the bus and assigning relative priorities to those messages — the latter being often neglected. • Limited bandwidth: Due to a low maximum signaling speed of 1 Mbit/s, further reduced by significant protocol overhead. Volcano was designed to provide abstraction, composability, and identifier distribution reflecting true urgencies, while at the same time providing the most efficient utilization of the protocol.
32.2 Volcano Concepts The Volcano concept is founded on the ability to guarantee the worst-case latencies of all frames sent in a multiprotocol network system. This is a key step because it gives the following: • A way of guaranteeing that there are no communications-related timing problems. • A way of maximizing the amount of information carried on the bus. This is important for reduced production costs. • The possibility to develop highly automated tools for design of optimal network configurations. The timing guarantee for CAN is provided by mathematical analysis developed from academic research [1]. Other protocols like FlexRay are predictable by design. For this reason, some of the subjects discussed below are CAN specific and others are independent of the protocol used. The analysis is able to calculate the worst-case latency for each frame sent on the bus. This latency is the longest time from placing a frame in a CAN controller at the sending side to the time the frame is correctly received at all receivers. The analysis needs to make several assumptions about how the bus is used. One of these assumptions is that there is a limited set of frames that can access the bus and that timerelated attributes of these frames are known (e.g., frame size, frame periodicity, queuing jitter, and so on). Another important assumption is that the CAN hardware can be driven correctly: • The internal message queue within any CAN controller in the system is organized (or can be used) as such that the highest-priority message will be sent out first if more than one message is ready to be sent. (The hardware slot position-based arbitration is OK as long as the number of sent frames is less than the number of transmit slots available in the CAN controller.)
© 2005 by CRC Press
32-4
The Industrial Communication Technology Handbook
• The CAN controller should be able to send out a stream of scheduled messages without releasing the bus in the interframe space between two messages. Such devices will arbitrate for the bus right after sending the previous message and will only release the bus in case of lost arbitration. A third important assumption is the error model: the analysis can account for retransmissions due to errors on the bus, but requires a model for the number of errors in a given time interval. The Volcano software running in each ECU controls the CAN hardware and accesses the bus so that all these assumptions are met, allowing application software to rely on all communications taking place on time. This means that integration testing at the automotive manufacturer can concentrate on functional testing of the application software. Another important benefit is that a large amount of communications protocol overhead can be avoided. Examples of how protocol overheads are reduced by obtaining timing guarantees are: • There is no need to provide frame acknowledgment within the communications layer, dramatically reducing bus traffic. The only case where an ECU can fail to receive a frame via CAN is if the ECU is off the bus, a serious fault that is detected and handled by network management and onboard diagnostics. • Retransmissions are unnecessary. The system-level timing analysis guarantees that a frame will arrive on time. Time-outs only happen after a fault, which can be detected and handled by network management or the onboard diagnostics. A Volcano system never suffers from intermittent overruns during correct operation because of the timing guarantees, and therefore achieves these efficiency gains.
32.2.1 Volcano Signals and the Publish–Subscribe Model The Volcano system provides signals as the basic communication object. Signals are small data items that are sent between ECUs. The publish–subscribe model is used for defining signaling needs. For a given ECU there are a set of signals that are published (i.e., made available to the system integrator) and a number of subscribed signals (i.e., signals that are required as inputs to the ECU). The signal model is provided directly to the programmer of ECU application software, and the Volcano software running in each ECU is responsible for translation between signals and CAN frames. An important design requirement for the Volcano software was that the application programmer is unaware of the bus behavior: all the details of the network are hidden and the programmer only deals with signals through a simple API. This is crucial because a major problem with alternative techniques is that the application software makes assumptions about the CAN behavior, and therefore changing the bus behavior becomes difficult. In Volcano there are three types of signals: • Integer signals: These represent unsigned numbers and are of a static size between 1 and 16 bits. So, for example, a 16-bit signal can store integers in the range of 0 to 65,535. • Boolean signals: These represent truth conditions (true/false). Note that this is not the same as a 1-bit integer signal (which stores the integer values 0 or 1). • Byte signals: These represent data with no Volcano-defined structure. A byte signal consists of a fixed number of between 1 and 8 bytes. The advantage of Boolean and integer signals is that the values of a signal are independent of processor architecture (i.e., the values of the signals are consistent regardless of the “endian-ness” of the microprocessors in each ECU). For published signals, Volcano internally stores the value of these signals and, in case of periodic signals, will send them to the network according to a pattern defined offline by the system integrator. The system integrator also defines the initial value of a signal. The value of a signal persists until updated by the application program via a write call or until Volcano is reinitialized.
© 2005 by CRC Press
Volcano: Enabling Correctness by Design
32-5
For subscribed signals, Volcano internally stores the current value of each signal. The system integrator also defines the initial value of a signal. The value of a subscribed signal persists until: • It is updated by receiving a new value from the network • Volcano is reinitialized • A signal refresh time-out occurs and the value is replaced by a substitute value defined by the application programmer In the case where new signal values are received from the network, these values will not be reflected in the values of subscribed signals until a Volcano input call is made. A published signal value is updated via a write call. The latest value of a subscribed signal is obtained via a read call. A write call for a subscribed signal is not permitted. The last written value of a published signal may be obtained via a read call. 32.2.1.1 Update Bits The Volcano concept permits placement of several signals with different update rates into the same frame. It provides a special mechanism — named update bit — to indicate which signals within the frame have actually been updated; i.e., the ECU generating the signal wrote a fresh value of the signal since the last time it was transmitted. The Volcano software on an ECU transmitting a signal automatically clears the update bit when it has been sent. This ensures that a Volcano-based ECU on the receiving side will know each time the signal has been updated (the application can see this update bit by using flags tied to an update bit; see below). Using update bits to their full extent requires that the underlying protocol is secure. (Frames cannot be lost without being detected.) The CAN protocol is regarded as such, but not the LIN protocol. Therefore, the update bit mechanism is limited to CAN within Volcano. 32.2.1.2 Flags A flag is a Volcano object purely local to an ECU. It is bound to one of two things: • The update bit of a received Volcano signal; the flag is set when the update bit is set. • The containing frame of a signal; the flag is set when the frame containing the signal is received (regardless of whether an update bit for the signal is set). Many flags can be bound to each update bit, or the reception of a containing frame. Volcano sets all the flags bound to an object when the occurrence is seen. The flags are cleared explicitly by the application software. 32.2.1.3 Time-Outs A time-out is, like the flags, a Volcano object purely local to an ECU. The time-out is declared by the application programmer and is bound to a subscribed signal. A time-out condition occurs when the particular signal was not received within the given time limit. In this case, the signal (and a number of other signals) is set to a value specified as part of the declaration of the time-out. As with the flags, the time-out reset mechanism can be bound to either: • The update bit of a received Volcano signal • The frame carrying a specific signal
32.2.2 Frames A frame is a container capable of carrying a certain amount of data (0 to 8 bytes for CAN and LIN). Several signals can be packed into the available data space and transmitted together in one frame on the network. The total size of a frame is determined by the protocol. A frame can be transmitted periodically or sporadically. Each frame is assigned a unique identifier. The identifier serves two purposes in the CAN case:
© 2005 by CRC Press
32-6
The Industrial Communication Technology Handbook
• Identifies and filters a frame on reception at an ECU • Assigns a priority to a frame 32.2.2.1 Immediate Frames Volcano normally hides the existence of network frames from the application designer. However, under certain cases there is a need to send and receive frames with very short processing latencies. In these cases, direct application support is required. Such frames are designated immediate frames. There are two Volcano calls to handle immediate frames: • A transmit call, which immediately sends the designated frame to the network • A receive call, which immediately processes the designated incoming frame if that frame is pending There is also a read-update-bit call to test the update bit of a subscribed signal within an immediate frame. The signals packed into an immediate frame can be accessed with normal read and write function calls in the same way as all other normal signals. The application programmer is responsible for ensuring that the transmit call is made only when the signal values of published signals are consistent. 32.2.2.2 Frame Modes In Volcano one is allowed to specify different frame modes for an ECU. A frame mode is a description of an ECU working mode, where a set of frames (signals) can be active (input and output). The frames can be active in one or many frame modes. The timing properties of frames do not have to be the same for different frame modes supporting the same frame.
32.2.3 Network Interfaces A network interface is the device used to send and receive frames to and from networks. A network interface connects a given ECU to the network. In the CAN case, more than one network interface (CAN controller) on the same ECU may be connected to the same network. Likewise, an ECU may be connected to more than one network. The network interfaces in Volcano are protocol specific. The protocols currently supported are CAN and LIN; FlexRay and MOST are under implementation. The network interface is managed by a standard set of Volcano calls. These allow the interface to be initialized or reinitialized, connected to the network (i.e., begin operating the defined protocol), and disconnected from the network (i.e., take no further part in the defined protocol). There is also a Volcano call to return the status of the interface.
32.2.4 The Volcano API The Volcano API provides a set of simple calls to manipulate signals and to control the CAN/LIN controllers. There are also calls to control Volcano sending to and receiving from networks. To manipulate signals there are read and write calls. A read call returns to the caller the latest value of a signal; a write call sets the value of a signal. The read and write calls are the same regardless of the underlying network type. 32.2.4.1 Volcano Thread of Control There are two Volcano calls that must be called at the same fixed rate: v_input() and v_output(). If the v_gateway() function is used, the same calling rate should be used as for the v_input() and v_output() functions. The v_output() call places the frames into the appropriate controllers. The v_input() call takes received frames and makes the signal values available to read calls. The v_gateway() call copies values of signals in frames received from the network to values of signals in frames sent to the network. The v_sb_tick() call handles transmitting and receiving frames for subbuses.
© 2005 by CRC Press
32-7
Volcano: Enabling Correctness by Design
1
2
3
4
5
6
Notional generation
First v_output at which new value is available
Frame enters arbitration
Transmission completed
First v_input at which signal is available
Notional consumption
time T
PL
T
T
BT
T
T
AT
T
SL
max_age
FIGURE 32.2
The Volcano timing model.
Volcano also provides a very low latency communication mechanism in the form of the immediate frame API. This is a view of frames on the network that allows transmission and reception from and to the Volcano domain without the normal Volcano input and output latencies, or mutual exclusion requirements with the v_input() and v_output() calls. There are two communication calls in the immediate signal API: v_imf_rx() and v_imf_tx(). The v_imf_tx() call copies values of immediate signals into a frame and places the frame in the appropriate CAN controller for transmission. The v_imf_rx() takes a received frame containing immediate signals and makes the signal values available to read calls. A third call, v_imf_queued(), allows the user to see if an immediate frame has really been sent on the network. The controller calls allow the application to initialize, connect, and disconnect from networks, and to place the controllers into sleep mode, among others. 32.2.4.2 Volcano Resource Information The ambition of the Volcano concept is to provide a fully predictable communications solution. In order to achieve this, the resource usage of the Volcano embedded part has to be determined. Resources of special interest are memory and execution time. 32.2.4.2.1 Execution Time of Volcano Processing Calls In order to bound processing time, a budget for the v_input() call (i.e., the maximum number of frames that will be processed by a single call to v_input()) has to be established. A corresponding process for transmitted frames applies as well.
32.2.5 Timing Model The Volcano timing model covers end-to-end timing (from button press to activation). To be able to set in context the signal timing information needed in order to analyze a network configuration of signals and frames, a timing model is used. This section defines the required information that must be provided by an application programmer in order to be able to guarantee the end-to-end timing requirements. A Volcano signal is transported over a network within a frame. Figure 32.2 identifies six time points between the generation and consumption of a signal value: 1. Notional generation (signal generated) — either by hardware (e.g., switch pressed) or software (e.g., time-out signaled). The user can define this point to best reflect his system. 2. First v_output() (or v_imf_tx() for an immediate frame) at which a new value is available. This is the first such call after the signal value is written by a write call. 3. The frame containing the signal is first entered for transmission (arbitration on a CAN bus).
© 2005 by CRC Press
32-8
The Industrial Communication Technology Handbook
4. Transmission of the frame completes successfully (i.e., the subscriber’s communication controller receives the frame from the network). 5. v_input() (or v_imf_rx() for an immediate frame) makes the signal available to the application. 6. Notional consumption — the user application consumes the data. The user can define this point to best reflect his system. The max_age of the signal is the maximum age, measured from notional generation, at which it is acceptable for notional consumption. The max_age is the overall timing requirement on a signal. TPL (publish latency) is the time from notional generation to the first v_output() call when the signal value is available to Volcano (a write call has been made). It will depend on the properties of the publishing application. Typical values might be the frame_processing_period (if the signal is written fresh every period, but this is not synchronized with v_output()), the offset between the write call and v_output() (if the two are synchronized), or the sum of the frame_processing_period and the period of some lowerrate activity that generates the value. This value must be given by the application programmer. TSL (subscribe latency) is the time from the first v_input that makes the new value available to the application to the time when the value is consumed. The consumption of a signal is a user-defined event that will depend on the properties of the subscribing function. As an example, it can be a lamp being lit or an actuator starting to move. This value must be given by the application programmer. The intervals TBT, TT, and TAT are controlled by the Volcano 5 configuration and are dependent upon the nature of the frame in which the signal is transported. The value TBT is the time before transmission (the time from the v_output call until the frame enters arbitration on the bus). TBT is a per-frame value that depends on the type of frame carrying the signal (see later sections). This time is shared by all signals in the frame and is common to all subscribers to those signals. The value TAT is the time after transmission (the time from when the frame has been successfully transmitted on the network until the next v_input call). TAT is a per-frame value that may be different for each subscribing ECU. The value TT is the time required to transmit the frame (including the arbitration time) on the network. 32.2.5.1 Jitter The application programmer at the supplier must also provide information of the jitter to the systems integrator. This information is as follows: The input_jitter and output_jitter refer to the variability in the time taken to complete the v_input() and v_output() calls, measured relative to the occurrence of the periodic event causing Volcano processing to be done (i.e., calls to v_input(), v_gateway(), and v_output() to be made). Figure 32.3 shows how the output_jitter is measured. In Figure 32.3, E marks the earliest completion time of the v_output() call and L marks the latest completion time, relative to the start of the cycle. The output_jitter is therefore L – E. The input_jitter is measured according to the same principles. If a single-thread system is used, without interrupts, the calculation of the input_jitter and output_jitter is straightforward: the earliest time is the best-case execution time of all the calls in the cycle (including the v_output() call), and the latest time is the worst-case execution time of all the calls. The situation is more complex if interrupts can occur or the system consists of multiple tasks, since the latest time must take into account preemption from interrupts and other tasks.
32.2.6 Capture of Timing Constraints The declaration of a signal in a Volcano fixed configuration file provides syntax to capture the following timing-related information: • Whether a signal is state or state change — (info_type) • Whether a signal is sporadic or periodic — (generation_type)
© 2005 by CRC Press
32-9
Volcano: Enabling Correctness by Design
Frame processing period
TV
TV
E
Occurrence of periodic event that initiates Volcano processing calls
Worst-case execution time of v_output call
FIGURE 32.3
• • • •
TV
L
Best-case execution time of v_output call
Execution of other computation
Completion v_output call
Measurement of output jitter.
The latency The min_interval The max_interval The max_age
The first two (together with whether the signal is published or subscribed to) provide signal properties that determine the kind of signal. A state signal carries a value that completely describes the signaled property (e.g., the current position of a switch). A subscriber to such a signal need only observe the signal value when the information is required for the subscriber’s purposes (e.g., signal values can be missed without affecting the usefulness of later values). A state change signal carries a value that must always be observed in order to be meaningful (e.g., distance traveled since last signal value). A subscriber must observe every signal value. A sporadic signal is one that is written by the application in response to some event (e.g., a button press). A periodic signal is one that is written by the application at regular intervals. The latency of a signal is the time from notional generation to being available to Volcano (for a published signal), or from being made available to the application by Volcano to notional consumption (for a subscribed signal). Note that immediate signals (those in immediate frames) include time taken to move frames to and from the network in these latencies. The min_interval has different interpretations for published and subscribed signals. For a published signal, it is the minimum time between any pair of write calls to the signal (this allows, for example, the calculation of the maximum rate at which the signal could cause a sporadic frame carrying it to be transmitted). For a subscribed signal, it is the minimum acceptable time between arrivals of the signal. This is optional: it is intended to be used if the processing associated with the signal is triggered by arrival of a new value, rather than periodic. In such a case, it provides a constraint that the signal should not be connected to a published signal with a faster rate. The max_interval has different interpretations for published and subscribed signals. For a published signal, the interesting timing information is already captured by min_interval and publish latency. For a subscribed signal, it is the maximum interval between notional consumptions of the signal (i.e., it can be used to determine that signal values are sampled quickly enough that none will be missed). The max_age of a signal is the maximum acceptable age of a signal at notional consumption, measured from notional generation. This value is meaningful for subscribed signals.
© 2005 by CRC Press
32-10
The Industrial Communication Technology Handbook
In addition to the signal timing properties described above, the Volcano fixed configuration file provides syntax to capture the following additional timing-related information: • Volcano processing period • Volcano jitter time The Volcano processing period defines the nominal intervals between successive v_input() calls on the ECU and between successive v_output() calls (i.e., the rates of the calls are the same, but v_input() and v_output() are not assumed to become due at the same instant). For example, if the Volcano processing period is 5 ms, then each v_output() call becomes due 5 ms after the previous one became due. The Volcano jitter defines the time by which the actual call may lag behind the time at which it became due. Note that becomes due refers to the start of the call, and jitter refers to the completion of the call.
32.3 Volcano Network Architect To manage increasing complexity in electrical architectures, a structured development approach is believed essential to ensure correctness by design. The Volcano Automotive Group has developed a network design tool, Volcano Network Architect (VNA), to support a development process, based on strict systems engineering principles. Gatewaying of signals between different networks is automatically handled by the VNA tool and the accompanying embedded software. The tool supports partitioning of responsibilities into different roles, such as system integrator and function owner. Third-party tools may be used for functional modeling. These models can be imported into VNA. VNA is the top-level tool in the Volcano Automotive Group’s tool chain for designing vehicle network systems. The tool chain supports important aspects of systems engineering such as: • Use of functional modeling tools • Partitioning of responsibilities • Abstracting away from hardware- and protocol-specific details providing a signal-based API for the application developer • Abstracting away from the network topology through automatic gatewaying between different networks • Automatic frame compilation to ensure that all declared requirements are fulfilled (if possible), that is, delivering correctness by design • Reconfiguration flexibility by supporting post-compile-time reconfiguration capability The VNA tool supports network design and makes management and maintenance of distributed network solutions more efficient. The tool supports capturing of requirements and then takes a user through all stages of network definition.
32.3.1 The Car OEM Tool Chain: One Example Increasing competition and complex electrical architectures demands enhanced processes. Function modeling has proved to be a suitable tool to capture the functional needs in a vehicle. Tools such as Rational Rose provide a good foundation to capture all different functions, and other tools (Statemate, Simulink) model them in order to allocate objects and functionality in the vehicle. Networking is essential since the functionality is distributed among a number of ECUs in the vehicle. Substantial parts of the outcome from the function modeling are highly suitable to use as input to a network design tool such as VNA. The amount of information required to properly define the networks is vast. To support input of data, VNA provides an automated import from third-party tools through an Extensible Markup Language (XML)-based format.
© 2005 by CRC Press
Volcano: Enabling Correctness by Design
32-11
It is the job of the signal database administrator/system integrator to ensure that all data entered into the system are valid and internally consistent. VNA supports this task through a built-in multilevel consistency checker that verifies all data. In this particular approach, the network is designed by the system integrator in close contact with the different function owners in order to capture all necessary signaling requirements — functional and nonfunctional (including timing). When the requirements are agreed upon and documented in VNA, the system integrator uses VNA to pack all signals into frames; this can be done manually or automatically. The algorithm used by VNA handles gatewaying by partitioning end-to-end timing requirements into requirements per network segment. All requirements are captured in the form of a Microsoft Word document called software requirement specification (SWRS) that is generated by VNA and sent to the different node owners as a draft copy to be signed off. When all SWRSs have been signed off, VNA automatically creates all necessary configuration files used in the vehicle, along with a variety of files for third-party analysis and measurement tools. The network-level (global) configuration files are used as input to the Volcano configuration tool and Volcano back-end tool in order to generate a set of downloadable binary configuration files for each node. The use of reconfigurable nodes makes the system very flexible since the Volcano concept separates application-dependent information and network-dependent information. A change in the network by the system integrator can easily be applied to a vehicle without having to recompile the application software in the nodes. The connection between function modeling and VNA provides good support for iterative design. It verifies network consistency and timing up front, to ensure a predictable and deterministic network.
32.3.2 VNA: Tool Overview 32.3.2.1 Global Objects The work flow in VNA ensures that all relevant information about the network is captured. Global objects are created first and then (re)used in several projects. The VNA user works with objects of types such as signals, nodes, interfaces, etc. These objects are used to build up the networks used in a car. Signals are defined by name and type and can have logical or physical encoding information attached. Interfaces detailing hardware requirements are defined, leading to the description of actual nodes on a network. For each node, receive and transmit signals are defined, and timing requirements are provided for the signals. This information is intended for global use, that is, across car variants, platforms, etc. 32.3.2.2 Project- or Configuration-Related Data When all global data have been collected, the network will be designed by connecting the interfaces in a desired configuration. VNA has strong project and variant handling. Different configurations can selectively use or adapt the global objects, for example, by removing a high-end feature from a low-end car model. This means that VNA can manage multiple configurations, designs, and releases, with version and variant handling. The release handling ensures that all components in a configuration are locked. It is, however, still possible to reuse the components in unchanged form. This makes it possible to go back to any released configuration at any point in time. 32.3.2.3 Database All data objects, both global and configuration specific, are stored in a common database (Figure 32.4). The VNA tool was designed to have one common multiuser database per car OEM. In order to secure the highest possible performance, all complex and time-consuming VNA operations are performed toward a local RAM mirror of the database. A specially designed database interface ensures consistency in the local mirror. Operations that are not time critical, such as database management, operate toward the database.
© 2005 by CRC Press
32-12
The Industrial Communication Technology Handbook
Use manage
Frame compile
D backup
D Converter
Consistency check
DB GUI
D Console
in RA
D If
DB
Specs, Reports & Document
FIBEX XML Files
ASAP
Config. generator
SWRS
.ldf
Quick generator
Generic Exp./Imp
XML
LIN Description Files
Fixed
Target
Network
Volcano Configuration Files
Conversion tool
HTML 3rd Party Format
customer
FIGURE 32.4 The database is a central part of the VNA system. In order to ensure highest possible performance, each instance of VNA accesses a local mirror of the database that is continuously synchronized with its parent.
The built-in multiuser functionality allows multiple users to access all data stored in the database simultaneously. To ensure that a data object is not modified by more than one user, the object must be locked before any modification; although an object is locked for modification, read access is of course allowed for all users. 32.3.2.4 Version and Variant Handling The VNA database implements functionality for variant and version handling. Most of the global data objects, e.g., signals, functions, and nodes, may exist in different versions, but only one version of an object can be used in a specific project or configuration. The node objects can be seen as the main global objects, since hierarchically they include all other types of global objects. The node objects can exist in different variants, but only one object can be used from a variant folder in a particular project or configuration. 32.3.2.5 Consistency Checking Extensive functionality for consistency checking is built into the VNA tool. The consistency check can be manually activated when needed, but it is also running continuously to check user input and give immediate feedback on any suspected inconsistency. The consistency check ensures that the network design follows predefined rules and generates errors when appropriate. 32.3.2.6 Timing Analysis/Frame Compilation The Volcano concept is based on a foundation of guaranteed message latency and a signal-based publish–subscribe model. This provides abstraction by hiding the network and protocol details, allowing the developer to work in the application domain with signals, functions, and related timing information. Much effort has been spent on developing and refining the timing analysis in VNA. The timing analysis is built upon a scheduling model called DMA (deadline monotonic analysis) and calculates the worstcase latency for each frame among a defined set of frames sent on the bus. Parts of this functionality have been built into the consistency check routine as described above, but the real power of the VNA tool is found in the frame packer/frame compiler functionality.
© 2005 by CRC Press
32-13
Volcano: Enabling Correctness by Design
28
ID bit
1
24
6
20
0
16
0
12
0
8
4
4
C
0
2
1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 1 0 7
8
0
num priority bits = 7
4
C
2
num filter bits = 13 unused bits (0)
FIGURE 32.5 A CAN identifier on an extended CAN network. The network clause has defined the CAN identifiers to have 7 priority bits and 13 filter bits. The least significant bit of the value corresponds with the bit of the identifier transmitted last. Only legal CAN identifiers can be specified: identifiers with the seven most significant bits equal to 1 are illegal according to the CAN standard.
The frame packer/compiler attempts to create an optimal packing of signals into frames and then calculate the proper IDs to every frame, ensuring that all the timing requirements captured earlier in the process are fulfilled (if possible). This automatic packing of multiple signals into each frame makes more efficient use of the data bus, by amortizing some of the protocol overheads involved, thus lowering bus load. The combined effect of multiple signals per frame and perfect filtering results in a lower interrupt and CPU load, which means that the same performance can be obtained at lower cost. The frame packer can create the most optimal solution if all nodes are reconfigurable. To handle carryover nodes that are not reconfigurable (ROM based), these nodes and their associated frames can be classed as fixed. Frame packing can also be performed manually if desired. Should changes to the design be required at a later time, the process allows rapid turnaround of design changes, rerunning of the frame compiler, and regeneration of the configuration files. The VNA tool can be used to design network solutions that are later realized by embedded software from any provider. However, the VNA tool is designed with the Volcano embedded software (VTP) in mind, which implements the expected behavior into the different nodes. To get the full benefits of the tool chain, VNA and VTP should be used together. 32.3.2.7 Volcano Filtering Algorithm A crucial aspect of network configuration is how to choose identifiers so that the load on a CPU related to handling of interrupts generated by frames of no interest for the particular node is minimized: most CAN controllers have only limited filtering capabilities. The Volcano filtering algorithm is designed to achieve this. An identifier is split into two parts: priority bits and filter bits. All frames on a network must have unique priority bits; for real-time performance, the priority setting of a frame should reflect the relative urgency of the frame. The filter bits are used to determine if a CAN controller should accept or reject a frame. Each ECU that needs to receive frames by interrupts is assigned a single filter bit; the hardware filtering in the CAN controller is set to “must match 1” for the filter bit and “don’t care” for all other bits. The filter bits of a frame are set for each ECU by which the frame needs to be seen. So a frame that is broadcast to all ECUs on the network is assigned filter bits all set to 1. For a frame sent to a single ECU on the network, just one filter bit is set. Figure 32.5 illustrates this; the frame shown is sent to four ECUs. If an ECU takes an interrupt for just the frames that it needs, then the filtering is said to be perfect. In some systems there may be more ECUs needing to receive frames by interrupt than there are filter bits in the network; in this case, some ECUs will need to share a bit. If this happens, then Volcano will
© 2005 by CRC Press
32-14
The Industrial Communication Technology Handbook
FIGURE 32.6 VNA screen.
filter the frames in software, using the priority bits to uniquely identify the frame and discarding unwanted frames. The priority bits are the most significant bits. They indicate priority and uniquely identify a frame. The number of priority bits must be large enough to uniquely identify a frame in a given network configuration. The priority bits for a given frame are set by the relative urgency (or deadline) of the frame. This is derived from how urgently each subscriber of a signal in the frame needs the signal (as described earlier). In most systems, 5 to 10 priority bits is sufficient. The filter bits are the remaining least significant bits and are used to indicate the destination ECUs for a given frame. Treating them as a target mask does this: each ECU (or group of ECUs) is assigned a single filter bit. The filtering for a CAN controller in the ECU is set up to accept only frames where the corresponding filter bit in the identifier is set. This can give perfect filtering: an interrupt is raised if and only if the frame is needed by the ECU. Perfect filtering can dramatically reduce the CPU load compared to filtering in software. Indeed, perfect filtering is essential if the system integrator needs to connect ECUs with slow 8-bit CPUs to high-speed CAN networks (if filtering were implemented in software, the CPU would spend most of its available processing time handling interrupts and discarding unwanted frames). The filtering scheme also allows for broadcast of a frame to an arbitrary set of ECUs. This can reduce the traffic on the bus since frames do not need to be transmitted several times to different destinations (Figure 32.6). Because the system integrator is able to define the configuration data and because those data define the complete network behavior of an ECU, the in-vehicle networks are under the control of the system integrator. 32.3.2.8 Multiprotocol Support The existing version of VNA supports the complementary, contemporary network protocols of CAN and LIN. The next version will also have support for the FlexRay protocol. A prototype version of VNA with
© 2005 by CRC Press
Volcano: Enabling Correctness by Design
32-15
partial MOST support is currently under construction. As network technology continues to advance into other protocols, VNA will also move to support these advances. 32.3.2.9 Gatewaying A network normally consists of multiple network segments using different protocols. Signals may be transferred from one segment to another through a gateway node. As implemented throughout the whole tool chain of the Volcano Automotive Group, gatewaying of data even across multiple protocols is automatically configured in VNA. In this way, VNA allows any node to subscribe any signal generated on any network without needing to know how this signal is gatewayed from the publishing node. Handling of timing requirements over one or more gateways is also handled by VNA. The Volcano solution requires no special gatewaying hardware and therefore provides the most cost efficient solution to signal gatewaying. 32.3.2.10 Data Export and Import The VNA tool enables the OEMs to get a close integration between VNA and functional modeling tools and to share data between different OEMs and subcontractors, e.g., node developers. Support of emerging standards such as FIBEX and XML will further simplify information sharing and become a basis for configuration of third-party communication layers.
32.4 Volcano Software in an ECU The Volcano tool chain includes networking software running in each ECU in the system. This software uses the configuration data to control the transmission and reception of frames on one or more buses and present signals to the application programmer. One view of the Volcano network software is as a communications engine under the control of the system integrator. The view of the application programmer is different: the software is a black box into which published signals are placed, and out of which can be summoned subscribed signals. The main implementation goals for Volcano target software are as follows: • Predictable real-time behavior — No data loss under any circumstances • Efficiency — Low RAM usage, fast execution time, small code size • Portability — Low cost of moving to a new platform
32.4.1 Volcano Configuration Building a configuration is a key part of the Volcano concept. As already mentioned, a configuration is based around details, such as how signals are mapped into frames, allocation of identifiers, and processing of intervals. For each ECU, there are two authorities acting in the configuration process: the system integrator and the ECU supplier. The system integrator provides the Volcano configuration for the ECU regarding the network behavior at the system level, and the supplier provides the Volcano configuration data for the ECU in terms of the internal behavior. 32.4.1.1 The Configuration Files The Volcano configuration data are captured in four different types of files: • Fixed information (agreed upon between the supplier and system integrator). • Private information provided by the ECU supplier. The ECU supplier does not necessarily have to provide this information to the system integrator. • Network configuration information supplied by the system integrator. • Target information (supplier description of the ECU published to the system integrator).
© 2005 by CRC Press
32-16
The Industrial Communication Technology Handbook
32.4.1.1.1 Fixed Information The fixed information is the most important in achieving a working system. It consists of a complete description of the dependencies between the ECU and the network. This includes a description of the signals the ECU needs from the network, how often Volcano calls will be executed, and so on. The information also includes a description of the CAN controller(s) and possible limitations regarding reception and transmission boundaries and supported frame modes. The fixed information forms a “contract” between the supplier and the system integrator: the information should not be changed without both parties being aware of the changes. The fixed information file is referred to as the FIX file. 32.4.1.1.2 Private Information The private file contains additional information for Volcano that does not affect the network: time-out values associated with signals and what flags are used by the application. The private information file is referred to as the PRI file. 32.4.1.1.3 Network Information The network information specifies the network configuration of the ECU. The system integrator must define the number of frames sent from and received by the ECU, the frame identifier and length, and details of how the signals in the agreed-upon information are mapped into these frames. Here, the vehicle manufacturer also defines the different frame modes used in the network. The network information file is referred to as the NET file. 32.4.1.1.4 Target Information The target information contains information about the resources that the supplier has allocated to Volcano in the ECU. It describes the ECU’s hardware (e.g., used CAN controllers and where those are mapped in memory). The target information file is referred to as the TGT file.
32.4.2 Work Flow The Volcano system identifies two major roles in the development of a network of ECUs: the application designer (which may include the designer of the ECU system or the application programmer) and the system integrator. The application designer is typically located at the organization developing the ECU hardware and application software. The system integrator is typically located at the vehicle manufacturer. The interface between the application designer and the system integrator is carefully controlled, and the information owned by each side is strictly defined. The Volcano tool chain implementation clearly reflects this partitioning of roles. The Volcano system includes a number of tools to help the system integrator in defining a network configuration. The Network Architect is a high-level design tool, with a database containing all the publish–subscribe information for each ECU available, as described in the previous sections. After mapping the signaling needs on particular network architecture, thus defining the connections between the published and subscribed signals, an automatic frame compiler will be run. The frame compiler tool uses the requirements captured earlier to build a configuration that meets those requirements. There are many possibilities to optimize the bus behavior. The frame compiler includes the CAN bus timing analysis and LIN schedule table generation and will not generate a configuration that violates the timing requirements placed on the system. The frame compiler also uses the analysis to answer “What if?” type of questions and guide the user in building a valid and optimized network configuration. The output of the frame compiler is used to build configuration data specific to each ECU. This is used by the Volcano target software in the ECU to properly configure and use the hardware resources. The Volcano configuration data generator tool set (V5CFG/V5BND) is used to translate this ASCII text information to executable binary code in the following way: • When the supplier executes the tool, it reads the FIX, PRI, and TGT files to generate compile time data files. These data files are compiled and linked together with the application program together with the Volcano library supplied for the specific ECU system.
© 2005 by CRC Press
32-17
Volcano: Enabling Correctness by Design
agreed information
ECU supplier
‘‘private’’
‘‘target’’
vehicle manufacturer
‘‘fixed’’
‘‘network’’
V5CFG configuration tool
V5CFG configuration tool
intermediate ‘‘fix/pri’’
intermediate ‘‘fix/net’’
V5BND target code generation
V5BND target tailoring
Compile-time data
Binary data for ECU configuration
Volcano 5 target library
Application program
ECU memory Program code (ROM/FLASH EEPROM) Volcano 5 NVRAM pool
FIGURE 32.7
Volcano Target Code configuration process.
• When the vehicle manufacturer executes the tool, it reads the FIX, NET, and TGT files to generate the binary data that are to be located in the ECU’s Volcano configuration memory (known as the Volcano NVRAM). An ECU is then configured (or reconfigured) by downloading the binary data to the ECU’s memory. It is vital to realize that changes to either the FIX or TGT file cannot be done without having coordination between the system integrator and the ECU supplier. The vehicle manufacturer can, however, change the NET file without informing the ECU supplier. In the same way, the ECU supplier can change the PRI file without informing the system integrator. Figure 32.7 shows how the Volcano target code for an ECU is configured by the supplier and the system integrator. The Volcano concept and related products have been successfully used in production since 1996. Present car OEMs using the entire tool chain are Aston Martin, Jaguar, LandRover, MG Rover Volvo Cars, and Volvo Bus Corporation.
Acknowledgments I acknowledge the contributions of my colleagues at Volcano Automotive Group — in particular, István Horváth, Niklas Amberntsson, and Mats Ramnefors for their contributions to this chapter.
© 2005 by CRC Press
32-18
The Industrial Communication Technology Handbook
Reference [1] K. Tindell and A. Burns, Guaranteeing message latencies on Controller Area Network (CAN), in Proceedings of the 1st International CAN Conference, 1994, pp. 2–11.
More Information http://www.VolcanoAutomotive.com. K. Tindell, H. Hansson, and A.J. Wellings, Analysing real-time communications: Controller Area Network (CAN), in Proceedings of the 15th IEEE Real-Time Systems Symposium, San Juan, Puerto Rico, 1994, pp. 259–265. K. Tindell, A. Rajnák, and L. Casparsson, CAN Communications Concept with Guaranteed Message Latencies, SAE Paper, 1998. L. Casparsson, K. Tindell, A. Rajnák, and P. Malmberg, Volcano: A Revolution in On-Board Communication, Volvo Technology Report, 1998. W. Specks and A. Rajnák, The scaleable network architecture of the Volvo S80, in 8th International Conference on Electronic Systems for Vehicles, Baden-Baden, Germany, October 1998, pp. 597–641.
© 2005 by CRC Press
33 The Use of Network Hierarchies in Building Telemetry and Control Applications 33.1 Introduction ......................................................................33-1 Background • Distributed Building Control Applications • General Requirements
33.2 Hierarchical Communications Infrastructure.................33-3 DAN Communications • LAN Communications • WAN Communications • DAN Routers and Gateways
33.3 Distributed Building Automation Architectures...........33-10 System-Wide Logic Distribution • Application-Level Communications Interfaces • The Role of Distributed Objects
Edward Koch Akua Control
33.4 Conclusions .....................................................................33-15 References ...................................................................................33-15
33.1 Introduction Within a building the range of communications technologies that are available is vast. There are literally thousands of different physical mediums and protocol combinations that can be used. Rather than enumerate the countless possibilities, this chapter will focus on the common factors that encompass the development of a hierarchical networking infrastructure that can be used for the monitoring and control of a building’s systems. The described network hierarchy is an integration of the control networks and IT networks inside a building. By integrating, the following benefits can be achieved: • • • •
Lower costs by reusing existing IT network infrastructures Increased accessibility of information on the control networks Easier integration of IT applications with control applications More easily expanded control network communications
Building monitoring and control applications fall within the general area of building automation, which entails the computerized monitoring and control of a building’s systems such as lighting and HVAC. Some systems are simply monitored, both from within the facility and remotely, while others are controlled in some automated fashion. Historically, large commercial buildings were the only facilities to benefit from automation, but the lowering cost and higher availability of the enabling technologies has spawned an increased rate of automation in both small commercial and residential buildings. This chapter will therefore present a 33-1 © 2005 by CRC Press
33-2
The Industrial Communication Technology Handbook
communications and networking model that is applicable to a wide range of buildings from residential to large commercial. The networking model presented consists of heterogeneous networks that are integrated together to form a hierarchy for the flow of data and control information.
33.1.1 Background Historically, buildings were automated by wiring sensors and actuators back to a central controller or computer that would provide the monitoring and control functionality. With the advent of cheaper and more available networking technology, the trend over time has been to use communications and embed more intelligence in the building’s systems. Initially, the large building control companies provided end-toend building automation solutions and used proprietary communications technology and protocols, but over time, there have emerged more standardized communications infrastructures within buildings that have allowed system integrators to use off-the-shelf components from a number of different manufacturers. While the lower cost of embedded computers and communications technology has allowed more of a buildings system to be networked together, the emergence of packet-based networking standards has had the largest impact on how computers in general communicate. More specifically, the following set of standards has had the largest influence over the years: • IEEE 802 suite of standards [1] • Emergence of Internet Protocol (IP) [2] By studying the IEEE 802 suite of standards, one can gain an appreciation for how computer network communications have evolved over the years from the early days of Ethernet (802.3) to the more recent wireless (802.11x). Likewise, the emergence of IP is pervasive and is the predominant protocol used for all computer communications. As such, it will play an important role in the network hierarchy described in this chapter.
33.1.2 Distributed Building Control Applications The systems and applications within a building that lend themselves to automation include the following: • • • • • • • • •
Lighting Energy consumption Heating, ventilation, and air conditioning (HVAC) Access control Air quality monitoring Appliance and equipment monitoring and control Safety Security Health monitoring
Because these systems are typically spread throughout the facility, the task of monitoring and controlling them is distributed by nature. Often, these systems are monitored and controlled independently, while sometimes they are interrelated as part of a higher-level control strategy. The best way to model this situation is to view the entire building as a single control system in which the various subsystems and applications are integrated together using some sort of communications network infrastructure. Note that although there are significant differences between residential and commercial buildings in the scale of the facility, and thus the number and complexity of the systems being automated, they both share all the applications listed above in one form or another.
33.1.3 General Requirements The following are the general requirements of a communications infrastructure to support building automation.
© 2005 by CRC Press
The Use of Network Hierarchies in Building Telemetry and Control Applications
33-3
33.1.3.1 Throughput Compared to general-purpose computer networks, the volume of information exchanged between a building’s subsystems is relatively small. On the other hand, the number of such subsystems can be quite large. In short, the throughput requirement of an individual subsystem is small, while the throughput requirement of the entire network can be large. 33.1.3.2 Latency Some applications, such as lighting and access control, require that when a human initiates an action (i.e., turns on a switch), there is a resulting action within a reasonable period. These requirements are therefore typically in the hundreds of milliseconds. Unlike many industrial automation applications, there are usually no high-speed closed-loop control algorithms that rely on the exchange of information over the network infrastructure. The requirement on both latency and predictability of the network traffic is therefore low compared to some industrial automation applications. 33.1.3.3 Reliability Obviously it is desirable to have high reliability in any communications network, but most building automation applications are not life threatening, and if a subsystem goes down, it usually does not have the same financial impact as a failed manufacturing line. Some exceptions to this general rule are in the area of safety and security, where the reliability requirements can be quite high. 33.1.3.4 Cost Traditionally the subsystems being automated are relatively low cost. Any cost in adding intelligence and enabling communications must be kept as low as possible. This is probably the most significant reason that the subsystems being automated generally do not use the same networking technology as the generalpurpose computing devices in the building. For example, enabling embedded devices with IP communications can be relatively expensive, although this is obviously changing over time. 33.1.3.5 Security In general, the information being exchanged in a building automation system is not of a sensitive nature. On the other hand, it is highly desirable that systems not be controlled or accessed by unauthorized users. Most security is therefore targeted toward authentication. 33.1.3.6 Operations The departments and personnel that manage the building automation systems and networks are usually distinctly different from the IT department that manages the general computing networks. This can often lead to operational issues when trying to integrate the two networks together.
33.2 Hierarchical Communications Infrastructure Typically there exist a number of different communications networks within a building, each of which has its own set of requirements and primary applications that it supports. Some of the networks are designed and installed specifically to support building automation, while others exist primarily to support the networking of computers for data processing and the sharing of resources such as printers and disk storage. In most cases, the networking infrastructure can support applications that are distinct from their intended target applications. Therefore, for the sake of this analysis, it is useful to roughly classify the type of communications within a building into the following two categories: • Information technology (IT) communications — Supports data processing and general computing applications • Control communications — Supports monitoring and control of sensors and actuators in a building
© 2005 by CRC Press
33-4
The Industrial Communication Technology Handbook
FIGURE 33.1 Hierarchical communications network.
Note that this classification is based upon the intended application of the communications and not on the specific content or nature of the messages themselves. In many cases, the same information could be communicated in both cases. Traditionally, networks can be classified by their geographic extent, such as LANs and WANs. It is useful to extend these classifications to segment the networking infrastructures within the building in the following way: • Wide area network (WAN) — IT network whose scope supports communications outside and remote to the facility • Local area network (LAN) — IT network whose scope is within the facility where the control systems exist • Device area network (DAN) — Communications network used by devices within the facility As shown in Figure 33.1, these three segments can be integrated together to form a hierarchical communications network for building telemetry and control applications. Special attention will be given to the DAN segment, but each segment will be analyzed in the role that it may play in supporting building automation.
33.2.1 DAN Communications All building automation systems consist of a collection of sensors and actuators. In general, the DAN is used to collect information from sensors and send commands to actuators. In some cases, the technology used to implement a DAN is IP based and uses the same technology as that used for IT communications, although more typically, it is different because DANs have different constraints in terms of cost and
© 2005 by CRC Press
The Use of Network Hierarchies in Building Telemetry and Control Applications
33-5
performance. Like other engineering domains, there is always a trade-off between cost, complexity, functionality, etc. Some DANs are very simple, with limited functionality, while others are general purpose and complex in nature, requiring sophisticated configuration tools to commission and manage devices on it. There are numerous technologies and protocols available for the implementation of DANs, and it is common for more than one DAN technology to be used in a single building. As shown in Figure 33.1, the DAN technologies can be broken down into one of four general categories based upon the physical nature of DAN. It should be noted that this classification is somewhat arbitrary and done merely as a convenience to enable a broad introduction to the type of DAN networks in use today. It should also be noted that there exist DAN protocols such as EIA-709 [3] that span more than one of these categories. 33.2.1.1 Special-Purpose Networks There exist special-purpose networks that are targeted for specific applications. Special-purpose networks tend to provide the best performance/price for the applications they are designed for. An example of one such network is the Digital Addressable Lighting Interface (DALI) [4]. DALI is developed specifically for lighting control and has begun to gain acceptance in the lighting industry in recent years. Its main benefit is cost and industry acceptance. Its applicability outside lighting control is limited. Another example of a specialized network is the Integrated Building Environmental Communications System (IBECS) being developed at Lawrence Berkeley National Laboratory [5]. It is primarily focused on lighting and environmental controls and has a wider scope of applicability over the more narrowly focused DALI networks. 33.2.1.2 General Wired Networks The most common type of networks used in building automation are so-called wired networks. Wired networks entail running a physical cable between all the devices on the DAN. The topology can be star, daisy chain, or multidrop. The electrical characteristics can be single or differential and may be based upon a multitude of standards, such as RS-485. Speeds range from a few kilobits to many megabits. The most common type of cabling scheme in use is twisted-pair cables. On top of that, there are innumerable protocols in use today. Regardless of the vast combination of technologies and protocols outlined above, all wired DANs share the following set of features and issues: • Wired networks are generally the most reliable and have the best performance. • Wired networks have the lowest cost per device to enable the device to communicate on the DAN, but have the highest cost to install and commission due to all the cabling. • Special care must usually be taken when running the cables to avoid signal degradation from line reflections due to unterminated cables or too many stubs. • Physically reconfiguring the location of devices on the network is costly. 33.2.1.3 Power-Line Communications Networks Power-line communications (PLC) uses existing power lines for communication. This means that no cables need to be run, and thus it overcomes many of the issues with wired networks. On the other hand, PLC tends to be more expensive to implement at the device level and are much less reliable than wired networks. In many commercial applications, PLC is not reliable enough to be used. In general, there are two classes of PLC, which can be roughly classified as high and low speed. Highspeed PLC has a throughput in the megabit range and is targeted at the networking of PCs in residences. The standards surrounding high-speed PLC are driven by the HomePlug [6] alliance. Low-speed PLC has a much lower throughput in the range of a few bauds to many tens of kilobaud. Low-speed PLC is considered a DAN technology, while high-speed PLC is considered a LAN technology. Low-speed PLC suffers from a lack of accepted industry standards, and there are a number of companies that offer PLC technology. Examples of PLC technology include X10 and EIA-709.2 [7]. The relative high cost of PLC at the device level and its unreliability make it inappropriate for many applications.
© 2005 by CRC Press
33-6
The Industrial Communication Technology Handbook
33.2.1.4 Radio Frequency (RF) Networks RF technologies hold the promise of lower installation costs and may be the best choice for retrofit applications. Certain types of sensor devices, such as occupancy and light-level sensors, definitely benefit from the wireless aspect of RF technologies since it is desirable to be able to easily move or add these types of sensors. There have been a number of RF technologies developed of late that are potentially applicable. Some of the standards being developed include IEEE 802.15.4 [8], Zigbee [9], Bluetooth [10], and potentially Ultra Wideband (UWB; IEEE 802.15.3 [11]). There are numerous companies developing RF technologies that are suitable for DANs. The developers of the RF technology have focused on the following requirements: • • • • •
Low power consumption (multiyear battery life) Low cost (less than $5 per node) Low data rates (10 Kbaud to 100 Kbaud) Short range (10 to 100 m) Ad hoc mesh networking
To date, the most promising standards revolve around IEEE 802.15.4 and Zigbee, with a large number of companies announcing support for these two complementary standards. Current UWB efforts are primarily targeted toward high-speed short-range consumer applications, but UWB may be the best long-term RF solution since it also supports the ability to geographically locate the devices.
33.2.2 LAN Communications The LAN segment of the network typically exists to support IT communications. In relation to building automation, it performs the following functions: • Supports SCADA (supervisory control and data acquisition) and other system-wide building automation applications • Provides a means to connect DAN subnets together The most prevalent LAN technology in use today is twisted-pair Ethernet, although WiFi is gaining acceptance. The SCADA applications that provide building-wide control often run on general-purpose workstations connected to the LAN. As such, these applications must communicate in some way with the building control actuators and sensors. As shown in Figure 33.1, this is done via gateways or routers that link the DAN networks with the LAN networks. The integration of the communications technologies between the LAN and the DAN is only part of what is required. In addition, there must be some sort of semantic integration between the applications on the LAN and the devices on the DAN. This typically means that the data models used by the LAN applications must be consistent with the data models used by the DAN devices. Sometimes this can be accomplished directly between a LAN application and a DAN device by doing protocol translation, but often it is necessary to insert an intermediary agent such as a gateway that can perform some sort of data translation and mapping between the LAN application and the DAN device’s interface.
33.2.3 WAN Communications The WAN segment typically exists to support IT communications, although in some cases there may be a dedicated control or telemetry WAN network. In relation to building automation, it performs the following functions: • Provides remote access to the building control systems • Links together the control systems of multiple buildings, i.e., a campus
© 2005 by CRC Press
The Use of Network Hierarchies in Building Telemetry and Control Applications
33-7
WANs span large geographic areas and are typically an amalgamation of different technologies and protocols. There are a wide range of technologies that can be used to interface a building to the WAN, including T1 [12], DSL [13], cable [14], or even dial-up over plain old telephone service (POTS). The most prevalent WAN network is the Internet, and its use in building automation systems is growing rapidly. The most important characteristic of WANs is whether the building’s connection to them is always on or sometimes on. Always-on connections allow communications on demand without any delay to establish a connection, while a sometimes-on connection requires significant overhead and time to establish a connection. Furthermore, sometimes-on connections are typically not left in a connected state. Once the network exchange is completed, the connection is disabled. An example of a sometimeson connection is dial-up over POTS, which is still the most prevalent WAN connection in use today, especially in residential applications. If remote access is required, then the nature of the WAN connection can have a significant impact on the overall design of the building automation system. As shown in Figure 33.1, there exists some sort of gateway or router that provides the interface between the WAN network and the LAN and DAN networks inside the building. In many cases, a typical IT router will be used. In other cases, the router or gateway may have special functionality that is used to support the building automation applications. This will be discussed in more detail later.
33.2.4 DAN Routers and Gateways Routers and gateways are the main integration component between the DAN and the LAN and WAN segments of a hierarchical network. Routers and gateways perform the following functions: • Facilitate DAN communications between devices on different network segments • Facilitate application-level communications • Provide local control functions with devices on the DAN For the purposes of this discussion, routers are devices that transport DAN packets between network segments without performing significant protocol translation, while gateways may perform a variety of protocol translations as well as implement various local control functions on the DAN. Below is a more detailed discussion of the various functions performed by routers and gateways. 33.2.4.1 Facilitate DAN Communications Routers move native DAN packets over LANs without doing any significant protocol translation or data mapping. An example of this type of routing can be found in EIA-852 [15], which is a standard that allows the tunneling of DAN type packets over IP networks. Another example is BACnet [16], which is an emerging building automation protocol that has a similar standard for allowing the tunneling of BACnet packets over IP networks. It should be noted that this is different than BACnet IP, which uses IP as the BACnet transport protocol. Tunneling routers are depicted in Figure 33.2A. Figure 33.2A shows two control subnets (A and B) in the DAN segment. Each of the control subnets is composed using the same communications protocol. The role of the tunneling routers is to take DAN packets from a control subnet (subnet A), encapsulate them into an IP packet, and forward them on to other tunneling routers on the IP LAN, where they are subsequently extracted from the IP packet and transmitted on to the other DAN subnet (subnet B). The IP used to encapsulate the DAN packets is known as the tunneling protocol. As already stated, EIA-852 is an example of a tunneling protocol for DAN type packets. Depending upon the requirements of the DAN protocol, the tunneling protocol may be very simple or quite complex. Complexities usually arise when there are timing and transaction space constraints of the DAN protocol that are difficult to maintain in IP networks. The scenario discussed above assumes that the control subnets A and B use the same DAN protocol, but it is also possible to transmit DAN packets between two subnets that do not use the same DAN protocol. In this case, the gateways involved perform both protocol translation and packet tunneling between the DAN subnets. If the differences between the DAN protocols are significant, it may not be possible to perform a simple protocol translation.
© 2005 by CRC Press
33-8
(A)
(B)
© 2005 by CRC Press
The Industrial Communication Technology Handbook
FIGURE 33.2
The Use of Network Hierarchies in Building Telemetry and Control Applications
33-9
Tunneling DAN packets in this fashion is an effective way to utilize the LAN (or WAN) as a backbone or medium for expanding the DAN communications throughout the facility, but it can also be used to communicate with workstation-based applications on the LAN/WAN. For this to be possible, the LAN/ WAN application must speak both the tunneling protocol and the specific DAN protocol for each subnet that it wants to communicate with. The need for the LAN/WAN application to speak so many different DAN protocols will make it more difficult to interface with a variety of DAN subnets. In addition, it must also adopt the data models and interfaces of each device on each DAN segment. 33.2.4.2 Facilitate Application-Level Communications An alternative to using tunneling routers is to use a gateway that communicates with the LAN/WAN applications using a single standardized protocol and data model that translates the various DAN protocols and data models to those used by the LAN/WAN applications. This is depicted in Figure 33.2B. This approach has the advantages of defining a single protocol and data model for LAN/WAN applications, thus reducing the number of standards and protocols that these applications must adopt. In addition, if a suitable LAN/WAN protocol and data model can be chosen, there may be a better chance for industry adoption and interoperability. Standards such as BACNet, Open Services Gateway Initiative (OSGi) [17], or IEEE 1451 [18] can play a significant role in this process. In many cases, it is desirable to leverage an existing IT middleware infrastructure to accomplish this type of integration. A more detailed discussion of middleware infrastructures will be discussed in Section 33.3. 33.2.4.3 Local Logic and Control In addition to facilitating communications, the gateway can also perform local logic functions. If so enabled, it can be programmed to interface with the devices on the DAN to read values and issue commands. Thus, the system-wide logic that may typically be executed at centrally located LAN/WAN applications can be distributed to a multitude of gateways. This may be done for a variety of reasons, including: • • • • •
Reduce network traffic on the LAN/WAN Reduce latencies in control loops that utilize the devices on the DAN Enable functions that are not possible in WAN applications because of sometimes-on connections Eliminate single point of failures in the LAN/WAN applications Provide for redundancy in the control algorithms
The following classes of functionality may be implemented in the gateway: • • • •
Log of information from the DAN Perform control loops by reading sensor values and issuing actuator commands Monitor values on the DAN and issue alarms for exceptional conditions Perform configuration and commissioning functions for devices on the DAN
Typically, these functions are programmed into the gateway using workstation-based tools or applications on the LAN/WAN. There are various levels of programmability of a gateway, as described below: 1. The first level is a fixed set of functionality built into the gateway that has been parameterized such that the programmability of the gateway is accomplished simply by using an Application Programming Interface (API) to set the parameters from the LAN application. This is the simplest method if the applications and functionality can be defined in advance. It suffers from being somewhat inflexible. 2. The second level is to design the gateway with the ability to run interpreted programs that can be loaded into it from the LAN applications. This provides for more flexibility, but can be more difficult to use. The programming language can be a simple scripting language such as Tcl and Basic, or something as complete as Java. Standards such as OSGi exploit this model to provide a standard framework for programming the gateway in this fashion. 3. The final level is to allow for native code to be developed and loaded into the gateway. This has the maximum flexibility, but is by far the most difficult to execute and the most prone to intro-
© 2005 by CRC Press
33-10
The Industrial Communication Technology Handbook
ducing bugs that require the gateway to be replaced. This level of programming is typically not used by LAN/WAN applications, except in cases where it is necessary to upgrade the core software of the gateway.
33.3 Distributed Building Automation Architectures One can view the entire building as a single system consisting of sensors, actuators, and controllers. For the purposes of automation, the following functions are performed at various places within the system: • Control processes that monitor sensors and activate actuators • Alarm processes that monitor sensors and provide notifications for exceptional conditions • Monitoring processes with human machine interfaces (HMIs) to allow operators to view and manually control various subsystems • Logging processes that record system information for later analysis There is no definitive way to distribute this functionality, but with the advent of intelligent sensors, actuators, and pervasive networking technology, there is increased flexibility to consider all the factors discussed in Section 33.1.2 to engineer the right solution for each building under consideration. This section explores some of the issues involved in distributing the above-mentioned functionality across the building control system.
33.3.1 System-Wide Logic Distribution Engineers that put together building controls systems are referred to as system integrators. This is indicative of the fact that they take an amalgamation of existing technologies and off-the-shelf products and integrate them together to engineer the building control system. Using these components, aspects of each of the functions described above can be instantiated in the following places: • • • •
Within intelligent devices on the DAN Within intelligent DAN gateways, as described in Section 33.2.4 Within LAN-based devices and applications Within remote WAN-based devices and applications
Without a specific set of requirements, it is impossible to determine how best to distribute the logic of a building control system. In some cases, it is best to put as much intelligence as possible into the devices on the DAN. This tends to be more robust and can produce a better-performing system as a whole. It also tends to be more expensive and create a more complex system. Instead of simply programming a small number of centralized controllers with logic, it is necessary to program a large number of distributed devices. This requires sophisticated programming and commissioning tools and can be very difficult to maintain. One factor that has a significant effect on how the logic of the building control system is distributed is the concept of system-level logic. System-level logic is that which transcends individual devices. It involves control rules that encompass multiple sensor and actuator subsystems throughout the building. If the state of sensors on multiple devices is monitored and used to control multiple actuators on multiple devices, then the issue is, Where do the rules that perform the desired functions exist? In an ideal world, all sensors and actuators would publish their state information in a fashion that could be read by all other devices, and the control logic for the actuators would be embedded in each of them. The fact is that this is not practical for the following reasons: • It consumes large amounts of network bandwidth since all state information is potentially distributed throughout the entire network. • All actuators must be able to understand the semantics and syntax of all the state information published.
© 2005 by CRC Press
The Use of Network Hierarchies in Building Telemetry and Control Applications
33-11
• All the actuators must possess enough intelligence and functionality to implement potentially complex control algorithms. • There must exist very sophisticated programming tools that would enable programmers to specify the system-level logic in a manner that ultimately gets decomposed and distributed to the devices in the network. • There must exist very sophisticated diagnostic tools that allow engineers to debug problems that might exist. The fact is that a truly distributed system that treats all the devices as equal peers is by its nature a very complex system. A more practical approach is to apply the tried and true engineering principle of decomposing the system into a hierarchy of well-defined and more manageable subsystems. This is, in fact, what is done in practice. In the case of implementing system-level logic, this means that controllers are placed at strategic places in the architecture. A controller is a device that can monitor sensor and actuator states and issue the appropriate commands to actuators. Note that the concept of a controller is very general, and in practice it may be instantiated in any of the following: • • • •
A dedicated controller device on the DAN One of the sensors or actuators on the DAN A gateway on the DAN An application running on the LAN/WAN
In practice, instantiation of controllers goes hand in hand with the network infrastructure hierarchy. Sometimes it makes sense to place controllers based upon the existing network infrastructure, while other times it may make sense to engineer the network infrastructure based upon the location of controllers and the system-level logic. An example of this is the situation where there exist sometimes-on connections between the WAN and the building. In such cases, it would not make sense to put system-level logic that required frequent monitoring of sensors and actuators on the WAN. It would make more sense to put a controller on the LAN or DAN that could monitor sensors and actuators at a higher frequency, and then communicate with WAN applications at a frequency that matched the sometimes-on connection.
33.3.2 Application-Level Communications Interfaces Up to this point, it has been established that there are devices and applications that are able to exchange information via DANs, LANs, and WANs. The various protocols used by the network segments allow messages to be exchanged, but this does not fully address the following issues: • • • •
Application-level semantics and syntax of information content Information exchange middleware and APIs Frequency of information exchange The binding of physical to logical constructs
33.3.2.1 Semantics and Syntax Each device and application has a set of data constructs that it uses. These constructs include basic syntax and semantic elements such as data structures, types, and units. In order for information to be effectively exchanged, there must be agreement among the devices and applications as to what the semantics and syntax of the information being exchanged is. Usually, this is done by convention. Many control protocols, such as EIA-709, include implicit data types with the various types of messages that are exchanged. In order to keep complexity and costs to a minimum, devices on the DAN are typically manufactured with a relatively fixed set of data constructs. This can be problematic when it is necessary to integrate information from two such devices if their data constructs are not compatible. Since it is not practical to change the data constructs in the devices themselves, there must be some sort of translation elsewhere in the system. What is required is a type of data translation proxy that can reformat information from one device domain to the other. This sort of functionality is typically embedded in intermediate
© 2005 by CRC Press
33-12
The Industrial Communication Technology Handbook
controllers, gateways, or SCADA applications. Sometimes, these proxy agents have preprogrammed dictionaries that allow them to translate information from one domain to the other. In other cases, the proxies are general purpose and can be programmed dynamically to perform these translations. It should be noted that the data translation problem also applies to the integration between devices and SCADA applications. Intermediate translation proxies can sometimes become communications bottlenecks in the system, since it may be necessary for all message traffic from one device to another to pass through them. Their use needs to be carefully engineered or the overall system performance could suffer. On the other hand, proxies add flexibility to the system and allow a wider range of devices and applications to be integrated together. In many cases, it is simply not possible to design a system without them. It is worth noting that many standards’ efforts have working groups whose main objective is to foster the data compatibility among the device manufacturers. Recently, Extensible Markup Language (XML) [19] has gained a lot of acceptance as the lingua franca for encoding data structures and content, and many industry consortiums are developing XML schemas to help solve the data compatibility problem. 33.3.2.2 Information Exchange Middleware and APIs In the IT realm, middleware is loosely defined as the software between applications and the network that facilitates the exchange of information. Middleware is very useful for developing large distributed systems, especially where integration is required between distributed IT applications that already have use some sort of middleware platform. Examples of commercial middleware platforms include: • CORBA (Common Object Request Broker Architecture) [20] • Microsoft products (.NET), including DCOM (distributed component object model), Microsoft Message Queuing (MSMQ), and Web Services (SOAP) [21] • J2EE application servers, including Java Remote Method Invocation (RMI), Java Message Server (JMS), and Web Services (SOAP) [22] • Publish–subscribe platforms In terms of communications, middleware typically supports the following type of services: • Remote procedure or method calling mechanism • Asynchronous messaging service • Publish–subscribe messaging Remote procedure calls (RPCs) refer to invoking a function or method on a machine that is usually different from the one doing the calling. It is a synchronous process in which a message representing the function call is sent and a message representing the function’s return value is returned. Because of this relationship, this is often referred to as client–server. Technologies such as DCOM, RMI, CORBA, and SOAP represent technologies that support this mechanism. It is most appropriate to use RPC when it is necessary to maintain some sort of synchronization between processes or to determine the result of a function invocation. Applications using this type of mechanism are considered to be tightly coupled. An alternative to doing remote procedure calls is to use asynchronous messaging. Messages from some process are sent to a queue, while other processes retrieve messages from the queue at their convenience. The processes do no need to exist on the same machine, and in distributed applications they usually do not. There is no explicit synchronization between the sending process and the receiving process. Examples of messaging technologies include JMS, MSMQ, and CORBA asynchronous messaging. This programming model can be simpler to use for many types of interactions because the message sender and receiver do not need to know anything about each other. On the other hand, it can be difficult to implement synchronization through this mechanism. Applications using this type of mechanism are considered to be loosely coupled. Publish–subscribe is a special case of asynchronous messaging in that the messages are not sent to a specific address or queue but are published to a specific topic (i.e., conference room temperature).
© 2005 by CRC Press
The Use of Network Hierarchies in Building Telemetry and Control Applications
33-13
Processes that wish to receive the conference room temperature would subscribe to that topic, and every time a message is published with that topic, it would be received by all the subscribers. Each of the services described above has an appropriate application depending upon the system being engineered, and most systems will use RPC in some process interactions and asynchronous messaging in others. The middleware described here is a fairly heavy-weight software infrastructure and is typically not supported in the devices on the DAN. Therefore, in some systems DAN/LAN gateways will implement an interface to the middleware and provide a mapping between information from the devices on the DAN to objects or data structures that can be accessed from LAN/WAN applications via the middleware. There exist gateways today that do this using CORBA, SOAP, and Java RMI. 33.3.2.3 Information Exchange Frequency In general, devices contain states that are transmitted to other devices and LAN/WAN applications in the system. For example, a device with a temperature sensor contains the temperature reading as part of its state. An important issue to be resolved in the design of a control system is when a device’s state information is transmitted to other devices and LAN/WAN applications. In general, there are two mechanisms that are used, either upon request or asynchronously whenever it changes. It is not uncommon to use both for a particular data point in a device. The advantage of transmitting the information asynchronously is that the latency of the information is kept to a minimum and all changes to the data are transmitted. On the other hand, if the information is not needed by the receiving device or application, then valuable network bandwidth is wasted. This can be particularly troublesome if the data value is changing with a high frequency. The advantage of getting values on request (polling) is that the amount of network bandwidth consumed can be controlled by the processes that are consuming the data. The disadvantage is that there is inherent latency in the polling loop and data must be requested at a rate that will ensure that the maximum latency requirements are met. If the data value is not changing very frequently, then network bandwidth will again be wasted. An additional problem is that the data value may change faster than the polling rate, and depending upon the nature of the information, this may be an unacceptable situation. Trade-offs between latency and network bandwidth are a common engineering problem, and if this issue is not dealt with properly, it can lead to subtle incorrect behaviors in the control system that are very difficult to track down. One technique for dealing with this issue is to segment the network with DAN gateways that can poll or receive data from a device at some high frequency and then process and store the state information to be transmitted to other processes on other network segments at a different frequency. 33.3.2.4 Binding One aspect of DANs that is somewhat more complicated than IT networks is the so-called binding problem during the commissioning of devices on the DAN. In general, devices on an IT network are general purpose and do not require special considerations based upon where the computer is located or what function it is performing. On the other hand, devices on a DAN are subject to the binding problem wherein logical data constructs in the device are logically bound to logical data constructs in other devices or LAN/WAN applications. The binding problem probably contributes more to the complexity and overall cost of DANs and building automation than any other single factor. As an example, there may be numerous temperature sensors in a building, but somewhere during the commissioning of the temperature-sensing devices there must be some way of addressing and identifying device XYZ so that it is known that the information coming from it corresponds to the temperature in the conference room. There are numerous ways of addressing the binding problem, but they all entail physically handling the device in question and doing something special with it. This ranges from programming application-specific information into the device or recording information from the device (i.e., its communications address) to be entered into some commissioning application. Sometimes the binding problem can be handled procedurally when the device is installed by using commissioning tools
© 2005 by CRC Press
33-14
The Industrial Communication Technology Handbook
and correlating the device with application-specific data constructs via the tools. But even in that case, there must be some manual step performed. The binding problem gets more complicated when you want to replace an existing device with a new one, but maintain the same logical binding that it had to the automation applications. The binding problem also gets more complicated if the DAN protocol has additional parameters to configure during the commissioning process. In short, there exist methods for automatically identifying when devices of a certain type appear on a DAN network, but there is no way to automatically correlate that device with application-specific data without first recording some specific information related to that device or data. The most promising DAN research for alleviating the binding problem hinges on the fact that in building automation the control applications are more concerned with sensors and actuators that are at specific geographic locations within the building than with devices with specific network addresses. In short, if a device’s identity or address was its location and that location could be obtained in some automated fashion, then the binding problem would be much simpler. Toward this end there is research into the development of devices and protocols that are location aware and use location as the logical addressing scheme for communications.
33.3.3 The Role of Distributed Objects Software engineers in the IT realm have increasingly adopted an object-oriented approach to designing and implementing large systems. Objects are entities that have both state and behavior and thus are a natural fit for control systems that consist of devices with both state and behavior. The same design and programming models can thus be applied to the design of the building control system. Objects contain methods that may be invoked by other objects. At the lowest level, a method invocation is a function call on a specific instance of an object. In the case of distributed objects, it is possible to invoke methods on objects that do not reside on the same machine, typically using some sort of RPC mechanism, as described above. Languages such as Java and C# support distributed objects as part of their specifications. Other languages, such as C/C++, must rely on middleware platforms such as CORBA or DCOM to provide the object mapping and remote method invocation plumbing. SOAP is yet another example of a remote method invocation mechanism that is gaining acceptance in the distributed object realm, and there exist many commercial packages that support this. The object paradigm is a natural fit for building control systems. The sensor and actuator devices on the DAN typically have a very specific and limited set of functionality that can be easily represented as an object in a distributed system. Standards such as IEEE 1451 and BACnet identify conventions for how objects are defined, accessed, and managed. In some cases, a very simple RPC mechanism can be defined even in DANs with simple protocols. Although almost all devices can be thought of logically as objects in a building control system, they often do not have the necessary RPC capabilities to allow them to be accessed directly from other distributed software objects. In this case, an intermediate DAN gateway can serve as a proxy by defining an object that is mapped to the devices on the DAN in such a way that calls to object methods in the DAN gateway get translated to the appropriate messages and commands to the device on the DAN. By doing this, there also exists the possibility of creating virtual devices in the DAN gateway. A virtual device is a logical object representation of some sort of system (i.e., all the lights on a floor) that is in reality composed of numerous devices (i.e., individual lights and switches) on the DAN. Since these devices and their objects are just software constructs in the DAN gateway, there exists the possibility to define a number of different objects that use the same devices on the DAN. Furthermore, these virtual devices and objects in the DAN gateway can be customized to suit a particular application. This represents a tremendous amount of flexibility and allows object-oriented programming techniques for developing the building control system. This implies that part of the development process is to define and develop the appropriate objects that will exist in the DAN gateway and program them into that device. Standards such as the OSGi are specifically targeted toward defining a gateway infrastructure that allows just this sort of functionality.
© 2005 by CRC Press
The Use of Network Hierarchies in Building Telemetry and Control Applications
33-15
33.4 Conclusions Buildings typically contain a number of different networks that are installed for a variety of applications. With the use of appropriate routers and gateways, these heterogeneous networks can be integrated together to implement a building-wide control system that allows both IT applications and control devices and applications to interoperate. There are a variety of issues that must be resolved to integrate the various networks, devices, and applications together, but as discussed in this chapter, the use of intermediate gateways can address a wide range of problems that may exist. The end result is a system that is less expensive, more flexible, and generally contains greater functionality than what can be achieved by using a single control network.
References [1] IEEE 802.1D (2004), IEEE Standards for Local and Metropolitan Area Networks: Media Access Control Bridges. IEEE 802.1F (1993), IEEE Standards for Local and Metropolitan Area Networks: Common Defintions and Procedures for IEEE 802 Management Information. ISO/IEC 8802-2 (1998), Information Technology: Telecommunications and Information Exchange between Systems: Local and Metropolitan Area Networks: Specific Requirements: Part 2: Logical Link Control. ISO/IEC 8802-3 (2000), Information Technology: Telecommunications and Information Exchange between Systems: Local and Metropolitan Area Networks: Specific Requirements: Part 3: Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access Method and Physical Layer Specifications. [2] Internet Engineering Task Force RFC Index, www.ietf.org. [3] EIA-709.1 (March 1998), Control Network Specification, Electronic Industries Alliance, Arlington, VA. [4] DALI, Performance Standard: IEC 929/EN 60929 Annex E (Control Interface for Controllable Ballasts). [5] Rubinstein, F.M., S. Treado, P. Pettler (2003), Standardizing communications between lighting control devices: a role for IEEE P1451, in Proceedings of the 2003 IEEE-IAS Annual Conference, Salt Lake City, UT. [6] HomePlug 1.0 Specification, HomePlug® Powerline Alliance, www.homeplug.org. [7] ANSI/EIA-709.2 (2000), Control Network Powerline (PL) Channel Standard, Electronic Industries Alliance, Arlington, VA. [8] IEEE 802.15.4 (2003), IEEE Standard for Telecommunications and Information Exchange between Systems: LAN/MAN Specific Requirements: Part 15: Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications for Low Rate Wireless Personal Area Networks (WPAN). [9] Zigbee Specification 1.0 (to be ratified Q4/2004), Zigbee Industry Alliance, www.zigbee.org. [10] IEEE 802.15.1 (2002), IEEE Standard for Telecommunications and Information Exchange between Systems: LAN/MAN Specific Requirements: Part 15: Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications for Wireless Personal Area Networks (WPANs). [11] IEEE 802.15.3 (2003), IEEE Standard for Telecommunications and Information Exchange between Systems: LAN/MAN Specific Requirements: Part 15.3: Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications for High Rate Wireless Personal Area Networks (WPAN). [12] ANSI T1.102 (1993) (R 1999), Telecommunications: Digital Hierarchy: Electrical Interfaces. [13] Starr, T., J. Cioffi, P. Silverman (1999), Understanding Digital Subscriber Line Technology, Prentice Hall, Englewood Cliffs, NJ. [14] DOCSIS 2.0 Interface Specifications, CableLabs, www.cablelabs.com. [15] EIA-852 (November 2001), Tunneling Component Network Protocols over Internet Protocol Channels, Electronic Industries Alliance, Arlington, VA.
© 2005 by CRC Press
33-16
The Industrial Communication Technology Handbook
[16] ANSI/ASHRAE 135 (2001), BACnet: A Data Communication Protocol for Building Automation and Control Networks. [17] OSGi Service Platform, Release 3 (March 2003), OSGi Alliance, www.osgi.org. [18] IEEE 1451.1 (1999), IEEE Standard for a Smart Transducer Interface for Sensors and Actuators: Network Capable Application Processor Information Model 2000. IEEE 1451.3 (2003), Standard for a Smart Transducer Interface for Sensors and Actuators: Digital Communication and Transducer Electronic Data Sheet (TEDS) Formats for Distributed Multidrop Systems. IEEE P1451.4 (D3.0), Draft Standard for a Smart Transducer Interface for Sensors and Actuators: Mixed-Mode Communication Protocols and Transducer Electronic Data Sheet (TEDS). [19] CABA XML/Web Services Guideline Committee, CABA Standards Committee, CABA, www.caba.org. [20] Common Object Request Broker Architecture (CORBA/IIOP) Specification, Version 3.0.3 (March 2004), Object Management Group, www.omg.org. [21] Microsoft Corporation (2003), Enterprise Solution Patterns Using Microsoft.NET, Microsoft Press, Redmond, WA. [22] Collection of J2EE 1.4 Specifications, Sun Microsystems, www.sun.com/1.4/docs/index.html.
© 2005 by CRC Press
34 EIB: European Installation Bus 34.1 Introduction ......................................................................34-1 Home and Building Automation • EIB: Key Features • Competitors
34.2 Group Communication ....................................................34-4 Group Objects and Shared Variables • Interworking
34.3 Twisted-Pair EIB................................................................34-7 Topology • Standard Message Cycle • Format of the Standard Data Frame • Other Frame Formats
34.4 Medium Independent Layers..........................................34-10 Overview • Horizontal Runtime Communication • Network Management • Interface Objects
34.5 Development, Configuration, and Integration .............34-13
Wolfgang Kastner Vienna University of Technology
Georg Neugschwandtner Vienna University of Technology
Configuration Tools • Node Development • Tunneling and Gateways
34.6 The Future of EIB: KNX.................................................34-17 References ...................................................................................34-18 Further Reading .........................................................................34-18
34.1 Introduction In building automation, the exchange of control data is a key issue. Small data amounts have to be transmitted infrequently, but robustly over long distances. A number of field area networks (FANs) cater to this domain. The European Installation Bus (EIB) is an open* and well-established representative of this group expressly aimed at enhancing electrical installations in buildings.
34.1.1 Home and Building Automation Although automation scenarios regarding buildings of all sizes and purposes are frequently referred to using the popular catchall term home and building automation, one should be aware that requirements and expectations differ significantly across this spectrum. Within this chapter, home shall refer to a small-scale installation in the residential context. It is most often associated with an owner-occupied detached house or flat. Building, on the other hand, is taken short for a functional building, implying the presence of a significantly more complex automation system.
*Consequently, EIB is discussed in a manufacturer-independent fashion in this chapter. For a list of manufacturers and vendors of certified EIB devices and solutions, please contact Konnex Association.
34-1 © 2005 by CRC Press
34-2
The Industrial Communication Technology Handbook
Electric and electronic devices in both homes and buildings that could potentially be integrated into an automation system can be divided into several groups according to their function: • • • • • • • • •
Lighting and window blinds HVAC (heating, ventilation, and air conditioning) systems, including water heating White goods (household appliances), like a washing machine or stove Brown goods (audio/video or home theater equipment, game consoles) Communications equipment, both on and off premises (intercom system, telephone) Information processing and presentation equipment (PCs, tablet PCs, PDAs) Security and access control Safety alarm systems Elevators and sundry special domains
These device groups are considered with different priorities for integration into an automation system in homes and buildings, since application scenarios are of different importance as well. As an example, consider integration of the home theater system with blinds and lighting vs. escalation management of technical alarms. Which of the above fields is selected for enhancement to make a home or building “smart” seems to be a matter of opinion. Regarding homes, the focus has shifted from home control (i.e., lighting scenes and the like) more toward information (and “infotainment”) systems, like the “smart fridge.” Homes and buildings differ in other aspects as well. Functions that in the home can be provided by a single device will be taken over by entire systems as buildings get larger (and thus the complexity of these tasks increases). Also, the reasons for opting for an automation system are different. In the home, they are chosen for an increase in personal comfort, safety, and security. In buildings, economic utility is the main driver. The ability to control and optimize the use of a large number of energy-consuming devices allows immediate considerable financial savings to be made. One should not be confused by the fact that some of the devices listed above handle high-bandwidth data, like media streams. For automation purposes, only control information is relevant. Control data exchange only involves small information items. Yet, these have to be collected from and distributed to many dispersed locations. Field area networks, which are specifically designed for narrowband control applications, are ideally suited to this task. Eventually, Internet Protocol (IP) networks will probably reach the field level. Yet, this cannot be expected to happen anytime soon. Thus, there will still be a market for narrowband FANs targeting the installation domain for a considerable time.
34.1.2 EIB: Key Features EIB is a field area network designed to enhance electrical installations in homes and buildings. Thus, its technical concept is focused on incorporating devices that are installed in a fixed place and permanently connected. Main application areas of EIB solutions are lighting, control of window blinds, and HVAC systems. The design of EIB is powerful enough to create tailor-made complex installations for large buildings. Nevertheless, it is also being marketed for upscale home automation. Rather high component prices, however, limit its spread in the latter field. The technology is well established in central Europe, with a great variety of components being readily available from multiple companies. Although their products are compatible, companies use different brand names to refer to EIB technology in their product ranges. The clear alignment of EIB toward electric installations in buildings is visible in a number of properties. Most obviously, devices come in housings appropriate for this field. Rail mount housings, for example, allow the devices to be mounted in a distribution box easily. Here, the standard 35 mm top hat profile rail can be extended to provide bus connectivity in addition to being a mechanical fixture. This is accomplished by a piece of printed circuit board inserted into the recess of the rail. The copper tracks that this data rail provides remove the need for stringing bus wires between devices sharing a rail.
© 2005 by CRC Press
EIB: European Installation Bus
34-3
Also, many flush mount devices are available. While rail mount housings are frequently used for switching and dimming actuators, flush mounting is especially suited to sensors, like push buttons or a temperature adjustment knob. With EIB, such devices are frequently split into two parts, bus coupling units (BCUs) and application modules. While BCUs are standard modules providing bus connectivity, which fit into a standard installation socket, the different application modules contain the user-visible part. The physical external interface (PEI), which connects both, is standardized as well. This principle is very similar to the one used with conventional flush mount switches, where the same switch mechanics can be used with various cover plates. These standardized housing form factors ensure that integrators have a choice of interchangeable components by different manufacturers for most applications. This not only offers increased flexibility when creating an EIB system, but also proves beneficial in maintenance because of the dramatically reduced risk of vendor lock-in. The bus wiring was designed to comply with the demands of professional electricians as well. EIB uses cables that can be threaded into prelaid conduiting and allows arbitrary branching — just like a standard main distribution lead. Nonetheless, a fully fledged EIB installation can cover up to several thousand meters and contain tens of thousands of devices, with data traffic still remaining reasonably robust against interference. For renovation tasks and situations where no cables can be tolerated at all, powerline communication and radio transmission are available.* All media can be mixed and matched within an installation as needed. They all operate at data rates below 10 kbit/s, which is sufficient for the applications envisaged. EIB is a peer-to-peer network system. Access to the communication medium is handled using a distributed algorithm. Likewise, the mapping between sensor inputs and desired actuator actions is maintained in a decentralized way. By avoiding the need for a central station controlling these aspects, such a design eliminates a potential bottleneck and a single point of failure. It allows functionality to degrade gracefully in case of node failures. This is a desirable quality in the functional areas covered by EIB. For instance, the failure of a single light can in most cases be easily tolerated, but complete loss of lighting due to anything but a main power failure will not be acceptable. The definitive and binding specification for the EIB is centrally maintained as a consortium standard by Konnex Association. Konnex Association recently took over this task from EIBA (EIB Association), which was founded in 1990 by European installation technology companies involved in the development of EIB. The specification encompasses the complete network protocol as well as application-level interworking profiles. It also regulates basic system components like transceiver integrated circuits (ICs) and BCUs. The specification is openly available to interested parties at nondiscriminatory conditions. Konnex Association not only coordinates further development of the specification, but is also involved in the management of intellectual property rights (IPR). This activity is necessary since implementations of the standard will necessarily touch IPR property (i.e., patents) of companies that have contributed material to the common specification. Konnex members automatically gain the necessary IPR licenses with no additional expenses besides the fixed membership fee. Konnex Association also oversees certification of devices and installer training.†
34.1.3 Competitors In the domain of field area networks for home and building automation, only a small number of different technologies exist. Some of these further confine themselves to specific aspects of the field, like Digital Addressable Lighting Interface (DALI) [7], or only allow systems of very limited size, like X10. When narrowing the choice to universally applicable peer-to-peer fieldbus systems comparable to EIB in flexibility and performance, LON and LCN emerge as main contenders. *Radio transmission is already defined under the umbrella of KNX, the follow-up standard to EIB (cf. Section 34.6 The Future of EIB: KNX). Nevertheless, it is backward compatible. †Cf. http://www.konnex.org/Graphic_guidelines.zip, Section 3.2.2: Writing the association name.
© 2005 by CRC Press
34-4
The Industrial Communication Technology Handbook
LON (Local Operating Network) was designed with utmost flexibility in mind. It should be applicable to all kinds of control systems where fieldbus technology could potentially be of benefit. For example, it offers a comprehensive range of network media, data rates, and transmission ranges. On the protocol level, aspects like the selection of services to be supported, the presentation of data, or even the address length can be freely chosen by the system designer. Its communication protocol, called LonTalk, is disclosed via U.S. and European formal standards [3, 5]. To ensure interoperability in spite of these numerous degrees of freedom, LonMark profiles prescribe a specific subset of LON technology to be used, as well as behavioral aspects. None of these requirements are laid down as a formal standard, however. Under this umbrella, a number of interoperable solutions for building automation have emerged. Still, LON-based solutions generally require higher competence from system integrators. They are therefore considered to be better suited for complex applications where far-reaching control of technical aspects is required. As a result, LON technology is very rarely applied in home environments. Nonetheless, it has definitely gained a foothold in building automation. LCN (Local Control Network) is mainly present in EIB’s German home market. In contrast to LON, it was developed with a very specific focus on home and building automation. Consequently, it shares some properties with EIB, but takes quite different technical approaches in places. For example, LCN uses a free strand in standard main distribution cables for communication. Assessing the potential of LCN technology is difficult since the specifications are proprietary and not disclosed. LCN also lacks the backing of large component manufacturers. It is practically a single-vendor system.
34.2 Group Communication EIB group communication allows the passing of a piece of information to an arbitrary number of receivers by way of a single message. This is achieved by making use of a publisher–subscriber model. The sender uses a logical group address as its destination address. Receiving stations know which group (or groups) they belong to, and accordingly, they either ignore or process incoming messages. Thus, a sender does not require information about which nodes will actually be receivers of its message. Any node can elect to subscribe to a group without the publisher knowing. Actually, it is neither necessary nor possible for a node to determine which other nodes will act as publishers or subscribers to a specific group address. The knowledge concerning which nodes participate in a certain group communication relationship is distributed over all nodes in the system. Figure 34.1 illustrates the resulting communication model. It shows how an additional publisher and subscriber can enter a group communication relationship without any modification to the original communication partners.* Obviously, the setup presented would not be possible when using maintained contact switches. Their rockers would remain in position even if the state of the lights was changed using the other switch, creating an inconsistency. Therefore, intelligent building installations generally use momentary contacts in switch sensors to avoid this kind of restriction when influencing the state of an actuator from multiple places. This includes not only obvious cases like corridor lighting, but also every kind of central control functionality. Now, one might argue that switches need not consist of two separate halves for issuing on and off commands, as they do in the example. A single button toggling the state of the light at every press by alternately transmitting 1 and 0 messages would suffice. Although this is correct, such a solution introduces additional complexity. Such an approach requires the switch to keep track of the controlled actuator’s state, even if the latter is changed by another node. Otherwise, it would exhibit inconsistent behavior, requiring two presses whenever the state it assumes the actuator to be in does not correspond with reality.
*In practice, one usually would not install a second actuator for the sole purpose of controlling two lamps in parallel, but of course connect them both to the power output of the first.
© 2005 by CRC Press
EIB: European Installation Bus
34-5
FIGURE 34.1 EIB regular operation: the publisher–subscriber model.
34.2.1 Group Objects and Shared Variables EIB uses a shared variable model to express the functionality of individual nodes and combine them into a working system.* Every device publishes several application-related variables that expose specific aspects of its functionality. They will usually be either data sources providing information to other devices or data sinks that carry out certain actions according to the information received. Examples for the former would be the position of a switch; for the latter, the control input of a relay. These application-related variables are referred to as group (communication) objects. It should be noted that these represent a single value only and thus are loosely related at best to the notion of an object in object-oriented programming. Group objects of various devices are grouped at setup time to form network-wide shared variables. The values of all group objects assigned to the same group will be held consistent by the nodes’ system software. Continuing the above example, this would allow the switch to control the relay by linking the states of their respective group objects. Group membership is defined individually for each group object of a node. Group objects can also belong to multiple groups. This is a key feature of EIB, which allows elegant realization of central functions. By including all light-switching actuators in the house in a common group besides their standard control group, a master switch by the door can easily turn them all off when the owner is leaving the house. In this case, however, the illusion of a purely state-based model can no longer be maintained. Since a group object cannot assume two — possibly different — shared values at once, it will keep the state of the last update. No limitations exist concerning the semantics associated with the individual group objects. This binding is entirely within the local responsibility of a node application. Thus, the same message from a master switch may turn off a light as well as cause a sun blind to go down. Every group of communication objects is assigned a unique group address. This address is used to handle all network traffic pertaining to the shared value. Usually, data sources will actively publish new values, although a query mechanism is provided as well. Thanks to the publisher–subscriber principle, the group address is all a node needs to know about its communication partners. Installations will often run into large numbers of group addresses. Therefore, it is useful to assign these using a scheme indicating the functionality associated. Usual practice is to have the 4 most significant bits form the main group, while the 11 (or alternatively 8) least significant bits describe the subgroup. In the case of an 8-bit subgroup, the remaining bits form the middle group. The group address 32767 therefore reads 15/2047 in two-field or 15/7/255 in three-field notation. This division, however, is relevant *Despite the state-based semantics of this model, the underlying communication is event-based.
© 2005 by CRC Press
34-6
The Industrial Communication Technology Handbook
FIGURE 34.2 EIB regular operation: end-to-end overview.
exclusively for visual presentation, not operation. Any field values (or entire group addresses, for that matter) can be chosen freely. Figure 34.2 illustrates what was discussed so far by showing which steps are involved in turning on a light using EIB. Simple user applications, like those for polling a switch and operating a relay, are usually hosted by the BCU in addition to the standard implementation of the EIB network stack. They can use the BCU’s Physical External Interface for exchanging arbitrary signals with the application module. The lookup process the network stack performs concerning the association of group objects and group addresses is simplified for the purpose of this illustration: actually, no such integrated lookup table exists, but multiple ones are used, as will be detailed in Section 34.4.
34.2.2 Interworking For the EIB network stack, the contents of shared variables are opaque octet strings only. Yet obviously, all group objects of a group need to use a common encoding to ensure that the shared value will be interpreted in a consistent way. For this purpose, the EIB interworking standard (EIS) defines a standardized bit-level syntax for various variable types, including Boolean values, signed and unsigned integers of multiple widths, and time/date and floating-point values. Yet, type checking is only performed at setup time. No information concerning the type of a shared variable is ever exchanged over the network. EIB devices interact by exchanging trigger information only, mostly Boolean values. The same Boolean value may directly control the state of a relay, but it may be used to trigger a time delay switch as well — without the publisher knowing. Consequently, EIB relies on qualified integrators to build working solutions. For the most part, EIS therefore does not standardize any behavioral aspects. It does, however, define interactions between controllers and actuators for light dimmers and sun blind drives.
© 2005 by CRC Press
34-7
EIB: European Installation Bus
TABLE 34.1 EIB Example Device: Switching Actuator Application Program Switch
Stairway lighting switch
Delayed switching
Parameters
Group Objects
Logic operation (none/AND/OR) Contacting behavior (make/break) Power-on behavior (off/on/last state) Logic operation (none/AND/OR) Contacting behavior (make/break) Stairway lighting duration (minutes) Logic operation (none/AND/OR) Contacting behavior (make/break) Switch-on delay (minutes) Release delay (minutes)
Switch (in) Logic input (in) Feedback (out) Switch (in) Logic input (in) Feedback (out) Switch (in) Logic input (in) Feedback (out)
For illustration, Table 34.1 shows how the behavior of a switching actuator may be controlled. Since most devices support multiple applications, the integrator first selects the one appropriate for his purpose. Available parameters and group objects depend on this choice. Still, during setup, application parameters are set. Note that only values that are expected to change frequently are made available for runtime exchange via group objects.
34.3 Twisted-Pair EIB The mainstay physical medium of EIB is twisted-pair cabling. A single wire pair (2 ¥ 0.8 mm diameter) is used for both data and power transmission. Although recommended, shielding is not mandatory. Proper sheathing and resistance against mechanical and thermal stress are required for EIB-certified cables to allow them to be threaded into prelaid conduiting, in immediate vicinity of the main wiring. Still, SELV (safety extra-low voltage) conditions are maintained on the EIB network itself.
34.3.1 Topology The physical segments of an EIB network are called lines. A line can accommodate up to 256 devices in free topology. Loops are permissible, but should be avoided. Terminating resistors are not required. The maximum allowable accumulated cable length of a line is 1000 m. Since EIB networks provide link power, one bus power supply unit (BPSU) per line is required. BPSUs contain a 30-V direct-current (DC) supply and a choke for signal shaping. Since the original EIB specification (referred to as TP64) allowed only 64 devices per electrical segment, devices with two-segment connections forwarding all messages in both directions were introduced. Thus, a line would be made up of up to four electrical segments. Although more of a bridge in Open Systems Interconnection (OSI) terminology since they buffer telegrams, these devices were termed line repeaters. With the current TP256 specification, lines coincide with electrical segments. Line repeaters can still be useful in special cases to extend the overall length of a line up to 4000 m, or even further if nonstandard — yet compatible — solutions like fiber-optic transmission are used, which are offered by certain manufacturers. Lines can be connected by routers (referred to as couplers) to form a tree structure, as illustrated in Figure 34.3. Up to 15 lines (in this case frequently referred to as sublines) can be connected by a main line to form a zone. A maximum of 15 zones can in turn be coupled by a backbone line. No loops are allowed.* Usually, levels of this hierarchy will be matched to the division of a building (or complex of buildings) into its structural elements, like individual buildings, staircases, floors, offices, apartments, or rooms. Overall, an EIB installation can accommodate more than 60,000 end devices. *Strictly speaking, though, the hierarchy is only treelike since sublines may be attached to the backbone directly. This rather odd feature should not be used in practice and has been omitted from the illustration.
© 2005 by CRC Press
34-8
The Industrial Communication Technology Handbook
FIGURE 34.3 EIB network topology.
Couplers will only forward messages that would not reach their destination otherwise. Thus, locality of information is exploited to reduce the overall load on the network by segmentation. This is not sufficient, however, if central monitoring and control of the entire installation are desired. In this case, the amount of traffic flowing to (and possibly from) the monitor station will quickly exceed the load limit of the EIB network on the backbone line. The obvious solution is to switch to a higher-speed backbone connection. For this purpose, methods for tunneling of EIB data frames over Ethernet (EIBnet) [6] and arbitrary networks using the Internet Protocol (EIBnet/IP) are included with the EIB specifications.
34.3.2 Standard Message Cycle Stations share the communication medium by way of asynchronous time-division multiplexing. The contention protocol used is carrier-sense multiple access (CSMA) with bit-wise arbitration. EIB utilizes balanced baseband signal coding. A logical 0 is encoded as a negative voltage imprinted on the DC supply voltage, created by a current pulse within the sender. It is always followed by a positive compensation pulse supplied by the BPSU choke, yielding a constant component free signal cycle overall. A logical 1 is identical with the idle state of the transmission medium. Thus, the transmission of a logical 0 will always dominate the medium. A sending station will always examine the state of the bus line in parallel (listen-while-talk). In case it encounters a 0 value it did not transmit, it assumes a collision and cancels the transmission. The other station can continue its transmission undisturbed. Data are transmitted with a rate of 9.6 kbit/s, yielding a bit time of 104 ms. Figure 34.4 illustrates the timing of the message cycle. Data frames are of variable length. After a pause of 15 bit times, the receiving node responds with an acknowledgment frame. It consists of a single character containing information about whether reception was successful, erroneous, or unsuccessful since the receiver was too busy to process it. Before any station may attempt to transmit another frame, the line has to be kept idle for at least 50 bit times. Stations intending to send a message with standard or low priority have to wait for three additional bit times, giving precedence to messages with system or high priority and frames repeated due to negative or missing acknowledgment. In the latter case, it is usual practice to resend the frame up to three times.
FIGURE 34.4 Medium access timing for twisted-pair EIB.
© 2005 by CRC Press
EIB: European Installation Bus
34-9
FIGURE 34.5 EIB standard data frame.
Couplers will positively acknowledge every frame they pass on and autonomously repeat its transmission on the other interface, if necessary.
34.3.3 Format of the Standard Data Frame Twisted-pair EIB makes use of character-oriented asynchronous start–stop signaling. Characters contain eight data bits, even parity, and one stop bit (8e1) and are always followed by two idle bit times. The format of the standard data frame is shown in Figure 34.5. The character corresponding to octet 0 is sent first (corresponding to the direction of reading). Within every octet, however, individual bits are sent in ascending order of significance, i.e., from right to left when looking at the illustration. The illustration also specifies the OSI layers associated with the functionality supported by its various fields. Octet 0 contains the frame priority, which was already mentioned. Priority bit patterns are chosen so that frames with higher priority will win the arbitration cycle over ones with lower priority. The repeat flag marks frames that are retransmitted due to the sender not receiving a positive acknowledgment. Without this information, a receiver could not tell a repeated message from a mere duplicate in case it received both messages intact. This can, for example, happen when the acknowledgment frame is destroyed. EIB uses two distinct types of station addresses: individual (also referred to as physical) and group. The frame source address will always be an individual address, while its destination may be specified as an individual or group address. The individual address of an EIB node is closely related to its position within the topological structure of the network. It specifies the number of the zone and (sub)line the device resides in, as well as its device number within the line. Zero values in any of these fields designate special entities: line couplers (device number is zero), main lines (line number is zero), backbone couplers (line and device numbers are zero), and the backbone line (zone and line numbers are zero). Since individual addresses are unique within an installation, all collisions will be resolved after the transmission of the frame source address. The individual addressing scheme is maintained on other physical media supported by EIB as well to ensure a consistent topological view. Especially on open media like powerline, however, this topology will only be logical, instead of both logical and physical, as with twisted-pair cabling. Group addresses are purely logical identifiers. Stations privately maintain a list of groups they are members of, comparing destination addresses of incoming frames against it to determine whether they are addressed. Every device is a member of group 0, which is used for broadcast messages. In case of a group as the destination address, multiple acknowledgment frames will be returned. Since they will be superimposed on one another, acknowledgment character bit patterns are chosen specifically to ensure that negative acknowledgments will overwrite positive ones. Identification of duplicate frames by way of the repeat flag is especially important in this case, since a single addressee (which may have been busy) returning a nonpositive acknowledgment will cause the sender to retransmit. Yet, all others may have received the frame in perfect condition. The routing counter ensures that a frame will appear in seven physical segments at maximum to avoid frames circulating endlessly due to errors in the network setup. Its initial value (usually six) will be decremented by every coupler (and line repeater) passing on the frame. Frames whose routing counter
© 2005 by CRC Press
34-10
The Industrial Communication Technology Handbook
is at zero will not leave the segment. The special value of seven is never decremented, allowing the disabling of this mechanism for diagnostic purposes. Apart from properly initializing the routing counter, the OSI network layer is empty for EIB end devices (i.e., devices other than couplers and line repeaters). Finally, the length field specifies the size of the network layer service data unit (N-SDU), which can be 1 to 16 octets. The frame is concluded by the check octet, which contains an odd horizontal parity value calculated over all preceding frame octets. Together with the even vertical parity information contained within the protocol characters, it enables the receiver to perform cross-parity checking. This way, arbitrary double-bit errors can be detected. In principle, correction of single-bit errors is also possible, but not performed by current communication controllers.
34.3.4 Other Frame Formats The standard message frame presented, although by far the most frequent, is not the only format possible. All formats use the same control octet, but with different values for the two most significant bits. 00 at this position identifies an extended data frame, which extends the maximum permissible frame length beyond the 23-character limit and contains additional type bits for future protocol extensions. Poll requests (11) allow a station to request status data from up to 15 other nodes with minimal protocol overhead. All participating devices have to share the same physical segment, and the data returned are limited to one character each. Responses have to be sent in a narrow time slice following the request frame (the message cycle can be roughly likened to a standard data frame with multiple subsequent acknowledgment characters). Polling groups are especially useful for high-frequency liveness checking of sensors in security applications and are only supported on twisted-pair EIB.
34.4 Medium Independent Layers The EIB protocol stack is aligned with the OSI model. Its structure is illustrated in Figure 34.6. The session and presentation layers are left empty, as is frequently the case with field area networks. Beyond the OSI application layer, EIB offers a standardized application environment, referred to as user layer.
FIGURE 34.6 EIB network stack and application environment.
© 2005 by CRC Press
EIB: European Installation Bus
34-11
34.4.1 Overview The design of EIB is based on the assumption that regular operation will frequently include addressing groups of communications partners simultaneously. Consequently, logical group addressing based on the publisher–subscriber principle is provided on the lowest layer possible, making it a native protocol feature. Medium access control and frame format on twisted-pair cabling, as well as the publisher–subscriber principle, have already been introduced in Sections 34.2 and 34.3. The necessary table of group addresses a node is subscribed to is maintained by the data link layer as well. Incoming frames will be passed to upper layers when their destination addresses appear in this table or match the individual address of the node. The transport layer provides broadcast, multicast, and unicast unreliable datagram services as well as reliable point-to-point connections. Multicast and broadcast services are implemented on top of the group addressing facility of layer 2, while the remaining services use individual addressing. All datagram services rely on the best-effort semantics offered by layer 2 and are restricted by its maximum frame length. While layers 1 to 4 implement protocol functionality, the purpose of the application layer is to define end-to-end semantics of services for maintenance and regular operation, which will be detailed in the following passages. Service requests by and responses from the layer 7 user are mapped to protocol data units (PDUs) and passed to the peer application layer of the receiver by way of appropriate transport layer services. Likewise, incoming PDUs are mapped to appropriate indications and confirmations. The user layer places an additional level of abstraction between layer 7 and the user application. On the one hand, it eases the handling of shared variables, allowing the user application to use them in a way similar to that of local variables, while being notified on external updates. On the other hand, it autonomously handles incoming network management calls, allowing application programmers to concentrate on the actual application. Conceptually, this functionality is broken up into a number of server entities.
34.4.2 Horizontal Runtime Communication This part of layer 7 provides two services for the communication between sensors and actuators according to the shared variable model (as described in Section 34.2). They use the transport layer multicast facility (which in turn is based on layer 2 group addressing). GroupValue_Read is a confirmed service to retrieve the current value of a shared variable from the network. GroupValue_Write is an unconfirmed service to indicate to group members that the value of a shared variable has changed and to provide them with the new value. Application layer primitives refer to group objects only, which therefore act as application layer service access points (A-SAPs). The association between group objects and group addresses is maintained by layer 7.* This clearly separates network management issues (i.e., communication relationships) from application functionality. In the following, the service primitives involved are introduced: GroupValue_Write.req causes the network stack to transmit the value associated with an A-SAP for perusal by interested subscribers. It is usually invoked whenever this value has changed. GroupValue_Write.ind informs the layer 7 user that a value update for a certain A-SAP was received. GroupValue_Read.req causes the network stack to transmit a poll request for the value associated with an A-SAP. Any number of responses (including none) can be expected. These may also contain differing values.† The system integrator is responsible for ensuring useful responses by selecting a single station — which can be assumed to be in possession of the correct value — to answer. For this purpose, the layer 7 users may ignore GroupValue_Read indications. *Actually, the application layer does not directly operate on group addresses, but on transport layer SAPs. These identifiers are mapped one to one to group addresses by layer 4. This additional level of indirection is necessary for implementation reasons only. †This is due to the fact that group objects may be associated with multiple groups, as discussed in Section 34.2 Group Communication.
© 2005 by CRC Press
34-12
The Industrial Communication Technology Handbook
GroupValue_Read.con informs the layer 7 user about the value associated with an A-SAP as seen by another group member. It corresponds to GroupValue_Write.ind, but does not imply that new information is available. The GroupValue_Read.req, which caused the activation of this primitive, need not have been issued locally. GroupValue_Read.ind informs the layer 7 user that a request for the current value associated with an A-SAP was made by another node. It is free to answer (or ignore) the request. GroupValue_Read.res corresponds to GroupValue_Write.req, but is not sent unsolicited (i.e., only in response to GroupValue_Read.ind).
34.4.3 Network Management Commonly, the term network management is used to describe the making of all necessary provisions for transmitting data actually useful from a user’s perspective. In the EIB management model,* a distinction is made between management procedures whose semantics are sufficiently defined by the application layer services and others that demand specific knowledge about the internal structure of the device being managed. The latter are referred to as device management functions. Implementation-independent network management is effectively restricted to the assignment of individual addresses. IndividualAddress_Write allows the setting of the individual address of a node. The desired value is transmitted via broadcast. Usually, the node to be affected is designated manually by pressing a button on this device, which will cause it to enter the so-called programming mode. Devices not in programming mode will ignore the telegram. A variant of this service identifies the target node by including its unique serial number in the telegram. Corresponding services exist to read out the individual address of devices currently in programming mode or possessing a certain serial number. The purpose of Memory_Read and Memory_Write (on unicast connection-oriented communication relationships) is to respectively read and modify the content of memory locations within the communication controller. DeviceDescriptor_Read provides a unique identifier for the implementation structure of a target device, which specifically includes memory locations of management resources like the group address table. In cases where the user application is not hosted by the BCU, the services UserMemory_Read and UserMemory_Write can be used to access memory in an external application processor. By way of these services, a logical address space of 64 KB can be addressed. Mapping of these logical memory locations to actual memory locations is within the responsibility of the application processor. UserManufacturerInfo_Read identifies the associated memory map in this case. Traditionally, management resources were solely accessed by directly writing to the memory of the communication controller using these services. On current communication controllers, these resources can be accessed in a more structured way through the use of interface objects.
34.4.4 Interface Objects EIB interface objects serve to provide access to management resources in an implementation-independent way. Instead of requiring data points (called properties) to reside at specific memory locations, they allow a property client to refer to them via standardized IDs. This will typically be a PC-based tool or controller, but could as well be the local user application. Properties of related functionality are grouped into objects. The resulting structure is shown in Figure 34.7.† Every property ID implies certain semantics associated with the property value. For example, property ID 12 holds the manufacturer of the device. This information is encoded as an 8-bit unsigned integer, which is uniquely associated with the name of the manufacturer.
*Although being aligned with the OSI model, EIB — like all networks in widespread use today — does not attempt to implement the OSI network management model. †Note that property and type IDs shown are partially fictitious and for illustration only.
© 2005 by CRC Press
EIB: European Installation Bus
34-13
FIGURE 34.7 EIB interface objects.
The value of a property is retrieved by passing its object index and property ID to the PropertyValue_Read service. To determine which interface objects are available, the property client iterates through object indices, retrieving the object type by requesting the value of the property with the well-known ID 1. While the meaning of ID 1 obviously has to be independent of the object type, other property IDs may be unique to a specific object type. For every object type, a basic set of mandatory properties is specified, which the property client can immediately access by stating their ID. Modifications are possible using the PropertyValue_Write service, which works in analogy to PropertyValue_Read. For information about additional properties available, the property client can step through property indices, passing them to the PropertyDescription_Read service. Meta data returned include both the property ID and a type identifier, which allows the correct display of the value data even if the property ID is unknown. This self-description mechanism reduces the a priori knowledge necessary to interact with a yet unknown device, and thus significantly improves ad hoc management. Access to properties can be secured by assigning them access levels. Every access level is associated with a single, static 32-bit secret key, which has to be presented by a property client before access is granted. In this case, connection-oriented access is necessary; otherwise, properties can be accessed on connectionless communication relationships as well. System interface objects are related to system management and include the device object (holding general information like the device serial number), the address table object, the association table object, and the application program object. In addition, every device can provide any number of application interface objects related to the behavior of the user application. On the one hand, their properties can hold application parameters that are normally modified during setup time only. On the other hand, they can contain runtime values normally accessed through group objects. Although this facility could, in principle, also be used for horizontal communication (i.e., between sensors and actuators), no situation is conceivable where this would be of benefit. Instead, this mode of communication is intended for vertical access. For example, it allows a central monitoring station to retrieve parts of the process image spontaneously. When using group objects, this demand would have to be prepared at setup time by establishing a proper group communication relationship.
34.5 Development, Configuration, and Integration Its vendor-neutral approach to planning and configuration is a distinctive feature of EIB. A common tool ensures uniform handling of this task for every certified EIB component. This makes multivendor systems
© 2005 by CRC Press
34-14
The Industrial Communication Technology Handbook
usable in practice, as it allows an integrator to mix and match products from the entire selection available without incurring excessive effort in setting up the system. The benefit of having a wide applicable range of components is further multiplied by the fact that bindings of group objects can be determined individually. This high degree of freedom already offers sufficient flexibility for the vast majority of applications. Even more complex interactions can be easily realized using PLCs with EIB interface, which are available from various manufacturers. Standardized communication controllers (BCUs) have a key role in enabling uniform handling of devices by the setup tool. Therefore, they are highly relevant to system developers as well. Even if many appliances integrate EIB natively and considerable improvements can be achieved through its use alone, integrating it with other networks offers further evident benefits. These include tunneling to support various remote access scenarios, routing to extend both range and bandwidth, and gateways.
34.5.1 Configuration Tools For planning an EIB network and properly setting up the participating devices, EIBA* maintains and distributes a single, official standard PC software package. This Microsoft Windows-based software is called EIB Tool Software (ETS). ETS assists with defining the project in a structured way and is able to configure certified EIB devices of any manufacturer regarding appropriate behavior and communication relationships over the bus. This includes loading application programs and setting application parameters. ETS also provides bus monitoring functions for troubleshooting. Compliance with this tool is a certification requirement for EIB devices. Manufacturers are required to supply the necessary device descriptions along with their hardware. Tools for their creation are provided by EIBA as well. ETS can also be extended by way of plug-ins, should the configuration of a device require it. Only a minor number of devices need further setup tools in addition to ETS. For interacting with such external tools, ETS supports the export of project files. Besides its obvious advantage when considering multivendor systems, a vendor-neutral configuration tool also facilitates the standardization of training, as well as actually holding the training. It also introduces a stability factor, offering a predictable long-term perspective. For all its benefits, the common tool approach definitely involves trade-offs as well. Obviously, it closes the door on competition as a possible source for innovation and improvement. Such improvements may include alternate modes of interaction, like a simplified mode for beginners offering fewer options — and thus fewer possibilities for erroneous configuration. In trying to broaden the market, various manufacturers attempt to fill this gap by offering different proprietary configuration approaches. These allow the circumvention of ETS when restricting the system to the specific manufacturer’s subset of EIB devices. Further configuration abilities can still be accessed by using ETS, if needed. Of these proprietary alternatives, one is a software tool closely modeled on ETS. Another adopts a more graphical approach with dramatically reduced configuration possibilities. Other solutions entirely do without a PC, delegating configuration to a controller component, which is integrated in a device providing certain runtime functionality. In one case, this is a rail mount actuator, which as a controller will even automatically detect connected sensors and actuators, while the other is a simple control desk.
34.5.2 Node Development When targeting the development of a new field device (sensor or actuator), BCUs and bus interface modules (BIMs) provide a convenient platform. They implement the network stack, including the user layer, and can also co-host simple user applications. This allows the creation of new applications without
*Despite the transition to KNX, with Konnex Association taking over all other tasks, EIBA retains responsibility for specification, marketing, and sales of the software tool set for organizational reasons. Konnex Association is required to lay the necessary foundation regarding standardization and certification requirements to actively promote the single tool.
© 2005 by CRC Press
EIB: European Installation Bus
34-15
getting involved with the details of network communication more than absolutely necessary. Both commercial and open-source identifier extensions (IDEs) are available. BCUs and BIMs are based on MC68HC05 and MC68HC11 family microcontrollers, which are preloaded with EIB system software. Their digital inputs/outputs and analog inputs are connected to the standardized 10-pin PEI connector. While BCUs come with housing and shielding against electromagnetic interference, BIMs do not, allowing for tighter integration into manufacturer-specific solutions. Available versions vary in EEPROM and RAM size (which, for example, affects the maximum size of the group address table) and execution speed. Also, two system software variants exist, which mainly differ in that system 1 does not support interface objects. Also, system 2 employs more complex load control procedures for management resources, like the application program. For high-volume designs, the transceiver ICs and microcontrollers used in BCUs and BIMs are also available separately. Since the processing power of a BCU is limited, more complex user applications have to be run on a separate microprocessor. In this case, the application processor can access all EIB-related functionality provided by the BCU (including support for group objects) using the PEI as a serial interface. Alternatively, the processor can implement the entire communication stack down to the MAC (medium access control) sublayer itself, using a standard transceiver IC for connection to the EIB medium. However, layer 2 is especially complex to implement, not at least due to its tight timing requirements. This also entails high certification costs. Therefore, TP-UART (Universal Asynchronous Receiver/Transmitter) ICs offer a more convenient solution. They interact with the application processor on the level of layer 2 frames via an asynchronous interface (UART). Of the layer 2 services, only the determination of whether its node is being addressed is left to the application controller. Certified software implementations of the remaining EIB protocol stack are available for various microcontrollers. For applications with even higher demands on processing power or on the human–machine interface, PC-based solutions come into play. Connection to the EIB network is usually accomplished using serial communication with a BCU. Additionally, Universal Serial Bus (USB) interfaces are available for legacy-free PCs. As an alternative, access by way of an intermediate network gains importance, as will be detailed below. For Microsoft Windows-based systems, EIBA offers a certified software component called Falcon that provides a high-level API for accessing functionality throughout the network stack. For the Linux operating system, which provides an interesting perspective toward cost-effective embedded platforms, both commercial and open-source kernel-level drivers for BCU access as well as TP-UART-based serial interfaces are available. Finally, the Eiblet API [9], with an implementation available free of charge, allows Java applications to interact with an EIB system at multiple levels of abstraction.
34.5.3 Tunneling and Gateways As a field area network, EIB is well suited for transporting narrow-bandwidth control data. Yet, large building automation systems will generate lots of data, which are to be collected in a single place for data acquisition and central control. Since high-volume data transfer is out of scope of FANs, other network technologies have to step in and act as a backbone to smaller FAN segments. A straightforward approach is to use them as an alternative transport medium, encapsulating FAN protocol frames in the host protocol. This method, called tunneling, can be used for connecting FAN segments with a high-performance backbone to overcome limitations of the FAN protocol regarding bandwidth, range, or both. It also offers an alternative to physically decouple applications like remote configuration or visualization from the bus line. Although tunneling can perfectly cover certain fields of application, it has two main drawbacks. First, the fact that the host network remains transparent to FAN end devices will inevitably cause timing problems when the latency of the former exceeds the range acceptable for the FAN protocol. Second, it is only applicable to scenarios where all participants can handle the FAN protocol. In cases where this requirement is not convenient or acceptable, information has to be converted at the application level using gateways.
© 2005 by CRC Press
34-16
The Industrial Communication Technology Handbook
FIGURE 34.8 EIBnet/IP.
Regarding IP connectivity, official tunneling support for EIB has been in existence for years. Since it was marketed as extending ETS for remote access (iETS), the server component on the EIB side is called the iETS server. Falcon can use this transparently as an alternative bus access method, instead of a locally connected BCU. Currently, a working draft incompatible to iETS for IP tunneling and routing (EIBnet/IP) is awaiting voting for inclusion in the EIB specification. With EIBnet/IP routing, EIBnet/IP routers take the place of backbone or line couplers. IP multicast is used by an EIBnet/IP client to discover EIBnet/IP servers and for routing. Retrieval of further descriptive information on an EIBnet/IP server and tunneling is handled via IP unicast. Although EIBnet/IP leverages the practical omnipresence of IP technology, it will in most cases be advisable to keep the automation network logically separated. EIBnet/IP explicitly does not address security issues to keep the protocol lean and reduce the processing power required in EIBnet/IP routers. This is done on the assumption that the backbone IP network will be tightly supervised and highly restricted or closed to network traffic from the outside. The use of EIBnet/IP in an intranet or over a Virtual Private Network (VPN) connection only is suggested as a key countermeasure against all threats. Besides allowing more fine grained handling of security issues, such separation also simplifies performance considerations. It may be achieved by routing or firewalling, with authorized users able to open VPN tunnels, as shown in Figure 34.8. It is the task of the firewall to handle authentication, authorization, and encryption to avoid burdening EIBnet/IP servers with these tasks. For future releases, the integration of gateway functionality into EIBnet/IP with the capability to handle both regular and maintenance operation is planned. Until then, a variety of proprietary implementations for interfacing not only with IP but with all kinds of other networks (including Integrated Services Digital Network [ISDN], POTS, PROFIBUS-DP, and Bluetooth) are available on the market. Albeit as a general rule, these are limited to group communication. For BACnet [1], a network protocol dedicated to building automation and control, the situation is different. The mapping between EIB and BACnet objects is laid down in both standards [2, 6]. Together with the specification of EIB tunneling on Ethernet (EIBnet), a
© 2005 by CRC Press
EIB: European Installation Bus
34-17
defined integration of these systems is ensured. Still, semantic differences remain to be bridged by the system integrator through proper gateway setup. For high-demand visualization and facility management, various PC software packages exist that will connect to the EIB network locally or via tunneling, if necessary through multiple ports. Besides standard soft PLC, logging, and alert escalation functionality, some products will also automatically dispatch work orders and handle room management. Additionally, any OLE for Process Control (OPC)-enabled software can be used by way of one of the OPC servers for EIB available from various vendors (including EIBA). Still another way of connecting is through one or more gateways via an intermediate automation-level network. A variety of compact, rail-mounted embedded IP gateways are available, which will additionally provide other features like: • ISDN terminal adaptor (TA) or POTS modem in addition to Ethernet interface (for dial-up Internet access or dial-in system access) • Hypertext Transfer Protocol (HTTP)/Wireless Application Protocol (WAP) server • Various messaging possibilities, including escalation management (e-mail, fax, cellular text messaging, voice call) • Logic functions (including timer functions and lighting scenarios), partially with visual setup • Logging (data acquisition) • Time servers • Visualization • Media integration (brown goods) Which of these functions are actually implemented in a particular device is related to the intended sales market. Some of these functions are not relevant for large-scale building automation, since they will be handled by the central server (like visualization and data acquisition), yet they are nonetheless of interest for smaller buildings. In the home environment, entirely different functionality comes to the forefront, like, for instance, being able to make a voice call to the gateway to receive a one-time PIN, which can be used to access the gateway from an Internet café without fear of eavesdropping. For home use, outsourcing of visualization functionality to data processing centers is offered as well. Besides personalized portal sites, these functions will provide remote services like dispatching voice calls in response to an alert condition.
34.6 The Future of EIB: KNX In 2002, EIB merged with BatiBus [5] and EHS (European Home System) [5] to form the new KNX standard. Although EIB is now correctly known by the name KNX TP1/PL110 S-Mode, EIB will definitely remain the label for a specific subset of KNX functionalities for quite some time. Besides the KNX Handbook [8] provided by Konnex Association, KNX is documented through formal standards as well. The relevant family of European standards [4] is maintained by CENELEC TC 205 (Home and Building Electronic Systems) in cooperation with Konnex Association. The normative process is also coordinated with CEN TC 247, which is concerned with building controls and has in the past also defined standards covering EIB technology. From the perspective of EIB, the transition to KNX brings about certain extensions to the well-known protocol stack, none of which are mandatory to implement. Regarding physical media, an additional twisted pair (TP0), an additional powerline (PL132), and the RF medium already mentioned in Section 34.1 were added. For both TP media, a specification that allows bus power to be provided in a distributed way by any device, instead of requiring dedicated BPSUs, was introduced. KNX also offers new configuration modes. While compliance with the single tool ETS (referred to as S-Mode [system mode] configuration) will be a requirement for all KNX devices, additional mechanisms are available in parallel.
© 2005 by CRC Press
34-18
The Industrial Communication Technology Handbook
E-Modes (easy modes) aim to provide installers with easier and less error-prone (albeit less powerful and flexible) ways of linking devices without the need for ETS. This is accomplished by: • Push buttons: The installer pushes special buttons on the devices that are to work together. • Logical tags: Devices that have the same tag number set (for example, via a code wheel) will cooperate. • Central controllers: This mode, making use of a dedicated controller device, is similar to the proprietary approaches outlined in Section 34.5. While the E-Modes like EIB/S-Mode are targeted at the installation domain, A-Mode (automatic mode) is intended for the integration of loose goods, in particular white goods. A-Mode devices will autonomously integrate themselves into the network in a plug-and-play fashion, establishing the necessary communications links without user intervention. Since E- and A-Modes cannot rely on an integrator’s competence to combine the data points of node applications into a properly working system, interworking issues are given ample room in the KNX specification. Functional blocks for various application domains are being specified, which describe sets of group objects and interface object properties with clearly defined semantics. Instead of the free binding possible in the S-Mode, E- and A-Modes always link data points at the granularity of functional blocks. Devices are not expected to implement the KNX specification in its entirety. Rather, they are expected to select a subset (profile) appropriate for the sales market intended. Consequently, certification is also done against specific profiles. With simpler configuration modes and other aspects aimed at accommodating resource-limited devices within the KNX standard, KNX opens up paths into the mass market. Also, endeavors to clarify and restructure the specification can be identified. Among others, this includes issues of terminology. The present chapter already uses the new terminology wherever applicable.
References [1] ANSI/ASHRAE 135 (2001): BACnet: A Data Communication Protocol for Building Automation and Control Networks. [2] ANSI/ASHRAE, Addendum d to ANSI/ASHRAE Standard 135-2001 (2004). [3] ANSI/EIA-709.1 (1999): Control Networking Standard. [4] EN 50090 (1994–2004): Home and Building Electronic Systems (HBES). [5] ENV 13154-2 (1998): Data Communication for HVAC Applications: Field Net: Part 2: Protocols. [6] ENV 13321-2 (1997): Data Communication for HVAC Applications: Automation Net: Part 4: EIB. [7] IEC/EN 60929, Annex E (2003): Control Interface for Controllable Ballasts. [8] Konnex Association (2004): KNX Specifications, Version 1.1. [9] Robert Ott and Heinrich Reiter (1999), Connecting EIB components to distributed Java applications, in Proceedings of the 7th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA ’99), Vol. 1, pp. 23–26.
Further Reading Konnex Association: http://www.konnex.org. EIB Association: http://www.eiba.com. Thilo Sauter, Dietmar Dietrich, and Wolfgang Kastner, editors, EIB: Installation Bus System, Wiley-VCH, Weinheim, 2001. European Home Systems Association: http://www.ehsa.com. BatiBus Club International: http://www.batibus.com. ISO/IEC JTC 1 SC 25 WG 1, Home Electronic System Standards: http://hes-standards.org. LonMark International: http://www.lonmark.org.
© 2005 by CRC Press
35 Fundamentals of LonWorks/EIA-709 Networks: ANSI/EIA-709 Protocol Standard (LonTalk)
Dietmar Loy LOYTEC electronics GmbH
35.1 Node Architecture .............................................................35-3 35.2 Network Infrastructure .....................................................35-4 35.3 Interoperability and Profiles.............................................35-4 35.4 Development and Integration Tools ................................35-4 35.5 New Developments for ANSI/EIA-709 Networks...........35-5 References .....................................................................................35-6
One of the characteristics of LonWorks“ is its unique seven-layer protocol implementation called LonTalk“ or LonWorks protocol. Unlike other fieldbus protocols, all seven layers of the International Organization for Standardization (ISO)/Open Systems Interconnection (OSI) reference model are actually defined and implemented in every single network node. One characteristic is its media-independent OSI layer 2, which supports various communication media like twisted-pair cables, power-line communication, radio frequency channel, infrared connections, fiber-optic channels, and Internet Protocol (IP) connections based on the EIA-852 protocol standard. The basic frame layout for twisted-pair media is sketched in Figure 35.1. The preamble allows for bit synchronization at the receiver, and the start bit (SB) indicates the beginning of the byte boundary. The header field carries information about the importance of the data packet expressed in a priority bit and an estimation of the number of expected data packets on the bus in the near future. This estimation is used to adjust the bus arbitration algorithm. The source address holds the subnet/node address of the transmitting node, and the destination address holds the address of the receiving node(s) followed by the user data, which can be between 1 and 228 bytes long. Finally, a 16-bit cyclic redundancy check (CRC) (CCITT polynomial) protects the frame from bit errors (Hamming distance 4), and the code violation (CV) indicates the end of the frame. After the CV, a new round of bus arbitration can take place if nodes have data to be sent over the network [1]. On twisted-pair cables a p-persistent carrier-sense multiple-access (CSMA) bus arbitration scheme is used. For other communication channels, e.g., IP over Ethernet, the arbitration scheme defined for this media is used by the EIA-709 protocol stack. Another characteristic is its elaborated OSI layer 3 that supports a variety of different addressing schemes and advanced routing capabilities, as shown in Figure 35.2. Every node in the network can be identified with a unique 64-bit physical address (Neuron ID) and with a logical address composed of
35-1 © 2005 by CRC Press
35-2
The Industrial Communication Technology Handbook
FIGURE 35.1 LonTalk frame layout on a twisted-pair cable.
three address elements: the domain identifier, subnet number, and node number. Node numbers can be in the range of 1 to 127, and subnet numbers between 1 and 255; hence, a maximum of 127 ¥ 255 = 32,385 nodes can be addressed within a single domain and up to 248 domains can be addressed. Domain gateways can be built between logical domains in order to allow communication across domain boundaries. Groups can be formed in order to send a single data packet to a group of nodes using a multicast addressed message. Optionally, routers can be used to keep local traffic within the subnet and only forward packets that are addressed to another subnet, as shown in Figure 35.2. An EIA-709 node can send a unicast addressed message to exactly one node using either the physical Neuron ID or the logical subnet/node (S/N) address, or a multicast addressed message to either a group of nodes (group address) or all nodes in the subnet or all nodes in the entire domain (broadcast address). OSI layer 4 supports four service types. The unacknowledged service simply transmits the data packet from the sender to the receiver and hopes that the message does not get lost or destroyed. The unacknowledged repeated service transmits the same data packet a number of times and assumes that at least one data packet will get through to the receiver. The number of retries is programmable. The acknowledged service transmits the data packet and expects an acknowledgment to come back from the receiver. If the acknowledgment is not received, the transmitter retransmits the same data packet. The number of retries is again programmable. The request–response service sends a request message to the receiver and the receiver must respond with a response message. This service type can be used to retrieve, e.g., statistics information from a network node. Authentication can be used to authenticate the sender at the receiver side. The interfaces to the application program are the so-called network variables (NVs). NVs are variables (like variables in a C program) that can be propagated over the network. NVs have a type, e.g.,
FIGURE 35.2 Addressing elements in EIA-709 networks. Up to 32,385 nodes can be addressed within one domain and up to 248 domains are possible.
© 2005 by CRC Press
Fundamentals of LonWorks/EIA-709 Networks
35-3
SNVT_temp to represent a temperature in degree Celsius or SNVT_amp to represent an electric current in amps or mA. SNVT is the acronym for standard network variable type. Each SNVT has a valid range defined by the upper and lower boundaries, a resolution, and a system integrator (SI) unit. Network nodes communicate with each other by exchanging NVs. Another way to communicate between nodes is explicit messages. These messages do not use predefined variable types but can use any combination of bits and bytes assembled in the data packet.
35.1 Node Architecture For the last 10 years the Neuron“ chip manufactured by Motorola, Toshiba, and Cypress was the only microcontroller that supported the LonTalk protocol. Only in the last few years, with the adoption of the LonTalk protocol as a European and ANSI standard, have some platform-independent implementations become available on the market [21]. The traditional LonWorks node is shown in Figure 35.3. The Neuron chip executes the LonTalk protocol and the application program that interfaces to sensors and actuators through the input/output (I/O) block. Two types of Neuron chips are available. The 3150 provides an external memory bus interface to expand the internal EEPROM and SRAM with external Flash and SRAM. The 3120 is optimized for low-node-cost applications and has a built-in 10-kByte ROM, 2-kByte EEPROM, and 2-kByte SRAM, but has no external memory interface [17]. An ANSI C derivative language called Neuron C is used to program the Neuron chip. Neuron C uses language extensions to schedule application events and to react to incoming data packets (network variables) from the network interface. LonWorks supports a variety of different communication media. The most popular media today are 78-kbps (EIA-709.3) or 1.25-Mbps twisted-pair communication, 4-kbps power-line communication (EIA-709.2), a 1.25-Mbps fiber-optic interface, and RS-485 interfaces at various bit rates between a few hundred bps and 1.25 Mbps. A brand new channel is the EIA-852 IP channel for tunneling EIA-709 data packets through IP (intranet, Internet) networks.
FIGURE 35.3 Typical LonWorks network node architecture.
© 2005 by CRC Press
35-4
The Industrial Communication Technology Handbook
35.2 Network Infrastructure Great care must be taken when designing the network infrastructure for an EIA-709 network. There are a number of requirements that must be met in order to guarantee reliable communication between all the nodes in the network. The maximum cable length must be met, the network cables must be properly terminated, the number of nodes on each physical channel must not exceed the specification, the expected network traffic must be estimated, a network topology must be found that keeps local traffic local, and only data packets that need to travel across network segments should leave the local segment. The bandwidth utilization must be kept below a certain limit, and the number of CRC errors due to collisions and noise on the network media must be measured on each channel and kept within well-defined boundaries. Only a reliable network infrastructure with built-in diagnostics and troubleshooting functionality can guarantee reliable operation of the building or factory. Depending on the network media and the network transceiver, a variety of network topologies are possible with LonWorks nodes. Traditional bus, ring, and star topologies, as well as free topology, are supported. Complex networks require networking elements like repeaters, routers, and gateways, available in EIA-709, to segment the local traffic from traffic crossing segment boundaries. Such networking elements include layer 1 repeaters to extend the physical length of a network cable, layer 2 bridges, and layer 3 routers to decouple individual segments from a networking backbone, and gateways to bridge between the levels of hierarchy in the automation pyramid (Figure 35.2). Modern network infrastructure products [21] have built-in network diagnostics capabilities and can monitor the health state of the network 24 hours a day, 7 days a week. They immediately report network malfunctions or deviations from the normal operation to the system operator. All monitoring and reporting can be done either locally on site or via remote through an intranet or Internet connection.
35.3 Interoperability and Profiles Open standards allow individual companies to build single pieces of a bigger puzzle and enable system integrators (SIs) to design complete systems consisting of those building blocks. Node developers must follow certain rules in order to make the bigger picture work. These rules are defined in interoperability guidelines and in profiles. Compatible does not necessarily mean interoperable, and not at all plug and play. More elaborated guidelines need to be established for plug-and-play, interchangeable, and interworkable products. Different committees have set guidelines to define those terms and to establish the rules. The LonMark organization, for example, has published interoperability guidelines for nodes that use the LonTalk protocol [18]. It is important to note that interoperability must be guaranteed on all seven OSI layers. Interoperability on all seven OSI layers is still not a guarantee for having interworkable products. A typical node only uses a subset of all the possible protocol features. Profiles define these subsets based on the intended use of the node. Task groups within LonMark define functional profiles for analog input, analog output, temperature sensor, humidity sensor, CO2 sensor, Variable Air Volume (VAV) controller, fan coil unit, chiller, thermostat, damper actuator, etc. As of this writing, LonMark hosts different task groups for fire, home/utility, petrol station, HVAC, lighting, sun blinds, and security type applications.
35.4 Development and Integration Tools In order to program the Neuron chip, the user has two tool options. For node manufacturers, it might be sufficient to buy the NodeBuilder from Echelon; the more advanced and more expensive tool is the LonBuilder from Echelon. Both tools allow writing Neuron C programs, compiling and linking them, and downloading the final application into the target node hardware. LonBuilder supports simultaneous debugging of multiple nodes, whereas Nodebuilder only supports debugging of one node at a time. LonBuilder has a built-in protocol analyzer and a network binder to create communication relationships between network nodes.
© 2005 by CRC Press
Fundamentals of LonWorks/EIA-709 Networks
35-5
FIGURE 35.4 Platform-independent ANSI/EIA-709 implementation. Simulation environment with virtual nodes linked to a physical network.
System integrators need tools to design the network and define the communication relationships before the nodes are actually installed in the field. Similar to creating a schematic of resistors, capacitors, integrated circuits (ICs), and diodes, the SI creates a floor plan of all the nodes in the network. After physically installing the nodes in the network, a network management tool is required to configure (also called commission) the nodes. During commissioning, nodes get a logical address (subnet/node/domain) assigned and the binding is created. Binding is the name of the process to establish the communication relationships between sensors, actuators, and controller nodes. Protocol analyzers are used to debug communication problems and to gather traffic and error statistics. During the maintenance phase of the network, the network management tool also supports replacing faulty nodes and extending the network with additional nodes and changes in the communication relationships.
35.5 New Developments for ANSI/EIA-709 Networks As mentioned earlier, until 1999 the Neuron chip was the only vehicle to execute the LonTalk protocol. Due to the limited resources available on the Neuron chip (CPU performance, memory space, I/O capabilities, etc.), many applications could not be solved with the existing technology. Recent hardware platform-independent implementations of the ANSI/EIA-709 protocol allow for good scalability of the CPU performance and memory requirements tailored for the specific application. Furthermore, simulation of the network behavior before actual hardware exists is now possible, as shown in Figure 35.4. Different application programs 1 – n run on top of the ANSI/EIA-709 protocol stack that talks to an L-Chip Driver Interface (LDI) layer, which communicates with the virtual network devices. As shown on the right-hand side in Figure 35.4, the LDI can also talk to a real physical network; hence, existing nodes in a network can be part of the system simulation. The application programs on top do not see a difference if the node already exists as a physical device or if it is still a virtual device simulated on a PC or workstation [19][20]. This architecture is fully portable between different hardware platforms and different operating systems and allows cross-development, e.g., on a Windows platform for a ARM7based target hardware. Since the Application Programming Interface (API) on top of the EIA-709 protocol stack is the same on all hardware platforms and under all operating systems, the final application programs can easily be ported between different target systems. This approach allows full hardware–software codesign, since the application programs can be developed while the hardware development of the network node is still going on.
© 2005 by CRC Press
35-6
The Industrial Communication Technology Handbook
With the new ANSI/EIA-852 standard that defines how EIA-709 packets are tunneled through IP (Internet Protocol) networks, standard hardware platforms with an Ethernet interface like PCs or embedded controllers can be used to develop LonWorks-based network nodes.
References [1] D. Dietrich, D. Loy, H.-J. Schweinzer, LON-Technologie, Verteilte Systeme in der Anwendung. 2. Auflage, Hüthig Verlag, Heidelberg, 1999. [2] D. Dietrich, P. Neumann, H. Schweinzer, Fieldbus Technology: System Integration, Networking, and Engineering, Springer, New York, 1999. [3] A.S. Tanenbaum, Computer Networks, 2nd edition, Prentice Hall, Englewood Cliffs, NJ, 1989. [4] A.S. Tanenbaum, Distributed Operating Systems, Prentice Hall, Englewood Cliffs, NJ, 1994. [5] C.J. Koomen, The Design of Communicating Systems, Kluwer Academic Publishers, Boston, 1991. [6] H. Kopetz, Design Principles for Distributed Embedded Applications, Kluwer Academic Publishers, Boston, 1997. [7] Pradeep K. Sinha, Distributed Operating Systems, Institute of Electrical and Electronics Engineers, New York, 1997. [8] G.H. Gürtler, Fieldbus standardization, the European approach and experiences, in Feldbustechnik in Forschung, Entwicklung und Anwendung, Springer, New York, 1997, pp. 2–11. [9] J.-P. Thomesse and M. Leon Chavez, Main paradigms as a basis for current fieldbus concepts, in Fieldbus Technology, Springer, New York, 1999, pp. 2–15. [10] D. Dietrich, T. Sauter, Evolution potentials for fieldbus systems, in Proceedings of WFCS ’00, Porto, Portugal, 2000. [11] T. Sauter, P. Palensky, A closer look into Internet-fieldbus connectivity, in Proceedings of WFCS ’00, Porto, Portugal, 2000. [12] Th. Bangemann, R. Dübner, A. Neumann, Integration of fieldbus objects into computer-aided network facility management systems, in Proceedings of FeT ’99, D. Dietrich, P. Neumann, H. Schweinzer (eds.), Springer, New York, 1999, pp. 180–187. [13] S. Rüping, H. Klugmann, K.-H. Gerdes, S. Mirbach, Modular OPC-server connecting different fieldbus systems and Internet Java applets, in Proceedings of FeT ’99, D. Dietrich, P. Neumann, H. Schweinzer (eds.), Springer, New York, 1999, pp. 240–246. [14] P. Neumann, F. Iwanitz, Integration of fieldbus systems into distributed object-oriented systems, in Proceedings of WFCS ’97, 1997, pp. 247–253. [15] U. Döbrich, P. Noury, ESPRIT Project NOAH: introduction, in Fieldbus Technology, Springer, New York, 1999, pp. 414–422. [16] P. Palensky, The convergence of intelligent software agents and field area networks, in Proceedings of ETFA ’99, Barcelona, 1999, pp. 917–922. [17] Motorola, LonWorks Technology Device Data, DL159, Rev. 4, Q4/97. [18] LonMark, Application Layer Interoperability Guidelines, Version 3, LonMark Interoperability Association, 1996. [19] A. Bauer, LC-SIM, Loytec electronics GmbH, www.loytec.com, March 2000. [20] A. Bauer, S. Soucek, Simulation of ANSI/EIA709 Networks, LonWorld, October 2000. [21] H. Schweinzer, VENUS-Vienna Embedded Networking Utility Suite, Loytec electronics GmbH, www.loytec.com, October 1999.
© 2005 by CRC Press
36 The Standard Message Specification for Industrial Automation Systems: ISO 9506 (MMS) 36.1 Introduction ......................................................................36-1 36.2 MMS Client–Server Model...............................................36-2 36.3 Virtual Manufacturing Device .........................................36-3 MMS Models and Services
36.4 36.5 36.6 36.7 36.8
Locality of the VMD .........................................................36-7 Interfaces............................................................................36-8 Environment and General Management Services ........36-10 VMD Support..................................................................36-11 Domain Management .....................................................36-12 What Is the Domain Scope?
36.9 Program Invocation Management .................................36-14 Program Invocation Services
36.10 MMS Variable Model ......................................................36-15 Access Paths • Objects of the MMS Variable Model • Unnamed Variable • MMS Address of the Unnamed Variable • Services for the Unnamed Variable Object • Explanation of the Type Description • Named Variable • Access to Several Variables • Services
Karlheinz Schwarz Schwarz Consulting Company (SCC)
36.11 Conclusion.......................................................................36-31 References ...................................................................................36-31 Further Resources ......................................................................36-32
36.1 Introduction The international standard Manufacturing Message Specification (MMS) [1, 2] is an Open Systems Interconnection (OSI) application layer messaging protocol designed for the remote control and monitoring of devices such as remote terminal units (RTUs), programmable logic controllers (PLCs), numerical controllers (NCs), or robot controllers (RCs). It provides a set of services allowing the remote manipulation of variables, programs, semaphores, events, journals, etc. MMS offers a wide range of services satisfying both simple and complex applications.
36-1 © 2005 by CRC Press
36-2
The Industrial Communication Technology Handbook
For years, the automation of technical processes has been marked by increasing requirements with regard to flexible functionalities for the transparent control and visualization of any kind of process [7, 8]. The mere cyclic data exchange will more and more be replaced by systems that join together independent yet coordinated systems — like communication, processing, open- and closed-loop control, quality protection, monitoring, configuring, and archiving systems. These individual systems are interconnected and work together. As a common component, they require a suitable real-time communication system with adequate functions. The MMS standard defines common functions for distributed automation systems. The expression manufacturing, which stands for the first M in MMS, has been badly chosen. The MMS standard does not contain any manufacturing-specific definitions. The application of MMS is as general as the application of a personal computer. MMS offers a platform for a variety of applications. The first version of MMS documents were published in 1990 by ISO TC 184 (Industrial Automation) as an outcome of the GM initiative Manufacturing Application Protocols (MAP). The current version was published in 2003: • Part 1: ISO 9506-1 Services: Describes the services that are provided to remotely manipulate the MMS objects. For each service, a description is given of the parameters carried by the service primitives. The services are described in an abstract way that does not imply any particular implementation. • Part 2: ISO 9506-2 Protocol: Specifies the MMS protocol in terms of messages. The messages are described with ASN.1, which gives the syntax. Today, MMS is being implemented — unlike the practice 15 years ago and unlike the supposition still partly found today — on all common communication networks that support the safe transport of data. These can be networks like Transmission Control Protocol (TCP)/Internet Protocol (IP) or International Organization for Standardization (ISO)/OSI on Ethernet, a fieldbus, or simple point-to-point connections like high-level data link control (HDLC), RS 485, or RS 232. MMS is independent of a seven-layer stack. Since MMS was originally developed in the MAP environment, it was generally believed earlier that MMS could be used only in connection with MAP. MMS is the basis of the international project Utility Communication Architecture (UCA™, IEEE TR 1550) [13], IEC 60870-6-TASE.2 (Inter-Control Center Communication) [9–12], IEC 61850 (Communication Networks and Systems in Substations) [16–20], and IEC 61400-25 (Communications for Monitoring and Control of Wind Power Plants) [15, 21]. This chapter introduces the basic concepts of MMS applied in the above-mentioned standards.
36.2 MMS Client–Server Model MMS describes the behavior of two communicating devices by the client–server model (Figure 36.1). The client can, for example, be an operating and monitoring system, a control center, or another intelligent device. The server represents one or several real devices or whole systems. MMS uses an objectoriented modeling method with object classes (named variable, domain, named variable list, journal, etc.), instances from the object classes, and methods (services like read, write, information report, download, read journal, etc.). The standard is comprehensive. This does not at all mean that an MMS implementation must be complex or complicated. If only a simple subset is used, then the implementation can also be simple. Meanwhile, MMS implementations are available in the third generation. They allow the use of MMS on both PC platforms and embedded controllers. The MMS server represents the objects that the MMS client can access. The virtual manufacturing device (VMD) object represents the outermost “container” in which all other objects are contained. Real devices can play both roles (client and server) simultaneously. A server in a control center for its part can be client with respect to a substation. MMS basically describes the behavior of the server. The
© 2005 by CRC Press
36-3
The Standard Message Specification for Industrial Automation Systems
Access, Define, Delete,... Objects in the VMD
LAN, WAN, Fieldbus, Point-to-Point, ...
Virtual Manufacturing Device (VMD) Object
Services (Commands/Responses, Remote Calls)
A
B Object Object
MMS Client
Real devices
X
MMS Server
FIGURE 36.1 MMS client–server model.
server contains the MMS objects, and it also executes services. MMS can be regarded as server-centric. In principle, in a system more devices are installed that function as server (for example, controllers and field devices) than devices that perform as client (e.g., PC and workstation). The calls that the client sends to the server are described in ISO 9506-1 (services). These calls are processed and answered by the server. The services can also be referred to as remote calls, commands, or methods. Using these services, the client can access the objects in the server. It can, for example, browse through the server, i.e., make visible all available objects and their definitions (configurations). The client can define, delete, change, or access objects via reading and writing. An MMS server models real data (e.g., temperature measurement, counted measurand, or other data of a device). These real data and their implementation are concealed or hidden by the server. MMS does not define any implementation details of the servers. It is only defined how the objects behave and represent themselves to the outside (from the point of view of the wire) and how a client can access them. MMS provides the very common classes. The named variable, for example, allows the structuring of any information provided for access by an application. The content (the semantic of the exchanged information) of the named variables is outside of the MMS standard. Several other standards define common and domain-specific information models. IEC 61850 defines the semantics of many points in electric substations. For example, “Atlanta26/ XCBR3.Pos.stVal” is the position of the third circuit breaker in substation Atlanta26. The names XCBR, Pos, and stVal are standardized names. The coming standard IEC 61400-25 (Communication for Wind Power Plants) defines a comprehensive list of named points specific for wind turbines. For example, “Tower24/WROT.RotSpd.mag” is the (deadbanded) measured value of the rotor position of Tower24. “RotSpd.avgVal” is the average value (calculated based on a configuration attribute “avgPer”). These information models are based on common data classes like measured value, three-phase value (delta and Y), and single-point status.
36.3 Virtual Manufacturing Device According to Figure 36.2, the real data and devices are represented — in the direction of a client — by the virtual manufacturing device. In this regard, the server represents a standard driver that maps the real world to a virtual one. The following definitions help to clarify the modeling in the form of a virtual device: If If If If
it is there and you can see it it is there and you cannot see it it is not there and you can see it it is not there and you cannot see it
It is REAL It is TRANSPARENT It is VIRTUAL It is GONE Roy Wills
The VMD can represent, for example, a variable “Measurement3” whose value may not permanently exist in reality; only when the variable is being read will a measurand transducer get started in determining
© 2005 by CRC Press
36-4
The Industrial Communication Technology Handbook
MMS Client
MMS Services
Hide/encapsulate real World
MMS Server
Real data/devices
(Virtuelle Welt)
Object Object Object
VMD Virtual Manufacturing Device
FIGURE 36.2 Hiding real devices in the VMD.
the value. All objects in a server can already be contained in a device before the delivery of a device. The objects are predefined in this case. Independent of the implementation of a VMD, data and the access to data are always treated in the same way. This is completely independent of the operating system, the programming language, and memory management. Like printer drivers for a standard operating system hide the various real printers, so a VMD also hides real devices. The server can be understood as a communication driver that hides the specifics of real devices. From the point of view of the client, only the server with its objects and its behavior is visible. The real device is not visible directly. MMS merely describes the server side of the communication (objects and services) and the messages that are exchanged between client and server. The VMD represents the behavior of a real device as far as it is visible “over the wire.” It contains, for example, an identification of manufacturer, device type, and version. The virtual device contains objects like variables, lists, programs, data areas, semaphores, events, journals, etc. The client can read the attributes of the VMD (Figure 36.3); i.e., it can browse through a device. If the client does not have any information about the device, it can view all the objects of the VMD and their attributes by means of the different “get” services. With that, the client can perform a first plausibility check on a just-installed device by means of a “get(object-attribute)” service. It learns whether the installed device is the ordered device with the right model number (model name) and the expected issue number (revision). All other attributes can also be checked (for example, variable names and types). The attributes of all objects represent a self-description of the device. Since they are stored in the device itself, a VMD always has the currently valid and thus consistent configuration information of the respective device. This information can be requested online directly from the device. In this way, the client always receives up-to-date information. MMS defines some 80 functions: • Browsing functions about the contents of the virtual device: Which objects are available? • Functions for reading, reporting, and writing of arbitrarily structured variable values • Functions for the transmission of data and programs, for the control of programs, and many other functions The individual groups of the MMS services and objects are shown in Figure 36.4. MMS describes such aspects of the real device that shall be open, i.e., standardized. An open device must behave as described by the virtual device. How this behavior is achieved is not visible, nor is it relevant to the user that accesses the device externally. MMS does not define any local, specific interfaces in the real systems. The interfaces are independent of the functions that shall be used remotely. Interfaces in connection with MMS are always understood in the sense that MMS quasi-represents an interface between the devices
© 2005 by CRC Press
The Standard Message Specification for Industrial Automation Systems
36-5
MMS Server Object: VMD Key Attribute: Executive Function Attribute: Vendor Name Attribute: Model Name Attribute: Revision Attribute: List of Abstract Syntaxes Supported Attribute: Logical Status (STATE-CHANGESALLOWED, ...) Attribute: List of Capabilities Attribute: Physical Status (OPERATIONAL, ...) Attribute: List of Program Invocations Attribute: List of Domains Attribute: List of Transaction Objects Attribute: List of Upload State Machines (ULSM) Attribute: Lists of Other VMD-Specific Objects Attribute: Additional Detail
Browse through VMD
Access, Define, Delete, ... Objects in VMD
FIGURE 36.3 VMD attributes.
MMS Services Management, VMD Support
MMS Server MMS Objects
VMD
Named Variable Read, Write, Report, Def., Del.
Named Type
Download, Upload, Del.
Domain
Def., Start, Stop, Resume, ...
Program
Def., Subscribe, Notification, ...
Events
Def., Write, Read, Query, ...
Journals
...
... 15 Objects 80 Services
Get Attributes
Named Var.List
FIGURE 36.4 MMS objects and services.
and not within the devices. This interface could be described as an external interface. Of course, interfaces are also needed for implementations of MMS functions in the individual real devices. These shall not and cannot be defined by a single standard. They are basically dependent on the real systems — and these vary to a great extent.
36.3.1 MMS Models and Services 36.3.1.1 ISO 9506-1 (Part 1): Service Specification 36.3.1.1.1 Environment and General Management Services Two applications that want to communicate with each other can set up, maintain, and close a logical connection (initiate, conclude, abort).
© 2005 by CRC Press
36-6
The Industrial Communication Technology Handbook
36.3.1.1.2 VMD Support The client can thereby query the status of a VMD or the status is reported (unsolicited status); the client can query the different lists of the objects (get name list), the attributes of the VMD (identify), or change the names of objects (rename). 36.3.1.1.3 Domain Management Using a simple flow control (download, upload, delete domains, etc.), programs and data of arbitrary length can be transmitted between client and server and also a third station (and vice versa). In the case of simple devices, the receiver of the data stream determines the speed of the transmission. 36.3.1.1.4 Program Invocation Management The client can create, start, stop, and delete modularly structured programs (start, stop, resume, kill, delete, etc.). 36.3.1.1.5 Variable Access This service allows the client to read and write variables that are defined in the server or a server is enabled to report the contents to a client without being requested (information report). The structures of these data are simple (octet string) to arbitrarily complex (structure of array of …). In addition, data types and arbitrary variables can be defined (read, write, information report, define variable, etc.). The variables constitute the core functionality of every MMS application; therefore, the variable access model will be explained in detail below. 36.3.1.1.6 Event Management This allows an event-driven operation; i.e., a given service (e.g., read) is only carried out if a given event has occurred in the server. An alarm strategy is integrated. Alarms will be reported to one or more clients if certain events occur. These have the possibility to acknowledge the alarms later (define, alter event condition monitoring, get alarm summary, event notification, acknowledge event notification, etc.). This model is not explained further. 36.3.1.1.7 Semaphore Management The synchronization of several clients and the coordinated access to the resources of real devices are carried out hereby (define semaphore, take/relinquish control, etc.). This model is not explained further. 36.3.1.1.8 Operator Communication Simple services for communication with operating consoles integrated in the VMD (input and output). This model is not explained further. 36.3.1.1.9 Journal Management Several clients can enter data into journals (archives, logbooks), which are defined in the server. Then these data can selectively be retrieved through filters (write journal, read journal, etc.). This model is not explained further. 36.3.1.2 ISO 9506-2 (Part 2): Protocol Specification If a client invokes a service, then the server must be informed about the requested type of service. For a “read” service, e.g., the name of the variables must be sent to the server. This information, which the server needs for the execution, is exchanged in so-called protocol data units (PDUs). The set of all the PDUs that can be exchanged between client and server constitute the MMS protocol. In other words, the protocol specification — using ISO 8824 (Abstract Syntax Notation One, ASN.1) and ISO 8825 (ASN.1 BER, the basic encoding rules for ASN.1) — describes the abstract and concrete syntax of the functions defined in Part 1. The syntax is explained below exemplarily.
© 2005 by CRC Press
The Standard Message Specification for Industrial Automation Systems
36-7
end device
control/ monitoring/ engineering
end device VMD MMS Objects MMS
ices
Serv
Gateway/Agent
MM
SS
ervic
es VMD MMS Objects 125,34
VMD MMS Objects
FIGURE 36.5 Location of VMDs.
36.4 Locality of the VMD VMDs are virtual descriptions of real data and devices (e.g., protection devices, transducers, wind turbines, and any other automation device or system). Regarding the implementation of a VMD, there are three very different possibilities where a VMD can be located (Figure 36.5): 1. In the end device: One or several VMDs are in the real device, which is represented by the VMD. The implementations of the VMD have direct access to the data in the device. The modeling can be carried out in such a way that each application field in the device is assigned to its own VMD. The individual VMDs are independent from each other. 2. In the gateway: One or several VMDs are implemented in a separate computer (a so-called gateway or agent). In this case, all MMS objects that describe the access to real data in the devices are at a central location. While being accessed, the data of a VMD can be in the memory of the gateway — or they must be retrieved from the end device only after the request. The modeling can be carried out in such a way that for each device or application, a VMD of its own will be implemented. The VMDs are independent of each other. 3. In a file: One or several VMDs are implemented in a database on a computer, on a File Transfer Protocol (FTP) server, or on a CD-ROM (the possibilities under 1 and 2 are also valid here). Thus, all VMDs and all included objects with all their configuration information can be entered directly into engineering systems. Such a CD-ROM, which represents the device description, could also be used, for example, to provide a monitoring system with the configuration information: names, data types, object attributes, etc. Before devices are delivered, the engineering tools can already process the accompanying device configuration information (electronic data sheet). The configuration information can also be read later online from the respective VMDs via corresponding MMS requests. The VMD is independent of the location. This also allows, for example — besides the support during configuration — that several VMDs can be installed for testing purposes on a computer other than the final system (Figure 36.6). Thus, the VMDs of several large robots can be tested in the laboratory or office. The VMDs will be installed on one or several computers (the computers emulate the real robots). Using a suitable communication (for example, intranet or a simple RS 232 connection — available on every PC), the original client (a control system that controls and supervises the robots) can now access and test the VMDs in the laboratory. This way, whole systems can be tested beforehand regarding the interaction of individual devices (for example, monitoring and control system).
© 2005 by CRC Press
36-8
The Industrial Communication Technology Handbook
control/ monitoring/ engineering
VMD device A
VMD device B
VMD device X
VMD device Y
MMS Objects
MMS Objects
MMS Objects
MMS Objects
A
B...
X
Y
report office communications network
FIGURE 36.6 VMD testing using PC in an office environment.
If the Internet is used instead of the intranet, global access is possible to any VMD that is connected to the Internet. The author tested the access from Germany to a VMD that was implemented on a PLC in the U.S. Through standards like MMS and open transmission systems, it has become possible to set up global communication networks for the real-time process data exchange. The previous statements about the VMD are also fully valid for all standards that are based on MMS.
36.5 Interfaces The increasing distribution of automation applications and the exploding amount of information require more and more, and increasingly more complex interfaces for operation and monitoring. Complex interfaces turn into complicated interfaces very fast. Interfaces “cut” components in two pieces; through this, interactions between the emerged subcomponents — which were hidden in one component before — become visible. An interface discloses which functions are carried out in the individual subcomponents and how they act in combination. Transmitters and receivers of information must likewise be able to understand these definitions. The request “Read T142” must be formulated understandably, transmitted correctly, and understood unambiguously (Figure 36.7). The semantics (named terms that represent something) of the services and the service parameters are defined in MMS. The content, e.g., of named variables is defined in domainspecific standards like IEC 61850.
Igel ! Eagle?
FIGURE 36.7 Sender and receiver of information.
© 2005 by CRC Press
The Standard Message Specification for Industrial Automation Systems
36-9
devices
Portability
Communications network e.g. MMS open Unified internal (access) interface
Unified external interface
FIGURE 36.8 Internal and external interfaces.
Interfaces occur in two forms: • Internal program—program interfaces or Application Programming Interface (APIs) • External interfaces over a network (wide area network (WAN), local area network (LAN), fieldbus, etc.) Both interfaces affect each other. MMS defines an external interface. The necessity of complex interfaces (complex because of the necessary functionality, not because of an end in itself) is generally known and accepted. To keep the number of complex interfaces as small as possible, they are defined in standards or industry standards — mostly as open interfaces. Open interfaces are in the meantime integral components of every modern automation. In mid-1997 it was explained in [22] that the trend in automation engineering obviously leads away from the proprietary solutions to open, standardized interfaces — i.e., to open systems. The reason why open interfaces are complicated is not because they were standardized. Proprietary interfaces tend more toward being complicated or even very complicated. The major reasons for the latter observation are found in the permanent improvement of the interfaces, which expresses itself in the quick changes of version and in the permanent development of new — apparently better — interfaces. Automation systems of one manufacturer often offer — for identical functions — a variety of complicated interfaces that are incompatible with each other. At first, interfaces can be divided up into two classes (Figure 36.8): internal interfaces (for example, in a computer) and external interfaces (over a communication network). The following consideration is strongly simplified because, in reality, both internally and externally several interfaces can lie one above the other. However, it nevertheless shows the differences in principle that must be paid attention to. MMS defines an external interface. Many understand MMS in such a way that it offers — or at least also offers — an internal interface. This notion results in completely false ideas. Therefore, the following consideration is very helpful. The left-hand side of the figure shows the case with a uniform internal interface and varying external interfaces. This uniform internal interface allows many applications to access the same functions with the same parameters and perhaps the same programming language — independent of the external interface. Uniform internal interfaces basically allow the portability of the application programs over different external interfaces. The right-hand side of the figure shows the case with the external interface being uniform. The internal interfaces are various (since the programming languages or the operating systems, for example, are various). The uniform external interface is independent of the internal interface. The consequence of this is that devices whose local interfaces differ and are implemented in diverse environments can
© 2005 by CRC Press
36-10
The Industrial Communication Technology Handbook
communicate together. Differences can result, for example, from an interface being integrated into the application in a certain device, but being available explicitly in another device. The essential feature of this uniform external interface is the interoperability of different devices. The ISO/OSI reference model is aimed at exactly this feature. The (internal) MMS interface, for example, in a client (perhaps $READ (Par. 1, Par. 2, … Par. N)), depends on manufacturer, operating system, and programming language. MMS implementations are available for UNIX or Windows NT. On the one hand, this is a disadvantage because applications that want to access an MMS server would have to support, depending on environment, various real program interfaces. On the other hand, the MMS protocol is completely independent of the fast-changing operating system platforms. Standardized external interfaces like MMS offer a high degree on stability, because, in the first place, the communication can hardly be changed arbitrarily by a manufacturer and, in the second place, several design cycles of devices can survive. Precisely the stability of the communication, as it is defined in MMS, also offers a stable basis for the development of internal interfaces on the various platforms, such as under Windows 95, NT, or in UNIX environments. Openness describes in the ISO/OSI world the interface on the wire. The protocol of this external interface executes according to defined standardized rules. For an interaction of two components, these rules have to be taken into account on the two sides; otherwise, the two will not understand each other.
36.6 Environment and General Management Services MMS uses a connection-oriented mode of operation. That is to say, before a computer can read a value from a PLC for the first time, a connection must be set up between the two. MMS connections have particular quality features such as: • Exclusive allocation of computer and memory resources to a connection. This is necessary to guarantee that all services (for example, five “reads,” etc.) allowed to be carried out simultaneously find sufficient resources on both sides of the connection. • Flow control in order to avoid blockages and vain transmissions if, e.g., the receive buffers are full. • Segmentation of long messages. • Routing of messages over different networks. • Supervision of the connection if no communication takes place. • Acknowledgment of the transmitted data. • Authentication, access protection (password), and encoding of the messages. Connections are generally established once and then remain established as long as a device is connected (at least during permanently necessary communication). If, for example, a device is only seldom accessed by a diagnostics system, a connection then does not need to be established permanently (waste of resources). It suffices to establish a connection and later to close it to release the needed resources again. The connection can remain established for rare but time-critical transmissions. The subordinate layers supervise the connection permanently. Through this the interruption of a connection is quickly recognized. The MMS services for the connection management are: • Initiate: Connection setup • Conclude: Orderly connection teardown — waiting requests are still being answered • Abort: Abrupt connection teardown — waiting requests are deleted Besides these services that are all mapped to the subordinate layers, there are two other services: • Cancel • Reject After the MMS client has sent a read request to the MMS server, for example, it may happen that the server leaves the service in its request queue and, for whatever reason, does not process it. Using the
© 2005 by CRC Press
The Standard Message Specification for Industrial Automation Systems
36-11
service cancel, the client can now delete the request in the server. On the other hand, it may occur that the server shall carry out a service with forbidden parameters. Using reject, it rejects the faulty request and reports this back to the client. Although MMS was originally developed for ISO/OSI networks, a number of implementations are available in the meantime that also use other networks, such as the known TCP/IP network. From the point of view of MMS, this is insignificant as long as the necessary quality of the connection is guaranteed.
36.7 VMD Support The VMD object consists of 12 attributes. The key attribute identifies the executive function. The executive function corresponds directly with the entity of a VMD. A VMD is identified by a presentation address: Object: Key attribute: Attribute: Attribute: Attribute: Attribute: Attribute: Attribute: Attribute: Attribute: Attribute: Attribute: Attribute:
VMD Executive function Vendor name Model name Revision Logical status (STATE CHANGES ALLOWED, NO STATE CHANGES ALLOWED, LIMITED SERVICES SUPPORTED) List of capabilities Physical status (OPERATIONAL, PARTIALLY OPERATIONAL, INOPERABLE NEEDS COMMISSIONING) List of program invocations List of domains List of transaction objects List of upload state machines (ULSMs) List of other VMD-specific objects
The attributes vendor name, model name, and revision provide information about the manufacturer and the device. The logical status defines which services may be carried out. The status “limited services supported” allows that only such services may be executed that have read access to the VMD. The physical status indicates whether the device works in principle. Two services are used to get the status unsolicited (unsolicited status) or explicitly requested (status). Thus, a client can recognize whether a given server — from the point of view of the communication — works at all. The list of capabilities offers clients and servers a possibility to define application-specific agreements in the form of features. The available memory of a device, for example, could be a capability. Through the “get capability list” service the current value can be queried. The remaining attributes contain the lists of all the MMS objects available in a VMD. The VMD therefore contains an object dictionary in which all objects of a VMD are recorded. The following three services complete the VMD: • Identify supplies the VMD attributes vendor name, model name, and revision. With that, a plausibility check can be carried out from the side of the client. • Get name list returns the names of all MMS objects. It can be selectively determined from which classes of objects (for example, named variable or event condition) the names of the stored objects shall be queried. Let us assume that a VMD was not known to the client until now (because it is, for example, a maintenance device); the client can then browse through the VMD and systematically query all names of the objects. Using the get services, which are defined for every object
© 2005 by CRC Press
36-12
The Industrial Communication Technology Handbook
class (e.g., get variable access attributes), the client can get detailed knowledge about a given object (for example, the named variable T142). • Rename allows a client to rename the name of an object.
36.8 Domain Management Domains are to be viewed as containers that represent memory areas. Domain contents can be interchanged between different devices. The object type domain with its 12 attributes and 12 direct operations, which create, manipulate, delete a domain, etc., are part of the model. The abstract structure of the domain object consists of the following attributes: Object: Key attribute: Attribute: Attribute: Constraint: Attribute: Attribute: Attribute: Attribute: Attribute: Constraint: Attribute: Attribute: Attribute:
Domain Domain name List of capabilities State (LOADING, COMPLETE, INCOMPLETE, READY, IN USE) State (LOADING, COMPLETE, INCOMPLETE) Assigned application association MMS deletable Sharable (TRUE, FALSE) Domain content List of subordinate objects State (IN USE) List of program invocation references Upload in progress Additional detail
The domain name is an identifier of a domain within a VMD. Domain content is a dummy for the information that is within a domain. The contents of the data to be transmitted can be coded transparently or according to certain rules agreed upon before. Using the MMS version (2003), the data stream can be coded per default in such a way that a VMD can be transmitted completely, including all MMS object definitions that it contains. This means, on the one hand, that the contents of a VMD can be loaded from a configuration tool into a device (or saved from a device) and, on the other hand, that the contents can be stored on a disk per default. Using a visible string, the list of capabilities describes which resources are to be provided — by the real device — for the domain of a VMD. MMS deletable indicates whether this domain may be deleted by means of an MMS operation. Sharable indicates whether a domain may be used by more than one program invocation. List of program invocation lists those program invocation objects that use this domain. List of subordinate objects lists those MMS objects (no domains or program invocations) that are defined within this domain: objects that were created (1) by the domain loading, (2) dynamically by a program invocation, (3) dynamically by MMS operations, or (4) locally. State describes one of the ten states in which a domain can be. Upload in progress indicates whether the content of this domain is being copied to the client at the moment. MMS defines loading in two directions: • Data transmission from the client to the server (download) • Data transmission from the server to the client (upload) Three phases can be distinguished during loading: • Open transmission • Segmented transmission, controlled by the data sink • Closed transmission
© 2005 by CRC Press
The Standard Message Specification for Industrial Automation Systems
36-13
Computer
Client
Initiative for Transfer Server Upload
VMD Domain MValues
Download Initiative for Transfer
FIGURE 36.9 MMS domain transfer.
Transmission during download and upload is initiated by the client. If the server initiates transmission, then it has the possibility to initiate the transmission indirectly (Figure 36.9). For this purpose, the server informs the client that the client shall initiate the loading. Even a third station can initiate the transmission by informing the server, which then informs the client.
36.8.1 What Is the Domain Scope? Further MMS objects can be defined within a domain: variable objects, event objects, and semaphore objects. A domain forms a scope (validity range) in which named MMS objects are reversibly unambiguous. MMS objects can be defined in three different scopes, as shown in Figure 36.10. Objects with VMDspecific scope (for example, the variable Status_125) can be addressed directly through this name by all clients. If an object has a domain-specific scope such as the object Status_155, then it is identified by two identifiers: domain identifier Motor_2 and object identifier Status_155. A third scope is defined by the application association. The object Status_277 is part of the corresponding connection. This object can only be accessed through this connection. When the connection is closed, all objects are deleted in this scope. MMS VMD
Variable
“Status_125”
Domain “Motor_2” “Status_155”
Application Association 1
Application Association 2
FIGURE 36.10 VMD and domain scope.
© 2005 by CRC Press
“Status_277”
36-14
The Industrial Communication Technology Handbook
MMS objects can be organized using the different scopes. The object names (with or without domain scope) can be compounded from the following character set: the identifiers can contain 1 to 32 characters and they must not start with a number. The object names can be structured by agreement in a further standard or other specification. Many standards that reference MMS make much use of this possibility. This way, all named variables with the prefix “RWE_” and similar prefixes, for example, could describe the membership of the data (in a transEuropean information network) to a specific utility of an interconnected operation.
36.9 Program Invocation Management A program invocation object is a dynamic element that corresponds with the program executions in multitasking environments. Program invocations are created by linking several domains. They are either predefined or created dynamically by MMS services or created locally. A program invocation object is defined by its name, its status (idle, starting, running, stopping, stopped, resuming, unrunnable), the list of the domains to be used, and nine operations. Object: Program invocation Key attribute: Program invocation name Attribute: State (IDLE, STARTING, RUNNING, STOPPING, STOPPED, RESUMING, RESETTING, UNRUNNABLE) Attribute: List of domain references Attribute: MMS deletable (TRUE, FALSE) Attribute: Reusable (TRUE, FALSE) Attribute: Monitor (TRUE, FALSE) Constraint: Monitor = TRUE Attribute: Event condition reference Attribute: Event action reference Attribute: Event enrollment reference Attribute: Execution argument Attribute: Additional detail Program invocations are structured flatly, though several program invocations can reference the same domains (shared domains). The contents of the individual domains are absolutely transparent both from the point of view of the domain and from the point of view of the program invocations. What is semantically connected with the program invocations is outside the scope of MMS. The user of the MMS objects must therefore define the contents; the semantics result from this context. If a program invocation connects two domains, then the domain contents must define what these domains will do together —– MMS actually only provides a wrapper. The program invocation name is a clear identifier of a program invocation within a VMD. State describes the status in which a program invocation can be. Altogether, seven states are defined. List of domains contains the names of the domains that are combined with a program invocation. This list also includes such domains that are created by the program invocation itself (this can be a domain into which some output is written). MMS deletable indicates whether this program invocation may be deleted by means of an MMS operation. Reusable indicates whether a program invocation can be started again after the program execution. If it cannot be started again, then the program invocation can only be deleted. Monitor indicates whether the program invocation reports a transition to the client when exiting the running status. Start argument contains an application-specific character string that was transferred to a program invocation during the last start operation; e.g., this string could indicate which function has started the program last. Additional detail allows the companion standards to make application-specific definitions [3–6].
© 2005 by CRC Press
The Standard Message Specification for Industrial Automation Systems
36-15
36.9.1 Program Invocation Services Create program invocations: This service arranges an operational program, which consists of the indicated domains, in the server. After installation, the program invocation is in the status idle, from where it can be started. The monitor and monitor type indicate whether and how the program invocation shall be monitored. Delete program invocation: Deletable program invocations are deleted through this service. Primarily, the resources bound to a program invocation are released again. Start: The start service causes the server to transfer the specified program invocation from the idle into the running state. Further information can be transferred to the VMD through a character string in the start argument. A further parameter (start detail) contains additional information that can be defined by companion standards. Stop: The stop service changes a specified program invocation from the running to the stopped state. Resume: The resume service changes a specified program invocation from the stopped to the running state. Reset: The reset service changes a specified program invocation from the running or stopped to the idle state. Kill: The kill service changes a specified program invocation from arbitrary states to the unrunnable state. Get program invocation attribute: Through this service the client can read all attributes of a certain program invocation.
36.10 MMS Variable Model MMS named variables are addressed using identifiers made up of the domain name and the named variable name within the domain. Components of an MMS named variable may also be individually addressed using a scheme called alternate access. The alternate access address of a component consists of the domain name and the named variable name, along with a sequence of enclosing component names of the path down to the target component. The variable access services contain an extensive variable model, which offers the user a variety of advanced services for the description and offers access to arbitrary data of a distributed system. A wide variety of process data are processed by automation systems. The data and their definitions and representation are usually oriented at the technological requirements and at the available automation equipment. The methods the components employ for the representation of their data and the access to them correspond to the way of thinking of their implementers. This has resulted in a wide variety of data representations and access procedures for one and the same technological datum in different components. If, e.g., a certain temperature measurement shall be accessed in different devices, then a huge quantity of internal details must generally be taken into account for every device (request, parameter, coding of the data, etc.). As shown in Figure 36.11, the number of the protocols for the access of a client (on the left in the figure) to the data from n servers (S1 – Sn) can be reduced to a single protocol (on the right in the figure). Through this, the data rate required for the communication primarily in central devices can be reduced drastically. In programs, variables are declared; i.e., they get a name, a type, and a value. Described in a simplified way, both the name and the type are converted by the compiler into a memory location and into a reference that is only accessible to the compiled program. Without any further measures, the data of the variable are not identifiable outside the program. It is concealed for the user of the program how a compiler carries out the translation into the representation of a certain real machine. The data are stored in different ways depending on the processor; primarily, the data are stored in various memory locations. During the runtime of the program, only this representation is available.
© 2005 by CRC Press
36-16
The Industrial Communication Technology Handbook
Client
Server
1
S1
S1
l2
S2
S2
ol
oc rot
P
toco
Pro
C
Protocol 3
S3
Std. Protocol
C
S3
Proto
col m
Sm
Sm
Sn
Sn
Pro
toc
ol
n
Internal/external representation
FIGURE 36.11 Unified protocols.
These data are not visible from the outside. They must first be made visible for access from the outside. To enable this, an entity must be provided in the implementation of the application. It is insignificant here whether this entity is separated from or integrated into the program. This entity is acting for all data that shall be accessible from the outside. The following consideration is helpful in explaining the MMS variable model: What do protocols through which process data are accessed have in common? Figure 36.12 shows the characteristics in principle. On the right is the memory with the real process data, which shall be read. The client must be able to identify the data to be read (gray shade). For this purpose, pointer (start address) and length of the data must be known. By means of this information, the data can be identified in the memory. Yet how can a client know the pointer of the data and what the length of the data is? It could have this information somehow and indicate it when reading. Yet if the data should move, then the pointer of the data is not correct anymore. There can also be the case where the data do not exist at all at the time of reading but must first be calculated. In this case, there is no pointer. To avoid this, references to MMS Variable object Name
Memory
Type description
A B C
read B
1 1 0 0 1 0 0 1
Start Length
1 1 0 1 1 1 0 1 1 0 0 1 1 1 0 1 1 1 0 1 1 0 0 1 1 0 0 1 1 1 0 1 1 1 0 1 1 0 0 1
Data description
FIGURE 36.12 Data access principle.
© 2005 by CRC Press
Data
The Standard Message Specification for Industrial Automation Systems
36-17
the data, which are mapped to the actual pointers (table or algorithm), are used in most cases. In our case, the reference B is mapped by a table to the corresponding pointer and length. The pointer and length are stored in the type description of the table. The pointer is a system-specific value that generally is not visible on the outside. The length is dependent on the internal representation of the memory and on the type. An individual bit can, for example, be stored as an individual bit in the memory or as a whole octet. However, this is not relevant from the point of view of the communication. The data themselves and their descriptions are important for the message response of the read service. The question of the external representation (for example, an individual bit encoded as a bit or an octet) is — unlike the internal representation — of special importance here. The various receivers of the data must be able to interpret the data unambiguously. For this purpose, they need the representation that is a substantial component of the MMS variable model. The data description is therefore derived from the type description. For a deeper understanding of the variable model, three aspects have to be explained more exactly: • Objects and their functions • Services (read, write, etc.) that access the MMS variable objects • Data description for the transmission of the data The object model of an MMS variable object is conceptually different from a variable according to a programming language. The MMS objects describe the access path to structured data. In this sense, they do not have any variable value.
36.10.1 Access Paths The access path represents an essential feature of the MMS variable model. Starting from a more complex hierarchical structure, we will consider the concept. An abstract and extremely simplified example was deliberately chosen. Here, we are merely concerned about the principle. Certain data of a machine shall be modeled using MMS methods. The machine has a tool magazine with n similar tools. A tool is represented, according to Figure 36.13, by three components (tool type, number of blades, and remaining use time). The machine with its tool magazine M is outlined in the figure on the right. The magazine contains three tools: A, B, and C. The appropriate data structure of the magazine is shown on the left. The structure is treelike; the root M is drawn as the topmost small circle (node). M has three components (branches): A, B, and C, which are also represented as circles. These components in turn also have three components (branches). In this case, the branches end in a leaf (represented in the form of a square). Leaves represent Machine Magazine M Magazine M Tool A
B
C A
T A R
T A R
T A R
B
C
T Tool type A Number of edges R Remaining use time
FIGURE 36.13 Data of a machine.
© 2005 by CRC Press
36-18
The Industrial Communication Technology Handbook
M
M
Read M
Read M.A.R/.B.R/.C.R
A T A R response
B
C
T A R
T A R
A T A R
TAR TAR T AR
B T A R R
R
M1
C T A R R
M5
Read M1 and M2
A T A R
B
C
T A R
T A R
TAR TAR TAR
A T A R
B T A R
C T A R
T A R TA R T AR
FIGURE 36.14 Access and partial access.
the endpoints of the branches. For each tool, three leaves are shown: T (tool type), A (number of blades), and R (remaining use time). The leaves represent the real data. The nodes ordered hierarchically are merely introduced for reasons of clustering. Leaves can occur at all nodes. A leaf with the information “magazine full/not full” could, for example, be attached at the topmost node. With this structure, the MMS features are explained in more detail. The most essential aspect is the definition of the access path. The access to the data (and their use) can be carried out according to various task definitions: 1. Selecting all leaves for reading, writing, etc. 2. Selecting certain leaves for reading, writing, etc. 3. Selecting all leaves as components of a higher-level structure, for example, “machine” with the components magazine M and drilling machine 4. Selecting certain leaves as components of a higher-level structure Examples of cases 1 and 2 are shown in Figure 36.14. The case that the complete structure is read is shown in the top left corner (the selected nodes, branches, and leaves are represented in bold lines or squares). All nine data are transmitted as response (3 ¥ (T + A + R)). At first, the representation during transmission is deliberately refrained from here and also in the following examples. Only a part of the data is read in the top right corner of the figure: only the leaf R of all three components A, B, and C. The notation for the description of the subset M.A.R/.B.R/.C.R is chosen arbitrarily. The subset M.A.R represents a (access) path that leads from a root to a leaf. It can also be said that one or several paths represent a part of a tree. The read message contains three paths that must be described in the request completely. A path, for example, can also end at A. All three components of A will be transmitted in this case. Besides the possibility of describing every conceivable subset, MMS also supports the possibility of reading several objects M1 and M5 in a read request simultaneously (see lower half of figure). Of course, every object can only be partly read (not represented). An example of case 4 is shown in Figure 36.15 (case 4 has to be understood as the generalization of case 3). Here a new structure was defined using two substructures. The object “machine” contains only the R component of all six tools of the two magazines M1 and M5. The object “machine” shall not be mixed up with a list (named variable list). Read “machine” supplies the six individual R values. The component names, such as A, B, or C, need to be unambiguous only below a node — and only at the next lower level. Thus, R can always stand for remaining lifetime. The position in the tree indicates the remaining lifetime of a particular tool. The new structure “machine” has all features that were described in the previous examples for M1 and M5.
© 2005 by CRC Press
The Standard Message Specification for Industrial Automation Systems
Machine
Read Machine
M1
M5
A
B
R
R R
R
36-19
C R
A
B
C
R
R
R
R
R
R
R
FIGURE 36.15 Partial trees. Read M1/.A.R/.B.R/.C.R and M2/.A.R/.B.R/.C.R
M1
A T A R
M5
B T A R
R
R
C T A R
R
B
A T A R
T A R
R
R
C T A R
R
FIGURE 36.16 Partial trees used for read requests.
These features of the MMS variable objects can be applied to (1) the definition of new variable objects and (2) the access to existing variable objects. The second case is also interesting. As shown in Figure 36.16, the same result as in Figure 36.15 can be reached by enclosing the description (only the R components of the tools shall be read from the two objects M1 and M5) in the read request every time. The results (read answer) are absolutely identical in the two cases. Every possibility to assemble hierarchies from other hierarchies or to read parts of a tree during the access has its useful application. The first case is important in order to avoid enclosing the complete path description every time when reading an extensive part of the tree. The second case offers the possibility of constructing complex structures based on standardized basic structures (for example, the structure of “tool data” consisting of the components T, A, and R) and of using them for the definition of new objects. Summarizing, it can be stated that the access paths accomplish two tasks: • Description of a subset of nodes, branches, and leaves of objects during reading, writing, etc. • Description of a subset of nodes, branches, and leaves of objects during the definition of new objects In conclusion, this may be expressed in the following way: path descriptions describe the way or ways to a single datum (leaf) or to several data (leaves). A client can read the structure description (complete tree) through the MMS service “get variable access attributes.” Another aspect is of special importance too. Until now, we have not considered the description of leaves. Every leaf has one of the following MMS basic data types: • • • •
Boolean Bit Integer Unsigned
© 2005 by CRC Press
36-20
The Industrial Communication Technology Handbook
MMS Variable object/ MMS Named Type Access to Variables (Path description,...)
Type description 2
Read, Write, ... 1 Define, ...
Data description (Path description, ...)
Path description, ...
4
Data
Read, Get Attribute, ... 3
5
Information Report, ... 3
FIGURE 36.17 Application of path descriptions.
• • • • • • •
Floating point Real Octet string Visible Generalized time Binary BCD
Every node in a tree is either an array or a structure. Arrays have n elements (0 to n – 1) of the same type (this can be a basic type, an array, or a structure). When describing a part of a tree, any number of array elements can be selected (e.g., one element, two adjacent elements, two arbitrary elements, etc.). Structures consist of one or several components. Every component can be marked by its position and, if necessary, by an Identifier — the component name. This component name (e.g., A) is used for access to a component. Parts of trees can describe every subset of the structure. The path description contains the following three elements: • All of the possibilities of the description of the structures (individual and composite paths in the type description of the MMS variables) are defined in the form of an extensive abstract syntax. • For every leaf of a structure (these are MMS basic data types), the value range (size) is also defined (besides the class). The value range of the class “integer” can contain one, two, or more octets. The value range four octets (often represented as Int32), e.g., indicates that the value cannot exceed these four octets. On the other hand, with ASN.1 BER coding (explained later) and the value range Int32, the decimal value 5 will be transmitted in only one octet (not in four). That is, only the length needed for the current value will be transmitted. • The aspect of the representation of the data and their structuring during transmission on the line (communication in the original meaning) is dealt with below in the context of the encoding of messages. The path description is used in five ways (see Figure 36.17): • • • •
During access to and during the definition of variable objects In the type description of variable objects In the description of data during reading, for example During the definition of named type objects (an object name of its own is assigned to the type description — i.e., to one or several paths — of those objects) • When reading the attributes of variables and named type objects
© 2005 by CRC Press
The Standard Message Specification for Industrial Automation Systems
36-21
36.10.2 Objects of the MMS Variable Model The five objects of the MMS variable model are: Description of simple or complex values: • Unnamed variable • Named variable List of several unnamed variables or named variables: • Named variable list • Scattered access (not explained here) Description of the structure by means of a user-defined name: • Named type
36.10.3 Unnamed Variable The unnamed variable object describes the assignment of an individual MMS variable to a real variable that is located at a definite address in the device. An unnamed variable object can never be created or deleted. The unnamed variable has the following attributes: Object: Key attribute: Attribute: Attribute: Attribute:
Unnamed variable Address MMS deletable (FALSE) Access method (PUBLIC) Type description
36.10.3.1 Address Address is used to reference the object. There are three different kinds: 1. Numeric address (nonnegative integer values) 2. Symbolic address (character string) 3. Unconstrained address (implementation-specific format) Even though in 2 the address is represented by character strings, this kind of addressing has absolutely to be distinguished from the object name of a named variable (see Section 36.8.1 and explanations below). 36.10.3.2 MMS Deletable The attribute is always FALSE here. 36.10.3.3 Access Method The attribute is always PUBLIC here. The attribute points to the inherent abstract type of the subordinate real variable as it is seen by MMS. It specifies the class (bit string, integer, floating point, etc.), the range of the values, and the group formation of the real variable (arrays of structures). The attribute type description is completely independent of the addressing. Figure 36.18 represents the unnamed variable roughly sketched. The unnamed variable with the address 62 (MMSString) has three components with the names value, quality, and time. These component names are only required if individual components (specifying the path, for example, 62/Value) shall be accessed.
36.10.4 MMS Address of the Unnamed Variable The MMS address is a system-specific reference that is used by the system for the internal addressing — it is quasi-released for access via MMS. There the address can assume one of three forms (here ASN.1 notation is deliberately used for the first time):
© 2005 by CRC Press
36-22
The Industrial Communication Technology Handbook
Type and Address build-in
Real data/device
VMD Temperature 62 Unnamed Variable Address 62 Value Type descr. Quality Time Int 32,good,Time 32
Read 62
Int 32 good/bad Time 32
Unnamed Variable 63 Unnamed Variable 64 Unnamed Variable 65 ....
mapping
FIGURE 36.18 Unnamed variable object.
Address ::= CHOICE { numericAddress [0] IMPLICIT Unsigned32 symbolicAddress [1] MMSString unconstrainedAddress [2] IMPLICIT OCTET STRING }
The definition above has to be read as follows: address defines as (::=) a selection (keyword CHOICE) of three possibilities. The possibilities are numbered here from [0] to [2] to be able to distinguish them. The keyword IMPLICIT is discussed later. The numeric address is defined as Unsigned32 (four octets). Thus, the addresses can be defined as an index with a value range of up to 2**32. Since only the actual length (e.g., only one octet for the value 65) will be transmitted for an Unsigned32, the minimal length of the index, which can thus be used, is merely one octet. Already, 255 objects (of arbitrary complexity) can be addressed with one octet. The symbolic address can transmit an arbitrarily long MMSString (for example, DB5_DW6). The unconstrained address represents an arbitrarily long octet string (for example, 24FE23F2A1hex). The meaning and structure of these addresses are outside the scope of the standard. These addresses can be used in MMS unnamed variable and named variable objects and in the corresponding services. MMS can neither define nor change these addresses. The address offers a possibility to reference objects by short indexes. The addresses can be structured arbitrarily. Unnamed variables could, for example, contain measurements in the address range [1000 to 1999], status information in the address range [3000 to 3999], limit values in the address range [7000 to 7999], etc.
36.10.5 Services for the Unnamed Variable Object 36.10.5.1 Read This service uses the “variable get” (V-Get) function to transmit the current value of the real variable, which is described by the unnamed variable object, from a server to a client. V-Get represents the internal, system-specific function through which an implementation gets the actual data and provides them for the communication. 36.10.5.2 Write This service uses the V-Put function to replace the current value of the real variable, which is described by the unnamed variable object, by the enclosed value.
© 2005 by CRC Press
The Standard Message Specification for Industrial Automation Systems
36-23
36.10.5.3 Information Report Like “read,” but without prior request by the client. Only the read.response is sent by the server to the client without being asked. The information report corresponds to a spontaneous message. The application itself determines when the transmission is to be activated. 36.10.5.4 Get Variable Access Attributes Through this operation, a client can query the attributes of an unnamed variable object.
36.10.6 Explanation of the Type Description Features of the structure description of MMS variable objects were explained in principle above. For those interested in the details, the formal definition of the MMS type specification is explained according to Figure 36.19. TypeSpecification ::= CHOICE { type Name
[0] ObjectName,
array
[1] IMPLICIT SEQUENCE { packed
[0] IMPLICIT BOOLEAN DEFAULT FALSE,
number Of Elements [1] IMPLICIT Unsigned32, element Type structure
[2] TypeSpecification}, [2] IMPLICIT SEQUENCE {
packed
[0] IMPLICIT BOOLEAN DEFAULT FALSE,
components
[1] IMPLICIT SEQUENCE OF SEQUENCE {
componentName [0] IMPLICIT Identifier OPTIONAL, componentType
Simple
Size
boolean
[3] IMPLICIT NULL,
- BOOLEAN
bit-string
[4] IMPLICIT Integer32,
- BIT-STRING
integer
[5] IMPLICIT Unsigned8,
- INTEGER
unsigned
[6] IMPLICIT Unsigned8,
- UNSIGNED
floating-point
[7] IMPLICIT SEQUENCE {
Class
format-width
Unsigned8,
exponent-width
Unsigned8
real
- # of bits in fraction plus sign - size of exponent in bits },
[8] IMPLICIT SEQUENCE { base
[0] IMPLICIT INTEGER(2|10),
exponent
[1] IMPLICIT INTEGER,
mantissa
[2] IMPLICIT INTEGER
- max number of octets - max number of octets },
octet-string
[9] IMPLICIT Integer32,
- OCTET-STRING
visible-string
[10] IMPLICIT Integer32,
- VISIBLE-STRING
generalized-time
[11] IMPLICIT NULL,
- GENERALIZED-TIME
binary-time
[12] IMPLICIT BOOLEAN,
- BINARY-TIME
bcd
[13] IMPLICIT Unsigned8
objId
[15] IMPLICIT NULL}
FIGURE 36.19 MMS type specification.
© 2005 by CRC Press
[1] TypeSpecification } },
- BCD
36-24
The Industrial Communication Technology Handbook
The description in ASN.1 was deliberately selected here too. The type specification is a CHOICE (selection) of 15 possibilities (tags [0] to [13] and [15]). Tags are qualifications of the selected possibility. The first possibility is the specification of an object name, a named type object. If we remember that one named type object describes one or several paths, then the use is obvious. The path description referenced by the name can be used to define a named variable object. Or, if during reading the path must be specified, it can be referenced by a named type object in the server. Note that the ASN.1 definitions in MMS are comparable with Extensible Markup Language (XML) schema. ASN.1 BER provide very efficient message encoding compared to XML documents. The coming standard IEC 61400-25 applies ASN.1 as well as XML schema for the specification of messages. The two next possibilities (array and structure) have a common feature. Both refer — through their element type or component type — back to the beginning of the complete definition (type specification). This recursive definition allows the definition of arbitrarily complex structures. Thus, an element of a structure can in turn be a structure or an array. Arrays are defined by three features. Packed defines whether the data are stored optimized. “Number of elements” indicates the number of the elements of the array of equal type (element type). The data of structures can also be saved as packed. Structures consist of a series of components (components [1] IMPLICIT SEQUENCE OF SEQUENCE). This series is marked by the keyword SEQUENCE OF, which describes a repetition of the following definition. Next in the list is SEQUENCE {component Name and component Type}, which describes the individual component. Since the SEQUENCE OF (repetition) can be arbitrarily long, the number of the components at a node is also arbitrary. Then follow the simple data types. They start at tag [3]. The length of the types is typical for the simple data types. For example, integers of different lengths can be defined. The length (size) is defined as Unsigned8, which allows for an integer with the length of 255 octets. It should be mentioned here that in the ASN.1 description of the MMS syntax, expressions like “integer” (written in small letters) show that they are replaced by another definition (in this case, by tag [5] with the IMPLICIT-Unsigned8 definition). Capital letters at the beginning indicate that the definition is terminated here; it is not replaced anymore. It is here a basic definition. Figure 36.20 shows an example of an object defined in IEC 61850-7-4. The circuit breaker class is instantiated as XCBR1. The hierarchical components of the object are mapped to MMS (according to IEC 61850-8-1). The circuit breaker is defined as a comprehensive MMS named variable. The components of the hierarchical model can be accessed by the description of the alternate access: XCBR1 component ST component Pos component stVal. Another possibility of mapping the hierarchy to a flat name is depicted in Figure 36.21. Each path is defined as a character string.
36.10.7 Named Variable The named variable object describes the assignment of a named MMS variable to a real variable. Only one named variable object should be assigned to a real variable. The attributes of the object are as follows: Object: Named variable Key attribute: Variable name Attribute: MMS deletable Attribute: Type description Attribute: Access method (PUBLIC, etc.) Constraint: Access method = PUBLIC Attribute: Address 36.10.7.1 Variable Name The variable name unambiguously defines the named variable object in a given scope (VMD specific, domain specific, or application association specific). The variable name can be 32 characters long (plus 32 characters if the object has a domain scope).
© 2005 by CRC Press
The Standard Message Specification for Industrial Automation Systems
IEC 61850 View XCBR1
MMS View XCBR1
Functional Component
Pos
ctlVal operTim stVal q t pulse Config origin ctlNum d ctlModel sboTime out sboClass BlkOpn ctlVal operTim stVal q t origin ctlNum d ctlModel
CO Pos ctlVal operTim origin ctlNum BlkOpn ctlVal operTim origin ctlNum
CO CO ST ST ST CF CO CO DC CF CF CF
ST
CO CO ST ST ST CO CO DC CF
Pos stVal q t BlkOpn stVal q t
CF Pos pulse Config ctlModel sboTimeout sboClass BlkOpn ctlModel DC Pos d BlkOpn d
FIGURE 36.20 Example MMS named variable. MMS View XCBR1 CO Pos ctlVal operTim origin ctlNum BlkOpn ctlVal operTim origin ctlNum XCBR1$CO XCBR1$CO$Pos XCBR1$CO$Pos$ctlVal XCBR1$CO$Pos$operTim …
FIGURE 36.21 Example MMS named variable.
© 2005 by CRC Press
36-25
36-26
The Industrial Communication Technology Handbook
Read MeasurementTIC13 VMD GetVAAttr. MeasurementTIC13 23 24 hex Read 23 24 hex
Named Variable Name MeasurementTIC13 Address 23 24 hex Type descr. Value Int32 Quality good/bad Time Time32
fast access (without Table)
access with Table
Int32, good, Time 32
FIGURE 36.22 Address and variable name of named variable objects.
36.10.7.2 MMS Deletable This attribute shows whether the object may be deleted using a service. 36.10.7.3 Type Description This attribute describes the abstract type of the subordinate real variable as it represents itself to the external user. This attribute is not inherently in the system, unlike the unnamed variable object; i.e., this type description can be defined from the outside. 36.10.7.4 Access Method This attribute contains the information that a device needs to identify the real variable. It contains values that are necessary and adequate to find the memory location. The contents lie outside MMS. A special method, the method PUBLIC, is standardized. The attribute address is also available in the case of PUBLIC. This is the address that identifies an unnamed variable object. Named variables can thus be addressed by the name and the ADDRESS (see Figure 36.22). 36.10.7.5 Address See Section 36.10.3. Defining a named variable object does not allocate any memory because the real variable must already exist; it is assigned to the named variable object with the corresponding name. Altogether, six operations are defined on the object: Read: The service uses the V-Get function to retrieve the current value of the real data, which is described by the object. Write: The service uses the V-Put function to replace the current value of the real data, which is described by the object, by the enclosed value. Define named variable: This service creates a new named variable object, which is assigned to real data. Get variable access attribute: Through this operation, a client can query the attributes of a named variable object. Delete variable access: This service deletes a named variable object if attribute deletable is (=TRUE). Figure 36.23 shows the possibilities to reference a named variable object by names and, if it is required, also by address (optimal access reference of a given system). For a given name, a client can query the address by means of the service “get variable access attribute.” This possibility allows access through technological names (MeasurementTIC13) or with the optimal (index-) address 23 24 hex. As shown in Figure 36.23, an essential feature of the VMD is the possibility for the client application to define by request named variable objects in the server via the communication. This includes the definition of the name, the type, and the structure. The name by which the client would like to reference
© 2005 by CRC Press
The Standard Message Specification for Industrial Automation Systems
36-27
Client defined Type and Name Real Data/device Def. TIC42 at 22 31 hex
VMD
Temperature at Addr. 22 31
DataValue TIC42 22 31 hex Addr. Value Int32 Read TIC42 Type Quality good/bad Time Time32 Int32,good,Time32
1_2_4 mapping
MMS-Services
MMS-Server
FIGURE 36.23 Client-defined named variable object.
the named variable later is TIC42 here. The first component, “value,” is of the type Integer32, the second is “quality” with the values good or bad, and the third is “time” of the type Time32. The type of the data value object can be arbitrarily simple (flat) or complex (hierarchical). As a rule, the data value objects are implicitly created by the local configuring or programming of the server (they are predefined). The internal assignment of the variable to the real temperature measurement is made by a systemspecific, optimal reference. This reference, whose structure and contents are transparent, must be known when defining the named variable, though. The reference can, e.g., be a relative memory address (for example, DB5 DW15 of a PLC). So quick access to the data is allowed. The named variable object describes how data for the communication are modeled, accessed, encoded, and transmitted. What is transmitted is described independently of the function. From the point of view of the communication, it is not relevant where the data in the server actually come from or where in the client they actually go to and how they are managed — this is deliberately concealed. Figure 36.24 shows the concrete encoding of the information report message. The message is encoded according to ASN.1 BER. The encoding using XML would be several times longer than using ASN.1 BER. These octets are packed into further messages that add lower-layer-specific control and address information, e.g., the TCP header, IP header, and Ethernet frames. The receiving ID is able to interpret the report message according to the identifier, lengths, names, and other values. The interpretation of the message requires the same stack, i.e., knowledge of all layers involved, including the definitions of IEC 61850-7-4, IEC 61850-7-3, IEC 61850-7-2, and IEC 61850-7-1.
36.10.8 Access to Several Variables 36.10.8.1 Named Variable List The named variable list allows the definition of a single name for a group of references to arbitrary MMS unnamed variables and named variables. Thus, the named variable list offers a grouping for the frequently repeated access to several variables (Figure 36.25). Although the simultaneous access to several MMS variables can also be carried out in a single service (read or write), the named variable list offers a substantial advantage. When reading several variables in a read request, the individual names and the internal access parameters (pointers and lengths), corresponding to the names in the request, must be searched for in a server. This search can last for some time in the case of many names or a low processor performance. By using the named variable list object, the search is not required — except for the search of a single name (the name of the named variable list object) — if the references, for example, have been entered into the named variables on the list system specifically, and thus optimally. Once the name of the list has been found, the appropriate data can be provided quickly.
© 2005 by CRC Press
36-28
The Industrial Communication Technology Handbook
g) Ta r( h nt te it fie ngt on en Le C d I A3 4E
MMSpdu ::= CHOICE {... .unconfirmed-PDU [3] IMPLICIT SEQUENCE {... ..CHOICE { ... A0 4C ...informationReport [0] IMPLICIT SEQUENCE {... ....variableAccessSpecification CHOICE { ... 01 06 .....variableListName [1] CHOICE { ... 80 04 XX XX XX XX ......vmdSpecific [0] IMPLICIT VisibleSting} .....}, --end of variableAccessSpecification A0 42 ....listOfAccessResult [0] INPLICIT SEQUENCE OF CHOICE { ... A2 40 .....success CHOICE{ ... structure [2] IMPLICIT SEQUENCE OF --„Data“ 8A 11 XX XX XX XX XX XX XX XX XX XX .. .. .. CHOICE {...visible-string [10] IMPLICIT VisibleString }, -- Rpt ID XX XX XX XX XX XX XX 84 02 0110 1 octet for the tag; ..CHOICE {...bit-string [4] IMPLICIT BIT STRING <0001 000x>}, --Opt Flds 84 04 04 80 00 00 .. ...CHOICE {...bit-string [4] IMPLICIT BIT STRING <0100 0000 0000 0000 0000 xxxx>}, --InclBS 1 octet for length; A2 1E ..CHOICE {...structure [2] IMPLICIT SEQUENCE OF -- Value(s) -here only one A2 1C ...CHOICE {... structure[2] IMPLICITSEQUENCE OF -- Value 1 octet for value 85 01 01 ........CHOICE{... integer [5] IMPLICIT INTEGER <1>}, - stVal 84 03 00 00 00 ........CHOICE{... bit-string [4] IMPLICIT BIT STRING <0000 0000 0000>},-- q 90 08 XX XX XX XX XX XX XX XX ........CHOICE{... utc-time [17] IMPLICIT UtcTime<SS SS SS SS QQ MM MM MM>}, --t A2 08 ........CHOICE{... structure [2] IMPLICIT SEQUENCE OF -- origin 85 01 03 .........CHOICE {... integer [5] IMPLICIT INTEGER <3> }, -- origin.or Cat 8A 03 XX XX XX .........CHOICE {... visible-string [10] IMPLICIT VisibleString }} -- origin.orIdent ........} --end of „Value“ .......},-- end of „Value(s)“ 85 01 01 ......CHOICE {...integer [5] IMPLICIT INTEGER<1>} -- reason Code ......}--end of „Data“ .....} -- end of ListOfAccessResult ....}-- end of informationReport Σ 80 Byte ...} (44 Byte pay load) ..} --end of unconfirmedPDU Interpretation of received message .} -- end of MMSpdu (Tag values->ASN.1 syntax (Schema))
MMS Syntax (written in ASN.1) defined in ISO 9506-2
FIGURE 36.24 MMS information report (spontaneous message). MMS Named Variable List Read LIST_1
LIST_1 NV_1 NV_2 NV_3 NV_4
local/optimized References
MMS Named Variable NV_1
Read NV_1, NV_2, NV_3, NV_4 NV_2 NV_3 D1, D2, D3, D4 NV_4
FIGURE 36.25 MMS named type and named variable.
Thus, the named variable list object provides optimal access features for the applications. This object class is used in the known applications of MMS very intensively. The structure of the named variable list object is as follows: Object: Key attribute: Attribute: Attribute:
© 2005 by CRC Press
Named variable list Variable list name MMS deletable (TRUE, FALSE) List of variable
The Standard Message Specification for Industrial Automation Systems
36-29
Attribute: Kind of reference (NAMED, UNNAMED, SCATTERED) Attribute: Reference Attribute: Access description 36.10.8.2 Variable List Name The variable list name unambiguously identifies the named variable list object in a given scope (VMD specific, domain specific, or application association specific). See also MMS object names in Section 36.8.1. 36.10.8.3 MMS Deletable This attribute shows whether the object may be deleted. 36.10.8.4 List of Variable A list can contain an arbitrary number of objects (unnamed variable, named variable, or scattered access object). 36.10.8.5 Kind of Reference Lists can refer to three object classes: named variables, unnamed variables, and scattered access. No named variable lists can be included. 36.10.8.6 Reference An optimal internal reference to the actual data is assigned to every element of the list. If a referenced object is not (anymore) available, the entry into the list will indicate it. When accessing the list, for example, by “read,” no data but an error indication will be transmitted to the client for this element. 36.10.8.7 Access Description Each variable of the list may be the complete variable. The Access Description may reduce the referenced variable, i.e., only a part of the variable is made visible through the named variable list.
36.10.9 Services 36.10.9.1 Read This service reads the data of all objects that are part of the list (unnamed variable, named variable, and scattered access object). For objects that are not defined, an error is reported in the corresponding place of the list of the returned values. 36.10.9.2 Write This service writes the data from the write request into the objects that are part of the list (unnamed variable, named variable, and scattered access object). For objects that are not defined, an error is reported in the corresponding place of the list of the returned values. 36.10.9.3 Information Report This is just like the “read” service, where the read data are sent by the server to the client without prior request (read request) by the client, i.e., as if only a “read response” would be transmitted. 36.10.9.4 Define Named Variable List Using this service a client can create a named variable list object. 36.10.9.5 Get Named Variable List Attributes This service queries the attributes of a named variable list object. 36.10.9.6 Delete Named Variable List This service deletes the specified named variable list object.
© 2005 by CRC Press
36-30
The Industrial Communication Technology Handbook
VMD Named Name Type Define Named Variable TIC42 at 22 31hex NamedType Mwert
Type MWert
Value Int32 Quality good/bad Time Time32
inheritance
Only Type! (not Name MWert)
Named Variable Name TIC42 Addr. 22 31 hex Value Int32 Type Quality good/bad
Time
Time32
FIGURE 36.26 Inheritance of type of the MMS named type objects.
36.10.9.7 Named Type Object The named type object merely describes structures. The object model is very simple: Object: Key attribute: Attribute: Attribute:
Named type Type name MMS deletable (TRUE, FALSE) Type description
The essential attribute is the type description, which was already discussed before for named and unnamed variables. On the one hand, TASE.2 standard data structures can be specified by means of named types. This is the most frequent application of the named type objects. On the other hand, named types can be used for access to the server. The read request can refer to a named type object. Or the named type object will be used to define named variables. Figure 36.26 describes the application of the named type objects for the definition of a named variable. A variable will be created by the request “define named variable.” It shall have the name TIC42, the address 22 31 hex, and the type that is defined in the named type object, MWert. The variable inherits the type from the named type object. The inheritance has the consequence that the variable will have only the type — not the name of the named type object. This inheritance was defined so strictly in order to avoid, through deleting the named type, the type of the variable becoming undefined or, by subsequent definition of a differently structured named type with the old type name, MWert, the type of the variable being changed (the named type object and the new type description would be referenced by the old name). Perhaps it is objected now that this strict inheritance has the consequence that the type also would have to be saved for each variable (even though many variables have the same type description). Since these variables can internally be implemented in a system in whatever way the programmer likes, they can refer through an internal index to a single type description. He must only make sure that this type description is not deleted. If the accompanying named type object gets deleted, then the referenced type description must remain preserved for these many variables. The disadvantage that the name of the “structure mother,” i.e., the named type, is not known anymore as an attribute of the variables has been eliminated in the MMS revision. 36.10.9.8 Define Named Type This service creates a named type object.
© 2005 by CRC Press
The Standard Message Specification for Industrial Automation Systems
36-31
36.10.9.9 Get Named Type Attribute This service delivers all attributes of a named type object. Read, write, define named variable, define scattered access, define named variable list, and define named type use the type description of the named type object when carrying out their tasks.
36.11 Conclusion MMS is a standard messaging specification (comparable to Web services), widely implemented by industrial device manufacturers like ABB, Alstom, General Electric, and Siemens. It solves problems of heterogeneity so often found in automation applications. MMS is the lingua franca of industrial devices. MMS provides much more than TCP/IP, which essentially offers a transfer stream of bytes. MMS transfers commands with parameters between machines. MMS allows a user to concentrate on the applications and the application data to be accessible — and not on communication problems, which are already solved. It provides a basis for the definition of common and domain-specific semantics. Examples are the standards IEC 60870-6 TASE.2, IEC 61850, and IEC 61400-25.
References [1] ISO 9506-1, Manufacturing Message Specification (MMS): Part 1: Service Definition, 2003. [2] ISO 9506-2, Manufacturing Message Specification (MMS): Part 2: Protocol Definition, 2003. [3] ISO/IEC 9506-3, Manufacturing Message Specification (MMS): Part 3: Companion Standard for Robotics, 1991. [4] ISO/IEC 9506-4, Manufacturing Message Specification (MMS): Part 4: Companion Standard for Numerical Control, 1992. [5] ISO/IEC CD 9506-5, Manufacturing Message Specification (MMS): Part 5: Companion Standard for Programmable Controllers, 1993. [6] ISO/IEC 9506-6, Manufacturing Message Specification (MMS): Part 6: Companion Standard for Process Control, 1993. [7] ESPRIT Consortium CCE-CNMA, Preston, U.K. (Editors), MMS: A Communication Language of Manufacturing, Berlin: Springer-Verlag, 1995. [8] ESPRIT Consortium CCE-CNMA, Preston, U.K. (Editors), CCE: An Integration Platform for Distributed Manufacturing Applications, Berlin: Springer-Verlag, 1995. [9] Inter-control center communication, IEEE Transactions on Power Delivery, 12, 607–615, 1997. [10] IEC 60870-6-503, Telecontrol Equipment and Systems: Part 6: Telecontrol Protocols Compatible with ISO Standards and ITU-T Recommendations: Section 503: Services and Protocol (ICCP Part 1), 1997. [11] IEC 60870-6-802, Telecontrol Equipment and Systems: Part 6: Telecontrol Protocols Compatible with ISO Standards and ITU-T Recommendations: Section 802: Object Models (ICCP Part 4), 1997. [12] März, W. and Schwarz, K., Powerful and open communication platforms for the operation of interconnected networks, in Proceedings of ETG-Tage/IEEE PES, Berlin, 1997. [13] IEEE Technical Report 1550, Utility Communications Architecture, UCA, http://www.nettedauto mation.com/standardization/IEEE_SCC36_UCA, 1999. [14] Becker, G., Gärtner, W., Kimpel, T., Link, V., März, W., Schmitz, W., and Schwarz, K., Open Communication Platforms for Telecontrol Applications: Benefits from the New Standard IEC 608706 TASE.2 (ICCP), Report 32, VDE-Verlag, Berlin, 1999. [15] Wind Power Communication: Verification Report and Recommendation, Elforsk rapport 02:14; Stockholm, April 2002, www.nettedautomation.com/download/02_14_rapport.pdf. [16] IEC 61850-7-1, Communication Networks and Systems in Substations: Part 7-1: Basic Communication Structure for Substation and Feeder Equipment: Principles and Models, 2003.
© 2005 by CRC Press
36-32
The Industrial Communication Technology Handbook
[17] IEC 61850-7-2, Communication Networks and Systems in Substations: Part 7-2: Basic Communication Structure for Substation and Feeder Equipment: Abstract Communication Service Interface (ACSI), 2003. [18] IEC 61850-7-3, Communication Networks and Systems in Substations: Part 7-3: Basic Communication Structure for Substation and Feeder Equipment: Common Data Classes, 2003. [19] IEC 61850-7-4, Communication Networks and Systems in Substations: Part 7-4: Basic Communication Structure for Substation and Feeder Equipment: Compatible Logical Node Classes and Data Classes, 2003. [20] IEC 61850-8-1, Communication Networks and Systems in Substations: Part 8-1: Specific Communication Service Mapping (SCSM): Mappings to MMS (ISO/IEC 9506-1 and ISO/IEC 9506-2) and to ISO/IEC 8802-3, 2004. [21] IEC CD 61400-25, Wind Turbines: Part 25: Communications for Monitoring and Control of Wind Power Plants, 2004.
Further Resources See the following Web pages for additional information: http://www.livedata.com http://www.sisconet.com http://www.tamarack.com http://www.scc-online.de http://www.nettedautomation.com http://www.nettedautomation.com/qanda/iec61850/information-service.html# (IEC 61850 circuit breaker model)
© 2005 by CRC Press
37 Virtual Factory Communication System Using ISO 9506 and Its Application to Networked Factory Machine 37.1 Introduction ......................................................................37-1 37.2 MMS on Top of TCP/IP ...................................................37-2 37.3 Design of a Virtual Factory Communication System Using MMS ........................................................................37-3 37.4 Virtual Factory Communication System Using MMS-CS ............................................................................37-3
Dong-Sung Kim Kumoh National Institute of Technology
Zygmunt J. Haas Cornell University
MMS Companion Standard • MMS Internet Monitoring System
37.5 Practical Application of MMS-CS and VFCS .................37-8 37.6 Conclusions .......................................................................37-9 Acknowledgment .........................................................................37-9 References .....................................................................................37-9
37.1 Introduction For interconnection purposes, a factory automation (FA) system can be combined with various sensors, controllers, and heterogeneous machines using a common message specification. In particular, interconnection of heterogeneous networked machines through a common message specification promotes flexibility and interoperability. The Manufacturing Message Specification (MMS) standard has been developed to facilitate such interconnection. The standard specifies a set of communication primitives and communication protocols for the factory communication environment. In particular, the MMS standard specifies various functionalities of the different FA devices in a compatible way. Thus, users of MMS applications have to use functions from only one unique set of functionalities to operate various kinds of automation machines. Moreover, the different automation machines can communicate among themselves through the standard automation language. This enables transition from the traditional centralized control to the distributed networked control systems.
37-1 © 2005 by CRC Press
37-2
The Industrial Communication Technology Handbook
The MMS standard is composed of two types of standards: the common standards [1] [2] and the MMS companion standard (MMS-CS). The MMS-CS includes device-dependent specifications for various factory machines: a robot [3], a numerical controller (NC) [4], a programmable logic controller (PLC) [5], and a process control (PC) [6]. In its initial stage, the MMS standard was developed for an application layer of the Manufacturing Automation Protocol (MAP) [7]. However, nowadays MMS has been implemented on top of the Transmission Control Protocol (TCP)/Internet Protocol (IP) suite. MMS is used as a reference model for industrial network or other message protocols, such as the home network protocol [8]. In particular, MMS is implemented in a reduced form as the application layer of Profibus (process fieldbus) [9] and the Field Instrumentation Protocol (FIP) [10]. It is also used as an application layer of Utility Communication Architecture 2.0 (UCA 2.0) and Inter-Control Center Communication Protocol (ICCP) [11]. A virtual factory environment can be used to examine the correctness of an implementation and as a test solution of an FA system prior to installing the system in a real factory communication environment [12] [13]. Furthermore, by using the virtual factory communication system (VFCS), development time and costs can be reduced. In particular, VFCS can be used as a training tool of MMS users and developers. Research in the MMS technology includes the application of factory devices, a middleware platform, and multimedia communication of an FA system. In [14], a real NC machine was implemented using the MMSenabled application program. In [15], MMS with Common Object Request Broker Architecture (CORBA) was studied. Modified architecture was proposed for multimedia communication in an FA system [16] [17]. The performance evaluation and an analysis methodology of the MMS standard have been studied as well [18] [19] [20]. In this chapter, we describe a VFCS implementation using the MMS and MMS-CS standards. In addition, we discuss the implementation of a practical FA system based on the developed VFCS. This chapter is organized as follows. In Section 37.2, the design of the architecture of MMS communication on top of TCP/IP is described. In Section 37.3, the design procedure of VFCS using the MMS standard is described. The developed VFCS and its applications are presented in Section 37.4. In Section 37.5, a practical application of VFCS is introduced. Finally, summary and conclusions are presented in Section 37.6.
37.2 MMS on Top of TCP/IP Two types of protocol structures for the MMS communication on top of TCP/IP, referred to as MOTIP, have been studied [21]. One method, which is based on Requests for Comments (RFC) 1006, uses the Open Systems Interconnection (OSI) seven-layer model. The other method is based on N578 and defines the direct mapping relation between the M-services and the TCP functions. In particular, we chose to use the N578 method in the implemented system because of its simplicity. The differences between the structures of a full MAP, mini-MAP, and MOTIP are presented in Figure 37.1. In the case of mini-MAP, the IEEE 802.4 Token Bus network is used instead of the Ethernet (IEEE 802.3) subnetwork. As shown in Figure 37.1, the full MAP structure has a scheme similar to that of 1
MMS
ACSE
MMS
MMS
2 Presentation 3
Session
4
Transport
5
Network
6
Data Link
Data Link
IP
7
Ethernet
Token Bus
Ethernet
TCP
FIGURE 37.1 Various types of the MMS protocol structures (full MAP, mini-MAP, and MOTIP).
© 2005 by CRC Press
Virtual Factory Communication System
37-3
FIGURE 37.2 Protocol stack of MOTIP.
MOTIP based on the RFC 1006 method, and the mini-MAP structure is similar to that of MOTIP based on the N578 method. The protocol stack of MOTIP using the N578 rule is shown in Figure 37.2. Once an MMS function is requested at the MMS client side, the synchronous service or the asynchronous service can be used for communication. Using the synchronous service, the MMS client receives a confirmation through a response from the MMS server. In the asynchronous service, the functions can be executed according to a received event through the response of the MMS server. Most of the implemented MMS services are executed in the synchronous mode in the developed system.
37.3 Design of a Virtual Factory Communication System Using MMS Large factory communication systems, such as those in an automated factory or process plant (e.g., nuclear or refinery processes), are composed of several distributed and spatially dispersed field devices or machines. Some benefits of an interconnection of these field devices or machines through networking include increased reliability, increased flexibility, and a user-friendly interface design. For these reasons, VFCS implemented using MMS can be a useful tool for engineers prior to a practical installation of a factory communication system. It can be used to test an implemented communication program prior to a practical installation on real machines. In particular, MMS emphasizes vendor-independent communication means and easily expandable services. In this section, we introduce an efficient and practical implementation of a VFCS using MMS. The implemented VFCS is based on the description of a practical test plant shown in Figure 37.3 and Figure 37.4. These figures show an overview (Figure 37.3) and the layout (Figure 37.4) of a practical test plant implemented using MMS communication on top of TCP/IP.
37.4 Virtual Factory Communication System Using MMS-CS In this section, we introduce the developed VFCS using MMS-CS and describe its operation. The virtual factory can be embodied by integrating each virtual real machine (VRM), the MMS-CS program, and the MOTIP programs. Communication with each VRM can be carried out using the developed MMSCS program based on the Seoul National University (SNU) MMS library [23]. The MOTIP central server program can be located in a control room of a factory plant. The MOTIP cental server can monitor and command the MMS services to the networked manufacturing machines. The implemented MMS services are as follows: virtual manufacturing device (VMD) service for the status acquisition of a networked machine, domain service for file transfer, program invocation (PI) service for the remote execution of a program, and miscellaneous services for reading and writing services of device status and operations. PI services includes creation, initialization, termination, and deletion of a remotely executed program. PI service can be carried out with other services. For example, the remote execution of a specific program can be
© 2005 by CRC Press
37-4
The Industrial Communication Technology Handbook
FIGURE 37.3 Overview photo of the referenced practical test plant.
TONGIL CNC
Drilling
Cleaning Machine POSCON PLC Moving Part
Samsung Robot Assembly
AGV
HHI Robot
AS/RS
Cell Controller
FIGURE 37.4 Reference layout of the virtual factory.
performed after downloading the status and segment service using VMD and the domain service (Figure 37.5).
37.4.1 MMS Companion Standard The MMS-CS includes device-dependent specifications for several factory machines such as the robot, NC, PLC, and PC. The implementations of the MMS-CS VRMs are based on the MMS services specification in [3]–[6]. Specifically, the services of the implemented MMS-CS VRMs consist of VMD, domain, PI, and miscellaneous services. The implemented services are as follows: • VMD services (three services): status, identify, and unsolicited status • Domain service (five services): initiate download, download segment, terminated download, initiate upload segment, and terminate upload • PI services (four services): create, start, stop, and delete • Variable services (two services): read and write
© 2005 by CRC Press
Virtual Factory Communication System
FIGURE 37.5
37-5
Screen shot of the developed virtual factory.
The following services have been implemented only partially: semaphore, operator, event, and journal. Figure 37.5 shows the overall screen shot of the developed virtual factory. The VFCS is operated by controlling each VRM through the MOTIP-based communication programs. Figure 37.6 shows one of the reference models of practical machines. Figure 37.7 shows NC-CS VRM. The application programs of the robot-CS VRMs are modeled after a six-axis robot and a four-axis scalar robot. The six-axis robot is used for conveying the material with a pallet throughout the implemented virtual system. The four-axis robot takes part in a bolt-assembling task of the overall process. For example, NC-CS VRM is modeled after an NC machine (TNV-40) (Figure 37.6). NC-CS VRM is responsible for drilling a hole in the sample material. The application program of PLC-CS VRM is modeled after a cleaning machine with an air compressor and clamps. It is controlled by PLC, which is installed under the cleaning machine. PLC-CS VRM takes part in the cleaning of assembled parts after the drilling process performed by the NC-CS VRM (Figure 37.7). The PC-CS VRM program can be scheduled for all the networked machines with the autonomous guided vehicle (AGV) movement (Figure 37.8). The CS application program includes all the VRMs, the common MMS, and the MMS-CS services. All networked virtual machines are operated in accordance with their scheduled operation scenarios [21], and the operation scenarios can be modified by the MMS user (Figure 37.8). The overall scenario can be scheduled by the MMS user. For reference, the source of the VRMs and the MOTIP program can be downloaded from the site at [23].
37.4.2 MMS Internet Monitoring System MMS Internet Monitoring System (MIMS) is designed for simultaneous monitoring and controlling of the status of multiple clients based on VFCS. MIMS transfers the operational information of the virtual
© 2005 by CRC Press
37-6
The Industrial Communication Technology Handbook
FIGURE 37.6 Practical model of an NC machine.
FIGURE 37.7 Screen shot of the NC-CS VRM program.
© 2005 by CRC Press
Virtual Factory Communication System
FIGURE 37.8
37-7
Screen shot of PC-CS VRM: AGV moves from station 1 to station 4 for unloading a processed part.
factory to an MMS administrator and external MMS users. Therefore, MIMS can concurrently monitor the status of each networked VRM and several MMS-CS services. The structure of MIMS is shown in Figure 37.9. MIMS is composed of the MOTIP application program, the application program using MMS-CS, and each networked VRM. The MMS server monitoring program of the server area can display the messages from external MMS users through online monitoring of the status of VRM.
FIGURE 37.9 Structure of the MIMS system.
© 2005 by CRC Press
37-8
The Industrial Communication Technology Handbook
MIMS is designed for up to 100 multiple MMS users. MMS users can download the MMS-CS and the client-specific program from the site at [22]. Whenever MMS clients use the downloaded program based on MMS-CS, MIMS can monitor the operation status through the user information and the communication data between MMS client and server. The administrator can monitor all the messages from the multiple clients and check for problems in the networked VRM based on MOTIP. The MMS user can then fix his technical problem through online advisories from the MMS server administrator. Therefore, the developed system can serve as a training tool for MMS users.
37.5 Practical Application of MMS-CS and VFCS In this section, we introduce one of the practical systems that uses MMS-CS on TCP/IP and its VRM. The applicability of the described networked factory machine using the MMS and its CS part can be extended to several factory machines using MOTIP. Figure 37.10 shows a specific application program for the NC interface by an MMS-enabled application program. To connect the NC machine, one can use RS232, Ethernet, or a vendor-specific interface, such as DNC2 [24]. It presents an experimental platform that implements an MMS-CS-based manufacturing communication system to the NC machine. This platform has been developed in a practical molding plant for the support of CAD/CAM data transferring and remote operation of the NC machine using the MMSenabled communication system (Figure 37.11). For practical FA implementation, a file transfer between the CAD/CAM part and the device control and monitoring part has been designed and developed. In addition, an NC code transformation program and position detection program for tools using hardware systems were developed for integration of the heterogeneous NC machine.
FIGURE 37.10 Screen shot of the NC-specific program.
© 2005 by CRC Press
37-9
Virtual Factory Communication System
File Transferring Server
MoTIP
Switch Hub MoTIP
CAD/CAM Local Server
TCP/IP
MoTIP, TCP/IP
DNC2, RS232C
Middle Sized NC Machine Practical Field
FIGURE 37.11 Overall communication architecture of the NC machine.
The FA implementation demonstrates the design of a fully networked FA system that achieves high quality of services for remote operation. This allows for the obtaining of results from pretests using a developed VFCS test bed and verification of the application program based on MOTIP.
37.6 Conclusions This chapter presents an implementation of a VFCS system using the general MMS and its CS part. In particular, the robot, NC, PLC, PC VRMs, and MIMS were designed and implemented. The described VFCS can be extended to the MMS implementation of a networked practical plant. Additionally, the implemented system can be used as a design framework of an MMS system for a practical plant. Furthermore, the developed VFCS and the VRMs can also be used for the verification of the operation of a real FA environment. Regarding the future direction of this work, one may consider the application of wireless communication in the implementation of a VFCS for a practical FA system. Wireless communication networks can be used for online monitoring and control, together with a wired network in an FA system, especially in cases where it is difficult and expensive to install wired connectivity only.
Acknowledgment The authors thank the editor and ERC-OTT members of ERC-ACI at Seoul National University for their valuable and insightful comments.
References [1] ISO 9506-1: Industrial Automation Systems: Manufacturing Message Specification: Part1: Service Definition, 2002. [2] ISO 9506-2: Industrial Automation Systems: Manufacturing Message Specification: Part2: Protocol Specification, 2002. [3] ISO 9506-3: Manufacturing Message Specification: Companion Standard: Protocol Specification of Robot, 2000. [4] ISO 9506-4: Industrial Automation Systems: Manufacturing Message Specification: Companion Standard: Protocol Specification of NC, 2000.
© 2005 by CRC Press
37-10
The Industrial Communication Technology Handbook
[5] ISO 9506-5: Industrial Automation Systems: Manufacturing Message Specification: Companion Standard: Protocol Specification of PLC, 2000. [6] ISO 9506-6: Industrial Automation Systems: Manufacturing Message Specification: Companion Standard: Protocol Specification of Process Control, 2000. [7] MAP 3.0 Specification, 1993 release. [8] D.S. Kim, K.Y. Cho, W.H. Kwon, Y.I. Kwan, and Y.H. Kim, Home network message specification of white goods and its application, IEEE Transactions on Consumer Electronics, 48, 1–10, 2002. [9] DIN 19 245, Profibus Standard, Profibus Trade Organization, 1993. [10] prEN 50170, General Purpose Field Communication System, WorldFIP, 1995. [11] EPRI, Fundamentals of Utilities Communication Architecture, IEEE Computer Applications in Power, 14, 15–21, 2001. [12] D. Brugali, D. Dragomirescu, et al., A customizable software infrastructure for virtual factories development, in Proceedings of the IEEE International Conference on Robotics and Automation, Vol. 3, 1999, pp. 2440–2445. [13] F. Florent, Experiences with a control architecture for the virtual factory, in IEEE International Workshop on Robot and Human, 1999, pp. 387–393. [14] Brahim Maref, Communication system for industrial automation, in IEEE International Symposium on Industrial Electronics, 1997, pp. 1286–1291. [15] V. Glavinic, B. Motik, Object-oriented interface to MMS services, in Mediterranean Electrotechnical Conference, Vol. 2, 1998, pp. 1375–1379. [16] V. Matraix, J. Tordera, Video transmission on industrial processes over MAP networks using MMS Sepere, in IEEE International Conference on Emerging Technologies and Factory Automation, Vol. 1, 1999, pp. 423–433. [17] Q. Wu, T. Divoux, F. Lepage, Integrating multimedia communications into an MMS environment, Computer Communications, 22, 907–918, 1999. [18] B. Maaref, S. Nasri, P. Sicard, Performance evaluation of an MMS/ATM implementation, in Factory Communication Systems Proceedings, IEEE International Workshop, 1997, pp. 239–244. [19] J. Akazan, Z. Mammeri, On task synchronizations with the MMS protocol, Journal of Real Time Systems, 2, 213–234, 1995. [20] T. Watanabe, Y. Maeda, et al., Total simulation system for steel plate manufacturing based on the virtual factory concept, in IEEE Conference on Industrial Electronics, Vol. 2, 2000, pp. 1280–1285. [21] D.S. Kim, W.H. Kwon, Z.J. Hass, Virtual industrial communication system using Manufacturing Message Specification, in The Handbook of Industrial Information Technology, CRC Press, Boca Raton, FL, 2004. [22] Vir tual Demo System Using MOTIP, Eng ineer ing Research Center for Advanced Control Instrumentation, Seoul National University, 2003, available from http://icat.snu.ac. kr:4444/ mms/demo_frame.html. [23] SNU MMS Library, Engineering Research Center for Advanced Control Instrumentation, Seoul National University, 2003, available from http://www.icat.or.kr:8086/mms. [24] FANUC CIMPLICITY i CELL, Fanuc Corporation, 2003, available from http://www.fanuc.co.jp/en/ product/cimplicity/icell/icell.html.
© 2005 by CRC Press
38 The SERCOS interface™* 38.1 Description ........................................................................38-2 Three Generations of the SERCOS interface • SERCOS interface Is an International Standard • How the SERCOS interface Works • Functions of the SERCOS interface • Ideal for Distributed Multiaxis Control Systems • Advantages
38.2 Brief History of the SERCOS interface ...........................38-4 The ±10-Volt Analog Drive Standard • First- and SecondGeneration SERCOS interface Criteria • First- and SecondGeneration SERCOS interface: A New World Standard
38.3 The SERCOS interface Is Not a Fieldbus ........................38-5 38.4 How the SERCOS interface Communicates....................38-7 SERCOS interface Topology • Fiber Optics • SERCOS interface ASICs • Timing • Interface Placement • SERCOS interface IDNs • System of Units and Variable Format • SERCOS interface Cycle Times • Cyclic Operation • Error Correction/ Diagnostics • System Safety
38.5 Features and Operation of SERCOS-III ........................38-18 Introduction • Distinct Features of SERCOS-III • Topology • Direct Communication between Slaves • IP Channel • Motion Control Communication for Centralized and Decentralized Drive Concepts • Synchronization of Motion Controls • SERCOS-III Performance • SERCOS-III Hardware • SERCOS-III Compared to Other Ethernet-Based Real-Time Communication Systems • SERCOS-III Availability • Functionality and Features of the Three Generations of SERCOS interface
Scott C. Hibbard Bosch Rexroth Corporation
Peter Lutz Interests Group SERCOS interface e.V.
Ronald M. Larsen SERCOS North America
38.6 SERCANS........................................................................38-22 38.7 SoftSERCANS ..................................................................38-24 What SoftSERCANS Is • What SoftSERCANS Is Not • Performance • Resources Required
38.8 Conformance Testing ......................................................38-25 SERCOS interface Conformance Classes • Conformance Test Descriptions • Conformance Test Environment
38.9 The SERCOS interface in Packaging Applications .......38-26 38.10 The SERCOS interface in Machine Tool Applications .....................................................................38-29 Application Note: Contouring Control System
*Much of the text in this document is included by permission from www.sercos.com, the Web site of the SERCOS North America Trade Association.
38-1 © 2005 by CRC Press
38-2
The Industrial Communication Technology Handbook
38.11 Future Technical Advancements.....................................38-30 Acknowledgments......................................................................38-30 Bibliography ...............................................................................38-30 Sources for More Information..................................................38-30
38.1 Description SERCOS is an acronym for Serial Real-time COmmunications System, a digital motion control bus that interconnects motion controls, drives, input/output (I/O), and sensors. It is an open controller-tointelligent digital device interface, designed for high-speed serial communication of standardized closedloop data in real time over a noise-immune, fiber-optic cable or Ethernet cable. The SERCOS interface™ was created in the 1980s by the German ZVEI and VDW organizations to specify a digital open interface that would ease the transition from analog to digital drive technology. It was originally intended to be a drive interface, and in its beginning was mainly used for advanced machine tool applications. It now has become a universal motion control interface, accepted worldwide in a myriad of industries. The SERCOS interface not only is a real-time communication system, but also offers more than 500 standardized parameters that describe the interplay of drives and controls in terms independent of any manufacturer. It offers advanced motion control capabilities, and it includes features for I/O control that often allow a machine builder to dispense with the need for a separate I/O bus. The SERCOS interface is the only internationally standardized open digital interface with the performance required to synchronize high-performance multiaxis motion control systems. Like most digital buses, the SERCOS interface greatly reduces connectivity problems in control systems. The SERCOS interface can connect up to 254 drives to a control using one fiber-optic cable ring or a single Ethernet cable, compared to a traditional analog servo system with eight axes of motion that may require over 100 wires between the drive and control. This reduces system cost, eliminates many types of noise problems, and helps machine designers get motion control systems up and running quickly. The proper trademarked name for the interface is SERCOS interface™, with SERCOS all caps and interface lowercase. Because of trademark considerations, the interface should not be referred to merely as “SERCOS.”
38.1.1 Three Generations of the SERCOS interface There are three generations of the SERCOS interface. The first two utilize an application-specific integrated circuit (ASIC) as a hardware processing platform, fiber-optic transmitters and receivers, and a fiber-optic cable as the transmission medium. • The first generation operated at 2 and 4 Mbit/s, using the SERCON410B ASIC and fiber-optic cabling. The SERCON410B is no longer produced. • The second (current) generation operates at 2, 4, 8, and 16 Mbit/s, using the SERCON816 ASIC and fiber-optic cabling. • The third generation, dubbed SERCOS-III, operates at up to 100 Mbit/s utilizing a field programmable gate array (FPGA) or a general-purpose communication controller (GPCC) and is based on standard Ethernet hardware. SERCOS-III is expected to be available in 2005. More than 50 control manufacturers and 30 drive manufacturers worldwide offer first- and secondgeneration SERCOS interface products, with hundreds of thousands of systems installed. Several manufacturers offer SERCOS interface I/O stations plus hardware and software development tools. In 2004, work was begun on a third generation of the SERCOS interface. It involves linking the existing high-performance SERCOS interface with Industrial Ethernet to form a new generation of SERCOS, named SERCOS-III. The new version combines the determinism of the original SERCOS interface with the high bandwidth of Industrial Ethernet for the best of both worlds.
© 2005 by CRC Press
The SERCOS interface™
38-3
SERCOS-III maintains backward compatibility with previous versions in regard to profiles, synchronization, and message structures. It retains the set of more than 500 standard parameters that describe all aspects of real-time motion and I/O control. This chapter first describes the basic structure of the SERCOS interface as implemented in generations I and II. It then provides a complete description of the changes and enhancements provided by SERCOS-III.
38.1.2 SERCOS interface Is an International Standard The SERCOS interface is a set of standard specifications that may be incorporated into any company’s products, with each control and drive maintaining its own functions and features. Because it is an international standard (IEC/EN 61491), it allows any manufacturer’s SERCOS interface-compatible digital control to talk to any other SERCOS interface-compatible digital servo drive, digital spindle drive, hydraulic system, digital I/O, or sensors over a well-defined fiber-optic link or standard Ethernet technology (SERCOS-III). Controls and drives conforming to the standard comply with a standard medium for transmission, topology, connection techniques, signal levels, message (telegram) structures, timing, and data formats. Most of the other digital drive interfaces on the market are proprietary and can only be used with the manufacturer’s own drives or have been “opened,” but have been developed and are controlled by a single manufacturer, with a small number of other firms providing products using that bus. The SERCOS interface is the only motion control interface that is standardized, guaranteeing both a standardized hardware platform and a standardized protocol open to use by all manufacturers.
38.1.3 How the SERCOS interface Works The first-generation SERCOS interface operated at 2 and 4 Mbit/s, using the SERCON410B ASIC, now obsolete. The second-generation SERCOS interface operates at 2, 4, 8, and 16 Mbit/s, using the SERCON816 ASIC. SERCOS-III operates at up to 100 Mbit/s, using an FPGA or a GPCC and standard Industrial Ethernet hardware. ASIC-, FPGA-, or GPCC-based SERCOS interface controllers are normally integrated into master motion controls as well as drives, amplifiers, and I/O modules. They simplify the task of the designer by automatically handling most SERCOS interface communication functions. A set of over 500 standard software functions (called IDNs (identification numbers)) defines standard motion and I/O functions. In addition, the interface allows for manufacturer-specific IDNs, which can be used to define unique functions not addressed by the standard IDN set. In a SERCOS interface system, all servo loops are normally closed in the drive. This reduces the computational load on the motion controller, allowing it to synchronize more motion axes than it otherwise could. In addition, closing all the servo loops in the drive reduces the effect of the transport delay between the motion control and drive.
38.1.4 Functions of the SERCOS interface • The SERCOS interface exchanges data between controls and drives, transmitting command and actual values with extremely short cycle times. • It guarantees an exact synchronization for precise coordinated moves with as many axes as required. • The interface includes a service channel for noncyclic data transmission, used for the display and input of all control internal parameters, data, and diagnostics. Drive parameters can be downloaded and uploaded for storage via the service channel. • The SERCOS interface supports four operating modes: torque, velocity, position control, and block modes. • The interface enables the use of controls and drives from different manufacturers in a system, by standardizing all data, parameters, commands, and feedback exchanged between drives and controls.
© 2005 by CRC Press
38-4
The Industrial Communication Technology Handbook
38.1.5 Ideal for Distributed Multiaxis Control Systems The SERCOS interface is a foundation for building distributed multiaxis control systems for a myriad of applications. Distributed control improves machine flexibility by moving processing power and decision making from the Computer Numerical Control (CNC) or motion control down into the drives and sensors. These devices then become intelligent building blocks that can easily be added to a machine or production line without major changes in hardware and software. The SERCOS interface is well suited for distributed control because it allows placement of axisdependent control functions, such as loop closures, interpolation, and registration in the drives, not in the motion controller. Thus, motion controllers can concentrate on motion control profiles and tool paths independent of the axes. When designed in this manner, the control issues a position command to the drive, which then closes its own loops and microinterpolates its trajectory, based on previously downloaded parameters. The SERCOS interface also integrates input/output functions such as limit switches, push buttons, and various sensors.
38.1.6 Advantages The SERCOS interface provides machine manufacturers with the flexibility to configure multivendor control systems with plug-and-play interoperability. Even more importantly, the SERCOS interface has facilitated great advances in machine productivity. A prominent U.S. food manufacturer states, “Our SERCOS applications play an important role in our plan for a dynamic enterprise that can flexibly adapt manufacturing for every new product size, shape and packaging configuration. Along with this flexibility, we have realized incredible timesaving advantages — faster delivery, faster installation, faster setup, faster product changeover, and faster production speeds.”* The SERCOS interface allows manufacturers to create intelligent digital drives with vastly improved capabilities and flexibility. A single drive can be designed to handle multiple types of motors, such as permanent magnet servomotors, high-horsepower induction servomotors (vector drives), and linear motors, with the configuration set up parametrically. The SERCOS interface is also used with stepper motors, hydraulic drives, and I/O systems.
38.2 Brief History of the SERCOS interface† 38.2.1 The ±10-Volt Analog Drive Standard For 30 years, the motion control industry relied on a de facto ±10-V interface standard (RS-431) between analog drives and controls, where 10 V equals full speed of the drive and ± determines the direction of motor rotation. Although it was fine for analog systems, this interface proved to be totally inadequate for the new, more complex digital technologies that were coming on the scene in the 1980s. As servos began to evolve into digital-based devices, they soon exceeded the capabilities of the ±10-V command. To circumvent the restrictions of the analog velocity command interface, various proprietary CNC–drive interfaces emerged. Although they each solved some of the problems, they also created new ones. Foremost was that the openness and portability of RS-431 was lost. Products were developed that depended upon not just a particular type of servo technology, but even a particular vendor’s product. The situation clearly demanded a new standard.
*Hershey Finds SERCOS Best Path to Enterprise-Wide Compatibility, Keith Campbell, Instrumentation and Control Systems, January 1999. †Illustrations and text on the evolution of digital drives and the SERCOS interface adapted by permission from the technical paper “Open Drive Interfaces for Advanced Machining Concepts,” by Scott Hibbard of Bosch Rexroth Corporation, January 1995.
© 2005 by CRC Press
The SERCOS interface™
38-5
38.2.2 First- and Second-Generation SERCOS interface Criteria Based on past experiences, a list of attributes was produced for an analog interface replacement: 1. It must be an open system. Users of the motion control products and interface must not be restricted to a given supplier for both drives and controls, as no one supplier can provide all solutions in all arenas. Likewise, designers of custom controls should have an open solution to design into their products, so that they may make use of the widest possible family of solutions. Such a system also should permit the development of controls and drives independently of one another. 2. The interface must be technology independent, permitting use of various drive technologies such as brushless DC, vector control, Variable Frequency Drives (VFD), stepper, hydraulics, or pneumatics. 3. It must be economical. The total package cost should be no more for a control system with a nextgeneration drive–control interface than for an analog interface system. 4. It must support high-speed and high-resolution operations. 5. It must support access to internal data in a standardized format — variables as well as diagnostics. 6. It should support single-axis and multiaxis systems. The cost should not affect single-axis systems, yet the interface should support synchronized multiaxis systems. 7. It should support distributed control. 8. It must afford the same or better troubleshooting aids as the analog drive.
38.2.3 First- and Second-Generation SERCOS interface: A New World Standard Based upon pressures from its member companies, in 1987 the VDW (German Machine Tool Builders Association) formed a joint working group with the ZVEI (German Electrical Standards Association). Their charter was to develop a next-generation open interface between drives and controls that included the expanded opportunities afforded by digital technology. The interface was intended to permit continued drive technology independence, meaning drive technology should have no impact on CNC and motion control technology, and vice versa. After over 40 man-years of combined effort, the working group published a specification for the SERCOS interface scheme. From the outset, the goal was to create an international standard, usable by any manufacturer. In 1990, a draft international standard was submitted to the International Electrotechnical Commission (IEC). The standardization process moves slowly, so in 1995, the standard was approved as IEC 1491. A few years later, the IEC changed its numbering convention and the standard became IEC 61491. In 1998, the standard was also approved as European Standard EN 61491. A few typos had been corrected from the IEC to the EN version, but the two were essentially the same. And in 2001, a minor update to IEC 61491 made the wording of the two identical. ASIC development occurred in parallel with the standardization process. In 1993, SGS Thomson (now STMicroelectronics) developed the SERCON410B ASIC, operating at 2 and 4 Mbit/s. This made implementation of the SERCOS interface simpler and less expensive. In 1999, STMicroelectronics produced the SERCON816 ASIC, operating at 2, 4, 8, and 16 Mbit/s, quadrupling available interface speeds. It provides backward compatibility to the SERCON 410B. In 2000, products from SERCOS interface association members began shipping with the SERCON816.
38.3 The SERCOS interface Is Not a Fieldbus It is important to understand that the SERCOS interface is first and foremost a motion control bus, with I/O capabilities. It does not compete with device buses and fieldbuses, which are primarily intended to control low-level devices that require on/off control or an asynchronous command value.
© 2005 by CRC Press
38-6
The Industrial Communication Technology Handbook
Industrial PC runs Windows-based MMI software
Gateway to fieldbus
Factory network (Ethernet)
I/O I/O
Controller (master) Feedback and diagnostics
SERCOS interface
Resolver/ encoder feedback
Motor power
Feedback
Registration input
Motor power
Digital Drive (slave)
Digital Drive (slave)
Digital Drive (slave)
Registration input
Via SERCOS interface
Position, velocity, and current loops closed in the drives
Fiber-optic loop operates at up to 16 Mbit/sec
Registration input
Machine I/O Position, velocity, and torque commands
Motor power
Feedback
n
1 1
2
n
Multi-axis machine
FIGURE 38.1 Flexible system with multiple interfaces: generations I and II.
In fact, many manufacturers use a high-level bus such as Ethernet for factory control, a fieldbus such as Profibus or DeviceNet for I/O control, and the SERCOS interface for motion control. However, in some applications the I/O requirements are appropriate for utilizing the SERCOS interface as both the motion and I/O interfaces. Although the SERCOS interface was not designed primarily for I/O, it includes the capability to decentralize I/O via multiple I/O nodes that connect to the field devices. The decision of whether to use a fieldbus, SERCOS interface, or both depends on the application. If only loose coupling between motion axes is required, a series of single-axis position drives connected via a fieldbus may be adequate. DeviceNet or ControlNet, for example, can be used for simple motion control applications, but they are not truly deterministic and do not possess a clock tick or synchronization signal to ensure that multiple nodes operate in perfect time with each other. Synchronized motion applications require a higher level of real-time control (Figure 38.1). Only the SERCOS interface (or a proprietary digital interface) allows the motion control to calculate the position commands for all axes in a servo system and transmit them with guaranteed synchronization to fully coordinate multiple axes of motion. In high-power applications, such as flexible machine systems, both a fieldbus and SERCOS interface exist in the same control. Today’s trend is for fieldbus and motion bus development (including SERCOS-III) to utilize Industrial Ethernet as a common physical layer. It is important to note that there are still significant differences among these buses in the protocols, synchronization mechanisms, and profiles, so that the coexistence of different technologies will still be necessary for most applications.
© 2005 by CRC Press
38-7
The SERCOS interface™
Master controls the ring and assigns time slots to ensure deterministic and collision-free access for all drives. Control Unit Master 1
Master 2
Ring 1
Ring 2 Up to 254 slaves per ring
Messages travel unidirectionally Slave
Slave
Slave
Slave
Slave
Single drive
Group of drives
Single drive
Single drive
Group of drives
Slave extracts or adds data, then passes the message downstream.
FIGURE 38.2 SERCOS interface generations I and II topology.
38.4 How the SERCOS interface Communicates* 38.4.1 SERCOS interface Topology Components in generations I and II SERCOS interface-based motion control systems are connected via fiber-optic rings using a master–slave configuration (Figure 38.2). A typical system may include several rings, with up to 254 devices per ring. A SERCOS interface master controls each ring, assigning time slots to ensure deterministic and collision-free access for all slaves. SERCOS-III uses Industrial Ethernet and offers both a line and ring structure. 38.4.1.1
Communications Structure
In order to ensure strict synchronization of multiple axes and a predictable update time at each axis, the SERCOS interface utilizes a master–slave communication structure, where the motion control acts as a master and the drives as slaves. The drives are only permitted to respond to queries from the motion control. Note that the new SERCOS-III supports slave-to-slave data transfer.
38.4.2 Fiber Optics With generations I and II SERCOS interfaces, one fiber-optic ring is used to exchange full 32-bit data between controllers, drives, I/O, and sensors. This includes commands, status, parameters, and diagnostics. In SERCOS-III, standard Industrial Ethernet is normally used for this purpose. Note that Ethernet fiber optics can be used for communication with SERCOS-III. Fiber optics provides inherent noise immunity and eliminates the immense requirements for conduit, wiring, and terminations normally required with the analog interface. The ring architecture reduces the number of components required on a motion control. Adding an additional motion axis often requires nothing more than opening the ring and placing the new drive in the ring. A SERCOS interface ring is composed of a number of fiber-optic segments, as illustrated in Figure 38.2. Each device in a system receives signals via a fiber-optic receiver with an F-SMA† connector and transmits the signals to the next device via a fiber-optic transmitter, again with a standard F-SMA connector. Both plastic and glass fiber are specified, with wavelength specified at 650 nm. *Illustrations and text adapted by permission from the technical paper “Open Drive Interfaces for Advanced Machining Concepts,” by Scott Hibbard of Bosch Rexroth Corporation, January 1995. †F-SMA Connector, Fiber Optic SubMiniature version A Connector.
© 2005 by CRC Press
38-8
The Industrial Communication Technology Handbook
TABLE 38.1 Performance of SERCOS interface Rings: Generations I and II Cycle Time
Data Record per Drive (MDT + AT)
Transmission Rate
Number of Drives
Data Rate (Noncyclic Data)
2 ms 1 ms 1 ms 0.5 ms 2 ms
32 bytes 32 bytes 36 bytes 36 bytes Standard telegram 2, 3, 4
2 Mbit/s 4 Mbit/s 8 Mbit/s 16 Mbit/s 16 Mbit/s
8 8 15 14 112
8 kbit/s (2 bytes) 16 kbit/s (2 bytes) 32 kbit/s (4 bytes) 128 kbit/s (8 bytes) 8 kbit/s (2 bytes)
Remaining Cycle Time 390 125 208 113 330
ms ms ms ms ms
Maximum cable lengths are: • Plastic optical fiber (POF), 1 mm diameter • Node to node, 40 m • Maximum ring length (254 nodes), 10,000+ m • Hard Clad Silica (glass) fiber (HCS), 200 µm diameter • Node to node, 200 m • Maximum ring length (254 nodes), 50,000+ m The maximum number of drops per fiber-optic ring is 254. However, the number of drives that can be serviced per ring depends on three application requirements: 1. The communication cycle time 2. The volume of operational data 3. The communications speed required Table 38.1 illustrates performance for generations I and II, showing examples of number of drives per ring at various transmission rates and cycle times. Performance is improved in SERCOS-III; see Table 38.2. The number of rings that can be synchronized together is limited only by the controller, which will have a limit to the number of axes it can process. SERCOS interface-compliant controllers can have a much higher axis limit than their predecessors, since the SERCOS interface supports distributed processing, which relieves the controller of many time-intensive tasks, which are now handled by the drives. 38.4.2.1 Fiber-Optic Transmitters and Receivers Fiber-optic transmitters and receivers from Agilent Technologies and Honeywell have been tested and approved by the SERCOS Technical Working Group. The SERCOS N.A. and Interests Group SERCOS interface e.v. (IGS) Web sites (www.sercos.com and www.sercos.de) provide links to downloadable data sheets for these products. Signal levels are defined in the IEC 61491 standard.
38.4.3 SERCOS interface ASICs The SERCON ASICs are integrated circuits for SERCOS interface (generations I and II) real-time communication systems. A SERCOS interface communication system consists of one master and several slaves, connected by a fiber-optic ring that starts and ends at the master. While their use is not mandatory, the ASICs contain all the hardware-related functions of the SERCOS interface, and considerably reduce the hardware costs for a design. Unlike other bus interface devices, the SERCON ASIC features the capability to perform all sequencing and synchronization tasks, relieving the host microprocessor of these time-intensive functions. The ASIC is the direct link between the fiber-optic receiver and transmitter and the microprocessor that executes the control algorithms in a SERCOS interface device. The same SERCON ASIC is used both for SERCOS interface masters and slaves. The current SERCOS interface ASIC is SERCON816 (see Figure 38.3), manufactured by STMicroelectronics in the 0.5 µm/5 V HCMOS technology. It operates at 2, 4, 8, and 16 Mbit/s and is downward compatible with the now obsolete SERCON410B (2 and 4 Mbit/s) ASIC.
© 2005 by CRC Press
38-9
The SERCOS interface™
D[15:0]
BUSYN A[15:0]
WRN RDN
ADMUX BUSMODE[1:0] BUSWIDTH BYTEDlR
MCSNO/1 PCSNO ALEL ALEH BHEN PCS1
bus interface
interrupt
clock reset
port A dual-portRAM 2048*16 bit port B
telegramprocessing
SBAUD SBAUD16 TM0/1
DMA
watchdog
timing control
serial interface
INTO/1
SCLK SCLKO2/4 MCLK RSTN DMAREQR/T DMAACKNR/T WDOGN
CYC_CLK CON_CLK DIV_CLK
L_ERRN RECACTN
IDLE
R× C R ×D
T×C T×D[6:1]
optical transmitter/ receiver or RS-485 bus driver
FIGURE 38.3 SERCON816 ASIC block diagram.
Features of this device are*: • Single-chip controller for the SERCOS interface; available in 100-pin flat-pack QFP100 and TQFP100 packages. Physically compatible with SERCON410B. • Faster and lower cost than SERCON410B. • Interface to the microprocessor with a data bus width of 8 or 16 bits and with control lines per Intel or Motorola standards. • Serial transmission rate of 2, 4, 8, and 16 Mbit/s with an internal clock. The chosen bit rate can be selected by pins or software. The ASIC can be run in SERCON410B-compatible mode by tying pin SBAUD16 to a logical 1. • Data communication is available via fiber-optic rings or RS-485 rings and RS-485 buses. RS-485 is seldom used. • Dual-port RAM with 2048 16-bit words for control and communication data. • Data and clock regeneration; the repeater for ring topologies and serial transmitter and receiver are integrated. The signals are monitored and test signals are generated. • Full-duplex operation. • Modulation of power of optical transmitter diode. *Feature list and Figure 38.3 from SERCON816 SERCOS interface controller manual, courtesy of Interests Group SERCOS interface e.V., Stuttgart, Germany, 2001.
© 2005 by CRC Press
38-10
The Industrial Communication Technology Handbook
• Telegram processing for automatic transmission and monitoring of synchronous and data telegrams. The transmission of service channel information over several communication cycles is executed automatically. • Flexible RAM configuration; communication data stored in RAM (single or double buffer) or transferred via Direct Memory Access (DMA). • Timing control signals. • Automatic service channel transmission. • Watchdog to monitor software and external synchronization signals. Reset value for repeater mode of the serial interface is configurable through input pins.
38.4.4 Timing Timing is critical in serial networks because motion controls cannot accurately reconstruct the state of the machine unless everything is precisely synchronized — measurements, transmissions, and replies. Many of the messages that controls and drives send each other are for timing and synchronization. In CNC position loop software, the position error is determined in a software routine that takes time. The system works because the position error updates are designed to occur at predictable points in time. The same logic is applied in the SERCOS interface. Methods are specified to keep the jitter on the serial link down to a low level, then an internal timing sequence is used to ensure that all drives in a loop act upon their command signal at the exact same moment, and all acquire their feedback information at the exact same moment. The result is transparent to the user.
38.4.5 Interface Placement The SERCOS interface supports position, velocity, and torque mode control, which maintains technology independence between drives and motion controls (Figure 38.4). This independence was one of the SERCOS interface design goals, to ensure that the interface restricted the design of neither controls nor drives, but allowed both to proceed independently. For this reason, the power stage interface controlling commutation/phase current was ruled out. A power stage interface is highly hardware dependent, requiring the vendor to design motion controls around specific motors, limiting the use of the control to that type of motor. It also involves transmitting high-speed commutation signals over long distances with microsecond update rates. Thus, the demands for speed on the SERCOS interface are less than those for competitive interfaces that offer power stage control. position velocity torque Tf vf x1* . . . xm*
Pos
x1
FIGURE 38.4 Interface placement.
© 2005 by CRC Press
Vel v1
Torque ε1
M
38-11
The SERCOS interface™
IDN
Name (abbreviation) Function/description Length in bytes
Minimum input value Maximum input value
Scaling/resolution
Units
FIGURE 38.5 Standard IDN format. S-00047
Position command value During the position control drive operation mode, the position command values are transferred from the control unit to the drive according to the time pattern of the control unit cycle. Scaling type: IDN00076 31 4 Min. ≥ − 2 Scaling factor: IDN 00077 31 Max. ≤ + 2 − 1 Scaling exponent: IDN 00078 Rotational position resolution: IDN 00079
S-00051
Position feedback value 1 (motor feedback) The position feedback value 1 is transferred from the drive to the control unit so that it is possible for the control unit to perform block stepping and to display position information if necessary. 4
31
Min. ≥ − 2
Max. ≤ + 2
31
−1
Scaling type: IDN00076 Scaling factor: IDN 00077 Scaling exponent: IDN 00078 Rotational position resolution: IDN 00079
FIGURE 38.6 IDNs for position command value and motor feedback value.
38.4.6 SERCOS interface IDNs All motion control and drive communication is performed via a set of SERCOS interface telegrams, which each have an identification, or “ident” number (IDN). All parametric data, such as scaling and loop gains, and real-time loop closure information are set up this way. This allows the SERCOS interface to standardize the most common interface data. The IEC 61491 interface standard allocates a block of 32,767 IDN numbers for standard commands. Over 500 of these are presently used to define a comprehensive set of motion control and I/O commands. All SERCOS interface products must incorporate a subset of these, but do not necessarily need to include all IDNs. The format used for an IDN in the IEC standard document is shown in Figure 38.5. 38.4.6.1 IDNs for Real-Time Data The standard defines a number of IDNs for real-time data. For example, in each communication cycle, the master transmits a master data telegram (MDT) that contains a series of IDNs specifying real-time operating data and drive commands for the addressed drive. A slave responds with an amplifier telegram (AT) containing 1 to 16 IDNs reporting data such as speed, torque, and position measurements to the master, effectively closing the control loop. For example, IDN 00047 transfers the specified position command value as a 32-bit value to the drive. IDN 00051 communicates the actual position feedback value 1 (motor feedback) back to the control, as illustrated in Figure 38.6. 38.4.6.2 IDNs for Parameters Figure 38.7 illustrates a SERCOS interface drive system used to control a machine tool axis. It shows some of the control-specific and machine-specific parameter IDNs that would be used for control of such an axis and lists the function of each IDN. 38.4.6.3 Manufacturer-Defined IDNs In order to avoid restricting drive and control development, the IEC standard defines 32,767 vendordefined IDNs that can be used by individual vendors to incorporate special features in their products.
© 2005 by CRC Press
38-12
The Industrial Communication Technology Handbook
CNC
SERCOS interface IDN S-0-0041 IDN S-0-0042 IDN S-0-0054 IDN S-0-0147 IDN S-0-0151
fiber optic ring
IDN S-0-0049
IDN S-0-0050
Slide
IDN S-0-0165 IDN S-0-0116
IDN S-0-0123
IDN S-0-0122
IDN S-0-0156
IDN S-0-0121
IDN S-0-0118 IDN S-0-0115
Representative Parameters for Control and Drive Adaptation Control-specific applications parameters IDN
Function
Machine-specific application parameters IDN
Function
S-0-0032
Primary mode of operation
S-0-0041
Homing velocity
S-0-0033
Secondary operation mode 1
S-0-0042
Homing acceleration
S-0-0034
Secondary operation mode 2
S-0-0049
Positive position limit value
S-0-0035
Secondary operation mode 3
S-0-0050
Negative position limit value
S-0-0044
Scaling of velocity data
S-0-0054
Actual position feedback 2
S-0-0076
Position data scaling type
S-0-0115
Position feedback type
S-0-0079
Rotational position resolution
S-0-0116
Resolution of rotational feedback 1
S-0-0086
Scaling type for torque force data
Resolution of linear feedback
S-0-0160
Scaling type for acceleration data
S-0-0118 S-0-0121 S-0-0122
Load gear output revolutions
S-0-0123
Feed constant
S-0-0147
Homing parameter
S-0-0151
Reference offset 2
S-0-0165
Distance coded reference dimension 1
S-0-0166
Distance coded reference dimension 1
Load gear input revolutions
FIGURE 38.7 SERCOS interface drive for machine tool axis with representative IDNs. (Illustration from SERCOS interface Technical Short Description, Interests Group SERCOS interface e.V., Stuttgart, Germany, 1998.)
The standard provides that these vendor-defined IDNs only enable additional functionality not already handled by standard functions, thus ensuring interoperability. One of the values of the vendor-defined IDNs has proven to be their use in incorporating new features into the SERCOS interface. A number of vendors have utilized the vendor-defined IDNs to introduce and field-prove some new control and drive features, such as block mode, in which the controller sends the drive a final destination, a velocity, a ramp, and possibly a jerk command, and then the drive operates independently until the destination is reached. Many such vendor-defined features have been brought to the SERCOS Technical Working Group, where they have been incorporated into the standard IDN set for use in a standardized manner by all manufacturers. 38.4.6.4 I/O Functions A set of I/O functions is defined and accepted as a part of the SERCOS interface, but because the SERCOS interface was originally developed as a drive interface, these were not originally included in the formal
© 2005 by CRC Press
38-13
The SERCOS interface™
Administrative Delimiter
Address
Command and feedback Data
01111110
Administrative Error checking
Delimiter
01111110
FIGURE 38.8 Standard telegram format.
IEC 61491 standard. The most up-to-date SERCOS specification, including these functions, is available from the SERCOS member organizations.
38.4.7 System of Units and Variable Format Wherever possible, tables of acceptable units such as revolutions, inches, millimeters, etc., are defined for standardized telegrams. The format of the byte values is also defined. One of the more valuable results of the standards effort was the application of a predictable system of units for up-to-then unstandardized values, such as loop gains, feedback values, etc. This is another step toward predictable operation of any servo system a machine builder may choose.
38.4.8 SERCOS interface Cycle Times The SERCOS interface cycle time is specified in a flexible format of 62.5, 125, 250, 500 µs, and then multiples of 1 ms. (SERCOS-III cycle times begin at 31.25 µs.) The amount and type of data contained in a cycle are also variable. This flexibility permits a designer to vary cycle time, content, and number of drives to achieve a particular project’s requirements. More data can be sent faster to a smaller number of drives. Slowing the rate down permits a higher density of drives per ring. For example, with generations I and II, four to eight axes are common on a ring in machine tool applications where intense communication is required to control a high-speed tool path. Up to 40 axes may be controlled on a ring in packaging applications, and up to 100 axes may be found in Web-fed printing applications, where electronic line shafting is utilized to synchronize a number of print cylinders and auxiliary axes. When necessary, multiple rings can be employed in an application.
38.4.9 Cyclic Operation SERCOS interface devices communicate and stay in sync by sending each other a series of telegrams. A telegram is a rigidly defined bit stream carrying data and timing information. Telegrams are: Master synchronization telegram (MST): Establishes the timing for the system Amplifier (drive) telegram (AT): Response by the drives to the MST Master data telegram (MDT): Provides data records for all drives in the loop 38.4.9.1 Telegram Format All SERCOS interface telegrams consist of five major fields, as illustrated in Figure 38.8. Telegrams are transmitted in the non-return-to-zero-inverted (NRZI) format. The transmitted signal remains at a 0 or 1 as long as the bit is a 0 or 1. In order to ensure edge synchronization and to prevent the delimiter pattern from reoccurring, the master forces a signal change every six bits by utilizing a bit stuffing technique of inserting a zero after five consecutive ones. 38.4.9.2 Telegram Sequence A cycle involves three different types of telegrams, as illustrated in Figure 38.9:
© 2005 by CRC Press
38-14
The Industrial Communication Technology Handbook
Cycle time is set during initialization. tcyc MST
AT1
AT2
ATn
MDT
Synchronization signal Speed, torque, or position measurement
MST
Speed, Synchronization torque, signal or position command
FIGURE 38.9 Timing diagram for cyclic operation.
1. It begins with the transmission of a master synchronization telegram (MST) from the master (CNC or motion control). The MST is used as a time mark for all slaves (drives) to determine when to talk on the bus, when to acquire feedback signals, and so forth. 2. At a predetermined time after the end of the MST, the first drive in the system places its data on the bus in an amplifier (drive) telegram (AT). Each drive follows in turn, all synchronized off the MST. During a SERCOS interface initialization phase, the drives are instructed when they should transmit their message with respect to the MST. 3. After the last drive has placed its data on the bus, the master sends out a master data telegram (MDT). The MDT is one long message with space set aside for each drive in the ring. The drives have been previously instructed where their data are located within the MDT. As the MDT is received by a drive, it “fast forwards” to the start location for its information and retrieves the data. After the MDT is sent, another MST is transmitted by the master control, signaling the beginning of another cycle, as illustrated above. 38.4.9.3 Master Synchronization Telegram The SERCOS interface master begins a communication cycle by transmitting an MST, as detailed in Figure 38.10. The MST sets the timing and synchronization for the network. Each slave device (drive or I/O) synchronizes its clock from the MST, then uses this timing to calculate when to acquire feedback signals and other processing. The MST data field also contains information. Upon start-up, the SERCOS interface loops go through an extensive initialization procedure, where drives are assigned an operating mode (speed, torque, or position) and the exact configuration of their cyclic data. They are also told to synchronize and phaselock their internal loop closure frequency, eliminating the risk of harmonics. Cyclic communication does not begin until this initialization is complete. The MST data field indicates whether the loop is in one of the initialization phases or in the communication cycle. BOF
Address INFO
FCS
EOF
Beginning of frame (01111110) Target address (11111111 is the broadcast address) Information field (one byte indicates communication phase) Frame check sequence End of frame (01111110) Master Synchronization Telegrams last only 30 µsec, but they are extremely important because they set the timing and synchronization for the entire loop.
FIGURE 38.10 Master synchronization telegram structure.
© 2005 by CRC Press
38-15
The SERCOS interface™
BOF
ADR
Data record
FCS
EOF
Data Record Status
drive service info
Fixed part of data record
Operation data Configurable part of data record
Operation data IDN *****
Operation data IDN *****
Operation data IDN *****
Operation data IDN *****
FIGURE 38.11 Amplifier (drive) telegram structure.
BOF ADR Data record Status
Service Info
Velocity feedback value (IDN 00040)
FCS
EOF
FIGURE 38.12 Amplifier telegram example.
38.4.9.4 Amplifier (Drive) Telegram Drives respond to the master synchronization telegrams via an amplifier telegram sent to the master. At a predetermined time after the MST, the first drive in the system places its data in its predetermined slot. Each drive follows in succession, all synchronized off the MST. A drive telegram is composed of five main fields, as illustrated in Figure 38.11: 1. 2. 3. 4. 5.
Beginning of frame (BOF) Drive address (ADR) Data record Frame check sequence (FCS) End of frame (EOF)
The data record is composed of both fixed and variable data in three fields: 1. Status: The eight-bit status field indicates whether the drive is ready and verifies that it is in the correct operating mode. 2. Drive service info: This two-byte service channel field contains non-time-critical data such as torque limits, travel limits, time constants, and gains. 3. Operation data: This is the most important field, containing from 1 to 16 IDNs reporting data such as speed, torque, and position measurements to the master, effectively closing the control loop. For example, if the drive is operating in velocity mode, meaning it receives velocity information from the motion control, it may be configured with current actual velocity in the drive telegram, as illustrated in Figure 38.12. Because all drives take their measurements at the same time, the control can construct an instantaneous snapshot of all axes of motion.
© 2005 by CRC Press
38-16
BOF
The Industrial Communication Technology Handbook
ADR
Data record 1
Data record k
Data record K
FCS
EOF
Data Record k for drive xx Master Control Service Info Fixed part of data record
Operation data for drive xx Configurable part of data record
Operation data IDN *****
Operation data IDN *****
Operation data IDN *****
Operation data IDN *****
FIGURE 38.13 Structure of the master data telegram.
BOF ADR Data record 1 Control
Velocity command value Master (IDN 00036) Service Info
Control
Master Velocity command value Service Info (IDN 00036)
Control
Master Velocity command value Service Info (IDN 00036)
Data record 2
Data record k
Data record K Control
Master Velocity command value Service Info (IDN 00036)
FCS
EOF
FIGURE 38.14 Example of a master data telegram.
38.4.9.5 Master Data Telegram The Master Data Telegram (MDT) is transmitted by the master after the amplifier telegram. It consists of one long message, with space allocated for each drive on the ring, as shown in Figure 38.13. All drives have previously been instructed where their data are located within the MDT. As the MDT is received by a drive, it “fast forwards” to the starting location for its information and retrieves it. The master data telegram is structured much the same as the amplifier telegram, except that the data field includes a record for each drive on the ring (Figure 38.14). Like the AT, each data record in the MDT consists of a fixed and variable data portion in three fields: 1. Control field: This eight-bit field enables or disables the drive and configures the service channel. 2. Master servo info: Implements setup parameters and special functions, such as homing, probing, and coordinate offsets.
© 2005 by CRC Press
38-17
The SERCOS interface™
Service channel transfers MASTER Information Control
Service channel management
Timing control
MDT transmission Incoming AT Drive Drive service status info
Control Information
Data transmissions occur every cycle
Master Master service control info
DRIVE
Master Master service control info Incoming AT AT transmission
Service channel management
Drive Drive service status info
Status bits Information
Timing control
Information Control
FIGURE 38.15 Service channel operation.
3. Operation data for drive xx: This field contains a series of IDNs specifying real-time operating data and drive commands for the addressed drive. The chosen idents depend on the application. Using the previous example, in velocity mode this data record would contain the velocity commands for the drives. 38.4.9.6 Service Channel The service channel allows high-level, non-time-critical data transfers without disturbing the synchronous transfer of data. For example, items such as diagnostics and loop gains do not need to be transmitted repeatedly, so it is not necessary to impact loop performance by setting aside bandwidth for them. Long messages are broken into two-byte chunks, sent over the service channel in the amplifier telegram, and reassembled on the other end. Via the service channel, operators can retrieve diagnostic and status information, reprogram or display setup parameters, and change operating data on the fly (Figure 38.15). Some manufacturer’s drives include a software oscilloscope function that captures data on drive performance, sends the data over the service channel two bytes at a time, and formats the information into an oscilloscope-type display on the operating terminal. 38.4.9.7 Example of Cyclic Operation: SERCOS interface Generations I and II The best way to imagine how a system works is to take a real-world example such as a system with 14 drives and a 500-µs cycle rate using generation I or II. Under these conditions, the following information can be transferred every cycle: From control to each drive: • 32-bit command value (e.g., velocity or position) • 16-bit limit value (e.g., torque) From each drive to control: • 32-bit feedback value (e.g., velocity or position) • 16-bit feedback value (e.g., torque) This means that every 500 µs one SERCOS interface ring can transport a very high resolution velocity command signal to 14 drives, plus a torque limit command to each of those drives that can be enabled at will. In addition, each cycle is returning two feedback values from each drive. This reduces the customary feedback wiring to the control to nothing.
© 2005 by CRC Press
38-18
The Industrial Communication Technology Handbook
In addition to all this real-time data being exchanged, the service channel provides the equivalent of a 128-kbaud serial port to each and every drive on the ring. This can be used to set and archive performance variables, to read diagnostics, etc. Note that this example does not state that the SERCOS interface performance is limited to 500 µs or 14 drives. Neither is the case. Flexibility exists in the update rate (down to 62.5 microseconds, 31.25 for SERCOS-III), the number of drives per ring (up to 254), the amount of data exchanged, and even the number of rings. Performance for SERCOS-III is even better, as shown in Table 38.2.
38.4.10 Error Correction/Diagnostics If a SERCOS interface loop detects an error, it continues to operate for one cycle using previously valid data. However, if a second consecutive error is detected, the drives are shut down in a predefined, orderly fashion and a diagnostic message is issued. The SERCOS interface offers extensive diagnostic reporting and produces detailed reports on fault conditions. For example, IDNs 00200 through 00205 report drive and motor temperature warning and shutdown conditions, IDN 00309 indicates spindle synchronization error, IDN 00323 specifies target position outside travel range, and there are many more. This allows vendors to incorporate detailed context-sensitive troubleshooting procedures in their motion controls. An operator can go to his control panel and acquire instant diagnosis of every axis motor and drive at every section of the machine. Failures can be immediately pinpointed by the control system and quickly and easily be repaired, as opposed to hit-and-miss troubleshooting. A number of SERCOS interface vendors include built-in soft oscilloscope functions in their products to simplify troubleshooting. Data can be uploaded from the drives using the service channel, allowing operators to display and analyze torque, velocity, and position. For predictive maintenance purposes, a snapshot of operating status can be saved, then compared to another snapshot taken at a later time to determine operation/failure trends.
38.4.11 System Safety Digital intelligent drives with the SERCOS interface offer outstanding protection against uncontrolled drive movements and excessive velocities. Using position values, command values, actual values, and drive parameters, a SERCOS interface drive monitors itself and provides error warnings or error shutdowns, including forced orderly shutdown in the event of a malfunction or failure of the drive processor. Excessive velocities, overtravel, or axis runaway due to faulty or incorrectly transmitted position commands can be eliminated by logical monitoring by the drive of the values it receives from the control. Safety redundancy can be achieved by the control as it monitors the actual value data fed back from the drive via the SERCOS interface. This internal monitoring can ensure safe shutdown in the event of a communications failure.
38.5 Features and Operation of SERCOS-III* 38.5.1 Introduction Industrial Ethernet has become the de facto standard for manufacturing information networking, and the market is requesting Ethernet connectivity for servo drives. Ethernet is characterized by high bandwidth and low hardware costs, but it is not deterministic. The SERCOS interface, on the other hand, is optimized for high-speed deterministic motion control, where the exact synchronization of multiple drives is required. The SERCOS interface operates at up to 16 Mbit/s, which is more than adequate for most applications. But there is a need, sometimes more *Information adapted by permission from “SERCOS-III: Innovation by Combining SERCOS interface™ and Ethernet,” published on the SERCOS N.A. Web site.
© 2005 by CRC Press
38-19
The SERCOS interface™
perceived than real, for higher speeds. The SERCOS interface has been most effective in 1-kW and above drives, because the cost of its optical components has limited its adoption in lower-power drives, a large segment of the market. The challenge in creating the third generation of the SERCOS interface was to create the best of both worlds and combine the deterministic performance of SERCOS with the low cost and high bandwidth of the nondeterministic industrial Ethernet. Many technologies have been added to the Ethernet stack simply by adding a new protocol. But rather than put motion under TCP/IP (Transmission Control Protocol/Internet Protocol), resulting in less determinism and the added expense of switching devices, the SERCOS solution was to put the standard Ethernet TCP/IP under control of the motion bus, and use Ethernet hardware with its lower-cost twistedpair copper cable. (However, Ethernet fiber optics will be supported by SERCOS-III.) This maintains the deterministic motion control of SERCOS, allows links to the existing manufacturing communications infrastructure, provides for the possibility of new features, and lowers hardware costs. And noise immunity should not be a major problem, as Ethernet’s twisted-pair cable is generally extremely noise immune because it uses a differential driver and receiver.
38.5.2 Distinct Features of SERCOS-III SERCOS-III offers the following advantages: • Protection of investment due to high compatibility with previous SERCOS interface (topology, profiles, telegram structures, synchronization) • Reduction of hardware costs for a SERCOS-III interface connection down to the level of an analog interface • Integration of Internet protocols • Cross communication between slaves • Synchronization of several motion controls • Fault tolerance in case of a break in the ring
38.5.3 Topology SERCOS-III has a ring structure like the current generation single-ring SERCOS interface. Conditional on the Ethernet physics, however, it is not a single-ring but a double-ring structure. In addition to the ring structure, a linear structure is also possible (Figure 38.16). The double-ring structure offers the possibility of redundant data transfer. With SERCOS-III, in case of a break at any point in the ring, the protocol automatically switches over to the line structure, so that
Ring structure
Line structure
(double ring)
(single ring)
Master M1
Slave 1
Master
M2
Slave 2
M1
Slave n
With hardware redundancy
FIGURE 38.16 SERCOS-III topologies.
© 2005 by CRC Press
Slave 1
M2
Slave 2
Slave n
Without hardware redundancy
38-20
The Industrial Communication Technology Handbook
AT
MDT HM DS RT
IP Channel
MDT HM DS RT
HM DS RT
Non-cyclic communication
Cyclic communication
Communication Cycle MDT, AT and IP channel telegrams are embedded in Standard Ethernet frames
FIGURE 38.17 Cyclic and noncyclic communication.
the communication and manufacturing will continue while the integrated diagnostics tool signals the break, which can be repaired without interfering with the plant’s performance. The line structure does not offer the redundancy advantage, but it does save a wire connection, which can save money when the SERCOS interface bus runs the length of a long machine or system. SERCOS-III does not use the star topology of the standard Ethernet. No hubs or switches are needed.
38.5.4 Direct Communication between Slaves Direct communication between slaves is not possible with the generations I and II SERCOS interface. However, this feature is advantageous in some motion control applications, and the Ethernet physics enable such a slave-to-slave data transfer, so SERCOS-III supports this feature.
38.5.5 IP Channel Generations I and II SERCOS interface include a service channel, which can be used for the transfer of communication data as well as parameter or diagnostic data. To maintain downward compatibility, SERCOS-III also has the service channel. An additional optional IP channel can be added for transfer of real-time and non-real-time data via standard Ethernet frames, as illustrated in Figure 38.17. The cyclic channel and IP channel are configurable.
38.5.6 Motion Control Communication for Centralized and Decentralized Drive Concepts SERCOS-III halves the minimum cycle time of the current SERCOS interface from 62.5 ms down to 31.25 ms. Because of the greater bandwidth of the Ethernet physics, it is still possible to connect an adequate number of slaves, despite the short cycle time. Thus, it is possible to implement both decentralized drive concepts, where all control loops are closed in the drive, and centralized signal processing concepts, where only the current loop is closed in the drive and all other loops are implemented in the central control electronics. SERCOS-III is the only open motion control system that supports these structures.
38.5.7 Synchronization of Motion Controls The trend toward modularization requires motion control concepts that offer the possibility of synchronizing several machine modules (Figure 38.18). Some proprietary solutions, based on SERCOS interface physics, already exist. The aim of SERCOS-III is to standardize this type of communication by defining a profile for the synchronization and communication between motion controls.
38.5.8 SERCOS-III Performance SERCOS-III has a substantial increase in transmission speed. Some typical values are listed in Table 38.2.
© 2005 by CRC Press
38-21
The SERCOS interface™
ERP
Modularized Machine (e.g., printing machine)
PLC
Ethernet/IP or PROFInet SERCOS-III S S
Controller-to-controller Synchronization & Communication via SERCOS-III
MC/PLC
M
M
M
M
Switch
MC/PLC
I/O Machinemodule A
M
M
M
M
I/O Machinemodule B
FIGURE 38.18 Synchronization of machine modules with SERCOS-III. TABLE 38.2 SERCOS-III Performance Size of Cyclic Data
Cycle Time
Number of Drives
8 bytes 12 bytes
31.25 µs 250 µs
8 70
32 bytes 16 bytes
1 ms 1 ms
150 254 (maximum number of drives)
Type of Cyclic Data Torque command, actual position Speed command and actual value, position command and actual value Numerous command and actual values Numerous command and actual values
38.5.9 SERCOS-III Hardware An important goal of SERCOS-III was the reduction of costs per node. No communications ASIC is required, as in generations I and II. Instead, lower-cost standard modules such as a field programmable gate array (FPGA) or communication controllers are used. Ethernet couplings are used instead of the current fiber-optic coupling, which means lower costs for connectors. This should allow the SERCOS interface to be implemented in very cost sensitive products, such as sub-1-kW systems. Because a more flexible hardware solution is implemented in SERCOS-III, the development of a SERCOS core (SERCOS-III IP) is necessary. With this, manufacturers of components and systems can combine the SERCOS-III hardware and their own logic components in one common FPGA. In addition, there are plans to integrate the SERCOS core into a general-purpose communication controller (GPCC). Such controllers are able to support various Industrial Ethernet protocols. It is thus possible to implement control and drive devices that can easily be adjusted to the respective Ethernet protocol simply by using the appropriate driver software. This is a huge advantage — not just for the manufacturers of components. The machine builder needs to handle only one type of wiring. The end user profits from not having to use different hardware configurations in case he is using several different Industrial Ethernet protocols in his plant.
38.5.10 SERCOS-III Compared to Other Ethernet-Based Real-Time Communication Systems Figure 38.19 illustrates the types of communication needed by the automation industry, classified into four real-time types. The figure shows the coverage provided by the ODVA (DeviceNet), PNO (Profibus), and IGS (SERCOS) solutions. The SERCOS-III standard covers all relevant real-time areas.
© 2005 by CRC Press
38-22
The Industrial Communication Technology Handbook
Cycle times
Synchronization
Standard Ethernet Communication
Not cyclic
Not synchronized
Positioning drives, FCs, I/Os ...
4...10 ms
>4 ms
Controller-toController Comm. & Synchr.
1...10 ms
<1...10 µs
IGS
PNO/PTO
PROFInet (NRT)
Ethernet IP with CIP
PROFInet (SRT) Ethernet IP with CIPsync SERCOS-III
Coordinated Drives, 250 µs... 4 ms High-Speed-I/Os. Multiaxis drive concepts with centralized signal processing
ODVA
PROFInet (IRT)
<1 µs
31.25 µs... <1 µs 125 µs
FIGURE 38.19 Coverage of real-time requirements.
Standard Ethernet communication is also included, if the IP channel is used without the cyclic realtime channel.
38.5.11 SERCOS-III Availability First prototypes of the SERCOS-III hardware became available in 2004, with first products envisioned for 2005.
38.5.12 Functionality and Features of the Three Generations of SERCOS interface Table 38.3 compares the features and functions of the three generations of the SERCOS interface. New features in SERCOS-III are highlighted in boldface.
38.6
SERCANS
The SERCOS interface has extensive mechanisms that ensure the functional capabilities of systems incorporating various manufacturers’ products. A SERCOS interface master must request various data from the ring subscribers at initialization and then conduct a time-slot calculation. This complex process led to the development of the SERCANS active SERCOS interface master card, which makes the complex initialization process anonymous. The installed software initializes and manages the SERCOS interface ring by itself, streamlining the interface between the ring and motion control to a DPR interface. This in no way restricts the operating system of the control. The SERCOS interface can be used if the time grid is complied with — this is defined by the cyclic interrupt — and the DPR addresses and their definitions are known. The SERCANS module has been successfully used in more than 70 different controls globally since 1996.
© 2005 by CRC Press
38-23
The SERCOS interface™
TABLE 38.3 Three Generations of SERCOS interface Functionality/Feature Date implemented Over 500 standardized parameters, defining single and multiaxis motion control Communication type Physical media
SERCOS-I
SERCOS-II
SERCOS-III
1987 Yes
1999 Yes
Projected for 2005 Yes
Serial Fiber optics
Serial Fiber optics
Network topology Transmission speed Operating modes
Ring 2 and 4 Mbit/s Velocity, current, and position loops closed in the drive
Ring 2, 4, 8, and 16 Mbit/s Velocity, current, and position loops closed in the drive
Cycle times
Selectable from 62.5 ms and up Same Same No
Selectable from 62.5 ms and up Same Same No
Serial Ethernet twisted-pair copper or fiber optics Line or ring 100 Mbit/s Velocity, current, and position loops closed in the drive or position and velocity loops closed in the control, with only current loop closed in the drive Selectable from 31.25 ms and up Same Same Yes
Yes No <1 µs Yes One per ring 254 per ring, multiple rings allowed IEC 61491/EN 61491
Yes No <1 µs Yes One per ring 254 per ring, multiple rings allowed IEC 61491/EN 61491
Yes Yes <1 µs Yes One per ring/line 254 per ring, multiple rings/ lines allowed IEC 61491/EN 61491
Yes
Yes
Yes
No Plastic: 40 m node to node; 10,000-m maximum ring length Glass: 800 m node to node; 200,000+ m maximum ring length Cyclic: Deterministic time slice for managing realtime communications; speed depends on number of nodes on the ring; cyclic data can be transmitted at rates as low as 62.5 µs Noncyclic: Used for nonreal-time data transfers such as status and diagnostic messages
No Plastic: 40 m node to node; 10,000-m maximum ring length Glass: 800 m node to node; 200,000+ m maximum ring length Cyclic: Deterministic time slice for managing realtime communications: speed depends on number of nodes on the ring; cyclic data can be transmitted at rates as low as 62.5 µs. Noncyclic: Used for nonreal-time data transfers such as status and diagnostic messages
Yes Copper: 100 m Fiber optic: 2000 m
Synchronization Message structures Slave-to-slave communication Service channel Optional IP channel Jitter Determinism Number of masters Maximum number of nodes per network Physical and control standard Support supervisory controls and additional I/O Hot pluggable Maximum cable length
Communication modes
Note: New features in SERCOS-III are highlighted in boldface.
© 2005 by CRC Press
Cyclic: Deterministic time slice for managing realtime communications; speed depends on number of nodes on the ring; cyclic data can be transmitted at rates as low as 31.25 µs. Noncyclic: Used for nonreal-time data transfers such as status and diagnostic messages
38-24
The Industrial Communication Technology Handbook
38.7 SoftSERCANS* SoftSERCANS is the software equivalent of the SERCANS hardware, eliminating the need for an active SERCOS interface master card and replacing it with a passive card, which has a simpler structure and a lower cost.
38.7.1 What SoftSERCANS Is SoftSERCANS is a software interface master function to SERCOS interface-based drives that runs on VenturCom’s RTX Real-Time Windows® software. SoftSERCANS is a SERCOS interface driver to PCbased soft motion control, just as printer drivers are for a PC. With SoftSERCANS, the SERCOS interface accomplishes the complete unbundling of software and hardware. (Note that SoftSERCANS is intended to control servo drives from any manufacturer, provided those drives have passed the SERCOS interface conformance test.) SoftSERCANS initializes the SERCOS interface loop and manages the data communication on the loop, managing all services such as real-time data communication, service channel, and diagnostics. A real-time software (e.g., SoftCNC) based on VenturCom’s RTX establishes the communication with SoftSERCANS via Dynamic Link Library (DLL). Features of the SERCOS interface allow SoftSERCANS to make Real-Time Windows a hard deterministic motion control platform. SoftSERCANS enables RealTime Windows to perform read and write functions within a relatively large time frame, designed by the time-slot configuration to compensate for jitter. The SERCOS interface internal clock and the synchronization mechanism establish a rigid submicrosecond time lattice, giving the data communication within the SERCOS interface loop the determinism necessary to satisfy the highest requirements of today’s servo drive technology. SoftSERCANS allows the internal mechanisms of the SERCOS interface to remain a black box to motion control development engineers. It can be downloaded at no charge from www.sercos.de. The only hardware requirement is a passive SERCOS interface card (available from several vendors for approximately $250).
38.7.2 What SoftSERCANS Is Not SoftSERCANS is an important software component of SoftCNC or SoftMotion Control, but it is not a complete control package that can be used as a stand-alone product. The SoftCNC/SoftMotion developer must still create his human machine interface (HMI) and application packages, using SoftSERCANS to provide the black box interface to SERCOS interface drives and SERCOS interface I/O devices. SoftSERCANS is written in C++. The SoftCNC/SoftMotion developer should be familiar with C++ and Windows programming, have experience with real-time applications, and have a basic familiarity with the SERCOS interface.
38.7.3 Performance SoftSERCANS was designed to support a maximum of 40 axes and a minimum SERCOS interface cycle time of 500 ms. A lab test shows that SoftSERCANS can support eight axes with a typical telegram configuration for machine tool with a cycle time of 1 ms, when a 200-MHz Pentium and 4 Mbit/s data transfer rate are used, while consuming less than 10% of the CPU’s resources. Because SoftSERCANS is software only, its performance improves as hardware performance improves. It checks and dynamically adapts to the performance of the processor during installation. Utilizing the 16 Mbit/s SERCON816 ASIC of generation II or the FPGA of SERCOS-III and increasing the speed of the processor will further expand performance and the number of axes supported for a given cycle time. *Adapted from Thriving in the Age of IT, Dr. Mahito Ando, Motion Control, April/May 2000, and descriptions from www.sercos.com.
© 2005 by CRC Press
The SERCOS interface™
38-25
Determinism of the system is ensured by the mechanism of the SERCOS interface, which is independent of the data transfer speed and ensures submicrosecond synchronization accuracy of all axes in the system.
38.7.4 Resources Required SoftSERCANS requires: • • • •
Microsoft Windows NT, version 4.0 (or higher). Minimum of 32 MB of RAM (64 MB recommended). VenturCom RTX real-time operating system, version 4.2 or higher. Passive SERCOS interface card. Cards are available from Automata GmbH and Hilscher GmbH. Check the links on www.sercos.com. • SoftSERCANS Test Program, version 1.0. This is an aid to check functions of SoftSERCANS without real-time access. • Recommended — Microsoft Visual C++, version 5.0 or higher (for testing). • Recommended — Numega Softice Debugger, version 3.2 or higher.
38.8 Conformance Testing Conformance testing of products carrying the SERCOS interface logo is required. Any manufacturer can purchase the IEC 61491 standard and design controls and drives according to the standard. However, the SERCOS interface logo is the property of Interests Group SERCOS interface e.V. and may only be used on SERCOS interface products that have passed a conformance test. This is important to ensure customers of the compatibility and interoperability of products from various manufacturers. Conformance testing of SERCOS interface products is done at the Institute for Control Engineering of Machine Tools and Manufacturing Units (ISW) at the University of Stuttgart. The intent of the conformance test is to ensure the compatibility and interoperability of devices from different vendors in multiple-vendor environments. Tests to various SERCOS interface conformance class levels are available. Manufacturers seeking conformance certification for their products submit their SERCOS interface products to ISW for testing. Once a product has passed the conformance test, the manufacturer can apply to the IGS for a conformance certificate and can apply the SERCOS interface logo to that product.
38.8.1 SERCOS interface Conformance Classes Appendix D of the SERCOS interface International Standard (IEC 61491)* defines three compliance classes (or levels) and the operational functions that a product must exhibit to achieve compliance in each class: Class A: Defines communication only. A product that conforms to Class A can “close a ring.” Class B: Products at this level meet all the requirements for Class A, plus they permit operation of a drive in one or more of the three accepted modes of operation defined in the SERCOS interface specification: torque control, velocity control, and position control. Class C: Products at this level meet all the requirements for Classes A and B, plus they meet the requirements for real-time bit support, data scaling parameters, secondary switchable operation modes, and external feedback support (at the drive). Additional functions are defined in the SERCOS interface specification that, if adhered to, can be called out individually. Examples are slave feedback, parameter set switching, probe cycle, and a host of spindle functions.
*Electrical Equipment of Industrial Machines: Serial Data Link for Real-Time Communication between Controls and Drives, IEC 61491, International Electrotechnical Commission, revised 2002.
© 2005 by CRC Press
38-26
The Industrial Communication Technology Handbook
Note that the SERCOS community has defined a standard subset of the SERCOS interface IDNs for packaging applications (see Section 38.9) and new conformance classes — Communication Classes A to C, Application Classes. These may become a part of a future edition of IEC/EN 61491.
38.8.2 Conformance Test Descriptions A link for downloading conformance Class A test descriptions for both SERCOS interface master and slave devices in PDF format is available at www.sercos.de.
38.8.3 Conformance Test Environment The conformance test environment for SERCOS interface devices consists of Conformizer test software running on a standard PC with an enhanced SERCOS interface card, an optical analyzer, and a digital oscilloscope. There are three facets to the testing: • Logical test: Per the script files in the SERCOS interface Conformizer package • Physical test: Measures optical signals • Time measurement: Handled by special test hardware The Conformizer is an easy-to-use package that includes a high-level scripting language with control flow statements and over 80 functions, variables, and input/output programming features. It includes a command shell, parameter browser, script editor, function browser, script browser, protocol window, and error code browser. The test results can be directly processed in Rich Text Format (RTF) or Hypertext Markup Language (HTML) format. The Conformizer software runs under Windows NT, 2000, and XP. The Conformizer test software is available for purchase by any SERCOS interface vendor who wishes to pretest his products in his own facilities prior to submitting them for the conformance test. This provides 90% assurance of passing the formal test. The Conformizer is sold as a complete test and development environment for SERCOS interface master and slave devices and includes a passive PCI-SERCOS interface card. It operates under the VenturCom RTX system, running Windows NT, 2000, or XP. A runtime license for VenturCom RTX 5.5 is included with the package. The user’s guide for the SERCOS interface slave Conformizer is available for download in PDF format from the ISW Web site.
38.9 The SERCOS interface in Packaging Applications Packaging applications have benefited greatly from the adoption of the SERCOS interface in controls and drives. Many packaging machinery manufacturers are shipping machines today that were designed from the ground up using servo technology, in which hundreds of mechanical components have been eliminated, costs have been reduced, and flexibility has been maximized. The Packaging Machinery Working Group of the North American OMAC (Open Modular Architecture Controls) Users Group published a recommendation in November 2001 to use SERCOS interface (IEC 61491) as an open drive interface for packaging machines. In this context, the group challenged the SERCOS organization to further improve the multivendor interoperability of servo controls and drives on the basis of a packaging profile. The SERCOS interface already specifies more than 500 standardized parameters that define the interaction between controls and drives in a vendor-independent manner. The SERCOS specification supports the interoperability of controls, drives, and I/O devices from different manufacturers on the basis of an open real-time communication system. However, in practice, issues arise that make the idea of plug and play more difficult to implement. Only a subset of all existing parameters is needed for most applications, as some parameters have been designed for very specific use only (i.e., spindle positioning in machine tools).
© 2005 by CRC Press
38-27
The SERCOS interface™
Operation mode Operation mode Primary mode IDN IDN 0032 0032 Primary operation operationmode Secondary IDN Secondary operation operationmode 1 IDN 0033 0033 mode 1
Function Groups with SERCOS Parameters Function Groups with SERCOS Parameter
Position mode IDN IDN 0047 0047
Positioncommand commandvalue value
IDN IDN 0051 0051
Positionfeedback feedbackvalue1 value 1 (motor ( feedback)) feedbackvalue value2 2 Position feedback (external ( feedback))
IDN IDN 0053 0053 ….
Basic Basic Pack Pack Profile Profile AA
Profile for with formultiaxis multi-axiscontroller controllerw ith simple drives ininpositioning mode simpleservo servo-drives positioningmode
IDN 0032
IDN 0040 ….
IDN 0051
IDN IDN 0053 0053
….
…. …
Velocitycommand commandvalue value Velocity Velocity Velocityfeedback feedbackvalue value
Torque Torquemode mode IDN IDN 0080 0080 IDN IDN 0084 0084 ….
Torque command command value value Torque
Extended ExtendedPackPack Profile Profile Profile multiaxis Profilefor form ulti-axiscontroller controllerwith with intelligent drives intelligentservo servo-drives
Differentiation between mandatory and optional parameters
Mandatory Mandatory
Torque feedback feedback value value Torque
Optional Optional
IDN 0032
IDN IDN 0033 0033
….
….
IDN 0036
……
IDN 0040
Probing Probing ….
IDN IDN 0033 0033
IDN 0047
Velocity mode mode Velocity IDN IDN 0036 0036
Optional Optional
Mandatory Mandatory
Mapping of function groups and parameters to different profile types
IDN 0130
value11positive positiveedge edge Probe value
….
….
IDN 0130
IDN IDN 0132 0132
FIGURE 38.20 Example of mapping SERCOS interface parameters in different profile types. Pack Pack ProfileProfile Basic A Basic Basic Pack Profile AA Limits Limits Position Position Position Position Velocity Window Velocity Window Torque/Force Torque/Force
Pack Pack ProfileProfile Basic B Basic Basic Pack Profile B B Limits Limits Velocity Velocity Torque/Force Torque/Force
Data scaling Data scalin g Modulo Modulo Position PositionMode Mode Telegram Type No.No.4 4 Telegram Type
Data Datascaling scalin g Velocity VelocityMode Mode Telegram Type No.No.6 6 Telegram Type
CommunicationClass ClassA Communication AA Communication Class
FIGURE 38.21 Structure of basic pack profiles.
Another issue is that motion control functions can be integrated centrally in the control as well as decentrally in the drives. This depends on the functionality of the servo drives, which may vary from manufacturer to manufacturer. Machine builders have different philosophies and preferences with regard to control architecture. Thus, depending on the drive functions that are used by a control, a different set of SERCOS parameters is required. To keep this complexity manageable and to ensure the highest interoperability of controls and drives, it was decided to implement packaging application-specific profiles as subsets to the SERCOS specification, as illustrated in Figure 38.20. For simple servo drives and frequency converters, a basic profile was defined (see Figure 38.21). This profile consists of Communication Class A covering the following functions:
© 2005 by CRC Press
38-28
• • • •
The Industrial Communication Technology Handbook
Ring configuration for phase run-up and timing Cyclic communication with standard telegrams Diagnostic information Status and control signals
The Basic Pack Profile A defines the parameters that are required to send cyclical position command values to a simple servo drive (telegram type 4). The data scaling of the position command values is rotational, measured in degrees, and is processed in either an absolute or modulo value. The position loop is closed in the drive; the position feedback value is transmitted back to the control. The profile allows limit values to be defined for torque/force, velocity, and position. The scaling of velocity data is in revolutions per minute. Torque and force data are scaled in percentage. The Basic Pack Profile B defines the parameters that are required to send cyclical velocity command values to a frequency converter (telegram type 6). The profile allows limit values to be defined for torque/ force and velocity. The scaling of velocity data is in revolutions per minute. Torque and force data are scaled in percentage. An extended profile was defined for intelligent servo drives, as shown in Figure 38.22. This is similar to the basic profile for simple servo drives and frequency converters, as described above. It also defines additional functions that are realized in the drive and can be used by higher-level controls. Drives that support this extended profile can be commanded by any position, velocity, or torque values. For acceleration and torque data, a preferred scaling is supported, whereas the scaling of position and velocity data can be configured very flexibly. Additional parts of the profile are the driveand NC-controlled homing, as well as a powerful function for high-speed probing. The extended pack profile requires the SERCOS Communication Class B, which contains all functions of Class A, plus the following functions: • Configurable telegram (telegram type 7) • Extended diagnostic information • Configurable real-time control and status bits In addition to the mandatory parameters defined by the profiles, optional parameters and additional functions (e.g., automatic baud rate recognition, detection of the physical order, and firmware up- or download) can be supported per the SERCOS specification. The profiles were published as version 1.1 in December 2003 and currently are implemented within the controls and drives of several suppliers. In parallel, the SERCOS Conformizer has been extended so that the conformance of control and drive devices to the packaging profile can easily be tested and verified. Pack Profile Extended Extended Pack Profile Extended Profile Limits Limits Position Positio n
Homing Homing
Velocity Velocity
DriveNCDriveNCcontrolled controlled controlled controlled
Torque/Force Torque/Force
Extended scaling Extendeddata data scaling Position PositionMode Mode
Velocity VelocityMode Mode
Probing Probing Function Function
Modulo Modul o Torque Mode TorqueMode
CommunicationClass Class (incl.Telegram Telegram No. & Real Real Bits) Communication No. 7No. & Real-time Bits) Communication Class BB(incl. (incl. Telegram 77 -time &-time Bits)
FIGURE 38.22 Structure of the extended pack profile.
© 2005 by CRC Press
The SERCOS interface™
38-29
FIGURE 38.23 Parallel kinematics machine driven by Andronic 2060 control.
38.10 The SERCOS interface in Machine Tool Applications The SERCOS interface was originally developed to meet the needs of the machine tool industry; thus, it incorporates a rich set of parameters for machine tool functions, such as probing, backlash compensation, feed forward, lead screw error compensation, adaptive positioning, move to positive stop, and torque monitoring. The interface provides all parameters necessary for control of machine tool axes, including spindle axis control.
38.10.1 Application Note: Contouring Control System* andron GmbH from Wasserburg, Germany, has been using the SERCOS interface as a digital drive interface on its control units since 1992. The andronic 2060 is currently regarded as one of the fastest contouring control systems for use in complex multiaxis contouring interpolation applications. In contrast to the general rule that speed decreases as the number of axes increases, block change times on this controller remain below 250 ms even with 16 interpolating axes. At a transmission speed of 16 Mbit/s, the system delivers a guaranteed cycle time of 250 ms regardless of the number of axes present in the ring. This means that all axes that are simultaneously involved in an interpolation receive new position data every 250 ms. In combination with high-performance position controllers in the servo amplifier (position control cycle £250 ms), the system can deliver high path velocities (even at a high resolution of 0.1 mm, for example) as well as high precision. The stability and predictability of the SERCOS interface is indispensable, as any timing error or even a transmission repetition can have fatal consequences at the workpiece. Tool grinding machines (for complete machining of drills and cutters) can have up to 11 axes. In practice, this normally means simultaneous interpolation of five axes. With two SERCOS interface rings, the cycle time is 500 or 250 ms at machining speeds of >1 m/min, a resolution of better than 0.1 mm, and a precision of 1 mm. Similar performance can be achieved during camshaft grinding and grinding of knives and scissors.
*Application note and photograph (Figure 38.23) reprinted courtesy of andron GmbH, Wasserburg, Germany and Interests Group SERCOS interface e.V., Stuttgart, Germany.
© 2005 by CRC Press
38-30
The Industrial Communication Technology Handbook
For high-speed cutting applications using Cartesian machines and parallel kinematics machines (Figure 38.23), high path speeds in combination with high contour accuracy and finish quality are required. Processing speeds of 30 m/min at 1/10 mm resolutions and accelerations of up to 8 g are common. During gearwheel honing operations, the machines must meet stringent requirements for synchronization of several drives in the start phase and during workpiece machining. This means axis synchronization in the position controller with a cycle time of 500 ms and a position control cycle rate of 250 ms, using eight machining axes, five of which are interpolating.
38.11 Future Technical Advancements Updates to the SERCOS interface specification are defined by multicompany technical working groups (TWGs) in both North America and Europe, which ensure that the latest enhancements proposed by member companies are incorporated into the specification. At the time of this writing, the SERCOS Technical Working Groups are involved in specifying requirements for SERCOS-III and for advanced conformance tests for SERCOS interface products. The SERCOS Technical Working Groups are also working on an update to IEC 61491. This consists of a number of updates and new features agreed upon since the original standard was introduced in 1995.
Acknowledgments The authors gratefully acknowledge input for this article from various members of the SERCOS trade associations. This chapter was prepared using published materials (see footnotes and additional list below) and information from the SERCOS N.A., Interests Group SERCOS interface e.V., and various SERCOS interface vendor Web sites. Information is believed to be correct, but is subject to change without notice. Please address any comments to [email protected].
Bibliography Ronald Larsen (SERCOS North America), One bus in constant motion, Worldbus Journal, ISA, Research Triangle Park, NC, October 2001, pp. 30–31. SoftSERCANS SERCOS interface Master Connection for PC-Based Control Systems: Functional Description, Bosch Rexroth, Lohr am Main, Germany, 1999. Lawrence Berardinis, SERCOS Lights the Way for Digital Drives, Penton Publishing, Cleveland, August 22, 1994.
Sources for More Information SERCOS interface Standard (IEC 61491) Available from Global Engineering Documents (Englewood, CO), a source for technical standards: http/ /global.his.com Available from IEC (International Electrotechnical Commission) in Geneva, Switzerland: www.iec.ch Note that the title of the standard is “Electrical Equipment of Industrial Machines: Serial Data Link for Real-Time Communication between Controls and Drives.” Searching for the word SERCOS will not locate the standard. Support Organizations A number of support and development tools for the SERCOS interface are available, both from the SERCOS interface trade associations and from various vendors. See the SERCOS association Web sites: Interests Group SERCOS interface e.V.: www.sercos.de SERCOS North America: www.sercos.com
© 2005 by CRC Press
39 The IEC/IEEE Train Communication Network 39.1 39.2 39.3 39.4 39.5
Hubert Kirrmann ABB Corporate Research
Pierre A. Zuber DaimlerChrysler Rail Systems (NA) Inc.
Introduction ......................................................................39-1 General Architecture .........................................................39-2 Wire Train Bus...................................................................39-3 Multifunction Vehicle Bus ................................................39-5 Common Protocols ...........................................................39-5 Data Traffic • Medium Access Control for Periodic and Sporadic Traffic • Process Variable Transmission • Message Transfer • Network Management • Conformance Testing • State of the Work
39.6 Conclusion.......................................................................39-11 References ...................................................................................39-12
39.1 Introduction Automatic coupling of railway vehicles has existed since the mechanical Jenny coupler at the turn of the 19th century. The railway industry’s challenge at the dawn of the 21st century was automatic coupling of a vehicle’s electronic equipment through a data bus. This required a worldwide standardization of onboard data communication. A joint effort by the International Railways Union (Union Internationale des Chemins de Fer (UIC)), Utrecht, Netherlands, and the International Electrotechnical Committee (IEC), Geneva, Switzerland, has laid the ground for this standardization: • The International Railways Union groups all national rail operators worldwide and ensures crossborder traffic by standardizing track profiles, pneumatic hoses, traction voltages, operating procedures, etc. • The International Electrotechnical Committee is well known to IEEE members by its impressive collection of standards in the electric world, and as the “electric sister” of the International Organization for Standardization (ISO). Deputies from over 20 countries, including many European nations, the U.S., Japan, and China, representing major railways operators and manufacturers, worked several years within Working Group 22 (WG22) in the definition of the Train Communication Network (TCN). The TCN was adopted as the international standard IEC 61375 in 1999 [1]. The IEEE Rail Transit Vehicle Interface Standards Committee Working Group 1 contributed to this work in the late phase and adopted TCN as IEEE 1473–1999 Type T with no modifications the same year [2].
39-1 © 2005 by CRC Press
39-2
The Industrial Communication Technology Handbook
Train bus node
node
node
Vehicle bus
Vehicle bus
Vehicle bus
FIGURE 39.1 Train communication network.
An international standardization of data communication is necessary at both the train and vehicle levels. Trains with varying composition during daily service — such as metros or suburban and international trains — need a standard form of data communication for train control, diagnostics, and passenger information. Such communication should configure itself automatically when vehicles are coupled on the track. At the vehicle level, a standard attachment of equipment serves manufacturers, suppliers, and operators. Manufacturers assemble pretested units, such as doors manufactured by subcontractors, which include their own computers. Parts suppliers who interface with different assemblers reduce development costs by adhering to one standard. Railroad operators reduce spare parts and simplify maintenance and part replacement.
39.2 General Architecture The TCN architecture addresses all relevant configurations used in rail vehicles. It comprises the train bus connecting the vehicles and the vehicle bus connecting the equipment aboard a vehicle or group of vehicles, as shown in Figure 39.1. A vehicle may carry none, one, or several vehicle buses. The vehicle bus may span several vehicles, as in the case of mass-transit train sets (multiple units) that are not separated except in the works. In closed train sets, where the train bus needs no sequential numbering of nodes, the vehicle bus may serve as a train bus, as shown in Figure 39.2. (a)
Open train
WTB 860 m (without repeater) MVB
conduction vehicle (b)
0 vehicle bus
1 vehicle bus
MVB
XXX
2 vehicle bus WTB
Connected train sets MVB
XXX
1 vehicle bus 200 m (without repeater) (c)
MVB
Closed train MVB 1 vehicle bus
MVB
XXX
0 vehicle bus 200 m (without repeater)
FIGURE 39.2 (a) Open train with the multifunction vehicle bus as the vehicle bus (in some vehicles) and the wire train bus as the train bus. (b) Train sets connected with the wire train bus as the train bus and vehicles interconnecting with the multifunction vehicle bus. (c) Closed train, such as a tilting train, with the multifunction vehicle bus as both a train bus and vehicle bus — a nonstandard bus can also be integrated as a vehicle bus.
© 2005 by CRC Press
39-3
The IEC/IEEE Train Communication Network
descending
Starboard
ascending Bottom
2 62
Top top
Port
bottom 1
1 63
2
2
01 master
02
1
1
03
2 04
05
06
07
“S” (starboard) 1 A
B 2
B 2
1 A 02
master
A 1
2 B
B 2
1 A 03
“P” (Port)
A 1
2 B
top
04
FIGURE 39.3 Wire train bus.
39.3 Wire Train Bus To respond to the demand for train-level standardization, WG22 specified the wire train bus (WTB) as part of the TCN architecture. The WTB interconnects vehicles over hand-plug jumper cables or automatic couplers, as shown in Figure 39.3. WG22 considered several media. It rejected the coaxial cable because of its poor mechanical resistance to shocks and vibrations. Optical fiber (fiber train bus) was also dismissed at the train level because of difficulties in building automatic couplers that would withstand shocks and vibrations, as well as harsh weather conditions. Therefore, as its name implies, the WTB uses a twisted shielded wire pair, which has demonstrated its reliability in several European trains. Originally, the WTB shared the established UIC cable with the wire carrying the signals for controlling lights, loudspeakers, and doors in international vehicles. Due to these wires’ limited bandwidth and in view of future requirements, the UIC decided to add to its wires a dedicated, twisted shielded wire pair capable of carrying data at 1 Mbit/s. The WTB layout is in principle redundant; one cable runs on each side of the vehicle, as shown in Figure 39.4. The WTB can span 860 m, a distance corresponding to 22 standard UIC vehicles, without a repeater. This requirement allows connecting older vehicles not equipped with the new data bus onto a train. It also allows by-passing of vehicles with a low battery — a major concern because batteries discharge when vehicles rest in the marshalling yard. The WTB may have to operate under very harsh environmental conditions where contacts can oxidize. To clean oxidized connectors or contacts, fritting (direct-current (DC) voltage to break the oxide layer) is superimposed on the lines. The WTB operates according to the high-level data link control (HDLC) protocol (ISO 3309), but it does not transmit data in the classic sequence of logic 1’s and 0’s, technically known as non-return to zero inverted (NRZI). Instead, the WTB uses Manchester encoding, which combines clock and data in a synchronous bit stream. This signal is decoded by a quadrature/phase demodulator, allowing the bridging of 860 m of cables and 21 connectors at 1 Mbit/s. redundant nodes
WTB cable vehicle
UIC lines
Line B
classic
1
1 jumper
UIC lines Line A
WTB node
WTB node
2
Line B classic
WTB node
jumper
Line A
top view UIC data cable
FIGURE 39.4 WTB cabling (top view).
© 2005 by CRC Press
2
39-4
The Industrial Communication Technology Handbook
Intermediate node(s)
End node
End node
Trunk cable Terminators (inserted)
+
Terminators (inserted)
Jumper cable
−
− +
+
−
− +
+ −
− +
+
−
Bus controllers
Bus controllers
Bus controllers
Bus controllers
Two channels active
One channel active
One channel active
Two channels active
FIGURE 39.5 Detailed view of wire train bus.
The WTB’s most salient feature (and a unique trait in the industry) is that it automatically numbers nodes in sequential order and allows all nodes to recognize the train’s right and left sides, as well as the aft and fore directions. Each time the train composition changes, that is, after adding or removing vehicles, the train bus nodes execute the inauguration procedure, which connects electrically and assigns a sequential address to each node. In general, there is one node per vehicle, but as shown in Figure 39.3, there may be more than one node or none at all. At the end of the inauguration, all vehicles recognize the train topography, including: • • • •
Their own address, orientation (right and left), and position with respect to the bus master Other vehicles’ number and position in the train Other vehicles’ type and version (locomotive, coach, and so on) and their supported functions Their own and other vehicles’ dynamic properties (for example, the presence of a driver)
For inauguration, each node is composed of two HDLC channels, one for each direction (forward, backward), as shown in Figure 39.5. During operation, the end nodes insert their termination resistors to close the bus, while the intermediate nodes establish a chain of continuity between the end nodes. On the end nodes, two channels are active, one for the bus traffic and one for detecting additional nodes. On the intermediate nodes, only one channel is active; the other is isolated to reduce the bus load. When a train composition consisting of N nodes is operating, both of its end nodes send a “We are N nodes” frame every 50 ms toward the open extremity. The rest of the time, the end nodes listen to detect additional nodes. When a second composition consisting of M nodes is coupled to the first, the end node of the first composition detects the “We are M nodes” sent by the end node of the second composition, while the second composition detects the “We are N nodes” of the first. What follows depends on the respective number of nodes: the compositions compare the respective strengths (M, N) and the smaller composition disbands. If both compositions have the same strength, the disbanding decision is random. The winning composition integrates the nodes of the disbanded composition one by one. Each time the winning bus segment integrates a new node, this node receives its address and becomes the next end node, while the former end node switches to the intermediate position. The principle is simple, but inauguration is complex since it requires correct node numbering in many situations. For instance, nodes with a low battery recharging may wake up from the low-power sleep mode to the active mode in the middle of an already inaugurated composition. A connector may be damaged on one of the redundant lines (Figure 39.4 shows both lines, Figure 39.5 only one), but this should not affect numbering. Nodes could start operating as backup in case a working node fails. In case the current bus master fails, recovery must take place in less than 1 s for 32 nodes in the worst case. To allow this, every node may become master; mastership of a failed node automatically transfers to a neighboring node. The dining car, for example, can become the bus master, but since all TCN traffic is slave to slave, it will not control the train. In Figure 39.3, for instance, one of the passenger cars controls the bus.
© 2005 by CRC Press
39-5
The IEC/IEEE Train Communication Network
Radio Train bus
Power line
Cockpit
Vehicle Bus
Diagnosis
Brakes
Motors Power electronics
Track signals
FIGURE 39.6 MVB layout in a locomotive.
Once inauguration is finished, the nodes broadcast their configurations to each other, indicating, for instance, that they represent a locomotive, a motor coach, or a driver coach. They also broadcast properties such as the length between buffers and weight (for calculating the brake parabola). This requires a strict definition of the data exchanged and relies on railway experts. The WTB data traffic and the exact meaning of each variable and each bit are standardized in UIC leaflet 556 [3].
39.4 Multifunction Vehicle Bus To address assembly, commissioning, and subsystem reuse, the TCN architecture specifies the multifunction vehicle bus (MVB) as a vehicle bus. The MVB connects equipment within a vehicle or within different vehicles in closed train sets. Figure 39.6 shows what subsystems MVB could connect in a locomotive. The MVB operates at a speed of 1.5 Mbit/s and over the following media: • Optical fibers are recommended for distances over 200 m and for environments sensitive to electromagnetic interference (in locomotives). MVB specifies 240-mm fibers, which are more robust against cracks and vibrations than standard telecom fibers. • Transformer-coupled, 120-ohm twisted-wire pairs for distances of up to 200 m to connect two or three vehicles in a train set. These specifications resemble those of IEC 61158 but use 120-ohm cables for better robustness and low attenuation. • RS-485 with 120-ohm cables for cost-effective device connection within the same cabinet or on the same backplane with no galvanic separation. When galvanically separated, this cable can connect equipment in different vehicles in closed train sets. These different media can be directly interconnected with repeaters, since they operate at the same speed with the same signaling. The MVB is based on the bus pioneered on the Swiss Locomotive 460 and is used in over 600 vehicles worldwide. The MVB enables considerable savings and increased reliability with respect to conventional wiring. A dedicated master controls the MVB. For higher availability, several devices can become bus master when needed. The MVB bus controller provides redundancy at the physical layer: a device transmits simultaneously on both redundant lines; receivers listen to and monitor both lines. Other features include high integrity against data corruption and, due to its robust Manchester encoding and checksums, fulfillment of the IEC 60870-5 FT2 class. The achieved Hamming distance is 8 when using fiber optics [4].
39.5 Common Protocols Despite differences at the physical and link layers, the WTB and MVB adhere to the same operating principles.
39.5.1 Data Traffic TCN buses transport two types of data: process variables and messages.
© 2005 by CRC Press
39-6
The Industrial Communication Technology Handbook
Basic period Periodic phase
Sporadic phase
1 2 3 4 5 6 ? ? ? ?
Basic period Periodic phase
Sporadic phase
1 2 7 8 9 10 ? ? ! ? 1 2 3 Events ? Time Event data Guard time
Event query Guard time
FIGURE 39.7 Alternating periodic and sporadic data transmissions lets a single bus transmit both types of data. Process variables are transmitted at regular intervals (e.g., 1 ms). After a periodic phase, the bus checks for sporadic traffic requests and transmits, if requested, a message packet, provided there is sufficient time until the next periodic phase (guard time is respected).
Process variables reflect the train’s state, such as speed, motor current, and operator’s commands. The transfer time for process variables must be short and deterministic. The railways required the Train Communication Network to guarantee less than 100 ms of delivery delay from a device on a first vehicle bus to a device on a second vehicle bus, both vehicle buses being connected by the train bus. Traction control over the vehicle bus requires delivery from application to application for all critical variables within less than 16 ms. To guarantee these delays, the Train Communication Network transmits all process variables periodically. Message data carry infrequent, but possibly lengthy information, such as diagnostics or passenger information. Message length varies between a few bytes and some kilobytes. Message transmission delay must be short on average, but the application tolerates delays up to some seconds. This slack timing requirement allows the TCN to transmit messages on demand.
39.5.2 Medium Access Control for Periodic and Sporadic Traffic All buses pertaining to the TCN provide two basic data transmission services: • Periodic (for process variables) • Sporadic (for on-demand data traffic, such as messages) Periodic and sporadic data traffic share the same bus, but devices treat each separately. One device acting as master controls periodic and sporadic data transmission, which guarantees deterministic medium access. To accomplish this, the master alternates periodic and sporadic phases, as shown in Figure 39.7. Traffic is divided into basic periods of fixed duration — either 1 or 2 ms on the MVB, 25 ms on the WTB, respectively. At the start of a period, the master polls the process variables in sequence during a certain time — the periodic phase — according to a predefined poll list. To reduce traffic, urgent data are transmitted every period and less urgent variables are transmitted with an individual period every second, fourth, eighth, etc., basic period, with the longest period being 1024 ms. After transmitting the process variables, the bus master checks for sporadic data to transmit. On the WTB, a flag in the periodic data signals that a node has sporadic data pending. On the MVB, an arbitration procedure ensures that one of several requesting devices gets serviced. If there are no sporadic data to transmit, the sporadic phase remains unused. If there are data, the master checks that sufficient time remains to start timely the next period (it respects the guard time) and, if so, invites a device to transmit its sporadic data. A highly precise start of the next period is requested because the first master frame of a period serves to synchronize all clocks with a jitter of some microseconds.
39.5.3 Process Variable Transmission In the first phase of process variable transmission, the master broadcasts a frame to trigger transmission of a certain variable without specifying the source device. In a second phase, the source device answers by broadcasting a frame containing the requested value to all devices. Each device interested in this variable picks up the value, as shown in Figure 39.8.
© 2005 by CRC Press
39-7
The IEC/IEEE Train Communication Network
Subscribed devices Bus Master
Sink
Sink
Source
Sink
Devices (slaves) Bus
Variable identifier (a)
Subscribed devices Bus Master
Source
Sink
Sink
Sink
Devices (slaves) Bus
Variable value (b)
FIGURE 39.8 Source-addressed broadcast. In the first phase (a), the master broadcasts a short master frame with the identifier of a variable, taken from its poll list. In the second phase (b), the source device sends the variable’s value in a slave frame; all devices interested in that value receive it. The master is normally neither source nor sink of the variables.
To increase efficiency, slave frames carry numerous variables having the same period, called a data set. A data set contains values and check bits, but no addresses. Each variable is identified by its offset from the data set’s beginning; each data set is identified by the address in the master frame. To maintain determinism, the configuration tools define the frame format and the poll list before operation starts. After this, the traffic pattern cannot change. On the MVB, each device can subscribe (as either a source or sink) to up to 4096 data sets. On the WTB, each node has only one data set to broadcast, but it can receive up to 32 data sets from other nodes. The sources or sink data sets with the values of the variables are stored in a shared memory, called the traffic store. The application processor and the bus controller can access simultaneously the traffic store on a device (no semaphores). The traffic stores implement conjointly a distributed database, as shown in Figure 39.9, which the bus keeps synchronized in the background. Application programmers only see the traffic stores and do not care about bus traffic.
Cyclic algorithms Cyclic poll
Cyclic algorithms
Application 1 Application 2
Cyclic algorithms
Application 3 Application 4
source port Ports
Ports
Poll list
Cyclic algorithms
sink port
Ports
Ports
Traffic stores
sink port Bus controller Bus controller Bus controller Bus controller Bus controller Bus Port address
Port data
FIGURE 39.9 Traffic stores as a distributed database actualized by the bus.
© 2005 by CRC Press
39-8
The Industrial Communication Technology Handbook
Train bus Train-vehicle Bus Passenger master Air condition gateway info Device Doors sensors, actors
Device
Device
Vehicle Bus
Device Device Doors Sensor bus
Brakes
FIGURE 39.10 Map of logically addressed application functions that are typical in a railway car.
Source-addressed broadcast lets applications and the bus operate independently. The application processor is only interrupted on reception or transmission for a special time synchronization data set. End-to-end determinism is ensured by the periodic operation of the application processes (programmed, e.g., in IEC 61131) and of the bus. Since the master periodically requests the transmission of process variables, there is no need for an explicit retransmission after an occasional loss. To cope with persistent faults, each bus controller maintains a counter for each variable, indicating how long ago the bus refreshed the variable. In addition, the application can transmit a check variable along with a critical variable in the same data set to certify the timely and correct production of that variable. The application accesses process variables either individually or (more efficiently) by clusters. The process data application layer marshalls transmitted data to the individual application variables. It also converts data types to and from the representation used by the application. The gateway between the WTB and MVB copies variables from one bus to the other and can synchronize the cycles. The gateway can also combine variables; for example, it can build a compound variable indicating that all doors are closed in its vehicle.
39.5.4 Message Transfer Applications exchange messages transparently over the TCN. An application accesses another application the same way, whether its peer resides on the same station, the same bus, or anywhere else on the network. To cope with a variety of vehicles and equipment, the TCN uses logical addresses for messages. Every node of the train bus supports several application functions, as shown in the map in Figure 39.10. From an outside node on the WTB, the internal organization of the vehicle is not visible; it seems as if the train bus node executes all the functions. One or more devices on the vehicle bus or the train bus node itself can execute the application functions. A device might execute several functions, or different devices can execute one function. The same principle applies to functions communicating over the vehicle bus — the application need not recognize where the other functions reside. Applications communicate on a remote procedure call basis. A conversation consists of a call message sent by the client and a reply message sent by the remote server. The network retains no memory of a conversation once transmission is successful or timed out. This is more efficient in terms of memory and timers than Transmission Control Protocol (TCP)-like stream-oriented protocols for the kind of traffic encountered in rail vehicles. The communication layer divides these call or reply messages into small packets for transmission. Each packet carries a full address, which identifies its source and destination. The train bus nodes route the packets using a function directory that indicates which device executes which function. This function directory is static. A dynamic actualization would have been more analogous to plug and play, but would have caused long recovery delays in case of fault. A classical sliding window retransmission protocol implements flow control and error recovery. Only caller or replier devices execute this transport protocol; router nodes only intervene in exceptional cases (during inauguration, for instance).
© 2005 by CRC Press
The IEC/IEEE Train Communication Network
39-9
39.5.5 Network Management Network management helps configure, commission, and maintain the TCN. A network manager can connect to the TCN, for instance, as a vehicle bus device. The network manager has access to all devices — in any vehicle — connected to the TCN. The network manager can inspect and modify other devices through an agent (an application task running in each station). The agent has local access to managed objects such as process variables, protocols, memory, tasks, and clocks. The standard specifies the management services to read and write the managed objects, along with formatting the network manager messages.
39.5.6 Conformance Testing Interoperation will only succeed if manufacturers can validate that their devices conform to the TCN specifications. Conformance testing guidelines let manufacturers test their products against the standard. In particular, this requirement applies to WTB nodes, which must operate without adjustment when vehicles of any origin are coupled. The MVB has similar requirements when it comes to plug-in interchangeability. To address conformance testing, WG22 developed a set of guidelines. This is only the first step toward a full program of conformance testing that an independent agency, such as the European Railways Research Institute, would perform.
39.5.7 State of the Work Standardization has prompted numerous railway manufacturers to support TCN-compliant product development. Applications include signaling, radio communication, and Web access to rolling stock. 39.5.7.1 Development A joint development project by a group of manufacturers — Adtranz, Siemens, and Firema-Ercole Marelli — supports the TCN and has helped to demonstrate its technical capabilities. The joint development project members combined forces to develop a complete TCN with all the necessary hardware and software. This group committed itself toward the IEC to make these components available to any interested party under reasonable conditions. 39.5.7.2 ERRI Test Train Although the working group derived the WTB and MVB from existing, railways-proven solutions, important modifications made in response to user requirements demanded a complete test of the TCN. The UIC, through the European Railways Research Institute (ERRI), sponsored a full-scale TCN implementation from May 1994 to September 1995. It tested the TCN in the lab and on an existing track. The test train was equipped by different manufacturers (Adtranz, Siemens, Firema, and Holec) and with coaches from Italy, Switzerland, Germany, and the Netherlands. The ERRI put this train into revenue service between Interlaken, Switzerland, and Amsterdam, Netherlands. This test validated the interoperation of a mixed system and confirmed the standard documents’ completeness. The valuable experience gained on this train improved the standard, especially in relation to its impact on existing systems and on exploitation issues (e.g., the necessity for the personnel to verify that the two cables are plugged). 39.5.7.3 Standardization Although the technical standard work was nearly complete after the ERRI test train, it took four more years to meet the quality requirements for a standard. This long delay is not uncommon in standards work: while the original documents focused on the technical problem, the final documents focus on specifying the interface to ensure that standard-compliant devices are interoperable. This approach differs from the current tendency to base standards on product specifications, allowing several variants, profiles,
© 2005 by CRC Press
39-10
The Industrial Communication Technology Handbook
and incompatible options. For instance, the conformance test lists several properties that a device must exhibit to bear the name of IEC 61375. These properties range from the connector screws to the type of messages that the device must send and receive. The result is that the IEC passed this standard in 1999 with nearly unanimous approval. The IEEE Rail Transit Vehicle Interface Standards Committee adopted the TCN as IEEE 1473 for onboard data communication. Here, the focus of standardization was defining the TCN application domain, assuming that other bus types aboard the same vehicle may exist. The IEC shares this focus, stating that the WTB and MVB should not be mandatory in applications that do not require interoperability and interchangeability. 39.5.7.4 Eurocab Project A full-size test rig in Brussels demonstrated TCN application to safety tasks — in particular, automatic train operation. Here, signaling equipment from different manufacturers with different safety philosophies interoperated on a simulated train. This test took place within the Eurocab project and was part of the larger European Train Management System (ERTMS). The network’s deterministic nature let the safety analysis focus on hardware failures and disturbances [5]. 39.5.7.5 ROSIN Project A common communication protocol is necessary but not sufficient to ensure interoperation. Applications must also have standardized objects to exchange and to access equipment and subassemblies regardless of the manufacturer. To address this need, the European Union set up the 3-year ROSIN (Railways Open System Interconnection Network) project [6]. About 20 different firms collaborated on this project to define device profiles for different applications, such as: • • • • • •
Passenger trains with locomotives Freight trains Mass transit Equipment interfaces (for propulsion, brakes, doors, air conditioning, etc.) Radio links Signaling
ROSIN defined the exchanged data down to the individual bit level. Standardized data representation definitions exist, but it would be unrealistic to force all existing equipment to switch to a common dataencoding scheme. Rather, project members defined a notation (ROSIN Notation or Retrofit Notation) that describes arbitrary bit fields. For instance, say that certain equipment exports the vehicle speed; ROSIN recommends using SI units, e.g., representing speed in meters/second as a 32-bit real number in big-endian format. A manufacturer may, however, use a different representation in its device description file, such as speed_mph, by which a 16-bit integer expresses speed from 0 to 200 mph in little-endian format. By looking at this specification, the user of this device knows how to convert the variable’s type. The ROSIN project concluded with a demonstration of a Web access to the vehicles, called RoMain (ROSIN Maintenance), with the participation of ABB Corporate Research and the Swiss Federal Institute of Technology, Lausanne. The group equipped a local commuter train between France and Spain with a radio link and a Web server. This demonstrated that a PC-based Web server could understand the data traffic on the MVB just by using the equipment description files. The demonstration was impressive — users could inspect vehicle data while the train was running with a standard browser from anywhere in the world, via the system architecture shown in Figure 39.11. The main challenge was database management. Indeed, because of the radio links’ limited bandwidth, the devices’ static information is not located on the devices themselves, but on the railway operator’s Web server. This arrangement arose from the fear that mergers and sales among device manufacturers would rapidly make the Web links obsolete. This makes updating handbooks and maintenance manuals easy, but requires a rather high administrative effort.
© 2005 by CRC Press
39-11
The IEC/IEEE Train Communication Network
Railways operators
Remote Romain clients
Manufacturers Open Doors Ltd
EuroRail
ROSIN server (railways directory) Access to static equipment information (web pages) internet TCN Access to dynamic equipment information (variable values, logs)
Secure TCP/IP Network
RoGate HTTP server Proxy Proxy
Werks GmbH
Train equipment Proxy
Manager RTP
Agent Agent Agent RTP RTP RTP
TCN management messages
FIGURE 39.11 RoMain architecture for Web access to moving trains.
39.5.7.6 U.S. Initiatives After the IEEE Rail Transit Vehicle Interface Standards Committee Working Group 1 accepted IEEE 1473, work continued in Working Group 9 to define the equipment interface, considering the experience of the American Public Transportation Association and the Transit Communication Interface Profile project of the Department of Transport. In parallel, Working Group 1 proposed to develop an open TCN stack as a clean room implementation. The first WTB-equipped train in the U.S. should be New Jersey’s Comet 5 train.
39.6 Conclusion The IEC/UIC standardization of the TCN ensures a good base for actual and future developments, such as the TrainCom project (www.traincom.org). The number of TCN-equipped vehicles is growing rapidly. All new projects by Adtranz, Firema, Siemens, and several other manufacturers are TCN based. In the Swiss tilting train ICN, a TCN controls the inclination of the coaches in the curves (Figure 39.12).
FIGURE 39.12 The TCN team in front of the ICN at Zurich main station.
© 2005 by CRC Press
39-12
The Industrial Communication Technology Handbook
Several railways are now specifying TCN conformance in their public bids. The standardization of application functions is an indispensable further step to achieve plug-in interchangeability of equipment and vehicles. The TCN technology has spread outside of the railways community. It is used in high-voltage substations and in newspaper printing machines, where real-time constraints are as demanding as in railways.
References [1] IEC 61375, Train Communication Network, International Electrotechnical Committee, Geneva, 1999, http://www.iec.ch. [2] IEEE 1473-1999, IEEE Standard for Communication Protocol Aboard Trains, IEEE, Piscataway, NJ, 1999. [3] UIC B 108.3, Information Transfer in Trains, Leaflet 556, 11th ed., Union Internationale des Chemins de Fer, Utrecht, Netherlands, 1996. [4] Philip Koopman and Tridib Chakravarty, Analysis of the Train Communication Network Protocol Error Detection Capabilities, ECE Department and ICES, Carnegie Mellon University, Pittsburgh, PA, 2000. [5] Bernhard Eschermann, et al., Fail-Safe On-Board Data Bus for Automatic Train Protection, Railtech, Birmingham, 1994, http://www.cordis.lu/transport/src/eurosig.htm. [6] Telematics Applications for Transport Research Information, 4th Framework Research Projects ROSIN TR 1045, Railway Open System Interconnection Network, European Union, Brussels, http: //www.traincom.org.
© 2005 by CRC Press
40 A Smart Transducer Interface Standard for Sensors and Actuators 40.1 40.2 40.3 40.4 40.5 40.6
Introduction ......................................................................40-1 A Smart Transducer Model ..............................................40-2 Networking Smart Transducers........................................40-3 Establishment of the IEEE 1451 Standards.....................40-4 Goals of IEEE 1451 ...........................................................40-4 The IEEE 1451 Standards .................................................40-5 The IEEE 1451 Smart Transducer Model • IEEE P1451.0 Common Functionality • IEEE 1451.1 Smart Transducer Information Model • IEEE 1451.2 Transducer-toMicroprocessor Interface • IEEE P14513 Distributed Multidrop Systems • IEEE P1451.4 Mixed-Mode Transducer Interface • IEEE P1451.5 Wireless Transducer Interface • IEEE 1451 Family • Benefits of IEEE 1451
Kang Lee National Institute of Standards and Technology
40.7 Example Application of IEEE 1451.2 ............................40-13 40.8 Application of IEEE 1451-Based Sensor Network........40-13 40.9 Summary..........................................................................40-14 Acknowledgments......................................................................40-15 References ...................................................................................40-15
40.1 Introduction Sensors are used in many devices and systems to provide information on the parameters being measured or to identify the states of control. They are good candidates for increased built-in intelligence. Microprocessors can make smart sensors or devices a reality. With this added capability, it is possible for a smart sensor to directly communicate measurements to an instrument or a system. In recent years, the concept of computer networking has gradually migrated into the sensor community. Networking of transducers (sensors or actuators) in a system and communicating transducer information via digital means vs. analog cabling facilitates easy distributed measurements and control. In other words, intelligence and control, which were traditionally centralized, are gradually migrating to the sensor level. They can provide flexibility, improve system performance, and ease system installation, upgrade, and maintenance. Thus, the trend in industry is moving toward distributed control with intelligent sensing architecture. These enabling technologies can be applied to aerospace, automotive, industrial automation, military and homeland defenses, manufacturing process control, smart buildings and homes, and smart toys and appliances for consumers.
40-1 © 2005 by CRC Press
40-2
The Industrial Communication Technology Handbook
Network
Transducers (sensors or actuators)
Signal conditioning, analog-to-digital conversion, or digital-to-analog conversion
Application algorithm
Network communication
FIGURE 40.1 A smart transducer model.
As examples, (1) in order to reduce the number of personnel to run a naval ship from 400 to less than 100 as required by the reduced-manning program, the U.S. Navy needs tens of thousands of networked sensors per vessel to enhance automation, and (2) Boeing needs to network hundreds of sensors for monitoring and characterizing airplane performance. Sensors are used across industries and are going global [1]. The sensor market is extremely diverse, and it is expected to grow to $43 billion by 2008. The rapid development and emergence of smart sensor and field network technologies have made the networking of smart transducers a very economical and attractive solution for a broad range of measurement and control applications. However, with the existence of a multitude of incompatible networks and protocols, the number of sensor interfaces and amount of hardware and software development efforts required to support this variety of networks are enormous for both sensor producers and users alike. The reason is that a sensor interface customized for a particular network will not necessarily work with another network. It seems that a variety of networks will coexist to serve their specific industries. The sensor manufacturers are uncertain of which network(s) to support and are restrained from full-scale smart sensor product development. Hence, this condition has impeded the widespread adoption of the smart sensor and networking technologies despite a great desire to build and use them. Clearly, a sensor interface standard is needed to help alleviate this problem [2].
40.2 A Smart Transducer Model In order to develop a sensor interface standard, a smart transducer model should first be defined. As defined in IEEE 1451.2 [3], “a smart transducer is a transducer that provides functions beyond those necessary for generating a correct representation of a sensed or controlled quantity. This functionality typically simplifies the integration of the transducer into applications in a networked environment.” Thus, let us consider the functional capability of a smart transducer. A smart transducer should have: • Integrated intelligence closer to the point of measurement and control • Basic computation capability • Capability to communicate data and information in a standardized digital format Based on this premise, a smart transducer model is shown in Figure 40.1. It applies to both sensors and actuators. The output of a sensor is conditioned and scaled, then converted to a digital format through an analog-to-digital converter. The digitized sensor signal can then be easily processed by a microprocessor using a digital application control algorithm. The output, after being converted to an analog signal via a digital-to-analog converter, can then be used to control an actuator. Any of the measured or calculated parameters can be passed on to any device or host in a network by means of a network communication protocol. The different modules of the smart transducer model can be grouped into functional units as shown in Figure 40.2. The transducers and signal conditioning and conversion modules can be grouped into a building block called a Smart Transducer Interface Module (STIM). Likewise, the application algorithm and network communication modules can be combined into a single entity called a network-capable application processor (NCAP). With this functional partitioning, transducer-to-network interoperability can be achieved in these manners:
© 2005 by CRC Press
40-3
A Smart Transducer Interface Standard for Sensors and Actuators
Network
Transducers (sensors or actuators)
Signal conditioning, analog-to-digital conversion, or digital-to-analog conversion
Smart Transducer Interface Module (STIM)
Application algorithm
Network communication
Network Capable Application Processor (NCAP)
FIGURE 40.2 Functional partitioning. Network
Transducers (sensors or actuators)
Signal conditioning, analog-to-digital conversion, or digital-to-analog conversion
Application algorithm
Network communication
Integrated Networked Smart Transducer
FIGURE 40.3 An integrated networked smart transducer.
1. STIMs from different sensor manufacturers can plug and play with NCAPs from a particular sensor network supplier. 2. STIMs from a sensor manufacturer can plug and play with NCAPs supplied by different sensor or field network vendors. 3. STIMs from different manufacturers can be interoperable with NCAPs from different field network suppliers. Using this partitioning approach, a migration path is provided to those sensor manufacturers who want to build STIMs with their sensors, but do not intend to become field network providers. Similarly, it applies to those sensor network builders who do not want to become sensor manufacturers. As technology becomes more advanced and microcontrollers become smaller relative to the size of the transducer, integrated networked smart transducers that are economically feasible to implement will emerge in the marketplace. In this case, all the modules are incorporated into a single unit, as shown in Figure 40.3. Thus, the interface between the STIM and NCAP is not exposed for external access and separation. The only connection to the integrated transducer is through the network connector. The integrated smart transducer approach simplifies the use of transducers by merely plugging the device into a sensor network.
40.3 Networking Smart Transducers Not until recently have sensors been connected to instruments or computer systems by means of a pointto-point or multiplexing scheme. These techniques involve a large amount of cabling, which is very bulky and costly to implement and maintain. With the emergence of computer networking technology, transducer manufacturers and users alike are finding ways to apply this networking technology to their transducers for monitoring, measurement, and control applications [4]. Networking smart sensors provide the following features and benefits: • Enable peer-to-peer communication and distributed sensing and control • Significantly lower the total system cost by simplified wiring
© 2005 by CRC Press
40-4
The Industrial Communication Technology Handbook
• • • • •
Use prefabricated cables instead of custom laying of cables for ease of installation and maintenance Facilitate expansion and reconfiguration Allow time stamping of sensor data Enable sharing of sensor measurement and control data Provide Internet connectivity, meaning global or anywhere, access of sensor information
40.4 Establishment of the IEEE 1451 Standards As discussed earlier, a smart sensor interface standard is needed in industry. In view of this situation, the Technical Committee on Sensor Technology of the Institute of Electrical and Electronics Engineers’ (IEEE) Instrumentation and Measurement Society sponsored a series of projects for establishing a family of IEEE 1451 standards [5]. These standards specify a set of common interfaces for connecting transducers to instruments, microprocessors, or field networks. They cover digital, mixed-mode, distributed multidrop, and wireless interfaces to address the needs of different sectors of industry. A key concept in the IEEE 1451 standards is the transducer electronic data sheet (TEDS), which contains manufacture-related information about the sensor, such as manufacturer name, sensor types, serial number, and calibration data and standardized data format for the TEDS. The TEDS has many benefits: • Enables self-identification of sensors or actuators: A sensor or actuator equipped with the IEEE 1451 TEDS can identify and describe itself to the host or network via the sending of the TEDS. • Provides long-term self-documentation: The TEDS in the sensor can be updated and stored with information such as location of the sensor, recalibration date, repair record, and many maintenance-related data. • Reduces human error: Automatic transfer of TEDS data to the network or system eliminates the entering of sensor parameters by hands, which could induce errors due to various conditions. • Eases field installation, upgrade, and maintenance of sensors: This helps to reduce life cycle costs because only a less skilled person is needed to perform the task by simply using plug and play. IEEE 1451, designated Standard Transducer Interface for Sensors and Actuators, consists of six document standards. The current statuses of their development are as follows: 1. IEEE P1451.0,* Common Functions, Communication Protocols, and Transducer Electronic Data Sheet (TEDS) Formats — in progress 2. IEEE 1451.1, Network Capable Application Processor (NCAP) Information Model for Smart Transducers [6] — published standard 3. IEEE 1451.2, Transducer to Microprocessor Communication Protocols and Transducer Electronic Data Sheet (TEDS) Formats — published standard 4. IEEE P1451.3, Digital Communication and Transducer Electronic Data Sheet (TEDS) Formats for Distributed Multidrop Systems — balloted and awaiting IEEE approval in September 2003 5. IEEE P1451.4, Mixed-Mode Communication Protocols and Transducer Electronic Data Sheet (TEDS) Formats— balloted and expected to be submitted to IEEE for approval in December 2003 6. IEEE P1451.5, Wireless Communication and Transducer Electronic Data Sheet (TEDS) Formats — in progress
40.5 Goals of IEEE 1451 The goals of the IEEE 1451 standards are to: • Develop network-independent and vendor-independent transducer interfaces
*P1451.0: The P designation means P1451.0 is a draft standard development project. Once the draft document is approved as a standard, P will be dropped.
© 2005 by CRC Press
A Smart Transducer Interface Standard for Sensors and Actuators
Any network
40-5
NCAP Transducer module interface
P1451.0 Common Functional Transducer Interface
Transducer Module
Transducer Module
Common functions
Common functions
Transducer Electronic Data Sheets (TEDS)
Transducer Electronic Data Sheets (TEDS)
Common functions
Common functions
FIGURE 40.4 The block diagram for IEEE P1451.0.
• Define TEDS and standardized data formats • Support general transducer data, control, timing, configuration, and calibration models • Allow transducers to be installed, upgraded, replaced, and moved with minimum effort by simple plug and play • Eliminate error-prone, manual entering of data and system configuration steps • Ease the connection of sensors and actuators by wireline or wireless means
40.6 The IEEE 1451 Standards 40.6.1 The IEEE 1451 Smart Transducer Model The IEEE 1451 smart transducer model parallels the smart transducer model discussed in Figure 40.2. In addition, the IEEE 1451 model includes the TEDS. The model for each of the IEEE 1451.X standards is discussed in the following.
40.6.2 IEEE P1451.0 Common Functionality Several standards in the IEEE 1451 family share certain characteristics, but there is no common set of functions, communication protocols, and TEDS formats that facilitate interoperability among these standards. The IEEE P1451.0 standard provides that commonality and simplifies the creation of future standards with different physical layers that will facilitate interoperability in the family. This project defines a common functionality for the family of IEEE P1451 smart transducer interface standards. This functionality is independent of the physical communications media. It includes the basic functions required to control and manage smart transducers, common communication protocols, and media-independent transducer electronic data sheet formats. The block diagram for IEEE P1451.0 is shown in Figure 40.4. P1451.0 defines functional characteristics, but it does not define any physical interface.
40.6.3 IEEE 1451.1 Smart Transducer Information Model The IEEE 1451.1 standard defines a common object model for the components of a networked smart transducer and the software interface specifications to these components [7]. Some of the components are the NCAP block, function block, and transducer block.
© 2005 by CRC Press
40-6
The Industrial Communication Technology Handbook
Physical Transducer
Transducer Block Network Ports
Communication Interface Client / Server and Publish / Subscribe
Transducer Block Transducer Block
Transducer Interface (for example: 1451.2) Contains Other Software Objects (i.e., Parameters, Actions, and Files)
FIGURE 40.5 Conceptual view of IEEE 1451.1.
The networked smart transducer object model provides two interfaces. 1. The interface to the transducer block, which encapsulates the details of the transducer hardware implementation within a simple programming model. This makes the sensor or actuator hardware interface look like an input/output (I/O) driver. 2. The interface to the NCAP block and ports encapsulate the details of the different network protocol implementations behind a small set of communication methods. Application-specific behavior is modeled by function blocks. To produce the desired behavior, the function blocks communicate with other blocks both on and off the smart transducer. This common network-independent application model has the following two advantages: 1. Establishment of a high degree of interoperability between sensors/actuators and networks, thus enabling plug-and-play capability. 2. Simplification of the support of multiple sensor/actuator control network protocols. A conceptual view of IEEE 1451.1 NCAP is shown in Figure 40.5, which uses the idea of a backplane or card cage to explain the functionality of the NCAP. The NCAP centralizes all system and communications facilities. Network communication can be viewed as a port through the NCAP, and communication interfaces support both client–server and publish–subscribe communication models. Client–server is a tightly coupled, point-to-point communication model where a specific object, the client, communicates in a one-to-one fashion with a specific server object, the server. On the other hand, the publish–subscribe communication model provides a loosely coupled mechanism for network communications between objects, where the sending object, the publisher object, does not need to be aware of the receiving objects, the subscriber objects. The loosely coupled, publish–subscribe model is used for one-to-many and many-to-many communications. A function block containing application code or control algorithm is plugged in as needed. Physical transducers are mapped into the NCAP using transducer block objects via the hardware interface, for example, the IEEE 1451.2 interface. The IEEE 1451 logical interfaces are illustrated in Figure 40.6. The transducer logical interface specification defines how the transducers communicate with the NCAP block object via the transducer block. The network protocol logical interface specification defines how the NCAP block object communicates with any network protocol via the ports.
40.6.4 IEEE 1451.2 Transducer-to-Microprocessor Interface The IEEE 1451.2 standard defines a TEDS, its data format, and the digital interface and communication protocols between the STIM and NCAP [8]. A block diagram and detailed system diagram of IEEE 1451 are shown in Figure 40.7 and Figure 40.8, respectively. The STIM contains the transducer(s) and the TEDS, which is stored in a nonvolatile memory attached to a transducer. The TEDS contains fields that describe the type, attributes, operation, and calibration of the transducer. The mandatory requirement for the TEDS is only 179 bytes. The rest of the TEDS specification is optional. A transducer integrated with the TEDS provides a very unique feature that makes possible the self-describing of transducers to
© 2005 by CRC Press
A Smart Transducer Interface Standard for Sensors and Actuators
Any Arbitrary Network
Network Protocol
Transducer Software
Network Hardware
40-7
I/O Port Hardware
Application Software (Function Blocks) Transducers
NCAP Block
Server Objects Dispatch, Ports NCAP Transducer Logical Interface Specification
Network Protocol Logical Interface Specification
Transducer Hardware Interface Specification (e.g., IEEE 1451.2)
FIGURE 40.6 IEEE 1451 logical interfaces.
the system or network. Since the manufacture-related data in the TEDS always goes with the transducer, and this information is electronically transferred to an NCAP or host, human errors associated with manual entering of sensor parameters into the host are eliminated. Because of this distinctive feature of the TEDS, upgrading transducers with higher accuracy and enhanced capability or replacing transducers for maintenance purpose is simply considered plug and play. Eight different types of TEDS are defined in the standard. Two of them are mandatory and six are optional. They are listed in Table 40.1. The TEDS are divided into two categories. The first category contains data in a machine-readable form that is intended to be used by the NCAP. The second category contains data in a human-readable form. The human-readable TEDS may be represented in multiple languages using different encoding for each language.
Any network
NCAP With
IEEE 1451.1 Smart Transducer Object Model
FIGURE 40.7 Block diagram of IEEE 1451.
© 2005 by CRC Press
IEEE 1451.2 Transducer Independent Interface (TII) and TEDS Transducer Module - STIM Transducer Electronic Data Sheets (TEDS)
40-8
The Industrial Communication Technology Handbook
1451.2 Interface
Physical world
Smart Transducer Interface Module (STIM) XDCR
ADC
XDCR
DAC
XDCR
DI/O
XDCR
?
Any arbitrary network
Transducer Independent Interface(TII) NerworkCapable Application Processor (NCAP) with 1451.1 Smart Transducer Object Model
Address Logic
Transducer Electronic Data Sheet
FIGURE 40.8 Detailed system block diagram of IEEE 1451 smart transducer interface.
TABLE 40.1 Different Types of TEDS TEDS Name
Type
Optional/Mandatory
Meta-TEDS Channel TEDS Calibration TEDS Generic extension TEDS Meta-identification TEDS Channel identification TEDS Calibration identification TEDS End-user application-specific TEDS
Machine readable Machine readable Machine readable Machine readable Human readable Human readable Human readable Human readable
Mandatory Mandatory Optional Optional Optional Optional Optional Optional
The meta-TEDS contains the data that describe the whole STIM. It contains the revision of the standard, the version number of the TEDS, the number of channels in the STIM, and the worst-case timing required to access these channels. This information will allow the NCAP to access the channel information. In addition, the meta-TEDS includes the channel groupings that describe the relationships between channels. Each transducer is represented by a channel. Each channel in the STIM contains a channel TEDS. The channel TEDS lists the actual timing parameters for each individual channel. It also lists the type of transducer, the format of the data word being output by the channel, the physical units, the upper and lower range limits, the uncertainty or accuracy, whether a calibration TEDS is provided, and where the calibration is to be performed. The calibration TEDS contains all the necessary information for the sensor data to be converted from the analog-to-digital converter raw output into the physical units specified in the channel TEDS. If actuators are included in the STIM, it also contains the parameters that convert data in the physical units into the proper output format to drive the actuators. Additionally, it contains the calibration interval and last calibration date and time. This allows the system to determine when a calibration is needed. A general calibration algorithm is specified in the standard. The generic extension TEDS is provided to allow industry groups to provide additional TEDS in a machine-readable format.
© 2005 by CRC Press
A Smart Transducer Interface Standard for Sensors and Actuators
Any network
IEEE P1451.3 HomePNA Hardware interface and TEDS
Transducer Bus Controller (TBC)
NCAP
40-9
Transducer Module -TBIM
Transducer Module -TBIM
Transducer Electronic Data Sheets (TEDS)
Transducer Electronic Data Sheets (TEDS)
FIGURE 40.9 Block diagram of IEEE P1451.3.
The meta-identification TEDS is human-readable data that the system can retrieve from the STIM for display purposes. This TEDS contains fields for the manufacturer’s name, the model number and serial number of the STIM, and a date code. The channel identification TEDS is similar to the meta-identification TEDS. When transducers from different manufacturers are built into a STIM, this information will be very useful for the identification of channels. The channel identification TEDS provides information about each channel, whereas the meta-identification TEDS provides information for the STIM. The calibration identification TEDS provides details of the calibration in the STIM. This information includes who performed the calibration and what standards were used. The end-user application-specific TEDS is not defined in detail by the standard. It allows the user to insert information such as installation location, time it was installed, or any other desired text. The STIM module can contain a combination of sensors and actuators of up to 255 channels, signal conditioning/processing, analog-to-digital converter (A/D), digital-to-analog converter (D/A), and digital logics to support the transducer-independent interface (TII). Currently, the P1451.2 working group is considering an update to the standard to include a popular serial interface, such as RS-232, in addition to the TII for connecting sensors and actuators.
40.6.5 IEEE P14513 Distributed Multidrop Systems IEEE P1451.3 defines a transducer bus for connecting transducer modules to an NCAP in a distributed multidrop fashion. A block diagram is shown in Figure 40.9. The physical interface for the transducer bus is based on the Home Phoneline Networking Alliance (HomePNA) specification. Both power and data run on a twisted pair of wires. Multiple transducer modules, called transducer bus interface modules (TBIMs), can be connected to an NCAP via the bus. Each TBIM contains transducers, signal conditioning/ processing, A/D, D/A, and digital logics to support the bus, and can accommodate large arrays of transducers for synchronized access at up to 128 Mbps with HomePNA 3.0, and up to 240 Mbps with extensions. The TEDS is defined in the Extensible Markup Language (XML).
40.6.6 IEEE P1451.4 Mixed-Mode Transducer Interface IEEE P1451.4 defines a mixed-mode transducer interface (MMI), which is used for connecting transducer modules, mixed-mode transducers (MMTs), to an instrument, a computer, or an NCAP. The block diagram of the system is shown in Figure 40.10. The physical transducer interface is based on the Maxim/
© 2005 by CRC Press
40-10
The Industrial Communication Technology Handbook
IEEE P1451.4 Mixed-mode, Onewire Interface and TEDS
Any network
Instrument, Computer, or NCAP.
Transducer Module -MMT
Transducer Module -MMT
Transducer Electronic Data Sheets (TEDS)
Transducer Electronic Data Sheets (TEDS)
FIGURE 40.10 Block diagram of IEEE P1451.4.
Dallas Semiconductor’s one-wire protocol, but it also supports up to four wires for bridge-type sensors. It is a simple, low-cost connectivity for analog sensors with a very small TEDS — 64 bits mandatory and 256 bits optional. The mixed-mode interface supports a digital interface for reading and writing the TEDS by the instrument or NCAP. After the TEDS transaction is completed, the interface switches into analog mode, where the analog sensor signal is sent straight to the instrument and NCAP that is equipped with A/D to read the sensor data.
40.6.7 IEEE P1451.5 Wireless Transducer Interface Wireless communication is emerging, and low-cost wireless technology is on the horizon. Wireless communication links could replace the costly cabling for sensor connectivity. It also could greatly reduce sensor installation cost. Industry would like to apply the wireless technology for sensors; however, there is a need to solve the interoperability problem among wireless sensors, equipment, and data. In response to this need, the IEEE P1451.5 working group is working to define a wireless sensor communication interface standard that will leverage existing wireless communication technologies and protocols [9]. A block diagram of IEEE P1451.5 is shown in Figure 40.11. The working group seeks to define the wireless message formats, data/control model, security model, and TEDS that are scalable to meet the needs of low cost to sophisticated sensor or device manufacturers. It allows for a minimum of 64 sensors per access point. Intrinsic safety is not required, but the standard would allow for it. The physical communication protocols being considered by the working group are IEEE 802.11 (WiFi), IEEE 802.15.1 (Bluetooth), IEEE 802.15.4 (WPAN-LR), and Ultra Wideband.
40.6.8 IEEE 1451 Family Figure 40.12 summarizes the family of IEEE 1451 standards. Each of the IEEE P1451.X is designed to work with each other. However, they can also stand on their own. For example, IEEE 1451.1 can work without any IEEE 1451.X hardware interface. Likewise, IEEE1451.X can also be used without IEEE 1451.1, but software with similar functionality should provide sensor data/information to each network.
40.6.9 Benefits of IEEE 1451 IEEE 1451 defines a set of common transducer interfaces, which will help to lower the cost of designing smart sensors and actuators because designers would only have to design to a single set of standardized digital interfaces. Thus, the overall cost to make networked sensors will decrease.
© 2005 by CRC Press
40-11
A Smart Transducer Interface Standard for Sensors and Actuators
IEEE P1451.5 Wireless Sensor Interface and TEDS Any network
Host or NCAP
Transducer Module
Transducer Module
Transducer Electronic Data Sheets (TEDS)
Transducer Electronic Data Sheets (TEDS)
FIGURE 40.11 Block diagram of IEEE P1451.5 wireless transducer. Analog + Digital IEEE P1451.4 Digital, Point-to-Point IEEE 1451.2 Distributed Multidrop Bus IEEE P1451.3
IEEE P1451.0 Wireless Smart Transducer Object Model IEEE P1451.5
Mixed-Mode Transducer (MMT)
TEDS Txdcr Wireless Txdcr Bus Digital TII Interface Interface Interface
Any Network
Network-Capable Application Processor (NCAP) IEEE P1451.0 Common Functionality & TEDS
Smart Transducer Interface Module A/D Txdcr (STIM) TEDS
Transducer Bus Interface Module A/D Txdcr (TBIM) TEDS
Wireless Transducer A/D Txdcr (WT) TEDS
TII = Transducer Independent Interface Txdcr = Transducer (Sensor or Actuator)
FIGURE 40.12 Family of IEEE P1451 standards.
Incorporating the TEDS with the sensors will enable self-description of sensors and actuators, eliminating error-prone, manual configuration. 40.6.9.1 Sensor Manufacturers Sensor manufacturers can benefit from the standard because they only have to design a single standard physical interface. Standard calibration specification and data format can help to design and develop multilevel products based on TEDS with a minimum effort. 40.6.9.2 Application Software Developers Applications can benefit from the standard as well because standard transducer models for control and data can support and facilitate distributed measurement and control applications. The standard also provides support for multiple languages — good for international developers. 40.6.9.3 System Integrators Sensor system integrators can benefit from IEEE 1451 because sensor systems become easier to install, maintain, modify, and upgrade. Quick and efficient transducer replacement results by simple plug and
© 2005 by CRC Press
40-12
The Industrial Communication Technology Handbook
play. It can also provide a means to store installation details in the TEDS. Self-documenting of hardware and software is done via the TEDS. Best of all is the ability to choose sensors and networks based on merit. 40.6.9.4 End Users End users can benefit from a standard interface because sensors will be easy to use by simple plug and play. Based on the information provided in the TEDS, software can automatically provide the physical units, readings with significant digits as defined in the TEDS, and installation details such as instruction, identification, and location of the sensor. 40.6.9.5 Plug and Play of Sensors IEEE 1451 enables plug and play of transducers to a network, as illustrated in Figure 40.13. In this example, IEEE P1451.4-compatible transducers from different companies are shown to work with a sensor network. IEEE 1451 also enables plug and play of transducers to a data acquisition system/ instrumentation system, as shown in Figure 40.14. In this example, various IEEE P1451.4-compatible transducers such as an accelerometer, a thermistor, a load cell, and a linear variable differential transformer (LVDT) are shown to work with a LabView-based system.* Any network
1451 NCAPs
1451-compatible transducer
1451-compatible transducer
FIGURE 40.13 IEEE 1451 enables plug and play of transducers to a network.
1451-compatible Instrument or Data Acquisition System
1451-compatible transducer
Example: P1451.4 transducer demonstration (acceleration, load cell, position, and temperature sensors, etc.)
1451-compatible transducer
FIGURE 40.14 IEEE 1451 enables plug and play of transducers to data acquisition/instrumentation system.
*Certain commercial products are identified in this paper in order to describe the system. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the products identified are necessarily the best or only ones available for the purpose.
© 2005 by CRC Press
A Smart Transducer Interface Standard for Sensors and Actuators
40-13
NCAP & STIM in each box
FIGURE 40.15 NCAP-based condition monitoring system.
FIGURE 40.16 Three-axis vertical machining center.
40.7 Example Application of IEEE 1451.2 IEEE 1451-based sensor networks consisting of sensors, STIM, and NCAP are designed and built into a cabinet, as shown in Figure 40.15. There were a total of four STIM and NCAP network nodes, as pointed to in the figure. Thermistor sensors were used for temperature measurements. They were calibrated in the laboratory to generate an IEEE 1451.2-compliant calibration TEDS for all four STIMs and NCAPs. The thermistors were mounted on the spindle motor housing, bearings, and axis drive motors of a threeaxis vertical machining center, which is shown in Figure 40.16. Since each NCAP has a built-in micro Web server, a custom Web page was constructed using the Web tool provided with the NCAP. Thus, remote monitoring of the machine thermal condition was easily achieved via the Ethernet network and the Internet using a readily available common Web browser. The daily trend chart of the temperatures of the spindle motor (top trace) and Z-axis drive motor (bottom trace) in the machine is shown in Figure 40.17. The temperature rise tracks the working of the machine during the day, and the temperature fall indicates the machine is cooling off after the machine shop has closed.
40.8 Application of IEEE 1451-Based Sensor Network A distributed measurement and control system can be easily implemented based on the IEEE 1451 standards [10]. An application model of IEEE 1451 is shown in Figure 40.18. Three NCAP/STIMs are used to illustrate the distributed control, remote sensing or monitoring, and remote actuating. In the first scenario, a sensor and actuator are connected to the STIM of NCAP 1, and an application software running in the NCAP can perform a locally distributed control function, such as maintaining a constant temperature for a bath. The NCAP reports measurement data, process information, and control status
© 2005 by CRC Press
40-14
The Industrial Communication Technology Handbook
FIGURE 40.17 Temperature trend chart.
Monitoring Station Network NCAP #1
Actuator STIM
Sensor STIM
Distributed Control
NCAP #2
NCAP #3
Sensor STIM
Actuator STIM
Remote Sensing Remote Actuating
FIGURE 40.18 Application model of IEEE 1451.
to a remote monitoring station or host. It frees the host from the processor-intensive, closed-loop control operation. In the second scenario, only sensors are connected to NCAP 2, which can perform remote process or condition monitoring functions, such as monitoring the vibration level of a set of bearings in a turbine. In the third scenario, based on the broadcast data received from NCAP 2, NCAP 3 activates an alarm when the vibration level of the bearings exceeds a critical set point. As illustrated in these examples, an IEEE 1451-based sensor network can easily facilitate peer-to-peer communications and distributed control functions.
40.9 Summary The IEEE 1451 smart transducer interface standards are defined to allow a transducer manufacturer to build transducers of various performance capabilities that are interoperable within a networking system.
© 2005 by CRC Press
A Smart Transducer Interface Standard for Sensors and Actuators
40-15
The IEEE 1451 family of standards has provided the common interface and enabling technology for the connectivity of transducers to microprocessors, field networks, and instrumentation systems using wired and wireless means. The standardized TEDS allows the self-description of sensors, which turns out to be a very valuable tool for condition-based maintenance. The expanding Internet market has created a good opportunity for sensor and network manufacturers to exploit Web-based and smart sensor technologies. As a result, users will greatly benefit from many innovations and new applications.
Acknowledgments The author sincerely thanks the IEEE 1451 working groups for the use of the materials in this paper. Through its program in smart machine tools, the Manufacturing Engineering Laboratory of the National Institute of Standards and Technology has contributed to the development of the IEEE 1451 standards.
References [1] Amos, Kenna, Sensor market goes global, InTech: The International Journal for Measurement and Control, 40–43, 1999. [2] Bryzek, Janusz, Summary report, in Proceedings of the IEEE/NIST First Smart Sensor Interface Standard Workshop, Gaithersburg, MD, March 1994, pp. 5–12. [3] IEEE1451.2, Standard for a Smart Transducer Interface for Sensors and Actuators: Transducer to Microprocessor Communication Protocols and Transducer Electronic Data Sheet (TEDS) Formats, IEEE, Piscataway, NJ, 1997. [4] Eidson, J., Woods, S., A research prototype of a networked smart sensor system, in Proceedings of the Sensors Expo, Boston, May 1995. [5] URL http://ieee1451.nist.gov. [6] IEEE 1451.1, Standard for a Smart Transducer Interface for Sensors and Actuators: Network Capable Application Processor (NCAP) Information Model, IEEE, Piscataway, NJ, 1999. [7] Warrior, Jay, IEEE-P1451 network capable application processor information model, in Proceedings of the Sensors Expo, Anaheim, April 1996, pp. 15–21. [8] Woods, Stan, et al., IEEE-P1451.2 smart transducer interface module, in Proceedings of the Sensors Expo, Philadelphia, PA, October 1996, pp. 25–38. [9] Lee, K.B., Gilsinn, J.D., Schneeman, R.D., Huang, H.M., First Workshop on Wireless Sensing, NISTIR 02-6823, NIST, February 2002. [10] Lee, Kang, Schneeman, Richard, Distributed measurement and control based on the IEEE 1451 smart transducer interface standards, in Instrumentation and Measurement Technical Conference 1999, Venice, Italy, May 1999.
© 2005 by CRC Press
41 Applying IEC 61375 (Train Communication Network) to Data Communication in Electrical Substations
Hubert Kirrmann ABB Corporate Research
41.1 Introduction ......................................................................41-1 41.2 MVB ...................................................................................41-2 41.3 Clock Synchronization......................................................41-2 41.4 Compensating Transmission Delays ................................41-4 41.5 Connecting MVBs.............................................................41-4 41.6 Conclusions .......................................................................41-4 References .....................................................................................41-4
41.1 Introduction One of the most demanding real-time requirements for real-time data communication arises from protection tasks in electrical substations (Figure 41.1). Figure 41.2 shows the basic arrangement of a substation with four bays connecting four grids. The high-voltage lines are connected over a bus bar, which is a copper rod or cable. Some lines feed the bus bar and others draw energy from it, depending on the generator and load balance of the connected grids. Each bay has its own protection against overcurrent or other disturbances. However, to protect the bus bar, the bays must collaborate to trip all high-power circuit breakers when a short circuit affects the bus bar. A short circuit in one bay should not cause the loss of the whole bus bar. The abnormal condition is detected by sensors measuring the current of all bays (i1 … i4 in Figure 41.2). According to Kirchhoff ’s law, the sum of currents over a closed surface (the dotted line in Figure 41.2) must be zero; otherwise, a short circuit exists. Building the sum of the currents requires the field devices (called PISAs) to sample the instantaneous value of the current with a very small phase difference (a few microseconds) and to transmit the value to the protection device within a few milliseconds. The circuit breakers should trip within some 30 ms of the disturbance.
41-1 © 2005 by CRC Press
41-2
The Industrial Communication Technology Handbook
FIGURE 41.1 Substation. (Courtesy of ABB Schweiz AG, Power Technologies, Baden, Germany.) protection
MVB (fiber links) Busbar Current sensor
i1
i2
PISA
i3
PISA
i4
PISA
PISA Short to ground
Isolator Circuit breaker
PISA
PISA
PISA
PISA
Transformer
Bay 1 Grid 1
Bay 2 Grid 2
Bay 3 Grid 3
Bay 4 Grid 4
FIGURE 41.2 Substation layout.
41.2 MVB The substation requirements for communication are similar to the high-speed drive requirements in railways [1]. The multifunction vehicle bus (MVB), standardized as part of the Train Communication Network (IEC 61375) [2], was designed for this type of application. It transmits process data cyclically with a period of 1 ms. At 1.5 Mbit/s, it can transmit the current samples of several bays on a single bus, using about 80 µs for each three-phase current sample. The physical medium in substations is fiber optics, due to the extreme electromagnetic fields. Fibers can cover the whole distance of a substation with no repeaters, but require a star coupler. In addition, dependability requires that the bus be duplicated. Figure 41.3 shows a typical structure of a redundant MVB layout. Protection devices can be used as concentrators for several buses. Figure 41.4 shows how MVB is used in a bay protection arrangement with redundant protection. Here, it makes sense to lay out completely separated buses to achieve full protection device redundancy.
41.3 Clock Synchronization Precise synchronization for sampling of the current and voltage values is indispensable. To that effect, a master clock, e.g., receiving a GPS or long-wave radio signal from an atomic clock, synchronizes periodically the slave clocks in the PISAs. Figure 41.5 shows the clock synchronization scheme of MVB, which is executed by hardware in the bus controller.
© 2005 by CRC Press
Applying IEC 61375 to Data Communication in Electrical Substations
41-3
Star coupler A Fiber links A
Rack devices
Rack devices A
Copper Bus A
A PISA
PISA Copper Bus B
B
B
Bus administrator
Redundant Bus administrator
Fibre links B
Star coupler B
FIGURE 41.3
Redundant, fiber layout in substation. Bus 2
Feeder 1 Current
Current
CB X Protection Feeder 1
CB Y Protection Feeder 1
Voltage Voltage Current
Current
CB
CB
Voltage
Current
Y Protection Feeder 2
Current
X Protection Feeder 2
Voltage CB
CB
Current
Current Bus 2
Feeder 2
FIGURE 41.4 Bay arrangement.
Bus master Periodic list
Slave Master clock clock
Bus controller
Application processor 1 Int Req Ports
Bus controller
Application processor 2 Int Req Slave clock
Ports
Application processor 3 Int Req Slave clock
Bus controller
Ports
Bus controller
MVB Sync port address
Sync port variable
FIGURE 41.5 Clock synchronization.
A dedicated variable broadcasts the UTC time every second. The variable source may be one device dedicated to a clock (polled by the master) or the bus master itself, as in Figure 41.5, in which case the exact moment of sending is very precise. The reception of that variable triggers an interrupt and at the same time synchronizes the slave clocks.
© 2005 by CRC Press
41-4
The Industrial Communication Technology Handbook
Slave Master clock clock
MVB 1
Bus administrator 1
Slave clocks
Slave devices
Slave clocks
Bus administrator 2 Synchronizer
MVB 2
FIGURE 41.6 Connecting MVBs.
41.4 Compensating Transmission Delays When the substation is large and repeaters have to be used, transmission delays must be compensated since sampling should be synchronized within a few microseconds. Two solutions have been found: 1. Use a separate fiber for synchronization. This is the simplest and most accurate method, but it requires lying out a third fiber from the clock device to all PISAs, and it is therefore costly. 2. Calculate the bus delays and let the bus controller application-specific integrated circuit (ASIC) compensate the delays. To this effect, the receivers assume that two consecutive frames from the same source suffer the same delay. This method is similar to the one that has been later standardized as IEEE 1588 [3].
41.5 Connecting MVBs When the substation becomes so large that several MVB buses must be used, a device connected between the buses synchronizes the clocks of the different segments (Figure 41.6).
41.6 Conclusions Although MVB was developed in the 1980s, it has, after 20 years, not yet found a successor. Variants of Industrial Ethernet have been discussed, such as IEC 61850, with commercial products expected. There was no urgent need for a next generation since MVB solved the issues regarding real time, such as determinism, redundancy, and clock synchronization, which commercial Ethernets could not. Also, substation structures do not change, and there was little incentive to increase data rates, except for easier configuration and maintenance. Experience taught that critical devices such as PISAs must be kept very simple to meet the overall reliability goals. It is interesting to observe that the Industrial Ethernets that are proposed follow the same patterns as MVB and require similar special hardware for real-time operation.
References [1] Hubert Kirrmann, et al. The IEC/IEEE Train Communication Network, IEEE Micro, March/April 2001. [2] IEC 61375, Train Communication Network, International Electrotechnical Committee, Geneva, 1999. [3] Hubert Kirrmann and Georg zur Bonsen, Method of Synchronizing Devices to a Computer Bus, European Patent EP1061454.
© 2005 by CRC Press
42 SEMI Interface and Communication Standards: An Overview and Case Study A.M. Fong Singapore Institute of Manufacturing Technology
K.M. Goh Singapore Institute of Manufacturing Technology
Y.G. Lim Singapore Institute of Manufacturing Technology
K. Yi Singapore Institute of Manufacturing Technology
O. Tin Singapore Institute of Manufacturing Technology
42.1 Introduction to SEMI .......................................................42-1 42.2 The SEMI Equipment Communication Standard..........42-1 The Four Main SEMI Communications/Control Standards • SEMI Equipment Communication Standard I • High-Speed SECS Message Services Standard • SEMI Communication Standard II • Generic Equipment Model • Limitations of SECS/ GEM • Emerging and Evolving New Standards
42.3 A Survey of SECS/GEM-Compliant Tool Kits..............42-17 Overview of Equipment Integration Software • SEMI Equipment Communication Software • An Analysis of Current Available SECS/GEM Solutions
42.4 Conclusion.......................................................................42-23 Acknowledgments......................................................................42-23 References ...................................................................................42-26 Web References ..........................................................................42-27
42.1 Introduction to SEMI Semiconductor Equipment and Materials International (SEMI), founded in the U.S. in 1970, is a global industry association with a worldwide membership of 2500 companies, representing semiconductorrelated industries. SEMI has 17 subcommittees that include, among others, the Subcommittees for Automated Test Equipment; Environment, Health and Safety (EHS); and Information and Control. This chapter will give an overview of the fundamentals of the SEMI Equipment Communication Standard, commonly referred to as SECS, its interpretation, the available software tools, and case study applications.
42.2 The SEMI Equipment Communication Standard The SEMI Equipment Communication Standard (SECS) defines computer-to-computer communication protocols intended to assist in the automation of electronics manufacturing facilities. SECS enables the automation, communications, and control of semiconductor process equipment, from a variety of vendors, using a set of standards and reliable protocols. These standards have the capacity to resolve/define
42-1 © 2005 by CRC Press
42-2
The Industrial Communication Technology Handbook
TABLE 42.1 SEMI Equipment Communication Standard Protocol Date of launch Message definition
SECS-I 1978 Æ Communication protocol and physical definition
SECS-II 1982 Æ Message format
GEM 1992 Æ Additional SECS-II sequences
HSMS 1994 Æ Supersede SECS-I for TCP/IP compatibility
Defines document for communication
the physical layers, signals, block semantics, etc., thus ensuring repeatable, nondeadlocked communications from simple to extended-hierarchy networks. In its basic form, SECS provides a one-to-one link between a particular unit of equipment and a host computer. The equipment can be in the form of wafer processing equipment, test and metrology equipment, die/package assembly equipment, surface mount equipment, and others. Similarly, the host computer can range from minicomputers, small dedicated desktop PCs, and laptops, to a network of embedded systems. In many cases, the host computer system manages and coordinates equipment on the factory floor via cell controllers, line controllers, and material controllers. These must be compatible with current Manufacturing Execution Systems and Manufacturing Legacy Systems (MLS).1
42.2.1 The Four Main SEMI Communications/Control Standards SEMI Equipment Communications are based on four main standards: SEMI Equipment Communication Standard I (SECS-I), SEMI Equipment Communication Standard II (SECS-II), High-Speed SECS Message Services Standard (HSMS), and Generic Equipment Model Standard (GEM), as shown in Table 42.1. Both SECS-I and HSMS define the message transport layer of the protocol. SECS-II defines the message format, and GEM defines the equipment behavior. 42.2.1.1 The Components for Executing the SECS Protocol Host systems are linked to different equipment that have to be SECS compliant (i.e., based on SECS). The connectivity to the host can be through serial or Transmission Control Protocol (TCP)/Internet Protocol (IP) ports. GEM is incorporated in some of the newer equipment. Older equipment is generally not GEM compliant, having only serial connectivity. An example of SECS Implementation Architecture is shown in Figure 42.1. 42.2.1.1.1 Host System The main function of the host system is to manage the connectivity to a group of equipment via a configuration file, recipe management, equipment load balancing, equipment monitoring, and connection to enterprise-level applications. The host systems that manage the same type of equipment are referred to as cell controllers. Similarly, the host systems that manage and coordinate different equipment are referred to as line controllers. 42.2.1.1.2 Terminal Server This is a serial hub (also sometimes called a serial device server) that allows equipment with serial port connections to connect to this hub and provides information sharing among other equipment through Ethernet-ready TCP/IP connections. 42.2.1.1.3 Equipment 1 (Non-GEM Compliant) This is normally the older semiconductor equipment that is not GEM compliant. Most of the older equipment will only provide serial connections to the host system. 42.2.1.1.4 Equipment 2 (HSMS Compliant) This is typically the newest semiconductor equipment that has point-to-point communication protocol for TCP/IP Ethernet communications. Some of the equipment is already GEM compliant for host/ equipment communications.
© 2005 by CRC Press
SEMI Interface and Communication Standards: An Overview and Case Study
42-3
Enterprise Level System
Host System
RS232 SECS-I
Application
SECS-II
SECS-II
GEM Non-GEM Application
HSMS
SECS-I
TCP/IP
RS232
Equipment 1 Network TS or Serial
TCP/IP
TCP/IP
RS232
HSMS
HSMS
SECS-I
SECS-II GEM Non-GEM
Application Equipment 2
SECS-II GEM Non-GEM
Application Equipment 3
Legend: Equipment 1: Non-GEM Compliance Equipment 2: HSMS Compliance Equipment 3: GEM and HSMS Compliance TS : Terminal Server
FIGURE 42.1 The components for the execution of SECS protocol.
42.2.1.1.5 Equipment 3 (GEM and HSMS Compliant) These are typically the up-to-date models of semiconductor equipment that are both GEM and HSMS compliant, having both serial and TCP/IP connections. Nevertheless, some of this equipment is expected to be simultaneously serial port and TCP/IP capable. Usually there is a problem of connectivity to the Manufacturing Legacy System, which we cannot easily ignore.
42.2.2 SEMI Equipment Communication Standard I SECS-I is also called SEMI E4. It was launched in 1978 and released in 1980. It describes a transmission interface for passing messages between the equipment and a host over the RS-232 communication, as shown in Figure 42.2. This standard defines point-to-point communication of data, utilizing a subset of the international standard EIA RS-232-C for the connector and voltage levels. The communication is bidirectional and asynchronous, but is half duplex. The communication speed normally varies from 9,600 to 19,200 baud rate. SECS-I protocol establishes multiblock transfers based on blocks of 256 bytes.
© 2005 by CRC Press
42-4
The Industrial Communication Technology Handbook
Software Handshaking Sender
Receiver
2
Transmit Data
3
Receive Data
7
Transmit Data
Signal
2
Receive Data
3
Signal
7 Host
Equipment 1, 4, 5, 6 8–25
Unused
1, 4, 5, 6 8–25
25-pin D Connector
25-pin D Connector
FIGURE 42.2 Three-wire serial data configuration for SECS-I (with software handshake). TABLE 42.2 Layer Definition of the SECS-I Standard SECS-I Layers
Definitions for
Transaction layer Message layer Block layer
Primary message Block header Block format, block protocol, packet format, packet protocol Physical connector, voltage, bit coding, baud rate
Physical layer
SECS-I includes specifications for the electrical connection between two communicating entities. It defines the message header structure, the block transfer protocol, and the message transfer protocol. The block transfer protocol is the procedure used by the serial line to establish the direction of communication and to provide the environment for passing messages. The message transfer protocol details how the block transfer protocol is used to send and receive messages. Table 42.2 shows the layer definition of SECS-I. The physical layer defines the physical connectors, electrical characteristics, bit and character coding, and baud rate. The block layer defines the block or packet format and protocols that are used to transfer a single block. The message layer defines the block header. The transaction layer defines how primary messages and their corresponding reply messages are related. 42.2.2.1 SECS-I Message Structure Figure 42.3 shows the structure of the SECS-I message. This message structure allows multiblock messages with one stream and function. The alphabet in Figure 42.3 can be described as follows: a. The first byte represents the length of the message. b. Bytes 0 and 1 of the header represent the device ID. The device ID is to identify the equipment in the factory. There is no device ID for the host system. Bit 8 of the first byte is a reserved bit or R-Bit, which is used for indicating the direction of the message, i.e., host to equipment or equipment to host. If the R-Bit is 0, the message is from the host to the equipment. If it is 1, the message is from the equipment to the host. c. Byte 2 of the header represents the stream number for the message. The stream represents the category of the messages that are defined in SECS-II. d. Byte 3 of the header represents the function numbers within the stream.
© 2005 by CRC Press
SEMI Interface and Communication Standards: An Overview and Case Study
LTH a
b
HEADER c d f
TEXT
42-5
CHECK SUM
e 2
# of Bytes
10
0–244
a: Length of message b: Device ID c: Stream d: Function e: Block number f: System Bytes
FIGURE 42.3 Structure of SECS-I message.
e. Bytes 4 and 5 of the header represents the block number of the message. If the message is longer than 244 bytes, the message will be broken into multiple blocks. The system will use these two bytes to identify the sequence of the block number. f. Bytes 6 to 9 of the header represent the system bytes. The purpose of system bytes is to avoid duplication of a message. 42.2.2.2 Synchronization Mechanism SECS-I uses software-based time-outs to handle communication synchronization between host system and equipment. As shown in Figure 42.4, time-outs (T1 to T4) are used in SECS-I for managing the transmission of blocks between the sender and receiver. Each time-out has a special function, as explained below: T1 defines intercharacter time-out in the block transfer protocol. It measures the duration of characters transferred within the same block. T1 time-out will be activated if the host system does not receive the block of characters within the defined time. T2 defines the protocol time-out. It limits the time required to communicate among the following messages: • Request to send (ENQ) and request to receive (EOT) • EOT and the receipt of the block of characters • Sending block of characters and correct reception (ACK) T3 defines the time-out for the reply message. It sets the maximum waiting time for multiple-block message communications. Within a specified time, if the last block is not received, T3 time-out will be triggered. ENQ T2
Equipment T4
Host
EOT T1
T2
ACK T3
ENQ EOT
ACK
T2
FIGURE 42.4 Establishing synchronization via time-outs (T1 to T4).
© 2005 by CRC Press
ENQ = Request to send EOT = Request to receive ACK = Correct reception NAK = Incorrect reception
42-6
The Industrial Communication Technology Handbook
TABLE 42.3 Time-Outs (T1–T4) Definition Time-Out
Description
T1 T2 T3 T4
Intercharacter time-out Protocol waiting time time-out Time-out for reply message Interblock communication time-out
Transition
Default Entry Point
State
C Conditional Selector
H* History
FIGURE 42.5 Harel’s notation of state models.
T4 defines the time-out for interblock communications. It sets the time limit to receive the next block of messages within the multiblock communications. If the next block of messages is not received within the specific time, T4 will be activated. SECS-I provides a mechanism for synchronization and software deadlock resolution via four timeouts (T1 to T4), as shown in Table 42.3.
42.2.3 High-Speed SECS Message Services Standard With the increasing trend toward network communication for high-performance components and hardware, HSMS was introduced in 1995. HSMS (also referred to as E31) defines a communication interface suitable for the exchange of messages between computers via TCP/IP. HSMS uses TCP/IP streams to provide reliable two-way simultaneous transmission of streams of adjoining bytes. It can be used as a replacement for SECS-I communication as well as other more advanced communication environments. It was intended as an alternative for applications where higher-speed communication is needed or when a simple point-to-point topology is insufficient. Most of the new SECS/GEM implementations use this standard for the message transport layer. HSMS was designed with the intention that existing SECSII- and GEM-compliant applications could be easily converted to use HSMS instead of SECS-I. HSMS also provides the means for independent equipment manufacturers to produce their own SECS-II/GEM implementations while still maintaining interequipment compatibility. 42.2.3.1 HSMS State Model The state model is a methodology developed by David Harel and adapted by SEMI to describe the expected functionalities in the standard. As shown in Figure 42.5, a state is a static condition of an operation. A state could have multiple substates. The arrows represent the transitions from one state to another state. In Figure 42.6, the round black dot is the start of the process. When HSMS first starts, it will be in the “TCP/IP not connected” state. If the TCP/IP connection is successful, then the process will move to the “TCP/IP connected” state. In the “TCP/IP connected” state, there are two substates: “not selected” and “selected.” The system could be “connected on TCP/IP but HSMS has not been selected.” When HSMS is selected, it will go into a substate called “selected” and stays there until a T3 time-out occurs. When “connection terminated or selection fails” occurs, the state will go back to “TCP/IP not connected.” 42.2.3.2 HSMS Message Structure Figure 42.7 shows the structure of the HSMS message. The message structure is similar to SECS-I, but it is for TCP/IP network communication. The alphabet in Figure 42.7 can be described as follows:
© 2005 by CRC Press
SEMI Interface and Communication Standards: An Overview and Case Study
42-7
TCP/IP Connected TCP/IP connection succeeds
T3 timeout Select succeeds Not Selected
TCP/IP not connected
Selection Fails
Selected Connection Terminates
FIGURE 42.6 State model of HSMS. LTH a 4
b
HEADER c d f
# of Bytes
10
TEXT e 0–7.9 M
a: Length of message b: Session ID c: Header Bytes d: P Type e: S Type f: System Bytes
FIGURE 42.7 Structure of HSMS message.
a. First through fourth bytes represent the length of the message. HSMS can handle about 8 Mb of data. The length defines the total length of the header and the actual message. b. Bytes 0 and 1 of the headers represent session ID. Session ID is used to identify whether the communication message is a control message or subsequent data information. c. Bytes 2 and 3 of the header represent the stream and function number or status code of the message. d. Byte 4 of the header represents the presentation type (P-type) of how the message is encoded. A value 0 means that it will be used in SECS-II. Values 1 to 127 are reserved for subsidiary standards. Values 128 to 255 are reserved but not used. e. Byte 5 of the header represents the session type (S-type). Value 0 means that the session is sending data information. Other values indicate that the message is control information. f. Bytes 6 to 9 of the header represent the system bytes. The purpose of system bytes is to avoid duplication of a message. 42.2.3.3 HSMS Subclassification SEMI has subclassified HSMS into single session (HSMS-SS) and general session (HSMS-GS), as shown in Figure 42.8. HSMS-SS defines a single-session equipment connection so that the implementation does not have to consider other equipment or knowledge of other systems. It will be similar to the SECS-I connection, which is a one-to-one connection. The HSMS generic definition and structure will still apply. HSMS-GS is used for complex systems that have a lot of independent equipment as subsystems. HSMSGS defines the convention for identifying independent system entities.
42.2.4 SEMI Communication Standard II SECS-II was released in 1982. SECS-II (also referred to as E5) defines the message content passing between the equipment and the host. SECS-II specifies the format of the text and the minimal unit of semantics, but it does not specify the exact action that a recipient machine should take. With the definition of SECSII, the host–equipment software may be constructed with minimal knowledge of each other. In addition, SECS-II allows the creation of user-specific messages. SECS-II defines the method of conveying information between equipment and host in the form of messages. Every SECS-II message is identified by a stream and a function number. Each of the streams
© 2005 by CRC Press
42-8
The Industrial Communication Technology Handbook
SEMI E37 High-Speed SECS Message Services, Generic
SEMI E37.1 High-Speed SECS Message Services Single Session (HSMS-SS)
SEMI E37.2 High-Speed SECS Message Services General Session (HSMS-GS)
FIGURE 42.8 Subclassification of HSMS. TABLE 42.4 Streams Defined in SECS-II Stream
Description
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Equipment status Equipment control and diagnostics Material status Material control Exception handling Data collection Process program management Control program transfer System errors Terminal services Host file services Wafer mapping Data set transfer Object services Recipe management Processing management Equipment control and diagnostics
represents the category of service available, as shown in Table 42.4. The stream will contain specific messages called functions. It also defines the request for information and the corresponding data transmission. Each combination of stream and function represents a distinct message classification. In every stream, function 0 is reserved for aborting transactions and the others are constructed in pairs. For example, Table 42.5 shows that each pair of odd-numbered function codes is reserved for the primary message and even-numbered function codes for the reply, or the secondary message. If the primary message does not require a reply message, the even-numbered message is discarded. TABLE 42.5 Odd/Even Message Number of SECSII Function
© 2005 by CRC Press
Stream
Function
Description
1 1 1 1 1 1 1 …
0 1 2 3 4 5 6 …
Abort transaction (S1 F0) Are you there request Online data Select equipment status request Select equipment status data Formatted status request Formatted status data …
42-9
SEMI Interface and Communication Standards: An Overview and Case Study
HEADER
DATA
CHECKSUM
Message Data
Checksum
Data Item #2
Data Item #3
Data Item #4
Length 1–3 Length Bytes
SECS II Message Data
Data Item #1
Format Code
SECS I/HSMS Block Layer
LTH
Data
Format Code is Data Item Type + Number of Length
FIGURE 42.9 Data structures of SECS-I, SECS-II, and HSMS.
42.2.4.1 Data Structure The messages that are sent through the physical layer through serial (RS-232) or TCP/IP (Ethernet network) are broken down by SECS-I or HSMS definitions, yielding the length, header, and data blocks. SECS-II defines the structure of messages into entities called items and lists of items. This structure allows for a self-describing data format to guarantee proper interpretation of the message. Interchange of messages is governed by a set of rules for handling messages called the transaction protocol. SECS-II further defines the data into a multiple-block structure, yielding the format code, length, and data item. All messages are packed into multiple data blocks with headers and checksums, as shown in Figure 42.9. 42.2.4.2 Streams and Functions in SECS-II Table 42.4 shows the category of various SECS-II messages that are classified under the different streams. As an example, all messages in stream 1 will be part of the equipment status category. This category will help the user and integrator in understanding, designing, and developing their SECS solutions. Each stream can specify multiple functions, as shown in Table 42.5, where an example for stream 1 is explained. Function 0 is reserved for abort purposes. The remaining functions are operated upon in pairs of requests (odd function or primary message) and reply/acknowledgments (even function or secondary message). The primary message is the request message sent by either the host or equipment. All the oddnumbered functions, e.g., S1 F1, S2 F1, etc., are examples of primary messages. The user can indicate if a reply is needed for the primary message using the wait bit (W-Bit). The W-bit is the first bit from the header byte, which is used to indicate that the sender of a primary message expects a reply. The secondary message is the reply message. All the even-numbered functions, e.g., S1 F2, S2 F2, etc., are examples of secondary messages. For any messages that cannot be processed by the equipment, the appropriate error message on stream 9 will be used. Upon detection of a transaction time-out, the equipment sends S9 F9 to the host. Upon receipt of function 0 as a reply to a primary message, the related transaction is terminated. No error message should be sent to the host by the equipment.
© 2005 by CRC Press
42-10
The Industrial Communication Technology Handbook
42.2.4.3 Conversation Protocols A conversation is a series of one or more related SECS-II messages used to complete a specific task. A conversation should include all transactions necessary to accomplish the task and leave both the originator and interpreter free of resource commitments at its conclusion. There are a few types of conversations and they are discussed below. 42.2.4.3.1 Primary Message with No Reply Once the W-Bit is set to zero, the primary message will not require a reply. It is the simplest conversation. The primary message will send a single block of information without reply. This conversation is used where the originator can do nothing if the message is rejected. 42.2.4.3.2 Primary Message with Data Return (Request/Data Conversation) The originator will need to get information from the interpreter. The W-Bit will have to be set to 1 and data are returned as a secondary reply message. 42.2.4.3.3 Send/Acknowledge Conversation The originator sends data in a single block and expects an acknowledgment. 42.2.4.3.4 Inquire, Grant, Send, and Acknowledge Conversation The originator must ask for permission from the interpreter. The interpreter will grant permission as well as set the time-out for getting the information. If the wait time is longer than the time-out time, S9 F13 will be sent out. Once the information is received, the acknowledgment will be sent. This type of conversation is commonly used in equipment-to-host conversations, and only the equipment should send error messages to the host. 42.2.4.3.5 Unformatted Data Set Conversation Stream 13 is used for unformatted data conversation. Unformatted data could be in the form of a bitmap file or a binary file. 42.2.4.3.6 Material Handling Conversation Stream 4 is used for material control and handling of material between equipment. 42.2.4.3.7 Conversation with Delay The originator may request information from the interpreter, which requires some time to obtain. The interpreter could collect the information immediately and return. The interpreter could also indicate if the information has be obtained or returned in the subsequent transaction. Messages can also be transferred as single-block messages or multiblock messages. The SECS-I maximum message length for a single block is 244 bytes. The maximum message length for HSMS is 7.9 Mbytes. For longer messages, SECS-II will have to use multiblock messages. For example, if messages that are longer than 244 bytes are needed, they have to be sent by multiple blocks or packets. Each packet will have a block number so that the receiver will be able to join all the blocks together. 42.2.4.4 Sample of SECS-II Message 42.2.4.4.1 Stream 1 Function 1 S1, F1 Are You There Request (R)a
S,H <> E, Replyb
Descriptionc: To establish if the equipment is online. A function 0 response to this message means the communication is inoperative. In the equipment, a function 0 is equivalent to a time-out of the receive timer after issuing S1, F1 to the host. Structure: Header onlyd The above example is from the SEMI standard E5. S1, F2 Online Data (D)
© 2005 by CRC Press
S,H <>E
SEMI Interface and Communication Standards: An Overview and Case Study
42-11
Description: Data signifying that the equipment is alive. Structure:e L, 2 1. <MDLN> 2. <SOFTREV> Exception: The host sends a zero-length list to the equipment. a. Shows the stream (S), function (F), and title of the function (R). b. Shows the type of message. S represents single-block message. Multiblock message will be represented by M. H <> E means the message could be sent from either equipment (E) or host (H). Other alternatives: E > H, equipment sends to host, or H > E, which is from host to equipment. The message requires “reply.” Other alternatives are “optional to reply” or “blank,” which means no reply is needed. c. The description of the message. d. Shows the structure of the message. In this case, it is just the header of the message and there are no data. e. Shows the structure of S1, F2. L, 2 means that it has two listed items. The first listed item is <MDLN>, which represents the equipment model type with maximum of six bytes in length. The second item is <SOFTREV>, which represents software revision with a maximum of six bytes as well. As shown in Table 42.6, SECS-II only uses streams 1 to 14 and functions 1 to 63. In order to provide for expansion of features, streams 15 to 63 are reserved for future SEMI standards. Streams 64 to 127 and functions are available for custom messages. While SECS-II makes room for vendor-independent compatibility, problems do arise when a vendor applies different semantics for the same message pairs. This creates intervendor incompatibility. GEM was created to define generic equipment features so that intervendor compatibility can be established.
42.2.5 Generic Equipment Model The Generic Equipment Model Standard (also referred to as E30) was published in 1992. It defines which SECS-II messages should be used, in which situation, and what the resulting activity should be. GEM is intended to specify the following: • A behavior model of semiconductor equipment in a SECS-II communication environment • A description of information and control functions needed in a semiconductor manufacturing environment • A definition of the basic SECS-II communication capabilities TABLE 42.6 Status of Streams and Functions (Reserved/Unreserved)
© 2005 by CRC Press
Stream
Status
1–14 15–63 64–127
Currently defined in the standard Reserved for standard messages Reserved for user-defined messages
Function
Status
1–63 64–255
Reserved for standard messages Reserved for user-defined messages
42-12
The Industrial Communication Technology Handbook
Fundamental GEM Requirements (Basic) State Models Equipment Processing States Host-Initiated Event Notification Error Messages Documentation Control (Operator Initiated)
Additional GEM Capabilities (Advanced) Establish Communication Dynamic Event Report Configuration Variable Data Collection Trace Data Collection Status Data Collection Alarm Management Remote Control Equipment Constants Process Program Management Material Movement Equipment Terminal Services Clock Limit Monitoring Spooling Control (Host-Initiated)
FIGURE 42.10 GEM requirements.
• A single consistent means of accomplishing an action when SECS-II provides multiple possible methods • Standard message dialogues that are necessary to achieve useful communication The GEM standard contains two types of requirements: • Fundamental (or foundation) requirements • Requirement for additional GEM capabilities As shown in Figure 42.10, the fundamental GEM requirements form the foundation of the GEM standard, which are basic requirements for all types of semiconductor equipment. The additional GEM capabilities provide the functionality required for some instances of factory automation or functionalities that are applicable to specific types of equipment. Capabilities are operations performed by semiconductor manufacturing equipment. Each of the capabilities consists of a statement of purpose, pertinent definitions, a detailed description, requirements, and various scenarios. A scenario is a group of SECSII messages arranged in a sequence to perform a function. These scenarios handle communications that involve more than one pair of messages. 42.2.5.1 GEM Fundamental Requirements 42.2.5.1.1 State Models State models describe the behavior of the equipment from a host perspective. The same model can be used in different equipment for identical functions. The fundamental GEM specifies the state model of process (equipment specific), communication, alarm, control, material movement, and spooling. Figure 42.11 shows a sample state model for equipment processing. There are also step numbers defined in the diagram. Here is how the state model can be interpreted: Step 1: The process will be initialized (INIT). After being initialized, the process could be in the “idle” state. Step 2: The next state is “processing active.” It will be in the substate of “process” doing an operation of “setup.” The setup operation could be semimanual, as the operator needs to log on and finetune the equipment. Step 3: After setup, the equipment will be in the “ready” state for operation. Step 4: In normal processes, the equipment should be either in the “executing” state, producing some part or component, or in step 5, which is “idle,” waiting for the next part or component. Step 5: The equipment is idle and waiting for the next part.
© 2005 by CRC Press
SEMI Interface and Communication Standards: An Overview and Case Study
Processing State Diagram
42-13
INIT 1 IDLE 6 2 7
PAUSE
8 10 9
H
SETUP 3 READY
5
4 EXECUTING
PROCESSING ACTIVE
PROCESS
FIGURE 42.11 Equipment processing states. TABLE 42.7 Example of GEM Transactions for Event Reporting Setup Step
Host
Direction
Equipment
1
Send report definitions S2, F33
Æ ¨
Send acknowledgment S2, F34
2
Send event and report link definitions S2, F35
Æ ¨
Send acknowledgment S2, F36
3
Send event-enabling information S2, F37
Æ ¨
Send acknowledgment S2, F38
Step 6: The equipment could be in “processing active” and receive a “STOP” command from the operator or host system. This will put the equipment back to the “idle” state. Step 7: The equipment could be in “processing active” and receive an “ABORT” command from the operator or host system. This will put the equipment back to the “idle” state. Step 8: The equipment may receive an alarm and it will then be in the “pause” state. Step 9: The equipment may receive a pause command by the operator or host system. Step 10: The equipment may receive a resume command by the operator or host system. An example of the GEM transactions for event-reporting setup is illustrated in the Table 42.7. Step 1: Host defines the report and request for reports. • The host initiates a report definition by sending S2, F33. Stream 2 is the equipment control and diagnostics stream and function 33 is to “define report.” This message allows the host system to define and request the group of reports available from the equipment. A unique ID identifies each report. • The equipment receives and acknowledges by replying with S2, F34. If the primary message has errors or the report ID is not found, the primary message will be rejected. Step 2: Host sets an ID for the required reports. • The host initiates an “event and report” link definition by sending S2, F35. Function 35 will only link the reports that are requested by the host, with an ID. The link will be disabled until the equipment is ready to send. • The equipment receives and acknowledges by replying S2, F36. Again, if there is any error condition, the whole message will be rejected. Step 3: The host enables the sending of the reports. • The host sends the event-enabling definition by sending with S2, F37. At this stage, the equipment knows what reports the host requires and the host will enable the sending process.
© 2005 by CRC Press
42-14
The Industrial Communication Technology Handbook
•
The equipment receives and acknowledges by replying with S2, F38. After the equipment acknowledges the message from the host system, it will then proceed to send the reports to the host system.
42.2.5.1.2 Host Initiates S1, F13; S1, F14 It is a fundamental requirement in GEM that the required equipment be able to accept a connect request from the host. The host can request connections by sending S1, F13 to the equipment. Upon receipt of the message, the equipment will send the S1, F14 “connect acknowledge” reply. 42.2.5.1.3 Event Notification This capability provides data to the host at specified points during equipment operation. The equipment will send S6, F5 to get permission from the host. The host will reply with S6, F6 to grant permission. Once the permission is granted, the equipment will send the report using S6, F11. The host will acknowledge the report using F6 F12. 42.2.5.1.4 Online Identification Online identification is the most basic SEMI requirement. The originator (which could be host or equipment) will send S1, F1 to ask if the interpreter (which could be equipment or host) is there. The interpreter will then reply using S1, F2, which has the information of the equipment type and revision. 42.2.5.1.5 Error Messages Fundamental GEM requires the system to be able to detect communication link errors in SECS-I and to detect SECS-II and GEM format errors. It will also need to support all stream 9 messages and provide appropriate error-handling routines. 42.2.5.1.6 Documentation Fundamental GEM requires that the equipment should be able to provide a detailed specification document for the SECS–GEM interface on the equipment. 42.2.5.1.7 Operator-Initiated Control Fundamental GEM requires that the equipment should have operator-initiated control-related capabilities to allow the configuration and manipulation of the control state model. In this way, the host or user may modify the equipment’s control-related behavior. The operator at the equipment will send S1, F1 to the host. The host will send S2, F18 to grant an online link to equipment. The equipment will send a control state local event report using S6, F11. The host will acknowledge the report with S2, F12. 42.2.5.2 Additional GEM Requirement “Additional GEM” requires that the system be able to support communication state model, operator communication state display, operator enable and disable commands, enabled or disabled power-up states, and establish communications timer equipment constants. 42.2.5.2.1 Establish Communications The “establish communications” capability provides a means of formally establishing communications upon system initialization or upon any loss of communication between communication partners. Notification of the period of noncommunication is then possible. The host system initiates communication by sending S1, F13, and the equipment will reply using S1, F14. 42.2.5.2.2 Event Notification This capability provides data to the host at specified points during equipment operation. The equipment will ask the host for permission to send by using the S6, F5 message. The host will give permission using the S6, F6 message. Once the permission is given, the equipment will send the event report using S6, F11, and the host will acknowledge the report using S6, F12.
© 2005 by CRC Press
SEMI Interface and Communication Standards: An Overview and Case Study
42-15
42.2.5.2.3 Dynamic Event Report Configuration This capability provides the data-reporting flexibility required in some manufacturing environments. The host will send S2, F39 to request permission from the equipment and will get the permission from the equipment using S6, F40. Having received the permission, the host will send S2, F33 to define the report requirements. The equipment will acknowledge the report requirements by S6, F34. The host will send S2, F35 to collect the report from the equipment, and the equipment will then send the report using S6, F36. 42.2.5.2.4 Variable Data Collection This capability allows the host to query for the equipment data variables and is useful during initialization and synchronization. The host will send S2, F19 to ask for the variable data, and the equipment will send the data using S6, F20. 42.2.5.2.5 Trace Data Collection The trace data collection provides a method of sampling data on a periodic basis. The host will initiate the trace of certain data using S2, F23. The equipment will initialize the trace data and acknowledge using S2, F24. 42.2.5.2.6 Status Data Collection This capability allows the host to query equipment for selected information and is useful in synchronizing with equipment status. The host will send S1, F11 to request for the variable equipment status, and the equipment will reply the variable status using S1, F12. 42.2.5.2.7 Alarm Management This capability provides for host notification and management of alarm conditions occurring on the equipment. The host will ask for alarm data and text using S5, F5. The equipment will send the alarm data to the host using S5, F5. 42.2.5.2.8 Remote Control This capability provides the host with a level of control over equipment operations. The host will send the remote command using S2, F41. The equipment will acknowledge using S2, F42 and send the data report using F6, F11. The host will acknowledge the report using S6, F12. 42.2.5.2.9 Equipment Constants This capability provides a method for the host to read and change the value of selected equipment constants on the equipment. These equipment constants include nonvolatile storage, validation/verification of equipment constants, host equipment constants name, list request messages, etc. The host will send the new equipment constant using S2, F15, and equipment will send the equipment constant using S2, F16. The host will then send S2, F13 to request for equipment constants, and the equipment will send the equipment constants using S2, F14. 42.2.5.2.10 Process Program Management This capability provides a means to transfer process programs and to share the management of those process programs between the host and equipment. The host will ask for the equipment process program using S7, F19, and the equipment will send the process program data using S7, F20. To upload the process program, the host will send S7, F5 and the equipment will send the process program using S7, F6. To download the process program, the host will send S7, F1 and the equipment will grant permission using S7, F2. Once permission is received, the host will send the process program using S7, F3 and the equipment will acknowledge with S7, F4. 42.2.5.2.11 Material Movement This capability includes the physical transfer of material among equipment, buffers, and storage facilities. The transfer of material can be performed by the operator, AGV (autonomous guided vehicle) robots, tracks, or dedicated fixed material-handling equipment. The equipment will send collection of event using S6, F11, and the host will acknowledge with S6, F12.
© 2005 by CRC Press
42-16
The Industrial Communication Technology Handbook
42.2.5.2.11.1 Equipment Terminal Services — This capability allows the host to display information on the equipment’s display device or the operator of the equipment to send information to the host. The host will send textual information to the equipment using S10, F3. The equipment will display the information and acknowledge with S10, F4. 42.2.5.2.12 Clock The clock capability enables host management of time-related activities and occurrences associated with the equipment and across multiple pieces of equipment. The equipment will send S2, F17 to request time information from the host, and the host will respond with a time value using S2, F18. 42.2.5.2.13 Limits Monitoring This capability relates to the monitoring of selected equipment variables. The host will get the equipment’s current limit-monitoring status using S2, F47, and the equipment will send the status using S2, F48. 42.2.5.2.14 Spooling Spooling is a capability whereby the equipment can query a message intended for the host during times of communication failure and subsequently deliver this message when communication is restored. The host will define messages that need spooling in case of communication failure using S2, F43. The equipment will reset the spooling message and acknowledge with S2, F44. 42.2.5.2.15 Host-Initiated Control The control-related capabilities allow for configuration and manipulation of the control state model. In this way, the host or user may modify the equipment’s control-related behavior. The host will request to be offline using S1, F15. The equipment uses S1, F0 to abort the transaction, sends S1, F16 to acknowledge, and uses S6, F11 to send the event report. The host will acknowledge with S6, F12.
42.2.6 Limitations of SECS/GEM The limitations faced by the currently defined SECS/GEM standards are: a. Manual integration cannot be totally avoided. The factory automation system cannot query SECS/ GEM software to determine its full capabilities because different vendors have customized the same message pairs for different applications. Therefore, SECS requires significant manual intervention/customization efforts for each application function. b. Weak security mechanism. SECS is weak in its security mechanism. There is no concept of client authentication or access permission in SECS. Issues of security are becoming important operational considerations because of the need for remote equipment engineering and remote diagnostics. c. Single-client architecture. The implementation of SECS is based on a point-to-point link, and there is only one software process (i.e., one point) on the factory side. This is a critical limitation because the newly developed factory applications such as advanced process control (APC), factory automation, and advanced planning and scheduling systems (APS) require a steady stream of accurate realtime data, even in the form of peer-to-peer communications, to facilitate a decision support process.
42.2.7 Emerging and Evolving New Standards The industry associations such as SEMI and SEMATECH, as well as universities, have tried to develop a new standard that can handle the various applications needed by today’s semiconductor manufacturing environment. SEMI communication and interface standards have adopted some of the popular object-oriented programming, like Extensible Markup Language (XML) and distributed computing techniques, but these are still not very commonly used. Table 42.8 shows the trend in the SEMI communication standard.
© 2005 by CRC Press
42-17
SEMI Interface and Communication Standards: An Overview and Case Study
TABLE 42.8 Trend of the SEMI Standard First Generation Characteristics Simple text-based reporting
Second Generation Characteristics Some recipe management Some equipment performance matrices Low standardization
Third Generation
SECS-I/SECS-II GEM HSMS
Limited solution opportunities Highest integration cost Longest development
Standards Object-based standards
Technology Serial TCP/IP
Issues
Intelligent control Remote diagnostics/ monitoring Mainstream: distributed computing standards
Standards
Technology Serial TCP/IP
Characteristics
Characteristics Full recipe management Equipment performance matrices SEMI standardization
Standards SECS-I De facto Proprietary
Next Generation
Technology Middleware
Issues Highest integration cost Long lead time Semiconductor-specific interfaces
Issues Non-semiconductorspecific PC-to-PC communication Lower integration cost Object-oriented approach
42.3 A Survey of SECS/GEM-Compliant Tool Kits SECS/GEM-compliant software comes in different layers. In this section, a survey of different tool kits that are available in the market is presented together with their different features. The different features that are provided by each vendor are summarized in Table 42.9 to highlight the capabilities among them.
42.3.1 Overview of Equipment Integration Software The equipment integration software is an important part in semiconductor manufacturing or wafer fabrication. The factory automation software must accommodate various factory applications, such as equipment configuration, online and remote diagnostics, remote control, and equipment simulation. These applications are the essential functionalities in today’s semiconductor wafer fabrication. Most of the software vendors would like to provide solutions across the span of process control, system automation, production data acquisition, and communication systems. The factory automation software can be divided into two levels: the application level and the communication-enabling level. The function of the application level is to carry out the various applications such as remote diagnostics, data collection, tracking of materials, testing and simulation, and more. The function of the communication enabler is to interface the factory networks or host with the equipment, which maybe SECS/GEM compatible or non-SECS/GEM compatible equipment, such as programmable logic controller (PLC) controlled machines. Figure 42.12 gives an overview of the factory automation software. The software tool kit comes with different features and functionalities. They can be conveniently classified as tool kits for manufacturing execution systems (MES) application, factory automation software/solution, and SECS/GEM drivers.
© 2005 by CRC Press
42-18
TABLE 42.9 The Survey of the Available Software Software Vendor
Software
Language API Recipe management ARAM Testing tools Middleware
EQBraina
EQBrain 300a
CellBraina
Win32
Win32
Win32
VC++, VB, .NET
VC++, VB
VC++, VB
Good user interface GEM to XML a
Auto code generator.
© 2005 by CRC Press
CIMETRIX
CIM Connect Win32
ABAKUS
Kinesys
ErgoTech
SDI
GEM Host Manager
AHEAD ActiveX GEM interface, SCI Spy
GEMBox
ALPS
Tran SECS
JAVA SECS
Win32
Win
Win
Win UNIX
Win UNIX
C++ DCOM
DLL
ActiveX/COM
C++
Win
JAVA
JAVA
JAVA Bean
JAVA Bean
sdiRelayer, EDAGateway, sdiStation Host Developer, SMS Developer, Reseller/ Integrator of Cimetrix TESTConnect, CIMConnect, CIM300 MS Windows, UNIX, Linux C++, C, VB
CORBA, TIBCO, SOAP, XML, ACE, TAO
The Industrial Communication Technology Handbook
SECS-I SECS-II HSMS GEM OBEM O/S
Yokogawa
SEMI Interface and Communication Standards: An Overview and Case Study
42-19
Application Layer On-line & remote diagnostic
MES Application
Remote Control of Equipment
Factory Automation Software/Solution
Configuration
SECS/GEM Driver Equipment Simulation
Communication Layer
SECS/GEM Interface Control logic Equipment
FIGURE 42.12 Application and communication layers of factory automation software.
42.3.2 SEMI Equipment Communication Software Some of the commercially available factory automation software is described in this section. These products are described from company-provided information. The survey is based on several important features listed in Table 42.9. A few software vendors provide the complete solution, but most provide solutions for the application level. They generally use the communication-enabler component from prominent players in the market, such as Cimetrix, in conjunction with their application package.
42.3.3 An Analysis of Current Available SECS/GEM Solutions An analysis of the SECS/GEM solutions is shown below, based on the features of the software and the intended application. 42.3.3.1 The Analysis Based on the Features of the Software The analysis on the current available solutions is consolidated from public information provided by the companies. Feedback from vendors is also used to compile this analysis. 42.3.3.2 An Analysis Based on the Intended Application Due to the fact that SECS/GEM communication software has the propensity to be tailored to a variety of applications, it is necessary to map such applications to varying communication levels, as shown in Figure 42.13 and Table 42.10. For example, it can be used by the equipment manufacturer for developing the host interface, just as it can also be used by the semiconductor manufacturer to communicate with the equipment, to test the equipment for its interface, etc. Figure 42.8 shows one example that highlights the application of the SECS/GEM software. Further, Table 42.10 compares the application intent of the software. 42.3.3.3 Case Study: Design and Implementation of SECS/GEM for a New Semiconductor Back-End Machine (Implementation of SECS/GEM Compliance on a Virgin Machine) This section presents a case study in the design and implementation of SECS and GEM for a case where a back-end semiconductor machine is to be built with SECS/GEM compliance. We shall see how easy
© 2005 by CRC Press
42-20
The Industrial Communication Technology Handbook
Middleware Application, e.g., CORBA, DCOM Level III
HOST
Tools, e.g., application enabler Level II
Level I
Level I Level II
Host SECS/GEM Driver
SECS/GEM Equipment interface
Tools, e.g., Control Logic
EQUIPMENT Level III Real-time Middleware to facilitate query by external application
FIGURE 42.13 The functionality of SECS/GEM software.
it is to implement SECS-II and GEM for a PC-based equipment controller by leveraging on commercially available software. We have chosen a package from GW, Inc. We will show the steps toward the implementation of a SECS/GEM, in the case where the equipment initiates the communication with the host system. a. A Statement of the Specifications and Requirements The equipment maker is required to develop GEM-compliant equipment. This equipment is required to be directly integrated into an existing SECS-II-enabled host (cell control) system. The detailed requirements and capabilities must be able to: 1. Network connected to the host system with both SECS-II and HSMS protocols 2. Upload and download process programs for equipment configuration 3. React to the host’s requests, based on the process state of equipment 4. Send event or alarm reports to the host system in the event of predefined events or alarms occur 5. Time-synchronize with host system on equipment power-up 6. Implement GEM fundamental requirements b. System Architecture and Software Tools Figure 42.14 shows the recommended system architecture and communication interfaces. The SECS/GEM software package from GW, Inc., is utilized in this example. It includes: • GCD file • GWGEM daemon • GWGEM extension task • GWGEM primary message handler task • GWGEM Application Programming Interface (API) in C++ As we mentioned earlier, in order to run the GEM application that is built on the above five major GWGEM components, the SDR (SECS Driver) must be installed and run as a message driver (HSMS) to communicate with the host system.
© 2005 by CRC Press
Software Vendor
Software Level III (HOST) Level II (HOST) Level I (HOST) Level I equipment Level II equipment Level III equipment
Yokogawa
EQBrain
EQBrain 300
CIMETRIX
CellBrain
CIM Connect
GEM Host Manager
Kinesys
GEMBox
ErgoTech
Tran SECS
JAVA SECS
SDI
ABAKUS
sdiRelayer, EDAGateway, sdiStation Host Developer, SMS Developer, Reseller/Integrator of Cimetrix TESTConnect, CIMConnect, CIM300
AHEAD ActiveX GEM Interface, SCI Spy
SEMI Interface and Communication Standards: An Overview and Case Study
TABLE 42.10 A Survey of the Software Based on Communication Levels
42-21
© 2005 by CRC Press
42-22
The Industrial Communication Technology Handbook
User GUIs
Data base
Equipment Control Logic
GCD file
Windows Message Algorithm
GW-C++ APIs MSG Handler GW-Extension GW-Daemon
I/O Threads
GW-SDR
H/W
FIGURE 42.14 System architecture for SEMI interface creation.
c. Functional Mapping of Modules The GWGEM daemon process coordinates all the communications between user applications, GWGEM extension, and GWGEM message handler tasks. The GCD (GEM configuration data) file defines all SECS variables such as status variables, equipment constants, and data variables. The extension task is a process that contains various GWGEM extension routines that are called by the GWGEM daemon process from time to time. Extension routines are user-written application programs that provide application-specific or equipment-specific handling of GEM messages. d. Message Partition A primary message handler task is used to process a particular type of incoming SECS primary message from the host system. It must provide for the following conditions: • Messages that are beyond the GEM standard set • Messages that vary from one equipment type to another • The ability to override GWGEM built-in handling of a particular incoming primary message • The ability to scan (peek at) an incoming SECS-II message without actually processing it e. Levels of Implementation In this project, the following GEM fundamental requirements are implemented: 1. State model 2. Equipment processing status 3. Host-initiated S1F13/F14 scenario 4. Event notification 5. Online identification 6. Error message 7. Documentation 8. Control (operator initiated) By implementing the GEM fundamental requirements, the equipment is GEM compliant, which means that it is able to communicate with any GEM-compliant equipment no matter which hardware vendor it comes from. In addition to the above, some additional GEM capabilities are also implemented. These include: 1. Initialization and establishment of communications 2. Alarm management 3. Equipment terminal services
© 2005 by CRC Press
SEMI Interface and Communication Standards: An Overview and Case Study
42-23
4. Process program management 5. Clock GWGEM C++ encapsulates the GEM requirements into several classes. The developer treats the GEM functionalities as objects. GWGEM C++ for Windows packs as a Windows dynamic link library (DLL). It takes advantage of window programming for WIN32-based applications such as Microsoft Foundation Class (MFC). GWGEM C++ uses Windows event notification and message service to notify or trigger the WIN32 application to handle incoming SECS-II primary messages. f. Integrate GWGEM C++ with an MFC-Based Equipment Controller Normally, the development team of GEM-compliant semiconductor equipment is made up of mechanical engineers, control engineers, and SECS/GEM consultants. The first step for the consultant is to work with the mechanical and control engineers to collect equipment, status, event, and alarm data. The consultant needs to translate these data related to the equipment’s operation into variables defined in the GCD file (see GCD file sample in Figure 42.15a). Once the control engineer completes code writing, including Graphical User Interface (GUI) portions and I/O control logic, it is time to integrate GWGEM C++ with the control program. To improve the real-time performance, GWGEM C++ maps events such as equipment state change, process state change, or the arrival of SECS-II primary messages with Window’s message handler so that these events are handled concurrently without performance trade-offs. Due to limited space, the code samples in Figure 42.15 only show the necessary steps for declarations, initialization, and deletion of GEGEM objects in an MFC-based application, based on the hierarchy of messages in Figure 42.16. 42.3.3.3.1 Code Samples By using the GCD compiler, these equipment constants in the GCD file are converted to a standard C++ header file, which is recognized by standard C++ compilers. Note that these header files are provided with a GWGEM C++ package for the Windows platform. By using GWGEM C++ for Windows, there is no need for the developer to code the low-level communication details. GWDaemon and its supporting components, such as GWGEM extension tasks, do the processing of incoming SECS-II messages. As a result, it significantly shortens the development cycle and reduces software bugs. Typically, after 2 months of development and testing, the equipment successfully passes the GEM tests and is declared fit for delivery.
42.4 Conclusion This chapter has given a brief overview of the standards of SEMI SECS-I, SECS-II, and GEM. While keeping within the special (and at time strange) nomenclature and conventions adopted by SEMI, the authors have provided ample comments and interpretations on the pertinent points to enable the reader to understand the complexity and integrity of SECS equipment. A survey of existing SECS-compliant software is also conducted. A case study is introduced to illustrate how SECS/GEM compliance can be implemented on virgin machines. Presently, there are no systematic alternatives to SEMI standards. Although the SEMI communication standard is still evolving, it continues to be critical as a means of sustaining maximum functionality and availability in the wake of new semiconductor processes and new semiconductor manufacturing automation.
Acknowledgments The authors express their sincere thanks to the following people for various technical discussions and suggestions leading toward the completion of this chapter: Dr. Lim Khiang Wee, Executive Director, SIMTech; Phua Geok Hong; William Tan; and Prof. Sim Siang Kok.
© 2005 by CRC Press
42-24
The Industrial Communication Technology Handbook
(a)
/* GWGEMCPP Base-line GCD Source File */ Constant fixup GemOfflineSubstate =
/* data type is 1 byte unsigned integer */
vid = 43
/* variable ID is 43 */
name = "OFFLINESUBSTATE"
/* 1: offline/equipment offline */
units = ""
/* no unit */
min =
/* minimum value is 1*/
max =
/* maximum value is 3 */
default =
/* default value is 1 */ /* event name */
event ProcessRunningEvent
/* constant event ID is 122 */
ceid = 122 event ProcessAbortEvent ceid = 123 status AramsStateVariable =
/* character array with null value */
vid = 500
/* variable ID is 500 */
name = "AramsStateVariable"
/* status name */
units = "" (b) # include "gwgemcpptemplate.h"; /* for template class */
(c)
# include "gemcppbase.h";
/* base class definition */
# include "GemVirtImp.h";
/* virtual classes for application-specific overwrite*/
# include "GemMessages.h"
/* SECS II message header structures, etc. */
# include "gwcontrol.h
/* equipment control state */
/* This is a sample program to be used in conjunction with GWGEM class library to demonstrate how to create, initialize, access and remove GWGEM object in MFC code */ CGWGemCPP* pGem = new CGWGemCPP;
/*create a GWGemCPP instance */
pGlobalGem = pGem;
/* setup global pointer */
(d) /*Create a control state message, set initial condition states and call the message handler.*/ pGlobalGem -> GWControl().GetControlState(ControlState); /*get current control state */ pGemConfControlStateChange pControlStateChange = new GemConfControlStateChange; pControlStateChange->NewControlState = ControlState; /*set state */ pControlStateChange->ControlStateName = getStrControlState(ControlState); OnGemConfControlStateChange(0, (LPARAM) pControlStateChange); /*notify state change */
FIGURE 42.15 (a) Sample GCD file. (b) C++ sample, header files. (c) C++ sample; create GWGEM object. (d) C++ sample; set up control state. (e) C++ sample; set up communication state. (f) C++ sample; set up spooling state. (g) C++ sample; remove GWGEM object. (h) C++ sample; fire an event. (i) C++ sample; disable the communication link. (j) C++ sample; send S1, F13 to host.
© 2005 by CRC Press
SEMI Interface and Communication Standards: An Overview and Case Study
(e)
pGlobalGem -> GWLink().GetLinkState(LinkState); /* get initial communications state */ SetDlgItemText(IDS_LINKSTATE, getStrLinkState(LinkState)); /* display on GUI */
(f)
pGlobalGem -> GWSpool().GetSpoolState(SpoolState); /* get the spool state */ SetDlgItemText(IDS_SPOOLSTATE, getStrSpoolState(SpoolState)); /*display */
(g)
(h)
delete pGEM; /* remove GWGEM object */
int status = pGlobalGem->GWEvent().Send(EventID); /*send event with ID = EventID*/
(i)
pGlobalGem->GWLink().Disable();
(j)
/* following code sends S1F13 to the host system */
SDRTKT tkx = 0;
/* message structure declared in gwmessage.h */ /* set SDR ticket value to 0 */
unsigned char buffer[512] ;
/* Message text buffer */
unsigned char ModelNum[7] = "SDR";
/* set model number */
unsigned char SoftRev[7] = "Rev10";
/* software version */
pmsg->stream = 1;
/* set stream to 1 */
pmsg->function = 13;
/* set function to 13 */
pmsg->wbit = 1;
/* request reply */
pmsg->buffer = buffer;
/* pointer to message buffer */
PSDRMSG pmsg;
pmsg->length = sizeof(buffer); pGlobalGem->GWSdr().SdrItemInitO( pmsg);
/* fill up SECS II message */
pGlobalGem->GWSdr().SdrItemOutput( pmsg, GWS2_L, NULL, (SDRLENGTH)2); pGlobalGem->GWSdr().SdrItemOutput(pmsg,GWS2_STRING,ModelNum, (SDRLENGTH)6); pGlobalGem->GWSdr().SdrItemOutput( pmsg,GWS2_STRING, SoftRev, (SDRLENGTH)6); int status = pGlobalGem->GWSdr().SdrRequest(0, pmsg, &tkx); /* send S1F13 out */
FIGURE 42.15 Continued.
© 2005 by CRC Press
42-25
42-26
The Industrial Communication Technology Handbook
Host Application
SECSII Message Handler
HSMS Driver
TCP/IP
HSMS Driver
SECSII Message Handler
Equipment State Model
RS-232 Connection
PLC
Actuators & Sensors Mechanical
FIGURE 42.16
Design of intercommunication process.
References [1] Tin O., Competitive Analysis and Conceptual Design of SEMI Equipment Communication Standards and Middleware Technology, Master of Science (Computer Integrated Manufacturing) Dissertation, Nanyang Technological University, 2003. [2] SEMATECH, Generic Equipment Model (GEM) Specification Manual: The GEM Specification as Viewed from Host, Technology Transfer 97093366A-XFR, 2000, pp. 4–39. [3] SEMATECH, High Speed Message Services (HSMS): Technical Education Report, Technology Transfer 95092974A-TR, 1999, pp. 11–34. [4] GW Associates, Inc., Solutions for SECS Communications, Product Training (PowerPoint slides), 1999. [5] SEMI International Standards, CD-ROM, SEMI, 2003. [6] Semiconductor Equipment and Materials International Equipment Automation/Software, Volumes 1 and 2, SEMI, 1995. [7] SEMATECH, CIM Framework Architecture Guide 1.0, 97103379A-ENG, 1997, pp. 1–31. [8] SEMATECH, CIM Framework Architecture Guide 2.0, 1998, pp. 1–24.
© 2005 by CRC Press
SEMI Interface and Communication Standards: An Overview and Case Study
42-27
[9] SEMI, Standard for the Object-Based Equipment Model, SEMI Draft Document 2748, 1998, pp. 1–52. [10] SEMI E98, Provisional Standard for the Object-Based Equipment Model. [11] Weiss M., Increasing Productivity in Existing Fabs by Simplified Tool Interconnection, 12th edition, Semiconductor FABTECH, 2001, pp. 21–24. [12] Yang H.-C., Cheng F.-T., and Huang D., Development of a Generic Equipment Manager for Semiconductor Manufacturing, paper presented at 7th IEEE International Conference on Emerging and Factory Automation, Barcelona, October 1996, pp. 727–732. [13] Feng C., Cheng F.-T., and Kuo T.-L., Modeling and Analysis for an Equipment Manager of the Manufacturing Execution System in Semiconductor Packaging Factories, 1998, pp. 469–474. [14] ControlPRoTM, Developer Guide, Realtime Performance, Inc., 1996. [15] Kaufmann T., The Paradigm Shift for Manufacturing Execution Systems in European Projects and SEMI Activities, 8th edition, Semiconductor FABTECH, 2002, pp. 17–25. [16] GW Associates, Inc., SECSIMPro GEM Compliance Scripts User’s Guide, 2001. [17] GW Associates, Inc., SECSIMPro, SSL Reference Guide, 2001. [18] GW Associates, Inc., SECSIMPro, User’s Guide, 2001. [19] SEMATECH, SEMASPEC GEM Purchasing Guidelines 2.0, Technology Transfer 93031573B-STD, 1994, pp. 10–30.
Web References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
www.cimetrix.com www.abakus-soft.de www.kinesysinc.com www.secsandgem.com www.sdiusa.com www.yokogawa.com.sg www.asyst.com www.siautomation.com www.ais-dresden.de www.agilent.com
© 2005 by CRC Press
Home page of Cimetrix Software Home page of Abakus Software Home page of Kinesys Software — The GEM Box Home page of Ergo Tech Software Home page of SDI Software Home page of Yokogawa Software Home page of Asyst Software Home page of SI Automation Software Home page of VECC Product Home page of Agilent Software